[ https://issues.apache.org/jira/browse/TIKA-1970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15284699#comment-15284699 ]
Tim Allison edited comment on TIKA-1970 at 5/17/16 1:42 PM: ------------------------------------------------------------ This looks to be an area for improvement in the underlying James library. {noformat} ParsedField parsedField = LenientFieldParser.getParser().parse( field, DecodeMonitor.SILENT); ... DateTimeField dateField = (DateTimeField) parsedField; {noformat} {{dateField.getDate()}} is returning {{null}}, which suggests that it can't read a date of format: {{16 May 2016 at 09:30:32 GMT+1}} We're using the latest version of James, but that dates back to 2012. Some options: 1) fix this in james (unlikely given lack of activity??) 2) add our own date parser 3) borrow a date parser from someone else (??) 4) find another rfc822 parser that is more actively maintained (??) See also [xkcd|https://xkcd.com/1179/] was (Author: talli...@mitre.org): This looks to be a bug in the underlying James library. {noformat} ParsedField parsedField = LenientFieldParser.getParser().parse( field, DecodeMonitor.SILENT); ... DateTimeField dateField = (DateTimeField) parsedField; {noformat} {{dateField.getDate()}} is returning {{null}}, which suggests that it can't read a date of format: {{16 May 2016 at 09:30:32 GMT+1}} We're using the latest version of James, but that dates back to 2012. Some options: 1) fix this in james (unlikely given lack of activity??) 2) add our own date parser 3) borrow a date parser from someone else (??) 4) find another rfc822 parser that is more actively maintained (??) See also [xkcd|https://xkcd.com/1179/] > Date not extracted from email saved as plain txt > ------------------------------------------------ > > Key: TIKA-1970 > URL: https://issues.apache.org/jira/browse/TIKA-1970 > Project: Tika > Issue Type: Bug > Components: metadata > Affects Versions: 1.14 > Environment: Debian Linux Jessie > Java(TM) SE Runtime Environment (build 1.8.0_91-b14) > Mac OS X Mail > Reporter: Philipp Steinkrueger > Priority: Minor > Attachments: Testemail-date.eml, Testemail-nodate.txt > > > I have two email testfiles: > (1) A file that has been created by using "save as" in Mac Mail (this creates > a .txt file) > (2) A file that has been created by dragging an email from Mac Mail to the > Desktop (this creates an .eml file) > If I feed the files with > curl -T filename http://localhost:9998/detect/stream > I get the response "message/rfc822" for both files. > If I run > curl -T filename http://localhost:9998/meta > I get the metadata, but in the case of (1) I do not get the DATE extracted, > while in case (2) I do. -- This message was sent by Atlassian JIRA (v6.3.4#6332)