[ 
https://issues.apache.org/jira/browse/TIKA-1970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15284699#comment-15284699
 ] 

Tim Allison edited comment on TIKA-1970 at 5/17/16 1:42 PM:
------------------------------------------------------------

This looks to be an area for improvement in the underlying James library.

{noformat}
            ParsedField parsedField = LenientFieldParser.getParser().parse(
                    field, DecodeMonitor.SILENT);
...
DateTimeField dateField = (DateTimeField) parsedField;
{noformat}

{{dateField.getDate()}} is returning {{null}}, which suggests that it can't 
read a date of format: {{16 May 2016 at 09:30:32  GMT+1}}

We're using the latest version of James, but that dates back to 2012.

Some options:
1) fix this in james (unlikely given lack of activity??)
2) add our own date parser
3) borrow a date parser from someone else (??)
4) find another rfc822 parser that is more actively maintained (??)

See also [xkcd|https://xkcd.com/1179/]


was (Author: talli...@mitre.org):
This looks to be a bug in the underlying James library.

{noformat}
            ParsedField parsedField = LenientFieldParser.getParser().parse(
                    field, DecodeMonitor.SILENT);
...
DateTimeField dateField = (DateTimeField) parsedField;
{noformat}

{{dateField.getDate()}} is returning {{null}}, which suggests that it can't 
read a date of format: {{16 May 2016 at 09:30:32  GMT+1}}

We're using the latest version of James, but that dates back to 2012.

Some options:
1) fix this in james (unlikely given lack of activity??)
2) add our own date parser
3) borrow a date parser from someone else (??)
4) find another rfc822 parser that is more actively maintained (??)

See also [xkcd|https://xkcd.com/1179/]

> Date not extracted from email saved as plain txt
> ------------------------------------------------
>
>                 Key: TIKA-1970
>                 URL: https://issues.apache.org/jira/browse/TIKA-1970
>             Project: Tika
>          Issue Type: Bug
>          Components: metadata
>    Affects Versions: 1.14
>         Environment: Debian Linux Jessie
> Java(TM) SE Runtime Environment (build 1.8.0_91-b14)
> Mac OS X Mail
>            Reporter: Philipp Steinkrueger
>            Priority: Minor
>         Attachments: Testemail-date.eml, Testemail-nodate.txt
>
>
> I have two email testfiles:
> (1) A file that has been created by using "save as" in Mac Mail (this creates 
> a .txt file)
> (2) A file that has been created by dragging an email from Mac Mail to the 
> Desktop (this creates an .eml file)
> If I feed the files with
> curl -T filename http://localhost:9998/detect/stream
> I get the response "message/rfc822" for both files.
> If I run
> curl -T filename http://localhost:9998/meta
> I get the metadata, but in the case of (1) I do not get the DATE extracted, 
> while in case (2) I do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to