[ 
https://issues.apache.org/jira/browse/TIKA-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781922#comment-13781922
 ] 

Tim Allison commented on TIKA-1162:
-----------------------------------

Dear Colleague,
  I'm on paternity leave.  Will be back part time on October 14.

   Best,

            Tim



> content-type/charset problem with RFC822Parser
> ----------------------------------------------
>
>                 Key: TIKA-1162
>                 URL: https://issues.apache.org/jira/browse/TIKA-1162
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>            Reporter: Maciej Lizewski
>
> RFC822Parser (mime mail) uses MailContentHandler which internally uses 
> AutoDetectParser to handle each mime part. The problem is that 
> MailContentHandler reads mime part headers and sets CONTENT_TYPE and 
> CONTENT_ENCODING metadata properly and passes this metadata to 
> AutoDetectParser::parse method. But that method ignores those headers and 
> overwrites it:
>         MediaType type = this.getDetector().detect(tis, metadata);
>         metadata.set(Metadata.CONTENT_TYPE, type.toString());
> this leads to some additional recursion loops (Detector returns 
> message/rfc822 mime type instead of proper mimetype for current mime part) 
> and finally somehow it skips out of the loop but without proper content-type 
> and content-encoding headers...
> My proposition is to add check if metadata already contains CONTENT_TYPE in 
> AutoDetectPArser::parse and in such case do not override it. If this is not 
> valid behavior in general - then RFC822Parser should use custom parser in 
> MailContentHandler which respects passed content-type...



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to