[ 
https://issues.apache.org/jira/browse/TIKA-863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210205#comment-13210205
 ] 

Nick Burch commented on TIKA-863:
---------------------------------

I'm not sure if we should be setting it as AutoDetectParser.class on the 
context or not. 

Part of me thinks that we should be setting it as Parser.class, if the user 
hasn't already set one. However, there is an issue that the parser set for that 
may be expecting something slightly different, eg an embedded document rather 
than effectively the core document, though you could argue that the mime 
contents are all embedded.

The second worry is with attaching additional objects to the ParseContext 
automatically. A user may choose to not set a recursing parser on the 
ParseContext, because they don't want embedded resources. If we add one, and 
the email contains a word document with an embedded excel spreadsheet, then 
suddenly because we added a Parser to the context recursion will trigger
                
> MailContentHandler should not create AutoDetectParser on each call
> ------------------------------------------------------------------
>
>                 Key: TIKA-863
>                 URL: https://issues.apache.org/jira/browse/TIKA-863
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.1
>            Reporter: Andrzej Bialecki 
>         Attachments: TIKA-863.patch
>
>
> MailContentHandler is called from RFC822Parser, and it creates 
> AutoDetectParser on each call to parse(...). The process of creating 
> AutoDetectParser involves reading TikaConfig (not cached), which in turn 
> involves parsing XML config files. Apart from the fact that this process is 
> wasteful and heavy, in addition in a highly concurrent setup it leads to 
> multiple threads blocking on SAX parser creation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to