[ https://issues.apache.org/jira/browse/TIKA-863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210205#comment-13210205 ]
Nick Burch commented on TIKA-863: --------------------------------- I'm not sure if we should be setting it as AutoDetectParser.class on the context or not. Part of me thinks that we should be setting it as Parser.class, if the user hasn't already set one. However, there is an issue that the parser set for that may be expecting something slightly different, eg an embedded document rather than effectively the core document, though you could argue that the mime contents are all embedded. The second worry is with attaching additional objects to the ParseContext automatically. A user may choose to not set a recursing parser on the ParseContext, because they don't want embedded resources. If we add one, and the email contains a word document with an embedded excel spreadsheet, then suddenly because we added a Parser to the context recursion will trigger > MailContentHandler should not create AutoDetectParser on each call > ------------------------------------------------------------------ > > Key: TIKA-863 > URL: https://issues.apache.org/jira/browse/TIKA-863 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.1 > Reporter: Andrzej Bialecki > Attachments: TIKA-863.patch > > > MailContentHandler is called from RFC822Parser, and it creates > AutoDetectParser on each call to parse(...). The process of creating > AutoDetectParser involves reading TikaConfig (not cached), which in turn > involves parsing XML config files. Apart from the fact that this process is > wasteful and heavy, in addition in a highly concurrent setup it leads to > multiple threads blocking on SAX parser creation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira