[ https://issues.apache.org/jira/browse/TIKA-863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrzej Bialecki updated TIKA-863: ----------------------------------- Attachment: TIKA-863.patch Patch to address this issue. AutoDetectParser instance is cached in ParseContext. All tests pass with this patch. An application that I'm testing with this patch experienced a 12x speedup in mail parsing. > MailContentHandler should not create AutoDetectParser on each call > ------------------------------------------------------------------ > > Key: TIKA-863 > URL: https://issues.apache.org/jira/browse/TIKA-863 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.1 > Reporter: Andrzej Bialecki > Attachments: TIKA-863.patch > > > MailContentHandler is called from RFC822Parser, and it creates > AutoDetectParser on each call to parse(...). The process or creating > AutoDetectParser involves reading TikaConfig (not cached), which in turn > involves parsing XML config files. Apart from the fact that this process is > wasteful and heavy, in addition in a highly concurrent setup it leads to > multiple threads blocking on SAX parser creation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira