[ 
https://issues.apache.org/jira/browse/TIKA-835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177574#comment-13177574
 ] 

Nick Burch commented on TIKA-835:
---------------------------------

winmail.dat is a TNEF file, which POI supports through HMEF. All the sample 
files we have in POI open fine, but it appears that the files we have don't 
cover all the possible cases... (TNEF is partly documented, but not fully)

If you're able to help look into it, then either the d...@poi.apache.org 
mailing list or a new bug in the POI bugzilla are the appropriate places to do 
this (it's a POI issue not a Tika one)
                
> TNEF parsing unstable
> ---------------------
>
>                 Key: TIKA-835
>                 URL: https://issues.apache.org/jira/browse/TIKA-835
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>         Environment: CentOS 4.x/5.x/6.x 
> Java 6
>            Reporter: Rob Tulloh
>
> We are seeing problems in Solr with tika throwing exceptions. Sometimes we 
> see OOM like this:
> {noformat}
> SEVERE: java.lang.OutOfMemoryError: Java heap space
>         at 
> org.apache.poi.hmef.attribute.TNEFAttribute.<init>(TNEFAttribute.java:50)
>         at 
> org.apache.poi.hmef.attribute.TNEFAttribute.create(TNEFAttribute.java:76)
>         at org.apache.poi.hmef.HMEFMessage.process(HMEFMessage.java:74)
>         at org.apache.poi.hmef.HMEFMessage.process(HMEFMessage.java:98)
>         at org.apache.poi.hmef.HMEFMessage.process(HMEFMessage.java:98)
>         at org.apache.poi.hmef.HMEFMessage.process(HMEFMessage.java:98)
>         at org.apache.poi.hmef.HMEFMessage.<init>(HMEFMessage.java:63)
>         at 
> org.apache.tika.parser.microsoft.TNEFParser.parse(TNEFParser.java:79)
>         at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>         at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>         at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:129)
>         at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:195)
>         at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
>         at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>         at 
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:244)
>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:1478)
>         at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353)
>         at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
>         at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>         at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>         at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>         at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>         at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>         at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>         at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>         at 
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>         at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> {noformat}
> Other times, we see errors like this one:
> {noformat}
> Caused by: org.apache.poi.util.LittleEndian$BufferUnderrunException: buffer 
> underrun
>         at org.apache.poi.util.LittleEndian.readUShort(LittleEndian.java:302)
>         at 
> org.apache.poi.hmef.attribute.TNEFAttribute.<init>(TNEFAttribute.java:53)
>         at 
> org.apache.poi.hmef.attribute.TNEFAttribute.create(TNEFAttribute.java:76)
>         at org.apache.poi.hmef.HMEFMessage.process(HMEFMessage.java:74)
>         at org.apache.poi.hmef.HMEFMessage.process(HMEFMessage.java:98)
>         at org.apache.poi.hmef.HMEFMessage.process(HMEFMessage.java:98)
>         at org.apache.poi.hmef.HMEFMessage.process(HMEFMessage.java:98)
>         at org.apache.poi.hmef.HMEFMessage.<init>(HMEFMessage.java:63)
>         at 
> org.apache.tika.parser.microsoft.TNEFParser.parse(TNEFParser.java:79)
>         at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>         ... 26 more
> {noformat}
> I am able to reproduce these failures with tika-app-1.0.jar. I am not able to 
> share the content as the content is proprietary in nature. The OOM error is 
> particularly problematic as it crashes Solr and causes our document indexing 
> pipeline to get congested while it waits for Solr to restart. Please see also 
> Solr ticket https://issues.apache.org/jira/browse/SOLR-2990 as this contains 
> the original posting of the problem and some details of our environment where 
> the tests are being performed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to