[ https://issues.apache.org/jira/browse/TIKA-1043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13532300#comment-13532300 ]
David Morana commented on TIKA-1043: ------------------------------------ Ok, I looked at a lot of my ppt files and the parser was functioning normally for ppts from as early as 1998! And I also discovered my crawler was corrupted. So, I rebuilt it and ran the crawl over night and thankfully, there were no Tika errors! In fact there were no errors at all. Please close the ticket. If the errors come back, I'll open a new ticket. Sorry for any confusion. P.S. Currently, I only have the tomcat logs with the error. Would the tomcat logs be sufficient or will you need the stack trace from jstack.exe? Thanks, > Tika parser v1.2 fails on legacy power point documents > ------------------------------------------------------ > > Key: TIKA-1043 > URL: https://issues.apache.org/jira/browse/TIKA-1043 > Project: Tika > Issue Type: Bug > Affects Versions: 1.2 > Environment: Solr 4.0 on Tomcat 7 with manifoldcf v1.1 dev > Reporter: David Morana > Fix For: 1.2 > > > I can't index "older" powerpoint documents > I did some research and the current "fix" is to open the legacy ppt and save > it as a newer version of PowerPoint. > I have over 3000 ppt docs in my development environment alone so that's not > an option. > Here's the error in solr: > SEVERE: null:org.apache.solr.common.SolrException: > org.apache.tika.exception.TikaException: Unexpected RuntimeException from > org.apache.tika.parser.microsoft.OfficeParser@e86b202 > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:215) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1561) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:244) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) > at > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:240) > at > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:161) > at > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164) > at > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100) > at > org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:541) > at > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) > at > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:383) > at > org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:243) > at > org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:188) > at > org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:166) > at > org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:288) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:722) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira