Re: Text Only Extraction Using Solr and Tika
Hi Emyr, You could try using the extractOnly=true parameter [1]. Of course, you'll need to repost the extracted text manually. --jay [1] http://wiki.apache.org/solr/ExtractingRequestHandler#Extract_Only On Thu, May 5, 2011 at 9:36 AM, Emyr James emyr.ja...@sussex.ac.uk wrote: Hi All, I have solr and tika installed and am happily extracting and indexing various files. Unfortunately on some word documents it blows up since it tries to auto-generate a 'title' field but my title field in the schema is single valued. Here is my config for the extract handler... requestHandler name=/update/extract class=org.apache.solr.handler.extraction.ExtractingRequestHandler lst name=defaults str name=uprefixignored_/str /lst /requestHandler Is there a config option to make it only extract text, or ideally to allow me to specify which metadata fields to accept ? E.g. I'd like to use any author metadata it finds but to not use any title metadata it finds as I want title to be single valued and set explicitly using a literal.title in the post request. I did look around for some docs but all i can find are very basic examples. there's no comprehensive configuration documentation out there as far as I can tell. ALSO... I get some other bad responses coming back such as... htmlheadtitleApache Tomcat/6.0.28 - Error report/titlestyle!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:# 525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;c olor:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--/style /headbodyh1HTTP Status 500 - org.ap ache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator; java.lang.NoSuchMethodError: org.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator; at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:168) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:148) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:636) /h1HR size=1 noshade=noshadepbtype/b Status report/ppbmessage/b uorg.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator; For the above my url was... http://localhost:8080/solr/update/extract?literal.id=3922defaultField=contentfmap.content=contentuprefix=ignored_stream.contentType=application%2Fvnd.ms-powerpointcommit=trueliteral.title=Reactor+cycle+141literal.not es=literal.tag=UCN_productionliteral.author=Maurits+van+der+Grinten I guess there's something special I need to be able to process power point files ? Maybe I need to get the latest apache POI ? Any suggestions welcome... Regards, Emyr
Re: Text Only Extraction Using Solr and Tika
Thanks for the suggestion but there surely must be a better way than that to do it ? I don't want to post the whole file up, get it extracted on the server, send the extracted text back to the client then send it all back up to the server again as plain text. On 05/05/11 14:55, Jay Luker wrote: Hi Emyr, You could try using the extractOnly=true parameter [1]. Of course, you'll need to repost the extracted text manually. --jay [1] http://wiki.apache.org/solr/ExtractingRequestHandler#Extract_Only On Thu, May 5, 2011 at 9:36 AM, Emyr Jamesemyr.ja...@sussex.ac.uk wrote: Hi All, I have solr and tika installed and am happily extracting and indexing various files. Unfortunately on some word documents it blows up since it tries to auto-generate a 'title' field but my title field in the schema is single valued. Here is my config for the extract handler... requestHandler name=/update/extract class=org.apache.solr.handler.extraction.ExtractingRequestHandler lst name=defaults str name=uprefixignored_/str /lst /requestHandler Is there a config option to make it only extract text, or ideally to allow me to specify which metadata fields to accept ? E.g. I'd like to use any author metadata it finds but to not use any title metadata it finds as I want title to be single valued and set explicitly using a literal.title in the post request. I did look around for some docs but all i can find are very basic examples. there's no comprehensive configuration documentation out there as far as I can tell. ALSO... I get some other bad responses coming back such as... htmlheadtitleApache Tomcat/6.0.28 - Error report/titlestyle!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:# 525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;c olor:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--/style /headbodyh1HTTP Status 500 - org.ap ache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator; java.lang.NoSuchMethodError: org.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator; at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:168) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:148) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:636) /h1HR size=1 noshade=noshadepbtype/b Status report/ppbmessage/b uorg.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator; For the above my url was... http://localhost:8080/solr/update/extract?literal.id=3922defaultField=contentfmap.content=contentuprefix=ignored_stream.contentType=application%2Fvnd.ms-powerpointcommit=trueliteral.title=Reactor+cycle+141literal.not
Re: Text Only Extraction Using Solr and Tika
Hi Emyr, You can try the XPath based approach and see if that works. Also, see if dynamic fields can help you for the meta data fields. References- http://wiki.apache.org/solr/SchemaXml#Dynamic_fields http://wiki.apache.org/solr/ExtractingRequestHandler#Input_Parameters http://wiki.apache.org/solr/TikaExtractOnlyExampleOutput Regards, Anuj On Thu, May 5, 2011 at 7:28 PM, Emyr James emyr.ja...@sussex.ac.uk wrote: Thanks for the suggestion but there surely must be a better way than that to do it ? I don't want to post the whole file up, get it extracted on the server, send the extracted text back to the client then send it all back up to the server again as plain text. On 05/05/11 14:55, Jay Luker wrote: Hi Emyr, You could try using the extractOnly=true parameter [1]. Of course, you'll need to repost the extracted text manually. --jay [1] http://wiki.apache.org/solr/ExtractingRequestHandler#Extract_Only On Thu, May 5, 2011 at 9:36 AM, Emyr Jamesemyr.ja...@sussex.ac.uk wrote: Hi All, I have solr and tika installed and am happily extracting and indexing various files. Unfortunately on some word documents it blows up since it tries to auto-generate a 'title' field but my title field in the schema is single valued. Here is my config for the extract handler... requestHandler name=/update/extract class=org.apache.solr.handler.extraction.ExtractingRequestHandler lst name=defaults str name=uprefixignored_/str /lst /requestHandler Is there a config option to make it only extract text, or ideally to allow me to specify which metadata fields to accept ? E.g. I'd like to use any author metadata it finds but to not use any title metadata it finds as I want title to be single valued and set explicitly using a literal.title in the post request. I did look around for some docs but all i can find are very basic examples. there's no comprehensive configuration documentation out there as far as I can tell. ALSO... I get some other bad responses coming back such as... htmlheadtitleApache Tomcat/6.0.28 - Error report/titlestyle!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:# 525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;c olor:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--/style /headbodyh1HTTP Status 500 - org.ap ache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator; java.lang.NoSuchMethodError: org.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator; at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:168) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:148) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at
Re: Text Only Extraction Using Solr and Tika
Hi, I'm not really sure how these can help with my problem. Can you give a bit more info on this ? I think what i'm after is a fairly common request.. http://lucene.472066.n3.nabble.com/Controlling-Tika-s-metadata-td2378677.html http://lucene.472066.n3.nabble.com/Select-tika-output-for-extract-only-td499059.html#a499062 Did the change that Yonik Seely mentions to allow more control over the output ever make it into 1.4 ? Regards, Emyr On 05/05/11 15:01, Anuj Kumar wrote: Hi Emyr, You can try the XPath based approach and see if that works. Also, see if dynamic fields can help you for the meta data fields. References- http://wiki.apache.org/solr/SchemaXml#Dynamic_fields http://wiki.apache.org/solr/ExtractingRequestHandler#Input_Parameters http://wiki.apache.org/solr/TikaExtractOnlyExampleOutput Regards, Anuj On Thu, May 5, 2011 at 7:28 PM, Emyr Jamesemyr.ja...@sussex.ac.uk wrote: Thanks for the suggestion but there surely must be a better way than that to do it ? I don't want to post the whole file up, get it extracted on the server, send the extracted text back to the client then send it all back up to the server again as plain text. On 05/05/11 14:55, Jay Luker wrote: Hi Emyr, You could try using the extractOnly=true parameter [1]. Of course, you'll need to repost the extracted text manually. --jay [1] http://wiki.apache.org/solr/ExtractingRequestHandler#Extract_Only On Thu, May 5, 2011 at 9:36 AM, Emyr Jamesemyr.ja...@sussex.ac.uk wrote: Hi All, I have solr and tika installed and am happily extracting and indexing various files. Unfortunately on some word documents it blows up since it tries to auto-generate a 'title' field but my title field in the schema is single valued. Here is my config for the extract handler... requestHandler name=/update/extract class=org.apache.solr.handler.extraction.ExtractingRequestHandler lst name=defaults str name=uprefixignored_/str /lst /requestHandler Is there a config option to make it only extract text, or ideally to allow me to specify which metadata fields to accept ? E.g. I'd like to use any author metadata it finds but to not use any title metadata it finds as I want title to be single valued and set explicitly using a literal.title in the post request. I did look around for some docs but all i can find are very basic examples. there's no comprehensive configuration documentation out there as far as I can tell. ALSO... I get some other bad responses coming back such as... htmlheadtitleApache Tomcat/6.0.28 - Error report/titlestyle!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:# 525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;c olor:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--/style /headbodyh1HTTP Status 500 - org.ap ache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator; java.lang.NoSuchMethodError: org.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator; at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:168) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:148) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
Re: Text Only Extraction Using Solr and Tika
Hey Emyr, Looking at your stack trace below my guess is that you have two conflicting Apache POI jars in your classpath. The odd stack trace is indicative of that as the class loader is likely loading some other version of the DirectoryNode class that doesn't have the iterator method. java.lang.NoSuchMethodError: org.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator; Thanks, Paul Ramirez On May 5, 2011, at 6:36 AM, Emyr James wrote: Hi All, I have solr and tika installed and am happily extracting and indexing various files. Unfortunately on some word documents it blows up since it tries to auto-generate a 'title' field but my title field in the schema is single valued. Here is my config for the extract handler... requestHandler name=/update/extract class=org.apache.solr.handler.extraction.ExtractingRequestHandler lst name=defaults str name=uprefixignored_/str /lst /requestHandler Is there a config option to make it only extract text, or ideally to allow me to specify which metadata fields to accept ? E.g. I'd like to use any author metadata it finds but to not use any title metadata it finds as I want title to be single valued and set explicitly using a literal.title in the post request. I did look around for some docs but all i can find are very basic examples. there's no comprehensive configuration documentation out there as far as I can tell. ALSO... I get some other bad responses coming back such as... htmlheadtitleApache Tomcat/6.0.28 - Error report/titlestyle!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:# 525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;c olor:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--/style /headbodyh1HTTP Status 500 - org.ap ache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator; java.lang.NoSuchMethodError: org.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator; at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:168) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:148) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:636) /h1HR size=1 noshade=noshadepbtype/b Status report/ppbmessage/b uorg.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator; For the above my url was... http://localhost:8080/solr/update/extract?literal.id=3922defaultField=contentfmap.content=contentuprefix=ignored_stream.contentType=application%2Fvnd.ms-powerpointcommit=trueliteral.title=Reactor+cycle+141literal.not