Luca created SOLR-12985: --------------------------- Summary: ClassNotFound indexing crypted documents Key: SOLR-12985 URL: https://issues.apache.org/jira/browse/SOLR-12985 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: contrib - DataImportHandler Affects Versions: 7.3.1 Reporter: Luca
When indexing a BLOB containing an encrypted Office Document (xls or xlsx but I think all types) it fail with a very bad exception, if the document is not encrypted works fine. I'm using the DataImportHandler. The exception seems also avoid the onError=skip or continue, making the import fail. I tried to move the libraries from contrib/extraction/lib/ to server/lib and the unfounded class changes, so it's a class loading issue. This is the base exception: Exception while processing: document_index document : SolrInputDocument(fields: [site=187, index_type=document, resource_id=3, title_full=Dati cliente.docx, id=d-XXX-3, publish_date=2018-09-28 00:00:00.0, abstract= Azioni di recupero intraprese sulle Fatture telefoniche, insert_date=2019-09-28 00:00:00.0, type=Documenti, url=http://]):org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to read content Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:69) at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:171) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:517) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415) at org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:364) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:225) at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:452) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:485) at org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.microsoft.OfficeParser@500efcf1 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:165) ... 10 more Caused by: java.io.IOException: java.lang.ClassNotFoundException: org.apache.poi.poifs.crypt.agile.AgileEncryptionInfoBuilder at org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:150) at org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:102) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:203) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ... 13 more Caused by: java.lang.ClassNotFoundException: org.apache.poi.poifs.crypt.agile.AgileEncryptionInfoBuilder at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.eclipse.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:565) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.poi.poifs.crypt.EncryptionInfo.getBuilder(EncryptionInfo.java:222) at org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:148) ... 17 more -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org