[
https://issues.apache.org/jira/browse/SOLR-12985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16687924#comment-16687924
]
Jan Høydahl commented on SOLR-12985:
------------------------------------
I propose that we document this limitation in a general section of Ref Guide
(if not already there, have not checked) and perhaps in particular in the
Extraction Handler documentation.
Perhaps we should also open a new Jira with the aim of fixing Solr's class
loading. I think Jetty gives our code control over class loading so we perhaps
could make sure we use the same class loader for loading everything, including
plugin jars?
I'm a bit puzzled that POI's own class was not able to load this by reflection
since both JARs are loaded by the same loader. You see from the trace that is
looks up using WebAppClassLoader:
{code:java}
Caused by: java.lang.ClassNotFoundException:
org.apache.poi.poifs.crypt.agile.AgileEncryptionInfoBuilder
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at
org.eclipse.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:565)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at
org.apache.poi.poifs.crypt.EncryptionInfo.getBuilder(EncryptionInfo.java:222){code}
Fortunately this was fixed in POI 4.0.0 which is used in latest Tika which will
be included in Solr 8 (master) and will also be released with Solr 7.6.0
scheduled for next week :) See
[https://github.com/apache/poi/commit/d3b5a0141ed28c19b1afe57a0c4dc7b08937b704]
for the related change in class loader in POI, hopefully this does the trick.
> ClassNotFound indexing crypted documents
> ----------------------------------------
>
> Key: SOLR-12985
> URL: https://issues.apache.org/jira/browse/SOLR-12985
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: contrib - DataImportHandler
> Affects Versions: 7.3.1
> Reporter: Luca
> Priority: Critical
> Attachments: crypted.xlsx, db.sql, logs.zip, notcrypted.docx,
> schema.zip
>
>
> When indexing a BLOB containing an encrypted Office Document (xls or xlsx but
> I think all types) it fail with a very bad exception, if the document is not
> encrypted works fine.
> I'm using the DataImportHandler.
> The exception seems also avoid the onError=skip or continue, making the
> import fail.
> I tried to move the libraries from contrib/extraction/lib/ to server/lib and
> the unfounded class changes, so it's a class loading issue.
> This is the base exception:
> Exception while processing: document_index document :
> SolrInputDocument(fields: [site=187, index_type=document, resource_id=3,
> title_full=Dati cliente.docx, id=d-XXX-3, publish_date=2018-09-28 00:00:00.0,
> abstract= Azioni di recupero intraprese sulle Fatture telefoniche,
> insert_date=2019-09-28 00:00:00.0, type=Documenti,
> url=http://]):org.apache.solr.handler.dataimport.DataImportHandlerException:
> Unable to read content Processing Document # 1
> at
> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:69)
> at
> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:171)
> at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267)
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:517)
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
> at
> org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:364)
> at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:225)
> at
> org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:452)
> at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:485)
> at
> org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.tika.exception.TikaException: TIKA-198: Illegal
> IOException from org.apache.tika.parser.microsoft.OfficeParser@500efcf1
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
> at
> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:165)
> ... 10 more
> Caused by: java.io.IOException: java.lang.ClassNotFoundException:
> org.apache.poi.poifs.crypt.agile.AgileEncryptionInfoBuilder
> at
> org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:150)
> at
> org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:102)
> at
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:203)
> at
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ... 13 more
> Caused by: java.lang.ClassNotFoundException:
> org.apache.poi.poifs.crypt.agile.AgileEncryptionInfoBuilder
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at
> org.eclipse.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:565)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at
> org.apache.poi.poifs.crypt.EncryptionInfo.getBuilder(EncryptionInfo.java:222)
> at
> org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:148)
> ... 17 more
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]