[ https://issues.apache.org/jira/browse/TIKA-1111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13638883#comment-13638883 ]
Niels Beekman commented on TIKA-1111: ------------------------------------- This was not enough however for the parser, as it was missing an import for the org.w3c.dom package: java.lang.NoClassDefFoundError at org.apache.xmlbeans.XmlBeans.class$(XmlBeans.java:43) at org.apache.xmlbeans.XmlBeans.buildNodeMethod(XmlBeans.java:195) at org.apache.xmlbeans.XmlBeans.buildNodeToCursorMethod(XmlBeans.java:232) at org.apache.xmlbeans.XmlBeans.<clinit>(XmlBeans.java:131) at org.openxmlformats.schemas.wordprocessingml.x2006.main.DocumentDocument$Factory.parse(Unknown Source) at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:134) at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:159) at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:116) at org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:53) at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:180) at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:87) at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.tika.parser.ParsingReader$ParsingTask.run(ParsingReader.java:221) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.ClassNotFoundException: org.w3c.dom.Node at org.apache.felix.framework.ModuleImpl.findClassOrResourceByDelegation(ModuleImpl.java:814) at org.apache.felix.framework.ModuleImpl.access$100(ModuleImpl.java:61) at org.apache.felix.framework.ModuleImpl$ModuleClassLoader.loadClass(ModuleImpl.java:1733) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:169) ... 18 more This appears to be the same issue as TIKA-1086 > Class loading issues when running in OSGi environment > ----------------------------------------------------- > > Key: TIKA-1111 > URL: https://issues.apache.org/jira/browse/TIKA-1111 > Project: Tika > Issue Type: Bug > Affects Versions: 1.3 > Environment: Tika 1.3 (tika-core and tika-bundle OSGi bundles) > Felix 2.0.5 > Reporter: Niels Beekman > > When dom4j is on the system classpath, a class loading error occurs during > detection of Office Open XML files: > java.lang.ExceptionInInitializerError > at > org.apache.poi.openxml4j.opc.internal.unmarshallers.PackagePropertiesUnmarshaller.<clinit>(PackagePropertiesUnmarshaller.java:49) > at org.apache.poi.openxml4j.opc.OPCPackage.init(OPCPackage.java:154) > at org.apache.poi.openxml4j.opc.OPCPackage.<init>(OPCPackage.java:141) > at org.apache.poi.openxml4j.opc.Package.<init>(Package.java:54) > at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:99) > at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:207) > at > org.apache.tika.parser.pkg.ZipContainerDetector.detectOfficeOpenXML(ZipContainerDetector.java:194) > at > org.apache.tika.parser.pkg.ZipContainerDetector.detectZipFormat(ZipContainerDetector.java:134) > at > org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:77) > at > org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61) > at > org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:113) > at > org.apache.tika.parser.ParsingReader$ParsingTask.run(ParsingReader.java:221) > at java.lang.Thread.run(Thread.java:662) > Caused by: java.lang.ClassCastException: org.dom4j.DocumentFactory cannot be > cast to org.dom4j.DocumentFactory > at org.dom4j.DocumentFactory.getInstance(DocumentFactory.java:97) > at org.dom4j.tree.AbstractNode.<clinit>(AbstractNode.java:39) > ... 14 more > As a workaround (maybe a solution), I modified the context classloader when > running the detection (wrapped the detector and parser). This appears to be > the common fix for dom4j, as it uses the context classloader during > initialization. Ideally, the detectors and parsers would be running with > their original loader (from ServiceLoader) as context class loader. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira