Hi there, Recently I've been trying to get enhancements from a PDF file, and the Apache tika engine fails and logs the following error:
21.06.2016 17:02:56.824 *ERROR* [Thread-12] org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler Unexpected Exception while processing ContentItem <urn:content-item-sha1-11f821c604fcbcbaeca7a1c29909065ac7fedafc> with EnhancementJobManager: class org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl java.lang.NoClassDefFoundError: Could not initialize class org.apache.pdfbox.pdmodel.PDPage at org.apache.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:212) at org.apache.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:218) at org.apache.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:184) at org.apache.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:212) at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340) at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:106) at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:143) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.stanbol.enhancer.engines.tika.TikaEngine$1.run(TikaEngine.java:275) at java.security.AccessController.doPrivileged(Native Method) at org.apache.stanbol.enhancer.engines.tika.TikaEngine.computeEnhancements(TikaEngine.java:256) at org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.processEvent(EnhancementJobHandler.java:280) at org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.handleEvent(EnhancementJobHandler.java:198) at org.apache.felix.eventadmin.impl.handler.EventHandlerProxy.sendEvent(EventHandlerProxy.java:415) at org.apache.felix.eventadmin.impl.tasks.SyncDeliverTasks.execute(SyncDeliverTasks.java:118) at org.apache.felix.eventadmin.impl.tasks.AsyncDeliverTasks$TaskExecuter.run(AsyncDeliverTasks.java:159) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 21.06.2016 17:02:56.824 *INFO* [qtp158698819-2677] org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl Execution of Chain tikaChain failed after 14ms for ContentItem <urn:content-item-sha1-11f821c604fcbcbaeca7a1c29909065ac7fedafc> 21.06.2016 17:02:56.824 *INFO* [qtp158698819-2677] org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl finished: true 21.06.2016 17:02:56.824 *INFO* [qtp158698819-2677] org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl state: failed 21.06.2016 17:02:56.824 *INFO* [qtp158698819-2677] org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl chain: tikaChain 21.06.2016 17:02:56.824 *INFO* [qtp158698819-2677] org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl content-item: <urn:content-item-sha1-11f821c604fcbcbaeca7a1c29909065ac7fedafc> 21.06.2016 17:02:56.824 *INFO* [qtp158698819-2677] org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl executions: 21.06.2016 17:02:56.824 *INFO* [qtp158698819-2677] org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl - tika completed 21.06.2016 17:02:56.824 *INFO* [qtp158698819-2677] org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl Error Message: Enhancement Chain failed because of required Engi at org.apache.stanbol.enhancer.jersey.resource.AbstractEnhancerResource.enhanceFromData(AbstractEnhancerResource.java:213) at sun.reflect.GeneratedMethodAccessor105.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:151) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:171) at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:152) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:104) at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:406) at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:350) at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:106) at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:259) ... 59 common frames omitted Caused by: java.lang.IllegalStateException: Unexpected Exception while processing ContentItem <urn:content-item-sha1-11f821c604fcbcbaeca7a1c29909065ac7fedafc> with EnhancementJobManager: class org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl at org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.handleEvent(EnhancementJobHandler.java:204) at org.apache.felix.eventadmin.impl.handler.EventHandlerProxy.sendEvent(EventHandlerProxy.java:415) at org.apache.felix.eventadmin.impl.tasks.SyncDeliverTasks.execute(SyncDeliverTasks.java:118) at org.apache.felix.eventadmin.impl.tasks.AsyncDeliverTasks$TaskExecuter.run(AsyncDeliverTasks.java:159) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ... 1 common frames omitted Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.pdfbox.pdmodel.PDPage at org.apache.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:212) at org.apache.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:218) at org.apache.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:184) at org.apache.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:212) at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340) at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:106) at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:143) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.stanbol.enhancer.engines.tika.TikaEngine$1.run(TikaEngine.java:275) at java.security.AccessController.doPrivileged(Native Method) at org.apache.stanbol.enhancer.engines.tika.TikaEngine.computeEnhancements(TikaEngine.java:256) at org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.processEvent(EnhancementJobHandler.java:280) at org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.handleEvent(EnhancementJobHandler.java:198) ... 8 common frames omitted 21.06.2016 17:02:56.825 *WARN* [qtp158698819-2677] org.eclipse.jetty.server.HttpChannel Could not send response error 500: javax.servlet.ServletException: org.glassfish.jersey.server.ContainerException: org.apache.stanbol.enhancer.servicesapi.ChainException: Enhancement Chain failed because of required Engine 'tika' failed with Message: Unable to process ContentItem '<urn:content-item-sha1-11f821c604fcbcbaeca7a1c29909065ac7fedafc>' with Enhancement Engine 'tika' because the engine is currently not active(Reason: Unexpected Exception while processing ContentItem <urn:content-item-sha1-11f821c604fcbcbaeca7a1c29909065ac7fedafc> with EnhancementJobManager: class org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl)! I think the main problem here is: Could not initialize class org.apache.pdfbox.pdmodel.PDPage, then because of that, the tika engine fails, and because of thata, it is seen as not active. Works fine with other file formats, like Office Word Docs, Excel Spreadsheets... etc I'm wondering if anyone else as had this problem. I am using a build checked out from trunk and it is around 2 months old, so, V 1.0, not 0.12. I am also using a custom launcher and I can provide details on that launcher if anyone thinks it might be relevant, even though I basically just stripped out the External Engines. If there is anything else I can provide, just ask. Best Regards, Antero Duarte