Hi Antero, It seems clear that there are some PDFBox dependencies that are not being included as bundles in the specific launcher that you are using. I have just compiled and unit tested a fresh copy (from the trunk) of the tika engine and it hasn't failed. PDF files are used in the engine unit tests. So, it could be an old issue already solved. I have not used tika engine too much, did you configure it in some way that could be causing this?
Cheers, Rafa On Tue, Jun 21, 2016 at 6:14 PM Antero Duarte <a.fduar...@gmail.com> wrote: > Hi there, > > Recently I've been trying to get enhancements from a PDF file, and the > Apache tika engine fails and logs the following error: > > 21.06.2016 17:02:56.824 *ERROR* [Thread-12] > org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler > Unexpected Exception while processing ContentItem > <urn:content-item-sha1-11f821c604fcbcbaeca7a1c29909065ac7fedafc> with > EnhancementJobManager: class > org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl > java.lang.NoClassDefFoundError: Could not initialize class > org.apache.pdfbox.pdmodel.PDPage > at > org.apache.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:212) > at > org.apache.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:218) > at > org.apache.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:184) > at > > org.apache.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:212) > at > org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340) > at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:106) > at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:143) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at > > org.apache.stanbol.enhancer.engines.tika.TikaEngine$1.run(TikaEngine.java:275) > at java.security.AccessController.doPrivileged(Native Method) > at > > org.apache.stanbol.enhancer.engines.tika.TikaEngine.computeEnhancements(TikaEngine.java:256) > at > > org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.processEvent(EnhancementJobHandler.java:280) > at > > org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.handleEvent(EnhancementJobHandler.java:198) > at > > org.apache.felix.eventadmin.impl.handler.EventHandlerProxy.sendEvent(EventHandlerProxy.java:415) > at > > org.apache.felix.eventadmin.impl.tasks.SyncDeliverTasks.execute(SyncDeliverTasks.java:118) > at > > org.apache.felix.eventadmin.impl.tasks.AsyncDeliverTasks$TaskExecuter.run(AsyncDeliverTasks.java:159) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 21.06.2016 17:02:56.824 *INFO* [qtp158698819-2677] > org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl > Execution of Chain tikaChain failed after 14ms for ContentItem > <urn:content-item-sha1-11f821c604fcbcbaeca7a1c29909065ac7fedafc> > 21.06.2016 17:02:56.824 *INFO* [qtp158698819-2677] > org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl > finished: true > 21.06.2016 17:02:56.824 *INFO* [qtp158698819-2677] > org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl > state: failed > 21.06.2016 17:02:56.824 *INFO* [qtp158698819-2677] > org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl > chain: tikaChain > 21.06.2016 17:02:56.824 *INFO* [qtp158698819-2677] > org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl > content-item: > <urn:content-item-sha1-11f821c604fcbcbaeca7a1c29909065ac7fedafc> > 21.06.2016 17:02:56.824 *INFO* [qtp158698819-2677] > org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl > executions: > 21.06.2016 17:02:56.824 *INFO* [qtp158698819-2677] > org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl - > tika completed > 21.06.2016 17:02:56.824 *INFO* [qtp158698819-2677] > org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl Error > Message: Enhancement Chain failed because of required Engi at > > org.apache.stanbol.enhancer.jersey.resource.AbstractEnhancerResource.enhanceFromData(AbstractEnhancerResource.java:213) > at sun.reflect.GeneratedMethodAccessor105.invoke(Unknown Source) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > > org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81) > at > > org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:151) > at > > org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:171) > at > > org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:152) > at > > org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:104) > at > > org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:406) > at > > org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:350) > at > > org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:106) > at > org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:259) > ... 59 common frames omitted > Caused by: java.lang.IllegalStateException: Unexpected Exception while > processing ContentItem > <urn:content-item-sha1-11f821c604fcbcbaeca7a1c29909065ac7fedafc> with > EnhancementJobManager: class > org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl > at > > org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.handleEvent(EnhancementJobHandler.java:204) > at > > org.apache.felix.eventadmin.impl.handler.EventHandlerProxy.sendEvent(EventHandlerProxy.java:415) > at > > org.apache.felix.eventadmin.impl.tasks.SyncDeliverTasks.execute(SyncDeliverTasks.java:118) > at > > org.apache.felix.eventadmin.impl.tasks.AsyncDeliverTasks$TaskExecuter.run(AsyncDeliverTasks.java:159) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > ... 1 common frames omitted > Caused by: java.lang.NoClassDefFoundError: Could not initialize class > org.apache.pdfbox.pdmodel.PDPage > at > org.apache.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:212) > at > org.apache.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:218) > at > org.apache.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:184) > at > > org.apache.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:212) > at > org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340) > at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:106) > at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:143) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at > > org.apache.stanbol.enhancer.engines.tika.TikaEngine$1.run(TikaEngine.java:275) > at java.security.AccessController.doPrivileged(Native Method) > at > > org.apache.stanbol.enhancer.engines.tika.TikaEngine.computeEnhancements(TikaEngine.java:256) > at > > org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.processEvent(EnhancementJobHandler.java:280) > at > > org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.handleEvent(EnhancementJobHandler.java:198) > ... 8 common frames omitted > 21.06.2016 17:02:56.825 *WARN* [qtp158698819-2677] > org.eclipse.jetty.server.HttpChannel Could not send response error 500: > javax.servlet.ServletException: > org.glassfish.jersey.server.ContainerException: > org.apache.stanbol.enhancer.servicesapi.ChainException: Enhancement Chain > failed because of required Engine 'tika' failed with Message: Unable to > process ContentItem > '<urn:content-item-sha1-11f821c604fcbcbaeca7a1c29909065ac7fedafc>' with > Enhancement Engine 'tika' because the engine is currently not > active(Reason: Unexpected Exception while processing ContentItem > <urn:content-item-sha1-11f821c604fcbcbaeca7a1c29909065ac7fedafc> with > EnhancementJobManager: class > org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl)! > > I think the main problem here is: Could not initialize class > org.apache.pdfbox.pdmodel.PDPage, then because of that, the tika engine > fails, and because of thata, it is seen as not active. > Works fine with other file formats, like Office Word Docs, Excel > Spreadsheets... etc > > I'm wondering if anyone else as had this problem. I am using a build > checked out from trunk and it is around 2 months old, so, V 1.0, not 0.12. > I am also using a custom launcher and I can provide details on that > launcher if anyone thinks it might be relevant, even though I basically > just stripped out the External Engines. > > If there is anything else I can provide, just ask. > > Best Regards, > Antero Duarte >