Cheers Rafa, I just figured out the problem, thanks for your pointers, it was indeed a configuration in the custom launcher, but it's not so obvious... All I changed to make it work was including the contenthub module that I had previously removed because I wasn't using it... Could the PDF parser be a dependency of the contenthub instead of being a direct dependency of the tika bundle? If that is the case, I don't think it makes much sense since you can use tika with the enhancer, not necessarily with the contenthub.
But yeah, adding that again, and rebuilding stanbol and the custom launcher with mvn clean install -U fixed it. I did update stanbol before rebuilding, so something else might have changed, but I'm assuming commenting the contenthub module is what broke it in the first place. Just for the record, I commented the contenthub entry in stanbol/launchers/bundlelists/pom.xml Anyway, it's fixed, thanks for your answer Regards, Antero On Wed, 22 Jun 2016 at 10:00 Rafa Haro <rh...@apache.org> wrote: > Hi Antero, > > It seems clear that there are some PDFBox dependencies that are not being > included as bundles in the specific launcher that you are using. I have > just compiled and unit tested a fresh copy (from the trunk) of the tika > engine and it hasn't failed. PDF files are used in the engine unit tests. > So, it could be an old issue already solved. I have not used tika engine > too much, did you configure it in some way that could be causing this? > > Cheers, > Rafa > > On Tue, Jun 21, 2016 at 6:14 PM Antero Duarte <a.fduar...@gmail.com> > wrote: > > > Hi there, > > > > Recently I've been trying to get enhancements from a PDF file, and the > > Apache tika engine fails and logs the following error: > > > > 21.06.2016 17:02:56.824 *ERROR* [Thread-12] > > org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler > > Unexpected Exception while processing ContentItem > > <urn:content-item-sha1-11f821c604fcbcbaeca7a1c29909065ac7fedafc> with > > EnhancementJobManager: class > > org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl > > java.lang.NoClassDefFoundError: Could not initialize class > > org.apache.pdfbox.pdmodel.PDPage > > at > > org.apache.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:212) > > at > > org.apache.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:218) > > at > > org.apache.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:184) > > at > > > > > org.apache.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:212) > > at > > > org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340) > > at > org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:106) > > at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:143) > > at > > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > > at > > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > > at > > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > > at > > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > > at > > > > > org.apache.stanbol.enhancer.engines.tika.TikaEngine$1.run(TikaEngine.java:275) > > at java.security.AccessController.doPrivileged(Native Method) > > at > > > > > org.apache.stanbol.enhancer.engines.tika.TikaEngine.computeEnhancements(TikaEngine.java:256) > > at > > > > > org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.processEvent(EnhancementJobHandler.java:280) > > at > > > > > org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.handleEvent(EnhancementJobHandler.java:198) > > at > > > > > org.apache.felix.eventadmin.impl.handler.EventHandlerProxy.sendEvent(EventHandlerProxy.java:415) > > at > > > > > org.apache.felix.eventadmin.impl.tasks.SyncDeliverTasks.execute(SyncDeliverTasks.java:118) > > at > > > > > org.apache.felix.eventadmin.impl.tasks.AsyncDeliverTasks$TaskExecuter.run(AsyncDeliverTasks.java:159) > > at > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > > at > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > > at > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > > at java.lang.Thread.run(Thread.java:745) > > 21.06.2016 17:02:56.824 *INFO* [qtp158698819-2677] > > org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl > > Execution of Chain tikaChain failed after 14ms for ContentItem > > <urn:content-item-sha1-11f821c604fcbcbaeca7a1c29909065ac7fedafc> > > 21.06.2016 17:02:56.824 *INFO* [qtp158698819-2677] > > org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl > > finished: true > > 21.06.2016 17:02:56.824 *INFO* [qtp158698819-2677] > > org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl > > state: failed > > 21.06.2016 17:02:56.824 *INFO* [qtp158698819-2677] > > org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl > > chain: tikaChain > > 21.06.2016 17:02:56.824 *INFO* [qtp158698819-2677] > > org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl > > content-item: > > <urn:content-item-sha1-11f821c604fcbcbaeca7a1c29909065ac7fedafc> > > 21.06.2016 17:02:56.824 *INFO* [qtp158698819-2677] > > org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl > > executions: > > 21.06.2016 17:02:56.824 *INFO* [qtp158698819-2677] > > org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl > - > > tika completed > > 21.06.2016 17:02:56.824 *INFO* [qtp158698819-2677] > > org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl > Error > > Message: Enhancement Chain failed because of required Engi at > > > > > org.apache.stanbol.enhancer.jersey.resource.AbstractEnhancerResource.enhanceFromData(AbstractEnhancerResource.java:213) > > at sun.reflect.GeneratedMethodAccessor105.invoke(Unknown Source) > > at > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:498) > > at > > > > > org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81) > > at > > > > > org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:151) > > at > > > > > org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:171) > > at > > > > > org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:152) > > at > > > > > org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:104) > > at > > > > > org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:406) > > at > > > > > org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:350) > > at > > > > > org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:106) > > at > > org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:259) > > ... 59 common frames omitted > > Caused by: java.lang.IllegalStateException: Unexpected Exception while > > processing ContentItem > > <urn:content-item-sha1-11f821c604fcbcbaeca7a1c29909065ac7fedafc> with > > EnhancementJobManager: class > > org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl > > at > > > > > org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.handleEvent(EnhancementJobHandler.java:204) > > at > > > > > org.apache.felix.eventadmin.impl.handler.EventHandlerProxy.sendEvent(EventHandlerProxy.java:415) > > at > > > > > org.apache.felix.eventadmin.impl.tasks.SyncDeliverTasks.execute(SyncDeliverTasks.java:118) > > at > > > > > org.apache.felix.eventadmin.impl.tasks.AsyncDeliverTasks$TaskExecuter.run(AsyncDeliverTasks.java:159) > > at > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > > at > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > > at > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > > ... 1 common frames omitted > > Caused by: java.lang.NoClassDefFoundError: Could not initialize class > > org.apache.pdfbox.pdmodel.PDPage > > at > > org.apache.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:212) > > at > > org.apache.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:218) > > at > > org.apache.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:184) > > at > > > > > org.apache.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:212) > > at > > > org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340) > > at > org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:106) > > at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:143) > > at > > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > > at > > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > > at > > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > > at > > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > > at > > > > > org.apache.stanbol.enhancer.engines.tika.TikaEngine$1.run(TikaEngine.java:275) > > at java.security.AccessController.doPrivileged(Native Method) > > at > > > > > org.apache.stanbol.enhancer.engines.tika.TikaEngine.computeEnhancements(TikaEngine.java:256) > > at > > > > > org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.processEvent(EnhancementJobHandler.java:280) > > at > > > > > org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.handleEvent(EnhancementJobHandler.java:198) > > ... 8 common frames omitted > > 21.06.2016 17:02:56.825 *WARN* [qtp158698819-2677] > > org.eclipse.jetty.server.HttpChannel Could not send response error 500: > > javax.servlet.ServletException: > > org.glassfish.jersey.server.ContainerException: > > org.apache.stanbol.enhancer.servicesapi.ChainException: Enhancement Chain > > failed because of required Engine 'tika' failed with Message: Unable to > > process ContentItem > > '<urn:content-item-sha1-11f821c604fcbcbaeca7a1c29909065ac7fedafc>' with > > Enhancement Engine 'tika' because the engine is currently not > > active(Reason: Unexpected Exception while processing ContentItem > > <urn:content-item-sha1-11f821c604fcbcbaeca7a1c29909065ac7fedafc> with > > EnhancementJobManager: class > > org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl)! > > > > I think the main problem here is: Could not initialize class > > org.apache.pdfbox.pdmodel.PDPage, then because of that, the tika engine > > fails, and because of thata, it is seen as not active. > > Works fine with other file formats, like Office Word Docs, Excel > > Spreadsheets... etc > > > > I'm wondering if anyone else as had this problem. I am using a build > > checked out from trunk and it is around 2 months old, so, V 1.0, not > 0.12. > > I am also using a custom launcher and I can provide details on that > > launcher if anyone thinks it might be relevant, even though I basically > > just stripped out the External Engines. > > > > If there is anything else I can provide, just ask. > > > > Best Regards, > > Antero Duarte > > >