[ https://issues.apache.org/jira/browse/TIKA-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102137#comment-17102137 ]
Bob Paulin edited comment on TIKA-3094 at 5/8/20, 1:02 AM: ----------------------------------------------------------- Looks like the jaxb error is not so much an issue with tika as it is with the test OSGi container. There's a few different ways to address the jars removed in Java 11 but the most simple I think is to just add the missing jars to the classpath and expose them to the bundle from the system packages. I do not see the error on java 11 or 8 now. was (Author: bob): Looks like the jaxb error is not so much an issue with tika as it is with the test OSGi container. There's a few different ways to address the jars removed in Java 11 but the most simple I think is to just add the missing jars to the classpath and expose them to the bundle from the system class loader. I do not see the error on java 11 or 8 now. > Apache Tika fails to extract text for pptx extension. > ----------------------------------------------------- > > Key: TIKA-3094 > URL: https://issues.apache.org/jira/browse/TIKA-3094 > Project: Tika > Issue Type: Bug > Affects Versions: 1.24, 1.24.1 > Reporter: Abhishek Chauhan > Assignee: Bob Paulin > Priority: Critical > Attachments: Sample PPT.pptx > > > This is regressed from 1.23 version of Apache Tika. Text extraction for .pptx > ententions which was earlier working with Apache Tika 1.23 is no longer > working in 1.24 version. > For .ppt extention it is working fine in both 1.23 and 1.24 > > As I referred to release notes [https://tika.apache.org/1.24/index.html], you > have updated the POI to 4.1.2. That might be the root cause of this problem. > POI requires [https://mvnrepository.com/artifact/com.zaxxer/SparseBitSet/1.2] > which is not present in bundle I guess. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)