[jira] [Commented] (TIKA-1191) ForkParser / ClassLoaderProxy does not define package
[ https://issues.apache.org/jira/browse/TIKA-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16321839#comment-16321839 ] Hudson commented on TIKA-1191: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1420 (See [https://builds.apache.org/job/Tika-trunk/1420/]) fix for TIKA-1191 contributed by BenRomberg (ben: [https://github.com/apache/tika/commit/6a398bd3f6245543091fd7c0e9e4facb34a26882]) * (edit) tika-core/src/test/java/org/apache/tika/fork/ForkTestParser.java * (add) tika-core/src/test/java/org/apache/tika/fork/unusedpackage/ClassInUnusedPackage.java * (edit) tika-core/src/test/java/org/apache/tika/fork/ForkParserTest.java * (edit) tika-core/src/main/java/org/apache/tika/fork/ClassLoaderProxy.java > ForkParser / ClassLoaderProxy does not define package > - > > Key: TIKA-1191 > URL: https://issues.apache.org/jira/browse/TIKA-1191 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.4, 1.5 >Reporter: Nicolas Belisle > Fix For: 1.18 > > Attachments: ClassLoaderProxy.java.patch, Test.java, test.eml > > > ForkParser will throw an Exception in some cases : > org.apache.tika.exception.TikaException: Invalid embedded resource > at > org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedOfficeDoc(AbstractPOIFSExtractor.java:189) > at > org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:135) > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:186) > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.tika.fork.ForkServer.call(ForkServer.java:144) > at org.apache.tika.fork.ForkServer.processRequests(ForkServer.java:124) > at org.apache.tika.fork.ForkServer.main(ForkServer.java:69) > Caused by: java.lang.NullPointerException > at > org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:136) > at > org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:499) > at > org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:60) > at org.apache.tika.config.TikaConfig.(TikaConfig.java:169) > at > org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:268) > at > org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.getTikaConfig(AbstractPOIFSExtractor.java:72) > at > org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.getDetector(AbstractPOIFSExtractor.java:79) > at > org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedOfficeDoc(AbstractPOIFSExtractor.java:176) > ... 10 more > A patch will follow -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TIKA-1191) ForkParser / ClassLoaderProxy does not define package
[ https://issues.apache.org/jira/browse/TIKA-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16321797#comment-16321797 ] Ben Romberg commented on TIKA-1191: --- Thank you! Feels good to contribute at least a little to such a great project! > ForkParser / ClassLoaderProxy does not define package > - > > Key: TIKA-1191 > URL: https://issues.apache.org/jira/browse/TIKA-1191 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.4, 1.5 >Reporter: Nicolas Belisle > Fix For: 1.18 > > Attachments: ClassLoaderProxy.java.patch, Test.java, test.eml > > > ForkParser will throw an Exception in some cases : > org.apache.tika.exception.TikaException: Invalid embedded resource > at > org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedOfficeDoc(AbstractPOIFSExtractor.java:189) > at > org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:135) > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:186) > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.tika.fork.ForkServer.call(ForkServer.java:144) > at org.apache.tika.fork.ForkServer.processRequests(ForkServer.java:124) > at org.apache.tika.fork.ForkServer.main(ForkServer.java:69) > Caused by: java.lang.NullPointerException > at > org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:136) > at > org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:499) > at > org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:60) > at org.apache.tika.config.TikaConfig.(TikaConfig.java:169) > at > org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:268) > at > org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.getTikaConfig(AbstractPOIFSExtractor.java:72) > at > org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.getDetector(AbstractPOIFSExtractor.java:79) > at > org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedOfficeDoc(AbstractPOIFSExtractor.java:176) > ... 10 more > A patch will follow -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TIKA-1191) ForkParser / ClassLoaderProxy does not define package
[ https://issues.apache.org/jira/browse/TIKA-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16321769#comment-16321769 ] ASF GitHub Bot commented on TIKA-1191: -- Gagravarr closed pull request #215: TIKA-1191 fix package access in ForkParser URL: https://github.com/apache/tika/pull/215 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/tika-core/src/main/java/org/apache/tika/fork/ClassLoaderProxy.java b/tika-core/src/main/java/org/apache/tika/fork/ClassLoaderProxy.java index 920926d74..01b0ba548 100644 --- a/tika-core/src/main/java/org/apache/tika/fork/ClassLoaderProxy.java +++ b/tika-core/src/main/java/org/apache/tika/fork/ClassLoaderProxy.java @@ -112,7 +112,9 @@ protected synchronized URL findResource(String name) { // Receive the response if (input.readBoolean()) { byte[] data = readStream(); -return defineClass(name, data, 0, data.length); +Class clazz = defineClass(name, data, 0, data.length); +definePackageIfNecessary(name, clazz); +return clazz; } else { throw new ClassNotFoundException("Unable to find class " + name); } @@ -121,6 +123,21 @@ protected synchronized URL findResource(String name) { } } +private void definePackageIfNecessary(String className, Class clazz) { +String packageName = toPackageName(className); +if (packageName != null && getPackage(packageName) == null) { +definePackage(packageName, null, null, null, null, null, null, null); +} +} + +private String toPackageName(String className) { +int packageEndIndex = className.lastIndexOf('.'); +if (packageEndIndex > 0) { +return className.substring(0, packageEndIndex); +} +return null; +} + private byte[] readStream() throws IOException { ByteArrayOutputStream stream = new ByteArrayOutputStream(); byte[] buffer = new byte[0x]; diff --git a/tika-core/src/test/java/org/apache/tika/fork/ForkParserTest.java b/tika-core/src/test/java/org/apache/tika/fork/ForkParserTest.java index 5883c75d0..01e08d9d5 100644 --- a/tika-core/src/test/java/org/apache/tika/fork/ForkParserTest.java +++ b/tika-core/src/test/java/org/apache/tika/fork/ForkParserTest.java @@ -218,4 +218,21 @@ public void testPulse() throws Exception { } } +@Test +public void testPackageCanBeAccessed() throws Exception { +ForkParser parser = new ForkParser( +ForkParserTest.class.getClassLoader(), +new ForkTestParser.ForkTestParserAccessingPackage()); +try { +Metadata metadata = new Metadata(); +ContentHandler output = new BodyContentHandler(); +InputStream stream = new ByteArrayInputStream(new byte[0]); +ParseContext context = new ParseContext(); +parser.parse(stream, output, metadata, context); +assertEquals("Hello, World!", output.toString().trim()); +assertEquals("text/plain", metadata.get(Metadata.CONTENT_TYPE)); +} finally { +parser.close(); +} +} } diff --git a/tika-core/src/test/java/org/apache/tika/fork/ForkTestParser.java b/tika-core/src/test/java/org/apache/tika/fork/ForkTestParser.java index 0948cdd64..7e9c0bf2f 100644 --- a/tika-core/src/test/java/org/apache/tika/fork/ForkTestParser.java +++ b/tika-core/src/test/java/org/apache/tika/fork/ForkTestParser.java @@ -22,11 +22,13 @@ import java.util.Set; import org.apache.tika.exception.TikaException; +import org.apache.tika.fork.unusedpackage.ClassInUnusedPackage; import org.apache.tika.metadata.Metadata; import org.apache.tika.mime.MediaType; import org.apache.tika.parser.AbstractParser; import org.apache.tika.parser.ParseContext; import org.apache.tika.sax.XHTMLContentHandler; +import org.junit.Assert; import org.xml.sax.ContentHandler; import org.xml.sax.SAXException; @@ -54,4 +56,12 @@ public void parse( xhtml.endDocument(); } +static class ForkTestParserAccessingPackage extends ForkTestParser { +@Override +public void parse(InputStream stream, ContentHandler handler, Metadata metadata, +ParseContext context) throws IOException, SAXException, TikaException { +Assert.assertNotNull(ClassInUnusedPackage.class.getPackage()); +super.parse(stream, handler, metadata, context); +} +} } \ No newline at end of file diff --git a/tika-core/src/test/java/org/apache/tika/fork/unusedpackage/ClassInUnusedPackage.java b/tika-core/src/test/
[jira] [Commented] (TIKA-1191) ForkParser / ClassLoaderProxy does not define package
[ https://issues.apache.org/jira/browse/TIKA-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16320956#comment-16320956 ] Tim Allison commented on TIKA-1191: --- +1 I've been meaning to do this. Looks good to me. Thank you! > ForkParser / ClassLoaderProxy does not define package > - > > Key: TIKA-1191 > URL: https://issues.apache.org/jira/browse/TIKA-1191 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.4, 1.5 >Reporter: Nicolas Belisle > Attachments: ClassLoaderProxy.java.patch, Test.java, test.eml > > > ForkParser will throw an Exception in some cases : > org.apache.tika.exception.TikaException: Invalid embedded resource > at > org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedOfficeDoc(AbstractPOIFSExtractor.java:189) > at > org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:135) > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:186) > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.tika.fork.ForkServer.call(ForkServer.java:144) > at org.apache.tika.fork.ForkServer.processRequests(ForkServer.java:124) > at org.apache.tika.fork.ForkServer.main(ForkServer.java:69) > Caused by: java.lang.NullPointerException > at > org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:136) > at > org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:499) > at > org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:60) > at org.apache.tika.config.TikaConfig.(TikaConfig.java:169) > at > org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:268) > at > org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.getTikaConfig(AbstractPOIFSExtractor.java:72) > at > org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.getDetector(AbstractPOIFSExtractor.java:79) > at > org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedOfficeDoc(AbstractPOIFSExtractor.java:176) > ... 10 more > A patch will follow -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TIKA-1191) ForkParser / ClassLoaderProxy does not define package
[ https://issues.apache.org/jira/browse/TIKA-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16319839#comment-16319839 ] Nick Burch commented on TIKA-1191: -- [~talli...@mitre.org] I'm minded to apply Ben Romberg's patch from pull #215, any thoughts/comments/objections? > ForkParser / ClassLoaderProxy does not define package > - > > Key: TIKA-1191 > URL: https://issues.apache.org/jira/browse/TIKA-1191 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.4, 1.5 >Reporter: Nicolas Belisle > Attachments: ClassLoaderProxy.java.patch, Test.java, test.eml > > > ForkParser will throw an Exception in some cases : > org.apache.tika.exception.TikaException: Invalid embedded resource > at > org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedOfficeDoc(AbstractPOIFSExtractor.java:189) > at > org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:135) > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:186) > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.tika.fork.ForkServer.call(ForkServer.java:144) > at org.apache.tika.fork.ForkServer.processRequests(ForkServer.java:124) > at org.apache.tika.fork.ForkServer.main(ForkServer.java:69) > Caused by: java.lang.NullPointerException > at > org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:136) > at > org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:499) > at > org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:60) > at org.apache.tika.config.TikaConfig.(TikaConfig.java:169) > at > org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:268) > at > org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.getTikaConfig(AbstractPOIFSExtractor.java:72) > at > org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.getDetector(AbstractPOIFSExtractor.java:79) > at > org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedOfficeDoc(AbstractPOIFSExtractor.java:176) > ... 10 more > A patch will follow -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TIKA-1191) ForkParser / ClassLoaderProxy does not define package
[ https://issues.apache.org/jira/browse/TIKA-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16307557#comment-16307557 ] ASF GitHub Bot commented on TIKA-1191: -- BenRomberg opened a new pull request #215: TIKA-1191 fix package access in ForkParser URL: https://github.com/apache/tika/pull/215 `ForkParser` can not be used right now when using `AutoDetectParser` together with the optional `jai-imageio-core` dependency. This fix enhances the patch provided in TIKA-1191 with unit tests. Thanks for the great work with Apache Tika! It would be really helpful for us to be able to use `ForkParser` with all optional dependencies in a future version. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > ForkParser / ClassLoaderProxy does not define package > - > > Key: TIKA-1191 > URL: https://issues.apache.org/jira/browse/TIKA-1191 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.4, 1.5 >Reporter: Nicolas Belisle > Attachments: ClassLoaderProxy.java.patch, Test.java, test.eml > > > ForkParser will throw an Exception in some cases : > org.apache.tika.exception.TikaException: Invalid embedded resource > at > org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedOfficeDoc(AbstractPOIFSExtractor.java:189) > at > org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:135) > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:186) > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.tika.fork.ForkServer.call(ForkServer.java:144) > at org.apache.tika.fork.ForkServer.processRequests(ForkServer.java:124) > at org.apache.tika.fork.ForkServer.main(ForkServer.java:69) > Caused by: java.lang.NullPointerException > at > org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:136) > at > org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:499) > at > org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:60) > at org.apache.tika.config.TikaConfig.(TikaConfig.java:169) > at > org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:268) > at > org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.getTikaConfig(AbstractPOIFSExtractor.java:72) > at > org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.getDetector(AbstractPOIFSExtractor.java:79) > at > org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedOfficeDoc(AbstractPOIFSExtractor.java:176) > ... 10 more > A patch will follow -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TIKA-1191) ForkParser / ClassLoaderProxy does not define package
[ https://issues.apache.org/jira/browse/TIKA-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14703807#comment-14703807 ] Eric Biggers commented on TIKA-1191: I am using Tika 1.7 and I encountered this problem while testing ForkParser on the files in the test-documents directory distributed with the Tika sources. An example of a file that causes the problem is "testBinControlWord.rtf". Applying the ClassLoaderProxy.java.patch attached to this ticket appears to solve the problem (or at least work around it, since the packages won't be defined with their full original metadata). The stacktrace given above for Tika 1.8-SNAPSHOT looks like an unrelated problem. > ForkParser / ClassLoaderProxy does not define package > - > > Key: TIKA-1191 > URL: https://issues.apache.org/jira/browse/TIKA-1191 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.4, 1.5 >Reporter: Nicolas Belisle > Attachments: ClassLoaderProxy.java.patch, Test.java, test.eml > > > ForkParser will throw an Exception in some cases : > org.apache.tika.exception.TikaException: Invalid embedded resource > at > org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedOfficeDoc(AbstractPOIFSExtractor.java:189) > at > org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:135) > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:186) > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.tika.fork.ForkServer.call(ForkServer.java:144) > at org.apache.tika.fork.ForkServer.processRequests(ForkServer.java:124) > at org.apache.tika.fork.ForkServer.main(ForkServer.java:69) > Caused by: java.lang.NullPointerException > at > org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:136) > at > org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:499) > at > org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:60) > at org.apache.tika.config.TikaConfig.(TikaConfig.java:169) > at > org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:268) > at > org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.getTikaConfig(AbstractPOIFSExtractor.java:72) > at > org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.getDetector(AbstractPOIFSExtractor.java:79) > at > org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedOfficeDoc(AbstractPOIFSExtractor.java:176) > ... 10 more > A patch will follow -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1191) ForkParser / ClassLoaderProxy does not define package
[ https://issues.apache.org/jira/browse/TIKA-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14362588#comment-14362588 ] Tyler Palsulich commented on TIKA-1191: --- Here is an updated stacktrace for Tika 1.8-SNAPSHOT. It looks like something is trying to mark/reset a stream that doesn't support it: {code} ➜ trunk tika -z https://issues.apache.org/jira/secure/attachment/12657409/test.eml Exception in thread "main" org.apache.tika.exception.TikaException: Failed to parse an email message at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:79) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:270) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:270) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:153) at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:450) at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:123) Caused by: java.io.IOException: mark/reset not supported at java.io.InputStream.reset(InputStream.java:347) at org.apache.tika.parser.microsoft.POIFSContainerDetector.detect(POIFSContainerDetector.java:161) at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61) at org.apache.tika.cli.TikaCLI$FileEmbeddedDocumentExtractor.parseEmbedded(TikaCLI.java:918) at org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:110) at org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133) at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76) ... 6 more {code}. > ForkParser / ClassLoaderProxy does not define package > - > > Key: TIKA-1191 > URL: https://issues.apache.org/jira/browse/TIKA-1191 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.4, 1.5 >Reporter: Nicolas Belisle > Attachments: ClassLoaderProxy.java.patch, Test.java, test.eml > > > ForkParser will throw an Exception in some cases : > org.apache.tika.exception.TikaException: Invalid embedded resource > at > org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedOfficeDoc(AbstractPOIFSExtractor.java:189) > at > org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:135) > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:186) > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.tika.fork.ForkServer.call(ForkServer.java:144) > at org.apache.tika.fork.ForkServer.processRequests(ForkServer.java:124) > at org.apache.tika.fork.ForkServer.main(ForkServer.java:69) > Caused by: java.lang.NullPointerException > at > org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:136) > at > org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:499) > at > org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:60) > at org.apache.tika.config.TikaConfig.(TikaConfig.java:169) > at > org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:268) > at > org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.getTikaConfig(AbstractPOIFSExtractor.java:72) > at > org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.getDetector(AbstractPOIFSExtractor.java:79) > at > org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedOfficeDoc(AbstractPOIFSExtractor.java:176) > ... 10 more > A patch will follow -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1191) ForkParser / ClassLoaderProxy does not define package
[ https://issues.apache.org/jira/browse/TIKA-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072104#comment-14072104 ] Nicolas Belisle commented on TIKA-1191: --- I was able to reproduce a similar issue with another file using Tika 1.5. See attached eml.test and the test (Test.java). The exception : Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.mail.RFC822Parser@6743bc0f at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.tika.fork.ForkServer.call(ForkServer.java:144) at org.apache.tika.fork.ForkServer.processRequests(ForkServer.java:124) at org.apache.tika.fork.ForkServer.main(ForkServer.java:69) Caused by: java.lang.NullPointerException at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:158) at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:516) at org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:60) at org.apache.tika.config.TikaConfig.(TikaConfig.java:169) at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:268) at org.apache.tika.parser.AutoDetectParser.(AutoDetectParser.java:51) at org.apache.tika.parser.mail.RFC822Parser.adaptedExtractMultipart(RFC822Parser.java:167) at org.apache.tika.parser.mail.RFC822Parser.adaptedExtractMultipart(RFC822Parser.java:156) at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:101) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) ... 9 more > ForkParser / ClassLoaderProxy does not define package > - > > Key: TIKA-1191 > URL: https://issues.apache.org/jira/browse/TIKA-1191 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.4 >Reporter: Nicolas Belisle > Attachments: ClassLoaderProxy.java.patch, Test.java, test.eml > > > ForkParser will throw an Exception in some cases : > org.apache.tika.exception.TikaException: Invalid embedded resource > at > org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedOfficeDoc(AbstractPOIFSExtractor.java:189) > at > org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:135) > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:186) > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.tika.fork.ForkServer.call(ForkServer.java:144) > at org.apache.tika.fork.ForkServer.processRequests(ForkServer.java:124) > at org.apache.tika.fork.ForkServer.main(ForkServer.java:69) > Caused by: java.lang.NullPointerException > at > org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:136) > at > org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:499) > at > org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:60) > at org.apache.tika.config.TikaConfig.(TikaConfig.java:169) > at > org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:268) > at > org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.getTikaConfig(AbstractPOIFSExtractor.java:72) > at > org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.getDetector(AbstractPOIFSExtractor.java:79) > at > org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedOfficeDoc(AbstractPOIFSExtractor.java:176) > ... 10 more > A patch will follow -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TIKA-1191) ForkParser / ClassLoaderProxy does not define package
[ https://issues.apache.org/jira/browse/TIKA-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813484#comment-13813484 ] Nicolas Belisle commented on TIKA-1191: --- Unfortunately, I cannot upload an example (in my case, a Word 97-2003 document) that triggers the issue. > ForkParser / ClassLoaderProxy does not define package > - > > Key: TIKA-1191 > URL: https://issues.apache.org/jira/browse/TIKA-1191 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.4 >Reporter: Nicolas Belisle > Attachments: ClassLoaderProxy.java.patch > > > ForkParser will throw an Exception in some cases : > org.apache.tika.exception.TikaException: Invalid embedded resource > at > org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedOfficeDoc(AbstractPOIFSExtractor.java:189) > at > org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:135) > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:186) > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.tika.fork.ForkServer.call(ForkServer.java:144) > at org.apache.tika.fork.ForkServer.processRequests(ForkServer.java:124) > at org.apache.tika.fork.ForkServer.main(ForkServer.java:69) > Caused by: java.lang.NullPointerException > at > org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:136) > at > org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:499) > at > org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:60) > at org.apache.tika.config.TikaConfig.(TikaConfig.java:169) > at > org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:268) > at > org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.getTikaConfig(AbstractPOIFSExtractor.java:72) > at > org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.getDetector(AbstractPOIFSExtractor.java:79) > at > org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedOfficeDoc(AbstractPOIFSExtractor.java:176) > ... 10 more > A patch will follow -- This message was sent by Atlassian JIRA (v6.1#6144)