unit tests and classpaths
Hi dev group, I'm working on a very simple starter unit test for a new parser and am coming up with some roadblocks. I suspect it may be classpath related, but have tried many iterations and am coming up short. My unit test: package edu.usc.sunset.burgess.tika; //JDK imports import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertTrue; import junit.framework.TestCase; import java.io.InputStream; //TIKA imports import org.apache.tika.metadata.Metadata; import org.apache.tika.metadata.TikaCoreProperties; import org.apache.tika.parser.ParseContext; import org.apache.tika.parser.Parser; import org.apache.tika.sax.BodyContentHandler; import org.junit.Test; import org.xml.sax.ContentHandler; import java.io.IOException; /* * Test cases to exercise the {@link EnviHeaderParser}. * */ public class EnviHeaderParserTest extends TestCase { public static final String TEST_STRING = "{GEO-TIFF File Imported into ENVI [Fri May 25 14:06:23 2012]}"; @Test public void testParser() throws Exception { Parser parser = new EnviHeaderParser(); ContentHandler handler = new BodyContentHandler(); Metadata metadata = new Metadata(); InputStream stream = EnviHeaderParser.class .getResourceAsStream("/test-documents/envi_test_header.hdr"); try { parser.parse(stream, handler, metadata, new ParseContext()); } finally { stream.close(); } // Check text String content = handler.toString(); assertTrue(content.contains(TEST_STRING)); } } --- Files are located as follows: tika/tika-parsers/src/test/java/org/apache/tika/parser/envi/EnviHeaderParserTest.java /tika/tika-parsers/src/test/resources/test-documents/envi_test_header.hdr /tika/anniedev/src/main/java/edu/usc/sunset/burgess/tika/EnviHeaderParser.java To compile and test code I do: cd /tika/tika/tika-parsers mvn -Dtest=EnviHeaderParserTest compile mvn -Dtest=EnviHeaderParserTest test - I get the following output: Running edu.usc.sunset.burgess.tika.EnviHeaderParserTest Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.172 sec <<< FAILURE! Results : Failed tests: testParser(edu.usc.sunset.burgess.tika.EnviHeaderParserTest) Tests run: 1, Failures: 1, Errors: 0, Skipped:0 - Please let me know if any additional information would be helpful. Any insights are appreciated. Annie -- -- Ann Bryant Burgess, PhD Postdoctoral Fellow Computer Science Department University of Southern California Viterbi School of Engineering Los Angeles, CA Alaska Science Center/USGS Anchorage, AK Cell: (585) 738-7549 Office: (907) 786-7059 Fax: (907) 786-7150 E-mail: anniebryant.burg...@gmail.com Office Address: 4210 University Dr., Anchorage, AK 99508-4626 ---
[jira] [Commented] (TIKA-1279) Missing return lines at output of SourceCodeParser
[ https://issues.apache.org/jira/browse/TIKA-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13979880#comment-13979880 ] Tim Allison commented on TIKA-1279: --- Fixed caps in "testJAVA.java" in test cases so that tests pass in *nix. r1589778 > Missing return lines at output of SourceCodeParser > -- > > Key: TIKA-1279 > URL: https://issues.apache.org/jira/browse/TIKA-1279 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.5 >Reporter: Hong-Thai Nguyen >Assignee: Hong-Thai Nguyen >Priority: Trivial > Fix For: 1.6 > > > xhtml output is on a single line. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (TIKA-1279) Missing return lines at output of SourceCodeParser
[ https://issues.apache.org/jira/browse/TIKA-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen resolved TIKA-1279. Resolution: Fixed Thank [~rgauss] for this good catch. I fixed with more tests in r1589742 Hoping that we can move away Java 6 soon :) > Missing return lines at output of SourceCodeParser > -- > > Key: TIKA-1279 > URL: https://issues.apache.org/jira/browse/TIKA-1279 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.5 >Reporter: Hong-Thai Nguyen >Assignee: Hong-Thai Nguyen >Priority: Trivial > Fix For: 1.6 > > > xhtml output is on a single line. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Reopened] (TIKA-1279) Missing return lines at output of SourceCodeParser
[ https://issues.apache.org/jira/browse/TIKA-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Gauss II reopened TIKA-1279: Assignee: Hong-Thai Nguyen [~thaichat04], I believe we still have to support Java 6 and {{System.lineSeparator()}} appears to have been added in Java 7. I think {{System.getProperty("line.separator")}} would be equivalent. > Missing return lines at output of SourceCodeParser > -- > > Key: TIKA-1279 > URL: https://issues.apache.org/jira/browse/TIKA-1279 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.5 >Reporter: Hong-Thai Nguyen >Assignee: Hong-Thai Nguyen >Priority: Trivial > Fix For: 1.6 > > > xhtml output is on a single line. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (TIKA-1278) Expose PDF Avg Char and Spacing Tolerance Config Params
[ https://issues.apache.org/jira/browse/TIKA-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13979700#comment-13979700 ] Ray Gauss II edited comment on TIKA-1278 at 4/24/14 1:31 PM: - Resolved in r1589722. The setting of {{PDF2XHTML}} params was also moved from {{PDF2XHTML.process}} to a new {{PDFParserConfig.configure}} method which should allow developers to extend {{PDFParserConfig}} for custom behavior. was (Author: rgauss): Resolved in r1589722. > Expose PDF Avg Char and Spacing Tolerance Config Params > --- > > Key: TIKA-1278 > URL: https://issues.apache.org/jira/browse/TIKA-1278 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.5 >Reporter: Ray Gauss II >Assignee: Ray Gauss II > Fix For: 1.6 > > > {{PDFParserConfig}} should allow for override of PDFBox's > {{averageCharTolerance}} and {{spacingTolerance}} settings as noted by a TODO > comment in {{PDF2XHTML}}. > Additionally, {{PDF2XHTML}}'s use of {{PDFParserConfig}} should be changed > slightly to allow for extension of that config class and its configuration > behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (TIKA-1278) Expose PDF Avg Char and Spacing Tolerance Config Params
[ https://issues.apache.org/jira/browse/TIKA-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Gauss II resolved TIKA-1278. Resolution: Fixed Resolved in r1589722. > Expose PDF Avg Char and Spacing Tolerance Config Params > --- > > Key: TIKA-1278 > URL: https://issues.apache.org/jira/browse/TIKA-1278 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.5 >Reporter: Ray Gauss II >Assignee: Ray Gauss II > Fix For: 1.6 > > > {{PDFParserConfig}} should allow for override of PDFBox's > {{averageCharTolerance}} and {{spacingTolerance}} settings as noted by a TODO > comment in {{PDF2XHTML}}. > Additionally, {{PDF2XHTML}}'s use of {{PDFParserConfig}} should be changed > slightly to allow for extension of that config class and it's configuration > behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TIKA-1278) Expose PDF Avg Char and Spacing Tolerance Config Params
[ https://issues.apache.org/jira/browse/TIKA-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Gauss II updated TIKA-1278: --- Description: {{PDFParserConfig}} should allow for override of PDFBox's {{averageCharTolerance}} and {{spacingTolerance}} settings as noted by a TODO comment in {{PDF2XHTML}}. Additionally, {{PDF2XHTML}}'s use of {{PDFParserConfig}} should be changed slightly to allow for extension of that config class and its configuration behavior. was: {{PDFParserConfig}} should allow for override of PDFBox's {{averageCharTolerance}} and {{spacingTolerance}} settings as noted by a TODO comment in {{PDF2XHTML}}. Additionally, {{PDF2XHTML}}'s use of {{PDFParserConfig}} should be changed slightly to allow for extension of that config class and it's configuration behavior. > Expose PDF Avg Char and Spacing Tolerance Config Params > --- > > Key: TIKA-1278 > URL: https://issues.apache.org/jira/browse/TIKA-1278 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.5 >Reporter: Ray Gauss II >Assignee: Ray Gauss II > Fix For: 1.6 > > > {{PDFParserConfig}} should allow for override of PDFBox's > {{averageCharTolerance}} and {{spacingTolerance}} settings as noted by a TODO > comment in {{PDF2XHTML}}. > Additionally, {{PDF2XHTML}}'s use of {{PDFParserConfig}} should be changed > slightly to allow for extension of that config class and its configuration > behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Reopened] (TIKA-1276) Missing embedded dependencies in tika-bundle
[ https://issues.apache.org/jira/browse/TIKA-1276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch reopened TIKA-1276: -- Re-opening, as we don't yet have any unit tests for this problem > Missing embedded dependencies in tika-bundle > > > Key: TIKA-1276 > URL: https://issues.apache.org/jira/browse/TIKA-1276 > Project: Tika > Issue Type: Bug > Components: packaging >Affects Versions: 1.5 > Environment: OSGI, Apache Felix via Apache Sling Launcher >Reporter: Rupert Westenthaler > Fix For: 1.6 > > Attachments: TIKA-1276_20140423_rwesten.diff > > > While updating from tika 1.2 to 1.5 I that the > `org.apache.tika:tika-bundle:1.5` module has some missing dependences. > 1. `com.uwyn:jhighlight:1.0` is not embedded > Because of that installing the bundle results in the following exception > {code} > org.osgi.framework.BundleException: Unresolved constraint in bundle > org.apache.tika.bundle [103]: Unable to resolve 103.0: missing requirement > [103.0] osgi.wiring.package; > (osgi.wiring.package=com.uwyn.jhighlight.renderer)) > org.osgi.framework.BundleException: Unresolved constraint in bundle > org.apache.tika.bundle [103]: Unable to resolve 103.0: missing requirement > [103.0] osgi.wiring.package; > (osgi.wiring.package=com.uwyn.jhighlight.renderer) > at > org.apache.felix.framework.Felix.resolveBundleRevision(Felix.java:3962) > at org.apache.felix.framework.Felix.startBundle(Felix.java:2025) > at org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1279) > at > org.apache.felix.framework.FrameworkStartLevelImpl.run(FrameworkStartLevelImpl.java:304) > at java.lang.Thread.run(Thread.java:744) > {code} > 2. `org.ow2.asm:asm:4.1` is not embedded because > `org.apache.tika:tika-core:1.5` uses `org.ow2.asm-debug-all:asm:4.1` and > therefore the `Embed-Dependency` directive `asm` does not match any > dependency. > Because of that one do get the following exception (after fixing (1)) > {code} > org.osgi.framework.BundleException: Unresolved constraint in bundle > org.apache.tika.bundle [96]: Unable to resolve 96.0: missing requirement > [96.0] osgi.wiring.package; > (&(osgi.wiring.package=org.objectweb.asm)(version>=4.1.0)(!(version>=5.0.0 > org.osgi.framework.BundleException: Unresolved constraint in bundle > org.apache.tika.bundle [96]: Unable to resolve 96.0: missing requirement > [96.0] osgi.wiring.package; > (&(osgi.wiring.package=org.objectweb.asm)(version>=4.1.0)(!(version>=5.0.0))) > at > org.apache.felix.framework.Felix.resolveBundleRevision(Felix.java:3962) > at org.apache.felix.framework.Felix.startBundle(Felix.java:2025) > at org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1279) > at > org.apache.felix.framework.FrameworkStartLevelImpl.run(FrameworkStartLevelImpl.java:304) > at java.lang.Thread.run(Thread.java:744) > {code} > There are two possibilities to fix this (a) change the `Embed-Dependency` to > `asm-debug-all` or adding a dependency to `org.ow2.asm:asm:4.1` to the > tika-bundle pom file. > 3. `edu.ucar:netcdf:4.2-min` is not embedded > Because of that one does get the following exception (after fixing (1) and > (2)) > {code} > org.osgi.framework.BundleException: Unresolved constraint in bundle > org.apache.tika.bundle [96]: Unable to resolve 96.0: missing requirement > [96.0] osgi.wiring.package; (osgi.wiring.package=ucar.ma2)) > org.osgi.framework.BundleException: Unresolved constraint in bundle > org.apache.tika.bundle [96]: Unable to resolve 96.0: missing requirement > [96.0] osgi.wiring.package; (osgi.wiring.package=ucar.ma2) > at > org.apache.felix.framework.Felix.resolveBundleRevision(Felix.java:3962) > at org.apache.felix.framework.Felix.startBundle(Felix.java:2025) > at org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1279) > at > org.apache.felix.framework.FrameworkStartLevelImpl.run(FrameworkStartLevelImpl.java:304) > at java.lang.Thread.run(Thread.java:744) > {code} > 4. The `com.adobe.xmp:xmpcore:5.1.2` dependency is required at runtime > After fixing the above issues the tika-bundle was started successfully. > However when extracting EXIG metadata from a jpeg image I got the following > exception. > {code} > java.lang.NoClassDefFoundError: com/adobe/xmp/XMPException > at > com.drew.imaging.jpeg.JpegMetadataReader.extractMetadataFromJpegSegmentReader(JpegMetadataReader.java:112) > at > com.drew.imaging.jpeg.JpegMetadataReader.readMetadata(JpegMetadataReader.java:71) > at > org.apache.tika.parser.image.ImageMetadataExtractor.parseJpeg(ImageMetadataExtractor.java:91) > at org.apache.tika.parser.jpeg.JpegParser.parse(JpegParser.java:56) > [..] > {code} > Embedding xmpcore i
[jira] [Resolved] (TIKA-1276) Missing embedded dependencies in tika-bundle
[ https://issues.apache.org/jira/browse/TIKA-1276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen resolved TIKA-1276. Resolution: Fixed Thank [~rwesten], added your patch at r1589717 > Missing embedded dependencies in tika-bundle > > > Key: TIKA-1276 > URL: https://issues.apache.org/jira/browse/TIKA-1276 > Project: Tika > Issue Type: Bug > Components: packaging >Affects Versions: 1.5 > Environment: OSGI, Apache Felix via Apache Sling Launcher >Reporter: Rupert Westenthaler > Fix For: 1.6 > > Attachments: TIKA-1276_20140423_rwesten.diff > > > While updating from tika 1.2 to 1.5 I that the > `org.apache.tika:tika-bundle:1.5` module has some missing dependences. > 1. `com.uwyn:jhighlight:1.0` is not embedded > Because of that installing the bundle results in the following exception > {code} > org.osgi.framework.BundleException: Unresolved constraint in bundle > org.apache.tika.bundle [103]: Unable to resolve 103.0: missing requirement > [103.0] osgi.wiring.package; > (osgi.wiring.package=com.uwyn.jhighlight.renderer)) > org.osgi.framework.BundleException: Unresolved constraint in bundle > org.apache.tika.bundle [103]: Unable to resolve 103.0: missing requirement > [103.0] osgi.wiring.package; > (osgi.wiring.package=com.uwyn.jhighlight.renderer) > at > org.apache.felix.framework.Felix.resolveBundleRevision(Felix.java:3962) > at org.apache.felix.framework.Felix.startBundle(Felix.java:2025) > at org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1279) > at > org.apache.felix.framework.FrameworkStartLevelImpl.run(FrameworkStartLevelImpl.java:304) > at java.lang.Thread.run(Thread.java:744) > {code} > 2. `org.ow2.asm:asm:4.1` is not embedded because > `org.apache.tika:tika-core:1.5` uses `org.ow2.asm-debug-all:asm:4.1` and > therefore the `Embed-Dependency` directive `asm` does not match any > dependency. > Because of that one do get the following exception (after fixing (1)) > {code} > org.osgi.framework.BundleException: Unresolved constraint in bundle > org.apache.tika.bundle [96]: Unable to resolve 96.0: missing requirement > [96.0] osgi.wiring.package; > (&(osgi.wiring.package=org.objectweb.asm)(version>=4.1.0)(!(version>=5.0.0 > org.osgi.framework.BundleException: Unresolved constraint in bundle > org.apache.tika.bundle [96]: Unable to resolve 96.0: missing requirement > [96.0] osgi.wiring.package; > (&(osgi.wiring.package=org.objectweb.asm)(version>=4.1.0)(!(version>=5.0.0))) > at > org.apache.felix.framework.Felix.resolveBundleRevision(Felix.java:3962) > at org.apache.felix.framework.Felix.startBundle(Felix.java:2025) > at org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1279) > at > org.apache.felix.framework.FrameworkStartLevelImpl.run(FrameworkStartLevelImpl.java:304) > at java.lang.Thread.run(Thread.java:744) > {code} > There are two possibilities to fix this (a) change the `Embed-Dependency` to > `asm-debug-all` or adding a dependency to `org.ow2.asm:asm:4.1` to the > tika-bundle pom file. > 3. `edu.ucar:netcdf:4.2-min` is not embedded > Because of that one does get the following exception (after fixing (1) and > (2)) > {code} > org.osgi.framework.BundleException: Unresolved constraint in bundle > org.apache.tika.bundle [96]: Unable to resolve 96.0: missing requirement > [96.0] osgi.wiring.package; (osgi.wiring.package=ucar.ma2)) > org.osgi.framework.BundleException: Unresolved constraint in bundle > org.apache.tika.bundle [96]: Unable to resolve 96.0: missing requirement > [96.0] osgi.wiring.package; (osgi.wiring.package=ucar.ma2) > at > org.apache.felix.framework.Felix.resolveBundleRevision(Felix.java:3962) > at org.apache.felix.framework.Felix.startBundle(Felix.java:2025) > at org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1279) > at > org.apache.felix.framework.FrameworkStartLevelImpl.run(FrameworkStartLevelImpl.java:304) > at java.lang.Thread.run(Thread.java:744) > {code} > 4. The `com.adobe.xmp:xmpcore:5.1.2` dependency is required at runtime > After fixing the above issues the tika-bundle was started successfully. > However when extracting EXIG metadata from a jpeg image I got the following > exception. > {code} > java.lang.NoClassDefFoundError: com/adobe/xmp/XMPException > at > com.drew.imaging.jpeg.JpegMetadataReader.extractMetadataFromJpegSegmentReader(JpegMetadataReader.java:112) > at > com.drew.imaging.jpeg.JpegMetadataReader.readMetadata(JpegMetadataReader.java:71) > at > org.apache.tika.parser.image.ImageMetadataExtractor.parseJpeg(ImageMetadataExtractor.java:91) > at org.apache.tika.parser.jpeg.JpegParser.parse(JpegParser.java:56) > [..] > {code} > Emb
[jira] [Updated] (TIKA-1276) Missing embedded dependencies in tika-bundle
[ https://issues.apache.org/jira/browse/TIKA-1276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen updated TIKA-1276: --- Fix Version/s: 1.6 > Missing embedded dependencies in tika-bundle > > > Key: TIKA-1276 > URL: https://issues.apache.org/jira/browse/TIKA-1276 > Project: Tika > Issue Type: Bug > Components: packaging >Affects Versions: 1.5 > Environment: OSGI, Apache Felix via Apache Sling Launcher >Reporter: Rupert Westenthaler > Fix For: 1.6 > > Attachments: TIKA-1276_20140423_rwesten.diff > > > While updating from tika 1.2 to 1.5 I that the > `org.apache.tika:tika-bundle:1.5` module has some missing dependences. > 1. `com.uwyn:jhighlight:1.0` is not embedded > Because of that installing the bundle results in the following exception > {code} > org.osgi.framework.BundleException: Unresolved constraint in bundle > org.apache.tika.bundle [103]: Unable to resolve 103.0: missing requirement > [103.0] osgi.wiring.package; > (osgi.wiring.package=com.uwyn.jhighlight.renderer)) > org.osgi.framework.BundleException: Unresolved constraint in bundle > org.apache.tika.bundle [103]: Unable to resolve 103.0: missing requirement > [103.0] osgi.wiring.package; > (osgi.wiring.package=com.uwyn.jhighlight.renderer) > at > org.apache.felix.framework.Felix.resolveBundleRevision(Felix.java:3962) > at org.apache.felix.framework.Felix.startBundle(Felix.java:2025) > at org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1279) > at > org.apache.felix.framework.FrameworkStartLevelImpl.run(FrameworkStartLevelImpl.java:304) > at java.lang.Thread.run(Thread.java:744) > {code} > 2. `org.ow2.asm:asm:4.1` is not embedded because > `org.apache.tika:tika-core:1.5` uses `org.ow2.asm-debug-all:asm:4.1` and > therefore the `Embed-Dependency` directive `asm` does not match any > dependency. > Because of that one do get the following exception (after fixing (1)) > {code} > org.osgi.framework.BundleException: Unresolved constraint in bundle > org.apache.tika.bundle [96]: Unable to resolve 96.0: missing requirement > [96.0] osgi.wiring.package; > (&(osgi.wiring.package=org.objectweb.asm)(version>=4.1.0)(!(version>=5.0.0 > org.osgi.framework.BundleException: Unresolved constraint in bundle > org.apache.tika.bundle [96]: Unable to resolve 96.0: missing requirement > [96.0] osgi.wiring.package; > (&(osgi.wiring.package=org.objectweb.asm)(version>=4.1.0)(!(version>=5.0.0))) > at > org.apache.felix.framework.Felix.resolveBundleRevision(Felix.java:3962) > at org.apache.felix.framework.Felix.startBundle(Felix.java:2025) > at org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1279) > at > org.apache.felix.framework.FrameworkStartLevelImpl.run(FrameworkStartLevelImpl.java:304) > at java.lang.Thread.run(Thread.java:744) > {code} > There are two possibilities to fix this (a) change the `Embed-Dependency` to > `asm-debug-all` or adding a dependency to `org.ow2.asm:asm:4.1` to the > tika-bundle pom file. > 3. `edu.ucar:netcdf:4.2-min` is not embedded > Because of that one does get the following exception (after fixing (1) and > (2)) > {code} > org.osgi.framework.BundleException: Unresolved constraint in bundle > org.apache.tika.bundle [96]: Unable to resolve 96.0: missing requirement > [96.0] osgi.wiring.package; (osgi.wiring.package=ucar.ma2)) > org.osgi.framework.BundleException: Unresolved constraint in bundle > org.apache.tika.bundle [96]: Unable to resolve 96.0: missing requirement > [96.0] osgi.wiring.package; (osgi.wiring.package=ucar.ma2) > at > org.apache.felix.framework.Felix.resolveBundleRevision(Felix.java:3962) > at org.apache.felix.framework.Felix.startBundle(Felix.java:2025) > at org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1279) > at > org.apache.felix.framework.FrameworkStartLevelImpl.run(FrameworkStartLevelImpl.java:304) > at java.lang.Thread.run(Thread.java:744) > {code} > 4. The `com.adobe.xmp:xmpcore:5.1.2` dependency is required at runtime > After fixing the above issues the tika-bundle was started successfully. > However when extracting EXIG metadata from a jpeg image I got the following > exception. > {code} > java.lang.NoClassDefFoundError: com/adobe/xmp/XMPException > at > com.drew.imaging.jpeg.JpegMetadataReader.extractMetadataFromJpegSegmentReader(JpegMetadataReader.java:112) > at > com.drew.imaging.jpeg.JpegMetadataReader.readMetadata(JpegMetadataReader.java:71) > at > org.apache.tika.parser.image.ImageMetadataExtractor.parseJpeg(ImageMetadataExtractor.java:91) > at org.apache.tika.parser.jpeg.JpegParser.parse(JpegParser.java:56) > [..] > {code} > Embedding xmpcore in the tika-bundle solved this iss
[jira] [Commented] (TIKA-1278) Expose PDF Avg Char and Spacing Tolerance Config Params
[ https://issues.apache.org/jira/browse/TIKA-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13979676#comment-13979676 ] Tim Allison commented on TIKA-1278: --- +1. Let me know if there's anything I can do to help. > Expose PDF Avg Char and Spacing Tolerance Config Params > --- > > Key: TIKA-1278 > URL: https://issues.apache.org/jira/browse/TIKA-1278 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.5 >Reporter: Ray Gauss II >Assignee: Ray Gauss II > Fix For: 1.6 > > > {{PDFParserConfig}} should allow for override of PDFBox's > {{averageCharTolerance}} and {{spacingTolerance}} settings as noted by a TODO > comment in {{PDF2XHTML}}. > Additionally, {{PDF2XHTML}}'s use of {{PDFParserConfig}} should be changed > slightly to allow for extension of that config class and it's configuration > behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (TIKA-1279) Missing return lines at output of SourceCodeParser
[ https://issues.apache.org/jira/browse/TIKA-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen resolved TIKA-1279. Resolution: Fixed Fixed at r1589687 > Missing return lines at output of SourceCodeParser > -- > > Key: TIKA-1279 > URL: https://issues.apache.org/jira/browse/TIKA-1279 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.5 >Reporter: Hong-Thai Nguyen >Priority: Trivial > Fix For: 1.6 > > > xhtml output is on a single line. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TIKA-1224) Adding Source code (Java, Groovy, C) parser
[ https://issues.apache.org/jira/browse/TIKA-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13979614#comment-13979614 ] Hong-Thai Nguyen commented on TIKA-1224: Thank [~ben.12] for feedback. For line return problem at output, I created a new issue: TIKA-1279 For -t option in TikaCLI, It's ambiguous on mimetype of java file. It's could be text/plain (in this case, TxtParser will be used to return original text as is), x-java-source (SourceCodeParser will be used). For -h option, output is normally something: {code} Author: Hong-Thai.Nguyen Content-Encoding: windows-1252 Content-Length: 4899 Content-Type: text/x-java-source LoC: 133 creator: Hong-Thai.Nguyen dc:creator: Hong-Thai.Nguyen meta:author: Hong-Thai.Nguyen resourceName: SourceCodeParser.java {code} the creator is from 'author' annotation in javadoc. This parser is quite generic (quick and dirty as mentioned by [~kkrugler]) and simplistic. We can make a more dedicate Java source parser and extract more metadata (member, attributes...). If you interest this kind of parser, please create new issue and eventually an investigation on this work is warmly welcome. Regards, > Adding Source code (Java, Groovy, C) parser > --- > > Key: TIKA-1224 > URL: https://issues.apache.org/jira/browse/TIKA-1224 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.5 >Reporter: Hong-Thai Nguyen >Priority: Minor > > We can parser some source code file formats: > text/x-java-source > text/x-groovy > text/x-c > for HTML rendering from code, we can use jhightlight: > http://www.ohloh.net/p/jhighlight -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (TIKA-1279) Missing return lines at output of SourceCodeParser
Hong-Thai Nguyen created TIKA-1279: -- Summary: Missing return lines at output of SourceCodeParser Key: TIKA-1279 URL: https://issues.apache.org/jira/browse/TIKA-1279 Project: Tika Issue Type: Bug Components: parser Affects Versions: 1.5 Reporter: Hong-Thai Nguyen Priority: Trivial Fix For: 1.6 xhtml output is on a single line. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (TIKA-1278) Expose PDF Avg Char and Spacing Tolerance Config Params
Ray Gauss II created TIKA-1278: -- Summary: Expose PDF Avg Char and Spacing Tolerance Config Params Key: TIKA-1278 URL: https://issues.apache.org/jira/browse/TIKA-1278 Project: Tika Issue Type: Improvement Components: parser Affects Versions: 1.5 Reporter: Ray Gauss II Assignee: Ray Gauss II Fix For: 1.6 {{PDFParserConfig}} should allow for override of PDFBox's {{averageCharTolerance}} and {{spacingTolerance}} settings as noted by a TODO comment in {{PDF2XHTML}}. Additionally, {{PDF2XHTML}}'s use of {{PDFParserConfig}} should be changed slightly to allow for extension of that config class and it's configuration behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TIKA-1224) Adding Source code (Java, Groovy, C) parser
[ https://issues.apache.org/jira/browse/TIKA-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13979379#comment-13979379 ] Benoit Moreau commented on TIKA-1224: - In debug, Tika uses org.apache.tika.SourceCodeParser with "x-java-source" mime-type. It removes all end of lines (why?, mistake? readLine() doesn't return \n or/and \r), then gives the result to JHightlight. JHightlight result (entire html) is used as argument of characters() method of ContentHandler. I just start with Tika, but I don't think that is good. > Adding Source code (Java, Groovy, C) parser > --- > > Key: TIKA-1224 > URL: https://issues.apache.org/jira/browse/TIKA-1224 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.5 >Reporter: Hong-Thai Nguyen >Priority: Minor > > We can parser some source code file formats: > text/x-java-source > text/x-groovy > text/x-c > for HTML rendering from code, we can use jhightlight: > http://www.ohloh.net/p/jhighlight -- This message was sent by Atlassian JIRA (v6.2#6252)