unit tests and classpaths

2014-04-24 Thread Annie Burgess
Hi dev group,

I'm working on a very simple starter unit test for a new parser and am
coming up with some roadblocks.  I suspect it may be classpath related, but
have tried many iterations and am coming up short.

My unit test:

package edu.usc.sunset.burgess.tika;

//JDK imports
import static org.junit.Assert.assertEquals;
import static org.junit.Assert.assertTrue;
import junit.framework.TestCase;

import java.io.InputStream;

//TIKA imports
import org.apache.tika.metadata.Metadata;
import org.apache.tika.metadata.TikaCoreProperties;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.parser.Parser;
import org.apache.tika.sax.BodyContentHandler;
import org.junit.Test;
import org.xml.sax.ContentHandler;
import java.io.IOException;
/*
 * Test cases to exercise the {@link EnviHeaderParser}.
 *
 */
public class EnviHeaderParserTest extends TestCase
{
 public static final String TEST_STRING = "{GEO-TIFF File Imported into
ENVI [Fri May 25 14:06:23 2012]}";

@Test
public void testParser() throws Exception {

Parser parser = new EnviHeaderParser();
ContentHandler handler = new BodyContentHandler();
Metadata metadata = new Metadata();

InputStream stream = EnviHeaderParser.class

.getResourceAsStream("/test-documents/envi_test_header.hdr");
try {
parser.parse(stream, handler, metadata, new ParseContext());
} finally {
stream.close();
}

// Check text
String content = handler.toString();
assertTrue(content.contains(TEST_STRING));
}
}
---
Files are located as follows:

 
tika/tika-parsers/src/test/java/org/apache/tika/parser/envi/EnviHeaderParserTest.java

/tika/tika-parsers/src/test/resources/test-documents/envi_test_header.hdr

/tika/anniedev/src/main/java/edu/usc/sunset/burgess/tika/EnviHeaderParser.java


To compile and test code I do:

cd /tika/tika/tika-parsers
mvn -Dtest=EnviHeaderParserTest compile
mvn -Dtest=EnviHeaderParserTest test

-
I get the following output:

Running edu.usc.sunset.burgess.tika.EnviHeaderParserTest
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.172 sec
<<< FAILURE!

Results :

Failed tests:   testParser(edu.usc.sunset.burgess.tika.EnviHeaderParserTest)

Tests run: 1, Failures: 1, Errors: 0, Skipped:0
-

Please let me know if any additional information would be helpful.
Any insights are appreciated.

Annie

-- 
--
Ann Bryant Burgess, PhD

Postdoctoral Fellow
Computer Science Department
University of Southern California
Viterbi School of Engineering
Los Angeles, CA

Alaska Science Center/USGS
Anchorage, AK

Cell:  (585) 738-7549
Office:  (907) 786-7059
Fax:  (907) 786-7150
E-mail: anniebryant.burg...@gmail.com
Office Address: 4210 University Dr., Anchorage, AK 99508-4626
---


[jira] [Commented] (TIKA-1279) Missing return lines at output of SourceCodeParser

2014-04-24 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13979880#comment-13979880
 ] 

Tim Allison commented on TIKA-1279:
---

Fixed caps in "testJAVA.java" in test cases so that tests pass in *nix.

r1589778

> Missing return lines at output of SourceCodeParser
> --
>
> Key: TIKA-1279
> URL: https://issues.apache.org/jira/browse/TIKA-1279
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.5
>Reporter: Hong-Thai Nguyen
>Assignee: Hong-Thai Nguyen
>Priority: Trivial
> Fix For: 1.6
>
>
> xhtml output is on a single line.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (TIKA-1279) Missing return lines at output of SourceCodeParser

2014-04-24 Thread Hong-Thai Nguyen (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong-Thai Nguyen resolved TIKA-1279.


Resolution: Fixed

Thank [~rgauss] for this good catch. I fixed with more tests in r1589742
Hoping that we can move away Java 6 soon :)

> Missing return lines at output of SourceCodeParser
> --
>
> Key: TIKA-1279
> URL: https://issues.apache.org/jira/browse/TIKA-1279
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.5
>Reporter: Hong-Thai Nguyen
>Assignee: Hong-Thai Nguyen
>Priority: Trivial
> Fix For: 1.6
>
>
> xhtml output is on a single line.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Reopened] (TIKA-1279) Missing return lines at output of SourceCodeParser

2014-04-24 Thread Ray Gauss II (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Gauss II reopened TIKA-1279:


  Assignee: Hong-Thai Nguyen

[~thaichat04], I believe we still have to support Java 6 and 
{{System.lineSeparator()}} appears to have been added in Java 7.

I think {{System.getProperty("line.separator")}} would be equivalent.

> Missing return lines at output of SourceCodeParser
> --
>
> Key: TIKA-1279
> URL: https://issues.apache.org/jira/browse/TIKA-1279
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.5
>Reporter: Hong-Thai Nguyen
>Assignee: Hong-Thai Nguyen
>Priority: Trivial
> Fix For: 1.6
>
>
> xhtml output is on a single line.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (TIKA-1278) Expose PDF Avg Char and Spacing Tolerance Config Params

2014-04-24 Thread Ray Gauss II (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13979700#comment-13979700
 ] 

Ray Gauss II edited comment on TIKA-1278 at 4/24/14 1:31 PM:
-

Resolved in r1589722.

The setting of {{PDF2XHTML}} params was also moved from {{PDF2XHTML.process}} 
to a new {{PDFParserConfig.configure}} method which should allow developers to 
extend {{PDFParserConfig}} for custom behavior.


was (Author: rgauss):
Resolved in r1589722.

> Expose PDF Avg Char and Spacing Tolerance Config Params
> ---
>
> Key: TIKA-1278
> URL: https://issues.apache.org/jira/browse/TIKA-1278
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.5
>Reporter: Ray Gauss II
>Assignee: Ray Gauss II
> Fix For: 1.6
>
>
> {{PDFParserConfig}} should allow for override of PDFBox's 
> {{averageCharTolerance}} and {{spacingTolerance}} settings as noted by a TODO 
> comment in {{PDF2XHTML}}.
> Additionally, {{PDF2XHTML}}'s use of {{PDFParserConfig}} should be changed 
> slightly to allow for extension of that config class and its configuration 
> behavior.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (TIKA-1278) Expose PDF Avg Char and Spacing Tolerance Config Params

2014-04-24 Thread Ray Gauss II (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Gauss II resolved TIKA-1278.


Resolution: Fixed

Resolved in r1589722.

> Expose PDF Avg Char and Spacing Tolerance Config Params
> ---
>
> Key: TIKA-1278
> URL: https://issues.apache.org/jira/browse/TIKA-1278
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.5
>Reporter: Ray Gauss II
>Assignee: Ray Gauss II
> Fix For: 1.6
>
>
> {{PDFParserConfig}} should allow for override of PDFBox's 
> {{averageCharTolerance}} and {{spacingTolerance}} settings as noted by a TODO 
> comment in {{PDF2XHTML}}.
> Additionally, {{PDF2XHTML}}'s use of {{PDFParserConfig}} should be changed 
> slightly to allow for extension of that config class and it's configuration 
> behavior.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TIKA-1278) Expose PDF Avg Char and Spacing Tolerance Config Params

2014-04-24 Thread Ray Gauss II (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Gauss II updated TIKA-1278:
---

Description: 
{{PDFParserConfig}} should allow for override of PDFBox's 
{{averageCharTolerance}} and {{spacingTolerance}} settings as noted by a TODO 
comment in {{PDF2XHTML}}.

Additionally, {{PDF2XHTML}}'s use of {{PDFParserConfig}} should be changed 
slightly to allow for extension of that config class and its configuration 
behavior.

  was:
{{PDFParserConfig}} should allow for override of PDFBox's 
{{averageCharTolerance}} and {{spacingTolerance}} settings as noted by a TODO 
comment in {{PDF2XHTML}}.

Additionally, {{PDF2XHTML}}'s use of {{PDFParserConfig}} should be changed 
slightly to allow for extension of that config class and it's configuration 
behavior.


> Expose PDF Avg Char and Spacing Tolerance Config Params
> ---
>
> Key: TIKA-1278
> URL: https://issues.apache.org/jira/browse/TIKA-1278
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.5
>Reporter: Ray Gauss II
>Assignee: Ray Gauss II
> Fix For: 1.6
>
>
> {{PDFParserConfig}} should allow for override of PDFBox's 
> {{averageCharTolerance}} and {{spacingTolerance}} settings as noted by a TODO 
> comment in {{PDF2XHTML}}.
> Additionally, {{PDF2XHTML}}'s use of {{PDFParserConfig}} should be changed 
> slightly to allow for extension of that config class and its configuration 
> behavior.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Reopened] (TIKA-1276) Missing embedded dependencies in tika-bundle

2014-04-24 Thread Nick Burch (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Burch reopened TIKA-1276:
--


Re-opening, as we don't yet have any unit tests for this problem

> Missing embedded dependencies in tika-bundle
> 
>
> Key: TIKA-1276
> URL: https://issues.apache.org/jira/browse/TIKA-1276
> Project: Tika
>  Issue Type: Bug
>  Components: packaging
>Affects Versions: 1.5
> Environment: OSGI, Apache Felix via Apache Sling Launcher
>Reporter: Rupert Westenthaler
> Fix For: 1.6
>
> Attachments: TIKA-1276_20140423_rwesten.diff
>
>
> While updating from tika 1.2 to 1.5 I that the 
> `org.apache.tika:tika-bundle:1.5` module has some missing dependences.
> 1. `com.uwyn:jhighlight:1.0` is not embedded
> Because of that installing the bundle results in the following exception
> {code}
> org.osgi.framework.BundleException: Unresolved constraint in bundle 
> org.apache.tika.bundle [103]: Unable to resolve 103.0: missing requirement 
> [103.0] osgi.wiring.package; 
> (osgi.wiring.package=com.uwyn.jhighlight.renderer))
> org.osgi.framework.BundleException: Unresolved constraint in bundle 
> org.apache.tika.bundle [103]: Unable to resolve 103.0: missing requirement 
> [103.0] osgi.wiring.package; 
> (osgi.wiring.package=com.uwyn.jhighlight.renderer)
>   at 
> org.apache.felix.framework.Felix.resolveBundleRevision(Felix.java:3962)
>   at org.apache.felix.framework.Felix.startBundle(Felix.java:2025)
>   at org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1279)
>   at 
> org.apache.felix.framework.FrameworkStartLevelImpl.run(FrameworkStartLevelImpl.java:304)
>   at java.lang.Thread.run(Thread.java:744)
> {code}
> 2. `org.ow2.asm:asm:4.1` is not embedded because 
> `org.apache.tika:tika-core:1.5` uses `org.ow2.asm-debug-all:asm:4.1` and 
> therefore the `Embed-Dependency` directive `asm` does not match any 
> dependency. 
> Because of that one do get the following exception (after fixing (1))
> {code}
> org.osgi.framework.BundleException: Unresolved constraint in bundle 
> org.apache.tika.bundle [96]: Unable to resolve 96.0: missing requirement 
> [96.0] osgi.wiring.package; 
> (&(osgi.wiring.package=org.objectweb.asm)(version>=4.1.0)(!(version>=5.0.0
> org.osgi.framework.BundleException: Unresolved constraint in bundle 
> org.apache.tika.bundle [96]: Unable to resolve 96.0: missing requirement 
> [96.0] osgi.wiring.package; 
> (&(osgi.wiring.package=org.objectweb.asm)(version>=4.1.0)(!(version>=5.0.0)))
>   at 
> org.apache.felix.framework.Felix.resolveBundleRevision(Felix.java:3962)
>   at org.apache.felix.framework.Felix.startBundle(Felix.java:2025)
>   at org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1279)
>   at 
> org.apache.felix.framework.FrameworkStartLevelImpl.run(FrameworkStartLevelImpl.java:304)
>   at java.lang.Thread.run(Thread.java:744)
> {code}
> There are two possibilities to fix this (a) change the `Embed-Dependency` to 
> `asm-debug-all` or adding a dependency to `org.ow2.asm:asm:4.1` to the 
> tika-bundle pom file.
> 3. `edu.ucar:netcdf:4.2-min` is not embedded
> Because of that one does get the following exception (after fixing (1) and 
> (2))
> {code}
> org.osgi.framework.BundleException: Unresolved constraint in bundle 
> org.apache.tika.bundle [96]: Unable to resolve 96.0: missing requirement 
> [96.0] osgi.wiring.package; (osgi.wiring.package=ucar.ma2))
> org.osgi.framework.BundleException: Unresolved constraint in bundle 
> org.apache.tika.bundle [96]: Unable to resolve 96.0: missing requirement 
> [96.0] osgi.wiring.package; (osgi.wiring.package=ucar.ma2)
>   at 
> org.apache.felix.framework.Felix.resolveBundleRevision(Felix.java:3962)
>   at org.apache.felix.framework.Felix.startBundle(Felix.java:2025)
>   at org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1279)
>   at 
> org.apache.felix.framework.FrameworkStartLevelImpl.run(FrameworkStartLevelImpl.java:304)
>   at java.lang.Thread.run(Thread.java:744)
> {code}
> 4. The `com.adobe.xmp:xmpcore:5.1.2` dependency is required at runtime
> After fixing the above issues the tika-bundle was started successfully. 
> However when extracting EXIG metadata from a jpeg image I got the following 
> exception.
> {code}
> java.lang.NoClassDefFoundError: com/adobe/xmp/XMPException
>   at 
> com.drew.imaging.jpeg.JpegMetadataReader.extractMetadataFromJpegSegmentReader(JpegMetadataReader.java:112)
>   at 
> com.drew.imaging.jpeg.JpegMetadataReader.readMetadata(JpegMetadataReader.java:71)
>   at 
> org.apache.tika.parser.image.ImageMetadataExtractor.parseJpeg(ImageMetadataExtractor.java:91)
>   at org.apache.tika.parser.jpeg.JpegParser.parse(JpegParser.java:56)
>   [..]
> {code}
> Embedding xmpcore i

[jira] [Resolved] (TIKA-1276) Missing embedded dependencies in tika-bundle

2014-04-24 Thread Hong-Thai Nguyen (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong-Thai Nguyen resolved TIKA-1276.


Resolution: Fixed

Thank [~rwesten], added your patch at r1589717

> Missing embedded dependencies in tika-bundle
> 
>
> Key: TIKA-1276
> URL: https://issues.apache.org/jira/browse/TIKA-1276
> Project: Tika
>  Issue Type: Bug
>  Components: packaging
>Affects Versions: 1.5
> Environment: OSGI, Apache Felix via Apache Sling Launcher
>Reporter: Rupert Westenthaler
> Fix For: 1.6
>
> Attachments: TIKA-1276_20140423_rwesten.diff
>
>
> While updating from tika 1.2 to 1.5 I that the 
> `org.apache.tika:tika-bundle:1.5` module has some missing dependences.
> 1. `com.uwyn:jhighlight:1.0` is not embedded
> Because of that installing the bundle results in the following exception
> {code}
> org.osgi.framework.BundleException: Unresolved constraint in bundle 
> org.apache.tika.bundle [103]: Unable to resolve 103.0: missing requirement 
> [103.0] osgi.wiring.package; 
> (osgi.wiring.package=com.uwyn.jhighlight.renderer))
> org.osgi.framework.BundleException: Unresolved constraint in bundle 
> org.apache.tika.bundle [103]: Unable to resolve 103.0: missing requirement 
> [103.0] osgi.wiring.package; 
> (osgi.wiring.package=com.uwyn.jhighlight.renderer)
>   at 
> org.apache.felix.framework.Felix.resolveBundleRevision(Felix.java:3962)
>   at org.apache.felix.framework.Felix.startBundle(Felix.java:2025)
>   at org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1279)
>   at 
> org.apache.felix.framework.FrameworkStartLevelImpl.run(FrameworkStartLevelImpl.java:304)
>   at java.lang.Thread.run(Thread.java:744)
> {code}
> 2. `org.ow2.asm:asm:4.1` is not embedded because 
> `org.apache.tika:tika-core:1.5` uses `org.ow2.asm-debug-all:asm:4.1` and 
> therefore the `Embed-Dependency` directive `asm` does not match any 
> dependency. 
> Because of that one do get the following exception (after fixing (1))
> {code}
> org.osgi.framework.BundleException: Unresolved constraint in bundle 
> org.apache.tika.bundle [96]: Unable to resolve 96.0: missing requirement 
> [96.0] osgi.wiring.package; 
> (&(osgi.wiring.package=org.objectweb.asm)(version>=4.1.0)(!(version>=5.0.0
> org.osgi.framework.BundleException: Unresolved constraint in bundle 
> org.apache.tika.bundle [96]: Unable to resolve 96.0: missing requirement 
> [96.0] osgi.wiring.package; 
> (&(osgi.wiring.package=org.objectweb.asm)(version>=4.1.0)(!(version>=5.0.0)))
>   at 
> org.apache.felix.framework.Felix.resolveBundleRevision(Felix.java:3962)
>   at org.apache.felix.framework.Felix.startBundle(Felix.java:2025)
>   at org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1279)
>   at 
> org.apache.felix.framework.FrameworkStartLevelImpl.run(FrameworkStartLevelImpl.java:304)
>   at java.lang.Thread.run(Thread.java:744)
> {code}
> There are two possibilities to fix this (a) change the `Embed-Dependency` to 
> `asm-debug-all` or adding a dependency to `org.ow2.asm:asm:4.1` to the 
> tika-bundle pom file.
> 3. `edu.ucar:netcdf:4.2-min` is not embedded
> Because of that one does get the following exception (after fixing (1) and 
> (2))
> {code}
> org.osgi.framework.BundleException: Unresolved constraint in bundle 
> org.apache.tika.bundle [96]: Unable to resolve 96.0: missing requirement 
> [96.0] osgi.wiring.package; (osgi.wiring.package=ucar.ma2))
> org.osgi.framework.BundleException: Unresolved constraint in bundle 
> org.apache.tika.bundle [96]: Unable to resolve 96.0: missing requirement 
> [96.0] osgi.wiring.package; (osgi.wiring.package=ucar.ma2)
>   at 
> org.apache.felix.framework.Felix.resolveBundleRevision(Felix.java:3962)
>   at org.apache.felix.framework.Felix.startBundle(Felix.java:2025)
>   at org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1279)
>   at 
> org.apache.felix.framework.FrameworkStartLevelImpl.run(FrameworkStartLevelImpl.java:304)
>   at java.lang.Thread.run(Thread.java:744)
> {code}
> 4. The `com.adobe.xmp:xmpcore:5.1.2` dependency is required at runtime
> After fixing the above issues the tika-bundle was started successfully. 
> However when extracting EXIG metadata from a jpeg image I got the following 
> exception.
> {code}
> java.lang.NoClassDefFoundError: com/adobe/xmp/XMPException
>   at 
> com.drew.imaging.jpeg.JpegMetadataReader.extractMetadataFromJpegSegmentReader(JpegMetadataReader.java:112)
>   at 
> com.drew.imaging.jpeg.JpegMetadataReader.readMetadata(JpegMetadataReader.java:71)
>   at 
> org.apache.tika.parser.image.ImageMetadataExtractor.parseJpeg(ImageMetadataExtractor.java:91)
>   at org.apache.tika.parser.jpeg.JpegParser.parse(JpegParser.java:56)
>   [..]
> {code}
> Emb

[jira] [Updated] (TIKA-1276) Missing embedded dependencies in tika-bundle

2014-04-24 Thread Hong-Thai Nguyen (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong-Thai Nguyen updated TIKA-1276:
---

Fix Version/s: 1.6

> Missing embedded dependencies in tika-bundle
> 
>
> Key: TIKA-1276
> URL: https://issues.apache.org/jira/browse/TIKA-1276
> Project: Tika
>  Issue Type: Bug
>  Components: packaging
>Affects Versions: 1.5
> Environment: OSGI, Apache Felix via Apache Sling Launcher
>Reporter: Rupert Westenthaler
> Fix For: 1.6
>
> Attachments: TIKA-1276_20140423_rwesten.diff
>
>
> While updating from tika 1.2 to 1.5 I that the 
> `org.apache.tika:tika-bundle:1.5` module has some missing dependences.
> 1. `com.uwyn:jhighlight:1.0` is not embedded
> Because of that installing the bundle results in the following exception
> {code}
> org.osgi.framework.BundleException: Unresolved constraint in bundle 
> org.apache.tika.bundle [103]: Unable to resolve 103.0: missing requirement 
> [103.0] osgi.wiring.package; 
> (osgi.wiring.package=com.uwyn.jhighlight.renderer))
> org.osgi.framework.BundleException: Unresolved constraint in bundle 
> org.apache.tika.bundle [103]: Unable to resolve 103.0: missing requirement 
> [103.0] osgi.wiring.package; 
> (osgi.wiring.package=com.uwyn.jhighlight.renderer)
>   at 
> org.apache.felix.framework.Felix.resolveBundleRevision(Felix.java:3962)
>   at org.apache.felix.framework.Felix.startBundle(Felix.java:2025)
>   at org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1279)
>   at 
> org.apache.felix.framework.FrameworkStartLevelImpl.run(FrameworkStartLevelImpl.java:304)
>   at java.lang.Thread.run(Thread.java:744)
> {code}
> 2. `org.ow2.asm:asm:4.1` is not embedded because 
> `org.apache.tika:tika-core:1.5` uses `org.ow2.asm-debug-all:asm:4.1` and 
> therefore the `Embed-Dependency` directive `asm` does not match any 
> dependency. 
> Because of that one do get the following exception (after fixing (1))
> {code}
> org.osgi.framework.BundleException: Unresolved constraint in bundle 
> org.apache.tika.bundle [96]: Unable to resolve 96.0: missing requirement 
> [96.0] osgi.wiring.package; 
> (&(osgi.wiring.package=org.objectweb.asm)(version>=4.1.0)(!(version>=5.0.0
> org.osgi.framework.BundleException: Unresolved constraint in bundle 
> org.apache.tika.bundle [96]: Unable to resolve 96.0: missing requirement 
> [96.0] osgi.wiring.package; 
> (&(osgi.wiring.package=org.objectweb.asm)(version>=4.1.0)(!(version>=5.0.0)))
>   at 
> org.apache.felix.framework.Felix.resolveBundleRevision(Felix.java:3962)
>   at org.apache.felix.framework.Felix.startBundle(Felix.java:2025)
>   at org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1279)
>   at 
> org.apache.felix.framework.FrameworkStartLevelImpl.run(FrameworkStartLevelImpl.java:304)
>   at java.lang.Thread.run(Thread.java:744)
> {code}
> There are two possibilities to fix this (a) change the `Embed-Dependency` to 
> `asm-debug-all` or adding a dependency to `org.ow2.asm:asm:4.1` to the 
> tika-bundle pom file.
> 3. `edu.ucar:netcdf:4.2-min` is not embedded
> Because of that one does get the following exception (after fixing (1) and 
> (2))
> {code}
> org.osgi.framework.BundleException: Unresolved constraint in bundle 
> org.apache.tika.bundle [96]: Unable to resolve 96.0: missing requirement 
> [96.0] osgi.wiring.package; (osgi.wiring.package=ucar.ma2))
> org.osgi.framework.BundleException: Unresolved constraint in bundle 
> org.apache.tika.bundle [96]: Unable to resolve 96.0: missing requirement 
> [96.0] osgi.wiring.package; (osgi.wiring.package=ucar.ma2)
>   at 
> org.apache.felix.framework.Felix.resolveBundleRevision(Felix.java:3962)
>   at org.apache.felix.framework.Felix.startBundle(Felix.java:2025)
>   at org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1279)
>   at 
> org.apache.felix.framework.FrameworkStartLevelImpl.run(FrameworkStartLevelImpl.java:304)
>   at java.lang.Thread.run(Thread.java:744)
> {code}
> 4. The `com.adobe.xmp:xmpcore:5.1.2` dependency is required at runtime
> After fixing the above issues the tika-bundle was started successfully. 
> However when extracting EXIG metadata from a jpeg image I got the following 
> exception.
> {code}
> java.lang.NoClassDefFoundError: com/adobe/xmp/XMPException
>   at 
> com.drew.imaging.jpeg.JpegMetadataReader.extractMetadataFromJpegSegmentReader(JpegMetadataReader.java:112)
>   at 
> com.drew.imaging.jpeg.JpegMetadataReader.readMetadata(JpegMetadataReader.java:71)
>   at 
> org.apache.tika.parser.image.ImageMetadataExtractor.parseJpeg(ImageMetadataExtractor.java:91)
>   at org.apache.tika.parser.jpeg.JpegParser.parse(JpegParser.java:56)
>   [..]
> {code}
> Embedding xmpcore in the tika-bundle solved this iss

[jira] [Commented] (TIKA-1278) Expose PDF Avg Char and Spacing Tolerance Config Params

2014-04-24 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13979676#comment-13979676
 ] 

Tim Allison commented on TIKA-1278:
---

+1.  Let me know if there's anything I can do to help.

> Expose PDF Avg Char and Spacing Tolerance Config Params
> ---
>
> Key: TIKA-1278
> URL: https://issues.apache.org/jira/browse/TIKA-1278
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.5
>Reporter: Ray Gauss II
>Assignee: Ray Gauss II
> Fix For: 1.6
>
>
> {{PDFParserConfig}} should allow for override of PDFBox's 
> {{averageCharTolerance}} and {{spacingTolerance}} settings as noted by a TODO 
> comment in {{PDF2XHTML}}.
> Additionally, {{PDF2XHTML}}'s use of {{PDFParserConfig}} should be changed 
> slightly to allow for extension of that config class and it's configuration 
> behavior.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (TIKA-1279) Missing return lines at output of SourceCodeParser

2014-04-24 Thread Hong-Thai Nguyen (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong-Thai Nguyen resolved TIKA-1279.


Resolution: Fixed

Fixed at r1589687

> Missing return lines at output of SourceCodeParser
> --
>
> Key: TIKA-1279
> URL: https://issues.apache.org/jira/browse/TIKA-1279
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.5
>Reporter: Hong-Thai Nguyen
>Priority: Trivial
> Fix For: 1.6
>
>
> xhtml output is on a single line.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1224) Adding Source code (Java, Groovy, C) parser

2014-04-24 Thread Hong-Thai Nguyen (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13979614#comment-13979614
 ] 

Hong-Thai Nguyen commented on TIKA-1224:


Thank [~ben.12] for feedback.
For line return problem at output, I created a new issue: TIKA-1279
For -t option in TikaCLI, It's ambiguous on mimetype of java file. It's could 
be text/plain (in this case, TxtParser will be used to return original text as 
is), x-java-source (SourceCodeParser will be used).

For -h option, output is normally something:
{code}
Author: Hong-Thai.Nguyen
Content-Encoding: windows-1252
Content-Length: 4899
Content-Type: text/x-java-source
LoC: 133
creator: Hong-Thai.Nguyen
dc:creator: Hong-Thai.Nguyen
meta:author: Hong-Thai.Nguyen
resourceName: SourceCodeParser.java
{code}
the creator is from 'author' annotation in javadoc.

This parser is quite generic (quick and dirty as mentioned by [~kkrugler]) and 
simplistic. We can make a more dedicate Java source parser and extract more 
metadata (member, attributes...). If you interest this kind of parser, please 
create new issue and eventually an investigation on this work is warmly welcome.

Regards,

> Adding Source code (Java, Groovy, C) parser
> ---
>
> Key: TIKA-1224
> URL: https://issues.apache.org/jira/browse/TIKA-1224
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.5
>Reporter: Hong-Thai Nguyen
>Priority: Minor
>
> We can parser some source code file formats:
> text/x-java-source
> text/x-groovy
> text/x-c
> for HTML rendering from code, we can use jhightlight: 
> http://www.ohloh.net/p/jhighlight



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (TIKA-1279) Missing return lines at output of SourceCodeParser

2014-04-24 Thread Hong-Thai Nguyen (JIRA)
Hong-Thai Nguyen created TIKA-1279:
--

 Summary: Missing return lines at output of SourceCodeParser
 Key: TIKA-1279
 URL: https://issues.apache.org/jira/browse/TIKA-1279
 Project: Tika
  Issue Type: Bug
  Components: parser
Affects Versions: 1.5
Reporter: Hong-Thai Nguyen
Priority: Trivial
 Fix For: 1.6


xhtml output is on a single line.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (TIKA-1278) Expose PDF Avg Char and Spacing Tolerance Config Params

2014-04-24 Thread Ray Gauss II (JIRA)
Ray Gauss II created TIKA-1278:
--

 Summary: Expose PDF Avg Char and Spacing Tolerance Config Params
 Key: TIKA-1278
 URL: https://issues.apache.org/jira/browse/TIKA-1278
 Project: Tika
  Issue Type: Improvement
  Components: parser
Affects Versions: 1.5
Reporter: Ray Gauss II
Assignee: Ray Gauss II
 Fix For: 1.6


{{PDFParserConfig}} should allow for override of PDFBox's 
{{averageCharTolerance}} and {{spacingTolerance}} settings as noted by a TODO 
comment in {{PDF2XHTML}}.

Additionally, {{PDF2XHTML}}'s use of {{PDFParserConfig}} should be changed 
slightly to allow for extension of that config class and it's configuration 
behavior.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1224) Adding Source code (Java, Groovy, C) parser

2014-04-24 Thread Benoit Moreau (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13979379#comment-13979379
 ] 

Benoit Moreau commented on TIKA-1224:
-

In debug, Tika uses org.apache.tika.SourceCodeParser with "x-java-source" 
mime-type. It removes all end of lines (why?, mistake? readLine() doesn't 
return \n or/and \r), then gives the result to JHightlight. JHightlight result 
(entire html) is used as argument of characters() method of ContentHandler.

I just start with Tika, but I don't think that is good.

> Adding Source code (Java, Groovy, C) parser
> ---
>
> Key: TIKA-1224
> URL: https://issues.apache.org/jira/browse/TIKA-1224
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.5
>Reporter: Hong-Thai Nguyen
>Priority: Minor
>
> We can parser some source code file formats:
> text/x-java-source
> text/x-groovy
> text/x-c
> for HTML rendering from code, we can use jhightlight: 
> http://www.ohloh.net/p/jhighlight



--
This message was sent by Atlassian JIRA
(v6.2#6252)