[
https://issues.apache.org/jira/browse/TIKA-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643470#comment-13643470
]
Alexander Chow commented on TIKA-1112:
--------------------------------------
The file here and in TIKA-1113 are two public domain test files I have used. I
have used other ones as well and have similar problems.
Using {{ogginfo}} and {{oggdec}} against these two OGV files check overall
fine. For the Typing_example.ogv, it does return a warning that "EOS not set
on stream 1" but seems fine otherwise. Are there any other tools you think I
should verify things against?
These are just a couple of the test files I have used for a while. We only had
a problem when I upgraded from Tika 0.9 to 1.3 (which of course now includes
the VorbisJava library).
> Parsing for OGV file with invalid checksum
> ------------------------------------------
>
> Key: TIKA-1112
> URL: https://issues.apache.org/jira/browse/TIKA-1112
> Project: Tika
> Issue Type: Bug
> Components: metadata, parser
> Affects Versions: 1.3
> Environment: OS X 10.8.3
> JDK 1.6.0_45 64-bit
> Reporter: Alexander Chow
>
> When parsing any OGV file (e.g.,
> [Typing_example.ogv|http://commons.wikimedia.org/wiki/File:Typing_example.ogv]),
> log will output something like the following:
> {code}
> Warning - invalid checksum on page 2 of stream 155f (5471)
> Warning - invalid checksum on page 3 of stream 155f (5471)
> Warning - invalid checksum on page 4 of stream 155f (5471)
> Warning - invalid checksum on page 5 of stream 155f (5471)
> Warning - invalid checksum on page 6 of stream 155f (5471)
> Warning - invalid checksum on page 7 of stream 155f (5471)
> ...
> Warning - invalid checksum on page 3071 of stream 155f (5471)
> Warning - invalid checksum on page 3072 of stream 155f (5471)
> Warning - invalid checksum on page 3073 of stream 155f (5471)
> Warning - invalid checksum on page 3074 of stream 155f (5471)
> Exception in thread "main" java.io.IOException: Asked to read 4228 bytes from
> 0 but hit EoF at 2884
> at org.gagravarr.ogg.IOUtils.readFully(IOUtils.java:39)
> at org.gagravarr.ogg.IOUtils.readFully(IOUtils.java:31)
> at org.gagravarr.ogg.OggPage.<init>(OggPage.java:82)
> at
> org.gagravarr.ogg.OggPacketReader.getNextPacket(OggPacketReader.java:116)
> at org.gagravarr.tika.OggDetector.detect(OggDetector.java:79)
> at
> org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:113)
> at com.test.OGVTest.main(OGVTest.java:31)
> {code}
> My test code was the following:
> {code:java}
> void parse(String fileName) throws Exception {
> InputStream inputStream = new FileInputStream(fileName);
>
> Metadata metadata = new Metadata();
>
> Parser parser = new AutoDetectParser();
>
> ParseContext parserContext = new ParseContext();
> parserContext.set(Parser.class, parser);
> ContentHandler contentHandler = new WriteOutContentHandler(
> new DummyWriter());
> parser.parse(inputStream, contentHandler, metadata,
> parserContext);
>
> System.out.println(metadata);
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira