[ 
https://issues.apache.org/jira/browse/TIKA-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643470#comment-13643470
 ] 

Alexander Chow commented on TIKA-1112:
--------------------------------------

The file here and in TIKA-1113 are two public domain test files I have used.  I 
have used other ones as well and have similar problems.

Using {{ogginfo}} and {{oggdec}} against these two OGV files check overall 
fine.  For the Typing_example.ogv, it does return a warning that "EOS not set 
on stream 1" but seems fine otherwise.  Are there any other tools you think I 
should verify things against?

These are just a couple of the test files I have used for a while.  We only had 
a problem when I upgraded from Tika 0.9 to 1.3 (which of course now includes 
the VorbisJava library).
                
> Parsing for OGV file with invalid checksum
> ------------------------------------------
>
>                 Key: TIKA-1112
>                 URL: https://issues.apache.org/jira/browse/TIKA-1112
>             Project: Tika
>          Issue Type: Bug
>          Components: metadata, parser
>    Affects Versions: 1.3
>         Environment: OS X 10.8.3
> JDK 1.6.0_45 64-bit
>            Reporter: Alexander Chow
>
> When parsing any OGV file (e.g., 
> [Typing_example.ogv|http://commons.wikimedia.org/wiki/File:Typing_example.ogv]),
>  log will output something like the following:
> {code}
> Warning - invalid checksum on page 2 of stream 155f (5471)
> Warning - invalid checksum on page 3 of stream 155f (5471)
> Warning - invalid checksum on page 4 of stream 155f (5471)
> Warning - invalid checksum on page 5 of stream 155f (5471)
> Warning - invalid checksum on page 6 of stream 155f (5471)
> Warning - invalid checksum on page 7 of stream 155f (5471)
> ...
> Warning - invalid checksum on page 3071 of stream 155f (5471)
> Warning - invalid checksum on page 3072 of stream 155f (5471)
> Warning - invalid checksum on page 3073 of stream 155f (5471)
> Warning - invalid checksum on page 3074 of stream 155f (5471)
> Exception in thread "main" java.io.IOException: Asked to read 4228 bytes from 
> 0 but hit EoF at 2884
>       at org.gagravarr.ogg.IOUtils.readFully(IOUtils.java:39)
>       at org.gagravarr.ogg.IOUtils.readFully(IOUtils.java:31)
>       at org.gagravarr.ogg.OggPage.<init>(OggPage.java:82)
>       at 
> org.gagravarr.ogg.OggPacketReader.getNextPacket(OggPacketReader.java:116)
>       at org.gagravarr.tika.OggDetector.detect(OggDetector.java:79)
>       at 
> org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)
>       at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:113)
>       at com.test.OGVTest.main(OGVTest.java:31)
> {code}
> My test code was the following:
> {code:java}
>       void parse(String fileName) throws Exception {
>               InputStream inputStream = new FileInputStream(fileName);
>               
>               Metadata metadata = new Metadata();
>               
>               Parser parser = new AutoDetectParser();
>               
>               ParseContext parserContext = new ParseContext();
>               parserContext.set(Parser.class, parser);
>               ContentHandler contentHandler = new WriteOutContentHandler(
>                       new DummyWriter());
>               parser.parse(inputStream, contentHandler, metadata, 
> parserContext);
>               
>               System.out.println(metadata);
>       }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to