[ https://issues.apache.org/jira/browse/TIKA-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643470#comment-13643470 ]
Alexander Chow commented on TIKA-1112: -------------------------------------- The file here and in TIKA-1113 are two public domain test files I have used. I have used other ones as well and have similar problems. Using {{ogginfo}} and {{oggdec}} against these two OGV files check overall fine. For the Typing_example.ogv, it does return a warning that "EOS not set on stream 1" but seems fine otherwise. Are there any other tools you think I should verify things against? These are just a couple of the test files I have used for a while. We only had a problem when I upgraded from Tika 0.9 to 1.3 (which of course now includes the VorbisJava library). > Parsing for OGV file with invalid checksum > ------------------------------------------ > > Key: TIKA-1112 > URL: https://issues.apache.org/jira/browse/TIKA-1112 > Project: Tika > Issue Type: Bug > Components: metadata, parser > Affects Versions: 1.3 > Environment: OS X 10.8.3 > JDK 1.6.0_45 64-bit > Reporter: Alexander Chow > > When parsing any OGV file (e.g., > [Typing_example.ogv|http://commons.wikimedia.org/wiki/File:Typing_example.ogv]), > log will output something like the following: > {code} > Warning - invalid checksum on page 2 of stream 155f (5471) > Warning - invalid checksum on page 3 of stream 155f (5471) > Warning - invalid checksum on page 4 of stream 155f (5471) > Warning - invalid checksum on page 5 of stream 155f (5471) > Warning - invalid checksum on page 6 of stream 155f (5471) > Warning - invalid checksum on page 7 of stream 155f (5471) > ... > Warning - invalid checksum on page 3071 of stream 155f (5471) > Warning - invalid checksum on page 3072 of stream 155f (5471) > Warning - invalid checksum on page 3073 of stream 155f (5471) > Warning - invalid checksum on page 3074 of stream 155f (5471) > Exception in thread "main" java.io.IOException: Asked to read 4228 bytes from > 0 but hit EoF at 2884 > at org.gagravarr.ogg.IOUtils.readFully(IOUtils.java:39) > at org.gagravarr.ogg.IOUtils.readFully(IOUtils.java:31) > at org.gagravarr.ogg.OggPage.<init>(OggPage.java:82) > at > org.gagravarr.ogg.OggPacketReader.getNextPacket(OggPacketReader.java:116) > at org.gagravarr.tika.OggDetector.detect(OggDetector.java:79) > at > org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:113) > at com.test.OGVTest.main(OGVTest.java:31) > {code} > My test code was the following: > {code:java} > void parse(String fileName) throws Exception { > InputStream inputStream = new FileInputStream(fileName); > > Metadata metadata = new Metadata(); > > Parser parser = new AutoDetectParser(); > > ParseContext parserContext = new ParseContext(); > parserContext.set(Parser.class, parser); > ContentHandler contentHandler = new WriteOutContentHandler( > new DummyWriter()); > parser.parse(inputStream, contentHandler, metadata, > parserContext); > > System.out.println(metadata); > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira