[jira] [Commented] (TIKA-1856) Error while parsing an ogg file
[ https://issues.apache.org/jira/browse/TIKA-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151071#comment-15151071 ] Hudson commented on TIKA-1856: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #910 (See [https://builds.apache.org/job/tika-trunk-jdk1.7/910/]) TIKA-1856 Upgrade the Ogg dependency for the truncated files fix (nick: rev 2eb49a721b77edf23c3588326c8d480332d79722) * tika-parsers/pom.xml > Error while parsing an ogg file > --- > > Key: TIKA-1856 > URL: https://issues.apache.org/jira/browse/TIKA-1856 > Project: Tika > Issue Type: Bug > Components: detector, parser >Affects Versions: 1.12 > Environment: python >Reporter: Yash Tanna > Labels: newbie, tika > Fix For: 1.13 > > Attachments: > 1B7A7AE8FE999D22E2A677EFDA38982C8957CF77BEF3371E48852F7D67A7, > 1DE811ACAB8432D526EFE9D941E5EFE58F3C89F1AAB6CB7152091961DD854431, > 4600B9FF184F6AB71AA0CF6873E580FB0A31D75CE1218998057E9A185A5FFBB2, > 5E5892EA6C2B4A07BE998403A04127C7924E5539DB3EB0D27B9BD34D11A1575B, > CA3065B754E6CE79E4BF128464F4A202B0F2CF0336FBE73FA33F13776CD01CE8, > F036789D92EE18032556D9D0ECAC75073CED52226E1833001E379740E23E183D, > F33BFE4B1AF562D40E5B9D9F5D4B34EA6734F8F3A06F99535F100F957958D9BA, > F47F833BFD4A7E55C128DD76DB3666EEFFD0F5EDA24BF31D6F2427BA092D, > FA9D1D2B8D0FB50CFE306FA6024EC48BD771562878B9B70D38D106DF4E61147A > > > Unable to detect a malformed ogg file. The error thrown was > Exception in thread "main" java.io.IOException: Asked to read 4335 bytes > from 0 but hit EoF at 780 > at org.gagravarr.ogg.IOUtils.readFully(IOUtils.java:39) > at org.gagravarr.ogg.IOUtils.readFully(IOUtils.java:31) > at org.gagravarr.ogg.OggPage.(OggPage.java:82) > at > org.gagravarr.ogg.OggPacketReader.getNextPacket(OggPacketReader.java:116) > at org.gagravarr.tika.OggDetector.detect(OggDetector.java:97) > at > org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61) > at org.apache.tika.cli.TikaCLI$10.process(TikaCLI.java:291) > at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:477) > at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:134) > [xdatadeploy@xdata upload]$ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1856) Error while parsing an ogg file
[ https://issues.apache.org/jira/browse/TIKA-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149012#comment-15149012 ] Chris A. Mattmann commented on TIKA-1856: - Hey Nick It's possible they were truncated from Nutch crawls and content limits. See http://github.com/chrismattmann/trec-dd-polar/ for a description of the dataset. > Error while parsing an ogg file > --- > > Key: TIKA-1856 > URL: https://issues.apache.org/jira/browse/TIKA-1856 > Project: Tika > Issue Type: Bug > Components: detector, parser >Affects Versions: 1.12 > Environment: python >Reporter: Yash Tanna > Labels: newbie, tika > Attachments: > 1B7A7AE8FE999D22E2A677EFDA38982C8957CF77BEF3371E48852F7D67A7, > 1DE811ACAB8432D526EFE9D941E5EFE58F3C89F1AAB6CB7152091961DD854431, > 4600B9FF184F6AB71AA0CF6873E580FB0A31D75CE1218998057E9A185A5FFBB2, > 5E5892EA6C2B4A07BE998403A04127C7924E5539DB3EB0D27B9BD34D11A1575B, > CA3065B754E6CE79E4BF128464F4A202B0F2CF0336FBE73FA33F13776CD01CE8, > F036789D92EE18032556D9D0ECAC75073CED52226E1833001E379740E23E183D, > F33BFE4B1AF562D40E5B9D9F5D4B34EA6734F8F3A06F99535F100F957958D9BA, > F47F833BFD4A7E55C128DD76DB3666EEFFD0F5EDA24BF31D6F2427BA092D, > FA9D1D2B8D0FB50CFE306FA6024EC48BD771562878B9B70D38D106DF4E61147A > > > Unable to detect a malformed ogg file. The error thrown was > Exception in thread "main" java.io.IOException: Asked to read 4335 bytes > from 0 but hit EoF at 780 > at org.gagravarr.ogg.IOUtils.readFully(IOUtils.java:39) > at org.gagravarr.ogg.IOUtils.readFully(IOUtils.java:31) > at org.gagravarr.ogg.OggPage.(OggPage.java:82) > at > org.gagravarr.ogg.OggPacketReader.getNextPacket(OggPacketReader.java:116) > at org.gagravarr.tika.OggDetector.detect(OggDetector.java:97) > at > org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61) > at org.apache.tika.cli.TikaCLI$10.process(TikaCLI.java:291) > at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:477) > at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:134) > [xdatadeploy@xdata upload]$ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1856) Error while parsing an ogg file
[ https://issues.apache.org/jira/browse/TIKA-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148861#comment-15148861 ] Yash Tanna commented on TIKA-1856: -- The files are a part of TREC Dynamic Domain Polar Dataset which is collected by [~chrismattmann] and his students. > Error while parsing an ogg file > --- > > Key: TIKA-1856 > URL: https://issues.apache.org/jira/browse/TIKA-1856 > Project: Tika > Issue Type: Bug > Components: detector, parser >Affects Versions: 1.12 > Environment: python >Reporter: Yash Tanna > Labels: newbie, tika > Attachments: > 1B7A7AE8FE999D22E2A677EFDA38982C8957CF77BEF3371E48852F7D67A7, > 1DE811ACAB8432D526EFE9D941E5EFE58F3C89F1AAB6CB7152091961DD854431, > 4600B9FF184F6AB71AA0CF6873E580FB0A31D75CE1218998057E9A185A5FFBB2, > 5E5892EA6C2B4A07BE998403A04127C7924E5539DB3EB0D27B9BD34D11A1575B, > CA3065B754E6CE79E4BF128464F4A202B0F2CF0336FBE73FA33F13776CD01CE8, > F036789D92EE18032556D9D0ECAC75073CED52226E1833001E379740E23E183D, > F33BFE4B1AF562D40E5B9D9F5D4B34EA6734F8F3A06F99535F100F957958D9BA, > F47F833BFD4A7E55C128DD76DB3666EEFFD0F5EDA24BF31D6F2427BA092D, > FA9D1D2B8D0FB50CFE306FA6024EC48BD771562878B9B70D38D106DF4E61147A > > > Unable to detect a malformed ogg file. The error thrown was > Exception in thread "main" java.io.IOException: Asked to read 4335 bytes > from 0 but hit EoF at 780 > at org.gagravarr.ogg.IOUtils.readFully(IOUtils.java:39) > at org.gagravarr.ogg.IOUtils.readFully(IOUtils.java:31) > at org.gagravarr.ogg.OggPage.(OggPage.java:82) > at > org.gagravarr.ogg.OggPacketReader.getNextPacket(OggPacketReader.java:116) > at org.gagravarr.tika.OggDetector.detect(OggDetector.java:97) > at > org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61) > at org.apache.tika.cli.TikaCLI$10.process(TikaCLI.java:291) > at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:477) > at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:134) > [xdatadeploy@xdata upload]$ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1856) Error while parsing an ogg file
[ https://issues.apache.org/jira/browse/TIKA-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148629#comment-15148629 ] Nick Burch commented on TIKA-1856: -- Picking one of those files to look at,{{oggz-info}} processes it without warning. {{ogginfo}} warns about the EOS being missing on both streams, but otherwise gives no errors Trying with mplayer, it reports some issues with the file: {code} [vorbis @ 0x7f1470f5cb00]partition out of bounds: type, begin, end, size, blocksize: 2, 0, 192, 16, 1024 [vorbis @ 0x7f1470f5cb00] Vorbis setup header packet corrupt (residues). [vorbis @ 0x7f1470f5cb00]Setup header corrupt. Could not open codec. {code} Do you know where these files came from? It looks like they have been truncated some how, could that be the case? (If so, we'd probably just need to improve the truncation error handling) > Error while parsing an ogg file > --- > > Key: TIKA-1856 > URL: https://issues.apache.org/jira/browse/TIKA-1856 > Project: Tika > Issue Type: Bug > Components: detector, parser >Affects Versions: 1.12 > Environment: python >Reporter: Yash Tanna > Labels: newbie, tika > Attachments: > 1B7A7AE8FE999D22E2A677EFDA38982C8957CF77BEF3371E48852F7D67A7, > 1DE811ACAB8432D526EFE9D941E5EFE58F3C89F1AAB6CB7152091961DD854431, > 4600B9FF184F6AB71AA0CF6873E580FB0A31D75CE1218998057E9A185A5FFBB2, > 5E5892EA6C2B4A07BE998403A04127C7924E5539DB3EB0D27B9BD34D11A1575B, > CA3065B754E6CE79E4BF128464F4A202B0F2CF0336FBE73FA33F13776CD01CE8, > F036789D92EE18032556D9D0ECAC75073CED52226E1833001E379740E23E183D, > F33BFE4B1AF562D40E5B9D9F5D4B34EA6734F8F3A06F99535F100F957958D9BA, > F47F833BFD4A7E55C128DD76DB3666EEFFD0F5EDA24BF31D6F2427BA092D, > FA9D1D2B8D0FB50CFE306FA6024EC48BD771562878B9B70D38D106DF4E61147A > > > Unable to detect a malformed ogg file. The error thrown was > Exception in thread "main" java.io.IOException: Asked to read 4335 bytes > from 0 but hit EoF at 780 > at org.gagravarr.ogg.IOUtils.readFully(IOUtils.java:39) > at org.gagravarr.ogg.IOUtils.readFully(IOUtils.java:31) > at org.gagravarr.ogg.OggPage.(OggPage.java:82) > at > org.gagravarr.ogg.OggPacketReader.getNextPacket(OggPacketReader.java:116) > at org.gagravarr.tika.OggDetector.detect(OggDetector.java:97) > at > org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61) > at org.apache.tika.cli.TikaCLI$10.process(TikaCLI.java:291) > at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:477) > at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:134) > [xdatadeploy@xdata upload]$ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1856) Error while parsing an ogg file
[ https://issues.apache.org/jira/browse/TIKA-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148009#comment-15148009 ] Chris A. Mattmann commented on TIKA-1856: - thanks [~yashtanna93] can you please attach the file? [~gagravarr] can you have a look? > Error while parsing an ogg file > --- > > Key: TIKA-1856 > URL: https://issues.apache.org/jira/browse/TIKA-1856 > Project: Tika > Issue Type: Bug > Components: detector, parser >Affects Versions: 1.12 > Environment: python >Reporter: Yash Tanna > Labels: newbie, tika > > Unable to detect a malformed ogg file. The error thrown was > Exception in thread "main" java.io.IOException: Asked to read 4335 bytes > from 0 but hit EoF at 780 > at org.gagravarr.ogg.IOUtils.readFully(IOUtils.java:39) > at org.gagravarr.ogg.IOUtils.readFully(IOUtils.java:31) > at org.gagravarr.ogg.OggPage.(OggPage.java:82) > at > org.gagravarr.ogg.OggPacketReader.getNextPacket(OggPacketReader.java:116) > at org.gagravarr.tika.OggDetector.detect(OggDetector.java:97) > at > org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61) > at org.apache.tika.cli.TikaCLI$10.process(TikaCLI.java:291) > at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:477) > at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:134) > [xdatadeploy@xdata upload]$ -- This message was sent by Atlassian JIRA (v6.3.4#6332)