[ 
https://issues.apache.org/jira/browse/TIKA-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15149012#comment-15149012
 ] 

Chris A. Mattmann commented on TIKA-1856:
-----------------------------------------

Hey Nick It's possible they were truncated from Nutch crawls and content 
limits. See http://github.com/chrismattmann/trec-dd-polar/ for a description of 
the dataset.

> Error while parsing an ogg file
> -------------------------------
>
>                 Key: TIKA-1856
>                 URL: https://issues.apache.org/jira/browse/TIKA-1856
>             Project: Tika
>          Issue Type: Bug
>          Components: detector, parser
>    Affects Versions: 1.12
>         Environment: python
>            Reporter: Yash Tanna
>              Labels: newbie, tika
>         Attachments: 
> 1B7A7AE8FE999D22E2A677EFDA38982C8957CF77BEF33717777E48852F7D67A7, 
> 1DE811ACAB8432D526EFE9D941E5EFE58F3C89F1AAB6CB7152091961DD854431, 
> 4600B9FF184F6AB71AA0CF6873E580FB0A31D75CE1218998057E9A185A5FFBB2, 
> 5E5892EA6C2B4A07BE998403A04127C7924E5539DB3EB0D27B9BD34D11A1575B, 
> CA3065B754E6CE79E4BF128464F4A202B0F2CF0336FBE73FA33F13776CD01CE8, 
> F036789D92EE18032556D9D0ECAC75073CED52226E1833001E379740E23E183D, 
> F33BFE4B1AF562D40E5B9D9F5D4B34EA6734F8F3A06F99535F100F957958D9BA, 
> F47F833BFD4A7E55C128DD76DB3666EEFFD0F5EDA24BF3EEEE1D6F2427BA092D, 
> FA9D1D2B8D0FB50CFE306FA6024EC48BD771562878B9B70D38D106DF4E61147A
>
>
> Unable to detect a malformed ogg file. The error thrown was 
> Exception in thread "main" java.io.IOException: Asked to read 4335 bytes
> from 0 but hit EoF at 780
>         at org.gagravarr.ogg.IOUtils.readFully(IOUtils.java:39)
>         at org.gagravarr.ogg.IOUtils.readFully(IOUtils.java:31)
>         at org.gagravarr.ogg.OggPage.<init>(OggPage.java:82)
>         at
> org.gagravarr.ogg.OggPacketReader.getNextPacket(OggPacketReader.java:116)
>         at org.gagravarr.tika.OggDetector.detect(OggDetector.java:97)
>         at
> org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)
>         at org.apache.tika.cli.TikaCLI$10.process(TikaCLI.java:291)
>         at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:477)
>         at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:134)
> [xdatadeploy@xdata upload]$



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to