[jira] [Commented] (TIKA-1856) Error while parsing an ogg file

2016-02-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151071#comment-15151071
 ] 

Hudson commented on TIKA-1856:
--

SUCCESS: Integrated in tika-trunk-jdk1.7 #910 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/910/])
TIKA-1856 Upgrade the Ogg dependency for the truncated files fix (nick: rev 
2eb49a721b77edf23c3588326c8d480332d79722)
* tika-parsers/pom.xml


> Error while parsing an ogg file
> ---
>
> Key: TIKA-1856
> URL: https://issues.apache.org/jira/browse/TIKA-1856
> Project: Tika
>  Issue Type: Bug
>  Components: detector, parser
>Affects Versions: 1.12
> Environment: python
>Reporter: Yash Tanna
>  Labels: newbie, tika
> Fix For: 1.13
>
> Attachments: 
> 1B7A7AE8FE999D22E2A677EFDA38982C8957CF77BEF3371E48852F7D67A7, 
> 1DE811ACAB8432D526EFE9D941E5EFE58F3C89F1AAB6CB7152091961DD854431, 
> 4600B9FF184F6AB71AA0CF6873E580FB0A31D75CE1218998057E9A185A5FFBB2, 
> 5E5892EA6C2B4A07BE998403A04127C7924E5539DB3EB0D27B9BD34D11A1575B, 
> CA3065B754E6CE79E4BF128464F4A202B0F2CF0336FBE73FA33F13776CD01CE8, 
> F036789D92EE18032556D9D0ECAC75073CED52226E1833001E379740E23E183D, 
> F33BFE4B1AF562D40E5B9D9F5D4B34EA6734F8F3A06F99535F100F957958D9BA, 
> F47F833BFD4A7E55C128DD76DB3666EEFFD0F5EDA24BF31D6F2427BA092D, 
> FA9D1D2B8D0FB50CFE306FA6024EC48BD771562878B9B70D38D106DF4E61147A
>
>
> Unable to detect a malformed ogg file. The error thrown was 
> Exception in thread "main" java.io.IOException: Asked to read 4335 bytes
> from 0 but hit EoF at 780
> at org.gagravarr.ogg.IOUtils.readFully(IOUtils.java:39)
> at org.gagravarr.ogg.IOUtils.readFully(IOUtils.java:31)
> at org.gagravarr.ogg.OggPage.(OggPage.java:82)
> at
> org.gagravarr.ogg.OggPacketReader.getNextPacket(OggPacketReader.java:116)
> at org.gagravarr.tika.OggDetector.detect(OggDetector.java:97)
> at
> org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)
> at org.apache.tika.cli.TikaCLI$10.process(TikaCLI.java:291)
> at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:477)
> at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:134)
> [xdatadeploy@xdata upload]$



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1856) Error while parsing an ogg file

2016-02-16 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149012#comment-15149012
 ] 

Chris A. Mattmann commented on TIKA-1856:
-

Hey Nick It's possible they were truncated from Nutch crawls and content 
limits. See http://github.com/chrismattmann/trec-dd-polar/ for a description of 
the dataset.

> Error while parsing an ogg file
> ---
>
> Key: TIKA-1856
> URL: https://issues.apache.org/jira/browse/TIKA-1856
> Project: Tika
>  Issue Type: Bug
>  Components: detector, parser
>Affects Versions: 1.12
> Environment: python
>Reporter: Yash Tanna
>  Labels: newbie, tika
> Attachments: 
> 1B7A7AE8FE999D22E2A677EFDA38982C8957CF77BEF3371E48852F7D67A7, 
> 1DE811ACAB8432D526EFE9D941E5EFE58F3C89F1AAB6CB7152091961DD854431, 
> 4600B9FF184F6AB71AA0CF6873E580FB0A31D75CE1218998057E9A185A5FFBB2, 
> 5E5892EA6C2B4A07BE998403A04127C7924E5539DB3EB0D27B9BD34D11A1575B, 
> CA3065B754E6CE79E4BF128464F4A202B0F2CF0336FBE73FA33F13776CD01CE8, 
> F036789D92EE18032556D9D0ECAC75073CED52226E1833001E379740E23E183D, 
> F33BFE4B1AF562D40E5B9D9F5D4B34EA6734F8F3A06F99535F100F957958D9BA, 
> F47F833BFD4A7E55C128DD76DB3666EEFFD0F5EDA24BF31D6F2427BA092D, 
> FA9D1D2B8D0FB50CFE306FA6024EC48BD771562878B9B70D38D106DF4E61147A
>
>
> Unable to detect a malformed ogg file. The error thrown was 
> Exception in thread "main" java.io.IOException: Asked to read 4335 bytes
> from 0 but hit EoF at 780
> at org.gagravarr.ogg.IOUtils.readFully(IOUtils.java:39)
> at org.gagravarr.ogg.IOUtils.readFully(IOUtils.java:31)
> at org.gagravarr.ogg.OggPage.(OggPage.java:82)
> at
> org.gagravarr.ogg.OggPacketReader.getNextPacket(OggPacketReader.java:116)
> at org.gagravarr.tika.OggDetector.detect(OggDetector.java:97)
> at
> org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)
> at org.apache.tika.cli.TikaCLI$10.process(TikaCLI.java:291)
> at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:477)
> at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:134)
> [xdatadeploy@xdata upload]$



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1856) Error while parsing an ogg file

2016-02-16 Thread Yash Tanna (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148861#comment-15148861
 ] 

Yash Tanna commented on TIKA-1856:
--

The files are a part of TREC Dynamic Domain Polar Dataset which is collected by 
[~chrismattmann] and his students.

> Error while parsing an ogg file
> ---
>
> Key: TIKA-1856
> URL: https://issues.apache.org/jira/browse/TIKA-1856
> Project: Tika
>  Issue Type: Bug
>  Components: detector, parser
>Affects Versions: 1.12
> Environment: python
>Reporter: Yash Tanna
>  Labels: newbie, tika
> Attachments: 
> 1B7A7AE8FE999D22E2A677EFDA38982C8957CF77BEF3371E48852F7D67A7, 
> 1DE811ACAB8432D526EFE9D941E5EFE58F3C89F1AAB6CB7152091961DD854431, 
> 4600B9FF184F6AB71AA0CF6873E580FB0A31D75CE1218998057E9A185A5FFBB2, 
> 5E5892EA6C2B4A07BE998403A04127C7924E5539DB3EB0D27B9BD34D11A1575B, 
> CA3065B754E6CE79E4BF128464F4A202B0F2CF0336FBE73FA33F13776CD01CE8, 
> F036789D92EE18032556D9D0ECAC75073CED52226E1833001E379740E23E183D, 
> F33BFE4B1AF562D40E5B9D9F5D4B34EA6734F8F3A06F99535F100F957958D9BA, 
> F47F833BFD4A7E55C128DD76DB3666EEFFD0F5EDA24BF31D6F2427BA092D, 
> FA9D1D2B8D0FB50CFE306FA6024EC48BD771562878B9B70D38D106DF4E61147A
>
>
> Unable to detect a malformed ogg file. The error thrown was 
> Exception in thread "main" java.io.IOException: Asked to read 4335 bytes
> from 0 but hit EoF at 780
> at org.gagravarr.ogg.IOUtils.readFully(IOUtils.java:39)
> at org.gagravarr.ogg.IOUtils.readFully(IOUtils.java:31)
> at org.gagravarr.ogg.OggPage.(OggPage.java:82)
> at
> org.gagravarr.ogg.OggPacketReader.getNextPacket(OggPacketReader.java:116)
> at org.gagravarr.tika.OggDetector.detect(OggDetector.java:97)
> at
> org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)
> at org.apache.tika.cli.TikaCLI$10.process(TikaCLI.java:291)
> at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:477)
> at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:134)
> [xdatadeploy@xdata upload]$



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1856) Error while parsing an ogg file

2016-02-16 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148629#comment-15148629
 ] 

Nick Burch commented on TIKA-1856:
--

Picking one of those files to look at,{{oggz-info}} processes it without 
warning. {{ogginfo}} warns about the EOS being missing on both streams, but 
otherwise gives no errors

Trying with mplayer, it reports some issues with the file:
{code}
[vorbis @ 0x7f1470f5cb00]partition out of bounds: type, begin, end, size, 
blocksize: 2, 0, 192, 16, 1024
[vorbis @ 0x7f1470f5cb00] Vorbis setup header packet corrupt (residues). 
[vorbis @ 0x7f1470f5cb00]Setup header corrupt.
Could not open codec.
{code}

Do you know where these files came from? It looks like they have been truncated 
some how, could that be the case? 

(If so, we'd probably just need to improve the truncation error handling)

> Error while parsing an ogg file
> ---
>
> Key: TIKA-1856
> URL: https://issues.apache.org/jira/browse/TIKA-1856
> Project: Tika
>  Issue Type: Bug
>  Components: detector, parser
>Affects Versions: 1.12
> Environment: python
>Reporter: Yash Tanna
>  Labels: newbie, tika
> Attachments: 
> 1B7A7AE8FE999D22E2A677EFDA38982C8957CF77BEF3371E48852F7D67A7, 
> 1DE811ACAB8432D526EFE9D941E5EFE58F3C89F1AAB6CB7152091961DD854431, 
> 4600B9FF184F6AB71AA0CF6873E580FB0A31D75CE1218998057E9A185A5FFBB2, 
> 5E5892EA6C2B4A07BE998403A04127C7924E5539DB3EB0D27B9BD34D11A1575B, 
> CA3065B754E6CE79E4BF128464F4A202B0F2CF0336FBE73FA33F13776CD01CE8, 
> F036789D92EE18032556D9D0ECAC75073CED52226E1833001E379740E23E183D, 
> F33BFE4B1AF562D40E5B9D9F5D4B34EA6734F8F3A06F99535F100F957958D9BA, 
> F47F833BFD4A7E55C128DD76DB3666EEFFD0F5EDA24BF31D6F2427BA092D, 
> FA9D1D2B8D0FB50CFE306FA6024EC48BD771562878B9B70D38D106DF4E61147A
>
>
> Unable to detect a malformed ogg file. The error thrown was 
> Exception in thread "main" java.io.IOException: Asked to read 4335 bytes
> from 0 but hit EoF at 780
> at org.gagravarr.ogg.IOUtils.readFully(IOUtils.java:39)
> at org.gagravarr.ogg.IOUtils.readFully(IOUtils.java:31)
> at org.gagravarr.ogg.OggPage.(OggPage.java:82)
> at
> org.gagravarr.ogg.OggPacketReader.getNextPacket(OggPacketReader.java:116)
> at org.gagravarr.tika.OggDetector.detect(OggDetector.java:97)
> at
> org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)
> at org.apache.tika.cli.TikaCLI$10.process(TikaCLI.java:291)
> at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:477)
> at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:134)
> [xdatadeploy@xdata upload]$



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1856) Error while parsing an ogg file

2016-02-15 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148009#comment-15148009
 ] 

Chris A. Mattmann commented on TIKA-1856:
-

thanks [~yashtanna93] can you please attach the file? [~gagravarr] can you have 
a look?

> Error while parsing an ogg file
> ---
>
> Key: TIKA-1856
> URL: https://issues.apache.org/jira/browse/TIKA-1856
> Project: Tika
>  Issue Type: Bug
>  Components: detector, parser
>Affects Versions: 1.12
> Environment: python
>Reporter: Yash Tanna
>  Labels: newbie, tika
>
> Unable to detect a malformed ogg file. The error thrown was 
> Exception in thread "main" java.io.IOException: Asked to read 4335 bytes
> from 0 but hit EoF at 780
> at org.gagravarr.ogg.IOUtils.readFully(IOUtils.java:39)
> at org.gagravarr.ogg.IOUtils.readFully(IOUtils.java:31)
> at org.gagravarr.ogg.OggPage.(OggPage.java:82)
> at
> org.gagravarr.ogg.OggPacketReader.getNextPacket(OggPacketReader.java:116)
> at org.gagravarr.tika.OggDetector.detect(OggDetector.java:97)
> at
> org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)
> at org.apache.tika.cli.TikaCLI$10.process(TikaCLI.java:291)
> at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:477)
> at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:134)
> [xdatadeploy@xdata upload]$



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)