[ 
https://issues.apache.org/jira/browse/TIKA-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Gauss II resolved TIKA-1179.
--------------------------------

    Resolution: Cannot Reproduce
      Assignee: Ray Gauss II

I've just confirmed the described behavior in Tika 1.4, however, it appears the 
file is parsed just fine in 1.5!

You can verify by downloading a 1.5 snapshot of {{tika-app}} ([current 
link|https://repository.apache.org/content/groups/snapshots/org/apache/tika/tika-app/1.5-SNAPSHOT/tika-app-1.5-20130927.201341-30.jar]),
 running the app, i.e.:
{code}
java -jar tika-app-1.5-20130927.201341-30.jar
{code}
and dropping {{corrupt.mp3}} onto the app window.

> A corrupt mp3 file can cause an infinite loop in Mp3Parser
> ----------------------------------------------------------
>
>                 Key: TIKA-1179
>                 URL: https://issues.apache.org/jira/browse/TIKA-1179
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.4
>            Reporter: Marius Dumitru Florea
>            Assignee: Ray Gauss II
>             Fix For: 1.5
>
>         Attachments: corrupt.mp3
>
>
> I have a thread that indexes (among other things) files using Apache Sorl. 
> This thread hangs (still running but with no progress) when trying to extract 
> meta data from the mp3 file attached to this issue. Here are a couple of 
> thread dumps taken at various moments:
> {noformat}
> "XWiki Solr index thread" daemon prio=10 tid=0x0000000003b72800 nid=0x64b5 
> runnable [0x00007f46f4617000]
>    java.lang.Thread.State: RUNNABLE
>       at 
> org.apache.commons.io.input.AutoCloseInputStream.close(AutoCloseInputStream.java:63)
>       at 
> org.apache.commons.io.input.AutoCloseInputStream.afterRead(AutoCloseInputStream.java:77)
>       at 
> org.apache.commons.io.input.ProxyInputStream.read(ProxyInputStream.java:99)
>       at java.io.BufferedInputStream.fill(Unknown Source)
>       at java.io.BufferedInputStream.read1(Unknown Source)
>       at java.io.BufferedInputStream.read(Unknown Source)
>       - locked <0x00000000cb7094e8> (a java.io.BufferedInputStream)
>       at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99)
>       at java.io.FilterInputStream.read(Unknown Source)
>       at org.apache.tika.io.TailStream.read(TailStream.java:117)
>       at org.apache.tika.io.TailStream.skip(TailStream.java:140)
>       at org.apache.tika.parser.mp3.MpegStream.skipStream(MpegStream.java:283)
>       at org.apache.tika.parser.mp3.MpegStream.skipFrame(MpegStream.java:160)
>       at 
> org.apache.tika.parser.mp3.Mp3Parser.getAllTagHandlers(Mp3Parser.java:193)
>       at org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:71)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>       at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>       at org.apache.tika.Tika.parseToString(Tika.java:380)
>       ...
> {noformat}
> {noformat}
> "XWiki Solr index thread" daemon prio=10 tid=0x0000000003b72800 nid=0x64b5 
> runnable [0x00007f46f4618000]
>    java.lang.Thread.State: RUNNABLE
>       at org.apache.tika.io.TailStream.skip(TailStream.java:133)
>       at org.apache.tika.parser.mp3.MpegStream.skipStream(MpegStream.java:283)
>       at org.apache.tika.parser.mp3.MpegStream.skipFrame(MpegStream.java:160)
>       at 
> org.apache.tika.parser.mp3.Mp3Parser.getAllTagHandlers(Mp3Parser.java:193)
>       at org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:71)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>       at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>       at org.apache.tika.Tika.parseToString(Tika.java:380)
>       ...
> {noformat}
> {noformat}
> "XWiki Solr index thread" daemon prio=10 tid=0x0000000003b72800 nid=0x64b5 
> runnable [0x00007f46f4617000]
>    java.lang.Thread.State: RUNNABLE
>       at java.io.BufferedInputStream.read1(Unknown Source)
>       at java.io.BufferedInputStream.read(Unknown Source)
>       - locked <0x00000000cb1be170> (a java.io.BufferedInputStream)
>       at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99)
>       at java.io.FilterInputStream.read(Unknown Source)
>       at org.apache.tika.io.TailStream.read(TailStream.java:117)
>       at org.apache.tika.io.TailStream.skip(TailStream.java:140)
>       at org.apache.tika.parser.mp3.MpegStream.skipStream(MpegStream.java:283)
>       at org.apache.tika.parser.mp3.MpegStream.skipFrame(MpegStream.java:160)
>       at 
> org.apache.tika.parser.mp3.Mp3Parser.getAllTagHandlers(Mp3Parser.java:193)
>       at org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:71)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>       at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>       at org.apache.tika.Tika.parseToString(Tika.java:380)
>       ...
> {noformat}
> This makes our Solr indexer very fragile as it prevents it from indexing 
> other files thus leading to incomplete search results.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to