Marius Dumitru Florea created TIKA-1179:
-------------------------------------------

             Summary: A corrupt mp3 file can cause an infinite loop in Mp3Parser
                 Key: TIKA-1179
                 URL: https://issues.apache.org/jira/browse/TIKA-1179
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.4
            Reporter: Marius Dumitru Florea
             Fix For: 1.5


I have a thread that indexes (among other things) files using Apache Sorl. This 
thread hangs (still running but with no progress) when trying to extract meta 
data from the mp3 file attached to this issue. Here are a couple of thread 
dumps taken at various moments:

{noformat}
"XWiki Solr index thread" daemon prio=10 tid=0x0000000003b72800 nid=0x64b5 
runnable [0x00007f46f4617000]
   java.lang.Thread.State: RUNNABLE
        at 
org.apache.commons.io.input.AutoCloseInputStream.close(AutoCloseInputStream.java:63)
        at 
org.apache.commons.io.input.AutoCloseInputStream.afterRead(AutoCloseInputStream.java:77)
        at 
org.apache.commons.io.input.ProxyInputStream.read(ProxyInputStream.java:99)
        at java.io.BufferedInputStream.fill(Unknown Source)
        at java.io.BufferedInputStream.read1(Unknown Source)
        at java.io.BufferedInputStream.read(Unknown Source)
        - locked <0x00000000cb7094e8> (a java.io.BufferedInputStream)
        at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99)
        at java.io.FilterInputStream.read(Unknown Source)
        at org.apache.tika.io.TailStream.read(TailStream.java:117)
        at org.apache.tika.io.TailStream.skip(TailStream.java:140)
        at org.apache.tika.parser.mp3.MpegStream.skipStream(MpegStream.java:283)
        at org.apache.tika.parser.mp3.MpegStream.skipFrame(MpegStream.java:160)
        at 
org.apache.tika.parser.mp3.Mp3Parser.getAllTagHandlers(Mp3Parser.java:193)
        at org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:71)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
        at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
        at org.apache.tika.Tika.parseToString(Tika.java:380)
        ...
{noformat}

{noformat}
"XWiki Solr index thread" daemon prio=10 tid=0x0000000003b72800 nid=0x64b5 
runnable [0x00007f46f4618000]
   java.lang.Thread.State: RUNNABLE
        at org.apache.tika.io.TailStream.skip(TailStream.java:133)
        at org.apache.tika.parser.mp3.MpegStream.skipStream(MpegStream.java:283)
        at org.apache.tika.parser.mp3.MpegStream.skipFrame(MpegStream.java:160)
        at 
org.apache.tika.parser.mp3.Mp3Parser.getAllTagHandlers(Mp3Parser.java:193)
        at org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:71)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
        at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
        at org.apache.tika.Tika.parseToString(Tika.java:380)
        ...
{noformat}

{noformat}
"XWiki Solr index thread" daemon prio=10 tid=0x0000000003b72800 nid=0x64b5 
runnable [0x00007f46f4617000]
   java.lang.Thread.State: RUNNABLE
        at java.io.BufferedInputStream.read1(Unknown Source)
        at java.io.BufferedInputStream.read(Unknown Source)
        - locked <0x00000000cb1be170> (a java.io.BufferedInputStream)
        at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99)
        at java.io.FilterInputStream.read(Unknown Source)
        at org.apache.tika.io.TailStream.read(TailStream.java:117)
        at org.apache.tika.io.TailStream.skip(TailStream.java:140)
        at org.apache.tika.parser.mp3.MpegStream.skipStream(MpegStream.java:283)
        at org.apache.tika.parser.mp3.MpegStream.skipFrame(MpegStream.java:160)
        at 
org.apache.tika.parser.mp3.Mp3Parser.getAllTagHandlers(Mp3Parser.java:193)
        at org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:71)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
        at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
        at org.apache.tika.Tika.parseToString(Tika.java:380)
        ...
{noformat}

This makes our Solr indexer very fragile as it prevents it from indexing other 
files thus leading to incomplete search results.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to