[jira] [Commented] (TIKA-1215) Regression: Unable to parse a mp3 file on 1.5 which parsed successfully on 1.4

2014-01-14 Thread Hong-Thai Nguyen (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870573#comment-13870573
 ] 

Hong-Thai Nguyen commented on TIKA-1215:


Great catch. Thank [~jukkaz]

 Regression: Unable to parse a mp3 file on 1.5 which parsed successfully on 1.4
 --

 Key: TIKA-1215
 URL: https://issues.apache.org/jira/browse/TIKA-1215
 Project: Tika
  Issue Type: Bug
  Components: parser
Affects Versions: 1.5
Reporter: Hong-Thai Nguyen
Priority: Critical
 Attachments: Centres 080805@0650 RTBF Matin Première - A propos des 
 rues de Dublin et Dubreucq.mp3, TIKA-1215-fix-prefix-namespaces.patch, 
 tika-1215-without-wildcard.patch


 With attached file, 1.5 raises this exception on parsing. This file has no 
 problem on 1.4
 {code}
 ...
 Caused by: org.xml.sax.SAXException: Namespace http://www.w3.org/1999/xhtml 
 not declared
   at 
 org.apache.tika.sax.ToXMLContentHandler$ElementInfo.getPrefix(ToXMLContentHandler.java:62)
   at 
 org.apache.tika.sax.ToXMLContentHandler$ElementInfo.getQName(ToXMLContentHandler.java:68)
   at 
 org.apache.tika.sax.ToXMLContentHandler.startElement(ToXMLContentHandler.java:148)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284)
   at 
 org.apache.tika.sax.XHTMLContentHandler.element(XHTMLContentHandler.java:323)
   at org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:107)
   at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
   at 
 com.polyspot.document.converter.DocumentConverter.realizeTikaConversion(DocumentConverter.java:221)
   ... 15 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (TIKA-1215) Regression: Unable to parse a mp3 file on 1.5 which parsed successfully on 1.4

2014-01-13 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869528#comment-13869528
 ] 

Tim Allison commented on TIKA-1215:
---

[~thaichat04] thank you for sending a clean patch. This area of the code base 
is not exceedingly familiar to me, but if I understand Tika's history and your 
code correctly, your if statement wasn't necessary in 1.4, and (based on a very 
quick look) it looks like nothing else in the relevant lines of the MP3 parser 
changed between 1.4 and trunk.  Are you able to determine what changed btwn 1.4 
and trunk that led to this regression?  Thank you!

 Regression: Unable to parse a mp3 file on 1.5 which parsed successfully on 1.4
 --

 Key: TIKA-1215
 URL: https://issues.apache.org/jira/browse/TIKA-1215
 Project: Tika
  Issue Type: Bug
  Components: parser
Affects Versions: 1.5
Reporter: Hong-Thai Nguyen
Priority: Critical
 Attachments: Centres 080805@0650 RTBF Matin Première - A propos des 
 rues de Dublin et Dubreucq.mp3, TIKA-1215-fix-prefix-namespaces.patch, 
 tika-1215-without-wildcard.patch


 With attached file, 1.5 raises this exception on parsing. This file has no 
 problem on 1.4
 {code}
 ...
 Caused by: org.xml.sax.SAXException: Namespace http://www.w3.org/1999/xhtml 
 not declared
   at 
 org.apache.tika.sax.ToXMLContentHandler$ElementInfo.getPrefix(ToXMLContentHandler.java:62)
   at 
 org.apache.tika.sax.ToXMLContentHandler$ElementInfo.getQName(ToXMLContentHandler.java:68)
   at 
 org.apache.tika.sax.ToXMLContentHandler.startElement(ToXMLContentHandler.java:148)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284)
   at 
 org.apache.tika.sax.XHTMLContentHandler.element(XHTMLContentHandler.java:323)
   at org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:107)
   at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
   at 
 com.polyspot.document.converter.DocumentConverter.realizeTikaConversion(DocumentConverter.java:221)
   ... 15 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (TIKA-1215) Regression: Unable to parse a mp3 file on 1.5 which parsed successfully on 1.4

2014-01-13 Thread Hong-Thai Nguyen (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869590#comment-13869590
 ] 

Hong-Thai Nguyen commented on TIKA-1215:


[~talli...@apache.org], here's XML of input to parse:
{noformat}
h1 xmlns=http://www.w3.org/1999/xhtml;Matin Première - Tour des régions 
080806/h1
pRTBF - La Première/p
pSpeech/p
p101698.914/p
pXXX - 
A propos du contrat de quartier rues Dublin/Dubreucq/p
{noformat}

I think this regression came from TIKA-1070
{code}
currentElement = currentElement.parent;
{code}

The parentElement of p is null, then getPrefix() raised exception, that's 
different from 1.4

 Regression: Unable to parse a mp3 file on 1.5 which parsed successfully on 1.4
 --

 Key: TIKA-1215
 URL: https://issues.apache.org/jira/browse/TIKA-1215
 Project: Tika
  Issue Type: Bug
  Components: parser
Affects Versions: 1.5
Reporter: Hong-Thai Nguyen
Priority: Critical
 Attachments: Centres 080805@0650 RTBF Matin Première - A propos des 
 rues de Dublin et Dubreucq.mp3, TIKA-1215-fix-prefix-namespaces.patch, 
 tika-1215-without-wildcard.patch


 With attached file, 1.5 raises this exception on parsing. This file has no 
 problem on 1.4
 {code}
 ...
 Caused by: org.xml.sax.SAXException: Namespace http://www.w3.org/1999/xhtml 
 not declared
   at 
 org.apache.tika.sax.ToXMLContentHandler$ElementInfo.getPrefix(ToXMLContentHandler.java:62)
   at 
 org.apache.tika.sax.ToXMLContentHandler$ElementInfo.getQName(ToXMLContentHandler.java:68)
   at 
 org.apache.tika.sax.ToXMLContentHandler.startElement(ToXMLContentHandler.java:148)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284)
   at 
 org.apache.tika.sax.XHTMLContentHandler.element(XHTMLContentHandler.java:323)
   at org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:107)
   at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
   at 
 com.polyspot.document.converter.DocumentConverter.realizeTikaConversion(DocumentConverter.java:221)
   ... 15 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (TIKA-1215) Regression: Unable to parse a mp3 file on 1.5 which parsed successfully on 1.4

2014-01-11 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868868#comment-13868868
 ] 

Nick Burch commented on TIKA-1215:
--

Are you able to reproduce the file with a smaller MP3 than the one in your 
patch?

Also, your patch is a bit hard to review, as most of it is whitespace changes. 
If there is inconsistent whitespace in a file that needs fixing, it's normally 
better to post separate patches for the whitespace bit and the bug fix part, to 
make it easier to see what changed where, and hence focus the review on the 
important parts

 Regression: Unable to parse a mp3 file on 1.5 which parsed successfully on 1.4
 --

 Key: TIKA-1215
 URL: https://issues.apache.org/jira/browse/TIKA-1215
 Project: Tika
  Issue Type: Bug
  Components: parser
Affects Versions: 1.5
Reporter: Hong-Thai Nguyen
Priority: Critical
 Attachments: Centres 080805@0650 RTBF Matin Première - A propos des 
 rues de Dublin et Dubreucq.mp3, TIKA-1215-fix-prefix-namespaces.patch


 With attached file, 1.5 raises this exception on parsing. This file has no 
 problem on 1.4
 {code}
 ...
 Caused by: org.xml.sax.SAXException: Namespace http://www.w3.org/1999/xhtml 
 not declared
   at 
 org.apache.tika.sax.ToXMLContentHandler$ElementInfo.getPrefix(ToXMLContentHandler.java:62)
   at 
 org.apache.tika.sax.ToXMLContentHandler$ElementInfo.getQName(ToXMLContentHandler.java:68)
   at 
 org.apache.tika.sax.ToXMLContentHandler.startElement(ToXMLContentHandler.java:148)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284)
   at 
 org.apache.tika.sax.XHTMLContentHandler.element(XHTMLContentHandler.java:323)
   at org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:107)
   at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
   at 
 com.polyspot.document.converter.DocumentConverter.realizeTikaConversion(DocumentConverter.java:221)
   ... 15 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)