[ https://issues.apache.org/jira/browse/TIKA-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061531#comment-14061531 ]
Tien Nguyen Manh commented on TIKA-1365: ---------------------------------------- [~tpalsulich] Ah yes, I tried with url directly java -jar tika-app-1.5.jar http://lucene.apache.org/core/discussion.html and it failed. > Incorrectly MimeType detection for Apache Lucene web site > --------------------------------------------------------- > > Key: TIKA-1365 > URL: https://issues.apache.org/jira/browse/TIKA-1365 > Project: Tika > Issue Type: Bug > Components: detector > Affects Versions: 1.5 > Reporter: Tien Nguyen Manh > Attachments: discussion.html > > > Tika 1.5 detect many page from apache lucene web site as xml, for example > this page > http://lucene.apache.org/core/discussion.html > Here are error log:, it failed to parse becuase it use xml parser > Apache Tika was unable to parse the document > at http://lucene.apache.org/core/discussion.html. > The full exception stack trace is included below: > org.apache.tika.exception.TikaException: XML parse error > at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:78) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at org.apache.tika.gui.TikaGUI.handleStream(TikaGUI.java:320) > at org.apache.tika.gui.TikaGUI.openURL(TikaGUI.java:293) > at org.apache.tika.gui.TikaGUI.actionPerformed(TikaGUI.java:247) > at > javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2018) -- This message was sent by Atlassian JIRA (v6.2#6252)