[ 
https://issues.apache.org/jira/browse/TIKA-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062979#comment-14062979
 ] 

Tien Nguyen Manh commented on TIKA-1365:
----------------------------------------

i think XMLParser throws that exception is correct. But the problem is that  
tika shouldn't detect shouldn't it as XML type, it should be HTML type so that 
it will use HTMLParser instead of XMLParser

> Incorrectly MimeType detection for Apache Lucene web site
> ---------------------------------------------------------
>
>                 Key: TIKA-1365
>                 URL: https://issues.apache.org/jira/browse/TIKA-1365
>             Project: Tika
>          Issue Type: Bug
>          Components: detector
>    Affects Versions: 1.5
>            Reporter: Tien Nguyen Manh
>         Attachments: discussion.html
>
>
> Tika 1.5 detect many page from apache lucene web site as xml, for example 
> this page 
> http://lucene.apache.org/core/discussion.html
> Here are error log:, it failed to parse becuase it use xml parser
> Apache Tika was unable to parse the document
> at http://lucene.apache.org/core/discussion.html.
> The full exception stack trace is included below:
> org.apache.tika.exception.TikaException: XML parse error
>       at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:78)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>       at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>       at org.apache.tika.gui.TikaGUI.handleStream(TikaGUI.java:320)
>       at org.apache.tika.gui.TikaGUI.openURL(TikaGUI.java:293)
>       at org.apache.tika.gui.TikaGUI.actionPerformed(TikaGUI.java:247)
>       at 
> javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2018)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to