[ 
https://issues.apache.org/jira/browse/NUTCH-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13694564#comment-13694564
 ] 

Julien Nioche commented on NUTCH-1592:
--------------------------------------

Hi Seb

That's a very plausible explanation. Ideally we should add a test to parse-tika 
and parse-html to make sure that they produce the same DOM tree. The place to 
hack in parse-tika would be org.apache.nutch.parse.tika.DOMBuilder I believe. 
Not sure when I'll find the time to do that but at least it's now in JIRA.

Thanks
                
> XPath works on documents parsed with parse-html but not parse-tika
> ------------------------------------------------------------------
>
>                 Key: NUTCH-1592
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1592
>             Project: Nutch
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.6
>            Reporter: Julien Nioche
>             Fix For: 1.8
>
>
> The title says it all. The behaviour should be the same regardless of which 
> parser is used

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to