[ https://issues.apache.org/jira/browse/NUTCH-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13694564#comment-13694564 ]
Julien Nioche commented on NUTCH-1592: -------------------------------------- Hi Seb That's a very plausible explanation. Ideally we should add a test to parse-tika and parse-html to make sure that they produce the same DOM tree. The place to hack in parse-tika would be org.apache.nutch.parse.tika.DOMBuilder I believe. Not sure when I'll find the time to do that but at least it's now in JIRA. Thanks > XPath works on documents parsed with parse-html but not parse-tika > ------------------------------------------------------------------ > > Key: NUTCH-1592 > URL: https://issues.apache.org/jira/browse/NUTCH-1592 > Project: Nutch > Issue Type: Bug > Components: parser > Affects Versions: 1.6 > Reporter: Julien Nioche > Fix For: 1.8 > > > The title says it all. The behaviour should be the same regardless of which > parser is used -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira