[
https://issues.apache.org/jira/browse/NUTCH-2772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17046773#comment-17046773
]
ASF GitHub Bot commented on NUTCH-2772:
---------------------------------------
sebastian-nagel commented on pull request #500: NUTCH-2772 Debugging parse
filter to show serialized DOM tree
URL: https://github.com/apache/nutch/pull/500
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Debugging parse filter to show serialized DOM tree
> --------------------------------------------------
>
> Key: NUTCH-2772
> URL: https://issues.apache.org/jira/browse/NUTCH-2772
> Project: Nutch
> Issue Type: Improvement
> Components: parser, plugin
> Affects Versions: 1.16
> Reporter: Sebastian Nagel
> Assignee: Sebastian Nagel
> Priority: Major
> Fix For: 1.17
>
>
> A tool to show the DOM tree (eg. serialized as XML/HTML) might be helpful for
> debugging, eg., see NUTCH-2769. The DOM tree is available in the parse
> plugins and is also passed to the HtmlParseFilter plugins. We could provide a
> parsefilter-debug plugin which logs the DOM tree and add the serialized
> string representation to the parse data.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)