Rename HTMLParserFilter
------------------------
Key: NUTCH-861
URL: https://issues.apache.org/jira/browse/NUTCH-861
Project: Nutch
Issue Type: Wish
Components: parser
Affects Versions: 2.0
Reporter: Julien Nioche
Fix For: 2.0
The name 'HTMLParserFilter' is slightly confusing as it gives the impression
that the implementations of this endpoint are getting only HTML documents.
The plugin parse-tika calls the HTMLParserFilters and passes them a DOM
representation of the XHTML-like documents it got from the underlying Tika
parsers. This means that we are getting a DOM representation for documents in
any format recognised by Tika and not only HTML.
What about renaming HTMLParserFilter into ParserFilter? Any other suggestions?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.