Rename HTMLParserFilter 
------------------------

                 Key: NUTCH-861
                 URL: https://issues.apache.org/jira/browse/NUTCH-861
             Project: Nutch
          Issue Type: Wish
          Components: parser
    Affects Versions: 2.0
            Reporter: Julien Nioche
             Fix For: 2.0


The name 'HTMLParserFilter' is slightly confusing as it gives the impression 
that the implementations of this endpoint are getting only HTML documents. 
The plugin parse-tika calls the HTMLParserFilters and passes them a DOM 
representation of the XHTML-like documents it got from the underlying Tika 
parsers. This means that we are getting a DOM representation for documents in 
any format recognised by Tika and not only HTML.

What about renaming HTMLParserFilter into ParserFilter? Any other suggestions?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to