Jorge Luis Betancourt Gonzalez created NUTCH-1985:
-----------------------------------------------------

             Summary: Adding a main() method to the MimeTypeIndexingFilter
                 Key: NUTCH-1985
                 URL: https://issues.apache.org/jira/browse/NUTCH-1985
             Project: Nutch
          Issue Type: Improvement
          Components: indexer, metadata, plugin
    Affects Versions: 1.10
            Reporter: Jorge Luis Betancourt Gonzalez
            Priority: Minor
             Fix For: 1.10


This make very easy the testing of different rules files to check the 
expressions used to filter the content based on the MIME type detected. Until 
now the only way to check this was to do test crawls and check the stored data 
in Solr/Elasticsearch. 

This allows calling the file using the {{bin/nutch plugin}} command, something 
like:

{{bin/nutch plugin mimetype-filter 
org.apache.nutch.indexer.filter.MimeTypeIndexingFilter -h}}

Two options are accepted, {{-h, --help}} for showing the help and {{-rules}} 
for specifying a rules file to be used, this makes easy to play with different 
rules file until you get the desired behavior. 

After invoking the class, a valid MIME type must be entered for each line, and 
the output will be the same MIME type with a {{+}} or {{-}} sign in the 
beginning, indicating if the given MIME type is allowed or denied respectively.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to