Markus Jelsma created NUTCH-2214:
------------------------------------

             Summary: Index clean to be flexible on what it deletes
                 Key: NUTCH-2214
                 URL: https://issues.apache.org/jira/browse/NUTCH-2214
             Project: Nutch
          Issue Type: Improvement
    Affects Versions: 1.11
            Reporter: Markus Jelsma
            Assignee: Markus Jelsma
             Fix For: 1.13


Nutch clean removes all useless records, but if Nutch is configured correctly 
(-deleteGone etc), the index should only contain duplicates, if existing. On a 
large index, this could result in Nutch sending millions of getById's to Solr, 
for records that don't exist in the first place.

This issue will make it configurable on what to delete, e.g. useless records 
(404, 30x) or duplicates.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to