Hi dev@,
Reading the code for the IndexerJob -deleteGone flag [0] you can clearly
see that we bundle deletion requests for 404s, redirects and duplicates
into one option.
This of course has pros and cons.
Does anyone wish to share their opinion on how this is implemented?
My opinion is that
1. The flag is either inappropriately named and should be renamed, or
2. We break out the individual functions into separate options e.g.,
-deleteGone (for records with an HTTP 404), -deleteRedirect (for
permanently or temporarily redirected records) and -deleteDuplicate (for
duplicate records).
Thanks for any consideration.
lewismc

[0]
https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/indexer/IndexingJob.java#L205-L206


-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc

Reply via email to