I also should note that the -deleteGone setting cannot be overriden via 
nutch-site.xml whereas  similar settings do have equivalent configuration 
properties in nutch-default.xml

https://github.com/apache/nutch/blob/master/conf/nutch-default.xml#L1361-L1373

On 2021/12/29 17:08:20 lewis john mcgibbney wrote:
> Hi dev@,
> Reading the code for the IndexerJob -deleteGone flag [0] you can clearly
> see that we bundle deletion requests for 404s, redirects and duplicates
> into one option.
> This of course has pros and cons.
> Does anyone wish to share their opinion on how this is implemented?
> My opinion is that
> 1. The flag is either inappropriately named and should be renamed, or
> 2. We break out the individual functions into separate options e.g.,
> -deleteGone (for records with an HTTP 404), -deleteRedirect (for
> permanently or temporarily redirected records) and -deleteDuplicate (for
> duplicate records).
> Thanks for any consideration.
> lewismc
> 
> [0]
> https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/indexer/IndexingJob.java#L205-L206
> 
> 
> -- 
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
> 

Reply via email to