I also should note that the -deleteGone setting cannot be overriden via nutch-site.xml whereas similar settings do have equivalent configuration properties in nutch-default.xml
https://github.com/apache/nutch/blob/master/conf/nutch-default.xml#L1361-L1373 On 2021/12/29 17:08:20 lewis john mcgibbney wrote: > Hi dev@, > Reading the code for the IndexerJob -deleteGone flag [0] you can clearly > see that we bundle deletion requests for 404s, redirects and duplicates > into one option. > This of course has pros and cons. > Does anyone wish to share their opinion on how this is implemented? > My opinion is that > 1. The flag is either inappropriately named and should be renamed, or > 2. We break out the individual functions into separate options e.g., > -deleteGone (for records with an HTTP 404), -deleteRedirect (for > permanently or temporarily redirected records) and -deleteDuplicate (for > duplicate records). > Thanks for any consideration. > lewismc > > [0] > https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/indexer/IndexingJob.java#L205-L206 > > > -- > http://home.apache.org/~lewismc/ > http://people.apache.org/keys/committer/lewismc >