Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "bin/nutch solrclean" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/bin/nutch%20solrclean

Comment:
Update to reflect Nutch 1.3 API

New page:
Solrclean is an alias for org.apache.nutch.indexer.solr.SolrClean

The class scans a crawldb directory looking for entries with status DB_GONE 
(404) and sends delete requests to Solr for those documents. Once Solr receives 
the request the aforementioned documents are duly deleted. This maintains a 
healthier quality of Solr index.

Usage:

{{{ 
bin/nutch solrclean <crawldb> <solrurl>
}}}

'''<crawldb>''': The path to a crawldb directory. This enables us to search for 
404 URLs and update the solr index accordingly.

'''<solrurl>''': The solr instance we wish to update and remove 404 pages from 
e.g. ''http://localhost:8983/solr/''


CommandLineOptions

Reply via email to