> Anybody know how to delete an index document in a distributed search 
> server?  Is that even possible?

I will assume by index document, you are
referring to a document that has been indexed.
If not, delete and forget.

When we need to remove a document, we go through
the process of filtering out the document by
using the following procedure:

1. build temporary nutch configuration directory
     build special filter files based on document(s) to be filtered out
     point NUTCH_CONF_DIR env var to temporary nutch configuration directory
2. run bin/nutch mergedb $NEWCRAWLDBDIR $CRAWLDBDIR -filter
3. run bin/nutch mergesegs $NEWSEGMENTSDIR -dir $SEGMENTSDIR -filter
4. run bin/nutch mergelinkdb $NEWLINKDBDIR $LINKDBDIR -filter
5. run standard set to rebuild index:
     bin/nutch index $NEWINDEXESDIR $CRAWLDBDIR $LINKDBDIR $NEWSEGLIST
     bin/nutch dedup $NEWINDEXESDIR
     bin/nutch merge -workingdir $NUTCHTMPDIR $NEWINDEXDIR $NEWINDEXESDIR

The variable names should be self-explanatory.  If not,
just let me know.

JohnM

-- 
john mendenhall
[EMAIL PROTECTED]
surf utopia
internet services

Reply via email to