Hello - I asked this on nutch-user but didn't get a response.  I am
using nutch-0.8.  I would like to fetch a few segments each night,
then update one large index.  Is it safe to run index on a group of
segments, then run index again on a different group of segments, then
merge?  I haven't found where this procedure is documented.  I would
like to do something like this:

assume I have four segments - I'll label them s0 s1 s2 s3 instead of
their timestamp names.

The first night I would index s0, s1 and rename the index to "A":
  nutch index crawl/indexes crawl/crawldb crawl/linkdb
crawl/segments/s0 crawl/segments/s1
  mv crawl/indexes/part-00000 crawl/indexes/A

Then on the second night I would index s2, s3 and rename the index to "B":
  nutch index crawl/indexes crawl/crawldb crawl/linkdb
crawl/segments/s2 crawl/segments/s3
  mv crawl/indexes/part-00000 crawl/indexes/B

Finally I would merge the two:
  nutch merge crawl/index crawl/indexes

Is this safe to do?  Is this how you're supposed to crawl nightly? 
Any docs I'm missing on this?  Again, this is all for nutch-0.8, so
some of the docs from 0.7 no longer apply.

Thank you

-- Derek Young


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid3432&bid#0486&dat1642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to