I am by no means a Nutch expert yet, but this is how I merged two
separate segments so I could search through them:

Step 1: 
$ bin/nutch mergesegs -local -o testmerge -i
../crawls/foo/segments/20051018224434/
../crawls/bar/segments/20051018225505/
< bunch of stuff happens >

This creates a segment 20051023112848 in the testmerge folder. The
segment contains a combined index as well as copies of all information
from the two input segments. 

Step 2:
This wasn't quite enough to search with, however. I copied the index
folder and organized the directories into the same structure as used
during a crawl, then was able to run the Tomcat searcher on the new
segment.

After copying/moving/reorganizing I have:

$ ls -l testmerge/
total 0
drwxrwxrwx+ 2 Oct 23 11:42 index
drwxrwxrwx+ 3 Oct 23 11:42 segments

$ ls -l testmerge/segments/
total 0
drwxrwxrwx+ 7 Oct 23 11:28 20051023112848 


Step 3:
Then place this in Tomcat's nutch-site.xml file:

<nutch-conf>
        <property>
                <name>searcher.dir</name>
                <value>C:\path_to_testmerge\testmerge</value>
        </property>
</nutch-conf>

Run Tomcat and search away.

Hope this helps,
-Graham

> -----Original Message-----
> From: AJ Chen [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, October 25, 2005 4:03 PM
> To: nutch-dev@lucene.apache.org
> Subject: merge indices from multiple webdb
> 
> Has anyone merged indices from two separate webdb? I have two 
> separate webdb and need to find a good way to combine them 
> for unified search.
> AJ
> 

Reply via email to