If you merge two segments page ranks are off. You have to build new webdb, 
calculate page rank and then build one more segment again. 

Thank you,
  Andrey

-----Original Message-----
From: AJ Chen [mailto:[EMAIL PROTECTED]
Sent: Tuesday, October 25, 2005 2:02 PM
To: [email protected]
Subject: Re: merge indices from multiple webdb


Thanks so much, Graham. This should do it.
A related question: After the merge, is it possible to build the new webdb
as well? The link data for the merged db can be different from the two
original db. In order to have accurate page ranking, the link data should be
updated.

AJ

On 10/25/05, Graham Stead <[EMAIL PROTECTED]> wrote:
>
> I am by no means a Nutch expert yet, but this is how I merged two
> separate segments so I could search through them:
>
> Step 1:
> $ bin/nutch mergesegs -local -o testmerge -i
> ../crawls/foo/segments/20051018224434/
> ../crawls/bar/segments/20051018225505/
> < bunch of stuff happens >
>
> This creates a segment 20051023112848 in the testmerge folder. The
> segment contains a combined index as well as copies of all information
> from the two input segments.
>
> Step 2:
> This wasn't quite enough to search with, however. I copied the index
> folder and organized the directories into the same structure as used
> during a crawl, then was able to run the Tomcat searcher on the new
> segment.
>
> After copying/moving/reorganizing I have:
>
> $ ls -l testmerge/
> total 0
> drwxrwxrwx+ 2 Oct 23 11:42 index
> drwxrwxrwx+ 3 Oct 23 11:42 segments
>
> $ ls -l testmerge/segments/
> total 0
> drwxrwxrwx+ 7 Oct 23 11:28 20051023112848
>
>
> Step 3:
> Then place this in Tomcat's nutch-site.xml file:
>
> <nutch-conf>
> <property>
> <name>searcher.dir</name>
> <value>C:\path_to_testmerge\testmerge</value>
> </property>
> </nutch-conf>
>
> Run Tomcat and search away.
>
> Hope this helps,
> -Graham
>
> > -----Original Message-----
> > From: AJ Chen [mailto:[EMAIL PROTECTED]
> > Sent: Tuesday, October 25, 2005 4:03 PM
> > To: [email protected]
> > Subject: merge indices from multiple webdb
> >
> > Has anyone merged indices from two separate webdb? I have two
> > separate webdb and need to find a good way to combine them
> > for unified search.
> > AJ
> >
>


-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc.
Get Certified Today * Register for a JBoss Training Course
Free Certification Exam for All Training Attendees Through End of 2005
Visit http://www.jboss.com/services/certification for more information
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to