How do you buid a new webdb from the merged segment/index? Could you provide detailed steps for the process you described? Thanks.
AJ On 10/25/05, Andrey Ilinykh <[EMAIL PROTECTED]> wrote: > > If you merge two segments page ranks are off. You have to build new webdb, > calculate page rank and then build one more segment again. > > Thank you, > Andrey > > -----Original Message----- > From: AJ Chen [mailto:[EMAIL PROTECTED] > Sent: Tuesday, October 25, 2005 2:02 PM > To: nutch-dev@lucene.apache.org > Subject: Re: merge indices from multiple webdb > > > Thanks so much, Graham. This should do it. > A related question: After the merge, is it possible to build the new webdb > as well? The link data for the merged db can be different from the two > original db. In order to have accurate page ranking, the link data should > be > updated. > > AJ > > On 10/25/05, Graham Stead <[EMAIL PROTECTED]> wrote: > > > > I am by no means a Nutch expert yet, but this is how I merged two > > separate segments so I could search through them: > > > > Step 1: > > $ bin/nutch mergesegs -local -o testmerge -i > > ../crawls/foo/segments/20051018224434/ > > ../crawls/bar/segments/20051018225505/ > > < bunch of stuff happens > > > > > This creates a segment 20051023112848 in the testmerge folder. The > > segment contains a combined index as well as copies of all information > > from the two input segments. > > > > Step 2: > > This wasn't quite enough to search with, however. I copied the index > > folder and organized the directories into the same structure as used > > during a crawl, then was able to run the Tomcat searcher on the new > > segment. > > > > After copying/moving/reorganizing I have: > > > > $ ls -l testmerge/ > > total 0 > > drwxrwxrwx+ 2 Oct 23 11:42 index > > drwxrwxrwx+ 3 Oct 23 11:42 segments > > > > $ ls -l testmerge/segments/ > > total 0 > > drwxrwxrwx+ 7 Oct 23 11:28 20051023112848 > > > > > > Step 3: > > Then place this in Tomcat's nutch-site.xml file: > > > > <nutch-conf> > > <property> > > <name>searcher.dir</name> > > <value>C:\path_to_testmerge\testmerge</value> > > </property> > > </nutch-conf> > > > > Run Tomcat and search away. > > > > Hope this helps, > > -Graham > > > > > -----Original Message----- > > > From: AJ Chen [mailto:[EMAIL PROTECTED] > > > Sent: Tuesday, October 25, 2005 4:03 PM > > > To: nutch-dev@lucene.apache.org > > > Subject: merge indices from multiple webdb > > > > > > Has anyone merged indices from two separate webdb? I have two > > > separate webdb and need to find a good way to combine them > > > for unified search. > > > AJ > > > > > >