Aled,

I'll try that today, excellent, and thanks for the heads up on the db
directory. I'll let you now how it goes. 

r/d



-----Original Message-----
From: Aled Jones [mailto:[EMAIL PROTECTED] 
Sent: Thursday, March 30, 2006 12:24 AM
To: nutch-user@lucene.apache.org
Subject: ATB: Multiple crawls how to get them to work together

Hi Dan

I'll presume you've done the crawls already..

Each resulting crawled folder should have 3 folders, db, index and
segments.

Create your search.dir folder and create a segments folder in that.

Each segments folder in each crawl folder should contain folders with
timestamps as the names.  Copy the contents of:

crawlA/segments
crawlB/segments
crawlc/segments

(i.e. The folders with timestamps as names)Into:

search.dir/segments

Next, delete the duplicates from the segments by running the command:

bin/nutch dedup -local search.dir/segments

Then you need to merge the segments to create an index folder, so run
the command:

bin/nutch merge -local search.dir/index search.dir/segments/*

You should now have two folders in your search.dir:
search.dir/segments
search.dir/index

That's all you need for serving pages (db folder is only used when
fetching).

Now just set the searcher.dir property value in nutch-site.xml to be the
location of search.dir

That's how I've been doing it, although it may not be the "right" way.
:-) Hope this helps.

Cheers
Aled


> -----Neges Wreiddiol-----/-----Original Message-----
> Oddi wrth/From: Dan Morrill [mailto:[EMAIL PROTECTED] 
> Anfonwyd/Sent: 29 March 2006 18:06
> At/To: nutch-user@lucene.apache.org
> Copi/Cc: [EMAIL PROTECTED]
> Pwnc/Subject: Multiple crawls how to get them to work together
> 
> Hi folks,
> 
>  
> 
> I have 3 crawls, crawlA, crawlB, and crawlC. I would like all 
> of them to be available to the search.jsp page. 
> 
>  
> 
> I went through the site saw merge, index, make new db, and 
> followed all the directions that I could find, but still no 
> resolution on this one. So what I need are some idea's on 
> where to proceed from here, I intend on having 2 or
> 3 boxes make a crawl, then somehow merge the crawls together 
> and form a "master" under search.dir. I would also want to 
> update this one on a regular basis. 
> 
>  
> 
> Unfortunately, the instructions to date have all been tried, 
> and have all lead to the idea not working. There is also no 
> indexmerger or indexsemgents directives in nutch 0.7.1. Any 
> support ideas, direct pointers, or even step-by-step 
> instructions on how to do this (outside of what is in the 
> tutorials because that has been tried already, including 
> support idea's in the user web mail list). 
> 
>  
> 
> Cheers/r/dan
> 
>  
> 
>  
> 
>  
> 
> 
###########################################

This message has been scanned by F-Secure Anti-Virus for Microsoft Exchange.
For more information, connect to http://www.f-secure.com/

************************************************************************
This e-mail and any attachments are strictly confidential and intended
solely for the addressee. They may contain information which is covered by
legal, professional or other privilege. If you are not the intended
addressee, you must not copy the e-mail or the attachments, or use them for
any purpose or disclose their contents to any other person. To do so may be
unlawful. If you have received this transmission in error, please notify us
as soon as possible and delete the message and attachments from all places
in your computer where they are stored. 

Although we have scanned this e-mail and any attachments for viruses, it is
your responsibility to ensure that they are actually virus free.
 

=

Reply via email to