So, the 'db' is never used during the searching aspect.  Interesting. 
'segments' is more for run-time use.

On 3/30/06, Aled Jones <[EMAIL PROTECTED]> wrote:
> Hi Dan
>
> I'll presume you've done the crawls already..
>
> Each resulting crawled folder should have 3 folders, db, index and
> segments.
>
> Create your search.dir folder and create a segments folder in that.
>
> Each segments folder in each crawl folder should contain folders with
> timestamps as the names.  Copy the contents of:
>
> crawlA/segments
> crawlB/segments
> crawlc/segments
>
> (i.e. The folders with timestamps as names)Into:
>
> search.dir/segments
>
> Next, delete the duplicates from the segments by running the command:
>
> bin/nutch dedup -local search.dir/segments
>
> Then you need to merge the segments to create an index folder, so run
> the command:
>
> bin/nutch merge -local search.dir/index search.dir/segments/*
>
> You should now have two folders in your search.dir:
> search.dir/segments
> search.dir/index
>
> That's all you need for serving pages (db folder is only used when
> fetching).
>
> Now just set the searcher.dir property value in nutch-site.xml to be the
> location of search.dir
>
> That's how I've been doing it, although it may not be the "right" way.
> :-) Hope this helps.
>
> Cheers
> Aled
>
>
> > -----Neges Wreiddiol-----/-----Original Message-----
> > Oddi wrth/From: Dan Morrill [mailto:[EMAIL PROTECTED]
> > Anfonwyd/Sent: 29 March 2006 18:06
> > At/To: nutch-user@lucene.apache.org
> > Copi/Cc: [EMAIL PROTECTED]
> > Pwnc/Subject: Multiple crawls how to get them to work together
> >
> > Hi folks,
> >
> >
> >
> > I have 3 crawls, crawlA, crawlB, and crawlC. I would like all
> > of them to be available to the search.jsp page.
> >
> >
> >
> > I went through the site saw merge, index, make new db, and
> > followed all the directions that I could find, but still no
> > resolution on this one. So what I need are some idea's on
> > where to proceed from here, I intend on having 2 or
> > 3 boxes make a crawl, then somehow merge the crawls together
> > and form a "master" under search.dir. I would also want to
> > update this one on a regular basis.
> >
> >
> >
> > Unfortunately, the instructions to date have all been tried,
> > and have all lead to the idea not working. There is also no
> > indexmerger or indexsemgents directives in nutch 0.7.1. Any
> > support ideas, direct pointers, or even step-by-step
> > instructions on how to do this (outside of what is in the
> > tutorials because that has been tried already, including
> > support idea's in the user web mail list).
> >
> >
> >
> > Cheers/r/dan
> >
> >
> >
> >
> >
> >
> >
> >
> ###########################################
>
> This message has been scanned by F-Secure Anti-Virus for Microsoft Exchange.
> For more information, connect to http://www.f-secure.com/
>
> ************************************************************************
> This e-mail and any attachments are strictly confidential and intended solely 
> for the addressee. They may contain information which is covered by legal, 
> professional or other privilege. If you are not the intended addressee, you 
> must not copy the e-mail or the attachments, or use them for any purpose or 
> disclose their contents to any other person. To do so may be unlawful. If you 
> have received this transmission in error, please notify us as soon as 
> possible and delete the message and attachments from all places in your 
> computer where they are stored.
>
> Although we have scanned this e-mail and any attachments for viruses, it is 
> your responsibility to ensure that they are actually virus free.
>
>
>

Reply via email to