Hello Chun Wei,
 
Updating indexes and segments on the server is something I would have
expected just about every Nutch installation would need to do.  It's a
shame that Nutch doesn't handle this particularly well.
 
For my installation, I've modified the JSP stuff so that a new
NutchBean is created for every search.  The performance is not as bad as
you might think, because we have fast machines and our segments are not
too big.  Then all we do is copy our new crawl data into a new directory
on the search machine; and rename this directory when we're ready to use
it.  So the downtime is just the time it takes to rename a directory.
 
There's a small trap with this approach.  The NutchBean doesn't close
its files after it's done its search.  Therefore, if you create a new
NutchBean every time, you'll eventually get an exception.  It's
something like "too many open file handles"; I can't remember the exact
wording.  You need to add a closeFiles() method to the NutchBean, and
call it after the search.
 
Hope this helps,
David.
 
 
Date: Fri, 27 Jan 2006 12:03:13 +0800
From: Chun Wei Ho <[EMAIL PROTECTED]>
To: [email protected]
Subject: [Nutch-general] Updating the search index
Reply-To: [EMAIL PROTECTED]

Hi,

We are running a nutch crawler on one machine and a web search
application on searching an index using NutchBean on another.
Periodically we would like to copy the updated crawl index from the
crawl machine to replace the one used by the search application,
without resulting in any broken queries or search application
downtime.

Could anyone give us some pointers on how they are doing this or
setting it up? Thanks :)


********************************************************************************
This email may contain legally privileged information and is intended only for 
the addressee. It is not necessarily the official view or 
communication of the New Zealand Qualifications Authority. If you are not the 
intended recipient you must not use, disclose, copy or distribute this email or 
information in it. If you have received this email in error, please contact the 
sender immediately. NZQA does not accept any liability for changes made to this 
email or attachments after sending by NZQA. 

All emails have been scanned for viruses and content by MailMarshal. 
NZQA reserves the right to monitor all email communications through its network.

********************************************************************************

Reply via email to