[Nutch-general] Re: Updating the search index

Chun Wei Ho Sun, 29 Jan 2006 19:28:03 -0800

Hi,

I am not sure if creating a new NutchBean/IndexSearcher each time
would have any performance considerations (would it affect any
disk/mem/query caching techniques used for faster searches?).


Maybe I could check if a new index is available periodically and just
replace the cached NutchBean (I think the jsp caches in the
application context) with one using the new index,  but are there any
solutions to ensuring that the older active NutchBean instance and
corresponding IndexSearcher is closed properly after current ongoing
searches using it are done? (assumably the objects would be garbaged
once all users were done and the files closed?)

Thanks for the information :)


On 1/30/06, David Wallace <[EMAIL PROTECTED]> wrote:
> Hello Chun Wei,
>
> Updating indexes and segments on the server is something I would have
> expected just about every Nutch installation would need to do.  It's a
> shame that Nutch doesn't handle this particularly well.
>
> For my installation, I've modified the JSP stuff so that a new
> NutchBean is created for every search.  The performance is not as bad as
> you might think, because we have fast machines and our segments are not
> too big.  Then all we do is copy our new crawl data into a new directory
> on the search machine; and rename this directory when we're ready to use
> it.  So the downtime is just the time it takes to rename a directory.
>
> There's a small trap with this approach.  The NutchBean doesn't close
> its files after it's done its search.  Therefore, if you create a new
> NutchBean every time, you'll eventually get an exception.  It's
> something like "too many open file handles"; I can't remember the exact
> wording.  You need to add a closeFiles() method to the NutchBean, and
> call it after the search.
>
> Hope this helps,
> David.
>
>
> Date: Fri, 27 Jan 2006 12:03:13 +0800
> From: Chun Wei Ho <[EMAIL PROTECTED]>
> To: [email protected]
> Subject: [Nutch-general] Updating the search index
> Reply-To: [EMAIL PROTECTED]
>
> Hi,
>
> We are running a nutch crawler on one machine and a web search
> application on searching an index using NutchBean on another.
> Periodically we would like to copy the updated crawl index from the
> crawl machine to replace the one used by the search application,
> without resulting in any broken queries or search application
> downtime.
>
> Could anyone give us some pointers on how they are doing this or
> setting it up? Thanks :)
>
>
> ********************************************************************************
> This email may contain legally privileged information and is intended only 
> for the addressee. It is not necessarily the official view or
> communication of the New Zealand Qualifications Authority. If you are not the 
> intended recipient you must not use, disclose, copy or distribute this email 
> or
> information in it. If you have received this email in error, please contact 
> the sender immediately. NZQA does not accept any liability for changes made 
> to this email or attachments after sending by NZQA.
>
> All emails have been scanned for viruses and content by MailMarshal.
> NZQA reserves the right to monitor all email communications through its 
> network.
>
> ********************************************************************************
>
>


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid3432&bid#0486&dat1642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] Re: Updating the search index

Reply via email to