Re: [Nutch-general] Deleting crawl still gives proper results

Enzo Michelangeli Sat, 26 May 2007 20:16:17 -0700

----- Original Message ----- 
From: "Manoharam Reddy" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Saturday, May 26, 2007 6:23 PM


> After I create the crawldb after running bin/nutch crawl, I start my
> Tomcat server. It gives proper search results.
>
> What I am wondering is that even after I delete, the 'crawl' folder,
> the search page still gives proper search results. How is this
> possible? Only after I restart the Tomcat server, it stops giving
> results.

The webapp seems to cache data. I have a related problem: updates to the
indexes are only noticed after restarting Tomcat (so I have scheduled a
nightly cron job to do that).

Question for the Ones Who Know: in "bin/nutch mergesegs", can I use the same
directory for input and output?

For example:

 bin/nutch mergesegs crawl/segments -dir crawl/segments

Same for mergedb: can I issue:

  bin/nutch mergedb crawl/crawldb crawl/crawldb

At present I pass through temporary directories, and then I switch them in
place of the old ones with a couple of "mv", but I don't know if that's
necessary, or may even be harmful (for example, leaving the webapp, unaware
of the "mv", pointing to the inode of the old directory). And I noticed that
"bin/nutch mergedb" does not create the output directory until it's done, so
I wonder if the explicit use of a temporary directory in my scripts is
redundant.

Enzo



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Re: [Nutch-general] Deleting crawl still gives proper results

Reply via email to