The webapp caches the whole crawldb? Can anyone please tell me where does it cache the whole crawldb? I don't think it is possible to cache it on RAM. Is it cached in some location on the hard disk.
Please clarify this point. On 5/27/07, Enzo Michelangeli <[EMAIL PROTECTED]> wrote:
----- Original Message ----- From: "Manoharam Reddy" <[EMAIL PROTECTED]> To: <[email protected]> Sent: Saturday, May 26, 2007 6:23 PM > After I create the crawldb after running bin/nutch crawl, I start my > Tomcat server. It gives proper search results. > > What I am wondering is that even after I delete, the 'crawl' folder, > the search page still gives proper search results. How is this > possible? Only after I restart the Tomcat server, it stops giving > results. The webapp seems to cache data. I have a related problem: updates to the indexes are only noticed after restarting Tomcat (so I have scheduled a nightly cron job to do that). Question for the Ones Who Know: in "bin/nutch mergesegs", can I use the same directory for input and output? For example: bin/nutch mergesegs crawl/segments -dir crawl/segments Same for mergedb: can I issue: bin/nutch mergedb crawl/crawldb crawl/crawldb At present I pass through temporary directories, and then I switch them in place of the old ones with a couple of "mv", but I don't know if that's necessary, or may even be harmful (for example, leaving the webapp, unaware of the "mv", pointing to the inode of the old directory). And I noticed that "bin/nutch mergedb" does not create the output directory until it's done, so I wonder if the explicit use of a temporary directory in my scripts is redundant. Enzo
