Hi,

I was able to get Nutch to crawl my company's intranet and set up a
search webapp without much trouble. However, I have some questions about
maintaining that web app.

I'd like to be able to update the crawl periodically (probably nightly)
with minimal fuss. I saw 2 bash scripts for updating a crawl:

   
http://today.java.net/pub/a/today/2006/02/16/introduction-to-nutch-2.html
    http://wiki.apache.org/nutch/Nutch_-_The_Java_Search_Engine

However, both fail when I tried to use them--apparently they use nutch
commands that are no longer supported in 0.9. Simple modifications of
the scripts didn't seem to help much. So...

Question #1: How do I update a previous crawl with Nutch 0.9? Does
someone have an updated version of the bash scripts in the links above?
Or does nutch now do an all-in-one recrawl and I just haven't found the
documentation yet?

The second questions is about refreshing my webapp.

The java.net article above says that

    Even with the re-crawl script, we have a problem with updating the
    live search index. As mentioned above, the |NutchBean| class opens
    the index to search when it is initialized. Since the Nutch web app
    caches the |NutchBean| in the application servlet context, updates
    to the index will never be picked up as long as the servlet
    container is running.This problem is recognized by the Nutch
    community, so it will likely be fixed in an upcoming release (Nutch
    0.7.1 was the stable release at the time of writing).

Question #2: Has this issue been resolved in Nutch 0.9? What's the
easiest way to get the 0.9 webapp to pick up changes to a crawl? I'm
comfortable monkeying around with the webapp a bit if necessary, but if
there is a simple way of updating the web app, I'd prefer that.


Thanks for any help,

Michael



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to