Hi Nils
According to my knowledge , Nutch do not support this feature Till Date.
If yes, Do let me know. I also Need nutch to support this feature ,
otherwise I am planning to move to the same tech as u did like using wget
and Lucene ....
Keep in touch...
./Arun
On 11/7/06, Nils Höller <[EMAIL PROTECTED]> wrote:
Hi,
I ve worked with Nutch till last year and
I am now trying to do something (about continious queries) new with it.
I have only used nutch for getting the index an searching something in a
generated site-map (with the WebDB).
Now I want to use it for to get a archive of a certain number of sites.
So I ll want to nutch to crawl the sites every day (like I used it
before) but also download and save the REAL content of the sites (all
html and pictures), so I can work with this real content.
Is there a possibility to make nutch save also the content like it is
crawled, and not only creating the WebDB and Index?
Actually I have a solution with a perl script, wget, and lucene, but
it would be perfect if I can use nutch from now on.
Thanks for your help.
Nils
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general