Hi,

I am using Nutchwax. which is using Nutch v0.7,
together with heritrix and wera for a web archive
system.

Since we are achiving the websites that we crawled,
storage is a concern. I would like to ask what files
inside the Index folder can be deleted? I did a trial
and error approach and was still able to run search
and retrieval on Wera without the following folders:
webdb,segment-*-indexs, segment-*-parse_data, and
segment-*-fetcher.

I hope someone can advise me if what I am doing is
correct.

Best Regards,
Alexis Artes

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to