Hi,

I am using Nutchwax. which is using Nutch v0.7,
together with heritrix and wera for a web archive
system.

Since we are achiving the websites that we crawled,
storage is a concern. I would like to ask what files
inside the Index folder can be deleted? I did a trial
and error approach and was still able to run search
and retrieval on Wera without the following folders:
webdb,segment-*-indexs, segment-*-parse_data, and
segment-*-fetcher.

I hope someone can advise me if what I am doing is
correct.

Best Regards,
Alexis Artes

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Reply via email to