I am using Nutch 1.0.  I want to perform a 'clean' crawl.  

 

I see the force option in this patch:  NUTCH-601v1.0.patch
<https://issues.apache.org/jira/secure/attachment/12375717/NUTCH-601v1.0
.patch> 

Do I have to make those code changes, or does Nutch 1.0 have another way
to do this?

 

Also, everytime I do another crawl, I see the same file being fetched
over and over again. Is it appending the same url over and over to the
fetch list?

 

Thanks,

- Vijaya

 

 

Vijaya Peters
SRA International, Inc.
4350 Fair Lakes Court North
Room 4004
Fairfax, VA  22033
Tel:  703-502-1184

www.sra.com <http://www.sra.com/> 
Named to FORTUNE's "100 Best Companies to Work For" list for 10
consecutive years

P Please consider the environment before printing this e-mail

This electronic message transmission contains information from SRA
International, Inc. which may be confidential, privileged or
proprietary.  The information is intended for the use of the individual
or entity named above.  If you are not the intended recipient, be aware
that any disclosure, copying, distribution, or use of the contents of
this information is strictly prohibited.  If you have received this
electronic information in error, please notify us immediately by
telephone at 866-584-2143.

 

Reply via email to