Peters, Vijaya schrieb: > I am using Nutch 1.0. I want to perform a 'clean' crawl. > > > > I see the force option in this patch: NUTCH-601v1.0.patch > <https://issues.apache.org/jira/secure/attachment/12375717/NUTCH-601v1.0 > .patch> > > Do I have to make those code changes, or does Nutch 1.0 have another way > to do this? > > > > Also, everytime I do another crawl, I see the same file being fetched > over and over again. Is it appending the same url over and over to the > which file? you can check the crawl date of this file with
reinh...@thord:>bin/nutch readdb <crawldb> -url <url> > fetch list? > > > > Thanks, > > - Vijaya > > > > > > Vijaya Peters > SRA International, Inc. > 4350 Fair Lakes Court North > Room 4004 > Fairfax, VA 22033 > Tel: 703-502-1184 > > www.sra.com <http://www.sra.com/> > Named to FORTUNE's "100 Best Companies to Work For" list for 10 > consecutive years > > P Please consider the environment before printing this e-mail > > This electronic message transmission contains information from SRA > International, Inc. which may be confidential, privileged or > proprietary. The information is intended for the use of the individual > or entity named above. If you are not the intended recipient, be aware > that any disclosure, copying, distribution, or use of the contents of > this information is strictly prohibited. If you have received this > electronic information in error, please notify us immediately by > telephone at 866-584-2143. > > > > >
