Hi all,
 
I am trying to create script for recrawl. The script needs to run every
night and generate, fetch, index etc the sites and pickup the new and
updated docs and update the nutch DB.
I am not able to find the proper value to the -adddays in the loop
(below). If I put 30-31 it keeps processing for more than 24 hrs. If I
put 5-10-15, it doesn't pick up any new/modified files.
Do I need to put the right number or I am going wrong somewhere else?
Any guidance will be of great help.
 
depth=10

# To generate/fetch/update cycle

for ((i=1; i <= depth ; i++))

do

bin/nutch generate crawl/crawldb crawl/segments -topN 1000 -adddays 20

segment=`ls -d crawl/segments/* | tail -1`

bin/nutch fetch $segment

bin/nutch updatedb crawl/crawldb $segment

done

Thanks a Lot...
 
--Sanjay 


------------------------------------------
The contents of this message, together with any attachments, are
intended only for the use of the person(s) to which they are
addressed and may contain confidential and/or privileged
information. Further, any medical information herein is
confidential and protected by law. It is unlawful for unauthorized
persons to use, review, copy, disclose, or disseminate confidential
medical information. If you are not the intended recipient,
immediately advise the sender and delete this message and any
attachments. Any distribution, or copying of this message, or any
attachment, is prohibited.

Reply via email to