Hi all, I am trying to create script for recrawl. The script needs to run every night and generate, fetch, index etc the sites and pickup the new and updated docs and update the nutch DB. I am not able to find the proper value to the -adddays in the loop (below). If I put 30-31 it keeps processing for more than 24 hrs. If I put 5-10-15, it doesn't pick up any new/modified files. Do I need to put the right number or I am going wrong somewhere else? Any guidance will be of great help. depth=10
# To generate/fetch/update cycle for ((i=1; i <= depth ; i++)) do bin/nutch generate crawl/crawldb crawl/segments -topN 1000 -adddays 20 segment=`ls -d crawl/segments/* | tail -1` bin/nutch fetch $segment bin/nutch updatedb crawl/crawldb $segment done Thanks a Lot... --Sanjay ------------------------------------------ The contents of this message, together with any attachments, are intended only for the use of the person(s) to which they are addressed and may contain confidential and/or privileged information. Further, any medical information herein is confidential and protected by law. It is unlawful for unauthorized persons to use, review, copy, disclose, or disseminate confidential medical information. If you are not the intended recipient, immediately advise the sender and delete this message and any attachments. Any distribution, or copying of this message, or any attachment, is prohibited.
