Hi, I'm a new user to nutch and am wondering about seeding the database by running a crawl with a very shallow depth, then growing the database every time the periodic update script is done. I have two scripts that I'm currently using, but I'm not sure if the update script is actually adding searchable data. The initial crawl script is doing a great job, and I can verify that it is working by using the search app that comes with nutch, but my maintenance script doesn't seem to be adding any results, although it throws no errors.
Below are the two small scripts. Am I missing any simple errors? -- initial crawl script << END1 -- #!/bin/sh ./../bin/nutch crawl urls -dir crawl -depth 2 -topN 10000 END1 -- updater script << END2 -- first="crawl" second="100000" ../bin/nutch generate $first/crawldb $first/segments -topN $second segment=`ls -d $first/segments/* | tail -1 | grep "[a-zA-Z0-9/]*"` ../bin/nutch fetch $segment ../bin/nutch updatedb $first/crawldb $segment rm -r $first/indexes ../bin/nutch invertlinks $first/linkdb $first/segments/* ../bin/nutch index $first/indexes $first/crawldb $first/linkdb $first/segments/* END2 ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
