hi, actually what i want is to crawl a web page say 'page A' and all its outlinks. i want to index all the content gathered by crawling the outlinks. But not the 'page A'. is there any way to do it in single run.
with Regards Beats be...@yahoo.com SunGod wrote: > > 1.create work dir test first > > > 2.insert url > ../bin/nutch inject test -urlfile urls > > 3.create fetchlist > ../bin/nutch generate test test/segments > > 4.fetch url > s1=`ls -d crawl/segments/2* | tail -1` > echo $s1 > ../bin/nutch fetch test/segments/20090628160619 > > 5.update crawldb > ../bin/nutch updatedb test test/segments/20090628160619 > > loop step 3 - 5, write a bash script running is best! > > next time please use google search first > > 2009/7/13 Beats <tarun_agrawal...@yahoo.com> > >> >> can anyone help me on this.. >> >> i m using solr to index the nutch doc. >> So i think prune tool will not work. >> >> i do not want to index the document taken from a particular set of sites >> >> with regards Beats >> -- >> View this message in context: >> http://www.nabble.com/how-to-crawl-a-page-but-not-index-it-tp24437901p24459435.html >> Sent from the Nutch - User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://www.nabble.com/how-to-crawl-a-page-but-not-index-it-tp24437901p24478530.html Sent from the Nutch - User mailing list archive at Nabble.com.