Thanks! It worked.
On 5/28/07, Doğacan Güney <[EMAIL PROTECTED]> wrote: > Hi, > > On 5/28/07, Manoharam Reddy <[EMAIL PROTECTED]> wrote: > > In my crawl-urlfilter.txt I have put a statement like > > > > -^http://cdserver > > > > Still while running crawl, it fetches this site. I am running the > > crawl using these commands:- > > > > bin/nutch inject crawl/crawldb urls > > > > Inside a loop:- > > > > bin/nutch generate crawl/crawldb crawl/segments -topN 10 > > segment=`ls -d crawl/segments/* | tail -1` > > bin/nutch fetch $segment -threads 10 > > bin/nutch updatedb crawl/crawldb $segment > > > > Why does it fetch http://cdserver even though I have blocked it? Is it > > becoming "allowed" from some other filter file? If so, what do I need > > to check. Please help. > > > > In your case, crawl-urlfilter.txt is not read because you are not > running 'crawl' command (as in bin/nutch crawl). You have to update > regex-urlfilter.txt or prefix-urlfilter.txt and make sure that you > enable them in your conf. > > -- > Doğacan Güney > ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list Nutch-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-general