i hv tried to implement it in BasicIndexingFilter which returns null if such string is found in url.
But no data is shown. thanx Tarun Doğacan Güney-3 wrote: > > On Thu, Jul 16, 2009 at 14:11, Beats<[email protected]> wrote: >> >> hi... >> >> i am using nutch-1.0 >> >> i am trying to crawl pages. But further restrict them while indexing. >> >> Means Pages containing certain phrase in their url or content should not >> b >> indexed.. >> >> i hv tried stopping output.collect() in IndexerMapReduce.java at line- >> 162 >> >> but it give null point error. >> >> Is there a way to do this.. >> >> plz can someone gv me the code to do this >> >> > > You may write a new indexing filter. If an indexing filter returns > null, then that page > is not indexed. > >> >> with Regards >> >> Tarun >> -- >> View this message in context: >> http://www.nabble.com/how-to-filter-pages-before-indexing-tp24514463p24514463.html >> Sent from the Nutch - User mailing list archive at Nabble.com. >> >> > > > > -- > Doğacan Güney > > -- View this message in context: http://www.nabble.com/how-to-filter-pages-before-indexing-tp24514463p24515701.html Sent from the Nutch - User mailing list archive at Nabble.com.
