On Thu, Jul 16, 2009 at 14:11, Beats<[email protected]> wrote: > > hi... > > i am using nutch-1.0 > > i am trying to crawl pages. But further restrict them while indexing. > > Means Pages containing certain phrase in their url or content should not b > indexed.. > > i hv tried stopping output.collect() in IndexerMapReduce.java at line- 162 > > but it give null point error. > > Is there a way to do this.. > > plz can someone gv me the code to do this > >
You may write a new indexing filter. If an indexing filter returns null, then that page is not indexed. > > with Regards > > Tarun > -- > View this message in context: > http://www.nabble.com/how-to-filter-pages-before-indexing-tp24514463p24514463.html > Sent from the Nutch - User mailing list archive at Nabble.com. > > -- Doğacan Güney
