On Thu, Jul 16, 2009 at 14:11, Beats<[email protected]> wrote:
>
> hi...
>
> i am using nutch-1.0
>
> i am trying to crawl pages. But further restrict them while indexing.
>
> Means  Pages containing certain phrase in their url or content should not b
> indexed..
>
> i hv tried stopping output.collect() in IndexerMapReduce.java at line- 162
>
> but it give null point error.
>
> Is there a way to do this..
>
> plz can someone gv me the code to do this
>
>

You may write a new indexing filter. If an indexing filter returns
null, then that page
is not indexed.

>
> with Regards
>
> Tarun
> --
> View this message in context: 
> http://www.nabble.com/how-to-filter-pages-before-indexing-tp24514463p24514463.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>



-- 
Doğacan Güney

Reply via email to