i hv tried to implement it in BasicIndexingFilter

which returns null if such string is found in url.

But no data is shown.

thanx

Tarun 



Doğacan Güney-3 wrote:
> 
> On Thu, Jul 16, 2009 at 14:11, Beats<[email protected]> wrote:
>>
>> hi...
>>
>> i am using nutch-1.0
>>
>> i am trying to crawl pages. But further restrict them while indexing.
>>
>> Means  Pages containing certain phrase in their url or content should not
>> b
>> indexed..
>>
>> i hv tried stopping output.collect() in IndexerMapReduce.java at line-
>> 162
>>
>> but it give null point error.
>>
>> Is there a way to do this..
>>
>> plz can someone gv me the code to do this
>>
>>
> 
> You may write a new indexing filter. If an indexing filter returns
> null, then that page
> is not indexed.
> 
>>
>> with Regards
>>
>> Tarun
>> --
>> View this message in context:
>> http://www.nabble.com/how-to-filter-pages-before-indexing-tp24514463p24514463.html
>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> Doğacan Güney
> 
> 

-- 
View this message in context: 
http://www.nabble.com/how-to-filter-pages-before-indexing-tp24514463p24515701.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to