Doğacan Güney-3 wrote:
>
> On Thu, Jul 16, 2009 at 14:11, Beats<[email protected]> wrote:
>>
>> hi...
>>
>> i am using nutch-1.0
>>
>> i am trying to crawl pages. But further restrict them while indexing.
>>
>> Means Pages containing certain phrase in their url or content should not
>> b
>> indexed..
>>
>> i hv tried stopping output.collect() in IndexerMapReduce.java at line-
>> 162
>>
>> but it give null point error.
>>
>> Is there a way to do this..
>>
>> plz can someone gv me the code to do this
>>
>>
>
> You may write a new indexing filter. If an indexing filter returns
> null, then that page
> is not indexed.
>
>
> i hv tried to implement it in BasicIndexingFilter
>
> which returns null if such string is found in url.
>
> But no data is shown.
>
> thanx
>
> Tarun
>
>
>
>>
>> with Regards
>>
>> Tarun
>> --
>> View this message in context:
>> http://www.nabble.com/how-to-filter-pages-before-indexing-tp24514463p24514463.html
>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>
>>
>
>
>
> --
> Doğacan Güney
>
>
--
View this message in context:
http://www.nabble.com/how-to-filter-pages-before-indexing-tp24514463p24515213.html
Sent from the Nutch - User mailing list archive at Nabble.com.