Thx, a bit too complex for me right now. I don't -yet- fully understand this
map/reduce technique.
But I'll keep the idea for a future development.

2009/12/4 Dennis Kubes <[email protected]>

> Sorry, segments, not indexes.
>
>
> Dennis Kubes wrote:
>
>> You would need to write a custom MapReduce job to run through the indexes
>> and only keeps the ones identified by your plugin.  Be sure to update the
>> CrawlDb with the extracted urls before you drop the content from the
>> segments.
>>
>> Dennis
>>
>> MilleBii wrote:
>>
>>> Hi guys,
>>>
>>> I'm looking if I can optimize the size occupied on disk by my segments.
>>> I have implemented a topical-scoring plugin... this means I know at that
>>> steps if I should keep that page content or not.
>>> Is there a way to drop some pages content after parsing it, but of course
>>> keep the links because I want to follow the graph ?
>>>
>>> PS: Prune is no option to me because it only cleans up the indexes, not
>>> the
>>> segments and my indexer does that clean-up very well.
>>>
>>>


-- 
-MilleBii-

Reply via email to