Sorry, segments, not indexes.
Dennis Kubes wrote:
You would need to write a custom MapReduce job to run through the
indexes and only keeps the ones identified by your plugin. Be sure to
update the CrawlDb with the extracted urls before you drop the content
from the segments.
Dennis
MilleBii wrote:
Hi guys,
I'm looking if I can optimize the size occupied on disk by my segments.
I have implemented a topical-scoring plugin... this means I know at that
steps if I should keep that page content or not.
Is there a way to drop some pages content after parsing it, but of course
keep the links because I want to follow the graph ?
PS: Prune is no option to me because it only cleans up the indexes,
not the
segments and my indexer does that clean-up very well.