Thx, a bit too complex for me right now. I don't -yet- fully understand this map/reduce technique. But I'll keep the idea for a future development.
2009/12/4 Dennis Kubes <[email protected]> > Sorry, segments, not indexes. > > > Dennis Kubes wrote: > >> You would need to write a custom MapReduce job to run through the indexes >> and only keeps the ones identified by your plugin. Be sure to update the >> CrawlDb with the extracted urls before you drop the content from the >> segments. >> >> Dennis >> >> MilleBii wrote: >> >>> Hi guys, >>> >>> I'm looking if I can optimize the size occupied on disk by my segments. >>> I have implemented a topical-scoring plugin... this means I know at that >>> steps if I should keep that page content or not. >>> Is there a way to drop some pages content after parsing it, but of course >>> keep the links because I want to follow the graph ? >>> >>> PS: Prune is no option to me because it only cleans up the indexes, not >>> the >>> segments and my indexer does that clean-up very well. >>> >>> -- -MilleBii-
