Marko Bauhardt wrote:
Hi,
i know you are working in the new "plugin system", osgi etc. but i want to talk about new extension points.

I think it would be helpfully if we have for example an extension point IPreCrawl and IPostCrawl. This extension points can be use to implement some helpfully jobs.

For example before starting a new crawl one implementation of IPreCrawl could be + export urls from a "database" in a url file for inject this file into the crawldb
+ or create statistics.

If a crawl is finished one implementation of IPostCrawl could be
+ restart search servers
+ switch index
+ create statistics from this complete crawl
+ or sending email or whatever to an administrator...

This looks to me less like an extension point and more like a notification system, e.g. JMS-based. Currently the execution of plugins in extension points is synchronous, i.e. the calling application will be blocked until the plugin completes its execution. Most likely you want an asynchronous execution here?



Also i think statistics of a segment or the crawldb are very important to get an overview about the url room. So maybe an other extensionPoint (e.g. ISegmentStatistic) can be used to create statistics for every segment after this segment is fetched.

I agree - segment parts are immutable, so once they are created their statistics are also immutable. It would make even more sense to collect such stats on-the-fly as each part is being created, and then write them out to a per-segment metadata file.

BTW: we really need to move away from using the name "segment", which is for many reasons confusing, and towards using the name "shard" which seems to be the commonly used name for this kind of data.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to