Marko Bauhardt wrote:
Hi,
i know you are working in the new "plugin system", osgi etc. but i want
to talk about new extension points.
I think it would be helpfully if we have for example an extension point
IPreCrawl and IPostCrawl. This extension points can be use to implement
some helpfully jobs.
For example before starting a new crawl one implementation of IPreCrawl
could be
+ export urls from a "database" in a url file for inject this file into
the crawldb
+ or create statistics.
If a crawl is finished one implementation of IPostCrawl could be
+ restart search servers
+ switch index
+ create statistics from this complete crawl
+ or sending email or whatever to an administrator...
This looks to me less like an extension point and more like a
notification system, e.g. JMS-based. Currently the execution of plugins
in extension points is synchronous, i.e. the calling application will be
blocked until the plugin completes its execution. Most likely you want
an asynchronous execution here?
Also i think statistics of a segment or the crawldb are very important
to get an overview about the url room. So maybe an other extensionPoint
(e.g. ISegmentStatistic) can be used to create statistics for every
segment after this segment is fetched.
I agree - segment parts are immutable, so once they are created their
statistics are also immutable. It would make even more sense to collect
such stats on-the-fly as each part is being created, and then write them
out to a per-segment metadata file.
BTW: we really need to move away from using the name "segment", which is
for many reasons confusing, and towards using the name "shard" which
seems to be the commonly used name for this kind of data.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com