Hi nutch community,

since i had request for improvement suggestions, i will do the first step and bring some fresh ideas to the plugin system. ;)
I mean it seriously and at least I'm here to implement it.


At first let us define a goal and its requirements.

Nutch should be a enterprise search engine.
Enterprise means to me:

+ High performance
+ stable / fall back
+ easy maintaining at runtime
+ easy extension at runtime
to name a few.

To translate this "business" key word in technology.

+ High performance => good algorithms, load balancing
+ stable / fall back => fall back solutions for 24/7
+ easy maintaining at runtime => independent components that can managed (start, stop, redeploy) at runtime
+ easy extension at runtime => deploy components at runtime.


This issues most application servers implement as well.

How they do it?
Most application servers are cluster-able to do a load balancing and fall back solutions. They have as well a kind of plugin mechanism that allow deployment and management at runtime.


Especially for this requirement a technology specification was developed, the Java Management Extension (JMX).
http://java.sun.com/products/JavaManagement/
http://www.amazon.com/exec/obidos/tg/detail/-/1930110561/ qid=1075409894/sr=1-1/ref=sr_1_1/102-6533159-6947339?v=glance&s=books
http://www.amazon.com/exec/obidos/tg/detail/-/0672322889/ qid=1075409894/sr=1-2/ref=sr_1_2/102-6533159-6947339?v=glance&s=books
(remember to support your local book shop! Globalization of information and communication is good, but Globalization??? hmm! )


So i was thinking a time how we can bring nutch on top of a jmx layer by using the existing plugin mechanism.
http://www.media-style.com/index.jsp?folderPK=422


In case everything in nutch is a plugin. Each plugin has a api and mostly use a api of a other plugin.
The usage of a plugin api i call: "extension".
To provide a api i call: provide a "extension-point".
This vocabulary was defined by OTI/IBM in the eclipse framework.


The api are described in a programming language independent language. (XML / XML-Schema) Similar to web services.
This api description i call plugin.xml or plugin-manifest. (a java manifest file is something different!!!)
All plugin live in a container that load, start and stop it.


All this feature JMX provides as well. Future more it provide a possibility to distribute the plugins over a set of machines, abstract the communication model and provide a deployment mechanism at runtime.

So there are just some differences in the vocabulary.
A container is called jmx server.
A plugin is called MBean.


To bring it from the existing code to JMX wouldn't be much work (a plugin is a Model MBean). At least just the Plugin repository and the plugin communication need to refactored.


Features that would come with porting to JMX. (please see the JMX literature for technical details)

+ deployment of plugins at runtime.
+ shutdown of plugins at runtime.
+ remote management of attributes at runtime
For example changing the folder path where the index is stored, when a network storage crashed.
+ notification mechanism
A set of plugin instances can subscribe to a notification publisher.
In case a index writer gets it data via notification from the fetcher. Two instances of the index write running on different machines could subscribed to a fetcher.
That would be a software raid 1. ;)
A other case could be nutch is running on 6 machines, where 5 machines run a fetcher and just one machine write the results to disk or versa-vi.
+ notification of attribute changes.
If a hard-drive is full a other machine can takeover to provide a fail back index writer/reader.


+ nutch integration in a j2ee environment.
In case someone wish to run a commercial search service, it would be easy to integrate a j2ee e-commerce solution.


A set of other great functionalities we will get until we use JMX. The best of all we do not need to code it, since JMX provide it ready to use.

If you wish to see JMX in action, i suggest download a jboss.org version, run it by starting the run.sh/bat; wait until it is started.
Open your webbrowser and surf to: http://localhost:8080/jmx-console/
Just click around!
Take a look to http://localhost:8080/web-console/ as well!



What you think about this improvement suggestion?


Regards
Stefan



open technology:   www.media-style.com
open source:           www.weta-group.net
open discussion:    www.text-mining.org



-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to