Grigory Begelman wrote:
I am trying to understand the concept of distributed nutch operation.
As far as I understand from available documentation and the source
code, there are the following (high-level) components:
1. Distributed WebDB,

The distributed WebDB is more of a work-in-progress. I think it works, but it has not been tested heavily, nor is there much documentation on how to use it yet.


2. Distributed search servers.

This is pretty solid.

How do I perform database population from scratch: 1. I create distributed webdb and make it accessible for all computers via nfs,
2. Inject URLs in the webdb (though WebDBInjector does not support
distributed operation)
3. Start fetching.

Mike, can you provide some assistance here? I have not personally used the distributed webdb yet.


So, should I run fetcher on each search server so that they properly
build document indexes (locally on each search server)? Or it's
possible to run less fetchers? Or I just misunderstand the whole
concept?

Logically fetchers, db updaters, indexers and searchers are separate machines. The fetcher output needs to be accessible to the db updaters, indexers and searchers. Except for when searching, nfs access is plenty fast. When searching, the index files should be local.


I hope this helps!

Doug


------------------------------------------------------- This SF.Net email is sponsored by BEA Weblogic Workshop FREE Java Enterprise J2EE developer tools! Get your free copy of BEA WebLogic Workshop 8.1 today. http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to