howard chen wrote:
Can you suggest if using Hadoop + Lucene, how to make a simple
distributed indexing & searching program, i.e. what are the mapping /
reducing processes involved in both indexing abd searching?
There is not yet a universal, best practice for this.
Nutch provides an example of how to use Lucene for distributed indexing.
Nutch's current distributed search implementation builds on Hadoop's
RPC mechanism, but is not based on Hadoop's MapReduce.
http://lucene.apache.org/nutch/apidocs/org/apache/nutch/searcher/DistributedSearch.html
There has been some discussion of MapReduce-based distributed search on
the Nutch lists, e.g.:
http://mail-archives.apache.org/mod_mbox/lucene-nutch-user/200604.mbox/[EMAIL
PROTECTED]
I think Andrzej Bialecki has explored this approach some.
Another approach is to build a non-MapReduce-based system specifically
for supporting distributed search and indexing. I started a discussion
about this a few months ago and hope to start work on this project
before long.
http://www.nabble.com/-PROPOSAL--index-server-project-tf2469695.html
Doug
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]