Thank you for your response. I was imagining the 2 concepts of i) using hadoop.contrib.index to index documents ii) providing search in a distributed fashion, to be all in one box.
So basically, hadoop.contrib.index is used to create lucene indexes in a distributed fashion (by creating shards-each shard being a lucene instance). And then I can use Katta or any other Distributed Lucene application to serve lucene indexes distributed over many servers. Deepika -----Original Message----- From: Ning Li [mailto:[EMAIL PROTECTED] Sent: Friday, August 08, 2008 7:08 AM To: core-user@hadoop.apache.org Subject: Re: Distributed Lucene - from hadoop contrib > 1) Katta n Distributed Lucene are different projects though, right? Both > being based on kind of the same paradigm (Distributed Index)? The design of Katta and that of Distributed Lucene are quite different last time I checked. I pointed out the Katta project because you can find the code for Distributed Lucene there. > 2) So, I should be able to use the hadoop.contrib.index with HDFS. > Though, it would be much better if it is integrated with "Distributed > Lucene" or the "Katta project" as these are designed keeping the > structure and behavior of indexes in mind. Right? As described in the README file, hadoop.contrib.index uses map/reduce to build Lucene instances. It does not contain a component that serves queries. If that's not sufficient for you, you can check out the designs of Katta and Distributed Index and see which one suits your use better. Ning