On Wed, Apr 1, 2009 at 17:42, Ken Krugler <kkrugler_li...@transpac.com>wrote:
> On Fri, 2009-03-13 at 19:42 -0700, buddha1021 wrote: >> >>> hi dennis: >>> >> ... >> > I am confident that hadoop can process the large datas of the www >> search >> >>> engine! But lucene? I am afraid of the limited size of lucene's index >>> per >>> server is very little ,10G? or 30G? this is not enough for the www >>> search >>> >> > engine! IMO, this is a bottleneck! >> >> I agree that the actual problem/solution of accessing lucene indexes is >> to keep them small. What does the possibility of having a clouded index >> serve if accessing it takes hours? >> >> For me here should lie one of nutch core competences: making search in >> BIG indexes fast (as fast as in SMALL indexes). >> > > I would suggest looking at Katta (http://katta.sourceforge.net/). It's one > of several projects where the goal is to support very large Lucene indexes > via distributed shards. Solr has also added federated search support. > I agree. I think the new index framework should be flexible enough that we can support katta along with solr. Actually, this is one of the things I want to do before the next major release. > > -- Ken > -- > Ken Krugler > +1 530-210-6378 > -- Doğacan Güney