Re: The Future of Nutch

Doğacan Güney Thu, 02 Apr 2009 06:07:04 -0700

On Wed, Apr 1, 2009 at 17:42, Ken Krugler <kkrugler_li...@transpac.com>wrote:


>  On Fri, 2009-03-13 at 19:42 -0700, buddha1021 wrote:
>>
>>>  hi dennis:
>>>
>> ...
>>  > I am confident that hadoop can process the large datas of the  www
>> search
>>
>>>  engine! But lucene? I am afraid of the limited size of lucene's index
>>> per
>>>  server is very little ,10G? or 30G? this is not enough for the www
>>> search
>>>
>>  > engine! IMO, this is a bottleneck!
>>
>> I agree that the actual problem/solution of accessing lucene indexes is
>> to keep them small. What does the possibility of having a clouded index
>> serve if accessing it takes hours?
>>
>> For me here should lie one of nutch core competences: making search in
>> BIG indexes fast (as fast as in SMALL indexes).
>>
>
> I would suggest looking at Katta (http://katta.sourceforge.net/). It's one
> of several projects where the goal is to support very large Lucene indexes
> via distributed shards. Solr has also added federated search support.
>

I agree. I think the new index framework should be flexible enough that we
can support katta along
with solr. Actually, this is one of the things I want to do before the next
major release.


>
> -- Ken
> --
> Ken Krugler
> +1 530-210-6378
>



-- 
Doğacan Güney

Re: The Future of Nutch

Reply via email to