Solr - Nutch
- Original Message
> From: Andrzej Bialecki
> To: nutch-dev@lucene.apache.org
> Sent: Thursday, May 14, 2009 9:59:11 AM
> Subject: The Future of Nutch, reactivated
>
> Hi all,
>
> I'd like to revive this thread and gather additional feedback so
> From: Aaron Binns
> To: nutch-dev@lucene.apache.org
> Sent: Tue May 19 13:23:37 2009
> Subject: Re: The Future of Nutch, reactivated
>
>
> Andrzej Bialecki writes:
>
> >> One of the biggest boons of Nutch is the Hadoop infrastructure. When
> >>
R
- Original Message -
From: Aaron Binns
To: nutch-dev@lucene.apache.org
Sent: Tue May 19 13:23:37 2009
Subject: Re: The Future of Nutch, reactivated
Andrzej Bialecki writes:
>> One of the biggest boons of Nutch is the Hadoop infrastructure. When
>> indexing massi
AA{hb
- Original Message -
From: Aaron Binns
To: nutch-dev@lucene.apache.org
Sent: Tue May 19 13:23:37 2009
Subject: Re: The Future of Nutch, reactivated
Andrzej Bialecki writes:
>> One of the biggest boons of Nutch is the Hadoop infrastructure. When
>> indexing massi
Andrzej Bialecki writes:
>> One of the biggest boons of Nutch is the Hadoop infrastructure. When
>> indexing massive data sets, being able to fire up 60+ nodes in a
>> Hadoop system helps tremendously.
>
> Are you familiar with the distributed indexing package in Hadoop
> contrib/ ?
Only super
Aaron Binns wrote:
Our usage of Nutch is focused on index building and search services. We
don't use the crawling/fetching features at all. We use Heritrix.
Typically, our large-scale harvests are performed over 8-12 week
periods, then the archived data is handed off to me for full-text search
Andrzej Bialecki writes:
> Target audience
> ===
> I think that the Nutch project experiences a crisis of personality now -
> we are not sure what is the target audience, and we cannot satisfy
> everyone. I think that there are following groups of Nutch users:
>
> 1. Large-scale Inte
All,
Sorry that I didn't reply, and thus this isn't threaded properly.
I've lurked on the list via the RSS feed, I subscribed so I could put
in my two cents worth. I've recently starting using git to maintain a
local branch of Nutch. My hope is to get my employer to let me
contribute "just engin
Hi Andrzej,
Great summary. My general feeling on this is similar to my prior comments on
similar threads from Otis and from Dennis. My personal pet projects for
Nutch2:
* refactored Nutch core data structures, modeled as POJOs
* refactored Nutch architecture where crawling/indexing/parsing/scorin
Hi all,
I'd like to revive this thread and gather additional feedback so that we
end up with concrete conclusions. Much of what I write below others have
said before, I'm trying here to express this as it looks from my point
of view.
Target audience
===
I think that the Nutch project
10 matches
Mail list logo