Re: The Future of Nutch, reactivated

2009-05-23 Thread Otis Gospodnetic
Solr - Nutch - Original Message > From: Andrzej Bialecki > To: nutch-dev@lucene.apache.org > Sent: Thursday, May 14, 2009 9:59:11 AM > Subject: The Future of Nutch, reactivated > > Hi all, > > I'd like to revive this thread and gather additional feedback so

Re: The Future of Nutch, reactivated

2009-05-19 Thread Bradford Stephens
> From: Aaron Binns > To: nutch-dev@lucene.apache.org > Sent: Tue May 19 13:23:37 2009 > Subject: Re: The Future of Nutch, reactivated > > > Andrzej Bialecki writes: > > >> One of the biggest boons of Nutch is the Hadoop infrastructure. When > >>

Re: The Future of Nutch, reactivated

2009-05-19 Thread Mark Olson
R - Original Message - From: Aaron Binns To: nutch-dev@lucene.apache.org Sent: Tue May 19 13:23:37 2009 Subject: Re: The Future of Nutch, reactivated Andrzej Bialecki writes: >> One of the biggest boons of Nutch is the Hadoop infrastructure. When >> indexing massi

Re: The Future of Nutch, reactivated

2009-05-19 Thread Mark Olson
AA{hb - Original Message - From: Aaron Binns To: nutch-dev@lucene.apache.org Sent: Tue May 19 13:23:37 2009 Subject: Re: The Future of Nutch, reactivated Andrzej Bialecki writes: >> One of the biggest boons of Nutch is the Hadoop infrastructure. When >> indexing massi

Re: The Future of Nutch, reactivated

2009-05-19 Thread Aaron Binns
Andrzej Bialecki writes: >> One of the biggest boons of Nutch is the Hadoop infrastructure. When >> indexing massive data sets, being able to fire up 60+ nodes in a >> Hadoop system helps tremendously. > > Are you familiar with the distributed indexing package in Hadoop > contrib/ ? Only super

Re: The Future of Nutch, reactivated

2009-05-19 Thread Andrzej Bialecki
Aaron Binns wrote: Our usage of Nutch is focused on index building and search services. We don't use the crawling/fetching features at all. We use Heritrix. Typically, our large-scale harvests are performed over 8-12 week periods, then the archived data is handed off to me for full-text search

Re: The Future of Nutch, reactivated

2009-05-18 Thread Aaron Binns
Andrzej Bialecki writes: > Target audience > === > I think that the Nutch project experiences a crisis of personality now - > we are not sure what is the target audience, and we cannot satisfy > everyone. I think that there are following groups of Nutch users: > > 1. Large-scale Inte

The Future of Nutch, reactivated

2009-05-14 Thread Kirby Bohling
All, Sorry that I didn't reply, and thus this isn't threaded properly. I've lurked on the list via the RSS feed, I subscribed so I could put in my two cents worth. I've recently starting using git to maintain a local branch of Nutch. My hope is to get my employer to let me contribute "just engin

Re: The Future of Nutch, reactivated

2009-05-14 Thread Mattmann, Chris A
Hi Andrzej, Great summary. My general feeling on this is similar to my prior comments on similar threads from Otis and from Dennis. My personal pet projects for Nutch2: * refactored Nutch core data structures, modeled as POJOs * refactored Nutch architecture where crawling/indexing/parsing/scorin

The Future of Nutch, reactivated

2009-05-14 Thread Andrzej Bialecki
Hi all, I'd like to revive this thread and gather additional feedback so that we end up with concrete conclusions. Much of what I write below others have said before, I'm trying here to express this as it looks from my point of view. Target audience === I think that the Nutch project