I actually use Nutch as a large scale search engine on two products.  I think a 
few things that would be nice to have are built in options to produce an 
incremental index and maybe a quartz scheduler to automate it completely.

One thing that would be nice is when one of us figures something out like doing 
an incremental index, we would create a document and post it to the wiki.  
Documentation has been one of the big hurdles for me.

Thanks for all your hard work and I hope to contribute to the project soon.

Alex

--- On Fri, 3/13/09, Dennis Kubes <ku...@apache.org> wrote:

> From: Dennis Kubes <ku...@apache.org>
> Subject: The Future of Nutch
> To: nutch-user@lucene.apache.org
> Date: Friday, March 13, 2009, 7:19 PM
> With the release of Nutch 1.0 I think it is a good time to
> begin a discussion about the future of Nutch.  Here are some
> things to consider and would love to here everyones views on
> this
> 
> Nutch's original intention was as a large-scale www
> search engine.  That is a very specific goal.  Only a few
> people and organizations actually use it on that level.  (I
> just happen to be one of them as most of my work focuses on
> large scale web search as opposed to vertical search). Many,
> perhaps most, people using Nutch these days are either using
> parts of Nutch, such as the crawler, or are targeting
> towards vertical or intranet type search engines.  This can
> be seen in how many people have already started using the
> Solr integration features.  So while Nutch was originally
> intended as a www search, IMO most people aren't using
> it for that purpose.
> 
> Since there are different purposes for different users,
> would it be good to consider moving Nutch to a top level
> apache project out from under the Lucene umbrella?  This
> would then allow the creation of nutch sub-projects, such as
> nutch-solr, nutch-hbase.  Thoughts?
> 
> Many parts of Nutch have also been implemented in other
> projects.  For example, Tika for the parsers, Droids for the
> Crawler.  In begs the question what is Nutch's core
> features going forward.  When I think about search (again my
> perspective is large scale), I think crawling or acquisition
> of data, parsing, analysis, indexing, deployment, and
> searching.  I personally think that there is much room for
> improvement in crawling and especially analysis.  Nutch
> shouldn't just be about the shell but also the brains.
> 
> And one of the biggest things I see is many newcomers to
> nutch have a very hard time getting started.  Part of this
> is understanding mapreduce mentality, part is documentation,
> part is there is only so much time some of us have to answer
> questions so some questions go unanswered on the lists.  How
> might this be improved going forward?
> 
> Any other thoughts also welcome.  Really I want to start a
> discussion about where everyone thinks we are with the state
> of Nutch and its future.
> 
> Dennis


      

Reply via email to