With the release of Nutch 1.0 I think it is a good time to begin a discussion about the future of Nutch. Here are some things to consider and would love to here everyones views on this

Nutch's original intention was as a large-scale www search engine. That is a very specific goal. Only a few people and organizations actually use it on that level. (I just happen to be one of them as most of my work focuses on large scale web search as opposed to vertical search). Many, perhaps most, people using Nutch these days are either using parts of Nutch, such as the crawler, or are targeting towards vertical or intranet type search engines. This can be seen in how many people have already started using the Solr integration features. So while Nutch was originally intended as a www search, IMO most people aren't using it for that purpose.

Since there are different purposes for different users, would it be good to consider moving Nutch to a top level apache project out from under the Lucene umbrella? This would then allow the creation of nutch sub-projects, such as nutch-solr, nutch-hbase. Thoughts?

Many parts of Nutch have also been implemented in other projects. For example, Tika for the parsers, Droids for the Crawler. In begs the question what is Nutch's core features going forward. When I think about search (again my perspective is large scale), I think crawling or acquisition of data, parsing, analysis, indexing, deployment, and searching. I personally think that there is much room for improvement in crawling and especially analysis. Nutch shouldn't just be about the shell but also the brains.

And one of the biggest things I see is many newcomers to nutch have a very hard time getting started. Part of this is understanding mapreduce mentality, part is documentation, part is there is only so much time some of us have to answer questions so some questions go unanswered on the lists. How might this be improved going forward?

Any other thoughts also welcome. Really I want to start a discussion about where everyone thinks we are with the state of Nutch and its future.

Dennis

Reply via email to