Hi,

First of all, thanks! We greatly appreciate it.

We are using Nutch for a very long time now, but we have diverged a lot from the default codebase in order to make it suited for our purposes. Therefore we could never really integrate it with Nutch development itself. For example a custom component we build is a "component fetcher" which directly fetches outlink urls within a fetcher job itself to speed up certain vertical crawls. The way we implemented it prevented us from integrating it in Nutch itself. (Although sometimes we did make attempts, see details in mailing list [1]). Some other things include persisting parsed results to HBase and creating a Lucene index from HBase.

However, the recent developments with Nutchgora sparked our interest to decided to become more involved. Especially the fact that crawling can be fully maintained within HBase itself is very cool. (We are a big fan of Hadoop and Lucene too). Leaning more closely to an activily maintained codebase is of course the best way to go. Our main goal for now is having an healthy Nutchgora branch that is able to perform crawling on a large scale (40+ machines) using HBase as a backend!

By the way, Mathijs and I will be attending the upcoming HadoopWorld, so if any of you guys are going too please let us know so maybe we could join for a meet and greet.

Cheers!

1. http://lucene.472066.n3.nabble.com/Component-fetching-during-parsing-vertical-crawling-td981098.html

On 10/28/2011 02:26 PM, Markus Jelsma wrote:
Cheers!

On Friday 28 October 2011 14:21:25 Julien Nioche wrote:
Hi,

A while back the NUTCH PMC nominated Ferdy Galema for Nutch committership
and PMC membership. The VOTE tallies in Nutch PMC have occurred and I'm
happy to announce that Ferdy is now a Nutch committer.

Ferdy, feel free to say a little bit about yourself. Your account has been
created and you should have committer rights. Your first task will be to
check that it works by adding yourself to the list of committers on the
website (see Wiki for instructions).

Well done and welcome on board

Julien

Reply via email to