Re: Does Nutch 2.0 in good enough shape to test?

Andrzej Bialecki Fri, 17 Dec 2010 02:10:22 -0800

(switching to devs)

On 12/17/10 10:18 AM, Alexis wrote:

Hi,


I've spent some time working on this as well. I've just put together a
blog entry addressing the issues I ran into. See
http://techvineyard.blogspot.com/2010/12/build-nutch-20.html

In a nutchsell, I changed three pieces in Gora and Nutch code:
- flush the datastore regularly in the Hadoop RecordWriter (in GoraOutputFormat)

Careful here. DataStore flush may be very expensive, so it should bedone only when we are finished with the output. If you see that data islost without this flush then this should be reported as a Gora bug.

- wait for Hadoop job completion in the Fetcher job

I missed your previous email... I'll fix this shortly - thanks forspotting it.


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: Does Nutch 2.0 in good enough shape to test?

Reply via email to