(switching to devs)

On 12/17/10 10:18 AM, Alexis wrote:
Hi,

I've spent some time working on this as well. I've just put together a
blog entry addressing the issues I ran into. See
http://techvineyard.blogspot.com/2010/12/build-nutch-20.html

In a nutchsell, I changed three pieces in Gora and Nutch code:
- flush the datastore regularly in the Hadoop RecordWriter (in GoraOutputFormat)

Careful here. DataStore flush may be very expensive, so it should be done only when we are finished with the output. If you see that data is lost without this flush then this should be reported as a Gora bug.

- wait for Hadoop job completion in the Fetcher job

I missed your previous email... I'll fix this shortly - thanks for spotting it.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to