(switching to devs)
On 12/17/10 10:18 AM, Alexis wrote:
Hi,
I've spent some time working on this as well. I've just put together a
blog entry addressing the issues I ran into. See
http://techvineyard.blogspot.com/2010/12/build-nutch-20.html
In a nutchsell, I changed three pieces in Gora and Nutch code:
- flush the datastore regularly in the Hadoop RecordWriter (in GoraOutputFormat)
Careful here. DataStore flush may be very expensive, so it should be
done only when we are finished with the output. If you see that data is
lost without this flush then this should be reported as a Gora bug.
- wait for Hadoop job completion in the Fetcher job
I missed your previous email... I'll fix this shortly - thanks for
spotting it.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com