You right, I forgot to put 719 manually when I moved on my Linux box. Thank Julien. We really ought to have patch for this one and probably also in a Nutch 1.1
I will comment on the JIRA for 770, bare with me I've never done that before. Now to the bandwidth issue : I found a way to greatly improve it by raising to 800 threads. The bandwidth is above 6Mb/s and around 35 fetches/s, knowing that I have two maps running concurrently (default mode in hadoop pseudo-distributed). So it is really half per map task. Not sure what it means but it looks like there is a lot of waiting involved for each fetch. 2009/11/28 Julien Nioche <[email protected]> > nutch-721 is a different issue. 719 has no patch but describes the solution > to the problem you encountered. > if you get errors with 770 it would be helpful to comment on the JIRA > > 2009/11/27 MilleBii <[email protected]> > > > Already applied that patch which is actually 721, I was part of that > > discussion at the time. The difference now is that I moved on a linux > box, > > and working pseudo-distributed hadoop, also I took a later nutch > snapshot. > > > > By the way I could not apply Time-Bomb 770 patch command gives me errors. > > > > I applied 769 and tried it with a level at threshold at 5 no real > > improvement either. > > > > > > 2009/11/27 Julien Nioche <[email protected]> > > > > > there is a jira + a discussion on the mailing list on this. This is a > > > synchronisation problem which has already been reported, patched but > not > > > yet > > > committed. See https://issues.apache.org/jira/browse/NUTCH-719 > > > > > > J. > > > > > > 2009/11/27 MilleBii <[email protected]> > > > > > > > My fetch run is getting to the end now I have the following logs > > towards > > > > the > > > > end > > > > > > > > 2009-11-27 19:07:43,866 INFO fetcher.Fetcher - -activeThreads=100, > > > > spinWaiting=100, fetchQueues.totalSize=12 > > > > 2009-11-27 19:07:44,866 INFO fetcher.Fetcher - -activeThreads=100, > > > > spinWaiting=100, fetchQueues.totalSize=12 > > > > 2009-11-27 19:07:45,866 INFO fetcher.Fetcher - -activeThreads=100, > > > > spinWaiting=100, fetchQueues.totalSize=12 > > > > 2009-11-27 19:07:46,866 INFO fetcher.Fetcher - -activeThreads=100, > > > > spinWaiting=100, fetchQueues.totalSize=12 > > > > 2009-11-27 19:07:47,867 INFO fetcher.Fetcher - -activeThreads=100, > > > > spinWaiting=100, fetchQueues.totalSize=12 > > > > 2009-11-27 19:07:47,867 WARN fetcher.Fetcher - Aborting with 100 > hung > > > > threads. > > > > > > > > It was same on previous run, the fetchqueue is not "empty", what does > > it > > > > mean ? Looks like there is 'problem' > > > > > > > > > > > > 2009/11/27 Andrzej Bialecki <[email protected]> > > > > > > > > > MilleBii wrote: > > > > > > > > > >> You mean map/reduce tasks ??? > > > > >> > > > > > > > > > > Yes. > > > > > > > > > > > > > > > Being in pseudo-distributed / single node I only have two maps > > during > > > > the > > > > >> fetch phase... so it would be back to the URLs distribution. > > > > >> > > > > > > > > > > Well, yes, but my explanation is still valid. Which unfortunately > > > doesn't > > > > > change the situation. > > > > > > > > > > Next week I will be working on integrating the patches from Julien, > > and > > > > if > > > > > time permits I could perhaps start working on a speed monitoring to > > > lock > > > > out > > > > > slow servers. > > > > > > > > > > > > > > > -- > > > > > Best regards, > > > > > Andrzej Bialecki <>< > > > > > ___. ___ ___ ___ _ _ __________________________________ > > > > > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > > > > > ___|||__|| \| || | Embedded Unix, System Integration > > > > > http://www.sigram.com Contact: info at sigram dot com > > > > > > > > > > > > > > > > > > > > > > -- > > > > -MilleBii- > > > > > > > > > > > > > > > > -- > > > DigitalPebble Ltd > > > http://www.digitalpebble.com > > > > > > > > > > > -- > > -MilleBii- > > > > > > -- > DigitalPebble Ltd > http://www.digitalpebble.com > -- -MilleBii-
