You right, I forgot to put 719 manually  when I moved on my Linux box. Thank
Julien.
We really ought to have patch for this one and probably also in a Nutch 1.1

I will comment on the JIRA for 770, bare with me I've never done that
before.

Now to the bandwidth issue : I found a way to greatly improve it by raising
to 800 threads. The bandwidth is above 6Mb/s and around 35 fetches/s,
knowing that I have two maps running concurrently (default mode in hadoop
pseudo-distributed). So it is really half per map task.
Not sure what it means but it looks like there is a lot of waiting involved
for each fetch.

2009/11/28 Julien Nioche <[email protected]>

> nutch-721 is a different issue. 719 has no patch but describes the solution
> to the problem you encountered.
> if you get errors with 770 it would be helpful to comment on the JIRA
>
> 2009/11/27 MilleBii <[email protected]>
>
> > Already applied that patch which is actually 721, I was part of that
> > discussion at the time. The difference now is that I moved on a linux
> box,
> > and working pseudo-distributed hadoop, also I took a later nutch
> snapshot.
> >
> > By the way I could not apply Time-Bomb 770 patch command gives me errors.
> >
> > I applied 769 and tried it with a level at threshold at 5 no real
> > improvement either.
> >
> >
> > 2009/11/27 Julien Nioche <[email protected]>
> >
> > > there is a jira + a discussion on the mailing list on this. This is a
> > > synchronisation problem which has already been reported, patched but
> not
> > > yet
> > > committed. See https://issues.apache.org/jira/browse/NUTCH-719
> > >
> > > J.
> > >
> > > 2009/11/27 MilleBii <[email protected]>
> > >
> > > > My fetch run is getting to the end now I have the following logs
> > towards
> > > > the
> > > > end
> > > >
> > > > 2009-11-27 19:07:43,866 INFO  fetcher.Fetcher - -activeThreads=100,
> > > > spinWaiting=100, fetchQueues.totalSize=12
> > > > 2009-11-27 19:07:44,866 INFO  fetcher.Fetcher - -activeThreads=100,
> > > > spinWaiting=100, fetchQueues.totalSize=12
> > > > 2009-11-27 19:07:45,866 INFO  fetcher.Fetcher - -activeThreads=100,
> > > > spinWaiting=100, fetchQueues.totalSize=12
> > > > 2009-11-27 19:07:46,866 INFO  fetcher.Fetcher - -activeThreads=100,
> > > > spinWaiting=100, fetchQueues.totalSize=12
> > > > 2009-11-27 19:07:47,867 INFO  fetcher.Fetcher - -activeThreads=100,
> > > > spinWaiting=100, fetchQueues.totalSize=12
> > > > 2009-11-27 19:07:47,867 WARN  fetcher.Fetcher - Aborting with 100
> hung
> > > > threads.
> > > >
> > > > It was same on previous run, the fetchqueue is not "empty", what does
> > it
> > > > mean ? Looks like there is 'problem'
> > > >
> > > >
> > > > 2009/11/27 Andrzej Bialecki <[email protected]>
> > > >
> > > > > MilleBii wrote:
> > > > >
> > > > >> You mean map/reduce tasks ???
> > > > >>
> > > > >
> > > > > Yes.
> > > > >
> > > > >
> > > > >  Being in pseudo-distributed / single node I only have two maps
> > during
> > > > the
> > > > >> fetch phase... so it would be back to the URLs distribution.
> > > > >>
> > > > >
> > > > > Well, yes, but my explanation is still valid. Which unfortunately
> > > doesn't
> > > > > change the situation.
> > > > >
> > > > > Next week I will be working on integrating the patches from Julien,
> > and
> > > > if
> > > > > time permits I could perhaps start working on a speed monitoring to
> > > lock
> > > > out
> > > > > slow servers.
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > > Andrzej Bialecki     <><
> > > > >  ___. ___ ___ ___ _ _   __________________________________
> > > > > [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> > > > > ___|||__||  \|  ||  |  Embedded Unix, System Integration
> > > > > http://www.sigram.com  Contact: info at sigram dot com
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > -MilleBii-
> > > >
> > >
> > >
> > >
> > > --
> > > DigitalPebble Ltd
> > > http://www.digitalpebble.com
> > >
> >
> >
> >
> > --
> > -MilleBii-
> >
>
>
>
> --
> DigitalPebble Ltd
> http://www.digitalpebble.com
>



-- 
-MilleBii-

Reply via email to