On Fri, 2006-12-08 at 11:22 +0100, Andrzej Bialecki wrote:
> This should be shown in the logs - the map xx% or reduce xx% progress is 
> printed to the logs.
> 
> The reduce phase consists of copying map outputs (reduce 0-33%), then 
> sorting them - and here's where most CPU and disk IO and time is spent - 
> which happens between 33%-66%, and finally copying sorted outputs to 
> form the final result.

The last entries from hadoop.log are:

2006-12-07 16:34:50,547 INFO  fetcher.Fetcher - fetching
http://zut.languageskills.co.uk/press.html
2006-12-07 16:34:50,582 INFO  fetcher.Fetcher - fetching
http://zwartelijst-vliegtuigen.capita-pc.co.uk/
2006-12-07 16:34:50,614 INFO  fetcher.Fetcher - fetching
http://zzz.grad.ucl.ac.uk/
2006-12-07 16:34:51,005 INFO  fetcher.Fetcher - fetching
http://zzz.grad.ucl.ac.uk/comp/
2006-12-07 16:34:51,582 INFO  fetcher.Fetcher - fetching
http://zzz.grad.ucl.ac.uk/contact/
2006-12-07 16:34:51,584 INFO  fetcher.Fetcher - fetching
http://zzz.grad.ucl.ac.uk/courses/
2006-12-07 16:34:51,586 INFO  fetcher.Fetcher - fetching
http://zzz.grad.ucl.ac.uk/essinfo/
2006-12-07 16:34:51,740 INFO  fetcher.Fetcher - fetching
http://zzz.grad.ucl.ac.uk/funds/
2006-12-07 16:34:51,816 INFO  fetcher.Fetcher - fetching
http://zzz.grad.ucl.ac.uk/intro/
2006-12-07 16:34:51,876 INFO  fetcher.Fetcher - fetching
http://zzz.grad.ucl.ac.uk/javascript.js
2006-12-07 16:34:51,934 INFO  fetcher.Fetcher - fetching
http://zzz.grad.ucl.ac.uk/js/CreateTrail.js
2006-12-07 16:34:52,186 INFO  fetcher.Fetcher - fetching
http://zzz.grad.ucl.ac.uk/societies/


This pretty much corresponds to my stdout output. Here's a strace:

Process 18245 attached - interrupt to quit
clock_gettime(CLOCK_REALTIME, {1165573641, 880921000}) = 0
futex(0x2aab361b2e94, FUTEX_WAIT, 1556713, {0, 120816000}) = -1
ETIMEDOUT (Connection timed out)
futex(0x40118e98, FUTEX_WAKE, 1)        = 0
clock_gettime(CLOCK_REALTIME, {1165573642, 6244000}) = 0
futex(0x2aab361b2e94, FUTEX_WAIT, 1556715, {0, 999931000}) = -1
ETIMEDOUT (Connection timed out)
futex(0x40118e98, FUTEX_WAKE, 1)        = 0
clock_gettime(CLOCK_REALTIME, {1165573643, 10317000}) = 0
futex(0x2aab361b2e94, FUTEX_WAIT, 1556717, {0, 999934000}) = -1
ETIMEDOUT (Connection timed out)
futex(0x40118e98, FUTEX_WAKE, 1)        = 0
clock_gettime(CLOCK_REALTIME, {1165573644, 14358000}) = 0
futex(0x2aab361b2e94, FUTEX_WAIT, 1556719, {0, 999935000}) = -1
ETIMEDOUT (Connection timed out)
futex(0x40118e98, FUTEX_WAKE, 1)        = 0
clock_gettime(CLOCK_REALTIME, {1165573645, 18481000}) = 0
futex(0x2aab361b2e94, FUTEX_WAIT, 1556721, {0, 999936000}) = -1
ETIMEDOUT (Connection timed out)
futex(0x40118e98, FUTEX_WAKE, 1)        = 0
clock_gettime(CLOCK_REALTIME, {1165573646, 22537000}) = 0
futex(0x2aab361b2e94, FUTEX_WAIT, 1556723, {0, 999931000}
<unfinished ...>
Process 18245 detached


That's of the process consuming loads of CPU

What do you think?

Thanks

-Rob


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to