On Fri, 2006-12-08 at 11:22 +0100, Andrzej Bialecki wrote: > This should be shown in the logs - the map xx% or reduce xx% progress is > printed to the logs. > > The reduce phase consists of copying map outputs (reduce 0-33%), then > sorting them - and here's where most CPU and disk IO and time is spent - > which happens between 33%-66%, and finally copying sorted outputs to > form the final result.
The last entries from hadoop.log are: 2006-12-07 16:34:50,547 INFO fetcher.Fetcher - fetching http://zut.languageskills.co.uk/press.html 2006-12-07 16:34:50,582 INFO fetcher.Fetcher - fetching http://zwartelijst-vliegtuigen.capita-pc.co.uk/ 2006-12-07 16:34:50,614 INFO fetcher.Fetcher - fetching http://zzz.grad.ucl.ac.uk/ 2006-12-07 16:34:51,005 INFO fetcher.Fetcher - fetching http://zzz.grad.ucl.ac.uk/comp/ 2006-12-07 16:34:51,582 INFO fetcher.Fetcher - fetching http://zzz.grad.ucl.ac.uk/contact/ 2006-12-07 16:34:51,584 INFO fetcher.Fetcher - fetching http://zzz.grad.ucl.ac.uk/courses/ 2006-12-07 16:34:51,586 INFO fetcher.Fetcher - fetching http://zzz.grad.ucl.ac.uk/essinfo/ 2006-12-07 16:34:51,740 INFO fetcher.Fetcher - fetching http://zzz.grad.ucl.ac.uk/funds/ 2006-12-07 16:34:51,816 INFO fetcher.Fetcher - fetching http://zzz.grad.ucl.ac.uk/intro/ 2006-12-07 16:34:51,876 INFO fetcher.Fetcher - fetching http://zzz.grad.ucl.ac.uk/javascript.js 2006-12-07 16:34:51,934 INFO fetcher.Fetcher - fetching http://zzz.grad.ucl.ac.uk/js/CreateTrail.js 2006-12-07 16:34:52,186 INFO fetcher.Fetcher - fetching http://zzz.grad.ucl.ac.uk/societies/ This pretty much corresponds to my stdout output. Here's a strace: Process 18245 attached - interrupt to quit clock_gettime(CLOCK_REALTIME, {1165573641, 880921000}) = 0 futex(0x2aab361b2e94, FUTEX_WAIT, 1556713, {0, 120816000}) = -1 ETIMEDOUT (Connection timed out) futex(0x40118e98, FUTEX_WAKE, 1) = 0 clock_gettime(CLOCK_REALTIME, {1165573642, 6244000}) = 0 futex(0x2aab361b2e94, FUTEX_WAIT, 1556715, {0, 999931000}) = -1 ETIMEDOUT (Connection timed out) futex(0x40118e98, FUTEX_WAKE, 1) = 0 clock_gettime(CLOCK_REALTIME, {1165573643, 10317000}) = 0 futex(0x2aab361b2e94, FUTEX_WAIT, 1556717, {0, 999934000}) = -1 ETIMEDOUT (Connection timed out) futex(0x40118e98, FUTEX_WAKE, 1) = 0 clock_gettime(CLOCK_REALTIME, {1165573644, 14358000}) = 0 futex(0x2aab361b2e94, FUTEX_WAIT, 1556719, {0, 999935000}) = -1 ETIMEDOUT (Connection timed out) futex(0x40118e98, FUTEX_WAKE, 1) = 0 clock_gettime(CLOCK_REALTIME, {1165573645, 18481000}) = 0 futex(0x2aab361b2e94, FUTEX_WAIT, 1556721, {0, 999936000}) = -1 ETIMEDOUT (Connection timed out) futex(0x40118e98, FUTEX_WAKE, 1) = 0 clock_gettime(CLOCK_REALTIME, {1165573646, 22537000}) = 0 futex(0x2aab361b2e94, FUTEX_WAIT, 1556723, {0, 999931000} <unfinished ...> Process 18245 detached That's of the process consuming loads of CPU What do you think? Thanks -Rob ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
