Robin Haswell wrote:
> On Fri, 2006-12-08 at 11:22 +0100, Andrzej Bialecki wrote:
>
>> This should be shown in the logs - the map xx% or reduce xx% progress is
>> printed to the logs.
>>
>> The reduce phase consists of copying map outputs (reduce 0-33%), then
>> sorting them - and here's where most CPU and disk IO and time is spent -
>> which happens between 33%-66%, and finally copying sorted outputs to
>> form the final result.
>>
>
> The last entries from hadoop.log are:
>
> 2006-12-07 16:34:50,547 INFO fetcher.Fetcher - fetching
> http://zut.languageskills.co.uk/press.html
> 2006-12-07 16:34:50,582 INFO fetcher.Fetcher - fetching
> http://zwartelijst-vliegtuigen.capita-pc.co.uk/
> 2006-12-07 16:34:50,614 INFO fetcher.Fetcher - fetching
> http://zzz.grad.ucl.ac.uk/
> 2006-12-07 16:34:51,005 INFO fetcher.Fetcher - fetching
> http://zzz.grad.ucl.ac.uk/comp/
> 2006-12-07 16:34:51,582 INFO fetcher.Fetcher - fetching
> http://zzz.grad.ucl.ac.uk/contact/
> 2006-12-07 16:34:51,584 INFO fetcher.Fetcher - fetching
> http://zzz.grad.ucl.ac.uk/courses/
> 2006-12-07 16:34:51,586 INFO fetcher.Fetcher - fetching
> http://zzz.grad.ucl.ac.uk/essinfo/
> 2006-12-07 16:34:51,740 INFO fetcher.Fetcher - fetching
> http://zzz.grad.ucl.ac.uk/funds/
> 2006-12-07 16:34:51,816 INFO fetcher.Fetcher - fetching
> http://zzz.grad.ucl.ac.uk/intro/
> 2006-12-07 16:34:51,876 INFO fetcher.Fetcher - fetching
> http://zzz.grad.ucl.ac.uk/javascript.js
> 2006-12-07 16:34:51,934 INFO fetcher.Fetcher - fetching
> http://zzz.grad.ucl.ac.uk/js/CreateTrail.js
> 2006-12-07 16:34:52,186 INFO fetcher.Fetcher - fetching
> http://zzz.grad.ucl.ac.uk/societies/
>
>
> This pretty much corresponds to my stdout output. Here's a strace:
>
No lines like "INFO map 100%" ? Strange.
> Process 18245 attached - interrupt to quit
> clock_gettime(CLOCK_REALTIME, {1165573641, 880921000}) = 0
>
[...]
> That's of the process consuming loads of CPU
>
> What do you think?
>
I think that instead of running strace you should get a thread dump ;)
strace cannot tell you what each JVM thread is doing.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general