Robin Haswell wrote:
> On Fri, 2006-12-08 at 11:22 +0100, Andrzej Bialecki wrote:
>   
>> This should be shown in the logs - the map xx% or reduce xx% progress is 
>> printed to the logs.
>>
>> The reduce phase consists of copying map outputs (reduce 0-33%), then 
>> sorting them - and here's where most CPU and disk IO and time is spent - 
>> which happens between 33%-66%, and finally copying sorted outputs to 
>> form the final result.
>>     
>
> The last entries from hadoop.log are:
>
> 2006-12-07 16:34:50,547 INFO  fetcher.Fetcher - fetching
> http://zut.languageskills.co.uk/press.html
> 2006-12-07 16:34:50,582 INFO  fetcher.Fetcher - fetching
> http://zwartelijst-vliegtuigen.capita-pc.co.uk/
> 2006-12-07 16:34:50,614 INFO  fetcher.Fetcher - fetching
> http://zzz.grad.ucl.ac.uk/
> 2006-12-07 16:34:51,005 INFO  fetcher.Fetcher - fetching
> http://zzz.grad.ucl.ac.uk/comp/
> 2006-12-07 16:34:51,582 INFO  fetcher.Fetcher - fetching
> http://zzz.grad.ucl.ac.uk/contact/
> 2006-12-07 16:34:51,584 INFO  fetcher.Fetcher - fetching
> http://zzz.grad.ucl.ac.uk/courses/
> 2006-12-07 16:34:51,586 INFO  fetcher.Fetcher - fetching
> http://zzz.grad.ucl.ac.uk/essinfo/
> 2006-12-07 16:34:51,740 INFO  fetcher.Fetcher - fetching
> http://zzz.grad.ucl.ac.uk/funds/
> 2006-12-07 16:34:51,816 INFO  fetcher.Fetcher - fetching
> http://zzz.grad.ucl.ac.uk/intro/
> 2006-12-07 16:34:51,876 INFO  fetcher.Fetcher - fetching
> http://zzz.grad.ucl.ac.uk/javascript.js
> 2006-12-07 16:34:51,934 INFO  fetcher.Fetcher - fetching
> http://zzz.grad.ucl.ac.uk/js/CreateTrail.js
> 2006-12-07 16:34:52,186 INFO  fetcher.Fetcher - fetching
> http://zzz.grad.ucl.ac.uk/societies/
>
>
> This pretty much corresponds to my stdout output. Here's a strace:
>   

No lines like "INFO map 100%" ? Strange.

> Process 18245 attached - interrupt to quit
> clock_gettime(CLOCK_REALTIME, {1165573641, 880921000}) = 0
>   
[...]
> That's of the process consuming loads of CPU
>
> What do you think?
>   

I think that instead of running strace you should get a thread dump ;) 
strace cannot tell you what each JVM thread is doing.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to