about the overhead

2008-07-24 Thread Wei Jiang
Hi all,

Does hadoop provide a way to let the users know the time for
computation(map/reduce functions) and the time for different types of
overhead (such as the startup, sorting, i/o disk, etc.) respectively?

Thanks~~

Best regards,

-- 
---
Wei


Two questions about hadoop

2008-07-15 Thread Wei Jiang
Hi all,

I am a new user with hadoop and have some questions about it.

1)about setting the number of maps/reduces:  With running hadoop on a 8-node
cluster, I set mapred.map.tasks to 64 and
mapred.tasktracker.map.tasks.maximum to 8, but by examining the counter
launched map tasks from the output, I found that hadoop launched from 96
to110 map tasks in different jobs. The size of the dataset is 6.4GB and the
dfs.block.size is set to be 64MB. Why is the number of launched map tasks
different in different running jobs with the same dataset size and block
size? Is there a way to make the hadoop launch the same number of map tasks
as specified exactly?

2)about the launched map tasks. Does the number of launched map tasks imply
that hadoop would spawn a new thread for each map task? How can I know the
number of threads launched by hadoop in a particular job?

Thanks very much~~

-- 
---
Wei