about the overhead
Hi all, Does hadoop provide a way to let the users know the time for computation(map/reduce functions) and the time for different types of overhead (such as the startup, sorting, i/o disk, etc.) respectively? Thanks~~ Best regards, -- --- Wei
Two questions about hadoop
Hi all, I am a new user with hadoop and have some questions about it. 1)about setting the number of maps/reduces: With running hadoop on a 8-node cluster, I set mapred.map.tasks to 64 and mapred.tasktracker.map.tasks.maximum to 8, but by examining the counter launched map tasks from the output, I found that hadoop launched from 96 to110 map tasks in different jobs. The size of the dataset is 6.4GB and the dfs.block.size is set to be 64MB. Why is the number of launched map tasks different in different running jobs with the same dataset size and block size? Is there a way to make the hadoop launch the same number of map tasks as specified exactly? 2)about the launched map tasks. Does the number of launched map tasks imply that hadoop would spawn a new thread for each map task? How can I know the number of threads launched by hadoop in a particular job? Thanks very much~~ -- --- Wei