[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12926119#action_12926119 ] Hudson commented on MAPREDUCE-220: -- Integrated in Hadoop-Mapreduce-trunk-Commit #523 (See [https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/523/]) Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Components: task, tasktracker Reporter: Hong Tang Assignee: Scott Chen Fix For: 0.22.0 Attachments: ant-test-patch.log, ant-test.log, MAPREDUCE-220-20100616.txt, MAPREDUCE-220-20100804.txt, MAPREDUCE-220-20100806.txt, MAPREDUCE-220-20100809.txt, MAPREDUCE-220-20100811.txt, MAPREDUCE-220-20100812.txt, MAPREDUCE-220-20100817.txt, MAPREDUCE-220-20100818.txt, MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900796#action_12900796 ] Scott Chen commented on MAPREDUCE-220: -- Thanks for the help :) Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Components: task, tasktracker Reporter: Hong Tang Assignee: Scott Chen Fix For: 0.22.0 Attachments: ant-test-patch.log, ant-test.log, MAPREDUCE-220-20100616.txt, MAPREDUCE-220-20100804.txt, MAPREDUCE-220-20100806.txt, MAPREDUCE-220-20100809.txt, MAPREDUCE-220-20100811.txt, MAPREDUCE-220-20100812.txt, MAPREDUCE-220-20100817.txt, MAPREDUCE-220-20100818.txt, MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12899013#action_12899013 ] Scott Chen commented on MAPREDUCE-220: -- Hey Arun, Thanks, I will run the tests and attach them. Scott Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Components: task, tasktracker Reporter: Hong Tang Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-20100804.txt, MAPREDUCE-220-20100806.txt, MAPREDUCE-220-20100809.txt, MAPREDUCE-220-20100811.txt, MAPREDUCE-220-20100812.txt, MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12898498#action_12898498 ] Arun C Murthy commented on MAPREDUCE-220: - Hudson might be stuck. Can you please attach the output of 'ant test' and 'ant test-patch' here? Thanks. Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Components: task, tasktracker Reporter: Hong Tang Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-20100804.txt, MAPREDUCE-220-20100806.txt, MAPREDUCE-220-20100809.txt, MAPREDUCE-220-20100811.txt, MAPREDUCE-220-20100812.txt, MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897877#action_12897877 ] Scott Chen commented on MAPREDUCE-220: -- Hey Eli, I think it will still work. The process tree will be initialized in Task.initialized(). So it will get the correct process id. Scott Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Components: task, tasktracker Reporter: Hong Tang Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-20100804.txt, MAPREDUCE-220-20100806.txt, MAPREDUCE-220-20100809.txt, MAPREDUCE-220-20100811.txt, MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897328#action_12897328 ] Arun C Murthy commented on MAPREDUCE-220: - Scott, sorry for coming in late. I have a nit: we seem to create a new ProcfsBasedProcessTree each time - wouldn't it be easier to re-use the object? Create it once and re-use it each time? Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Components: task, tasktracker Reporter: Hong Tang Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-20100804.txt, MAPREDUCE-220-20100806.txt, MAPREDUCE-220-20100809.txt, MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897492#action_12897492 ] Scott Chen commented on MAPREDUCE-220: -- Thanks, Arun. I will update the patch soon. Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Components: task, tasktracker Reporter: Hong Tang Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-20100804.txt, MAPREDUCE-220-20100806.txt, MAPREDUCE-220-20100809.txt, MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897501#action_12897501 ] Scott Chen commented on MAPREDUCE-220: -- Update to address Arun's comment. Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Components: task, tasktracker Reporter: Hong Tang Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-20100804.txt, MAPREDUCE-220-20100806.txt, MAPREDUCE-220-20100809.txt, MAPREDUCE-220-20100811.txt, MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897509#action_12897509 ] Eli Collins commented on MAPREDUCE-220: --- Caching the process tree this way works with JVM re-use? Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Components: task, tasktracker Reporter: Hong Tang Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-20100804.txt, MAPREDUCE-220-20100806.txt, MAPREDUCE-220-20100809.txt, MAPREDUCE-220-20100811.txt, MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12896659#action_12896659 ] Eli Collins commented on MAPREDUCE-220: --- Looks good. Minor nit: I might rename ProcResourceStatus to something like ProcResourceValues. Also, this inner class technically needs interface annotations (private and unstable). Sanjay and Tom can correct me if I'm wrong but I don't think we decided that classes inherit the annotations of the outer class. Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Components: task, tasktracker Reporter: Hong Tang Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-20100804.txt, MAPREDUCE-220-20100806.txt, MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12896683#action_12896683 ] Eli Collins commented on MAPREDUCE-220: --- +1 Latest patch looks good to me. Thanks Scott. Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Components: task, tasktracker Reporter: Hong Tang Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-20100804.txt, MAPREDUCE-220-20100806.txt, MAPREDUCE-220-20100809.txt, MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12896084#action_12896084 ] Eli Collins commented on MAPREDUCE-220: --- Hey Scott, Latest patch looks good to me. I assume the redundant calls to getProcessTree be handled in MR-901, worth returning the values as a tuple in the mean time? Out of curiosity for the test why did the map and reduce sleeps time need to be bumped to 5s? Wouldn't anything 1s pass? Thanks, Eli Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Components: task, tasktracker Reporter: Hong Tang Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-20100804.txt, MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12894990#action_12894990 ] Scott Chen commented on MAPREDUCE-220: -- Hey Philip, We haven't try test this under the case of JVM re-use. But I think you are right about this. We need to do some more work for this case. We can still get the correct PID in JVM reuse case. Because we use {code} String pid = System.getenv().get(JVM_PID); {code} which is invoked from Task.updateCounters(). So we should be able to get the correct PID for the task no matter JVM is reused or not. The problem is the cumulated CPU time. Because the process may be used by another task for a while. One way to solve this is to send only the current value instead of cumulated value. Does this sound correct to you? Scott Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Components: task, tasktracker Reporter: Hong Tang Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12895129#action_12895129 ] Philip Zeyliger commented on MAPREDUCE-220: --- Hi Scott, You could also reset the counters to 0 when the new task is started (sort of like a tare button on a scale). If resourceCalculator.getProcCumulativeCpuTime() was rather resourceCalculator.getCumulativeCpuTimeDelta() [cumulative CPU time since last call], you could use counter.incr() for the CPU usage. It's also worth mentioning that the memory usage here is the last-known memory usage value. It's not byte-seconds (which wouldn't be that useful), nor is it maximum memory. That seems useful, but it's a bit unintuitive. {noformat} +long cpuTime = resourceCalculator.getProcCumulativeCpuTime(); +long pMem = resourceCalculator.getProcPhysicalMemorySize(); +long vMem = resourceCalculator.getProcVirtualMemorySize(); +counters.findCounter(TaskCounter.CPU_MILLISECONDS).setValue(cpuTime); +counters.findCounter(TaskCounter.PHYSICAL_MEMORY_BYTES).setValue(pMem); +counters.findCounter(TaskCounter.VIRTUAL_MEMORY_BYTES).setValue(vMem); {noformat} Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Components: task, tasktracker Reporter: Hong Tang Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12894821#action_12894821 ] Philip Zeyliger commented on MAPREDUCE-220: --- Scott, Quick question: have you tried this patch with JVM re-use enabled? On my quick-reading, this patch doesn't handle that case; I don't know if it's a real problem or not. Cheers, -- Philip Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Components: task, tasktracker Reporter: Hong Tang Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887672#action_12887672 ] M. C. Srivas commented on MAPREDUCE-220: We've found that disk bandwidth is virtually unlimited compared to other factors, esp network, thus measuring/collecting it is not worthwhile for scheduling. More interesting is disk-ops-per-second-per-drive. It identifies bad data layout immediately (ie, one disk will be very hot even though it might be transferring very little data). Unfortunately, using ops / second / disk to schedule work is still not very useful, since bad data layout will not change because we schedule less. Network is a big bottleneck. But bytes-in/bytes-out per unit of time is not representative of a problem. IF we had some measure of the congestion, we could use it to increase/decrease scheduling locality (eg, if network gets congested, reduce %-age of non-local tasks). We need to know round-trip times under normal vs congested situations., dropped packet counts, retransmit counts, etc. to figure out metrics for congestion. (Perhaps add some sockopts to tell us this? TCP knows this, after all) CPU/memory/swapping still seem to be most useful therefore. Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Components: task, tasktracker Reporter: Hong Tang Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887674#action_12887674 ] dhruba borthakur commented on MAPREDUCE-220: +1 to srivas's proposal. let this jira focus on cpu/memory metrics. And then maybe continue the discussion about disk bandwidth in another jira. Evan: If this is acceptable to you, can you pl create a new jira for it? Thanks. Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Components: task, tasktracker Reporter: Hong Tang Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887349#action_12887349 ] Evan Wang commented on MAPREDUCE-220: - why not collect disk i/o and bandwidth for scheduling as well? Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Components: task, tasktracker Reporter: Hong Tang Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887572#action_12887572 ] Scott Chen commented on MAPREDUCE-220: -- @Evan: That's a very good idea. We can file another JIRA on this one. What do you think? Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Components: task, tasktracker Reporter: Hong Tang Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887613#action_12887613 ] Evan Wang commented on MAPREDUCE-220: - @Scott: I've already make a java tool for MR profiling through Linux OS tools, which is independent from Hadoop. However, the overhead of network monitor, tcpdump, is really high. When running gridmix2, tcpdump will cost 20% cpu in one core. Disk monitor also encountered some problems. So, I am not so sure that the MR performance is influenced by all that factors---cpu, memory, disk, network. I'd like to complete my base experiment first. Could you give me some advice about network and disk monitor? Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Components: task, tasktracker Reporter: Hong Tang Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887620#action_12887620 ] Scott Chen commented on MAPREDUCE-220: -- @Evan: This sounds like a good experiment. The CPU and memory collected in this JIRA is obtained by parsing /proc/ directory. It is very good because /proc/ is in memory so the overhead is small. However, there is no per process IO and network information in /proc/. And like you mentioned, using tools like tcpdump can be very expensive. Another approach to do this is by counting the {non,rack,data}-local bytes fetched from HDFS and fetched/served for map output. This way we can estimate the IO and network traffic from these numbers. The drawback of this approach is that this doesn't capture IO and network that is not introduced by the framework. People can write user script which does lots of IO. That will not be captured by this. Thoughts? Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Components: task, tasktracker Reporter: Hong Tang Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881907#action_12881907 ] Scott Chen commented on MAPREDUCE-220: -- Allen: I agree. LinuxProcfsBasedProcessTree is a better name. Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Components: task, tasktracker Reporter: Hong Tang Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880017#action_12880017 ] Scott Chen commented on MAPREDUCE-220: -- The failed contrib test is TestSimulatorDeterministicReplay.testMain. It is a know issue in MAPREDUCE-1834. In the patch we put task cumulative CPU time, current physical memory and current virtual memory in task counters. So it will be aggregated in JobInProgress.getJobCounter(). We will get the total CPU time and current total memory usage. They will go to both web ui and history as part of the counters. We can access the task counters to obtain these information in place like task scheduler too. @Vinod: I didn't do the refactoring of the ProcTree. Because LinuxResourceCalculatorPlugin is now called by the task (updateCounters is in Task.java). It is in a different process than TaskTracker so we can not reuse the ProcTree. But directory /proc/ is in memory, this may not be so bad in terms of performance. What do you think? Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Components: task, tasktracker Reporter: Hong Tang Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880022#action_12880022 ] Allen Wittenauer commented on MAPREDUCE-220: (You know, it is a shame this was called ProcfsBasedProcessTree with the disclaimer that it only works on Linux. It probably should be renamed LinuxProcfsBasedProcessTree so that other operating systems with /proc could work. I suppose the alternative is to hack this code to be multi-OS aware) Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Components: task, tasktracker Reporter: Hong Tang Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12879644#action_12879644 ] Hadoop QA commented on MAPREDUCE-220: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12447289/MAPREDUCE-220-20100616.txt against trunk revision 955198. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/576/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/576/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/576/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/576/console This message is automatically generated. Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Components: task, tasktracker Reporter: Hong Tang Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864985#action_12864985 ] Scott Chen commented on MAPREDUCE-220: -- I had some discussion with Arun. The problem with the Counter is that it can only be incremented. So it is difficult to use to transmit CPU and memory information (this goes up and down). We filed another JIRA MAPREDUCE-1762 to allow setValue() in Counter. Then we may use Counters to send these information. What do you think? Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Components: task, tasktracker Reporter: Hong Tang Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12863894#action_12863894 ] Vinod K V commented on MAPREDUCE-220: - bq. We probably should convert the fields in ResourceStatus into Counters and use that as the primary interface for the end-user, also we should store them into JobHistory etc. I second that. It will also solve two other issues with the patch: - the cpu and memory usage details of each task are sent in every heartbeat, making it bulky. Translating them into Counters will make them to be sent only once every minute - with Counters, we get for free the logging into JobHistory as well displaying on the web UI. Leaving that aside, I have one more comment on the TT side: For getting the cpu/memory usage of a task, we construct the process-tree of the task repeatedly every time a heartbeat is sent. - For one, if we go the Counters way, we only need to do the calculations every once a minute. - Otherwise, the process-trees for all tasks are now constructed by both by TaskMemoryManager and the TT main thread. It can become costly depending on the size of the process-tree. There is an opportunity for refactoring this, I guess - may be a single class which maintains all the process-trees (TaskMemoryManager.ProcessTreeInfo?) and the corresponding statistics, within a given precision, time-wise. Scott? Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Components: task, tasktracker Reporter: Hong Tang Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12863897#action_12863897 ] dhruba borthakur commented on MAPREDUCE-220: I like the idea of sending this information via Counters. This data could be used by schedulers to make decisions on what/when to schedule new tasks or preempt existing tasks. For this use-case, it would be nice if we can send them to the JT more frequently that 1 minute. any ideas here? Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Components: task, tasktracker Reporter: Hong Tang Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864065#action_12864065 ] Scott Chen commented on MAPREDUCE-220: -- Hey guys, Thanks for the help. I am not familiar with the counters. But from Arun and Vinod's comments I can the see the benefits: 1. Reuse of the counter logging and transmitting 2. Easier to expose to end users This is really good! But as Dhruba mentioned, we want to use this information for scheduling. So measuring it and then sending it with the heart beat ensures the scheduler gets the latest information. One minute may be too slow for the scheduling. The other question I have is that Using counters, can we aggregate using other method (e.g. max) rather than just increment values? My original plan is to report these information in this issue and aggregate them into job level status in MAPREDUCE-1739. And I am planning to generate these fields after aggregation: 1. Total CPU cycles (# of giga-cycles) 2. Total Memory occupied time (GB-sec) 3. Maximum peak memory on one task (GB) 4. Maximum peak CPU on one task (GHz) Is it possible to get these fields by using the counters? I will read the relavent codes and think more about it. Maybe there's a way to obtain both benefit. Vinod: I also feel that there are lots of redundant creation/computation of processTree. Maybe we should refactor the codes and use one thread to compute it and expose the information to others. Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Components: task, tasktracker Reporter: Hong Tang Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12863498#action_12863498 ] Scott Chen commented on MAPREDUCE-220: -- Vinod: I think you are very familiar with this part of the codes. Is it possible that you can help me review the patch? Thanks. Arun: Let me know if there is anything that might cause trouble to MAPREDUCE-901. Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Hong Tang Assignee: Scott Chen Attachments: MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12862651#action_12862651 ] Hadoop QA commented on MAPREDUCE-220: - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12443235/MAPREDUCE-220-v1.txt against trunk revision 939505. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/363/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/363/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/363/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/363/console This message is automatically generated. Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Hong Tang Assignee: Scott Chen Attachments: MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12862093#action_12862093 ] Hadoop QA commented on MAPREDUCE-220: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12443120/MAPREDUCE-220.txt against trunk revision 938805. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/155/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/155/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/155/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/155/console This message is automatically generated. Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Hong Tang Assignee: Scott Chen Attachments: MAPREDUCE-220.txt It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12859454#action_12859454 ] dhruba borthakur commented on MAPREDUCE-220: hi folks, we would like to start work on this one. In the earlier discussion, we said that a pre-requisite is to refactor all the metric reporting via MAPREDUCE-901. But 901 is not moving forward. Given that fact, is it ok with people if we start working on this JIRA using the existing reporting framework even before M901 is done? Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Hong Tang Assignee: Scott Chen It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12859458#action_12859458 ] Arun C Murthy commented on MAPREDUCE-220: - Dhruba, I think that makes sense. Having said that, if we do manage to get MAPREDUCE-901 committed before this gets in, would it be reasonable to ask for a bit of re-work on this one? Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Hong Tang Assignee: Scott Chen It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12859464#action_12859464 ] dhruba borthakur commented on MAPREDUCE-220: Agreed. we can start work on this one, ge it reviewed by the community, and by the time it is ready, if M-901 is already comitted, then we rafactor this patch to be compatible with M-901. On the other hand, if M-901 is not committed by the time this one is ready, then we do not hold this one up. Sounds fair? Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Hong Tang Assignee: Scott Chen It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12859467#action_12859467 ] Arun C Murthy commented on MAPREDUCE-220: - Precisely. Thanks! Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Hong Tang Assignee: Scott Chen It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12859468#action_12859468 ] Scott Chen commented on MAPREDUCE-220: -- Thanks, Dhruba and Arun. I will start working on this one now. Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Hong Tang Assignee: Scott Chen It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781548#action_12781548 ] Scott Chen commented on MAPREDUCE-220: -- bq. What can these two stats possibly be used for? B2. can also allow us to compute the current CPU usage (by taking difference)? +1 on Hong's idea of collecting A4. total number of child processes. Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Hong Tang Assignee: Scott Chen It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780938#action_12780938 ] Hong Tang commented on MAPREDUCE-220: - bq. What can these two stats possibly be used for? This would allow us to tell whether a sudden decrease/increase of cpu or memory usage is caused by the spawning of new processes. Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Hong Tang Assignee: Scott Chen It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12779572#action_12779572 ] Olga Natkovich commented on MAPREDUCE-220: -- It would be very useful for profiling purposes if applications could get resource utilization information via counters. Either detailed information for each map/reduce or min/max/average could be useful. - Average CPU utilization - Max memory usage - Average inbound and outbound I/0. (Not sure if it is possible to obtain this information on per process basis.) Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Hong Tang Assignee: Scott Chen It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12779109#action_12779109 ] dhruba borthakur commented on MAPREDUCE-220: Are we proposing that we add the following metrics to the heartbeat message? A1. virtual memory used by each task (in bytes) A2. physical memory used by each task (in bytes) A3. cpu used by each task (as a percentage of total CPU on that machine) B1. available physical memory on this machine (in bytes) B2. available cpu on this machine (as a percentage of total CPU on that machine) Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Hong Tang Assignee: Scott Chen It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12779230#action_12779230 ] Scott Chen commented on MAPREDUCE-220: -- I posted the above comment in the wrong place. It's supposed to go to MAPREDUCE-961. Sorry for the confusion and spam. Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Hong Tang Assignee: Scott Chen It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12779232#action_12779232 ] dhruba borthakur commented on MAPREDUCE-220: I am proposing that we hold off doing anything to this JIRA until MAPREDUCE-901 is committed. In the meantime, the items marked B1 - B4 can be done as part of MAPREDUCE-1218. The B1-B4 are not task related metrics (rather, they are TaskTracker related) and are not dependent on TaskMetrics changes proposed in MAPREDUCE-901. Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Hong Tang Assignee: Scott Chen It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12779292#action_12779292 ] Vinod K V commented on MAPREDUCE-220: - bq. B2. cumulative used cpu time (for all cores) since the machine is up (in millisecond) bq. I'd also propose A4. Total number of child processes in the process tree rooted from the main task process. What can these two stats possibly be used for? Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Hong Tang Assignee: Scott Chen It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.