[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-10-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12926119#action_12926119
 ] 

Hudson commented on MAPREDUCE-220:
--

Integrated in Hadoop-Mapreduce-trunk-Commit #523 (See 
[https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/523/])


 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task, tasktracker
Reporter: Hong Tang
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: ant-test-patch.log, ant-test.log, 
 MAPREDUCE-220-20100616.txt, MAPREDUCE-220-20100804.txt, 
 MAPREDUCE-220-20100806.txt, MAPREDUCE-220-20100809.txt, 
 MAPREDUCE-220-20100811.txt, MAPREDUCE-220-20100812.txt, 
 MAPREDUCE-220-20100817.txt, MAPREDUCE-220-20100818.txt, MAPREDUCE-220-v1.txt, 
 MAPREDUCE-220.txt


 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-08-20 Thread Scott Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900796#action_12900796
 ] 

Scott Chen commented on MAPREDUCE-220:
--

Thanks for the help :)

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task, tasktracker
Reporter: Hong Tang
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: ant-test-patch.log, ant-test.log, 
 MAPREDUCE-220-20100616.txt, MAPREDUCE-220-20100804.txt, 
 MAPREDUCE-220-20100806.txt, MAPREDUCE-220-20100809.txt, 
 MAPREDUCE-220-20100811.txt, MAPREDUCE-220-20100812.txt, 
 MAPREDUCE-220-20100817.txt, MAPREDUCE-220-20100818.txt, MAPREDUCE-220-v1.txt, 
 MAPREDUCE-220.txt


 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-08-16 Thread Scott Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12899013#action_12899013
 ] 

Scott Chen commented on MAPREDUCE-220:
--

Hey Arun,
Thanks, I will run the tests and attach them.
Scott

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task, tasktracker
Reporter: Hong Tang
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-20100804.txt, 
 MAPREDUCE-220-20100806.txt, MAPREDUCE-220-20100809.txt, 
 MAPREDUCE-220-20100811.txt, MAPREDUCE-220-20100812.txt, MAPREDUCE-220-v1.txt, 
 MAPREDUCE-220.txt


 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-08-13 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12898498#action_12898498
 ] 

Arun C Murthy commented on MAPREDUCE-220:
-

Hudson might be stuck. Can you please attach the output of 'ant test' and 'ant 
test-patch' here? Thanks.

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task, tasktracker
Reporter: Hong Tang
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-20100804.txt, 
 MAPREDUCE-220-20100806.txt, MAPREDUCE-220-20100809.txt, 
 MAPREDUCE-220-20100811.txt, MAPREDUCE-220-20100812.txt, MAPREDUCE-220-v1.txt, 
 MAPREDUCE-220.txt


 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-08-12 Thread Scott Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897877#action_12897877
 ] 

Scott Chen commented on MAPREDUCE-220:
--

Hey Eli,

I think it will still work. The process tree will be initialized in 
Task.initialized().
So it will get the correct process id.

Scott

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task, tasktracker
Reporter: Hong Tang
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-20100804.txt, 
 MAPREDUCE-220-20100806.txt, MAPREDUCE-220-20100809.txt, 
 MAPREDUCE-220-20100811.txt, MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt


 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-08-11 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897328#action_12897328
 ] 

Arun C Murthy commented on MAPREDUCE-220:
-

Scott, sorry for coming in late. 

I have a nit: we seem to create a new ProcfsBasedProcessTree each time - 
wouldn't it be easier to re-use the object? Create it once and re-use it each 
time?

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task, tasktracker
Reporter: Hong Tang
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-20100804.txt, 
 MAPREDUCE-220-20100806.txt, MAPREDUCE-220-20100809.txt, MAPREDUCE-220-v1.txt, 
 MAPREDUCE-220.txt


 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-08-11 Thread Scott Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897492#action_12897492
 ] 

Scott Chen commented on MAPREDUCE-220:
--

Thanks, Arun. I will update the patch soon.

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task, tasktracker
Reporter: Hong Tang
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-20100804.txt, 
 MAPREDUCE-220-20100806.txt, MAPREDUCE-220-20100809.txt, MAPREDUCE-220-v1.txt, 
 MAPREDUCE-220.txt


 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-08-11 Thread Scott Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897501#action_12897501
 ] 

Scott Chen commented on MAPREDUCE-220:
--

Update to address Arun's comment.

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task, tasktracker
Reporter: Hong Tang
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-20100804.txt, 
 MAPREDUCE-220-20100806.txt, MAPREDUCE-220-20100809.txt, 
 MAPREDUCE-220-20100811.txt, MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt


 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-08-11 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897509#action_12897509
 ] 

Eli Collins commented on MAPREDUCE-220:
---

Caching the process tree this way works with JVM re-use?

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task, tasktracker
Reporter: Hong Tang
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-20100804.txt, 
 MAPREDUCE-220-20100806.txt, MAPREDUCE-220-20100809.txt, 
 MAPREDUCE-220-20100811.txt, MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt


 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-08-09 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12896659#action_12896659
 ] 

Eli Collins commented on MAPREDUCE-220:
---

Looks good. Minor nit: I might rename ProcResourceStatus to something like 
ProcResourceValues.  Also, this inner class technically needs interface 
annotations (private and unstable). Sanjay and Tom can correct me if I'm wrong 
but I don't think we decided that classes inherit the annotations of the outer 
class.

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task, tasktracker
Reporter: Hong Tang
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-20100804.txt, 
 MAPREDUCE-220-20100806.txt, MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt


 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-08-09 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12896683#action_12896683
 ] 

Eli Collins commented on MAPREDUCE-220:
---

+1   

Latest patch looks good to me. Thanks Scott.  

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task, tasktracker
Reporter: Hong Tang
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-20100804.txt, 
 MAPREDUCE-220-20100806.txt, MAPREDUCE-220-20100809.txt, MAPREDUCE-220-v1.txt, 
 MAPREDUCE-220.txt


 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-08-06 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12896084#action_12896084
 ] 

Eli Collins commented on MAPREDUCE-220:
---

Hey Scott, 

Latest patch looks good to me.  I assume the redundant calls to getProcessTree 
be handled in MR-901, worth returning the values as a tuple in the mean time? 
Out of curiosity for the test why did the map and reduce sleeps time need to be 
bumped to 5s? Wouldn't anything 1s pass?

Thanks,
Eli 

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task, tasktracker
Reporter: Hong Tang
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-20100804.txt, 
 MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt


 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-08-03 Thread Scott Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12894990#action_12894990
 ] 

Scott Chen commented on MAPREDUCE-220:
--

Hey Philip,

We haven't try test this under the case of JVM re-use. But I think you are 
right about this.
We need to do some more work for this case.

We can still get the correct PID in JVM reuse case. Because we use
{code}
String pid = System.getenv().get(JVM_PID);
{code}
which is invoked from Task.updateCounters().
So we should be able to get the correct PID for the task no matter JVM is 
reused or not.

The problem is the cumulated CPU time. Because the process may be used by 
another task for a while.
One way to solve this is to send only the current value instead of cumulated 
value.
Does this sound correct to you?

Scott

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task, tasktracker
Reporter: Hong Tang
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-v1.txt, 
 MAPREDUCE-220.txt


 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-08-03 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12895129#action_12895129
 ] 

Philip Zeyliger commented on MAPREDUCE-220:
---

Hi Scott,

You could also reset the counters to 0 when the new task is started (sort of 
like a tare button on a scale).  If 
resourceCalculator.getProcCumulativeCpuTime() was rather 
resourceCalculator.getCumulativeCpuTimeDelta() [cumulative CPU time since last 
call], you could use counter.incr() for the CPU usage.

It's also worth mentioning that the memory usage here is the last-known memory 
usage value.  It's not byte-seconds (which wouldn't be that useful), nor is it 
maximum memory.  That seems useful, but it's a bit unintuitive.

{noformat}
+long cpuTime = resourceCalculator.getProcCumulativeCpuTime();
+long pMem = resourceCalculator.getProcPhysicalMemorySize();
+long vMem = resourceCalculator.getProcVirtualMemorySize();
+counters.findCounter(TaskCounter.CPU_MILLISECONDS).setValue(cpuTime);
+counters.findCounter(TaskCounter.PHYSICAL_MEMORY_BYTES).setValue(pMem);
+counters.findCounter(TaskCounter.VIRTUAL_MEMORY_BYTES).setValue(vMem);
{noformat}

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task, tasktracker
Reporter: Hong Tang
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-v1.txt, 
 MAPREDUCE-220.txt


 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-08-02 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12894821#action_12894821
 ] 

Philip Zeyliger commented on MAPREDUCE-220:
---

Scott,

Quick question: have you tried this patch with JVM re-use enabled?  On my 
quick-reading, this patch doesn't handle that case; I don't know if it's a real 
problem or not.

Cheers,

-- Philip

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task, tasktracker
Reporter: Hong Tang
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-v1.txt, 
 MAPREDUCE-220.txt


 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-07-13 Thread M. C. Srivas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887672#action_12887672
 ] 

M. C. Srivas commented on MAPREDUCE-220:


We've found that disk bandwidth is virtually unlimited compared to other 
factors, esp network, thus measuring/collecting it is not worthwhile for 
scheduling. More interesting is disk-ops-per-second-per-drive. It identifies  
bad data layout immediately (ie, one disk will be very hot even though it might 
be transferring very little data).

Unfortunately, using ops / second / disk  to schedule work is still not very 
useful, since bad data layout will not change because we schedule less.

Network is a big bottleneck. But bytes-in/bytes-out per unit of time is not 
representative of a problem. IF we had some measure of the congestion, we could 
use it to increase/decrease scheduling locality (eg, if network gets congested, 
reduce %-age of non-local tasks).  We need to know round-trip times under 
normal vs congested situations., dropped packet counts, retransmit counts, 
etc. to figure out metrics for congestion. (Perhaps add some sockopts to tell 
us this? TCP knows this, after all)

CPU/memory/swapping still seem to be most useful therefore.




 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task, tasktracker
Reporter: Hong Tang
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-v1.txt, 
 MAPREDUCE-220.txt


 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-07-13 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887674#action_12887674
 ] 

dhruba borthakur commented on MAPREDUCE-220:


+1 to srivas's proposal. let this jira focus on cpu/memory metrics. And then 
maybe continue the discussion about disk bandwidth in another jira.

Evan:  If this is acceptable to you, can you pl create a new jira for it? 
Thanks.

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task, tasktracker
Reporter: Hong Tang
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-v1.txt, 
 MAPREDUCE-220.txt


 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-07-12 Thread Evan Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887349#action_12887349
 ] 

Evan Wang commented on MAPREDUCE-220:
-

why not collect disk i/o and bandwidth for scheduling as well?

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task, tasktracker
Reporter: Hong Tang
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-v1.txt, 
 MAPREDUCE-220.txt


 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-07-12 Thread Scott Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887572#action_12887572
 ] 

Scott Chen commented on MAPREDUCE-220:
--

@Evan: That's a very good idea. We can file another JIRA on this one. What do 
you think?

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task, tasktracker
Reporter: Hong Tang
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-v1.txt, 
 MAPREDUCE-220.txt


 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-07-12 Thread Evan Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887613#action_12887613
 ] 

Evan Wang commented on MAPREDUCE-220:
-

@Scott: I've already make a java tool for MR profiling through Linux OS tools, 
which is independent from Hadoop. However, the overhead of network monitor, 
tcpdump, is really high. When running gridmix2, tcpdump will cost 20% cpu in 
one core. Disk monitor also encountered some problems. So, I am not so sure 
that the MR performance is influenced by all that factors---cpu, memory, disk, 
network. I'd like to complete my base experiment first. Could you give me some 
advice about network and disk monitor?

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task, tasktracker
Reporter: Hong Tang
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-v1.txt, 
 MAPREDUCE-220.txt


 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-07-12 Thread Scott Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887620#action_12887620
 ] 

Scott Chen commented on MAPREDUCE-220:
--

@Evan: This sounds like a good experiment.

The CPU and memory collected in this JIRA is obtained by parsing /proc/ 
directory.
It is very good because /proc/ is in memory so the overhead is small. 
However, there is no per process IO and network information in /proc/.
And like you mentioned, using tools like tcpdump can be very expensive.

Another approach to do this is by counting the {non,rack,data}-local bytes 
fetched from HDFS and fetched/served for map output.
This way we can estimate the IO and network traffic from these numbers.
The drawback of this approach is that this doesn't capture IO and network that 
is not introduced by the framework.
People can write user script which does lots of IO. That will not be captured 
by this.
Thoughts?


 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task, tasktracker
Reporter: Hong Tang
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-v1.txt, 
 MAPREDUCE-220.txt


 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-06-23 Thread Scott Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881907#action_12881907
 ] 

Scott Chen commented on MAPREDUCE-220:
--

Allen: I agree. LinuxProcfsBasedProcessTree is a better name.

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task, tasktracker
Reporter: Hong Tang
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-v1.txt, 
 MAPREDUCE-220.txt


 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-06-17 Thread Scott Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880017#action_12880017
 ] 

Scott Chen commented on MAPREDUCE-220:
--

The failed contrib test is TestSimulatorDeterministicReplay.testMain.
It is a know issue in MAPREDUCE-1834.

In the patch we put task cumulative CPU time, current physical memory and 
current virtual memory in task counters.
So it will be aggregated in JobInProgress.getJobCounter(). We will get the 
total CPU time and current total memory usage.
They will go to both web ui and history as part of the counters.
We can access the task counters to obtain these information in place like task 
scheduler too.

@Vinod: I didn't do the refactoring of the ProcTree. Because 
LinuxResourceCalculatorPlugin is now called by the task (updateCounters is in 
Task.java). It is in a different process than TaskTracker so we can not reuse 
the ProcTree. But directory /proc/ is in memory, this may not be so bad in 
terms of performance. What do you think?

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task, tasktracker
Reporter: Hong Tang
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-v1.txt, 
 MAPREDUCE-220.txt


 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-06-17 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880022#action_12880022
 ] 

Allen Wittenauer commented on MAPREDUCE-220:


(You know, it is a shame this was called ProcfsBasedProcessTree with the 
disclaimer that it only works on Linux.  It probably should be renamed 
LinuxProcfsBasedProcessTree so that other operating systems with /proc could 
work.  I suppose the alternative is to hack this code to be multi-OS aware)

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task, tasktracker
Reporter: Hong Tang
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-v1.txt, 
 MAPREDUCE-220.txt


 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-06-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12879644#action_12879644
 ] 

Hadoop QA commented on MAPREDUCE-220:
-

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12447289/MAPREDUCE-220-20100616.txt
  against trunk revision 955198.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/576/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/576/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/576/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/576/console

This message is automatically generated.

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task, tasktracker
Reporter: Hong Tang
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-v1.txt, 
 MAPREDUCE-220.txt


 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-05-06 Thread Scott Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864985#action_12864985
 ] 

Scott Chen commented on MAPREDUCE-220:
--

I had some discussion with Arun. The problem with the Counter is that it can 
only be incremented.
So it is difficult to use to transmit CPU and memory information (this goes up 
and down).
We filed another JIRA MAPREDUCE-1762 to allow setValue() in Counter.
Then we may use Counters to send these information.

What do you think?

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task, tasktracker
Reporter: Hong Tang
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt


 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-05-04 Thread Vinod K V (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12863894#action_12863894
 ] 

Vinod K V commented on MAPREDUCE-220:
-

bq. We probably should convert the fields in ResourceStatus into Counters and 
use that as the primary interface for the end-user, also we should store them 
into JobHistory etc.
I second that. It will also solve two other issues with the patch:
 - the cpu and memory usage details of each task are sent in every heartbeat, 
making it bulky. Translating them into Counters will make them to be sent only 
once every minute
 - with Counters, we get for free the logging into JobHistory as well 
displaying on the web UI.

Leaving that aside, I have one more comment on the TT side: For getting the 
cpu/memory usage of a task, we construct the process-tree of the task 
repeatedly every time a heartbeat is sent.
 - For one, if we go the Counters way, we only need to do the calculations 
every once a minute.
 - Otherwise, the process-trees for all tasks are now constructed by both by 
TaskMemoryManager and the TT main thread. It can become costly depending on the 
size of the process-tree. There is an opportunity for refactoring this, I guess 
- may be a single class which maintains all the process-trees 
(TaskMemoryManager.ProcessTreeInfo?) and the corresponding statistics, within a 
given precision, time-wise.

Scott?

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task, tasktracker
Reporter: Hong Tang
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt


 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-05-04 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12863897#action_12863897
 ] 

dhruba borthakur commented on MAPREDUCE-220:


I like the idea of sending this information via Counters.

This data could be used by schedulers to make decisions on what/when to 
schedule new tasks or preempt existing tasks. For this use-case, it would be 
nice if we can send them to the JT more frequently that 1 minute. any ideas 
here?



 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task, tasktracker
Reporter: Hong Tang
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt


 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-05-04 Thread Scott Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864065#action_12864065
 ] 

Scott Chen commented on MAPREDUCE-220:
--

Hey guys, Thanks for the help.

I am not familiar with the counters. But from Arun and Vinod's comments I can 
the see the benefits:
1. Reuse of the counter logging and transmitting
2. Easier to expose to end users
This is really good!

But as Dhruba mentioned, we want to use this information for scheduling.
So measuring it and then sending it with the heart beat ensures the scheduler 
gets the latest information.
One minute may be too slow for the scheduling.

The other question I have is that 
Using counters, can we aggregate using other method (e.g. max) rather than just 
increment values?

My original plan is to report these information in this issue and aggregate 
them into job level status in MAPREDUCE-1739.
And I am planning to generate these fields after aggregation:
1. Total CPU cycles (# of giga-cycles)
2. Total Memory occupied time (GB-sec)
3. Maximum peak memory on one task (GB)
4. Maximum peak CPU on one task (GHz)
Is it possible to get these fields by using the counters?

I will read the relavent codes and think more about it.
Maybe there's a way to obtain both benefit.

Vinod: I also feel that there are lots of redundant creation/computation of 
processTree.
Maybe we should refactor the codes and use one thread to compute it and expose 
the information to others.



 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task, tasktracker
Reporter: Hong Tang
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt


 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-05-03 Thread Scott Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12863498#action_12863498
 ] 

Scott Chen commented on MAPREDUCE-220:
--

Vinod: I think you are very familiar with this part of the codes. Is it 
possible that you can help me review the patch? Thanks.

Arun: Let me know if there is anything that might cause trouble to 
MAPREDUCE-901.

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Hong Tang
Assignee: Scott Chen
 Attachments: MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt


 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-04-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12862651#action_12862651
 ] 

Hadoop QA commented on MAPREDUCE-220:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12443235/MAPREDUCE-220-v1.txt
  against trunk revision 939505.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/363/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/363/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/363/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/363/console

This message is automatically generated.

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Hong Tang
Assignee: Scott Chen
 Attachments: MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt


 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-04-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12862093#action_12862093
 ] 

Hadoop QA commented on MAPREDUCE-220:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12443120/MAPREDUCE-220.txt
  against trunk revision 938805.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 1 new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/155/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/155/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/155/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/155/console

This message is automatically generated.

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Hong Tang
Assignee: Scott Chen
 Attachments: MAPREDUCE-220.txt


 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-04-21 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12859454#action_12859454
 ] 

dhruba borthakur commented on MAPREDUCE-220:


hi folks, we would like to start work on this one. In the earlier discussion, 
we said that a pre-requisite is to refactor all the metric reporting via 
MAPREDUCE-901. But 901 is not moving forward. Given that fact, is it ok with 
people if we start working on this JIRA using the existing reporting framework 
even before M901 is done?

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Hong Tang
Assignee: Scott Chen

 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-04-21 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12859458#action_12859458
 ] 

Arun C Murthy commented on MAPREDUCE-220:
-

Dhruba, I think that makes sense.

Having said that, if we do manage to get MAPREDUCE-901 committed before this 
gets in, would it be reasonable to ask for a bit of re-work on this one?

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Hong Tang
Assignee: Scott Chen

 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-04-21 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12859464#action_12859464
 ] 

dhruba borthakur commented on MAPREDUCE-220:


Agreed. we can start work on this one, ge it reviewed by the community, and by 
the time it is ready, if M-901 is already comitted, then we rafactor this patch 
to be compatible with M-901. On the other hand, if M-901 is not committed by 
the time this one is ready, then we do not hold this one up. Sounds fair?

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Hong Tang
Assignee: Scott Chen

 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-04-21 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12859467#action_12859467
 ] 

Arun C Murthy commented on MAPREDUCE-220:
-

Precisely. Thanks!

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Hong Tang
Assignee: Scott Chen

 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-04-21 Thread Scott Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12859468#action_12859468
 ] 

Scott Chen commented on MAPREDUCE-220:
--

Thanks, Dhruba and Arun.
I will start working on this one now.

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Hong Tang
Assignee: Scott Chen

 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2009-11-23 Thread Scott Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781548#action_12781548
 ] 

Scott Chen commented on MAPREDUCE-220:
--

bq. What can these two stats possibly be used for?

B2. can also allow us to compute the current CPU usage (by taking difference)?
+1 on Hong's idea of collecting A4. total number of child processes.

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Hong Tang
Assignee: Scott Chen

 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2009-11-21 Thread Hong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780938#action_12780938
 ] 

Hong Tang commented on MAPREDUCE-220:
-

bq. What can these two stats possibly be used for?
This would allow us to tell whether a sudden decrease/increase of cpu or memory 
usage is caused by the spawning of new processes.

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Hong Tang
Assignee: Scott Chen

 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2009-11-18 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12779572#action_12779572
 ] 

Olga Natkovich commented on MAPREDUCE-220:
--

It would be very useful for profiling purposes if applications could get 
resource utilization information via counters. Either detailed information for 
each map/reduce or min/max/average could be useful.

- Average CPU utilization
- Max memory usage
- Average inbound and outbound I/0. (Not sure if it is possible to obtain this 
information on per process basis.)

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Hong Tang
Assignee: Scott Chen

 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2009-11-17 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12779109#action_12779109
 ] 

dhruba borthakur commented on MAPREDUCE-220:


Are we proposing that we add the following metrics to the heartbeat message?

A1. virtual memory used by each task (in bytes)
A2. physical memory  used by each task (in bytes)
A3. cpu used by each task (as a percentage of total CPU on that machine)

B1. available physical memory on this machine (in bytes)
B2. available cpu on this machine (as a percentage of total CPU on that machine)

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Hong Tang
Assignee: Scott Chen

 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2009-11-17 Thread Scott Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12779230#action_12779230
 ] 

Scott Chen commented on MAPREDUCE-220:
--

I posted the above comment in the wrong place. It's supposed to go to 
MAPREDUCE-961. Sorry for the confusion and spam.

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Hong Tang
Assignee: Scott Chen

 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2009-11-17 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12779232#action_12779232
 ] 

dhruba borthakur commented on MAPREDUCE-220:


I am proposing that we hold off doing anything to this JIRA until MAPREDUCE-901 
is committed.

In the meantime, the items marked B1 - B4 can be done as part of 
MAPREDUCE-1218. The B1-B4 are not task related metrics (rather, they are 
TaskTracker related) and are not dependent on TaskMetrics changes proposed in 
MAPREDUCE-901.

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Hong Tang
Assignee: Scott Chen

 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2009-11-17 Thread Vinod K V (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12779292#action_12779292
 ] 

Vinod K V commented on MAPREDUCE-220:
-

bq. B2. cumulative used cpu time (for all cores) since the machine is up (in 
millisecond)
bq. I'd also propose A4. Total number of child processes in the process tree 
rooted from the main task process.
What can these two stats possibly be used for?

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Hong Tang
Assignee: Scott Chen

 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.