[jira] [Updated] (TEZ-1698) Use ResourceCalculatorPlugin instead of ResourceCalculatorProcessTree in Tez

2014-10-23 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1698:
-
Attachment: ProcfsBasedProcessTree.png

> Use ResourceCalculatorPlugin instead of ResourceCalculatorProcessTree in Tez
> 
>
> Key: TEZ-1698
> URL: https://issues.apache.org/jira/browse/TEZ-1698
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.2
>Reporter: Gopal V
> Attachments: ProcfsBasedProcessTree.png
>
>
> ResourceCalculatorProcessTree scraps all of /proc/ for PIDs which are part of 
> the current task's process group.
> This is mostly wasted in Tez, since unlike YARN which has to do this since it 
> has the PID for the container-executor process (bash) and has to trace the 
> bash -> java spawn inheritance.
> !ProcfsBasedProcessTree.png!
> The effect of this is less clearly visible with the profiler turned on as 
> this is primarily related to Syscall overhead in the kernel (via the 
> following codepath in YARN).
> {code}
>  private List getProcessList() {
> String[] processDirs = (new File(procfsDir)).list();
> ...
> for (String dir : processDirs) {
>   try {
> if ((new File(procfsDir, dir)).isDirectory()) {
>   processList.add(dir);
> }
> ...
>   public void updateProcessTree() {
> if (!pid.equals(deadPid)) {
>   // Get the list of processes
>   List processList = getProcessList();
> ...
>   for (String proc : processList) {
> // Get information for each process
> ProcessInfo pInfo = new ProcessInfo(proc);
> if (constructProcessInfo(pInfo, procfsDir) != null) {
>   allProcessInfo.put(proc, pInfo);
>   if (proc.equals(this.pid)) {
> me = pInfo; // cache 'me'
> processTree.put(proc, pInfo);
>   }
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1698) Use ResourceCalculatorPlugin instead of ResourceCalculatorProcessTree in Tez

2014-10-23 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1698:
-
Description: 
ResourceCalculatorProcessTree scraps all of /proc/ for PIDs which are part of 
the current task's process group.

This is mostly wasted in Tez, since unlike YARN which has to do this since it 
has the PID for the container-executor process (bash) and has to trace the bash 
-> java spawn inheritance.

!ProcfsBasedProcessTree.png!

The latency effect of this is less clearly visible with the profiler turned on 
as this is primarily related to rate of syscalls + overhead in the kernel (via 
the following codepath in YARN).

!ProcfsFiles.png!

{code}
 private List getProcessList() {
String[] processDirs = (new File(procfsDir)).list();
...
for (String dir : processDirs) {
  try {
if ((new File(procfsDir, dir)).isDirectory()) {
  processList.add(dir);
}
...

  public void updateProcessTree() {
if (!pid.equals(deadPid)) {
  // Get the list of processes
  List processList = getProcessList();
...
  for (String proc : processList) {
// Get information for each process
ProcessInfo pInfo = new ProcessInfo(proc);
if (constructProcessInfo(pInfo, procfsDir) != null) {
  allProcessInfo.put(proc, pInfo);
  if (proc.equals(this.pid)) {
me = pInfo; // cache 'me'
processTree.put(proc, pInfo);
  }
}
  }
{code}

  was:
ResourceCalculatorProcessTree scraps all of /proc/ for PIDs which are part of 
the current task's process group.

This is mostly wasted in Tez, since unlike YARN which has to do this since it 
has the PID for the container-executor process (bash) and has to trace the bash 
-> java spawn inheritance.

!ProcfsBasedProcessTree.png!

The effect of this is less clearly visible with the profiler turned on as this 
is primarily related to Syscall overhead in the kernel (via the following 
codepath in YARN).

{code}
 private List getProcessList() {
String[] processDirs = (new File(procfsDir)).list();
...
for (String dir : processDirs) {
  try {
if ((new File(procfsDir, dir)).isDirectory()) {
  processList.add(dir);
}
...

  public void updateProcessTree() {
if (!pid.equals(deadPid)) {
  // Get the list of processes
  List processList = getProcessList();
...
  for (String proc : processList) {
// Get information for each process
ProcessInfo pInfo = new ProcessInfo(proc);
if (constructProcessInfo(pInfo, procfsDir) != null) {
  allProcessInfo.put(proc, pInfo);
  if (proc.equals(this.pid)) {
me = pInfo; // cache 'me'
processTree.put(proc, pInfo);
  }
}
  }
{code}


> Use ResourceCalculatorPlugin instead of ResourceCalculatorProcessTree in Tez
> 
>
> Key: TEZ-1698
> URL: https://issues.apache.org/jira/browse/TEZ-1698
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.2
>Reporter: Gopal V
> Attachments: ProcfsBasedProcessTree.png, ProcfsFiles.png
>
>
> ResourceCalculatorProcessTree scraps all of /proc/ for PIDs which are part of 
> the current task's process group.
> This is mostly wasted in Tez, since unlike YARN which has to do this since it 
> has the PID for the container-executor process (bash) and has to trace the 
> bash -> java spawn inheritance.
> !ProcfsBasedProcessTree.png!
> The latency effect of this is less clearly visible with the profiler turned 
> on as this is primarily related to rate of syscalls + overhead in the kernel 
> (via the following codepath in YARN).
> !ProcfsFiles.png!
> {code}
>  private List getProcessList() {
> String[] processDirs = (new File(procfsDir)).list();
> ...
> for (String dir : processDirs) {
>   try {
> if ((new File(procfsDir, dir)).isDirectory()) {
>   processList.add(dir);
> }
> ...
>   public void updateProcessTree() {
> if (!pid.equals(deadPid)) {
>   // Get the list of processes
>   List processList = getProcessList();
> ...
>   for (String proc : processList) {
> // Get information for each process
> ProcessInfo pInfo = new ProcessInfo(proc);
> if (constructProcessInfo(pInfo, procfsDir) != null) {
>   allProcessInfo.put(proc, pInfo);
>   if (proc.equals(this.pid)) {
> me = pInfo; // cache 'me'
> processTree.put(proc, pInfo);
>   }
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1698) Use ResourceCalculatorPlugin instead of ResourceCalculatorProcessTree in Tez

2014-10-23 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1698:
-
Attachment: ProcfsFiles.png

> Use ResourceCalculatorPlugin instead of ResourceCalculatorProcessTree in Tez
> 
>
> Key: TEZ-1698
> URL: https://issues.apache.org/jira/browse/TEZ-1698
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.2
>Reporter: Gopal V
> Attachments: ProcfsBasedProcessTree.png, ProcfsFiles.png
>
>
> ResourceCalculatorProcessTree scraps all of /proc/ for PIDs which are part of 
> the current task's process group.
> This is mostly wasted in Tez, since unlike YARN which has to do this since it 
> has the PID for the container-executor process (bash) and has to trace the 
> bash -> java spawn inheritance.
> !ProcfsBasedProcessTree.png!
> The latency effect of this is less clearly visible with the profiler turned 
> on as this is primarily related to rate of syscalls + overhead in the kernel 
> (via the following codepath in YARN).
> !ProcfsFiles.png!
> {code}
>  private List getProcessList() {
> String[] processDirs = (new File(procfsDir)).list();
> ...
> for (String dir : processDirs) {
>   try {
> if ((new File(procfsDir, dir)).isDirectory()) {
>   processList.add(dir);
> }
> ...
>   public void updateProcessTree() {
> if (!pid.equals(deadPid)) {
>   // Get the list of processes
>   List processList = getProcessList();
> ...
>   for (String proc : processList) {
> // Get information for each process
> ProcessInfo pInfo = new ProcessInfo(proc);
> if (constructProcessInfo(pInfo, procfsDir) != null) {
>   allProcessInfo.put(proc, pInfo);
>   if (proc.equals(this.pid)) {
> me = pInfo; // cache 'me'
> processTree.put(proc, pInfo);
>   }
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1698) Use ResourceCalculatorPlugin instead of ResourceCalculatorProcessTree in Tez

2014-10-28 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-1698:
--
Attachment: TEZ-1698.1.patch

Both ResourceCalculatorPlugin & ResourceCalculatorProcessTree ends up opening 
lots of file handles and both of them are in YARN.  Attaching a simple patch 
which would allow users to disable resource calculator in TaskCounterUpdater.  

> Use ResourceCalculatorPlugin instead of ResourceCalculatorProcessTree in Tez
> 
>
> Key: TEZ-1698
> URL: https://issues.apache.org/jira/browse/TEZ-1698
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.2
>Reporter: Gopal V
>Assignee: Rajesh Balamohan
> Attachments: ProcfsBasedProcessTree.png, ProcfsFiles.png, 
> TEZ-1698.1.patch
>
>
> ResourceCalculatorProcessTree scraps all of /proc/ for PIDs which are part of 
> the current task's process group.
> This is mostly wasted in Tez, since unlike YARN which has to do this since it 
> has the PID for the container-executor process (bash) and has to trace the 
> bash -> java spawn inheritance.
> !ProcfsBasedProcessTree.png!
> The latency effect of this is less clearly visible with the profiler turned 
> on as this is primarily related to rate of syscalls + overhead in the kernel 
> (via the following codepath in YARN).
> !ProcfsFiles.png!
> {code}
>  private List getProcessList() {
> String[] processDirs = (new File(procfsDir)).list();
> ...
> for (String dir : processDirs) {
>   try {
> if ((new File(procfsDir, dir)).isDirectory()) {
>   processList.add(dir);
> }
> ...
>   public void updateProcessTree() {
> if (!pid.equals(deadPid)) {
>   // Get the list of processes
>   List processList = getProcessList();
> ...
>   for (String proc : processList) {
> // Get information for each process
> ProcessInfo pInfo = new ProcessInfo(proc);
> if (constructProcessInfo(pInfo, procfsDir) != null) {
>   allProcessInfo.put(proc, pInfo);
>   if (proc.equals(this.pid)) {
> me = pInfo; // cache 'me'
> processTree.put(proc, pInfo);
>   }
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1698) Use ResourceCalculatorPlugin instead of ResourceCalculatorProcessTree in Tez

2014-10-30 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-1698:
--
Attachment: TEZ-1698.2.patch

Attaching a patch which has TezMxBeanResourceCalculator to get CPU/Mem info 
based on JVM MBeans and would be less intrusive.   This can be enabled with 
"tez.task.resource.calculator.process-tree.class=org.apache.tez.util.TezMxBeanResourceCalculator"

[~gopalv] - Can you please review?

> Use ResourceCalculatorPlugin instead of ResourceCalculatorProcessTree in Tez
> 
>
> Key: TEZ-1698
> URL: https://issues.apache.org/jira/browse/TEZ-1698
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.2
>Reporter: Gopal V
>Assignee: Rajesh Balamohan
> Attachments: ProcfsBasedProcessTree.png, ProcfsFiles.png, 
> TEZ-1698.1.patch, TEZ-1698.2.patch
>
>
> ResourceCalculatorProcessTree scraps all of /proc/ for PIDs which are part of 
> the current task's process group.
> This is mostly wasted in Tez, since unlike YARN which has to do this since it 
> has the PID for the container-executor process (bash) and has to trace the 
> bash -> java spawn inheritance.
> !ProcfsBasedProcessTree.png!
> The latency effect of this is less clearly visible with the profiler turned 
> on as this is primarily related to rate of syscalls + overhead in the kernel 
> (via the following codepath in YARN).
> !ProcfsFiles.png!
> {code}
>  private List getProcessList() {
> String[] processDirs = (new File(procfsDir)).list();
> ...
> for (String dir : processDirs) {
>   try {
> if ((new File(procfsDir, dir)).isDirectory()) {
>   processList.add(dir);
> }
> ...
>   public void updateProcessTree() {
> if (!pid.equals(deadPid)) {
>   // Get the list of processes
>   List processList = getProcessList();
> ...
>   for (String proc : processList) {
> // Get information for each process
> ProcessInfo pInfo = new ProcessInfo(proc);
> if (constructProcessInfo(pInfo, procfsDir) != null) {
>   allProcessInfo.put(proc, pInfo);
>   if (proc.equals(this.pid)) {
> me = pInfo; // cache 'me'
> processTree.put(proc, pInfo);
>   }
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)