Re: CPU/Disk/network performance instrumentation

2014-07-09 Thread Reynold Xin
Maybe it's time to create an advanced mode in the ui.


On Wed, Jul 9, 2014 at 12:23 PM, Kay Ousterhout k...@eecs.berkeley.edu
wrote:

 Hi all,

 I've been doing a bunch of performance measurement of Spark and, as part of
 doing this, added metrics that record the average CPU utilization, disk
 throughput and utilization for each block device, and network throughput
 while each task is running.  These metrics are collected by reading the
 /proc filesystem so work only on Linux.  I'm happy to submit a pull request
 with the appropriate changes but first wanted to see if sufficiently many
 people think this would be useful.  I know the metrics reported by Spark
 (and in the UI) are already overwhelming to some folks so don't want to add
 more instrumentation if it's not widely useful.

 These metrics are slightly more difficult to interpret for Spark than
 similar metrics reported by Hadoop because, with Spark, multiple tasks run
 in the same JVM and therefore as part of the same process.  This means
 that, for example, the CPU utilization metrics reflect the CPU use across
 all tasks in the JVM, rather than only the CPU time used by the particular
 task.  This is a pro and a con -- it makes it harder to determine why
 utilization is high (it may be from a different task) but it also makes the
 metrics useful for diagnosing straggler problems.  Just wanted to clarify
 this before asking folks to weigh in on whether the added metrics would be
 useful.

 -Kay

 (if you're curious, the instrumentation code is on a very messy branch
 here:

 https://github.com/kayousterhout/spark-1/tree/proc_logging_perf_minimal_temp/core/src/main/scala/org/apache/spark/performance_logging
 )



Re: CPU/Disk/network performance instrumentation

2014-07-09 Thread Shivaram Venkataraman
I think it would be very useful to have this. We could put the ui display
either behind a flag or a url parameter

Shivaram


On Wed, Jul 9, 2014 at 12:25 PM, Reynold Xin r...@databricks.com wrote:

 Maybe it's time to create an advanced mode in the ui.


 On Wed, Jul 9, 2014 at 12:23 PM, Kay Ousterhout k...@eecs.berkeley.edu
 wrote:

  Hi all,
 
  I've been doing a bunch of performance measurement of Spark and, as part
 of
  doing this, added metrics that record the average CPU utilization, disk
  throughput and utilization for each block device, and network throughput
  while each task is running.  These metrics are collected by reading the
  /proc filesystem so work only on Linux.  I'm happy to submit a pull
 request
  with the appropriate changes but first wanted to see if sufficiently many
  people think this would be useful.  I know the metrics reported by Spark
  (and in the UI) are already overwhelming to some folks so don't want to
 add
  more instrumentation if it's not widely useful.
 
  These metrics are slightly more difficult to interpret for Spark than
  similar metrics reported by Hadoop because, with Spark, multiple tasks
 run
  in the same JVM and therefore as part of the same process.  This means
  that, for example, the CPU utilization metrics reflect the CPU use across
  all tasks in the JVM, rather than only the CPU time used by the
 particular
  task.  This is a pro and a con -- it makes it harder to determine why
  utilization is high (it may be from a different task) but it also makes
 the
  metrics useful for diagnosing straggler problems.  Just wanted to clarify
  this before asking folks to weigh in on whether the added metrics would
 be
  useful.
 
  -Kay
 
  (if you're curious, the instrumentation code is on a very messy branch
  here:
 
 
 https://github.com/kayousterhout/spark-1/tree/proc_logging_perf_minimal_temp/core/src/main/scala/org/apache/spark/performance_logging
  )