[GitHub] spark pull request #22279: [SPARK-25277][YARN] YARN applicationMaster metric...

2018-10-15 Thread LucaCanali
Github user LucaCanali commented on a diff in the pull request:

https://github.com/apache/spark/pull/22279#discussion_r225280136
  
--- Diff: core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala 
---
@@ -103,6 +103,14 @@ private[spark] class MetricsSystem private (
 sinks.foreach(_.start)
   }
 
+  // Same as start but this method only registers sinks
--- End diff --

I would split the case in 2 topics: 
1. I belive that CodeGenerator and HiveExternalCatalog metrics don't make 
sense in the context of AM, so they can be safely removed. 
2. The JVM metrics may be relevant as you mentioned. Although in the 
current version I see the problem that the JVM metrics for the AM appear 
without any application id nor prefix, so they are difficult to process. I 
guess this part can be improved if we think JVM metrics for AM can be of 
interest.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22279: [SPARK-25277][YARN] YARN applicationMaster metric...

2018-10-15 Thread LucaCanali
Github user LucaCanali commented on a diff in the pull request:

https://github.com/apache/spark/pull/22279#discussion_r225088430
  
--- Diff: core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala 
---
@@ -103,6 +103,14 @@ private[spark] class MetricsSystem private (
 sinks.foreach(_.start)
   }
 
+  // Same as start but this method only registers sinks
--- End diff --

Thanks @attilapiros for looking at this. I agree with your proposal. I'll 
provide an update to the PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22279: [SPARK-25277][YARN] YARN applicationMaster metrics shoul...

2018-10-15 Thread LucaCanali
Github user LucaCanali commented on the issue:

https://github.com/apache/spark/pull/22279
  
@attilapiros would you be interested to review this as a follow-up of your 
work on [SPARK-24594][YARN] Introducing metrics for YARN ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22279: [SPARK-25277][YARN] YARN applicationMaster metrics shoul...

2018-10-05 Thread LucaCanali
Github user LucaCanali commented on the issue:

https://github.com/apache/spark/pull/22279
  
@jerryshao would you have any additional comments on this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22397: [SPARK-25170][DOC] Add list and short description of Spa...

2018-09-12 Thread LucaCanali
Github user LucaCanali commented on the issue:

https://github.com/apache/spark/pull/22397
  
Thanks @srowen for reviewing this. The metrics are commented in the source 
code of TaskMetrics class, I took most of the descriptions from there, adding 
some additional explanations where needed. Indeed I agree that the fact that 
time units are not uniform is a bit incovenient.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22167: [SPARK-25170][DOC] Add list and short description...

2018-09-11 Thread LucaCanali
Github user LucaCanali closed the pull request at:

https://github.com/apache/spark/pull/22167


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22167: [SPARK-25170][DOC] Add list and short description of Spa...

2018-09-11 Thread LucaCanali
Github user LucaCanali commented on the issue:

https://github.com/apache/spark/pull/22167
  
Thanks @kiszk for reviewing this. I have addressed your comments in a new 
commit +
apologies as I have now moved this to a new PR 
https://github.com/apache/spark/pull/22397 
I am closing this to avoid confusion.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22397: [SPARK-25170][DOC] Add list and short description...

2018-09-11 Thread LucaCanali
GitHub user LucaCanali opened a pull request:

https://github.com/apache/spark/pull/22397

[SPARK-25170][DOC] Add list and short description of Spark Executor Task 
Metrics to the documentation.

## What changes were proposed in this pull request?

Add description of Executor Task Metrics to the documentation.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/LucaCanali/spark docMonitoringTaskMetrics

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22397.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22397


commit a8db1605adbc271c785fda24b4945bf87149a4cd
Author: LucaCanali 
Date:   2018-08-20T14:12:52Z

Document Spark Executor Task Metrics

commit 27130bac74fda2a21dc1443b613c0aee4df1e17a
Author: LucaCanali 
Date:   2018-09-10T19:30:43Z

Addressing review comments.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22218: [SPARK-25228][CORE]Add executor CPU time metric.

2018-09-03 Thread LucaCanali
Github user LucaCanali commented on a diff in the pull request:

https://github.com/apache/spark/pull/22218#discussion_r214605458
  
--- Diff: 
core/src/main/scala/org/apache/spark/executor/ExecutorSource.scala ---
@@ -73,6 +76,28 @@ class ExecutorSource(threadPool: ThreadPoolExecutor, 
executorId: String) extends
 registerFileSystemStat(scheme, "write_ops", _.getWriteOps(), 0)
   }
 
+  // Dropwizard metrics gauge measuring the executor's process CPU time.
+  // This Gauge will try to get and return the JVM Process CPU time or 
return -1 otherwise.
+  // The CPU time value is returned in nanoseconds.
+  // It will use proprietary extensions such as 
com.sun.management.OperatingSystemMXBean or
+  // com.ibm.lang.management.OperatingSystemMXBean, if available.
+  metricRegistry.register(MetricRegistry.name("jvmCpuTime"), new 
Gauge[Long] {
+val mBean: MBeanServer = ManagementFactory.getPlatformMBeanServer
+val name = new ObjectName("java.lang", "type", "OperatingSystem")
+override def getValue: Long = {
+  try {
+val attribute = mBean.getAttribute(name, "ProcessCpuTime")
+if (attribute != null) {
--- End diff --

Indeed good point. I'll remove this additional check for null value.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22218: [SPARK-25228][CORE]Add executor CPU time metric.

2018-09-02 Thread LucaCanali
Github user LucaCanali commented on a diff in the pull request:

https://github.com/apache/spark/pull/22218#discussion_r214549896
  
--- Diff: 
core/src/main/scala/org/apache/spark/executor/ExecutorSource.scala ---
@@ -73,6 +76,28 @@ class ExecutorSource(threadPool: ThreadPoolExecutor, 
executorId: String) extends
 registerFileSystemStat(scheme, "write_ops", _.getWriteOps(), 0)
   }
 
+  // Dropwizard metrics gauge measuring the executor's process CPU time.
+  // This Gauge will try to get and return the JVM Process CPU time or 
return -1 otherwise.
+  // The CPU time value is returned in nanoseconds.
+  // It will use proprietary extensions such as 
com.sun.management.OperatingSystemMXBean or
+  // com.ibm.lang.management.OperatingSystemMXBean, if available.
+  metricRegistry.register(MetricRegistry.name("jvmCpuTime"), new 
Gauge[Long] {
--- End diff --

Indeed, this is exposed only through dropwizard metrics system and not used 
otherwise in the Spark code. Another point worth mentioning is that currently 
executorSource is not registered when running in local mode.
On a related topic (although maybe for a more general discussion than the 
scope of this PR) I was wondering if it would make sense to introduce a few 
SparkConf properties to switch on/off certain families of (dropwizard) metrics 
in the Spark, as the list of available metrics is mecoming long in recent 
versions.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22218: [SPARK-25228][CORE]Add executor CPU time metric.

2018-09-01 Thread LucaCanali
Github user LucaCanali commented on the issue:

https://github.com/apache/spark/pull/22218
  
Thanks @srowen 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22218: [SPARK-25228][CORE]Add executor CPU time metric.

2018-09-01 Thread LucaCanali
Github user LucaCanali commented on the issue:

https://github.com/apache/spark/pull/22218
  
I have implemented the changes as from the latest comments by @maropu and 
@srowen 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22218: [SPARK-25228][CORE]Add executor CPU time metric.

2018-09-01 Thread LucaCanali
Github user LucaCanali commented on a diff in the pull request:

https://github.com/apache/spark/pull/22218#discussion_r214521537
  
--- Diff: 
core/src/main/scala/org/apache/spark/executor/ExecutorSource.scala ---
@@ -73,6 +75,29 @@ class ExecutorSource(threadPool: ThreadPoolExecutor, 
executorId: String) extends
 registerFileSystemStat(scheme, "write_ops", _.getWriteOps(), 0)
   }
 
+  /** Dropwizard metrics gauge measuring the executor's process CPU time.
+   *  This code will try to get JVM Process CPU time or return -1 
otherwise.
+   *  The CPU time value is returned in nanoseconds.
+   *  It will use proprietary extensions as 
com.sun.management.OperatingSystemMXBean or
+   *  com.ibm.lang.management.OperatingSystemMXBean if available
+   */
+  val mBean: MBeanServer = ManagementFactory.getPlatformMBeanServer
+  val name = new ObjectName("java.lang", "type", "OperatingSystem")
+  metricRegistry.register(MetricRegistry.name("executorCPUTime" ), new 
Gauge[Long] {
+override def getValue: Long = {
+  try {
+val attribute = mBean.getAttribute(name, "ProcessCpuTime")
+if (attribute != null) {
+  attribute.asInstanceOf[Long]
+} else {
+  -1L
--- End diff --

I took the idea from 
com.sun.management.OperatingSystemMXBean.getProcessCpuTime, according to the 
documentation: "Returns: the CPU time used by the process in nanoseconds, or -1 
if this operation is not supported."
I guess it makes sense to return an invalid value as -1L for the CPU time 
if something goes wrong with gathering CPU Time values, so the error condition 
will appear evident to the end user of the metric. Returning 0 is also 
possible, of course.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22218: [SPARK-25228][CORE]Add executor CPU time metric.

2018-08-31 Thread LucaCanali
Github user LucaCanali commented on the issue:

https://github.com/apache/spark/pull/22218
  
I have implemented the changes as from the latest comments, namely inlined 
the method.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22218: [SPARK-25228][CORE]Add executor CPU time metric.

2018-08-31 Thread LucaCanali
Github user LucaCanali commented on the issue:

https://github.com/apache/spark/pull/22218
  
I have refactored the code now using the BeanServer which should address 
the comments about availability of com.sun.management.OperatingSystemMXBean 
across different JDKs.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22218: [SPARK-25228][CORE]Add executor CPU time metric.

2018-08-31 Thread LucaCanali
Github user LucaCanali commented on a diff in the pull request:

https://github.com/apache/spark/pull/22218#discussion_r214393554
  
--- Diff: 
core/src/main/scala/org/apache/spark/executor/ExecutorSource.scala ---
@@ -17,11 +17,13 @@
 
 package org.apache.spark.executor
 
+import java.lang.management.ManagementFactory
 import java.util.concurrent.ThreadPoolExecutor
 
 import scala.collection.JavaConverters._
 
 import com.codahale.metrics.{Gauge, MetricRegistry}
+import com.sun.management.OperatingSystemMXBean
--- End diff --

I have refactored the code with a different approach using the BeanServer 
which should address the comments about avialability of 
com.sun.management.OperatingSystemMXBean across different JDKs.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22279: [SPARK-25277][YARN] YARN applicationMaster metric...

2018-08-31 Thread LucaCanali
Github user LucaCanali commented on a diff in the pull request:

https://github.com/apache/spark/pull/22279#discussion_r214289112
  
--- Diff: core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala 
---
@@ -103,6 +103,14 @@ private[spark] class MetricsSystem private (
 sinks.foreach(_.start)
   }
 
+  // Same as start but this method only registers sinks
--- End diff --

What I am trying to do is to avoid the registration of the "static 
metrics", for CodeGeneration and HiveExternalCatalog and also for JVM.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22279: [SPARK-25277][YARN] YARN applicationMaster metrics shoul...

2018-08-31 Thread LucaCanali
Github user LucaCanali commented on the issue:

https://github.com/apache/spark/pull/22279
  
Hi @jerryshao you can find here below an example of metrics currently 
reported by applicationMaster, illustrating the issue reported here. You can 
find there  the list of AM metrics reported (with the application ID as a 
prefix by default). In addition metrics for CodeGeneration and 
HiveExternalCatalog are also reported, these metrics do not make sense in this 
context, in addition they have no prefix. Metrics for JVM are reported too 
(without application_id prefix), which I am not sure it is wanted either.

```
bin/spark-shell --master yarn \
--conf 
"spark.metrics.conf.applicationMaster.sink.graphite.class"="org.apache.spark.metrics.sink.GraphiteSink"
 \
--conf "spark.metrics.conf.*.sink.graphite.host"=lc-mytest5 \
--conf "spark.metrics.conf.*.sink.graphite.port"=2003 \
--conf "spark.metrics.conf.*.sink.graphite.period"=10 \
--conf "spark.metrics.conf.*.sink.graphite.unit"=seconds \
--conf "spark.metrics.conf.*.sink.graphite.prefix"="luca" \
--conf 
"spark.metrics.conf.*.source.jvm.class"="org.apache.spark.metrics.source.JvmSource"
```

I have used InfluxDB to collect the metrics. This is the output of "show 
measurements" in InfluxDB:

```
name: measurements
name

CodeGenerator.compilationTime.count
CodeGenerator.compilationTime.max
CodeGenerator.compilationTime.mean
CodeGenerator.compilationTime.min
CodeGenerator.compilationTime.p50
CodeGenerator.compilationTime.p75
CodeGenerator.compilationTime.p95
CodeGenerator.compilationTime.p98
CodeGenerator.compilationTime.p99
CodeGenerator.compilationTime.p999
CodeGenerator.compilationTime.stddev
CodeGenerator.generatedClassSize.count
CodeGenerator.generatedClassSize.max
CodeGenerator.generatedClassSize.mean
CodeGenerator.generatedClassSize.min
CodeGenerator.generatedClassSize.p50
CodeGenerator.generatedClassSize.p75
CodeGenerator.generatedClassSize.p95
CodeGenerator.generatedClassSize.p98
CodeGenerator.generatedClassSize.p99
CodeGenerator.generatedClassSize.p999
CodeGenerator.generatedClassSize.stddev
CodeGenerator.generatedMethodSize.count
CodeGenerator.generatedMethodSize.max
CodeGenerator.generatedMethodSize.mean
CodeGenerator.generatedMethodSize.min
CodeGenerator.generatedMethodSize.p50
CodeGenerator.generatedMethodSize.p75
CodeGenerator.generatedMethodSize.p95
CodeGenerator.generatedMethodSize.p98
CodeGenerator.generatedMethodSize.p99
CodeGenerator.generatedMethodSize.p999
CodeGenerator.generatedMethodSize.stddev
CodeGenerator.sourceCodeSize.count
CodeGenerator.sourceCodeSize.max
CodeGenerator.sourceCodeSize.mean
CodeGenerator.sourceCodeSize.min
CodeGenerator.sourceCodeSize.p50
CodeGenerator.sourceCodeSize.p75
CodeGenerator.sourceCodeSize.p95
CodeGenerator.sourceCodeSize.p98
CodeGenerator.sourceCodeSize.p99
CodeGenerator.sourceCodeSize.p999
CodeGenerator.sourceCodeSize.stddev
HiveExternalCatalog.fileCacheHits.count
HiveExternalCatalog.filesDiscovered.count
HiveExternalCatalog.hiveClientCalls.count
HiveExternalCatalog.parallelListingJobCount.count
HiveExternalCatalog.partitionsFetched.count

application_1516620698330_110908.applicationMaster.numContainersPendingAllocate
application_1516620698330_110908.applicationMaster.numExecutorsFailed
application_1516620698330_110908.applicationMaster.numExecutorsRunning
application_1516620698330_110908.applicationMaster.numLocalityAwareTasks
application_1516620698330_110908.applicationMaster.numReleasedContainers
jvm.PS-MarkSweep.count
jvm.PS-MarkSweep.time
jvm.PS-Scavenge.count
jvm.PS-Scavenge.time
jvm.direct.capacity
jvm.direct.count
jvm.direct.used
jvm.heap.committed
jvm.heap.init
jvm.heap.max
jvm.heap.usage
jvm.heap.used
jvm.mapped.capacity
jvm.mapped.count
jvm.mapped.used
jvm.non-heap.committed
jvm.non-heap.init
jvm.non-heap.max
jvm.non-heap.usage
jvm.non-heap.used
jvm.pools.Code-Cache.committed
jvm.pools.Code-Cache.init
jvm.pools.Code-Cache.max
jvm.pools.Code-Cache.usage
jvm.pools.Code-Cache.used
jvm.pools.Compressed-Class-Space.committed
jvm.pools.Compressed-Class-Space.init
jvm.pools.Compressed-Class-Space.max
jvm.pools.Compressed-Class-Space.usage
jvm.pools.Compressed-Class-Space.used
jvm.pools.Metaspace.committed
jvm.pools.Metaspace.init
jvm.pools.Metaspace.max
jvm.pools.Metaspace.usage
jvm.pools.Metaspace.used
jvm.pools.PS-Eden-Space.committed
jvm.pools.PS-Eden-Space.init
jvm.pools.PS-Eden-Space.max
jvm.pools.PS-Eden-Space.usage
jvm.pools.PS

[GitHub] spark pull request #22290: [SPARK-25285][CORE] Add executor task metrics, su...

2018-08-30 Thread LucaCanali
GitHub user LucaCanali opened a pull request:

https://github.com/apache/spark/pull/22290

[SPARK-25285][CORE] Add executor task metrics, successfulTasks and 
threadpool.startedTasks

## What changes were proposed in this pull request?

The motivation for these additional metrics is to help in troubleshooting 
situations when tasks fail, are killed and/or restarted. Currently available 
metrics include executor threadpool metrics for task completed and for active 
tasks. The addition of threadpool tasStarted metric will allow for example to 
collect info on the (approximate) number of failed tasks by computing the 
difference thread started – (active threads + completed tasks and/or 
successfully completed tasks).
The proposed metric successfulTasks is also intended for this type of 
troubleshooting. The difference between  successfulTasks and 
threadpool.completeTasks, is that the latter is a (dropwizard library) gauge 
taken from the threadpool, while the former is a (dropwizard) counter computed 
in the [[Executor]] class, when a task successfully completes, together with 
several other task metrics counters.

## How was this patch tested?

Manually tested on a YARN cluster

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/LucaCanali/spark AddMetricExecutorStartedTasks

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22290.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22290


commit 72bacd30eea3abf49c00c88ffcbfdefea7e758ff
Author: LucaCanali 
Date:   2018-08-30T15:16:24Z

Add executor metrics, successfulTasks and threadpool.startedTasks




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22279: [SPARK-25277][YARN] YARN applicationMaster metric...

2018-08-30 Thread LucaCanali
GitHub user LucaCanali opened a pull request:

https://github.com/apache/spark/pull/22279

[SPARK-25277][YARN] YARN applicationMaster metrics should not register 
static metrics

## What changes were proposed in this pull request?

YARN applicationMaster metrics registration introduced in SPARK-24594 
causes further registration of static metrics (Codegenerator and 
HiveExternalCatalog) and of JVM metrics, which I believe do not belong in this 
context.
This looks like an unintended side effect of using the start method of 
[[MetricsSystem]].
A possible solution proposed here, is to introduce startNoRegisterSources 
to avoid these additional registrations of static sources and of JVM sources in 
the case of YARN applicationMaster metrics (this could be useful for other 
metrics that may be added in the future).

## How was this patch tested?

Manually tested on a YARN cluster,


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/LucaCanali/spark 
YarnMetricsRemoveExtraSourceRegistration

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22279.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22279


commit a99075873fa1519fe07344eb90d5c8af02b9d7e5
Author: LucaCanali 
Date:   2018-08-30T07:07:47Z

YARN applicationMaster metrics should not register static metrics




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22218: [SPARK-25228][CORE]Add executor CPU time metric.

2018-08-27 Thread LucaCanali
Github user LucaCanali commented on a diff in the pull request:

https://github.com/apache/spark/pull/22218#discussion_r212939411
  
--- Diff: 
core/src/main/scala/org/apache/spark/executor/ExecutorSource.scala ---
@@ -17,11 +17,13 @@
 
 package org.apache.spark.executor
 
+import java.lang.management.ManagementFactory
 import java.util.concurrent.ThreadPoolExecutor
 
 import scala.collection.JavaConverters._
 
 import com.codahale.metrics.{Gauge, MetricRegistry}
+import com.sun.management.OperatingSystemMXBean
--- End diff --

Indeed this is a very good point that I had overlooked. I have now directly 
checked and this appears to work OK on OpenJDK (and on Oracle JVM of course). 
In addition, I tested manually with IBM JDK (IBM J9 VM, Java 1.8.0_181,  where 
one would indeed suspect incompatibilities and surprisingly this appears to 
work in that case too. I believe this may come from recent work by IBM to make 
`com.ibm.lang.management.OperatingSystemMXBean.getProcessCpuTime` compatible 
with `com.sun.management.OperatingSystemMXBean.getProcessCpuTime`? See also 
[this 
link](https://www.ibm.com/support/knowledgecenter/en/SSYKE2_8.0.0/com.ibm.java.vm.80.doc/docs/dcomibmlangmanagementosmxbeaniscputime100ns.html)

I guess that if this is confirmed, we should be fine with a large fraction 
of the commonly used JDKs. In addition, we could handle the exception in case 
getProcessCpuTime is not available on a particular platform where the executor 
is running, for example returning the value -1 for this gauge in that case. Any 
thoughts/suggestions on this proposal?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22218: [SPARK-25228][CORE]Add executor CPU time metric.

2018-08-27 Thread LucaCanali
Github user LucaCanali commented on a diff in the pull request:

https://github.com/apache/spark/pull/22218#discussion_r212933292
  
--- Diff: 
core/src/main/scala/org/apache/spark/executor/ExecutorSource.scala ---
@@ -73,6 +75,13 @@ class ExecutorSource(threadPool: ThreadPoolExecutor, 
executorId: String) extends
 registerFileSystemStat(scheme, "write_ops", _.getWriteOps(), 0)
   }
 
+  // Dropwizard metrics gauge measuring the executor's process (JVM) CPU 
time.
+  // The value is returned in nanoseconds, the method return -1 if this 
operation is not supported.
+  val osMXBean = 
ManagementFactory.getOperatingSystemMXBean.asInstanceOf[OperatingSystemMXBean]
+  metricRegistry.register(MetricRegistry.name("executorCPUTime" ), new 
Gauge[Long] {
+override def getValue: Long = osMXBean.getProcessCpuTime()
--- End diff --

I believe the proposed metric tracking the executor CPU time is useful and 
adds additional information and convenience on top of the task CPU metric, as 
implemented in SPARK-22190. A couple of considerations to support this argument 
from some of the recent findings and experimentation on this:
- the process CPU time contains all the CPU consumed by the JVM, notably 
including the CPU consumed by garbage collection, which can be important in 
some cases and definitely something we want to measure and analyze
- the CPU time collected from the tasks is "harder to consume" in a 
dashboard as the CPU value is only updated at the end of the successful 
execution of the task, which makes it harder to handle for a dashboard in case 
of long-running tasks. In contrast, the executor process CPU time "dropwizard 
gauge" gives an up-to-date value of the CPU consumed by the executor at any 
time as it takes it directly from the OS.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22218: [SPARK-25228][CORE]Add executor CPU time metric.

2018-08-24 Thread LucaCanali
GitHub user LucaCanali opened a pull request:

https://github.com/apache/spark/pull/22218

[SPARK-25228][CORE]Add executor CPU time metric.

## What changes were proposed in this pull request?

Add a new metric to measure the executor's process (JVM) CPU time.

## How was this patch tested?

Manually tested on a Spark cluster (see SPARK-25228 for an example 
screenshot).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/LucaCanali/spark AddExecutrCPUTimeMetric

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22218.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22218


commit c715096657c36beedd43c64104f56a78b2eb268d
Author: LucaCanali 
Date:   2018-08-24T12:05:59Z

Add Executor CPU Time metric




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22217: [SPARK-25228][CORE]Add executor CPU time metric

2018-08-24 Thread LucaCanali
Github user LucaCanali closed the pull request at:

https://github.com/apache/spark/pull/22217


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22217: [SPARK-25228][CORE]Add executor CPU time metric

2018-08-24 Thread LucaCanali
GitHub user LucaCanali opened a pull request:

https://github.com/apache/spark/pull/22217

[SPARK-25228][CORE]Add executor CPU time metric

## What changes were proposed in this pull request?

Add a new metric to measure the executor's process (JVM) CPU time

## How was this patch tested?

Manually tested on a Spark cluster (see SPARK-25228 for an example 
screenshot).

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/LucaCanali/spark executorCPUTImeNewMetric

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22217.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22217


commit a8db1605adbc271c785fda24b4945bf87149a4cd
Author: LucaCanali 
Date:   2018-08-20T14:12:52Z

Document Spark Executor Task Metrics

commit 51454e7629e38f92165fc48676fc8a1af91edf26
Author: LucaCanali 
Date:   2018-08-24T11:34:41Z

add executorCPUTime metric




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22167: [SPARK-25170][DOC] Add list and short description...

2018-08-21 Thread LucaCanali
GitHub user LucaCanali opened a pull request:

https://github.com/apache/spark/pull/22167

[SPARK-25170][DOC] Add list and short description of Spark Executor Task 
Metrics to the documentation

## What changes were proposed in this pull request?

Add description of Task Metrics to the documentation.

## How was this patch tested?

None.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/LucaCanali/spark docMonitoringTaskMetrics

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22167.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22167


commit a8db1605adbc271c785fda24b4945bf87149a4cd
Author: LucaCanali 
Date:   2018-08-20T14:12:52Z

Document Spark Executor Task Metrics




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19426: [SPARK-22190][CORE] Add Spark executor task metrics to D...

2017-10-31 Thread LucaCanali
Github user LucaCanali commented on the issue:

https://github.com/apache/spark/pull/19426
  
Thanks @umehrot2 for the review and comments. Indeed well spotted that I 
had forgotten a couple of metrics and added one of them twice. This is 
hopefully fixed with the latest commit. As for your comment on the Spark 
Executor task metric JVMGCTime, my first reaction is that (1) I think it is 
good to have this type of info in the context of the executor metrics (2) It 
does not appear to me that JVMGCTime measured at the executor task level 
provides a direct duplication of info available in the JVM source (I am 
available to investigte this in more details in follow-up comments if you wish).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19426: [SPARK-22190][CORE] Add Spark executor task metrics to D...

2017-10-27 Thread LucaCanali
Github user LucaCanali commented on the issue:

https://github.com/apache/spark/pull/19426
  
Thanks @hvanhovell  for pushing the test. I see that the test build threw 
an error on Scala style tests: the logs report "/Executor.scala:443:0: 
Whitespace at end of line". However, I cannot reproduce the error when I run 
scalastyle locally, moreover it appears to me that line Executor.scala:443 is 
empty (so no extra space there). 



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19426: [SPARK-22190][CORE] Add Spark executor task metri...

2017-10-04 Thread LucaCanali
GitHub user LucaCanali opened a pull request:

https://github.com/apache/spark/pull/19426

[SPARK-22190][CORE] Add Spark executor task metrics to Dropwizard metrics

## What changes were proposed in this pull request?

This proposed patch is about making Spark executor task metrics available 
as Dropwizard metrics. This is intended to be of aid in monitoring Spark jobs 
and when drilling down on performance troubleshooting issues.

## How was this patch tested?

Manually tested on a Spark cluster (see JIRA for an example screenshot).


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/LucaCanali/spark SparkTaskMetricsDropWizard

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19426.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19426


commit ca43764327e8796ea01d5053fc73ca7136b57835
Author: LucaCanali <luca.can...@cern.ch>
Date:   2017-10-03T13:48:44Z

Add Spark executor task metrics to Dropwizard metrics




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19039: [SPARK-21829][CORE] Enable config to permanently blackli...

2017-08-29 Thread LucaCanali
Github user LucaCanali commented on the issue:

https://github.com/apache/spark/pull/19039
  
@jiangxb1987 Indeed good suggestion by @jerryshao - I have replied on 
SPARK-21829 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19039: [SPARK-21829][CORE] Enable config to permanently blackli...

2017-08-24 Thread LucaCanali
Github user LucaCanali commented on the issue:

https://github.com/apache/spark/pull/19039
  
Thanks @jiangxb1987 for the review. I have tried to address the comments in 
a new commit, in particular adding the configuration to internal/config and 
building a private function to handle processing of the node list in 
`spark.blacklist.alwaysBlacklistedNodes`. As for setting `_nodeBlacklist` I 
think it makes sense to use 
`_nodeBlacklist.set(nodeIdToBlacklistExpiryTime.keySet.toSet)` to keep it 
consistent with the rest of the code in BlacklistTracker. Also 
`nodeIdToBlacklistExpiryTime` needs to be initialized with the blacklisted 
nodes.

As for the usefulness of the feature, I understand your comment and I have 
added some comments in SPARK-21829. The need for this feature for me comes from 
a production issue, which I realize is not very common, but I guess can happen 
again in my environment and maybe in others'.
What we have is a shared YARN cluster and we have a workload that runs slow 
on a couple of nodes, however the nodes are fine to run other types of jobs, so 
we want to have them in the cluster. The actual problem comes from reading from 
an external file system, and apparently only for this specific workload (which 
is only one of many workloads that run on that cluster). What I have done as a 
workaround to make the job run faster so far is just killing the executors on 
the 2 "slow nodes" and the job could finish faster as it avoided the painfully 
slow long tail of execution on the affected nodes. The proposed patch/feature 
is an attempt to address this case in a more structured way than just going on 
the nodes and killing executors.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19039: Add feature to permanently blacklist a user-speci...

2017-08-24 Thread LucaCanali
GitHub user LucaCanali opened a pull request:

https://github.com/apache/spark/pull/19039

Add feature to permanently blacklist a user-specified list of nodes, …

…SPARK-21829

## What changes were proposed in this pull request?

With this new feature I propose to introduce a mechanism to allow users to 
specify a list of nodes in the cluster where executors/tasks should not run for 
a specific job.
The proposed implementation that I tested (see PR) uses the Spark blacklist 
mechanism. With the parameter spark.blacklist.alwaysBlacklistedNodes, a list of 
user-specified nodes is added to the blacklist at the start of the Spark 
Context and it is never expired. 
I have tested this on a YARN cluster on a case taken from the original 
production problem and I confirm a performance improvement of about 5x for the 
specific test case I have. I imagine that there can be other cases where Spark 
users may want to blacklist a set of nodes. This can be used for 
troubleshooting, including cases where certain nodes/executors are slow for a 
given workload and this is caused by external agents, so the anomaly is not 
picked up by the cluster manager.
See also SPARK-21829

## How was this patch tested?

A test has been added to the BlackListTrackerSuite.
The patch has also been successfully tested manually on a YARN cluster.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/LucaCanali/spark 
SchedulerAlwaysBlacklistedNodes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19039.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19039


commit 4932d3767e3c3bb24a94a7422b7b8e6f7a5bf08b
Author: LucaCanali <luca.can...@cern.ch>
Date:   2017-08-24T13:27:54Z

Add feature to permanently blacklist a user-specified list of nodes, 
SPARK-21829




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18724: [SPARK-21519][SQL] Add an option to the JDBC data...

2017-08-11 Thread LucaCanali
Github user LucaCanali commented on a diff in the pull request:

https://github.com/apache/spark/pull/18724#discussion_r132679699
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala 
---
@@ -1007,4 +1007,23 @@ class JDBCSuite extends SparkFunSuite
   assert(sql("select * from people_view").count() == 3)
 }
   }
+
+  test("SPARK-21519: option sessionInitStatement, run SQL to initialize 
the database session.") {
+val initSQL1 = "SET @MYTESTVAR 21519"
+val df1 = spark.read.format("jdbc")
+  .option("url", urlWithUserAndPass)
+  .option("dbtable", "(SELECT NVL(@MYTESTVAR, -1))")
+  .option("sessionInitStatement", initSQL1)
+  .load()
+assert(df1.collect() === Array(Row(21519)))
+
+val initSQL2 = "SET SCHEMA DUMMY"
--- End diff --

Thanks for the clarification. I have now added a test that runs 2 SQL 
statements.
For future reference I'd like to stress the fact that the code executed by 
the option "sessionInitStatement" is just the user-provided string fed through 
the execute method of the JDBC connection, so it can use the features of the 
target database language/syntax. In the case of the test I wrote for the H2 
database I have just put together two commands separated by ";". When using 
sessionInitStatement for querying Oracle, for example, the user-provided 
command can be a SQL statemnet or a PL/SQL block grouping multiple commands and 
logic.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18724: [SPARK-21519][SQL] Add an option to the JDBC data source...

2017-08-06 Thread LucaCanali
Github user LucaCanali commented on the issue:

https://github.com/apache/spark/pull/18724
  
Thank you very much @gatorsmile for the review. I plan to provide the 
required changes and add a test case, however it is probably going to take one 
more week before I can do that.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18724: [SPARK-21519][SQL] Add an option to the JDBC data...

2017-07-24 Thread LucaCanali
GitHub user LucaCanali opened a pull request:

https://github.com/apache/spark/pull/18724

[SPARK-21519][SQL] Add an option to the JDBC data source to initialize the 
target DB environment

Add an option to the JDBC data source to initialize the environment of the 
remote database session

## What changes were proposed in this pull request?

This proposes an option to the JDBC datasource, tentatively called " 
sessionInitStatement" to implement the functionality of session initialization 
present for example in the Sqoop connector for Oracle (see 
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oraoop_oracle_session_initialization_statements
 ) . After each database session is opened to the remote DB, and before 
starting to read data, this option executes a custom SQL statement (or a PL/SQL 
block in the case of Oracle).

See also https://issues.apache.org/jira/browse/SPARK-21519 

## How was this patch tested?

Manually tested using Spark SQL data source and Oracle JDBC

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/LucaCanali/spark 
JDBC_datasource_sessionInitStatement

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18724.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18724


commit 92d082a67910bd7e5f3aeefc204f97a60f144e08
Author: LucaCanali <luca.can...@cern.ch>
Date:   2017-07-24T08:46:17Z

Add an option to the JDBC data source to initialize the environment of the 
remote database session




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org