[GitHub] spark pull request #22279: [SPARK-25277][YARN] YARN applicationMaster metric...
Github user LucaCanali commented on a diff in the pull request: https://github.com/apache/spark/pull/22279#discussion_r225280136 --- Diff: core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala --- @@ -103,6 +103,14 @@ private[spark] class MetricsSystem private ( sinks.foreach(_.start) } + // Same as start but this method only registers sinks --- End diff -- I would split the case in 2 topics: 1. I belive that CodeGenerator and HiveExternalCatalog metrics don't make sense in the context of AM, so they can be safely removed. 2. The JVM metrics may be relevant as you mentioned. Although in the current version I see the problem that the JVM metrics for the AM appear without any application id nor prefix, so they are difficult to process. I guess this part can be improved if we think JVM metrics for AM can be of interest. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22279: [SPARK-25277][YARN] YARN applicationMaster metric...
Github user LucaCanali commented on a diff in the pull request: https://github.com/apache/spark/pull/22279#discussion_r225088430 --- Diff: core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala --- @@ -103,6 +103,14 @@ private[spark] class MetricsSystem private ( sinks.foreach(_.start) } + // Same as start but this method only registers sinks --- End diff -- Thanks @attilapiros for looking at this. I agree with your proposal. I'll provide an update to the PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22279: [SPARK-25277][YARN] YARN applicationMaster metrics shoul...
Github user LucaCanali commented on the issue: https://github.com/apache/spark/pull/22279 @attilapiros would you be interested to review this as a follow-up of your work on [SPARK-24594][YARN] Introducing metrics for YARN ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22279: [SPARK-25277][YARN] YARN applicationMaster metrics shoul...
Github user LucaCanali commented on the issue: https://github.com/apache/spark/pull/22279 @jerryshao would you have any additional comments on this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22397: [SPARK-25170][DOC] Add list and short description of Spa...
Github user LucaCanali commented on the issue: https://github.com/apache/spark/pull/22397 Thanks @srowen for reviewing this. The metrics are commented in the source code of TaskMetrics class, I took most of the descriptions from there, adding some additional explanations where needed. Indeed I agree that the fact that time units are not uniform is a bit incovenient. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22167: [SPARK-25170][DOC] Add list and short description...
Github user LucaCanali closed the pull request at: https://github.com/apache/spark/pull/22167 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22167: [SPARK-25170][DOC] Add list and short description of Spa...
Github user LucaCanali commented on the issue: https://github.com/apache/spark/pull/22167 Thanks @kiszk for reviewing this. I have addressed your comments in a new commit + apologies as I have now moved this to a new PR https://github.com/apache/spark/pull/22397 I am closing this to avoid confusion. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22397: [SPARK-25170][DOC] Add list and short description...
GitHub user LucaCanali opened a pull request: https://github.com/apache/spark/pull/22397 [SPARK-25170][DOC] Add list and short description of Spark Executor Task Metrics to the documentation. ## What changes were proposed in this pull request? Add description of Executor Task Metrics to the documentation. You can merge this pull request into a Git repository by running: $ git pull https://github.com/LucaCanali/spark docMonitoringTaskMetrics Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22397.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22397 commit a8db1605adbc271c785fda24b4945bf87149a4cd Author: LucaCanali Date: 2018-08-20T14:12:52Z Document Spark Executor Task Metrics commit 27130bac74fda2a21dc1443b613c0aee4df1e17a Author: LucaCanali Date: 2018-09-10T19:30:43Z Addressing review comments. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22218: [SPARK-25228][CORE]Add executor CPU time metric.
Github user LucaCanali commented on a diff in the pull request: https://github.com/apache/spark/pull/22218#discussion_r214605458 --- Diff: core/src/main/scala/org/apache/spark/executor/ExecutorSource.scala --- @@ -73,6 +76,28 @@ class ExecutorSource(threadPool: ThreadPoolExecutor, executorId: String) extends registerFileSystemStat(scheme, "write_ops", _.getWriteOps(), 0) } + // Dropwizard metrics gauge measuring the executor's process CPU time. + // This Gauge will try to get and return the JVM Process CPU time or return -1 otherwise. + // The CPU time value is returned in nanoseconds. + // It will use proprietary extensions such as com.sun.management.OperatingSystemMXBean or + // com.ibm.lang.management.OperatingSystemMXBean, if available. + metricRegistry.register(MetricRegistry.name("jvmCpuTime"), new Gauge[Long] { +val mBean: MBeanServer = ManagementFactory.getPlatformMBeanServer +val name = new ObjectName("java.lang", "type", "OperatingSystem") +override def getValue: Long = { + try { +val attribute = mBean.getAttribute(name, "ProcessCpuTime") +if (attribute != null) { --- End diff -- Indeed good point. I'll remove this additional check for null value. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22218: [SPARK-25228][CORE]Add executor CPU time metric.
Github user LucaCanali commented on a diff in the pull request: https://github.com/apache/spark/pull/22218#discussion_r214549896 --- Diff: core/src/main/scala/org/apache/spark/executor/ExecutorSource.scala --- @@ -73,6 +76,28 @@ class ExecutorSource(threadPool: ThreadPoolExecutor, executorId: String) extends registerFileSystemStat(scheme, "write_ops", _.getWriteOps(), 0) } + // Dropwizard metrics gauge measuring the executor's process CPU time. + // This Gauge will try to get and return the JVM Process CPU time or return -1 otherwise. + // The CPU time value is returned in nanoseconds. + // It will use proprietary extensions such as com.sun.management.OperatingSystemMXBean or + // com.ibm.lang.management.OperatingSystemMXBean, if available. + metricRegistry.register(MetricRegistry.name("jvmCpuTime"), new Gauge[Long] { --- End diff -- Indeed, this is exposed only through dropwizard metrics system and not used otherwise in the Spark code. Another point worth mentioning is that currently executorSource is not registered when running in local mode. On a related topic (although maybe for a more general discussion than the scope of this PR) I was wondering if it would make sense to introduce a few SparkConf properties to switch on/off certain families of (dropwizard) metrics in the Spark, as the list of available metrics is mecoming long in recent versions. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22218: [SPARK-25228][CORE]Add executor CPU time metric.
Github user LucaCanali commented on the issue: https://github.com/apache/spark/pull/22218 Thanks @srowen --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22218: [SPARK-25228][CORE]Add executor CPU time metric.
Github user LucaCanali commented on the issue: https://github.com/apache/spark/pull/22218 I have implemented the changes as from the latest comments by @maropu and @srowen --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22218: [SPARK-25228][CORE]Add executor CPU time metric.
Github user LucaCanali commented on a diff in the pull request: https://github.com/apache/spark/pull/22218#discussion_r214521537 --- Diff: core/src/main/scala/org/apache/spark/executor/ExecutorSource.scala --- @@ -73,6 +75,29 @@ class ExecutorSource(threadPool: ThreadPoolExecutor, executorId: String) extends registerFileSystemStat(scheme, "write_ops", _.getWriteOps(), 0) } + /** Dropwizard metrics gauge measuring the executor's process CPU time. + * This code will try to get JVM Process CPU time or return -1 otherwise. + * The CPU time value is returned in nanoseconds. + * It will use proprietary extensions as com.sun.management.OperatingSystemMXBean or + * com.ibm.lang.management.OperatingSystemMXBean if available + */ + val mBean: MBeanServer = ManagementFactory.getPlatformMBeanServer + val name = new ObjectName("java.lang", "type", "OperatingSystem") + metricRegistry.register(MetricRegistry.name("executorCPUTime" ), new Gauge[Long] { +override def getValue: Long = { + try { +val attribute = mBean.getAttribute(name, "ProcessCpuTime") +if (attribute != null) { + attribute.asInstanceOf[Long] +} else { + -1L --- End diff -- I took the idea from com.sun.management.OperatingSystemMXBean.getProcessCpuTime, according to the documentation: "Returns: the CPU time used by the process in nanoseconds, or -1 if this operation is not supported." I guess it makes sense to return an invalid value as -1L for the CPU time if something goes wrong with gathering CPU Time values, so the error condition will appear evident to the end user of the metric. Returning 0 is also possible, of course. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22218: [SPARK-25228][CORE]Add executor CPU time metric.
Github user LucaCanali commented on the issue: https://github.com/apache/spark/pull/22218 I have implemented the changes as from the latest comments, namely inlined the method. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22218: [SPARK-25228][CORE]Add executor CPU time metric.
Github user LucaCanali commented on the issue: https://github.com/apache/spark/pull/22218 I have refactored the code now using the BeanServer which should address the comments about availability of com.sun.management.OperatingSystemMXBean across different JDKs. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22218: [SPARK-25228][CORE]Add executor CPU time metric.
Github user LucaCanali commented on a diff in the pull request: https://github.com/apache/spark/pull/22218#discussion_r214393554 --- Diff: core/src/main/scala/org/apache/spark/executor/ExecutorSource.scala --- @@ -17,11 +17,13 @@ package org.apache.spark.executor +import java.lang.management.ManagementFactory import java.util.concurrent.ThreadPoolExecutor import scala.collection.JavaConverters._ import com.codahale.metrics.{Gauge, MetricRegistry} +import com.sun.management.OperatingSystemMXBean --- End diff -- I have refactored the code with a different approach using the BeanServer which should address the comments about avialability of com.sun.management.OperatingSystemMXBean across different JDKs. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22279: [SPARK-25277][YARN] YARN applicationMaster metric...
Github user LucaCanali commented on a diff in the pull request: https://github.com/apache/spark/pull/22279#discussion_r214289112 --- Diff: core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala --- @@ -103,6 +103,14 @@ private[spark] class MetricsSystem private ( sinks.foreach(_.start) } + // Same as start but this method only registers sinks --- End diff -- What I am trying to do is to avoid the registration of the "static metrics", for CodeGeneration and HiveExternalCatalog and also for JVM. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22279: [SPARK-25277][YARN] YARN applicationMaster metrics shoul...
Github user LucaCanali commented on the issue: https://github.com/apache/spark/pull/22279 Hi @jerryshao you can find here below an example of metrics currently reported by applicationMaster, illustrating the issue reported here. You can find there the list of AM metrics reported (with the application ID as a prefix by default). In addition metrics for CodeGeneration and HiveExternalCatalog are also reported, these metrics do not make sense in this context, in addition they have no prefix. Metrics for JVM are reported too (without application_id prefix), which I am not sure it is wanted either. ``` bin/spark-shell --master yarn \ --conf "spark.metrics.conf.applicationMaster.sink.graphite.class"="org.apache.spark.metrics.sink.GraphiteSink" \ --conf "spark.metrics.conf.*.sink.graphite.host"=lc-mytest5 \ --conf "spark.metrics.conf.*.sink.graphite.port"=2003 \ --conf "spark.metrics.conf.*.sink.graphite.period"=10 \ --conf "spark.metrics.conf.*.sink.graphite.unit"=seconds \ --conf "spark.metrics.conf.*.sink.graphite.prefix"="luca" \ --conf "spark.metrics.conf.*.source.jvm.class"="org.apache.spark.metrics.source.JvmSource" ``` I have used InfluxDB to collect the metrics. This is the output of "show measurements" in InfluxDB: ``` name: measurements name CodeGenerator.compilationTime.count CodeGenerator.compilationTime.max CodeGenerator.compilationTime.mean CodeGenerator.compilationTime.min CodeGenerator.compilationTime.p50 CodeGenerator.compilationTime.p75 CodeGenerator.compilationTime.p95 CodeGenerator.compilationTime.p98 CodeGenerator.compilationTime.p99 CodeGenerator.compilationTime.p999 CodeGenerator.compilationTime.stddev CodeGenerator.generatedClassSize.count CodeGenerator.generatedClassSize.max CodeGenerator.generatedClassSize.mean CodeGenerator.generatedClassSize.min CodeGenerator.generatedClassSize.p50 CodeGenerator.generatedClassSize.p75 CodeGenerator.generatedClassSize.p95 CodeGenerator.generatedClassSize.p98 CodeGenerator.generatedClassSize.p99 CodeGenerator.generatedClassSize.p999 CodeGenerator.generatedClassSize.stddev CodeGenerator.generatedMethodSize.count CodeGenerator.generatedMethodSize.max CodeGenerator.generatedMethodSize.mean CodeGenerator.generatedMethodSize.min CodeGenerator.generatedMethodSize.p50 CodeGenerator.generatedMethodSize.p75 CodeGenerator.generatedMethodSize.p95 CodeGenerator.generatedMethodSize.p98 CodeGenerator.generatedMethodSize.p99 CodeGenerator.generatedMethodSize.p999 CodeGenerator.generatedMethodSize.stddev CodeGenerator.sourceCodeSize.count CodeGenerator.sourceCodeSize.max CodeGenerator.sourceCodeSize.mean CodeGenerator.sourceCodeSize.min CodeGenerator.sourceCodeSize.p50 CodeGenerator.sourceCodeSize.p75 CodeGenerator.sourceCodeSize.p95 CodeGenerator.sourceCodeSize.p98 CodeGenerator.sourceCodeSize.p99 CodeGenerator.sourceCodeSize.p999 CodeGenerator.sourceCodeSize.stddev HiveExternalCatalog.fileCacheHits.count HiveExternalCatalog.filesDiscovered.count HiveExternalCatalog.hiveClientCalls.count HiveExternalCatalog.parallelListingJobCount.count HiveExternalCatalog.partitionsFetched.count application_1516620698330_110908.applicationMaster.numContainersPendingAllocate application_1516620698330_110908.applicationMaster.numExecutorsFailed application_1516620698330_110908.applicationMaster.numExecutorsRunning application_1516620698330_110908.applicationMaster.numLocalityAwareTasks application_1516620698330_110908.applicationMaster.numReleasedContainers jvm.PS-MarkSweep.count jvm.PS-MarkSweep.time jvm.PS-Scavenge.count jvm.PS-Scavenge.time jvm.direct.capacity jvm.direct.count jvm.direct.used jvm.heap.committed jvm.heap.init jvm.heap.max jvm.heap.usage jvm.heap.used jvm.mapped.capacity jvm.mapped.count jvm.mapped.used jvm.non-heap.committed jvm.non-heap.init jvm.non-heap.max jvm.non-heap.usage jvm.non-heap.used jvm.pools.Code-Cache.committed jvm.pools.Code-Cache.init jvm.pools.Code-Cache.max jvm.pools.Code-Cache.usage jvm.pools.Code-Cache.used jvm.pools.Compressed-Class-Space.committed jvm.pools.Compressed-Class-Space.init jvm.pools.Compressed-Class-Space.max jvm.pools.Compressed-Class-Space.usage jvm.pools.Compressed-Class-Space.used jvm.pools.Metaspace.committed jvm.pools.Metaspace.init jvm.pools.Metaspace.max jvm.pools.Metaspace.usage jvm.pools.Metaspace.used jvm.pools.PS-Eden-Space.committed jvm.pools.PS-Eden-Space.init jvm.pools.PS-Eden-Space.max jvm.pools.PS-Eden-Space.usage jvm.pools.PS
[GitHub] spark pull request #22290: [SPARK-25285][CORE] Add executor task metrics, su...
GitHub user LucaCanali opened a pull request: https://github.com/apache/spark/pull/22290 [SPARK-25285][CORE] Add executor task metrics, successfulTasks and threadpool.startedTasks ## What changes were proposed in this pull request? The motivation for these additional metrics is to help in troubleshooting situations when tasks fail, are killed and/or restarted. Currently available metrics include executor threadpool metrics for task completed and for active tasks. The addition of threadpool tasStarted metric will allow for example to collect info on the (approximate) number of failed tasks by computing the difference thread started â (active threads + completed tasks and/or successfully completed tasks). The proposed metric successfulTasks is also intended for this type of troubleshooting. The difference between successfulTasks and threadpool.completeTasks, is that the latter is a (dropwizard library) gauge taken from the threadpool, while the former is a (dropwizard) counter computed in the [[Executor]] class, when a task successfully completes, together with several other task metrics counters. ## How was this patch tested? Manually tested on a YARN cluster You can merge this pull request into a Git repository by running: $ git pull https://github.com/LucaCanali/spark AddMetricExecutorStartedTasks Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22290.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22290 commit 72bacd30eea3abf49c00c88ffcbfdefea7e758ff Author: LucaCanali Date: 2018-08-30T15:16:24Z Add executor metrics, successfulTasks and threadpool.startedTasks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22279: [SPARK-25277][YARN] YARN applicationMaster metric...
GitHub user LucaCanali opened a pull request: https://github.com/apache/spark/pull/22279 [SPARK-25277][YARN] YARN applicationMaster metrics should not register static metrics ## What changes were proposed in this pull request? YARN applicationMaster metrics registration introduced in SPARK-24594 causes further registration of static metrics (Codegenerator and HiveExternalCatalog) and of JVM metrics, which I believe do not belong in this context. This looks like an unintended side effect of using the start method of [[MetricsSystem]]. A possible solution proposed here, is to introduce startNoRegisterSources to avoid these additional registrations of static sources and of JVM sources in the case of YARN applicationMaster metrics (this could be useful for other metrics that may be added in the future). ## How was this patch tested? Manually tested on a YARN cluster, You can merge this pull request into a Git repository by running: $ git pull https://github.com/LucaCanali/spark YarnMetricsRemoveExtraSourceRegistration Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22279.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22279 commit a99075873fa1519fe07344eb90d5c8af02b9d7e5 Author: LucaCanali Date: 2018-08-30T07:07:47Z YARN applicationMaster metrics should not register static metrics --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22218: [SPARK-25228][CORE]Add executor CPU time metric.
Github user LucaCanali commented on a diff in the pull request: https://github.com/apache/spark/pull/22218#discussion_r212939411 --- Diff: core/src/main/scala/org/apache/spark/executor/ExecutorSource.scala --- @@ -17,11 +17,13 @@ package org.apache.spark.executor +import java.lang.management.ManagementFactory import java.util.concurrent.ThreadPoolExecutor import scala.collection.JavaConverters._ import com.codahale.metrics.{Gauge, MetricRegistry} +import com.sun.management.OperatingSystemMXBean --- End diff -- Indeed this is a very good point that I had overlooked. I have now directly checked and this appears to work OK on OpenJDK (and on Oracle JVM of course). In addition, I tested manually with IBM JDK (IBM J9 VM, Java 1.8.0_181, where one would indeed suspect incompatibilities and surprisingly this appears to work in that case too. I believe this may come from recent work by IBM to make `com.ibm.lang.management.OperatingSystemMXBean.getProcessCpuTime` compatible with `com.sun.management.OperatingSystemMXBean.getProcessCpuTime`? See also [this link](https://www.ibm.com/support/knowledgecenter/en/SSYKE2_8.0.0/com.ibm.java.vm.80.doc/docs/dcomibmlangmanagementosmxbeaniscputime100ns.html) I guess that if this is confirmed, we should be fine with a large fraction of the commonly used JDKs. In addition, we could handle the exception in case getProcessCpuTime is not available on a particular platform where the executor is running, for example returning the value -1 for this gauge in that case. Any thoughts/suggestions on this proposal? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22218: [SPARK-25228][CORE]Add executor CPU time metric.
Github user LucaCanali commented on a diff in the pull request: https://github.com/apache/spark/pull/22218#discussion_r212933292 --- Diff: core/src/main/scala/org/apache/spark/executor/ExecutorSource.scala --- @@ -73,6 +75,13 @@ class ExecutorSource(threadPool: ThreadPoolExecutor, executorId: String) extends registerFileSystemStat(scheme, "write_ops", _.getWriteOps(), 0) } + // Dropwizard metrics gauge measuring the executor's process (JVM) CPU time. + // The value is returned in nanoseconds, the method return -1 if this operation is not supported. + val osMXBean = ManagementFactory.getOperatingSystemMXBean.asInstanceOf[OperatingSystemMXBean] + metricRegistry.register(MetricRegistry.name("executorCPUTime" ), new Gauge[Long] { +override def getValue: Long = osMXBean.getProcessCpuTime() --- End diff -- I believe the proposed metric tracking the executor CPU time is useful and adds additional information and convenience on top of the task CPU metric, as implemented in SPARK-22190. A couple of considerations to support this argument from some of the recent findings and experimentation on this: - the process CPU time contains all the CPU consumed by the JVM, notably including the CPU consumed by garbage collection, which can be important in some cases and definitely something we want to measure and analyze - the CPU time collected from the tasks is "harder to consume" in a dashboard as the CPU value is only updated at the end of the successful execution of the task, which makes it harder to handle for a dashboard in case of long-running tasks. In contrast, the executor process CPU time "dropwizard gauge" gives an up-to-date value of the CPU consumed by the executor at any time as it takes it directly from the OS. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22218: [SPARK-25228][CORE]Add executor CPU time metric.
GitHub user LucaCanali opened a pull request: https://github.com/apache/spark/pull/22218 [SPARK-25228][CORE]Add executor CPU time metric. ## What changes were proposed in this pull request? Add a new metric to measure the executor's process (JVM) CPU time. ## How was this patch tested? Manually tested on a Spark cluster (see SPARK-25228 for an example screenshot). You can merge this pull request into a Git repository by running: $ git pull https://github.com/LucaCanali/spark AddExecutrCPUTimeMetric Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22218.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22218 commit c715096657c36beedd43c64104f56a78b2eb268d Author: LucaCanali Date: 2018-08-24T12:05:59Z Add Executor CPU Time metric --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22217: [SPARK-25228][CORE]Add executor CPU time metric
Github user LucaCanali closed the pull request at: https://github.com/apache/spark/pull/22217 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22217: [SPARK-25228][CORE]Add executor CPU time metric
GitHub user LucaCanali opened a pull request: https://github.com/apache/spark/pull/22217 [SPARK-25228][CORE]Add executor CPU time metric ## What changes were proposed in this pull request? Add a new metric to measure the executor's process (JVM) CPU time ## How was this patch tested? Manually tested on a Spark cluster (see SPARK-25228 for an example screenshot). Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/LucaCanali/spark executorCPUTImeNewMetric Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22217.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22217 commit a8db1605adbc271c785fda24b4945bf87149a4cd Author: LucaCanali Date: 2018-08-20T14:12:52Z Document Spark Executor Task Metrics commit 51454e7629e38f92165fc48676fc8a1af91edf26 Author: LucaCanali Date: 2018-08-24T11:34:41Z add executorCPUTime metric --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22167: [SPARK-25170][DOC] Add list and short description...
GitHub user LucaCanali opened a pull request: https://github.com/apache/spark/pull/22167 [SPARK-25170][DOC] Add list and short description of Spark Executor Task Metrics to the documentation ## What changes were proposed in this pull request? Add description of Task Metrics to the documentation. ## How was this patch tested? None. You can merge this pull request into a Git repository by running: $ git pull https://github.com/LucaCanali/spark docMonitoringTaskMetrics Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22167.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22167 commit a8db1605adbc271c785fda24b4945bf87149a4cd Author: LucaCanali Date: 2018-08-20T14:12:52Z Document Spark Executor Task Metrics --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19426: [SPARK-22190][CORE] Add Spark executor task metrics to D...
Github user LucaCanali commented on the issue: https://github.com/apache/spark/pull/19426 Thanks @umehrot2 for the review and comments. Indeed well spotted that I had forgotten a couple of metrics and added one of them twice. This is hopefully fixed with the latest commit. As for your comment on the Spark Executor task metric JVMGCTime, my first reaction is that (1) I think it is good to have this type of info in the context of the executor metrics (2) It does not appear to me that JVMGCTime measured at the executor task level provides a direct duplication of info available in the JVM source (I am available to investigte this in more details in follow-up comments if you wish). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19426: [SPARK-22190][CORE] Add Spark executor task metrics to D...
Github user LucaCanali commented on the issue: https://github.com/apache/spark/pull/19426 Thanks @hvanhovell for pushing the test. I see that the test build threw an error on Scala style tests: the logs report "/Executor.scala:443:0: Whitespace at end of line". However, I cannot reproduce the error when I run scalastyle locally, moreover it appears to me that line Executor.scala:443 is empty (so no extra space there). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19426: [SPARK-22190][CORE] Add Spark executor task metri...
GitHub user LucaCanali opened a pull request: https://github.com/apache/spark/pull/19426 [SPARK-22190][CORE] Add Spark executor task metrics to Dropwizard metrics ## What changes were proposed in this pull request? This proposed patch is about making Spark executor task metrics available as Dropwizard metrics. This is intended to be of aid in monitoring Spark jobs and when drilling down on performance troubleshooting issues. ## How was this patch tested? Manually tested on a Spark cluster (see JIRA for an example screenshot). You can merge this pull request into a Git repository by running: $ git pull https://github.com/LucaCanali/spark SparkTaskMetricsDropWizard Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19426.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19426 commit ca43764327e8796ea01d5053fc73ca7136b57835 Author: LucaCanali <luca.can...@cern.ch> Date: 2017-10-03T13:48:44Z Add Spark executor task metrics to Dropwizard metrics --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19039: [SPARK-21829][CORE] Enable config to permanently blackli...
Github user LucaCanali commented on the issue: https://github.com/apache/spark/pull/19039 @jiangxb1987 Indeed good suggestion by @jerryshao - I have replied on SPARK-21829 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19039: [SPARK-21829][CORE] Enable config to permanently blackli...
Github user LucaCanali commented on the issue: https://github.com/apache/spark/pull/19039 Thanks @jiangxb1987 for the review. I have tried to address the comments in a new commit, in particular adding the configuration to internal/config and building a private function to handle processing of the node list in `spark.blacklist.alwaysBlacklistedNodes`. As for setting `_nodeBlacklist` I think it makes sense to use `_nodeBlacklist.set(nodeIdToBlacklistExpiryTime.keySet.toSet)` to keep it consistent with the rest of the code in BlacklistTracker. Also `nodeIdToBlacklistExpiryTime` needs to be initialized with the blacklisted nodes. As for the usefulness of the feature, I understand your comment and I have added some comments in SPARK-21829. The need for this feature for me comes from a production issue, which I realize is not very common, but I guess can happen again in my environment and maybe in others'. What we have is a shared YARN cluster and we have a workload that runs slow on a couple of nodes, however the nodes are fine to run other types of jobs, so we want to have them in the cluster. The actual problem comes from reading from an external file system, and apparently only for this specific workload (which is only one of many workloads that run on that cluster). What I have done as a workaround to make the job run faster so far is just killing the executors on the 2 "slow nodes" and the job could finish faster as it avoided the painfully slow long tail of execution on the affected nodes. The proposed patch/feature is an attempt to address this case in a more structured way than just going on the nodes and killing executors. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19039: Add feature to permanently blacklist a user-speci...
GitHub user LucaCanali opened a pull request: https://github.com/apache/spark/pull/19039 Add feature to permanently blacklist a user-specified list of nodes, ⦠â¦SPARK-21829 ## What changes were proposed in this pull request? With this new feature I propose to introduce a mechanism to allow users to specify a list of nodes in the cluster where executors/tasks should not run for a specific job. The proposed implementation that I tested (see PR) uses the Spark blacklist mechanism. With the parameter spark.blacklist.alwaysBlacklistedNodes, a list of user-specified nodes is added to the blacklist at the start of the Spark Context and it is never expired. I have tested this on a YARN cluster on a case taken from the original production problem and I confirm a performance improvement of about 5x for the specific test case I have. I imagine that there can be other cases where Spark users may want to blacklist a set of nodes. This can be used for troubleshooting, including cases where certain nodes/executors are slow for a given workload and this is caused by external agents, so the anomaly is not picked up by the cluster manager. See also SPARK-21829 ## How was this patch tested? A test has been added to the BlackListTrackerSuite. The patch has also been successfully tested manually on a YARN cluster. You can merge this pull request into a Git repository by running: $ git pull https://github.com/LucaCanali/spark SchedulerAlwaysBlacklistedNodes Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19039.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19039 commit 4932d3767e3c3bb24a94a7422b7b8e6f7a5bf08b Author: LucaCanali <luca.can...@cern.ch> Date: 2017-08-24T13:27:54Z Add feature to permanently blacklist a user-specified list of nodes, SPARK-21829 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18724: [SPARK-21519][SQL] Add an option to the JDBC data...
Github user LucaCanali commented on a diff in the pull request: https://github.com/apache/spark/pull/18724#discussion_r132679699 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala --- @@ -1007,4 +1007,23 @@ class JDBCSuite extends SparkFunSuite assert(sql("select * from people_view").count() == 3) } } + + test("SPARK-21519: option sessionInitStatement, run SQL to initialize the database session.") { +val initSQL1 = "SET @MYTESTVAR 21519" +val df1 = spark.read.format("jdbc") + .option("url", urlWithUserAndPass) + .option("dbtable", "(SELECT NVL(@MYTESTVAR, -1))") + .option("sessionInitStatement", initSQL1) + .load() +assert(df1.collect() === Array(Row(21519))) + +val initSQL2 = "SET SCHEMA DUMMY" --- End diff -- Thanks for the clarification. I have now added a test that runs 2 SQL statements. For future reference I'd like to stress the fact that the code executed by the option "sessionInitStatement" is just the user-provided string fed through the execute method of the JDBC connection, so it can use the features of the target database language/syntax. In the case of the test I wrote for the H2 database I have just put together two commands separated by ";". When using sessionInitStatement for querying Oracle, for example, the user-provided command can be a SQL statemnet or a PL/SQL block grouping multiple commands and logic. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18724: [SPARK-21519][SQL] Add an option to the JDBC data source...
Github user LucaCanali commented on the issue: https://github.com/apache/spark/pull/18724 Thank you very much @gatorsmile for the review. I plan to provide the required changes and add a test case, however it is probably going to take one more week before I can do that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18724: [SPARK-21519][SQL] Add an option to the JDBC data...
GitHub user LucaCanali opened a pull request: https://github.com/apache/spark/pull/18724 [SPARK-21519][SQL] Add an option to the JDBC data source to initialize the target DB environment Add an option to the JDBC data source to initialize the environment of the remote database session ## What changes were proposed in this pull request? This proposes an option to the JDBC datasource, tentatively called " sessionInitStatement" to implement the functionality of session initialization present for example in the Sqoop connector for Oracle (see https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oraoop_oracle_session_initialization_statements ) . After each database session is opened to the remote DB, and before starting to read data, this option executes a custom SQL statement (or a PL/SQL block in the case of Oracle). See also https://issues.apache.org/jira/browse/SPARK-21519 ## How was this patch tested? Manually tested using Spark SQL data source and Oracle JDBC You can merge this pull request into a Git repository by running: $ git pull https://github.com/LucaCanali/spark JDBC_datasource_sessionInitStatement Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18724.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18724 commit 92d082a67910bd7e5f3aeefc204f97a60f144e08 Author: LucaCanali <luca.can...@cern.ch> Date: 2017-07-24T08:46:17Z Add an option to the JDBC data source to initialize the environment of the remote database session --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org