[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN

2018-07-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21635


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN

2018-07-22 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21635#discussion_r204265925
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala 
---
@@ -152,6 +152,11 @@ package object config {
 .timeConf(TimeUnit.MILLISECONDS)
 .createWithDefaultString("100s")
 
+  private[spark] val YARN_METRICS_NAMESPACE = 
ConfigBuilder("spark.yarn.metrics.namespace")
--- End diff --

Can you please add this configuration to the yarn doc?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN

2018-07-19 Thread attilapiros
Github user attilapiros commented on a diff in the pull request:

https://github.com/apache/spark/pull/21635#discussion_r203840572
  
--- Diff: docs/monitoring.md ---
@@ -435,6 +435,7 @@ set of sinks to which metrics are reported. The 
following instances are currentl
 * `executor`: A Spark executor.
 * `driver`: The Spark driver process (the process in which your 
SparkContext is created).
 * `shuffleService`: The Spark shuffle service.
+* `applicationMaster`: The Spark application master on YARN.
 
--- End diff --

Thanks, updated accordingly.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN

2018-07-19 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21635#discussion_r203744555
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMasterSource.scala
 ---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import com.codahale.metrics.{Gauge, MetricRegistry}
+
+import org.apache.spark.metrics.source.Source
+
+private[spark] class ApplicationMasterSource(yarnAllocator: YarnAllocator) 
extends Source {
+
+  override val sourceName: String = "applicationMaster"
--- End diff --

Ah for the client mode yes there is an order issue with spark.app.id. I'm 
fine with using the yarn app id since that is essentially what the driver 
executor use anyway, but I think we should also make it configurable.  I like 
to see these stay consistent.  If the user can set the driver/executor metrics 
with spark.metrics.namespace we should allow them to set the yarn ones so that 
they all could have similar prefix.  Perhaps we add a 
spark.yarn.metrics.namespace?


application_1530654167152_24008.driver.LiveListenerBus.listenerProcessingTime.org.apache.spark.ExecutorAllocationManager$ExecutorAllocationListener
application_1530654167152_25538.2.executor.recordsRead


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN

2018-07-18 Thread attilapiros
Github user attilapiros commented on a diff in the pull request:

https://github.com/apache/spark/pull/21635#discussion_r203594956
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMasterSource.scala
 ---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import com.codahale.metrics.{Gauge, MetricRegistry}
+
+import org.apache.spark.metrics.source.Source
+
+private[spark] class ApplicationMasterSource(yarnAllocator: YarnAllocator) 
extends Source {
+
+  override val sourceName: String = "applicationMaster"
--- End diff --

I like the idea to make the metric names more app specific. So I will 
prepend the app ID to the sourcename. And rerun my test.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN

2018-07-18 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21635#discussion_r203580155
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMasterSource.scala
 ---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import com.codahale.metrics.{Gauge, MetricRegistry}
+
+import org.apache.spark.metrics.source.Source
+
+private[spark] class ApplicationMasterSource(yarnAllocator: YarnAllocator) 
extends Source {
+
+  override val sourceName: String = "applicationMaster"
--- End diff --

I see. But I think we may not get "spark.app.id" in AM side, instead I 
think we can get yarn application id, so either we can set this configuration 
with application id, or directly prepend to the source name.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN

2018-07-18 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21635#discussion_r203495430
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMasterSource.scala
 ---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import com.codahale.metrics.{Gauge, MetricRegistry}
+
+import org.apache.spark.metrics.source.Source
+
+private[spark] class ApplicationMasterSource(yarnAllocator: YarnAllocator) 
extends Source {
+
+  override val sourceName: String = "applicationMaster"
--- End diff --

the config spark.metrics.namespace already exists.  see the metrics section 
in http://spark.apache.org/docs/latest/monitoring.html.  But if you look at the 
code 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala#L129
 its only applied for executor and driver metrics.  I think we should have it 
apply to the yarn metrics as well.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN

2018-07-17 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21635#discussion_r203232034
  
--- Diff: docs/monitoring.md ---
@@ -435,6 +435,7 @@ set of sinks to which metrics are reported. The 
following instances are currentl
 * `executor`: A Spark executor.
 * `driver`: The Spark driver process (the process in which your 
SparkContext is created).
 * `shuffleService`: The Spark shuffle service.
+* `applicationMaster`: The Spark application master on YARN.
 
--- End diff --

I think it would be better to clarify as "The Spark ApplicationMaster when 
running on YARN."


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN

2018-07-17 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21635#discussion_r203228423
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMasterSource.scala
 ---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import com.codahale.metrics.{Gauge, MetricRegistry}
+
+import org.apache.spark.metrics.source.Source
+
+private[spark] class ApplicationMasterSource(yarnAllocator: YarnAllocator) 
extends Source {
+
+  override val sourceName: String = "applicationMaster"
--- End diff --

@tgravescs Would you please explain more, are you going to add a new 
configuration "spark.metrics.namespace", also how do you use this configuration?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN

2018-07-17 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21635#discussion_r20309
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMasterSource.scala
 ---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import com.codahale.metrics.{Gauge, MetricRegistry}
+
+import org.apache.spark.metrics.source.Source
+
+private[spark] class ApplicationMasterSource(yarnAllocator: YarnAllocator) 
extends Source {
+
+  override val sourceName: String = "applicationMaster"
--- End diff --

Ah good catch, I was thinking it automatically added the namespace but it 
looks like that is only on executor and driver instances.  Perhaps we should 
just add it as system that will append in the spark.metrics.namespace setting.  
for yarn I see the applicationmaster metrics the same as the dag scheduler 
source, executor allocation manager, etc..  Allowing user to control this makes 
sense to me.  thoughts?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN

2018-07-16 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21635#discussion_r202886551
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMasterSource.scala
 ---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import com.codahale.metrics.{Gauge, MetricRegistry}
+
+import org.apache.spark.metrics.source.Source
+
+private[spark] class ApplicationMasterSource(yarnAllocator: YarnAllocator) 
extends Source {
+
+  override val sourceName: String = "applicationMaster"
--- End diff --

In case this is the metrics output:

```
-- Gauges 
--
applicationMaster.numContainersPendingAllocate
 value = 0
applicationMaster.numExecutorsFailed
 value = 3
applicationMaster.numExecutorsRunning
 value = 9
applicationMaster.numLocalityAwareTasks
 value = 0
applicationMaster.numReleasedContainers
 value = 0
...
```

I would suggest to add application id as a prefix to differentiate between 
different apps.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN

2018-07-16 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21635#discussion_r202886077
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
 ---
@@ -309,6 +312,16 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments) extends
 finish(FinalApplicationStatus.FAILED,
   ApplicationMaster.EXIT_UNCAUGHT_EXCEPTION,
   "Uncaught exception: " + StringUtils.stringifyException(e))
+} finally {
+  try {
+metricsSystem.foreach { ms =>
+  ms.report()
+  ms.stop()
+}
+  } catch {
+case e: Exception =>
+  logInfo("Exception during stopping of the metric system: ", e)
--- End diff --

I would suggest to change to warning log if exception occurred.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN

2018-07-12 Thread attilapiros
Github user attilapiros commented on a diff in the pull request:

https://github.com/apache/spark/pull/21635#discussion_r202056881
  
--- Diff: docs/monitoring.md ---
@@ -435,6 +435,7 @@ set of sinks to which metrics are reported. The 
following instances are currentl
 * `executor`: A Spark executor.
 * `driver`: The Spark driver process (the process in which your 
SparkContext is created).
 * `shuffleService`: The Spark shuffle service.
+* `yarn`: Spark resource allocations on YARN.
--- End diff --

Successfully retested on cluster.  


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN

2018-07-12 Thread attilapiros
Github user attilapiros commented on a diff in the pull request:

https://github.com/apache/spark/pull/21635#discussion_r202035811
  
--- Diff: docs/monitoring.md ---
@@ -435,6 +435,7 @@ set of sinks to which metrics are reported. The 
following instances are currentl
 * `executor`: A Spark executor.
 * `driver`: The Spark driver process (the process in which your 
SparkContext is created).
 * `shuffleService`: The Spark shuffle service.
+* `yarn`: Spark resource allocations on YARN.
--- End diff --

Sure we can do that. After this many change I would like to test it on a 
cluster again. 
Soon I will come back with the result.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN

2018-07-12 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21635#discussion_r202015739
  
--- Diff: docs/monitoring.md ---
@@ -435,6 +435,7 @@ set of sinks to which metrics are reported. The 
following instances are currentl
 * `executor`: A Spark executor.
 * `driver`: The Spark driver process (the process in which your 
SparkContext is created).
 * `shuffleService`: The Spark shuffle service.
+* `yarn`: Spark resource allocations on YARN.
--- End diff --

Is it better to change to application master for better understanding?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN

2018-07-12 Thread attilapiros
Github user attilapiros commented on a diff in the pull request:

https://github.com/apache/spark/pull/21635#discussion_r201979140
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
 ---
@@ -309,6 +312,11 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments) extends
 finish(FinalApplicationStatus.FAILED,
   ApplicationMaster.EXIT_UNCAUGHT_EXCEPTION,
   "Uncaught exception: " + StringUtils.stringifyException(e))
+} finally {
+  metricsSystem.foreach { ms =>
+ms.report()
--- End diff --

In case of OOM or any interrupt exception I expect [Line 
309](https://github.com/attilapiros/spark/blob/f3781bdcadb84f90cbb3402f38df9c6493fb3ad2/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L309)
 to catch the exception and log the error. But the exit code can be lost if 
metricSystems throws new exception so to be on the safe side I add the try 
catch to avoid this. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN

2018-07-10 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21635#discussion_r201339532
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
 ---
@@ -309,6 +312,11 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments) extends
 finish(FinalApplicationStatus.FAILED,
   ApplicationMaster.EXIT_UNCAUGHT_EXCEPTION,
   "Uncaught exception: " + StringUtils.stringifyException(e))
+} finally {
+  metricsSystem.foreach { ms =>
+ms.report()
--- End diff --

is there any issues with this if AM gets killed badly or OOMs?Basically 
where we would want to wrap this in try catch and ignore any exceptions from 
it. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN

2018-07-10 Thread attilapiros
Github user attilapiros commented on a diff in the pull request:

https://github.com/apache/spark/pull/21635#discussion_r201331447
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnClusterSchedulerSource.scala
 ---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import com.codahale.metrics.{Gauge, MetricRegistry}
+
+import org.apache.spark.metrics.source.Source
+
+private[spark] class YarnClusterSchedulerSource(yarnAllocator: 
YarnAllocator) extends Source {
+
+  override val sourceName: String = "yarn"
+  override val metricRegistry: MetricRegistry = new MetricRegistry()
+
+  metricRegistry.register(MetricRegistry.name("numExecutorsFailed"), new 
Gauge[Int] {
+override def getValue: Int = yarnAllocator.getNumExecutorsFailed
+  })
+
+  metricRegistry.register(MetricRegistry.name("numExecutorsRunning"), new 
Gauge[Int] {
+override def getValue: Int = yarnAllocator.getNumExecutorsRunning
+  })
+
+  metricRegistry.register(MetricRegistry.name("numReleasedContainers"), 
new Gauge[Int] {
+override def getValue: Int = yarnAllocator.getNumReleasedContainers
+  })
+
+  metricRegistry.register(MetricRegistry.name("numLocalityAwareTasks"), 
new Gauge[Int] {
+override def getValue: Int = yarnAllocator.numLocalityAwareTasks
+  })
+
+  metricRegistry.register(MetricRegistry.name("numPendingAllocate"), new 
Gauge[Int] {
--- End diff --

Sure we can do that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN

2018-07-09 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21635#discussion_r201065806
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnClusterSchedulerSource.scala
 ---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import com.codahale.metrics.{Gauge, MetricRegistry}
+
+import org.apache.spark.metrics.source.Source
+
+private[spark] class YarnClusterSchedulerSource(yarnAllocator: 
YarnAllocator) extends Source {
+
+  override val sourceName: String = "yarn"
+  override val metricRegistry: MetricRegistry = new MetricRegistry()
+
+  metricRegistry.register(MetricRegistry.name("numExecutorsFailed"), new 
Gauge[Int] {
+override def getValue: Int = yarnAllocator.getNumExecutorsFailed
+  })
+
+  metricRegistry.register(MetricRegistry.name("numExecutorsRunning"), new 
Gauge[Int] {
+override def getValue: Int = yarnAllocator.getNumExecutorsRunning
+  })
+
+  metricRegistry.register(MetricRegistry.name("numReleasedContainers"), 
new Gauge[Int] {
+override def getValue: Int = yarnAllocator.getNumReleasedContainers
+  })
+
+  metricRegistry.register(MetricRegistry.name("numLocalityAwareTasks"), 
new Gauge[Int] {
+override def getValue: Int = yarnAllocator.numLocalityAwareTasks
+  })
+
+  metricRegistry.register(MetricRegistry.name("numPendingAllocate"), new 
Gauge[Int] {
--- End diff --

should we give these a more clear name.  Like numContainersPendingAllocate?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN

2018-07-02 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21635#discussion_r199543397
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnClusterSchedulerSource.scala
 ---
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import com.codahale.metrics.{Gauge, MetricRegistry}
+
+import org.apache.spark.metrics.source.Source
+
+private[spark] class YarnClusterSchedulerSource(yarnAllocator: 
YarnAllocator) extends Source {
+
+  override val sourceName: String = "yarn"
+  override val metricRegistry: MetricRegistry = new MetricRegistry()
+
+  metricRegistry.register(MetricRegistry.name("numExecutorsFailed"), new 
Gauge[Int] {
+override def getValue: Int = yarnAllocator.getNumExecutorsFailed
+  })
+
+  metricRegistry.register(MetricRegistry.name("numExecutorsRunning"), new 
Gauge[Int] {
+override def getValue: Int = yarnAllocator.getNumExecutorsRunning
+  })
+
+  metricRegistry.register(MetricRegistry.name("numReleasedContainers"), 
new Gauge[Int] {
+override def getValue: Int = yarnAllocator.getNumReleasedContainers
+  })
+
+  
metricRegistry.register(MetricRegistry.name("numPendingLossReasonRequests"), 
new Gauge[Int] {
+override def getValue: Int = 
yarnAllocator.getNumPendingLossReasonRequests
--- End diff --

yeah I would leave it out if no one specifically requested it and we can't 
think of use case.  Its easier to add later then to remove.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN

2018-07-02 Thread attilapiros
Github user attilapiros commented on a diff in the pull request:

https://github.com/apache/spark/pull/21635#discussion_r199527182
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnClusterSchedulerSource.scala
 ---
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import com.codahale.metrics.{Gauge, MetricRegistry}
+
+import org.apache.spark.metrics.source.Source
+
+private[spark] class YarnClusterSchedulerSource(yarnAllocator: 
YarnAllocator) extends Source {
+
+  override val sourceName: String = "yarn"
+  override val metricRegistry: MetricRegistry = new MetricRegistry()
+
+  metricRegistry.register(MetricRegistry.name("numExecutorsFailed"), new 
Gauge[Int] {
+override def getValue: Int = yarnAllocator.getNumExecutorsFailed
+  })
+
+  metricRegistry.register(MetricRegistry.name("numExecutorsRunning"), new 
Gauge[Int] {
+override def getValue: Int = yarnAllocator.getNumExecutorsRunning
+  })
+
+  metricRegistry.register(MetricRegistry.name("numReleasedContainers"), 
new Gauge[Int] {
+override def getValue: Int = yarnAllocator.getNumReleasedContainers
+  })
+
+  
metricRegistry.register(MetricRegistry.name("numPendingLossReasonRequests"), 
new Gauge[Int] {
+override def getValue: Int = 
yarnAllocator.getNumPendingLossReasonRequests
--- End diff --

Sorry I have no use case for it. I added it as in previous comments it was 
requested to have more metrics and this one was something easy to collect If it 
is totally useless then better to remove it.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN

2018-07-02 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21635#discussion_r199524583
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnClusterSchedulerSource.scala
 ---
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import com.codahale.metrics.{Gauge, MetricRegistry}
+
+import org.apache.spark.metrics.source.Source
+
+private[spark] class YarnClusterSchedulerSource(yarnAllocator: 
YarnAllocator) extends Source {
+
+  override val sourceName: String = "yarn"
+  override val metricRegistry: MetricRegistry = new MetricRegistry()
+
+  metricRegistry.register(MetricRegistry.name("numExecutorsFailed"), new 
Gauge[Int] {
+override def getValue: Int = yarnAllocator.getNumExecutorsFailed
+  })
+
+  metricRegistry.register(MetricRegistry.name("numExecutorsRunning"), new 
Gauge[Int] {
+override def getValue: Int = yarnAllocator.getNumExecutorsRunning
+  })
+
+  metricRegistry.register(MetricRegistry.name("numReleasedContainers"), 
new Gauge[Int] {
+override def getValue: Int = yarnAllocator.getNumReleasedContainers
+  })
+
+  
metricRegistry.register(MetricRegistry.name("numPendingLossReasonRequests"), 
new Gauge[Int] {
+override def getValue: Int = 
yarnAllocator.getNumPendingLossReasonRequests
--- End diff --

I'm not sure how useful this metric is, did you have specific use case?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN

2018-06-27 Thread attilapiros
Github user attilapiros commented on a diff in the pull request:

https://github.com/apache/spark/pull/21635#discussion_r198641094
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnClusterSchedulerSource.scala
 ---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import com.codahale.metrics.{Gauge, MetricRegistry}
+
+import org.apache.spark.metrics.source.Source
+
+private[spark] class YarnClusterSchedulerSource(yarnAllocator: 
YarnAllocator) extends Source {
+
+  override val sourceName: String = "yarn"
+  override val metricRegistry: MetricRegistry = new MetricRegistry()
+
+  metricRegistry.register(MetricRegistry.name("numExecutorsFailed"), new 
Gauge[Int] {
+override def getValue: Int = yarnAllocator.getNumExecutorsFailed
+  })
+
+  metricRegistry.register(MetricRegistry.name("numExecutorsRunning"), new 
Gauge[Int] {
+override def getValue: Int = yarnAllocator.getNumExecutorsRunning
+  })
+
+  metricRegistry.register(MetricRegistry.name("numReleasedContainers"), 
new Gauge[Int] {
+override def getValue: Int = yarnAllocator.getNumReleasedContainers
+  })
+
+  
metricRegistry.register(MetricRegistry.name("numPendingLossReasonRequests"), 
new Gauge[Int] {
+override def getValue: Int = 
yarnAllocator.getNumPendingLossReasonRequests
+  })
+
+  metricRegistry.register(MetricRegistry.name("numLocalityAwareTasks"), 
new Gauge[Int] {
+override def getValue: Int = yarnAllocator.numLocalityAwareTasks
+  })
+
+}
--- End diff --

The getPendingAllocate seams to me quite cheap as it just uses local maps 
(and tables) to calculate a list of ContainerRequests.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN

2018-06-27 Thread attilapiros
Github user attilapiros commented on a diff in the pull request:

https://github.com/apache/spark/pull/21635#discussion_r198630307
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
 ---
@@ -309,6 +312,9 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments) extends
 finish(FinalApplicationStatus.FAILED,
   ApplicationMaster.EXIT_UNCAUGHT_EXCEPTION,
   "Uncaught exception: " + StringUtils.stringifyException(e))
+} finally {
+  metricsSystem.report()
--- End diff --

Yes, better and more elegant to store metricSystem in an Option.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN

2018-06-27 Thread attilapiros
Github user attilapiros commented on a diff in the pull request:

https://github.com/apache/spark/pull/21635#discussion_r198629311
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnClusterSchedulerSource.scala
 ---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import com.codahale.metrics.{Gauge, MetricRegistry}
+
+import org.apache.spark.metrics.source.Source
+
+private[spark] class YarnClusterSchedulerSource(yarnAllocator: 
YarnAllocator) extends Source {
+
+  override val sourceName: String = "yarn"
+  override val metricRegistry: MetricRegistry = new MetricRegistry()
+
+  metricRegistry.register(MetricRegistry.name("numExecutorsFailed"), new 
Gauge[Int] {
+override def getValue: Int = yarnAllocator.getNumExecutorsFailed
+  })
+
+  metricRegistry.register(MetricRegistry.name("numExecutorsRunning"), new 
Gauge[Int] {
+override def getValue: Int = yarnAllocator.getNumExecutorsRunning
+  })
+
+  metricRegistry.register(MetricRegistry.name("numReleasedContainers"), 
new Gauge[Int] {
+override def getValue: Int = yarnAllocator.getNumReleasedContainers
+  })
+
+  
metricRegistry.register(MetricRegistry.name("numPendingLossReasonRequests"), 
new Gauge[Int] {
+override def getValue: Int = 
yarnAllocator.getNumPendingLossReasonRequests
+  })
+
+  metricRegistry.register(MetricRegistry.name("numLocalityAwareTasks"), 
new Gauge[Int] {
+override def getValue: Int = yarnAllocator.numLocalityAwareTasks
+  })
+
+}
--- End diff --

Yes, I have seen the call goes to YARN and I also was afraid abut its 
execution time so this is why I finally decided to leave it out. 

But I will check whether there is something better to get it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN

2018-06-27 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21635#discussion_r198626433
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
 ---
@@ -309,6 +312,9 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments) extends
 finish(FinalApplicationStatus.FAILED,
   ApplicationMaster.EXIT_UNCAUGHT_EXCEPTION,
   "Uncaught exception: " + StringUtils.stringifyException(e))
+} finally {
+  metricsSystem.report()
--- End diff --

`metricsSystem` can be null at this point, can't it? in case of some issue 
during startup.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN

2018-06-27 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21635#discussion_r198627306
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnClusterSchedulerSource.scala
 ---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import com.codahale.metrics.{Gauge, MetricRegistry}
+
+import org.apache.spark.metrics.source.Source
+
+private[spark] class YarnClusterSchedulerSource(yarnAllocator: 
YarnAllocator) extends Source {
+
+  override val sourceName: String = "yarn"
+  override val metricRegistry: MetricRegistry = new MetricRegistry()
+
+  metricRegistry.register(MetricRegistry.name("numExecutorsFailed"), new 
Gauge[Int] {
+override def getValue: Int = yarnAllocator.getNumExecutorsFailed
+  })
+
+  metricRegistry.register(MetricRegistry.name("numExecutorsRunning"), new 
Gauge[Int] {
+override def getValue: Int = yarnAllocator.getNumExecutorsRunning
+  })
+
+  metricRegistry.register(MetricRegistry.name("numReleasedContainers"), 
new Gauge[Int] {
+override def getValue: Int = yarnAllocator.getNumReleasedContainers
+  })
+
+  
metricRegistry.register(MetricRegistry.name("numPendingLossReasonRequests"), 
new Gauge[Int] {
+override def getValue: Int = 
yarnAllocator.getNumPendingLossReasonRequests
+  })
+
+  metricRegistry.register(MetricRegistry.name("numLocalityAwareTasks"), 
new Gauge[Int] {
+override def getValue: Int = yarnAllocator.numLocalityAwareTasks
+  })
+
+}
--- End diff --

The size of `getPendingAllocate` might be an interesting metric, but need 
to check whether it requires synchronization... and it may be an expensive 
operation, not sure if the AM client has a better API to get the number of 
pending requests.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN

2018-06-27 Thread attilapiros
Github user attilapiros commented on a diff in the pull request:

https://github.com/apache/spark/pull/21635#discussion_r198538655
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
 ---
@@ -309,6 +312,9 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments) extends
 finish(FinalApplicationStatus.FAILED,
   ApplicationMaster.EXIT_UNCAUGHT_EXCEPTION,
   "Uncaught exception: " + StringUtils.stringifyException(e))
+} finally {
+  metricsSystem.report()
--- End diff --

Thanks Tom, documentation is added. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN

2018-06-27 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21635#discussion_r198494546
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
 ---
@@ -309,6 +312,9 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments) extends
 finish(FinalApplicationStatus.FAILED,
   ApplicationMaster.EXIT_UNCAUGHT_EXCEPTION,
   "Uncaught exception: " + StringUtils.stringifyException(e))
+} finally {
+  metricsSystem.report()
--- End diff --

add yarn to the monitoring.md doc as sink


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN

2018-06-26 Thread attilapiros
Github user attilapiros commented on a diff in the pull request:

https://github.com/apache/spark/pull/21635#discussion_r198116931
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClusterSchedulerSource.scala
 ---
@@ -0,0 +1,35 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler.cluster
+
+import com.codahale.metrics.{Gauge, MetricRegistry}
+
+import org.apache.spark.deploy.yarn.FailureTracker
+import org.apache.spark.metrics.source.Source
+
+private[spark] class YarnClusterSchedulerSource(failureTracker: 
FailureTracker) extends Source {
+
+  override val sourceName: String = "yarn_cluster"
+  override val metricRegistry: MetricRegistry = new MetricRegistry()
+
+  metricRegistry.register(
--- End diff --

I have added some new metrics. So the current metrics are:
- yarn.numExecutorsFailed
- yarn.numExecutorsRunning
- yarn.numLocalityAwareTasks
- yarn.numPendingLossReasonRequests
- yarn.numReleasedContainers



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN

2018-06-26 Thread attilapiros
Github user attilapiros commented on a diff in the pull request:

https://github.com/apache/spark/pull/21635#discussion_r198116371
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
 ---
@@ -67,6 +68,14 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments) extends
 
   private val securityMgr = new SecurityManager(sparkConf)
 
+  private[spark] val failureTracker = new FailureTracker(sparkConf, new 
SystemClock)
--- End diff --

As other metrics from YarnAllocator will be added this change is reverted.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN

2018-06-26 Thread attilapiros
Github user attilapiros commented on a diff in the pull request:

https://github.com/apache/spark/pull/21635#discussion_r198115953
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
 ---
@@ -771,6 +784,7 @@ object ApplicationMaster extends Logging {
 
   private var master: ApplicationMaster = _
 
+
--- End diff --

ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN ...

2018-06-25 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21635#discussion_r198004697
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClusterSchedulerSource.scala
 ---
@@ -0,0 +1,35 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler.cluster
+
+import com.codahale.metrics.{Gauge, MetricRegistry}
+
+import org.apache.spark.deploy.yarn.FailureTracker
+import org.apache.spark.metrics.source.Source
+
+private[spark] class YarnClusterSchedulerSource(failureTracker: 
FailureTracker) extends Source {
+
+  override val sourceName: String = "yarn_cluster"
+  override val metricRegistry: MetricRegistry = new MetricRegistry()
+
+  metricRegistry.register(
--- End diff --

Agreed, creating metric source with only one metrics seems overkill. Maybe 
we can mix this into  `FailureTracker`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN ...

2018-06-25 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21635#discussion_r197975990
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
 ---
@@ -771,6 +784,7 @@ object ApplicationMaster extends Logging {
 
   private var master: ApplicationMaster = _
 
+
--- End diff --

nit: remove


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN ...

2018-06-25 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21635#discussion_r197976701
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClusterSchedulerSource.scala
 ---
@@ -0,0 +1,35 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler.cluster
+
+import com.codahale.metrics.{Gauge, MetricRegistry}
+
+import org.apache.spark.deploy.yarn.FailureTracker
+import org.apache.spark.metrics.source.Source
+
+private[spark] class YarnClusterSchedulerSource(failureTracker: 
FailureTracker) extends Source {
+
+  override val sourceName: String = "yarn_cluster"
+  override val metricRegistry: MetricRegistry = new MetricRegistry()
+
+  metricRegistry.register(
--- End diff --

The mechanics of adding the metric source are ok, but have you thought of 
other metrics to expose? `YarnAllocator` has a lot of things that could be 
easily hooked up here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN ...

2018-06-25 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21635#discussion_r197976137
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
 ---
@@ -67,6 +68,14 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments) extends
 
   private val securityMgr = new SecurityManager(sparkConf)
 
+  private[spark] val failureTracker = new FailureTracker(sparkConf, new 
SystemClock)
--- End diff --

No need to specify the clock here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN ...

2018-06-25 Thread attilapiros
GitHub user attilapiros opened a pull request:

https://github.com/apache/spark/pull/21635

[SPARK-24594][YARN] Introducing metrics for YARN executor allocation 
problems

## What changes were proposed in this pull request?

In this PR metrics are introduced for YARN allocation failures.  As up to 
now there was no metrics in the YARN module a new metric system is created with 
the name "yarn".
To support both client and cluster mode the metric system lifecycle is 
bound to the AM. 

## How was this patch tested?

Both client and cluster mode was tested manually. 
Before the test on one of the YARN node spark-core was removed to cause the 
allocation failure. 
Spark was started as (in case of client mode): 

```
spark2-submit \
  --class org.apache.spark.examples.SparkPi \
  --conf "spark.yarn.blacklist.executor.launch.blacklisting.enabled=true" 
--conf "spark.blacklist.application.maxFailedExecutorsPerNode=2" --conf 
"spark.dynamicAllocation.enabled=true" --conf 
"spark.metrics.conf.*.sink.console.class=org.apache.spark.metrics.sink.ConsoleSink"
 \
  --master yarn \
  --deploy-mode client \
  original-spark-examples_2.11-2.4.0-SNAPSHOT.jar \
  1000
```

In both cases the YARN logs contained the new metrics as:

```
$ yarn logs --applicationId application_1529926424933_0015 | grep -A1 -B1 
yarn.numFailedExecutors
18/06/25 07:08:29 INFO client.RMProxy: Connecting to ResourceManager at ...
-- Gauges 
--
yarn.numFailedExecutors
 value = 0
--
-- Gauges 
--
yarn.numFailedExecutors
 value = 3
--
-- Gauges 
--
yarn.numFailedExecutors
 value = 3
--
-- Gauges 
--
yarn.numFailedExecutors
 value = 3
``` 




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/attilapiros/spark SPARK-24594

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21635.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21635


commit 9b033ccfa572c93d7c2dc7bca06f9be1e363f88a
Author: “attilapiros” 
Date:   2018-06-19T19:40:20Z

Initial commit (yarn metrics)




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org