[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21635 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/21635#discussion_r204265925 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala --- @@ -152,6 +152,11 @@ package object config { .timeConf(TimeUnit.MILLISECONDS) .createWithDefaultString("100s") + private[spark] val YARN_METRICS_NAMESPACE = ConfigBuilder("spark.yarn.metrics.namespace") --- End diff -- Can you please add this configuration to the yarn doc? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN
Github user attilapiros commented on a diff in the pull request: https://github.com/apache/spark/pull/21635#discussion_r203840572 --- Diff: docs/monitoring.md --- @@ -435,6 +435,7 @@ set of sinks to which metrics are reported. The following instances are currentl * `executor`: A Spark executor. * `driver`: The Spark driver process (the process in which your SparkContext is created). * `shuffleService`: The Spark shuffle service. +* `applicationMaster`: The Spark application master on YARN. --- End diff -- Thanks, updated accordingly. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/21635#discussion_r203744555 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMasterSource.scala --- @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import com.codahale.metrics.{Gauge, MetricRegistry} + +import org.apache.spark.metrics.source.Source + +private[spark] class ApplicationMasterSource(yarnAllocator: YarnAllocator) extends Source { + + override val sourceName: String = "applicationMaster" --- End diff -- Ah for the client mode yes there is an order issue with spark.app.id. I'm fine with using the yarn app id since that is essentially what the driver executor use anyway, but I think we should also make it configurable. I like to see these stay consistent. If the user can set the driver/executor metrics with spark.metrics.namespace we should allow them to set the yarn ones so that they all could have similar prefix. Perhaps we add a spark.yarn.metrics.namespace? application_1530654167152_24008.driver.LiveListenerBus.listenerProcessingTime.org.apache.spark.ExecutorAllocationManager$ExecutorAllocationListener application_1530654167152_25538.2.executor.recordsRead --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN
Github user attilapiros commented on a diff in the pull request: https://github.com/apache/spark/pull/21635#discussion_r203594956 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMasterSource.scala --- @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import com.codahale.metrics.{Gauge, MetricRegistry} + +import org.apache.spark.metrics.source.Source + +private[spark] class ApplicationMasterSource(yarnAllocator: YarnAllocator) extends Source { + + override val sourceName: String = "applicationMaster" --- End diff -- I like the idea to make the metric names more app specific. So I will prepend the app ID to the sourcename. And rerun my test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/21635#discussion_r203580155 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMasterSource.scala --- @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import com.codahale.metrics.{Gauge, MetricRegistry} + +import org.apache.spark.metrics.source.Source + +private[spark] class ApplicationMasterSource(yarnAllocator: YarnAllocator) extends Source { + + override val sourceName: String = "applicationMaster" --- End diff -- I see. But I think we may not get "spark.app.id" in AM side, instead I think we can get yarn application id, so either we can set this configuration with application id, or directly prepend to the source name. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/21635#discussion_r203495430 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMasterSource.scala --- @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import com.codahale.metrics.{Gauge, MetricRegistry} + +import org.apache.spark.metrics.source.Source + +private[spark] class ApplicationMasterSource(yarnAllocator: YarnAllocator) extends Source { + + override val sourceName: String = "applicationMaster" --- End diff -- the config spark.metrics.namespace already exists. see the metrics section in http://spark.apache.org/docs/latest/monitoring.html. But if you look at the code https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala#L129 its only applied for executor and driver metrics. I think we should have it apply to the yarn metrics as well. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/21635#discussion_r203232034 --- Diff: docs/monitoring.md --- @@ -435,6 +435,7 @@ set of sinks to which metrics are reported. The following instances are currentl * `executor`: A Spark executor. * `driver`: The Spark driver process (the process in which your SparkContext is created). * `shuffleService`: The Spark shuffle service. +* `applicationMaster`: The Spark application master on YARN. --- End diff -- I think it would be better to clarify as "The Spark ApplicationMaster when running on YARN." --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/21635#discussion_r203228423 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMasterSource.scala --- @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import com.codahale.metrics.{Gauge, MetricRegistry} + +import org.apache.spark.metrics.source.Source + +private[spark] class ApplicationMasterSource(yarnAllocator: YarnAllocator) extends Source { + + override val sourceName: String = "applicationMaster" --- End diff -- @tgravescs Would you please explain more, are you going to add a new configuration "spark.metrics.namespace", also how do you use this configuration? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/21635#discussion_r20309 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMasterSource.scala --- @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import com.codahale.metrics.{Gauge, MetricRegistry} + +import org.apache.spark.metrics.source.Source + +private[spark] class ApplicationMasterSource(yarnAllocator: YarnAllocator) extends Source { + + override val sourceName: String = "applicationMaster" --- End diff -- Ah good catch, I was thinking it automatically added the namespace but it looks like that is only on executor and driver instances. Perhaps we should just add it as system that will append in the spark.metrics.namespace setting. for yarn I see the applicationmaster metrics the same as the dag scheduler source, executor allocation manager, etc.. Allowing user to control this makes sense to me. thoughts? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/21635#discussion_r202886551 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMasterSource.scala --- @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import com.codahale.metrics.{Gauge, MetricRegistry} + +import org.apache.spark.metrics.source.Source + +private[spark] class ApplicationMasterSource(yarnAllocator: YarnAllocator) extends Source { + + override val sourceName: String = "applicationMaster" --- End diff -- In case this is the metrics output: ``` -- Gauges -- applicationMaster.numContainersPendingAllocate value = 0 applicationMaster.numExecutorsFailed value = 3 applicationMaster.numExecutorsRunning value = 9 applicationMaster.numLocalityAwareTasks value = 0 applicationMaster.numReleasedContainers value = 0 ... ``` I would suggest to add application id as a prefix to differentiate between different apps. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/21635#discussion_r202886077 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -309,6 +312,16 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments) extends finish(FinalApplicationStatus.FAILED, ApplicationMaster.EXIT_UNCAUGHT_EXCEPTION, "Uncaught exception: " + StringUtils.stringifyException(e)) +} finally { + try { +metricsSystem.foreach { ms => + ms.report() + ms.stop() +} + } catch { +case e: Exception => + logInfo("Exception during stopping of the metric system: ", e) --- End diff -- I would suggest to change to warning log if exception occurred. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN
Github user attilapiros commented on a diff in the pull request: https://github.com/apache/spark/pull/21635#discussion_r202056881 --- Diff: docs/monitoring.md --- @@ -435,6 +435,7 @@ set of sinks to which metrics are reported. The following instances are currentl * `executor`: A Spark executor. * `driver`: The Spark driver process (the process in which your SparkContext is created). * `shuffleService`: The Spark shuffle service. +* `yarn`: Spark resource allocations on YARN. --- End diff -- Successfully retested on cluster. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN
Github user attilapiros commented on a diff in the pull request: https://github.com/apache/spark/pull/21635#discussion_r202035811 --- Diff: docs/monitoring.md --- @@ -435,6 +435,7 @@ set of sinks to which metrics are reported. The following instances are currentl * `executor`: A Spark executor. * `driver`: The Spark driver process (the process in which your SparkContext is created). * `shuffleService`: The Spark shuffle service. +* `yarn`: Spark resource allocations on YARN. --- End diff -- Sure we can do that. After this many change I would like to test it on a cluster again. Soon I will come back with the result. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/21635#discussion_r202015739 --- Diff: docs/monitoring.md --- @@ -435,6 +435,7 @@ set of sinks to which metrics are reported. The following instances are currentl * `executor`: A Spark executor. * `driver`: The Spark driver process (the process in which your SparkContext is created). * `shuffleService`: The Spark shuffle service. +* `yarn`: Spark resource allocations on YARN. --- End diff -- Is it better to change to application master for better understanding? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN
Github user attilapiros commented on a diff in the pull request: https://github.com/apache/spark/pull/21635#discussion_r201979140 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -309,6 +312,11 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments) extends finish(FinalApplicationStatus.FAILED, ApplicationMaster.EXIT_UNCAUGHT_EXCEPTION, "Uncaught exception: " + StringUtils.stringifyException(e)) +} finally { + metricsSystem.foreach { ms => +ms.report() --- End diff -- In case of OOM or any interrupt exception I expect [Line 309](https://github.com/attilapiros/spark/blob/f3781bdcadb84f90cbb3402f38df9c6493fb3ad2/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L309) to catch the exception and log the error. But the exit code can be lost if metricSystems throws new exception so to be on the safe side I add the try catch to avoid this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/21635#discussion_r201339532 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -309,6 +312,11 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments) extends finish(FinalApplicationStatus.FAILED, ApplicationMaster.EXIT_UNCAUGHT_EXCEPTION, "Uncaught exception: " + StringUtils.stringifyException(e)) +} finally { + metricsSystem.foreach { ms => +ms.report() --- End diff -- is there any issues with this if AM gets killed badly or OOMs?Basically where we would want to wrap this in try catch and ignore any exceptions from it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN
Github user attilapiros commented on a diff in the pull request: https://github.com/apache/spark/pull/21635#discussion_r201331447 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnClusterSchedulerSource.scala --- @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import com.codahale.metrics.{Gauge, MetricRegistry} + +import org.apache.spark.metrics.source.Source + +private[spark] class YarnClusterSchedulerSource(yarnAllocator: YarnAllocator) extends Source { + + override val sourceName: String = "yarn" + override val metricRegistry: MetricRegistry = new MetricRegistry() + + metricRegistry.register(MetricRegistry.name("numExecutorsFailed"), new Gauge[Int] { +override def getValue: Int = yarnAllocator.getNumExecutorsFailed + }) + + metricRegistry.register(MetricRegistry.name("numExecutorsRunning"), new Gauge[Int] { +override def getValue: Int = yarnAllocator.getNumExecutorsRunning + }) + + metricRegistry.register(MetricRegistry.name("numReleasedContainers"), new Gauge[Int] { +override def getValue: Int = yarnAllocator.getNumReleasedContainers + }) + + metricRegistry.register(MetricRegistry.name("numLocalityAwareTasks"), new Gauge[Int] { +override def getValue: Int = yarnAllocator.numLocalityAwareTasks + }) + + metricRegistry.register(MetricRegistry.name("numPendingAllocate"), new Gauge[Int] { --- End diff -- Sure we can do that. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/21635#discussion_r201065806 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnClusterSchedulerSource.scala --- @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import com.codahale.metrics.{Gauge, MetricRegistry} + +import org.apache.spark.metrics.source.Source + +private[spark] class YarnClusterSchedulerSource(yarnAllocator: YarnAllocator) extends Source { + + override val sourceName: String = "yarn" + override val metricRegistry: MetricRegistry = new MetricRegistry() + + metricRegistry.register(MetricRegistry.name("numExecutorsFailed"), new Gauge[Int] { +override def getValue: Int = yarnAllocator.getNumExecutorsFailed + }) + + metricRegistry.register(MetricRegistry.name("numExecutorsRunning"), new Gauge[Int] { +override def getValue: Int = yarnAllocator.getNumExecutorsRunning + }) + + metricRegistry.register(MetricRegistry.name("numReleasedContainers"), new Gauge[Int] { +override def getValue: Int = yarnAllocator.getNumReleasedContainers + }) + + metricRegistry.register(MetricRegistry.name("numLocalityAwareTasks"), new Gauge[Int] { +override def getValue: Int = yarnAllocator.numLocalityAwareTasks + }) + + metricRegistry.register(MetricRegistry.name("numPendingAllocate"), new Gauge[Int] { --- End diff -- should we give these a more clear name. Like numContainersPendingAllocate? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/21635#discussion_r199543397 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnClusterSchedulerSource.scala --- @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import com.codahale.metrics.{Gauge, MetricRegistry} + +import org.apache.spark.metrics.source.Source + +private[spark] class YarnClusterSchedulerSource(yarnAllocator: YarnAllocator) extends Source { + + override val sourceName: String = "yarn" + override val metricRegistry: MetricRegistry = new MetricRegistry() + + metricRegistry.register(MetricRegistry.name("numExecutorsFailed"), new Gauge[Int] { +override def getValue: Int = yarnAllocator.getNumExecutorsFailed + }) + + metricRegistry.register(MetricRegistry.name("numExecutorsRunning"), new Gauge[Int] { +override def getValue: Int = yarnAllocator.getNumExecutorsRunning + }) + + metricRegistry.register(MetricRegistry.name("numReleasedContainers"), new Gauge[Int] { +override def getValue: Int = yarnAllocator.getNumReleasedContainers + }) + + metricRegistry.register(MetricRegistry.name("numPendingLossReasonRequests"), new Gauge[Int] { +override def getValue: Int = yarnAllocator.getNumPendingLossReasonRequests --- End diff -- yeah I would leave it out if no one specifically requested it and we can't think of use case. Its easier to add later then to remove. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN
Github user attilapiros commented on a diff in the pull request: https://github.com/apache/spark/pull/21635#discussion_r199527182 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnClusterSchedulerSource.scala --- @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import com.codahale.metrics.{Gauge, MetricRegistry} + +import org.apache.spark.metrics.source.Source + +private[spark] class YarnClusterSchedulerSource(yarnAllocator: YarnAllocator) extends Source { + + override val sourceName: String = "yarn" + override val metricRegistry: MetricRegistry = new MetricRegistry() + + metricRegistry.register(MetricRegistry.name("numExecutorsFailed"), new Gauge[Int] { +override def getValue: Int = yarnAllocator.getNumExecutorsFailed + }) + + metricRegistry.register(MetricRegistry.name("numExecutorsRunning"), new Gauge[Int] { +override def getValue: Int = yarnAllocator.getNumExecutorsRunning + }) + + metricRegistry.register(MetricRegistry.name("numReleasedContainers"), new Gauge[Int] { +override def getValue: Int = yarnAllocator.getNumReleasedContainers + }) + + metricRegistry.register(MetricRegistry.name("numPendingLossReasonRequests"), new Gauge[Int] { +override def getValue: Int = yarnAllocator.getNumPendingLossReasonRequests --- End diff -- Sorry I have no use case for it. I added it as in previous comments it was requested to have more metrics and this one was something easy to collect If it is totally useless then better to remove it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/21635#discussion_r199524583 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnClusterSchedulerSource.scala --- @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import com.codahale.metrics.{Gauge, MetricRegistry} + +import org.apache.spark.metrics.source.Source + +private[spark] class YarnClusterSchedulerSource(yarnAllocator: YarnAllocator) extends Source { + + override val sourceName: String = "yarn" + override val metricRegistry: MetricRegistry = new MetricRegistry() + + metricRegistry.register(MetricRegistry.name("numExecutorsFailed"), new Gauge[Int] { +override def getValue: Int = yarnAllocator.getNumExecutorsFailed + }) + + metricRegistry.register(MetricRegistry.name("numExecutorsRunning"), new Gauge[Int] { +override def getValue: Int = yarnAllocator.getNumExecutorsRunning + }) + + metricRegistry.register(MetricRegistry.name("numReleasedContainers"), new Gauge[Int] { +override def getValue: Int = yarnAllocator.getNumReleasedContainers + }) + + metricRegistry.register(MetricRegistry.name("numPendingLossReasonRequests"), new Gauge[Int] { +override def getValue: Int = yarnAllocator.getNumPendingLossReasonRequests --- End diff -- I'm not sure how useful this metric is, did you have specific use case? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN
Github user attilapiros commented on a diff in the pull request: https://github.com/apache/spark/pull/21635#discussion_r198641094 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnClusterSchedulerSource.scala --- @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import com.codahale.metrics.{Gauge, MetricRegistry} + +import org.apache.spark.metrics.source.Source + +private[spark] class YarnClusterSchedulerSource(yarnAllocator: YarnAllocator) extends Source { + + override val sourceName: String = "yarn" + override val metricRegistry: MetricRegistry = new MetricRegistry() + + metricRegistry.register(MetricRegistry.name("numExecutorsFailed"), new Gauge[Int] { +override def getValue: Int = yarnAllocator.getNumExecutorsFailed + }) + + metricRegistry.register(MetricRegistry.name("numExecutorsRunning"), new Gauge[Int] { +override def getValue: Int = yarnAllocator.getNumExecutorsRunning + }) + + metricRegistry.register(MetricRegistry.name("numReleasedContainers"), new Gauge[Int] { +override def getValue: Int = yarnAllocator.getNumReleasedContainers + }) + + metricRegistry.register(MetricRegistry.name("numPendingLossReasonRequests"), new Gauge[Int] { +override def getValue: Int = yarnAllocator.getNumPendingLossReasonRequests + }) + + metricRegistry.register(MetricRegistry.name("numLocalityAwareTasks"), new Gauge[Int] { +override def getValue: Int = yarnAllocator.numLocalityAwareTasks + }) + +} --- End diff -- The getPendingAllocate seams to me quite cheap as it just uses local maps (and tables) to calculate a list of ContainerRequests. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN
Github user attilapiros commented on a diff in the pull request: https://github.com/apache/spark/pull/21635#discussion_r198630307 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -309,6 +312,9 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments) extends finish(FinalApplicationStatus.FAILED, ApplicationMaster.EXIT_UNCAUGHT_EXCEPTION, "Uncaught exception: " + StringUtils.stringifyException(e)) +} finally { + metricsSystem.report() --- End diff -- Yes, better and more elegant to store metricSystem in an Option. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN
Github user attilapiros commented on a diff in the pull request: https://github.com/apache/spark/pull/21635#discussion_r198629311 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnClusterSchedulerSource.scala --- @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import com.codahale.metrics.{Gauge, MetricRegistry} + +import org.apache.spark.metrics.source.Source + +private[spark] class YarnClusterSchedulerSource(yarnAllocator: YarnAllocator) extends Source { + + override val sourceName: String = "yarn" + override val metricRegistry: MetricRegistry = new MetricRegistry() + + metricRegistry.register(MetricRegistry.name("numExecutorsFailed"), new Gauge[Int] { +override def getValue: Int = yarnAllocator.getNumExecutorsFailed + }) + + metricRegistry.register(MetricRegistry.name("numExecutorsRunning"), new Gauge[Int] { +override def getValue: Int = yarnAllocator.getNumExecutorsRunning + }) + + metricRegistry.register(MetricRegistry.name("numReleasedContainers"), new Gauge[Int] { +override def getValue: Int = yarnAllocator.getNumReleasedContainers + }) + + metricRegistry.register(MetricRegistry.name("numPendingLossReasonRequests"), new Gauge[Int] { +override def getValue: Int = yarnAllocator.getNumPendingLossReasonRequests + }) + + metricRegistry.register(MetricRegistry.name("numLocalityAwareTasks"), new Gauge[Int] { +override def getValue: Int = yarnAllocator.numLocalityAwareTasks + }) + +} --- End diff -- Yes, I have seen the call goes to YARN and I also was afraid abut its execution time so this is why I finally decided to leave it out. But I will check whether there is something better to get it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21635#discussion_r198626433 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -309,6 +312,9 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments) extends finish(FinalApplicationStatus.FAILED, ApplicationMaster.EXIT_UNCAUGHT_EXCEPTION, "Uncaught exception: " + StringUtils.stringifyException(e)) +} finally { + metricsSystem.report() --- End diff -- `metricsSystem` can be null at this point, can't it? in case of some issue during startup. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21635#discussion_r198627306 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnClusterSchedulerSource.scala --- @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import com.codahale.metrics.{Gauge, MetricRegistry} + +import org.apache.spark.metrics.source.Source + +private[spark] class YarnClusterSchedulerSource(yarnAllocator: YarnAllocator) extends Source { + + override val sourceName: String = "yarn" + override val metricRegistry: MetricRegistry = new MetricRegistry() + + metricRegistry.register(MetricRegistry.name("numExecutorsFailed"), new Gauge[Int] { +override def getValue: Int = yarnAllocator.getNumExecutorsFailed + }) + + metricRegistry.register(MetricRegistry.name("numExecutorsRunning"), new Gauge[Int] { +override def getValue: Int = yarnAllocator.getNumExecutorsRunning + }) + + metricRegistry.register(MetricRegistry.name("numReleasedContainers"), new Gauge[Int] { +override def getValue: Int = yarnAllocator.getNumReleasedContainers + }) + + metricRegistry.register(MetricRegistry.name("numPendingLossReasonRequests"), new Gauge[Int] { +override def getValue: Int = yarnAllocator.getNumPendingLossReasonRequests + }) + + metricRegistry.register(MetricRegistry.name("numLocalityAwareTasks"), new Gauge[Int] { +override def getValue: Int = yarnAllocator.numLocalityAwareTasks + }) + +} --- End diff -- The size of `getPendingAllocate` might be an interesting metric, but need to check whether it requires synchronization... and it may be an expensive operation, not sure if the AM client has a better API to get the number of pending requests. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN
Github user attilapiros commented on a diff in the pull request: https://github.com/apache/spark/pull/21635#discussion_r198538655 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -309,6 +312,9 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments) extends finish(FinalApplicationStatus.FAILED, ApplicationMaster.EXIT_UNCAUGHT_EXCEPTION, "Uncaught exception: " + StringUtils.stringifyException(e)) +} finally { + metricsSystem.report() --- End diff -- Thanks Tom, documentation is added. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/21635#discussion_r198494546 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -309,6 +312,9 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments) extends finish(FinalApplicationStatus.FAILED, ApplicationMaster.EXIT_UNCAUGHT_EXCEPTION, "Uncaught exception: " + StringUtils.stringifyException(e)) +} finally { + metricsSystem.report() --- End diff -- add yarn to the monitoring.md doc as sink --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN
Github user attilapiros commented on a diff in the pull request: https://github.com/apache/spark/pull/21635#discussion_r198116931 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClusterSchedulerSource.scala --- @@ -0,0 +1,35 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler.cluster + +import com.codahale.metrics.{Gauge, MetricRegistry} + +import org.apache.spark.deploy.yarn.FailureTracker +import org.apache.spark.metrics.source.Source + +private[spark] class YarnClusterSchedulerSource(failureTracker: FailureTracker) extends Source { + + override val sourceName: String = "yarn_cluster" + override val metricRegistry: MetricRegistry = new MetricRegistry() + + metricRegistry.register( --- End diff -- I have added some new metrics. So the current metrics are: - yarn.numExecutorsFailed - yarn.numExecutorsRunning - yarn.numLocalityAwareTasks - yarn.numPendingLossReasonRequests - yarn.numReleasedContainers --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN
Github user attilapiros commented on a diff in the pull request: https://github.com/apache/spark/pull/21635#discussion_r198116371 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -67,6 +68,14 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments) extends private val securityMgr = new SecurityManager(sparkConf) + private[spark] val failureTracker = new FailureTracker(sparkConf, new SystemClock) --- End diff -- As other metrics from YarnAllocator will be added this change is reverted. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN
Github user attilapiros commented on a diff in the pull request: https://github.com/apache/spark/pull/21635#discussion_r198115953 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -771,6 +784,7 @@ object ApplicationMaster extends Logging { private var master: ApplicationMaster = _ + --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN ...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/21635#discussion_r198004697 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClusterSchedulerSource.scala --- @@ -0,0 +1,35 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler.cluster + +import com.codahale.metrics.{Gauge, MetricRegistry} + +import org.apache.spark.deploy.yarn.FailureTracker +import org.apache.spark.metrics.source.Source + +private[spark] class YarnClusterSchedulerSource(failureTracker: FailureTracker) extends Source { + + override val sourceName: String = "yarn_cluster" + override val metricRegistry: MetricRegistry = new MetricRegistry() + + metricRegistry.register( --- End diff -- Agreed, creating metric source with only one metrics seems overkill. Maybe we can mix this into `FailureTracker`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21635#discussion_r197975990 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -771,6 +784,7 @@ object ApplicationMaster extends Logging { private var master: ApplicationMaster = _ + --- End diff -- nit: remove --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21635#discussion_r197976701 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClusterSchedulerSource.scala --- @@ -0,0 +1,35 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler.cluster + +import com.codahale.metrics.{Gauge, MetricRegistry} + +import org.apache.spark.deploy.yarn.FailureTracker +import org.apache.spark.metrics.source.Source + +private[spark] class YarnClusterSchedulerSource(failureTracker: FailureTracker) extends Source { + + override val sourceName: String = "yarn_cluster" + override val metricRegistry: MetricRegistry = new MetricRegistry() + + metricRegistry.register( --- End diff -- The mechanics of adding the metric source are ok, but have you thought of other metrics to expose? `YarnAllocator` has a lot of things that could be easily hooked up here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21635#discussion_r197976137 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -67,6 +68,14 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments) extends private val securityMgr = new SecurityManager(sparkConf) + private[spark] val failureTracker = new FailureTracker(sparkConf, new SystemClock) --- End diff -- No need to specify the clock here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN ...
GitHub user attilapiros opened a pull request: https://github.com/apache/spark/pull/21635 [SPARK-24594][YARN] Introducing metrics for YARN executor allocation problems ## What changes were proposed in this pull request? In this PR metrics are introduced for YARN allocation failures. As up to now there was no metrics in the YARN module a new metric system is created with the name "yarn". To support both client and cluster mode the metric system lifecycle is bound to the AM. ## How was this patch tested? Both client and cluster mode was tested manually. Before the test on one of the YARN node spark-core was removed to cause the allocation failure. Spark was started as (in case of client mode): ``` spark2-submit \ --class org.apache.spark.examples.SparkPi \ --conf "spark.yarn.blacklist.executor.launch.blacklisting.enabled=true" --conf "spark.blacklist.application.maxFailedExecutorsPerNode=2" --conf "spark.dynamicAllocation.enabled=true" --conf "spark.metrics.conf.*.sink.console.class=org.apache.spark.metrics.sink.ConsoleSink" \ --master yarn \ --deploy-mode client \ original-spark-examples_2.11-2.4.0-SNAPSHOT.jar \ 1000 ``` In both cases the YARN logs contained the new metrics as: ``` $ yarn logs --applicationId application_1529926424933_0015 | grep -A1 -B1 yarn.numFailedExecutors 18/06/25 07:08:29 INFO client.RMProxy: Connecting to ResourceManager at ... -- Gauges -- yarn.numFailedExecutors value = 0 -- -- Gauges -- yarn.numFailedExecutors value = 3 -- -- Gauges -- yarn.numFailedExecutors value = 3 -- -- Gauges -- yarn.numFailedExecutors value = 3 ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/attilapiros/spark SPARK-24594 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21635.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21635 commit 9b033ccfa572c93d7c2dc7bca06f9be1e363f88a Author: âattilapirosâ Date: 2018-06-19T19:40:20Z Initial commit (yarn metrics) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org