[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28416: [SPARK-31611][YARN] Register NettyMemoryMetrics into Node Manager's metrics system
dongjoon-hyun commented on a change in pull request #28416: URL: https://github.com/apache/spark/pull/28416#discussion_r422409727 ## File path: resource-managers/yarn/src/test/scala/org/apache/spark/network/yarn/YarnShuffleServiceSuite.scala ## @@ -402,7 +402,9 @@ class YarnShuffleServiceSuite extends SparkFunSuite with Matchers with BeforeAnd "numRegisteredConnections", "openBlockRequestLatencyMillis", "registeredExecutorsSize", - "registerExecutorRequestLatencyMillis" + "registerExecutorRequestLatencyMillis", + "shuffle-server.usedDirectMemory", + "shuffle-server.usedHeapMemory" Review comment: Got it. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28416: [SPARK-31611][YARN] Register NettyMemoryMetrics into Node Manager's metrics system
dongjoon-hyun commented on a change in pull request #28416: URL: https://github.com/apache/spark/pull/28416#discussion_r421880370 ## File path: resource-managers/yarn/src/test/scala/org/apache/spark/network/yarn/YarnShuffleServiceSuite.scala ## @@ -402,7 +402,9 @@ class YarnShuffleServiceSuite extends SparkFunSuite with Matchers with BeforeAnd "numRegisteredConnections", "openBlockRequestLatencyMillis", "registeredExecutorsSize", - "registerExecutorRequestLatencyMillis" + "registerExecutorRequestLatencyMillis", + "shuffle-server.usedDirectMemory", + "shuffle-server.usedHeapMemory" Review comment: Apparently, the naming looks inconsistent from the others. Do you think we have a better way? For example, do we need `shuffle-server.` prefix? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28416: [SPARK-31611][YARN] Register NettyMemoryMetrics into Node Manager's metrics system
dongjoon-hyun commented on a change in pull request #28416: URL: https://github.com/apache/spark/pull/28416#discussion_r418363468 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java ## @@ -242,7 +240,6 @@ public ShuffleMetrics() { allMetrics.put("registeredExecutorsSize", (Gauge) () -> blockManager.getRegisteredExecutorsSize()); allMetrics.put("numActiveConnections", activeConnections); - allMetrics.put("numRegisteredConnections", registeredConnections); Review comment: We will review there if that is really never exposed to any one or not. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28416: [SPARK-31611][YARN] Register NettyMemoryMetrics into Node Manager's metrics system
dongjoon-hyun commented on a change in pull request #28416: URL: https://github.com/apache/spark/pull/28416#discussion_r418363468 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java ## @@ -242,7 +240,6 @@ public ShuffleMetrics() { allMetrics.put("registeredExecutorsSize", (Gauge) () -> blockManager.getRegisteredExecutorsSize()); allMetrics.put("numActiveConnections", activeConnections); - allMetrics.put("numRegisteredConnections", registeredConnections); Review comment: We will review there if that is really never exposed to any one. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28416: [SPARK-31611][YARN] Register NettyMemoryMetrics into Node Manager's metrics system
dongjoon-hyun commented on a change in pull request #28416: URL: https://github.com/apache/spark/pull/28416#discussion_r418362998 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java ## @@ -242,7 +240,6 @@ public ShuffleMetrics() { allMetrics.put("registeredExecutorsSize", (Gauge) () -> blockManager.getRegisteredExecutorsSize()); allMetrics.put("numActiveConnections", activeConnections); - allMetrics.put("numRegisteredConnections", registeredConnections); Review comment: Please create a removal PR first for the following. That will make the review easier~ > After digging into it, I find it's the Counter from TransportContext that's really counting the numbers. That created here in ExternalBlockHandler#ShuffleMetrics is never used anywhere. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28416: [SPARK-31611][YARN] Register NettyMemoryMetrics into Node Manager's metrics system
dongjoon-hyun commented on a change in pull request #28416: URL: https://github.com/apache/spark/pull/28416#discussion_r418313334 ## File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java ## @@ -198,6 +198,7 @@ protected void serviceInit(Configuration conf) throws Exception { // register metrics on the block handler into the Node Manager's metrics system. blockHandler.getAllMetrics().getMetrics().put("numRegisteredConnections", Review comment: If we put all metrics at line 201, do we need to have this line? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28416: [SPARK-31611][YARN] Register NettyMemoryMetrics into Node Manager's metrics system
dongjoon-hyun commented on a change in pull request #28416: URL: https://github.com/apache/spark/pull/28416#discussion_r418310844 ## File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java ## @@ -198,6 +198,7 @@ protected void serviceInit(Configuration conf) throws Exception { // register metrics on the block handler into the Node Manager's metrics system. blockHandler.getAllMetrics().getMetrics().put("numRegisteredConnections", shuffleServer.getRegisteredConnections()); + blockHandler.getAllMetrics().getMetrics().putAll(shuffleServer.getAllMetrics().getMetrics()); Review comment: This one-line seems to be the intended contribution. Did you face some error with `numRegisteredConnections` while adding this? I'm wondering if there is a reason to remove `numRegisteredConnections`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28416: [SPARK-31611][YARN] Register NettyMemoryMetrics into Node Manager's metrics system
dongjoon-hyun commented on a change in pull request #28416: URL: https://github.com/apache/spark/pull/28416#discussion_r418308100 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java ## @@ -242,7 +240,6 @@ public ShuffleMetrics() { allMetrics.put("registeredExecutorsSize", (Gauge) () -> blockManager.getRegisteredExecutorsSize()); allMetrics.put("numActiveConnections", activeConnections); - allMetrics.put("numRegisteredConnections", registeredConnections); Review comment: If you need to fix some YARN issue, I'd like to recommend to remove from YARN side and leave this shared class AS-IS. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28416: [SPARK-31611][YARN] Register NettyMemoryMetrics into Node Manager's metrics system
dongjoon-hyun commented on a change in pull request #28416: URL: https://github.com/apache/spark/pull/28416#discussion_r418307375 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java ## @@ -242,7 +240,6 @@ public ShuffleMetrics() { allMetrics.put("registeredExecutorsSize", (Gauge) () -> blockManager.getRegisteredExecutorsSize()); allMetrics.put("numActiveConnections", activeConnections); - allMetrics.put("numRegisteredConnections", registeredConnections); Review comment: Ur, this might remove this metric from the other resource managers like `Mesos`. Are you sure about the side-effect? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org