[GitHub] [spark] cloud-fan commented on a change in pull request #32776: [SPARK-35639][SQL] Add metrics about coalesced partitions to CustomShuffleReader in AQE

2021-07-26 Thread GitBox


cloud-fan commented on a change in pull request #32776:
URL: https://github.com/apache/spark/pull/32776#discussion_r676519068



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffleReaderExec.scala
##
@@ -182,6 +193,17 @@ case class CustomShuffleReaderExec private(
 } else {
   Map.empty
 }
+  } ++ {
+if (isLocalReader) {

Review comment:
   do we need this if condition here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #32776: [SPARK-35639][SQL] Add metrics about coalesced partitions to CustomShuffleReader in AQE

2021-06-10 Thread GitBox


cloud-fan commented on a change in pull request #32776:
URL: https://github.com/apache/spark/pull/32776#discussion_r649352199



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffleReaderExec.scala
##
@@ -76,19 +76,76 @@ case class CustomShuffleReaderExec private(
 val desc = if (isLocalReader) {
   "local"
 } else if (hasCoalescedPartition && hasSkewedPartition) {
-  "coalesced and skewed"
+  s"$coalescedDetail and $skewedDetail"

Review comment:
   No other plan node does this: metrics are only available through UI or 
programing APIs (e.g. `QueryExecutionListener`).
   
   I can't find a strong reason to start doing it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #32776: [SPARK-35639][SQL] Add metrics about coalesced partitions to CustomShuffleReader in AQE

2021-06-09 Thread GitBox


cloud-fan commented on a change in pull request #32776:
URL: https://github.com/apache/spark/pull/32776#discussion_r648846235



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffleReaderExec.scala
##
@@ -76,19 +76,76 @@ case class CustomShuffleReaderExec private(
 val desc = if (isLocalReader) {
   "local"
 } else if (hasCoalescedPartition && hasSkewedPartition) {
-  "coalesced and skewed"
+  s"$coalescedDetail and $skewedDetail"
 } else if (hasCoalescedPartition) {
-  "coalesced"
+  coalescedDetail
 } else if (hasSkewedPartition) {
-  "skewed"
+  skewedDetail
 } else {
   ""
 }
 Iterator(desc)
   }
+  private def isCoalesced(spec: ShufflePartitionSpec) = coalesceRange(spec) > 1
+  /**
+   * How many partitions were coalesced; 0 if not [[CoalescedPartitionSpec]]
+   */
+  private def coalesceRange(spec: ShufflePartitionSpec) = spec match {
+case s: CoalescedPartitionSpec => s.endReducerIndex - s.startReducerIndex
+case _ => 0
+  }
+
+  /* This is left as documentation
+   * Is it worth reporting this?  For example, if we have
+   * MapOutputStatistics 0,0,0,72,0
+   * MapOutputStatistics 0,0,0,138,138
+   * with target partition size 10, we'll have
+   * CoalescedPartitionSpec(3,4) & CoalescedPartitionSpec(4,5)
+   * So pre-shuffle partitions 0,1,2 are dropped
+   * Another example, (target size 10)
+   * MapOutputStatistics 0,3,0,2,7
+   * MapOutputStatistics 0,2,0,2,7
+   * Results in CoalescedPartitionSpec(1,4) & CoalescedPartitionSpec(4,5)
+   * So pre-shuffle partition 2 is included
+   * We could figure out dropped partitions but doesn't seem that useful.

Review comment:
   I don't think it's useful to report this metrics.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #32776: [SPARK-35639][SQL] Add metrics about coalesced partitions to CustomShuffleReader in AQE

2021-06-09 Thread GitBox


cloud-fan commented on a change in pull request #32776:
URL: https://github.com/apache/spark/pull/32776#discussion_r648845848



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffleReaderExec.scala
##
@@ -76,19 +76,76 @@ case class CustomShuffleReaderExec private(
 val desc = if (isLocalReader) {
   "local"
 } else if (hasCoalescedPartition && hasSkewedPartition) {
-  "coalesced and skewed"
+  s"$coalescedDetail and $skewedDetail"

Review comment:
   AFAIK plan node string never contains metrics before this PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #32776: [SPARK-35639][SQL] Add metrics about coalesced partitions to CustomShuffleReader in AQE

2021-06-09 Thread GitBox


cloud-fan commented on a change in pull request #32776:
URL: https://github.com/apache/spark/pull/32776#discussion_r648845537



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffleReaderExec.scala
##
@@ -171,6 +238,19 @@ case class CustomShuffleReaderExec private(
 } else {
   Map.empty
 }
+  } ++ {
+if (isLocalReader) {
+  Map.empty
+} else {
+  if (hasCoalescedPartition) {
+Map("numCoalescedPartitions" ->
+  SQLMetrics.createMetric(sparkContext, "number of coalesced 
partitions"),
+  "numPartitionsToCoalesce" ->

Review comment:
   There is always a shuffle node below `CustomShuffleReader`. I think it 
makes more sense to let the shuffle node to report the metrics of the number of 
partitions/reducers.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #32776: [SPARK-35639][SQL] Add metrics about coalesced partitions to CustomShuffleReader in AQE

2021-06-09 Thread GitBox


cloud-fan commented on a change in pull request #32776:
URL: https://github.com/apache/spark/pull/32776#discussion_r648844909



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffleReaderExec.scala
##
@@ -76,19 +76,76 @@ case class CustomShuffleReaderExec private(
 val desc = if (isLocalReader) {
   "local"
 } else if (hasCoalescedPartition && hasSkewedPartition) {
-  "coalesced and skewed"
+  s"$coalescedDetail and $skewedDetail"

Review comment:
   It makes sense to add more metrics but it doesn't make sense to include 
metrics in the plan node string.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org