Re: [PR] [SPARK-52578][SQL] Add metrics for rows to track case and action in MergeRowsExec [spark]

via GitHub Tue, 15 Jul 2025 10:47:17 -0700


szehon-ho commented on code in PR #51285:
URL: https://github.com/apache/spark/pull/51285#discussion_r2208191712



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/MergeRowsExec.scala:
##########
@@ -257,4 +272,27 @@ case class MergeRowsExec(
       null
     }
   }
+
+  // For group based merge, copy is inserted if row matches no other case
+  private def incrementCopyMetric(): Unit = longMetric("numTargetRowsCopied") 
+= 1

Review Comment:
   For that one, I took a closer look.  It look like it was from this 
discussion:  https://github.com/apache/spark/pull/31451#discussion_r604572473.  
In that case there was concern to get the metric from the DSV2 connector, to 
avoid calling currentMetricsValue so many times because the external 
implementation can be heavy.
   
   In our case, getting the metric is in memory, so it should be quick.
   
   On the sending end, it looks like SQLMetric is an accumulator and updating 
it just sets an in memory value as well.
   
   So at first glance, I think adding complexity to update the metric per 100 
rows may not be worth it.  But I may be missing something.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-52578][SQL] Add metrics for rows to track case and action in MergeRowsExec [spark]

Reply via email to