szehon-ho commented on code in PR #51285:
URL: https://github.com/apache/spark/pull/51285#discussion_r2208191712
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/MergeRowsExec.scala:
##########
@@ -257,4 +272,27 @@ case class MergeRowsExec(
null
}
}
+
+ // For group based merge, copy is inserted if row matches no other case
+ private def incrementCopyMetric(): Unit = longMetric("numTargetRowsCopied")
+= 1
Review Comment:
For that one, I took a closer look. It look like it was from this
discussion: https://github.com/apache/spark/pull/31451#discussion_r604572473.
In that case there was concern to get the metric from the DSV2 connector, to
avoid calling currentMetricsValue so many times because the external
implementation can be heavy.
In our case, getting the metric is in memory, so it should be quick.
On the sending end, it looks like SQLMetric is an accumulator and updating
it just sets an in memory value as well.
So at first glance, I think adding complexity to update the metric per 100
rows may not be worth it. But I may be missing something.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]