ZiyaZa commented on code in PR #55576:
URL: https://github.com/apache/spark/pull/55576#discussion_r3159524955
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala:
##########
@@ -350,9 +350,14 @@ case class ReplaceDataExec(
// One of the metrics couldn't be found, also mark numDeletedRows as
not found.
-1L
}
+
+ // SQLMetric.set call is a no-op if value is -1. Override numDeletedRows
value in summary.
Review Comment:
If my understanding is correct, setting the initial value as -1 will result
it in the final numDeletedRows metric value being equal to -1 x num_of_tasks as
no task calls .add() on it and each task will send -1 as the value and finally
sum these up. I'm not sure how it behaves with task failures etc.
There are also places where we legitimately rely on metric values being
initialized to 0 unless updated. For instance, MERGE numTargetRowsCopied should
stay as 0, unless we have a row that is being copied and then we increment it.
Since the meaning of `-1` is only documented at Summary classes as not
found, I don't think we have to make the metrics behave the same way. It looks
like metrics were not built to support negative values properly.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]