ZiyaZa commented on code in PR #55576:
URL: https://github.com/apache/spark/pull/55576#discussion_r3159524955


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala:
##########
@@ -350,9 +350,14 @@ case class ReplaceDataExec(
         // One of the metrics couldn't be found, also mark numDeletedRows as 
not found.
         -1L
       }
+
+      // SQLMetric.set call is a no-op if value is -1. Override numDeletedRows 
value in summary.

Review Comment:
   If my understanding is correct, setting the initial value as -1 will result 
it in the final numDeletedRows metric value being equal to -1 x num_of_tasks as 
no task calls .add() on it and each task will send -1 as the value and finally 
sum these up. I'm not sure how it behaves with task failures etc.
   
   There are also places where we legitimately rely on metric values being 
initialized to 0 unless updated. For instance, MERGE numTargetRowsCopied should 
stay as 0, unless we have a row that is being copied and then we increment it.
   
   Since the meaning of `-1` is only documented at Summary classes as not 
found, I don't think we have to make the metrics behave the same way. It looks 
like metrics were not built to support negative values properly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to