juliuszsompolski commented on code in PR #51091:
URL: https://github.com/apache/spark/pull/51091#discussion_r2153892692


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/MergeRowsExec.scala:
##########
@@ -69,6 +71,13 @@ case class MergeRowsExec(
     copy(child = newChild)
   }
 
+  override lazy val metrics: Map[String, SQLMetric] = Map(
+    "numTargetRowsCopied" -> SQLMetrics.createMetric(sparkContext,
+      "Number of target rows copied over because they did not match any 
condition."),
+    "numTargetRowsUnmatched" -> SQLMetrics.createMetric(sparkContext,
+      "Number of target rows processed that do not match any condition. " +
+        "These will be dropped for delta-based merge and retained for 
group-based merge."))

Review Comment:
   Hm, I think the "Unmatched" in the name is a bit confusing because of 
overload with the "MATCHED", "NOT MATCHED", "NOT MATCHED BY SOURCE" of the 
whole MERGE.
   
   When looking at it from the perspective of the whole MERGE:
   - you are already after the target-source join, so these are rows that can 
be MATCHED (or NOT MATCHED BY SOURCE), but then didn't pass any of the extra 
conditions of "WHEN MATCHED AND", and "WHEN NOT MATCHED BY SOURCE AND".
   - these are not really all target rows "processed" in the MERGE, because 
there are more rows scanned from the target that then may have been dropped in 
the join.
   
   But from the perspective of the MergeRows operator:
   - "processed" is fine, because these are rows processed by the MergeRows 
operator.
   - I asked my teammates what name could be better than "Unmatched" and I 
think "Unchanged". Because these are the rows that passed into the MergeRows 
operator, but then fell through all the instructions.
   
   So I think "numTargetRowsUnchanged". Maybe while at it also have a 
corresponding "numSourceRowsUnchanged"?



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/MergeRows.scala:
##########
@@ -87,7 +88,22 @@ object MergeRows {
     override def dataType: DataType = NullType
   }
 
-  case class Keep(condition: Expression, output: Seq[Expression]) extends 
Instruction {
+  // A special case of Keep where the row is kept as is.
+  case class CarryOver(output: Seq[Expression]) extends Instruction  {

Review Comment:
   nit: since then we name it "copied rows" in other places, maybe name it 
`Copy`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to