EnricoMi commented on PR #47688:
URL: https://github.com/apache/spark/pull/47688#issuecomment-2296226981

   The [spark-extension](https://github.com/G-Research/spark-extension) 
packages provides some [Dataset diff 
tooling](https://github.com/G-Research/spark-extension/blob/master/DIFF.md). 
There, a user-defined comparison can simply be defined by implementing the 
[`scala.math.Equiv` 
interface](https://www.scala-lang.org/api/2.13.5/scala/math/Equiv.html): 
https://github.com/G-Research/spark-extension/blob/master/src/main/scala/uk/co/gresearch/spark/diff/DiffComparators.scala#L41
   
   That `Equiv` implementation is wrapped into an `Expression` (including 
codegen) and turned into a `Comparator`that  is then used by the package to 
diff columns: given two columns `left` and `right`, return a `Column` that 
evaluates (compares the columns) to `Boolean`:
   - 
[EquivDiffComparator.scala:34](https://github.com/G-Research/spark-extension/blob/master/src/main/scala/uk/co/gresearch/spark/diff/comparator/EquivDiffComparator.scala#L34)
   - 
[EquivDiffComparator.scala:67](https://github.com/G-Research/spark-extension/blob/master/src/main/scala/uk/co/gresearch/spark/diff/comparator/EquivDiffComparator.scala#L67)
   
   This obviously won't work for Spark connect, but with Column Node API this 
does not work for classic Spark client either.
   
   That package supports Spark 3.0 - 3.5. Creating a `Column` from an 
`Expression` would allow for minimal changes to keep this working for Spark 4.0 
with non-Connect client. This is what I meant with backward compatibility.
   
   In order to support Spark Connect, there is no way around using the Spark 
Connect plugin / extensions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to