Enrico Minack created SPARK-30296: ------------------------------------- Summary: Dataset diffing transformation Key: SPARK-30296 URL: https://issues.apache.org/jira/browse/SPARK-30296 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 2.4.4 Reporter: Enrico Minack Fix For: 3.0.0
Evolving Spark code needs frequent regression testing to prove it still produces identical results, or if changes are expected, to investigate those changes. Diffing the Datasets of two code paths provides confidence. Diffing small schemata is easy, but with wide schema the Spark query becomes laborious and error-prone. With a single proven and tested method, diffing becomes easier and a more reliable operation. As a Dataset transformation, you get this operation first hand with your Dataset API. This has proven to be useful for interactive spark as well as deployed production code. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org