Enrico Minack created SPARK-30296:
-------------------------------------

             Summary: Dataset diffing transformation
                 Key: SPARK-30296
                 URL: https://issues.apache.org/jira/browse/SPARK-30296
             Project: Spark
          Issue Type: New Feature
          Components: SQL
    Affects Versions: 2.4.4
            Reporter: Enrico Minack
             Fix For: 3.0.0


Evolving Spark code needs frequent regression testing to prove it still 
produces identical results, or if changes are expected, to investigate those 
changes. Diffing the Datasets of two code paths provides confidence.

Diffing small schemata is easy, but with wide schema the Spark query becomes 
laborious and error-prone. With a single proven and tested method, diffing 
becomes easier and a more reliable operation. As a Dataset transformation, you 
get this operation first hand with your Dataset API.

This has proven to be useful for interactive spark as well as deployed 
production code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to