[ 
https://issues.apache.org/jira/browse/SPARK-30296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-30296:
----------------------------------
    Affects Version/s:     (was: 2.4.4)
                       3.0.0

> Dataset diffing transformation
> ------------------------------
>
>                 Key: SPARK-30296
>                 URL: https://issues.apache.org/jira/browse/SPARK-30296
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Enrico Minack
>            Priority: Major
>
> Evolving Spark code needs frequent regression testing to prove it still 
> produces identical results, or if changes are expected, to investigate those 
> changes. Diffing the Datasets of two code paths provides confidence.
> Diffing small schemata is easy, but with wide schema the Spark query becomes 
> laborious and error-prone. With a single proven and tested method, diffing 
> becomes easier and a more reliable operation. As a Dataset transformation, 
> you get this operation first hand with your Dataset API.
> This has proven to be useful for interactive spark as well as deployed 
> production code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to