[ https://issues.apache.org/jira/browse/DATAFU-159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eyal Allweil closed DATAFU-159. ------------------------------- Resolution: Won't Do I can see that the spark-extension library has artifacts in Maven Central for Spark 2.4. So there's no reason I can see for implementing it here. Since there are no objections, I am closing this issue. > Add diff functionality to datafu-spark > -------------------------------------- > > Key: DATAFU-159 > URL: https://issues.apache.org/jira/browse/DATAFU-159 > Project: DataFu > Issue Type: New Feature > Reporter: Eyal Allweil > Priority: Major > > A useful feature when examining results is the ability to clearly understand > the differences between two datasets - for example, doing regressions between > expected and actual results. > Spark provides the _except_ functionality, but this is often not enough for > this - for example, see [this question on Stack > Overflow.|https://stackoverflow.com/questions/44338412/how-to-compare-two-dataframe-and-print-columns-that-are-different-in-scala] > Datafu-pig had a macro for doing this, and this could be a useful addition to > datafu-spark. > > -- This message was sent by Atlassian Jira (v8.20.7#820007)