Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20254#discussion_r161377477 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1364,7 +1364,9 @@ def subtract(self, other): """ Return a new :class:`DataFrame` containing rows in this frame but not in another frame. - This is equivalent to `EXCEPT` in SQL. + This is equivalent to `EXCEPT DISTINCT` in SQL. + + (Note: Before Spark 2.0, the behavior was equivalent to `EXCEPT ALL` in SQL.) --- End diff -- Actually, before 2.0, it is not equivalent to EXCEPT ALL. For details, see the PR: https://github.com/apache/spark/pull/12736
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org