Repository: spark
Updated Branches:
  refs/heads/branch-2.3 8ef323c57 -> bfbc2d41b


[SPARK-23062][SQL] Improve EXCEPT documentation

## What changes were proposed in this pull request?

Make the default behavior of EXCEPT (i.e. EXCEPT DISTINCT) more
explicit in the documentation, and call out the change in behavior
from 1.x.

Author: Henry Robinson <he...@cloudera.com>

Closes #20254 from henryr/spark-23062.

(cherry picked from commit 1f3d933e0bd2b1e934a233ed699ad39295376e71)
Signed-off-by: gatorsmile <gatorsm...@gmail.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/bfbc2d41
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/bfbc2d41
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/bfbc2d41

Branch: refs/heads/branch-2.3
Commit: bfbc2d41b8a9278b347b6df2d516fe4679b41076
Parents: 8ef323c
Author: Henry Robinson <he...@cloudera.com>
Authored: Wed Jan 17 16:01:41 2018 +0800
Committer: gatorsmile <gatorsm...@gmail.com>
Committed: Wed Jan 17 16:02:04 2018 +0800

----------------------------------------------------------------------
 R/pkg/R/DataFrame.R                                        | 2 +-
 python/pyspark/sql/dataframe.py                            | 3 ++-
 sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala | 2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/bfbc2d41/R/pkg/R/DataFrame.R
----------------------------------------------------------------------
diff --git a/R/pkg/R/DataFrame.R b/R/pkg/R/DataFrame.R
index 6caa125..29f3e98 100644
--- a/R/pkg/R/DataFrame.R
+++ b/R/pkg/R/DataFrame.R
@@ -2853,7 +2853,7 @@ setMethod("intersect",
 #' except
 #'
 #' Return a new SparkDataFrame containing rows in this SparkDataFrame
-#' but not in another SparkDataFrame. This is equivalent to \code{EXCEPT} in 
SQL.
+#' but not in another SparkDataFrame. This is equivalent to \code{EXCEPT 
DISTINCT} in SQL.
 #'
 #' @param x a SparkDataFrame.
 #' @param y a SparkDataFrame.

http://git-wip-us.apache.org/repos/asf/spark/blob/bfbc2d41/python/pyspark/sql/dataframe.py
----------------------------------------------------------------------
diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index 95eca76..2d5e9b9 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -1364,7 +1364,8 @@ class DataFrame(object):
         """ Return a new :class:`DataFrame` containing rows in this frame
         but not in another frame.
 
-        This is equivalent to `EXCEPT` in SQL.
+        This is equivalent to `EXCEPT DISTINCT` in SQL.
+
         """
         return DataFrame(getattr(self._jdf, "except")(other._jdf), 
self.sql_ctx)
 

http://git-wip-us.apache.org/repos/asf/spark/blob/bfbc2d41/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
----------------------------------------------------------------------
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
index 34f0ab5..912f411 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
@@ -1903,7 +1903,7 @@ class Dataset[T] private[sql](
 
   /**
    * Returns a new Dataset containing rows in this Dataset but not in another 
Dataset.
-   * This is equivalent to `EXCEPT` in SQL.
+   * This is equivalent to `EXCEPT DISTINCT` in SQL.
    *
    * @note Equality checking is performed directly on the encoded 
representation of the data
    * and thus is not affected by a custom `equals` function defined on `T`.


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to