Repository: spark
Updated Branches:
  refs/heads/branch-2.3 a81ace196 -> bb7502f9a


[SPARK-23157][SQL] Explain restriction on column expression in withColumn()

## What changes were proposed in this pull request?

It's not obvious from the comments that any added column must be a
function of the dataset that we are adding it to. Add a comment to
that effect to Scala, Python and R Data* methods.

Author: Henry Robinson <he...@cloudera.com>

Closes #20429 from henryr/SPARK-23157.

(cherry picked from commit 8b983243e45dfe2617c043a3229a7d87f4c4b44b)
Signed-off-by: gatorsmile <gatorsm...@gmail.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/bb7502f9
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/bb7502f9
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/bb7502f9

Branch: refs/heads/branch-2.3
Commit: bb7502f9a506d52365d7532b3b0281098dd85763
Parents: a81ace1
Author: Henry Robinson <he...@cloudera.com>
Authored: Mon Jan 29 22:19:59 2018 -0800
Committer: gatorsmile <gatorsm...@gmail.com>
Committed: Mon Jan 29 22:20:09 2018 -0800

----------------------------------------------------------------------
 R/pkg/R/DataFrame.R                                        | 3 ++-
 python/pyspark/sql/dataframe.py                            | 4 ++++
 sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala | 3 +++
 3 files changed, 9 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/bb7502f9/R/pkg/R/DataFrame.R
----------------------------------------------------------------------
diff --git a/R/pkg/R/DataFrame.R b/R/pkg/R/DataFrame.R
index 29f3e98..547b5ea 100644
--- a/R/pkg/R/DataFrame.R
+++ b/R/pkg/R/DataFrame.R
@@ -2090,7 +2090,8 @@ setMethod("selectExpr",
 #'
 #' @param x a SparkDataFrame.
 #' @param colName a column name.
-#' @param col a Column expression, or an atomic vector in the length of 1 as 
literal value.
+#' @param col a Column expression (which must refer only to this DataFrame), 
or an atomic vector in
+#' the length of 1 as literal value.
 #' @return A SparkDataFrame with the new column added or the existing column 
replaced.
 #' @family SparkDataFrame functions
 #' @aliases withColumn,SparkDataFrame,character-method

http://git-wip-us.apache.org/repos/asf/spark/blob/bb7502f9/python/pyspark/sql/dataframe.py
----------------------------------------------------------------------
diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index ac40308..055b2c4 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -1829,11 +1829,15 @@ class DataFrame(object):
         Returns a new :class:`DataFrame` by adding a column or replacing the
         existing column that has the same name.
 
+        The column expression must be an expression over this dataframe; 
attempting to add
+        a column from some other dataframe will raise an error.
+
         :param colName: string, name of the new column.
         :param col: a :class:`Column` expression for the new column.
 
         >>> df.withColumn('age2', df.age + 2).collect()
         [Row(age=2, name=u'Alice', age2=4), Row(age=5, name=u'Bob', age2=7)]
+
         """
         assert isinstance(col, Column), "col should be Column"
         return DataFrame(self._jdf.withColumn(colName, col._jc), self.sql_ctx)

http://git-wip-us.apache.org/repos/asf/spark/blob/bb7502f9/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
----------------------------------------------------------------------
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
index cc5b647..d47cd0a 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
@@ -2150,6 +2150,9 @@ class Dataset[T] private[sql](
    * Returns a new Dataset by adding a column or replacing the existing column 
that has
    * the same name.
    *
+   * `column`'s expression must only refer to attributes supplied by this 
Dataset. It is an
+   * error to add a column that refers to some other Dataset.
+   *
    * @group untypedrel
    * @since 2.0.0
    */


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to