spark git commit: [SPARK-24392][PYTHON] Label pandas_udf as Experimental

gurwls223 Sun, 27 May 2018 22:02:03 -0700

Repository: spark
Updated Branches:
  refs/heads/branch-2.3 9b0f6f530 -> 8bb6c2285



[SPARK-24392][PYTHON] Label pandas_udf as Experimental

The pandas_udf functionality was introduced in 2.3.0, but is not completely 
stable and still evolving.  This adds a label to indicate it is still an 
experimental API.

NA

Author: Bryan Cutler <cutl...@gmail.com>

Closes #21435 from BryanCutler/arrow-pandas_udf-experimental-SPARK-24392.

(cherry picked from commit fa2ae9d2019f839647d17932d8fea769e7622777)
Signed-off-by: hyukjinkwon <gurwls...@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8bb6c228
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8bb6c228
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8bb6c228

Branch: refs/heads/branch-2.3
Commit: 8bb6c2285c6017f28d8c94f4030df518f6d3048d
Parents: 9b0f6f5
Author: Bryan Cutler <cutl...@gmail.com>
Authored: Mon May 28 12:56:05 2018 +0800
Committer: hyukjinkwon <gurwls...@apache.org>
Committed: Mon May 28 12:57:18 2018 +0800

----------------------------------------------------------------------
 docs/sql-programming-guide.md   | 4 ++++
 python/pyspark/sql/dataframe.py | 2 ++
 python/pyspark/sql/functions.py | 2 ++
 python/pyspark/sql/group.py     | 2 ++
 python/pyspark/sql/session.py   | 2 ++
 5 files changed, 12 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/8bb6c228/docs/sql-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 651e440..14bc5e6 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -1797,6 +1797,10 @@ working with timestamps in `pandas_udf`s to get the best 
performance, see
 
 # Migration Guide
 
+## Upgrading From Spark SQL 2.3.0 to 2.3.1 and above
+
+  - As of version 2.3.1 Arrow functionality, including `pandas_udf` and 
`toPandas()`/`createDataFrame()` with `spark.sql.execution.arrow.enabled` set 
to `True`, has been marked as experimental. These are still evolving and not 
currently recommended for use in production.
+
 ## Upgrading From Spark SQL 2.2 to 2.3
 
   - Since Spark 2.3, the queries from raw JSON/CSV files are disallowed when 
the referenced columns only include the internal corrupt record column (named 
`_corrupt_record` by default). For example, 
`spark.read.schema(schema).json(file).filter($"_corrupt_record".isNotNull).count()`
 and `spark.read.schema(schema).json(file).select("_corrupt_record").show()`. 
Instead, you can cache or save the parsed results and then send the same query. 
For example, `val df = spark.read.schema(schema).json(file).cache()` and then 
`df.filter($"_corrupt_record".isNotNull).count()`.

http://git-wip-us.apache.org/repos/asf/spark/blob/8bb6c228/python/pyspark/sql/dataframe.py
----------------------------------------------------------------------
diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index 9bb0dca..d416b3b 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -1924,6 +1924,8 @@ class DataFrame(object):
         .. note:: This method should only be used if the resulting Pandas's 
DataFrame is expected
             to be small, as all the data is loaded into the driver's memory.
 
+        .. note:: Usage with spark.sql.execution.arrow.enabled=True is 
experimental.
+
         >>> df.toPandas()  # doctest: +SKIP
            age   name
         0    2  Alice

http://git-wip-us.apache.org/repos/asf/spark/blob/8bb6c228/python/pyspark/sql/functions.py
----------------------------------------------------------------------
diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py
index 365be7b..cf26523 100644
--- a/python/pyspark/sql/functions.py
+++ b/python/pyspark/sql/functions.py
@@ -2172,6 +2172,8 @@ def pandas_udf(f=None, returnType=None, 
functionType=None):
     :param functionType: an enum value in 
:class:`pyspark.sql.functions.PandasUDFType`.
                          Default: SCALAR.
 
+    .. note:: Experimental
+
     The function type of the UDF can be one of the following:
 
     1. SCALAR

http://git-wip-us.apache.org/repos/asf/spark/blob/8bb6c228/python/pyspark/sql/group.py
----------------------------------------------------------------------
diff --git a/python/pyspark/sql/group.py b/python/pyspark/sql/group.py
index 330faf2..bc6c094 100644
--- a/python/pyspark/sql/group.py
+++ b/python/pyspark/sql/group.py
@@ -212,6 +212,8 @@ class GroupedData(object):
         This function does not support partial aggregation, and requires 
shuffling all the data in
         the :class:`DataFrame`.
 
+        .. note:: Experimental
+
         :param udf: a grouped map user-defined function returned by
             :func:`pyspark.sql.functions.pandas_udf`.
 

http://git-wip-us.apache.org/repos/asf/spark/blob/8bb6c228/python/pyspark/sql/session.py
----------------------------------------------------------------------
diff --git a/python/pyspark/sql/session.py b/python/pyspark/sql/session.py
index 2ac2ec2..a459cb5 100644
--- a/python/pyspark/sql/session.py
+++ b/python/pyspark/sql/session.py
@@ -578,6 +578,8 @@ class SparkSession(object):
         .. versionchanged:: 2.1
            Added verifySchema.
 
+        .. note:: Usage with spark.sql.execution.arrow.enabled=True is 
experimental.
+
         >>> l = [('Alice', 1)]
         >>> spark.createDataFrame(l).collect()
         [Row(_1=u'Alice', _2=1)]


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-24392][PYTHON] Label pandas_udf as Experimental

Reply via email to