[GitHub] spark pull request #22369: [SPARK-25072][DOC] Update migration guide for beh...

BryanCutler Sat, 08 Sep 2018 22:30:48 -0700

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22369#discussion_r216147674
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1901,6 +1901,7 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
     ## Upgrading From Spark SQL 2.3.0 to 2.3.1 and above
     
       - As of version 2.3.1 Arrow functionality, including `pandas_udf` and 
`toPandas()`/`createDataFrame()` with `spark.sql.execution.arrow.enabled` set 
to `True`, has been marked as experimental. These are still evolving and not 
currently recommended for use in production.
    +  - In version 2.3.1 and earlier, it is possible for PySpark to create a 
Row object by providing more value than column number through the customized 
Row class. Since Spark 2.3.3, Spark will confirm value length is less or equal 
than column length in PySpark. See 
[SPARK-25072](https://issues.apache.org/jira/browse/SPARK-25072) for details.
    --- End diff --
    
    Maybe say `..by providing more values than number of fields through a 
customized Row class. As of Spark 2.3.3, PySpark will raise a ValueError if the 
number of values are more than the number of fields. See...`



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22369: [SPARK-25072][DOC] Update migration guide for beh...

Reply via email to