Bryan Cutler created SPARK-22221: ------------------------------------ Summary: Add User Documentation for Working with Arrow in Spark Key: SPARK-22221 URL: https://issues.apache.org/jira/browse/SPARK-22221 Project: Spark Issue Type: Sub-task Components: PySpark, SQL Affects Versions: 2.3.0 Reporter: Bryan Cutler
There needs to be user facing documentation that will show how to enable/use Arrow with Spark, what the user should expect, and describe any differences with similar existing functionality. A comment from Xiao Li on https://github.com/apache/spark/pull/18664 Given the users/applications contain the Timestamp in their Dataset and their processing algorithms also need to have the codes based on the corresponding time-zone related assumptions. * For the new users/applications, they first enabled Arrow and later hit an Arrow bug? Can they simply turn off spark.sql.execution.arrow.enable? If not, what should they do? * For the existing users/applications, they want to utilize Arrow for better performance. Can they just turn on spark.sql.execution.arrow.enable? What should they do? Note Hopefully, the guides/solutions are user-friendly. That means, it must be very simple to understand for most users. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org