[GitHub] spark issue #19575: [SPARK-22221][DOCS] Adding User Documentation for Arrow

gatorsmile Mon, 29 Jan 2018 09:39:20 -0800

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19575
  
    I have two major comments. 
    - `group map` -> `grouped map` We need to also update `PythonEvalType`. 
        > SQL_PANDAS_GROUP_MAP_UDF -> SQL_PANDAS_GROUPED_MAP_UDF
        > SQL_PANDAS_GROUP_AGG_UDF -> SQL_PANDAS_GROUPED_AGG_UDF
    
    - Open a JIRA to add another limit in the next release (2.4) based on 
memory consumption, instead of number of rows. My major reason is the row size 
might be different and thus it is possible that the session-based SQLConf 
`spark.sql.execution.arrow.maxRecordsPerBatch` needs to be adjusted for 
different queries. It is hard for users to tune such a conf.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19575: [SPARK-22221][DOCS] Adding User Documentation for Arrow

Reply via email to