[jira] [Commented] (SPARK-14141) Let user specify datatypes of pandas dataframe in toPandas()

Luke Miner (JIRA) Sat, 26 Mar 2016 14:07:00 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-14141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15213199#comment-15213199
 ]


Luke Miner commented on SPARK-14141:
------------------------------------

If that's the case, it sounds like it is doable. One way would be to convert 
each chunk into a dataframe using the `from_records` constructor,  coercing 
each column in the dataframe to the specified datatype, and then appending to 
the final dataframe.

Or maybe it's worthwhile to add an issue to pandas allowing type coercion as 
part of the `from_records` constructor...

> Let user specify datatypes of pandas dataframe in toPandas()
> ------------------------------------------------------------
>
>                 Key: SPARK-14141
>                 URL: https://issues.apache.org/jira/browse/SPARK-14141
>             Project: Spark
>          Issue Type: New Feature
>          Components: Input/Output, PySpark, SQL
>            Reporter: Luke Miner
>            Priority: Minor
>
> Would be nice to specify the dtypes of the pandas dataframe during the 
> toPandas() call. Something like:
> bq. pdf = df.toPandas(dtypes={'a': 'float64', 'b': 'datetime64', 'c': 'bool', 
> 'd': 'category'})
> Since dtypes like `category` are more memory efficient, you could potentially 
> load many more rows into a pandas dataframe with this option without running 
> out of memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14141) Let user specify datatypes of pandas dataframe in toPandas()

Reply via email to