[ https://issues.apache.org/jira/browse/ARROW-7885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17040835#comment-17040835 ]
Benjamin commented on ARROW-7885: --------------------------------- Thanks for the feedback. The Flight project indeed looks very interesting! Maybe this is something we can migrate to in the future. Regarding parquet: I would like to be able to do something like this (similar to [https://arrow.apache.org/docs/python/ipc.html#arbitrary-object-serialization]): import dask df = dask.datasets.timeseries() buf = pa.serialize(df).to_buffer() Writing out individual tables (using parquet) would mean that I have to do the handling of the meta data on the dask dataframe myself. Since pickle for dask dataframe serialization is working fine (and apparently has improved performance which I didn't know), I see that extending the arbitrary object serialization to dask is not a priority. > [Python] pyarrow.serialize does not support dask dataframe > ---------------------------------------------------------- > > Key: ARROW-7885 > URL: https://issues.apache.org/jira/browse/ARROW-7885 > Project: Apache Arrow > Issue Type: Wish > Components: Python > Reporter: Benjamin > Priority: Minor > > Currently pyarrow knows how to serialize pandas dataframes but not dask > dataframes. > {code:java} > SerializationCallbackError: pyarrow does not know how to serialize objects of > type <class 'dask.dataframe.core.DataFrame'>. {code} > Pickling the dask dataframe foregoes the benefits of using pyarrow for the > sub dataframes. > Pyarrow support for serializing dask dataframes would allow storing > dataframes efficiently in a database instead of a file system (e.g. parquet). -- This message was sent by Atlassian Jira (v8.3.4#803005)