Re: question about pyarrow.Table to pyspark.DataFrame conversion

2020-10-24 Thread shouheng
Hi Bryan, I came across SPARK-29040 and I'm very excited that others are looking for such feature as well. It will be tremendously useful if we can implement this feature. Currently, my workaround is to serialize `pyarrow.Table` to a parquet

Re: question about pyarrow.Table to pyspark.DataFrame conversion

2019-09-10 Thread Bryan Cutler
Hi Artem, I don't believe this is currently possible, but it could be a great addition to PySpark since this would offer a convenient and efficient way to parallelize nested column data. I created the JIRA https://issues.apache.org/jira/browse/SPARK-29040 for this. On Tue, Aug 27, 2019 at 7:55

question about pyarrow.Table to pyspark.DataFrame conversion

2019-08-27 Thread Artem Kozhevnikov
I wonder if there's some recommended method to convert in memory pyarrow.Table (or pyarrow.BatchRecord) to pyspark.Dataframe without using pandas ? My motivation is about converting nested data (like List[int]) that have an efficient representation in pyarrow which is not possible with Pandas (I