Re: Arrow RecordBatches/Pandas Dataframes to (Arrow enabled) Spark Dataframe conversion in streaming fashion

Jorge Machado Mon, 25 May 2020 07:52:33 -0700

Hey, from what I know you can try to Union them df.union(df2)

Not sure if this is what you need


> On 25. May 2020, at 13:53, Tanveer Ahmad - EWI <t.ah...@tudelft.nl> wrote:
> 
> Hi all,
> 
> I need some help regarding Arrow RecordBatches/Pandas Dataframes to (Arrow 
> enabled) Spark Dataframe conversions.
> Here the example explains very well how to convert a single Pandas Dataframe 
> to Spark Dataframe [1]. 
> 
> But in my case, some external applications are generating Arrow RecordBatches 
> in my PySpark application in streaming fashion. Each time I receive an Arrow 
> RB, I want to transfer/append it to a Spark Dataframe. So is it possible to 
> create a Spark Dataframe initially from one Arrow RecordBatch and then start 
> appending many other in-coming Arrow RecordBatches to that Spark Dataframe 
> (like in streaming fashion)? Thanks!
> 
> I saw another example [2] in which all the Arrow RB are being converted to 
> Spark Dataframe but my case is little bit different than this.  
> 
> [1] https://spark.apache.org/docs/latest/sql-pyspark-pandas-with-arrow.html 
>  <https://spark.apache.org/docs/latest/sql-pyspark-pandas-with-arrow.html>
> [2] https://gist.github.com/linar-jether/7dd61ed6fa89098ab9c58a1ab428b2b5 
> <https://gist.github.com/linar-jether/7dd61ed6fa89098ab9c58a1ab428b2b5>
> 
> ---
> Regards,
> Tanveer Ahmad

Re: Arrow RecordBatches/Pandas Dataframes to (Arrow enabled) Spark Dataframe conversion in streaming fashion

Reply via email to