We have a similar use case, and we use ArrowConverters.scala mentioned by Wes. 
However, the overhead of the conversion is kinda high.
________________________________
From: Wes McKinney <wesmck...@gmail.com>
Sent: Thursday, December 5, 2019 6:53 AM
To: dev <dev@arrow.apache.org>
Cc: Fan Liya <liya.fa...@gmail.com>; jeetendra.jais...@impetus.co.in.invalid 
<jeetendra.jais...@impetus.co.in.invalid>
Subject: Re: Java - Spark dataframe to Arrow format

hi folks,

I understand the question to be about serialization.

see

* 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java
* 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala
* 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/util/ArrowUtils.scala

This code is used to convert between Spark Data Frames and Arrow
columnar format for UDF evaluation purposes

On Thu, Dec 5, 2019 at 6:58 AM GaoXiang Wang <wgx...@gmail.com> wrote:
>
> Hi Jeetendra and Liya,
>
> I am actually having a similar use case. We have some data stored as *parquet
> format in HDFS* and would like to make use of Apache Arrow to improve
> compute performance if possible. Right now, I didn't see there is a direct
> way to do in Java with Spark.
>
> I have search the Spark documentation, it looks like python support is
> added after 2.3.0 (
> https://urldefense.proofpoint.com/v2/url?u=https-3A__spark.apache.org_docs_latest_sql-2Dpyspark-2Dpandas-2Dwith-2Darrow.html&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=JX5y-LzqAOulZIcSbMRGYA&m=E587baUuoFDKcpKDDIF4Su8nBHs0I9bGTBqEhtErCuY&s=n28eUF_7egcwK6LLh63Wra3oWTzZWBlB6en3xCxDEdE&e=
>  ),
> any plan from Apache Arrow team to provide *Spark integration for Java*?
>
> Thank you very much.
>
>
> *Best Regards,WANG GAOXIANG*
> * (Eric) *
> National University of Singapore Graduate ::
> API Craft Singapore Co-organiser ::
> Singapore Python User Group Co-organiser
> *+6597685360 (P) :: wgx...@gmail.com <wgx...@gmail.com> (E) ::
> **https://urldefense.proofpoint.com/v2/url?u=https-3A__medium.com_-40wgx731&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=JX5y-LzqAOulZIcSbMRGYA&m=E587baUuoFDKcpKDDIF4Su8nBHs0I9bGTBqEhtErCuY&s=thoJd3JhOJ8HBCsAJTzhnfw91reStRfH0pUj9v-v5xE&e=
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__medium.com_-40wgx731&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=JX5y-LzqAOulZIcSbMRGYA&m=E587baUuoFDKcpKDDIF4Su8nBHs0I9bGTBqEhtErCuY&s=thoJd3JhOJ8HBCsAJTzhnfw91reStRfH0pUj9v-v5xE&e=
>  > **(W)*
>
>
> On Thu, Dec 5, 2019 at 6:58 PM Fan Liya <liya.fa...@gmail.com> wrote:
>
> > Hi Jeetendra,
> >
> > I am not sure if I understand your question correctly.
> >
> > Arrow is an in-memory columnar data format, and Spark has its own in-memory
> > data format for DataFrame, which is invisible to end users.
> > So the Spark user has no control over the underlying in-memory layout.
> >
> > If you really want to convert a DataFrame into Arrow format, maybe you can
> > save the results of a Spark job to some external store (e.g. in ORC
> > format), and then load it back to memory in Arrow format (if this is what
> > you want).
> >
> > Best,
> > Liya Fan
> >
> > On Thu, Dec 5, 2019 at 5:53 PM Jeetendra Kumar Jaiswal
> > <jeetendra.jais...@impetus.co.in.invalid> wrote:
> >
> > > Hi Dev Team,
> > >
> > > Can someone please let me know how to convert spark data frame to Arrow
> > > format. I am coding in Java.
> > >
> > > Java documentation of Arrow just has function API information. It is
> > > little hard to develop without proper documentation.
> > >
> > > Is there a way to directly convert spark dataframe to Arrow format
> > > dataframes.
> > >
> > > Thanks,
> > > Jeetendra
> > >
> > > ________________________________
> > >
> > >
> > >
> > >
> > >
> > >
> > > NOTE: This message may contain information that is confidential,
> > > proprietary, privileged or otherwise protected by law. The message is
> > > intended solely for the named addressee. If received in error, please
> > > destroy and notify the sender. Any use of this email is prohibited when
> > > received in error. Impetus does not represent, warrant and/or guarantee,
> > > that the integrity of this communication has been maintained nor that the
> > > communication is free of errors, virus, interception or interference.
> > >
> >

Reply via email to