Dataset?

Jim Pivarski Fri, 05 Aug 2016 14:18:37 -0700

I see. I've already started working with Arrow-C++ and talking to members
of the Arrow community, so I'll keep doing that.


As a follow-up question, is there an approximate timescale for when Spark
will support Arrow? I'd just like to know that all the pieces will come
together eventually.

(In this forum, most of the discussion about Arrow is about PySpark and
Pandas, not Spark in general.)

Best,
Jim

On Aug 5, 2016 2:43 PM, "Holden Karau" <[email protected]> wrote:

> Spark does not currently support Apache Arrow - probably a good place to
> chat would be on the Arrow mailing list where they are making progress
> towards unified JVM & Python/R support which is sort of a precondition of a
> functioning Arrow interface between Spark and Python.
>
> On Fri, Aug 5, 2016 at 12:40 PM, [email protected] <[email protected]>
> wrote:
>
>> In a few earlier posts [ 1
>> <http://apache-spark-developers-list.1001551.n3.nabble.com/
>> Tungsten-off-heap-memory-access-for-C-libraries-td13898.html>
>> ] [ 2
>> <http://apache-spark-developers-list.1001551.n3.nabble.com/
>> How-to-access-the-off-heap-representation-of-cached-data-
>> in-Spark-2-0-td17701.html>
>> ], I asked about moving data from C++ into a Spark data source (RDD,
>> DataFrame, or Dataset). The issue is that even the off-heap cache might
>> not
>> have a stable representation: it might change from one version to the
>> next.
>>
>> I recently learned about Apache Arrow, a data layer that Spark currently
>> or
>> will someday share with Pandas, Impala, etc. Suppose that I can fill a
>> buffer (such as a direct ByteBuffer) with Arrow-formatted data, is there
>> an
>> easy--- or even zero-copy--- way to use that in Spark? Is that an API that
>> could be developed?
>>
>> I'll be at the KDD Spark 2.0 tutorial on August 15. Is that a good place
>> to
>> ask this question?
>>
>> Thanks,
>> -- Jim
>>
>>
>>
>>
>> --
>> View this message in context: http://apache-spark-developers
>> -list.1001551.n3.nabble.com/Apache-Arrow-data-in-buffer-
>> to-RDD-DataFrame-Dataset-tp18563.html
>> Sent from the Apache Spark Developers List mailing list archive at
>> Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: [email protected]
>>
>>
>
>
> --
> Cell : 425-233-8271
> Twitter: https://twitter.com/holdenkarau
>

Re: Apache Arrow data in buffer to RDD/DataFrame/Dataset?

Reply via email to