In a few earlier posts [ 1
<http://apache-spark-developers-list.1001551.n3.nabble.com/Tungsten-off-heap-memory-access-for-C-libraries-td13898.html>
 
] [ 2
<http://apache-spark-developers-list.1001551.n3.nabble.com/How-to-access-the-off-heap-representation-of-cached-data-in-Spark-2-0-td17701.html>
 
], I asked about moving data from C++ into a Spark data source (RDD,
DataFrame, or Dataset). The issue is that even the off-heap cache might not
have a stable representation: it might change from one version to the next.

I recently learned about Apache Arrow, a data layer that Spark currently or
will someday share with Pandas, Impala, etc. Suppose that I can fill a
buffer (such as a direct ByteBuffer) with Arrow-formatted data, is there an
easy--- or even zero-copy--- way to use that in Spark? Is that an API that
could be developed?

I'll be at the KDD Spark 2.0 tutorial on August 15. Is that a good place to
ask this question?

Thanks,
-- Jim




--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Apache-Arrow-data-in-buffer-to-RDD-DataFrame-Dataset-tp18563.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to