In a few earlier posts [ 1 <http://apache-spark-developers-list.1001551.n3.nabble.com/Tungsten-off-heap-memory-access-for-C-libraries-td13898.html> ] [ 2 <http://apache-spark-developers-list.1001551.n3.nabble.com/How-to-access-the-off-heap-representation-of-cached-data-in-Spark-2-0-td17701.html> ], I asked about moving data from C++ into a Spark data source (RDD, DataFrame, or Dataset). The issue is that even the off-heap cache might not have a stable representation: it might change from one version to the next.
I recently learned about Apache Arrow, a data layer that Spark currently or will someday share with Pandas, Impala, etc. Suppose that I can fill a buffer (such as a direct ByteBuffer) with Arrow-formatted data, is there an easy--- or even zero-copy--- way to use that in Spark? Is that an API that could be developed? I'll be at the KDD Spark 2.0 tutorial on August 15. Is that a good place to ask this question? Thanks, -- Jim -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Apache-Arrow-data-in-buffer-to-RDD-DataFrame-Dataset-tp18563.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org