More thoughts. I took a deeper look at BlockManager, RDD, and friends. Suppose one wanted to get native code access to un-deserialized blocks. This task looks very hard. An RDD behaves much like a Scala iterator of deserialized values, and interop with BlockManager is all on deserialized data. One would probably need to rewrite much of RDD, CacheManager, etc in native code; an RDD subclass (e.g. PythonRDD) probably wouldn't work.
So exposing raw blocks to native code looks intractable. I wonder how fast Java/Kyro can SerDe of byte arrays. E.g. suppose we have an RDD<T> where T is immutable and most of the memory for a single T is a byte array. What is the overhead of SerDe-ing T? (Does Java/Kyro copy the underlying memory?) If the overhead is small, then native access to raw blocks wouldn't really yield any advantage. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Native-C-C-code-integration-tp18347p18640.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org