More thoughts.  I took a deeper look at BlockManager, RDD, and friends. 
Suppose one wanted to get native code access to un-deserialized blocks. 
This task looks very hard.  An RDD behaves much like a Scala iterator of
deserialized values, and interop with BlockManager is all on deserialized
data.  One would probably need to rewrite much of RDD, CacheManager, etc in
native code; an RDD subclass (e.g. PythonRDD) probably wouldn't work.

So exposing raw blocks to native code looks intractable.  I wonder how fast
Java/Kyro can SerDe of byte arrays.  E.g. suppose we have an RDD<T> where T
is immutable and most of the memory for a single T is a byte array.  What is
the overhead of SerDe-ing T?  (Does Java/Kyro copy the underlying memory?) 
If the overhead is small, then native access to raw blocks wouldn't really
yield any advantage.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Native-C-C-code-integration-tp18347p18640.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to