Paul: I've worked on running C++ code on Spark at scale before (via JNA, ~200 cores) and am working on something more contribution-oriented now (via JNI). A few comments: * If you need something *today*, try JNA. It can be slow (e.g. a short native function in a tight loop) but works if you have an existing C library. * If you want true zero-copy nested data structures (with explicit schema), you probably want to look at Google Flatbuffers or Captain Proto. Protobuf does copies; not sure about Avro. However, if instances of your nested messages fit completely in CPU cache, there might not be much benefit to zero-copy. * Tungsten numeric arrays and UTF-8 strings should be portable but likely need some special handling. (A major benefit of Protobuf, Avro, Flatbuffers, Capnp, etc., is these libraries already handle endianness and UTF8 for C++). * NB: Don't try to dive into messing with (standard) Java String <-> std::string using JNI. It's a very messy problem :)
Was there indeed a JIRA started to track this issue? Can't find it at the moment ... -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Tungsten-off-heap-memory-access-for-C-libraries-tp13898p13929.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org