Re: Tungsten off heap memory access for C++ libraries
Update for those who are still interested: djinni is a nice tool for generating Java/C++ bindings. Before today djinni's Java support was only aimed at Android, but now djinni works with (at least) Debian, Ubuntu, and CentOS. djinni will help you run C++ code in-process with the caveat that djinni only supports deep-copies of on-JVM-heap data (and no special off-heap features yet). However, you can in theory use Unsafe to get pointers to off-heap memory and pass those (as ints) to native code. So if you need a solution *today*, try checking out a small demo: https://github.com/dropbox/djinni/tree/master/example/localhost For the long deets, see: https://github.com/dropbox/djinni/pull/140 -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Tungsten-off-heap-memory-access-for-C-libraries-tp13898p14427.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Code generation for GPU
In order to get a major speedup from applying *single-pass* map/filter/reduce operations on an array in GPU memory, wouldn't you need to stream the columnar data directly into GPU memory somehow? You might find in your experiments that GPU memory allocation is a bottleneck. See e.g. John Canny's paper here (Section 1.1 paragraph 2): http://www.cs.berkeley.edu/~jfc/papers/13/BIDMach.pdfIf the per-item operation is very non-trivial, though, a dramatic GPU speedup may be more likely. Something related (and perhaps easier to contribute to Spark) might be a GPU-accelerated sorter for sorting Unsafe records. Especially since that stuff is already broken out somewhat well-- e.g. `UnsafeInMemorySorter`. Spark appears to use (single-threaded) Timsort for sorting Unsafe records, so I imagine a multi-thread/multi-core GPU solution could handily beat that. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Re-Code-generation-for-GPU-tp13954p14030.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Tungsten off heap memory access for C++ libraries
Paul: I've worked on running C++ code on Spark at scale before (via JNA, ~200 cores) and am working on something more contribution-oriented now (via JNI). A few comments: * If you need something *today*, try JNA. It can be slow (e.g. a short native function in a tight loop) but works if you have an existing C library. * If you want true zero-copy nested data structures (with explicit schema), you probably want to look at Google Flatbuffers or Captain Proto. Protobuf does copies; not sure about Avro. However, if instances of your nested messages fit completely in CPU cache, there might not be much benefit to zero-copy. * Tungsten numeric arrays and UTF-8 strings should be portable but likely need some special handling. (A major benefit of Protobuf, Avro, Flatbuffers, Capnp, etc., is these libraries already handle endianness and UTF8 for C++). * NB: Don't try to dive into messing with (standard) Java String <-> std::string using JNI. It's a very messy problem :) Was there indeed a JIRA started to track this issue? Can't find it at the moment ... -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Tungsten-off-heap-memory-access-for-C-libraries-tp13898p13929.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org