Re: Tungsten off heap memory access for C++ libraries

2015-10-01 Thread Paul Wais
Update for those who are still interested: djinni is a nice tool for
generating Java/C++ bindings.  Before today djinni's Java support was only
aimed at Android, but now djinni works with (at least) Debian, Ubuntu, and
CentOS.

djinni will help you run C++ code in-process with the caveat that djinni
only supports deep-copies of on-JVM-heap data (and no special off-heap
features yet).  However, you can in theory use Unsafe to get pointers to
off-heap memory and pass those (as ints) to native code.  

So if you need a solution *today*,  try checking out a small demo:
https://github.com/dropbox/djinni/tree/master/example/localhost

For the long deets, see:
 https://github.com/dropbox/djinni/pull/140



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Tungsten-off-heap-memory-access-for-C-libraries-tp13898p14427.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Code generation for GPU

2015-09-10 Thread Paul Wais
In order to get a major speedup from applying *single-pass* map/filter/reduce
operations on an array in GPU memory, wouldn't you need to stream the
columnar data directly into GPU memory somehow?  You might find in your
experiments that GPU memory allocation is a bottleneck.  See e.g. John
Canny's paper here (Section 1.1 paragraph 2):
http://www.cs.berkeley.edu/~jfc/papers/13/BIDMach.pdfIf the per-item
operation is very non-trivial, though, a dramatic GPU speedup may be more
likely.

Something related (and perhaps easier to contribute to Spark) might be a
GPU-accelerated sorter for sorting Unsafe records.  Especially since that
stuff is already broken out somewhat well-- e.g. `UnsafeInMemorySorter`. 
Spark appears to use (single-threaded) Timsort for sorting Unsafe records,
so I imagine a multi-thread/multi-core GPU solution could handily beat that.



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Re-Code-generation-for-GPU-tp13954p14030.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Tungsten off heap memory access for C++ libraries

2015-09-01 Thread Paul Wais
Paul: I've worked on running C++ code on Spark at scale before (via JNA, ~200
cores) and am working on something more contribution-oriented now (via JNI). 
A few comments:
 * If you need something *today*, try JNA.  It can be slow (e.g. a short
native function in a tight loop) but works if you have an existing C
library.
 * If you want true zero-copy nested data structures (with explicit schema),
you probably want to look at Google Flatbuffers or Captain Proto.  Protobuf
does copies; not sure about Avro.  However, if instances of your nested
messages fit completely in CPU cache, there might not be much benefit to
zero-copy.
 * Tungsten numeric arrays and UTF-8 strings should be portable but likely
need some special handling.  (A major benefit of Protobuf, Avro,
Flatbuffers, Capnp, etc., is these libraries already handle endianness and
UTF8 for C++).  
 * NB: Don't try to dive into messing with (standard) Java String <->
std::string using JNI.  It's a very messy problem :)

Was there indeed a JIRA started to track this issue?  Can't find it at the
moment ...



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Tungsten-off-heap-memory-access-for-C-libraries-tp13898p13929.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org