[ 
https://issues.apache.org/jira/browse/SPARK-10399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14744634#comment-14744634
 ] 

Paul Wais edited comment on SPARK-10399 at 9/15/15 1:16 AM:
------------------------------------------------------------

After investigating this issue a bit further, it might be feasible to expose 
*on-heap* Spark memory (without a copy) to native code through the 
Get/Release*Critical() JNI interface.  Android[1] uses this interface for 
copying on-heap data to devices (e.g. the GPU).  It's important to note that 
the interface is not necessarily zero-copy and will cause some JVMs to block GC 
(e.g. Hotspot [2])-- could lead to longer Spark GC pauses?  In any case, this 
feature might help expose the individual elements of an RDD to native code 
without any major changes to Spark (e.g. to the BlockManager).

Nevertheless, native code would ideally not run a JNI call per-item (e.g. per 
row) and instead could get access to a segment of rows or an entire partition.  
However, blocking the GC while processing an entire partition would probably 
not work well in practice...

[1] 
https://github.com/android/platform_frameworks_base/search?p=3&q=GetPrimitiveArrayCritical&utf8=%E2%9C%93
[2] 
https://github.com/openjdk-mirror/jdk7u-hotspot/blob/50bdefc3afe944ca74c3093e7448d6b889cd20d1/src/share/vm/prims/jni.cpp#L4235


was (Author: pwais):
After investigating this issue a bit further, it might be feasible to expose 
*on*-heap Spark memory (without a copy) to native code through the 
{Get,Release}*Critical() JNI interface.  Android[1] uses this interface for 
copying on-heap data to devices (e.g. the GPU).  It's important to note that 
the interface is not necessarily zero-copy and will cause some JVMs to block GC 
(e.g. Hotspot [2])-- could lead to longer Spark GC pauses?  In any case, this 
feature might help expose the individual elements of an RDD to native code 
without any major changes to Spark (e.g. to the BlockManager).

Nevertheless, native code would ideally not run a JNI call per-item (e.g. per 
row) and instead could get access to a segment of rows or an entire partition.  
However, blocking the GC while processing an entire partition would probably 
not work well in practice...

[1] 
https://github.com/android/platform_frameworks_base/search?p=3&q=GetPrimitiveArrayCritical&utf8=%E2%9C%93
[2] 
https://github.com/openjdk-mirror/jdk7u-hotspot/blob/50bdefc3afe944ca74c3093e7448d6b889cd20d1/src/share/vm/prims/jni.cpp#L4235

> Off Heap Memory Access for non-JVM libraries (C++)
> --------------------------------------------------
>
>                 Key: SPARK-10399
>                 URL: https://issues.apache.org/jira/browse/SPARK-10399
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>            Reporter: Paul Weiss
>
> *Summary*
> Provide direct off-heap memory access to an external non-JVM program such as 
> a c++ library within the Spark running JVM/executor.  As Spark moves to 
> storing all data into off heap memory it makes sense to provide access points 
> to the memory for non-JVM programs.
> ----
> *Assumptions*
> * Zero copies will be made during the call into non-JVM library
> * Access into non-JVM libraries will be accomplished via JNI
> * A generic JNI interface will be created so that developers will not need to 
> deal with the raw JNI call
> * C++ will be the initial target non-JVM use case
> * memory management will remain on the JVM/Spark side
> * the API from C++ will be similar to dataframes as much as feasible and NOT 
> require expert knowledge of JNI
> * Data organization and layout will support complex (multi-type, nested, 
> etc.) types
> ----
> *Design*
> * Initially Spark JVM -> non-JVM will be supported 
> * Creating an embedded JVM with Spark running from a non-JVM program is 
> initially out of scope
> ----
> *Technical*
> * GetDirectBufferAddress is the JNI call used to access byte buffer without 
> copy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to