Re: Shared memory between C++ process and Spark

Kazuaki Ishizaki Mon, 07 Dec 2015 10:28:40 -0800

Is this JIRA entry related to what you want?
https://issues.apache.org/jira/browse/SPARK-10399

Regards,
Kazuaki Ishizaki

From:   Jia <jacqueline...@gmail.com>
To:     Dewful <dew...@gmail.com>
Cc:     "user @spark" <u...@spark.apache.org>, dev@spark.apache.org, Robin 
East <robin.e...@xense.co.uk>
Date:   2015/12/08 03:17
Subject:        Re: Shared memory between C++ process and Spark

Thanks, Dewful!

My impression is that Tachyon is a very nice in-memory file system that 
can connect to multiple storages.
However, because our data is also hold in memory, I suspect that 
connecting to Spark directly may be more efficient in performance.
But definitely I need to look at Tachyon more carefully, in case it has a 
very efficient C++ binding mechanism.

Best Regards,
Jia

On Dec 7, 2015, at 11:46 AM, Dewful <dew...@gmail.com> wrote:

Maybe looking into something like Tachyon would help, I see some sample 
c++ bindings, not sure how much of the current functionality they 
support...
Hi, Robin, 
Thanks for your reply and thanks for copying my question to user mailing 
list.
Yes, we have a distributed C++ application, that will store data on each 
node in the cluster, and we hope to leverage Spark to do more fancy 
analytics on those data. But we need high performance, that’s why we want 
shared memory.
Suggestions will be highly appreciated!

Best Regards,
Jia

On Dec 7, 2015, at 10:54 AM, Robin East <robin.e...@xense.co.uk> wrote:

-dev, +user (this is not a question about development of Spark itself so 
you’ll get more answers in the user mailing list)

First up let me say that I don’t really know how this could be done - I’
m sure it would be possible with enough tinkering but it’s not clear what 
you are trying to achieve. Spark is a distributed processing system, it 
has multiple JVMs running on different machines that each run a small part 
of the overall processing. Unless you have some sort of idea to have 
multiple C++ processes collocated with the distributed JVMs using named 
memory mapped files doesn’t make architectural sense. 
-------------------------------------------------------------------------------
Robin East
Spark GraphX in Action Michael Malak and Robin East
Manning Publications Co.
http://www.manning.com/books/spark-graphx-in-action

On 6 Dec 2015, at 20:43, Jia <jacqueline...@gmail.com> wrote:

Dears, for one project, I need to implement something so Spark can read 
data from a C++ process. 
To provide high performance, I really hope to implement this through 
shared memory between the C++ process and Java JVM process.
It seems it may be possible to use named memory mapped files and JNI to do 
this, but I wonder whether there is any existing efforts or more efficient 
approach to do this?
Thank you very much!

Best Regards,
Jia

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Shared memory between C++ process and Spark

Reply via email to