Re: Shared memory between C++ process and Spark

Jia Mon, 07 Dec 2015 10:46:55 -0800

Thanks, Annabel, but I may need to clarify that I have no intention to write 
and run Spark UDF in C++, I'm just wondering whether Spark can read and write 
data to a C++ process with zero copy.


Best Regards,
Jia
 


On Dec 7, 2015, at 12:26 PM, Annabel Melongo <melongo_anna...@yahoo.com> wrote:

> My guess is that Jia wants to run C++ on top of Spark. If that's the case, 
> I'm afraid this is not possible. Spark has support for Java, Python, Scala 
> and R.
> 
> The best way to achieve this is to run your application in C++ and used the 
> data created by said application to do manipulation within Spark.
> 
> 
> 
> On Monday, December 7, 2015 1:15 PM, Jia <jacqueline...@gmail.com> wrote:
> 
> 
> Thanks, Dewful!
> 
> My impression is that Tachyon is a very nice in-memory file system that can 
> connect to multiple storages.
> However, because our data is also hold in memory, I suspect that connecting 
> to Spark directly may be more efficient in performance.
> But definitely I need to look at Tachyon more carefully, in case it has a 
> very efficient C++ binding mechanism.
> 
> Best Regards,
> Jia
> 
> On Dec 7, 2015, at 11:46 AM, Dewful <dew...@gmail.com> wrote:
> 
>> Maybe looking into something like Tachyon would help, I see some sample c++ 
>> bindings, not sure how much of the current functionality they support...
>> Hi, Robin, 
>> Thanks for your reply and thanks for copying my question to user mailing 
>> list.
>> Yes, we have a distributed C++ application, that will store data on each 
>> node in the cluster, and we hope to leverage Spark to do more fancy 
>> analytics on those data. But we need high performance, that’s why we want 
>> shared memory.
>> Suggestions will be highly appreciated!
>> 
>> Best Regards,
>> Jia
>> 
>> On Dec 7, 2015, at 10:54 AM, Robin East <robin.e...@xense.co.uk> wrote:
>> 
>>> -dev, +user (this is not a question about development of Spark itself so 
>>> you’ll get more answers in the user mailing list)
>>> 
>>> First up let me say that I don’t really know how this could be done - I’m 
>>> sure it would be possible with enough tinkering but it’s not clear what you 
>>> are trying to achieve. Spark is a distributed processing system, it has 
>>> multiple JVMs running on different machines that each run a small part of 
>>> the overall processing. Unless you have some sort of idea to have multiple 
>>> C++ processes collocated with the distributed JVMs using named memory 
>>> mapped files doesn’t make architectural sense. 
>>> -------------------------------------------------------------------------------
>>> Robin East
>>> Spark GraphX in Action Michael Malak and Robin East
>>> Manning Publications Co.
>>> http://www.manning.com/books/spark-graphx-in-action
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> On 6 Dec 2015, at 20:43, Jia <jacqueline...@gmail.com> wrote:
>>>> 
>>>> Dears, for one project, I need to implement something so Spark can read 
>>>> data from a C++ process. 
>>>> To provide high performance, I really hope to implement this through 
>>>> shared memory between the C++ process and Java JVM process.
>>>> It seems it may be possible to use named memory mapped files and JNI to do 
>>>> this, but I wonder whether there is any existing efforts or more efficient 
>>>> approach to do this?
>>>> Thank you very much!
>>>> 
>>>> Best Regards,
>>>> Jia
>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>>> 
>>> 
>> 
> 
> 
>

Re: Shared memory between C++ process and Spark

Reply via email to