Hi, Robin,
Thanks for your reply and thanks for copying my question to user mailing list.
Yes, we have a distributed C++ application, that will store data on each node
in the cluster, and we hope to leverage Spark to do more fancy analytics on
those data. But we need high performance, that’s
-dev, +user (this is not a question about development of Spark itself so you’ll
get more answers in the user mailing list)
First up let me say that I don’t really know how this could be done - I’m sure
it would be possible with enough tinkering but it’s not clear what you are
trying to
Robin,
Maybe you didn't read my post in which I stated that Spark works on top of
HDFS. What Jia wants is to have Spark interacts with a C++ process to read and
write data.
I've never heard about Jia's use case in Spark. If you know one, please share
that with me.
Thanks
On Monday,
Hi Annabel
I certainly did read your post. My point was that Spark can read from HDFS but
is in no way tied to that storage layer . A very interesting use case that
sounds very similar to Jia's (as mentioned by another poster) is contained in
https://issues.apache.org/jira/browse/SPARK-10399.
Robin,
To prove my point, this is an unresolved issue still in the implementation
stage.
On Monday, December 7, 2015 2:49 PM, Robin East
wrote:
Hi Annabel
I certainly did read your post. My point was that Spark can read from HDFS but
is in no way tied to that
SparkNet may have some interesting ideas - https://github.com/amplab/SparkNet.
Haven't had a deep look at it yet but it seems to have some functionality
allowing caffe to read data from RDDs, though I'm not certain the memory is
shared.
—
Sent from Mailbox
On Mon, Dec 7, 2015 at 9:55 PM,
<jacqueline...@gmail.com>; Dewful <dew...@gmail.com>; "user @spark"
<user@spark.apache.org>; "d...@spark.apache.org" <d...@spark.apache.org>
Sent: Monday, December 7, 2015 10:57 AM
Subject: Re: Shared memory between C++ process and Spark
Thanks, Annabel, but I may need to clarify that I have no intention to write
and run Spark UDF in C++, I'm just wondering whether Spark can read and write
data to a C++ process with zero copy.
Best Regards,
Jia
On Dec 7, 2015, at 12:26 PM, Annabel Melongo wrote:
Thanks, Robin, you have a very good point!
We feel that the data copy and allocation overhead may become a performance
bottleneck, and is evaluating it right now.
We will do the shared memory stuff only if we’re sure about the potential
performance gain and sure that there is no existing stuff
To:Dewful <dew...@gmail.com>
> Cc:"user @spark" <user@spark.apache.org>, d...@spark.apache.org,
> Robin East <robin.e...@xense.co.uk>
> Date: 2015/12/08 03:17
> Subject:Re: Shared memory between C++ process and Spark
>
>
>
Thanks, Dewful!
My impression is that Tachyon is a very nice in-memory file system that can
connect to multiple storages.
However, because our data is also hold in memory, I suspect that connecting to
Spark directly may be more efficient in performance.
But definitely I need to look at Tachyon
My guess is that Jia wants to run C++ on top of Spark. If that's the case, I'm
afraid this is not possible. Spark has support for Java, Python, Scala and R.
The best way to achieve this is to run your application in C++ and used the
data created by said application to do manipulation within
I guess you could write a custom RDD that can read data from a memory-mapped
file - not really my area of expertise so I’ll leave it to other members of the
forum to chip in with comments as to whether that makes sense.
But if you want ‘fancy analytics’ then won’t the processing time more than
Jia,
I'm so confused on this. The architecture of Spark is to run on top of HDFS.
What you're requesting, reading and writing to a C++ process, is not part of
that requirement.
On Monday, December 7, 2015 1:42 PM, Jia wrote:
Thanks, Annabel, but I may need
Annabel
Spark works very well with data stored in HDFS but is certainly not tied to it.
Have a look at the wide variety of connectors to things like Cassandra, HBase,
etc.
Robin
Sent from my iPhone
> On 7 Dec 2015, at 18:50, Annabel Melongo wrote:
>
> Jia,
>
>
15 matches
Mail list logo