I’m not sure what point you’re trying to prove and I’m not particularly interested in getting into a protracted discussion. Here is what you wrote: The architecture of Spark is to run on top of HDFS. I interpreted that as a statement implying that Spark has to run on HDFS which is definitely not the case. If you didn’t mean then we are both in agreement. ------------------------------------------------------------------------------- Robin East Spark GraphX in Action Michael Malak and Robin East Manning Publications Co. http://www.manning.com/books/spark-graphx-in-action <http://www.manning.com/books/spark-graphx-in-action>
> On 7 Dec 2015, at 19:56, Annabel Melongo <[email protected]> wrote: > > Robin, > > To prove my point, this is an unresolved issue still in the implementation > stage. > > > > On Monday, December 7, 2015 2:49 PM, Robin East <[email protected]> > wrote: > > > Hi Annabel > > I certainly did read your post. My point was that Spark can read from HDFS > but is in no way tied to that storage layer . A very interesting use case > that sounds very similar to Jia's (as mentioned by another poster) is > contained in https://issues.apache.org/jira/browse/SPARK-10399 > <https://issues.apache.org/jira/browse/SPARK-10399>. The comments section > provides a specific example of processing very large images using a > pre-existing c++ library. > > Robin > > Sent from my iPhone > > On 7 Dec 2015, at 18:50, Annabel Melongo <[email protected] > <mailto:[email protected]>> wrote: > >> Jia, >> >> I'm so confused on this. The architecture of Spark is to run on top of HDFS. >> What you're requesting, reading and writing to a C++ process, is not part of >> that requirement. >> >> >> >> >> >> On Monday, December 7, 2015 1:42 PM, Jia <[email protected] >> <mailto:[email protected]>> wrote: >> >> >> Thanks, Annabel, but I may need to clarify that I have no intention to write >> and run Spark UDF in C++, I'm just wondering whether Spark can read and >> write data to a C++ process with zero copy. >> >> Best Regards, >> Jia >> >> >> >> On Dec 7, 2015, at 12:26 PM, Annabel Melongo <[email protected] >> <mailto:[email protected]>> wrote: >> >>> My guess is that Jia wants to run C++ on top of Spark. If that's the case, >>> I'm afraid this is not possible. Spark has support for Java, Python, Scala >>> and R. >>> >>> The best way to achieve this is to run your application in C++ and used the >>> data created by said application to do manipulation within Spark. >>> >>> >>> >>> On Monday, December 7, 2015 1:15 PM, Jia <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> >>> Thanks, Dewful! >>> >>> My impression is that Tachyon is a very nice in-memory file system that can >>> connect to multiple storages. >>> However, because our data is also hold in memory, I suspect that connecting >>> to Spark directly may be more efficient in performance. >>> But definitely I need to look at Tachyon more carefully, in case it has a >>> very efficient C++ binding mechanism. >>> >>> Best Regards, >>> Jia >>> >>> On Dec 7, 2015, at 11:46 AM, Dewful <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>>> Maybe looking into something like Tachyon would help, I see some sample >>>> c++ bindings, not sure how much of the current functionality they >>>> support... >>>> Hi, Robin, >>>> Thanks for your reply and thanks for copying my question to user mailing >>>> list. >>>> Yes, we have a distributed C++ application, that will store data on each >>>> node in the cluster, and we hope to leverage Spark to do more fancy >>>> analytics on those data. But we need high performance, that’s why we want >>>> shared memory. >>>> Suggestions will be highly appreciated! >>>> >>>> Best Regards, >>>> Jia >>>> >>>> On Dec 7, 2015, at 10:54 AM, Robin East <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>>> -dev, +user (this is not a question about development of Spark itself so >>>>> you’ll get more answers in the user mailing list) >>>>> >>>>> First up let me say that I don’t really know how this could be done - I’m >>>>> sure it would be possible with enough tinkering but it’s not clear what >>>>> you are trying to achieve. Spark is a distributed processing system, it >>>>> has multiple JVMs running on different machines that each run a small >>>>> part of the overall processing. Unless you have some sort of idea to have >>>>> multiple C++ processes collocated with the distributed JVMs using named >>>>> memory mapped files doesn’t make architectural sense. >>>>> ------------------------------------------------------------------------------- >>>>> Robin East >>>>> Spark GraphX in Action Michael Malak and Robin East >>>>> Manning Publications Co. >>>>> http://www.manning.com/books/spark-graphx-in-action >>>>> <http://www.manning.com/books/spark-graphx-in-action> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> On 6 Dec 2015, at 20:43, Jia <[email protected] >>>>>> <mailto:[email protected]>> wrote: >>>>>> >>>>>> Dears, for one project, I need to implement something so Spark can read >>>>>> data from a C++ process. >>>>>> To provide high performance, I really hope to implement this through >>>>>> shared memory between the C++ process and Java JVM process. >>>>>> It seems it may be possible to use named memory mapped files and JNI to >>>>>> do this, but I wonder whether there is any existing efforts or more >>>>>> efficient approach to do this? >>>>>> Thank you very much! >>>>>> >>>>>> Best Regards, >>>>>> Jia >>>>>> >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: [email protected] >>>>>> <mailto:[email protected]> >>>>>> For additional commands, e-mail: [email protected] >>>>>> <mailto:[email protected]> >>>>>> >>>>> >>>> >>> >>> >>> >> >> >> > >
