Re: Shared memory between C++ process and Spark

2015-12-07 Thread Annabel Melongo
My guess is that Jia wants to run C++ on top of Spark. If that's the case, I'm afraid this is not possible. Spark has support for Java, Python, Scala and R. The best way to achieve this is to run your application in C++ and used the data created by said application to do manipulation within

Re: Shared memory between C++ process and Spark

2015-12-07 Thread Robin East
I guess you could write a custom RDD that can read data from a memory-mapped file - not really my area of expertise so I’ll leave it to other members of the forum to chip in with comments as to whether that makes sense. But if you want ‘fancy analytics’ then won’t the processing time more than

Fwd: Oozie SparkAction not able to use spark conf values

2015-12-07 Thread Rajadayalan Perumalsamy
Hi We are trying to change our existing oozie workflows to use SparkAction instead of ShellAction. We are passing spark configuration in spark-opts with --conf, but these values are not accessible in Spark and it is throwing error. Please note we are able to use SparkAction successfully in

Re: Shared memory between C++ process and Spark

2015-12-07 Thread Annabel Melongo
Jia, I'm so confused on this. The architecture of Spark is to run on top of HDFS. What you're requesting, reading and writing to a C++ process, is not part of that requirement. On Monday, December 7, 2015 1:42 PM, Jia wrote: Thanks, Annabel, but I may need

RE: How to create dataframe from SQL Server SQL query

2015-12-07 Thread Wang, Ningjun (LNG-NPV)
This is a very helpful article. Thanks for the help. Ningjun From: Sujit Pal [mailto:sujitatgt...@gmail.com] Sent: Monday, December 07, 2015 12:42 PM To: Wang, Ningjun (LNG-NPV) Cc: user@spark.apache.org Subject: Re: How to create dataframe from SQL Server SQL query Hi Ningjun, Haven't done

Re: Shared memory between C++ process and Spark

2015-12-07 Thread Robin East
Annabel Spark works very well with data stored in HDFS but is certainly not tied to it. Have a look at the wide variety of connectors to things like Cassandra, HBase, etc. Robin Sent from my iPhone > On 7 Dec 2015, at 18:50, Annabel Melongo wrote: > > Jia, > >

How to build Spark with Ganglia to enable monitoring using Ganglia

2015-12-07 Thread SRK
Hi, How to do a maven build to enable monitoring using Ganglia? What is the command for the same? Thanks, Swetha -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-build-Spark-with-Ganglia-to-enable-monitoring-using-Ganglia-tp25625.html Sent from the

Removing duplicates from dataframe

2015-12-07 Thread Ross.Cramblit
I have pyspark app loading a large-ish (100GB) dataframe from JSON files and it turns out there are a number of duplicate JSON objects in the source data. I am trying to find the best way to remove these duplicates before using the dataframe. With both df.dropDuplicates() and

python rdd.partionBy(): any examples of a custom partitioner?

2015-12-07 Thread Keith Freeman
I'm not a python expert, so I'm wondering if anybody has a working example of a partitioner for the "partitionFunc" argument (default "portable_hash") to rdd.partitionBy()? - To unsubscribe, e-mail:

<    1   2