Re: Using data frames to join separate RDDs in spark streaming

2016-06-05 Thread Cyril Scetbon
Problem solved by creating only one RDD. > On Jun 1, 2016, at 14:05, Cyril Scetbon <cyril.scet...@free.fr> wrote: > > It seems that to join a DStream with a RDD I can use : > > mgs.transform(rdd => rdd.join(rdd1)) > > or > > mgs.foreachRDD(rdd => rdd.joi

Spark Streaming w/variables used as dynamic queries

2016-06-03 Thread Cyril Scetbon
Hey guys, Can someone help me to solve the current issue. My code is the following : var arr = new ArrayBuffer[String]() sa_msgs.map(x => x._1) .foreachRDD { rdd => arr = new ArrayBuffer[String]() } (2) sa_msgs.map(x => x._1) .foreachRDD { rdd => arr ++=

Re: Using data frames to join separate RDDs in spark streaming

2016-06-01 Thread Cyril Scetbon
It seems that to join a DStream with a RDD I can use : mgs.transform(rdd => rdd.join(rdd1)) or mgs.foreachRDD(rdd => rdd.join(rdd1)) But, I can't see why rdd1.toDF("id","aid") really causes SPARK-5063 > On Jun 1, 2016, at 12:00, Cyril Scetbon <cyril.scet

Using data frames to join separate RDDs in spark streaming

2016-06-01 Thread Cyril Scetbon
t or I need to do everything using RDDs join only ? And If I need to use only RDDs join, how to do it ? as I have a RDD (rdd1) and a DStream (mgs) ? Thanks -- Cyril SCETBON - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Re: Joining a RDD to a Dataframe

2016-05-12 Thread Cyril Scetbon
ataFrame.scala:152) Is there a better way to query nested objects and to join between a DF containing nested objects and another regular data frame (yes it's the current case) > On May 9, 2016, at 00:42, Cyril Scetbon <cyril.scet...@free.fr> wrote: > > Hi Ashish, > > The issue

Re: Joining a RDD to a Dataframe

2016-05-08 Thread Cyril Scetbon
ent on it and maybe give me advices. Thank you. > On May 8, 2016, at 22:12, Ashish Dubey <ashish@gmail.com> wrote: > > Is there any reason you dont want to convert this - i dont think join b/w RDD > and DF is supported. > > On Sat, May 7, 2016 at 11:41 PM, Cyril S

Joining a RDD to a Dataframe

2016-05-08 Thread Cyril Scetbon
Hi, I have a RDD built during a spark streaming job and I'd like to join it to a DataFrame (E/S input) to enrich it. It seems that I can't join the RDD and the DF without converting first the RDD to a DF (Tell me if I'm wrong). Here are the schemas of both DF : scala> df res32:

Re: spark-shell failing but pyspark works

2016-04-04 Thread Cyril Scetbon
The only way I've found to make it work now is by using the current spark context and changing its configuration using spark-shell options. Which is really different from pyspark where you can't instantiate a new one, initialize it etc.. > On Apr 4, 2016, at 18:16, Cyril Scetbon <cyri

Re: spark-shell failing but pyspark works

2016-04-04 Thread Cyril Scetbon
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> > > http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> > > > On 4 April 2016 at 21:11, Cyril Scetbon <cyril.scet...@free.fr > <mai

Re: spark-shell failing but pyspark works

2016-04-04 Thread Cyril Scetbon
t; > > > Dr Mich Talebzadeh > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> > > http://talebzadehmich.wordpress.com <http://tal

Re: spark-shell failing but pyspark works

2016-04-02 Thread Cyril Scetbon
Nobody has any idea ? > On Mar 31, 2016, at 23:22, Cyril Scetbon <cyril.scet...@free.fr> wrote: > > Hi, > > I'm having issues to create a StreamingContext with Scala using spark-shell. > It tries to access the localhost interface and the Application Master is not >

spark-shell failing but pyspark works

2016-03-31 Thread Cyril Scetbon
e differences. Are there any parameters set using Scala but not Python ? Thanks -- Cyril SCETBON - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Re: StateSpec raises error "missing arguments for method"

2016-03-27 Thread Cyril Scetbon
gt; val spec = StateSpec.function(mappingFunction) spec: org.apache.spark.streaming.StateSpec[String,Int,Int,Option[String]] = StateSpecImpl() This code comes from one of the spark examples. If anyone can explain me the difference between the two. Regards > On Mar 27, 2016, at 23:57, Cyril Scet

StateSpec raises error "missing arguments for method"

2016-03-27 Thread Cyril Scetbon
Hi, I'm testing a sample code and I get this error. Here is the sample code I use : scala> import org.apache.spark.streaming._ import org.apache.spark.streaming._ scala> def mappingFunction(key: String, value: Option[Int], state: State[Int]): Option[String] = { |Option(key) | }

Re: python support of mapWithState

2016-03-24 Thread Cyril Scetbon
tend to be easier to keep in sync since they are both in the JVM and a > bit more work is needed to expose the same functionality from the JVM in > Python (or re-implement the Scala code in Python where appropriate). > > On Thursday, March 24, 2016, Cyril Scetbon <cyril.scet...@free.fr

python support of mapWithState

2016-03-24 Thread Cyril Scetbon
Hi guys, It seems that mapWithState is not supported. Am I right ? Is there a reason there are 3 languages supported and one is behind the two others ? Thanks - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For

Re: Spark Streaming with Kafka DirectStream

2016-02-17 Thread Cyril Scetbon
I don't think we can print an integer value in a spark streaming process As opposed to a spark job. I think I can print the content of an rdd but not debug messages. Am I wrong ? Cyril Scetbon > On Feb 17, 2016, at 12:51 AM, ayan guha <guha.a...@gmail.com> wrote: > > Hi >

Re: Spark Streaming with Kafka DirectStream

2016-02-16 Thread Cyril Scetbon
ghtly different understanding. > > Direct stream generates 1 RDD per batch, however, number of partitions in > that RDD = number of partitions in kafka topic. > > On Wed, Feb 17, 2016 at 12:18 PM, Cyril Scetbon <cyril.scet...@free.fr > <mailto:cyril.scet...@free.fr>>

Spark Streaming with Kafka DirectStream

2016-02-16 Thread Cyril Scetbon
Hi guys, I'm making some tests with Spark and Kafka using a Python script. I use the second method that doesn't need any receiver (Direct Approach). It should adapt the number of RDDs to the number of partitions in the topic. I'm trying to verify it. What's the easiest way to verify it ? I