Re: Nested RDD operation

2017-09-19 Thread Jean Georges Perrin
Have you tried to cache? maybe after the collect() and before the map? > On Sep 19, 2017, at 7:20 AM, Daniel O' Shaughnessy > wrote: > > Thanks for your response Jean. > > I managed to figure this out in the end but it's an extremely slow solution > and not

Re: Nested RDD operation

2017-09-19 Thread ayan guha
How big is the list of fruits in your example? Can you broadcast it? On Tue, 19 Sep 2017 at 9:21 pm, Daniel O' Shaughnessy < danieljamesda...@gmail.com> wrote: > Thanks for your response Jean. > > I managed to figure this out in the end but it's an extremely slow > solution and not tenable for

Re: Nested RDD operation

2017-09-19 Thread Daniel O' Shaughnessy
Thanks for your response Jean. I managed to figure this out in the end but it's an extremely slow solution and not tenable for my use-case: val rddX = dfWithSchema.select("event_name").rdd.map(_.getString(0).split( ",").map(_.trim replaceAll ("[\\[\\]\"]", "")).toList) //val oneRow =

Re: Nested RDD operation

2017-09-15 Thread Jean Georges Perrin
Hey Daniel, not sure this will help, but... I had a similar need where i wanted the content of a dataframe to become a "cell" or a row in the parent dataframe. I grouped by the child dataframe, then collect it as a list in the parent dataframe after a join operation. As I said, not sure it

Nested RDD operation

2017-09-15 Thread Daniel O' Shaughnessy
Hi guys, I'm having trouble implementing this scenario: I have a column with a typical entry being : ['apple', 'orange', 'apple', 'pear', 'pear'] I need to use a StringIndexer to transform this to : [0, 2, 0, 1, 1] I'm attempting to do this but because of the nested operation on another RDD I

nested rdd operation

2014-09-10 Thread Pavlos Katsogridakis
Hi , I have a question on spark this programm on spark-shell val filerdd = sc.textFile(NOTICE,2) val maprdd = filerdd.map( word = filerdd.map( word2 = (word2+word) ) ) maprdd.collect() throws NULL pointer exception , can somebody explain why i cannot have a nested rdd operation ? --pavlos

Re: nested rdd operation

2014-09-10 Thread Sean Owen
cannot have a nested rdd operation ? --pavlos - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org