Have you tried to cache? maybe after the collect() and before the map?
> On Sep 19, 2017, at 7:20 AM, Daniel O' Shaughnessy
> wrote:
>
> Thanks for your response Jean.
>
> I managed to figure this out in the end but it's an extremely slow solution
> and not
How big is the list of fruits in your example? Can you broadcast it?
On Tue, 19 Sep 2017 at 9:21 pm, Daniel O' Shaughnessy <
danieljamesda...@gmail.com> wrote:
> Thanks for your response Jean.
>
> I managed to figure this out in the end but it's an extremely slow
> solution and not tenable for
Thanks for your response Jean.
I managed to figure this out in the end but it's an extremely slow solution
and not tenable for my use-case:
val rddX = dfWithSchema.select("event_name").rdd.map(_.getString(0).split(
",").map(_.trim replaceAll ("[\\[\\]\"]", "")).toList)
//val oneRow =
Hey Daniel, not sure this will help, but... I had a similar need where i wanted
the content of a dataframe to become a "cell" or a row in the parent dataframe.
I grouped by the child dataframe, then collect it as a list in the parent
dataframe after a join operation. As I said, not sure it
Hi guys,
I'm having trouble implementing this scenario:
I have a column with a typical entry being : ['apple', 'orange', 'apple',
'pear', 'pear']
I need to use a StringIndexer to transform this to : [0, 2, 0, 1, 1]
I'm attempting to do this but because of the nested operation on another
RDD I
Hi ,
I have a question on spark
this programm on spark-shell
val filerdd = sc.textFile(NOTICE,2)
val maprdd = filerdd.map( word = filerdd.map( word2 = (word2+word) ) )
maprdd.collect()
throws NULL pointer exception ,
can somebody explain why i cannot have a nested rdd operation ?
--pavlos
cannot have a nested rdd operation ?
--pavlos
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org