Hi,
I have analytics data with timestamps on each element. I'd like to analyze
consecutive elements using Spark, but haven't figured out how to do this.
Essentially what I'd want is a transform from a sorted RDD [A, B, C, D, E]
to an RDD [(A,B), (B,C), (C,D), (D,E)]. (Or some other way to
Hi Sampo,
You could try zipWithIndex followed by a self join with shifted index
values like this:
val arr = Array((1, "A"), (8, "D"), (7, "C"), (3, "B"), (9, "E"))
val rdd = sc.parallelize(arr)
val sorted = rdd.sortByKey(true)
val zipped = sorted.zipWithIndex.map(x => (x._2, x._1))
val pairs =
Hi,
Sorry, I'm not very familiar with those methods and cannot find the 'drop'
method anywhere.
As an example:
val arr = Array((1, "A"), (8, "D"), (7, "C"), (3, "B"), (9, "E"))
val rdd = sc.parallelize(arr)
val sorted = rdd.sortByKey(true)
// ... then what?
Thanks.
Best regards,
*Sampo
.
-adrian
From: Sampo Niskanen
Date: Thursday, October 22, 2015 at 2:12 PM
To: Adrian Tanase
Cc: user
Subject: Re: Analyzing consecutive elements
Hi,
Sorry, I'm not very familiar with those methods and cannot find the 'drop'
method anywhere.
As an example:
val arr = Array((1, "A"
((7,C),(8,D)), ((8,D),(9,E)))
Otherwise you could try to convert your RDD to a DataFrame then use windowing
functions in SparkSQL with the LEAD/LAG functions.
Best regards,
Fanilo
De : Dylan Hogg [mailto:dylanh...@gmail.com]
Envoyé : jeudi 22 octobre 2015 13:44
À : Sampo Niskanen
Cc :
sc.stop()
> }
> }
>
>
>
> prints
>
>
>
> WrappedArray(((1,A),(3,B)), ((3,B),(7,C)), ((7,C),(8,D)), ((8,D),(9,E)))
>
>
>
> Otherwise you could try to convert your RDD to a DataFrame then use windowing
> functions in SparkSQL with the LEAD/LAG f