Analyzing consecutive elements

2015-10-22 Thread Sampo Niskanen
Hi, I have analytics data with timestamps on each element. I'd like to analyze consecutive elements using Spark, but haven't figured out how to do this. Essentially what I'd want is a transform from a sorted RDD [A, B, C, D, E] to an RDD [(A,B), (B,C), (C,D), (D,E)]. (Or some other way to

Re: Analyzing consecutive elements

2015-10-22 Thread Dylan Hogg
Hi Sampo, You could try zipWithIndex followed by a self join with shifted index values like this: val arr = Array((1, "A"), (8, "D"), (7, "C"), (3, "B"), (9, "E")) val rdd = sc.parallelize(arr) val sorted = rdd.sortByKey(true) val zipped = sorted.zipWithIndex.map(x => (x._2, x._1)) val pairs =

Re: Analyzing consecutive elements

2015-10-22 Thread Sampo Niskanen
Hi, Sorry, I'm not very familiar with those methods and cannot find the 'drop' method anywhere. As an example: val arr = Array((1, "A"), (8, "D"), (7, "C"), (3, "B"), (9, "E")) val rdd = sc.parallelize(arr) val sorted = rdd.sortByKey(true) // ... then what? Thanks. Best regards, *Sampo

Re: Analyzing consecutive elements

2015-10-22 Thread Adrian Tanase
. -adrian From: Sampo Niskanen Date: Thursday, October 22, 2015 at 2:12 PM To: Adrian Tanase Cc: user Subject: Re: Analyzing consecutive elements Hi, Sorry, I'm not very familiar with those methods and cannot find the 'drop' method anywhere. As an example: val arr = Array((1, "A"

RE: Analyzing consecutive elements

2015-10-22 Thread Andrianasolo Fanilo
((7,C),(8,D)), ((8,D),(9,E))) Otherwise you could try to convert your RDD to a DataFrame then use windowing functions in SparkSQL with the LEAD/LAG functions. Best regards, Fanilo De : Dylan Hogg [mailto:dylanh...@gmail.com] Envoyé : jeudi 22 octobre 2015 13:44 À : Sampo Niskanen Cc :

Re: Analyzing consecutive elements

2015-10-22 Thread Sampo Niskanen
sc.stop() > } > } > > > > prints > > > > WrappedArray(((1,A),(3,B)), ((3,B),(7,C)), ((7,C),(8,D)), ((8,D),(9,E))) > > > > Otherwise you could try to convert your RDD to a DataFrame then use windowing > functions in SparkSQL with the LEAD/LAG f