Drop is a method on scala’s collections (array, list, etc) - not on Spark’s RDDs. You can look at it as long as you use mapPartitions or something like reduceByKey, but it totally depends on the use-cases you have for analytics.
The others have suggested better solutions using only spark’s APIs. -adrian From: Sampo Niskanen Date: Thursday, October 22, 2015 at 2:12 PM To: Adrian Tanase Cc: user Subject: Re: Analyzing consecutive elements Hi, Sorry, I'm not very familiar with those methods and cannot find the 'drop' method anywhere. As an example: val arr = Array((1, "A"), (8, "D"), (7, "C"), (3, "B"), (9, "E")) val rdd = sc.parallelize(arr) val sorted = rdd.sortByKey(true) // ... then what? Thanks. Best regards, Sampo Niskanen Lead developer / Wellmo sampo.niska...@wellmo.com<mailto:sampo.niska...@wellmo.com> +358 40 820 5291 On Thu, Oct 22, 2015 at 10:43 AM, Adrian Tanase <atan...@adobe.com<mailto:atan...@adobe.com>> wrote: I'm not sure if there is a better way to do it directly using Spark APIs but I would try to use mapPartitions and then within each partition Iterable to: rdd.zip(rdd.drop(1)) - using the Scala collection APIs This should give you what you need inside a partition. I'm hoping that you can partition your data somehow (e.g by user id or session id) that makes you algorithm parallel. In that case you can use the snippet above in a reduceByKey. hope this helps -adrian Sent from my iPhone On 22 Oct 2015, at 09:36, Sampo Niskanen <sampo.niska...@wellmo.com<mailto:sampo.niska...@wellmo.com>> wrote: Hi, I have analytics data with timestamps on each element. I'd like to analyze consecutive elements using Spark, but haven't figured out how to do this. Essentially what I'd want is a transform from a sorted RDD [A, B, C, D, E] to an RDD [(A,B), (B,C), (C,D), (D,E)]. (Or some other way to analyze time-related elements.) How can this be achieved? Sampo Niskanen Lead developer / Wellmo sampo.niska...@wellmo.com<mailto:sampo.niska...@wellmo.com> +358 40 820 5291