Re: Implementing FIRST_VALUE, LEAD, LAG in Spark

2015-02-17 Thread Dmitry Tolpeko
I ended up with the following: def firstValue(items: Iterable[String]) = for { i - items } yield (i, items.head) data.groupByKey().map{case(a, b)=(a, firstValue(b))}.collect More details: http://dmtolpeko.com/2015/02/17/first_value-last_value-lead-and-lag-in-spark/ I would appreciate any

Implementing FIRST_VALUE, LEAD, LAG in Spark

2015-02-13 Thread Dmitry Tolpeko
Hello, To convert existing Map Reduce jobs to Spark, I need to implement window functions such as FIRST_VALUE, LEAD, LAG and so on. For example, FIRST_VALUE function: Source (1st column is key): A, A1 A, A2 A, A3 B, B1 B, B2 C, C1 and the result should be A, A1, A1 A, A2, A1 A, A3, A1 B, B1,