A not so efficient way can be this:

|val  r0:  RDD[OriginalRow] = ...
val  r1  =  r0.keyBy(row => extractKeyFromOriginalRow(row))
val  r2  =  r1.keys.distinct().zipWithIndex()
val  r3  =  r2.join(r1).values

On 11/18/14 8:54 PM, shahab wrote:


In my spark application, I am loading some rows from database into Spark RDDs Each row has several fields, and a string key. Due to my requirements I need to work with consecutive numeric ids (starting from 1 to N, where N is the number of unique keys) instead of string keys . Also several rows can have same string key .

In spark context, how I can map each row into (Numeric_Key, OriginalRow) as map/reduce tasks such that rows with same original string key get same numeric consecutive key?

Any hints?


Reply via email to