Re: How to assign consecutive numeric id to each row based on its content?

Cheng Lian Tue, 18 Nov 2014 05:17:29 -0800

A not so efficient way can be this:

|val  r0:  RDD[OriginalRow] = ...
val  r1  =  r0.keyBy(row => extractKeyFromOriginalRow(row))
val  r2  =  r1.keys.distinct().zipWithIndex()
val  r3  =  r2.join(r1).values
|


On 11/18/14 8:54 PM, shahab wrote:

Hi,
In my spark application, I am loading some rows from database intoSpark RDDsEach row has several fields, and a string key. Due to my requirementsI need to work with consecutive numeric ids (starting from 1 to N,where N is the number of unique keys) instead of string keys . Alsoseveral rows can have same string key .
In spark context, how I can map each row into (Numeric_Key,OriginalRow) as map/reduce tasks such that rows with same originalstring key get same numeric consecutive key?
Any hints?

best,
/Shahab

Re: How to assign consecutive numeric id to each row based on its content?

Reply via email to