How to assign consecutive numeric id to each row based on its content?

shahab Tue, 18 Nov 2014 04:55:33 -0800

Hi,

In my spark application, I am loading some rows from database into Spark
RDDs
Each row has several fields, and a string key. Due to my requirements I
need to work with consecutive numeric ids (starting from 1 to N, where N is
the number of unique keys) instead of string keys . Also several rows can
have same string key .


In spark context, how I can map each row into (Numeric_Key, OriginalRow) as
map/reduce  tasks such that rows with same original string key get same
numeric consecutive key?

Any hints?

best,
/Shahab

How to assign consecutive numeric id to each row based on its content?

Reply via email to