Thanks a lot, both solutions work.
best,
/Shahab
On Tue, Nov 18, 2014 at 5:28 PM, Daniel Siegmann daniel.siegm...@velos.io
wrote:
I think zipWithIndex is zero-based, so if you want 1 to N, you'll need to
increment them like so:
val r2 = r1.keys.distinct().zipWithIndex().mapValues(_ + 1)
Hi,
In my spark application, I am loading some rows from database into Spark
RDDs
Each row has several fields, and a string key. Due to my requirements I
need to work with consecutive numeric ids (starting from 1 to N, where N is
the number of unique keys) instead of string keys . Also several
A not so efficient way can be this:
|val r0: RDD[OriginalRow] = ...
val r1 = r0.keyBy(row = extractKeyFromOriginalRow(row))
val r2 = r1.keys.distinct().zipWithIndex()
val r3 = r2.join(r1).values
|
On 11/18/14 8:54 PM, shahab wrote:
Hi,
In my spark application, I am loading some
I think zipWithIndex is zero-based, so if you want 1 to N, you'll need to
increment them like so:
val r2 = r1.keys.distinct().zipWithIndex().mapValues(_ + 1)
If the number of distinct keys is relatively small, you might consider
collecting them into a map and broadcasting them rather than using