You can use RDD id as the seed, which is unique in the same spark
context. Suppose none of the RDDs would contain more than 1 billion
records. Then you can use

rdd.zipWithUniqueId().mapValues(uid => rdd.id * 1e9.toLong + uid)

Just a hack ..

On Wed, Aug 27, 2014 at 2:59 PM, Soumitra Kumar
<kumar.soumi...@gmail.com> wrote:
> So, I guess zipWithUniqueId will be similar.
>
> Is there a way to get unique index?
>
>
> On Wed, Aug 27, 2014 at 2:39 PM, Xiangrui Meng <men...@gmail.com> wrote:
>>
>> No. The indices start at 0 for every RDD. -Xiangrui
>>
>> On Wed, Aug 27, 2014 at 2:37 PM, Soumitra Kumar
>> <kumar.soumi...@gmail.com> wrote:
>> > Hello,
>> >
>> > If I do:
>> >
>> > DStream transform {
>> >     rdd.zipWithIndex.map {
>> >
>> >         Is the index guaranteed to be unique across all RDDs here?
>> >
>> > }
>> > }
>> >
>> > Thanks,
>> > -Soumitra.
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to