Re: Spark Streaming: DStream - zipWithIndex

Soumitra Kumar Wed, 27 Aug 2014 15:34:08 -0700

Thanks.

Just to double check, rdd.id would be unique for a batch in a DStream?



On Wed, Aug 27, 2014 at 3:04 PM, Xiangrui Meng <men...@gmail.com> wrote:

> You can use RDD id as the seed, which is unique in the same spark
> context. Suppose none of the RDDs would contain more than 1 billion
> records. Then you can use
>
> rdd.zipWithUniqueId().mapValues(uid => rdd.id * 1e9.toLong + uid)
>
> Just a hack ..
>
> On Wed, Aug 27, 2014 at 2:59 PM, Soumitra Kumar
> <kumar.soumi...@gmail.com> wrote:
> > So, I guess zipWithUniqueId will be similar.
> >
> > Is there a way to get unique index?
> >
> >
> > On Wed, Aug 27, 2014 at 2:39 PM, Xiangrui Meng <men...@gmail.com> wrote:
> >>
> >> No. The indices start at 0 for every RDD. -Xiangrui
> >>
> >> On Wed, Aug 27, 2014 at 2:37 PM, Soumitra Kumar
> >> <kumar.soumi...@gmail.com> wrote:
> >> > Hello,
> >> >
> >> > If I do:
> >> >
> >> > DStream transform {
> >> >     rdd.zipWithIndex.map {
> >> >
> >> >         Is the index guaranteed to be unique across all RDDs here?
> >> >
> >> > }
> >> > }
> >> >
> >> > Thanks,
> >> > -Soumitra.
> >
> >
>

Re: Spark Streaming: DStream - zipWithIndex

Reply via email to