Re: How to generate a sequential key in rdd across executors

Pedro Rodriguez Sun, 24 Jul 2016 03:36:08 -0700

If you can use a dataframe then you could use rank + window function at the
expense of an extra sort. Do you have an example of zip with index not
working, that seems surprising.
On Jul 23, 2016 10:24 PM, "Andrew Ehrlich" <and...@aehrlich.com> wrote:


> It’s hard to do in a distributed system. Maybe try generating a meaningful
> key using a timestamp + hashed unique key fields in the record?
>
> > On Jul 23, 2016, at 7:53 PM, yeshwanth kumar <yeshwant...@gmail.com>
> wrote:
> >
> > Hi,
> >
> > i am doing bulk load to hbase using spark,
> > in which i need to generate a sequential key for each record,
> > the key should be sequential across all the executors.
> >
> > i tried zipwith index, didn't worked because zipwith index gives index
> per executor not across all executors.
> >
> > looking for some suggestions.
> >
> >
> > Thanks,
> > -Yeshwanth
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Re: How to generate a sequential key in rdd across executors

Reply via email to