Re: Assigning a unique row ID

Ankur Srivastava Fri, 07 Apr 2017 16:28:47 -0700

You can use zipWithIndex or the approach Tim suggested or even the one you
are using but I believe the issue is that tableA is being materialized
every time you for the new transformations. Are you caching/persisting the
table A? If you do that you should not see this behavior.


Thanks
Ankur

On Fri, Apr 7, 2017 at 4:24 PM, Tim Smith <secs...@gmail.com> wrote:

> http://stackoverflow.com/questions/37231616/add-a-new-
> column-to-a-dataframe-new-column-i-want-it-to-be-a-uuid-generator
>
>
> On Fri, Apr 7, 2017 at 3:56 PM, Everett Anderson <ever...@nuna.com.invalid
> > wrote:
>
>> Hi,
>>
>> What's the best way to assign a truly unique row ID (rather than a hash)
>> to a DataFrame/Dataset?
>>
>> I originally thought that functions.monotonically_increasing_id would do
>> this, but it seems to have a rather unfortunate property that if you add it
>> as a column to table A and then derive tables X, Y, Z and save those, the
>> row ID values in X, Y, and Z may end up different. I assume this is because
>> it delays the actual computation to the point where each of those tables is
>> computed.
>>
>>
>
>
> --
>
> --
> Thanks,
>
> Tim
>

Re: Assigning a unique row ID

Reply via email to