You can use zipWithIndex or the approach Tim suggested or even the one you are using but I believe the issue is that tableA is being materialized every time you for the new transformations. Are you caching/persisting the table A? If you do that you should not see this behavior.
Thanks Ankur On Fri, Apr 7, 2017 at 4:24 PM, Tim Smith <secs...@gmail.com> wrote: > http://stackoverflow.com/questions/37231616/add-a-new- > column-to-a-dataframe-new-column-i-want-it-to-be-a-uuid-generator > > > On Fri, Apr 7, 2017 at 3:56 PM, Everett Anderson <ever...@nuna.com.invalid > > wrote: > >> Hi, >> >> What's the best way to assign a truly unique row ID (rather than a hash) >> to a DataFrame/Dataset? >> >> I originally thought that functions.monotonically_increasing_id would do >> this, but it seems to have a rather unfortunate property that if you add it >> as a column to table A and then derive tables X, Y, Z and save those, the >> row ID values in X, Y, and Z may end up different. I assume this is because >> it delays the actual computation to the point where each of those tables is >> computed. >> >> > > > -- > > -- > Thanks, > > Tim >