You could either do monotonically_increasing_id or use a window function and rank. The first is a simple spark SQL function, data bricks has a pretty helpful post for how to use window functions (in this case the whole data set is the window).
On Fri, Jul 22, 2016 at 12:20 PM, Marco Mistroni <mmistr...@gmail.com> wrote: > Hi > So u u have a data frame, then use zipwindex and create a tuple .... > I m not sure if df API has something useful for zip w index. > But u can > - get a data frame > - convert it to rdd (there's a tordd ) > - do a zip with index > > That will give u a rdd with 3 fields... > I don't think you can update df columns.... > Hth > On 22 Jul 2016 5:19 pm, "VG" <vlin...@gmail.com> wrote: > > > > > > Hi All, > > > > Any suggestions for this > > > > Regards, > > VG > > > > On Fri, Jul 22, 2016 at 6:40 PM, VG <vlin...@gmail.com> wrote: > > >> > > >> Hi All, > >> > >> I am really confused how to proceed further. Please help. > >> > >> I have a dataset created as follows: > >> Dataset<Row> b = sqlContext.sql("SELECT bid, name FROM business"); > >> > >> Now I need to map each name with a unique index and I did the following > >> JavaPairRDD<Row, Long> indexedBId = business.javaRDD() > >> > .zipWithIndex(); > >> > >> In later part of the code I need to change a datastructure and update > name with index value generated above . > >> I am unable to figure out how to do a look up here.. > >> > >> Please suggest /. > >> > >> If there is a better way to do this please suggest that. > >> > >> Regards > >> VG > >> > > > -- Pedro Rodriguez PhD Student in Distributed Machine Learning | CU Boulder UC Berkeley AMPLab Alumni ski.rodrig...@gmail.com | pedrorodriguez.io | 909-353-4423 Github: github.com/EntilZha | LinkedIn: https://www.linkedin.com/in/pedrorodriguezscience