You might look at monotonically_increasing_id() here http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions instead of converting it to an RDD. since you pay a performance penalty for that.
If you want to change the name you can do something like this (in scala since I am not familiar with java API, but it should be similar in java) val df = sqlContext.sql("select bid, name from business").withColumn(monotonically_increasing_id().as("id") // some steps later on df.withColumn("name", $"id") I am not 100% what you mean by updating the data structure, I am guessing you mean replace the name column with the id column? Not, on the second line the withColumn call uses $"id" which in scala converts to a Column. In java maybe its something like new Column("id"), not sure. Pedro On Fri, Jul 22, 2016 at 12:21 PM, VG <vlin...@gmail.com> wrote: > Any suggestions here please > > I basically need an ability to look up *name -> index* and *index -> name* > in the code > > -VG > > On Fri, Jul 22, 2016 at 6:40 PM, VG <vlin...@gmail.com> wrote: > >> Hi All, >> >> I am really confused how to proceed further. Please help. >> >> I have a dataset created as follows: >> Dataset<Row> b = sqlContext.sql("SELECT bid, name FROM business"); >> >> Now I need to map each name with a unique index and I did the following >> JavaPairRDD<Row, Long> indexedBId = business.javaRDD() >> >> .zipWithIndex(); >> >> In later part of the code I need to change a datastructure and update >> name with index value generated above . >> I am unable to figure out how to do a look up here.. >> >> Please suggest /. >> >> If there is a better way to do this please suggest that. >> >> Regards >> VG >> >> > -- Pedro Rodriguez PhD Student in Distributed Machine Learning | CU Boulder UC Berkeley AMPLab Alumni ski.rodrig...@gmail.com | pedrorodriguez.io | 909-353-4423 Github: github.com/EntilZha | LinkedIn: https://www.linkedin.com/in/pedrorodriguezscience