Use zipWithIndex but cache the data before you run zipWithIndex...that way your ordering will be consistent (unless the bug has been fixed where you don't have to cache the data)...
Normally these operations are used for dictionary building and so I am hoping you can cache the dictionary of RDD[String] before you can run zipWithIndex... indices are within 0 till maxIndex-1...if you want 1 you have to later map the index to index + 1 On Tue, Nov 18, 2014 at 8:56 AM, Blind Faith <person.of.b...@gmail.com> wrote: > As it is difficult to explain this, I would show what I want. Lets us say, > I have an RDD A with the following value > > A = ["word1", "word2", "word3"] > > I want to have an RDD with the following value > > B = [(1, "word1"), (2, "word2"), (3, "word3")] > > That is, it gives a unique number to each entry as a key value. Can we do > such thing with Python or Scala? >