On Tue, Nov 18, 2014 at 9:06 AM, Debasish Das <debasish.da...@gmail.com> wrote: > Use zipWithIndex but cache the data before you run zipWithIndex...that way > your ordering will be consistent (unless the bug has been fixed where you > don't have to cache the data)...
Could you point some link about the bug? > Normally these operations are used for dictionary building and so I am > hoping you can cache the dictionary of RDD[String] before you can run > zipWithIndex... > > indices are within 0 till maxIndex-1...if you want 1 you have to later map > the index to index + 1 > > On Tue, Nov 18, 2014 at 8:56 AM, Blind Faith <person.of.b...@gmail.com> > wrote: >> >> As it is difficult to explain this, I would show what I want. Lets us say, >> I have an RDD A with the following value >> >> A = ["word1", "word2", "word3"] >> >> I want to have an RDD with the following value >> >> B = [(1, "word1"), (2, "word2"), (3, "word3")] >> >> That is, it gives a unique number to each entry as a key value. Can we do >> such thing with Python or Scala? > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org