Re: Is there a way to create key based on counts in Spark

Davies Liu Tue, 18 Nov 2014 11:27:54 -0800

On Tue, Nov 18, 2014 at 9:06 AM, Debasish Das <debasish.da...@gmail.com> wrote:
> Use zipWithIndex but cache the data before you run zipWithIndex...that way
> your ordering will be consistent (unless the bug has been fixed where you
> don't have to cache the data)...


Could you point some link about the bug?

> Normally these operations are used for dictionary building and so I am
> hoping you can cache the dictionary of RDD[String] before you can run
> zipWithIndex...
>
> indices are within 0 till maxIndex-1...if you want 1 you have to later map
> the index to index + 1
>
> On Tue, Nov 18, 2014 at 8:56 AM, Blind Faith <person.of.b...@gmail.com>
> wrote:
>>
>> As it is difficult to explain this, I would show what I want. Lets us say,
>> I have an RDD A with the following value
>>
>> A = ["word1", "word2", "word3"]
>>
>> I want to have an RDD with the following value
>>
>> B = [(1, "word1"), (2, "word2"), (3, "word3")]
>>
>> That is, it gives a unique number to each entry as a key value. Can we do
>> such thing with Python or Scala?
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Is there a way to create key based on counts in Spark

Reply via email to