I think I got the root cause, you can use Text.toString() to solve this
issue.  Because the Text is shared so the last record display multiple
times.

On Wed, Mar 23, 2016 at 11:37 AM, Jeff Zhang <zjf...@gmail.com> wrote:

> Looks like a spark bug. I can reproduce it for sequence file, but it works
> for text file.
>
> On Wed, Mar 23, 2016 at 10:56 AM, Thamme Gowda N. <tgow...@gmail.com>
> wrote:
>
>> Hi spark experts,
>>
>> I am facing issues with cached RDDs. I noticed that few entries
>> get duplicated for n times when the RDD is cached.
>>
>> I asked a question on Stackoverflow with my code snippet to reproduce it.
>>
>> I really appreciate  if you can visit
>> http://stackoverflow.com/q/36168827/1506477
>> and answer my question / give your comments.
>>
>> Or at the least confirm that it is a bug.
>>
>> Thanks in advance for your help!
>>
>> --
>> Thamme
>>
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>



-- 
Best Regards

Jeff Zhang

Reply via email to