I think I got the root cause, you can use Text.toString() to solve this issue. Because the Text is shared so the last record display multiple times.
On Wed, Mar 23, 2016 at 11:37 AM, Jeff Zhang <zjf...@gmail.com> wrote: > Looks like a spark bug. I can reproduce it for sequence file, but it works > for text file. > > On Wed, Mar 23, 2016 at 10:56 AM, Thamme Gowda N. <tgow...@gmail.com> > wrote: > >> Hi spark experts, >> >> I am facing issues with cached RDDs. I noticed that few entries >> get duplicated for n times when the RDD is cached. >> >> I asked a question on Stackoverflow with my code snippet to reproduce it. >> >> I really appreciate if you can visit >> http://stackoverflow.com/q/36168827/1506477 >> and answer my question / give your comments. >> >> Or at the least confirm that it is a bug. >> >> Thanks in advance for your help! >> >> -- >> Thamme >> > > > > -- > Best Regards > > Jeff Zhang > -- Best Regards Jeff Zhang