Re: [Critical] Issue with cached RDDs created from hadoop sequence files

2016-03-22 Thread Thamme Gowda N.
Hi Jeff, Yes, you are absolutely right. It is because of the RecordReader reusing the Writable Instance. I did not anticipate this as it worked for text files. Thank you so much for doing this. Your answer is accepted! Best, Thamme -- *Thamme Gowda N. * Grad Student at usc.edu Twitter

[Critical] Issue with cached RDDs created from hadoop sequence files

2016-03-22 Thread Thamme Gowda N.
Hi spark experts, I am facing issues with cached RDDs. I noticed that few entries get duplicated for n times when the RDD is cached. I asked a question on Stackoverflow with my code snippet to reproduce it. I really appreciate if you can visit http://stackoverflow.com/q/36168827/1506477 and

Issue regarding removal of duplicates from RDD

2016-03-19 Thread Thamme Gowda N.
low.com/a/31656056/1506477 - Thanks and regards Thamme -- *Thamme Gowda N. * Grad Student at usc.edu Twitter: @thammegowda <https://twitter.com/thammegowda> Website: http://scf.usc.edu/~tnarayan/