Hi Jeff,
Yes, you are absolutely right.
It is because of the RecordReader reusing the Writable Instance. I did not
anticipate this as it worked for text files.
Thank you so much for doing this.
Your answer is accepted!
Best,
Thamme
--
*Thamme Gowda N. *
Grad Student at usc.edu
Twitter
Hi spark experts,
I am facing issues with cached RDDs. I noticed that few entries
get duplicated for n times when the RDD is cached.
I asked a question on Stackoverflow with my code snippet to reproduce it.
I really appreciate if you can visit
http://stackoverflow.com/q/36168827/1506477
and
low.com/a/31656056/1506477
-
Thanks and regards
Thamme
--
*Thamme Gowda N. *
Grad Student at usc.edu
Twitter: @thammegowda <https://twitter.com/thammegowda>
Website: http://scf.usc.edu/~tnarayan/