Issue was solved by clearing hashmap and hashset at the beginning of the call
method.
From: Jacob Maloney [mailto:jmalo...@conversantmedia.com]
Sent: Thursday, October 16, 2014 5:09 PM
To: user@spark.apache.org
Subject: Strange duplicates in data when scaling up
I have a flatmap function
I have a flatmap function that shouldn't possibly emit duplicates and yet it
does. The output of my function is a HashSet so the function itself cannot
output duplicates and yet I see many copies of keys emmited from it (in one
case up to 62). The curious thing is I can't get this to happen
[mailto:user-h...@spark.apache.org]
Sent: Friday, October 10, 2014 4:02 PM
To: Jacob Maloney
Subject: FAQ for user@spark.apache.org
Hi! This is the ezmlm program. I'm managing the user@spark.apache.org mailing
list.
FAQ - Frequently asked questions of the user@spark.apache.org list.
None available yet