Update: Just test with HashPartitioner(8) and count on each partition:
List((0,657824), (1,658549), (2,659199), (3,658684), (4,659394), *(5,657591*), (*6,658327*), (*7,658434*)), List((0,657824), (1,658549), (2,659199), (3,658684), (4,659394), *(5,657594)*, (6,658326), (*7,658434*)), List((0,657824), (1,658549), (2,659199), (3,658684), (4,659394), *(5,657592)*, (6,658326), (*7,658435*)), List((0,657824), (1,658549), (2,659199), (3,658684), (4,659394), *(5,657591)*, (6,658326), (7,658434)), List((0,657824), (1,658549), (2,659199), (3,658684), (4,659394), *(5,657592)*, (6,658326), (7,658435)), List((0,657824), (1,658549), (2,659199), (3,658684), (4,659394), *(5,657592)*, (6,658326), (7,658435)), List((0,657824), (1,658549), (2,659199), (3,658684), (4,659394), *(5,657592)*, (6,658326), (7,658435)), List((0,657824), (1,658549), (2,659199), (3,658684), (4,659394), *(5,657591)*, (6,658326), (7,658435)) The result is not identical for each execution. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/groupBy-gives-non-deterministic-results-tp13698p13702.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org