Github user mridulm commented on the issue: https://github.com/apache/spark/pull/20414 @jiangxb1987 Unfortunately I am unable to analyze this in detail; but hopefully can give some pointers, which I hope, helps ! One example I can think of is, for shuffle which uses Aggregator (like combineByKey), via ExternalAppendOnlyMap. The order in which we replay the keys with the same hash is non deterministic from what I remember - for example if first run did not result in any spills, second run had 3 spills and third run had 7, the order of keys (with same hash) could be different in each. Similarly, with sort based shuffle, depending on the length of the data array in AppendOnlyMap (which is determined by whether we spilt or not) we can get different sort order's ? Similarly for the actual sort itself, the `merge` quite clearly is sensitive to number of spills (for example when no aggregator or ordering, it is simply `iterators.iterator.flatten`). There might be other cases where this is happening - I have not regularly looked at this part of the codebase in a while now unfortunately. Please note that all the cases above, there is no ordering defined.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org