GitHub user aljoscha opened a pull request: https://github.com/apache/incubator-beam/pull/409
[BEAM-321] Fix Flink Comparators `KvCoderComparator` and `CoderComparator` were hashing the key directly while doing comparisons on the encoded form. This lead to inconsistencies in GroupByKey results with large numbers of elements per key. This changes the comparators to hash on the encoded form and also adds a test case to verify correct behavior. I made the test a `RunnableOnService` test that verifies that runners use the encoded form of a value for hashing/comparison. In the test, I use a bogus key that returns garbage from `equals()` and `hashCode()`. The result of the test is correct if only the encoded form is used for hashing/comparison. R: @tgroh for the test, do you think there's a better way of doing it? CC: @tgroh the newly introduced test hangs on the `InProcessPipelineRunner`, see `InProcessGroupByKeyTest` that I added for quickly checking this CC: @francesperry is this new test in line with the Beam model You can merge this pull request into a Git repository by running: $ git pull https://github.com/aljoscha/incubator-beam flink/fix-comparator Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/409.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #409 ---- commit 8b269b542032c887ea149c50245de8d07a94fd51 Author: Aljoscha Krettek <aljoscha.kret...@gmail.com> Date: 2016-06-01T09:56:18Z [BEAM-321] Fix Flink Comparators KvCoderComparator and CoderComparator were hashing the key directly while doing comparisons on the encoded form. This lead to inconsistencies in GroupByKey results with large numbers of elements per key. This changes the comparators to hash on the encoded form and also adds a test case to verify correct behavior. commit 6866fd740af07a214ef5a9be39983260878ecb8d Author: Aljoscha Krettek <aljoscha.kret...@gmail.com> Date: 2016-06-01T09:58:48Z Fix typo in KvCoderComperator, is now KvCoderComparator commit 6334b87ef3d2a90c540cf89aadab013dcded1483 Author: Aljoscha Krettek <aljoscha.kret...@gmail.com> Date: 2016-06-01T15:54:37Z move to RunnableOnService test commit 85defbc67323e10a87835d2514f45ad238e47842 Author: Aljoscha Krettek <aljoscha.kret...@gmail.com> Date: 2016-06-01T16:14:51Z make test class package private ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---