GitHub user aljoscha opened a pull request:

    https://github.com/apache/incubator-beam/pull/409

    [BEAM-321] Fix Flink Comparators

    `KvCoderComparator` and `CoderComparator` were hashing the key directly
    while doing comparisons on the encoded form. This lead to
    inconsistencies in GroupByKey results with large numbers of elements per
    key.
    
    This changes the comparators to hash on the encoded form and also adds a
    test case to verify correct behavior. 
    
    I made the test a `RunnableOnService` test that verifies that runners use 
the encoded form of a value for hashing/comparison. In the test, I use a bogus 
key that returns garbage from `equals()` and `hashCode()`. The result of the 
test is correct if only the encoded form is used for hashing/comparison.
    
    R: @tgroh for the test, do you think there's a better way of doing it?
    CC: @tgroh the newly introduced test hangs on the 
`InProcessPipelineRunner`, see `InProcessGroupByKeyTest` that I added for 
quickly checking this
    CC: @francesperry is this new test in line with the Beam model

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/aljoscha/incubator-beam flink/fix-comparator

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-beam/pull/409.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #409
    
----
commit 8b269b542032c887ea149c50245de8d07a94fd51
Author: Aljoscha Krettek <aljoscha.kret...@gmail.com>
Date:   2016-06-01T09:56:18Z

    [BEAM-321] Fix Flink Comparators
    
    KvCoderComparator and CoderComparator were hashing the key directly
    while doing comparisons on the encoded form. This lead to
    inconsistencies in GroupByKey results with large numbers of elements per
    key.
    
    This changes the comparators to hash on the encoded form and also adds a
    test case to verify correct behavior.

commit 6866fd740af07a214ef5a9be39983260878ecb8d
Author: Aljoscha Krettek <aljoscha.kret...@gmail.com>
Date:   2016-06-01T09:58:48Z

    Fix typo in KvCoderComperator, is now KvCoderComparator

commit 6334b87ef3d2a90c540cf89aadab013dcded1483
Author: Aljoscha Krettek <aljoscha.kret...@gmail.com>
Date:   2016-06-01T15:54:37Z

    move to RunnableOnService test

commit 85defbc67323e10a87835d2514f45ad238e47842
Author: Aljoscha Krettek <aljoscha.kret...@gmail.com>
Date:   2016-06-01T16:14:51Z

    make test class package private

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to