Github user greghogan commented on the issue:

    https://github.com/apache/flink/pull/4372
  
    @StephanEwen I like the new template. I much prefer free form over 
checkboxes.
    
    @fhueske I'm questioning my understanding of the the heuristic for using a 
hash-combine. For a fixed number of keys the hash-combine can be beneficial 
independent of the size of the data set. Basing the decision on the ratio of 
keys to values, as the size of the data set increases the likelihood of 
matching keys and values occurring in the same combine operation (before 
filling and being flushed to the reducer) decreases.
    
    This is often the case for graphs. I'm thinking that the improvement for 
using hash-combine on larger data sets may have been due to hashing performing 
better than sort when we wanted to disable the combiner.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to