[ 
https://issues.apache.org/jira/browse/FLINK-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15156306#comment-15156306
 ] 

ASF GitHub Bot commented on FLINK-3422:
---------------------------------------

Github user StephanEwen commented on the pull request:

    https://github.com/apache/flink/pull/1685#issuecomment-186948795
  
    It is pretty crucial that different hash functions are used for the 
partitioning across machines, and the internal partitioning of data structures. 
If the same hash function is used for both, many internal data structure 
partitions will be empty.
    
    So far we divided it the following way (admittedly not documented)
      - murmur hash across machines
      - Jenkins hash internally in data structures
    
    How about we stick with that division and use Murmur Hash in the streaming 
partitioner as well?
    



> Scramble HashPartitioner hashes
> -------------------------------
>
>                 Key: FLINK-3422
>                 URL: https://issues.apache.org/jira/browse/FLINK-3422
>             Project: Flink
>          Issue Type: Improvement
>          Components: Streaming
>    Affects Versions: 0.10.2
>            Reporter: Stephan Ewen
>            Assignee: Gabor Horvath
>            Priority: Critical
>             Fix For: 1.0.0
>
>
> The {{HashPartitioner}} used by the streaming API does not apply any hash 
> scrambling against bad user hash functions.
> We should apply a murmor or jenkins hash on top of the hash code, similar as 
> in the {{DataSet}} API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to