[ https://issues.apache.org/jira/browse/FLINK-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15369669#comment-15369669 ]
ASF GitHub Bot commented on FLINK-3477: --------------------------------------- Github user ggevay commented on a diff in the pull request: https://github.com/apache/flink/pull/1517#discussion_r70182801 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/operators/hash/MutableHashTable.java --- @@ -1480,28 +1480,17 @@ public static int getInitialTableSize(int numBuffers, int bufferSize, int numPar public static byte assignPartition(int bucket, byte numPartitions) { return (byte) (bucket % numPartitions); } - + /** - * This function hashes an integer value. It is adapted from Bob Jenkins' website - * <a href="http://www.burtleburtle.net/bob/hash/integer.html">http://www.burtleburtle.net/bob/hash/integer.html</a>. - * The hash function has the <i>full avalanche</i> property, meaning that every bit of the value to be hashed - * affects every bit of the hash value. - * - * @param code The integer to be hashed. - * @return The hash code for the integer. - */ + * The level parameter is needed so that we can have different hash functions when we recursively apply + * the partitioning, so that the working set eventually fits into memory. + */ public static int hash(int code, int level) { final int rotation = level * 11; code = (code << rotation) | (code >>> -rotation); --- End diff -- abfd1ff825bf63c5cda11c2b5a556990ca5df3e1 > Add hash-based combine strategy for ReduceFunction > -------------------------------------------------- > > Key: FLINK-3477 > URL: https://issues.apache.org/jira/browse/FLINK-3477 > Project: Flink > Issue Type: Sub-task > Components: Local Runtime > Reporter: Fabian Hueske > Assignee: Gabor Gevay > > This issue is about adding a hash-based combine strategy for ReduceFunctions. > The interface of the {{reduce()}} method is as follows: > {code} > public T reduce(T v1, T v2) > {code} > Input type and output type are identical and the function returns only a > single value. A Reduce function is incrementally applied to compute a final > aggregated value. This allows to hold the preaggregated value in a hash-table > and update it with each function call. > The hash-based strategy requires special implementation of an in-memory hash > table. The hash table should support in place updates of elements (if the > updated value has the same size as the new value) but also appending updates > with invalidation of the old value (if the binary length of the new value > differs). The hash table needs to be able to evict and emit all elements if > it runs out-of-memory. > We should also add {{HASH}} and {{SORT}} compiler hints to > {{DataSet.reduce()}} and {{Grouping.reduce()}} to allow users to pick the > execution strategy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)