Gopal V created HIVE-21531: ------------------------------ Summary: Vectorization: all NULL hashcodes are not computed using Murmur3 Key: HIVE-21531 URL: https://issues.apache.org/jira/browse/HIVE-21531 Project: Hive Issue Type: Bug Reporter: Gopal V
The comments in Vectorized hash computation call out the MurmurHash implementation (the one using 0x5bd1e995), while the non-vectorized codepath calls out the Murmur3 one (using 0xcc9e2d51). The comments here are wrong {code} /** * Batch compute the hash codes for all the serialized keys. * * NOTE: MAJOR MAJOR ASSUMPTION: * We assume that HashCodeUtil.murmurHash produces the same result * as MurmurHash.hash with seed = 0 (the method used by ReduceSinkOperator for * UNIFORM distribution). */ protected void computeSerializedHashCodes() { int offset = 0; int keyLength; byte[] bytes = output.getData(); for (int i = 0; i < nonNullKeyCount; i++) { keyLength = serializedKeyLengths[i]; hashCodes[i] = Murmur3.hash32(bytes, offset, keyLength, 0); offset += keyLength; } } {code} but the wrong comment is followed in the Vector RS operator {code} System.arraycopy(nullKeyOutput.getData(), 0, nullBytes, 0, nullBytesLength); nullKeyHashCode = HashCodeUtil.calculateBytesHashCode(nullBytes, 0, nullBytesLength); {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)