yifan-c commented on code in PR #46: URL: https://github.com/apache/cassandra-analytics/pull/46#discussion_r1541824843
########## cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/utils/XXHash32DigestAlgorithm.java: ########## @@ -61,7 +61,10 @@ public Digest calculateFileDigest(Path path) throws IOException { hasher.update(buffer, 0, len); } - return new XXHash32Digest(Long.toHexString(hasher.getValue()), SEED); + // lz4 library doesn't mask the hash value, so we need to mask it to + // prevent forwarding the negative sign bit when converting to a long value + long hash = hasher.getValue() & 0xffffffffL; Review Comment: The comment is not true. There is no mask required for lz4's implementation since it returns `int`. Masking is required because our code casts `int` to `long`, and the sign bit cannot be carried over. Alternatively, you can just call this w/o masking. ```java Integer.toHexString(hasher.getValue()) ``` And the proof. ``` jshell> int i = -1; i ==> -1 jshell> Integer.toHexString(i); $2 ==> "ffffffff" jshell> long l = i & 0xffffffffL; l ==> 4294967295 jshell> Long.toHexString(l); $4 ==> "ffffffff" ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org