[ https://issues.apache.org/jira/browse/KAFKA-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Viktor Somogyi-Vass updated KAFKA-10650: ---------------------------------------- Description: The usage of MD5 has been uncovered during testing Kafka for FIPS (Federal Information Processing Standards) verification. While MD5 isn't a FIPS incompatibility here as it isn't used for cryptographic purposes, I spent some time with this as it isn't ideal either. MD5 is a relatively fast crypto hashing algo but there are much better performing algorithms for hash tables as it's used in SkimpyOffsetMap. By applying Murmur3 (that is implemented in Streams) I could achieve a 3x faster {{put}} operation and the overall segment cleaning sped up by 30% while preserving the same collision rate (both performed within 0.0015 - 0.007, mostly with 0.004 median). The usage of Murmur3 was decided as research paper [1] shows Murmur2 is relatively a good choice for hash tables. Based on this Since Murmur3 is available in the project I used that. [1] https://www.researchgate.net/publication/235663569_Performance_of_the_most_common_non-cryptographic_hash_functions was: The usage of MD5 has been uncovered during testing Kafka for FIPS (Federal Information Processing Standards) verification. While MD5 isn't a FIPS incompatibility here as it isn't used for cryptographic purposes, I spent some time with this as it isn't ideal either. MD5 is a relatively fast crypto hashing algo but there are much better performing algorithms for hash tables as it's used in SkimpyOffsetMap. By applying Murmur3 (that is implemented in Streams) I could achieve a 3x faster {{put}} operation and the overall segment cleaning sped up by 30% while preserving the same collision rate (both performed within 0.0015 - 0.007, mostly with 0.004 median). > Use Murmur3 hashing instead of MD5 in SkimpyOffsetMap > ----------------------------------------------------- > > Key: KAFKA-10650 > URL: https://issues.apache.org/jira/browse/KAFKA-10650 > Project: Kafka > Issue Type: Improvement > Components: core > Reporter: Viktor Somogyi-Vass > Assignee: Viktor Somogyi-Vass > Priority: Major > > The usage of MD5 has been uncovered during testing Kafka for FIPS (Federal > Information Processing Standards) verification. > While MD5 isn't a FIPS incompatibility here as it isn't used for > cryptographic purposes, I spent some time with this as it isn't ideal either. > MD5 is a relatively fast crypto hashing algo but there are much better > performing algorithms for hash tables as it's used in SkimpyOffsetMap. > By applying Murmur3 (that is implemented in Streams) I could achieve a 3x > faster {{put}} operation and the overall segment cleaning sped up by 30% > while preserving the same collision rate (both performed within 0.0015 - > 0.007, mostly with 0.004 median). > The usage of Murmur3 was decided as research paper [1] shows Murmur2 is > relatively a good choice for hash tables. Based on this Since Murmur3 is > available in the project I used that. > [1] > https://www.researchgate.net/publication/235663569_Performance_of_the_most_common_non-cryptographic_hash_functions -- This message was sent by Atlassian Jira (v8.3.4#803005)