[jira] [Commented] (KAFKA-3499) byte[] should not be used as Map key nor Set member
[ https://issues.apache.org/jira/browse/KAFKA-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258173#comment-15258173 ] ASF GitHub Bot commented on KAFKA-3499: --- Github user asfgit closed the pull request at: https://github.com/apache/kafka/pull/1229 > byte[] should not be used as Map key nor Set member > --- > > Key: KAFKA-3499 > URL: https://issues.apache.org/jira/browse/KAFKA-3499 > Project: Kafka > Issue Type: Sub-task > Components: streams >Reporter: josh gruenberg >Assignee: Guozhang Wang > Labels: user-experience > Fix For: 0.10.0.0 > > > On the JVM, Array.equals and Array.hashCode do not incorporate array > contents; they inherit Object.equals/hashCode. This implies that Collections > that rely upon equals/hashCode (eg, HashMap/HashSet and variants) treat two > arrays with equal contents as distinct elements. > Many of the Kafka Streams internal classes currently use generic HashMaps and > Sets to manage caches and invalidation status. For example, > RocksDBStore.cacheDirtyKeys is a HashSet. Then, in RocksDBWindowStore, the > Elements are constructed as RocksDBStore. > Similarly, the MemoryLRUCache internally holds a > LinkedHashMap map, and a HashSet keys, and these end up holding > byte[] keys. Finally, user-code may attempt to use any of these provided > types with byte[], with undesirable results. > Keys that are byte-arrays should be wrapped in a type that incorporates the > content in their computation of equals/hashCode. java.nio.ByteBuffer is one > such type that could be used, but a purpose-built immutable class would > likely be a better solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3499) byte[] should not be used as Map key nor Set member
[ https://issues.apache.org/jira/browse/KAFKA-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243866#comment-15243866 ] josh gruenberg commented on KAFKA-3499: --- Sure, I'll take a look later tonight. > byte[] should not be used as Map key nor Set member > --- > > Key: KAFKA-3499 > URL: https://issues.apache.org/jira/browse/KAFKA-3499 > Project: Kafka > Issue Type: Sub-task > Components: streams >Reporter: josh gruenberg >Assignee: Guozhang Wang > Labels: user-experience > Fix For: 0.10.0.0 > > > On the JVM, Array.equals and Array.hashCode do not incorporate array > contents; they inherit Object.equals/hashCode. This implies that Collections > that rely upon equals/hashCode (eg, HashMap/HashSet and variants) treat two > arrays with equal contents as distinct elements. > Many of the Kafka Streams internal classes currently use generic HashMaps and > Sets to manage caches and invalidation status. For example, > RocksDBStore.cacheDirtyKeys is a HashSet. Then, in RocksDBWindowStore, the > Elements are constructed as RocksDBStore. > Similarly, the MemoryLRUCache internally holds a > LinkedHashMap map, and a HashSet keys, and these end up holding > byte[] keys. Finally, user-code may attempt to use any of these provided > types with byte[], with undesirable results. > Keys that are byte-arrays should be wrapped in a type that incorporates the > content in their computation of equals/hashCode. java.nio.ByteBuffer is one > such type that could be used, but a purpose-built immutable class would > likely be a better solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3499) byte[] should not be used as Map key nor Set member
[ https://issues.apache.org/jira/browse/KAFKA-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243856#comment-15243856 ] Guozhang Wang commented on KAFKA-3499: -- [~joshng] Would you like to take a look at the PR? > byte[] should not be used as Map key nor Set member > --- > > Key: KAFKA-3499 > URL: https://issues.apache.org/jira/browse/KAFKA-3499 > Project: Kafka > Issue Type: Sub-task > Components: streams >Reporter: josh gruenberg >Assignee: Guozhang Wang > Labels: user-experience > Fix For: 0.10.0.0 > > > On the JVM, Array.equals and Array.hashCode do not incorporate array > contents; they inherit Object.equals/hashCode. This implies that Collections > that rely upon equals/hashCode (eg, HashMap/HashSet and variants) treat two > arrays with equal contents as distinct elements. > Many of the Kafka Streams internal classes currently use generic HashMaps and > Sets to manage caches and invalidation status. For example, > RocksDBStore.cacheDirtyKeys is a HashSet. Then, in RocksDBWindowStore, the > Elements are constructed as RocksDBStore. > Similarly, the MemoryLRUCache internally holds a > LinkedHashMap map, and a HashSet keys, and these end up holding > byte[] keys. Finally, user-code may attempt to use any of these provided > types with byte[], with undesirable results. > Keys that are byte-arrays should be wrapped in a type that incorporates the > content in their computation of equals/hashCode. java.nio.ByteBuffer is one > such type that could be used, but a purpose-built immutable class would > likely be a better solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3499) byte[] should not be used as Map key nor Set member
[ https://issues.apache.org/jira/browse/KAFKA-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243852#comment-15243852 ] ASF GitHub Bot commented on KAFKA-3499: --- GitHub user guozhangwang opened a pull request: https://github.com/apache/kafka/pull/1229 KAFKA-3499: prevent array typed keys in KeyValueStore You can merge this pull request into a Git repository by running: $ git pull https://github.com/guozhangwang/kafka K3499 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/1229.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1229 commit 718b8f1423d885c5bee32bacbcacf4fb78598372 Author: Guozhang WangDate: 2016-04-15T00:26:57Z wrap segments keys with ByteBuffer commit 9302bc18c8fa400b1fb7d4b36f29c8f0c812785c Author: Guozhang Wang Date: 2016-04-15T23:35:36Z prevent array keys in key value store > byte[] should not be used as Map key nor Set member > --- > > Key: KAFKA-3499 > URL: https://issues.apache.org/jira/browse/KAFKA-3499 > Project: Kafka > Issue Type: Sub-task > Components: streams >Reporter: josh gruenberg >Assignee: Guozhang Wang > Labels: user-experience > Fix For: 0.10.0.0 > > > On the JVM, Array.equals and Array.hashCode do not incorporate array > contents; they inherit Object.equals/hashCode. This implies that Collections > that rely upon equals/hashCode (eg, HashMap/HashSet and variants) treat two > arrays with equal contents as distinct elements. > Many of the Kafka Streams internal classes currently use generic HashMaps and > Sets to manage caches and invalidation status. For example, > RocksDBStore.cacheDirtyKeys is a HashSet. Then, in RocksDBWindowStore, the > Elements are constructed as RocksDBStore . > Similarly, the MemoryLRUCache internally holds a > LinkedHashMap map, and a HashSet keys, and these end up holding > byte[] keys. Finally, user-code may attempt to use any of these provided > types with byte[], with undesirable results. > Keys that are byte-arrays should be wrapped in a type that incorporates the > content in their computation of equals/hashCode. java.nio.ByteBuffer is one > such type that could be used, but a purpose-built immutable class would > likely be a better solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3499) byte[] should not be used as Map key nor Set member
[ https://issues.apache.org/jira/browse/KAFKA-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15235799#comment-15235799 ] Michael Noll commented on KAFKA-3499: - FWIW, we ran into the same problem when handling byte[] in Twitter Algebird. Back then I introduced a custom [Bytes|https://github.com/twitter/algebird/blob/develop/algebird-core/src/main/scala/com/twitter/algebird/Bytes.scala] wrapper for byte arrays ([original PR|https://github.com/twitter/algebird/pull/399/files]), which happens to use java.nio.ByteBuffer. The Bytes.scala code might be a good starting point; it includes a sane implementations of hashCode, ordering/compare, equals, etc. > byte[] should not be used as Map key nor Set member > --- > > Key: KAFKA-3499 > URL: https://issues.apache.org/jira/browse/KAFKA-3499 > Project: Kafka > Issue Type: Sub-task > Components: streams >Reporter: josh gruenberg > Labels: user-experience > Fix For: 0.10.0.0 > > > On the JVM, Array.equals and Array.hashCode do not incorporate array > contents; they inherit Object.equals/hashCode. This implies that Collections > that rely upon equals/hashCode (eg, HashMap/HashSet and variants) treat two > arrays with equal contents as distinct elements. > Many of the Kafka Streams internal classes currently use generic HashMaps and > Sets to manage caches and invalidation status. For example, > RocksDBStore.cacheDirtyKeys is a HashSet. Then, in RocksDBWindowStore, the > Elements are constructed as RocksDBStore. > Similarly, the MemoryLRUCache internally holds a > LinkedHashMap map, and a HashSet keys, and these end up holding > byte[] keys. Finally, user-code may attempt to use any of these provided > types with byte[], with undesirable results. > Keys that are byte-arrays should be wrapped in a type that incorporates the > content in their computation of equals/hashCode. java.nio.ByteBuffer is one > such type that could be used, but a purpose-built immutable class would > likely be a better solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3499) byte[] should not be used as Map key nor Set member
[ https://issues.apache.org/jira/browse/KAFKA-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225135#comment-15225135 ] Guozhang Wang commented on KAFKA-3499: -- Converting to Kafka Streams subtask to keep track of double checking byte arrays. > byte[] should not be used as Map key nor Set member > --- > > Key: KAFKA-3499 > URL: https://issues.apache.org/jira/browse/KAFKA-3499 > Project: Kafka > Issue Type: Sub-task > Components: kafka streams >Reporter: josh gruenberg > Fix For: 0.10.0.0 > > > On the JVM, Array.equals and Array.hashCode do not incorporate array > contents; they inherit Object.equals/hashCode. This implies that Collections > that rely upon equals/hashCode (eg, HashMap/HashSet and variants) treat two > arrays with equal contents as distinct elements. > Many of the Kafka Streams internal classes currently use generic HashMaps and > Sets to manage caches and invalidation status. For example, > RocksDBStore.cacheDirtyKeys is a HashSet. Then, in RocksDBWindowStore, the > Elements are constructed as RocksDBStore. > Similarly, the MemoryLRUCache internally holds a > LinkedHashMap map, and a HashSet keys, and these end up holding > byte[] keys. Finally, user-code may attempt to use any of these provided > types with byte[], with undesirable results. > Keys that are byte-arrays should be wrapped in a type that incorporates the > content in their computation of equals/hashCode. java.nio.ByteBuffer is one > such type that could be used, but a purpose-built immutable class would > likely be a better solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3499) byte[] should not be used as Map key nor Set member
[ https://issues.apache.org/jira/browse/KAFKA-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224928#comment-15224928 ] Tommy Becker commented on KAFKA-3499: - This was/is a problem in Samza as well. Though many classes may have this problem, byte[] seems particularly common in this scenario. It's hard to imagine a situation where byte array keys are ever going to be what you want in a cache. https://issues.apache.org/jira/browse/SAMZA-505 > byte[] should not be used as Map key nor Set member > --- > > Key: KAFKA-3499 > URL: https://issues.apache.org/jira/browse/KAFKA-3499 > Project: Kafka > Issue Type: Bug > Components: kafka streams >Reporter: josh gruenberg > Fix For: 0.10.0.0 > > > On the JVM, Array.equals and Array.hashCode do not incorporate array > contents; they inherit Object.equals/hashCode. This implies that Collections > that rely upon equals/hashCode (eg, HashMap/HashSet and variants) treat two > arrays with equal contents as distinct elements. > Many of the Kafka Streams internal classes currently use generic HashMaps and > Sets to manage caches and invalidation status. For example, > RocksDBStore.cacheDirtyKeys is a HashSet. Then, in RocksDBWindowStore, the > Elements are constructed as RocksDBStore. > Similarly, the MemoryLRUCache internally holds a > LinkedHashMap map, and a HashSet keys, and these end up holding > byte[] keys. Finally, user-code may attempt to use any of these provided > types with byte[], with undesirable results. > Keys that are byte-arrays should be wrapped in a type that incorporates the > content in their computation of equals/hashCode. java.nio.ByteBuffer is one > such type that could be used, but a purpose-built immutable class would > likely be a better solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3499) byte[] should not be used as Map key nor Set member
[ https://issues.apache.org/jira/browse/KAFKA-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224809#comment-15224809 ] Guozhang Wang commented on KAFKA-3499: -- [~ijuma] We realized this problem before in the unit tests, and hence used {code}RawStoreChangeLogger{code} which has a customized comparator for such cases. I thought it should be sufficient for case 2), i.e. our own use case here. But let me double check the source code again to make sure. > byte[] should not be used as Map key nor Set member > --- > > Key: KAFKA-3499 > URL: https://issues.apache.org/jira/browse/KAFKA-3499 > Project: Kafka > Issue Type: Bug > Components: kafka streams >Reporter: josh gruenberg > Fix For: 0.10.0.0 > > > On the JVM, Array.equals and Array.hashCode do not incorporate array > contents; they inherit Object.equals/hashCode. This implies that Collections > that rely upon equals/hashCode (eg, HashMap/HashSet and variants) treat two > arrays with equal contents as distinct elements. > Many of the Kafka Streams internal classes currently use generic HashMaps and > Sets to manage caches and invalidation status. For example, > RocksDBStore.cacheDirtyKeys is a HashSet. Then, in RocksDBWindowStore, the > Elements are constructed as RocksDBStore. > Similarly, the MemoryLRUCache internally holds a > LinkedHashMap map, and a HashSet keys, and these end up holding > byte[] keys. Finally, user-code may attempt to use any of these provided > types with byte[], with undesirable results. > Keys that are byte-arrays should be wrapped in a type that incorporates the > content in their computation of equals/hashCode. java.nio.ByteBuffer is one > such type that could be used, but a purpose-built immutable class would > likely be a better solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3499) byte[] should not be used as Map key nor Set member
[ https://issues.apache.org/jira/browse/KAFKA-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224768#comment-15224768 ] Jay Kreps commented on KAFKA-3499: -- Yeah, gotcha, that would be totally broken. > byte[] should not be used as Map key nor Set member > --- > > Key: KAFKA-3499 > URL: https://issues.apache.org/jira/browse/KAFKA-3499 > Project: Kafka > Issue Type: Bug > Components: kafka streams >Reporter: josh gruenberg > Fix For: 0.10.0.0 > > > On the JVM, Array.equals and Array.hashCode do not incorporate array > contents; they inherit Object.equals/hashCode. This implies that Collections > that rely upon equals/hashCode (eg, HashMap/HashSet and variants) treat two > arrays with equal contents as distinct elements. > Many of the Kafka Streams internal classes currently use generic HashMaps and > Sets to manage caches and invalidation status. For example, > RocksDBStore.cacheDirtyKeys is a HashSet. Then, in RocksDBWindowStore, the > Elements are constructed as RocksDBStore. > Similarly, the MemoryLRUCache internally holds a > LinkedHashMap map, and a HashSet keys, and these end up holding > byte[] keys. Finally, user-code may attempt to use any of these provided > types with byte[], with undesirable results. > Keys that are byte-arrays should be wrapped in a type that incorporates the > content in their computation of equals/hashCode. java.nio.ByteBuffer is one > such type that could be used, but a purpose-built immutable class would > likely be a better solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3499) byte[] should not be used as Map key nor Set member
[ https://issues.apache.org/jira/browse/KAFKA-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224758#comment-15224758 ] Ismael Juma commented on KAFKA-3499: [~jkreps], I think there are two separate issues: 1. Third-party code that misuses the API. I think documenting is fine for those cases. 2. Our own code doing this. We need to fix those cases. I intrepreted the issue as saying that we have cases of 2, but I didn't check. > byte[] should not be used as Map key nor Set member > --- > > Key: KAFKA-3499 > URL: https://issues.apache.org/jira/browse/KAFKA-3499 > Project: Kafka > Issue Type: Bug > Components: kafka streams >Reporter: josh gruenberg > Fix For: 0.10.0.0 > > > On the JVM, Array.equals and Array.hashCode do not incorporate array > contents; they inherit Object.equals/hashCode. This implies that Collections > that rely upon equals/hashCode (eg, HashMap/HashSet and variants) treat two > arrays with equal contents as distinct elements. > Many of the Kafka Streams internal classes currently use generic HashMaps and > Sets to manage caches and invalidation status. For example, > RocksDBStore.cacheDirtyKeys is a HashSet. Then, in RocksDBWindowStore, the > Elements are constructed as RocksDBStore. > Similarly, the MemoryLRUCache internally holds a > LinkedHashMap map, and a HashSet keys, and these end up holding > byte[] keys. Finally, user-code may attempt to use any of these provided > types with byte[], with undesirable results. > Keys that are byte-arrays should be wrapped in a type that incorporates the > content in their computation of equals/hashCode. java.nio.ByteBuffer is one > such type that could be used, but a purpose-built immutable class would > likely be a better solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3499) byte[] should not be used as Map key nor Set member
[ https://issues.apache.org/jira/browse/KAFKA-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224741#comment-15224741 ] Jay Kreps commented on KAFKA-3499: -- I think this problem is more general than byte[] since lots of things don't implement equals/hc. A couple of other options: 1. Just document it, this, after all, is how Java handles it. 2. Do a one-time check, something like {code} if(firstExecution) { checkOverridesEqualsAndHC(key); firstExecution = false; } {code} > byte[] should not be used as Map key nor Set member > --- > > Key: KAFKA-3499 > URL: https://issues.apache.org/jira/browse/KAFKA-3499 > Project: Kafka > Issue Type: Bug > Components: kafka streams >Reporter: josh gruenberg > Fix For: 0.10.0.0 > > > On the JVM, Array.equals and Array.hashCode do not incorporate array > contents; they inherit Object.equals/hashCode. This implies that Collections > that rely upon equals/hashCode (eg, HashMap/HashSet and variants) treat two > arrays with equal contents as distinct elements. > Many of the Kafka Streams internal classes currently use generic HashMaps and > Sets to manage caches and invalidation status. For example, > RocksDBStore.cacheDirtyKeys is a HashSet. Then, in RocksDBWindowStore, the > Elements are constructed as RocksDBStore. > Similarly, the MemoryLRUCache internally holds a > LinkedHashMap map, and a HashSet keys, and these end up holding > byte[] keys. Finally, user-code may attempt to use any of these provided > types with byte[], with undesirable results. > Keys that are byte-arrays should be wrapped in a type that incorporates the > content in their computation of equals/hashCode. java.nio.ByteBuffer is one > such type that could be used, but a purpose-built immutable class would > likely be a better solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3499) byte[] should not be used as Map key nor Set member
[ https://issues.apache.org/jira/browse/KAFKA-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224691#comment-15224691 ] Ismael Juma commented on KAFKA-3499: [~guozhang], this looks important to fix so I set the fix version to 0.10.0.0. Please change it if you disagree. > byte[] should not be used as Map key nor Set member > --- > > Key: KAFKA-3499 > URL: https://issues.apache.org/jira/browse/KAFKA-3499 > Project: Kafka > Issue Type: Bug > Components: kafka streams >Reporter: josh gruenberg > Fix For: 0.10.0.0 > > > On the JVM, Array.equals and Array.hashCode do not incorporate array > contents; they inherit Object.equals/hashCode. This implies that Collections > that rely upon equals/hashCode (eg, HashMap/HashSet and variants) treat two > arrays with equal contents as distinct elements. > Many of the Kafka Streams internal classes currently use generic HashMaps and > Sets to manage caches and invalidation status. For example, > RocksDBStore.cacheDirtyKeys is a HashSet. Then, in RocksDBWindowStore, the > Elements are constructed as RocksDBStore. > Similarly, the MemoryLRUCache internally holds a > LinkedHashMap map, and a HashSet keys, and these end up holding > byte[] keys. Finally, user-code may attempt to use any of these provided > types with byte[], with undesirable results. > Keys that are byte-arrays should be wrapped in a type that incorporates the > content in their computation of equals/hashCode. java.nio.ByteBuffer is one > such type that could be used, but a purpose-built immutable class would > likely be a better solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)