[jira] [Commented] (KAFKA-3499) byte[] should not be used as Map key nor Set member

2016-04-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258173#comment-15258173
 ] 

ASF GitHub Bot commented on KAFKA-3499:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1229


> byte[] should not be used as Map key nor Set member
> ---
>
> Key: KAFKA-3499
> URL: https://issues.apache.org/jira/browse/KAFKA-3499
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: josh gruenberg
>Assignee: Guozhang Wang
>  Labels: user-experience
> Fix For: 0.10.0.0
>
>
> On the JVM, Array.equals and Array.hashCode do not incorporate array 
> contents; they inherit Object.equals/hashCode. This implies that Collections 
> that rely upon equals/hashCode (eg, HashMap/HashSet and variants) treat two 
> arrays with equal contents as distinct elements.
> Many of the Kafka Streams internal classes currently use generic HashMaps and 
> Sets to manage caches and invalidation status. For example, 
> RocksDBStore.cacheDirtyKeys is a HashSet. Then, in RocksDBWindowStore, the 
> Elements are constructed as RocksDBStore.
> Similarly, the MemoryLRUCache internally holds a 
> LinkedHashMap map, and a HashSet keys, and these end up holding 
> byte[] keys. Finally, user-code may attempt to use any of these provided 
> types with byte[], with undesirable results.
> Keys that are byte-arrays should be wrapped in a type that incorporates the 
> content in their computation of equals/hashCode. java.nio.ByteBuffer is one 
> such type that could be used, but a purpose-built immutable class would 
> likely be a better solution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3499) byte[] should not be used as Map key nor Set member

2016-04-15 Thread josh gruenberg (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243866#comment-15243866
 ] 

josh gruenberg commented on KAFKA-3499:
---

Sure, I'll take a look later tonight.

> byte[] should not be used as Map key nor Set member
> ---
>
> Key: KAFKA-3499
> URL: https://issues.apache.org/jira/browse/KAFKA-3499
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: josh gruenberg
>Assignee: Guozhang Wang
>  Labels: user-experience
> Fix For: 0.10.0.0
>
>
> On the JVM, Array.equals and Array.hashCode do not incorporate array 
> contents; they inherit Object.equals/hashCode. This implies that Collections 
> that rely upon equals/hashCode (eg, HashMap/HashSet and variants) treat two 
> arrays with equal contents as distinct elements.
> Many of the Kafka Streams internal classes currently use generic HashMaps and 
> Sets to manage caches and invalidation status. For example, 
> RocksDBStore.cacheDirtyKeys is a HashSet. Then, in RocksDBWindowStore, the 
> Elements are constructed as RocksDBStore.
> Similarly, the MemoryLRUCache internally holds a 
> LinkedHashMap map, and a HashSet keys, and these end up holding 
> byte[] keys. Finally, user-code may attempt to use any of these provided 
> types with byte[], with undesirable results.
> Keys that are byte-arrays should be wrapped in a type that incorporates the 
> content in their computation of equals/hashCode. java.nio.ByteBuffer is one 
> such type that could be used, but a purpose-built immutable class would 
> likely be a better solution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3499) byte[] should not be used as Map key nor Set member

2016-04-15 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243856#comment-15243856
 ] 

Guozhang Wang commented on KAFKA-3499:
--

[~joshng] Would you like to take a look at the PR?

> byte[] should not be used as Map key nor Set member
> ---
>
> Key: KAFKA-3499
> URL: https://issues.apache.org/jira/browse/KAFKA-3499
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: josh gruenberg
>Assignee: Guozhang Wang
>  Labels: user-experience
> Fix For: 0.10.0.0
>
>
> On the JVM, Array.equals and Array.hashCode do not incorporate array 
> contents; they inherit Object.equals/hashCode. This implies that Collections 
> that rely upon equals/hashCode (eg, HashMap/HashSet and variants) treat two 
> arrays with equal contents as distinct elements.
> Many of the Kafka Streams internal classes currently use generic HashMaps and 
> Sets to manage caches and invalidation status. For example, 
> RocksDBStore.cacheDirtyKeys is a HashSet. Then, in RocksDBWindowStore, the 
> Elements are constructed as RocksDBStore.
> Similarly, the MemoryLRUCache internally holds a 
> LinkedHashMap map, and a HashSet keys, and these end up holding 
> byte[] keys. Finally, user-code may attempt to use any of these provided 
> types with byte[], with undesirable results.
> Keys that are byte-arrays should be wrapped in a type that incorporates the 
> content in their computation of equals/hashCode. java.nio.ByteBuffer is one 
> such type that could be used, but a purpose-built immutable class would 
> likely be a better solution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3499) byte[] should not be used as Map key nor Set member

2016-04-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243852#comment-15243852
 ] 

ASF GitHub Bot commented on KAFKA-3499:
---

GitHub user guozhangwang opened a pull request:

https://github.com/apache/kafka/pull/1229

KAFKA-3499: prevent array typed keys in KeyValueStore



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/guozhangwang/kafka K3499

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1229.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1229


commit 718b8f1423d885c5bee32bacbcacf4fb78598372
Author: Guozhang Wang 
Date:   2016-04-15T00:26:57Z

wrap segments keys with ByteBuffer

commit 9302bc18c8fa400b1fb7d4b36f29c8f0c812785c
Author: Guozhang Wang 
Date:   2016-04-15T23:35:36Z

prevent array keys in key value store




> byte[] should not be used as Map key nor Set member
> ---
>
> Key: KAFKA-3499
> URL: https://issues.apache.org/jira/browse/KAFKA-3499
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: josh gruenberg
>Assignee: Guozhang Wang
>  Labels: user-experience
> Fix For: 0.10.0.0
>
>
> On the JVM, Array.equals and Array.hashCode do not incorporate array 
> contents; they inherit Object.equals/hashCode. This implies that Collections 
> that rely upon equals/hashCode (eg, HashMap/HashSet and variants) treat two 
> arrays with equal contents as distinct elements.
> Many of the Kafka Streams internal classes currently use generic HashMaps and 
> Sets to manage caches and invalidation status. For example, 
> RocksDBStore.cacheDirtyKeys is a HashSet. Then, in RocksDBWindowStore, the 
> Elements are constructed as RocksDBStore.
> Similarly, the MemoryLRUCache internally holds a 
> LinkedHashMap map, and a HashSet keys, and these end up holding 
> byte[] keys. Finally, user-code may attempt to use any of these provided 
> types with byte[], with undesirable results.
> Keys that are byte-arrays should be wrapped in a type that incorporates the 
> content in their computation of equals/hashCode. java.nio.ByteBuffer is one 
> such type that could be used, but a purpose-built immutable class would 
> likely be a better solution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3499) byte[] should not be used as Map key nor Set member

2016-04-11 Thread Michael Noll (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15235799#comment-15235799
 ] 

Michael Noll commented on KAFKA-3499:
-

FWIW, we ran into the same problem when handling byte[] in Twitter Algebird.  
Back then I introduced a custom 
[Bytes|https://github.com/twitter/algebird/blob/develop/algebird-core/src/main/scala/com/twitter/algebird/Bytes.scala]
 wrapper for byte arrays ([original 
PR|https://github.com/twitter/algebird/pull/399/files]), which happens to use 
java.nio.ByteBuffer.  The Bytes.scala code might be a good starting point; it 
includes a sane implementations of hashCode, ordering/compare, equals, etc.

> byte[] should not be used as Map key nor Set member
> ---
>
> Key: KAFKA-3499
> URL: https://issues.apache.org/jira/browse/KAFKA-3499
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: josh gruenberg
>  Labels: user-experience
> Fix For: 0.10.0.0
>
>
> On the JVM, Array.equals and Array.hashCode do not incorporate array 
> contents; they inherit Object.equals/hashCode. This implies that Collections 
> that rely upon equals/hashCode (eg, HashMap/HashSet and variants) treat two 
> arrays with equal contents as distinct elements.
> Many of the Kafka Streams internal classes currently use generic HashMaps and 
> Sets to manage caches and invalidation status. For example, 
> RocksDBStore.cacheDirtyKeys is a HashSet. Then, in RocksDBWindowStore, the 
> Elements are constructed as RocksDBStore.
> Similarly, the MemoryLRUCache internally holds a 
> LinkedHashMap map, and a HashSet keys, and these end up holding 
> byte[] keys. Finally, user-code may attempt to use any of these provided 
> types with byte[], with undesirable results.
> Keys that are byte-arrays should be wrapped in a type that incorporates the 
> content in their computation of equals/hashCode. java.nio.ByteBuffer is one 
> such type that could be used, but a purpose-built immutable class would 
> likely be a better solution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3499) byte[] should not be used as Map key nor Set member

2016-04-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225135#comment-15225135
 ] 

Guozhang Wang commented on KAFKA-3499:
--

Converting to Kafka Streams subtask to keep track of double checking byte 
arrays.

> byte[] should not be used as Map key nor Set member
> ---
>
> Key: KAFKA-3499
> URL: https://issues.apache.org/jira/browse/KAFKA-3499
> Project: Kafka
>  Issue Type: Sub-task
>  Components: kafka streams
>Reporter: josh gruenberg
> Fix For: 0.10.0.0
>
>
> On the JVM, Array.equals and Array.hashCode do not incorporate array 
> contents; they inherit Object.equals/hashCode. This implies that Collections 
> that rely upon equals/hashCode (eg, HashMap/HashSet and variants) treat two 
> arrays with equal contents as distinct elements.
> Many of the Kafka Streams internal classes currently use generic HashMaps and 
> Sets to manage caches and invalidation status. For example, 
> RocksDBStore.cacheDirtyKeys is a HashSet. Then, in RocksDBWindowStore, the 
> Elements are constructed as RocksDBStore.
> Similarly, the MemoryLRUCache internally holds a 
> LinkedHashMap map, and a HashSet keys, and these end up holding 
> byte[] keys. Finally, user-code may attempt to use any of these provided 
> types with byte[], with undesirable results.
> Keys that are byte-arrays should be wrapped in a type that incorporates the 
> content in their computation of equals/hashCode. java.nio.ByteBuffer is one 
> such type that could be used, but a purpose-built immutable class would 
> likely be a better solution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3499) byte[] should not be used as Map key nor Set member

2016-04-04 Thread Tommy Becker (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224928#comment-15224928
 ] 

Tommy Becker commented on KAFKA-3499:
-

This was/is a problem in Samza as well. Though many classes may have this 
problem, byte[] seems particularly common in this scenario. It's hard to 
imagine a situation where byte array keys are ever going to be what you want in 
a cache.

https://issues.apache.org/jira/browse/SAMZA-505

> byte[] should not be used as Map key nor Set member
> ---
>
> Key: KAFKA-3499
> URL: https://issues.apache.org/jira/browse/KAFKA-3499
> Project: Kafka
>  Issue Type: Bug
>  Components: kafka streams
>Reporter: josh gruenberg
> Fix For: 0.10.0.0
>
>
> On the JVM, Array.equals and Array.hashCode do not incorporate array 
> contents; they inherit Object.equals/hashCode. This implies that Collections 
> that rely upon equals/hashCode (eg, HashMap/HashSet and variants) treat two 
> arrays with equal contents as distinct elements.
> Many of the Kafka Streams internal classes currently use generic HashMaps and 
> Sets to manage caches and invalidation status. For example, 
> RocksDBStore.cacheDirtyKeys is a HashSet. Then, in RocksDBWindowStore, the 
> Elements are constructed as RocksDBStore.
> Similarly, the MemoryLRUCache internally holds a 
> LinkedHashMap map, and a HashSet keys, and these end up holding 
> byte[] keys. Finally, user-code may attempt to use any of these provided 
> types with byte[], with undesirable results.
> Keys that are byte-arrays should be wrapped in a type that incorporates the 
> content in their computation of equals/hashCode. java.nio.ByteBuffer is one 
> such type that could be used, but a purpose-built immutable class would 
> likely be a better solution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3499) byte[] should not be used as Map key nor Set member

2016-04-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224809#comment-15224809
 ] 

Guozhang Wang commented on KAFKA-3499:
--

[~ijuma] We realized this problem before in the unit tests, and hence used 
{code}RawStoreChangeLogger{code} which has a customized comparator for such 
cases. I thought it should be sufficient for case 2), i.e. our own use case 
here. But let me double check the source code again to make sure.

> byte[] should not be used as Map key nor Set member
> ---
>
> Key: KAFKA-3499
> URL: https://issues.apache.org/jira/browse/KAFKA-3499
> Project: Kafka
>  Issue Type: Bug
>  Components: kafka streams
>Reporter: josh gruenberg
> Fix For: 0.10.0.0
>
>
> On the JVM, Array.equals and Array.hashCode do not incorporate array 
> contents; they inherit Object.equals/hashCode. This implies that Collections 
> that rely upon equals/hashCode (eg, HashMap/HashSet and variants) treat two 
> arrays with equal contents as distinct elements.
> Many of the Kafka Streams internal classes currently use generic HashMaps and 
> Sets to manage caches and invalidation status. For example, 
> RocksDBStore.cacheDirtyKeys is a HashSet. Then, in RocksDBWindowStore, the 
> Elements are constructed as RocksDBStore.
> Similarly, the MemoryLRUCache internally holds a 
> LinkedHashMap map, and a HashSet keys, and these end up holding 
> byte[] keys. Finally, user-code may attempt to use any of these provided 
> types with byte[], with undesirable results.
> Keys that are byte-arrays should be wrapped in a type that incorporates the 
> content in their computation of equals/hashCode. java.nio.ByteBuffer is one 
> such type that could be used, but a purpose-built immutable class would 
> likely be a better solution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3499) byte[] should not be used as Map key nor Set member

2016-04-04 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224768#comment-15224768
 ] 

Jay Kreps commented on KAFKA-3499:
--

Yeah, gotcha, that would be totally broken.

> byte[] should not be used as Map key nor Set member
> ---
>
> Key: KAFKA-3499
> URL: https://issues.apache.org/jira/browse/KAFKA-3499
> Project: Kafka
>  Issue Type: Bug
>  Components: kafka streams
>Reporter: josh gruenberg
> Fix For: 0.10.0.0
>
>
> On the JVM, Array.equals and Array.hashCode do not incorporate array 
> contents; they inherit Object.equals/hashCode. This implies that Collections 
> that rely upon equals/hashCode (eg, HashMap/HashSet and variants) treat two 
> arrays with equal contents as distinct elements.
> Many of the Kafka Streams internal classes currently use generic HashMaps and 
> Sets to manage caches and invalidation status. For example, 
> RocksDBStore.cacheDirtyKeys is a HashSet. Then, in RocksDBWindowStore, the 
> Elements are constructed as RocksDBStore.
> Similarly, the MemoryLRUCache internally holds a 
> LinkedHashMap map, and a HashSet keys, and these end up holding 
> byte[] keys. Finally, user-code may attempt to use any of these provided 
> types with byte[], with undesirable results.
> Keys that are byte-arrays should be wrapped in a type that incorporates the 
> content in their computation of equals/hashCode. java.nio.ByteBuffer is one 
> such type that could be used, but a purpose-built immutable class would 
> likely be a better solution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3499) byte[] should not be used as Map key nor Set member

2016-04-04 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224758#comment-15224758
 ] 

Ismael Juma commented on KAFKA-3499:


[~jkreps], I think there are two separate issues:
1. Third-party code that misuses the API. I think documenting is fine for those 
cases.
2. Our own code doing this. We need to fix those cases.

I intrepreted the issue as saying that we have cases of 2, but I didn't check.

> byte[] should not be used as Map key nor Set member
> ---
>
> Key: KAFKA-3499
> URL: https://issues.apache.org/jira/browse/KAFKA-3499
> Project: Kafka
>  Issue Type: Bug
>  Components: kafka streams
>Reporter: josh gruenberg
> Fix For: 0.10.0.0
>
>
> On the JVM, Array.equals and Array.hashCode do not incorporate array 
> contents; they inherit Object.equals/hashCode. This implies that Collections 
> that rely upon equals/hashCode (eg, HashMap/HashSet and variants) treat two 
> arrays with equal contents as distinct elements.
> Many of the Kafka Streams internal classes currently use generic HashMaps and 
> Sets to manage caches and invalidation status. For example, 
> RocksDBStore.cacheDirtyKeys is a HashSet. Then, in RocksDBWindowStore, the 
> Elements are constructed as RocksDBStore.
> Similarly, the MemoryLRUCache internally holds a 
> LinkedHashMap map, and a HashSet keys, and these end up holding 
> byte[] keys. Finally, user-code may attempt to use any of these provided 
> types with byte[], with undesirable results.
> Keys that are byte-arrays should be wrapped in a type that incorporates the 
> content in their computation of equals/hashCode. java.nio.ByteBuffer is one 
> such type that could be used, but a purpose-built immutable class would 
> likely be a better solution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3499) byte[] should not be used as Map key nor Set member

2016-04-04 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224741#comment-15224741
 ] 

Jay Kreps commented on KAFKA-3499:
--

I think this problem is more general than byte[] since lots of things don't 
implement equals/hc. A couple of other options:
1. Just document it, this, after all, is how Java handles it.
2. Do a one-time check, something like

{code}
if(firstExecution) {
  checkOverridesEqualsAndHC(key);
  firstExecution = false;
}
{code}

> byte[] should not be used as Map key nor Set member
> ---
>
> Key: KAFKA-3499
> URL: https://issues.apache.org/jira/browse/KAFKA-3499
> Project: Kafka
>  Issue Type: Bug
>  Components: kafka streams
>Reporter: josh gruenberg
> Fix For: 0.10.0.0
>
>
> On the JVM, Array.equals and Array.hashCode do not incorporate array 
> contents; they inherit Object.equals/hashCode. This implies that Collections 
> that rely upon equals/hashCode (eg, HashMap/HashSet and variants) treat two 
> arrays with equal contents as distinct elements.
> Many of the Kafka Streams internal classes currently use generic HashMaps and 
> Sets to manage caches and invalidation status. For example, 
> RocksDBStore.cacheDirtyKeys is a HashSet. Then, in RocksDBWindowStore, the 
> Elements are constructed as RocksDBStore.
> Similarly, the MemoryLRUCache internally holds a 
> LinkedHashMap map, and a HashSet keys, and these end up holding 
> byte[] keys. Finally, user-code may attempt to use any of these provided 
> types with byte[], with undesirable results.
> Keys that are byte-arrays should be wrapped in a type that incorporates the 
> content in their computation of equals/hashCode. java.nio.ByteBuffer is one 
> such type that could be used, but a purpose-built immutable class would 
> likely be a better solution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3499) byte[] should not be used as Map key nor Set member

2016-04-04 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224691#comment-15224691
 ] 

Ismael Juma commented on KAFKA-3499:


[~guozhang], this looks important to fix so I set the fix version to 0.10.0.0. 
Please change it if you disagree.

> byte[] should not be used as Map key nor Set member
> ---
>
> Key: KAFKA-3499
> URL: https://issues.apache.org/jira/browse/KAFKA-3499
> Project: Kafka
>  Issue Type: Bug
>  Components: kafka streams
>Reporter: josh gruenberg
> Fix For: 0.10.0.0
>
>
> On the JVM, Array.equals and Array.hashCode do not incorporate array 
> contents; they inherit Object.equals/hashCode. This implies that Collections 
> that rely upon equals/hashCode (eg, HashMap/HashSet and variants) treat two 
> arrays with equal contents as distinct elements.
> Many of the Kafka Streams internal classes currently use generic HashMaps and 
> Sets to manage caches and invalidation status. For example, 
> RocksDBStore.cacheDirtyKeys is a HashSet. Then, in RocksDBWindowStore, the 
> Elements are constructed as RocksDBStore.
> Similarly, the MemoryLRUCache internally holds a 
> LinkedHashMap map, and a HashSet keys, and these end up holding 
> byte[] keys. Finally, user-code may attempt to use any of these provided 
> types with byte[], with undesirable results.
> Keys that are byte-arrays should be wrapped in a type that incorporates the 
> content in their computation of equals/hashCode. java.nio.ByteBuffer is one 
> such type that could be used, but a purpose-built immutable class would 
> likely be a better solution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)