[
https://issues.apache.org/jira/browse/SAMZA-505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274295#comment-14274295
]
Jay Kreps commented on SAMZA-505:
---------------------------------
I think that since there is no way to implement a class-agnostic equality check
the best you can do is special-case byte[] or array in general. This is a
common special case so it may be worth doing, but I think the problem will
continue to exist for any user-defined class that doesn't implement or inherit
a reasonable equals/hashCode.
An alternative would be to cache the serialized form. This would fix several
problems including the need to flush to implement range scan (I think), but it
kind of kills a lot of the point of the cache in the first place since the
rocksdb data is cached too...
Another take would just be to say: this sucks but it is no worse than HashMap
which Java programmers are pretty familiar with. So rather than trying to fix
special cases just make the warning in the docs bigger and bolder.
> CachedStore doesn't support Array keys well
> -------------------------------------------
>
> Key: SAMZA-505
> URL: https://issues.apache.org/jira/browse/SAMZA-505
> Project: Samza
> Issue Type: Bug
> Components: kv
> Affects Versions: 0.8.0
> Reporter: Chris Riccomini
> Fix For: 0.9.0
>
>
> Several people have hit an issue when using the Key/Value store with byte[]
> keys. Since CachedStore uses a HashMap, and Array.equals/Array.hashCode
> return object identity values, the HashMap behaves unexpectedly. This isn't
> really a bug, just a common misunderstanding in how things work. It's
> compounded by the fact that we default caches to "on". This yields the
> behavior:
> {code}
> store.put("a".getBytes, 1)
> store.get("a".getBytes) // returns null
> {code}
> See [this
> discussion|http://stackoverflow.com/questions/1058149/using-a-byte-array-as-hashmap-key-java]
> for details.
> Our TestKeyValueStore uses byte[] keys, but it keeps them in a list, and
> re-uses the same exact instance, so we don't hit this problem.
> I think we should wrap array keys in ByteBuffer, or use our own wrapper.
> We'll have to make sure to unwrap before calling the put/get/delete
> operations on the underlying store.
> Initially, I was thinking that the safest thing to do would be to have
> CachedStore check all keys, and throw an exception. This would allow
> individuals to choose the best course of action (ByteBuffer.wrap, use an
> alternative key, write a custom wrapper class, etc). But, I think this
> approach doesn't work in some cases. If there's a cache with a JSON serde,
> and the user is using a key of Array[Int], using the key of Array[Int] is
> valid. A JSON serde would just serialize it as [1,2,3], and everything should
> work in this case.
> Since this problem is basically an implementation detail introduced by
> CachedStore, I think it should be fixed internally by wrapping/unwrapping
> array keys.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)