[
https://issues.apache.org/jira/browse/KAFKA-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063945#comment-16063945
]
Guozhang Wang commented on KAFKA-4750:
--------------------------------------
[~mjsax][~evis] [~mihbor] Thanks for your comments. I would like to think a bit
more on the general resolution for this case though before reviewing [~evis]'s
patch:
1. In Kafka messages, "null" byte arrays indicate tombstones, note that this
means that if user's serde decide to serialize any objects into null for a log
compacted topic (e.g. a changelog topic of a state store), it meant to delete
the record from the store.
2. In Kafka Streams state stores, we did NOT enforcing if "null" indicates
deletion from the javadoc:
{code}
/**
* Update the value associated with this key
*
* @param key The key to associate the value to
* @param value The value, it can be null.
* @throws NullPointerException If null is used for key.
*/
void put(K key, V value);
{code}
However our implementation did treat value-typed "null" (note it is not "null"
byte arrays as in serialized messages) as deletions, since we implement
{{delete(key)}} as {{put(key, null)}}. As Evgeny / Michal mentioned, it is
intuitive if our {{put}} semantics aligned with Java's map operations:
{code}
... // store initialized as empty
store.get(key); // returns null
store.put(key, value);
store.delete(key);
store.get(key); // returns null
store.put(key, value);
store.put(key, null); // we can interpret it as "associate the key with null"
or simply delete this key
store.get(key); // returns null, though generally speaking it could indicate
either the key is associated with value or the key does not exist
{code}
Now assuming you have a customized serde that maps "null" object to "not-null"
byte arrays, in this case the above would still hold:
{code}
store.put(key, value);
store.put(key, null); // now "null" object is just a special value that do not
indicate deletion
store.get(key); // returns null, but this should be interpreted as "the key is
associated with null"
{code}
Now assuming you have a customized serde that maps "not null" object to "null"
byte arrays, in this case the "not-null" object is really interpreted as a
dummy value that the above still holds
{code}
store.put(key, value);
store.put(key, MY_DUMMY); // serialized into "null" byte arrays
store.get(key); // returns MY_DUMMY as "null" byte arrays is deserialized
symmetrically
{code}
So I think if we want to allow the above customized interpretation then we
should not implement {{delete()}} as {{put(key, null)}} since "null" objects
may not indicate deletions; if we want to be more restrict then we should
emphasize that in the javadoc above that "@param value The value, it can be
null which indicates deletion of the key".
WDYT?
> KeyValueIterator returns null values
> ------------------------------------
>
> Key: KAFKA-4750
> URL: https://issues.apache.org/jira/browse/KAFKA-4750
> Project: Kafka
> Issue Type: Bug
> Components: streams
> Affects Versions: 0.10.1.1, 0.11.0.0, 0.10.2.1
> Reporter: Michal Borowiecki
> Assignee: Evgeny Veretennikov
> Labels: newbie
> Attachments: DeleteTest.java
>
>
> The API for ReadOnlyKeyValueStore.range method promises the returned iterator
> will not return null values. However, after upgrading from 0.10.0.0 to
> 0.10.1.1 we found null values are returned causing NPEs on our side.
> I found this happens after removing entries from the store and I found
> resemblance to SAMZA-94 defect. The problem seems to be as it was there, when
> deleting entries and having a serializer that does not return null when null
> is passed in, the state store doesn't actually delete that key/value pair but
> the iterator will return null value for that key.
> When I modified our serilizer to return null when null is passed in, the
> problem went away. However, I believe this should be fixed in kafka streams,
> perhaps with a similar approach as SAMZA-94.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)