[ https://issues.apache.org/jira/browse/KAFKA-14460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sagar Rao reassigned KAFKA-14460: --------------------------------- Assignee: (was: Sagar Rao) > In-memory store iterators can return results with null values > ------------------------------------------------------------- > > Key: KAFKA-14460 > URL: https://issues.apache.org/jira/browse/KAFKA-14460 > Project: Kafka > Issue Type: Bug > Components: streams > Reporter: A. Sophie Blee-Goldman > Priority: Major > > Due to the thread-safety model we adopted in our in-memory stores to avoid > scaling issues, we synchronize all read/write methods and then during range > scans, copy the keyset of all results rather than returning a direct iterator > over the underlying map. When users call #next to read out the iterator > results, we issue a point lookup on the next key and then simply return a new > KeyValue<>(key, get(key)) > This lets the range scan return results without blocking access to the store > by other threads and without risk of ConcurrentModification, as a writer can > modify the real store without affecting the keyset copy of the iterator. This > also means that those changes won't be reflected in what the iterator sees or > returns, which in itself is fine as we don't guarantee consistency semantics > of any kind. > However, we _do_ guarantee that range scans "must not return null values" – > and this contract may be violated if the StreamThread deletes a record that > the iterator was going to return. > tl;dr we should check get(key) for null and skip to the next result if > necessary in the in-memory store iterators. See for example > InMemoryKeyValueIterator (note that we'll probably need to buffer one record > in advance before we return true from #hasNext) -- This message was sent by Atlassian Jira (v8.20.10#820010)