A. Sophie Blee-Goldman created KAFKA-14460: ----------------------------------------------
Summary: In-memory store iterators can return results with null values Key: KAFKA-14460 URL: https://issues.apache.org/jira/browse/KAFKA-14460 Project: Kafka Issue Type: Bug Components: streams Reporter: A. Sophie Blee-Goldman Due to the thread-safety model we adopted in our in-memory stores to avoid scaling issues, we synchronize all read/write methods and then during range scans, copy the keyset of all results rather than returning a direct iterator over the underlying map. When users call #next to read out the iterator results, we issue a point lookup on the next key and then simply return a new KeyValue<>(key, get(key)) This lets the range scan return results without blocking access to the store by other threads and without risk of ConcurrentModification, as a writer can modify the real store without affecting the keyset copy of the iterator. This also means that those changes won't be reflected in what the iterator sees or returns, which in itself is fine as we don't guarantee consistency semantics of any kind. However, we _do_ guarantee that range scans "must not return null values" – and this contract may be violated if the StreamThread deletes a record that the iterator was going to return. tl;dr we should check get(key) for null and skip to the next result if necessary in the in-memory store iterators. See for example InMemoryKeyValueIterator (note that we'll probably need to buffer one record in advance before we return true from #hasNext) -- This message was sent by Atlassian Jira (v8.20.10#820010)