[ 
https://issues.apache.org/jira/browse/KAFKA-14460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17859725#comment-17859725
 ] 

Ayoub Omari commented on KAFKA-14460:
-------------------------------------

[~ableegoldman] Is this ticket only about KeyValueStore ? I see that for Window 
and Session stores, iterators work directly on the underlying segments

> In-memory store iterators can return results with null values
> -------------------------------------------------------------
>
>                 Key: KAFKA-14460
>                 URL: https://issues.apache.org/jira/browse/KAFKA-14460
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>            Reporter: A. Sophie Blee-Goldman
>            Assignee: Ayoub Omari
>            Priority: Major
>
> Due to the thread-safety model we adopted in our in-memory stores to avoid 
> scaling issues, we synchronize all read/write methods and then during range 
> scans, copy the keyset of all results rather than returning a direct iterator 
> over the underlying map. When users call #next to read out the iterator 
> results, we issue a point lookup on the next key and then simply return a new 
> KeyValue<>(key, get(key))
> This lets the range scan return results without blocking access to the store 
> by other threads and without risk of ConcurrentModification, as a writer can 
> modify the real store without affecting the keyset copy of the iterator. This 
> also means that those changes won't be reflected in what the iterator sees or 
> returns, which in itself is fine as we don't guarantee consistency semantics 
> of any kind.
> However, we _do_ guarantee that range scans "must not return null values" – 
> and this contract may be violated if the StreamThread deletes a record that 
> the iterator was going to return.
> tl;dr we should check get(key) for null and skip to the next result if 
> necessary in the in-memory store iterators. See for example 
> InMemoryKeyValueIterator (note that we'll probably need to buffer one record 
> in advance before we return true from #hasNext)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to