[jira] [Commented] (KAFKA-14460) In-memory store iterators can return results with null values

2024-06-24 Thread Ayoub Omari (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17859725#comment-17859725
 ] 

Ayoub Omari commented on KAFKA-14460:
-

[~ableegoldman] Is this ticket only about KeyValueStore ? I see that for Window 
and Session stores, iterators work directly on the underlying segments

> In-memory store iterators can return results with null values
> -
>
> Key: KAFKA-14460
> URL: https://issues.apache.org/jira/browse/KAFKA-14460
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: A. Sophie Blee-Goldman
>Assignee: Ayoub Omari
>Priority: Major
>
> Due to the thread-safety model we adopted in our in-memory stores to avoid 
> scaling issues, we synchronize all read/write methods and then during range 
> scans, copy the keyset of all results rather than returning a direct iterator 
> over the underlying map. When users call #next to read out the iterator 
> results, we issue a point lookup on the next key and then simply return a new 
> KeyValue<>(key, get(key))
> This lets the range scan return results without blocking access to the store 
> by other threads and without risk of ConcurrentModification, as a writer can 
> modify the real store without affecting the keyset copy of the iterator. This 
> also means that those changes won't be reflected in what the iterator sees or 
> returns, which in itself is fine as we don't guarantee consistency semantics 
> of any kind.
> However, we _do_ guarantee that range scans "must not return null values" – 
> and this contract may be violated if the StreamThread deletes a record that 
> the iterator was going to return.
> tl;dr we should check get(key) for null and skip to the next result if 
> necessary in the in-memory store iterators. See for example 
> InMemoryKeyValueIterator (note that we'll probably need to buffer one record 
> in advance before we return true from #hasNext)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14460) In-memory store iterators can return results with null values

2024-06-20 Thread Lucia Cerchie (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17856496#comment-17856496
 ] 

Lucia Cerchie commented on KAFKA-14460:
---

[~ayoubomari] sure! thank you 

> In-memory store iterators can return results with null values
> -
>
> Key: KAFKA-14460
> URL: https://issues.apache.org/jira/browse/KAFKA-14460
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: A. Sophie Blee-Goldman
>Priority: Major
>
> Due to the thread-safety model we adopted in our in-memory stores to avoid 
> scaling issues, we synchronize all read/write methods and then during range 
> scans, copy the keyset of all results rather than returning a direct iterator 
> over the underlying map. When users call #next to read out the iterator 
> results, we issue a point lookup on the next key and then simply return a new 
> KeyValue<>(key, get(key))
> This lets the range scan return results without blocking access to the store 
> by other threads and without risk of ConcurrentModification, as a writer can 
> modify the real store without affecting the keyset copy of the iterator. This 
> also means that those changes won't be reflected in what the iterator sees or 
> returns, which in itself is fine as we don't guarantee consistency semantics 
> of any kind.
> However, we _do_ guarantee that range scans "must not return null values" – 
> and this contract may be violated if the StreamThread deletes a record that 
> the iterator was going to return.
> tl;dr we should check get(key) for null and skip to the next result if 
> necessary in the in-memory store iterators. See for example 
> InMemoryKeyValueIterator (note that we'll probably need to buffer one record 
> in advance before we return true from #hasNext)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14460) In-memory store iterators can return results with null values

2024-06-19 Thread Ayoub Omari (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17856330#comment-17856330
 ] 

Ayoub Omari commented on KAFKA-14460:
-

Hi [~Cerchie], can I pick this up ?

> In-memory store iterators can return results with null values
> -
>
> Key: KAFKA-14460
> URL: https://issues.apache.org/jira/browse/KAFKA-14460
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: A. Sophie Blee-Goldman
>Assignee: Lucia Cerchie
>Priority: Major
>
> Due to the thread-safety model we adopted in our in-memory stores to avoid 
> scaling issues, we synchronize all read/write methods and then during range 
> scans, copy the keyset of all results rather than returning a direct iterator 
> over the underlying map. When users call #next to read out the iterator 
> results, we issue a point lookup on the next key and then simply return a new 
> KeyValue<>(key, get(key))
> This lets the range scan return results without blocking access to the store 
> by other threads and without risk of ConcurrentModification, as a writer can 
> modify the real store without affecting the keyset copy of the iterator. This 
> also means that those changes won't be reflected in what the iterator sees or 
> returns, which in itself is fine as we don't guarantee consistency semantics 
> of any kind.
> However, we _do_ guarantee that range scans "must not return null values" – 
> and this contract may be violated if the StreamThread deletes a record that 
> the iterator was going to return.
> tl;dr we should check get(key) for null and skip to the next result if 
> necessary in the in-memory store iterators. See for example 
> InMemoryKeyValueIterator (note that we'll probably need to buffer one record 
> in advance before we return true from #hasNext)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)