[jira] [Commented] (KAFKA-9356) Potential data loss in InMemoryWindowStore and InMemorySessionStore
[ https://issues.apache.org/jira/browse/KAFKA-9356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17019177#comment-17019177 ] Guozhang Wang commented on KAFKA-9356: -- Thanks [~ableegoldman], I cannot find the exact ticket so I'm just gonna create a new one: https://issues.apache.org/jira/browse/KAFKA-9455 > Potential data loss in InMemoryWindowStore and InMemorySessionStore > --- > > Key: KAFKA-9356 > URL: https://issues.apache.org/jira/browse/KAFKA-9356 > Project: Kafka > Issue Type: Bug >Reporter: Roman Leventov >Priority: Major > > {{InMemoryWindowStore.put()}} and {{InMemorySessionStore.put()}} call > {{computeIfAbsent()}} method on {{ConcurrentSkipListMap}} objects which opens > up possibility for data loss because > {{ConcurrentSkipListMap.computeIfAbsent()}} is not an atomic operation. > Possible fix: replace {{ConcurrentSkipListMaps with synchronized[Sorted, > Navigable]Map(new TreeMap<>())}}. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9356) Potential data loss in InMemoryWindowStore and InMemorySessionStore
[ https://issues.apache.org/jira/browse/KAFKA-9356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017748#comment-17017748 ] Sophie Blee-Goldman commented on KAFKA-9356: As [~bchen225242] said, Streams is single writer so this should not cause any problems. I'm going to close this bug as "Not a Problem" It's worth noting that it might be a good idea to switch to TreeMap for different reasons. Right now the ConcurrentSkipListMap allows us to safely perform range queries without copying over the entire keyset, but the performance on point queries seems to scale noticeably worse with the number of unique keys. Point queries are used by aggregations while range queries are used by windowed joins, but of course both are available within the PAPI and for interactive queries so it's hard to say which we should prefer. Maybe rather than make that tradeoff we should have one version for efficient range queries (a "JoinWindowStore") and one for efficient point queries ("AggWindowStore") – or something. I know we've had similar thoughts for a different RocksDB store layout for Joins (although I can't find that ticket anywhere..), it seems like the in-memory stores could benefit from a special "Join" version as well cc/ [~guozhang] > Potential data loss in InMemoryWindowStore and InMemorySessionStore > --- > > Key: KAFKA-9356 > URL: https://issues.apache.org/jira/browse/KAFKA-9356 > Project: Kafka > Issue Type: Bug >Reporter: Roman Leventov >Priority: Major > > {{InMemoryWindowStore.put()}} and {{InMemorySessionStore.put()}} call > {{computeIfAbsent()}} method on {{ConcurrentSkipListMap}} objects which opens > up possibility for data loss because > {{ConcurrentSkipListMap.computeIfAbsent()}} is not an atomic operation. > Possible fix: replace {{ConcurrentSkipListMaps with synchronized[Sorted, > Navigable]Map(new TreeMap<>())}}. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9356) Potential data loss in InMemoryWindowStore and InMemorySessionStore
[ https://issues.apache.org/jira/browse/KAFKA-9356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006972#comment-17006972 ] Boyang Chen commented on KAFKA-9356: Thanks for the report. For Streams there is only one writer (stream thread) and one reader (IQ) working on the same in-memory store. So a read-write operation should be safe in this case. cc [~ableegoldman] [~guozhang] > Potential data loss in InMemoryWindowStore and InMemorySessionStore > --- > > Key: KAFKA-9356 > URL: https://issues.apache.org/jira/browse/KAFKA-9356 > Project: Kafka > Issue Type: Bug >Reporter: Roman Leventov >Priority: Major > > {{InMemoryWindowStore.put()}} and {{InMemorySessionStore.put()}} call > {{computeIfAbsent()}} method on {{ConcurrentSkipListMap}} objects which opens > up possibility for data loss because > {{ConcurrentSkipListMap.computeIfAbsent()}} is not an atomic operation. > Possible fix: replace {{ConcurrentSkipListMaps with synchronized[Sorted, > Navigable]Map(new TreeMap<>())}}. > -- This message was sent by Atlassian Jira (v8.3.4#803005)