[jira] [Created] (KAFKA-9455) Consider using TreeMap for In-memory stores of Streams

Guozhang Wang (Jira) Sun, 19 Jan 2020 17:56:29 -0800

Guozhang Wang created KAFKA-9455:
------------------------------------

             Summary: Consider using TreeMap for In-memory stores of Streams
                 Key: KAFKA-9455
                 URL: https://issues.apache.org/jira/browse/KAFKA-9455
             Project: Kafka
          Issue Type: Improvement
          Components: streams
            Reporter: Guozhang Wang



>From [~ableegoldman]: It's worth noting that it might be a good idea to switch 
>to TreeMap for different reasons. Right now the ConcurrentSkipListMap allows 
>us to safely perform range queries without copying over the entire keyset, but 
>the performance on point queries seems to scale noticeably worse with the 
>number of unique keys. Point queries are used by aggregations while range 
>queries are used by windowed joins, but of course both are available within 
>the PAPI and for interactive queries so it's hard to say which we should 
>prefer. Maybe rather than make that tradeoff we should have one version for 
>efficient range queries (a "JoinWindowStore") and one for efficient point 
>queries ("AggWindowStore") - or something. I know we've had similar thoughts 
>for a different RocksDB store layout for Joins (although I can't find that 
>ticket anywhere..), it seems like the in-memory stores could benefit from a 
>special "Join" version as well cc/ Guozhang Wang

Here are some random thoughts:

1. For kafka streams processing logic (i.e. without IQ), it's better to make 
all processing logic relying on point queries rather than range queries. Right 
now the only processor that use range queries are, as mentioned above, windowed 
stream-stream joins. I think we should consider using a different window 
implementation for this (and as a result also get rid of the retainDuplicate 
flags) to refactor the windowed stream-stream join operation.

2. With 1), range queries would only be exposed as IQ. Depending on its usage 
frequency I think it makes lots of sense to optimize for single-point queries.

Of course, even without step 1) we should still consider using tree-map for 
windowed in-memory stores to have a better scaling effect.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (KAFKA-9455) Consider using TreeMap for In-memory stores of Streams

Reply via email to