[ https://issues.apache.org/jira/browse/KAFKA-9455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036300#comment-17036300 ]
highluck commented on KAFKA-9455: --------------------------------- [~guozhang] I don't know if I understand.. I'm trying to replace JoinWindowStore and existing InMemoryWindowStore with TreeMap. What do you think? single-point query -> WindowStore with TreeMap range-query -> JoinWindowStore If it's not what you think, please give me a hint > Consider using TreeMap for in-memory stores of Streams > ------------------------------------------------------ > > Key: KAFKA-9455 > URL: https://issues.apache.org/jira/browse/KAFKA-9455 > Project: Kafka > Issue Type: Improvement > Components: streams > Reporter: Guozhang Wang > Assignee: highluck > Priority: Major > Labels: newbie++ > > From [~ableegoldman]: It's worth noting that it might be a good idea to > switch to TreeMap for different reasons. Right now the ConcurrentSkipListMap > allows us to safely perform range queries without copying over the entire > keyset, but the performance on point queries seems to scale noticeably worse > with the number of unique keys. Point queries are used by aggregations while > range queries are used by windowed joins, but of course both are available > within the PAPI and for interactive queries so it's hard to say which we > should prefer. Maybe rather than make that tradeoff we should have one > version for efficient range queries (a "JoinWindowStore") and one for > efficient point queries ("AggWindowStore") - or something. I know we've had > similar thoughts for a different RocksDB store layout for Joins (although I > can't find that ticket anywhere..), it seems like the in-memory stores could > benefit from a special "Join" version as well cc/ Guozhang Wang > Here are some random thoughts: > 1. For kafka streams processing logic (i.e. without IQ), it's better to make > all processing logic relying on point queries rather than range queries. > Right now the only processor that use range queries are, as mentioned above, > windowed stream-stream joins. I think we should consider using a different > window implementation for this (and as a result also get rid of the > retainDuplicate flags) to refactor the windowed stream-stream join operation. > 2. With 1), range queries would only be exposed as IQ. Depending on its usage > frequency I think it makes lots of sense to optimize for single-point queries. > Of course, even without step 1) we should still consider using tree-map for > windowed in-memory stores to have a better scaling effect. -- This message was sent by Atlassian Jira (v8.3.4#803005)