[jira] [Created] (KAFKA-8731) InMemorySessionStore throws NullPointerException on startup
Jonathan Gordon created KAFKA-8731: -- Summary: InMemorySessionStore throws NullPointerException on startup Key: KAFKA-8731 URL: https://issues.apache.org/jira/browse/KAFKA-8731 Project: Kafka Issue Type: Bug Components: streams Affects Versions: 2.3.0 Reporter: Jonathan Gordon I receive a NullPointerException on startup when trying to use the new InMemorySessionStore via Stores.inMemorySessionStore(...) using the DSL. Here's the stack trace: {{ERROR [2019-07-29 21:56:52,246] org.apache.kafka.streams.processor.internals.StreamThread: stream-thread [trace_indexer-c8439020-12af-4db2-ad56-3e58cd56540f-StreamThread-1] Encountered the following error during processing:}} {{! java.lang.NullPointerException: null}} {{! at org.apache.kafka.streams.state.internals.InMemorySessionStore.remove(InMemorySessionStore.java:123)}} {{! at org.apache.kafka.streams.state.internals.InMemorySessionStore.put(InMemorySessionStore.java:115)}} {{! at org.apache.kafka.streams.state.internals.InMemorySessionStore.lambda$init$0(InMemorySessionStore.java:93)}} {{! at org.apache.kafka.streams.processor.internals.StateRestoreCallbackAdapter.lambda$adapt$1(StateRestoreCallbackAdapter.java:47)}} {{! at org.apache.kafka.streams.processor.internals.CompositeRestoreListener.restoreBatch(CompositeRestoreListener.java:89)}} {{! at org.apache.kafka.streams.processor.internals.StateRestorer.restore(StateRestorer.java:92)}} {{! at org.apache.kafka.streams.processor.internals.StoreChangelogReader.processNext(StoreChangelogReader.java:317)}} {{! at org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:92)}} {{! at org.apache.kafka.streams.processor.internals.TaskManager.updateNewAndRestoringTasks(TaskManager.java:328)}} {{! at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:867)}} {{! at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:805)}} {{! at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:774)}} Here's the Slack thread: [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1564438647169600] Here's a current PR aimed at fixing the issue: [https://github.com/apache/kafka/pull/7132] -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (KAFKA-7652) Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0
[ https://issues.apache.org/jira/browse/KAFKA-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16782139#comment-16782139 ] Jonathan Gordon commented on KAFKA-7652: That did it! This is really encouraging. Any chance it'll make it into 2.2.0? [^2.3.0-7652-NamedCache.txt] > Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0 > - > > Key: KAFKA-7652 > URL: https://issues.apache.org/jira/browse/KAFKA-7652 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.0, 0.11.0.1, 0.11.0.2, 0.11.0.3, 1.1.1, 2.0.0, > 2.0.1 >Reporter: Jonathan Gordon >Assignee: Guozhang Wang >Priority: Major > Labels: kip > Fix For: 2.2.0 > > Attachments: 0.10.2.1-NamedCache.txt, 2.2.0-rc0_b-NamedCache.txt, > 2.3.0-7652-NamedCache.txt, kafka_10_2_1_flushes.txt, kafka_11_0_3_flushes.txt > > > I'm creating this issue in response to [~guozhang]'s request on the mailing > list: > [https://lists.apache.org/thread.html/97d620f4fd76be070ca4e2c70e2fda53cafe051e8fc4505dbcca0321@%3Cusers.kafka.apache.org%3E] > We are attempting to upgrade our Kafka Streams application from 0.10.2.1 but > experience a severe performance degradation. The highest amount of CPU time > seems spent in retrieving from the local cache. Here's an example thread > profile with 0.11.0.0: > [https://i.imgur.com/l5VEsC2.png] > When things are running smoothly we're gated by retrieving from the state > store with acceptable performance. Here's an example thread profile with > 0.10.2.1: > [https://i.imgur.com/IHxC2cZ.png] > Some investigation reveals that it appears we're performing about 3 orders > magnitude more lookups on the NamedCache over a comparable time period. I've > attached logs of the NamedCache flush logs for 0.10.2.1 and 0.11.0.3. > We're using session windows and have the app configured for > commit.interval.ms = 30 * 1000 and cache.max.bytes.buffering = 10485760 > I'm happy to share more details if they would be helpful. Also happy to run > tests on our data. > I also found this issue, which seems like it may be related: > https://issues.apache.org/jira/browse/KAFKA-4904 > > KIP-420: > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-420%3A+Add+Single+Value+Fetch+in+Session+Stores] > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7652) Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0
[ https://issues.apache.org/jira/browse/KAFKA-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Gordon updated KAFKA-7652: --- Attachment: 2.3.0-7652-NamedCache.txt > Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0 > - > > Key: KAFKA-7652 > URL: https://issues.apache.org/jira/browse/KAFKA-7652 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.0, 0.11.0.1, 0.11.0.2, 0.11.0.3, 1.1.1, 2.0.0, > 2.0.1 >Reporter: Jonathan Gordon >Assignee: Guozhang Wang >Priority: Major > Labels: kip > Fix For: 2.2.0 > > Attachments: 0.10.2.1-NamedCache.txt, 2.2.0-rc0_b-NamedCache.txt, > 2.3.0-7652-NamedCache.txt, kafka_10_2_1_flushes.txt, kafka_11_0_3_flushes.txt > > > I'm creating this issue in response to [~guozhang]'s request on the mailing > list: > [https://lists.apache.org/thread.html/97d620f4fd76be070ca4e2c70e2fda53cafe051e8fc4505dbcca0321@%3Cusers.kafka.apache.org%3E] > We are attempting to upgrade our Kafka Streams application from 0.10.2.1 but > experience a severe performance degradation. The highest amount of CPU time > seems spent in retrieving from the local cache. Here's an example thread > profile with 0.11.0.0: > [https://i.imgur.com/l5VEsC2.png] > When things are running smoothly we're gated by retrieving from the state > store with acceptable performance. Here's an example thread profile with > 0.10.2.1: > [https://i.imgur.com/IHxC2cZ.png] > Some investigation reveals that it appears we're performing about 3 orders > magnitude more lookups on the NamedCache over a comparable time period. I've > attached logs of the NamedCache flush logs for 0.10.2.1 and 0.11.0.3. > We're using session windows and have the app configured for > commit.interval.ms = 30 * 1000 and cache.max.bytes.buffering = 10485760 > I'm happy to share more details if they would be helpful. Also happy to run > tests on our data. > I also found this issue, which seems like it may be related: > https://issues.apache.org/jira/browse/KAFKA-4904 > > KIP-420: > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-420%3A+Add+Single+Value+Fetch+in+Session+Stores] > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7652) Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0
[ https://issues.apache.org/jira/browse/KAFKA-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16780115#comment-16780115 ] Jonathan Gordon commented on KAFKA-7652: {quote}1) when you profile on latest trunk did you see the same pattern as observed in [https://i.imgur.com/IHxC2cZ.png] as well as in the trace logging compared with 0.10.2.x? {quote} The image you linked is actually for 0.10.2.x, which is our current deployment. It shows us gated by RocksDB, but that's actually *faster* than what we saw in 0.11.0.0, the recent trunk, or the test I just ran against 2.2.0-rc0: [https://i.imgur.com/L6PWIEF.png] {quote}2) practically the lookups in the caching layer is very cheap and hence even increased a lot it should not contribute to much overhead, whereas the fetches on the underlying store would be much more expensive. Could you confirm if the performance bottleneck is from the underlying rocksDB, or from the caching layer access? {quote} For 2.2.0-rc0, we're spending the bulk of our time trying to retrieve records from the NamedCache. See: [^0.10.2.1-NamedCache.txt] [^2.2.0-rc0_b-NamedCache.txt] While I agree it seems it should be more performant per retrieval, as you can see from the latest logs, it's the difference between 1,096,089 (2.2.0-rc0) and 19,245 (0.10.2.1) hits per second to the cache. The two orders of magnitude appear to outweigh whatever performance benefit we'd receive from the caching layer. This is just one of 8 tasks. During their respective runs, the services consumed 8.4M messages (0.10.2.1) with no lag vs 637K messages (2.2.0-rc0) with considerable lag. I'd be happy to run again with whatever custom logging or configuration you suggest to help further pinpoint the problem. > Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0 > - > > Key: KAFKA-7652 > URL: https://issues.apache.org/jira/browse/KAFKA-7652 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.0, 0.11.0.1, 0.11.0.2, 0.11.0.3, 1.1.1, 2.0.0, > 2.0.1 >Reporter: Jonathan Gordon >Assignee: Guozhang Wang >Priority: Major > Labels: kip > Fix For: 2.2.0 > > Attachments: 0.10.2.1-NamedCache.txt, 2.2.0-rc0_b-NamedCache.txt, > kafka_10_2_1_flushes.txt, kafka_11_0_3_flushes.txt > > > I'm creating this issue in response to [~guozhang]'s request on the mailing > list: > [https://lists.apache.org/thread.html/97d620f4fd76be070ca4e2c70e2fda53cafe051e8fc4505dbcca0321@%3Cusers.kafka.apache.org%3E] > We are attempting to upgrade our Kafka Streams application from 0.10.2.1 but > experience a severe performance degradation. The highest amount of CPU time > seems spent in retrieving from the local cache. Here's an example thread > profile with 0.11.0.0: > [https://i.imgur.com/l5VEsC2.png] > When things are running smoothly we're gated by retrieving from the state > store with acceptable performance. Here's an example thread profile with > 0.10.2.1: > [https://i.imgur.com/IHxC2cZ.png] > Some investigation reveals that it appears we're performing about 3 orders > magnitude more lookups on the NamedCache over a comparable time period. I've > attached logs of the NamedCache flush logs for 0.10.2.1 and 0.11.0.3. > We're using session windows and have the app configured for > commit.interval.ms = 30 * 1000 and cache.max.bytes.buffering = 10485760 > I'm happy to share more details if they would be helpful. Also happy to run > tests on our data. > I also found this issue, which seems like it may be related: > https://issues.apache.org/jira/browse/KAFKA-4904 > > KIP-420: > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-420%3A+Add+Single+Value+Fetch+in+Session+Stores] > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7652) Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0
[ https://issues.apache.org/jira/browse/KAFKA-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Gordon updated KAFKA-7652: --- Attachment: 0.10.2.1-NamedCache.txt 2.2.0-rc0_b-NamedCache.txt > Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0 > - > > Key: KAFKA-7652 > URL: https://issues.apache.org/jira/browse/KAFKA-7652 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.0, 0.11.0.1, 0.11.0.2, 0.11.0.3, 1.1.1, 2.0.0, > 2.0.1 >Reporter: Jonathan Gordon >Assignee: Guozhang Wang >Priority: Major > Labels: kip > Fix For: 2.2.0 > > Attachments: 0.10.2.1-NamedCache.txt, 2.2.0-rc0_b-NamedCache.txt, > kafka_10_2_1_flushes.txt, kafka_11_0_3_flushes.txt > > > I'm creating this issue in response to [~guozhang]'s request on the mailing > list: > [https://lists.apache.org/thread.html/97d620f4fd76be070ca4e2c70e2fda53cafe051e8fc4505dbcca0321@%3Cusers.kafka.apache.org%3E] > We are attempting to upgrade our Kafka Streams application from 0.10.2.1 but > experience a severe performance degradation. The highest amount of CPU time > seems spent in retrieving from the local cache. Here's an example thread > profile with 0.11.0.0: > [https://i.imgur.com/l5VEsC2.png] > When things are running smoothly we're gated by retrieving from the state > store with acceptable performance. Here's an example thread profile with > 0.10.2.1: > [https://i.imgur.com/IHxC2cZ.png] > Some investigation reveals that it appears we're performing about 3 orders > magnitude more lookups on the NamedCache over a comparable time period. I've > attached logs of the NamedCache flush logs for 0.10.2.1 and 0.11.0.3. > We're using session windows and have the app configured for > commit.interval.ms = 30 * 1000 and cache.max.bytes.buffering = 10485760 > I'm happy to share more details if they would be helpful. Also happy to run > tests on our data. > I also found this issue, which seems like it may be related: > https://issues.apache.org/jira/browse/KAFKA-4904 > > KIP-420: > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-420%3A+Add+Single+Value+Fetch+in+Session+Stores] > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7652) Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0
[ https://issues.apache.org/jira/browse/KAFKA-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16777630#comment-16777630 ] Jonathan Gordon commented on KAFKA-7652: I tested out with trunk on Feb 22 (commit 0d461e4ea0a8353c358ae661837f471995943bb0) and we're still seeing the same performance issue. Aside from logging the output of the NamedCache stats, is there data I can provide to help further narrow down the issue? Any other ideas? > Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0 > - > > Key: KAFKA-7652 > URL: https://issues.apache.org/jira/browse/KAFKA-7652 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.0, 0.11.0.1, 0.11.0.2, 0.11.0.3, 1.1.1, 2.0.0, > 2.0.1 >Reporter: Jonathan Gordon >Assignee: Guozhang Wang >Priority: Major > Labels: kip > Fix For: 2.2.0 > > Attachments: kafka_10_2_1_flushes.txt, kafka_11_0_3_flushes.txt > > > I'm creating this issue in response to [~guozhang]'s request on the mailing > list: > [https://lists.apache.org/thread.html/97d620f4fd76be070ca4e2c70e2fda53cafe051e8fc4505dbcca0321@%3Cusers.kafka.apache.org%3E] > We are attempting to upgrade our Kafka Streams application from 0.10.2.1 but > experience a severe performance degradation. The highest amount of CPU time > seems spent in retrieving from the local cache. Here's an example thread > profile with 0.11.0.0: > [https://i.imgur.com/l5VEsC2.png] > When things are running smoothly we're gated by retrieving from the state > store with acceptable performance. Here's an example thread profile with > 0.10.2.1: > [https://i.imgur.com/IHxC2cZ.png] > Some investigation reveals that it appears we're performing about 3 orders > magnitude more lookups on the NamedCache over a comparable time period. I've > attached logs of the NamedCache flush logs for 0.10.2.1 and 0.11.0.3. > We're using session windows and have the app configured for > commit.interval.ms = 30 * 1000 and cache.max.bytes.buffering = 10485760 > I'm happy to share more details if they would be helpful. Also happy to run > tests on our data. > I also found this issue, which seems like it may be related: > https://issues.apache.org/jira/browse/KAFKA-4904 > > KIP-420: > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-420%3A+Add+Single+Value+Fetch+in+Session+Stores] > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7748) Add wall clock TimeDefinition for suppression of intermediate events
[ https://issues.apache.org/jira/browse/KAFKA-7748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16741781#comment-16741781 ] Jonathan Gordon commented on KAFKA-7748: [~vvcephei] It doesn't appear I have perms to create a KIP. Is that something you were hoping I would do or are you planning on taking that on yourself? > Add wall clock TimeDefinition for suppression of intermediate events > > > Key: KAFKA-7748 > URL: https://issues.apache.org/jira/browse/KAFKA-7748 > Project: Kafka > Issue Type: New Feature > Components: streams >Affects Versions: 2.1.0 >Reporter: Jonathan Gordon >Priority: Major > Labels: needs-kip > > Currently, Kafka Streams offers the ability to suppress intermediate events > based on either RecordTime or WindowEndTime, which are in turn defined by > stream time: > {{Suppressed.untilTimeLimit(final Duration timeToWaitForMoreEvents, final > BufferConfig bufferConfig)}} > It would be helpful to have another option that would allow suppression of > intermediate events based on wall clock time. This would allow us to only > produce a limited number of aggregates independent of their stream time > (which in our case is event time). > For reference, here's the relevant KIP: > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-328%3A+Ability+to+suppress+updates+for+KTables#KIP-328:AbilitytosuppressupdatesforKTables-Best-effortratelimitperkey] > And here's the relevant Confluent Slack thread: > https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1544468349201700 > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7748) Add wall clock TimeDefinition for suppression of intermediate events
Jonathan Gordon created KAFKA-7748: -- Summary: Add wall clock TimeDefinition for suppression of intermediate events Key: KAFKA-7748 URL: https://issues.apache.org/jira/browse/KAFKA-7748 Project: Kafka Issue Type: New Feature Components: streams Affects Versions: 2.1.0 Reporter: Jonathan Gordon Currently, Kafka Streams offers the ability to suppress intermediate events based on either RecordTime or WindowEndTime, which are in turn defined by stream time: {{Suppressed.untilTimeLimit(final Duration timeToWaitForMoreEvents, final BufferConfig bufferConfig)}} It would be helpful to have another option that would allow suppression of intermediate events based on wall clock time. This would allow us to only produce a limited number of aggregates independent of their stream time (which in our case is event time). For reference, here's the relevant KIP: [https://cwiki.apache.org/confluence/display/KAFKA/KIP-328%3A+Ability+to+suppress+updates+for+KTables#KIP-328:AbilitytosuppressupdatesforKTables-Best-effortratelimitperkey] And here's the relevant Confluent Slack thread: https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1544468349201700 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7652) Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0
[ https://issues.apache.org/jira/browse/KAFKA-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Gordon updated KAFKA-7652: --- Description: I'm creating this issue in response to [~guozhang]'s request on the mailing list: [https://lists.apache.org/thread.html/97d620f4fd76be070ca4e2c70e2fda53cafe051e8fc4505dbcca0321@%3Cusers.kafka.apache.org%3E] We are attempting to upgrade our Kafka Streams application from 0.10.2.1 but experience a severe performance degradation. The highest amount of CPU time seems spent in retrieving from the local cache. Here's an example thread profile with 0.11.0.0: [https://i.imgur.com/l5VEsC2.png] When things are running smoothly we're gated by retrieving from the state store with acceptable performance. Here's an example thread profile with 0.10.2.1: [https://i.imgur.com/IHxC2cZ.png] Some investigation reveals that it appears we're performing about 3 orders magnitude more lookups on the NamedCache over a comparable time period. I've attached logs of the NamedCache flush logs for 0.10.2.1 and 0.11.0.3. We're using session windows and have the app configured for commit.interval.ms = 30 * 1000 and cache.max.bytes.buffering = 10485760 I'm happy to share more details if they would be helpful. Also happy to run tests on our data. I also found this issue, which seems like it may be related: https://issues.apache.org/jira/browse/KAFKA-4904 was: Here's the original thread from the mailing list: https://lists.apache.org/thread.html/97d620f4fd76be070ca4e2c70e2fda53cafe051e8fc4505dbcca0321@%3Cusers.kafka.apache.org%3E We are attempting to upgrade our Kafka Streams application from 0.10.2.1 but experience a severe performance degradation. The highest amount of CPU time seems spent in retrieving from the local cache. Here's an example with 0.11.0.0: [https://i.imgur.com/l5VEsC2.png] When things are running smoothly we're gated by retrieving from the state store with acceptable performance. Here's an example with 0.10.2.1: [https://i.imgur.com/IHxC2cZ.png] Some investigation reveals that it appears we're performing about 3 orders magnitude more lookups on the NamedCache over a comparable time period. I've attached logs of the NamedCache flush logs for 0.10.2.1 and 0.11.0.3. We're using session windows and have the app configured for commit.interval.ms of 30 * 1000 and cache.max.bytes.buffering = 10485760 I'm happy to share more details if they would be helpful. Also happy to run tests on our data. > Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0 > - > > Key: KAFKA-7652 > URL: https://issues.apache.org/jira/browse/KAFKA-7652 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.0, 0.11.0.1, 0.11.0.2, 0.11.0.3, 1.1.1, 2.0.0, > 2.0.1 >Reporter: Jonathan Gordon >Priority: Major > Attachments: kafka_10_2_1_flushes.txt, kafka_11_0_3_flushes.txt > > > I'm creating this issue in response to [~guozhang]'s request on the mailing > list: > [https://lists.apache.org/thread.html/97d620f4fd76be070ca4e2c70e2fda53cafe051e8fc4505dbcca0321@%3Cusers.kafka.apache.org%3E] > We are attempting to upgrade our Kafka Streams application from 0.10.2.1 but > experience a severe performance degradation. The highest amount of CPU time > seems spent in retrieving from the local cache. Here's an example thread > profile with 0.11.0.0: > [https://i.imgur.com/l5VEsC2.png] > When things are running smoothly we're gated by retrieving from the state > store with acceptable performance. Here's an example thread profile with > 0.10.2.1: > [https://i.imgur.com/IHxC2cZ.png] > Some investigation reveals that it appears we're performing about 3 orders > magnitude more lookups on the NamedCache over a comparable time period. I've > attached logs of the NamedCache flush logs for 0.10.2.1 and 0.11.0.3. > We're using session windows and have the app configured for > commit.interval.ms = 30 * 1000 and cache.max.bytes.buffering = 10485760 > I'm happy to share more details if they would be helpful. Also happy to run > tests on our data. > I also found this issue, which seems like it may be related: > https://issues.apache.org/jira/browse/KAFKA-4904 > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7652) Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0
Jonathan Gordon created KAFKA-7652: -- Summary: Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0 Key: KAFKA-7652 URL: https://issues.apache.org/jira/browse/KAFKA-7652 Project: Kafka Issue Type: Bug Affects Versions: 2.0.1, 2.0.0, 1.1.1, 0.11.0.3, 0.11.0.2, 0.11.0.1, 0.11.0.0 Reporter: Jonathan Gordon Attachments: kafka_10_2_1_flushes.txt, kafka_11_0_3_flushes.txt Here's the original thread from the mailing list: https://lists.apache.org/thread.html/97d620f4fd76be070ca4e2c70e2fda53cafe051e8fc4505dbcca0321@%3Cusers.kafka.apache.org%3E We are attempting to upgrade our Kafka Streams application from 0.10.2.1 but experience a severe performance degradation. The highest amount of CPU time seems spent in retrieving from the local cache. Here's an example with 0.11.0.0: [https://i.imgur.com/l5VEsC2.png] When things are running smoothly we're gated by retrieving from the state store with acceptable performance. Here's an example with 0.10.2.1: [https://i.imgur.com/IHxC2cZ.png] Some investigation reveals that it appears we're performing about 3 orders magnitude more lookups on the NamedCache over a comparable time period. I've attached logs of the NamedCache flush logs for 0.10.2.1 and 0.11.0.3. We're using session windows and have the app configured for commit.interval.ms of 30 * 1000 and cache.max.bytes.buffering = 10485760 I'm happy to share more details if they would be helpful. Also happy to run tests on our data. -- This message was sent by Atlassian JIRA (v7.6.3#76005)