[jira] [Created] (KAFKA-8731) InMemorySessionStore throws NullPointerException on startup

2019-07-29 Thread Jonathan Gordon (JIRA)
Jonathan Gordon created KAFKA-8731:
--

 Summary: InMemorySessionStore throws NullPointerException on 
startup
 Key: KAFKA-8731
 URL: https://issues.apache.org/jira/browse/KAFKA-8731
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 2.3.0
Reporter: Jonathan Gordon


I receive a NullPointerException on startup when trying to use the new 
InMemorySessionStore via Stores.inMemorySessionStore(...) using the DSL.

Here's the stack trace:

{{ERROR [2019-07-29 21:56:52,246] 
org.apache.kafka.streams.processor.internals.StreamThread: stream-thread 
[trace_indexer-c8439020-12af-4db2-ad56-3e58cd56540f-StreamThread-1] Encountered 
the following error during processing:}}
{{! java.lang.NullPointerException: null}}
{{! at 
org.apache.kafka.streams.state.internals.InMemorySessionStore.remove(InMemorySessionStore.java:123)}}
{{! at 
org.apache.kafka.streams.state.internals.InMemorySessionStore.put(InMemorySessionStore.java:115)}}
{{! at 
org.apache.kafka.streams.state.internals.InMemorySessionStore.lambda$init$0(InMemorySessionStore.java:93)}}
{{! at 
org.apache.kafka.streams.processor.internals.StateRestoreCallbackAdapter.lambda$adapt$1(StateRestoreCallbackAdapter.java:47)}}
{{! at 
org.apache.kafka.streams.processor.internals.CompositeRestoreListener.restoreBatch(CompositeRestoreListener.java:89)}}
{{! at 
org.apache.kafka.streams.processor.internals.StateRestorer.restore(StateRestorer.java:92)}}
{{! at 
org.apache.kafka.streams.processor.internals.StoreChangelogReader.processNext(StoreChangelogReader.java:317)}}
{{! at 
org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:92)}}
{{! at 
org.apache.kafka.streams.processor.internals.TaskManager.updateNewAndRestoringTasks(TaskManager.java:328)}}
{{! at 
org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:867)}}
{{! at 
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:805)}}
{{! at 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:774)}}

 

Here's the Slack thread:

[https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1564438647169600]

 

Here's a current PR aimed at fixing the issue:

[https://github.com/apache/kafka/pull/7132]

 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-7652) Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0

2019-03-01 Thread Jonathan Gordon (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16782139#comment-16782139
 ] 

Jonathan Gordon commented on KAFKA-7652:


That did it! This is really encouraging. Any chance it'll make it into 2.2.0?

[^2.3.0-7652-NamedCache.txt]

> Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0
> -
>
> Key: KAFKA-7652
> URL: https://issues.apache.org/jira/browse/KAFKA-7652
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.11.0.0, 0.11.0.1, 0.11.0.2, 0.11.0.3, 1.1.1, 2.0.0, 
> 2.0.1
>Reporter: Jonathan Gordon
>Assignee: Guozhang Wang
>Priority: Major
>  Labels: kip
> Fix For: 2.2.0
>
> Attachments: 0.10.2.1-NamedCache.txt, 2.2.0-rc0_b-NamedCache.txt, 
> 2.3.0-7652-NamedCache.txt, kafka_10_2_1_flushes.txt, kafka_11_0_3_flushes.txt
>
>
> I'm creating this issue in response to [~guozhang]'s request on the mailing 
> list:
> [https://lists.apache.org/thread.html/97d620f4fd76be070ca4e2c70e2fda53cafe051e8fc4505dbcca0321@%3Cusers.kafka.apache.org%3E]
> We are attempting to upgrade our Kafka Streams application from 0.10.2.1 but 
> experience a severe performance degradation. The highest amount of CPU time 
> seems spent in retrieving from the local cache. Here's an example thread 
> profile with 0.11.0.0:
> [https://i.imgur.com/l5VEsC2.png]
> When things are running smoothly we're gated by retrieving from the state 
> store with acceptable performance. Here's an example thread profile with 
> 0.10.2.1:
> [https://i.imgur.com/IHxC2cZ.png]
> Some investigation reveals that it appears we're performing about 3 orders 
> magnitude more lookups on the NamedCache over a comparable time period. I've 
> attached logs of the NamedCache flush logs for 0.10.2.1 and 0.11.0.3.
> We're using session windows and have the app configured for 
> commit.interval.ms = 30 * 1000 and cache.max.bytes.buffering = 10485760
> I'm happy to share more details if they would be helpful. Also happy to run 
> tests on our data.
> I also found this issue, which seems like it may be related:
> https://issues.apache.org/jira/browse/KAFKA-4904
>  
> KIP-420: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-420%3A+Add+Single+Value+Fetch+in+Session+Stores]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-7652) Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0

2019-03-01 Thread Jonathan Gordon (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gordon updated KAFKA-7652:
---
Attachment: 2.3.0-7652-NamedCache.txt

> Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0
> -
>
> Key: KAFKA-7652
> URL: https://issues.apache.org/jira/browse/KAFKA-7652
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.11.0.0, 0.11.0.1, 0.11.0.2, 0.11.0.3, 1.1.1, 2.0.0, 
> 2.0.1
>Reporter: Jonathan Gordon
>Assignee: Guozhang Wang
>Priority: Major
>  Labels: kip
> Fix For: 2.2.0
>
> Attachments: 0.10.2.1-NamedCache.txt, 2.2.0-rc0_b-NamedCache.txt, 
> 2.3.0-7652-NamedCache.txt, kafka_10_2_1_flushes.txt, kafka_11_0_3_flushes.txt
>
>
> I'm creating this issue in response to [~guozhang]'s request on the mailing 
> list:
> [https://lists.apache.org/thread.html/97d620f4fd76be070ca4e2c70e2fda53cafe051e8fc4505dbcca0321@%3Cusers.kafka.apache.org%3E]
> We are attempting to upgrade our Kafka Streams application from 0.10.2.1 but 
> experience a severe performance degradation. The highest amount of CPU time 
> seems spent in retrieving from the local cache. Here's an example thread 
> profile with 0.11.0.0:
> [https://i.imgur.com/l5VEsC2.png]
> When things are running smoothly we're gated by retrieving from the state 
> store with acceptable performance. Here's an example thread profile with 
> 0.10.2.1:
> [https://i.imgur.com/IHxC2cZ.png]
> Some investigation reveals that it appears we're performing about 3 orders 
> magnitude more lookups on the NamedCache over a comparable time period. I've 
> attached logs of the NamedCache flush logs for 0.10.2.1 and 0.11.0.3.
> We're using session windows and have the app configured for 
> commit.interval.ms = 30 * 1000 and cache.max.bytes.buffering = 10485760
> I'm happy to share more details if they would be helpful. Also happy to run 
> tests on our data.
> I also found this issue, which seems like it may be related:
> https://issues.apache.org/jira/browse/KAFKA-4904
>  
> KIP-420: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-420%3A+Add+Single+Value+Fetch+in+Session+Stores]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7652) Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0

2019-02-27 Thread Jonathan Gordon (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16780115#comment-16780115
 ] 

Jonathan Gordon commented on KAFKA-7652:


{quote}1) when you profile on latest trunk did you see the same pattern as 
observed in [https://i.imgur.com/IHxC2cZ.png] as well as in the trace logging 
compared with 0.10.2.x?
{quote}
The image you linked is actually for 0.10.2.x, which is our current deployment. 
It shows us gated by RocksDB, but that's actually *faster* than what we saw in 
0.11.0.0, the recent trunk, or the test I just ran against 2.2.0-rc0:

[https://i.imgur.com/L6PWIEF.png]
{quote}2) practically the lookups in the caching layer is very cheap and hence 
even increased a lot it should not contribute to much overhead, whereas the 
fetches on the underlying store would be much more expensive. Could you confirm 
if the performance bottleneck is from the underlying rocksDB, or from the 
caching layer access?
{quote}
For 2.2.0-rc0, we're spending the bulk of our time trying to retrieve records 
from the NamedCache. See:

[^0.10.2.1-NamedCache.txt]

[^2.2.0-rc0_b-NamedCache.txt]

While I agree it seems it should be more performant per retrieval, as you can 
see from the latest logs, it's the difference between 1,096,089 (2.2.0-rc0) and 
19,245 (0.10.2.1) hits per second to the cache. The two orders of magnitude 
appear to outweigh whatever performance benefit we'd receive from the caching 
layer. 

This is just one of 8 tasks. During their respective runs, the services 
consumed 8.4M messages (0.10.2.1) with no lag vs 637K messages (2.2.0-rc0) with 
considerable lag. I'd be happy to run again with whatever custom logging or 
configuration you suggest to help further pinpoint the problem. 

 

 

 

> Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0
> -
>
> Key: KAFKA-7652
> URL: https://issues.apache.org/jira/browse/KAFKA-7652
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.11.0.0, 0.11.0.1, 0.11.0.2, 0.11.0.3, 1.1.1, 2.0.0, 
> 2.0.1
>Reporter: Jonathan Gordon
>Assignee: Guozhang Wang
>Priority: Major
>  Labels: kip
> Fix For: 2.2.0
>
> Attachments: 0.10.2.1-NamedCache.txt, 2.2.0-rc0_b-NamedCache.txt, 
> kafka_10_2_1_flushes.txt, kafka_11_0_3_flushes.txt
>
>
> I'm creating this issue in response to [~guozhang]'s request on the mailing 
> list:
> [https://lists.apache.org/thread.html/97d620f4fd76be070ca4e2c70e2fda53cafe051e8fc4505dbcca0321@%3Cusers.kafka.apache.org%3E]
> We are attempting to upgrade our Kafka Streams application from 0.10.2.1 but 
> experience a severe performance degradation. The highest amount of CPU time 
> seems spent in retrieving from the local cache. Here's an example thread 
> profile with 0.11.0.0:
> [https://i.imgur.com/l5VEsC2.png]
> When things are running smoothly we're gated by retrieving from the state 
> store with acceptable performance. Here's an example thread profile with 
> 0.10.2.1:
> [https://i.imgur.com/IHxC2cZ.png]
> Some investigation reveals that it appears we're performing about 3 orders 
> magnitude more lookups on the NamedCache over a comparable time period. I've 
> attached logs of the NamedCache flush logs for 0.10.2.1 and 0.11.0.3.
> We're using session windows and have the app configured for 
> commit.interval.ms = 30 * 1000 and cache.max.bytes.buffering = 10485760
> I'm happy to share more details if they would be helpful. Also happy to run 
> tests on our data.
> I also found this issue, which seems like it may be related:
> https://issues.apache.org/jira/browse/KAFKA-4904
>  
> KIP-420: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-420%3A+Add+Single+Value+Fetch+in+Session+Stores]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-7652) Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0

2019-02-27 Thread Jonathan Gordon (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gordon updated KAFKA-7652:
---
Attachment: 0.10.2.1-NamedCache.txt
2.2.0-rc0_b-NamedCache.txt

> Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0
> -
>
> Key: KAFKA-7652
> URL: https://issues.apache.org/jira/browse/KAFKA-7652
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.11.0.0, 0.11.0.1, 0.11.0.2, 0.11.0.3, 1.1.1, 2.0.0, 
> 2.0.1
>Reporter: Jonathan Gordon
>Assignee: Guozhang Wang
>Priority: Major
>  Labels: kip
> Fix For: 2.2.0
>
> Attachments: 0.10.2.1-NamedCache.txt, 2.2.0-rc0_b-NamedCache.txt, 
> kafka_10_2_1_flushes.txt, kafka_11_0_3_flushes.txt
>
>
> I'm creating this issue in response to [~guozhang]'s request on the mailing 
> list:
> [https://lists.apache.org/thread.html/97d620f4fd76be070ca4e2c70e2fda53cafe051e8fc4505dbcca0321@%3Cusers.kafka.apache.org%3E]
> We are attempting to upgrade our Kafka Streams application from 0.10.2.1 but 
> experience a severe performance degradation. The highest amount of CPU time 
> seems spent in retrieving from the local cache. Here's an example thread 
> profile with 0.11.0.0:
> [https://i.imgur.com/l5VEsC2.png]
> When things are running smoothly we're gated by retrieving from the state 
> store with acceptable performance. Here's an example thread profile with 
> 0.10.2.1:
> [https://i.imgur.com/IHxC2cZ.png]
> Some investigation reveals that it appears we're performing about 3 orders 
> magnitude more lookups on the NamedCache over a comparable time period. I've 
> attached logs of the NamedCache flush logs for 0.10.2.1 and 0.11.0.3.
> We're using session windows and have the app configured for 
> commit.interval.ms = 30 * 1000 and cache.max.bytes.buffering = 10485760
> I'm happy to share more details if they would be helpful. Also happy to run 
> tests on our data.
> I also found this issue, which seems like it may be related:
> https://issues.apache.org/jira/browse/KAFKA-4904
>  
> KIP-420: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-420%3A+Add+Single+Value+Fetch+in+Session+Stores]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7652) Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0

2019-02-25 Thread Jonathan Gordon (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16777630#comment-16777630
 ] 

Jonathan Gordon commented on KAFKA-7652:


I tested out with trunk on Feb 22 (commit 
0d461e4ea0a8353c358ae661837f471995943bb0) and we're still seeing the same 
performance issue. Aside from logging the output of the NamedCache stats, is 
there data I can provide to help further narrow down the issue? Any other ideas?

> Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0
> -
>
> Key: KAFKA-7652
> URL: https://issues.apache.org/jira/browse/KAFKA-7652
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.11.0.0, 0.11.0.1, 0.11.0.2, 0.11.0.3, 1.1.1, 2.0.0, 
> 2.0.1
>Reporter: Jonathan Gordon
>Assignee: Guozhang Wang
>Priority: Major
>  Labels: kip
> Fix For: 2.2.0
>
> Attachments: kafka_10_2_1_flushes.txt, kafka_11_0_3_flushes.txt
>
>
> I'm creating this issue in response to [~guozhang]'s request on the mailing 
> list:
> [https://lists.apache.org/thread.html/97d620f4fd76be070ca4e2c70e2fda53cafe051e8fc4505dbcca0321@%3Cusers.kafka.apache.org%3E]
> We are attempting to upgrade our Kafka Streams application from 0.10.2.1 but 
> experience a severe performance degradation. The highest amount of CPU time 
> seems spent in retrieving from the local cache. Here's an example thread 
> profile with 0.11.0.0:
> [https://i.imgur.com/l5VEsC2.png]
> When things are running smoothly we're gated by retrieving from the state 
> store with acceptable performance. Here's an example thread profile with 
> 0.10.2.1:
> [https://i.imgur.com/IHxC2cZ.png]
> Some investigation reveals that it appears we're performing about 3 orders 
> magnitude more lookups on the NamedCache over a comparable time period. I've 
> attached logs of the NamedCache flush logs for 0.10.2.1 and 0.11.0.3.
> We're using session windows and have the app configured for 
> commit.interval.ms = 30 * 1000 and cache.max.bytes.buffering = 10485760
> I'm happy to share more details if they would be helpful. Also happy to run 
> tests on our data.
> I also found this issue, which seems like it may be related:
> https://issues.apache.org/jira/browse/KAFKA-4904
>  
> KIP-420: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-420%3A+Add+Single+Value+Fetch+in+Session+Stores]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7748) Add wall clock TimeDefinition for suppression of intermediate events

2019-01-13 Thread Jonathan Gordon (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16741781#comment-16741781
 ] 

Jonathan Gordon commented on KAFKA-7748:


[~vvcephei] It doesn't appear I have perms to create a KIP. Is that something 
you were hoping I would do or are you planning on taking that on yourself?

> Add wall clock TimeDefinition for suppression of intermediate events
> 
>
> Key: KAFKA-7748
> URL: https://issues.apache.org/jira/browse/KAFKA-7748
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Affects Versions: 2.1.0
>Reporter: Jonathan Gordon
>Priority: Major
>  Labels: needs-kip
>
> Currently, Kafka Streams offers the ability to suppress intermediate events 
> based on either RecordTime or WindowEndTime, which are in turn defined by 
> stream time:
> {{Suppressed.untilTimeLimit(final Duration timeToWaitForMoreEvents, final 
> BufferConfig bufferConfig)}}
> It would be helpful to have another option that would allow suppression of 
> intermediate events based on wall clock time. This would allow us to only 
> produce a limited number of aggregates independent of their stream time 
> (which in our case is event time).
> For reference, here's the relevant KIP:
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-328%3A+Ability+to+suppress+updates+for+KTables#KIP-328:AbilitytosuppressupdatesforKTables-Best-effortratelimitperkey]
> And here's the relevant Confluent Slack thread:
> https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1544468349201700
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7748) Add wall clock TimeDefinition for suppression of intermediate events

2018-12-17 Thread Jonathan Gordon (JIRA)
Jonathan Gordon created KAFKA-7748:
--

 Summary: Add wall clock TimeDefinition for suppression of 
intermediate events
 Key: KAFKA-7748
 URL: https://issues.apache.org/jira/browse/KAFKA-7748
 Project: Kafka
  Issue Type: New Feature
  Components: streams
Affects Versions: 2.1.0
Reporter: Jonathan Gordon


Currently, Kafka Streams offers the ability to suppress intermediate events 
based on either RecordTime or WindowEndTime, which are in turn defined by 
stream time:

{{Suppressed.untilTimeLimit(final Duration timeToWaitForMoreEvents, final 
BufferConfig bufferConfig)}}

It would be helpful to have another option that would allow suppression of 
intermediate events based on wall clock time. This would allow us to only 
produce a limited number of aggregates independent of their stream time (which 
in our case is event time).

For reference, here's the relevant KIP:

[https://cwiki.apache.org/confluence/display/KAFKA/KIP-328%3A+Ability+to+suppress+updates+for+KTables#KIP-328:AbilitytosuppressupdatesforKTables-Best-effortratelimitperkey]

And here's the relevant Confluent Slack thread:

https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1544468349201700

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-7652) Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0

2018-11-17 Thread Jonathan Gordon (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gordon updated KAFKA-7652:
---
Description: 
I'm creating this issue in response to [~guozhang]'s request on the mailing 
list:

[https://lists.apache.org/thread.html/97d620f4fd76be070ca4e2c70e2fda53cafe051e8fc4505dbcca0321@%3Cusers.kafka.apache.org%3E]

We are attempting to upgrade our Kafka Streams application from 0.10.2.1 but 
experience a severe performance degradation. The highest amount of CPU time 
seems spent in retrieving from the local cache. Here's an example thread 
profile with 0.11.0.0:

[https://i.imgur.com/l5VEsC2.png]

When things are running smoothly we're gated by retrieving from the state store 
with acceptable performance. Here's an example thread profile with 0.10.2.1:

[https://i.imgur.com/IHxC2cZ.png]

Some investigation reveals that it appears we're performing about 3 orders 
magnitude more lookups on the NamedCache over a comparable time period. I've 
attached logs of the NamedCache flush logs for 0.10.2.1 and 0.11.0.3.

We're using session windows and have the app configured for commit.interval.ms 
= 30 * 1000 and cache.max.bytes.buffering = 10485760

I'm happy to share more details if they would be helpful. Also happy to run 
tests on our data.

I also found this issue, which seems like it may be related:

https://issues.apache.org/jira/browse/KAFKA-4904

 

  was:
Here's the original thread from the mailing list:

https://lists.apache.org/thread.html/97d620f4fd76be070ca4e2c70e2fda53cafe051e8fc4505dbcca0321@%3Cusers.kafka.apache.org%3E

We are attempting to upgrade our Kafka Streams application from 0.10.2.1 but 
experience a severe performance degradation. The highest amount of CPU time 
seems spent in retrieving from the local cache. Here's an example with 0.11.0.0:

[https://i.imgur.com/l5VEsC2.png]

When things are running smoothly we're gated by retrieving from the state store 
with acceptable performance. Here's an example with 0.10.2.1:

[https://i.imgur.com/IHxC2cZ.png]

Some investigation reveals that it appears we're performing about 3 orders 
magnitude more lookups on the NamedCache over a comparable time period. I've 
attached logs of the NamedCache flush logs for 0.10.2.1 and 0.11.0.3.

We're using session windows and have the app configured for commit.interval.ms 
of 30 * 1000 and cache.max.bytes.buffering = 10485760

I'm happy to share more details if they would be helpful. Also happy to run 
tests on our data.

 

 


> Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0
> -
>
> Key: KAFKA-7652
> URL: https://issues.apache.org/jira/browse/KAFKA-7652
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.11.0.0, 0.11.0.1, 0.11.0.2, 0.11.0.3, 1.1.1, 2.0.0, 
> 2.0.1
>Reporter: Jonathan Gordon
>Priority: Major
> Attachments: kafka_10_2_1_flushes.txt, kafka_11_0_3_flushes.txt
>
>
> I'm creating this issue in response to [~guozhang]'s request on the mailing 
> list:
> [https://lists.apache.org/thread.html/97d620f4fd76be070ca4e2c70e2fda53cafe051e8fc4505dbcca0321@%3Cusers.kafka.apache.org%3E]
> We are attempting to upgrade our Kafka Streams application from 0.10.2.1 but 
> experience a severe performance degradation. The highest amount of CPU time 
> seems spent in retrieving from the local cache. Here's an example thread 
> profile with 0.11.0.0:
> [https://i.imgur.com/l5VEsC2.png]
> When things are running smoothly we're gated by retrieving from the state 
> store with acceptable performance. Here's an example thread profile with 
> 0.10.2.1:
> [https://i.imgur.com/IHxC2cZ.png]
> Some investigation reveals that it appears we're performing about 3 orders 
> magnitude more lookups on the NamedCache over a comparable time period. I've 
> attached logs of the NamedCache flush logs for 0.10.2.1 and 0.11.0.3.
> We're using session windows and have the app configured for 
> commit.interval.ms = 30 * 1000 and cache.max.bytes.buffering = 10485760
> I'm happy to share more details if they would be helpful. Also happy to run 
> tests on our data.
> I also found this issue, which seems like it may be related:
> https://issues.apache.org/jira/browse/KAFKA-4904
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7652) Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0

2018-11-17 Thread Jonathan Gordon (JIRA)
Jonathan Gordon created KAFKA-7652:
--

 Summary: Kafka Streams Session store performance degradation from 
0.10.2.2 to 0.11.0.0
 Key: KAFKA-7652
 URL: https://issues.apache.org/jira/browse/KAFKA-7652
 Project: Kafka
  Issue Type: Bug
Affects Versions: 2.0.1, 2.0.0, 1.1.1, 0.11.0.3, 0.11.0.2, 0.11.0.1, 
0.11.0.0
Reporter: Jonathan Gordon
 Attachments: kafka_10_2_1_flushes.txt, kafka_11_0_3_flushes.txt

Here's the original thread from the mailing list:

https://lists.apache.org/thread.html/97d620f4fd76be070ca4e2c70e2fda53cafe051e8fc4505dbcca0321@%3Cusers.kafka.apache.org%3E

We are attempting to upgrade our Kafka Streams application from 0.10.2.1 but 
experience a severe performance degradation. The highest amount of CPU time 
seems spent in retrieving from the local cache. Here's an example with 0.11.0.0:

[https://i.imgur.com/l5VEsC2.png]

When things are running smoothly we're gated by retrieving from the state store 
with acceptable performance. Here's an example with 0.10.2.1:

[https://i.imgur.com/IHxC2cZ.png]

Some investigation reveals that it appears we're performing about 3 orders 
magnitude more lookups on the NamedCache over a comparable time period. I've 
attached logs of the NamedCache flush logs for 0.10.2.1 and 0.11.0.3.

We're using session windows and have the app configured for commit.interval.ms 
of 30 * 1000 and cache.max.bytes.buffering = 10485760

I'm happy to share more details if they would be helpful. Also happy to run 
tests on our data.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)