[ 
https://issues.apache.org/jira/browse/KAFKA-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16782175#comment-16782175
 ] 

Guozhang Wang commented on KAFKA-7652:
--------------------------------------

Hi [~jonathanpdx] we are not voting on the 2.2.0 RC1 right now, if it is 
accepted then 2.2.0 is final and this PR would not be included; if it is 
cancelled we will see if we can push it into 2.2.0.

On the other hand, the PR I gave you is a bit hacky as it is just to validate 
the root cause, and I'd like to have a thorough profiling and see if we should 
consider this as a general regression fix not only for session store, but also 
for window stores. We will start the investigation right away, but in the worst 
case if we cannot get the clean fix into 2.2.0 we will cut out a 2.2.1 release 
immediately for this purpose as well. At the mean time, I think it is safe for 
your application to turn off caching since in session-windowed aggregations, as 
long as your records timestamp is monotonically increasing and there's little 
out-of-ordering data, your will keep merging / expanding your sessions as you 
accepts new data which means that you'd not have too many overwrites on the 
store that can be de-duplicated -- if you see the downstream traffic increased 
by a log if caching is not used please let me know, and we can look into that 
as well.

cc [~ableegoldman].

> Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0
> -----------------------------------------------------------------------------
>
>                 Key: KAFKA-7652
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7652
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.11.0.0, 0.11.0.1, 0.11.0.2, 0.11.0.3, 1.1.1, 2.0.0, 
> 2.0.1
>            Reporter: Jonathan Gordon
>            Assignee: Guozhang Wang
>            Priority: Major
>              Labels: kip
>             Fix For: 2.2.0
>
>         Attachments: 0.10.2.1-NamedCache.txt, 2.2.0-rc0_b-NamedCache.txt, 
> 2.3.0-7652-NamedCache.txt, kafka_10_2_1_flushes.txt, kafka_11_0_3_flushes.txt
>
>
> I'm creating this issue in response to [~guozhang]'s request on the mailing 
> list:
> [https://lists.apache.org/thread.html/97d620f4fd76be070ca4e2c70e2fda53cafe051e8fc4505dbcca0321@%3Cusers.kafka.apache.org%3E]
> We are attempting to upgrade our Kafka Streams application from 0.10.2.1 but 
> experience a severe performance degradation. The highest amount of CPU time 
> seems spent in retrieving from the local cache. Here's an example thread 
> profile with 0.11.0.0:
> [https://i.imgur.com/l5VEsC2.png]
> When things are running smoothly we're gated by retrieving from the state 
> store with acceptable performance. Here's an example thread profile with 
> 0.10.2.1:
> [https://i.imgur.com/IHxC2cZ.png]
> Some investigation reveals that it appears we're performing about 3 orders 
> magnitude more lookups on the NamedCache over a comparable time period. I've 
> attached logs of the NamedCache flush logs for 0.10.2.1 and 0.11.0.3.
> We're using session windows and have the app configured for 
> commit.interval.ms = 30 * 1000 and cache.max.bytes.buffering = 10485760
> I'm happy to share more details if they would be helpful. Also happy to run 
> tests on our data.
> I also found this issue, which seems like it may be related:
> https://issues.apache.org/jira/browse/KAFKA-4904
>  
> KIP-420: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-420%3A+Add+Single+Value+Fetch+in+Session+Stores]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to