[jira] [Commented] (KAFKA-3769) KStream job spending 60% of time writing metrics
[ https://issues.apache.org/jira/browse/KAFKA-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15468374#comment-15468374 ] Greg Fodor commented on KAFKA-3769: --- It's just the sensor calls inside of Selector, not Kafka Streams specific. I'll verify as much as I can from the profiler snapshot that it's the same issue and will open a jira. > KStream job spending 60% of time writing metrics > > > Key: KAFKA-3769 > URL: https://issues.apache.org/jira/browse/KAFKA-3769 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.0.0 >Reporter: Greg Fodor >Assignee: Guozhang Wang >Priority: Critical > Fix For: 0.10.1.0 > > > I've been profiling a complex streams job, and found two major hotspots when > writing metrics, which take up about 60% of the CPU time of the job. (!) A PR > is attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3769) KStream job spending 60% of time writing metrics
[ https://issues.apache.org/jira/browse/KAFKA-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15468157#comment-15468157 ] Guozhang Wang commented on KAFKA-3769: -- Yes please :) Also is it a general issue in {{Selector}} class, or a specific issue in Kafka Streams? > KStream job spending 60% of time writing metrics > > > Key: KAFKA-3769 > URL: https://issues.apache.org/jira/browse/KAFKA-3769 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.0.0 >Reporter: Greg Fodor >Assignee: Guozhang Wang >Priority: Critical > Fix For: 0.10.1.0 > > > I've been profiling a complex streams job, and found two major hotspots when > writing metrics, which take up about 60% of the CPU time of the job. (!) A PR > is attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3769) KStream job spending 60% of time writing metrics
[ https://issues.apache.org/jira/browse/KAFKA-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15463861#comment-15463861 ] Greg Fodor commented on KAFKA-3769: --- I've done some additional profiling and I have found that this problem also seems to crop up in complex kafka streams jobs within the Kafka core Selector class. Should I open another JIRA? > KStream job spending 60% of time writing metrics > > > Key: KAFKA-3769 > URL: https://issues.apache.org/jira/browse/KAFKA-3769 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.0.0 >Reporter: Greg Fodor >Assignee: Guozhang Wang >Priority: Critical > Fix For: 0.10.1.0 > > > I've been profiling a complex streams job, and found two major hotspots when > writing metrics, which take up about 60% of the CPU time of the job. (!) A PR > is attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3769) KStream job spending 60% of time writing metrics
[ https://issues.apache.org/jira/browse/KAFKA-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423136#comment-15423136 ] ASF GitHub Bot commented on KAFKA-3769: --- Github user asfgit closed the pull request at: https://github.com/apache/kafka/pull/1530 > KStream job spending 60% of time writing metrics > > > Key: KAFKA-3769 > URL: https://issues.apache.org/jira/browse/KAFKA-3769 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.0.0 >Reporter: Greg Fodor >Assignee: Guozhang Wang >Priority: Critical > > I've been profiling a complex streams job, and found two major hotspots when > writing metrics, which take up about 60% of the CPU time of the job. (!) A PR > is attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3769) KStream job spending 60% of time writing metrics
[ https://issues.apache.org/jira/browse/KAFKA-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15340675#comment-15340675 ] ASF GitHub Bot commented on KAFKA-3769: --- Github user guozhangwang closed the pull request at: https://github.com/apache/kafka/pull/1490 > KStream job spending 60% of time writing metrics > > > Key: KAFKA-3769 > URL: https://issues.apache.org/jira/browse/KAFKA-3769 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.0.0 >Reporter: Greg Fodor >Assignee: Guozhang Wang >Priority: Critical > > I've been profiling a complex streams job, and found two major hotspots when > writing metrics, which take up about 60% of the CPU time of the job. (!) A PR > is attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3769) KStream job spending 60% of time writing metrics
[ https://issues.apache.org/jira/browse/KAFKA-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15340672#comment-15340672 ] ASF GitHub Bot commented on KAFKA-3769: --- GitHub user guozhangwang opened a pull request: https://github.com/apache/kafka/pull/1530 KAFKA-3769: Create new sensors per-thread in KafkaStreams You can merge this pull request into a Git repository by running: $ git pull https://github.com/guozhangwang/kafka K3769-per-thread-metrics Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/1530.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1530 commit 8a8efa52bc5538ba933f85a0cdb9196d4a222e10 Author: Guozhang Wang Date: 2016-06-20T23:02:34Z use per-thread sensor names commit 25859f4b5121e19602741809c18864d99a84574f Author: Guozhang Wang Date: 2016-06-20T23:02:40Z Merge branch 'trunk' of https://git-wip-us.apache.org/repos/asf/kafka into K3769-per-thread-metrics commit 5fae56123fabc0568a99c27ba5b2e792b9f6a685 Author: Guozhang Wang Date: 2016-06-20T23:29:47Z remove unused imports > KStream job spending 60% of time writing metrics > > > Key: KAFKA-3769 > URL: https://issues.apache.org/jira/browse/KAFKA-3769 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.0.0 >Reporter: Greg Fodor >Assignee: Guozhang Wang >Priority: Critical > > I've been profiling a complex streams job, and found two major hotspots when > writing metrics, which take up about 60% of the CPU time of the job. (!) A PR > is attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3769) KStream job spending 60% of time writing metrics
[ https://issues.apache.org/jira/browse/KAFKA-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15326778#comment-15326778 ] Greg Fodor commented on KAFKA-3769: --- Discussion/resolution moved to: https://issues.apache.org/jira/browse/KAFKA-3811 > KStream job spending 60% of time writing metrics > > > Key: KAFKA-3769 > URL: https://issues.apache.org/jira/browse/KAFKA-3769 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.0.0 >Reporter: Greg Fodor >Assignee: Guozhang Wang >Priority: Critical > > I've been profiling a complex streams job, and found two major hotspots when > writing metrics, which take up about 60% of the CPU time of the job. (!) A PR > is attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3769) KStream job spending 60% of time writing metrics
[ https://issues.apache.org/jira/browse/KAFKA-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15325148#comment-15325148 ] ASF GitHub Bot commented on KAFKA-3769: --- GitHub user guozhangwang opened a pull request: https://github.com/apache/kafka/pull/1490 KAFKA-3769: Optimize metrics recording overhead You can merge this pull request into a Git repository by running: $ git pull https://github.com/guozhangwang/kafka K3769-optimize-metrics Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/1490.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1490 commit 68c889e1cba59280e4c1a37007efcdec6c784878 Author: Guozhang Wang Date: 2016-06-10T17:50:29Z reduce time.milliseconds commit ad5403c919fe6ee9e5c5e7e3d9c9f2534a5a3717 Author: Guozhang Wang Date: 2016-06-10T19:38:05Z use milliseconds instead of nanoseconds for state store metrics > KStream job spending 60% of time writing metrics > > > Key: KAFKA-3769 > URL: https://issues.apache.org/jira/browse/KAFKA-3769 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.0.0 >Reporter: Greg Fodor >Assignee: Guozhang Wang >Priority: Critical > > I've been profiling a complex streams job, and found two major hotspots when > writing metrics, which take up about 60% of the CPU time of the job. (!) A PR > is attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3769) KStream job spending 60% of time writing metrics
[ https://issues.apache.org/jira/browse/KAFKA-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15313369#comment-15313369 ] ASF GitHub Bot commented on KAFKA-3769: --- Github user gfodor closed the pull request at: https://github.com/apache/kafka/pull/1447 > KStream job spending 60% of time writing metrics > > > Key: KAFKA-3769 > URL: https://issues.apache.org/jira/browse/KAFKA-3769 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.0.0 >Reporter: Greg Fodor >Assignee: Guozhang Wang >Priority: Critical > > I've been profiling a complex streams job, and found two major hotspots when > writing metrics, which take up about 60% of the CPU time of the job. (!) A PR > is attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3769) KStream job spending 60% of time writing metrics
[ https://issues.apache.org/jira/browse/KAFKA-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15313295#comment-15313295 ] Guozhang Wang commented on KAFKA-3769: -- Hello Greg, I will continue the discussion in your PR. > KStream job spending 60% of time writing metrics > > > Key: KAFKA-3769 > URL: https://issues.apache.org/jira/browse/KAFKA-3769 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.0.0 >Reporter: Greg Fodor >Assignee: Guozhang Wang >Priority: Critical > > I've been profiling a complex streams job, and found two major hotspots when > writing metrics, which take up about 60% of the CPU time of the job. (!) A PR > is attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3769) KStream job spending 60% of time writing metrics
[ https://issues.apache.org/jira/browse/KAFKA-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311179#comment-15311179 ] Greg Fodor commented on KAFKA-3769: --- Thanks Jay! Guozhang, what are your thoughts on instead of trying to reduce the granularity of the metrics, potentially having a way to just disable the process/latency metrics collection? I'm still pretty new to KStreams, and haven't used these metrics, but I'm guessing they will be used for occasionally tuning the job against production data but not necessarily for operational monitoring. (I could be wrong about this.) As such, it seems that you may want to just have a switch you flip when you are running in production that will disable the metrics and maximize the throughput of the job, and then turn it on selectively when you want to perform performance measurement. > KStream job spending 60% of time writing metrics > > > Key: KAFKA-3769 > URL: https://issues.apache.org/jira/browse/KAFKA-3769 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.0.0 >Reporter: Greg Fodor >Assignee: Guozhang Wang >Priority: Critical > > I've been profiling a complex streams job, and found two major hotspots when > writing metrics, which take up about 60% of the CPU time of the job. (!) A PR > is attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3769) KStream job spending 60% of time writing metrics
[ https://issues.apache.org/jira/browse/KAFKA-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15309288#comment-15309288 ] Jay Kreps commented on KAFKA-3769: -- Nice catch. > KStream job spending 60% of time writing metrics > > > Key: KAFKA-3769 > URL: https://issues.apache.org/jira/browse/KAFKA-3769 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.0.0 >Reporter: Greg Fodor >Assignee: Guozhang Wang >Priority: Critical > > I've been profiling a complex streams job, and found two major hotspots when > writing metrics, which take up about 60% of the CPU time of the job. (!) A PR > is attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3769) KStream job spending 60% of time writing metrics
[ https://issues.apache.org/jira/browse/KAFKA-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15306381#comment-15306381 ] Greg Fodor commented on KAFKA-3769: --- Consider the PR a first pass, please advise on how we may want to deal with the fact that for KStream jobs with lots of tasks, etc, the overhead of writing the various process/poll/latency metrics is immense. > KStream job spending 60% of time writing metrics > > > Key: KAFKA-3769 > URL: https://issues.apache.org/jira/browse/KAFKA-3769 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.0.0 >Reporter: Greg Fodor >Assignee: Guozhang Wang >Priority: Critical > > I've been profiling a complex streams job, and found two major hotspots when > writing metrics, which take up about 60% of the CPU time of the job. (!) A PR > is attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3769) KStream job spending 60% of time writing metrics
[ https://issues.apache.org/jira/browse/KAFKA-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15306321#comment-15306321 ] Greg Fodor commented on KAFKA-3769: --- It seems it might be desirable to have a way to just flip off some or all of the metrics. > KStream job spending 60% of time writing metrics > > > Key: KAFKA-3769 > URL: https://issues.apache.org/jira/browse/KAFKA-3769 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.0.0 >Reporter: Greg Fodor >Assignee: Guozhang Wang >Priority: Critical > > I've been profiling a complex streams job, and found two major hotspots when > writing metrics, which take up about 60% of the CPU time of the job. (!) A PR > is attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3769) KStream job spending 60% of time writing metrics
[ https://issues.apache.org/jira/browse/KAFKA-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15306318#comment-15306318 ] Greg Fodor commented on KAFKA-3769: --- Actually, additionally it looks like the code path for fetching from RocksDB spends most of its time recording the latency metrics :( > KStream job spending 60% of time writing metrics > > > Key: KAFKA-3769 > URL: https://issues.apache.org/jira/browse/KAFKA-3769 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.0.0 >Reporter: Greg Fodor >Assignee: Guozhang Wang >Priority: Critical > > I've been profiling a complex streams job, and found two major hotspots when > writing metrics, which take up about 60% of the CPU time of the job. (!) A PR > is attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3769) KStream job spending 60% of time writing metrics
[ https://issues.apache.org/jira/browse/KAFKA-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15306306#comment-15306306 ] Greg Fodor commented on KAFKA-3769: --- https://github.com/apache/kafka/pull/1447 > KStream job spending 60% of time writing metrics > > > Key: KAFKA-3769 > URL: https://issues.apache.org/jira/browse/KAFKA-3769 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.0.0 >Reporter: Greg Fodor >Assignee: Guozhang Wang >Priority: Critical > > I've been profiling a complex streams job, and found two major hotspots when > writing metrics, which take up about 60% of the CPU time of the job. (!) A PR > is attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)