David Arthur created KAFKA-18921:
------------------------------------
Summary: Low CPU utilization during streams integration tests
Key: KAFKA-18921
URL: https://issues.apache.org/jira/browse/KAFKA-18921
Project: Kafka
Issue Type: Improvement
Components: build, streams
Reporter: David Arthur
Attachments: image-2025-03-04-17-05-48-768.png
While analyzing some build performance, I noticed that our CPU is not well
utilized during the streams integration tests. We should investigate to make
sure we're not needlessly waiting during the tests.
Here is an example from this build scan
https://develocity.apache.org/s/rqe4e2vh5jr72/timeline?page=1&sort=longest
!image-2025-03-04-17-05-48-768.png!
Notice that the ":streams:integration-tests:test" worker is running longer than
the other works and we can see during that time that the CPU is not very
utilized. This suggests that the tests are in some idle/waiting state.
I tried to reproduce this locally and found at 40 second pause in a random test.
[2025-03-04 16:56:11,606] INFO [Consumer
clientId=consumer-f2eaa005-b1e1-406f-8c52-e50c71b4d807-58,
groupId=f2eaa005-b1e1-406f-8c52-e50c71b4d807] Resetting offset for partition
outputTopic_3-0 to position FetchPosition\{offset=0,
offsetEpoch=Optional.empty,
currentLeader=LeaderAndEpoch{leader=Optional[localhost:59515 (id: 0 rack: null
isFenced: false)], epoch=0}}.
(org.apache.kafka.clients.consumer.internals.SubscriptionState:447)
[2025-03-04 16:56:55,214] INFO [Consumer
clientId=testAutoOffsetId-a426e1fb-61ad-4ce8-92ca-f0fe337c2565-StreamThread-1-consumer,
groupId=testAutoOffsetId] Successfully joined group with generation
Generation\{generationId=4,
memberId='testAutoOffsetId-a426e1fb-61ad-4ce8-92ca-f0fe337c2565-StreamThread-1-consumer-2f798603-d538-4041-a90d-2e1b922dd5af',
protocol='stream'}
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:666)
I grabbed a thread dump:
at
org.apache.kafka.streams.integration.utils.IntegrationTestUtils.readRecords(IntegrationTestUtils.java:1177)
at
org.apache.kafka.streams.integration.utils.IntegrationTestUtils.readKeyValues(IntegrationTestUtils.java:1136)
at
org.apache.kafka.streams.integration.utils.IntegrationTestUtils.lambda$waitUntilMinKeyValueRecordsReceived$4(IntegrationTestUtils.java:710)
at
org.apache.kafka.streams.integration.utils.IntegrationTestUtils$$Lambda$3338/0x00000070018f4fb0.call(Unknown
Source)
at
org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:483)
at
org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:451)
at
org.apache.kafka.streams.integration.utils.IntegrationTestUtils.waitUntilMinKeyValueRecordsReceived(IntegrationTestUtils.java:708)
at
org.apache.kafka.streams.integration.utils.IntegrationTestUtils.waitUntilMinKeyValueRecordsReceived(IntegrationTestUtils.java:681)
at
org.apache.kafka.streams.integration.FineGrainedAutoResetIntegrationTest.shouldResetByDuration(FineGrainedAutoResetIntegrationTest.java:334)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)