Re: [PR] MINOR: Change test logging capture to per-test, reducing jenkins truncation [kafka]
github-actions[bot] commented on PR #14795: URL: https://github.com/apache/kafka/pull/14795#issuecomment-1977907234 This PR is being marked as stale since it has not had any activity in 90 days. If you would like to keep this PR alive, please ask a committer for review. If the PR has merge conflicts, please update it with the latest from trunk (or appropriate release branch) If this PR is no longer valid or desired, please feel free to close it. If no activity occurs in the next 30 days, it will be automatically closed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] MINOR: Change test logging capture to per-test, reducing jenkins truncation [kafka]
gharris1727 commented on PR #14795: URL: https://github.com/apache/kafka/pull/14795#issuecomment-1841693899 @mimaison I've raised a ticket to ask the Infra team what they think about this change: https://issues.apache.org/jira/browse/INFRA-25245 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] MINOR: Change test logging capture to per-test, reducing jenkins truncation [kafka]
mimaison commented on PR #14795: URL: https://github.com/apache/kafka/pull/14795#issuecomment-1839170339 Thanks @gharris1727 for looking into this. That seems an interesting option and it would definitively help debugging. The only concern I have is about the retained size per build. x5 is a significant increase. Do you know how much space builds from other Apache project use? i.e. Is >100MB going to make Kafka an outlier? or is it a pretty common/acceptable size? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] MINOR: Change test logging capture to per-test, reducing jenkins truncation [kafka]
gharris1727 commented on PR #14795: URL: https://github.com/apache/kafka/pull/14795#issuecomment-1832781742 Also worth a follow-up is maybe tackling the outliers that do experience truncation, and try to reduce their log volume to a more reasonable level. The biggest offenders above 10MB appear to be: Size | Test -- | -- 153.56MB | org.apache.kafka.streams.integration.NamedTopologyIntegrationTest.shouldWaitForMissingInputTopicsToBeCreated() 142.97MB | org.apache.kafka.streams.processor.internals.assignment.TaskAssignorConvergenceTest.randomClusterPerturbationsShouldConverge[enableRackAwareTaskAssignor=false] 103.84MB | org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.testAlterSourceConnectorOffsetsExactlyOnceSupportEnabled 83.26MB | org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperatorTest.testThreadSafety 82.94MB | org.apache.kafka.streams.integration.NamedTopologyIntegrationTest.shouldAddAndRemoveNamedTopologiesBeforeStartingAndRouteQueriesToCorrectTopology() 66.77MB | kafka.coordinator.group.GroupCoordinatorConcurrencyTest.testConcurrentRandomSequence() 66.68MB | org.apache.kafka.streams.integration.NamedTopologyIntegrationTest.shouldAddNamedTopologyToRunningApplicationWithMultipleInitialNamedTopologies() 45.17MB | org.apache.kafka.streams.integration.NamedTopologyIntegrationTest.shouldAddToEmptyInitialTopologyRemoveResetOffsetsThenAddSameNamedTopologyWithRepartitioning() 41.76MB | org.apache.kafka.streams.integration.NamedTopologyIntegrationTest.shouldAllowRemovingAndAddingNamedTopologyToRunningApplicationWithMultipleNodesAndResetsOffsets() 39.36MB | org.apache.kafka.streams.integration.NamedTopologyIntegrationTest.shouldRemoveAndReplaceTopologicallyIncompatibleNamedTopology() 34.26MB | org.apache.kafka.streams.processor.internals.assignment.TaskAssignorConvergenceTest.randomClusterPerturbationsShouldConverge[enableRackAwareTaskAssignor=true] 29.29MB | org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest.testHighAvailabilityTaskAssignorLargeNumConsumers 28.11MB | org.apache.kafka.streams.integration.NamedTopologyIntegrationTest.shouldAllowPatternSubscriptionWithMultipleNamedTopologies() 24.25MB | org.apache.kafka.tools.MetadataQuorumCommandTest.testDescribeQuorumStatusSuccessful()[6] 22.99MB | org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.testResetSourceConnectorOffsetsExactlyOnceSupportEnabled 21.63MB | org.apache.kafka.tools.reassign.ReassignPartitionsIntegrationTest.testAlterLogDirReassignmentThrottle(String)[1] 20.61MB | org.apache.kafka.streams.integration.NamedTopologyIntegrationTest.shouldBackOffTaskAndEmitDataWithinSameTopology() 20.39MB | org.apache.kafka.tools.reassign.ReassignPartitionsIntegrationTest.testLogDirReassignment(String)[1] 19.17MB | org.apache.kafka.streams.processor.internals.HandlingSourceTopicDeletionIntegrationTest.shouldThrowErrorAfterSourceTopicDeleted 17.75MB | org.apache.kafka.tools.MetadataQuorumCommandTest.testDescribeQuorumStatusSuccessful()[2] 17.32MB | org.apache.kafka.connect.integration.ConnectWorkerIntegrationTest.testBrokerCoordinator 16.30MB | org.apache.kafka.tools.MetadataQuorumCommandTest.testDescribeQuorumReplicationSuccessful()[2] 15.47MB | org.apache.kafka.tools.reassign.ReassignPartitionsIntegrationTest.testCancellation(String)[1] 13.77MB | org.apache.kafka.tools.reassign.ReassignPartitionsIntegrationTest.testThrottledReassignment(String)[1] 12.64MB | org.apache.kafka.tools.reassign.ReassignPartitionsIntegrationTest.testCancellation(String)[2] 12.46MB | org.apache.kafka.tools.MetadataQuorumCommandTest.testDescribeQuorumReplicationSuccessful()[6] 10.22MB | org.apache.kafka.streams.integration.NamedTopologyIntegrationTest.shouldAddNamedTopologyToRunningApplicationWithEmptyInitialTopology() -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] MINOR: Change test logging capture to per-test, reducing jenkins truncation [kafka]
gharris1727 commented on PR #14795: URL: https://github.com/apache/kafka/pull/14795#issuecomment-1832757011 I did some statistics on the current state of log truncation in CI. I learned that: * A full run of `./gradlew test` writes 1.73GB of logs * 814 of 1532 test suites (53%) produce 0 logs For the existing suite-level truncation: * 1.71GB (98%) of these logs are discarded due to truncation * 27MB (1.5%) of these logs are kept after truncation * 175 of 1532 test suites (11%) experience truncation * 5721 of 26298 tests (22%) are in test suites that experience truncation * Test suites which produce logs average 37kb of logs after truncation With the test-level truncation proposed here: * 1.61GB (93%) of these logs are discarded due to test-level truncation * 126MB (7%) of these logs are kept after truncation * 452 of 26298 tests (1.7%) experience truncation * Tests which produce logs average 15kb of logs after truncation So, assuming a worst-case run with every test failing (as logs are only kept for failed tests) and log volume similar to successful runs, this change would cost 5 times (126MB/27MB) as much log storage space. However, any particular test would be **~12 times (5721/452) less likely to experience truncation**. We don't regularly see fully-failed test suites, and instead typically see small numbers of test failures. If we assume test failures to be uniformly distributed among all tests (which they almost certainly aren't, but I don't have statistics for that) we can use averages to calculate the expected persisted logs per test failure. Since a test failure in suite-truncation keeps the truncated logs for the whole suite, each test failure adds on average 37kb of logs, or less if multiple tests in the same suite fail. Test failures under test-truncation only keep logs for the individual tests, which averages 15kb, and receives no discount for multiple test failures. So assuming a small number of test failures which is typical, **the cost of storing these logs is 2 times (37kb/15kb) less**. I believe that this change should not be harmful to the Jenkins test infrastructure, and will immediately benefit our ability to debug tests via CI, especially flaky failures. @ijuma @mimaison @divijvaidya Could you take a look at this when you have a chance? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] MINOR: Change test logging capture to per-test, reducing jenkins truncation [kafka]
gharris1727 opened a new pull request, #14795: URL: https://github.com/apache/kafka/pull/14795 Jenkins truncates stdout/stderr from tests which exceed 100,000 bytes. This truncation is computed once per-suite, meaning that each suite gets a 100kb budget for logs, and suites that log too much have the middle of the log truncated. This unfairly discards complete logs for tests in the middle of the suite, while keeping logs from the beginning and end of the suite. If a failure occurs in a single test in the middle of a suite, the relevant logs may be completely elided, making investigation of the failure more difficult. This has made debugging with the CI logging almost completely ineffective, as the relevant logs are often swallowed by Jenkins, and irrelevant logs are shown. Instead, we can enable this feature in the Gradle JunitXmlReport: https://docs.gradle.org/current/javadoc/org/gradle/api/tasks/testing/JUnitXmlReport.html#setOutputPerTestCase-boolean- This changes the way that stdout/stderr is embedded in the XML report, separating the output for each test into different xml tags. This may enable Jenkins to perform truncation on a per-test basis, so that each test in a suite gets a fair distribution of the logging budget. This could increase the size of the logs persisted by Jenkins, as each suite is currently capped at 100kb, but after this change you could receive N*100kb logs overall, if there are N tests in the suite. However It appears that Jenkins cannot show the stdout for passing tests, so probably isn't capturing it (I couldn't find a configuration which would confirm this.) If this is true, that means that the size of logs persisted will only increase for test failures, when the additional logs would be useful. This change may also reduce the total amount of logs captured, since logs from tests that passed won't be kept when another test in the same suite fails. Regardless, the more effective usage of the logging budget will be beneficial even if the total amount of logs persisted increases. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org