gharris1727 commented on PR #14795: URL: https://github.com/apache/kafka/pull/14795#issuecomment-1832757011
I did some statistics on the current state of log truncation in CI. I learned that: * A full run of `./gradlew test` writes 1.73GB of logs * 814 of 1532 test suites (53%) produce 0 logs For the existing suite-level truncation: * 1.71GB (98%) of these logs are discarded due to truncation * 27MB (1.5%) of these logs are kept after truncation * 175 of 1532 test suites (11%) experience truncation * 5721 of 26298 tests (22%) are in test suites that experience truncation * Test suites which produce logs average 37kb of logs after truncation With the test-level truncation proposed here: * 1.61GB (93%) of these logs are discarded due to test-level truncation * 126MB (7%) of these logs are kept after truncation * 452 of 26298 tests (1.7%) experience truncation * Tests which produce logs average 15kb of logs after truncation So, assuming a worst-case run with every test failing (as logs are only kept for failed tests) and log volume similar to successful runs, this change would cost 5 times (126MB/27MB) as much log storage space. However, any particular test would be **~12 times (5721/452) less likely to experience truncation**. We don't regularly see fully-failed test suites, and instead typically see small numbers of test failures. If we assume test failures to be uniformly distributed among all tests (which they almost certainly aren't, but I don't have statistics for that) we can use averages to calculate the expected persisted logs per test failure. Since a test failure in suite-truncation keeps the truncated logs for the whole suite, each test failure adds on average 37kb of logs, or less if multiple tests in the same suite fail. Test failures under test-truncation only keep logs for the individual tests, which averages 15kb, and receives no discount for multiple test failures. So assuming a small number of test failures which is typical, **the cost of storing these logs is 2 times (37kb/15kb) less**. I believe that this change should not be harmful to the Jenkins test infrastructure, and will immediately benefit our ability to debug tests via CI, especially flaky failures. @ijuma @mimaison @divijvaidya Could you take a look at this when you have a chance? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org