Re: [PR] MINOR: Change test logging capture to per-test, reducing jenkins truncation [kafka]

2024-03-04 Thread via GitHub


github-actions[bot] commented on PR #14795:
URL: https://github.com/apache/kafka/pull/14795#issuecomment-1977907234

   This PR is being marked as stale since it has not had any activity in 90 
days. If you would like to keep this PR alive, please ask a committer for 
review. If the PR has  merge conflicts, please update it with the latest from 
trunk (or appropriate release branch)  If this PR is no longer valid or 
desired, please feel free to close it. If no activity occurs in the next 30 
days, it will be automatically closed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] MINOR: Change test logging capture to per-test, reducing jenkins truncation [kafka]

2023-12-05 Thread via GitHub


gharris1727 commented on PR #14795:
URL: https://github.com/apache/kafka/pull/14795#issuecomment-1841693899

   @mimaison I've raised a ticket to ask the Infra team what they think about 
this change: https://issues.apache.org/jira/browse/INFRA-25245


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] MINOR: Change test logging capture to per-test, reducing jenkins truncation [kafka]

2023-12-04 Thread via GitHub


mimaison commented on PR #14795:
URL: https://github.com/apache/kafka/pull/14795#issuecomment-1839170339

   Thanks @gharris1727 for looking into this. That seems an interesting option 
and it would definitively help debugging.
   The only concern I have is about the retained size per build. x5 is a 
significant increase. Do you know how much space builds from other Apache 
project use? i.e. Is >100MB going to make Kafka an outlier? or is it a pretty 
common/acceptable size?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] MINOR: Change test logging capture to per-test, reducing jenkins truncation [kafka]

2023-11-29 Thread via GitHub


gharris1727 commented on PR #14795:
URL: https://github.com/apache/kafka/pull/14795#issuecomment-1832781742

   Also worth a follow-up is maybe tackling the outliers that do experience 
truncation, and try to reduce their log volume to a more reasonable level.
   The biggest offenders above 10MB appear to be:
   
   Size | Test
   -- | --
   153.56MB | 
org.apache.kafka.streams.integration.NamedTopologyIntegrationTest.shouldWaitForMissingInputTopicsToBeCreated()
   142.97MB | 
org.apache.kafka.streams.processor.internals.assignment.TaskAssignorConvergenceTest.randomClusterPerturbationsShouldConverge[enableRackAwareTaskAssignor=false]
   103.84MB | 
org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.testAlterSourceConnectorOffsetsExactlyOnceSupportEnabled
   83.26MB | 
org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperatorTest.testThreadSafety
   82.94MB | 
org.apache.kafka.streams.integration.NamedTopologyIntegrationTest.shouldAddAndRemoveNamedTopologiesBeforeStartingAndRouteQueriesToCorrectTopology()
   66.77MB | 
kafka.coordinator.group.GroupCoordinatorConcurrencyTest.testConcurrentRandomSequence()
   66.68MB | 
org.apache.kafka.streams.integration.NamedTopologyIntegrationTest.shouldAddNamedTopologyToRunningApplicationWithMultipleInitialNamedTopologies()
   45.17MB | 
org.apache.kafka.streams.integration.NamedTopologyIntegrationTest.shouldAddToEmptyInitialTopologyRemoveResetOffsetsThenAddSameNamedTopologyWithRepartitioning()
   41.76MB | 
org.apache.kafka.streams.integration.NamedTopologyIntegrationTest.shouldAllowRemovingAndAddingNamedTopologyToRunningApplicationWithMultipleNodesAndResetsOffsets()
   39.36MB | 
org.apache.kafka.streams.integration.NamedTopologyIntegrationTest.shouldRemoveAndReplaceTopologicallyIncompatibleNamedTopology()
   34.26MB | 
org.apache.kafka.streams.processor.internals.assignment.TaskAssignorConvergenceTest.randomClusterPerturbationsShouldConverge[enableRackAwareTaskAssignor=true]
   29.29MB | 
org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest.testHighAvailabilityTaskAssignorLargeNumConsumers
   28.11MB | 
org.apache.kafka.streams.integration.NamedTopologyIntegrationTest.shouldAllowPatternSubscriptionWithMultipleNamedTopologies()
   24.25MB | 
org.apache.kafka.tools.MetadataQuorumCommandTest.testDescribeQuorumStatusSuccessful()[6]
   22.99MB | 
org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.testResetSourceConnectorOffsetsExactlyOnceSupportEnabled
   21.63MB | 
org.apache.kafka.tools.reassign.ReassignPartitionsIntegrationTest.testAlterLogDirReassignmentThrottle(String)[1]
   20.61MB | 
org.apache.kafka.streams.integration.NamedTopologyIntegrationTest.shouldBackOffTaskAndEmitDataWithinSameTopology()
   20.39MB | 
org.apache.kafka.tools.reassign.ReassignPartitionsIntegrationTest.testLogDirReassignment(String)[1]
   19.17MB | 
org.apache.kafka.streams.processor.internals.HandlingSourceTopicDeletionIntegrationTest.shouldThrowErrorAfterSourceTopicDeleted
   17.75MB | 
org.apache.kafka.tools.MetadataQuorumCommandTest.testDescribeQuorumStatusSuccessful()[2]
   17.32MB | 
org.apache.kafka.connect.integration.ConnectWorkerIntegrationTest.testBrokerCoordinator
   16.30MB | 
org.apache.kafka.tools.MetadataQuorumCommandTest.testDescribeQuorumReplicationSuccessful()[2]
   15.47MB | 
org.apache.kafka.tools.reassign.ReassignPartitionsIntegrationTest.testCancellation(String)[1]
   13.77MB | 
org.apache.kafka.tools.reassign.ReassignPartitionsIntegrationTest.testThrottledReassignment(String)[1]
   12.64MB | 
org.apache.kafka.tools.reassign.ReassignPartitionsIntegrationTest.testCancellation(String)[2]
   12.46MB | 
org.apache.kafka.tools.MetadataQuorumCommandTest.testDescribeQuorumReplicationSuccessful()[6]
   10.22MB | 
org.apache.kafka.streams.integration.NamedTopologyIntegrationTest.shouldAddNamedTopologyToRunningApplicationWithEmptyInitialTopology()
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] MINOR: Change test logging capture to per-test, reducing jenkins truncation [kafka]

2023-11-29 Thread via GitHub


gharris1727 commented on PR #14795:
URL: https://github.com/apache/kafka/pull/14795#issuecomment-1832757011

   I did some statistics on the current state of log truncation in CI. I 
learned that:
   
   * A full run of `./gradlew test` writes 1.73GB of logs
   * 814 of 1532 test suites (53%) produce 0 logs
   
   For the existing suite-level truncation:
   * 1.71GB (98%) of these logs are discarded due to truncation
   * 27MB (1.5%) of these logs are kept after truncation
   * 175 of 1532 test suites (11%) experience truncation
   * 5721 of 26298 tests (22%) are in test suites that experience truncation
   * Test suites which produce logs average 37kb of logs after truncation
   
   With the test-level truncation proposed here:
   * 1.61GB (93%) of these logs are discarded due to test-level truncation
   * 126MB (7%) of these logs are kept after truncation
   * 452 of 26298 tests (1.7%) experience truncation
   * Tests which produce logs average 15kb of logs after truncation
   
   So, assuming a worst-case run with every test failing (as logs are only kept 
for failed tests) and log volume similar to successful runs, this change would 
cost 5 times (126MB/27MB) as much log storage space. However, any particular 
test would be **~12 times (5721/452) less likely to experience truncation**.
   
   We don't regularly see fully-failed test suites, and instead typically see 
small numbers of test failures. If we assume test failures to be uniformly 
distributed among all tests (which they almost certainly aren't, but I don't 
have statistics for that) we can use averages to calculate the expected 
persisted logs per test failure. Since a test failure in suite-truncation keeps 
the truncated logs for the whole suite, each test failure adds on average 37kb 
of logs, or less if multiple tests in the same suite fail. Test failures under 
test-truncation only keep logs for the individual tests, which averages 15kb, 
and receives no discount for multiple test failures. So assuming a small number 
of test failures which is typical, **the cost of storing these logs is 2 times 
(37kb/15kb) less**.
   
   I believe that this change should not be harmful to the Jenkins test 
infrastructure, and will immediately benefit our ability to debug tests via CI, 
especially flaky failures.
   
   @ijuma @mimaison @divijvaidya Could you take a look at this when you have a 
chance?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] MINOR: Change test logging capture to per-test, reducing jenkins truncation [kafka]

2023-11-17 Thread via GitHub


gharris1727 opened a new pull request, #14795:
URL: https://github.com/apache/kafka/pull/14795

   Jenkins truncates stdout/stderr from tests which exceed 100,000 bytes. This 
truncation is computed once per-suite, meaning that each suite gets a 100kb 
budget for logs, and suites that log too much have the middle of the log 
truncated. This unfairly discards complete logs for tests in the middle of the 
suite, while keeping logs from the beginning and end of the suite.
   
   If a failure occurs in a single test in the middle of a suite, the relevant 
logs may be completely elided, making investigation of the failure more 
difficult. This has made debugging with the CI logging almost completely 
ineffective, as the relevant logs are often swallowed by Jenkins, and 
irrelevant logs are shown.
   
   Instead, we can enable this feature in the Gradle JunitXmlReport: 
https://docs.gradle.org/current/javadoc/org/gradle/api/tasks/testing/JUnitXmlReport.html#setOutputPerTestCase-boolean-
   This changes the way that stdout/stderr is embedded in the XML report, 
separating the output for each test into different xml tags. This may enable 
Jenkins to perform truncation on a per-test basis, so that each test in a suite 
gets a fair distribution of the logging budget.
   
   This could increase the size of the logs persisted by Jenkins, as each suite 
is currently capped at 100kb, but after this change you could receive N*100kb 
logs overall, if there are N tests in the suite. However It appears that 
Jenkins cannot show the stdout for passing tests, so probably isn't capturing 
it (I couldn't find a configuration which would confirm this.) If this is true, 
that means that the size of logs persisted will only increase for test 
failures, when the additional logs would be useful.
   
   This change may also reduce the total amount of logs captured, since logs 
from tests that passed won't be kept when another test in the same suite fails. 
Regardless, the more effective usage of the logging budget will be beneficial 
even if the total amount of logs persisted increases.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org