Ibuki Kaji created KAFKA-19973:
----------------------------------
Summary: Improve CI performance
Key: KAFKA-19973
URL: https://issues.apache.org/jira/browse/KAFKA-19973
Project: Kafka
Issue Type: Improvement
Reporter: Ibuki Kaji
h3. Problem
The GitHub Actions CI workflow is taking approximately 1.5 hours to complete,
primarily due to the JUnit test jobs ('JUnit tests Java 17/25').
According to GitHub Actions metrics:
[https://github.com/apache/kafka/actions/metrics/performance]
h4. Current Configuration
* Test parallelism:
** [Currently set
to|https://github.com/apache/kafka/blob/25da7051785b35e7097ee41b430f212e7eafb2f4/.github/actions/run-gradle/action.yml#L94]
{code:java}
-PmaxParallelForks=4{code}
* Runner specifications:
** Using
[`ubuntu-latest`|https://docs.github.com/en/actions/reference/runners/github-hosted-runners#standard-github-hosted-runners-for-public-repositories]
runners with 4-core CPU and 16GB RAM
h4. Note on root cause analysis:
Since GitHub Actions does not provide CPU/memory metrics during job execution,
it would be ideal to investigate the root cause using tools like [Workflow
Telemetry|https://github.com/marketplace/actions/workflow-telemetry].
However, if using third-party actions is not feasible for this project, we
could try the potential solutions below as a starting point to improve CI
performance.
h2. Potential Solutions
h4. Option 1: Increase Gradle parallel forks
Increase `maxParallelForks` from 4.
{code:java}
-PmaxParallelForks=8 # or 10, 12, ...
{code}
* Pros:
** No cost increase (same runner, higher parallelism)
** Kafka tests are I/O-bound, so parallelism beyond CPU core count may be
beneficial
** Easy to implement and test
* Cons:
** Potential memory pressure (needs monitoring)
** Slightly increased risk of test interference
h4. Option 2: Use larger GitHub runners
Change `ubuntu-latest` runners to [large
runners|https://docs.github.com/en/actions/reference/runners/github-hosted-runners#larger-runners].
* Pros:
** More resources for stable parallel execution
** Can further increase {{maxParallelForks}} safely
* Cons:
** Increased GitHub Actions billing costs
** Need to verify available quota for Apache projects
I'm willing to implement and test Option 1 (increasing
{{{}maxParallelForks{}}}) if the community thinks this is a reasonable approach.
However, since external contributors require approval for each CI run, it might
be more efficient for a maintainer with CI execution permissions to experiment
with different parallelism values to find the optimal configuration.
Please let me know your thoughts on these options and the best way to proceed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)