Ibuki Kaji created KAFKA-19973:
----------------------------------

             Summary: Improve CI performance
                 Key: KAFKA-19973
                 URL: https://issues.apache.org/jira/browse/KAFKA-19973
             Project: Kafka
          Issue Type: Improvement
            Reporter: Ibuki Kaji


h3. Problem

  The GitHub Actions CI workflow is taking approximately 1.5 hours to complete, 
primarily due to the JUnit test jobs ('JUnit tests Java 17/25').

  According to GitHub Actions metrics: 
[https://github.com/apache/kafka/actions/metrics/performance]
h4. Current Configuration
 * Test parallelism:
 ** [Currently set 
to|https://github.com/apache/kafka/blob/25da7051785b35e7097ee41b430f212e7eafb2f4/.github/actions/run-gradle/action.yml#L94]
 
{code:java}
-PmaxParallelForks=4{code}

 * Runner specifications:
 ** Using 
[`ubuntu-latest`|https://docs.github.com/en/actions/reference/runners/github-hosted-runners#standard-github-hosted-runners-for-public-repositories]
 runners with 4-core CPU and 16GB RAM

h4. Note on root cause analysis:

Since GitHub Actions does not provide CPU/memory metrics during job execution, 
it would be ideal to investigate the root cause using tools like [Workflow
Telemetry|https://github.com/marketplace/actions/workflow-telemetry].

However, if using third-party actions is not feasible for this project, we 
could try the potential solutions below as a starting point to improve CI 
performance.
h2. Potential Solutions
h4. Option 1: Increase Gradle parallel forks

  Increase `maxParallelForks` from 4.
{code:java}
-PmaxParallelForks=8  # or 10, 12, ...
{code}
 * Pros:
 ** No cost increase (same runner, higher parallelism)
 ** Kafka tests are I/O-bound, so parallelism beyond CPU core count may be 
beneficial
 ** Easy to implement and test
 * Cons:
 ** Potential memory pressure (needs monitoring)
 ** Slightly increased risk of test interference

h4. Option 2: Use larger GitHub runners

 Change `ubuntu-latest` runners to [large 
runners|https://docs.github.com/en/actions/reference/runners/github-hosted-runners#larger-runners].
 * Pros:
 ** More resources for stable parallel execution
 ** Can further increase {{maxParallelForks}} safely
 * Cons:
 ** Increased GitHub Actions billing costs
 ** Need to verify available quota for Apache projects

 

I'm willing to implement and test Option 1 (increasing 
{{{}maxParallelForks{}}}) if the community thinks this is a reasonable approach.

However, since external contributors require approval for each CI run, it might 
be more efficient for a maintainer with CI execution permissions to experiment 
with different parallelism values to find the optimal configuration.

Please let me know your thoughts on these options and the best way to proceed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to