[jira] [Commented] (GROOVY-10307) Groovy 4 runtime performance on average 2.4x slower than Groovy 3

ASF GitHub Bot (Jira) Wed, 04 Feb 2026 12:35:12 -0800


    [ 
https://issues.apache.org/jira/browse/GROOVY-10307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18056534#comment-18056534
 ]


ASF GitHub Bot commented on GROOVY-10307:
-----------------------------------------

jamesfredley opened a new pull request, #2377:
URL: https://github.com/apache/groovy/pull/2377

   ## Summary
   
   Based on https://github.com/apache/groovy/pull/2374, but applied to 
master(Groovy 6) instead of 
[GROOVY_4_0_X](https://github.com/apache/groovy/tree/GROOVY_4_0_X)
   
   This PR improves invokedynamic performance reported in GROOVY-10307. The 
optimization reduces the performance impact of metaclass changes on call sites 
by replacing the global SwitchPoint invalidation mechanism with targeted 
per-call-site cache invalidation.
   
   ## Problem
   
   When any metaclass changes in Groovy, the global `SwitchPoint` is 
invalidated, causing **all** invokedynamic call sites across the entire 
application to fall back and re-link. This creates significant overhead in 
applications that frequently modify metaclasses (e.g., Grails applications with 
dynamic finders, runtime mixins, etc.).
   
   ## Solution
   
   This PR implements a more targeted invalidation strategy:
   
   1. **Disable global SwitchPoint guard by default** - The 
`SwitchPoint.guardWithTest()` wrapper is now optional and disabled by default. 
This prevents mass invalidation of all call sites when any metaclass changes.
   
   2. **Track all call sites via WeakReference set** - All `CacheableCallSite` 
instances are registered in a concurrent set using weak references, allowing 
targeted invalidation without preventing garbage collection.
   
   3. **Add `clearCache()` method to CacheableCallSite** - When a metaclass 
changes, we can now clear the LRU cache and reset the fallback count on 
specific call sites rather than invalidating everything.
   
   4. **Targeted invalidation on metaclass change** - 
`invalidateSwitchPoints()` now iterates through registered call sites, clearing 
caches and resetting targets as needed.
   
   ## Changes
   
   ### `CacheableCallSite.java`
   - Added `clearCache()` method to clear LRU cache and reset fallback count
   
   ### `IndyInterface.java`
   - Added `ALL_CALL_SITES` WeakReference set to track all call sites
   - Added `registerCallSite(CacheableCallSite)` method
   - Modified `invalidateSwitchPoints()` to clear caches on all registered call 
sites
   - Register call sites during bootstrap
   
   ### `Selector.java`
   - Added `INDY_SWITCHPOINT_GUARD` system property flag (default: `false`)
   - Made the SwitchPoint guard conditional based on the flag
   
   ### `JarJarTask.groovy` (unrelated build fix)
   - Changed `@InputFiles @Classpath` to `@Input` on `untouchedFiles` field
   - Fixes Windows build issue where glob patterns containing `*` were treated 
as literal file paths
   
   ## Configuration
   
   The SwitchPoint guard behavior can be controlled via system property:
   
   ```bash
   # Use new targeted invalidation (default)
   java -jar myapp.jar
   
   # Revert to old global SwitchPoint behavior
   java -Dgroovy.indy.switchpoint.guard=true -jar myapp.jar
   ```
   
   ## Benchmark Results
   
   Tested using a dedicated benchmark suite measuring metaclass invalidation 
impact: https://github.com/jamesfredley/groovy-indy-performance
   
   ## Complete Benchmark Comparison (3-Run Averages)
   
   **Test Date:** February 4, 2026  
   **Test Machine:** Windows 11, 20 cores, 4GB max heap  
   **Java Version:** 17.0.18 (Amazon Corretto)
   
   ### Versions Tested
   
   | Version | Description |
   |---------|-------------|
   | **4.0.30** | Groovy 4.0.30 from Maven Central (baseline) |
   | **6-snapshot** | Groovy 6.0.0-SNAPSHOT clean master (no optimizations) |
   | **6-snapshot-opt** | Groovy 6.0.0-SNAPSHOT with this PRs optimizations |
   
   ### Optimizations Applied in 6-snapshot-opt
   
   1. **Disabled global SwitchPoint guard** - `INDY_SWITCHPOINT_GUARD` defaults 
to `false`
   2. **Call site registry** - Track all call sites via `WeakReference` set
   3. **Cache invalidation** - Clear individual call site caches on metaclass 
change
   4. **Target reset** - Reset call site targets to default on invalidation
   
   ---
   
   ## 🎯 KEY METRIC: Metaclass Invalidation Stress Test
   
   This test measures the performance impact when metaclass changes occur 
during execution.
   Lower ratio = better (less performance degradation from metaclass changes).
   
   | Metric | 4.0.30 | 6-snapshot | 6-snapshot-opt |
   |--------|--------|------------|----------------|
   | Run 1 | 72.66x | 83.16x | **67.33x** |
   | Run 2 | 103.63x | **77.54x** | 76.90x |
   | Run 3 | 107.92x | 103.71x | **71.26x** |
   | **Average** | 94.74x | 88.14x | **71.83x** |
   | Baseline (no changes) | **5.61 ms** | 5.84 ms | 6.02 ms |
   | With metaclass changes | 515.11 ms | 505.83 ms | **431.28 ms** |
   
   ### Key Finding
   
   **6-snapshot-opt reduces metaclass invalidation impact by:**
   - **19% vs 6-snapshot** (71.83x vs 88.14x)
   - **24% vs 4.0.30** (71.83x vs 94.74x)
   
   ---
   
   ## Comprehensive Benchmark Suite (3-Run Averages)
   
   | Benchmark | 4.0.30 | 6-snapshot | 6-snapshot-opt |
   |-----------|--------|------------|----------------|
   | **Loop Benchmarks** | | | |
   | Loop: each + toString | **31.72 ms** | 49.23 ms | 50.37 ms |
   | Loop: collect | **52.12 ms** | 75.74 ms | 75.80 ms |
   | Loop: findAll | **113.37 ms** | 146.15 ms | 147.52 ms |
   | **Method Invocation** | | | |
   | Method: simple instance | **8.81 ms** | 27.53 ms | 27.74 ms |
   | Method: with params | **10.33 ms** | **27.49 ms** | 29.32 ms |
   | Method: static | **7.54 ms** | 26.49 ms | 26.56 ms |
   | Method: polymorphic | **1.86 s** | 2.03 s | **1.86 s** |
   | **Closures** | | | |
   | Closure: creation + call | **24.93 ms** | 34.87 ms | 34.27 ms |
   | Closure: reused | **20.59 ms** | 27.70 ms | 27.85 ms |
   | Closure: nested | 39.16 ms | **37.46 ms** | 38.27 ms |
   | Closure: curried | **159.16 ms** | 191.60 ms | 192.33 ms |
   | **Properties** | | | |
   | Property: read/write | **19.62 ms** | 71.62 ms | 73.83 ms |
   | **Collections** | | | |
   | Collection: each | **108.84 ms** | 127.27 ms | 128.85 ms |
   | Collection: collect | **119.86 ms** | 138.99 ms | **138.79 ms** |
   | Collection: inject | **133.79 ms** | 153.63 ms | 155.63 ms |
   | **GStrings** | | | |
   | GString: simple | **101.42 ms** | 118.48 ms | **118.47 ms** |
   | GString: multi-value | **114.82 ms** | 130.68 ms | **129.68 ms** |
   | **Call Site Performance** | | | |
   | Monomorphic call site | 124.68 ms | 117.98 ms | **117.21 ms** |
   | Polymorphic call site | 3.73 s | 3.79 s | **3.56 s** |
   
   ---
   
   ## Closure Benchmark Suite (3-Run Averages)
   
   | Benchmark | 4.0.30 | 6-snapshot | 6-snapshot-opt |
   |-----------|--------|------------|----------------|
   | Simple closure creation | 29.71 ms | **14.86 ms** | 16.20 ms |
   | Closure reuse | 19.25 ms | **10.08 ms** | 10.99 ms |
   | Multi-param closure | 38.58 ms | **19.25 ms** | 21.48 ms |
   | Closure with capture | 19.60 ms | **7.48 ms** | 7.66 ms |
   | Closure modify capture | 12.87 ms | 4.73 ms | **4.29 ms** |
   | Closure delegation | 27.94 ms | 27.97 ms | **25.87 ms** |
   | Nested closures | 58.24 ms | 26.16 ms | **25.84 ms** |
   | Curried closure | **294.13 ms** | 337.07 ms | 332.51 ms |
   | Closure composition | **59.20 ms** | 70.86 ms | 73.05 ms |
   | Closure spread | **2.05 s** | 2.66 s | 2.66 s |
   | Closure.call() | **22.32 ms** | 8.56 ms | 8.99 ms |
   | Closure trampoline | **53.48 ms** | 60.26 ms | 59.67 ms |
   
   ---
   
   ## Loop Benchmark Suite (3-Run Averages)
   
   | Benchmark | 4.0.30 | 6-snapshot | 6-snapshot-opt |
   |-----------|--------|------------|----------------|
   | Original: each + toString | **45.54 ms** | 48.94 ms | 49.51 ms |
   | Simple: each only | **39.93 ms** | 47.65 ms | 48.95 ms |
   | Closure call | 19.54 ms | **2.58 ms** | 2.74 ms |
   | Method call | 5.32 ms | 6.02 ms | **5.33 ms** |
   | Nested loops | **74.47 ms** | 79.96 ms | 79.68 ms |
   | Loop with collect | **88.51 ms** | 106.90 ms | 105.39 ms |
   | Loop with findAll | **202.83 ms** | 233.18 ms | 232.28 ms |
   
   ---
   
   ## Method Invocation Benchmark Suite (3-Run Averages)
   
   | Benchmark | 4.0.30 | 6-snapshot | 6-snapshot-opt |
   |-----------|--------|------------|----------------|
   | Simple instance method | 7.69 ms | 6.17 ms | **5.75 ms** |
   | Method with parameters | **7.89 ms** | 7.85 ms | 8.08 ms |
   | Method with object param | **10.89 ms** | 10.61 ms | 10.64 ms |
   | Static method | **3.20 ms** | 3.42 ms | 3.48 ms |
   | Static method with params | **7.87 ms** | 8.16 ms | 7.67 ms |
   | Interface method | 3.24 ms | 3.81 ms | **3.69 ms** |
   | Dynamic typed calls | **3.26 ms** | **3.26 ms** | 3.31 ms |
   | Property access | **21.50 ms** | N/A | N/A |
   | GString method | **192.53 ms** | N/A | N/A |
   
   ---
   
   ## Raw Data: Individual Run Results
   
   ### Metaclass Invalidation Ratios
   
   | Run | 4.0.30 | 6-snapshot | 6-snapshot-opt |
   |-----|--------|------------|----------------|
   | 1 | 72.66x | 83.16x | 67.33x |
   | 2 | 103.63x | 77.54x | 76.90x |
   | 3 | 107.92x | 103.71x | 71.26x |
   
   ### With Metaclass Changes (ms)
   
   | Run | 4.0.30 | 6-snapshot | 6-snapshot-opt |
   |-----|--------|------------|----------------|
   | 1 | 517.31 | 515.21 | 430.71 |
   | 2 | 508.25 | 504.47 | 429.30 |
   | 3 | 519.77 | 497.80 | 433.84 |
   
   ### Baseline (No Metaclass Changes) (ms)
   
   | Run | 4.0.30 | 6-snapshot | 6-snapshot-opt |
   |-----|--------|------------|----------------|
   | 1 | 7.12 | 6.20 | 6.40 |
   | 2 | 4.90 | 6.51 | 5.58 |
   | 3 | 4.82 | 4.80 | 6.09 |
   
   
   
   
   ## Related
   
   - JIRA: https://issues.apache.org/jira/browse/GROOVY-10307
   - Original PR: https://github.com/apache/groovy/pull/2374
   - Test project: https://github.com/jamesfredley/groovy-indy-performance
   - Since this PR is against Groovy 6, the Grails 7 test project will not run
   




> Groovy 4 runtime performance on average 2.4x slower than Groovy 3
> -----------------------------------------------------------------
>
>                 Key: GROOVY-10307
>                 URL: https://issues.apache.org/jira/browse/GROOVY-10307
>             Project: Groovy
>          Issue Type: Bug
>          Components: bytecode, performance
>    Affects Versions: 4.0.0-beta-1, 3.0.9
>         Environment: OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 
> (build 11.0.11+9)
> OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed mode)
> WIN10 (tests) / REL 8 (web application)
> IntelliJ 2021.2 
>            Reporter: mgroovy
>            Priority: Major
>         Attachments: groovy_3_0_9_gc.png, groovy_3_0_9_loop2.png, 
> groovy_3_0_9_loop4.png, groovy_3_0_9_mem.png, groovy_4_0_0_b1_loop2.png, 
> groovy_4_0_0_b1_loop4.png, groovy_4_0_0_b1_loop4_gc.png, 
> groovy_4_0_0_b1_loop4_mem.png, 
> groovysql_performance_groovy4_2_xx_yy_zzzz.groovy, loops.groovy, 
> profile3.txt, profile4-loops.txt, profile4.txt, profile4d.txt
>
>
> Groovy 4.0.0-beta-1 runtime performance in our framework is on average 2 to 3 
> times slower compared to using Groovy 3.0.9 (regular i.e. non-INDY)
> * Our complete framework and application code is completely written in 
> Groovy, spread over multiple IntelliJ modules
> ** mixed @CompileDynamic/@TypeChecked and @CompileStatic
> ** No Java classes left in project, i.e. no cross compilation occurs
> * We build using IntelliJ 2021.2 Groovy build process, then run / deploy the 
> compiled class files
> ** We do _not_ use a Groovy based DSL, nor do we execute Groovy scripts 
> during execution
> * Performance degradation when using Groovy 4.0.0-beta-1 instead of Groovy 
> 3.0.9 (non-INDY):
> ** The performance of the largest of our web applications has dropped 3x 
> (startup) / 2x (table refresh) respectively
> *** Stack: Tomcat/Vaadin/Ebean plus framework generated SQL
> ** Our test suite runs about 2.4 times as long as before (120 min when using 
> G4, compared to about 50 min with G3)
> *** JUnit 5 
> *** test suite also contains no scripts / dynamic code execution
> *** Individual test performance varies: A small number of tests runs faster, 
> but the majority is slower, with some extreme cases taking nearly 10x as long 
> to finish
> * Using Groovy 3.0.9 INDY displays nearly identical performance degradation, 
> so it seems that the use of invoke dynamic is somehow at fault



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (GROOVY-10307) Groovy 4 runtime performance on average 2.4x slower than Groovy 3

Reply via email to