[
https://issues.apache.org/jira/browse/GROOVY-10307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18056534#comment-18056534
]
ASF GitHub Bot commented on GROOVY-10307:
-----------------------------------------
jamesfredley opened a new pull request, #2377:
URL: https://github.com/apache/groovy/pull/2377
## Summary
Based on https://github.com/apache/groovy/pull/2374, but applied to
master(Groovy 6) instead of
[GROOVY_4_0_X](https://github.com/apache/groovy/tree/GROOVY_4_0_X)
This PR improves invokedynamic performance reported in GROOVY-10307. The
optimization reduces the performance impact of metaclass changes on call sites
by replacing the global SwitchPoint invalidation mechanism with targeted
per-call-site cache invalidation.
## Problem
When any metaclass changes in Groovy, the global `SwitchPoint` is
invalidated, causing **all** invokedynamic call sites across the entire
application to fall back and re-link. This creates significant overhead in
applications that frequently modify metaclasses (e.g., Grails applications with
dynamic finders, runtime mixins, etc.).
## Solution
This PR implements a more targeted invalidation strategy:
1. **Disable global SwitchPoint guard by default** - The
`SwitchPoint.guardWithTest()` wrapper is now optional and disabled by default.
This prevents mass invalidation of all call sites when any metaclass changes.
2. **Track all call sites via WeakReference set** - All `CacheableCallSite`
instances are registered in a concurrent set using weak references, allowing
targeted invalidation without preventing garbage collection.
3. **Add `clearCache()` method to CacheableCallSite** - When a metaclass
changes, we can now clear the LRU cache and reset the fallback count on
specific call sites rather than invalidating everything.
4. **Targeted invalidation on metaclass change** -
`invalidateSwitchPoints()` now iterates through registered call sites, clearing
caches and resetting targets as needed.
## Changes
### `CacheableCallSite.java`
- Added `clearCache()` method to clear LRU cache and reset fallback count
### `IndyInterface.java`
- Added `ALL_CALL_SITES` WeakReference set to track all call sites
- Added `registerCallSite(CacheableCallSite)` method
- Modified `invalidateSwitchPoints()` to clear caches on all registered call
sites
- Register call sites during bootstrap
### `Selector.java`
- Added `INDY_SWITCHPOINT_GUARD` system property flag (default: `false`)
- Made the SwitchPoint guard conditional based on the flag
### `JarJarTask.groovy` (unrelated build fix)
- Changed `@InputFiles @Classpath` to `@Input` on `untouchedFiles` field
- Fixes Windows build issue where glob patterns containing `*` were treated
as literal file paths
## Configuration
The SwitchPoint guard behavior can be controlled via system property:
```bash
# Use new targeted invalidation (default)
java -jar myapp.jar
# Revert to old global SwitchPoint behavior
java -Dgroovy.indy.switchpoint.guard=true -jar myapp.jar
```
## Benchmark Results
Tested using a dedicated benchmark suite measuring metaclass invalidation
impact: https://github.com/jamesfredley/groovy-indy-performance
## Complete Benchmark Comparison (3-Run Averages)
**Test Date:** February 4, 2026
**Test Machine:** Windows 11, 20 cores, 4GB max heap
**Java Version:** 17.0.18 (Amazon Corretto)
### Versions Tested
| Version | Description |
|---------|-------------|
| **4.0.30** | Groovy 4.0.30 from Maven Central (baseline) |
| **6-snapshot** | Groovy 6.0.0-SNAPSHOT clean master (no optimizations) |
| **6-snapshot-opt** | Groovy 6.0.0-SNAPSHOT with this PRs optimizations |
### Optimizations Applied in 6-snapshot-opt
1. **Disabled global SwitchPoint guard** - `INDY_SWITCHPOINT_GUARD` defaults
to `false`
2. **Call site registry** - Track all call sites via `WeakReference` set
3. **Cache invalidation** - Clear individual call site caches on metaclass
change
4. **Target reset** - Reset call site targets to default on invalidation
---
## 🎯 KEY METRIC: Metaclass Invalidation Stress Test
This test measures the performance impact when metaclass changes occur
during execution.
Lower ratio = better (less performance degradation from metaclass changes).
| Metric | 4.0.30 | 6-snapshot | 6-snapshot-opt |
|--------|--------|------------|----------------|
| Run 1 | 72.66x | 83.16x | **67.33x** |
| Run 2 | 103.63x | **77.54x** | 76.90x |
| Run 3 | 107.92x | 103.71x | **71.26x** |
| **Average** | 94.74x | 88.14x | **71.83x** |
| Baseline (no changes) | **5.61 ms** | 5.84 ms | 6.02 ms |
| With metaclass changes | 515.11 ms | 505.83 ms | **431.28 ms** |
### Key Finding
**6-snapshot-opt reduces metaclass invalidation impact by:**
- **19% vs 6-snapshot** (71.83x vs 88.14x)
- **24% vs 4.0.30** (71.83x vs 94.74x)
---
## Comprehensive Benchmark Suite (3-Run Averages)
| Benchmark | 4.0.30 | 6-snapshot | 6-snapshot-opt |
|-----------|--------|------------|----------------|
| **Loop Benchmarks** | | | |
| Loop: each + toString | **31.72 ms** | 49.23 ms | 50.37 ms |
| Loop: collect | **52.12 ms** | 75.74 ms | 75.80 ms |
| Loop: findAll | **113.37 ms** | 146.15 ms | 147.52 ms |
| **Method Invocation** | | | |
| Method: simple instance | **8.81 ms** | 27.53 ms | 27.74 ms |
| Method: with params | **10.33 ms** | **27.49 ms** | 29.32 ms |
| Method: static | **7.54 ms** | 26.49 ms | 26.56 ms |
| Method: polymorphic | **1.86 s** | 2.03 s | **1.86 s** |
| **Closures** | | | |
| Closure: creation + call | **24.93 ms** | 34.87 ms | 34.27 ms |
| Closure: reused | **20.59 ms** | 27.70 ms | 27.85 ms |
| Closure: nested | 39.16 ms | **37.46 ms** | 38.27 ms |
| Closure: curried | **159.16 ms** | 191.60 ms | 192.33 ms |
| **Properties** | | | |
| Property: read/write | **19.62 ms** | 71.62 ms | 73.83 ms |
| **Collections** | | | |
| Collection: each | **108.84 ms** | 127.27 ms | 128.85 ms |
| Collection: collect | **119.86 ms** | 138.99 ms | **138.79 ms** |
| Collection: inject | **133.79 ms** | 153.63 ms | 155.63 ms |
| **GStrings** | | | |
| GString: simple | **101.42 ms** | 118.48 ms | **118.47 ms** |
| GString: multi-value | **114.82 ms** | 130.68 ms | **129.68 ms** |
| **Call Site Performance** | | | |
| Monomorphic call site | 124.68 ms | 117.98 ms | **117.21 ms** |
| Polymorphic call site | 3.73 s | 3.79 s | **3.56 s** |
---
## Closure Benchmark Suite (3-Run Averages)
| Benchmark | 4.0.30 | 6-snapshot | 6-snapshot-opt |
|-----------|--------|------------|----------------|
| Simple closure creation | 29.71 ms | **14.86 ms** | 16.20 ms |
| Closure reuse | 19.25 ms | **10.08 ms** | 10.99 ms |
| Multi-param closure | 38.58 ms | **19.25 ms** | 21.48 ms |
| Closure with capture | 19.60 ms | **7.48 ms** | 7.66 ms |
| Closure modify capture | 12.87 ms | 4.73 ms | **4.29 ms** |
| Closure delegation | 27.94 ms | 27.97 ms | **25.87 ms** |
| Nested closures | 58.24 ms | 26.16 ms | **25.84 ms** |
| Curried closure | **294.13 ms** | 337.07 ms | 332.51 ms |
| Closure composition | **59.20 ms** | 70.86 ms | 73.05 ms |
| Closure spread | **2.05 s** | 2.66 s | 2.66 s |
| Closure.call() | **22.32 ms** | 8.56 ms | 8.99 ms |
| Closure trampoline | **53.48 ms** | 60.26 ms | 59.67 ms |
---
## Loop Benchmark Suite (3-Run Averages)
| Benchmark | 4.0.30 | 6-snapshot | 6-snapshot-opt |
|-----------|--------|------------|----------------|
| Original: each + toString | **45.54 ms** | 48.94 ms | 49.51 ms |
| Simple: each only | **39.93 ms** | 47.65 ms | 48.95 ms |
| Closure call | 19.54 ms | **2.58 ms** | 2.74 ms |
| Method call | 5.32 ms | 6.02 ms | **5.33 ms** |
| Nested loops | **74.47 ms** | 79.96 ms | 79.68 ms |
| Loop with collect | **88.51 ms** | 106.90 ms | 105.39 ms |
| Loop with findAll | **202.83 ms** | 233.18 ms | 232.28 ms |
---
## Method Invocation Benchmark Suite (3-Run Averages)
| Benchmark | 4.0.30 | 6-snapshot | 6-snapshot-opt |
|-----------|--------|------------|----------------|
| Simple instance method | 7.69 ms | 6.17 ms | **5.75 ms** |
| Method with parameters | **7.89 ms** | 7.85 ms | 8.08 ms |
| Method with object param | **10.89 ms** | 10.61 ms | 10.64 ms |
| Static method | **3.20 ms** | 3.42 ms | 3.48 ms |
| Static method with params | **7.87 ms** | 8.16 ms | 7.67 ms |
| Interface method | 3.24 ms | 3.81 ms | **3.69 ms** |
| Dynamic typed calls | **3.26 ms** | **3.26 ms** | 3.31 ms |
| Property access | **21.50 ms** | N/A | N/A |
| GString method | **192.53 ms** | N/A | N/A |
---
## Raw Data: Individual Run Results
### Metaclass Invalidation Ratios
| Run | 4.0.30 | 6-snapshot | 6-snapshot-opt |
|-----|--------|------------|----------------|
| 1 | 72.66x | 83.16x | 67.33x |
| 2 | 103.63x | 77.54x | 76.90x |
| 3 | 107.92x | 103.71x | 71.26x |
### With Metaclass Changes (ms)
| Run | 4.0.30 | 6-snapshot | 6-snapshot-opt |
|-----|--------|------------|----------------|
| 1 | 517.31 | 515.21 | 430.71 |
| 2 | 508.25 | 504.47 | 429.30 |
| 3 | 519.77 | 497.80 | 433.84 |
### Baseline (No Metaclass Changes) (ms)
| Run | 4.0.30 | 6-snapshot | 6-snapshot-opt |
|-----|--------|------------|----------------|
| 1 | 7.12 | 6.20 | 6.40 |
| 2 | 4.90 | 6.51 | 5.58 |
| 3 | 4.82 | 4.80 | 6.09 |
## Related
- JIRA: https://issues.apache.org/jira/browse/GROOVY-10307
- Original PR: https://github.com/apache/groovy/pull/2374
- Test project: https://github.com/jamesfredley/groovy-indy-performance
- Since this PR is against Groovy 6, the Grails 7 test project will not run
> Groovy 4 runtime performance on average 2.4x slower than Groovy 3
> -----------------------------------------------------------------
>
> Key: GROOVY-10307
> URL: https://issues.apache.org/jira/browse/GROOVY-10307
> Project: Groovy
> Issue Type: Bug
> Components: bytecode, performance
> Affects Versions: 4.0.0-beta-1, 3.0.9
> Environment: OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9
> (build 11.0.11+9)
> OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed mode)
> WIN10 (tests) / REL 8 (web application)
> IntelliJ 2021.2
> Reporter: mgroovy
> Priority: Major
> Attachments: groovy_3_0_9_gc.png, groovy_3_0_9_loop2.png,
> groovy_3_0_9_loop4.png, groovy_3_0_9_mem.png, groovy_4_0_0_b1_loop2.png,
> groovy_4_0_0_b1_loop4.png, groovy_4_0_0_b1_loop4_gc.png,
> groovy_4_0_0_b1_loop4_mem.png,
> groovysql_performance_groovy4_2_xx_yy_zzzz.groovy, loops.groovy,
> profile3.txt, profile4-loops.txt, profile4.txt, profile4d.txt
>
>
> Groovy 4.0.0-beta-1 runtime performance in our framework is on average 2 to 3
> times slower compared to using Groovy 3.0.9 (regular i.e. non-INDY)
> * Our complete framework and application code is completely written in
> Groovy, spread over multiple IntelliJ modules
> ** mixed @CompileDynamic/@TypeChecked and @CompileStatic
> ** No Java classes left in project, i.e. no cross compilation occurs
> * We build using IntelliJ 2021.2 Groovy build process, then run / deploy the
> compiled class files
> ** We do _not_ use a Groovy based DSL, nor do we execute Groovy scripts
> during execution
> * Performance degradation when using Groovy 4.0.0-beta-1 instead of Groovy
> 3.0.9 (non-INDY):
> ** The performance of the largest of our web applications has dropped 3x
> (startup) / 2x (table refresh) respectively
> *** Stack: Tomcat/Vaadin/Ebean plus framework generated SQL
> ** Our test suite runs about 2.4 times as long as before (120 min when using
> G4, compared to about 50 min with G3)
> *** JUnit 5
> *** test suite also contains no scripts / dynamic code execution
> *** Individual test performance varies: A small number of tests runs faster,
> but the majority is slower, with some extreme cases taking nearly 10x as long
> to finish
> * Using Groovy 3.0.9 INDY displays nearly identical performance degradation,
> so it seems that the use of invoke dynamic is somehow at fault
--
This message was sent by Atlassian Jira
(v8.20.10#820010)