[jira] [Created] (GROOVY-12034) Apply "fat-free lambda" patterns to Groovy — DGM `*With` family + `curryWith` helper

Paul King (Jira) Fri, 22 May 2026 23:34:06 -0700

Paul King created GROOVY-12034:
----------------------------------

             Summary: Apply "fat-free lambda" patterns to Groovy — DGM `*With` 
family + `curryWith` helper
                 Key: GROOVY-12034
                 URL: https://issues.apache.org/jira/browse/GROOVY-12034
             Project: Groovy
          Issue Type: Improvement
            Reporter: Paul King
            Assignee: Paul King



h2. Summary

Donald Raab's blog post "Fat-Free Lambdas in Java" 
([https://donraab.medium.com/fat-free-lambdas-in-java-bf228da0613b]) describes 
a pattern where stateless lambdas, combined with library APIs that take an 
extra parameter, produce zero per-call allocation. This ticket assesses 
applying that pattern to Groovy, backed by a JMH benchmark.

*Result:* the optimisation already works for {{@CompileStatic}} lambdas 
targeting a functional interface (GROOVY-11905), but Groovy's GDK does not 
expose an API surface that lets idiomatic call sites _use_ it. The dominant 
Groovy idiom for "don't capture" — {{Closure.rcurry}} — measures *worse on 
every axis* than just capturing. Adding a small {{*With}} family to 
{{DefaultGroovyMethods}}, plus a {{curryWith}} helper, would deliver the 
fat-free benefit to ordinary Groovy code with no breaking changes.

h2. Background — the technique

Three pillars stack to make the optimisation work in Java:

# *Compiler hoisting of stateless lambdas* — javac lowers non-capturing {{(x) 
-> ...}} to a static {{LambdaMetafactory}} call site whose linked {{CallSite}} 
returns the same singleton forever.
# *API that prefers passing over capturing* — Eclipse Collections' 
{{anySatisfyWith(Predicate2<T,P>, P)}} accepts a 2-arg functional interface 
plus the would-be-captured value, so the lambda body becomes stateless.
# *Method references* — {{String::startsWith}} is unbound; naturally 
non-capturing.

h2. Mapping to Groovy

h3. Pillar 1 — Compiler hoisting

*Already done for {{@CompileStatic}} lambdas targeting a functional interface 
(GROOVY-11905).*

* {{StaticTypesLambdaAnalyzer.isNonCapturing()}} at 
{{StaticTypesLambdaWriter.java:72-75}} detects non-capturing lambdas.
* Non-capturing lambdas get {{ACC_STATIC}} on the synthetic {{doCall}} 
({{StaticTypesLambdaWriter.java:339}}).
* Emitted as {{invokedynamic}} → {{LambdaMetafactory.metafactory}} with 
{{H_INVOKESTATIC}} ({{StaticTypesLambdaWriter.java:179-183, :429}}). The JDK 
returns the singleton.

*Cannot be applied to plain {{Closure}}* (and probably should not be):

* Every {{Closure}} carries {{owner}}, {{thisObject}}, {{delegate}}, 
{{resolveStrategy}}, {{directive}}, {{parameterTypes}}, 
{{maximumNumberOfParameters}}, {{bcw}} ({{Closure.java:260-274}}) — ~56-72 
bytes minimum, dwarfing a Java capturing lambda.
* {{Closure}} instances are externally mutable: callers may set {{.delegate}}, 
{{.resolveStrategy}}, call {{.curry(...)}}. Sharing a singleton would let one 
caller's mutation leak globally.
* In dynamic compilation, {{LambdaWriter}} delegates to {{ClosureWriter}} 
({{LambdaWriter.java:27-56}}), so even a non-capturing lambda becomes a fresh 
{{Closure}} subclass per evaluation.

h3. Pillar 2 — API surface (the gap)

Groovy's GDK has *no equivalent* of Eclipse Collections' {{anySatisfyWith}}. 
Confirmed by inspection of {{DefaultGroovyMethods.java}}:

* 
{{findAll}}/{{find}}/{{any}}/{{every}}/{{collect}}/{{inject}}/{{count}}/{{groupBy}}
 (lines 5484, 5376, 600, 5267, 2336, 8787, 3607, 7850) all take a single 
{{Closure}}.
* The only {{BiPredicate}}/{{BiFunction}} overloads in DGM are {{zipWithNext}} 
({{DGM.java:18515}}) and {{groupConsecutive}} ({{DGM.java:18696}}) — both pair 
_adjacent elements_, not element-plus-external-arg.

Groovy's existing "don't capture" tool is {{Closure.rcurry/curry/ncurry}} 
({{Closure.java:673, 706, 755}}). But every call allocates:

{code:java}
super(uncurriedClosure.clone());       // CurriedClosure.java:67 — clones the 
base Closure
this.curriedArguments = arguments;     // CurriedClosure.java:70 — fresh 
Object[]
{code}

So {{rcurry(value)}} costs a clone + wrapper + array, _heavier than just 
capturing_. (Confirmed by the benchmark below.)

h3. Pillar 3 — Method references

Already work in {{@CompileStatic}}. Same indy path, same singleton behaviour, 
no synthetic class needed (the method handle points straight at the target 
method).

h2. Benchmark

Sidecar code added at:

* 
{{subprojects/performance/src/jmh/groovy/org/apache/groovy/bench/FatFreeLambda.groovy}}
* 
{{subprojects/performance/src/jmh/groovy/org/apache/groovy/bench/FatFreeLambdaBench.java}}

Seven variants compared, all traversing the full list (prefix never matches) 
with identical per-element work:

||Tag||Variant||Per-call allocation shape||
|A|{{data.any \{ it.startsWith(prefix) \}}} — capturing closure literal|1 × 
{{Closure}} subclass + per-element wrap|
|B|{{data.any(pred.rcurry(prefix))}} — closure + rcurry|1 × {{Closure}} + 1 × 
{{CurriedClosure}} + per-element args|
|C|{{anyWith(data, (s,p) -> s.startsWith(p), prefix)}} — candidate API + 
lambda|0 (indy singleton via {{LambdaMetafactory}})|
|D|{{anyWith(data, String::startsWith, prefix)}} — candidate API + method ref|0 
(same; no synthetic class)|
|E|hand-written {{for}} loop|0 (baseline)|
|F|{{static final}} hoisted closure + rcurry|1 × {{CurriedClosure}} + 
per-element args (no per-call literal)|
|G|{{anyPred(data, curryWith(STARTS_WITH, prefix))}} — fat-free curry helper|1 
small capturing lambda per call|

Setup: JMH 1.37, JDK 17.0.18 Zulu, macOS, 3×2s warmup + 5×2s × 2 forks, GC 
profiler enabled, size sweep {{@Param({"100","1000","10000"})}}. Two passes: 
default ({{indy=true}}) and {{-Pindy=false}}. Each pass ~21 min wall.

Run command:
{noformat}
./gradlew :perf:jmh -PbenchInclude=FatFreeLambda
{noformat}

h2. Results

h3. Throughput (ops/µs, higher = better, {{indy=true}})

||Variant||size=100||size=1000||size=10000||
|C {{anyWith}} + lambda|9.926 ± 0.105|1.001 ± 0.004|0.096 ± 0.001|
|D {{anyWith}} + method ref|9.622 ± 0.114|0.998 ± 0.009|0.096 ± 0.001|
|E baseline {{for}} loop|8.215 ± 0.050|0.768 ± 0.007|0.072 ± 0.001|
|G {{curryWith}}|7.850 ± 0.237|0.927 ± 0.012|0.078 ± 0.001|
|A capture closure|0.584 ± 0.024|0.062 ± 0.002|0.007 ± 0.001|
|B rcurry|0.252 ± 0.010|0.025 ± 0.001|0.003 ± 0.001|
|F static-final + rcurry|0.248 ± 0.001|0.025 ± 0.002|0.003 ± 0.001|

h3. Allocation (bytes/op, lower = better, {{indy=true}})

||Variant||size=100||size=1000||size=10000||Per-element rate||
|C/D|≈ 10⁻⁴|≈ 10⁻³|0.031|*0*|
|E baseline|≈ 10⁻³|0.004|32 (GC-thread noise)|*0*|
|G {{curryWith}}|*128*|*128*|*128*|*0 — constant per call*|
|A capture closure|7,304|72,080|720,096|~72 B/elem + ~80 fixed|
|B rcurry|9,808|96,208|960,209|~96 B/elem + ~80 fixed|
|F shared closure + rcurry|9,764|96,176|960,153|matches B (60 B/call less)|

h3. Indy on vs off (size=1000)

||Variant||indy=on ops/µs||indy=off ops/µs||Δ||indy=on B/op||indy=off B/op||
|A capture|0.062|0.057|−8%|72,080|72,080|
|B rcurry|0.025|0.024|−4%|96,208|96,580|
|C {{anyWith}}+lambda|1.001|1.001|0%|0.003|0.003|
|D {{anyWith}}+methodRef|0.998|0.999|0%|0.003|0.003|
|E baseline|0.768|0.905|+18%|0.003|0.003|
|F shared+rcurry|0.025|0.025|0%|96,176|96,472|
|G curryWith|0.927|0.894|−4%|128|120|

h2. Findings

# *The fat-free claim holds in Groovy, with margin.* C and D allocate 
effectively zero bytes regardless of size. At size=10000, C/D move 0.03 B/op vs 
720 KB/op for A and 960 KB/op for B — *5 orders of magnitude apart*. The 
optimisation is real and usable today via {{@CompileStatic}} + functional 
interface targets; the GDK just doesn't currently offer the API to reach it.
# *The Groovy idiom "curry instead of capture" is empirically backwards.* 
Across every size and both indy modes, B ({{rcurry}}) is *~2.5× slower* than A 
(capturing closure) and allocates *~33% more bytes*. The cause is 
{{CurriedClosure.call}} concatenating {{curriedArguments}} with the incoming 
args into a fresh {{Object[]}} on every element. At size=10000 that's 10,000 
small arrays per call.
# *Hoisting the base closure to {{static final}} is essentially worthless.* F 
matches B to within ~60 bytes per call across all configurations. The per-call 
closure-literal allocation is noise; the per-_element_ {{CurriedClosure}} cost 
is everything. Advising users to "extract closure literals to constants" 
doesn't solve the rcurry problem.
# *{{curryWith}} is the right shape for the adaptation gap.* G allocates a flat 
*128 B per call* regardless of workload size. At size=10000 that's *~7,500× 
less allocation than {{rcurry}}* for the same logical operation. 
Throughput-wise G is within 10% of the hand-written {{for}} loop. This covers 
the case where users _can't_ avoid producing a unary {{Predicate}}/{{Function}} 
(handing off to {{Stream.filter}}, {{Collection.removeIf}}, etc.).
# *Indy mode is irrelevant to this story.* Allocations are bit-identical 
between {{-Pindy=true}} and {{-Pindy=false}}; throughputs move ≤8% for the 
closure variants and 0% for the lambda variants. *The proposal helps indy and 
non-indy users equally.*
# *C and D systematically beat E by 21-33%* across every size, {{indy=true}}. 
Reproducible across both runs. Most likely the {{BiPredicate.test → 
static-method-handle → s.startsWith(p)}} chain inlines into a slightly tighter 
loop body than the direct {{for (String s : data) s.startsWith(prefix)}}. Not 
load-bearing for the proposal, but worth noting as the strongest possible 
refutation of "Groovy lambdas can't match Java loops".

h2. Proposed actions (_proposed pending team review_)

h3. Primary: add {{*With}} overloads to {{DefaultGroovyMethods}}

Additive — no source or binary breakage. ~8–12 methods cover the high-traffic 
surface:

{code:java}
public static <T,P> List<T>     findAll(Iterable<T> self, BiPredicate<? super 
T, ? super P> test, P p);
public static <T,P> boolean     any   (Iterable<T> self, BiPredicate<? super T, 
? super P> test, P p);
public static <T,P> boolean     every (Iterable<T> self, BiPredicate<? super T, 
? super P> test, P p);
public static <T,P> Number      count (Iterable<T> self, BiPredicate<? super T, 
? super P> test, P p);
public static <T,P,R> List<R>   collect(Iterable<T> self, BiFunction<? super T, 
? super P, ? extends R> f, P p);
public static <T,P> void        each(Iterable<T> self, BiConsumer<? super T, ? 
super P> c, P p);
public static <T,P> T           find(Iterable<T> self, BiPredicate<? super T, ? 
super P> test, P p);
public static <T,P,K> Map<K,List<T>> groupBy(Iterable<T> self, BiFunction<? 
super T, ? super P, K> classifier, P p);
// + inject/reduce with seed; *With variants on Map<K,V>
{code}

Open naming question: {{With}}-suffixed overloads (mirrors Eclipse Collections) 
vs. unsuffixed overloads disambiguated by argument types. The latter is the 
more usual Groovy approach; the former is more obviously discoverable. 
_Proposed pending team review._

h3. Primary: add a {{curryWith}} helper

{code:groovy}
public static <T,P> Predicate<T>  curryWith(BiPredicate<? super T, ? super P> 
bi, P p);
public static <T,P,R> Function<T,R> curryWith(BiFunction<? super T, ? super P, 
? extends R> f, P p);
public static <T,P> Consumer<T>   curryWith(BiConsumer<? super T, ? super P> c, 
P p);
{code}

Right-curry semantics (the value lands in the second/last parameter slot, 
matching what {{*With}} consumers want). Should ship alongside {{*With}} — the 
{{rcurry}} → {{*With}} → {{curryWith}} ratio (~33,000× → 1× → ~7,500×) is the 
strongest argument for shipping the helper in the same release, so users have a 
fat-free path for every shape.

h3. Bonus ticket A: {{CurriedClosure}} per-element allocation

{{CurriedClosure.call}} concatenates {{curriedArguments}} with the incoming 
args into a fresh {{Object[]}} on every invocation 
({{CurriedClosure.java:81-99}}). The dominant cost in every closure+curry path 
in this benchmark. A no-allocation variant for the common case 
({{curriedArguments.length == 1 && args.length == 1}}) would save ~24 B/element 
on every rcurry — a lift for everyone already using rcurry in production code, 
without anyone needing to change source.

h3. Bonus ticket B: {{LambdaWriter}} {{Reference}} wrapping for 
effectively-final method parameters

LambdaWriter wraps effectively-final method parameters in 
{{groovy.lang.Reference}} — visible in G's 128 B/call constant (synthetic 
lambda constructor signature: {{(Object, Object, Reference, Reference)V}}). 
javac captures such parameters by value into the synthetic lambda's final 
fields with no wrapping; Groovy could do the same, eliminating the per-capture 
{{Reference}} allocations. Lower priority than bonus A because JIT 
escape-analysis often elides these wrappers in hot paths, but they show up in 
cold paths, megamorphic call sites, and interpreter-only execution.

h2. Caveats

* Benchmark uses 1000-element traversal with a cheap predicate 
({{startsWith}}); closure overhead is maximally exposed. With heavier 
per-element work the proportional gap narrows.
* Single dev machine; no CPU pinning. Throughputs are reproducible to ~10%, 
allocations to <1 byte.
* JEP 450 compact object headers would further widen the C/D advantage but 
require Java 25+; Groovy's minimum supported JDK is currently 17 (commit 
{{e9b9cf1ec5}} / GROOVY-12027).
* The proposed {{*With}} family pays off most under {{@CompileStatic}} + lambda 
syntax. Dynamic-Groovy callers using closures get no allocation win — but the 
closure overloads remain, so no regression.
* *Related:* GROOVY-11905 (non-capturing lambda optimisation — landed)

h2. Appendix — raw bench output

Two full-fidelity passes saved at:
* Run 1 ({{indy=true}}): {{/tmp/fatfree-bench/run1-indy-on.\{log,txt\}}}
* Run 2 ({{indy=false}}): {{/tmp/fatfree-bench/run2-indy-off.\{log,txt\}}}

Bench source: 
{{subprojects/performance/src/jmh/groovy/org/apache/groovy/bench/FatFreeLambda\{,Bench\}.\{groovy,java\}}}

To reproduce with GC profiler:
{noformat}
# Add `profilers = ['gc']` to the jmh { } block in
# build-logic/src/main/groovy/org.apache.groovy-performance.gradle
./gradlew :perf:jmh -PbenchInclude=FatFreeLambda                # indy=true
./gradlew :perf:jmh -PbenchInclude=FatFreeLambda -Pindy=false   # indy=false
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (GROOVY-12034) Apply "fat-free lambda" patterns to Groovy — DGM `*With` family + `curryWith` helper

Reply via email to