[
https://issues.apache.org/jira/browse/GROOVY-12034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18083066#comment-18083066
]
ASF GitHub Bot commented on GROOVY-12034:
-----------------------------------------
codecov-commenter commented on PR #2557:
URL: https://github.com/apache/groovy/pull/2557#issuecomment-4525437127
##
[Codecov](https://app.codecov.io/gh/apache/groovy/pull/2557?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
Report
:white_check_mark: All modified and coverable lines are covered by tests.
:white_check_mark: Project coverage is 68.1935%. Comparing base
([`5318145`](https://app.codecov.io/gh/apache/groovy/commit/53181454e08716af1fb76927215775fb75084f56?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache))
to head
([`659429e`](https://app.codecov.io/gh/apache/groovy/commit/659429e1959cdf4f0af530d2c07bf4070a66e994?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)).
:warning: Report is 7 commits behind head on master.
<details><summary>Additional details and impacted files</summary>
[](https://app.codecov.io/gh/apache/groovy/pull/2557?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
```diff
@@ Coverage Diff @@
## master #2557 +/- ##
==================================================
+ Coverage 68.1844% 68.1935% +0.0091%
- Complexity 33086 33096 +10
==================================================
Files 1508 1508
Lines 126130 126147 +17
Branches 22878 22887 +9
==================================================
+ Hits 86001 86024 +23
+ Misses 32490 32486 -4
+ Partials 7639 7637 -2
```
[see 11 files with indirect coverage
changes](https://app.codecov.io/gh/apache/groovy/pull/2557/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
</details>
<details><summary> :rocket: New features to boost your workflow: </summary>
- :snowflake: [Test
Analytics](https://docs.codecov.com/docs/test-analytics): Detect flaky tests,
report on failures, and find test suite problems.
- :package: [JS Bundle
Analysis](https://docs.codecov.com/docs/javascript-bundle-analysis): Save
yourself from yourself by tracking and limiting bundle sizes in JS merges.
</details>
> Apply "fat-free lambda" patterns to Groovy — DGM `*With` family + `curryWith`
> helper
> ------------------------------------------------------------------------------------
>
> Key: GROOVY-12034
> URL: https://issues.apache.org/jira/browse/GROOVY-12034
> Project: Groovy
> Issue Type: Improvement
> Reporter: Paul King
> Assignee: Paul King
> Priority: Major
> Attachments: FatFreeLambda.groovy, FatFreeLambdaBench.java,
> run1-indy-on.log, run1-indy-on.txt, run2-indy-off.log, run2-indy-off.txt
>
>
> h2. Summary
> Donald Raab's blog post "Fat-Free Lambdas in Java"
> ([https://donraab.medium.com/fat-free-lambdas-in-java-bf228da0613b])
> describes a pattern where stateless lambdas, combined with library APIs that
> take an extra parameter, produce zero per-call allocation. This ticket
> assesses applying that pattern to Groovy, backed by a JMH benchmark.
> *Result:* the optimisation already works for {{@CompileStatic}} lambdas
> targeting a functional interface (GROOVY-11905), but Groovy's GDK does not
> expose an API surface that lets idiomatic call sites _use_ it. The dominant
> Groovy idiom for "don't capture" — {{Closure.rcurry}} — measures *worse on
> every axis* than just capturing. Adding a small {{*With}} family to
> {{DefaultGroovyMethods}}, plus a {{curryWith}} helper, would deliver the
> fat-free benefit to ordinary Groovy code with no breaking changes.
> h2. Background — the technique
> Three pillars stack to make the optimisation work in Java:
> # *Compiler hoisting of stateless lambdas* — javac lowers non-capturing
> {{\(x) -> ...}} to a static {{LambdaMetafactory}} call site whose linked
> {{CallSite}} returns the same singleton forever.
> # *API that prefers passing over capturing* — Eclipse Collections'
> {{anySatisfyWith(Predicate2<T,P>, P)}} accepts a 2-arg functional interface
> plus the would-be-captured value, so the lambda body becomes stateless.
> # *Method references* — {{String::startsWith}} is unbound; naturally
> non-capturing.
> h2. Mapping to Groovy
> h3. Pillar 1 — Compiler hoisting
> *Already done for {{@CompileStatic}} lambdas targeting a functional interface
> (GROOVY-11905).*
> * {{StaticTypesLambdaAnalyzer.isNonCapturing()}} at
> {{StaticTypesLambdaWriter.java:72-75}} detects non-capturing lambdas.
> * Non-capturing lambdas get {{ACC_STATIC}} on the synthetic {{doCall}}
> ({{StaticTypesLambdaWriter.java:339}}).
> * Emitted as {{invokedynamic}} → {{LambdaMetafactory.metafactory}} with
> {{H_INVOKESTATIC}} ({{StaticTypesLambdaWriter.java:179-183, :429}}). The JDK
> returns the singleton.
> *Cannot be applied to plain {{Closure}}* (and probably should not be):
> * Every {{Closure}} carries {{owner}}, {{thisObject}}, {{delegate}},
> {{resolveStrategy}}, {{directive}}, {{parameterTypes}},
> {{maximumNumberOfParameters}}, {{bcw}} ({{Closure.java:260-274}}) — ~56-72
> bytes minimum, dwarfing a Java capturing lambda.
> * {{Closure}} instances are externally mutable: callers may set
> {{.delegate}}, {{.resolveStrategy}}, call {{.curry(...)}}. Sharing a
> singleton would let one caller's mutation leak globally.
> * In dynamic compilation, {{LambdaWriter}} delegates to {{ClosureWriter}}
> ({{LambdaWriter.java:27-56}}), so even a non-capturing lambda becomes a fresh
> {{Closure}} subclass per evaluation.
> h3. Pillar 2 — API surface (the gap)
> Groovy's GDK has *no equivalent* of Eclipse Collections' {{anySatisfyWith}}.
> Confirmed by inspection of {{DefaultGroovyMethods.java}}:
> *
> {{findAll}}/{{find}}/{{any}}/{{every}}/{{collect}}/{{inject}}/{{count}}/{{groupBy}}
> (lines 5484, 5376, 600, 5267, 2336, 8787, 3607, 7850) all take a single
> {{Closure}}.
> * The only {{BiPredicate}}/{{BiFunction}} overloads in DGM are
> {{zipWithNext}} ({{DGM.java:18515}}) and {{groupConsecutive}}
> ({{DGM.java:18696}}) — both pair _adjacent elements_, not
> element-plus-external-arg.
> Groovy's existing "don't capture" tool is {{Closure.rcurry/curry/ncurry}}
> ({{Closure.java:673, 706, 755}}). But every call allocates:
> {code:java}
> super(uncurriedClosure.clone()); // CurriedClosure.java:67 — clones the
> base Closure
> this.curriedArguments = arguments; // CurriedClosure.java:70 — fresh
> Object[]
> {code}
> So {{rcurry(value)}} costs a clone + wrapper + array, _heavier than just
> capturing_. (Confirmed by the benchmark below.)
> h3. Pillar 3 — Method references
> Already work in {{@CompileStatic}}. Same indy path, same singleton behaviour,
> no synthetic class needed (the method handle points straight at the target
> method).
> h2. Benchmark
> Sidecar code added at:
> *
> {{subprojects/performance/src/jmh/groovy/org/apache/groovy/bench/FatFreeLambda.groovy}}
> *
> {{subprojects/performance/src/jmh/groovy/org/apache/groovy/bench/FatFreeLambdaBench.java}}
> Seven variants compared, all traversing the full list (prefix never matches)
> with identical per-element work:
> ||Tag||Variant||Per-call allocation shape||
> |A|{{data.any \{ it.startsWith(prefix) \}}} — capturing closure literal|1 ×
> {{Closure}} subclass + per-element wrap|
> |B|{{data.any(pred.rcurry(prefix))}} — closure + rcurry|1 × {{Closure}} + 1 ×
> {{CurriedClosure}} + per-element args|
> |C|{{anyWith(data, (s,p) -> s.startsWith(p), prefix)}} — candidate API +
> lambda|0 (indy singleton via {{LambdaMetafactory}})|
> |D|{{anyWith(data, String::startsWith, prefix)}} — candidate API + method
> ref|0 (same; no synthetic class)|
> |E|hand-written {{for}} loop|0 (baseline)|
> |F|{{static final}} hoisted closure + rcurry|1 × {{CurriedClosure}} +
> per-element args (no per-call literal)|
> |G|{{anyPred(data, curryWith(STARTS_WITH, prefix))}} — fat-free curry
> helper|1 small capturing lambda per call|
> Setup: JMH 1.37, JDK 17.0.18 Zulu, macOS, 3×2s warmup + 5×2s × 2 forks, GC
> profiler enabled, size sweep {{@Param({"100","1000","10000"})}}. Two passes:
> default ({{indy=true}}) and {{-Pindy=false}}. Each pass ~21 min wall.
> Run command:
> {noformat}
> ./gradlew :perf:jmh -PbenchInclude=FatFreeLambda
> {noformat}
> h2. Results
> h3. Throughput (ops/µs, higher = better, {{indy=true}})
> ||Variant||size=100||size=1000||size=10000||
> |C {{anyWith}} + lambda|9.926 ± 0.105|1.001 ± 0.004|0.096 ± 0.001|
> |D {{anyWith}} + method ref|9.622 ± 0.114|0.998 ± 0.009|0.096 ± 0.001|
> |E baseline {{for}} loop|8.215 ± 0.050|0.768 ± 0.007|0.072 ± 0.001|
> |G {{curryWith}}|7.850 ± 0.237|0.927 ± 0.012|0.078 ± 0.001|
> |A capture closure|0.584 ± 0.024|0.062 ± 0.002|0.007 ± 0.001|
> |B rcurry|0.252 ± 0.010|0.025 ± 0.001|0.003 ± 0.001|
> |F static-final + rcurry|0.248 ± 0.001|0.025 ± 0.002|0.003 ± 0.001|
> h3. Allocation (bytes/op, lower = better, {{indy=true}})
> ||Variant||size=100||size=1000||size=10000||Per-element rate||
> |C/D|≈ 10⁻⁴|≈ 10⁻³|0.031|*0*|
> |E baseline|≈ 10⁻³|0.004|32 (GC-thread noise)|*0*|
> |G {{curryWith}}|*128*|*128*|*128*|*0 — constant per call*|
> |A capture closure|7,304|72,080|720,096|~72 B/elem + ~80 fixed|
> |B rcurry|9,808|96,208|960,209|~96 B/elem + ~80 fixed|
> |F shared closure + rcurry|9,764|96,176|960,153|matches B (60 B/call less)|
> h3. Indy on vs off (size=1000)
> ||Variant||indy=on ops/µs||indy=off ops/µs||Δ||indy=on B/op||indy=off B/op||
> |A capture|0.062|0.057|−8%|72,080|72,080|
> |B rcurry|0.025|0.024|−4%|96,208|96,580|
> |C {{anyWith}}+lambda|1.001|1.001|0%|0.003|0.003|
> |D {{anyWith}}+methodRef|0.998|0.999|0%|0.003|0.003|
> |E baseline|0.768|0.905|+18%|0.003|0.003|
> |F shared+rcurry|0.025|0.025|0%|96,176|96,472|
> |G curryWith|0.927|0.894|−4%|128|120|
> h2. Findings
> # *The fat-free claim holds in Groovy, with margin.* C and D allocate
> effectively zero bytes regardless of size. At size=10000, C/D move 0.03 B/op
> vs 720 KB/op for A and 960 KB/op for B — *5 orders of magnitude apart*. The
> optimisation is real and usable today via {{@CompileStatic}} + functional
> interface targets; the GDK just doesn't currently offer the API to reach it.
> # *The Groovy idiom "curry instead of capture" is empirically backwards.*
> Across every size and both indy modes, B ({{rcurry}}) is *~2.5× slower* than
> A (capturing closure) and allocates *~33% more bytes*. The cause is
> {{CurriedClosure.call}} concatenating {{curriedArguments}} with the incoming
> args into a fresh {{Object[]}} on every element. At size=10000 that's 10,000
> small arrays per call.
> # *Hoisting the base closure to {{static final}} is essentially worthless.* F
> matches B to within ~60 bytes per call across all configurations. The
> per-call closure-literal allocation is noise; the per-_element_
> {{CurriedClosure}} cost is everything. Advising users to "extract closure
> literals to constants" doesn't solve the rcurry problem.
> # *{{curryWith}} is the right shape for the adaptation gap.* G allocates a
> flat *128 B per call* regardless of workload size. At size=10000 that's
> *~7,500× less allocation than {{rcurry}}* for the same logical operation.
> Throughput-wise G is within 10% of the hand-written {{for}} loop. This covers
> the case where users _can't_ avoid producing a unary
> {{Predicate}}/{{Function}} (handing off to {{Stream.filter}},
> {{Collection.removeIf}}, etc.).
> # *Indy mode is irrelevant to this story.* Allocations are bit-identical
> between {{-Pindy=true}} and {{-Pindy=false}}; throughputs move ≤8% for the
> closure variants and 0% for the lambda variants. *The proposal helps indy and
> non-indy users equally.*
> # *C and D systematically beat E by 21-33%* across every size, {{indy=true}}.
> Reproducible across both runs. Most likely the {{BiPredicate.test →
> static-method-handle → s.startsWith(p)}} chain inlines into a slightly
> tighter loop body than the direct {{for (String s : data)
> s.startsWith(prefix)}}. Not load-bearing for the proposal, but worth noting
> as the strongest possible refutation of "Groovy lambdas can't match Java
> loops".
> h2. Proposed actions (_proposed pending team review_)
> h3. Primary: add {{*With}} overloads to {{DefaultGroovyMethods}}
> Additive — no source or binary breakage. ~8–12 methods cover the high-traffic
> surface:
> {code:java}
> public static <T,P> List<T> findAll(Iterable<T> self, BiPredicate<? super
> T, ? super P> test, P p);
> public static <T,P> boolean any (Iterable<T> self, BiPredicate<? super
> T, ? super P> test, P p);
> public static <T,P> boolean every (Iterable<T> self, BiPredicate<? super
> T, ? super P> test, P p);
> public static <T,P> Number count (Iterable<T> self, BiPredicate<? super
> T, ? super P> test, P p);
> public static <T,P,R> List<R> collect(Iterable<T> self, BiFunction<? super
> T, ? super P, ? extends R> f, P p);
> public static <T,P> void each(Iterable<T> self, BiConsumer<? super T,
> ? super P> c, P p);
> public static <T,P> T find(Iterable<T> self, BiPredicate<? super T,
> ? super P> test, P p);
> public static <T,P,K> Map<K,List<T>> groupBy(Iterable<T> self, BiFunction<?
> super T, ? super P, K> classifier, P p);
> // + inject/reduce with seed; *With variants on Map<K,V>
> {code}
> Open naming question: {{With}}-suffixed overloads (mirrors Eclipse
> Collections) vs. unsuffixed overloads disambiguated by argument types. The
> latter is the more usual Groovy approach; the former is more obviously
> discoverable. _Proposed pending team review._
> h3. Primary: add a {{curryWith}} helper
> {code:groovy}
> public static <T,P> Predicate<T> curryWith(BiPredicate<? super T, ? super P>
> bi, P p);
> public static <T,P,R> Function<T,R> curryWith(BiFunction<? super T, ? super
> P, ? extends R> f, P p);
> public static <T,P> Consumer<T> curryWith(BiConsumer<? super T, ? super P>
> c, P p);
> {code}
> Right-curry semantics (the value lands in the second/last parameter slot,
> matching what {{*With}} consumers want). Should ship alongside {{*With}} —
> the {{rcurry}} → {{*With}} → {{curryWith}} ratio (~33,000× → 1× → ~7,500×) is
> the strongest argument for shipping the helper in the same release, so users
> have a fat-free path for every shape.
> h3. Bonus ticket A: {{CurriedClosure}} per-element allocation
> {{CurriedClosure.call}} concatenates {{curriedArguments}} with the incoming
> args into a fresh {{Object[]}} on every invocation
> ({{CurriedClosure.java:81-99}}). The dominant cost in every closure+curry
> path in this benchmark. A no-allocation variant for the common case
> ({{curriedArguments.length == 1 && args.length == 1}}) would save ~24
> B/element on every rcurry — a lift for everyone already using rcurry in
> production code, without anyone needing to change source.
> h3. Bonus ticket B: {{LambdaWriter}} {{Reference}} wrapping for
> effectively-final method parameters
> LambdaWriter wraps effectively-final method parameters in
> {{groovy.lang.Reference}} — visible in G's 128 B/call constant (synthetic
> lambda constructor signature: {{(Object, Object, Reference, Reference)V}}).
> javac captures such parameters by value into the synthetic lambda's final
> fields with no wrapping; Groovy could do the same, eliminating the
> per-capture {{Reference}} allocations. Lower priority than bonus A because
> JIT escape-analysis often elides these wrappers in hot paths, but they show
> up in cold paths, megamorphic call sites, and interpreter-only execution.
> h2. Caveats
> * Benchmark uses 1000-element traversal with a cheap predicate
> ({{startsWith}}); closure overhead is maximally exposed. With heavier
> per-element work the proportional gap narrows.
> * Single dev machine; no CPU pinning. Throughputs are reproducible to ~10%,
> allocations to <1 byte.
> * JEP 450 compact object headers would further widen the C/D advantage but
> require Java 25+; Groovy's minimum supported JDK is currently 17 (commit
> {{e9b9cf1ec5}} / GROOVY-12027).
> * The proposed {{*With}} family pays off most under {{@CompileStatic}} +
> lambda syntax. Dynamic-Groovy callers using closures get no allocation win —
> but the closure overloads remain, so no regression.
> * *Related:* GROOVY-11905 (non-capturing lambda optimisation — landed)
> h2. Appendix — raw bench output
> Two full-fidelity passes saved at:
> * Run 1 ({{indy=true}}): {{/tmp/fatfree-bench/run1-indy-on.\{log,txt\}}}
> * Run 2 ({{indy=false}}): {{/tmp/fatfree-bench/run2-indy-off.\{log,txt\}}}
> Bench source:
> {{subprojects/performance/src/jmh/groovy/org/apache/groovy/bench/FatFreeLambda\{,Bench\}.\{groovy,java\}}}
> To reproduce with GC profiler:
> {noformat}
> # Add `profilers = ['gc']` to the jmh { } block in
> # build-logic/src/main/groovy/org.apache.groovy-performance.gradle
> ./gradlew :perf:jmh -PbenchInclude=FatFreeLambda # indy=true
> ./gradlew :perf:jmh -PbenchInclude=FatFreeLambda -Pindy=false # indy=false
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)