[
https://issues.apache.org/jira/browse/IGNITE-28606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alex Abashev updated IGNITE-28606:
----------------------------------
Description:
## Summary
`KeyCacheObjectImpl.prepareMarshal(CacheObjectValueContext)` and
`CacheObjectImpl.prepareMarshal(...)` rely on an unsynchronized `if (valBytes
== null)` guard as their idempotency check. Under concurrent access — e.g. when
the same `CacheObject` instance is referenced from multiple transaction entries
that are marshalled from different threads — two callers can both observe
`valBytes == null`, both invoke `ctx.marshal(val)`, and race to publish their
result. The work is duplicated (CPU + allocations) and the final field value
depends on write ordering.
## Where observed
Surfaced during IGNITE-28520 Phase 2 work (migration of hand-written
`prepareMarshal` to auto-invocation of generated
`MessageSerializer.prepareMarshalCacheObjects`). An experimental "called at
most once per CO instance" assertion was added to `CacheObjectAdapter` to
validate the new path. Running `IgniteCacheQueryMultiThreadedSelfTest` with
`-ea` produced:
```
java.lang.AssertionError: prepareMarshal called more than once on
KeyCacheObjectImpl
at
org.apache.ignite.internal.processors.cache.CacheObjectAdapter.assertFirstPrepareMarshal(CacheObjectAdapter.java:53)
at
org.apache.ignite.internal.processors.cache.KeyCacheObjectImpl.prepareMarshal(KeyCacheObjectImpl.java:119)
at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxEntry.marshal(IgniteTxEntry.java:946)
at
org.apache.ignite.internal.processors.cache.GridCacheMessage.marshalTx(GridCacheMessage.java:380)
at
org.apache.ignite.internal.processors.cache.distributed.GridDistributedTxPrepareRequest.prepareMarshal(GridDistributedTxPrepareRequest.java:379)
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onSend(GridCacheIoManager.java:1152)
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1223)
...
```
The assertion was rolled back because it was incompatible with today's
intentionally-racy design (see IGNITE-28520 Phase 2 audit, "Rejected idea:
per-CO marshalled-once assertion"). The underlying race remains and is the
subject of this ticket.
## Root cause
```java
// KeyCacheObjectImpl
@Override public void prepareMarshal(CacheObjectValueContext ctx) throws
IgniteCheckedException {
if (valBytes == null) // (1) unsynchronized read
valBytes = ctx.marshal(val); // (2) unsynchronized write
}
```
Two threads executing `(1)` concurrently both see `null`, both execute `(2)`.
`valBytes` is not `volatile`, so publication is also unspecified — another
thread reading the field after the race can observe `null` even after both
writers completed on other cores.
Same pattern in `CacheObjectImpl.prepareMarshal`.
`CacheObjectByteArrayImpl.prepareMarshal` is a no-op and is unaffected.
## Why it's a functional no-op today (but still a bug)
- `val` is effectively immutable once the `CacheObject` is constructed.
- `ctx.marshal(val)` is deterministic: equal inputs produce equal byte arrays.
- Final read by the NIO writer goes through `writeExternal` / `putValue` /
`valueBytesLength`, each of which calls `valueBytes(ctx)` that re-does the
idempotency check and will populate `valBytes` on the calling thread if both
racers somehow lost their writes.
So the race does not corrupt on-wire output in practice. What it does cost:
- **Wasted CPU**: duplicate `Marshaller.marshal` calls on contended keys (hot
paths during tx prepare when the same key appears in multiple `IgniteTxEntry`
instances).
- **Wasted allocation**: each losing racer produces a `byte[]` that's
immediately garbage.
- **Debugging friction**: any future audit or assertion that presumes "marshal
happens once per CO" cannot be added without first fixing this.
- **Specification drift**: the contract is effectively "idempotent under
concurrency by accident" — reliant on value immutability and marshaller
determinism rather than stated.
## Reproduction
- Run `IgniteCacheQueryMultiThreadedSelfTest` with `-ea` after temporarily
re-adding the assertion from the Phase 2 audit:
```java
// in KeyCacheObjectImpl.prepareMarshal
if (valBytes == null) {
assert assertFirstPrepareMarshal();
valBytes = ctx.marshal(val);
}
```
Plus the transient flag + helper on `CacheObjectAdapter`.
- The assertion fires from the `IgniteTxEntry.marshal → key.prepareMarshal`
call site when a key is shared across concurrent prepare flows.
## Proposed fix
Synchronize the check-and-set. Two options, in order of preference:
1. **Double-checked locking on a per-instance monitor**, keeping the hot no-op
branch lock-free:
```java
@Override public void prepareMarshal(CacheObjectValueContext ctx) throws
IgniteCheckedException {
if (valBytes != null)
return;
synchronized (this) {
if (valBytes == null)
valBytes = ctx.marshal(val);
}
}
```
Requires making `valBytes` `volatile` for safe DCL. Uncontended cost: one
volatile read. Contended: `synchronized` on the `CacheObject` itself, which is
already the per-key identity.
2. **Atomic publish via `VarHandle` / `Unsafe.compareAndSetObject`** on
`valBytes` — winner's bytes publish, loser's garbage. Avoids entering
`synchronized`, but requires a VarHandle field and the losing thread still does
the marshal work.
Option 1 eliminates both races and duplicate work; option 2 eliminates the race
but preserves duplicate work. Option 1 is recommended.
Mirror the same treatment in `CacheObjectImpl.prepareMarshal` (which calls
`valueBytesFromValue(ctx)` instead of `ctx.marshal(val)`).
## Scope
- `modules/core`:
- `KeyCacheObjectImpl.prepareMarshal`
- `CacheObjectImpl.prepareMarshal`
- `CacheObjectAdapter.valBytes` → `volatile byte[]`
- No wire-protocol change (field layout unchanged).
- No API change.
## Acceptance criteria
- Under `-ea`, the Phase 2 audit assertion re-added to
`KeyCacheObjectImpl.prepareMarshal` / `CacheObjectImpl.prepareMarshal` does not
fire in `IgniteCacheQueryMultiThreadedSelfTest` or other multi-threaded suites.
- No measurable throughput regression on `GridCachePutAllFailoverSelfTest` and
`GridCacheRebalancingSyncSelfTest` (these exercise the same CO marshal paths on
the hot path).
- Unit test: two threads race on a fresh `KeyCacheObjectImpl.prepareMarshal` —
`ctx.marshal(val)` is invoked exactly once (verified via a counting
`CacheObjectValueContext` mock).
was:
h3. Background
A new code-generated serialization mechanism (`MessageSerializer`) has been
introduced as a replacement for the legacy approach where serialization logic
was embedded directly into message classes (`writeTo`/`readFrom`). To ensure
the new approach does not introduce performance regressions, JMH benchmarks are
needed.
h3. Problem with Current Implementation
The current PR benchmarks only the final `writeTo`/`readFrom` step, which is
effectively identical code in both approaches. The actual behavioral difference
between legacy and new serialization occurs at earlier stages: message
instantiation, dispatcher lookup, and factory calls. Additionally, using a
single real production message (`CacheMetricsMessage`) is problematic because:
- It will evolve over time, making the legacy copy stale and the benchmark
incorrect
- It keeps legacy code in master permanently, blocking full removal of the old
mechanism
- A single message type allows JVM to inline `ctor.newInstance()`, hiding
real-world dispatch costs
- It only covers primitive fields, missing the parts of codegen that actually
changed (nested messages, collections)
h2. Proposed Approach
h3. Synthetic Test Messages
Create a set of dedicated synthetic messages that will never evolve, each
implemented twice — once with legacy inline serialization and once via the new
`MessageSerializer` framework. The messages should cover a range of field
complexity:
| Message | Contents | Purpose |
|---|---|---|
| `BenchSingleFieldMessage` | 1 primitive field | Minimal baseline |
| `BenchSimpleMessage` | ~5 primitives (int, long, boolean, UUID, String) |
Basic primitive case |
| `BenchLargeMessage` | 100 primitive fields of various types | Stress test for
primitive-heavy messages |
| `BenchNestedMessage` | Primitives + 2–3 nested synthetic messages | Covers
nested message serialization changes |
| `BenchCollectionMessage` | Primitives + List of nested synthetic messages |
Covers collection serialization changes |
All nested messages used in `BenchNestedMessage` and `BenchCollectionMessage`
must also be synthetic (no production message types).
Running all five message types within a single benchmark ensures JVM cannot
effectively inline the message constructor call, producing results
representative of real-world dispatch behavior.
h3. Benchmark Scenarios
**Write benchmark** (`benchWrite`):
1. During `@Setup`: generate random test data for all fields and store in
memory.
2. During benchmark iteration: instantiate a new message, populate it with the
pre-generated test data, serialize it into a reusable output buffer.
**Read benchmark** (`benchRead`):
1. During `@Setup`: generate random test data, serialize it into pre-filled
byte buffers — one buffer per message type, stored in memory.
2. During benchmark iteration: instantiate a new message, read the message type
discriminator from the buffer, deserialize the full message from the buffer.
Both scenarios are implemented for both the legacy and new approaches, yielding
four benchmark methods per message type.
h3. What This Does Not Cover
End-to-end communication layer performance (including network I/O, thread
handoff, etc.) is out of scope for JMH. JMH is a microbenchmark tool and should
remain focused on the serialization layer in isolation. Broader performance
impact should be validated on dedicated load testing stands.
> СacheObject.prepareMarshal is not thread-safe — concurrent callers duplicate
> marshalling work
> ---------------------------------------------------------------------------------------------
>
> Key: IGNITE-28606
> URL: https://issues.apache.org/jira/browse/IGNITE-28606
> Project: Ignite
> Issue Type: Task
> Reporter: Alex Abashev
> Assignee: Alex Abashev
> Priority: Minor
> Labels: IEP-132, ise
>
> ## Summary
> `KeyCacheObjectImpl.prepareMarshal(CacheObjectValueContext)` and
> `CacheObjectImpl.prepareMarshal(...)` rely on an unsynchronized `if (valBytes
> == null)` guard as their idempotency check. Under concurrent access — e.g.
> when the same `CacheObject` instance is referenced from multiple transaction
> entries that are marshalled from different threads — two callers can both
> observe `valBytes == null`, both invoke `ctx.marshal(val)`, and race to
> publish their result. The work is duplicated (CPU + allocations) and the
> final field value depends on write ordering.
> ## Where observed
> Surfaced during IGNITE-28520 Phase 2 work (migration of hand-written
> `prepareMarshal` to auto-invocation of generated
> `MessageSerializer.prepareMarshalCacheObjects`). An experimental "called at
> most once per CO instance" assertion was added to `CacheObjectAdapter` to
> validate the new path. Running `IgniteCacheQueryMultiThreadedSelfTest` with
> `-ea` produced:
> ```
> java.lang.AssertionError: prepareMarshal called more than once on
> KeyCacheObjectImpl
> at
> org.apache.ignite.internal.processors.cache.CacheObjectAdapter.assertFirstPrepareMarshal(CacheObjectAdapter.java:53)
> at
> org.apache.ignite.internal.processors.cache.KeyCacheObjectImpl.prepareMarshal(KeyCacheObjectImpl.java:119)
> at
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxEntry.marshal(IgniteTxEntry.java:946)
> at
> org.apache.ignite.internal.processors.cache.GridCacheMessage.marshalTx(GridCacheMessage.java:380)
> at
> org.apache.ignite.internal.processors.cache.distributed.GridDistributedTxPrepareRequest.prepareMarshal(GridDistributedTxPrepareRequest.java:379)
> at
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.onSend(GridCacheIoManager.java:1152)
> at
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1223)
> ...
> ```
> The assertion was rolled back because it was incompatible with today's
> intentionally-racy design (see IGNITE-28520 Phase 2 audit, "Rejected idea:
> per-CO marshalled-once assertion"). The underlying race remains and is the
> subject of this ticket.
> ## Root cause
> ```java
> // KeyCacheObjectImpl
> @Override public void prepareMarshal(CacheObjectValueContext ctx) throws
> IgniteCheckedException {
> if (valBytes == null) // (1) unsynchronized read
> valBytes = ctx.marshal(val); // (2) unsynchronized write
> }
> ```
> Two threads executing `(1)` concurrently both see `null`, both execute `(2)`.
> `valBytes` is not `volatile`, so publication is also unspecified — another
> thread reading the field after the race can observe `null` even after both
> writers completed on other cores.
> Same pattern in `CacheObjectImpl.prepareMarshal`.
> `CacheObjectByteArrayImpl.prepareMarshal` is a no-op and is unaffected.
> ## Why it's a functional no-op today (but still a bug)
> - `val` is effectively immutable once the `CacheObject` is constructed.
> - `ctx.marshal(val)` is deterministic: equal inputs produce equal byte arrays.
> - Final read by the NIO writer goes through `writeExternal` / `putValue` /
> `valueBytesLength`, each of which calls `valueBytes(ctx)` that re-does the
> idempotency check and will populate `valBytes` on the calling thread if both
> racers somehow lost their writes.
> So the race does not corrupt on-wire output in practice. What it does cost:
> - **Wasted CPU**: duplicate `Marshaller.marshal` calls on contended keys (hot
> paths during tx prepare when the same key appears in multiple `IgniteTxEntry`
> instances).
> - **Wasted allocation**: each losing racer produces a `byte[]` that's
> immediately garbage.
> - **Debugging friction**: any future audit or assertion that presumes
> "marshal happens once per CO" cannot be added without first fixing this.
> - **Specification drift**: the contract is effectively "idempotent under
> concurrency by accident" — reliant on value immutability and marshaller
> determinism rather than stated.
> ## Reproduction
> - Run `IgniteCacheQueryMultiThreadedSelfTest` with `-ea` after temporarily
> re-adding the assertion from the Phase 2 audit:
> ```java
> // in KeyCacheObjectImpl.prepareMarshal
> if (valBytes == null) {
> assert assertFirstPrepareMarshal();
> valBytes = ctx.marshal(val);
> }
> ```
> Plus the transient flag + helper on `CacheObjectAdapter`.
> - The assertion fires from the `IgniteTxEntry.marshal → key.prepareMarshal`
> call site when a key is shared across concurrent prepare flows.
> ## Proposed fix
> Synchronize the check-and-set. Two options, in order of preference:
> 1. **Double-checked locking on a per-instance monitor**, keeping the hot
> no-op branch lock-free:
> ```java
> @Override public void prepareMarshal(CacheObjectValueContext ctx) throws
> IgniteCheckedException {
> if (valBytes != null)
> return;
> synchronized (this) {
> if (valBytes == null)
> valBytes = ctx.marshal(val);
> }
> }
> ```
> Requires making `valBytes` `volatile` for safe DCL. Uncontended cost: one
> volatile read. Contended: `synchronized` on the `CacheObject` itself, which
> is already the per-key identity.
> 2. **Atomic publish via `VarHandle` / `Unsafe.compareAndSetObject`** on
> `valBytes` — winner's bytes publish, loser's garbage. Avoids entering
> `synchronized`, but requires a VarHandle field and the losing thread still
> does the marshal work.
> Option 1 eliminates both races and duplicate work; option 2 eliminates the
> race but preserves duplicate work. Option 1 is recommended.
> Mirror the same treatment in `CacheObjectImpl.prepareMarshal` (which calls
> `valueBytesFromValue(ctx)` instead of `ctx.marshal(val)`).
> ## Scope
> - `modules/core`:
> - `KeyCacheObjectImpl.prepareMarshal`
> - `CacheObjectImpl.prepareMarshal`
> - `CacheObjectAdapter.valBytes` → `volatile byte[]`
> - No wire-protocol change (field layout unchanged).
> - No API change.
> ## Acceptance criteria
> - Under `-ea`, the Phase 2 audit assertion re-added to
> `KeyCacheObjectImpl.prepareMarshal` / `CacheObjectImpl.prepareMarshal` does
> not fire in `IgniteCacheQueryMultiThreadedSelfTest` or other multi-threaded
> suites.
> - No measurable throughput regression on `GridCachePutAllFailoverSelfTest`
> and `GridCacheRebalancingSyncSelfTest` (these exercise the same CO marshal
> paths on the hot path).
> - Unit test: two threads race on a fresh `KeyCacheObjectImpl.prepareMarshal`
> — `ctx.marshal(val)` is invoked exactly once (verified via a counting
> `CacheObjectValueContext` mock).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)