[jira] [Updated] (IGNITE-28606) СacheObject.prepareMarshal is not thread-safe — concurrent callers duplicate marshalling work

Alex Abashev (Jira) Fri, 24 Apr 2026 08:53:01 -0700


     [ 
https://issues.apache.org/jira/browse/IGNITE-28606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Alex Abashev updated IGNITE-28606:
----------------------------------
    Description: 
## Summary

`KeyCacheObjectImpl.prepareMarshal(CacheObjectValueContext)` and 
`CacheObjectImpl.prepareMarshal(...)` rely on an unsynchronized `if (valBytes 
== null)` guard as their idempotency check. Under concurrent access — e.g. when 
the same `CacheObject` instance is referenced from multiple transaction entries 
that are marshalled from different threads — two callers can both observe 
`valBytes == null`, both invoke `ctx.marshal(val)`, and race to publish their 
result. The work is duplicated (CPU + allocations) and the final field value 
depends on write ordering.

## Where observed

Surfaced during IGNITE-28520 Phase 2 work (migration of hand-written 
`prepareMarshal` to auto-invocation of generated 
`MessageSerializer.prepareMarshalCacheObjects`). An experimental "called at 
most once per CO instance" assertion was added to `CacheObjectAdapter` to 
validate the new path. Running `IgniteCacheQueryMultiThreadedSelfTest` with 
`-ea` produced:

```
java.lang.AssertionError: prepareMarshal called more than once on 
KeyCacheObjectImpl
    at 
org.apache.ignite.internal.processors.cache.CacheObjectAdapter.assertFirstPrepareMarshal(CacheObjectAdapter.java:53)
    at 
org.apache.ignite.internal.processors.cache.KeyCacheObjectImpl.prepareMarshal(KeyCacheObjectImpl.java:119)
    at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxEntry.marshal(IgniteTxEntry.java:946)
    at 
org.apache.ignite.internal.processors.cache.GridCacheMessage.marshalTx(GridCacheMessage.java:380)
    at 
org.apache.ignite.internal.processors.cache.distributed.GridDistributedTxPrepareRequest.prepareMarshal(GridDistributedTxPrepareRequest.java:379)
    at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onSend(GridCacheIoManager.java:1152)
    at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1223)
    ...
```

The assertion was rolled back because it was incompatible with today's 
intentionally-racy design (see IGNITE-28520 Phase 2 audit, "Rejected idea: 
per-CO marshalled-once assertion"). The underlying race remains and is the 
subject of this ticket.

## Root cause

```java
// KeyCacheObjectImpl
@Override public void prepareMarshal(CacheObjectValueContext ctx) throws 
IgniteCheckedException {
    if (valBytes == null)                  // (1) unsynchronized read
        valBytes = ctx.marshal(val);       // (2) unsynchronized write
}
```

Two threads executing `(1)` concurrently both see `null`, both execute `(2)`. 
`valBytes` is not `volatile`, so publication is also unspecified — another 
thread reading the field after the race can observe `null` even after both 
writers completed on other cores.

Same pattern in `CacheObjectImpl.prepareMarshal`. 
`CacheObjectByteArrayImpl.prepareMarshal` is a no-op and is unaffected.

## Why it's a functional no-op today (but still a bug)

- `val` is effectively immutable once the `CacheObject` is constructed.
- `ctx.marshal(val)` is deterministic: equal inputs produce equal byte arrays.
- Final read by the NIO writer goes through `writeExternal` / `putValue` / 
`valueBytesLength`, each of which calls `valueBytes(ctx)` that re-does the 
idempotency check and will populate `valBytes` on the calling thread if both 
racers somehow lost their writes.

So the race does not corrupt on-wire output in practice. What it does cost:

- **Wasted CPU**: duplicate `Marshaller.marshal` calls on contended keys (hot 
paths during tx prepare when the same key appears in multiple `IgniteTxEntry` 
instances).
- **Wasted allocation**: each losing racer produces a `byte[]` that's 
immediately garbage.
- **Debugging friction**: any future audit or assertion that presumes "marshal 
happens once per CO" cannot be added without first fixing this.
- **Specification drift**: the contract is effectively "idempotent under 
concurrency by accident" — reliant on value immutability and marshaller 
determinism rather than stated.

## Reproduction

- Run `IgniteCacheQueryMultiThreadedSelfTest` with `-ea` after temporarily 
re-adding the assertion from the Phase 2 audit:
  ```java
  // in KeyCacheObjectImpl.prepareMarshal
  if (valBytes == null) {
      assert assertFirstPrepareMarshal();
      valBytes = ctx.marshal(val);
  }
  ```
  Plus the transient flag + helper on `CacheObjectAdapter`.
- The assertion fires from the `IgniteTxEntry.marshal → key.prepareMarshal` 
call site when a key is shared across concurrent prepare flows.

## Proposed fix

Synchronize the check-and-set. Two options, in order of preference:

1. **Double-checked locking on a per-instance monitor**, keeping the hot no-op 
branch lock-free:
   ```java
   @Override public void prepareMarshal(CacheObjectValueContext ctx) throws 
IgniteCheckedException {
       if (valBytes != null)
           return;
       synchronized (this) {
           if (valBytes == null)
               valBytes = ctx.marshal(val);
       }
   }
   ```
   Requires making `valBytes` `volatile` for safe DCL. Uncontended cost: one 
volatile read. Contended: `synchronized` on the `CacheObject` itself, which is 
already the per-key identity.

2. **Atomic publish via `VarHandle` / `Unsafe.compareAndSetObject`** on 
`valBytes` — winner's bytes publish, loser's garbage. Avoids entering 
`synchronized`, but requires a VarHandle field and the losing thread still does 
the marshal work.

Option 1 eliminates both races and duplicate work; option 2 eliminates the race 
but preserves duplicate work. Option 1 is recommended.

Mirror the same treatment in `CacheObjectImpl.prepareMarshal` (which calls 
`valueBytesFromValue(ctx)` instead of `ctx.marshal(val)`).

## Scope

- `modules/core`:
  - `KeyCacheObjectImpl.prepareMarshal`
  - `CacheObjectImpl.prepareMarshal`
  - `CacheObjectAdapter.valBytes` → `volatile byte[]`
- No wire-protocol change (field layout unchanged).
- No API change.

## Acceptance criteria

- Under `-ea`, the Phase 2 audit assertion re-added to 
`KeyCacheObjectImpl.prepareMarshal` / `CacheObjectImpl.prepareMarshal` does not 
fire in `IgniteCacheQueryMultiThreadedSelfTest` or other multi-threaded suites.
- No measurable throughput regression on `GridCachePutAllFailoverSelfTest` and 
`GridCacheRebalancingSyncSelfTest` (these exercise the same CO marshal paths on 
the hot path).
- Unit test: two threads race on a fresh `KeyCacheObjectImpl.prepareMarshal` — 
`ctx.marshal(val)` is invoked exactly once (verified via a counting 
`CacheObjectValueContext` mock).

  was:
h3. Background

A new code-generated serialization mechanism (`MessageSerializer`) has been 
introduced as a replacement for the legacy approach where serialization logic 
was embedded directly into message classes (`writeTo`/`readFrom`). To ensure 
the new approach does not introduce performance regressions, JMH benchmarks are 
needed.

h3. Problem with Current Implementation

The current PR benchmarks only the final `writeTo`/`readFrom` step, which is 
effectively identical code in both approaches. The actual behavioral difference 
between legacy and new serialization occurs at earlier stages: message 
instantiation, dispatcher lookup, and factory calls. Additionally, using a 
single real production message (`CacheMetricsMessage`) is problematic because:

- It will evolve over time, making the legacy copy stale and the benchmark 
incorrect
- It keeps legacy code in master permanently, blocking full removal of the old 
mechanism
- A single message type allows JVM to inline `ctor.newInstance()`, hiding 
real-world dispatch costs
- It only covers primitive fields, missing the parts of codegen that actually 
changed (nested messages, collections)

h2. Proposed Approach

h3. Synthetic Test Messages

Create a set of dedicated synthetic messages that will never evolve, each 
implemented twice — once with legacy inline serialization and once via the new 
`MessageSerializer` framework. The messages should cover a range of field 
complexity:

| Message | Contents | Purpose |
|---|---|---|
| `BenchSingleFieldMessage` | 1 primitive field | Minimal baseline |
| `BenchSimpleMessage` | ~5 primitives (int, long, boolean, UUID, String) | 
Basic primitive case |
| `BenchLargeMessage` | 100 primitive fields of various types | Stress test for 
primitive-heavy messages |
| `BenchNestedMessage` | Primitives + 2–3 nested synthetic messages | Covers 
nested message serialization changes |
| `BenchCollectionMessage` | Primitives + List of nested synthetic messages | 
Covers collection serialization changes |

All nested messages used in `BenchNestedMessage` and `BenchCollectionMessage` 
must also be synthetic (no production message types).

Running all five message types within a single benchmark ensures JVM cannot 
effectively inline the message constructor call, producing results 
representative of real-world dispatch behavior.

h3. Benchmark Scenarios

**Write benchmark** (`benchWrite`):

1. During `@Setup`: generate random test data for all fields and store in 
memory.
2. During benchmark iteration: instantiate a new message, populate it with the 
pre-generated test data, serialize it into a reusable output buffer.

**Read benchmark** (`benchRead`):

1. During `@Setup`: generate random test data, serialize it into pre-filled 
byte buffers — one buffer per message type, stored in memory.
2. During benchmark iteration: instantiate a new message, read the message type 
discriminator from the buffer, deserialize the full message from the buffer.

Both scenarios are implemented for both the legacy and new approaches, yielding 
four benchmark methods per message type.

h3. What This Does Not Cover

End-to-end communication layer performance (including network I/O, thread 
handoff, etc.) is out of scope for JMH. JMH is a microbenchmark tool and should 
remain focused on the serialization layer in isolation. Broader performance 
impact should be validated on dedicated load testing stands.


> СacheObject.prepareMarshal is not thread-safe — concurrent callers duplicate 
> marshalling work
> ---------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-28606
>                 URL: https://issues.apache.org/jira/browse/IGNITE-28606
>             Project: Ignite
>          Issue Type: Task
>            Reporter: Alex Abashev
>            Assignee: Alex Abashev
>            Priority: Minor
>              Labels: IEP-132, ise
>
> ## Summary
> `KeyCacheObjectImpl.prepareMarshal(CacheObjectValueContext)` and 
> `CacheObjectImpl.prepareMarshal(...)` rely on an unsynchronized `if (valBytes 
> == null)` guard as their idempotency check. Under concurrent access — e.g. 
> when the same `CacheObject` instance is referenced from multiple transaction 
> entries that are marshalled from different threads — two callers can both 
> observe `valBytes == null`, both invoke `ctx.marshal(val)`, and race to 
> publish their result. The work is duplicated (CPU + allocations) and the 
> final field value depends on write ordering.
> ## Where observed
> Surfaced during IGNITE-28520 Phase 2 work (migration of hand-written 
> `prepareMarshal` to auto-invocation of generated 
> `MessageSerializer.prepareMarshalCacheObjects`). An experimental "called at 
> most once per CO instance" assertion was added to `CacheObjectAdapter` to 
> validate the new path. Running `IgniteCacheQueryMultiThreadedSelfTest` with 
> `-ea` produced:
> ```
> java.lang.AssertionError: prepareMarshal called more than once on 
> KeyCacheObjectImpl
>     at 
> org.apache.ignite.internal.processors.cache.CacheObjectAdapter.assertFirstPrepareMarshal(CacheObjectAdapter.java:53)
>     at 
> org.apache.ignite.internal.processors.cache.KeyCacheObjectImpl.prepareMarshal(KeyCacheObjectImpl.java:119)
>     at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxEntry.marshal(IgniteTxEntry.java:946)
>     at 
> org.apache.ignite.internal.processors.cache.GridCacheMessage.marshalTx(GridCacheMessage.java:380)
>     at 
> org.apache.ignite.internal.processors.cache.distributed.GridDistributedTxPrepareRequest.prepareMarshal(GridDistributedTxPrepareRequest.java:379)
>     at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.onSend(GridCacheIoManager.java:1152)
>     at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1223)
>     ...
> ```
> The assertion was rolled back because it was incompatible with today's 
> intentionally-racy design (see IGNITE-28520 Phase 2 audit, "Rejected idea: 
> per-CO marshalled-once assertion"). The underlying race remains and is the 
> subject of this ticket.
> ## Root cause
> ```java
> // KeyCacheObjectImpl
> @Override public void prepareMarshal(CacheObjectValueContext ctx) throws 
> IgniteCheckedException {
>     if (valBytes == null)                  // (1) unsynchronized read
>         valBytes = ctx.marshal(val);       // (2) unsynchronized write
> }
> ```
> Two threads executing `(1)` concurrently both see `null`, both execute `(2)`. 
> `valBytes` is not `volatile`, so publication is also unspecified — another 
> thread reading the field after the race can observe `null` even after both 
> writers completed on other cores.
> Same pattern in `CacheObjectImpl.prepareMarshal`. 
> `CacheObjectByteArrayImpl.prepareMarshal` is a no-op and is unaffected.
> ## Why it's a functional no-op today (but still a bug)
> - `val` is effectively immutable once the `CacheObject` is constructed.
> - `ctx.marshal(val)` is deterministic: equal inputs produce equal byte arrays.
> - Final read by the NIO writer goes through `writeExternal` / `putValue` / 
> `valueBytesLength`, each of which calls `valueBytes(ctx)` that re-does the 
> idempotency check and will populate `valBytes` on the calling thread if both 
> racers somehow lost their writes.
> So the race does not corrupt on-wire output in practice. What it does cost:
> - **Wasted CPU**: duplicate `Marshaller.marshal` calls on contended keys (hot 
> paths during tx prepare when the same key appears in multiple `IgniteTxEntry` 
> instances).
> - **Wasted allocation**: each losing racer produces a `byte[]` that's 
> immediately garbage.
> - **Debugging friction**: any future audit or assertion that presumes 
> "marshal happens once per CO" cannot be added without first fixing this.
> - **Specification drift**: the contract is effectively "idempotent under 
> concurrency by accident" — reliant on value immutability and marshaller 
> determinism rather than stated.
> ## Reproduction
> - Run `IgniteCacheQueryMultiThreadedSelfTest` with `-ea` after temporarily 
> re-adding the assertion from the Phase 2 audit:
>   ```java
>   // in KeyCacheObjectImpl.prepareMarshal
>   if (valBytes == null) {
>       assert assertFirstPrepareMarshal();
>       valBytes = ctx.marshal(val);
>   }
>   ```
>   Plus the transient flag + helper on `CacheObjectAdapter`.
> - The assertion fires from the `IgniteTxEntry.marshal → key.prepareMarshal` 
> call site when a key is shared across concurrent prepare flows.
> ## Proposed fix
> Synchronize the check-and-set. Two options, in order of preference:
> 1. **Double-checked locking on a per-instance monitor**, keeping the hot 
> no-op branch lock-free:
>    ```java
>    @Override public void prepareMarshal(CacheObjectValueContext ctx) throws 
> IgniteCheckedException {
>        if (valBytes != null)
>            return;
>        synchronized (this) {
>            if (valBytes == null)
>                valBytes = ctx.marshal(val);
>        }
>    }
>    ```
>    Requires making `valBytes` `volatile` for safe DCL. Uncontended cost: one 
> volatile read. Contended: `synchronized` on the `CacheObject` itself, which 
> is already the per-key identity.
> 2. **Atomic publish via `VarHandle` / `Unsafe.compareAndSetObject`** on 
> `valBytes` — winner's bytes publish, loser's garbage. Avoids entering 
> `synchronized`, but requires a VarHandle field and the losing thread still 
> does the marshal work.
> Option 1 eliminates both races and duplicate work; option 2 eliminates the 
> race but preserves duplicate work. Option 1 is recommended.
> Mirror the same treatment in `CacheObjectImpl.prepareMarshal` (which calls 
> `valueBytesFromValue(ctx)` instead of `ctx.marshal(val)`).
> ## Scope
> - `modules/core`:
>   - `KeyCacheObjectImpl.prepareMarshal`
>   - `CacheObjectImpl.prepareMarshal`
>   - `CacheObjectAdapter.valBytes` → `volatile byte[]`
> - No wire-protocol change (field layout unchanged).
> - No API change.
> ## Acceptance criteria
> - Under `-ea`, the Phase 2 audit assertion re-added to 
> `KeyCacheObjectImpl.prepareMarshal` / `CacheObjectImpl.prepareMarshal` does 
> not fire in `IgniteCacheQueryMultiThreadedSelfTest` or other multi-threaded 
> suites.
> - No measurable throughput regression on `GridCachePutAllFailoverSelfTest` 
> and `GridCacheRebalancingSyncSelfTest` (these exercise the same CO marshal 
> paths on the hot path).
> - Unit test: two threads race on a fresh `KeyCacheObjectImpl.prepareMarshal` 
> — `ctx.marshal(val)` is invoked exactly once (verified via a counting 
> `CacheObjectValueContext` mock).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-28606) СacheObject.prepareMarshal is not thread-safe — concurrent callers duplicate marshalling work

Reply via email to