Alex Abashev created IGNITE-28606:
-------------------------------------
Summary: acheObject.prepareMarshal is not thread-safe — concurrent
callers duplicate marshalling work
Key: IGNITE-28606
URL: https://issues.apache.org/jira/browse/IGNITE-28606
Project: Ignite
Issue Type: Task
Reporter: Alex Abashev
Assignee: Alex Abashev
h3. Background
A new code-generated serialization mechanism (`MessageSerializer`) has been
introduced as a replacement for the legacy approach where serialization logic
was embedded directly into message classes (`writeTo`/`readFrom`). To ensure
the new approach does not introduce performance regressions, JMH benchmarks are
needed.
h3. Problem with Current Implementation
The current PR benchmarks only the final `writeTo`/`readFrom` step, which is
effectively identical code in both approaches. The actual behavioral difference
between legacy and new serialization occurs at earlier stages: message
instantiation, dispatcher lookup, and factory calls. Additionally, using a
single real production message (`CacheMetricsMessage`) is problematic because:
- It will evolve over time, making the legacy copy stale and the benchmark
incorrect
- It keeps legacy code in master permanently, blocking full removal of the old
mechanism
- A single message type allows JVM to inline `ctor.newInstance()`, hiding
real-world dispatch costs
- It only covers primitive fields, missing the parts of codegen that actually
changed (nested messages, collections)
h2. Proposed Approach
h3. Synthetic Test Messages
Create a set of dedicated synthetic messages that will never evolve, each
implemented twice — once with legacy inline serialization and once via the new
`MessageSerializer` framework. The messages should cover a range of field
complexity:
| Message | Contents | Purpose |
|---|---|---|
| `BenchSingleFieldMessage` | 1 primitive field | Minimal baseline |
| `BenchSimpleMessage` | ~5 primitives (int, long, boolean, UUID, String) |
Basic primitive case |
| `BenchLargeMessage` | 100 primitive fields of various types | Stress test for
primitive-heavy messages |
| `BenchNestedMessage` | Primitives + 2–3 nested synthetic messages | Covers
nested message serialization changes |
| `BenchCollectionMessage` | Primitives + List of nested synthetic messages |
Covers collection serialization changes |
All nested messages used in `BenchNestedMessage` and `BenchCollectionMessage`
must also be synthetic (no production message types).
Running all five message types within a single benchmark ensures JVM cannot
effectively inline the message constructor call, producing results
representative of real-world dispatch behavior.
h3. Benchmark Scenarios
**Write benchmark** (`benchWrite`):
1. During `@Setup`: generate random test data for all fields and store in
memory.
2. During benchmark iteration: instantiate a new message, populate it with the
pre-generated test data, serialize it into a reusable output buffer.
**Read benchmark** (`benchRead`):
1. During `@Setup`: generate random test data, serialize it into pre-filled
byte buffers — one buffer per message type, stored in memory.
2. During benchmark iteration: instantiate a new message, read the message type
discriminator from the buffer, deserialize the full message from the buffer.
Both scenarios are implemented for both the legacy and new approaches, yielding
four benchmark methods per message type.
h3. What This Does Not Cover
End-to-end communication layer performance (including network I/O, thread
handoff, etc.) is out of scope for JMH. JMH is a microbenchmark tool and should
remain focused on the serialization layer in isolation. Broader performance
impact should be validated on dedicated load testing stands.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)