[
https://issues.apache.org/jira/browse/IGNITE-28520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alex Abashev updated IGNITE-28520:
----------------------------------
Description:
h2. Summary
Move generic {{CacheObject}} marshalling for outgoing cache messages from the
NIO communication worker to the user thread, so the IO loop is no longer
blocked on serialisation work that has nothing to do with the wire protocol
itself.
The send path now does:
# {{msg.prepareMarshal(cctx)}} — existing hand-written hook on
{{GridCacheMessage}} subclasses (legacy);
# {{ser.prepareMarshalCacheObjects(msg, cacheObjCtx)}} — new auto-invoke of the
generated serializer's traversal of {{@Order}}-annotated {{CacheObject}} /
{{KeyCacheObject}} / nested {{Message}} fields.
Both run on the user (sender) thread inside {{GridCacheIoManager.onSend}}
before the message is enqueued for NIO; the NIO worker only does the actual
byte writes.
h2. Scope of this ticket
* New {{default}} method {{MessageSerializer#prepareMarshalCacheObjects(M,
CacheObjectValueContext)}} (no-op default).
* {{MessageProcessor}} / {{MessageSerializerGenerator}} emit this method on
every concrete serializer with at least one traversable {{@Order}} field;
recursion into nested concrete {{Message}} types via per-class {{*_SER}} static
fields.
* {{GridCacheIoManager.onSend}} calls
{{prepareMarshalGeneratedCacheObjects(msg)}} — gated on {{msg instanceof
GridCacheIdMessage}}, takes {{cacheObjectContext}} from
{{((GridCacheIdMessage)msg).cacheId()}}.
* Phase 2 cleanup: the hand-written {{prepareMarshal(GridCacheSharedContext)}}
override is removed from {{GridCacheMessage}} subclasses where its body was a
pure {{@Order}}-CO traversal that the generator now performs:
** {{GridCacheTtlUpdateRequest}}, {{GridDhtForceKeysRequest}},
{{GridNearUnlockRequest}}, {{GridDhtUnlockRequest}},
{{GridDistributedLockRequest}}, {{GridDistributedLockResponse}},
{{GridDhtAtomicUpdateResponse}}, {{GridDhtAtomicNearResponse}},
{{GridNearGetRequest}}.
** Also removed: {{ret.prepareMarshal(cctx)}} call in
{{GridNearAtomicUpdateResponse.prepareMarshal}} (covered by nested-message
recursion in the generated serializer).
* {{MessageSerializerGenerator.isRecursableMessage}} narrowed:
nested-{{Message}} recursion is blocked only for {{GridCacheIdMessage}}
subclasses (top-level routable). Helper messages with {{@Order int cacheId}}
({{GridCacheReturn}}, {{IgniteTxKey}}, {{IgniteTxEntry}},
{{GridCacheEntryInfo}}, {{UpdateErrors}}) become recursable.
* Generator hygiene: emits {{private static final}} (modern modifier order);
{{/**}} class javadoc starts without trailing space; no trailing newline; empty
line between top-level {{if}}-blocks in {{prepareMarshalCacheObjects}} per
Ignite coding guidelines.
h2. Out of scope
* Mirror work for the receive direction ({{finishUnmarshalCacheObjects}}) —
split into a separate ticket (IGNITE-YYYYY).
* Extending the auto-invoke beyond {{GridCacheIdMessage}} subclasses or to
messages whose retVal carries its own {{cacheId}} — split into a separate
ticket (IGNITE-ZZZZZ); blocks removal of the {{GridCacheReturn.SER}} bridge and
the explicit {{retVal.prepareMarshal(cctx)}} calls in
{{GridDhtTxFinishResponse}} / {{GridNearTxPrepareResponse}}.
h2. Acceptance criteria
* {{GridCacheIoManager.onSend}} invokes
{{prepareMarshalGeneratedCacheObjects(msg)}} for {{GridCacheIdMessage}}
subclasses with non-{{null}} {{cacheObjectContext}}.
* {{MessageProcessorTest}} green; fixture serializers byte-identical with
generator output.
* Targeted suites green: {{GridCacheReturnValueTransferSelfTest}},
{{IgniteCacheEntryProcessorCallTest}},
{{GridCachePartitionedTxMultiThreadedSelfTest}},
{{GridCacheRebalancingSyncSelfTest}} (force-keys path),
{{GridCachePutAllFailoverSelfTest}} (atomic update / unlock path),
{{IgniteCacheQueryMultiThreadedSelfTest}} (query/eviction path).
* No regression in possible-blockers from the TC bot for the PR.
was:
*Background / Problem Statement:*
After moving the marshalling methods (prepareMarshal / finishUnmarshal) into
the NIO thread, two related issues emerged:
Performance degradation (IGNITE-28473). Marshalling of CustomObject/CacheObject
now happens in a single NIO worker, whereas previously it was done in parallel
across user threads.
Deadlock in Discovery. The marshaller broadcasts a class registration message
across the cluster and waits for acknowledgement from all nodes. If marshalling
happens on the Discovery thread, a deadlock occurs: the thread waits for a
response to a message it is supposed to process itself.
Root cause: the serializer invokes prepareMarshal / finishUnmarshal directly on
the sending thread (NIO / Discovery), whereas these methods must be executed on
a user thread.
*Proposed Solution (Phase 1):*
Implement two-phase marshalling for CacheObject fields:
Phase 1 — on the send call thread (user thread): Add methods to the generated
serializer that recursively traverse all @Order-annotated fields, locate
CacheObject fields (including nested ones and those inside collections), invoke
prepareMarshal, and store the result in a byte[].
Phase 2 — on the NIO sending thread: The serializer reads the pre-computed
byte[] and writes them to the socket. prepareMarshal is not called.
This phase covers only CacheObject fields generated by the code generator via
@Order. Manual code for MarshallableMessage fields (e.g.
GridJobExecuteResponse::marshallUserData) and encapsulation of byte[] fields
are deferred to the next ticket.
Out of scope (next ticket):
Handling MarshallableMessage fields that require manual code.
Hiding / encapsulating byte[] fields inside messages.
Acceptance Criteria:
prepareMarshal / finishUnmarshal for CacheObject fields are only invoked on a
user thread, never on NIO / Discovery threads.
The NIO worker only reads pre-computed bytes and writes them to the socket.
Recursive traversal of @Order-annotated fields correctly handles nested
CacheObject instances and collections.
The Discovery deadlock when sending messages with CustomObject is no longer
reproducible.
No performance degradation (confirmed by JMH benchmarks — IGNITE-28119).
Existing tests pass.
> Move prepareMarshal / finishUnmarshal out of NIO communication thread — Phase
> 1: CacheObjects
> ---------------------------------------------------------------------------------------------
>
> Key: IGNITE-28520
> URL: https://issues.apache.org/jira/browse/IGNITE-28520
> Project: Ignite
> Issue Type: Task
> Reporter: Alex Abashev
> Assignee: Alex Abashev
> Priority: Minor
> Labels: IEP-132, ise
> Fix For: 2.19
>
> Time Spent: 3h 20m
> Remaining Estimate: 0h
>
> h2. Summary
> Move generic {{CacheObject}} marshalling for outgoing cache messages from the
> NIO communication worker to the user thread, so the IO loop is no longer
> blocked on serialisation work that has nothing to do with the wire protocol
> itself.
> The send path now does:
> # {{msg.prepareMarshal(cctx)}} — existing hand-written hook on
> {{GridCacheMessage}} subclasses (legacy);
> # {{ser.prepareMarshalCacheObjects(msg, cacheObjCtx)}} — new auto-invoke of
> the generated serializer's traversal of {{@Order}}-annotated {{CacheObject}}
> / {{KeyCacheObject}} / nested {{Message}} fields.
> Both run on the user (sender) thread inside {{GridCacheIoManager.onSend}}
> before the message is enqueued for NIO; the NIO worker only does the actual
> byte writes.
> h2. Scope of this ticket
> * New {{default}} method {{MessageSerializer#prepareMarshalCacheObjects(M,
> CacheObjectValueContext)}} (no-op default).
> * {{MessageProcessor}} / {{MessageSerializerGenerator}} emit this method on
> every concrete serializer with at least one traversable {{@Order}} field;
> recursion into nested concrete {{Message}} types via per-class {{*_SER}}
> static fields.
> * {{GridCacheIoManager.onSend}} calls
> {{prepareMarshalGeneratedCacheObjects(msg)}} — gated on {{msg instanceof
> GridCacheIdMessage}}, takes {{cacheObjectContext}} from
> {{((GridCacheIdMessage)msg).cacheId()}}.
> * Phase 2 cleanup: the hand-written
> {{prepareMarshal(GridCacheSharedContext)}} override is removed from
> {{GridCacheMessage}} subclasses where its body was a pure {{@Order}}-CO
> traversal that the generator now performs:
> ** {{GridCacheTtlUpdateRequest}}, {{GridDhtForceKeysRequest}},
> {{GridNearUnlockRequest}}, {{GridDhtUnlockRequest}},
> {{GridDistributedLockRequest}}, {{GridDistributedLockResponse}},
> {{GridDhtAtomicUpdateResponse}}, {{GridDhtAtomicNearResponse}},
> {{GridNearGetRequest}}.
> ** Also removed: {{ret.prepareMarshal(cctx)}} call in
> {{GridNearAtomicUpdateResponse.prepareMarshal}} (covered by nested-message
> recursion in the generated serializer).
> * {{MessageSerializerGenerator.isRecursableMessage}} narrowed:
> nested-{{Message}} recursion is blocked only for {{GridCacheIdMessage}}
> subclasses (top-level routable). Helper messages with {{@Order int cacheId}}
> ({{GridCacheReturn}}, {{IgniteTxKey}}, {{IgniteTxEntry}},
> {{GridCacheEntryInfo}}, {{UpdateErrors}}) become recursable.
> * Generator hygiene: emits {{private static final}} (modern modifier order);
> {{/**}} class javadoc starts without trailing space; no trailing newline;
> empty line between top-level {{if}}-blocks in {{prepareMarshalCacheObjects}}
> per Ignite coding guidelines.
> h2. Out of scope
> * Mirror work for the receive direction ({{finishUnmarshalCacheObjects}}) —
> split into a separate ticket (IGNITE-YYYYY).
> * Extending the auto-invoke beyond {{GridCacheIdMessage}} subclasses or to
> messages whose retVal carries its own {{cacheId}} — split into a separate
> ticket (IGNITE-ZZZZZ); blocks removal of the {{GridCacheReturn.SER}} bridge
> and the explicit {{retVal.prepareMarshal(cctx)}} calls in
> {{GridDhtTxFinishResponse}} / {{GridNearTxPrepareResponse}}.
> h2. Acceptance criteria
> * {{GridCacheIoManager.onSend}} invokes
> {{prepareMarshalGeneratedCacheObjects(msg)}} for {{GridCacheIdMessage}}
> subclasses with non-{{null}} {{cacheObjectContext}}.
> * {{MessageProcessorTest}} green; fixture serializers byte-identical with
> generator output.
> * Targeted suites green: {{GridCacheReturnValueTransferSelfTest}},
> {{IgniteCacheEntryProcessorCallTest}},
> {{GridCachePartitionedTxMultiThreadedSelfTest}},
> {{GridCacheRebalancingSyncSelfTest}} (force-keys path),
> {{GridCachePutAllFailoverSelfTest}} (atomic update / unlock path),
> {{IgniteCacheQueryMultiThreadedSelfTest}} (query/eviction path).
> * No regression in possible-blockers from the TC bot for the PR.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)