[
https://issues.apache.org/jira/browse/IGNITE-28520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alex Abashev updated IGNITE-28520:
----------------------------------
Description:
*Background / Problem Statement:*
After moving the marshalling methods (prepareMarshal / finishUnmarshal) into
the NIO thread, two related issues emerged:
Performance degradation (IGNITE-28473). Marshalling of CustomObject/CacheObject
now happens in a single NIO worker, whereas previously it was done in parallel
across user threads.
Deadlock in Discovery. The marshaller broadcasts a class registration message
across the cluster and waits for acknowledgement from all nodes. If marshalling
happens on the Discovery thread, a deadlock occurs: the thread waits for a
response to a message it is supposed to process itself.
Root cause: the serializer invokes prepareMarshal / finishUnmarshal directly on
the sending thread (NIO / Discovery), whereas these methods must be executed on
a user thread.
*Proposed Solution (Phase 1):*
Implement two-phase marshalling for CacheObject fields:
Phase 1 — on the send call thread (user thread): Add methods to the generated
serializer that recursively traverse all @Order-annotated fields, locate
CacheObject fields (including nested ones and those inside collections), invoke
prepareMarshal, and store the result in a byte[].
Phase 2 — on the NIO sending thread: The serializer reads the pre-computed
byte[] and writes them to the socket. prepareMarshal is not called.
This phase covers only CacheObject fields generated by the code generator via
@Order. Manual code for MarshallableMessage fields (e.g.
GridJobExecuteResponse::marshallUserData) and encapsulation of byte[] fields
are deferred to the next ticket.
Out of scope (next ticket):
Handling MarshallableMessage fields that require manual code.
Hiding / encapsulating byte[] fields inside messages.
Acceptance Criteria:
prepareMarshal / finishUnmarshal for CacheObject fields are only invoked on a
user thread, never on NIO / Discovery threads.
The NIO worker only reads pre-computed bytes and writes them to the socket.
Recursive traversal of @Order-annotated fields correctly handles nested
CacheObject instances and collections.
The Discovery deadlock when sending messages with CustomObject is no longer
reproducible.
No performance degradation (confirmed by JMH benchmarks — IGNITE-28119).
Existing tests pass.
was:
Background / Problem Statement:
After moving the marshalling methods (prepareMarshal / finishUnmarshal) into
the NIO thread, two related issues emerged:
Performance degradation (IGNITE-28473). Marshalling of CustomObject/CacheObject
now happens in a single NIO worker, whereas previously it was done in parallel
across user threads.
Deadlock in Discovery. The marshaller broadcasts a class registration message
across the cluster and waits for acknowledgement from all nodes. If marshalling
happens on the Discovery thread, a deadlock occurs: the thread waits for a
response to a message it is supposed to process itself.
Root cause: the serializer invokes prepareMarshal / finishUnmarshal directly on
the sending thread (NIO / Discovery), whereas these methods must be executed on
a user thread.
Proposed Solution (Phase 1):
Implement two-phase marshalling for CacheObject fields:
Phase 1 — on the send call thread (user thread): Add methods to the generated
serializer that recursively traverse all @Order-annotated fields, locate
CacheObject fields (including nested ones and those inside collections), invoke
prepareMarshal, and store the result in a byte[].
Phase 2 — on the NIO sending thread: The serializer reads the pre-computed
byte[] and writes them to the socket. prepareMarshal is not called.
This phase covers only CacheObject fields generated by the code generator via
@Order. Manual code for MarshallableMessage fields (e.g.
GridJobExecuteResponse::marshallUserData) and encapsulation of byte[] fields
are deferred to the next ticket.
Out of scope (next ticket):
Handling MarshallableMessage fields that require manual code.
Hiding / encapsulating byte[] fields inside messages.
Acceptance Criteria:
prepareMarshal / finishUnmarshal for CacheObject fields are only invoked on a
user thread, never on NIO / Discovery threads.
The NIO worker only reads pre-computed bytes and writes them to the socket.
Recursive traversal of @Order-annotated fields correctly handles nested
CacheObject instances and collections.
The Discovery deadlock when sending messages with CustomObject is no longer
reproducible.
No performance degradation (confirmed by JMH benchmarks — IGNITE-28119).
Existing tests pass.
> Move prepareMarshal / finishUnmarshal out of NIO communication thread — Phase
> 1: CacheObjects
> ---------------------------------------------------------------------------------------------
>
> Key: IGNITE-28520
> URL: https://issues.apache.org/jira/browse/IGNITE-28520
> Project: Ignite
> Issue Type: Task
> Reporter: Alex Abashev
> Assignee: Alex Abashev
> Priority: Minor
> Labels: IEP-132, ise
> Fix For: 2.19
>
>
> *Background / Problem Statement:*
> After moving the marshalling methods (prepareMarshal / finishUnmarshal) into
> the NIO thread, two related issues emerged:
> Performance degradation (IGNITE-28473). Marshalling of
> CustomObject/CacheObject now happens in a single NIO worker, whereas
> previously it was done in parallel across user threads.
> Deadlock in Discovery. The marshaller broadcasts a class registration message
> across the cluster and waits for acknowledgement from all nodes. If
> marshalling happens on the Discovery thread, a deadlock occurs: the thread
> waits for a response to a message it is supposed to process itself.
> Root cause: the serializer invokes prepareMarshal / finishUnmarshal directly
> on the sending thread (NIO / Discovery), whereas these methods must be
> executed on a user thread.
> *Proposed Solution (Phase 1):*
> Implement two-phase marshalling for CacheObject fields:
> Phase 1 — on the send call thread (user thread): Add methods to the generated
> serializer that recursively traverse all @Order-annotated fields, locate
> CacheObject fields (including nested ones and those inside collections),
> invoke prepareMarshal, and store the result in a byte[].
> Phase 2 — on the NIO sending thread: The serializer reads the pre-computed
> byte[] and writes them to the socket. prepareMarshal is not called.
> This phase covers only CacheObject fields generated by the code generator via
> @Order. Manual code for MarshallableMessage fields (e.g.
> GridJobExecuteResponse::marshallUserData) and encapsulation of byte[] fields
> are deferred to the next ticket.
> Out of scope (next ticket):
> Handling MarshallableMessage fields that require manual code.
> Hiding / encapsulating byte[] fields inside messages.
> Acceptance Criteria:
> prepareMarshal / finishUnmarshal for CacheObject fields are only invoked on
> a user thread, never on NIO / Discovery threads.
> The NIO worker only reads pre-computed bytes and writes them to the socket.
> Recursive traversal of @Order-annotated fields correctly handles nested
> CacheObject instances and collections.
> The Discovery deadlock when sending messages with CustomObject is no longer
> reproducible.
> No performance degradation (confirmed by JMH benchmarks — IGNITE-28119).
> Existing tests pass.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)