[
https://issues.apache.org/jira/browse/IGNITE-28255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alex Abashev updated IGNITE-28255:
----------------------------------
Description:
Summary:
MarshallerCacheJobRunNodeRestartTest.testJobRun fails intermittently with
timeout due to NotSerializableException in MarshallerMappingItem
Description:
The test MarshallerCacheJobRunNodeRestartTest.testJobRun hangs and times out
after 5 minutes (300 000 ms).
TC link:
https://ci2.ignite.apache.org/test/381112157178694638?currentProjectId=IgniteTests24Java8&branch=%3Cdefault%3E
Failure rate: 2 failures out of 68 runs (~3%), both on aitc-lin15, branch
refs/heads/master (builds #41053, #41049)
Root cause:
During node restart, TcpDiscoverySpi fails to deserialize a discovery message
because MarshallerMappingItem is not serializable:
[ERROR][tcp-disco-sock-reader-...][TestTcpDiscoverySpi] Failed to read message
org.apache.ignite.IgniteCheckedException: Failed to deserialize object with
given class loader: IsolatedClassLoader\{roleName='test'}
at
org.apache.ignite.marshaller.jdk.JdkMarshallerImpl.unmarshal0(JdkMarshallerImpl.java:130)
...
Caused by: java.io.WriteAbortedException: writing aborted;
java.io.NotSerializableException:
org.apache.ignite.internal.processors.marshaller.MarshallerMappingItem
Caused by: java.io.NotSerializableException:
org.apache.ignite.internal.processors.marshaller.MarshallerMappingItem
MarshallerMappingItem is being sent as part of a TcpDiscovery message (likely a
MarshallerMappingRequest or MarshallerMappingResponse) but does not implement
Serializable. As a result, the restarting node cannot exchange marshaller
mappings with the rest of the cluster, causing the worker thread to hang
indefinitely waiting for the mapping to be resolved.
This ultimately causes GridTestUtils.runMultiThreaded() to block forever at
Thread.join(), triggering the 5-minute test timeout.
Stack trace (thread dump at timeout):
Thread
[name="test-runner-#83435%cache.MarshallerCacheJobRunNodeRestartTest%",
state=WAITING]
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1304)
at
o.a.i.testframework.GridTestUtils.runMultiThreaded(GridTestUtils.java:1124)
at
o.a.i.i.processors.cache.MarshallerCacheJobRunNodeRestartTest.testJobRun(MarshallerCacheJobRunNodeRestartTest.java:65)
Fix:
MarshallerMappingItem should implement java.io.Serializable (or be converted to
use Ignite's own serialization mechanism) so it can be properly
marshalled/unmarshalled during TcpDiscovery message exchange.
Environment:
- Ignite version: 2.18.0-SNAPSHOT#20260317
- JVM: OpenJDK 17.0.8.1+1 Eclipse Adoptium
- OS: Linux 5.4.0-216-generic amd64
- Agent: aitc-lin15
> Fix java.io.NotSerializableException:
> org.apache.ignite.internal.processors.marshaller.MarshallerMappingItem
> ------------------------------------------------------------------------------------------------------------
>
> Key: IGNITE-28255
> URL: https://issues.apache.org/jira/browse/IGNITE-28255
> Project: Ignite
> Issue Type: Bug
> Reporter: Alex Abashev
> Assignee: Alex Abashev
> Priority: Major
> Labels: IEP-132
> Fix For: 2.19
>
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> Summary:
> MarshallerCacheJobRunNodeRestartTest.testJobRun fails intermittently with
> timeout due to NotSerializableException in MarshallerMappingItem
> Description:
> The test MarshallerCacheJobRunNodeRestartTest.testJobRun hangs and times out
> after 5 minutes (300 000 ms).
> TC link:
> https://ci2.ignite.apache.org/test/381112157178694638?currentProjectId=IgniteTests24Java8&branch=%3Cdefault%3E
> Failure rate: 2 failures out of 68 runs (~3%), both on aitc-lin15, branch
> refs/heads/master (builds #41053, #41049)
> Root cause:
> During node restart, TcpDiscoverySpi fails to deserialize a discovery message
> because MarshallerMappingItem is not serializable:
> [ERROR][tcp-disco-sock-reader-...][TestTcpDiscoverySpi] Failed to read
> message
> org.apache.ignite.IgniteCheckedException: Failed to deserialize object with
> given class loader: IsolatedClassLoader\{roleName='test'}
> at
> org.apache.ignite.marshaller.jdk.JdkMarshallerImpl.unmarshal0(JdkMarshallerImpl.java:130)
> ...
> Caused by: java.io.WriteAbortedException: writing aborted;
> java.io.NotSerializableException:
> org.apache.ignite.internal.processors.marshaller.MarshallerMappingItem
> Caused by: java.io.NotSerializableException:
> org.apache.ignite.internal.processors.marshaller.MarshallerMappingItem
> MarshallerMappingItem is being sent as part of a TcpDiscovery message (likely
> a MarshallerMappingRequest or MarshallerMappingResponse) but does not
> implement Serializable. As a result, the restarting node cannot exchange
> marshaller mappings with the rest of the cluster, causing the worker thread
> to hang indefinitely waiting for the mapping to be resolved.
> This ultimately causes GridTestUtils.runMultiThreaded() to block forever at
> Thread.join(), triggering the 5-minute test timeout.
> Stack trace (thread dump at timeout):
> Thread
> [name="test-runner-#83435%cache.MarshallerCacheJobRunNodeRestartTest%",
> state=WAITING]
> at java.lang.Object.wait(Native Method)
> at java.lang.Thread.join(Thread.java:1304)
> at
> o.a.i.testframework.GridTestUtils.runMultiThreaded(GridTestUtils.java:1124)
> at
> o.a.i.i.processors.cache.MarshallerCacheJobRunNodeRestartTest.testJobRun(MarshallerCacheJobRunNodeRestartTest.java:65)
> Fix:
> MarshallerMappingItem should implement java.io.Serializable (or be converted
> to use Ignite's own serialization mechanism) so it can be properly
> marshalled/unmarshalled during TcpDiscovery message exchange.
> Environment:
> - Ignite version: 2.18.0-SNAPSHOT#20260317
> - JVM: OpenJDK 17.0.8.1+1 Eclipse Adoptium
> - OS: Linux 5.4.0-216-generic amd64
> - Agent: aitc-lin15
--
This message was sent by Atlassian Jira
(v8.20.10#820010)