[ https://issues.apache.org/jira/browse/IGNITE-17775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Roman Puchkovskiy reassigned IGNITE-17775: ------------------------------------------ Assignee: Roman Puchkovskiy > Invalid data in network buffers causes message deserialization errors and > messages loss > --------------------------------------------------------------------------------------- > > Key: IGNITE-17775 > URL: https://issues.apache.org/jira/browse/IGNITE-17775 > Project: Ignite > Issue Type: Bug > Components: networking > Reporter: Denis Chudov > Assignee: Roman Puchkovskiy > Priority: Major > Labels: ignite-3 > > h3. TL;DR > Message serialization registry behavior is inconsistent, it either throws an > AssertionError or NetworkConfigurationException if factory is not found. > There should be only one. This will simplify debugging situations where one > forgot to register a factory in the registry, as it's the case in the problem > below. There's no actual bug in messaging and mentioned exception is > impossible to get in normal circumstances. > h3. Original description > In some tests I observe network messages' deserialization errors and timeout > exceptions while waiting for response. In some cases there is negative group > type of the message, and this causes error: > {code:java} > java.lang.AssertionError: message type must not be negative, messageType=-5376 > at > org.apache.ignite.network.MessageSerializationRegistryImpl.getFactory(MessageSerializationRegistryImpl.java:77) > at > org.apache.ignite.network.MessageSerializationRegistryImpl.createDeserializer(MessageSerializationRegistryImpl.java:102) > at > org.apache.ignite.internal.network.serialization.SerializationService.createDeserializer(SerializationService.java:68) > at > org.apache.ignite.internal.network.serialization.PerSessionSerializationService.createMessageDeserializer(PerSessionSerializationService.java:109) > at > org.apache.ignite.internal.network.netty.InboundDecoder.decode(InboundDecoder.java:89) > at > io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:507) > at > io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:446) > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) > at > io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) > at > io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) > at > io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:833) > {code} > When the group or message type is positive but not existing, there should be > a NetworkConfigurationException but it's not displayed in logs, however, it > causes TimeoutExceptions because of messages loss. > This reproduces in > [https://github.com/gridgain/apache-ignite-3/tree/ignite-17523-2] in > ItTablesApiTest#testGetTableFromLaggedNode -- This message was sent by Atlassian Jira (v8.20.10#820010)