[ https://issues.apache.org/jira/browse/IGNITE-22280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Igor updated IGNITE-22280: -------------------------- Description: *Steps to reproduce:* # Start cluster of 2 nodes on single host. # Create 5 tables and insert 1000 rows into each. # Kill 1 server. # Start the killed server. # Check logs for errors. *Expected:* No errors in logs. *Actual:* Errors in logs {code:java} 2024-05-17 04:26:37:808 +0000 [ERROR][%ClusterFailover2NodesTest_cluster_0%common-scheduler-0][CriticalWorkerWatchdog] A critical thread is blocked for 688 ms that is more than the allowed 500 ms, it is "ClusterFailover2NodesTest_cluster_0-srv-worker-3" prio=10 Id=41 RUNNABLE at app//org.apache.ignite.internal.network.message.InvokeResponseDeserializer.getMessage(InvokeResponseDeserializer.java:25) at app//org.apache.ignite.internal.network.message.InvokeResponseDeserializer.getMessage(InvokeResponseDeserializer.java:11) at app//org.apache.ignite.internal.network.netty.InboundDecoder.decode(InboundDecoder.java:136) at app//io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:529) at app//io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:468) at app//io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290) at app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at app//io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at app//io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) at app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at app//io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at app//io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) at app//io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) at app//io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724) at app//io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650) at app//io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) at app//io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) at app//io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at app//io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base@17.0.6/java.lang.Thread.run(Thread.java:833) {code} GC calls of node ClusterFailover2NodesTest_cluster_0 (LOG: [^ignite3db-0.log]) !image-2024-05-17-17-57-32-913.png! GC calls of node ClusterFailover2NodesTest_cluster_1 (LOG: [^ignite3db-0.log]) !image-2024-05-17-17-58-12-428.png! was: *Steps to reproduce:* # Start cluster of 2 nodes on single host. # Insert create 5 tables and insert 1000 rows into each. # Kill 1 server. # Start the killed server. # Check logs for errors. *Expected:* No errors in logs. *Actual:* Errors in logs {code:java} 2024-05-17 04:26:37:808 +0000 [ERROR][%ClusterFailover2NodesTest_cluster_0%common-scheduler-0][CriticalWorkerWatchdog] A critical thread is blocked for 688 ms that is more than the allowed 500 ms, it is "ClusterFailover2NodesTest_cluster_0-srv-worker-3" prio=10 Id=41 RUNNABLE at app//org.apache.ignite.internal.network.message.InvokeResponseDeserializer.getMessage(InvokeResponseDeserializer.java:25) at app//org.apache.ignite.internal.network.message.InvokeResponseDeserializer.getMessage(InvokeResponseDeserializer.java:11) at app//org.apache.ignite.internal.network.netty.InboundDecoder.decode(InboundDecoder.java:136) at app//io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:529) at app//io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:468) at app//io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290) at app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at app//io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at app//io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) at app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at app//io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at app//io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) at app//io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) at app//io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724) at app//io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650) at app//io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) at app//io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) at app//io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at app//io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base@17.0.6/java.lang.Thread.run(Thread.java:833) {code} GC calls of node ClusterFailover2NodesTest_cluster_0 (LOG: [^ignite3db-0.log]) !image-2024-05-17-17-57-32-913.png! GC calls of node ClusterFailover2NodesTest_cluster_1 (LOG: [^ignite3db-0.log]) !image-2024-05-17-17-58-12-428.png! > Error "A critical thread is blocked" on restart > ----------------------------------------------- > > Key: IGNITE-22280 > URL: https://issues.apache.org/jira/browse/IGNITE-22280 > Project: Ignite > Issue Type: Bug > Components: general > Affects Versions: 3.0.0-beta2 > Environment: 2 nodes (with arguments "-Xms4096m", "-Xmx4096m" ) on *1 > host* > cpuCount=10 > memorySizeMb=15360 > Reporter: Igor > Priority: Major > Labels: ignite-3 > Attachments: ignite3db-0-1.log, ignite3db-0.log, > image-2024-05-17-17-57-18-759.png, image-2024-05-17-17-57-32-913.png, > image-2024-05-17-17-58-12-428.png > > > *Steps to reproduce:* > # Start cluster of 2 nodes on single host. > # Create 5 tables and insert 1000 rows into each. > # Kill 1 server. > # Start the killed server. > # Check logs for errors. > *Expected:* > No errors in logs. > *Actual:* > Errors in logs > {code:java} > 2024-05-17 04:26:37:808 +0000 > [ERROR][%ClusterFailover2NodesTest_cluster_0%common-scheduler-0][CriticalWorkerWatchdog] > A critical thread is blocked for 688 ms that is more than the allowed 500 > ms, it is "ClusterFailover2NodesTest_cluster_0-srv-worker-3" prio=10 Id=41 > RUNNABLE > at > app//org.apache.ignite.internal.network.message.InvokeResponseDeserializer.getMessage(InvokeResponseDeserializer.java:25) > at > app//org.apache.ignite.internal.network.message.InvokeResponseDeserializer.getMessage(InvokeResponseDeserializer.java:11) > at > app//org.apache.ignite.internal.network.netty.InboundDecoder.decode(InboundDecoder.java:136) > at > app//io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:529) > at > app//io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:468) > at > app//io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290) > at > app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) > at > app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > app//io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at > app//io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) > at > app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) > at > app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > app//io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) > at > app//io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) > at > app//io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) > at > app//io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724) > at > app//io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650) > at app//io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) > at > app//io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) > at > app//io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > at > app//io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base@17.0.6/java.lang.Thread.run(Thread.java:833) {code} > GC calls of node ClusterFailover2NodesTest_cluster_0 (LOG: [^ignite3db-0.log]) > !image-2024-05-17-17-57-32-913.png! GC calls of node > ClusterFailover2NodesTest_cluster_1 (LOG: [^ignite3db-0.log]) > !image-2024-05-17-17-58-12-428.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)