TaskManagers Crushing
Hi, I have 4 task manager working on 4 servers. They all crush at the same time without any useful error logs. Only log I can see is some disconnection from Kafka for both consumer and producers. Any idea or any help is appreciated. Some logs from all taskmanagers: I think first server 4 is crushing and it causes crush for all taskmanagers. JobManager: 2023-08-18 15:16:46,528 INFO org.apache.kafka.clients.NetworkClient [] - [AdminClient clientId=47539-enumerator-admin-client] Node 2 disconnected. 2023-08-18 15:19:00,303 INFO org.apache.kafka.clients.NetworkClient [] - [AdminClient clientId=tf_25464-enumerator-admin-client] Node 4 disconnected. 2023-08-18 15:19:16,668 INFO org.apache.kafka.clients.NetworkClient [] - [AdminClient clientId=cpu_59942-enumerator-admin-client] Node 1 disconnected. 2023-08-18 15:19:16,764 INFO org.apache.kafka.clients.NetworkClient [] - [AdminClient clientId=cpu_55128-enumerator-admin-client] Node 3 disconnected. 2023-08-18 15:19:27,913 WARN akka.remote.transport.netty.NettyTransport [] - Remote connection to [/10.11.0.51:42778] failed with java.io.IOException: Connection reset by peer 2023-08-18 15:19:27,963 WARN akka.remote.ReliableDeliverySupervisor [] - Association with remote system [akka.tcp://flink@tef-prod-flink-04:38835] has failed, address is now gated for [50] ms. Reason: [Disassociated] 2023-08-18 15:19:27,967 WARN akka.remote.ReliableDeliverySupervisor [] - Association with remote system [akka.tcp://flink-metrics@tef-prod-flink-04:46491] has failed, address is now gated for [50] ms. Reason: [Disassociated] 2023-08-18 15:19:29,225 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - RouterReplacementAlgorithm -> kafkaSink_sinkFaultyRouter_windowMode: Writer -> kafkaSink_sinkFaultyRouter_windowMode: Committer (3/4) (f6fd65e3fc049bd9021093d8f532bbaf_a47f4a3b960228021159de8de51dbb1f_2_0) switched from RUNNING to FAILED on injection-assia-3-pro-cloud-tef-gcp-europe-west1:39011-b24b1d @ injection-assia-3-pro-cloud-tef-gcp-europe-west1 (dataPort=35223). org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: Connection unexpectedly closed by remote task manager 'tef-prod-flink-04/ 10.11.0.51:37505 [ tef-prod-flink-04:38835-e3ca4d ] '. This might indicate that the remote task manager was lost. at org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler.channelInactive(CreditBasedPartitionRequestClientHandler.java:134) ~[flink-dist-1.16.2.jar:1.16.2] at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) ~[flink-dist-1.16.2.jar:1.16.2] at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) ~[flink-dist-1.16.2.jar:1.16.2] at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) ~[flink-dist-1.16.2.jar:1.16.2] at org.apache.flink.shaded.netty4.io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81) ~[flink-dist-1.16.2.jar:1.16.2] at org.apache.flink.runtime.io.network.netty.NettyMessageClientDecoderDelegate.channelInactive(NettyMessageClientDecoderDelegate.java:94) ~[flink-dist-1.16.2.jar:1.16.2] at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) ~[flink-dist-1.16.2.jar:1.16.2] at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) ~[flink-dist-1.16.2.jar:1.16.2] at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) ~[flink-dist-1.16.2.jar:1.16.2] at org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405) ~[flink-dist-1.16.2.jar:1.16.2] at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) ~[flink-dist-1.16.2.jar:1.16.2] at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) ~[flink-dist-1.16.2.jar:1.16.2] at org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901) ~[flink-dist-1.16.2.jar:1.16.2] at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:831) ~[flink-dist-1.16.2.jar:1.16.2] at org.apache.flink.shaded.netty4.io.netty.util.concurrent.
Re: TaskManagers Crushing
Hi, Maybe you need to check what changed on the Kafka side at that time. Best, Ron Kenan Kılıçtepe 于2023年8月20日周日 08:51写道: > Hi, > > I have 4 task manager working on 4 servers. > They all crush at the same time without any useful error logs. > Only log I can see is some disconnection from Kafka for both consumer and > producers. > Any idea or any help is appreciated. > > Some logs from all taskmanagers: > > I think first server 4 is crushing and it causes crush for all > taskmanagers. > > JobManager: > > 2023-08-18 15:16:46,528 INFO org.apache.kafka.clients.NetworkClient > [] - [AdminClient clientId=47539-enumerator-admin-client] > Node 2 disconnected. > 2023-08-18 15:19:00,303 INFO org.apache.kafka.clients.NetworkClient > [] - [AdminClient > clientId=tf_25464-enumerator-admin-client] Node 4 disconnected. > 2023-08-18 15:19:16,668 INFO org.apache.kafka.clients.NetworkClient > [] - [AdminClient > clientId=cpu_59942-enumerator-admin-client] Node 1 disconnected. > 2023-08-18 15:19:16,764 INFO org.apache.kafka.clients.NetworkClient > [] - [AdminClient > clientId=cpu_55128-enumerator-admin-client] Node 3 disconnected. > 2023-08-18 15:19:27,913 WARN akka.remote.transport.netty.NettyTransport > [] - Remote connection to [/10.11.0.51:42778] failed with > java.io.IOException: Connection reset by peer > 2023-08-18 15:19:27,963 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink@tef-prod-flink-04:38835] has failed, address is now > gated for [50] ms. Reason: [Disassociated] > 2023-08-18 15:19:27,967 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink-metrics@tef-prod-flink-04:46491] has failed, address is > now gated for [50] ms. Reason: [Disassociated] > 2023-08-18 15:19:29,225 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - > RouterReplacementAlgorithm -> kafkaSink_sinkFaultyRouter_windowMode: Writer > -> kafkaSink_sinkFaultyRouter_windowMode: Committer (3/4) > (f6fd65e3fc049bd9021093d8f532bbaf_a47f4a3b960228021159de8de51dbb1f_2_0) > switched from RUNNING to FAILED on > injection-assia-3-pro-cloud-tef-gcp-europe-west1:39011-b24b1d @ > injection-assia-3-pro-cloud-tef-gcp-europe-west1 (dataPort=35223). > org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: > Connection unexpectedly closed by remote task manager 'tef-prod-flink-04/ > 10.11.0.51:37505 [ tef-prod-flink-04:38835-e3ca4d ] '. This might > indicate that the remote task manager was lost. > at > org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler.channelInactive(CreditBasedPartitionRequestClientHandler.java:134) > ~[flink-dist-1.16.2.jar:1.16.2] > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) > ~[flink-dist-1.16.2.jar:1.16.2] > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) > ~[flink-dist-1.16.2.jar:1.16.2] > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) > ~[flink-dist-1.16.2.jar:1.16.2] > at > org.apache.flink.shaded.netty4.io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81) > ~[flink-dist-1.16.2.jar:1.16.2] > at > org.apache.flink.runtime.io.network.netty.NettyMessageClientDecoderDelegate.channelInactive(NettyMessageClientDecoderDelegate.java:94) > ~[flink-dist-1.16.2.jar:1.16.2] > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) > ~[flink-dist-1.16.2.jar:1.16.2] > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) > ~[flink-dist-1.16.2.jar:1.16.2] > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) > ~[flink-dist-1.16.2.jar:1.16.2] > at > org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405) > ~[flink-dist-1.16.2.jar:1.16.2] > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) > ~[flink-dist-1.16.2.jar:1.16.2] > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) > ~[flink-dist-1.16.2.jar:1.16.2] > at > org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.fireChan
Re: TaskManagers Crushing
Hi, Nothing interesting on Kafka side.Just sone partition delete/create logs. Also I can't understand why all task managers stop at the same time without any error log. Thanks Kenan On Sun, Aug 20, 2023 at 10:49 AM liu ron wrote: > Hi, > > Maybe you need to check what changed on the Kafka side at that time. > > Best, > Ron > > Kenan Kılıçtepe 于2023年8月20日周日 08:51写道: > >> Hi, >> >> I have 4 task manager working on 4 servers. >> They all crush at the same time without any useful error logs. >> Only log I can see is some disconnection from Kafka for both consumer and >> producers. >> Any idea or any help is appreciated. >> >> Some logs from all taskmanagers: >> >> I think first server 4 is crushing and it causes crush for all >> taskmanagers. >> >> JobManager: >> >> 2023-08-18 15:16:46,528 INFO org.apache.kafka.clients.NetworkClient >> [] - [AdminClient clientId=47539-enumerator-admin-client] >> Node 2 disconnected. >> 2023-08-18 15:19:00,303 INFO org.apache.kafka.clients.NetworkClient >> [] - [AdminClient >> clientId=tf_25464-enumerator-admin-client] Node 4 disconnected. >> 2023-08-18 15:19:16,668 INFO org.apache.kafka.clients.NetworkClient >> [] - [AdminClient >> clientId=cpu_59942-enumerator-admin-client] Node 1 disconnected. >> 2023-08-18 15:19:16,764 INFO org.apache.kafka.clients.NetworkClient >> [] - [AdminClient >> clientId=cpu_55128-enumerator-admin-client] Node 3 disconnected. >> 2023-08-18 15:19:27,913 WARN akka.remote.transport.netty.NettyTransport >> [] - Remote connection to [/10.11.0.51:42778] failed >> with java.io.IOException: Connection reset by peer >> 2023-08-18 15:19:27,963 WARN akka.remote.ReliableDeliverySupervisor >> [] - Association with remote system >> [akka.tcp://flink@tef-prod-flink-04:38835] has failed, address is now >> gated for [50] ms. Reason: [Disassociated] >> 2023-08-18 15:19:27,967 WARN akka.remote.ReliableDeliverySupervisor >> [] - Association with remote system >> [akka.tcp://flink-metrics@tef-prod-flink-04:46491] has failed, address >> is now gated for [50] ms. Reason: [Disassociated] >> 2023-08-18 15:19:29,225 INFO >> org.apache.flink.runtime.executiongraph.ExecutionGraph [] - >> RouterReplacementAlgorithm -> kafkaSink_sinkFaultyRouter_windowMode: Writer >> -> kafkaSink_sinkFaultyRouter_windowMode: Committer (3/4) >> (f6fd65e3fc049bd9021093d8f532bbaf_a47f4a3b960228021159de8de51dbb1f_2_0) >> switched from RUNNING to FAILED on >> injection-assia-3-pro-cloud-tef-gcp-europe-west1:39011-b24b1d @ >> injection-assia-3-pro-cloud-tef-gcp-europe-west1 (dataPort=35223). >> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: >> Connection unexpectedly closed by remote task manager 'tef-prod-flink-04/ >> 10.11.0.51:37505 [ tef-prod-flink-04:38835-e3ca4d ] '. This might >> indicate that the remote task manager was lost. >> at >> org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler.channelInactive(CreditBasedPartitionRequestClientHandler.java:134) >> ~[flink-dist-1.16.2.jar:1.16.2] >> at >> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) >> ~[flink-dist-1.16.2.jar:1.16.2] >> at >> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) >> ~[flink-dist-1.16.2.jar:1.16.2] >> at >> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) >> ~[flink-dist-1.16.2.jar:1.16.2] >> at >> org.apache.flink.shaded.netty4.io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81) >> ~[flink-dist-1.16.2.jar:1.16.2] >> at >> org.apache.flink.runtime.io.network.netty.NettyMessageClientDecoderDelegate.channelInactive(NettyMessageClientDecoderDelegate.java:94) >> ~[flink-dist-1.16.2.jar:1.16.2] >> at >> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) >> ~[flink-dist-1.16.2.jar:1.16.2] >> at >> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) >> ~[flink-dist-1.16.2.jar:1.16.2] >> at >> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) >> ~[flink-dist-1.16.2.jar:1.16.2] >> at >> org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405) >> ~[flink-dist-1.16.2.jar:1.16.2] >> at >> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractCh
Re: TaskManagers Crushing
Hi, I seems that the node `tef-prod-flink-04/10.11.0.51:37505 [ tef-prod-flink-04:38835-e3ca4d ]` exits unexpected, you can check whether there are some errors in the log of TM or K8S Best, Shammon FY On Sun, Aug 20, 2023 at 5:42 PM Kenan Kılıçtepe wrote: > Hi, > > Nothing interesting on Kafka side.Just sone partition delete/create logs. > Also I can't understand why all task managers stop at the same time > without any error log. > > Thanks > Kenan > > > > On Sun, Aug 20, 2023 at 10:49 AM liu ron wrote: > >> Hi, >> >> Maybe you need to check what changed on the Kafka side at that time. >> >> Best, >> Ron >> >> Kenan Kılıçtepe 于2023年8月20日周日 08:51写道: >> >>> Hi, >>> >>> I have 4 task manager working on 4 servers. >>> They all crush at the same time without any useful error logs. >>> Only log I can see is some disconnection from Kafka for both consumer >>> and producers. >>> Any idea or any help is appreciated. >>> >>> Some logs from all taskmanagers: >>> >>> I think first server 4 is crushing and it causes crush for all >>> taskmanagers. >>> >>> JobManager: >>> >>> 2023-08-18 15:16:46,528 INFO org.apache.kafka.clients.NetworkClient >>> [] - [AdminClient clientId=47539-enumerator-admin-client] >>> Node 2 disconnected. >>> 2023-08-18 15:19:00,303 INFO org.apache.kafka.clients.NetworkClient >>> [] - [AdminClient >>> clientId=tf_25464-enumerator-admin-client] Node 4 disconnected. >>> 2023-08-18 15:19:16,668 INFO org.apache.kafka.clients.NetworkClient >>> [] - [AdminClient >>> clientId=cpu_59942-enumerator-admin-client] Node 1 disconnected. >>> 2023-08-18 15:19:16,764 INFO org.apache.kafka.clients.NetworkClient >>> [] - [AdminClient >>> clientId=cpu_55128-enumerator-admin-client] Node 3 disconnected. >>> 2023-08-18 15:19:27,913 WARN akka.remote.transport.netty.NettyTransport >>> [] - Remote connection to [/10.11.0.51:42778] failed >>> with java.io.IOException: Connection reset by peer >>> 2023-08-18 15:19:27,963 WARN akka.remote.ReliableDeliverySupervisor >>> [] - Association with remote system >>> [akka.tcp://flink@tef-prod-flink-04:38835] has failed, address is now >>> gated for [50] ms. Reason: [Disassociated] >>> 2023-08-18 15:19:27,967 WARN akka.remote.ReliableDeliverySupervisor >>> [] - Association with remote system >>> [akka.tcp://flink-metrics@tef-prod-flink-04:46491] has failed, address >>> is now gated for [50] ms. Reason: [Disassociated] >>> 2023-08-18 15:19:29,225 INFO >>> org.apache.flink.runtime.executiongraph.ExecutionGraph [] - >>> RouterReplacementAlgorithm -> kafkaSink_sinkFaultyRouter_windowMode: Writer >>> -> kafkaSink_sinkFaultyRouter_windowMode: Committer (3/4) >>> (f6fd65e3fc049bd9021093d8f532bbaf_a47f4a3b960228021159de8de51dbb1f_2_0) >>> switched from RUNNING to FAILED on >>> injection-assia-3-pro-cloud-tef-gcp-europe-west1:39011-b24b1d @ >>> injection-assia-3-pro-cloud-tef-gcp-europe-west1 (dataPort=35223). >>> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: >>> Connection unexpectedly closed by remote task manager 'tef-prod-flink-04/ >>> 10.11.0.51:37505 [ tef-prod-flink-04:38835-e3ca4d ] '. This might >>> indicate that the remote task manager was lost. >>> at >>> org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler.channelInactive(CreditBasedPartitionRequestClientHandler.java:134) >>> ~[flink-dist-1.16.2.jar:1.16.2] >>> at >>> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) >>> ~[flink-dist-1.16.2.jar:1.16.2] >>> at >>> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) >>> ~[flink-dist-1.16.2.jar:1.16.2] >>> at >>> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) >>> ~[flink-dist-1.16.2.jar:1.16.2] >>> at >>> org.apache.flink.shaded.netty4.io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81) >>> ~[flink-dist-1.16.2.jar:1.16.2] >>> at >>> org.apache.flink.runtime.io.network.netty.NettyMessageClientDecoderDelegate.channelInactive(NettyMessageClientDecoderDelegate.java:94) >>> ~[flink-dist-1.16.2.jar:1.16.2] >>> at >>> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) >>> ~[flink-dist-1.16.2.jar:1.16.2] >>> at >>> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) >>> ~[flink-dist-1.16.2.jar:1.16.2] >>> at >>> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandl
Re: [EXTERNAL] TaskManagers Crushing
Were you ever able to find a workaround for this? I also have transient failures due to org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException. From: Kenan Kılıçtepe Sent: Saturday, August 19, 2023 5:50 PM To: user@flink.apache.org Subject: [EXTERNAL] TaskManagers Crushing You don't often get email from kkilict...@gmail.com. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> Hi, I have 4 task manager working on 4 servers. They all crush at the same time without any useful error logs. Only log I can see is some disconnection from Kafka for both consumer and producers. Any idea or any help is appreciated. Some logs from all taskmanagers: I think first server 4 is crushing and it causes crush for all taskmanagers. JobManager: 2023-08-18 15:16:46,528 INFO org.apache.kafka.clients.NetworkClient [] - [AdminClient clientId=47539-enumerator-admin-client] Node 2 disconnected. 2023-08-18 15:19:00,303 INFO org.apache.kafka.clients.NetworkClient [] - [AdminClient clientId=tf_25464-enumerator-admin-client] Node 4 disconnected. 2023-08-18 15:19:16,668 INFO org.apache.kafka.clients.NetworkClient [] - [AdminClient clientId=cpu_59942-enumerator-admin-client] Node 1 disconnected. 2023-08-18 15:19:16,764 INFO org.apache.kafka.clients.NetworkClient [] - [AdminClient clientId=cpu_55128-enumerator-admin-client] Node 3 disconnected. 2023-08-18 15:19:27,913 WARN akka.remote.transport.netty.NettyTransport [] - Remote connection to [/10.11.0.51:42778<http://10.11.0.51:42778/>] failed with java.io.IOException: Connection reset by peer 2023-08-18 15:19:27,963 WARN akka.remote.ReliableDeliverySupervisor [] - Association with remote system [akka.tcp://flink@tef-prod-flink-04:38835] has failed, address is now gated for [50] ms. Reason: [Disassociated] 2023-08-18 15:19:27,967 WARN akka.remote.ReliableDeliverySupervisor [] - Association with remote system [akka.tcp://flink-metrics@tef-prod-flink-04:46491] has failed, address is now gated for [50] ms. Reason: [Disassociated] 2023-08-18 15:19:29,225 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - RouterReplacementAlgorithm -> kafkaSink_sinkFaultyRouter_windowMode: Writer -> kafkaSink_sinkFaultyRouter_windowMode: Committer (3/4) (f6fd65e3fc049bd9021093d8f532bbaf_a47f4a3b960228021159de8de51dbb1f_2_0) switched from RUNNING to FAILED on injection-assia-3-pro-cloud-tef-gcp-europe-west1:39011-b24b1d @ injection-assia-3-pro-cloud-tef-gcp-europe-west1 (dataPort=35223). org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: Connection unexpectedly closed by remote task manager 'tef-prod-flink-04/10.11.0.51:37505<http://10.11.0.51:37505/> [ tef-prod-flink-04:38835-e3ca4d ] '. This might indicate that the remote task manager was lost. at org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler.channelInactive(CreditBasedPartitionRequestClientHandler.java:134) ~[flink-dist-1.16.2.jar:1.16.2] at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) ~[flink-dist-1.16.2.jar:1.16.2] at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) ~[flink-dist-1.16.2.jar:1.16.2] at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) ~[flink-dist-1.16.2.jar:1.16.2] at org.apache.flink.shaded.netty4.io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81) ~[flink-dist-1.16.2.jar:1.16.2] at org.apache.flink.runtime.io.network.netty.NettyMessageClientDecoderDelegate.channelInactive(NettyMessageClientDecoderDelegate.java:94) ~[flink-dist-1.16.2.jar:1.16.2] at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) ~[flink-dist-1.16.2.jar:1.16.2] at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) ~[flink-dist-1.16.2.jar:1.16.2] at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) ~[flink-dist-1.16.2.jar:1.16.2] at org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405) ~[flink-dist-1.16.2.jar:1.16.2] at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive