Re: Flink 1.12 ApplicationMode运行在阿里云托管Kubernetes报错
多谢,确实创建了一个LoadBalance service,报错是来自于此。 > 2021年3月1日 下午2:09,Yang Wang 写道: > > 这个其实原因是阿里云的LoadBalancer探活机制不停的给Flink的rest endpoint发送RST导致的 > 目前有一个ticket来跟进这个问题[1],但还没有修复 > > 短时间内你可以通过log4j的配置将org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint > 这个package的log level设置为WARN来暂时避免 > > [1]. https://issues.apache.org/jira/browse/FLINK-18129 > > > Best, > Yang > > 王 羽凡 于2021年3月1日周一 下午1:01写道: > >> 使用Flink1.12 Application Mode在阿里云托管Kubernetes >> ACK启动发现一些报错,同样的报错在自建Kubernetes集群中未发现。 >> 但是观察taskmanager容器有正常启动,后续任务也可正常执行,针对该报错需如何处理?是不兼容阿里云ACK集群么? >> >> 启动命令: >> ./bin/flink run-application \ >>--target kubernetes-application \ >>-Dkubernetes.cluster-id=demo \ >>-Dkubernetes.container.image=xx.xx.xx/xx/xxx:2.0.12 \ >>local:///opt/flink/usrlib/my-flink-job.jar >> >> 日志: >> 2021-03-01 04:52:06,518 INFO >> org.apache.flink.client.deployment.application.executors.EmbeddedExecutor >> [] - Job 6eb4027586e7137b20ecc8c3ce624417 is submitted. >> 2021-03-01 04:52:06,518 INFO >> org.apache.flink.client.deployment.application.executors.EmbeddedExecutor >> [] - Submitting Job with JobId=6eb4027586e7137b20ecc8c3ce624417. >> 2021-03-01 04:52:08,303 INFO >> org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Recovered >> 0 pods from previous attempts, current attempt id is 1. >> 2021-03-01 04:52:08,303 INFO >> org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - >> Recovered 0 workers from previous attempt. >> 2021-03-01 04:52:08,306 INFO >> org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - >> ResourceManager akka.tcp://flink@demo.default:6123/user/rpc/resourcemanager_0 >> was granted leadership with fencing token >> 2021-03-01 04:52:08,310 INFO >> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl [] - >> Starting the SlotManager. >> 2021-03-01 04:52:08,596 WARN >> org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint [] - >> Unhandled exception >> java.io.IOException: Connection reset by peer >> at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_275] >> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) >> ~[?:1.8.0_275] >> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[?:1.8.0_275] >> at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[?:1.8.0_275] >> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) >> ~[?:1.8.0_275] >> at >> org.apache.flink.shaded.netty4.io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:253) >> ~[flink-dist_2.12-1.12.0.jar:1.12.0] >> at >> org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1133) >> ~[flink-dist_2.12-1.12.0.jar:1.12.0] >> at >> org.apache.flink.shaded.netty4.io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:350) >> ~[flink-dist_2.12-1.12.0.jar:1.12.0] >> at >> org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:148) >> [flink-dist_2.12-1.12.0.jar:1.12.0] >> at >> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714) >> [flink-dist_2.12-1.12.0.jar:1.12.0] >> at >> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650) >> [flink-dist_2.12-1.12.0.jar:1.12.0] >> at >> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576) >> [flink-dist_2.12-1.12.0.jar:1.12.0] >> at >> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) >> [flink-dist_2.12-1.12.0.jar:1.12.0] >> at >> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) >> [flink-dist_2.12-1.12.0.jar:1.12.0] >> at >> org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) >> [flink-dist_2.12-1.12.0.jar:1.12.0] >> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_275] >> 2021-03-01 04:52:08,596 WARN >> org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint [] - >> Unhandled exception >> java.io.IOException: Connection reset by peer >> at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_275] >> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) >> ~[?:1.8.0_275] >> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[?:1.8.0_275] >> at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[?:1.8.0_275] >> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) >> ~[?:1.8.0_275] >> at >> org.apache.flink.shaded.netty4.io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:253) >> ~[flink-dist_2.12-1.12.0.jar:1.12.0] >> at >> org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1133) >> ~[flink-dist_2.12-1.12.0.jar:1.12.0] >> at >>
Re: Flink 1.12 ApplicationMode运行在阿里云托管Kubernetes报错
这个其实原因是阿里云的LoadBalancer探活机制不停的给Flink的rest endpoint发送RST导致的 目前有一个ticket来跟进这个问题[1],但还没有修复 短时间内你可以通过log4j的配置将org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint 这个package的log level设置为WARN来暂时避免 [1]. https://issues.apache.org/jira/browse/FLINK-18129 Best, Yang 王 羽凡 于2021年3月1日周一 下午1:01写道: > 使用Flink1.12 Application Mode在阿里云托管Kubernetes > ACK启动发现一些报错,同样的报错在自建Kubernetes集群中未发现。 > 但是观察taskmanager容器有正常启动,后续任务也可正常执行,针对该报错需如何处理?是不兼容阿里云ACK集群么? > > 启动命令: > ./bin/flink run-application \ > --target kubernetes-application \ > -Dkubernetes.cluster-id=demo \ > -Dkubernetes.container.image=xx.xx.xx/xx/xxx:2.0.12 \ > local:///opt/flink/usrlib/my-flink-job.jar > > 日志: > 2021-03-01 04:52:06,518 INFO > org.apache.flink.client.deployment.application.executors.EmbeddedExecutor > [] - Job 6eb4027586e7137b20ecc8c3ce624417 is submitted. > 2021-03-01 04:52:06,518 INFO > org.apache.flink.client.deployment.application.executors.EmbeddedExecutor > [] - Submitting Job with JobId=6eb4027586e7137b20ecc8c3ce624417. > 2021-03-01 04:52:08,303 INFO > org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Recovered > 0 pods from previous attempts, current attempt id is 1. > 2021-03-01 04:52:08,303 INFO > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - > Recovered 0 workers from previous attempt. > 2021-03-01 04:52:08,306 INFO > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - > ResourceManager akka.tcp://flink@demo.default:6123/user/rpc/resourcemanager_0 > was granted leadership with fencing token > 2021-03-01 04:52:08,310 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl [] - > Starting the SlotManager. > 2021-03-01 04:52:08,596 WARN > org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint [] - > Unhandled exception > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_275] > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > ~[?:1.8.0_275] > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[?:1.8.0_275] > at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[?:1.8.0_275] > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) > ~[?:1.8.0_275] > at > org.apache.flink.shaded.netty4.io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:253) > ~[flink-dist_2.12-1.12.0.jar:1.12.0] > at > org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1133) > ~[flink-dist_2.12-1.12.0.jar:1.12.0] > at > org.apache.flink.shaded.netty4.io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:350) > ~[flink-dist_2.12-1.12.0.jar:1.12.0] > at > org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:148) > [flink-dist_2.12-1.12.0.jar:1.12.0] > at > org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714) > [flink-dist_2.12-1.12.0.jar:1.12.0] > at > org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650) > [flink-dist_2.12-1.12.0.jar:1.12.0] > at > org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576) > [flink-dist_2.12-1.12.0.jar:1.12.0] > at > org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) > [flink-dist_2.12-1.12.0.jar:1.12.0] > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) > [flink-dist_2.12-1.12.0.jar:1.12.0] > at > org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > [flink-dist_2.12-1.12.0.jar:1.12.0] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_275] > 2021-03-01 04:52:08,596 WARN > org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint [] - > Unhandled exception > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_275] > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > ~[?:1.8.0_275] > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[?:1.8.0_275] > at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[?:1.8.0_275] > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) > ~[?:1.8.0_275] > at > org.apache.flink.shaded.netty4.io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:253) > ~[flink-dist_2.12-1.12.0.jar:1.12.0] > at > org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1133) > ~[flink-dist_2.12-1.12.0.jar:1.12.0] > at > org.apache.flink.shaded.netty4.io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:350) > ~[flink-dist_2.12-1.12.0.jar:1.12.0] > at >
Flink 1.12 ApplicationMode运行在阿里云托管Kubernetes报错
使用Flink1.12 Application Mode在阿里云托管Kubernetes ACK启动发现一些报错,同样的报错在自建Kubernetes集群中未发现。 但是观察taskmanager容器有正常启动,后续任务也可正常执行,针对该报错需如何处理?是不兼容阿里云ACK集群么? 启动命令: ./bin/flink run-application \ --target kubernetes-application \ -Dkubernetes.cluster-id=demo \ -Dkubernetes.container.image=xx.xx.xx/xx/xxx:2.0.12 \ local:///opt/flink/usrlib/my-flink-job.jar 日志: 2021-03-01 04:52:06,518 INFO org.apache.flink.client.deployment.application.executors.EmbeddedExecutor [] - Job 6eb4027586e7137b20ecc8c3ce624417 is submitted. 2021-03-01 04:52:06,518 INFO org.apache.flink.client.deployment.application.executors.EmbeddedExecutor [] - Submitting Job with JobId=6eb4027586e7137b20ecc8c3ce624417. 2021-03-01 04:52:08,303 INFO org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Recovered 0 pods from previous attempts, current attempt id is 1. 2021-03-01 04:52:08,303 INFO org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Recovered 0 workers from previous attempt. 2021-03-01 04:52:08,306 INFO org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - ResourceManager akka.tcp://flink@demo.default:6123/user/rpc/resourcemanager_0 was granted leadership with fencing token 2021-03-01 04:52:08,310 INFO org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl [] - Starting the SlotManager. 2021-03-01 04:52:08,596 WARN org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint [] - Unhandled exception java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_275] at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[?:1.8.0_275] at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[?:1.8.0_275] at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[?:1.8.0_275] at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) ~[?:1.8.0_275] at org.apache.flink.shaded.netty4.io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:253) ~[flink-dist_2.12-1.12.0.jar:1.12.0] at org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1133) ~[flink-dist_2.12-1.12.0.jar:1.12.0] at org.apache.flink.shaded.netty4.io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:350) ~[flink-dist_2.12-1.12.0.jar:1.12.0] at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:148) [flink-dist_2.12-1.12.0.jar:1.12.0] at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714) [flink-dist_2.12-1.12.0.jar:1.12.0] at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650) [flink-dist_2.12-1.12.0.jar:1.12.0] at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576) [flink-dist_2.12-1.12.0.jar:1.12.0] at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) [flink-dist_2.12-1.12.0.jar:1.12.0] at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [flink-dist_2.12-1.12.0.jar:1.12.0] at org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [flink-dist_2.12-1.12.0.jar:1.12.0] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_275] 2021-03-01 04:52:08,596 WARN org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint [] - Unhandled exception java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_275] at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[?:1.8.0_275] at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[?:1.8.0_275] at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[?:1.8.0_275] at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) ~[?:1.8.0_275] at org.apache.flink.shaded.netty4.io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:253) ~[flink-dist_2.12-1.12.0.jar:1.12.0] at org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1133) ~[flink-dist_2.12-1.12.0.jar:1.12.0] at org.apache.flink.shaded.netty4.io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:350) ~[flink-dist_2.12-1.12.0.jar:1.12.0] at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:148) [flink-dist_2.12-1.12.0.jar:1.12.0] at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714) [flink-dist_2.12-1.12.0.jar:1.12.0] at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650) [flink-dist_2.12-1.12.0.jar:1.12.0] at