[ https://issues.apache.org/jira/browse/FLINK-18129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272832#comment-17272832 ]
Robert Metzger commented on FLINK-18129: ---------------------------------------- +1 to Till's proposal of only logging the exception type + message on INFO, and logging the full stack trace on debug. This will reduce the noise in the logs already by a big margin. As far as I can tell, this handler is only used by the REST Endpoint, so cluster internal connectivity issues (akka rpc, taskmanager netty connections) will be logged with more details. > Unhandled exception stack trace from DispatcherRestEndpoint when deploying > Kubernetes session cluster > ----------------------------------------------------------------------------------------------------- > > Key: FLINK-18129 > URL: https://issues.apache.org/jira/browse/FLINK-18129 > Project: Flink > Issue Type: Bug > Components: Deployment / Kubernetes > Affects Versions: 1.11.0 > Reporter: Till Rohrmann > Priority: Major > Fix For: 1.13.0 > > > When deploying a session cluster on Kubernetes via > {{bin/kubernetes-session.sh}}, I see the following stack trace in the master > logs: > {code} > 2020-06-04 01:17:52,068 WARN > org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint [] - Unhandled > exception > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_252] > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > ~[?:1.8.0_252] > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > ~[?:1.8.0_252] > at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[?:1.8.0_252] > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:377) > ~[?:1.8.0_252] > at > org.apache.flink.shaded.netty4.io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:247) > ~[flink-dist_2.11-1.11.0.jar:1.11.0] > at > org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1140) > ~[flink-dist_2.11-1.11.0.jar:1.11.0] > at > org.apache.flink.shaded.netty4.io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:347) > ~[flink-dist_2.11-1.11.0.jar:1.11.0] > at > org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:148) > [flink-dist_2.11-1.11.0.jar:1.11.0] > at > org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:697) > [flink-dist_2.11-1.11.0.jar:1.11.0] > at > org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:632) > [flink-dist_2.11-1.11.0.jar:1.11.0] > at > org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:549) > [flink-dist_2.11-1.11.0.jar:1.11.0] > at > org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:511) > [flink-dist_2.11-1.11.0.jar:1.11.0] > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918) > [flink-dist_2.11-1.11.0.jar:1.11.0] > at > org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > [flink-dist_2.11-1.11.0.jar:1.11.0] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252] > {code} > I am not entirely sure whether this is a configuration problem or a K8s > service which does some liveness checks? The consequence is that the JM logs > are being cluttered with these stack traces. > Most likely this is not caused by Flink but some K8s behavior. The question > is whether we can do something about it if it occurs often. -- This message was sent by Atlassian Jira (v8.3.4#803005)