[ https://issues.apache.org/jira/browse/KUDU-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16314158#comment-16314158 ]
Andrew Wong edited comment on KUDU-2236 at 1/5/18 11:52 PM: ------------------------------------------------------------ So after looking a bit more, I think it was a regression _and maybe_ an incomplete bug fix in the first place. Taking out the regression and looping the test still spews some WARN-level exception messages stemming from ConnectToMasterErrCB.call() that seems to at least have been since a pretty old refactoring of ConnectToCluster (see 3102e3833bcf32a290b63e8d6cae917b62194ac6). Looping the test after taking out the WARN-level exception logging _and_ removing the regression from 986e8de63d8687421c476de07c4e889129062637, this passes 100/100. Looping leaving in the WARN-level logging led to ~50/100 failures. Looping leaving in the regression only and commenting out the WARN-level logging led to ~20/100 failures. A fix for the regression seems straightforward (conditional logging, maybe conditionally logging DEBUG level instead of INFO and updating so only INFO-level logs are parsed, in case the logging is still valuable for debugging). A fix for the WARN-level logging, though, I have to think about it a bit more (suggestions welcome :) ). was (Author: andrew.wong): After experimenting a bit more, it seems the culprit is exclusively the WARN-level exception messages stemming from ConnectToMasterErrCB.call(), which So after looking a bit more, I think it was a regression _and maybe_ an incomplete bug fix in the first place. Taking out the regression and looping the test still spews some WARN-level exception messages stemming from ConnectToMasterErrCB.call() that seems to at least have been since a pretty old refactoring of ConnectToCluster (see 3102e3833bcf32a290b63e8d6cae917b62194ac6). Looping the test after taking out the WARN-level exception logging _and_ removing the regression from 986e8de63d8687421c476de07c4e889129062637, this passes 100/100. Looping leaving in the WARN-level logging led to ~50/100 failures. Looping leaving in the regression only and commenting out the WARN-level logging led to ~20/100 failures. A fix for the regression seems straightforward (conditional logging, maybe conditionally logging DEBUG level instead of INFO and updating so only INFO-level logs are parsed, in case the logging is still valuable for debugging). A fix for the WARN-level logging, though, I have to think about it a bit more (suggestions welcome :) ). > org.apache.kudu.client.TestKuduClient flaky > ------------------------------------------- > > Key: KUDU-2236 > URL: https://issues.apache.org/jira/browse/KUDU-2236 > Project: Kudu > Issue Type: Bug > Components: test > Affects Versions: 1.6.0 > Reporter: Edward Fancher > Assignee: Andrew Wong > > Last seen in org.apache.kudu.client.TestKuduClient.testCloseShortlyAfterOpen > DEBUG - Could not login via JAAS. Using no credentials: Unable to obtain > Principal Name for authentication > DEBUG - SASL mechanism PLAIN chosen for peer 127.63.177.1 > DEBUG - SASL mechanism PLAIN chosen for peer 127.63.177.1 > DEBUG - SASL mechanism PLAIN chosen for peer 127.63.177.1 > DEBUG - Learned about tablet Kudu Master for table 'Kudu Master' with > partition [<start>, <end>) > DEBUG - Releasing all remaining resources > DEBUG - [peer master-127.63.177.1:64030] cleaning up while in state READY due > to: connection disconnected > INFO - W1206 07:14:39.727399 16334 connection.cc:511] server connection from > 127.63.177.1:43497 recv error: Network error: recv error: Connection reset by > peer (error 104) > DEBUG - [peer master-127.63.177.1:64034] cleaning up while in state > NEGOTIATING due to: connection disconnected > WARN - Error receiving response from 127.63.177.1:64030 > org.apache.kudu.client.RecoverableException: connection disconnected > at org.apache.kudu.client.Connection.channelDisconnected(Connection.java:244) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:102) > at org.apache.kudu.client.Connection.handleUpstream(Connection.java:236) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.channelDisconnected(SimpleChannelUpstreamHandler.java:208) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:102) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) > at > org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:60) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.cleanup(FrameDecoder.java:493) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.channelDisconnected(FrameDecoder.java:365) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:102) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) > at > org.jboss.netty.channel.Channels.fireChannelDisconnected(Channels.java:396) > at org.jboss.netty.channel.Channels$4.run(Channels.java:386) > at > org.jboss.netty.channel.socket.ChannelRunnableWrapper.run(ChannelRunnableWrapper.java:40) > at > org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391) > at > org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) > at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) > at > org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) > at > org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > INFO - W1206 07:14:39.741600 17527 negotiation.cc:311] Failed RPC > negotiation. Trace: > INFO - 1206 07:14:39.695892 (+ 0us) reactor.cc:499] Submitting > negotiation task for server connection from 127.63.177.1:48551 > INFO - 1206 07:14:39.722215 (+ 26323us) server_negotiation.cc:173] Beginning > negotiation > INFO - 1206 07:14:39.722236 (+ 21us) server_negotiation.cc:361] Waiting > for connection header > DEBUG - [peer master-127.63.177.1:64032] cleaning up while in state READY due > to: connection disconnected > WARN - Error receiving response from 127.63.177.1:64034 > org.apache.kudu.client.RecoverableException: connection disconnected > at org.apache.kudu.client.Connection.channelDisconnected(Connection.java:244) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:102) > at org.apache.kudu.client.Connection.handleUpstream(Connection.java:236) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.channelDisconnected(SimpleChannelUpstreamHandler.java:208) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:102) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.channelDisconnected(SimpleChannelUpstreamHandler.java:208) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:102) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) > at > org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:60) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.cleanup(FrameDecoder.java:493) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.channelDisconnected(FrameDecoder.java:365) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:102) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) > at > org.jboss.netty.channel.Channels.fireChannelDisconnected(Channels.java:396) > at org.jboss.netty.channel.Channels$4.run(Channels.java:386) > at > org.jboss.netty.channel.socket.ChannelRunnableWrapper.run(ChannelRunnableWrapper.java:40) > at > org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391) > at > org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) > at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) > at > org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) > at > org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > INFO - 1206 07:14:39.722274 (+ 38us) server_negotiation.cc:369] Connection > header received > INFO - 1206 07:14:39.722401 (+ 127us) server_negotiation.cc:325] Received > NEGOTIATE NegotiatePB request > INFO - 1206 07:14:39.722424 (+ 23us) server_negotiation.cc:410] Received > NEGOTIATE request from client > INFO - 1206 07:14:39.722534 (+ 110us) server_negotiation.cc:337] Sending > NEGOTIATE NegotiatePB response > INFO - 1206 07:14:39.722662 (+ 128us) server_negotiation.cc:194] Negotiated > authn=SASL > INFO - 1206 07:14:39.724372 (+ 1710us) server_negotiation.cc:325] Received > TLS_HANDSHAKE NegotiatePB request > INFO - 1206 07:14:39.724946 (+ 574us) server_negotiation.cc:337] Sending > TLS_HANDSHAKE NegotiatePB response > INFO - 1206 07:14:39.730617 (+ 5671us) server_negotiation.cc:325] Received > TLS_HANDSHAKE NegotiatePB request > INFO - 1206 07:14:39.736652 (+ 6035us) server_negotiation.cc:337] Sending > TLS_HANDSHAKE NegotiatePB response > INFO - 1206 07:14:39.737144 (+ 492us) negotiation.cc:302] Negotiation > complete: Network error: Server connection negotiation failed: server > connection from 127.63.177.1:48551: BlockingWrite error: write error: Broken > pipe (error 32) > INFO - Metrics: > {"server-negotiator.queue_time_us":26096,"thread_start_us":25983,"threads_started":1} -- This message was sent by Atlassian JIRA (v6.4.14#64029)