Li Chao created HBASE-28119: ------------------------------- Summary: LogRoller stuck by FanOutOneBlockAsyncDFSOutputHelper.createOutput waitting get future all time Key: HBASE-28119 URL: https://issues.apache.org/jira/browse/HBASE-28119 Project: HBase Issue Type: Bug Affects Versions: 2.2.7 Reporter: Li Chao Attachments: image-2023-09-29-17-23-04-560.png
We found this problem in our production. LogRoller stuck by FanOutOneBlockAsyncDFSOutputHelper.createOutput waitting get future all time !image-2023-09-29-17-23-04-560.png|width=566,height=191! Check the regionserver's log, the regionServer do sasl negotiate with two dataNode, but just one check complete. Another do nothing after connected with dn. {code:java} 518415 2023-04-17 14:17:25,434 INFO io.transwarp.guardian.client.cache.PeriodCacheUpdater: Fetch change version: 0 518416 2023-04-17 14:17:29,092 DEBUG org.apache.hadoop.hbase.ScheduledChore: RefreshCredentials execution time: 0 ms. 518417 2023-04-17 14:17:29,768 DEBUG org.apache.hadoop.hbase.ScheduledChore: CompactionChecker execution time: 0 ms. 518418 2023-04-17 14:17:29,768 DEBUG org.apache.hadoop.hbase.ScheduledChore: CompactionThroughputTuner execution time: 0 ms.518419 2023-04-17 14:17:29,768 DEBUG org.apache.hadoop.hbase.ScheduledChore: MemstoreFlusherChore execution time: 0 ms. 518420 2023-04-17 14:17:29,768 DEBUG org.apache.hadoop.hbase.ScheduledChore: gy-dmz-swrzjzcc-gx-2-19,60020,1677341424491-Hea pMemoryTunerChore execution time: 0 ms. 518421 2023-04-17 14:17:39,375 DEBUG org.apache.hadoop.hbase.regionserver.LogRoller: WAL AsyncFSWAL gy-dmz-swrzjzcc-gx-2-19% 2C60020%2C1677341424491:(num 1681711899342) roll requested 518422 2023-04-17 14:17:39,389 DEBUG org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputSaslHelper: SASL client doing general handshake for addr = 10.179.157.10/10.179.157.10, datanodeId = DatanodeInfoWithStorage[10.179.157.10:50 010,DS-4815c34a-8d0c-42b9-b56c-529d2732d956,DISK] 518423 2023-04-17 14:17:39,391 DEBUG org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputSaslHelper: SASL client doing general handshake for addr = 10.179.157.29/10.179.157.29, datanodeId = DatanodeInfoWithStorage[10.179.157.29:50 010,DS-509f84fe-2e88-403e-87b5-f4765e49094f,DISK] 518424 2023-04-17 14:17:39,392 DEBUG org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputSaslHelper: Verifying QO P, requested QOP = [auth], negotiated QOP = auth 518425 2023-04-17 14:17:39,743 DEBUG org.apache.hadoop.hbase.ScheduledChore: MemstoreFlusherChore execution time: 0 ms. 518426 2023-04-17 14:17:39,743 DEBUG org.apache.hadoop.hbase.ScheduledChore: CompactionChecker execution time: 0 ms. 518427 2023-04-17 14:17:49,977 DEBUG org.apache.hadoop.hbase.ScheduledChore: CompactionChecker execution time: 0 ms. 518428 2023-04-17 14:17:49,977 DEBUG org.apache.hadoop.hbase.ScheduledChore: MemstoreFlusherChore execution time: 0 ms. 518429 2023-04-17 14:17:55,492 INFO {code} FanOutOneBlockAsyncDFSOutputHelper.createOutput will connect and trySaslNegotiate to dataNode. In Sasl authentication mode, SaslNegotiateHandler will be used to handle authentication. If datanode is shut down, SaslNegotiateHandler.channelInactive do not call back to promise and cause future to be stuck forever. {code:java} @Override public void handlerAdded(ChannelHandlerContext ctx) throws Exception { ctx.write(ctx.alloc().buffer(4).writeInt(SASL_TRANSFER_MAGIC_NUMBER)); sendSaslMessage(ctx, new byte[0]); ctx.flush(); step++; } @Override public void channelInactive(ChannelHandlerContext ctx) throws Exception { saslClient.dispose(); } {code} So SaslNegotiateHandler.channelInactive should call promise.tryFailure to avoid future stuck forever. -- This message was sent by Atlassian Jira (v8.20.10#820010)