Li Chao created HBASE-28119:
-------------------------------

             Summary: LogRoller stuck by 
FanOutOneBlockAsyncDFSOutputHelper.createOutput waitting get future all time
                 Key: HBASE-28119
                 URL: https://issues.apache.org/jira/browse/HBASE-28119
             Project: HBase
          Issue Type: Bug
    Affects Versions: 2.2.7
            Reporter: Li Chao
         Attachments: image-2023-09-29-17-23-04-560.png

We found this problem in our production. LogRoller stuck by 
FanOutOneBlockAsyncDFSOutputHelper.createOutput waitting get future all time

!image-2023-09-29-17-23-04-560.png|width=566,height=191!

Check the regionserver's log, the regionServer do sasl negotiate with two 
dataNode, but just one check complete. Another do nothing after connected with 
dn.
{code:java}
518415 2023-04-17 14:17:25,434 INFO 
io.transwarp.guardian.client.cache.PeriodCacheUpdater: Fetch change version: 0
518416 2023-04-17 14:17:29,092 DEBUG org.apache.hadoop.hbase.ScheduledChore: 
RefreshCredentials execution time: 0 ms.
518417 2023-04-17 14:17:29,768 DEBUG org.apache.hadoop.hbase.ScheduledChore: 
CompactionChecker execution time: 0 ms.
518418 2023-04-17 14:17:29,768 DEBUG org.apache.hadoop.hbase.ScheduledChore: 
CompactionThroughputTuner execution time: 0 ms.518419 2023-04-17 14:17:29,768 
DEBUG org.apache.hadoop.hbase.ScheduledChore: MemstoreFlusherChore execution 
time: 0 ms.
518420 2023-04-17 14:17:29,768 DEBUG org.apache.hadoop.hbase.ScheduledChore: 
gy-dmz-swrzjzcc-gx-2-19,60020,1677341424491-Hea       pMemoryTunerChore 
execution time: 0 ms.
518421 2023-04-17 14:17:39,375 DEBUG 
org.apache.hadoop.hbase.regionserver.LogRoller: WAL AsyncFSWAL 
gy-dmz-swrzjzcc-gx-2-19%       2C60020%2C1677341424491:(num 1681711899342) roll 
requested
518422 2023-04-17 14:17:39,389 DEBUG 
org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputSaslHelper: SASL 
client        doing general handshake for addr = 10.179.157.10/10.179.157.10, 
datanodeId = DatanodeInfoWithStorage[10.179.157.10:50       
010,DS-4815c34a-8d0c-42b9-b56c-529d2732d956,DISK]
518423 2023-04-17 14:17:39,391 DEBUG 
org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputSaslHelper: SASL 
client        doing general handshake for addr = 10.179.157.29/10.179.157.29, 
datanodeId = DatanodeInfoWithStorage[10.179.157.29:50       
010,DS-509f84fe-2e88-403e-87b5-f4765e49094f,DISK]
518424 2023-04-17 14:17:39,392 DEBUG 
org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputSaslHelper: 
Verifying QO       P, requested QOP = [auth], negotiated QOP = auth
518425 2023-04-17 14:17:39,743 DEBUG org.apache.hadoop.hbase.ScheduledChore: 
MemstoreFlusherChore execution time: 0 ms.
518426 2023-04-17 14:17:39,743 DEBUG org.apache.hadoop.hbase.ScheduledChore: 
CompactionChecker execution time: 0 ms.
518427 2023-04-17 14:17:49,977 DEBUG org.apache.hadoop.hbase.ScheduledChore: 
CompactionChecker execution time: 0 ms.
518428 2023-04-17 14:17:49,977 DEBUG org.apache.hadoop.hbase.ScheduledChore: 
MemstoreFlusherChore execution time: 0 ms.
518429 2023-04-17 14:17:55,492 INFO {code}
FanOutOneBlockAsyncDFSOutputHelper.createOutput will connect and 
trySaslNegotiate to dataNode. In Sasl authentication mode, SaslNegotiateHandler 
will be used to handle authentication. If datanode is shut down, 
SaslNegotiateHandler.channelInactive do not  call back to promise and cause 
future to be stuck forever.
{code:java}
@Override
public void handlerAdded(ChannelHandlerContext ctx) throws Exception {
  ctx.write(ctx.alloc().buffer(4).writeInt(SASL_TRANSFER_MAGIC_NUMBER));
  sendSaslMessage(ctx, new byte[0]);
  ctx.flush();
  step++;
}

@Override
public void channelInactive(ChannelHandlerContext ctx) throws Exception {
  saslClient.dispose();
} {code}
So SaslNegotiateHandler.channelInactive should call promise.tryFailure to avoid 
future stuck forever.

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to