chenwandong created HBASE-22641: ----------------------------------- Summary: When the Region Server switches the WAL log, the new WAL file created successfully but namenode returns message fails. Then the client retry, but namenode return 'file has an exception', the Region Server does not handle the exception, and abort itself. Key: HBASE-22641 URL: https://issues.apache.org/jira/browse/HBASE-22641 Project: HBase Issue Type: Bug Affects Versions: 1.3.4 Reporter: chenwandong Attachments: image-2019-06-27-21-12-29-757.png
!image-2019-06-27-21-12-29-757.png! Problem Description 1. HBase's WAL log is full of 128M, switch to write a new WAL file. Region server calls HDFS client to create a new WAL log file. 2. The HDFS client sends a CREATE message to the HDFS namenode through the RPC channel. 3. Namenode checks and creates the file, and successfully records the metadata of the new file. 4. At this time, because the namenode network flashed, the namenode failed to respond to the Hdfs client. 5. Since the Hdfs client does not receive a response, wait for a while and try again, and send the CREATE request again. 6. The Namenode detects the file that needs to be created already exists. 7. The Namenode returns an existing file exception (IOException) to the Hdfs client. 8. After Hbase receives the returned exception, it does not handle it, and abort Region server. -- This message was sent by Atlassian JIRA (v7.6.3#76005)