[jira] [Created] (HBASE-26435) [branch-1] The log rolling request maybe canceled immediately in LogRoller due to a race

Rushabh Shah (Jira) Tue, 09 Nov 2021 08:53:05 -0800

Rushabh Shah created HBASE-26435:
------------------------------------

             Summary: [branch-1] The log rolling request maybe canceled 
immediately in LogRoller due to a race 
                 Key: HBASE-26435
                 URL: https://issues.apache.org/jira/browse/HBASE-26435
             Project: HBase
          Issue Type: Sub-task
          Components: wal
    Affects Versions: 1.6.0
            Reporter: Rushabh Shah
             Fix For: 1.7.2



Saw this issue in our internal 1.6 branch.

The WAL  was rolled but the new WAL file was not writable and it logged the 
following error also
{noformat}
2021-11-03 19:20:19,503 WARN  [.168:60020.logRoller] hdfs.DFSClient - Error 
while syncing
java.io.IOException: Could not get block locations. Source file 
"/hbase/WALs/<rs-name>,60020,1635567166484/<rs-name>%2C60020%2C1635567166484.1635967219389"
 - Aborting...
        at 
org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1466)
        at 
org.apache.hadoop.hdfs.DataStreamer.processDatanodeError(DataStreamer.java:1251)
        at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:670)

2021-11-03 19:20:19,507 WARN  [.168:60020.logRoller] wal.FSHLog - pre-sync 
failed but an optimization so keep going
java.io.IOException: Could not get block locations. Source file 
"/hbase/WALs/<rs-name>,60020,1635567166484/<rs-name>%2C60020%2C1635567166484.1635967219389"
 - Aborting...
        at 
org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1466)
        at 
org.apache.hadoop.hdfs.DataStreamer.processDatanodeError(DataStreamer.java:1251)
        at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:670)
{noformat}

Since the new WAL file was not writable, appends to that file started failing 
immediately it was rolled.

{noformat}
2021-11-03 19:20:19,677 INFO  [.168:60020.logRoller] wal.FSHLog - Rolled WAL 
/hbase/WALs/<rs-name>,60020,1635567166484/<rs-name>%2C60020%2C1635567166484.1635965392022
 with entries=253234, filesize=425.67 MB; new WAL 
/hbase/WALs/<rs-name>,60020,1635567166484/<rs-name>%2C60020%2C1635567166484.1635967219389


2021-11-03 19:20:19,690 WARN  [020.append-pool17-t1] wal.FSHLog - Append 
sequenceId=1962661783, requesting roll of WAL
java.io.IOException: Could not get block locations. Source file 
"/hbase/WALs/<rs-name>,60020,1635567166484/<rs-name>%2C60020%2C1635567166484.1635967219389"
 - Aborting...
        at 
org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1466)
        at 
org.apache.hadoop.hdfs.DataStreamer.processDatanodeError(DataStreamer.java:1251)
        at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:670)


2021-11-03 19:20:19,690 INFO  [.168:60020.logRoller] wal.FSHLog - Archiving 
hdfs://prod-EMPTY-hbase2a/hbase/WALs/<rs-name>,60020,1635567166484/<rs-name>%2C60020%2C1635567166484.1635960792837
 to 
hdfs://prod-EMPTY-hbase2a/hbase/oldWALs/hbase2a-dnds1-232-ukb.ops.sfdc.net%2C60020%2C1635567166484.1635960792837
{noformat}

We always reset the rollLog flag within LogRoller thread after the rollWal call 
is complete.
Within FSHLog#rollWriter method, it does many things, like replacing the writer 
and archiving old logs. If append thread fails to write to new file while 
logRoller thread is cleaning old logs, we will miss the rollLog flag since 
LogRoller will reset the flag to false while the previous rollWriter call is 
going on.
Relevant code: 
https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/LogRoller.java#L183-L203

We need to reset rollLog flag before we start rolling the wal. 
This is fixed in branch-2 and master via HBASE-22684 but we didn't fix it in 
branch-1
Also branch-2 has multi wal implementation so it can apply cleanly in branch-1.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HBASE-26435) [branch-1] The log rolling request maybe canceled immediately in LogRoller due to a race

Reply via email to