Nick Dimiduk created HBASE-12430:
------------------------------------

             Summary: Contention in lease recovery can delay log splitting 
unnecessarily
                 Key: HBASE-12430
                 URL: https://issues.apache.org/jira/browse/HBASE-12430
             Project: HBase
          Issue Type: Bug
          Components: regionserver, wal
            Reporter: Nick Dimiduk


I'm not deeply familiar with this area so please bear with me.

In a run of IntegrationTestMTTR with CM, I'm seeing a case where RS recovery is 
in progress. Splitting of one of the WAL files is started by a RS and some tmp 
files are written to HDFS. CM kills the RS. Now other RS's try to complete the 
same work but fail to write their temp files into this same location because 
each of them have no lease on the output file. Log lines look like

{noformat}
2014-11-03 12:57:14,093 INFO  [RS_LOG_REPLAY_OPS-ip-172-31-4-166:60020-1] 
wal.HLogSplitter: Processed 99 edits across 12 regions; log 
file=hdfs://ip-172-31-4-163.ec2.internal:8020/apps/hbase/data/WALs/ip-172-31-4-162.ec2.internal,60020,1415017856808-splitting/ip-172-31-4-162.ec2.internal%2C60020%2C1415017856808.1415018131158
 is corrupted = false progress failed = true
2014-11-03 12:57:14,093 WARN  [RS_LOG_REPLAY_OPS-ip-172-31-4-166:60020-1] 
regionserver.SplitLogWorker: log splitting of 
WALs/ip-172-31-4-162.ec2.internal,60020,1415017856808-splitting/ip-172-31-4-162.ec2.internal%2C60020%2C1415017856808.1415018131158
 failed, returning error
org.apache.hadoop.io.MultipleIOException: 11 exceptions 
[org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
 No lease on 
/apps/hbase/data/data/default/IntegrationTestIngestWithTags/0c55ce7c53f996cd97f55385eee222c2/recovered.edits/0000000000000030557.temp
 (inode 28346): File does not exist. [Lease.  Holder: 
DFSClient_hb_rs_ip-172-31-4-166.ec2.internal,60020,1415019284535_-996811059_38, 
pendingcreates: 49]
{noformat}

Splitting does eventually complete but it takes almost 15 minutes.

I don't have a fix in mind. I've thought we should be recovering edits into a 
worker-specific directory and then do a(n atomic) rename to the "official" 
split destination, but this change cannot be executed across a rolling restart. 
I've also considered managing the recovery more explicitly, but I think the 
current behavior of multiple RS's competing for the work is to facilitate 
speculative execution of splitting. Other ideas?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to