[ https://issues.apache.org/jira/browse/HBASE-20157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yu Li resolved HBASE-20157. --------------------------- Resolution: Duplicate > WAL file might get broken > ------------------------- > > Key: HBASE-20157 > URL: https://issues.apache.org/jira/browse/HBASE-20157 > Project: HBase > Issue Type: Bug > Components: wal > Affects Versions: 1.1.0 > Reporter: Zephyr Guo > Assignee: Zephyr Guo > Priority: Major > Fix For: 2.0.0 > > > WAL file can get corrupted by HBASE-16824. > When calling Writer.close() and Writer.sync() in the same time, a HDFS > bug(HDFS-13243) will be triggered. And, if this did happen, the last block in > WAL will get broken(NN mark it as CorruptBlock). > My purpose of reporting this scenario here is to help those who come across > the same problem like me. (HBASE-16824 has been fixed, though) > {panel:title=RS log} > 2018-02-05 07:58:54,212 INFO > [regionserver/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com/10.0.0.218:16020.logRoller] > hdfs.DFSClient: Could not complete > /hbase/WALs/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com,16020,1517453470107/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com%2C16020%2C1517453470107.default.1517788719683 > retrying... > 2018-02-05 07:59:00,612 INFO > [regionserver/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com/10.0.0.218:16020.logRoller] > hdfs.DFSClient: Could not complete > /hbase/WALs/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com,16020,1517453470107/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com%2C16020%2C1517453470107.default.1517788719683 > retrying... > {panel} > {panel:title=NN log} > 2018-02-05 07:58:48,011 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > fsync: > /hbase/WALs/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com,16020,1517453470107/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com%2C16020%2C1517453470107.default.1517788719683 > for DFSClient_NONMAPREDUCE_1109936977_1 > 2018-02-05 07:58:48,011 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK* > blk_1080650145_6909339\{UCState=COMMITTED, truncateBlock=null, > primaryNodeIndex=-1, > replicas=[ReplicaUC[[DISK]DS-a4e579e7-4721-4c22-9b61-f1d00b33c45f:NORMAL:10.0.0.218:50010|RBW], > > ReplicaUC[[DISK]DS-5d3d7878-876d-4a5a-97bc-5535c4cf8d59:NORMAL:10.0.0.220:50010|RBW], > > ReplicaUC[[DISK]DS-ccc314b2-e2ad-4c1f-99a5-a39e3677a83b:NORMAL:10.0.0.221:50010|RBW]]} > is not COMPLETE (ucState = COMMITTED, replication# = 0 < minimum = 2) in > file > /hbase/WALs/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com,16020,1517453470107/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com%2C16020%2C1517453470107.default.1517788719683 > 2018-02-05 07:58:48,111 INFO BlockStateChange: BLOCK > NameSystem.addToCorruptReplicasMap: blk_1080650145 added as corrupt on > 10.0.0.221:50010 by > hb-j5e517al6xib80rkb-005.hbase.rds.aliyuncs.com/10.0.0.221 because block is > COMMITTED and reported length 1957330 does not match length in block map 80594 > 2018-02-05 07:58:48,224 INFO BlockStateChange: BLOCK > NameSystem.addToCorruptReplicasMap: blk_1080650145 added as corrupt on > 10.0.0.218:50010 by > hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com/10.0.0.218 because block is > COMMITTED and reported length 1957330 does not match length in block map 80594 > 2018-02-05 07:58:48,224 INFO BlockStateChange: BLOCK > NameSystem.addToCorruptReplicasMap: blk_1080650145 added as corrupt on > 10.0.0.220:50010 by > hb-j5e517al6xib80rkb-003.hbase.rds.aliyuncs.com/10.0.0.220 because block is > COMMITTED and reported length 1957330 does not match length in block map 80594 > 2018-02-05 07:58:48,511 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK* > blk_1080650145_6909339\{UCState=COMMITTED, truncateBlock=null, > primaryNodeIndex=-1, > replicas=[ReplicaUC[[DISK]DS-a4e579e7-4721-4c22-9b61-f1d00b33c45f:NORMAL:10.0.0.218:50010|RBW], > > ReplicaUC[[DISK]DS-5d3d7878-876d-4a5a-97bc-5535c4cf8d59:NORMAL:10.0.0.220:50010|RBW], > > ReplicaUC[[DISK]DS-ccc314b2-e2ad-4c1f-99a5-a39e3677a83b:NORMAL:10.0.0.221:50010|RBW]]} > is not COMPLETE (ucState = COMMITTED, replication# = 3 >= minimum = 2) in > file > /hbase/WALs/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com,16020,1517453470107/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com%2C16020%2C1517453470107.default.1517788719683 > {panel} -- This message was sent by Atlassian JIRA (v7.6.3#76005)