[ https://issues.apache.org/jira/browse/HBASE-23205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012217#comment-17012217 ]
Hudson commented on HBASE-23205: -------------------------------- Results for branch branch-1.4 [build #1142 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/1142/]: (x) *{color:red}-1 overall{color}* ---- details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/1142//General_Nightly_Build_Report/] (x) {color:red}-1 jdk7 checks{color} -- For more information [see jdk7 report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/1142//JDK7_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/1142//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > Correctly update the position of WALs currently being replicated. > ----------------------------------------------------------------- > > Key: HBASE-23205 > URL: https://issues.apache.org/jira/browse/HBASE-23205 > Project: HBase > Issue Type: Bug > Affects Versions: 1.5.0, 1.4.10, 1.4.11 > Reporter: Jeongdae Kim > Assignee: Jeongdae Kim > Priority: Major > Fix For: 1.5.1, 1.4.13 > > > We observed a lot of old WALs were not removed from archives and their > corresponding replication queues, while testing with 1.4.10. > stacked old WALs are empty or have no entries to be replicated (not in > replication table_cfs) > > As described in HBASE-22784, if no entries to be replicated are appended to > WALs, log position will never be updated. As a consequence, all WALs won’t be > removed. this issue happened since HBASE-15995. > > I think old WALs would not be stacked with HBASE-22784. but, it still have > something to be fixed as below > case 1) Log position could be updated wrongly, when log rolled, because > lastWalPath of batches might not point to WAL currently being read. > * For example, after last entry added in a batch were read from P1 position > in the WAL W1 > and then WAL rolled, and reader read until it reaches the end of old wals > and continue reading entries from new WAL W2, and then it reached batch size. > current read position for W2 is P2. In this case, the batch being passed to a > shipper have walPath W1 and P2, so shipper will try to update position P2 for > W1. it may result in data inconsistency in recovery case or update failure to > zookeeper (znode could not exist by previous log position updates, i guess > this case is the same case as HBASE-23169 ?) > > case 2) Log position could be not updated or updated to wrong position by > pendingShipment flag introduced from HBASE-22784 > * In shipper thread, it would not be guaranteed to update log position > always, by setting pendingShipment to false. > If reader set the flag to true, right after shipper set it to false during > {color:#24292e}updateLogPosition(), shipper won’t update log position.{color} > On the other hand, while reader read filtered entries, If shipper set to > false reader will update log position to current read position. it may lose > data in recovery case. > > case 3) A lot of log position updates could be happened, when most of WAL > entries are filtered by TableCfWALEntryFilter. > * I think it would be better to reduce the number of log updates in that > case, because > ## zookeeper writes are more expensive operations than reads.(since writes > involve synchronizing the state of all servers), > ## even if read position was not updated, it would be harmless because all > entries will be filtered out again in recovery process. > * It would be enough to update log position only when wal rolled in that > case. (to cleanup old wals) > > -In addition, During this work, i found a minor bug which is updating > replication buffer size wrongly by decreasing total buffer size with the size > of bulk loaded files.- > -I’d like to fix it, if it’s ok.- > I removed the changes above and made a separate jira : HBASE-23254 -- This message was sent by Atlassian Jira (v8.3.4#803005)