[jira] [Commented] (HBASE-23205) Correctly update the position of WALs currently being replicated.

Hudson (Jira) Thu, 09 Jan 2020 12:47:21 -0800


    [ 
https://issues.apache.org/jira/browse/HBASE-23205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012217#comment-17012217
 ]


Hudson commented on HBASE-23205:
--------------------------------

Results for branch branch-1.4
        [build #1142 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/1142/]: 
(x) *{color:red}-1 overall{color}*
----
details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/1142//General_Nightly_Build_Report/]


(x) {color:red}-1 jdk7 checks{color}
-- For more information [see jdk7 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/1142//JDK7_Nightly_Build_Report/]


(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/1142//JDK8_Nightly_Build_Report_(Hadoop2)/]




(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> Correctly update the position of WALs currently being replicated.
> -----------------------------------------------------------------
>
>                 Key: HBASE-23205
>                 URL: https://issues.apache.org/jira/browse/HBASE-23205
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.5.0, 1.4.10, 1.4.11
>            Reporter: Jeongdae Kim
>            Assignee: Jeongdae Kim
>            Priority: Major
>             Fix For: 1.5.1, 1.4.13
>
>
> We observed a lot of old WALs were not removed from archives and their 
> corresponding replication queues, while testing with 1.4.10.
>  stacked old WALs are empty or have no entries to be replicated (not in 
> replication table_cfs)
>   
>  As described in HBASE-22784, if no entries to be replicated are appended to 
> WALs, log position will never be updated. As a consequence, all WALs won’t be 
> removed. this issue happened since HBASE-15995.
>   
>  I think old WALs would not be stacked with HBASE-22784. but, it still have 
> something to be fixed as below
>   case 1) Log position could be updated wrongly, when log rolled, because 
> lastWalPath of batches might not point to WAL currently being read.
>  * For example,  after last entry added in a batch were read from P1 position 
> in the WAL W1
>  and then WAL rolled, and reader read until it reaches the end of old wals 
> and continue reading entries from new WAL W2, and then it reached batch size. 
> current read position for W2 is P2. In this case, the batch being passed to a 
> shipper have walPath W1 and P2, so shipper will try to update position P2 for 
> W1. it may result in data inconsistency in recovery case or update failure to 
> zookeeper (znode could not exist by previous log position updates, i guess 
> this case is the same case as HBASE-23169 ?)
>  
>   case 2) Log position could be not updated or updated to wrong position by 
> pendingShipment flag introduced from HBASE-22784
>  * In shipper thread, it would not be guaranteed to update log position 
> always, by setting pendingShipment to false.
>  If  reader set the flag to true, right after shipper set it to false during 
> {color:#24292e}updateLogPosition(), shipper won’t update log position.{color}
>  On the other hand, while reader read filtered entries, If shipper set to 
> false reader will update log position to current read position. it may lose 
> data in recovery case.
>  
>   case 3) A lot of log position updates could be happened, when most of WAL 
> entries are filtered by TableCfWALEntryFilter.
>  * I think it would be better to reduce the number of log updates in that 
> case, because
>  ## zookeeper writes are more expensive operations than reads.(since writes 
> involve synchronizing the state of all servers),
>  ## even if read position was not updated, it would be harmless because all 
> entries will be filtered out again in recovery process.
>  * It would be enough to update log position only when wal rolled in that 
> case. (to cleanup old wals)
>  
> -In addition, During this work, i found a minor bug which is updating 
> replication buffer size wrongly by decreasing total buffer size with the size 
> of bulk loaded files.-
>  -I’d like to fix it, if it’s ok.-
> I removed the changes above and made a separate jira : HBASE-23254



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-23205) Correctly update the position of WALs currently being replicated.

Reply via email to