[jira] [Comment Edited] (HBASE-26120) New replication gets stuck or data loss when multiwal groups more than 10

Andrew Kyle Purtell (Jira) Mon, 26 Jul 2021 12:50:05 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-26120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387597#comment-17387597
 ]


Andrew Kyle Purtell edited comment on HBASE-26120 at 7/26/21, 7:49 PM:
-----------------------------------------------------------------------

There is no PR or patch available. If that changes in a day or so, then fine. 
Otherwise I think while this is a serious issue, it is fine to release in the 
next release if it cannot make the train for the current releases, because 
other releases have this issue already and while unfortunate there are 
obviously alternate configurations that work. Please advise if I am mistaken 
and this is a new regression.


was (Author: apurtell):
There is no PR or patch available. If that changes in a day or so, then fine. 
Otherwise I think this is an issue that is already in previous releases and 
while serious can get onto the train for the next release if it cannot make the 
current releases.

> New replication gets stuck or data loss when multiwal groups more than 10
> -------------------------------------------------------------------------
>
>                 Key: HBASE-26120
>                 URL: https://issues.apache.org/jira/browse/HBASE-26120
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 1.7.1, 2.4.5
>            Reporter: Jasee Tao
>            Priority: Critical
>
> {code:java}
> void preLogRoll(Path newLog) throws IOException {
>   recordLog(newLog);
>   String logName = newLog.getName();
>   String logPrefix = DefaultWALProvider.getWALPrefixFromWALName(logName);
>   synchronized (latestPaths) {
>     Iterator<Path> iterator = latestPaths.iterator();
>     while (iterator.hasNext()) {
>       Path path = iterator.next();
>       if (path.getName().contains(logPrefix)) {
>         iterator.remove();
>         break;
>       }
>     }
>     this.latestPaths.add(newLog);
>   }
> }
> {code}
> ReplicationSourceManager use _latestPaths_ to track each walgroup's last 
> WALlog and all of them will be enqueue for replication when new replication  
> peer added。
> If we set hbase.wal.regiongrouping.numgroups > 10, says 11, the name of 
> WALlog group will be _regionserver.null0.timestamp_ to 
> _regionserver.null11.timestamp_。*_String.contains_* is used in _preoLogRoll_ 
> to replace old logs in same group, leads when _regionserver.null1.ts_ comes, 
> _regionserver.null11.ts_ may be replaced, and *_latestPaths_ growing with 
> wrong logs*.
> Replication then partly stuckd as _regionsserver.null1.ts_ not exists on 
> hdfs, and data may not be replicated to slave as _regionserver.null11.ts_ not 
> in replication queue at startup.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (HBASE-26120) New replication gets stuck or data loss when multiwal groups more than 10

Reply via email to