[jira] [Resolved] (HBASE-28037) Replication stuck after switching to new WAL but the queue is empty

2023-09-27 Thread Xiaolin Ha (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaolin Ha resolved HBASE-28037.

Resolution: Fixed

Merged to branch-2.4 and branch-2.5, thanks [~zhangduo] for reviewing.

> Replication stuck after switching to new WAL but the queue is empty
> ---
>
> Key: HBASE-28037
> URL: https://issues.apache.org/jira/browse/HBASE-28037
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 3.0.0-alpha-4, 2.5.5
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Blocker
> Fix For: 2.4.18, 2.5.6
>
>
> When the speed of consuming replication WALs is high, and there are something 
> wrong when creating new WAL, the swith of replcation source reader to new WAL 
> in the queue may happen before the new WAL is created, then the replcation 
> will stuck since it can not consume the new WALs soon afterwards anymore. 
> Restarting the RS that replication stucking can make the replication recover.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28047) Deadlock when opening mob files

2023-09-27 Thread Xiaolin Ha (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaolin Ha resolved HBASE-28047.

Fix Version/s: 2.6.0
   2.4.18
   2.5.6
   3.0.0-beta-1
   Resolution: Fixed

Merged to master and branch-2+, thanks [~zhangduo] for reviewing.

> Deadlock when opening mob files
> ---
>
> Key: HBASE-28047
> URL: https://issues.apache.org/jira/browse/HBASE-28047
> Project: HBase
>  Issue Type: Bug
>  Components: mob
>Affects Versions: 3.0.0-alpha-4, 2.5.5
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Fix For: 2.6.0, 2.4.18, 2.5.6, 3.0.0-beta-1
>
> Attachments: mobdeadlock.js
>
>
> The hashcode of mob file name is used in MobFileCache to lock the cached mob 
> files, but hashcode may be repeated and the IdLock is not reentrant. Then 
> when opening a not cached file with evicting the opened by LRU, the repeated 
> hashcode files will bring deadlock.
> [^mobdeadlock.js]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28116) Move snapshot storage from filesystem to a separated HBase table

2023-09-27 Thread ruanhui (Jira)
ruanhui created HBASE-28116:
---

 Summary: Move snapshot storage from filesystem to a separated 
HBase table
 Key: HBASE-28116
 URL: https://issues.apache.org/jira/browse/HBASE-28116
 Project: HBase
  Issue Type: New Feature
  Components: snapshots
Reporter: ruanhui


As we know, rename and list are very expensive operations on object storage. 
Currently, the snapshot in hbase relies on these two operations. For example, 
when taking snapshot, we first write snapshot description and data manifest 
file to a temporary directory ,then commit it by a rename operation. When list 
all snapshots, we will scan the snapshot directory to find all completed 
snapshots.

So maybe we can try to introduce a new snapshot storage, using hbase table to 
store it.
Here are a few points from which maybe we can gain benefits:
1. make hbase easier to deploy on object storage, like s3
2. will make snapshots faster and more lightweight. In the current 
filesystem-based snapshot implementation, when consolidating snapshot manifest, 
we will first list all region manifests with a thread pool, read content and 
then delete them. When the number of regions is large, this process may take a 
lot of time. In comparison, the read and write operations of hbase tables are 
more lightweight than the read and write operations of hdfs files.
3. more likely to reduce hdfs small files



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28115) Deal with the replication queue empty problem for sync replication

2023-09-27 Thread Duo Zhang (Jira)
Duo Zhang created HBASE-28115:
-

 Summary: Deal with the replication queue empty problem for sync 
replication
 Key: HBASE-28115
 URL: https://issues.apache.org/jira/browse/HBASE-28115
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Duo Zhang
 Fix For: 3.0.0-beta-1


For sync replication, when the peer state changed from ACTIVE to 
DOWNGRADE_ACTIVE or STANDBY, the special replication group(which contains a 
timestamp in it) could also be empty, and this is acceptable.

We should deal with this too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28114) Replication log reader should not simply quit when queue is empty

2023-09-27 Thread Duo Zhang (Jira)
Duo Zhang created HBASE-28114:
-

 Summary: Replication log reader should not simply quit when queue 
is empty
 Key: HBASE-28114
 URL: https://issues.apache.org/jira/browse/HBASE-28114
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Duo Zhang
Assignee: Duo Zhang
 Fix For: 2.6.0, 3.0.0-beta-1


In HBASE-28037, [~Xiaolin Ha] found that there could be a very small window 
that even for a normal replication source, its queue could be empty.

This is because we will only enqueue the wal file to the queue in postLogRoll, 
where the old WAL writer has already been closed, so if the replication is fast 
enough, we could reach the end of the queue before enqueuing the new wal file.

The code for branch-2+ has been refactored a lot so we opened a new issue for 
fixing this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)