The issue is about moving replication queue storage from zookeeper to a
hbase table. This is the last piece of persistent data on zookeeper. So
after this feature merged, we are finally fine to say that all data on
zookeeper can be removed while restarting a cluster.

Let me paste the release note here

We introduced a table based replication queue storage in this issue. The
> queue data will be stored in hbase:replication table. This is the last
> piece of persistent data on zookeeper. So after this change, we are OK to
> clean up all the data on zookeeper, as now they are all transient, a
> cluster restarting can fix everything.
>
> The data structure has been changed a bit as now we only support an offset
> for a WAL group instead of storing all the WAL files for a WAL group.
> Please see the replication internals section in our ref guide for more
> details.
>
> To break the cyclic dependency issue, i.e, creating a new WAL writer
> requires writing to replication queue storage first but with table based
> replication queue storage, you first need a WAL writer when you want to
> update to table, now we will not record a queue when creating a new WAL
> writer instance. The downside for this change is that, the logic for
> claiming queue and WAL cleaner are much more complicated. See
> AssignReplicationQueuesProcedure and ReplicationLogCleaner for more details
> if you have interest.
>
> Notice that, we will use a separate WAL provider for hbase:replication
> table, so you will see a new WAL file for the region server which holds the
> hbase:replication table. If we do not do this, the update to
> hbase:replication table will also generate some WAL edits in the WAL file
> we need to track in replication, and then lead to more updates to
> hbase:replication table since we have advanced the replication offset. In
> this way we will generate a lot of garbage in our WAL file, even if we
> write nothing to the cluster. So a separated WAL provider which is not
> tracked by replication is necessary here.
>
> The data migration will be done automatically during rolling upgrading, of
> course the migration via a full cluster restart is also supported, but
> please make sure you restart master with new code first. The replication
> peers will be disabled during the migration and no claiming queue will be
> scheduled at the same time. So you may see a lot of unfinished SCPs during
> the migration but do not worry, it will not block the normal failover, all
> regions will be assigned. The replication peers will be enabled again after
> the migration is done, no manual operations needed.
>
> The ReplicationSyncUp tool is also affected. The goal of this tool is to
> replicate data to peer cluster while the source cluster is down. But if we
> store the replication queue data in a hbase table, it is impossible for us
> to get the newest data if the source cluster is down. So here we choose to
> read from the region directory directly to load all the replication queue
> data in memory, and do the sync up work. We may lose the newest data so in
> this way we need to replicate more data but it will not affect
> correctness.
>

 The nightly job is here

https://ci-hbase.apache.org/job/HBase%20Nightly/job/HBASE-27109%252Ftable_based_rqs/

Mostly fine, the failed UTs are not related and are flaky, for example,
build #73, the failed UT is TestAdmin1.testCompactionTimestamps, which is
not related to replication and it only failed in jdk11 build but passed in
jdk8 build.

This is the PR against the master branch.

https://github.com/apache/hbase/pull/5202

The PR is big as we have 16 commits on the feature branch.

The VOTE will be open for at least 72 hours.

[+1] Agree
[+0] Neutral
[-1] Disagree (please include actionable feedback)

Thanks.

Reply via email to