[ 
https://issues.apache.org/jira/browse/HBASE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17550230#comment-17550230
 ] 

Duo Zhang commented on HBASE-15867:
-----------------------------------

Oh, when trying to implement a prototype for ReplicationLogCleaner, I found 
that it is not easy as expected.

The basic idea of the proposed solution above, is to get the wal group of a wal 
file, and check if it is before or after the replication offset, to determine 
whether we can delete it. And if there is no offset for the group, we keep the 
file.

There are basically two problems:

1. Every peer has its own queue, so it is not a simple 'no offset for the 
group'. We need to know whether there is a missing queue for a peer, if so, we 
should not delete it.
2. For a recovered replication queue, we will delete the queue once after we 
finish replicating all the remaining wal files. So for a dead region server, if 
there is no queue for a wal file, then usually it means we could delete it, not 
'should not delete it'.

Anyway, I think it is still possible to implement the cleaner logic, as we can 
know all the replication peers, and we can also know whether a region server is 
dead. But the timing will be more complicated as we need to get information 
from different places, and we may have race and cause we make a wrong decision 
on whether to delete a file.
Will consider more on whether we could have simpler solutions.

Thanks.

> Move HBase replication tracking from ZooKeeper to HBase
> -------------------------------------------------------
>
>                 Key: HBASE-15867
>                 URL: https://issues.apache.org/jira/browse/HBASE-15867
>             Project: HBase
>          Issue Type: New Feature
>          Components: Replication
>    Affects Versions: 2.1.0
>            Reporter: Joseph
>            Assignee: Zheng Hu
>            Priority: Major
>
> Move the WAL file and offset tracking out of ZooKeeper and into an HBase 
> table called hbase:replication. 
> The largest three new changes will be two classes ReplicationTableBase, 
> TableBasedReplicationQueues, and TableBasedReplicationQueuesClient. As of now 
> ReplicationPeers and HFileRef's tracking will not be implemented. Subtasks 
> have been filed for these two jobs.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to