[ https://issues.apache.org/jira/browse/HBASE-26913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17628658#comment-17628658 ]
Duo Zhang commented on HBASE-26913: ----------------------------------- Oh, [~vjasani] so you merged the PR with squash committs? I suppose we should merge them while keeping the commits of the sub tasks... > Replication Observability Framework > ----------------------------------- > > Key: HBASE-26913 > URL: https://issues.apache.org/jira/browse/HBASE-26913 > Project: HBase > Issue Type: New Feature > Components: regionserver, Replication > Reporter: Rushabh Shah > Assignee: Rushabh Shah > Priority: Major > Fix For: 2.6.0, 3.0.0-alpha-4, 2.5.2 > > > In our production clusters, we have seen cases where data is present in > source cluster but not in the sink cluster and 1 case where data is present > in sink cluster but not in source cluster. > We have internal tools where we take incremental backup every day on both > source and sink clusters and we compare the hash of the data in both the > backups. We have seen many cases where hash doesn't match which means data is > not consistent between source and sink for that given day. The Mean Time To > Detect (MTTD) these inconsistencies is atleast 2 days and requires lot of > manual debugging. > We need some tool where we can reduce MTTD and requires less manual debugging. > I have attached design doc. Huge thanks to [~bharathv] to come up with this > design at my work place. -- This message was sent by Atlassian Jira (v8.20.10#820010)