[jira] [Comment Edited] (HBASE-21856) Consider Causal Replication Ordering
[ https://issues.apache.org/jira/browse/HBASE-21856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933196#comment-16933196 ] Andrew Purtell edited comment on HBASE-21856 at 9/19/19 9:23 AM: - The sender is going to be providing batches in sequence to each consistent target. These batches may arrive out of order but the window will be small by definition because RPCs are in flight when the ordering can be indeterminate. But if we are having issues as described if we introduce a back pressure signal or retransmit indication that stops the sender, or causes it to (exponentially) back off, or causes it to resend a specific batch, this could be explored - but only if actually needed was (Author: apurtell): That's correct. But if we are having issues as described if we introduce a back pressure signal that stops the sender, or causes it to (exponentially) back off, this will be fine. > Consider Causal Replication Ordering > > > Key: HBASE-21856 > URL: https://issues.apache.org/jira/browse/HBASE-21856 > Project: HBase > Issue Type: Brainstorming > Components: Replication >Reporter: Lars Hofhansl >Priority: Major > Labels: Replication > > We've had various efforts to improve the ordering guarantees for HBase > replication, most notably Serial Replication. > I think in many cases guaranteeing a Total Replication Order is not required, > but a simpler Causal Replication Order is sufficient. > Specifically we would guarantee causal ordering for a single Rowkey. Any > changes to a Row - Puts, Deletes, etc - would be replicated in the exact > order in which they occurred in the source system. > Unlike total ordering this can be accomplished with only local region server > control. > I don't have a full design in mind, let's discuss here. It should be > sufficient to to the following: > # RegionServers only adopt the replication queues from other RegionServers > for regions they (now) own. This requires log splitting for replication. > # RegionServers ship all edits for queues adopted from other servers before > any of their "own" edits are shipped. > It's probably a bit more involved, but should be much cheaper that the total > ordering provided by serial replication. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-21856) Consider Causal Replication Ordering
[ https://issues.apache.org/jira/browse/HBASE-21856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924560#comment-16924560 ] Bin Shi edited comment on HBASE-21856 at 9/6/19 7:52 PM: - [~apurtell], what you said about "source side ordering" makes sense to me. I want to know more about why the source needs to ship edits of a region to the same destination in the sink, as described by "Sink side ordering". Currently, the source ships the batches (a batch contains edits of multiple regions) with multiple Replication RPCs in parallel, and each RPC is a synchronous call (waiting for replication response before sending another batch). On the sink side, when a region server receives a Replication RPC, it splits the batch into smalls batches per region, calls table.batch() synchronously to apply the batch to the region, then sends replication response to the source. If my understanding is correct, what we really miss here is that the source should ship batches from the same region sequentially, instead of making sure these batches landing on the same destination. With the former, since we only have single source now, we can preserve the order for a region in the presence of cross cluster replication, no matter the batches from the same region lands on the same destination or on the different destinations. Without the former, even the batches from the same region landing on the same destination, we still can't preserve the order. Please let me know if I miss anything. was (Author: bin shi): [~apurtell], what you said about "source side ordering" makes sense to me. I want to know more about why the source needs to ship edits of a region to the same destination in the sink, as described by "Sink side ordering". Currently, the source ships the batches (a batch contains edits of multiple regions) with multiple Replication RPC in parallel, and for each RPC, it is a synchronous call (waiting for replication response before sending another batch). On the sink side, when a region server receives a Replication RPC, it splits the batch into smalls batches per region, calls table.batch() synchronously to apply the batch to the region, then sends replication response to the source. If my understanding is correct, what we really miss here is that the source should ship batches from the same region sequentially, instead of making sure these batches landing on the same destination. With the former, since we only have single source now, we can preserve the order for a region in the presence of cross cluster replication, no matter the batches from the same region lands on the same destination or on the different destinations. Without the former, even the batches from the same region landing on the same destination, we still can't preserve the order. Please let me know if I miss something. > Consider Causal Replication Ordering > > > Key: HBASE-21856 > URL: https://issues.apache.org/jira/browse/HBASE-21856 > Project: HBase > Issue Type: Brainstorming > Components: Replication >Reporter: Lars Hofhansl >Priority: Major > Labels: Replication > > We've had various efforts to improve the ordering guarantees for HBase > replication, most notably Serial Replication. > I think in many cases guaranteeing a Total Replication Order is not required, > but a simpler Causal Replication Order is sufficient. > Specifically we would guarantee causal ordering for a single Rowkey. Any > changes to a Row - Puts, Deletes, etc - would be replicated in the exact > order in which they occurred in the source system. > Unlike total ordering this can be accomplished with only local region server > control. > I don't have a full design in mind, let's discuss here. It should be > sufficient to to the following: > # RegionServers only adopt the replication queues from other RegionServers > for regions they (now) own. This requires log splitting for replication. > # RegionServers ship all edits for queues adopted from other servers before > any of their "own" edits are shipped. > It's probably a bit more involved, but should be much cheaper that the total > ordering provided by serial replication. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Comment Edited] (HBASE-21856) Consider Causal Replication Ordering
[ https://issues.apache.org/jira/browse/HBASE-21856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16788236#comment-16788236 ] Andrew Purtell edited comment on HBASE-21856 at 3/8/19 7:41 PM: There are two aspects of this: 1: Source side ordering. Sources need to ensure that all/any logs which contain the edits for a given region and row are processed in order. Today we can have the region open on one server generating edits and a recovery queue from a crashed regionserver somewhere else, leading to interleaving. [~lhofhansl]'s suggestions in the issue description are a solution for this. A strawman ideal solution: Upon failure of a replication source, the replication queues are split into per region queues, just like WALs are split into per region recovered edits files. Each split-replication-queue is assigned to the current replication source handling the region's edits. The source will drain the recovery queue before shipping newer edits. 2. Sink side ordering. All of the work above is useless if sources then send batches of work containing the edits for a given region and row to more than one destination over at the sink. Who can say in what order the sinks will process the replication RPC? They could be processed out or order or in parallel. The sources should build replication batch RPCs by region (the code today does this in part) and should chose a constant destination for a region's batched edits with a consistent hash. (Multiple pending batches for the same remote regionserver can be merged as an optimization.) Given a set of sink endpoints and an encoded region name, always choose the same sink. This guarantees at most one live endpoint in the sink will be processing a region's edits, and so can apply the edits in a total order. was (Author: apurtell): There are two aspects of this: 1: Source side ordering. Sources need to ensure that all/any logs which contain the edits for a given region and row are processed in order. Today we can have the region open on one server generating edits and a recovery queue from a crashed regionserver somewhere else, leading to interleaving. [~lhofhansl]'s suggestions in the issue description are a solution for this. A strawman ideal solution: Replication queues are split into per region queues, just like WALs are split into per region recovered edits files. The new owner of the primary region takes ownership of the split-replication-queue and drains the queue before shipping newer edits. 2. Sink side ordering. All of the work above is useless if sources then send batches of work containing the edits for a given region and row to more than one destination over at the sink. Who can say in what order the sinks will process the replication RPC? They could be processed out or order or in parallel. The sources should build replication batch RPCs by region (the code today does this in part) and should chose a constant destination for a region's batched edits with a consistent hash. (Multiple pending batches for the same remote regionserver can be merged as an optimization.) Given a set of sink endpoints and an encoded region name, always choose the same sink. This guarantees at most one live endpoint in the sink will be processing a region's edits, and so can apply the edits in a total order. > Consider Causal Replication Ordering > > > Key: HBASE-21856 > URL: https://issues.apache.org/jira/browse/HBASE-21856 > Project: HBase > Issue Type: Brainstorming >Reporter: Lars Hofhansl >Priority: Major > > We've had various efforts to improve the ordering guarantees for HBase > replication, most notably Serial Replication. > I think in many cases guaranteeing a Total Replication Order is not required, > but a simpler Causal Replication Order is sufficient. > Specifically we would guarantee causal ordering for a single Rowkey. Any > changes to a Row - Puts, Deletes, etc - would be replicated in the exact > order in which they occurred in the source system. > Unlike total ordering this can be accomplished with only local region server > control. > I don't have a full design in mind, let's discuss here. It should be > sufficient to to the following: > # RegionServers only adopt the replication queues from other RegionServers > for regions they (now) own. This requires log splitting for replication. > # RegionServers ship all edits for queues adopted from other servers before > any of their "own" edits are shipped. > It's probably a bit more involved, but should be much cheaper that the total > ordering provided by serial replication. -- This message was sent by Atlassian JIRA (v7.6.3#76005)