We don't have to ship the edits one by one. We can use a configurable batch to control the impact on network.
On Tue, Dec 3, 2013 at 7:59 PM, Jimmy Xiang <[email protected]> wrote: > A separate branch similar to that for snapshot is great. +1. > > For wal tailing, we can just skip those edits not for the shadow regions, > right? > > To tail the wal, we need to wait till the wal block is available. There > seems to be a hard latency. Is it better to have a pool of daemon threads > to ship corresponding wal edits directly? By this way, we have a better > control on what edits to ship around. The shadow region will be much closer > to the primary region. We don't need a queue for those edits not shipped > yet. We can just use the memstore as the queue. Once the memstore is > flushed, its content is no need to ship around. > > > > On Tue, Dec 3, 2013 at 6:47 PM, Jonathan Hsieh <[email protected]> wrote: > >> On Tue, Dec 3, 2013 at 2:04 PM, Enis Söztutar <[email protected]> wrote: >> >> > On Tue, Dec 3, 2013 at 11:51 AM, Jonathan Hsieh <[email protected]> >> wrote:> >> > > >> > > On Tue, Dec 3, 2013 at 11:07 AM, Enis Söztutar <[email protected]> >> wrote: >> > > >> > > > Thanks Jon for bringing this to dev@. >> > > > >> > > > >> > > > On Mon, Dec 2, 2013 at 10:01 PM, Jonathan Hsieh <[email protected]> >> > > wrote: >> > > > >> > > > > Fundamentally, I'd prefer focusing on making HBase "HBasier" >> instead >> > of >> > > > > tackling a feature that other systems architecturally can do >> better >> > > > > (inconsistent reads). I consider consistent reads/writes being >> one >> > of >> > > > > HBase's defining features. That said, I think read replicas makes >> > sense >> > > > and >> > > > > is a nice feature to have. >> > > > > >> > > > >> > > > Our design proposal has a specific use case goal, and hopefully we >> can >> > > > demonstrate the >> > > > benefits of having this in HBase so that even more pieces can be >> built >> > on >> > > > top of this. Plus I imagine this will >> > > > be a widely used feature for read-only tables or bulk loaded >> tables. We >> > > are >> > > > not >> > > > proposing of reworking strong consistency semantics or major >> > > architectural >> > > > changes. I think by >> > > > having the tables to be defined with replication count, and the >> > proposed >> > > > client API changes (Consistency definition) >> > > > plugs well into the HBase model rather well. >> > > > >> > > > >> > > I do agree think that without any recent updating mechanism, we are >> > > limiting this usefulness of this feature to essentially *only* the >> > > read-only or bulk load only tables. Recency if there were any >> > > edits/updates would be severely lagging (by default potentially an >> hour) >> > > especially in cases where there are only a few edits to a primarily >> bulk >> > > loaded table. This limitation is not mentioned in the tradeoffs or >> > > requirements (or a non-requirements section) definitely should be >> listed >> > > there. >> > > >> > >> > Obviously the amount of lag you would observe depends on whether you are >> > using >> > "Region snapshots", "WAL-Tailing" or "Async wal replication". I think >> there >> > are still >> > use cases where you can live with >1 hour old stale reads, so that >> "Region >> > snapshots" >> > is not *just* for read-only tables. I'll add these to the tradeoff's >> > section. >> > >> >> Thanks for adding it there -- I really think it is a big headline caveat >> on >> my expectation of "eventual consistency". Other systems out there that >> give you eventually consistency on the millisecond level for most cases, >> while this initial implementation would has eventual mean 10's of minutes >> or even handfuls of minutes behind (with the snapshots flush mechanism)! >> >> There are a handful of other things in the phase one part of the >> implementation section that limit the usefulness of the feature to a >> certain kind of constrained hbase user. I'll start another thread for >> those. >> >> >> > >> > We are proposing to implement "Region snapshots" first and "Async wal >> > replication" second. >> > As argued, I think wal-tailing only makes sense with WALpr so, that >> work is >> > left until after we have WAL >> > per region. >> > >> > >> This is our main disagreement -- I'm not convinced that wal tailing only >> making sense for the wal per region hlog implementation. Instead of >> bouncing around hypotheticals, it sounds like I'll be doing more >> experiments to prove it to myself and to convince you. :) >> >> >> > >> > > >> > > With the current design it might be best to have a flag on the table >> > which >> > > marks it read-only or bulk-load only so that it only gets used by >> users >> > > when the table is in that mode? (and maybe an "escape hatch" for >> power >> > > users). >> > > >> > >> > I think we have a read-only flag already. We might not have bulk-load >> only >> > flag though. Makes sense to add it >> > if we want to restrict allowing bulk loads but preventing writes. >> > >> > Great. >> >> > >> > > >> > > [snip] >> > > > >> > > > - I think the two goals are both worthy on their own each with their >> > own >> > > > > optimal points. We should in the design makes sure that we can >> > support >> > > > > both goals. >> > > > > >> > > > >> > > > I think our proposal is consistent with your doc, and we have >> > considered >> > > > secondary region promotion >> > > > in the future section. It would be good if you can review and >> comment >> > on >> > > > whether you see any points >> > > > missing. >> > > > >> > > > >> > > > I definitely will. At the moment, I think the hybrid for the >> > wals/hlogs I >> > > suggested in the other thread seems to be an optimal solution >> considering >> > > locality. Though feasible is obviously more complex than just one >> > approach >> > > alone. >> > > >> > > >> > > > > - I want to making sure the proposed design have a path for >> optimal >> > > > > fast-consistent read-recovery. >> > > > > >> > > > >> > > > We think that it is, but it is a secondary goal for the initial >> work. I >> > > > don't see any reason why secondary >> > > > promotion cannot be build on top of this, once the branch is in a >> > better >> > > > state. >> > > > >> > > >> > > Based on the detail in the design doc and this statement it sounds >> like >> > you >> > > have a prototype branch already? Is this the case? >> > > >> > >> > Indeed. I think that is mentioned in the jira description. We have some >> > parts of the >> > changes for region, region server, HRI, and master. Client changes are >> on >> > the way. >> > I think we can post that in a github branch for now to share the code >> early >> > and solicit >> > early reviews. >> > >> > I think that would be great. Back when we did snapshots, we had active >> development against a prototype and spent a bit of time breaking it down >> into manageable more polished pieces that had slightly lenient reviews. >> This exercise really helped us with our interfaces. We committed code to >> the dev branch which limited merge pains and diff for modifications made >> by >> different contributors. In the end when we had something we were happy >> with on the dev branch we merged with trunk and fixed bugs/diffs that >> cropped up in the mean time. I'd suggest a similar process for this. >> >> >> -- >> // Jonathan Hsieh (shay) >> // Software Engineer, Cloudera >> // [email protected] >> > >
