Re: [Shadow Regions / Read Replicas ]

Jimmy Xiang Tue, 03 Dec 2013 20:08:00 -0800

We don't have to ship the edits one by one.  We can use a configurable
batch to control the impact on network.



On Tue, Dec 3, 2013 at 7:59 PM, Jimmy Xiang <jxi...@cloudera.com> wrote:

> A separate branch similar to that for snapshot is great. +1.
>
> For wal tailing, we can just skip those edits not for the shadow regions,
> right?
>
> To tail the wal, we need to wait till the wal block is available. There
> seems to be a hard latency.  Is it better to have a pool of daemon threads
> to ship corresponding wal edits directly?  By this way, we have a better
> control on what edits to ship around. The shadow region will be much closer
> to the primary region.  We don't need a queue for those edits not shipped
> yet.  We can just use the memstore as the queue.  Once the memstore is
> flushed, its content is no need to ship around.
>
>
>
> On Tue, Dec 3, 2013 at 6:47 PM, Jonathan Hsieh <j...@cloudera.com> wrote:
>
>> On Tue, Dec 3, 2013 at 2:04 PM, Enis Söztutar <enis....@gmail.com> wrote:
>>
>> > On Tue, Dec 3, 2013 at 11:51 AM, Jonathan Hsieh <j...@cloudera.com>
>> wrote:>
>> >  >
>> > > On Tue, Dec 3, 2013 at 11:07 AM, Enis Söztutar <e...@apache.org>
>> wrote:
>> > >
>> > > > Thanks Jon for bringing this to dev@.
>> > > >
>> > > >
>> > > > On Mon, Dec 2, 2013 at 10:01 PM, Jonathan Hsieh <j...@cloudera.com>
>> > > wrote:
>> > > >
>> > > > > Fundamentally, I'd prefer focusing on making HBase "HBasier"
>> instead
>> > of
>> > > > > tackling a feature that other systems architecturally can do
>> better
>> > > > > (inconsistent reads).   I consider consistent reads/writes being
>> one
>> > of
>> > > > > HBase's defining features. That said, I think read replicas makes
>> > sense
>> > > > and
>> > > > > is a nice feature to have.
>> > > > >
>> > > >
>> > > > Our design proposal has a specific use case goal, and hopefully we
>> can
>> > > > demonstrate the
>> > > > benefits of having this in HBase so that even more pieces can be
>> built
>> > on
>> > > > top of this. Plus I imagine this will
>> > > > be a widely used feature for read-only tables or bulk loaded
>> tables. We
>> > > are
>> > > > not
>> > > > proposing of reworking strong consistency semantics or major
>> > > architectural
>> > > > changes. I think by
>> > > > having the tables to be defined with replication count, and the
>> > proposed
>> > > > client API changes (Consistency definition)
>> > > > plugs well into the HBase model rather well.
>> > > >
>> > > >
>> > > I do agree think that without any recent updating mechanism, we are
>> > > limiting this usefulness of this feature to essentially *only* the
>> > > read-only or bulk load only tables.  Recency if there were any
>> > > edits/updates would be severely lagging (by default potentially an
>> hour)
>> > > especially in cases where there are only a few edits to a primarily
>> bulk
>> > > loaded table.  This limitation is not mentioned in the tradeoffs or
>> > > requirements (or a non-requirements section) definitely should be
>> listed
>> > > there.
>> > >
>> >
>> > Obviously the amount of lag you would observe depends on whether you are
>> > using
>> > "Region snapshots", "WAL-Tailing" or "Async wal replication". I think
>> there
>> > are still
>> > use cases where you can live with >1 hour old stale reads, so that
>> "Region
>> > snapshots"
>> > is not *just* for read-only tables. I'll add these to the tradeoff's
>> > section.
>> >
>>
>> Thanks for adding it there -- I really think it is a big headline caveat
>> on
>> my expectation of "eventual consistency".  Other systems out there that
>> give you eventually consistency on the millisecond level for most cases,
>> while this initial implementation would has eventual mean 10's of minutes
>> or even handfuls of minutes behind (with the snapshots flush mechanism)!
>>
>> There are a handful of other things in the phase one part of the
>> implementation section that limit the usefulness of the feature to a
>> certain kind of constrained hbase user.  I'll start another thread for
>> those.
>>
>>
>> >
>> > We are proposing to implement "Region snapshots" first and "Async wal
>> > replication" second.
>> > As argued, I think wal-tailing only makes sense with WALpr so, that
>> work is
>> > left until after we have WAL
>> > per region.
>> >
>> >
>> This is our main disagreement -- I'm not convinced that wal tailing only
>> making sense for the wal per region hlog implementation.  Instead of
>> bouncing around hypotheticals, it sounds like I'll be doing more
>> experiments to prove it to myself and to convince you. :)
>>
>>
>> >
>> > >
>> > > With the current design it might be best to have a flag on the table
>> > which
>> > > marks it read-only or bulk-load only so that it only gets used by
>> users
>> > > when the table is in that mode?  (and maybe an "escape hatch" for
>> power
>> > > users).
>> > >
>> >
>> > I think we have a read-only flag already. We might not have bulk-load
>> only
>> > flag though. Makes sense to add it
>> > if we want to restrict allowing bulk loads but preventing writes.
>> >
>> > Great.
>>
>> >
>> > >
>> > > [snip]
>> > > >
>> > > > - I think the two goals are both worthy on their own each with their
>> > own
>> > > > > optimal points.  We should in the design makes sure that we can
>> > support
>> > > > > both goals.
>> > > > >
>> > > >
>> > > > I think our proposal is consistent with your doc, and we have
>> > considered
>> > > > secondary region promotion
>> > > > in the future section. It would be good if you can review and
>> comment
>> > on
>> > > > whether you see any points
>> > > > missing.
>> > > >
>> > > >
>> > > > I definitely will. At the moment, I think the hybrid for the
>> > wals/hlogs I
>> > > suggested in the other thread seems to be an optimal solution
>> considering
>> > > locality.  Though feasible is obviously more complex than just one
>> > approach
>> > > alone.
>> > >
>> > >
>> > > > > - I want to making sure the proposed design have a path for
>> optimal
>> > > > > fast-consistent read-recovery.
>> > > > >
>> > > >
>> > > > We think that it is, but it is a secondary goal for the initial
>> work. I
>> > > > don't see any reason why secondary
>> > > > promotion cannot be build on top of this, once the branch is in a
>> > better
>> > > > state.
>> > > >
>> > >
>> > > Based on the detail in the design doc and this statement it sounds
>> like
>> > you
>> > > have a prototype branch already?  Is this the case?
>> > >
>> >
>> > Indeed. I think that is mentioned in the jira description. We have some
>> > parts of the
>> > changes for region, region server, HRI, and master. Client changes are
>> on
>> > the way.
>> > I think we can post that in a github branch for now to share the code
>> early
>> > and solicit
>> > early reviews.
>> >
>> > I think that would be great.  Back when we did snapshots, we had active
>> development against a prototype and spent a bit of time breaking it down
>> into manageable more polished pieces that had slightly lenient reviews.
>>  This exercise really helped us with our interfaces.  We committed code to
>> the dev branch which limited merge pains and diff for modifications made
>> by
>> different contributors.  In the end when we had something we were happy
>> with on the dev branch we merged with trunk and fixed bugs/diffs that
>> cropped up in the mean time.  I'd suggest a similar process for this.
>>
>>
>> --
>> // Jonathan Hsieh (shay)
>> // Software Engineer, Cloudera
>> // j...@cloudera.com
>>
>
>

Re: [Shadow Regions / Read Replicas ]

Reply via email to