Re: 答复: [Shadow Regions / Read Replicas ]

Enis Söztutar Mon, 09 Dec 2013 13:25:35 -0800

We are also proposing to implement HBASE-7509 as a part of this major
undertaking. HBASE-7509 will help with HBase in general (even if you are
not using HBASE-10070), and possibly some other hdfs clients.
HBASE-10070 will give you similar benefits to HBASE-7509 if your use case
needs that, but on the hbase layer which will sit on top of HBASE-7509.


Enis


On Sat, Dec 7, 2013 at 5:39 AM, 谢良 <[email protected]> wrote:

> For one advantage of this design(ability to do low latency reads with
> <20ms 99.9% latencies for stale reads), to me, i more prefer to hbase-7509
> solution, Since if you want to ganrantee similar high performance read
> ability in
> shadow regions, then you must let the shadow rs warmup the related hot
> blocks
> into block cache.(In deed, i have a similar worry with Vladimir).
> I tried to think how this design could beat hbase-7509 on cutting the
> latency tail,
> but no result still.
>
> Enis, could you share your thoughts on it? thanks
>
> Thanks,
>
> ________________________________________
> 发件人: Enis Söztutar [[email protected]]
> 发送时间: 2013年12月4日 6:18
> 收件人: [email protected]
> 主题: Re: [Shadow Regions / Read Replicas ]
>
> On Tue, Dec 3, 2013 at 12:31 PM, Vladimir Rodionov
> <[email protected]>wrote:
>
> > The downside:
> >
> > - Double/Triple memstore usage
> > - Increased block cache usage (effectively, block cache will have 50%
> > capacity may be less)
>
>
> These are covered at the tradeoff section at the design doc.
>
>
> >
> >
> These downsides are pretty serious ones. This will result:
> >
> > 1. in decreased overall performance due to decreased efficient block
> cache
> > size
> >
>
> You can elect to not fill up the block cache for secondary reads. It will
> be a configuration option, and a
> tradeoff you may or may not want to pay. Details are in the doc.
>
>
> >  2. In more frequent memstore flushes - this will affect compaction and
> > write tput.
> >
>
> More frequent flushes is not needed unless you are using region snapshots
> approach,
> and want to bound the lag better. It is a tradeoff between expected lag vs
> more
> write amplification.
>
>
> >
> > I do not believe that  HBase 'large' MTTR does not allow to meet 99% SLA.
> > of 10-20ms unless your RSs go down 2-3 times a day for several minutes
> each
> > time. You have to analyze first why are you having so frequent failures,
> > than fix the root source of the problem. Its possible to reduce
> 'detection'
> > phase in MTTR process to couple seconds either by using external beacon
> > process (as I suggested already) or by rewriting some code inside HBase
> and
> > NameNode to move all data out from Java heap to off-heap and reducing
> > GC-induced timeouts from 30 sec to 1-2 sec max. Its tough, but doable.
> The
> > result: you will decrease MTTR by 50% at least w/o sacrificing the
> overall
> > cluster performance.
> >
> > I think, its RS and NN large heaps   and frequent s-t-w GC  activities
> > prevents meeting strict SLA - not occasional server failures.
> >
>
> MTTR and this work is ortagonal. In a distributed system, you cannot
> differentiate between
> a process not responding because it is down or it is busy or network is
> down, or whatnot. Having
> a couple of seconds detection time is unrealistic. You will end up in a
> very unstable state where
> you will be failing servers all over the place. An external beacon also
> cannot differentiate between
> the main process not responding because it is busy, or it is down. What
> happens why there is a temporary
> network partition.
>
>
>
> >
> >
> >
> > On Tue, Dec 3, 2013 at 11:51 AM, Jonathan Hsieh <[email protected]>
> wrote:
> >
> > > To keep the discussion focused on the design goals, I'm going start
> > > referring to enis and deveraj's eventually consistent read replicas as
> > the
> > > *read replica* design, and consistent fast read recovery mechanism
> based
> > on
> > > shadowing/tailing the wals as *shadow regions* or *shadow memstores*.
> >  Can
> > > we agree on nomenclature?
> > >
> > >
> > > On Tue, Dec 3, 2013 at 11:07 AM, Enis Söztutar <[email protected]>
> wrote:
> > >
> > > > Thanks Jon for bringing this to dev@.
> > > >
> > > >
> > > > On Mon, Dec 2, 2013 at 10:01 PM, Jonathan Hsieh <[email protected]>
> > > wrote:
> > > >
> > > > > Fundamentally, I'd prefer focusing on making HBase "HBasier"
> instead
> > of
> > > > > tackling a feature that other systems architecturally can do better
> > > > > (inconsistent reads).   I consider consistent reads/writes being
> one
> > of
> > > > > HBase's defining features. That said, I think read replicas makes
> > sense
> > > > and
> > > > > is a nice feature to have.
> > > > >
> > > >
> > > > Our design proposal has a specific use case goal, and hopefully we
> can
> > > > demonstrate the
> > > > benefits of having this in HBase so that even more pieces can be
> built
> > on
> > > > top of this. Plus I imagine this will
> > > > be a widely used feature for read-only tables or bulk loaded tables.
> We
> > > are
> > > > not
> > > > proposing of reworking strong consistency semantics or major
> > > architectural
> > > > changes. I think by
> > > > having the tables to be defined with replication count, and the
> > proposed
> > > > client API changes (Consistency definition)
> > > > plugs well into the HBase model rather well.
> > > >
> > > >
> > > I do agree think that without any recent updating mechanism, we are
> > > limiting this usefulness of this feature to essentially *only* the
> > > read-only or bulk load only tables.  Recency if there were any
> > > edits/updates would be severely lagging (by default potentially an
> hour)
> > > especially in cases where there are only a few edits to a primarily
> bulk
> > > loaded table.  This limitation is not mentioned in the tradeoffs or
> > > requirements (or a non-requirements section) definitely should be
> listed
> > > there.
> > >
> > > With the current design it might be best to have a flag on the table
> > which
> > > marks it read-only or bulk-load only so that it only gets used by users
> > > when the table is in that mode?  (and maybe an "escape hatch" for power
> > > users).
> > >
> > > [snip]
> > > >
> > > > - I think the two goals are both worthy on their own each with their
> > own
> > > > > optimal points.  We should in the design makes sure that we can
> > support
> > > > > both goals.
> > > > >
> > > >
> > > > I think our proposal is consistent with your doc, and we have
> > considered
> > > > secondary region promotion
> > > > in the future section. It would be good if you can review and comment
> > on
> > > > whether you see any points
> > > > missing.
> > > >
> > > >
> > > > I definitely will. At the moment, I think the hybrid for the
> > wals/hlogs I
> > > suggested in the other thread seems to be an optimal solution
> considering
> > > locality.  Though feasible is obviously more complex than just one
> > approach
> > > alone.
> > >
> > >
> > > > > - I want to making sure the proposed design have a path for optimal
> > > > > fast-consistent read-recovery.
> > > > >
> > > >
> > > > We think that it is, but it is a secondary goal for the initial
> work. I
> > > > don't see any reason why secondary
> > > > promotion cannot be build on top of this, once the branch is in a
> > better
> > > > state.
> > > >
> > >
> > > Based on the detail in the design doc and this statement it sounds like
> > you
> > > have a prototype branch already?  Is this the case?
> > >
> > > --
> > > // Jonathan Hsieh (shay)
> > > // Software Engineer, Cloudera
> > > // [email protected]
> > >
> >
>

Re: 答复: [Shadow Regions / Read Replicas ]

Reply via email to