Re: [Shadow Regions / Read Replicas ]

Enis Söztutar Wed, 04 Dec 2013 15:48:18 -0800

I did not know that we were reopening the log file for tailing. From what
Nicolas talks about in
https://issues.apache.org/jira/browse/HDFS-3219 it seems that the visible
length is not
updated for the open stream which is a shame. However in the append design,
the primary can
send the committed length ( minimum of RAs) to replicas, so that replicas
can make that data visible
to the client.


It would be good if we can implement this in hdfs.

About the minimum work agreed that we should not merge this in unless there
are real benefits
demonstrated. That is why we proposed to do the work for phase 1 in a
branch, and at the end of
that, we are hoping we can have something useful and working (but without
wal tailing and async
wal replication), and we will have a more detailed plan for the remaining
steps. We would love to
hear more feedback of how to test / stabilize the feature at the merge
discussions.

Enis



On Wed, Dec 4, 2013 at 2:47 PM, Stack <st...@duboce.net> wrote:

> A few comments after reading through this thread:
>
> + Thanks for moving the (good) discussion here out of the issue.
> + Testing WAL 'tailing'* would be a good input to have.  My sense is that a
> WALpr would make for about the same load on HDFS (and if so, lets just go
> there altogether).
> + I like the notion of doing the minimum work necessary first BUT as has
> been said above, we can't add a 'feature' that only works for one exotic
> use case only; it will just rot.  Any 'customer' of said addition likely
> does not want to be in a position where they are the only ones using any
> such new addition.
> + I like the list Vladimir makes above.  We need to work on his list too
> but it should be aside from this one.
>
> Thanks,
> St.Ack
>
> * HDFS does not support 'tailing'.  Rather it is a heavyweight reopen of
> the file each time we run off the end of the data.  Doing this for
> replication and then per region replica would impose 'heavy' HDFS loading
> (to be measured)
>
>
> On Thu, Dec 5, 2013 at 6:00 AM, Enis Söztutar <enis....@gmail.com> wrote:
>
> > >
> > >
> > > Thanks for adding it there -- I really think it is a big headline
> caveat
> > on
> > > my expectation of "eventual consistency".  Other systems out there that
> > > give you eventually consistency on the millisecond level for most
> cases,
> > > while this initial implementation would has eventual mean 10's of
> minutes
> > > or even handfuls of minutes behind (with the snapshots flush
> mechanism)!
> >
> >
> > > There are a handful of other things in the phase one part of the
> > > implementation section that limit the usefulness of the feature to a
> > > certain kind of constrained hbase user.  I'll start another thread for
> > > those.
> > >
> > >
> > Yes, hopefully we will not stop with only phase 1, and continue to
> > implement
> > the more-latent async wal replication and/or wal tailing. However phase 1
> > will get us
> > to the point of demonstrating that replicated regions works, the client
> > side of execution
> > is manageable, and there is real benefit for read-only or bulk loaded
> > tables plus some
> > specific use cases for read/write tables.
> >
> >
> > >
> > > >
> > > > We are proposing to implement "Region snapshots" first and "Async wal
> > > > replication" second.
> > > > As argued, I think wal-tailing only makes sense with WALpr so, that
> > work
> > > is
> > > > left until after we have WAL
> > > > per region.
> > > >
> > > >
> > > This is our main disagreement -- I'm not convinced that wal tailing
> only
> > > making sense for the wal per region hlog implementation.  Instead of
> > > bouncing around hypotheticals, it sounds like I'll be doing more
> > > experiments to prove it to myself and to convince you. :)
> > >
> >
> > That would be awesome! Region grouping or other related proposals for
> > efficient wal tailing
> > deserves it's own design doc(s).
> >
> >
> > > >
> > > > I think that would be great.  Back when we did snapshots, we had
> active
> > > development against a prototype and spent a bit of time breaking it
> down
> > > into manageable more polished pieces that had slightly lenient reviews.
> > >  This exercise really helped us with our interfaces.  We committed code
> > to
> > > the dev branch which limited merge pains and diff for modifications
> made
> > by
> > > different contributors.  In the end when we had something we were happy
> > > with on the dev branch we merged with trunk and fixed bugs/diffs that
> > > cropped up in the mean time.  I'd suggest a similar process for this.
> > >
> >
> > Agreed. We can make use of the previous best practices. Shame that we
> still
> > do not have read-write git repo.
> >
> >
> > >
> > >
> > > --
> > > // Jonathan Hsieh (shay)
> > > // Software Engineer, Cloudera
> > > // j...@cloudera.com
> > >
> >
>

Re: [Shadow Regions / Read Replicas ]

Reply via email to