On Tue, Dec 3, 2013 at 2:03 PM, Jonathan Hsieh <j...@cloudera.com> wrote:

> On Tue, Dec 3, 2013 at 11:42 AM, Enis Söztutar <enis....@gmail.com> wrote:
>
> > On Mon, Dec 2, 2013 at 10:20 PM, Jonathan Hsieh <j...@cloudera.com>
> wrote:
> >
> > > > Deveraj:
> > > > Jonathan Hsieh, WAL per region (WALpr) would give you the locality
> (and
> > > hence HDFS short
> > > > circuit) of reads if you were to couple it with the favored nodes.
> The
> > > cost is of course more WAL
> > > > files... In the current situation (no WALpr) it would create quite
> some
> > > traffic cross machine, no?
> > >
> > > I think we all agree that wal per region isn't efficient on today's
> > > spinning hard drive world where we are limited to a relatively low
> budget
> > > or seeks (though may be better in the future with SSD's).
> > >
> >
> > WALpr makes sense in fully SSD world and if hdfs had journaling for
> writes.
> > I don't think anybody
> > is working on this yet.
>
>
> what do you mean by journaling for writes?  do you mean where sync
> operations update length at the nn on every call?
>

I think hdfs guys were using "super sync" for referring to that. I was
referring to
journaling file system (http://en.wikipedia.org/wiki/Journaling_file_system)
where the writes to
multiple files are persisted to a journal disk so that you do not pay the
constant seeks for writing to
a lot of files (for regions wals) in parallel.



>
>
> > Full SSD clusters are already in place (pinterest
> > for example), so I
> > think having WALpr as a pluggable implementation makes sense. HBase
> should
> > work with both
> > WAL-per-regionserver (or multi) or WAL-per-region.
> >
> >
> > I agree here.
>
>
> > >
> > > With this in mind, I actually I making the case that we would group the
> > all
> > > the regions from RS-A onto the same set of preferred regions servers.
> >  This
> > > way we only need to have one or two other RS's tailing the RS.
> > >
> > > So for example, if region X, Y and Z were on RS-A and its hlog, the
> > shadow
> > > region memstores for X, Y, and Z would be assigned to the same one or
> two
> > > other RSs.  Ideally this would be where the HLog files replicas have
> > > locality (helped by favored nodes/block affinity).  Doing this, we hold
> > the
> > > number of readers on the active hlogs to a constant number, do not add
> > any
> > > new cross machine traffic (though tailing currently has costs on the
> NN).
> > >
> > > One inefficiency we have is that if there is a single log per RS, we
> end
> > up
> > > reading all the logs to tables that may not have the shadow feature
> > > enabled.  However, with HBase multi-wals coming, one strategy is to
> shard
> > > wals to a number on the order of the number of disks on a machine
> (12-24
> > > these days).  I think the a wal per namespaces (this could be used to
> > have
> > > a wal per table) of the hlog would make sense.  This way of shardind
> the
> > > hlog would reduce the amount of reading of irrelevant log entries on a
> > log
> > > tailing scheme. It would have the added benefit of reducing the log
> > > splitting work reducing MTTR and allowing for recovery priorities if
> the
> > > primaries and shadows also go down.  (this is an generalization of the
> > > separate out the META into a separate log idea).
> > >
> > > Jon.
> > >
> > > --
> > > // Jonathan Hsieh (shay)
> > > // Software Engineer, Cloudera
> > > // j...@cloudera.com
> > >
> >
>
>
>
> --
> // Jonathan Hsieh (shay)
> // Software Engineer, Cloudera
> // j...@cloudera.com
>

Reply via email to