subject:"\[DISCUSS\] No regions on Master node in 2.0"

[DISCUSS] No regions on Master node in 2.0

2016-04-07 Thread Stack

I would like to start a discussion on whether Master should be carrying
regions or not. No hurry. I see this thread going on a while and what with
2.0 being a ways out yet, there is no need to rush to a decision.

First, some background.

Currently in the master branch, HMaster hosts 'system tables': e.g.
hbase:meta. HMaster is doing more than just gardening the cluster,
bootstrapping and keeping all up and serving healthy as in branch-1; in
master branch, it is actually in the write path for the most critical
system regions.

Master is this way because HMaster and HRegionServer servers have so much
in common, they should be just one binary, w/ HMaster as any other server
with the HMaster function a minor appendage runnable by any running
HRegionServer.

I like this idea, but the unification work was just never finished. What is
in master branch is a compromise. HMaster is not a RegionServer but a
sort-of RegionServer doing part serving. So we have HMaster role, a new
part-RegionServer-carrying-special-regions role and then a full-on
HRegionServer role. We need to fix this messyness. We could revert to plain
branch-1 roles or carrying the
HMaster-function-is-something-any-RegionServer-could-execute through to
completion.

More background from a time long-past with good comments by the likes of
our Francis Liu and Mighty Matteo Bertozzi are here [1], on unifying master
and meta-serving. Slightly related are old discussions on being able to
scale by splitting meta with good comments by our Elliott Clark [2].

Also for consideration, the landscape has since changed. [1] was written
before we had ProcedureV2 available to us where we could record
intermediate transition states local to the Master rather than remote as
intermediate updates to an hbase:meta over rpc running on another node.

Enough on the background.

Let me provoke discussion by making the statement that we should undo
HMaster carrying any regions ever; that the HMaster function is work enough
for a single dedicated server and that it important enough that it cannot
take a background role on a serving RegionServer (I could go back from this
position if evidence HMaster role could be backgrounded). Notions of a
Master carrying system tables only are just not on given system tables will
be too big for a single server especially when hbase:meta is split (so we
can scale). This simple distinction of HMaster and RegionServer roles is
also what our users know and have gotten used to so needs to be a good
reason to change it (We can still pursue the single binary that can do
HMaster or HRegionServer role determined at runtime).

Thanks,
St.Ack

1.
https://docs.google.com/document/d/1xC-bCzAAKO59Xo3XN-Cl6p-5CM_4DMoR-WpnkmYZgpw/edit#heading=h.j5yqy7n04bkn
2.
https://docs.google.com/document/d/1eCuqf7i2dkWHL0PxcE1HE1nLRQ_tCyXI4JsOB6TAk60/edit#heading=h.80vcerzbkj93

Re: [DISCUSS] No regions on Master node in 2.0

2016-11-11 Thread Stack

(Reviving an old thread that needs resolving before 2.0.0. Does Master
carry regions in hbase-2.0.0 or not? A strong argument by one of our
biggest users is made below that master hosting hbase:meta can be more
robust when updates are local and that we can up the throughput of meta
operations if hbae:meta is exclusively hosted by master.)

On Mon, Apr 25, 2016 at 12:35 PM, Gary Helmling  wrote:

> On Mon, Apr 25, 2016 at 11:20 AM Stack  wrote:
>
> > On Fri, Apr 8, 2016 at 1:42 AM, Elliott Clark  wrote:
> >
> > > # Without meta on master, we double assign and lose data.
> > >
> > > That is currently a fact that I have seen over and over on multiple
> > loaded
> > > clusters. Some abstract clean up of deployment vs losing data is a
> > > no-brainer for me. Master assignment, region split, region merge are
> all
> > > risky, and all places that HBase can lose data. Meta being hosted on
> the
> > > master makes communication easier and less flakey. Running ITBLL on a
> > loop
> > > that creates a new table every time, and without meta on master
> > everything
> > > will fail pretty reliably in ~2 days. With meta on master things pass
> > MUCH
> > > more.
> > >
> > >
>

Only answer to the above observation is demonstration that ITBLL with meta
not on master is as robust as runs that have master carrying meta.



> > The discussion is what to do in 2.0 with the assumption that master state
> > would be done up on procedure v2 making most of the transitions now done
> > over zk and hbase:meta instead local to the master with only the final
> > state published to a remote meta (an RPC but if we can't make RPC work
> > reliably in our distributed system, thats a bigger problem).
> >

>
> But making RPC work for assignment here is precisely the problem.  There's
> no reason master should have to contend with user requests to meta in order
> to be able to make updates.  And until clients can actually see the change,
> it doesn't really matter if the master state has been updated or not.
>
>
In hbase-2.0.0, there'll be a new regime. hbase:meta writing will be single
writer by master only. No more contention on writes. Regards contention
reading, this is unavoidable.

In hbase-2.0.0, only the final publishing step, what we want clients to
see, will update hbase:meta. All other transitions will be internal.


> Sure, we could add more RPC priorities, even more handler pools and
> additional queues for master requests to meta vs. user requests to meta.
> Maybe with that plus adding in regionserver groups we actually start to
> have something that comes close to what we already have today with meta on
> master.  But why should we have to add all that complexity?  None of this
> is an issue if master updates to meta are local and don't have to go
> through RPC.
>
>
(Old args)  A single server carrying meta doesn't scale, etc.

New observation is that there has been no work carrying home our recasting
of our deploy format such that master now is inline with read/writes and
exclusive host of hbase:meta region.


> > # Master hosting the system tables locates the system tables as close as
> > > possible to the machine that will be mutating the data.
> > >
> > > Data locality is something that we all work for. Short circuit local
> > reads,
> > > Caching blocks in jvm, etc. Bringing data closer to the interested
> party
> > > has a long history of making things faster and better. Master is in
> > charge
> > > of just about all mutations of all systems tables. It's in charge of
> > > changing meta, changing acls, creating new namespaces, etc. So put the
> > > memstore as close as possible to the system that's going to mutate
> meta.
> > >
> >
> >
> > Above is fine except for the bit where we need to be able to field reads.
> > Lets distribute the data to be read over the cluster rather than treat
> meta
> > reads with kid gloves hosted on a 'special' server; let these 'reads' be
> > like any other read the cluster takes (see next point)
> >
> >
> In my opinion, the real "special" part here is the master bit -- which I
> think we should be working to make less special and more just a normal bit
> of housekeeping spread across nodes -- not the regionserver role.  It only
> looks special right now because the evolution has stopped in the middle.  I
> really don't think enshrining master as a separate process is the right way
> forward for us.
>
>
I always liked this notion.

To be worked out is how Master and hbase:meta hosting would interplay (The
RS that is designated Master would also host hbase:meta? Would it be
exclusively hosting hbase;meta or hbase:meta would move with Master
function [Stuff we've talked about before]).



>
> >
> > > # If you want to make meta faster then moving it to other regionservers
> > > makes things worse.
> > >
> > > Meta can get pretty hot. Putting it with other regions that clients
> will
> > be
> > > trying to access makes everything worse. It means that meta is
> competing
> > > with user r

Re: [DISCUSS] No regions on Master node in 2.0

2016-11-11 Thread Enis Söztutar

Thanks Stack for reviving this.

How to move forward here? The Pv2 master is almost done. An ITBLL bakeoff
> of new Pv2 based assign vs a Master that exclusively hosts hbase:meta?
>
>
I think we have to evaluate whether the new pv2 master works with remote
meta
updates and the fact that those updates can fail partially or succeed
without the
client getting the reply, etc. Sorry it has been some time I've looked at
the design.
Actually what would be very good is to have a design overview / write up of
the pv2
in its current / final form so that we can evaluate. Last time I've looked
there was no
detailed design doc at all.


> St.Ack
>
>
>
> >
> > >
> >
>

Re: [DISCUSS] No regions on Master node in 2.0

2016-11-12 Thread Andrew Purtell

Thanks Stack and Enis. I concur, it's hard to say for those not intimate
with the new code.

In the absence of more information, intuition says master carries meta to
avoid a whole class of problems.

On Fri, Nov 11, 2016 at 3:27 PM, Enis Söztutar  wrote:

> Thanks Stack for reviving this.
>
> How to move forward here? The Pv2 master is almost done. An ITBLL bakeoff
> > of new Pv2 based assign vs a Master that exclusively hosts hbase:meta?
> >
> >
> I think we have to evaluate whether the new pv2 master works with remote
> meta
> updates and the fact that those updates can fail partially or succeed
> without the
> client getting the reply, etc. Sorry it has been some time I've looked at
> the design.
> Actually what would be very good is to have a design overview / write up of
> the pv2
> in its current / final form so that we can evaluate. Last time I've looked
> there was no
> detailed design doc at all.
>
>
> > St.Ack
> >
> >
> >
> > >
> > > >
> > >
> >
>

-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: [DISCUSS] No regions on Master node in 2.0

2016-11-15 Thread toffer

> In the absence of more information, intuition says master carries meta to 
>avoid a whole class of problems.
Off-hand I think the class of problems we'll eliminate are problems that are 
well understood and being constantly dealt with and hardened to this day (ie 
puts to a region).  
> I think we have to evaluate whether the new pv2 master works with remote meta 
>updates and the fact that those updates can fail partially or succeed without 
>theI think failing meta updates need to be dealt with either way AFAIK 
>eventually procedure state will be stored in HDFS which is also a distributed 
>system.

 

On Saturday, November 12, 2016 9:45 AM, Andrew Purtell 
 wrote:
 

 Thanks Stack and Enis. I concur, it's hard to say for those not intimate
with the new code.

In the absence of more information, intuition says master carries meta to
avoid a whole class of problems.

On Fri, Nov 11, 2016 at 3:27 PM, Enis Söztutar  wrote:

> Thanks Stack for reviving this.
>
> How to move forward here? The Pv2 master is almost done. An ITBLL bakeoff
> > of new Pv2 based assign vs a Master that exclusively hosts hbase:meta?
> >
> >
> I think we have to evaluate whether the new pv2 master works with remote
> meta
> updates and the fact that those updates can fail partially or succeed
> without the
> client getting the reply, etc. Sorry it has been some time I've looked at
> the design.
> Actually what would be very good is to have a design overview / write up of
> the pv2
> in its current / final form so that we can evaluate. Last time I've looked
> there was no
> detailed design doc at all.
>
>
> > St.Ack
> >
> >
> >
> > >
> > > >
> > >
> >
>



-- 
Best regards,

  - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: [DISCUSS] No regions on Master node in 2.0

2016-11-16 Thread Yu Li

Very late to the party +1 (Smile)

We also offline discussed standalone meta server here in Alibaba since
we've observed crazily high QPS on meta caused by online machine learning
workload, and in the discussion we also mentioned pros. and cons. of
serving meta on HMaster. Since quite some pros. already mentioned in the
thread, I'd like to mention one cons. here: currently we could switch
active master (almost) freely w/o affecting online service, so we could do
some hot-fix on master. But if we carry meta region on HMaster, the cost of
switching master will increase a lot and the hot-switch may not be possible
any more. Not sure whether this is an important thing for most users but
still a point to share (Smile).

And maybe another point for discussion: if not placed on HMaster, should we
have a standalone meta server or at least provide such an option?

Thanks.

Best Regards,
Yu

On 16 November 2016 at 03:43,  wrote:

> > In the absence of more information, intuition says master carries meta
> to avoid a whole class of problems.
> Off-hand I think the class of problems we'll eliminate are problems that
> are well understood and being constantly dealt with and hardened to this
> day (ie puts to a region).
> > I think we have to evaluate whether the new pv2 master works with
> remote meta updates and the fact that those updates can fail partially or
> succeed without theI think failing meta updates need to be dealt with
> either way AFAIK eventually procedure state will be stored in HDFS which is
> also a distributed system.
>
>
>
> On Saturday, November 12, 2016 9:45 AM, Andrew Purtell <
> apurt...@apache.org> wrote:
>
>
>  Thanks Stack and Enis. I concur, it's hard to say for those not intimate
> with the new code.
>
> In the absence of more information, intuition says master carries meta to
> avoid a whole class of problems.
>
> On Fri, Nov 11, 2016 at 3:27 PM, Enis Söztutar  wrote:
>
> > Thanks Stack for reviving this.
> >
> > How to move forward here? The Pv2 master is almost done. An ITBLL bakeoff
> > > of new Pv2 based assign vs a Master that exclusively hosts hbase:meta?
> > >
> > >
> > I think we have to evaluate whether the new pv2 master works with remote
> > meta
> > updates and the fact that those updates can fail partially or succeed
> > without the
> > client getting the reply, etc. Sorry it has been some time I've looked at
> > the design.
> > Actually what would be very good is to have a design overview / write up
> of
> > the pv2
> > in its current / final form so that we can evaluate. Last time I've
> looked
> > there was no
> > detailed design doc at all.
> >
> >
> > > St.Ack
> > >
> > >
> > >
> > > >
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Best regards,
>
>   - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>
>
>

Re: [DISCUSS] No regions on Master node in 2.0

2016-11-16 Thread Mikhail Antonov

(side note @Yu - FYI there has been a number of fixes/optimizations to the
way we cache and invalidate caches of region locations on the client side
in 1.3, see for example HBASE-15658, HBASE-15654).

On the topic -

Hosting meta on the machine that doesn't serve user regions would help to
ensure that updates to meta have higher chance to succeed. But if that
machine isn't Master, then we'd introduce yet one more service role to the
deployment. And I'd say that different machine roles/service types required
for HBase deployment is something we already have enough of.

I think this discussion is still at the same point as it was back then - it
looks like we're essentially comparing (A) an existing feature that works
and has practical benefits (as noted above on the thread) to the (B)
different way of doing things that's not finalized / released yet (please
correct me if I'm wrong)?

And assuming B is finalized, I'm not sure that it actually fully addresses
the problems that A addresses now. That makes me inclined to think that
removing option A before we know that the actual problems it solves now are
completely addressed by other means would put us in a bad state.

-Mikhail

On Wed, Nov 16, 2016 at 2:13 AM, Yu Li  wrote:

> Very late to the party +1 (Smile)
>
> We also offline discussed standalone meta server here in Alibaba since
> we've observed crazily high QPS on meta caused by online machine learning
> workload, and in the discussion we also mentioned pros. and cons. of
> serving meta on HMaster. Since quite some pros. already mentioned in the
> thread, I'd like to mention one cons. here: currently we could switch
> active master (almost) freely w/o affecting online service, so we could do
> some hot-fix on master. But if we carry meta region on HMaster, the cost of
> switching master will increase a lot and the hot-switch may not be possible
> any more. Not sure whether this is an important thing for most users but
> still a point to share (Smile).
>
> And maybe another point for discussion: if not placed on HMaster, should we
> have a standalone meta server or at least provide such an option?
>
> Thanks.
>
> Best Regards,
> Yu
>
> On 16 November 2016 at 03:43,  wrote:
>
> > > In the absence of more information, intuition says master carries meta
> > to avoid a whole class of problems.
> > Off-hand I think the class of problems we'll eliminate are problems that
> > are well understood and being constantly dealt with and hardened to this
> > day (ie puts to a region).
> > > I think we have to evaluate whether the new pv2 master works with
> > remote meta updates and the fact that those updates can fail partially or
> > succeed without theI think failing meta updates need to be dealt with
> > either way AFAIK eventually procedure state will be stored in HDFS which
> is
> > also a distributed system.
> >
> >
> >
> > On Saturday, November 12, 2016 9:45 AM, Andrew Purtell <
> > apurt...@apache.org> wrote:
> >
> >
> >  Thanks Stack and Enis. I concur, it's hard to say for those not intimate
> > with the new code.
> >
> > In the absence of more information, intuition says master carries meta to
> > avoid a whole class of problems.
> >
> > On Fri, Nov 11, 2016 at 3:27 PM, Enis Söztutar  wrote:
> >
> > > Thanks Stack for reviving this.
> > >
> > > How to move forward here? The Pv2 master is almost done. An ITBLL
> bakeoff
> > > > of new Pv2 based assign vs a Master that exclusively hosts
> hbase:meta?
> > > >
> > > >
> > > I think we have to evaluate whether the new pv2 master works with
> remote
> > > meta
> > > updates and the fact that those updates can fail partially or succeed
> > > without the
> > > client getting the reply, etc. Sorry it has been some time I've looked
> at
> > > the design.
> > > Actually what would be very good is to have a design overview / write
> up
> > of
> > > the pv2
> > > in its current / final form so that we can evaluate. Last time I've
> > looked
> > > there was no
> > > detailed design doc at all.
> > >
> > >
> > > > St.Ack
> > > >
> > > >
> > > >
> > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Best regards,
> >
> >   - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
> >
> >
>



-- 
Thanks,
Michael Antonov

Re: [DISCUSS] No regions on Master node in 2.0

2016-11-16 Thread 宾莉金 or binlijin

Hosting meta on the machine that doesn't serve user regions would help to
ensure that updates to meta have higher chance to succeed. But if that
machine isn't Master, then we'd introduce yet one more service role to the
deployment. And I'd say that different machine roles/service types required
for HBase deployment is something we already have enough of.

I think we can just change the balancer, and always move user regions from
the regionserver which hosting meta region to other regionservers.  At some
point there are user regions with meta region together, there is no matter,
most of the time there is only meta region.

2016-11-16 19:06 GMT+08:00 Mikhail Antonov :

> (side note @Yu - FYI there has been a number of fixes/optimizations to the
> way we cache and invalidate caches of region locations on the client side
> in 1.3, see for example HBASE-15658, HBASE-15654).
>
> On the topic -
>
> Hosting meta on the machine that doesn't serve user regions would help to
> ensure that updates to meta have higher chance to succeed. But if that
> machine isn't Master, then we'd introduce yet one more service role to the
> deployment. And I'd say that different machine roles/service types required
> for HBase deployment is something we already have enough of.
>
> I think this discussion is still at the same point as it was back then - it
> looks like we're essentially comparing (A) an existing feature that works
> and has practical benefits (as noted above on the thread) to the (B)
> different way of doing things that's not finalized / released yet (please
> correct me if I'm wrong)?
>
> And assuming B is finalized, I'm not sure that it actually fully addresses
> the problems that A addresses now. That makes me inclined to think that
> removing option A before we know that the actual problems it solves now are
> completely addressed by other means would put us in a bad state.
>
> -Mikhail
>
> On Wed, Nov 16, 2016 at 2:13 AM, Yu Li  wrote:
>
> > Very late to the party +1 (Smile)
> >
> > We also offline discussed standalone meta server here in Alibaba since
> > we've observed crazily high QPS on meta caused by online machine learning
> > workload, and in the discussion we also mentioned pros. and cons. of
> > serving meta on HMaster. Since quite some pros. already mentioned in the
> > thread, I'd like to mention one cons. here: currently we could switch
> > active master (almost) freely w/o affecting online service, so we could
> do
> > some hot-fix on master. But if we carry meta region on HMaster, the cost
> of
> > switching master will increase a lot and the hot-switch may not be
> possible
> > any more. Not sure whether this is an important thing for most users but
> > still a point to share (Smile).
> >
> > And maybe another point for discussion: if not placed on HMaster, should
> we
> > have a standalone meta server or at least provide such an option?
> >
> > Thanks.
> >
> > Best Regards,
> > Yu
> >
> > On 16 November 2016 at 03:43,  wrote:
> >
> > > > In the absence of more information, intuition says master carries
> meta
> > > to avoid a whole class of problems.
> > > Off-hand I think the class of problems we'll eliminate are problems
> that
> > > are well understood and being constantly dealt with and hardened to
> this
> > > day (ie puts to a region).
> > > > I think we have to evaluate whether the new pv2 master works with
> > > remote meta updates and the fact that those updates can fail partially
> or
> > > succeed without theI think failing meta updates need to be dealt with
> > > either way AFAIK eventually procedure state will be stored in HDFS
> which
> > is
> > > also a distributed system.
> > >
> > >
> > >
> > > On Saturday, November 12, 2016 9:45 AM, Andrew Purtell <
> > > apurt...@apache.org> wrote:
> > >
> > >
> > >  Thanks Stack and Enis. I concur, it's hard to say for those not
> intimate
> > > with the new code.
> > >
> > > In the absence of more information, intuition says master carries meta
> to
> > > avoid a whole class of problems.
> > >
> > > On Fri, Nov 11, 2016 at 3:27 PM, Enis Söztutar 
> wrote:
> > >
> > > > Thanks Stack for reviving this.
> > > >
> > > > How to move forward here? The Pv2 master is almost done. An ITBLL
> > bakeoff
> > > > > of new Pv2 based assign vs a Master that exclusively hosts
> > hbase:meta?
> > > > >
> > > > >
> > > > I think we have to evaluate whether the new pv2 master works with
> > remote
> > > > meta
> > > > updates and the fact that those updates can fail partially or succeed
> > > > without the
> > > > client getting the reply, etc. Sorry it has been some time I've
> looked
> > at
> > > > the design.
> > > > Actually what would be very good is to have a design overview / write
> > up
> > > of
> > > > the pv2
> > > > in its current / final form so that we can evaluate. Last time I've
> > > looked
> > > > there was no
> > > > detailed design doc at all.
> > > >
> > > >
> > > > > St.Ack
> > > > >
> > > > >
> > > > >
> > > > > >
> > > > > > >
> > > >

Re: [DISCUSS] No regions on Master node in 2.0

2016-11-16 Thread Jean-Marc Spaggiari

Hi all,

I'm not a commiter, but I really love this dicsussion and I want to chime
in.

I saw 2 very interesting ideas I'm reading between the lines. The first one
was to not have any master, but get on RS making master decisions and
therefore not hosting any region (or why not just the META). The balancer
might be able to take care of that. That will allow us to move the master
accountability to any other RS, which will get rid of its regions and will
get the META back at some point. Concern is regarding the data locality
when master changes to another host. META will not be local anymore until
some compactions occurs, same for the regions moved. However, from a user
stand point, that will make everything very easy to manage.
Pros:
- No more master or worker roles. All RegionServer can act as a master if
asked to (ZooKeeper elected)
- Usually worker nodes are bigger servers compared to master. That will
allow a bigger machine to serve the META, so better performances
- Allow to switch the master role anywhere at anytime since META will not
necessary have to follow right away.
Cons:
- One "master" (So a RegionServer) will run a DataNode underneath that will
not be used or used only for the META.
- People with small HBase clusters might not want to dedicate one beefy RS
to server only the META table... Might be a waste based on their usage.
- Performance impact (network read) when switching the master until META is
compacted back locally.

Another idea I'm reading here (betweem the lines again) is to have one RS
hosting ONLY the META. We keep the master roles as we have today, but the
balancer takes care of assign only the META on the RS hosting it. All other
regions going on other servers. I like this approach a lot because:
- It will be very easy to implement. We jus thave to update the balancer.
- It allows small users do disable this feature and still allow the META RS
to host other regions too.
- It allows big users to separate the META on a different server to improve
performances.
One con:
- The META is still not on the master and every operation will have to go
over the network. But is that really an issue?


Overall, I like the idea of loosing the master and all servers being able
to get any of the roles. But I think the 2nd approach might be easier to
implement and to undestand.

My 2¢ opinion ;)

JMS


2016-11-16 6:56 GMT-05:00 宾莉金 or binlijin :

> Hosting meta on the machine that doesn't serve user regions would help to
> ensure that updates to meta have higher chance to succeed. But if that
> machine isn't Master, then we'd introduce yet one more service role to the
> deployment. And I'd say that different machine roles/service types required
> for HBase deployment is something we already have enough of.
>
> I think we can just change the balancer, and always move user regions from
> the regionserver which hosting meta region to other regionservers.  At some
> point there are user regions with meta region together, there is no matter,
> most of the time there is only meta region.
>
> 2016-11-16 19:06 GMT+08:00 Mikhail Antonov :
>
> > (side note @Yu - FYI there has been a number of fixes/optimizations to
> the
> > way we cache and invalidate caches of region locations on the client side
> > in 1.3, see for example HBASE-15658, HBASE-15654).
> >
> > On the topic -
> >
> > Hosting meta on the machine that doesn't serve user regions would help to
> > ensure that updates to meta have higher chance to succeed. But if that
> > machine isn't Master, then we'd introduce yet one more service role to
> the
> > deployment. And I'd say that different machine roles/service types
> required
> > for HBase deployment is something we already have enough of.
> >
> > I think this discussion is still at the same point as it was back then -
> it
> > looks like we're essentially comparing (A) an existing feature that works
> > and has practical benefits (as noted above on the thread) to the (B)
> > different way of doing things that's not finalized / released yet (please
> > correct me if I'm wrong)?
> >
> > And assuming B is finalized, I'm not sure that it actually fully
> addresses
> > the problems that A addresses now. That makes me inclined to think that
> > removing option A before we know that the actual problems it solves now
> are
> > completely addressed by other means would put us in a bad state.
> >
> > -Mikhail
> >
> > On Wed, Nov 16, 2016 at 2:13 AM, Yu Li  wrote:
> >
> > > Very late to the party +1 (Smile)
> > >
> > > We also offline discussed standalone meta server here in Alibaba since
> > > we've observed crazily high QPS on meta caused by online machine
> learning
> > > workload, and in the discussion we also mentioned pros. and cons. of
> > > serving meta on HMaster. Since quite some pros. already mentioned in
> the
> > > thread, I'd like to mention one cons. here: currently we could switch
> > > active master (almost) freely w/o affecting online service, so we could
> > do
> > > some hot-fix on ma

Re: [DISCUSS] No regions on Master node in 2.0

2016-11-16 Thread Ted Yu

Gary has a JIRA HBASE-16025 which would reduce the load on server hosting 
hbase:meta. 

FYI

> On Nov 16, 2016, at 2:13 AM, Yu Li  wrote:
> 
> Very late to the party +1 (Smile)
> 
> We also offline discussed standalone meta server here in Alibaba since
> we've observed crazily high QPS on meta caused by online machine learning
> workload, and in the discussion we also mentioned pros. and cons. of
> serving meta on HMaster. Since quite some pros. already mentioned in the
> thread, I'd like to mention one cons. here: currently we could switch
> active master (almost) freely w/o affecting online service, so we could do
> some hot-fix on master. But if we carry meta region on HMaster, the cost of
> switching master will increase a lot and the hot-switch may not be possible
> any more. Not sure whether this is an important thing for most users but
> still a point to share (Smile).
> 
> And maybe another point for discussion: if not placed on HMaster, should we
> have a standalone meta server or at least provide such an option?
> 
> Thanks.
> 
> Best Regards,
> Yu
> 
> On 16 November 2016 at 03:43,  wrote:
> 
>>> In the absence of more information, intuition says master carries meta
>> to avoid a whole class of problems.
>> Off-hand I think the class of problems we'll eliminate are problems that
>> are well understood and being constantly dealt with and hardened to this
>> day (ie puts to a region).
>>> I think we have to evaluate whether the new pv2 master works with
>> remote meta updates and the fact that those updates can fail partially or
>> succeed without theI think failing meta updates need to be dealt with
>> either way AFAIK eventually procedure state will be stored in HDFS which is
>> also a distributed system.
>> 
>> 
>> 
>>On Saturday, November 12, 2016 9:45 AM, Andrew Purtell <
>> apurt...@apache.org> wrote:
>> 
>> 
>> Thanks Stack and Enis. I concur, it's hard to say for those not intimate
>> with the new code.
>> 
>> In the absence of more information, intuition says master carries meta to
>> avoid a whole class of problems.
>> 
>>> On Fri, Nov 11, 2016 at 3:27 PM, Enis Söztutar  wrote:
>>> 
>>> Thanks Stack for reviving this.
>>> 
>>> How to move forward here? The Pv2 master is almost done. An ITBLL bakeoff
 of new Pv2 based assign vs a Master that exclusively hosts hbase:meta?
>>> I think we have to evaluate whether the new pv2 master works with remote
>>> meta
>>> updates and the fact that those updates can fail partially or succeed
>>> without the
>>> client getting the reply, etc. Sorry it has been some time I've looked at
>>> the design.
>>> Actually what would be very good is to have a design overview / write up
>> of
>>> the pv2
>>> in its current / final form so that we can evaluate. Last time I've
>> looked
>>> there was no
>>> detailed design doc at all.
>>> 
>>> 
 St.Ack
 
 
 
> 
>> 
>> 
>> 
>> --
>> Best regards,
>> 
>>  - Andy
>> 
>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>> (via Tom White)
>> 
>> 
>>

Re: [DISCUSS] No regions on Master node in 2.0

2016-11-16 Thread Bryan Beaudreault

I'd like to echo Yu Li. As an operator/user, it is very helpful to be able
to run the masters separately. This allows for hot fixes, but also
simplifies operations in times of crisis: if there's any real issue, we can
restart the masters at will without any fear of impact.

If the plan is to colocate the masters on the regionservers for simplicity,
I can understand that to make it easier for onboarding new users. But
please make it configurable, as those of us who have been doing it a while
would probably like to keep the separation.

Honestly, I'd love for other datastores to allow separation, such as Kafka
which we have annoyingly been hit by controller bugs a few times but have
little recourse since there is no separation of controller and broker. So
I'd rather not see HBase "regress" in this way.

For us we already have to run zookeeper separately, so we colocate our
HMasters on our zookeeper nodes. If the HMaster role was taken away, we'd
still need to run zookeeper so would have the same number of servers but
have lost the flexibility of running our masters separate from
regionservers.

On Wed, Nov 16, 2016 at 8:29 AM Ted Yu  wrote:

> Gary has a JIRA HBASE-16025 which would reduce the load on server hosting
> hbase:meta.
>
> FYI
>
> > On Nov 16, 2016, at 2:13 AM, Yu Li  wrote:
> >
> > Very late to the party +1 (Smile)
> >
> > We also offline discussed standalone meta server here in Alibaba since
> > we've observed crazily high QPS on meta caused by online machine learning
> > workload, and in the discussion we also mentioned pros. and cons. of
> > serving meta on HMaster. Since quite some pros. already mentioned in the
> > thread, I'd like to mention one cons. here: currently we could switch
> > active master (almost) freely w/o affecting online service, so we could
> do
> > some hot-fix on master. But if we carry meta region on HMaster, the cost
> of
> > switching master will increase a lot and the hot-switch may not be
> possible
> > any more. Not sure whether this is an important thing for most users but
> > still a point to share (Smile).
> >
> > And maybe another point for discussion: if not placed on HMaster, should
> we
> > have a standalone meta server or at least provide such an option?
> >
> > Thanks.
> >
> > Best Regards,
> > Yu
> >
> > On 16 November 2016 at 03:43,  wrote:
> >
> >>> In the absence of more information, intuition says master carries meta
> >> to avoid a whole class of problems.
> >> Off-hand I think the class of problems we'll eliminate are problems that
> >> are well understood and being constantly dealt with and hardened to this
> >> day (ie puts to a region).
> >>> I think we have to evaluate whether the new pv2 master works with
> >> remote meta updates and the fact that those updates can fail partially
> or
> >> succeed without theI think failing meta updates need to be dealt with
> >> either way AFAIK eventually procedure state will be stored in HDFS
> which is
> >> also a distributed system.
> >>
> >>
> >>
> >>On Saturday, November 12, 2016 9:45 AM, Andrew Purtell <
> >> apurt...@apache.org> wrote:
> >>
> >>
> >> Thanks Stack and Enis. I concur, it's hard to say for those not intimate
> >> with the new code.
> >>
> >> In the absence of more information, intuition says master carries meta
> to
> >> avoid a whole class of problems.
> >>
> >>> On Fri, Nov 11, 2016 at 3:27 PM, Enis Söztutar 
> wrote:
> >>>
> >>> Thanks Stack for reviving this.
> >>>
> >>> How to move forward here? The Pv2 master is almost done. An ITBLL
> bakeoff
>  of new Pv2 based assign vs a Master that exclusively hosts hbase:meta?
> >>> I think we have to evaluate whether the new pv2 master works with
> remote
> >>> meta
> >>> updates and the fact that those updates can fail partially or succeed
> >>> without the
> >>> client getting the reply, etc. Sorry it has been some time I've looked
> at
> >>> the design.
> >>> Actually what would be very good is to have a design overview / write
> up
> >> of
> >>> the pv2
> >>> in its current / final form so that we can evaluate. Last time I've
> >> looked
> >>> there was no
> >>> detailed design doc at all.
> >>>
> >>>
>  St.Ack
> 
> 
> 
> >
> >>
> >>
> >>
> >> --
> >> Best regards,
> >>
> >>  - Andy
> >>
> >> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> >> (via Tom White)
> >>
> >>
> >>
>

Re: [DISCUSS] No regions on Master node in 2.0

2016-11-16 Thread Yu Li

@Mikhail:
Thank you sir for the reference to HBASE-15654 and good to know the efforts
on optimizing meta cache. But in our case there might be plenty of new
process launched and accessing hbase at the same time, like at the very
beginning of some big yarn batch job or during the failover of streaming
jobs after several retries, so cache no longer exists and meta will still
experience a big pressure. And as a side-effect of
CallQueueTooBigException, there might be more retry requests issued to meta
region and cause a vicious circle. I'll submit one or two JIRAs to try
relieving the pain but for a thorough solution maybe a standalone meta
server with all handlers serving the meta region is better? And yes I do
feel the pain managing multiple roles so this could be an optional but not
necessary choice?

Sorry guys if I disturbed the main topic here but I do feel the "standalone
meta server" and "colocating meta region with HMaster" topics are relative.
Let me know if any of you prefer me to open another thread for the
standalone meta server topic. Thanks.

Best Regards,
Yu

On 16 November 2016 at 22:56, Bryan Beaudreault 
wrote:

> I'd like to echo Yu Li. As an operator/user, it is very helpful to be able
> to run the masters separately. This allows for hot fixes, but also
> simplifies operations in times of crisis: if there's any real issue, we can
> restart the masters at will without any fear of impact.
>
> If the plan is to colocate the masters on the regionservers for simplicity,
> I can understand that to make it easier for onboarding new users. But
> please make it configurable, as those of us who have been doing it a while
> would probably like to keep the separation.
>
> Honestly, I'd love for other datastores to allow separation, such as Kafka
> which we have annoyingly been hit by controller bugs a few times but have
> little recourse since there is no separation of controller and broker. So
> I'd rather not see HBase "regress" in this way.
>
> For us we already have to run zookeeper separately, so we colocate our
> HMasters on our zookeeper nodes. If the HMaster role was taken away, we'd
> still need to run zookeeper so would have the same number of servers but
> have lost the flexibility of running our masters separate from
> regionservers.
>
> On Wed, Nov 16, 2016 at 8:29 AM Ted Yu  wrote:
>
> > Gary has a JIRA HBASE-16025 which would reduce the load on server hosting
> > hbase:meta.
> >
> > FYI
> >
> > > On Nov 16, 2016, at 2:13 AM, Yu Li  wrote:
> > >
> > > Very late to the party +1 (Smile)
> > >
> > > We also offline discussed standalone meta server here in Alibaba since
> > > we've observed crazily high QPS on meta caused by online machine
> learning
> > > workload, and in the discussion we also mentioned pros. and cons. of
> > > serving meta on HMaster. Since quite some pros. already mentioned in
> the
> > > thread, I'd like to mention one cons. here: currently we could switch
> > > active master (almost) freely w/o affecting online service, so we could
> > do
> > > some hot-fix on master. But if we carry meta region on HMaster, the
> cost
> > of
> > > switching master will increase a lot and the hot-switch may not be
> > possible
> > > any more. Not sure whether this is an important thing for most users
> but
> > > still a point to share (Smile).
> > >
> > > And maybe another point for discussion: if not placed on HMaster,
> should
> > we
> > > have a standalone meta server or at least provide such an option?
> > >
> > > Thanks.
> > >
> > > Best Regards,
> > > Yu
> > >
> > > On 16 November 2016 at 03:43,  wrote:
> > >
> > >>> In the absence of more information, intuition says master carries
> meta
> > >> to avoid a whole class of problems.
> > >> Off-hand I think the class of problems we'll eliminate are problems
> that
> > >> are well understood and being constantly dealt with and hardened to
> this
> > >> day (ie puts to a region).
> > >>> I think we have to evaluate whether the new pv2 master works with
> > >> remote meta updates and the fact that those updates can fail partially
> > or
> > >> succeed without theI think failing meta updates need to be dealt with
> > >> either way AFAIK eventually procedure state will be stored in HDFS
> > which is
> > >> also a distributed system.
> > >>
> > >>
> > >>
> > >>On Saturday, November 12, 2016 9:45 AM, Andrew Purtell <
> > >> apurt...@apache.org> wrote:
> > >>
> > >>
> > >> Thanks Stack and Enis. I concur, it's hard to say for those not
> intimate
> > >> with the new code.
> > >>
> > >> In the absence of more information, intuition says master carries meta
> > to
> > >> avoid a whole class of problems.
> > >>
> > >>> On Fri, Nov 11, 2016 at 3:27 PM, Enis Söztutar 
> > wrote:
> > >>>
> > >>> Thanks Stack for reviving this.
> > >>>
> > >>> How to move forward here? The Pv2 master is almost done. An ITBLL
> > bakeoff
> >  of new Pv2 based assign vs a Master that exclusively hosts
> hbase:meta?
> > >>> I think we have to evaluate whether

Re: [DISCUSS] No regions on Master node in 2.0

2016-11-16 Thread Gary Helmling

Only answer to the above observation is demonstration that ITBLL with meta
not on master is as robust as runs that have master carrying meta.


Agree that this is a prerequisite.  Another useful measure might be the
delay before an assignment under load is visible to clients.



>
> But making RPC work for assignment here is precisely the problem.  There's
> no reason master should have to contend with user requests to meta in
order
> to be able to make updates.  And until clients can actually see the
change,
> it doesn't really matter if the master state has been updated or not.
>
>
In hbase-2.0.0, there'll be a new regime. hbase:meta writing will be single
writer by master only. No more contention on writes. Regards contention
reading, this is unavoidable.



In hbase-2.0.0, only the final publishing step, what we want clients to
see, will update hbase:meta. All other transitions will be internal.


There is still contention between readers and writers over available
handler threads.  With a local meta region, you don't have assignment
manager having to contend for handler threads in order to perform writes.
This is huge for reliability.

Without meta on master, it has not been hard to reproduce scenarios where
HBase _cannot_ start up from a cold roll with high client traffic.  Region
assignments just can not complete because master has to compete with all
the clients attempting to read new region locations from meta and can't get
in the queue.  With meta on master, this goes away completely.


> >
> >
> > Above is fine except for the bit where we need to be able to field
reads.
> > Lets distribute the data to be read over the cluster rather than treat
> meta
> > reads with kid gloves hosted on a 'special' server; let these 'reads' be
> > like any other read the cluster takes (see next point)
> >
> >
> In my opinion, the real "special" part here is the master bit -- which I
> think we should be working to make less special and more just a normal bit
> of housekeeping spread across nodes -- not the regionserver role.  It only
> looks special right now because the evolution has stopped in the middle.
I
> really don't think enshrining master as a separate process is the right
way
> forward for us.
>
>
I always liked this notion.

To be worked out is how Master and hbase:meta hosting would interplay (The
RS that is designated Master would also host hbase:meta? Would it be
exclusively hosting hbase;meta or hbase:meta would move with Master
function [Stuff we've talked about before]).


I think meta should be tied to the master function for the reasons
described above. It's key that the updates to meta be local.  I don't think
that has to mean that only meta regions are hosted by the regionserver
serving as master.  There could be other user regions hosted as well, given
the server has adequate headroom to handle the master functions.



> > >
> > Is this just because meta had a dedicated server?
> >
> >
> I'm sure that having dedicated resources for meta helps.  But I don't
think
> that's sufficient.  The key is that master writes to meta are local, and
do
> not have to contend with the user requests to meta.
>
> It seems premature to be discussing dropping a working implementation
which
> eliminates painful parts of distributed consensus, until we have a
complete
> working alternative to evaluate.  Until then, why are we looking at
> features that are in use and work well?
>
>
>
How to move forward here? The Pv2 master is almost done. An ITBLL bakeoff
of new Pv2 based assign vs a Master that exclusively hosts hbase:meta?


I think that's a necessary test for proving out the new AM implementation.
But remember that we are comparing a feature which is actively supporting
production workloads with a line of active development.  I think there
should also be additional testing around situations of high meta load and
end-to-end assignment latency.

Re: [DISCUSS] No regions on Master node in 2.0

2016-11-16 Thread Gary Helmling

On Tue, Nov 15, 2016 at 11:44 AM  wrote:

> > In the absence of more information, intuition says master carries meta
> to avoid a whole class of problems.
> Off-hand I think the class of problems we'll eliminate are problems that
> are well understood and being constantly dealt with and hardened to this
> day (ie puts to a region).
> > I think we have to evaluate whether the new pv2 master works with
> remote meta updates and the fact that those updates can fail partially or
> succeed without theI think failing meta updates need to be dealt with
> either way AFAIK eventually procedure state will be stored in HDFS which is
> also a distributed system.
>
>
I don't think these are really equivalent.  If you encounter a bad DN in
the write pipeline, you can construct a new write pipeline with a new set
of DNs.  If you encounter an error updating meta, which other copy of meta
do you try to write to?

Also, I don't really see Pv2 as being able to solve the whole reliability
problem here.  If it leads to more reliable region assignments that's
great.  But if a region is assigned, but clients can't see it because meta
can't be updated, I don't think that's a big improvement.  The region's
data is offline from the client perspective until it can actually see the
new region location.

Re: [DISCUSS] No regions on Master node in 2.0

2016-11-16 Thread Gary Helmling

>
> Gary has a JIRA HBASE-16025 which would reduce the load on server hosting
> hbase:meta.
>
>
Thanks for bringing this issue up, Ted.  It still needs to be fixed. Due to
the shift of table state into meta, the git master branch will currently
trigger calls to meta for every call retry.

This is exactly the kind of regression which we are at least shielded from
when master hosts meta.  Even though bugs/changes can trigger a much higher
client read load on meta, master can still complete writes locally.

Re: [DISCUSS] No regions on Master node in 2.0

2016-11-16 Thread Ted Yu

HBASE-16025 is currently marked Critical. 

I think it should be a blocker. 

Cheers

On Nov 16, 2016, at 11:10 AM, Gary Helmling  wrote:

>> 
>> Gary has a JIRA HBASE-16025 which would reduce the load on server hosting
>> hbase:meta.
> Thanks for bringing this issue up, Ted.  It still needs to be fixed. Due to
> the shift of table state into meta, the git master branch will currently
> trigger calls to meta for every call retry.
> 
> This is exactly the kind of regression which we are at least shielded from
> when master hosts meta.  Even though bugs/changes can trigger a much higher
> client read load on meta, master can still complete writes locally.

Re: [DISCUSS] No regions on Master node in 2.0

2016-11-16 Thread Gary Helmling

On Wed, Nov 16, 2016 at 11:36 AM Ted Yu  wrote:

> HBASE-16025 is currently marked Critical.
>
> I think it should be a blocker.
>
>
Done.

Re: [DISCUSS] No regions on Master node in 2.0

2016-11-16 Thread Stack

On Wed, Nov 16, 2016 at 10:57 AM, Gary Helmling  wrote:

> Only answer to the above observation is demonstration that ITBLL with meta
> not on master is as robust as runs that have master carrying meta.
>
>
> Agree that this is a prerequisite.  Another useful measure might be the
> delay before an assignment under load is visible to clients.
>

> There is still contention between readers and writers over available
> handler threads.  With a local meta region, you don't have assignment
> manager having to contend for handler threads in order to perform writes.
> This is huge for reliability.
>
Without meta on master, it has not been hard to reproduce scenarios where
> HBase _cannot_ start up from a cold roll with high client traffic.  Region
> assignments just can not complete because master has to compete with all
> the clients attempting to read new region locations from meta and can't get
> in the queue.  With meta on master, this goes away completely.
>
>
>
This latter is a scenario to defend against. All priority handlers are
occupied by clients trying to read hbase:meta including the master that is
trying to come up by first reading current state of hbase:meta and
subsequently write. Seems easy enough to repro and to fix. Along w/ the
above ITBLL equivalence proofs, let us take on this saturated hbase;meta
scenario as a proofing test.



> > > Above is fine except for the bit where we need to be able to field
> reads.
> > > Lets distribute the data to be read over the cluster rather than treat
> > meta
> > > reads with kid gloves hosted on a 'special' server; let these 'reads'
> be
> > > like any other read the cluster takes (see next point)
> > >
> > >
> > In my opinion, the real "special" part here is the master bit -- which I
> > think we should be working to make less special and more just a normal
> bit
> > of housekeeping spread across nodes -- not the regionserver role.  It
> only
> > looks special right now because the evolution has stopped in the middle.
> I
> > really don't think enshrining master as a separate process is the right
> way
> > forward for us.
> >
> >
> I always liked this notion.
>
> To be worked out is how Master and hbase:meta hosting would interplay (The
> RS that is designated Master would also host hbase:meta? Would it be
> exclusively hosting hbase;meta or hbase:meta would move with Master
> function [Stuff we've talked about before]).
>
>
> I think meta should be tied to the master function for the reasons
> described above. It's key that the updates to meta be local.  I don't think
> that has to mean that only meta regions are hosted by the regionserver
> serving as master.  There could be other user regions hosted as well, given
> the server has adequate headroom to handle the master functions.
>
>
I don't like introducing a new node type, the meta-carrying-master. Our
cluster form changes (see the BryanB concern above).

Also, some fundamentals are badly broke if are unable to reliably maintain
a table over RPC even though dedicated priority handlers and a single
writer.

(I'll not repeat other old args on why all our meta eggs on the one node
basket is the wrong direction IMO).

Do you folks run the meta-carrying-master form G?

One way to proceed would be to preserve the master carrying meta as an
option. It'd not be default. I don't like options like this because my
guess is that the master-carrying-meta option would get not testing (other
than by folks like yourselves G).

St.Ack





>
>
> > > >
> > > Is this just because meta had a dedicated server?
> > >
> > >
> > I'm sure that having dedicated resources for meta helps.  But I don't
> think
> > that's sufficient.  The key is that master writes to meta are local, and
> do
> > not have to contend with the user requests to meta.
> >
> > It seems premature to be discussing dropping a working implementation
> which
> > eliminates painful parts of distributed consensus, until we have a
> complete
> > working alternative to evaluate.  Until then, why are we looking at
> > features that are in use and work well?
> >
> >
> >
> How to move forward here? The Pv2 master is almost done. An ITBLL bakeoff
> of new Pv2 based assign vs a Master that exclusively hosts hbase:meta?
>
>
> I think that's a necessary test for proving out the new AM implementation.
> But remember that we are comparing a feature which is actively supporting
> production workloads with a line of active development.  I think there
> should also be additional testing around situations of high meta load and
> end-to-end assignment latency.
>

Re: [DISCUSS] No regions on Master node in 2.0

2016-11-16 Thread Stack

On Wed, Nov 16, 2016 at 10:44 PM, Stack  wrote:

> On Wed, Nov 16, 2016 at 10:57 AM, Gary Helmling 
> wrote:
>
>>
>> Do you folks run the meta-carrying-master form G?
>
> Pardon me. I missed a paragraph. I see you folks do deploy this form.
St.Ack





> St.Ack
>
>
>
>
>
>>
>>
>> > > >
>> > > Is this just because meta had a dedicated server?
>> > >
>> > >
>> > I'm sure that having dedicated resources for meta helps.  But I don't
>> think
>> > that's sufficient.  The key is that master writes to meta are local, and
>> do
>> > not have to contend with the user requests to meta.
>> >
>> > It seems premature to be discussing dropping a working implementation
>> which
>> > eliminates painful parts of distributed consensus, until we have a
>> complete
>> > working alternative to evaluate.  Until then, why are we looking at
>> > features that are in use and work well?
>> >
>> >
>> >
>> How to move forward here? The Pv2 master is almost done. An ITBLL bakeoff
>> of new Pv2 based assign vs a Master that exclusively hosts hbase:meta?
>>
>>
>> I think that's a necessary test for proving out the new AM implementation.
>> But remember that we are comparing a feature which is actively supporting
>> production workloads with a line of active development.  I think there
>> should also be additional testing around situations of high meta load and
>> end-to-end assignment latency.
>>
>
>

Re: [DISCUSS] No regions on Master node in 2.0

2016-11-18 Thread Francis Liu

Just some extra bits of information:

Another way to isolate user regions from meta is you can create a regionserver 
group (HBASE-6721) dedicated to the system tables. This is what we do at Y!. If 
the load on meta gets too high (and it does), we split meta so the load gets 
spread across more regionservers (HBASE-11165) this way availability for any 
client is not affected. Tho agreeing with Stack that something is really broken 
if high priority rpcs cannot get through to meta. 
Does single writer to meta refer to the zkless assignment feature? If isn't 
that feature has been available since 0.98.6 (meta _not_ on master)? and we've 
been running with it on all our clusters for quite sometime now (with some 
enhancements ie split meta etc). 
Cheers,Francis 

On Wednesday, November 16, 2016 10:47 PM, Stack  wrote:

 On Wed, Nov 16, 2016 at 10:44 PM, Stack  wrote:

> On Wed, Nov 16, 2016 at 10:57 AM, Gary Helmling 
> wrote:
>
>>
>> Do you folks run the meta-carrying-master form G?
>
> Pardon me. I missed a paragraph. I see you folks do deploy this form.
St.Ack

> St.Ack
>
>
>
>
>
>>
>>
>> > > >
>> > > Is this just because meta had a dedicated server?
>> > >
>> > >
>> > I'm sure that having dedicated resources for meta helps.  But I don't
>> think
>> > that's sufficient.  The key is that master writes to meta are local, and
>> do
>> > not have to contend with the user requests to meta.
>> >
>> > It seems premature to be discussing dropping a working implementation
>> which
>> > eliminates painful parts of distributed consensus, until we have a
>> complete
>> > working alternative to evaluate.  Until then, why are we looking at
>> > features that are in use and work well?
>> >
>> >
>> >
>> How to move forward here? The Pv2 master is almost done. An ITBLL bakeoff
>> of new Pv2 based assign vs a Master that exclusively hosts hbase:meta?
>>
>>
>> I think that's a necessary test for proving out the new AM implementation.
>> But remember that we are comparing a feature which is actively supporting
>> production workloads with a line of active development.  I think there
>> should also be additional testing around situations of high meta load and
>> end-to-end assignment latency.
>>
>
>

Re: [DISCUSS] No regions on Master node in 2.0

2017-06-03 Thread Stack

Back to this hanging thread full of goodness.

I'm here now as your RM for hbase2 trying to extract the minimum set of
deploy forms we need to support.

On a reread of the above, I am thinking the hbase2 default is to have the
same shape as hbase1 with Master carrying NO regions (See operators Bryan
and Yu petitions above). Making it so Master only carries system tables is
a deploy that our brethren from East Palo Alto argue is the most robust
deploy type so as RM I'll ensure this deploy form is possible (The bulk of
hbase2 testing, at least by myself, will use the default). Work to make it
so any server can carry regions and the Master function can be assumed by
any has not seen follow-through so I'm punting on this as an option.

If you think different regards hbase2, please speak up.

Thanks,
St.Ack




On Fri, Nov 18, 2016 at 11:37 AM, Francis Liu 
wrote:

> Just some extra bits of information:
>
> Another way to isolate user regions from meta is you can create a
> regionserver group (HBASE-6721) dedicated to the system tables. This is
> what we do at Y!. If the load on meta gets too high (and it does), we split
> meta so the load gets spread across more regionservers (HBASE-11165) this
> way availability for any client is not affected. Tho agreeing with Stack
> that something is really broken if high priority rpcs cannot get through to
> meta.
> Does single writer to meta refer to the zkless assignment feature? If
> isn't that feature has been available since 0.98.6 (meta _not_ on master)?
> and we've been running with it on all our clusters for quite sometime now
> (with some enhancements ie split meta etc).
> Cheers,Francis
>
> On Wednesday, November 16, 2016 10:47 PM, Stack 
> wrote:
>
>
>  On Wed, Nov 16, 2016 at 10:44 PM, Stack  wrote:
>
> > On Wed, Nov 16, 2016 at 10:57 AM, Gary Helmling 
> > wrote:
> >
> >>
> >> Do you folks run the meta-carrying-master form G?
> >
> > Pardon me. I missed a paragraph. I see you folks do deploy this form.
> St.Ack
>
>
>
>
>
> > St.Ack
> >
> >
> >
> >
> >
> >>
> >>
> >> > > >
> >> > > Is this just because meta had a dedicated server?
> >> > >
> >> > >
> >> > I'm sure that having dedicated resources for meta helps.  But I don't
> >> think
> >> > that's sufficient.  The key is that master writes to meta are local,
> and
> >> do
> >> > not have to contend with the user requests to meta.
> >> >
> >> > It seems premature to be discussing dropping a working implementation
> >> which
> >> > eliminates painful parts of distributed consensus, until we have a
> >> complete
> >> > working alternative to evaluate.  Until then, why are we looking at
> >> > features that are in use and work well?
> >> >
> >> >
> >> >
> >> How to move forward here? The Pv2 master is almost done. An ITBLL
> bakeoff
> >> of new Pv2 based assign vs a Master that exclusively hosts hbase:meta?
> >>
> >>
> >> I think that's a necessary test for proving out the new AM
> implementation.
> >> But remember that we are comparing a feature which is actively
> supporting
> >> production workloads with a line of active development.  I think there
> >> should also be additional testing around situations of high meta load
> and
> >> end-to-end assignment latency.
> >>
> >
> >
>
>
>
>

Re: [DISCUSS] No regions on Master node in 2.0

2017-06-06 Thread Elliott Clark

That doesn't solve the same problem. Dedicating a remote server for the
system tables still means that all the master to system tables mutations
and reads are done over rpc. That still means that the most important
operations are competing for rpc queue time.

On Fri, Nov 18, 2016 at 11:37 AM, Francis Liu 
wrote:

> Just some extra bits of information:
>
> Another way to isolate user regions from meta is you can create a
> regionserver group (HBASE-6721) dedicated to the system tables. This is
> what we do at Y!. If the load on meta gets too high (and it does), we split
> meta so the load gets spread across more regionservers (HBASE-11165) this
> way availability for any client is not affected. Tho agreeing with Stack
> that something is really broken if high priority rpcs cannot get through to
> meta.
> Does single writer to meta refer to the zkless assignment feature? If
> isn't that feature has been available since 0.98.6 (meta _not_ on master)?
> and we've been running with it on all our clusters for quite sometime now
> (with some enhancements ie split meta etc).
> Cheers,Francis
>
> On Wednesday, November 16, 2016 10:47 PM, Stack 
> wrote:
>
>
>  On Wed, Nov 16, 2016 at 10:44 PM, Stack  wrote:
>
> > On Wed, Nov 16, 2016 at 10:57 AM, Gary Helmling 
> > wrote:
> >
> >>
> >> Do you folks run the meta-carrying-master form G?
> >
> > Pardon me. I missed a paragraph. I see you folks do deploy this form.
> St.Ack
>
>
>
>
>
> > St.Ack
> >
> >
> >
> >
> >
> >>
> >>
> >> > > >
> >> > > Is this just because meta had a dedicated server?
> >> > >
> >> > >
> >> > I'm sure that having dedicated resources for meta helps.  But I don't
> >> think
> >> > that's sufficient.  The key is that master writes to meta are local,
> and
> >> do
> >> > not have to contend with the user requests to meta.
> >> >
> >> > It seems premature to be discussing dropping a working implementation
> >> which
> >> > eliminates painful parts of distributed consensus, until we have a
> >> complete
> >> > working alternative to evaluate.  Until then, why are we looking at
> >> > features that are in use and work well?
> >> >
> >> >
> >> >
> >> How to move forward here? The Pv2 master is almost done. An ITBLL
> bakeoff
> >> of new Pv2 based assign vs a Master that exclusively hosts hbase:meta?
> >>
> >>
> >> I think that's a necessary test for proving out the new AM
> implementation.
> >> But remember that we are comparing a feature which is actively
> supporting
> >> production workloads with a line of active development.  I think there
> >> should also be additional testing around situations of high meta load
> and
> >> end-to-end assignment latency.
> >>
> >
> >
>
>
>
>

Re: [DISCUSS] No regions on Master node in 2.0

2017-06-06 Thread stack

Pardon my imprecision.  Will ensure carrying system tables short circuits
RPC.

S

On Jun 6, 2017 9:17 AM, "Elliott Clark"  wrote:

That doesn't solve the same problem. Dedicating a remote server for the
system tables still means that all the master to system tables mutations
and reads are done over rpc. That still means that the most important
operations are competing for rpc queue time.

On Fri, Nov 18, 2016 at 11:37 AM, Francis Liu 
wrote:

> Just some extra bits of information:
>
> Another way to isolate user regions from meta is you can create a
> regionserver group (HBASE-6721) dedicated to the system tables. This is
> what we do at Y!. If the load on meta gets too high (and it does), we
split
> meta so the load gets spread across more regionservers (HBASE-11165) this
> way availability for any client is not affected. Tho agreeing with Stack
> that something is really broken if high priority rpcs cannot get through
to
> meta.
> Does single writer to meta refer to the zkless assignment feature? If
> isn't that feature has been available since 0.98.6 (meta _not_ on master)?
> and we've been running with it on all our clusters for quite sometime now
> (with some enhancements ie split meta etc).
> Cheers,Francis
>
> On Wednesday, November 16, 2016 10:47 PM, Stack 
> wrote:
>
>
>  On Wed, Nov 16, 2016 at 10:44 PM, Stack  wrote:
>
> > On Wed, Nov 16, 2016 at 10:57 AM, Gary Helmling 
> > wrote:
> >
> >>
> >> Do you folks run the meta-carrying-master form G?
> >
> > Pardon me. I missed a paragraph. I see you folks do deploy this form.
> St.Ack
>
>
>
>
>
> > St.Ack
> >
> >
> >
> >
> >
> >>
> >>
> >> > > >
> >> > > Is this just because meta had a dedicated server?
> >> > >
> >> > >
> >> > I'm sure that having dedicated resources for meta helps.  But I don't
> >> think
> >> > that's sufficient.  The key is that master writes to meta are local,
> and
> >> do
> >> > not have to contend with the user requests to meta.
> >> >
> >> > It seems premature to be discussing dropping a working implementation
> >> which
> >> > eliminates painful parts of distributed consensus, until we have a
> >> complete
> >> > working alternative to evaluate.  Until then, why are we looking at
> >> > features that are in use and work well?
> >> >
> >> >
> >> >
> >> How to move forward here? The Pv2 master is almost done. An ITBLL
> bakeoff
> >> of new Pv2 based assign vs a Master that exclusively hosts hbase:meta?
> >>
> >>
> >> I think that's a necessary test for proving out the new AM
> implementation.
> >> But remember that we are comparing a feature which is actively
> supporting
> >> production workloads with a line of active development.  I think there
> >> should also be additional testing around situations of high meta load
> and
> >> end-to-end assignment latency.
> >>
> >
> >
>
>
>
>

Re: [DISCUSS] No regions on Master node in 2.0

2017-06-06 Thread Francis Liu

> That doesn't solve the same problem.Agreed as mentioned regionserver groups 
>only provides user-system region isolation. 
> That still means that the most important operations are competing for rpc 
>queue time.Given the previous setup. For meta access contention this should be 
>addressed by higher priority rpc access no?

On Tuesday, June 6, 2017 9:17 AM, Elliott Clark  wrote:
 

 That doesn't solve the same problem. Dedicating a remote server for the
system tables still means that all the master to system tables mutations
and reads are done over rpc. That still means that the most important
operations are competing for rpc queue time.

On Fri, Nov 18, 2016 at 11:37 AM, Francis Liu 
wrote:

> Just some extra bits of information:
>
> Another way to isolate user regions from meta is you can create a
> regionserver group (HBASE-6721) dedicated to the system tables. This is
> what we do at Y!. If the load on meta gets too high (and it does), we split
> meta so the load gets spread across more regionservers (HBASE-11165) this
> way availability for any client is not affected. Tho agreeing with Stack
> that something is really broken if high priority rpcs cannot get through to
> meta.
> Does single writer to meta refer to the zkless assignment feature? If
> isn't that feature has been available since 0.98.6 (meta _not_ on master)?
> and we've been running with it on all our clusters for quite sometime now
> (with some enhancements ie split meta etc).
> Cheers,Francis
>
>    On Wednesday, November 16, 2016 10:47 PM, Stack 
> wrote:
>
>
>  On Wed, Nov 16, 2016 at 10:44 PM, Stack  wrote:
>
> > On Wed, Nov 16, 2016 at 10:57 AM, Gary Helmling 
> > wrote:
> >
> >>
> >> Do you folks run the meta-carrying-master form G?
> >
> > Pardon me. I missed a paragraph. I see you folks do deploy this form.
> St.Ack
>
>
>
>
>
> > St.Ack
> >
> >
> >
> >
> >
> >>
> >>
> >> > > >
> >> > > Is this just because meta had a dedicated server?
> >> > >
> >> > >
> >> > I'm sure that having dedicated resources for meta helps.  But I don't
> >> think
> >> > that's sufficient.  The key is that master writes to meta are local,
> and
> >> do
> >> > not have to contend with the user requests to meta.
> >> >
> >> > It seems premature to be discussing dropping a working implementation
> >> which
> >> > eliminates painful parts of distributed consensus, until we have a
> >> complete
> >> > working alternative to evaluate.  Until then, why are we looking at
> >> > features that are in use and work well?
> >> >
> >> >
> >> >
> >> How to move forward here? The Pv2 master is almost done. An ITBLL
> bakeoff
> >> of new Pv2 based assign vs a Master that exclusively hosts hbase:meta?
> >>
> >>
> >> I think that's a necessary test for proving out the new AM
> implementation.
> >> But remember that we are comparing a feature which is actively
> supporting
> >> production workloads with a line of active development.  I think there
> >> should also be additional testing around situations of high meta load
> and
> >> end-to-end assignment latency.
> >>
> >
> >
>
>
>
>

Re: [DISCUSS] No regions on Master node in 2.0

2017-06-06 Thread Enis Söztutar

I still have to review the full AMv2 meta updates path to see whether there
may still be "split brain" due to the extra RPC to a remote server. But I
really like the notion of keeping the deployment topology of branch-1 by
default.

The fact is that 2.0 is already lagging, and minimizing the set of changes
to get a release out earlier is in the best interest of the community.

Enis

On Tue, Jun 6, 2017 at 10:38 AM, Francis Liu  wrote:

> > That doesn't solve the same problem.Agreed as mentioned regionserver
> groups only provides user-system region isolation.
> > That still means that the most important operations are competing for
> rpc queue time.Given the previous setup. For meta access contention this
> should be addressed by higher priority rpc access no?
>
> On Tuesday, June 6, 2017 9:17 AM, Elliott Clark 
> wrote:
>
>
>  That doesn't solve the same problem. Dedicating a remote server for the
> system tables still means that all the master to system tables mutations
> and reads are done over rpc. That still means that the most important
> operations are competing for rpc queue time.
>
> On Fri, Nov 18, 2016 at 11:37 AM, Francis Liu 
> wrote:
>
> > Just some extra bits of information:
> >
> > Another way to isolate user regions from meta is you can create a
> > regionserver group (HBASE-6721) dedicated to the system tables. This is
> > what we do at Y!. If the load on meta gets too high (and it does), we
> split
> > meta so the load gets spread across more regionservers (HBASE-11165) this
> > way availability for any client is not affected. Tho agreeing with Stack
> > that something is really broken if high priority rpcs cannot get through
> to
> > meta.
> > Does single writer to meta refer to the zkless assignment feature? If
> > isn't that feature has been available since 0.98.6 (meta _not_ on
> master)?
> > and we've been running with it on all our clusters for quite sometime now
> > (with some enhancements ie split meta etc).
> > Cheers,Francis
> >
> >On Wednesday, November 16, 2016 10:47 PM, Stack 
> > wrote:
> >
> >
> >  On Wed, Nov 16, 2016 at 10:44 PM, Stack  wrote:
> >
> > > On Wed, Nov 16, 2016 at 10:57 AM, Gary Helmling 
> > > wrote:
> > >
> > >>
> > >> Do you folks run the meta-carrying-master form G?
> > >
> > > Pardon me. I missed a paragraph. I see you folks do deploy this form.
> > St.Ack
> >
> >
> >
> >
> >
> > > St.Ack
> > >
> > >
> > >
> > >
> > >
> > >>
> > >>
> > >> > > >
> > >> > > Is this just because meta had a dedicated server?
> > >> > >
> > >> > >
> > >> > I'm sure that having dedicated resources for meta helps.  But I
> don't
> > >> think
> > >> > that's sufficient.  The key is that master writes to meta are local,
> > and
> > >> do
> > >> > not have to contend with the user requests to meta.
> > >> >
> > >> > It seems premature to be discussing dropping a working
> implementation
> > >> which
> > >> > eliminates painful parts of distributed consensus, until we have a
> > >> complete
> > >> > working alternative to evaluate.  Until then, why are we looking at
> > >> > features that are in use and work well?
> > >> >
> > >> >
> > >> >
> > >> How to move forward here? The Pv2 master is almost done. An ITBLL
> > bakeoff
> > >> of new Pv2 based assign vs a Master that exclusively hosts hbase:meta?
> > >>
> > >>
> > >> I think that's a necessary test for proving out the new AM
> > implementation.
> > >> But remember that we are comparing a feature which is actively
> > supporting
> > >> production workloads with a line of active development.  I think there
> > >> should also be additional testing around situations of high meta load
> > and
> > >> end-to-end assignment latency.
> > >>
> > >
> > >
> >
> >
> >
> >
>
>
>
>

Re: [DISCUSS] No regions on Master node in 2.0

2017-08-14 Thread Stack

(Trying to tie off this thread...)

HBASE-18511 changes the configuration hbase.balancer.tablesOnMaster from a
list of table names that the master can carry (with 'none' meaning no
tables on the master) to instead be a boolean as to whether the Master
carries tables/regions or not.

When true, the master acts like any other regionserver in the cluster
hosting regions while also fulfilling master role. If false, the master
carries no tables (False is the default for hbase-2.0.0).

Another boolean configuration,
hbase.balancer.tablesOnMaster.systemTablesOnly, when set to true, enables
hbase.balancer.tablesOnMaster and makes it so the Master hosts system
tables exclusively (the long-time deploy mode in master branch and branch-2
up until HBASE-18511 goes in).

As part of HBASE-18511, verified that RPCs are short-circuited if the
region is local to the master.

The change of hbase.balancer.tablesOnMaster from String list to boolean and
the addition of a simple boolean to enable system-tables on Master was done
to constrain what operators might ask for via this master configuration.
Stipulating what tables are bound to the Master server verges into
regionserver grouping territory, a more robust means of specifying table
and server combinations. Operators should use this latter if they want
layouts more exotic than those supplied by the provided booleans.

Thanks,
St.Ack


On Tue, Jun 6, 2017 at 11:20 AM, Enis Söztutar  wrote:

> I still have to review the full AMv2 meta updates path to see whether there
> may still be "split brain" due to the extra RPC to a remote server. But I
> really like the notion of keeping the deployment topology of branch-1 by
> default.
>
> The fact is that 2.0 is already lagging, and minimizing the set of changes
> to get a release out earlier is in the best interest of the community.
>
> Enis
>
> On Tue, Jun 6, 2017 at 10:38 AM, Francis Liu  wrote:
>
> > > That doesn't solve the same problem.Agreed as mentioned regionserver
> > groups only provides user-system region isolation.
> > > That still means that the most important operations are competing for
> > rpc queue time.Given the previous setup. For meta access contention this
> > should be addressed by higher priority rpc access no?
> >
> > On Tuesday, June 6, 2017 9:17 AM, Elliott Clark 
> > wrote:
> >
> >
> >  That doesn't solve the same problem. Dedicating a remote server for the
> > system tables still means that all the master to system tables mutations
> > and reads are done over rpc. That still means that the most important
> > operations are competing for rpc queue time.
> >
> > On Fri, Nov 18, 2016 at 11:37 AM, Francis Liu 
> > wrote:
> >
> > > Just some extra bits of information:
> > >
> > > Another way to isolate user regions from meta is you can create a
> > > regionserver group (HBASE-6721) dedicated to the system tables. This is
> > > what we do at Y!. If the load on meta gets too high (and it does), we
> > split
> > > meta so the load gets spread across more regionservers (HBASE-11165)
> this
> > > way availability for any client is not affected. Tho agreeing with
> Stack
> > > that something is really broken if high priority rpcs cannot get
> through
> > to
> > > meta.
> > > Does single writer to meta refer to the zkless assignment feature? If
> > > isn't that feature has been available since 0.98.6 (meta _not_ on
> > master)?
> > > and we've been running with it on all our clusters for quite sometime
> now
> > > (with some enhancements ie split meta etc).
> > > Cheers,Francis
> > >
> > >On Wednesday, November 16, 2016 10:47 PM, Stack 
> > > wrote:
> > >
> > >
> > >  On Wed, Nov 16, 2016 at 10:44 PM, Stack  wrote:
> > >
> > > > On Wed, Nov 16, 2016 at 10:57 AM, Gary Helmling  >
> > > > wrote:
> > > >
> > > >>
> > > >> Do you folks run the meta-carrying-master form G?
> > > >
> > > > Pardon me. I missed a paragraph. I see you folks do deploy this form.
> > > St.Ack
> > >
> > >
> > >
> > >
> > >
> > > > St.Ack
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >>
> > > >>
> > > >> > > >
> > > >> > > Is this just because meta had a dedicated server?
> > > >> > >
> > > >> > >
> > > >> > I'm sure that having dedicated resources for meta helps.  But I
> > don't
> > > >> think
> > > >> > that's sufficient.  The key is that master writes to meta are
> local,
> > > and
> > > >> do
> > > >> > not have to contend with the user requests to meta.
> > > >> >
> > > >> > It seems premature to be discussing dropping a working
> > implementation
> > > >> which
> > > >> > eliminates painful parts of distributed consensus, until we have a
> > > >> complete
> > > >> > working alternative to evaluate.  Until then, why are we looking
> at
> > > >> > features that are in use and work well?
> > > >> >
> > > >> >
> > > >> >
> > > >> How to move forward here? The Pv2 master is almost done. An ITBLL
> > > bakeoff
> > > >> of new Pv2 based assign vs a Master that exclusively hosts
> hbase:meta?
> > > >>
> > > >>
> > > >> I think that's a nec

Re: [DISCUSS] No regions on Master node in 2.0

2016-04-08 Thread Elliott Clark

# Without meta on master, we double assign and lose data.

That is currently a fact that I have seen over and over on multiple loaded
clusters. Some abstract clean up of deployment vs losing data is a
no-brainer for me. Master assignment, region split, region merge are all
risky, and all places that HBase can lose data. Meta being hosted on the
master makes communication easier and less flakey. Running ITBLL on a loop
that creates a new table every time, and without meta on master everything
will fail pretty reliably in ~2 days. With meta on master things pass MUCH
more.

# Master hosting the system tables locates the system tables as close as
possible to the machine that will be mutating the data.

Data locality is something that we all work for. Short circuit local reads,
Caching blocks in jvm, etc. Bringing data closer to the interested party
has a long history of making things faster and better. Master is in charge
of just about all mutations of all systems tables. It's in charge of
changing meta, changing acls, creating new namespaces, etc. So put the
memstore as close as possible to the system that's going to mutate meta.

# If you want to make meta faster then moving it to other regionservers
makes things worse.

Meta can get pretty hot. Putting it with other regions that clients will be
trying to access makes everything worse. It means that meta is competing
with user requests. If meta gets served and other requests don't, causing
more requests to meta; or requests to user regions get served and other
clients get starved.
At FB we've seen read throughput to meta doubled or more by swapping it to
master. Writes to meta are also much faster since there's no rpc hop, no
queueing, to fighting with reads. So far it has been the single biggest
thing to make meta faster.

On Thu, Apr 7, 2016 at 10:11 PM, Stack  wrote:

> I would like to start a discussion on whether Master should be carrying
> regions or not. No hurry. I see this thread going on a while and what with
> 2.0 being a ways out yet, there is no need to rush to a decision.
>
> First, some background.
>
> Currently in the master branch, HMaster hosts 'system tables': e.g.
> hbase:meta. HMaster is doing more than just gardening the cluster,
> bootstrapping and keeping all up and serving healthy as in branch-1; in
> master branch, it is actually in the write path for the most critical
> system regions.
>
> Master is this way because HMaster and HRegionServer servers have so much
> in common, they should be just one binary, w/ HMaster as any other server
> with the HMaster function a minor appendage runnable by any running
> HRegionServer.
>
> I like this idea, but the unification work was just never finished. What is
> in master branch is a compromise. HMaster is not a RegionServer but a
> sort-of RegionServer doing part serving. So we have HMaster role, a new
> part-RegionServer-carrying-special-regions role and then a full-on
> HRegionServer role. We need to fix this messyness. We could revert to plain
> branch-1 roles or carrying the
> HMaster-function-is-something-any-RegionServer-could-execute through to
> completion.
>
> More background from a time long-past with good comments by the likes of
> our Francis Liu and Mighty Matteo Bertozzi are here [1], on unifying master
> and meta-serving. Slightly related are old discussions on being able to
> scale by splitting meta with good comments by our Elliott Clark [2].
>
> Also for consideration, the landscape has since changed. [1] was written
> before we had ProcedureV2 available to us where we could record
> intermediate transition states local to the Master rather than remote as
> intermediate updates to an hbase:meta over rpc running on another node.
>
> Enough on the background.
>
> Let me provoke discussion by making the statement that we should undo
> HMaster carrying any regions ever; that the HMaster function is work enough
> for a single dedicated server and that it important enough that it cannot
> take a background role on a serving RegionServer (I could go back from this
> position if evidence HMaster role could be backgrounded). Notions of a
> Master carrying system tables only are just not on given system tables will
> be too big for a single server especially when hbase:meta is split (so we
> can scale). This simple distinction of HMaster and RegionServer roles is
> also what our users know and have gotten used to so needs to be a good
> reason to change it (We can still pursue the single binary that can do
> HMaster or HRegionServer role determined at runtime).
>
> Thanks,
> St.Ack
>
> 1.
>
> https://docs.google.com/document/d/1xC-bCzAAKO59Xo3XN-Cl6p-5CM_4DMoR-WpnkmYZgpw/edit#heading=h.j5yqy7n04bkn
> 2.
>
> https://docs.google.com/document/d/1eCuqf7i2dkWHL0PxcE1HE1nLRQ_tCyXI4JsOB6TAk60/edit#heading=h.80vcerzbkj93
>

Re: [DISCUSS] No regions on Master node in 2.0

2016-04-08 Thread 张铎

Agree on the performance concerns. IMO we should not hurt the performance
of small(maybe normal?) clusters when scaling for huge clusters.
And I also agree that the current implementation which allows Master to
carry system regions is not good(sorry for the chinglish...). At least, it
makes the master startup really complicated.

So IMO, we should let the master process or master machine to also carry
system regions, but in another way. Start another RS instance on the same
machine or in the same JVM? Or build a new storage based on the procedure
store and convert it to a normal table when it is too large?

Thanks.

2016-04-08 16:42 GMT+08:00 Elliott Clark :

> # Without meta on master, we double assign and lose data.
>
> That is currently a fact that I have seen over and over on multiple loaded
> clusters. Some abstract clean up of deployment vs losing data is a
> no-brainer for me. Master assignment, region split, region merge are all
> risky, and all places that HBase can lose data. Meta being hosted on the
> master makes communication easier and less flakey. Running ITBLL on a loop
> that creates a new table every time, and without meta on master everything
> will fail pretty reliably in ~2 days. With meta on master things pass MUCH
> more.
>
> # Master hosting the system tables locates the system tables as close as
> possible to the machine that will be mutating the data.
>
> Data locality is something that we all work for. Short circuit local reads,
> Caching blocks in jvm, etc. Bringing data closer to the interested party
> has a long history of making things faster and better. Master is in charge
> of just about all mutations of all systems tables. It's in charge of
> changing meta, changing acls, creating new namespaces, etc. So put the
> memstore as close as possible to the system that's going to mutate meta.
>
> # If you want to make meta faster then moving it to other regionservers
> makes things worse.
>
> Meta can get pretty hot. Putting it with other regions that clients will be
> trying to access makes everything worse. It means that meta is competing
> with user requests. If meta gets served and other requests don't, causing
> more requests to meta; or requests to user regions get served and other
> clients get starved.
> At FB we've seen read throughput to meta doubled or more by swapping it to
> master. Writes to meta are also much faster since there's no rpc hop, no
> queueing, to fighting with reads. So far it has been the single biggest
> thing to make meta faster.
>
>
> On Thu, Apr 7, 2016 at 10:11 PM, Stack  wrote:
>
> > I would like to start a discussion on whether Master should be carrying
> > regions or not. No hurry. I see this thread going on a while and what
> with
> > 2.0 being a ways out yet, there is no need to rush to a decision.
> >
> > First, some background.
> >
> > Currently in the master branch, HMaster hosts 'system tables': e.g.
> > hbase:meta. HMaster is doing more than just gardening the cluster,
> > bootstrapping and keeping all up and serving healthy as in branch-1; in
> > master branch, it is actually in the write path for the most critical
> > system regions.
> >
> > Master is this way because HMaster and HRegionServer servers have so much
> > in common, they should be just one binary, w/ HMaster as any other server
> > with the HMaster function a minor appendage runnable by any running
> > HRegionServer.
> >
> > I like this idea, but the unification work was just never finished. What
> is
> > in master branch is a compromise. HMaster is not a RegionServer but a
> > sort-of RegionServer doing part serving. So we have HMaster role, a new
> > part-RegionServer-carrying-special-regions role and then a full-on
> > HRegionServer role. We need to fix this messyness. We could revert to
> plain
> > branch-1 roles or carrying the
> > HMaster-function-is-something-any-RegionServer-could-execute through to
> > completion.
> >
> > More background from a time long-past with good comments by the likes of
> > our Francis Liu and Mighty Matteo Bertozzi are here [1], on unifying
> master
> > and meta-serving. Slightly related are old discussions on being able to
> > scale by splitting meta with good comments by our Elliott Clark [2].
> >
> > Also for consideration, the landscape has since changed. [1] was written
> > before we had ProcedureV2 available to us where we could record
> > intermediate transition states local to the Master rather than remote as
> > intermediate updates to an hbase:meta over rpc running on another node.
> >
> > Enough on the background.
> >
> > Let me provoke discussion by making the statement that we should undo
> > HMaster carrying any regions ever; that the HMaster function is work
> enough
> > for a single dedicated server and that it important enough that it cannot
> > take a background role on a serving RegionServer (I could go back from
> this
> > position if evidence HMaster role could be backgrounded). Notions of a
> > Master carrying system

Re: [DISCUSS] No regions on Master node in 2.0

2016-04-08 Thread Matteo Bertozzi

# Without meta on master, we double assign and lose data.

I doubt meta on master solve this problem.
This has more to do on the fact that balancer, assignment, split, merge
are disjoint operations that are not aware of each other.
also those operation in general consist of multiple steps and if the master
crashes you may end up in an inconsistent state.

this is what proc-v2 should solve. since we are aware of each operation
there is no chance of double assignment and similar by design.

The master doesn't need the full meta to operate properly
it just need the "state" (at which point of the operation am I).
which is the wal of proc-v2. given that we can split meta or meta
remote without any problem. since we only have 1 update to meta to
update the location when the assignment is completed.

also at the moment the master has a copy of the information in meta.
a map with the RegionInfo, state and locations. but we are still doing
a query on meta instead of using that local map directly.
if we move meta on master we can remove that extra copy, but that
will tight together meta and master making impossible to offload meta, if
we need to.


In my opinion with the new assignment you have all the main problem solved.
we can keep regions on master as we have now,
so you can configure it to get more performance (avoid the remote rpc).
but our design should allow meta to be split and to be hosted somewhere
else.

Matteo


On Fri, Apr 8, 2016 at 2:08 AM, 张铎  wrote:

> Agree on the performance concerns. IMO we should not hurt the performance
> of small(maybe normal?) clusters when scaling for huge clusters.
> And I also agree that the current implementation which allows Master to
> carry system regions is not good(sorry for the chinglish...). At least, it
> makes the master startup really complicated.
>
> So IMO, we should let the master process or master machine to also carry
> system regions, but in another way. Start another RS instance on the same
> machine or in the same JVM? Or build a new storage based on the procedure
> store and convert it to a normal table when it is too large?
>
> Thanks.
>
> 2016-04-08 16:42 GMT+08:00 Elliott Clark :
>
> > # Without meta on master, we double assign and lose data.
> >
> > That is currently a fact that I have seen over and over on multiple
> loaded
> > clusters. Some abstract clean up of deployment vs losing data is a
> > no-brainer for me. Master assignment, region split, region merge are all
> > risky, and all places that HBase can lose data. Meta being hosted on the
> > master makes communication easier and less flakey. Running ITBLL on a
> loop
> > that creates a new table every time, and without meta on master
> everything
> > will fail pretty reliably in ~2 days. With meta on master things pass
> MUCH
> > more.
> >
> > # Master hosting the system tables locates the system tables as close as
> > possible to the machine that will be mutating the data.
> >
> > Data locality is something that we all work for. Short circuit local
> reads,
> > Caching blocks in jvm, etc. Bringing data closer to the interested party
> > has a long history of making things faster and better. Master is in
> charge
> > of just about all mutations of all systems tables. It's in charge of
> > changing meta, changing acls, creating new namespaces, etc. So put the
> > memstore as close as possible to the system that's going to mutate meta.
> >
> > # If you want to make meta faster then moving it to other regionservers
> > makes things worse.
> >
> > Meta can get pretty hot. Putting it with other regions that clients will
> be
> > trying to access makes everything worse. It means that meta is competing
> > with user requests. If meta gets served and other requests don't, causing
> > more requests to meta; or requests to user regions get served and other
> > clients get starved.
> > At FB we've seen read throughput to meta doubled or more by swapping it
> to
> > master. Writes to meta are also much faster since there's no rpc hop, no
> > queueing, to fighting with reads. So far it has been the single biggest
> > thing to make meta faster.
> >
> >
> > On Thu, Apr 7, 2016 at 10:11 PM, Stack  wrote:
> >
> > > I would like to start a discussion on whether Master should be carrying
> > > regions or not. No hurry. I see this thread going on a while and what
> > with
> > > 2.0 being a ways out yet, there is no need to rush to a decision.
> > >
> > > First, some background.
> > >
> > > Currently in the master branch, HMaster hosts 'system tables': e.g.
> > > hbase:meta. HMaster is doing more than just gardening the cluster,
> > > bootstrapping and keeping all up and serving healthy as in branch-1; in
> > > master branch, it is actually in the write path for the most critical
> > > system regions.
> > >
> > > Master is this way because HMaster and HRegionServer servers have so
> much
> > > in common, they should be just one binary, w/ HMaster as any other
> server
> > > with the HMaster function a minor app

Re: [DISCUSS] No regions on Master node in 2.0

2016-04-08 Thread Jimmy Xiang

One thing I'd like to say is that it makes the master startup much
more simpler and realiable to put system tables on master.

Even if proc-v2 can solve the problem, it makes things complicated,
right? I prefer to be sure that meta is always available, in a
consistent state.

If we really need to split meta, we should have an option for most
users to have just one meta region, and keep it on master.


On Fri, Apr 8, 2016 at 8:03 AM, Matteo Bertozzi  wrote:
> # Without meta on master, we double assign and lose data.
>
> I doubt meta on master solve this problem.
> This has more to do on the fact that balancer, assignment, split, merge
> are disjoint operations that are not aware of each other.
> also those operation in general consist of multiple steps and if the master
> crashes you may end up in an inconsistent state.
>
> this is what proc-v2 should solve. since we are aware of each operation
> there is no chance of double assignment and similar by design.
>
> The master doesn't need the full meta to operate properly
> it just need the "state" (at which point of the operation am I).
> which is the wal of proc-v2. given that we can split meta or meta
> remote without any problem. since we only have 1 update to meta to
> update the location when the assignment is completed.
>
> also at the moment the master has a copy of the information in meta.
> a map with the RegionInfo, state and locations. but we are still doing
> a query on meta instead of using that local map directly.
> if we move meta on master we can remove that extra copy, but that
> will tight together meta and master making impossible to offload meta, if
> we need to.
>
>
> In my opinion with the new assignment you have all the main problem solved.
> we can keep regions on master as we have now,
> so you can configure it to get more performance (avoid the remote rpc).
> but our design should allow meta to be split and to be hosted somewhere
> else.
>
> Matteo
>
>
> On Fri, Apr 8, 2016 at 2:08 AM, 张铎  wrote:
>
>> Agree on the performance concerns. IMO we should not hurt the performance
>> of small(maybe normal?) clusters when scaling for huge clusters.
>> And I also agree that the current implementation which allows Master to
>> carry system regions is not good(sorry for the chinglish...). At least, it
>> makes the master startup really complicated.
>>
>> So IMO, we should let the master process or master machine to also carry
>> system regions, but in another way. Start another RS instance on the same
>> machine or in the same JVM? Or build a new storage based on the procedure
>> store and convert it to a normal table when it is too large?
>>
>> Thanks.
>>
>> 2016-04-08 16:42 GMT+08:00 Elliott Clark :
>>
>> > # Without meta on master, we double assign and lose data.
>> >
>> > That is currently a fact that I have seen over and over on multiple
>> loaded
>> > clusters. Some abstract clean up of deployment vs losing data is a
>> > no-brainer for me. Master assignment, region split, region merge are all
>> > risky, and all places that HBase can lose data. Meta being hosted on the
>> > master makes communication easier and less flakey. Running ITBLL on a
>> loop
>> > that creates a new table every time, and without meta on master
>> everything
>> > will fail pretty reliably in ~2 days. With meta on master things pass
>> MUCH
>> > more.
>> >
>> > # Master hosting the system tables locates the system tables as close as
>> > possible to the machine that will be mutating the data.
>> >
>> > Data locality is something that we all work for. Short circuit local
>> reads,
>> > Caching blocks in jvm, etc. Bringing data closer to the interested party
>> > has a long history of making things faster and better. Master is in
>> charge
>> > of just about all mutations of all systems tables. It's in charge of
>> > changing meta, changing acls, creating new namespaces, etc. So put the
>> > memstore as close as possible to the system that's going to mutate meta.
>> >
>> > # If you want to make meta faster then moving it to other regionservers
>> > makes things worse.
>> >
>> > Meta can get pretty hot. Putting it with other regions that clients will
>> be
>> > trying to access makes everything worse. It means that meta is competing
>> > with user requests. If meta gets served and other requests don't, causing
>> > more requests to meta; or requests to user regions get served and other
>> > clients get starved.
>> > At FB we've seen read throughput to meta doubled or more by swapping it
>> to
>> > master. Writes to meta are also much faster since there's no rpc hop, no
>> > queueing, to fighting with reads. So far it has been the single biggest
>> > thing to make meta faster.
>> >
>> >
>> > On Thu, Apr 7, 2016 at 10:11 PM, Stack  wrote:
>> >
>> > > I would like to start a discussion on whether Master should be carrying
>> > > regions or not. No hurry. I see this thread going on a while and what
>> > with
>> > > 2.0 being a ways out yet, there is no need to rush to a d

Re: [DISCUSS] No regions on Master node in 2.0

2016-04-08 Thread Matteo Bertozzi

I think proc-v2 make things easier than having meta hard coded on master.
we just read the wal and we get back to the state we were previously.
In this case it doesn't make any difference if meta is on master or remote,
if we have one or we have hundred.

if we hard code meta, we need a special logic to load it and from there
start the bootstrap of the other regions.
then there is no way to switch to multiple metas if someone wants that,
unless we keep two code path and one of that will be proc-v2.
so at that point we should just keep a single code path that does both.


On Fri, Apr 8, 2016 at 8:27 AM, Jimmy Xiang  wrote:

> One thing I'd like to say is that it makes the master startup much
> more simpler and realiable to put system tables on master.
>
> Even if proc-v2 can solve the problem, it makes things complicated,
> right? I prefer to be sure that meta is always available, in a
> consistent state.
>
> If we really need to split meta, we should have an option for most
> users to have just one meta region, and keep it on master.
>
>
> On Fri, Apr 8, 2016 at 8:03 AM, Matteo Bertozzi 
> wrote:
> > # Without meta on master, we double assign and lose data.
> >
> > I doubt meta on master solve this problem.
> > This has more to do on the fact that balancer, assignment, split, merge
> > are disjoint operations that are not aware of each other.
> > also those operation in general consist of multiple steps and if the
> master
> > crashes you may end up in an inconsistent state.
> >
> > this is what proc-v2 should solve. since we are aware of each operation
> > there is no chance of double assignment and similar by design.
> >
> > The master doesn't need the full meta to operate properly
> > it just need the "state" (at which point of the operation am I).
> > which is the wal of proc-v2. given that we can split meta or meta
> > remote without any problem. since we only have 1 update to meta to
> > update the location when the assignment is completed.
> >
> > also at the moment the master has a copy of the information in meta.
> > a map with the RegionInfo, state and locations. but we are still doing
> > a query on meta instead of using that local map directly.
> > if we move meta on master we can remove that extra copy, but that
> > will tight together meta and master making impossible to offload meta, if
> > we need to.
> >
> >
> > In my opinion with the new assignment you have all the main problem
> solved.
> > we can keep regions on master as we have now,
> > so you can configure it to get more performance (avoid the remote rpc).
> > but our design should allow meta to be split and to be hosted somewhere
> > else.
> >
> > Matteo
> >
> >
> > On Fri, Apr 8, 2016 at 2:08 AM, 张铎  wrote:
> >
> >> Agree on the performance concerns. IMO we should not hurt the
> performance
> >> of small(maybe normal?) clusters when scaling for huge clusters.
> >> And I also agree that the current implementation which allows Master to
> >> carry system regions is not good(sorry for the chinglish...). At least,
> it
> >> makes the master startup really complicated.
> >>
> >> So IMO, we should let the master process or master machine to also carry
> >> system regions, but in another way. Start another RS instance on the
> same
> >> machine or in the same JVM? Or build a new storage based on the
> procedure
> >> store and convert it to a normal table when it is too large?
> >>
> >> Thanks.
> >>
> >> 2016-04-08 16:42 GMT+08:00 Elliott Clark :
> >>
> >> > # Without meta on master, we double assign and lose data.
> >> >
> >> > That is currently a fact that I have seen over and over on multiple
> >> loaded
> >> > clusters. Some abstract clean up of deployment vs losing data is a
> >> > no-brainer for me. Master assignment, region split, region merge are
> all
> >> > risky, and all places that HBase can lose data. Meta being hosted on
> the
> >> > master makes communication easier and less flakey. Running ITBLL on a
> >> loop
> >> > that creates a new table every time, and without meta on master
> >> everything
> >> > will fail pretty reliably in ~2 days. With meta on master things pass
> >> MUCH
> >> > more.
> >> >
> >> > # Master hosting the system tables locates the system tables as close
> as
> >> > possible to the machine that will be mutating the data.
> >> >
> >> > Data locality is something that we all work for. Short circuit local
> >> reads,
> >> > Caching blocks in jvm, etc. Bringing data closer to the interested
> party
> >> > has a long history of making things faster and better. Master is in
> >> charge
> >> > of just about all mutations of all systems tables. It's in charge of
> >> > changing meta, changing acls, creating new namespaces, etc. So put the
> >> > memstore as close as possible to the system that's going to mutate
> meta.
> >> >
> >> > # If you want to make meta faster then moving it to other
> regionservers
> >> > makes things worse.
> >> >
> >> > Meta can get pretty hot. Putting it with other regions that clien

Re: [DISCUSS] No regions on Master node in 2.0

2016-04-08 Thread Elliott Clark

Proc v2 can't fix that it's harder to get a write into meta when going over
rpc. Our try at qos doesn't fix it. As long as critical meta operations are
competing with user requests meta will be unstabla

I am absolutely confident that meta on master makes hbase lose less data.
The itbll tests bear this out. The real world experience bears this out.
On Apr 8, 2016 8:03 AM, "Matteo Bertozzi"  wrote:

> # Without meta on master, we double assign and lose data.
>
> I doubt meta on master solve this problem.
> This has more to do on the fact that balancer, assignment, split, merge
> are disjoint operations that are not aware of each other.
> also those operation in general consist of multiple steps and if the master
> crashes you may end up in an inconsistent state.
>
> this is what proc-v2 should solve. since we are aware of each operation
> there is no chance of double assignment and similar by design.
>
> The master doesn't need the full meta to operate properly
> it just need the "state" (at which point of the operation am I).
> which is the wal of proc-v2. given that we can split meta or meta
> remote without any problem. since we only have 1 update to meta to
> update the location when the assignment is completed.
>
> also at the moment the master has a copy of the information in meta.
> a map with the RegionInfo, state and locations. but we are still doing
> a query on meta instead of using that local map directly.
> if we move meta on master we can remove that extra copy, but that
> will tight together meta and master making impossible to offload meta, if
> we need to.
>
>
> In my opinion with the new assignment you have all the main problem solved.
> we can keep regions on master as we have now,
> so you can configure it to get more performance (avoid the remote rpc).
> but our design should allow meta to be split and to be hosted somewhere
> else.
>
> Matteo
>
>
> On Fri, Apr 8, 2016 at 2:08 AM, 张铎  wrote:
>
> > Agree on the performance concerns. IMO we should not hurt the performance
> > of small(maybe normal?) clusters when scaling for huge clusters.
> > And I also agree that the current implementation which allows Master to
> > carry system regions is not good(sorry for the chinglish...). At least,
> it
> > makes the master startup really complicated.
> >
> > So IMO, we should let the master process or master machine to also carry
> > system regions, but in another way. Start another RS instance on the same
> > machine or in the same JVM? Or build a new storage based on the procedure
> > store and convert it to a normal table when it is too large?
> >
> > Thanks.
> >
> > 2016-04-08 16:42 GMT+08:00 Elliott Clark :
> >
> > > # Without meta on master, we double assign and lose data.
> > >
> > > That is currently a fact that I have seen over and over on multiple
> > loaded
> > > clusters. Some abstract clean up of deployment vs losing data is a
> > > no-brainer for me. Master assignment, region split, region merge are
> all
> > > risky, and all places that HBase can lose data. Meta being hosted on
> the
> > > master makes communication easier and less flakey. Running ITBLL on a
> > loop
> > > that creates a new table every time, and without meta on master
> > everything
> > > will fail pretty reliably in ~2 days. With meta on master things pass
> > MUCH
> > > more.
> > >
> > > # Master hosting the system tables locates the system tables as close
> as
> > > possible to the machine that will be mutating the data.
> > >
> > > Data locality is something that we all work for. Short circuit local
> > reads,
> > > Caching blocks in jvm, etc. Bringing data closer to the interested
> party
> > > has a long history of making things faster and better. Master is in
> > charge
> > > of just about all mutations of all systems tables. It's in charge of
> > > changing meta, changing acls, creating new namespaces, etc. So put the
> > > memstore as close as possible to the system that's going to mutate
> meta.
> > >
> > > # If you want to make meta faster then moving it to other regionservers
> > > makes things worse.
> > >
> > > Meta can get pretty hot. Putting it with other regions that clients
> will
> > be
> > > trying to access makes everything worse. It means that meta is
> competing
> > > with user requests. If meta gets served and other requests don't,
> causing
> > > more requests to meta; or requests to user regions get served and other
> > > clients get starved.
> > > At FB we've seen read throughput to meta doubled or more by swapping it
> > to
> > > master. Writes to meta are also much faster since there's no rpc hop,
> no
> > > queueing, to fighting with reads. So far it has been the single biggest
> > > thing to make meta faster.
> > >
> > >
> > > On Thu, Apr 7, 2016 at 10:11 PM, Stack  wrote:
> > >
> > > > I would like to start a discussion on whether Master should be
> carrying
> > > > regions or not. No hurry. I see this thread going on a while and what
> > > with
> > > > 2.0 being a ways out yet, ther

Re: [DISCUSS] No regions on Master node in 2.0

2016-04-08 Thread Matteo Bertozzi

You are still thinking at meta used as a state machine.
to simplify, meta should just be: region:location

not being able to access meta only means that we can't publish
to the client the new location of the region.
when meta will be available, that location will be published.

what you are thinking about for meta on the master is "the state".
and with proc-v2 we have that state on the master.

Matteo


On Fri, Apr 8, 2016 at 8:46 AM, Elliott Clark 
wrote:

> Proc v2 can't fix that it's harder to get a write into meta when going over
> rpc. Our try at qos doesn't fix it. As long as critical meta operations are
> competing with user requests meta will be unstabla
>
> I am absolutely confident that meta on master makes hbase lose less data.
> The itbll tests bear this out. The real world experience bears this out.
> On Apr 8, 2016 8:03 AM, "Matteo Bertozzi"  wrote:
>
> > # Without meta on master, we double assign and lose data.
> >
> > I doubt meta on master solve this problem.
> > This has more to do on the fact that balancer, assignment, split, merge
> > are disjoint operations that are not aware of each other.
> > also those operation in general consist of multiple steps and if the
> master
> > crashes you may end up in an inconsistent state.
> >
> > this is what proc-v2 should solve. since we are aware of each operation
> > there is no chance of double assignment and similar by design.
> >
> > The master doesn't need the full meta to operate properly
> > it just need the "state" (at which point of the operation am I).
> > which is the wal of proc-v2. given that we can split meta or meta
> > remote without any problem. since we only have 1 update to meta to
> > update the location when the assignment is completed.
> >
> > also at the moment the master has a copy of the information in meta.
> > a map with the RegionInfo, state and locations. but we are still doing
> > a query on meta instead of using that local map directly.
> > if we move meta on master we can remove that extra copy, but that
> > will tight together meta and master making impossible to offload meta, if
> > we need to.
> >
> >
> > In my opinion with the new assignment you have all the main problem
> solved.
> > we can keep regions on master as we have now,
> > so you can configure it to get more performance (avoid the remote rpc).
> > but our design should allow meta to be split and to be hosted somewhere
> > else.
> >
> > Matteo
> >
> >
> > On Fri, Apr 8, 2016 at 2:08 AM, 张铎  wrote:
> >
> > > Agree on the performance concerns. IMO we should not hurt the
> performance
> > > of small(maybe normal?) clusters when scaling for huge clusters.
> > > And I also agree that the current implementation which allows Master to
> > > carry system regions is not good(sorry for the chinglish...). At least,
> > it
> > > makes the master startup really complicated.
> > >
> > > So IMO, we should let the master process or master machine to also
> carry
> > > system regions, but in another way. Start another RS instance on the
> same
> > > machine or in the same JVM? Or build a new storage based on the
> procedure
> > > store and convert it to a normal table when it is too large?
> > >
> > > Thanks.
> > >
> > > 2016-04-08 16:42 GMT+08:00 Elliott Clark :
> > >
> > > > # Without meta on master, we double assign and lose data.
> > > >
> > > > That is currently a fact that I have seen over and over on multiple
> > > loaded
> > > > clusters. Some abstract clean up of deployment vs losing data is a
> > > > no-brainer for me. Master assignment, region split, region merge are
> > all
> > > > risky, and all places that HBase can lose data. Meta being hosted on
> > the
> > > > master makes communication easier and less flakey. Running ITBLL on a
> > > loop
> > > > that creates a new table every time, and without meta on master
> > > everything
> > > > will fail pretty reliably in ~2 days. With meta on master things pass
> > > MUCH
> > > > more.
> > > >
> > > > # Master hosting the system tables locates the system tables as close
> > as
> > > > possible to the machine that will be mutating the data.
> > > >
> > > > Data locality is something that we all work for. Short circuit local
> > > reads,
> > > > Caching blocks in jvm, etc. Bringing data closer to the interested
> > party
> > > > has a long history of making things faster and better. Master is in
> > > charge
> > > > of just about all mutations of all systems tables. It's in charge of
> > > > changing meta, changing acls, creating new namespaces, etc. So put
> the
> > > > memstore as close as possible to the system that's going to mutate
> > meta.
> > > >
> > > > # If you want to make meta faster then moving it to other
> regionservers
> > > > makes things worse.
> > > >
> > > > Meta can get pretty hot. Putting it with other regions that clients
> > will
> > > be
> > > > trying to access makes everything worse. It means that meta is
> > competing
> > > > with user requests. If meta gets served and other requests d

Re: [DISCUSS] No regions on Master node in 2.0

2016-04-08 Thread Elliott Clark

On Fri, Apr 8, 2016 at 8:59 AM, Matteo Bertozzi 
wrote:

> You are still thinking at meta used as a state machine.
> to simplify, meta should just be: region:location
>
> not being able to access meta only means that we can't publish
> to the client the new location of the region.
> when meta will be available, that location will be published.
>
> what you are thinking about for meta on the master is "the state".
> and with proc-v2 we have that state on the master.
>

No, writing to meta is be how we publish the state to clients. That
operation will always be more reliable if we don't have to go over rpc.

Re: [DISCUSS] No regions on Master node in 2.0

2016-04-08 Thread Elliott Clark

Let me put it this way: Removing meta fixes no issues seen from day to day
operations, but makes worse just about everything that has been an issue on
loaded clusters.

On Fri, Apr 8, 2016 at 9:05 AM, Elliott Clark  wrote:

>
> On Fri, Apr 8, 2016 at 8:59 AM, Matteo Bertozzi 
> wrote:
>
>> You are still thinking at meta used as a state machine.
>> to simplify, meta should just be: region:location
>>
>> not being able to access meta only means that we can't publish
>> to the client the new location of the region.
>> when meta will be available, that location will be published.
>>
>> what you are thinking about for meta on the master is "the state".
>> and with proc-v2 we have that state on the master.
>>
>
> No, writing to meta is be how we publish the state to clients. That
> operation will always be more reliable if we don't have to go over rpc.
>

Re: [DISCUSS] No regions on Master node in 2.0

2016-04-08 Thread Andrew Purtell

> This simple distinction of HMaster and RegionServer roles is also what our 
> users know and have gotten used to so needs to be a good reason to change it 
> (We can still pursue the single binary that can do HMaster or HRegionServer 
> role determined at runtime).

I have always liked the idea of a single HBase daemon with dynamic role 
switching. Reducing the number of separate processes to manage will make life 
easier for operators by degree. We would need to move meta around with the 
master until we fix issues with remote meta, perhaps that gets tackled as part 
of splittable meta work, but I don't know who would be doing that. However, 
most of the complexity is in HDFS (like NN, ZKFC, and QJM as separate daemons 
when they should be all-in-one IMHO), and our reliance on a ZK quorum adds 
another daemon type to contemplate if you're coming to HBase without other 
ZK-dependent services already in production. Collapsing the daemon roles at the 
HBase layer provides only limited complexity reduction in the big picture. 
Sadly. 

> On Apr 7, 2016, at 10:11 PM, Stack  wrote:
> 
> I would like to start a discussion on whether Master should be carrying
> regions or not. No hurry. I see this thread going on a while and what with
> 2.0 being a ways out yet, there is no need to rush to a decision.
> 
> First, some background.
> 
> Currently in the master branch, HMaster hosts 'system tables': e.g.
> hbase:meta. HMaster is doing more than just gardening the cluster,
> bootstrapping and keeping all up and serving healthy as in branch-1; in
> master branch, it is actually in the write path for the most critical
> system regions.
> 
> Master is this way because HMaster and HRegionServer servers have so much
> in common, they should be just one binary, w/ HMaster as any other server
> with the HMaster function a minor appendage runnable by any running
> HRegionServer.
> 
> I like this idea, but the unification work was just never finished. What is
> in master branch is a compromise. HMaster is not a RegionServer but a
> sort-of RegionServer doing part serving. So we have HMaster role, a new
> part-RegionServer-carrying-special-regions role and then a full-on
> HRegionServer role. We need to fix this messyness. We could revert to plain
> branch-1 roles or carrying the
> HMaster-function-is-something-any-RegionServer-could-execute through to
> completion.
> 
> More background from a time long-past with good comments by the likes of
> our Francis Liu and Mighty Matteo Bertozzi are here [1], on unifying master
> and meta-serving. Slightly related are old discussions on being able to
> scale by splitting meta with good comments by our Elliott Clark [2].
> 
> Also for consideration, the landscape has since changed. [1] was written
> before we had ProcedureV2 available to us where we could record
> intermediate transition states local to the Master rather than remote as
> intermediate updates to an hbase:meta over rpc running on another node.
> 
> Enough on the background.
> 
> Let me provoke discussion by making the statement that we should undo
> HMaster carrying any regions ever; that the HMaster function is work enough
> for a single dedicated server and that it important enough that it cannot
> take a background role on a serving RegionServer (I could go back from this
> position if evidence HMaster role could be backgrounded). Notions of a
> Master carrying system tables only are just not on given system tables will
> be too big for a single server especially when hbase:meta is split (so we
> can scale). This simple distinction of HMaster and RegionServer roles is
> also what our users know and have gotten used to so needs to be a good
> reason to change it (We can still pursue the single binary that can do
> HMaster or HRegionServer role determined at runtime).
> 
> Thanks,
> St.Ack
> 
> 1.
> https://docs.google.com/document/d/1xC-bCzAAKO59Xo3XN-Cl6p-5CM_4DMoR-WpnkmYZgpw/edit#heading=h.j5yqy7n04bkn
> 2.
> https://docs.google.com/document/d/1eCuqf7i2dkWHL0PxcE1HE1nLRQ_tCyXI4JsOB6TAk60/edit#heading=h.80vcerzbkj93

Re: [DISCUSS] No regions on Master node in 2.0

2016-04-12 Thread Gary Helmling

Sorry to be late to the party here.  I'll sprinkle my comments over the
thread where they make the most sense.


> Currently in the master branch, HMaster hosts 'system tables': e.g.
> hbase:meta. HMaster is doing more than just gardening the cluster,
> bootstrapping and keeping all up and serving healthy as in branch-1; in
> master branch, it is actually in the write path for the most critical
> system regions.
>
>
I think it's important to point out that this feature exists and is usable
in branch-1 as well, including in all 1.x releases.  It just disabled by
default branch-1 and enabled by default in master. So this is really a
comparison of an existing, shipping, feature that does work, and is being
used vs. ongoing development work in master.


>
> Let me provoke discussion by making the statement that we should undo
> HMaster carrying any regions ever; that the HMaster function is work enough
> for a single dedicated server and that it important enough that it cannot
> take a background role on a serving RegionServer (I could go back from this
> position if evidence HMaster role could be backgrounded). Notions of a
> Master carrying system tables only are just not on given system tables will
> be too big for a single server especially when hbase:meta is split (so we
> can scale).


If we really think that normal master housekeeping functions are work
enough that we shouldn't combine with region serving, then why do we think
that those will _not_ have to be scaled by splitting the metadata space
across multiple servers when we encounter meta-scaling issues that require
splitting meta to distribute it across multiple servers?  If we really want
to scale, then it seems like we need to tackle scaling the region metadata
in general across multiple active masters, in which case meta-on-master is
not really an argument either way.


> This simple distinction of HMaster and RegionServer roles is
> also what our users know and have gotten used to so needs to be a good
> reason to change it (We can still pursue the single binary that can do
> HMaster or HRegionServer role determined at runtime).
>

The distinction in roles in HBase has long been used as a criticism of
HBase's operational complexity.  I think we would be doing our users a
service by simplifying this and making it a detail they do not need to
worry about. If we can truly make this transparent to users and improve
operability at the same time, I think that would be the best outcome.

Re: [DISCUSS] No regions on Master node in 2.0

2016-04-12 Thread Gary Helmling

>
> # Without meta on master, we double assign and lose data.
>
> I doubt meta on master solve this problem.
> This has more to do on the fact that balancer, assignment, split, merge
> are disjoint operations that are not aware of each other.
> also those operation in general consist of multiple steps and if the master
> crashes you may end up in an inconsistent state.
>
>
Meta-on-master does dramatically improve things.  For example, it makes it
possible to cold-start HBase under load, where a non-meta-serving master is
never able to successfully complete initialization.  This is the difference
between a cluster being able to come to a healthy state vs. one that is
never able to complete assignments, communicate those assignments to
clients and come to a steady state.


> this is what proc-v2 should solve. since we are aware of each operation
> there is no chance of double assignment and similar by design.
>
>
Again, I think it is difficult to compare an existing feature that is
working in production use vs. one that is actively being developed in
master.

Preventing double assignment sounds great.  When happens when the update of
meta to communicate this to clients fails?  So long as meta is served
elsewhere you still have distributed state.

Until we have an alternative that is feature complete and has demonstrated
success and stability in production use, I don't see how we can even
propose removing a feature that is solving real problems.

I also think that this proposed direction will amplify our release problems
and get us further away from regular, incremental releases.  Master will
remain unreleaseable indefinitely until proc v2 development is finished,
and even initial releases will have problems that need to be ironed out.
Ironing out issues in initial releases is not unexpected, but by removing
existing solution we would be forcing a big-bang approach where everything
has to work before anyone can move over to 2.0, which increases pressure
for users to stay on 1.x releases, which increases pressure to backport
features and brings us closer to the Hadoop way.  I would much rather see
us working on incrementally improving what we have and proving out new
solutions piece by piece.

Re: [DISCUSS] No regions on Master node in 2.0

2016-04-19 Thread Francis Liu

Very late to the party
IMHO having the master doing only gardening and not become a part of the user 
access path is a good design and something we should stick to. It's good SoC 
(ie makes gardening tasks more isolated from user workload).
> we double assign and lose data.
Given that meta only has a single writer/manager (aka master) this IMHO is more 
about having a clean state machine than because of remotely writing to a 
region. We should be able to remain in a good state in the event of write 
failures. After all even the writes to the filesystem involve remote writes. 
> Running ITBLL on a loop that creates a new table every time, and without meta 
>on master everything will fail pretty reliably in ~2 days.
This is interesting. I'll give it a try. Just run generator for 2 days? 
Creating a new table everytime? Do I drop the old one? 
> Short circuit local reads, Caching blocks in jvm, etc. Bringing data closer 
>to the interested party has a long history of making things faster and better.
AFAIK All the metadata that the master needs is already cached in memory during 
startup. It does not require meta to be on master.
> Master is in charge of just about all mutations of all systems tables.
Locality is not as useful here writes still end up being remote by virtue of 
hdfs.
> At FB we've seen read throughput to meta doubled or more by swapping it to 
>master. Writes to meta are also much faster since there's no rpc hop, no 
>queueing, to fighting with reads. So far it has been the single biggest thing 
>to make meta faster.
This can be addressed with region server groups. :-) As in this case that's 
pretty much what you're doing here having a special region server, serve system 
tables, isolating it from user tables. The upside is you can have more than one 
"system regionserver" in this case. This is how we do things internally so 
we've never experienced user region access interfering with meta. 
> For example, it makes it possible to cold-start HBase under load, where a 
>non-meta-serving master is never able to successfully complete initialization.
Is this problem here because meta is affected by user region workloads? If so 
region server groups should help in this case as well.
> If we really think that normal master housekeeping functions are work enough 
>that we shouldn't combine with region serving, then why do we think that those 
>will _not_ have to be scaled by splitting the metadata space across multiple 
>servers when we encounter meta-scaling issues that require splitting meta to 
>distribute it across multiple servers?  
Based on our tests a single master (without meta) is fine with handling a few 
million regions the bottlenecks are elsewhere (ie updating meta). 







 



 

On Tuesday, April 12, 2016 11:55 AM, Gary Helmling  
wrote:
 

 >
> # Without meta on master, we double assign and lose data.
>
> I doubt meta on master solve this problem.
> This has more to do on the fact that balancer, assignment, split, merge
> are disjoint operations that are not aware of each other.
> also those operation in general consist of multiple steps and if the master
> crashes you may end up in an inconsistent state.
>
>
Meta-on-master does dramatically improve things.  For example, it makes it
possible to cold-start HBase under load, where a non-meta-serving master is
never able to successfully complete initialization.  This is the difference
between a cluster being able to come to a healthy state vs. one that is
never able to complete assignments, communicate those assignments to
clients and come to a steady state.


> this is what proc-v2 should solve. since we are aware of each operation
> there is no chance of double assignment and similar by design.
>
>
Again, I think it is difficult to compare an existing feature that is
working in production use vs. one that is actively being developed in
master.

Preventing double assignment sounds great.  When happens when the update of
meta to communicate this to clients fails?  So long as meta is served
elsewhere you still have distributed state.

Until we have an alternative that is feature complete and has demonstrated
success and stability in production use, I don't see how we can even
propose removing a feature that is solving real problems.

I also think that this proposed direction will amplify our release problems
and get us further away from regular, incremental releases.  Master will
remain unreleaseable indefinitely until proc v2 development is finished,
and even initial releases will have problems that need to be ironed out.
Ironing out issues in initial releases is not unexpected, but by removing
existing solution we would be forcing a big-bang approach where everything
has to work before anyone can move over to 2.0, which increases pressure
for users to stay on 1.x releases, which increases pressure to backport
features and brings us closer to the Hadoop way.  I would much rather see
us working on incrementally improving what we

Re: [DISCUSS] No regions on Master node in 2.0

2016-04-19 Thread Elliott Clark

On Tue, Apr 19, 2016 at 1:52 PM, Francis Liu  wrote:

> Locality is not as useful here writes still end up being remote by virtue
> of hdfs.
>

Removing one hop is still useful. It's the same reason that for hdfs writes
the first copy is local.

Re: [DISCUSS] No regions on Master node in 2.0

2016-04-25 Thread Stack

On Fri, Apr 8, 2016 at 1:42 AM, Elliott Clark  wrote:

> # Without meta on master, we double assign and lose data.
>
> That is currently a fact that I have seen over and over on multiple loaded
> clusters. Some abstract clean up of deployment vs losing data is a
> no-brainer for me. Master assignment, region split, region merge are all
> risky, and all places that HBase can lose data. Meta being hosted on the
> master makes communication easier and less flakey. Running ITBLL on a loop
> that creates a new table every time, and without meta on master everything
> will fail pretty reliably in ~2 days. With meta on master things pass MUCH
> more.
>
>
The above is a problem of branch-1?

The discussion is what to do in 2.0 with the assumption that master state
would be done up on procedure v2 making most of the transitions now done
over zk and hbase:meta instead local to the master with only the final
state published to a remote meta (an RPC but if we can't make RPC work
reliably in our distributed system, thats a bigger problem).


> # Master hosting the system tables locates the system tables as close as
> possible to the machine that will be mutating the data.
>
> Data locality is something that we all work for. Short circuit local reads,
> Caching blocks in jvm, etc. Bringing data closer to the interested party
> has a long history of making things faster and better. Master is in charge
> of just about all mutations of all systems tables. It's in charge of
> changing meta, changing acls, creating new namespaces, etc. So put the
> memstore as close as possible to the system that's going to mutate meta.
>


Above is fine except for the bit where we need to be able to field reads.
Lets distribute the data to be read over the cluster rather than treat meta
reads with kid gloves hosted on a 'special' server; let these 'reads' be
like any other read the cluster takes (see next point)



> # If you want to make meta faster then moving it to other regionservers
> makes things worse.
>
> Meta can get pretty hot. Putting it with other regions that clients will be
> trying to access makes everything worse. It means that meta is competing
> with user requests. If meta gets served and other requests don't, causing
> more requests to meta; or requests to user regions get served and other
> clients get starved.
> At FB we've seen read throughput to meta doubled or more by swapping it to
> master. Writes to meta are also much faster since there's no rpc hop, no
> queueing, to fighting with reads. So far it has been the single biggest
> thing to make meta faster.
>
>
Is this just because meta had a dedicated server?

St.Ack


>
> On Thu, Apr 7, 2016 at 10:11 PM, Stack  wrote:
>
> > I would like to start a discussion on whether Master should be carrying
> > regions or not. No hurry. I see this thread going on a while and what
> with
> > 2.0 being a ways out yet, there is no need to rush to a decision.
> >
> > First, some background.
> >
> > Currently in the master branch, HMaster hosts 'system tables': e.g.
> > hbase:meta. HMaster is doing more than just gardening the cluster,
> > bootstrapping and keeping all up and serving healthy as in branch-1; in
> > master branch, it is actually in the write path for the most critical
> > system regions.
> >
> > Master is this way because HMaster and HRegionServer servers have so much
> > in common, they should be just one binary, w/ HMaster as any other server
> > with the HMaster function a minor appendage runnable by any running
> > HRegionServer.
> >
> > I like this idea, but the unification work was just never finished. What
> is
> > in master branch is a compromise. HMaster is not a RegionServer but a
> > sort-of RegionServer doing part serving. So we have HMaster role, a new
> > part-RegionServer-carrying-special-regions role and then a full-on
> > HRegionServer role. We need to fix this messyness. We could revert to
> plain
> > branch-1 roles or carrying the
> > HMaster-function-is-something-any-RegionServer-could-execute through to
> > completion.
> >
> > More background from a time long-past with good comments by the likes of
> > our Francis Liu and Mighty Matteo Bertozzi are here [1], on unifying
> master
> > and meta-serving. Slightly related are old discussions on being able to
> > scale by splitting meta with good comments by our Elliott Clark [2].
> >
> > Also for consideration, the landscape has since changed. [1] was written
> > before we had ProcedureV2 available to us where we could record
> > intermediate transition states local to the Master rather than remote as
> > intermediate updates to an hbase:meta over rpc running on another node.
> >
> > Enough on the background.
> >
> > Let me provoke discussion by making the statement that we should undo
> > HMaster carrying any regions ever; that the HMaster function is work
> enough
> > for a single dedicated server and that it important enough that it cannot
> > take a background role on a serving RegionServer (I coul

Re: [DISCUSS] No regions on Master node in 2.0

2016-04-25 Thread Stack

On Fri, Apr 8, 2016 at 8:43 AM, Matteo Bertozzi 
wrote:

> ...
> if we hard code meta, we need a special logic to load it and from there
> start the bootstrap of the other regions.
> then there is no way to switch to multiple metas if someone wants that,
> unless we keep two code path and one of that will be proc-v2.
> so at that point we should just keep a single code path that does both.
>
>
Yes. Lets not have two code paths if we can avoid it.
St.Ack


>
> On Fri, Apr 8, 2016 at 8:27 AM, Jimmy Xiang  wrote:
>
> > One thing I'd like to say is that it makes the master startup much
> > more simpler and realiable to put system tables on master.
> >
> > Even if proc-v2 can solve the problem, it makes things complicated,
> > right? I prefer to be sure that meta is always available, in a
> > consistent state.
> >
> > If we really need to split meta, we should have an option for most
> > users to have just one meta region, and keep it on master.
> >
> >
> > On Fri, Apr 8, 2016 at 8:03 AM, Matteo Bertozzi  >
> > wrote:
> > > # Without meta on master, we double assign and lose data.
> > >
> > > I doubt meta on master solve this problem.
> > > This has more to do on the fact that balancer, assignment, split, merge
> > > are disjoint operations that are not aware of each other.
> > > also those operation in general consist of multiple steps and if the
> > master
> > > crashes you may end up in an inconsistent state.
> > >
> > > this is what proc-v2 should solve. since we are aware of each operation
> > > there is no chance of double assignment and similar by design.
> > >
> > > The master doesn't need the full meta to operate properly
> > > it just need the "state" (at which point of the operation am I).
> > > which is the wal of proc-v2. given that we can split meta or meta
> > > remote without any problem. since we only have 1 update to meta to
> > > update the location when the assignment is completed.
> > >
> > > also at the moment the master has a copy of the information in meta.
> > > a map with the RegionInfo, state and locations. but we are still doing
> > > a query on meta instead of using that local map directly.
> > > if we move meta on master we can remove that extra copy, but that
> > > will tight together meta and master making impossible to offload meta,
> if
> > > we need to.
> > >
> > >
> > > In my opinion with the new assignment you have all the main problem
> > solved.
> > > we can keep regions on master as we have now,
> > > so you can configure it to get more performance (avoid the remote rpc).
> > > but our design should allow meta to be split and to be hosted somewhere
> > > else.
> > >
> > > Matteo
> > >
> > >
> > > On Fri, Apr 8, 2016 at 2:08 AM, 张铎  wrote:
> > >
> > >> Agree on the performance concerns. IMO we should not hurt the
> > performance
> > >> of small(maybe normal?) clusters when scaling for huge clusters.
> > >> And I also agree that the current implementation which allows Master
> to
> > >> carry system regions is not good(sorry for the chinglish...). At
> least,
> > it
> > >> makes the master startup really complicated.
> > >>
> > >> So IMO, we should let the master process or master machine to also
> carry
> > >> system regions, but in another way. Start another RS instance on the
> > same
> > >> machine or in the same JVM? Or build a new storage based on the
> > procedure
> > >> store and convert it to a normal table when it is too large?
> > >>
> > >> Thanks.
> > >>
> > >> 2016-04-08 16:42 GMT+08:00 Elliott Clark :
> > >>
> > >> > # Without meta on master, we double assign and lose data.
> > >> >
> > >> > That is currently a fact that I have seen over and over on multiple
> > >> loaded
> > >> > clusters. Some abstract clean up of deployment vs losing data is a
> > >> > no-brainer for me. Master assignment, region split, region merge are
> > all
> > >> > risky, and all places that HBase can lose data. Meta being hosted on
> > the
> > >> > master makes communication easier and less flakey. Running ITBLL on
> a
> > >> loop
> > >> > that creates a new table every time, and without meta on master
> > >> everything
> > >> > will fail pretty reliably in ~2 days. With meta on master things
> pass
> > >> MUCH
> > >> > more.
> > >> >
> > >> > # Master hosting the system tables locates the system tables as
> close
> > as
> > >> > possible to the machine that will be mutating the data.
> > >> >
> > >> > Data locality is something that we all work for. Short circuit local
> > >> reads,
> > >> > Caching blocks in jvm, etc. Bringing data closer to the interested
> > party
> > >> > has a long history of making things faster and better. Master is in
> > >> charge
> > >> > of just about all mutations of all systems tables. It's in charge of
> > >> > changing meta, changing acls, creating new namespaces, etc. So put
> the
> > >> > memstore as close as possible to the system that's going to mutate
> > meta.
> > >> >
> > >> > # If you want to make meta faster then moving it to other
> > regions

Re: [DISCUSS] No regions on Master node in 2.0

2016-04-25 Thread Stack

On Tue, Apr 12, 2016 at 11:22 AM, Gary Helmling  wrote:

> ...
>
> > Currently in the master branch, HMaster hosts 'system tables': e.g.
> > hbase:meta. HMaster is doing more than just gardening the cluster,
> > bootstrapping and keeping all up and serving healthy as in branch-1; in
> > master branch, it is actually in the write path for the most critical
> > system regions.
> >
> >
> I think it's important to point out that this feature exists and is usable
> in branch-1 as well, including in all 1.x releases.  It just disabled by
> default branch-1 and enabled by default in master. So this is really a
> comparison of an existing, shipping, feature that does work, and is being
> used vs. ongoing development work in master.
>
>
I did not realize this facility was being used in branch-1 or even that it
worked well enough to be deployed to production in branch-1.


>
> >
> > Let me provoke discussion by making the statement that we should undo
> > HMaster carrying any regions ever; that the HMaster function is work
> enough
> > for a single dedicated server and that it important enough that it cannot
> > take a background role on a serving RegionServer (I could go back from
> this
> > position if evidence HMaster role could be backgrounded). Notions of a
> > Master carrying system tables only are just not on given system tables
> will
> > be too big for a single server especially when hbase:meta is split (so we
> > can scale).
>
>
> If we really think that normal master housekeeping functions are work
> enough that we shouldn't combine with region serving, then why do we think
> that those will _not_ have to be scaled by splitting the metadata space
> across multiple servers when we encounter meta-scaling issues that require
> splitting meta to distribute it across multiple servers?


Master meta functions may one day grow such that they are more than one
server can manage. Chatting w/ folks who have run a system that is like
hbase's (smile) that is run at scale, rather than split the master
function, they took the time to make the master more efficient so they
didn't have to distribute its duties.

We can split hbase:meta and distribute it around the cluster if an
hbase:meta region can be served like any other region in the system.




>   If we really want
> to scale, then it seems like we need to tackle scaling the region metadata
> in general across multiple active masters, in which case meta-on-master is
> not really an argument either way.
>
>
Distributing the metadata function amongst a cluster of  masters has come
up before. But before we go there, a single master that does metadata
function only rather than metadata function AND fielding all metadata reads
will be able to do more metadata ops if the hbase:meta reads are done
elsewhere. Rough experiments with more to follow show that this should get
us to our next scaling target, 1M regions on a cluster.


>
> > This simple distinction of HMaster and RegionServer roles is
> > also what our users know and have gotten used to so needs to be a good
> > reason to change it (We can still pursue the single binary that can do
> > HMaster or HRegionServer role determined at runtime).
> >
>
> The distinction in roles in HBase has long been used as a criticism of
> HBase's operational complexity.  I think we would be doing our users a
> service by simplifying this and making it a detail they do not need to
> worry about. If we can truly make this transparent to users and improve
> operability at the same time, I think that would be the best outcome.
>

I could go this route of the floating master after we'd done some work.
First lets figure out this current state of hbase master branch where we
have an inbetweenie, a master that is sort-of-a-regionserver carrying only
system tables and thereby getting in the way of our being able to scale
metadata function.

St.Ack

Re: [DISCUSS] No regions on Master node in 2.0

2016-04-25 Thread Gary Helmling

On Mon, Apr 25, 2016 at 11:20 AM Stack  wrote:

> On Fri, Apr 8, 2016 at 1:42 AM, Elliott Clark  wrote:
>
> > # Without meta on master, we double assign and lose data.
> >
> > That is currently a fact that I have seen over and over on multiple
> loaded
> > clusters. Some abstract clean up of deployment vs losing data is a
> > no-brainer for me. Master assignment, region split, region merge are all
> > risky, and all places that HBase can lose data. Meta being hosted on the
> > master makes communication easier and less flakey. Running ITBLL on a
> loop
> > that creates a new table every time, and without meta on master
> everything
> > will fail pretty reliably in ~2 days. With meta on master things pass
> MUCH
> > more.
> >
> >
> The above is a problem of branch-1?
>
> The discussion is what to do in 2.0 with the assumption that master state
> would be done up on procedure v2 making most of the transitions now done
> over zk and hbase:meta instead local to the master with only the final
> state published to a remote meta (an RPC but if we can't make RPC work
> reliably in our distributed system, thats a bigger problem).
>
>
But making RPC work for assignment here is precisely the problem.  There's
no reason master should have to contend with user requests to meta in order
to be able to make updates.  And until clients can actually see the change,
it doesn't really matter if the master state has been updated or not.

Sure, we could add more RPC priorities, even more handler pools and
additional queues for master requests to meta vs. user requests to meta.
Maybe with that plus adding in regionserver groups we actually start to
have something that comes close to what we already have today with meta on
master.  But why should we have to add all that complexity?  None of this
is an issue if master updates to meta are local and don't have to go
through RPC.


>
> > # Master hosting the system tables locates the system tables as close as
> > possible to the machine that will be mutating the data.
> >
> > Data locality is something that we all work for. Short circuit local
> reads,
> > Caching blocks in jvm, etc. Bringing data closer to the interested party
> > has a long history of making things faster and better. Master is in
> charge
> > of just about all mutations of all systems tables. It's in charge of
> > changing meta, changing acls, creating new namespaces, etc. So put the
> > memstore as close as possible to the system that's going to mutate meta.
> >
>
>
> Above is fine except for the bit where we need to be able to field reads.
> Lets distribute the data to be read over the cluster rather than treat meta
> reads with kid gloves hosted on a 'special' server; let these 'reads' be
> like any other read the cluster takes (see next point)
>
>
In my opinion, the real "special" part here is the master bit -- which I
think we should be working to make less special and more just a normal bit
of housekeeping spread across nodes -- not the regionserver role.  It only
looks special right now because the evolution has stopped in the middle.  I
really don't think enshrining master as a separate process is the right way
forward for us.


>
> > # If you want to make meta faster then moving it to other regionservers
> > makes things worse.
> >
> > Meta can get pretty hot. Putting it with other regions that clients will
> be
> > trying to access makes everything worse. It means that meta is competing
> > with user requests. If meta gets served and other requests don't, causing
> > more requests to meta; or requests to user regions get served and other
> > clients get starved.
> > At FB we've seen read throughput to meta doubled or more by swapping it
> to
> > master. Writes to meta are also much faster since there's no rpc hop, no
> > queueing, to fighting with reads. So far it has been the single biggest
> > thing to make meta faster.
> >
> >
> Is this just because meta had a dedicated server?
>
>
I'm sure that having dedicated resources for meta helps.  But I don't think
that's sufficient.  The key is that master writes to meta are local, and do
not have to contend with the user requests to meta.

It seems premature to be discussing dropping a working implementation which
eliminates painful parts of distributed consensus, until we have a complete
working alternative to evaluate.  Until then, why are we looking at
features that are in use and work well?



>