Very late to the party....
IMHO having the master doing only gardening and not become a part of the user 
access path is a good design and something we should stick to. It's good SoC 
(ie makes gardening tasks more isolated from user workload).
> we double assign and lose data.
Given that meta only has a single writer/manager (aka master) this IMHO is more 
about having a clean state machine than because of remotely writing to a 
region. We should be able to remain in a good state in the event of write 
failures. After all even the writes to the filesystem involve remote writes. 
> Running ITBLL on a loop that creates a new table every time, and without meta 
>on master everything will fail pretty reliably in ~2 days.
This is interesting. I'll give it a try. Just run generator for 2 days? 
Creating a new table everytime? Do I drop the old one? 
> Short circuit local reads, Caching blocks in jvm, etc. Bringing data closer 
>to the interested party has a long history of making things faster and better.
AFAIK All the metadata that the master needs is already cached in memory during 
startup. It does not require meta to be on master.
> Master is in charge of just about all mutations of all systems tables.
Locality is not as useful here writes still end up being remote by virtue of 
hdfs.
> At FB we've seen read throughput to meta doubled or more by swapping it to 
>master. Writes to meta are also much faster since there's no rpc hop, no 
>queueing, to fighting with reads. So far it has been the single biggest thing 
>to make meta faster.
This can be addressed with region server groups. :-) As in this case that's 
pretty much what you're doing here having a special region server, serve system 
tables, isolating it from user tables. The upside is you can have more than one 
"system regionserver" in this case. This is how we do things internally so 
we've never experienced user region access interfering with meta. 
> For example, it makes it possible to cold-start HBase under load, where a 
>non-meta-serving master is never able to successfully complete initialization.
Is this problem here because meta is affected by user region workloads? If so 
region server groups should help in this case as well.
> If we really think that normal master housekeeping functions are work enough 
>that we shouldn't combine with region serving, then why do we think that those 
>will _not_ have to be scaled by splitting the metadata space across multiple 
>servers when we encounter meta-scaling issues that require splitting meta to 
>distribute it across multiple servers?  
Based on our tests a single master (without meta) is fine with handling a few 
million regions the bottlenecks are elsewhere (ie updating meta). 







 



 

    On Tuesday, April 12, 2016 11:55 AM, Gary Helmling <ghelml...@gmail.com> 
wrote:
 

 >
> # Without meta on master, we double assign and lose data.
>
> I doubt meta on master solve this problem.
> This has more to do on the fact that balancer, assignment, split, merge
> are disjoint operations that are not aware of each other.
> also those operation in general consist of multiple steps and if the master
> crashes you may end up in an inconsistent state.
>
>
Meta-on-master does dramatically improve things.  For example, it makes it
possible to cold-start HBase under load, where a non-meta-serving master is
never able to successfully complete initialization.  This is the difference
between a cluster being able to come to a healthy state vs. one that is
never able to complete assignments, communicate those assignments to
clients and come to a steady state.


> this is what proc-v2 should solve. since we are aware of each operation
> there is no chance of double assignment and similar by design.
>
>
Again, I think it is difficult to compare an existing feature that is
working in production use vs. one that is actively being developed in
master.

Preventing double assignment sounds great.  When happens when the update of
meta to communicate this to clients fails?  So long as meta is served
elsewhere you still have distributed state.

Until we have an alternative that is feature complete and has demonstrated
success and stability in production use, I don't see how we can even
propose removing a feature that is solving real problems.

I also think that this proposed direction will amplify our release problems
and get us further away from regular, incremental releases.  Master will
remain unreleaseable indefinitely until proc v2 development is finished,
and even initial releases will have problems that need to be ironed out.
Ironing out issues in initial releases is not unexpected, but by removing
existing solution we would be forcing a big-bang approach where everything
has to work before anyone can move over to 2.0, which increases pressure
for users to stay on 1.x releases, which increases pressure to backport
features and brings us closer to the Hadoop way.  I would much rather see
us working on incrementally improving what we have and proving out new
solutions piece by piece.


  

Reply via email to