[jira] [Commented] (HBASE-11165) Scaling so cluster can host 1M regions and beyond (50M regions?)

stack (JIRA) Tue, 19 Aug 2014 13:19:50 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-11165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102764#comment-14102764
 ]


stack commented on HBASE-11165:
-------------------------------

bq. If split meta, then 1) Less write amplification (ie no large compactions) 
...

Good point. i.e. if we want to move to lots of small regions, it would be odd 
if there was an "except for meta" clause.

bq. Better W throughput.

If Master is only writer, we'd need to ensure we are writing in // (i.e. 
Virag's recent patches).

bq. 2) More disks, more R/W throughput.

Yes.

bq. More heap to fit meta...

More heap to cache meta, yes.

bq. ...We need to do experiments for 1 rack and 2 rack failure...

Agreed that in time of catastrophic part-failure, we'd need the better R/W 
throughput a split meta can give you.

Other pluses are we would treat meta like any other table. Negatives are we 
need our root back and startup is more complicated (but at least all inside 
single master in this case).

In 
https://docs.google.com/document/d/1xC-bCzAAKO59Xo3XN-Cl6p-5CM_4DMoR-WpnkmYZgpw/edit#
 I (and others) argue for colocated meta and master going forward looking at 
options. Let me freshen it with arguments made here.

Colocating meta and master has nice properties. The in-memory image of the 
cluster layout -- probably a severe sub-set of what is actually in meta -- 
would need to fit a single-server's RAM in either model.  When colocated, 
operations are faster, less prone-to-error when less RPC involved (We'd still 
be subject to 
http://writings.quilt.org/2014/05/12/distributed-systems-and-the-end-of-the-api/
 if persisting meta in hdfs as francis notes above).  A single machine hosting 
single meta would not be able to service a 50M region startup with hundreds or 
regionservers as well as a deploy with split meta.  It could. It'd just be 
slower. Colocated meta and master implies single meta forever and that single 
meta is served by one server only -- a 50M meta region would be an anomaly in 
the cluster being bigger than all the rest -- and until we have HBASE-10295 
"Refactor the replication implementation to eliminate permanent zk node" and/or 
HBASE-11467 "New impl of Registry interface not using ZK + new RPCs on master 
protocol" (Maybe a later phase of HBASE-10070 when followers can run closer in 
to the leader state would work here) or a new master layout where we partition 
meta across multiple master server.

A plus split meta has over colocated master and meta is that master currently 
can be down for some period of time and the cluster keeps working; no splits 
and no merges and if a machine crashes while master is down, data is offline 
till master comes back (needs more exercise).  This is less the case when 
colocated master and meta.

Please pile on all with thoughts. We need to put stake in grounds soon for 
hbase 2.0 cluster topology.  Francis needs something in 0.98 timeframe.  If the 
0.98 is different to what folks want for 2.0, as per Andy lets split this issue.

Thoughts-for-the-day:

+ HBase is supposed to be able to scale
+ Single meta came about because way back, we were too lazy to fix issues that 
arose when meta was split (at the time, we didn't need to scale as much).

> Scaling so cluster can host 1M regions and beyond (50M regions?)
> ----------------------------------------------------------------
>
>                 Key: HBASE-11165
>                 URL: https://issues.apache.org/jira/browse/HBASE-11165
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: stack
>         Attachments: HBASE-11165.zip, Region Scalability test.pdf, 
> zk_less_assignment_comparison_2.pdf
>
>
> This discussion issue comes out of "Co-locate Meta And Master HBASE-10569" 
> and comments on the doc posted there.
> A user -- our Francis Liu -- needs to be able to scale a cluster to do 1M 
> regions maybe even 50M later.  This issue is about discussing how we will do 
> that (or if not 50M on a cluster, how otherwise we can attain same end).
> More detail to follow.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11165) Scaling so cluster can host 1M regions and beyond (50M regions?)

Reply via email to