[jira] [Commented] (HBASE-13260) Bootstrap Tables for fun and profit

Matteo Bertozzi (JIRA) Sun, 26 Apr 2015 12:39:52 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-13260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513220#comment-14513220
 ]


Matteo Bertozzi commented on HBASE-13260:
-----------------------------------------

code looks good, and there are some nice cleanups in region/wal code.

but now that I'm testing it, I'm not sure if we should replace the proc-wal 
with this.
following what stack mentioned, there is too much overhead on the region path 
compared
to the simple wal. I was expecting the performance of the region to be even 
better
then the simple wal because it never had love, no optimization of any kind and 
similar.
but the region looks at least an of magnitude slower then the simple wal on 
write. 
keep in mind that with the simple wal it is easy to do optimization on replay, 
which is where I expect to get more benefit from using it.

anyway, taking one step back and without looking too much into performance.
the question is, what benefit we have by using the region instead of a simple 
wal?
in theory the only benefit we have is that we can query it with the table 
interface,
but at this point with this patch we can probably wrap the proc-v2 wal in an 
EmbeddedTable or similar.
the other benefit of course is that the code is probably more tested than the 
proc-wal and we have already features like compression/encryption in.
the main disadvantage is that is that the overhead compared to the wal on the 
write side looks too much, and making optimization on the replay it may not be 
that simple.

I was in favor of replacing the proc-wal with this before testing it, but now 
i'm no longer sure about it. 
so, if you can come up with a list of benefit and disadvantages of using the 
region vs proc-wal it will be nice to make everyone able to decide. 
(I'm still +1 on including this patch even without having it used as proc-wal 
replacement, because looks useful anyway, and it has some nice cleanups)

> Bootstrap Tables for fun and profit 
> ------------------------------------
>
>                 Key: HBASE-13260
>                 URL: https://issues.apache.org/jira/browse/HBASE-13260
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 2.0.0, 1.1.0
>
>         Attachments: hbase-13260_bench.patch, hbase-13260_prototype.patch
>
>
> Over at the ProcV2 discussions(HBASE-12439) and elsewhere I was mentioning an 
> idea where we may want to use regular old regions to store/persist some data 
> needed for HBase master to operate. 
> We regularly use system tables for storing system data. acl, meta, namespace, 
> quota are some examples. We also store the table state in meta now. Some data 
> is persisted in zk only (replication peers and replication state, etc). We 
> are moving away from zk as a permanent storage. As any self-respecting 
> database does, we should store almost all of our data in HBase itself. 
> However, we have an "availability" dependency between different kinds of 
> data. For example all system tables need meta to be assigned first. All 
> master operations need ns table to be assigned, etc. 
> For at least two types of data, (1) procedure v2 states, (2) RS groups in 
> HBASE-6721 we cannot depend on meta being assigned since "assignment" itself 
> will depend on accessing this data. The solution in (1) is to implement a 
> custom WAL format, and custom recover lease and WAL recovery. The solution in 
> (2) is to have the table to store this data, but also cache it in zk for 
> bootrapping initial assignments. 
> For solving both of the above (and possible future use cases if any), I 
> propose we add a "boostrap table" concept, which is: 
>  - A set of predefined tables hosted in a separate dir in HDFS. 
>  - A table is only 1 region, not splittable 
>  - Not assigned through regular assignment 
>  - Hosted only on 1 server (typically master)
>  - Has a dedicated WAL. 
>  - A service does WAL recovery + fencing for these tables. 
> This has the benefit of using a region to keep the data, but frees us to 
> re-implement caching and we can use the same WAL / Memstore / Recovery 
> mechanisms that are battle-tested. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13260) Bootstrap Tables for fun and profit

Reply via email to