[
https://issues.apache.org/jira/browse/HBASE-13260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14506851#comment-14506851
]
Matteo Bertozzi commented on HBASE-13260:
-----------------------------------------
[~enis] I think you didn't get from my last comment that I'm pushing for having
this patch in. and I'd like to avoid wasting time on comparing perf on
something that were not optimized for write and we are going to throw away.
nonetheless, I looked into the code and found a couple of interesting things to
fix see HBASE-13529. bug and optimization aside one major hit is on hsync used
by default instead of hflush used by the region wal. and the times with the
patch now looks much better (didn't compare with the region but without the
patch which is infinitely slower).
Wrote 1000000 procedures with 5 threads with hsync=false in 44.9360sec 44.936sec
Wrote 1000000 procedures with 10 threads with hsync=false in 27.9200sec
27.920sec
Wrote 1000000 procedures with 30 threads with hsync=false in 17.1160sec
17.116sec
Wrote 1000000 procedures with 50 threads with hsync=false in 14.7460sec
14.746sec
Wrote 10000 procedures with 10 threads with hsync=true in 1mins, 47.52sec
107.520sec
Wrote 10000 procedures with 30 threads with hsync=true in 41.8420sec 41.842sec
Wrote 10000 procedures with 50 threads with hsync=true in 26.4210sec 26.421sec
anway, going back to the real topic.
when I started with the wal there were a couple of obvious shortcut.
* procedure are short lived, we probably don't need compaction but a TTL
expire should be ok.
* if we don't have any procedure running, there is no need to replay on
restart.
* if we keep a tracker we can avoid loading completed/removed procedure and
avoid serialize/deserialize.
* we are able to start some procedure before reading all the wals. The replay
is from the newest wal to the oldest. if we find the first entry (the user
submit) of the procedure and it was in execution we know that we can start it
without waiting on the rest.
if we can, it will be nice to be able to get these behavior in, because they
will be reducing the replay time. the last point for starting procedures before
completing the replay it will be really nice to reduce the AM time on replay.
the others are just shortcut to avoid reading data that we don't need.
of course it will be special logic just for the procedure case, but in theory
we can extend the base Region and have the Procedure region and do this kind of
tricks. We don't need this in now, and I don't want to block this jira for this
stuff. I'm just trying to point out what we can do to optimize this use case,
and see if we can do it this reusing what we have.
> Bootstrap Tables for fun and profit
> ------------------------------------
>
> Key: HBASE-13260
> URL: https://issues.apache.org/jira/browse/HBASE-13260
> Project: HBase
> Issue Type: Bug
> Reporter: Enis Soztutar
> Assignee: Enis Soztutar
> Fix For: 2.0.0, 1.1.0
>
> Attachments: hbase-13260_bench.patch, hbase-13260_prototype.patch
>
>
> Over at the ProcV2 discussions(HBASE-12439) and elsewhere I was mentioning an
> idea where we may want to use regular old regions to store/persist some data
> needed for HBase master to operate.
> We regularly use system tables for storing system data. acl, meta, namespace,
> quota are some examples. We also store the table state in meta now. Some data
> is persisted in zk only (replication peers and replication state, etc). We
> are moving away from zk as a permanent storage. As any self-respecting
> database does, we should store almost all of our data in HBase itself.
> However, we have an "availability" dependency between different kinds of
> data. For example all system tables need meta to be assigned first. All
> master operations need ns table to be assigned, etc.
> For at least two types of data, (1) procedure v2 states, (2) RS groups in
> HBASE-6721 we cannot depend on meta being assigned since "assignment" itself
> will depend on accessing this data. The solution in (1) is to implement a
> custom WAL format, and custom recover lease and WAL recovery. The solution in
> (2) is to have the table to store this data, but also cache it in zk for
> bootrapping initial assignments.
> For solving both of the above (and possible future use cases if any), I
> propose we add a "boostrap table" concept, which is:
> - A set of predefined tables hosted in a separate dir in HDFS.
> - A table is only 1 region, not splittable
> - Not assigned through regular assignment
> - Hosted only on 1 server (typically master)
> - Has a dedicated WAL.
> - A service does WAL recovery + fencing for these tables.
> This has the benefit of using a region to keep the data, but frees us to
> re-implement caching and we can use the same WAL / Memstore / Recovery
> mechanisms that are battle-tested.
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)