[
https://issues.apache.org/jira/browse/HBASE-13260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507870#comment-14507870
]
Enis Soztutar commented on HBASE-13260:
---------------------------------------
bq. The queue is bounded and its size is set by count of handlers (queue
doesn't need to be larger than count of handlers). Is it possible that this
config is warped in this context?
Thanks, it makes sense. I was testing with 150 threads at some point, and the
queue is initialized with numHandlers=5 * 3 = 15. 5 is coming from mini
cluster test config.
bq. bug and optimization aside one major hit is on hsync used by default
instead of hflush used by the region wal
It is a valid point. I think we should do the hsync anyway for regular users.
Since we are using hflush for meta already, procedures are not different than
that. In any case, I've added {{put.setDurability(Durability.FSYNC_WAL);}} to
the procedure store so that once we have it we will automatically start using
it.
bq. I looked into the code and found a couple of interesting things to fix see
HBASE-13529.
Looks good. Numbers much better.
bq. when I started with the wal there were a couple of obvious shortcut.
With procs, we have a custom in-memory representation + WAL. For doing all
those tricks, a custom memstore + custom flush and custom WAL replay is needed.
bq. procedure are short lived, we probably don't need compaction but a TTL
expire should be ok.
Actually, a custom flusher which does not flush a deleted cell and delete
tombstone should do the trick. The region will fill up its memstore, but the
flush will write very little data (only the non-deleted procedures).
bq. if we don't have any procedure running, there is no need to replay on
restart.
This is already the case whether procs are running or not. If clean shutdown,
WALs are moved to archive after a flush. The next master start will not do any
replay. If there is running procedures, the scan in load() will see these and
start executing. All of the deleted procedures will be not be seen by the scan.
bq. if we keep a tracker we can avoid loading completed/removed procedure and
avoid serialize/deserialize.
The procs which are deleted will not be deserialized since the scan in load()
will not see them.
bq. we are able to start some procedure before reading all the wals. The replay
is from the newest wal to the oldest. if we find the first entry (the user
submit) of the procedure and it was in execution we know that we can start it
without waiting on the rest.
WAL replay works is by reading the oldest WAL first. Not sure whether we can
change that without major surgery.
bq. We don't need this in now, and I don't want to block this jira for this
stuff. I'm just trying to point out what we can do to optimize this use case,
and see if we can do it this reusing what we have.
Ok great. Let's get this in 1.1 then. Do you mind doing a review?
> Bootstrap Tables for fun and profit
> ------------------------------------
>
> Key: HBASE-13260
> URL: https://issues.apache.org/jira/browse/HBASE-13260
> Project: HBase
> Issue Type: Bug
> Reporter: Enis Soztutar
> Assignee: Enis Soztutar
> Fix For: 2.0.0, 1.1.0
>
> Attachments: hbase-13260_bench.patch, hbase-13260_prototype.patch
>
>
> Over at the ProcV2 discussions(HBASE-12439) and elsewhere I was mentioning an
> idea where we may want to use regular old regions to store/persist some data
> needed for HBase master to operate.
> We regularly use system tables for storing system data. acl, meta, namespace,
> quota are some examples. We also store the table state in meta now. Some data
> is persisted in zk only (replication peers and replication state, etc). We
> are moving away from zk as a permanent storage. As any self-respecting
> database does, we should store almost all of our data in HBase itself.
> However, we have an "availability" dependency between different kinds of
> data. For example all system tables need meta to be assigned first. All
> master operations need ns table to be assigned, etc.
> For at least two types of data, (1) procedure v2 states, (2) RS groups in
> HBASE-6721 we cannot depend on meta being assigned since "assignment" itself
> will depend on accessing this data. The solution in (1) is to implement a
> custom WAL format, and custom recover lease and WAL recovery. The solution in
> (2) is to have the table to store this data, but also cache it in zk for
> bootrapping initial assignments.
> For solving both of the above (and possible future use cases if any), I
> propose we add a "boostrap table" concept, which is:
> - A set of predefined tables hosted in a separate dir in HDFS.
> - A table is only 1 region, not splittable
> - Not assigned through regular assignment
> - Hosted only on 1 server (typically master)
> - Has a dedicated WAL.
> - A service does WAL recovery + fencing for these tables.
> This has the benefit of using a region to keep the data, but frees us to
> re-implement caching and we can use the same WAL / Memstore / Recovery
> mechanisms that are battle-tested.
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)