[ 
https://issues.apache.org/jira/browse/HBASE-13260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507870#comment-14507870
 ] 

Enis Soztutar commented on HBASE-13260:
---------------------------------------

bq. The queue is bounded and its size is set by count of handlers (queue 
doesn't need to be larger than count of handlers). Is it possible that this 
config is warped in this context? 
Thanks, it makes sense. I was testing with 150 threads at some point, and the 
queue is initialized with numHandlers=5 * 3  = 15. 5 is coming from mini 
cluster test config. 
bq. bug and optimization aside one major hit is on hsync used by default 
instead of hflush used by the region wal
It is a valid point. I think we should do the hsync anyway for regular users. 
Since we are using hflush for meta already, procedures are not different than 
that. In any case, I've added {{put.setDurability(Durability.FSYNC_WAL);}} to 
the procedure store so that once we have it we will automatically start using 
it. 
bq. I looked into the code and found a couple of interesting things to fix see 
HBASE-13529. 
Looks good. Numbers much better. 

bq. when I started with the wal there were a couple of obvious shortcut.
With procs, we have a custom in-memory representation + WAL. For doing all 
those tricks, a custom memstore + custom flush and custom WAL replay is needed. 
 
bq. procedure are short lived, we probably don't need compaction but a TTL 
expire should be ok.
Actually, a custom flusher which does not flush a deleted cell and delete 
tombstone should do the trick. The region will fill up its memstore, but the 
flush will write very little data (only the non-deleted procedures). 
bq. if we don't have any procedure running, there is no need to replay on 
restart.
This is already the case whether procs are running or not. If clean shutdown, 
WALs are moved to archive after a flush. The next master start will not do any 
replay. If there is running procedures, the scan in load() will see these and 
start executing. All of the deleted procedures will be not be seen by the scan. 
bq. if we keep a tracker we can avoid loading completed/removed procedure and 
avoid serialize/deserialize.
The procs which are deleted will not be deserialized since the scan in load() 
will not see them. 
bq. we are able to start some procedure before reading all the wals. The replay 
is from the newest wal to the oldest. if we find the first entry (the user 
submit) of the procedure and it was in execution we know that we can start it 
without waiting on the rest.
WAL replay works is by reading the oldest WAL first. Not sure whether we can 
change that without major surgery. 
bq. We don't need this in now, and I don't want to block this jira for this 
stuff. I'm just trying to point out what we can do to optimize this use case, 
and see if we can do it this reusing what we have.
Ok great. Let's get this in 1.1 then. Do you mind doing a review? 
 




> Bootstrap Tables for fun and profit 
> ------------------------------------
>
>                 Key: HBASE-13260
>                 URL: https://issues.apache.org/jira/browse/HBASE-13260
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 2.0.0, 1.1.0
>
>         Attachments: hbase-13260_bench.patch, hbase-13260_prototype.patch
>
>
> Over at the ProcV2 discussions(HBASE-12439) and elsewhere I was mentioning an 
> idea where we may want to use regular old regions to store/persist some data 
> needed for HBase master to operate. 
> We regularly use system tables for storing system data. acl, meta, namespace, 
> quota are some examples. We also store the table state in meta now. Some data 
> is persisted in zk only (replication peers and replication state, etc). We 
> are moving away from zk as a permanent storage. As any self-respecting 
> database does, we should store almost all of our data in HBase itself. 
> However, we have an "availability" dependency between different kinds of 
> data. For example all system tables need meta to be assigned first. All 
> master operations need ns table to be assigned, etc. 
> For at least two types of data, (1) procedure v2 states, (2) RS groups in 
> HBASE-6721 we cannot depend on meta being assigned since "assignment" itself 
> will depend on accessing this data. The solution in (1) is to implement a 
> custom WAL format, and custom recover lease and WAL recovery. The solution in 
> (2) is to have the table to store this data, but also cache it in zk for 
> bootrapping initial assignments. 
> For solving both of the above (and possible future use cases if any), I 
> propose we add a "boostrap table" concept, which is: 
>  - A set of predefined tables hosted in a separate dir in HDFS. 
>  - A table is only 1 region, not splittable 
>  - Not assigned through regular assignment 
>  - Hosted only on 1 server (typically master)
>  - Has a dedicated WAL. 
>  - A service does WAL recovery + fencing for these tables. 
> This has the benefit of using a region to keep the data, but frees us to 
> re-implement caching and we can use the same WAL / Memstore / Recovery 
> mechanisms that are battle-tested. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to