[
https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864989#action_12864989
]
Alex Newman commented on HBASE-1364:
------------------------------------
People have asked about my design so I thought I would give a quick status
report and what I am doing.
Design:
When the HMaster server calls splitLog, under the hood, it is just enqueing a
znode as SEQUENTIAL PERSISTANT to build a quick queue. It then blocks until
that queue is drained.
RegionServer now have a LogSplitter thread, who is responsible for doing the
leg work of splitting a log. They watch the znode queue, so when an update
happens, and they are not doing work, they will attempt to claim responsibility
for working on that log.
RegionServers take responsibility by writing an ephemeral node under a separate
znode, which acts as a lock, preventing other regionservers from attempting to
split that same log. Technically this could still happen if a regionserver gc
so long that its ephemeral node expires. In which case we rely on the fact that
logsplits are idempotent.
The resultant per region logs are named sn_oldlog.log where the sn is the first
sequence number in that log. (Maybe it should be the last). When it is done, i
delete the queue entry and then the lock.
The main remaining issue, is that if the whole cluster goes down, the
regionserver threads are not started up until they are associated with a master
so the logsplitter never runs, and the master stays blocked waiting for the
queue to drain.
Status:
▪ I need to rejigger some stuff to handle an entire cluster
loosing power(kill -9)
▪ More testing infrastructure is needed
▪ I am thinking about incorporating HBASE-2437
> [performance] Distributed splitting of regionserver commit logs
> ---------------------------------------------------------------
>
> Key: HBASE-1364
> URL: https://issues.apache.org/jira/browse/HBASE-1364
> Project: Hadoop HBase
> Issue Type: Improvement
> Reporter: stack
> Assignee: Alex Newman
> Priority: Critical
> Fix For: 0.21.0
>
>
> HBASE-1008 has some improvements to our log splitting on regionserver crash;
> but it needs to run even faster.
> (Below is from HBASE-1008)
> In bigtable paper, the split is distributed. If we're going to have 1000
> logs, we need to distribute or at least multithread the splitting.
> 1. As is, regions starting up expect to find one reconstruction log only.
> Need to make it so pick up a bunch of edit logs and it should be fine that
> logs are elsewhere in hdfs in an output directory written by all split
> participants whether multithreaded or a mapreduce-like distributed process
> (Lets write our distributed sort first as a MR so we learn whats involved;
> distributed sort, as much as possible should use MR framework pieces). On
> startup, regions go to this directory and pick up the files written by split
> participants deleting and clearing the dir when all have been read in. Making
> it so can take multiple logs for input, can also make the split process more
> robust rather than current tenuous process which loses all edits if it
> doesn't make it to the end without error.
> 2. Each column family rereads the reconstruction log to find its edits. Need
> to fix that. Split can sort the edits by column family so store only reads
> its edits.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.