[
https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12849086#action_12849086
]
Lars George commented on HBASE-1364:
------------------------------------
@stack: I had the same suggestion using Coprocessors back when we discussed
this initially. Andrew said that it would not make sense because CP's can only
access local data, not random DFS file because of security concerns. I gave up
on that point knowing I had much less insight into how things work. But there
seems to be more interest apparently :)
The presorting of the log into pieces per region was also discussed here (or a
related issue), one idea was to use Zookeeper to stick in the ranges to be
processed and RS's working on those distributed until they all report back that
sorting has been done. Then the regions can read their edits locally. The other
notion here (pretty sure it was JD) was that index file kept with the log that
would give us know knowledge which regions are dirty and therefore unedited
regions can go live right away. Starting to accept writes before the region is
up seems like a doable technique as well. Not sure about the implications (say
what if they write with no WAL - or even with WAL for that matter I guess - and
you need to flush while the region is still in that "replay" state etc.)
I cannot recall though the BT approach you refer to though, i.e. the regions
are already online while log splitting. It sounded like they do that
distributed split and then have the regions read those edits. I need to read
this again it seems.
But seems like there is a lot of room for improvement there. ;)
> [performance] Distributed splitting of regionserver commit logs
> ---------------------------------------------------------------
>
> Key: HBASE-1364
> URL: https://issues.apache.org/jira/browse/HBASE-1364
> Project: Hadoop HBase
> Issue Type: Improvement
> Reporter: stack
> Priority: Critical
> Fix For: 0.21.0
>
>
> HBASE-1008 has some improvements to our log splitting on regionserver crash;
> but it needs to run even faster.
> (Below is from HBASE-1008)
> In bigtable paper, the split is distributed. If we're going to have 1000
> logs, we need to distribute or at least multithread the splitting.
> 1. As is, regions starting up expect to find one reconstruction log only.
> Need to make it so pick up a bunch of edit logs and it should be fine that
> logs are elsewhere in hdfs in an output directory written by all split
> participants whether multithreaded or a mapreduce-like distributed process
> (Lets write our distributed sort first as a MR so we learn whats involved;
> distributed sort, as much as possible should use MR framework pieces). On
> startup, regions go to this directory and pick up the files written by split
> participants deleting and clearing the dir when all have been read in. Making
> it so can take multiple logs for input, can also make the split process more
> robust rather than current tenuous process which loses all edits if it
> doesn't make it to the end without error.
> 2. Each column family rereads the reconstruction log to find its edits. Need
> to fix that. Split can sort the edits by column family so store only reads
> its edits.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.