[jira] [Updated] (HBASE-7006) [MTTR] Study distributed log splitting to see how we can make it faster

Jeffrey Zhong (JIRA) Tue, 23 Apr 2013 15:15:22 -0700

     [ 
https://issues.apache.org/jira/browse/HBASE-7006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jeffrey Zhong updated HBASE-7006:
---------------------------------

    Attachment: hbase-7006-combined-v1.patch

Thanks [~saint....@gmail.com] and [~anoopsamjohn] for reviewing! 

I included the following changes in the v1 patch:
1) Support for recovering wal edits of regions in disabling/disabled 
table(Theoretically  there is no need to recover wal edits of regions on a 
disabled table but I keep it to be compatible with before). 
2) Review feedbacks from Ted and Stack.

>From this point, I'll write more unit tests and start run integration tests. 

Below are answers to the latest feedbacks:

{quote}
Why we have this isReplay in a Mutation
{quote}
This is used inside HRegionServer#batchMutate for special handling of a reply 
mutation. For example, skip "readonly" check and coprocessor in the normal 
write path. I'll change this to "logReplay" per your suggestion. The other 
option is to add an addition "logReplay" argument for all functions in the 
write path, which isn't as clean as current way IMHO. 

{quote}
Does this define belong in this patch?
+ /** Conf key that specifies region assignment timeout value */
+ public static final String REGION_ASSIGNMENT_TIME_OUT = 
"hbase.master.region.assignment.time.out";
{quote}
I think the name is confusing and I changed it to 
"hbase.master.log.replay.wait.region.timeout". It's used by logReplay to wait 
for a region ready before we can replay wal edits against the region.

{quote}
If so, should it be updateMetaWALSplitTime? And given what this patch is about, 
should it be WALReplay?
{quote}
Good point. Fixed.

{quote}
Should we turn it on in trunk and off in 0.95?
{quote}
A good suggestion to bake it in trunk a little bit.

{quote}
Something wrong w/ license in WALEditsReplaySink
{quote}
Fixed. Good catch!

{quote}
Have you run the test with clients doing the writes to region soon after it is 
opened for write?
{quote}
No, I haven't yet. The performance test I run is against a cluster without load 
to easily compare results. I'll conduct more performance tests when the feature 
is fully ready. 


                
> [MTTR] Study distributed log splitting to see how we can make it faster
> -----------------------------------------------------------------------
>
>                 Key: HBASE-7006
>                 URL: https://issues.apache.org/jira/browse/HBASE-7006
>             Project: HBase
>          Issue Type: Bug
>          Components: MTTR
>            Reporter: stack
>            Assignee: Jeffrey Zhong
>            Priority: Critical
>             Fix For: 0.95.1
>
>         Attachments: hbase-7006-combined.patch, hbase-7006-combined-v1.patch, 
> LogSplitting Comparison.pdf, 
> ProposaltoimprovelogsplittingprocessregardingtoHBASE-7006-v2.pdf
>
>
> Just saw interesting issue where a cluster went down  hard and 30 nodes had 
> 1700 WALs to replay.  Replay took almost an hour.  It looks like it could run 
> faster that much of the time is spent zk'ing and nn'ing.
> Putting in 0.96 so it gets a look at least.  Can always punt.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7006) [MTTR] Study distributed log splitting to see how we can make it faster

Reply via email to