[ https://issues.apache.org/jira/browse/HBASE-7006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jeffrey Zhong updated HBASE-7006: --------------------------------- Attachment: hbase-7006-combined-v1.patch Thanks [~saint....@gmail.com] and [~anoopsamjohn] for reviewing! I included the following changes in the v1 patch: 1) Support for recovering wal edits of regions in disabling/disabled table(Theoretically there is no need to recover wal edits of regions on a disabled table but I keep it to be compatible with before). 2) Review feedbacks from Ted and Stack. >From this point, I'll write more unit tests and start run integration tests. Below are answers to the latest feedbacks: {quote} Why we have this isReplay in a Mutation {quote} This is used inside HRegionServer#batchMutate for special handling of a reply mutation. For example, skip "readonly" check and coprocessor in the normal write path. I'll change this to "logReplay" per your suggestion. The other option is to add an addition "logReplay" argument for all functions in the write path, which isn't as clean as current way IMHO. {quote} Does this define belong in this patch? + /** Conf key that specifies region assignment timeout value */ + public static final String REGION_ASSIGNMENT_TIME_OUT = "hbase.master.region.assignment.time.out"; {quote} I think the name is confusing and I changed it to "hbase.master.log.replay.wait.region.timeout". It's used by logReplay to wait for a region ready before we can replay wal edits against the region. {quote} If so, should it be updateMetaWALSplitTime? And given what this patch is about, should it be WALReplay? {quote} Good point. Fixed. {quote} Should we turn it on in trunk and off in 0.95? {quote} A good suggestion to bake it in trunk a little bit. {quote} Something wrong w/ license in WALEditsReplaySink {quote} Fixed. Good catch! {quote} Have you run the test with clients doing the writes to region soon after it is opened for write? {quote} No, I haven't yet. The performance test I run is against a cluster without load to easily compare results. I'll conduct more performance tests when the feature is fully ready. > [MTTR] Study distributed log splitting to see how we can make it faster > ----------------------------------------------------------------------- > > Key: HBASE-7006 > URL: https://issues.apache.org/jira/browse/HBASE-7006 > Project: HBase > Issue Type: Bug > Components: MTTR > Reporter: stack > Assignee: Jeffrey Zhong > Priority: Critical > Fix For: 0.95.1 > > Attachments: hbase-7006-combined.patch, hbase-7006-combined-v1.patch, > LogSplitting Comparison.pdf, > ProposaltoimprovelogsplittingprocessregardingtoHBASE-7006-v2.pdf > > > Just saw interesting issue where a cluster went down hard and 30 nodes had > 1700 WALs to replay. Replay took almost an hour. It looks like it could run > faster that much of the time is spent zk'ing and nn'ing. > Putting in 0.96 so it gets a look at least. Can always punt. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira