[
https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13043193#comment-13043193
]
mingjian commented on HBASE-1364:
---------------------------------
It's a great work!
But I can't pass the test unit TestDistributedLogSplitting. I used 1364-v5.txt.
Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 159.419 sec <<<
FAILURE!
testWorkerAbort(org.apache.hadoop.hbase.master.TestDistributedLogSplitting)
Time elapsed: 60.897 sec <<< FAILURE!
java.lang.AssertionError: region server completed the split before aborting
at org.junit.Assert.fail(Assert.java:91)
at
org.apache.hadoop.hbase.master.TestDistributedLogSplitting.testWorkerAbort(TestDistributedLogSplitting.java:289)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
> [performance] Distributed splitting of regionserver commit logs
> ---------------------------------------------------------------
>
> Key: HBASE-1364
> URL: https://issues.apache.org/jira/browse/HBASE-1364
> Project: HBase
> Issue Type: Improvement
> Components: coprocessors
> Reporter: stack
> Assignee: Prakash Khemani
> Priority: Critical
> Fix For: 0.92.0
>
> Attachments: 1364-v5.txt, HBASE-1364.patch,
> org.apache.hadoop.hbase.master.TestDistributedLogSplitting-output.txt
>
> Time Spent: 8h
> Remaining Estimate: 0h
>
> HBASE-1008 has some improvements to our log splitting on regionserver crash;
> but it needs to run even faster.
> (Below is from HBASE-1008)
> In bigtable paper, the split is distributed. If we're going to have 1000
> logs, we need to distribute or at least multithread the splitting.
> 1. As is, regions starting up expect to find one reconstruction log only.
> Need to make it so pick up a bunch of edit logs and it should be fine that
> logs are elsewhere in hdfs in an output directory written by all split
> participants whether multithreaded or a mapreduce-like distributed process
> (Lets write our distributed sort first as a MR so we learn whats involved;
> distributed sort, as much as possible should use MR framework pieces). On
> startup, regions go to this directory and pick up the files written by split
> participants deleting and clearing the dir when all have been read in. Making
> it so can take multiple logs for input, can also make the split process more
> robust rather than current tenuous process which loses all edits if it
> doesn't make it to the end without error.
> 2. Each column family rereads the reconstruction log to find its edits. Need
> to fix that. Split can sort the edits by column family so store only reads
> its edits.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira