[
https://issues.apache.org/jira/browse/HADOOP-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
stack updated HADOOP-1662:
--------------------------
Attachment: splits-v3.patch
version 3
Improvements around recovery from catastrophic loss of ROOT and
META regions (Needed to make TestRegionServerAbort work reliably
after application of this splits patch).
M src/contrib/hbase/src/test/org/apache/hadoop/hbase/MiniHBaseCluster.java
Added logging of abort, close and wait. Also on abort/close
was doing a remove that made it so subsequent wait had nothing to
wait on.
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HLog.java
Debug logging around split and edits.
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HMaster.java
Added toString to each of the PendingOperation implementations.
In the ShutdownPendingOperation scan of meta data, removed
check of startcode (if the server name is that of the dead
server, it needs reassigning even if start code is good).
Also, if server name is null -- possible if we are missing
edits off end of log -- then the region should be reassigned
just in case its from the dead server. Also, if reassigning,
clear from pendingRegions. Server may have died after sending
region is up but before the server confirms receipt in the
meta scan. Added mare detail to each log. In OpenPendingOperation
we were trying to clear pendingRegion in wrong place -- it was
never executed (regions were always pending).
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/util/Keying.java
(intToBytes, longToBytes, getBytes, bytesToString, bytesToLong): Added.
> [hbase] Make region splits faster
> ---------------------------------
>
> Key: HADOOP-1662
> URL: https://issues.apache.org/jira/browse/HADOOP-1662
> Project: Hadoop
> Issue Type: Improvement
> Components: contrib/hbase
> Reporter: stack
> Assignee: stack
> Attachments: fastsplits.patch, mapfile_split.patch, splits-2.patch,
> splits-v3.patch
>
>
> HADOOP-1644 '[hbase] Compactions should take no longer than period between
> memcache flushes' is about making compactions run faster. This issue is
> about making splits faster. Currently splits are done by reading as input a
> map file and per record, writing out two new mapfiles. Its currently too
> slow. ~30 seconds to split 120MB. Google hints in bigtable that splitting is
> very fast because they let the split children feed off the split parent.
> Primitive testing has splitting mapfiles using raw streams running 3 to 4
> times faster than splitting on mapfile keys.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.