[ https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115052#comment-13115052 ]
Dave Revell commented on HBASE-4489: ------------------------------------ The bug in MD5StringSplit I mentioned in my earlier comment occurs in the definition of the variable MAXMD5 in RegionSplitter.java. > Better key splitting in RegionSplitter > -------------------------------------- > > Key: HBASE-4489 > URL: https://issues.apache.org/jira/browse/HBASE-4489 > Project: HBase > Issue Type: Improvement > Affects Versions: 0.90.4 > Reporter: Dave Revell > Assignee: Dave Revell > Attachments: HBASE-4489-branch0.90-v1.patch, HBASE-4489-trunk-v1.patch > > > The RegionSplitter utility allows users to create a pre-split table from the > command line or do a rolling split on an existing table. It supports > pluggable split algorithms that implement the SplitAlgorithm interface. The > only/default SplitAlgorithm is one that assumes keys fall in the range from > ASCII string "00000000" to ASCII string "7FFFFFFF". This is not a sane > default, and seems useless to most users. Users are likely to be surprised by > the fact that all the region splits occur in in the byte range of ASCII > characters. > A better default split algorithm would be one that evenly divides the space > of all bytes, which is what this patch does. Making a table with five regions > would split at \x33\x33..., \x66\x66...., \x99\x99..., \xCC\xCC..., and > \xFF\xFF. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira