[ 
https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116639#comment-13116639
 ] 

Dave Revell commented on HBASE-4489:
------------------------------------

@Jonathan Hsieh, thanks for your thoughts.

When you say you agree with jgray: he actually wants to do two things. (1) stop 
using ASCII and (2) remove the 0x7F range bug. It sounds like you only agree 
with removing the 0x7F range bug but not with avoiding ASCII, for the default 
split algorithm? 

I agree in principle with your comment about preserving behavior between minor 
releases. If there were a valid use case for the existing code, I would agree 
that we should leave it. But given its current brokenness, we should fix it all 
the way instead of creating an intermediate slightly-broken state that falls 
short of a real fix. We're already breaking any existing use cases by virtue of 
fixing the range bug. We should not create another generation of broken use 
cases before making the real fix, IMO.

I agree that tests would be a good idea. I'll hopefully find some time for that 
soon.
                
> Better key splitting in RegionSplitter
> --------------------------------------
>
>                 Key: HBASE-4489
>                 URL: https://issues.apache.org/jira/browse/HBASE-4489
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.4
>            Reporter: Dave Revell
>            Assignee: Dave Revell
>         Attachments: HBASE-4489-branch0.90-v1.patch, HBASE-4489-trunk-v1.patch
>
>
> The RegionSplitter utility allows users to create a pre-split table from the 
> command line or do a rolling split on an existing table. It supports 
> pluggable split algorithms that implement the SplitAlgorithm interface. The 
> only/default SplitAlgorithm is one that assumes keys fall in the range from 
> ASCII string "00000000" to ASCII string "7FFFFFFF". This is not a sane 
> default, and seems useless to most users. Users are likely to be surprised by 
> the fact that all the region splits occur in in the byte range of ASCII 
> characters.
> A better default split algorithm would be one that evenly divides the space 
> of all bytes, which is what this patch does. Making a table with five regions 
> would split at \x33\x33..., \x66\x66...., \x99\x99..., \xCC\xCC..., and 
> \xFF\xFF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to