[ 
https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126270#comment-13126270
 ] 

Nicolas Spiegelberg commented on HBASE-4489:
--------------------------------------------

@Dave: 

I think the main disconnect here how is we envision accomplishing the goal of 
'better key splitting'.  I think your patch provides a different way to split.  
However, both algorithms provide a splitting algorithm but do not provide an 
easy way to normalize your row into this keyspace.  We both assume you are 
instinctively normalizing to begin with.  I was suggesting adding a static 
normalization function to both algorithms, in addition to the comments you 
provided.

I still think that keeping the algorithm as ASCII would be better for 
readability & usage.  
1. Basically, we're telling users that they need to learn the JRuby byte 
translation API to issue get commands from the shell if they follow the 
UniformSplit strategy.   I use the shell far more than I create new tables.  
2. It's 8 bits per nibble.  The 2x savings will go away with Delta Encoding & 
naively using the default UINT128 sha1 would take up more space and be less 
compressible than an ASCII UINT32 anyways.
                
> Better key splitting in RegionSplitter
> --------------------------------------
>
>                 Key: HBASE-4489
>                 URL: https://issues.apache.org/jira/browse/HBASE-4489
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.4
>            Reporter: Dave Revell
>            Assignee: Dave Revell
>         Attachments: HBASE-4489-branch0.90-v1.patch, 
> HBASE-4489-branch0.90-v2.patch, HBASE-4489-branch0.90-v3.patch, 
> HBASE-4489-trunk-v1.patch, HBASE-4489-trunk-v2.patch, 
> HBASE-4489-trunk-v3.patch
>
>
> The RegionSplitter utility allows users to create a pre-split table from the 
> command line or do a rolling split on an existing table. It supports 
> pluggable split algorithms that implement the SplitAlgorithm interface. The 
> only/default SplitAlgorithm is one that assumes keys fall in the range from 
> ASCII string "00000000" to ASCII string "7FFFFFFF". This is not a sane 
> default, and seems useless to most users. Users are likely to be surprised by 
> the fact that all the region splits occur in in the byte range of ASCII 
> characters.
> A better default split algorithm would be one that evenly divides the space 
> of all bytes, which is what this patch does. Making a table with five regions 
> would split at \x33\x33..., \x66\x66...., \x99\x99..., \xCC\xCC..., and 
> \xFF\xFF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to