[jira] Commented: (HBASE-2531) 32-bit encoding of regionnames waaaaaaayyyyy too susceptible to hash clashes

dhruba borthakur (JIRA) Tue, 11 May 2010 11:38:18 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12866265#action_12866265
 ]


dhruba borthakur commented on HBASE-2531:
-----------------------------------------

Please allow me to interject myself in this conversation.

It appears that UUID proposal will work. However, it always leaves the 
possibility of data corruption open. In the rare case when you might run two 
region servers on the same machine (if ever!), then we might get a chance of 
UUID collision, especially for a workload when region splitting occurs v 
frequently. The UUID approach also seems to imply that some sort of "migration 
of old format to new format" is required.

Isn't it more elegant and easier if we do the following: "when a region server 
splits a region it needs to create a new name, It can come up with a random 
number as it currently does and then try to create a znode in zookeeper with 
that name, if it already exists, then the region server can generate a new 
name. if the znode does not exist, then it will create the znode before 
creating the new region with the region name. will that work?" This will let us 
avoid any need for "migration" while preventing any possibility of uuid 
collisions.

> 32-bit encoding of regionnames waaaaaaayyyyy too susceptible to hash clashes
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-2531
>                 URL: https://issues.apache.org/jira/browse/HBASE-2531
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.20.5, 0.21.0
>
>
> Kannan tripped over two regionnames that hashed the same:
> Here is code demo'ing that his two names hash the same:
> {code}
> package org;
> import org.apache.hadoop.hbase.util.Bytes;
> import org.apache.hadoop.hbase.util.JenkinsHash;
> public class Testing {
>   public static void main(final String [] args) {
>     
> System.out.println(encodeRegionName(Bytes.toBytes("test1,6838000000,1273541236167")));
>     
> System.out.println(encodeRegionName(Bytes.toBytes("test1,0520100000,1273541610201")));
>   }
>   /**
>    * @param regionName
>    * @return the encodedName
>    */
>   public static int encodeRegionName(final byte [] regionName) {
>     return Math.abs(JenkinsHash.getInstance().hash(regionName, 
> regionName.length, 0));
>   }
> }
> {code}
> Need new encoding mechanism.  Will need to migrate old regions to new schema.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2531) 32-bit encoding of regionnames waaaaaaayyyyy too susceptible to hash clashes

Reply via email to