[ 
https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087751#comment-14087751
 ] 

Jonathan Hsieh commented on HBASE-11682:
----------------------------------------

Nice addition.  Personally, I don't really like the sematext definition of 
salting it conflates salting [1] with hashing[2] which are two separate things.

*salting* adds random data to the start of a rowkey. this means depending on 
the 'salt factor' you could end up writing to n different row keys (and ideally 
n different regions).  When reading you would generally want to read all n rows 
and coalesce the values.  This is helpful if you have individual hot keys.  It 
is often a bad smell because it is a trick used to try to mitigate having the 
date as the first part of a row key, but does have valid use cases. (ex: rowkey 
is name and you have a handful of individual celebreties  - obama, bieber, gaga 
- that need to have their load spread).  This preserves ordering but multiplies 
the number of reads required wrt # of writes.

*hashing* applies a random one way function to the rowkey such that a 
particular row will get the same 'random' value prepended.  The original row 
would get mapped to a single row.   This is good for when you have clusters of 
related keys that in aggregate form a hotspot.  (Example: rowkey is name and 
you have way to many joe's, john's, jon's, jonah's, jonathan's, and jonathons's 
all on the same region -- using a hash would spread all the j names around).  
this throws out the ability to effectively take advantage of the row ordering 
properties.

Another trick is to take numeric or fixed length values and make the least 
significant digit (e.g. the one that changes the most) in least significant 
digit order (little endian).  This effectively randomizes row key names but 
also sacrifices row ordering properties.

[1] http://en.wikipedia.org/wiki/Salt_(cryptography)
[2] http://en.wikipedia.org/wiki/Hash_function

> Explain hotspotting
> -------------------
>
>                 Key: HBASE-11682
>                 URL: https://issues.apache.org/jira/browse/HBASE-11682
>             Project: HBase
>          Issue Type: Task
>          Components: documentation
>            Reporter: Misty Stanley-Jones
>            Assignee: Misty Stanley-Jones
>         Attachments: HBASE-11682-1.patch, HBASE-11682.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to