[ https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087751#comment-14087751 ]
Jonathan Hsieh commented on HBASE-11682: ---------------------------------------- Nice addition. Personally, I don't really like the sematext definition of salting it conflates salting [1] with hashing[2] which are two separate things. *salting* adds random data to the start of a rowkey. this means depending on the 'salt factor' you could end up writing to n different row keys (and ideally n different regions). When reading you would generally want to read all n rows and coalesce the values. This is helpful if you have individual hot keys. It is often a bad smell because it is a trick used to try to mitigate having the date as the first part of a row key, but does have valid use cases. (ex: rowkey is name and you have a handful of individual celebreties - obama, bieber, gaga - that need to have their load spread). This preserves ordering but multiplies the number of reads required wrt # of writes. *hashing* applies a random one way function to the rowkey such that a particular row will get the same 'random' value prepended. The original row would get mapped to a single row. This is good for when you have clusters of related keys that in aggregate form a hotspot. (Example: rowkey is name and you have way to many joe's, john's, jon's, jonah's, jonathan's, and jonathons's all on the same region -- using a hash would spread all the j names around). this throws out the ability to effectively take advantage of the row ordering properties. Another trick is to take numeric or fixed length values and make the least significant digit (e.g. the one that changes the most) in least significant digit order (little endian). This effectively randomizes row key names but also sacrifices row ordering properties. [1] http://en.wikipedia.org/wiki/Salt_(cryptography) [2] http://en.wikipedia.org/wiki/Hash_function > Explain hotspotting > ------------------- > > Key: HBASE-11682 > URL: https://issues.apache.org/jira/browse/HBASE-11682 > Project: HBase > Issue Type: Task > Components: documentation > Reporter: Misty Stanley-Jones > Assignee: Misty Stanley-Jones > Attachments: HBASE-11682-1.patch, HBASE-11682.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)