[ 
https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087207#comment-14087207
 ] 

Nick Dimiduk commented on HBASE-11682:
--------------------------------------

bq. HBase also attempts to store rows near each other in the same region, on 
the same region server.

This sentence doesn't help much. A region is a contiguous sequence of rows that 
are physically hosted as a unit. Rows on region boundaries are 
lexicographically near each other but are part of different regions, so there 
are no guarantees about them being hosted on the same region server.

bq. However, poorly designed row keys can lead to 
<firstterm>hotspotting</firstterm>.

This is where schema/rowkey design and access patterns go hand-in-hand.

bq. Hotspotting occurs when nearly all the rows being written to HBase are 
written to the same region, because their row keys are contiguous or very 
similar.

I'd say "Hotspotting occurs when too much client traffic is directed at a 
single region. This can be from reads, writes, or both. The traffic overwhelms 
the single machine responsible for hosting that region, causing performance 
degradation and potentially leading to region unavailability. This can also 
have adverse effects on other regions hosted by the same region server as that 
host is unable to service the requested load."

bq. but in the bigger picture, data is being written to multiple regions across 
the cluster ...

Again, not limited to writes.

bq. One technique is to salt the row keys

Is the term "salt" explained?

bq. However, using totally random row keys would remove any benefit of HBase's 
row-sorting algorithm and cause very poor performance, as each get or scan 
would need to query all regions.

You're assuming a sequential access pattern here. Random rowkeys can be okay 
for random read access patterns, in that load is spread all over the cluster. 
I've seen other issues around poor blockcache performance from completely 
random access patterns, but that's a slight tangent.

> Explain hotspotting
> -------------------
>
>                 Key: HBASE-11682
>                 URL: https://issues.apache.org/jira/browse/HBASE-11682
>             Project: HBase
>          Issue Type: Task
>          Components: documentation
>            Reporter: Misty Stanley-Jones
>            Assignee: Misty Stanley-Jones
>         Attachments: HBASE-11682.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to