[ https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087207#comment-14087207 ]
Nick Dimiduk commented on HBASE-11682: -------------------------------------- bq. HBase also attempts to store rows near each other in the same region, on the same region server. This sentence doesn't help much. A region is a contiguous sequence of rows that are physically hosted as a unit. Rows on region boundaries are lexicographically near each other but are part of different regions, so there are no guarantees about them being hosted on the same region server. bq. However, poorly designed row keys can lead to <firstterm>hotspotting</firstterm>. This is where schema/rowkey design and access patterns go hand-in-hand. bq. Hotspotting occurs when nearly all the rows being written to HBase are written to the same region, because their row keys are contiguous or very similar. I'd say "Hotspotting occurs when too much client traffic is directed at a single region. This can be from reads, writes, or both. The traffic overwhelms the single machine responsible for hosting that region, causing performance degradation and potentially leading to region unavailability. This can also have adverse effects on other regions hosted by the same region server as that host is unable to service the requested load." bq. but in the bigger picture, data is being written to multiple regions across the cluster ... Again, not limited to writes. bq. One technique is to salt the row keys Is the term "salt" explained? bq. However, using totally random row keys would remove any benefit of HBase's row-sorting algorithm and cause very poor performance, as each get or scan would need to query all regions. You're assuming a sequential access pattern here. Random rowkeys can be okay for random read access patterns, in that load is spread all over the cluster. I've seen other issues around poor blockcache performance from completely random access patterns, but that's a slight tangent. > Explain hotspotting > ------------------- > > Key: HBASE-11682 > URL: https://issues.apache.org/jira/browse/HBASE-11682 > Project: HBase > Issue Type: Task > Components: documentation > Reporter: Misty Stanley-Jones > Assignee: Misty Stanley-Jones > Attachments: HBASE-11682.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)