[jira] [Commented] (HBASE-11682) Explain hotspotting
[ https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14103331#comment-14103331 ] Hudson commented on HBASE-11682: FAILURE: Integrated in HBase-TRUNK #5413 (See [https://builds.apache.org/job/HBase-TRUNK/5413/]) HBASE-11682 Explain Hotspotting (Misty Stanley-Jones) (jmhsieh: rev ac2e1c33fd32a6b473ebbfdc32f5e631a69f2a6d) * src/main/docbkx/schema_design.xml > Explain hotspotting > --- > > Key: HBASE-11682 > URL: https://issues.apache.org/jira/browse/HBASE-11682 > Project: HBase > Issue Type: Task > Components: documentation >Reporter: Misty Stanley-Jones >Assignee: Misty Stanley-Jones > Fix For: 0.99.0, 2.0.0 > > Attachments: HBASE-11682-1.patch, HBASE-11682.patch, > HBASE-11682.patch, HBASE-11682.patch, HBASE-11682.patch, HBASE-11682.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11682) Explain hotspotting
[ https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14103143#comment-14103143 ] Hudson commented on HBASE-11682: FAILURE: Integrated in HBase-1.0 #113 (See [https://builds.apache.org/job/HBase-1.0/113/]) HBASE-11682 Explain Hotspotting (Misty Stanley-Jones) (jmhsieh: rev f4f77b5756464d3e48f686af18e92560e2a4f76b) * src/main/docbkx/schema_design.xml > Explain hotspotting > --- > > Key: HBASE-11682 > URL: https://issues.apache.org/jira/browse/HBASE-11682 > Project: HBase > Issue Type: Task > Components: documentation >Reporter: Misty Stanley-Jones >Assignee: Misty Stanley-Jones > Fix For: 0.99.0, 2.0.0 > > Attachments: HBASE-11682-1.patch, HBASE-11682.patch, > HBASE-11682.patch, HBASE-11682.patch, HBASE-11682.patch, HBASE-11682.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11682) Explain hotspotting
[ https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14103050#comment-14103050 ] Jonathan Hsieh commented on HBASE-11682: i noticed the wrong row got emphasized in the latest patch (c-foo0003 should be emphasized instead of c-foo0004) otherwise it looked good. Also noticed on the re-read that the reversing trick should have the least significant digit as the first part of the key. (ex: so 0001 0002 0003 would be "flipped" to be 1000 2000 and 3000 to end up on different servers). I'm going to fix both and commit. > Explain hotspotting > --- > > Key: HBASE-11682 > URL: https://issues.apache.org/jira/browse/HBASE-11682 > Project: HBase > Issue Type: Task > Components: documentation >Reporter: Misty Stanley-Jones >Assignee: Misty Stanley-Jones > Attachments: HBASE-11682-1.patch, HBASE-11682.patch, > HBASE-11682.patch, HBASE-11682.patch, HBASE-11682.patch, HBASE-11682.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11682) Explain hotspotting
[ https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102948#comment-14102948 ] Nick Dimiduk commented on HBASE-11682: -- [~misty] you are a saint of patience. +1 > Explain hotspotting > --- > > Key: HBASE-11682 > URL: https://issues.apache.org/jira/browse/HBASE-11682 > Project: HBase > Issue Type: Task > Components: documentation >Reporter: Misty Stanley-Jones >Assignee: Misty Stanley-Jones > Attachments: HBASE-11682-1.patch, HBASE-11682.patch, > HBASE-11682.patch, HBASE-11682.patch, HBASE-11682.patch, HBASE-11682.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11682) Explain hotspotting
[ https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102921#comment-14102921 ] Nick Dimiduk commented on HBASE-11682: -- A little nit-picky, but... (now you know what Aman went through ;) ) {noformat} +Suppose you have the following list of row keys, and your table is split in such a way + that all the rows starting with "foo" are in the same region. {noformat} I would say "... and your table is split such that there is one region for each letter of the alphabet -- prefix 'a' is one region, prefix 'b' is another. In this table, all rows starting with 'f' are in the same region." That is, be explicitly clear about the region split for the example. {noformat} +an http://phoenix.apache.org/salted.html";>article on Salted Tables {noformat} "an" should be "and" ? > Explain hotspotting > --- > > Key: HBASE-11682 > URL: https://issues.apache.org/jira/browse/HBASE-11682 > Project: HBase > Issue Type: Task > Components: documentation >Reporter: Misty Stanley-Jones >Assignee: Misty Stanley-Jones > Attachments: HBASE-11682-1.patch, HBASE-11682.patch, > HBASE-11682.patch, HBASE-11682.patch, HBASE-11682.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11682) Explain hotspotting
[ https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102875#comment-14102875 ] Jonathan Hsieh commented on HBASE-11682: +1 to NIck's clarifications > Explain hotspotting > --- > > Key: HBASE-11682 > URL: https://issues.apache.org/jira/browse/HBASE-11682 > Project: HBase > Issue Type: Task > Components: documentation >Reporter: Misty Stanley-Jones >Assignee: Misty Stanley-Jones > Attachments: HBASE-11682-1.patch, HBASE-11682.patch, > HBASE-11682.patch, HBASE-11682.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11682) Explain hotspotting
[ https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102867#comment-14102867 ] Nick Dimiduk commented on HBASE-11682: -- Very well articulated example, I like it! [~jmhsieh] you're right in that I don't think of using random data for a prefix because the nondeterminism makes gets ineffective. It is, however, a valid approach. {noformat} +Suppose you have the following list of row keys: {noformat} This example assumes the table is split in a way such that f* would be in a single region but a-, b-, c-, d- are in different regions. Be explicit about the region splits, include a sentence like "assume your table is split by letter, so the rowkey prefix {{a}} is on one region, {{b}} is on a second, {{c}} on a 3rd, &c." In that topology, then all the foo rows would be in the same region, and the prefixed rows are in different regions. {noformat} +Hashing {noformat} For this bit, you can add something like "using a deterministic hash allows the client to reconstruct the complete rowkey and use a get operation to retrieve that row as normal." The current text alludes to this, but maybe we can some out and say it explicitly. For references, you could also link off to Phoenix's "Salted Tables" description http://phoenix.apache.org/salted.html > Explain hotspotting > --- > > Key: HBASE-11682 > URL: https://issues.apache.org/jira/browse/HBASE-11682 > Project: HBase > Issue Type: Task > Components: documentation >Reporter: Misty Stanley-Jones >Assignee: Misty Stanley-Jones > Attachments: HBASE-11682-1.patch, HBASE-11682.patch, > HBASE-11682.patch, HBASE-11682.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11682) Explain hotspotting
[ https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14101735#comment-14101735 ] Jonathan Hsieh commented on HBASE-11682: +1. I like it. Before it gets committed, [~ndimiduk], do you agree with the latest version? Misty went into a little more detail but I think it captures two different techniques to deal with hostspotting and their distinct effects. > Explain hotspotting > --- > > Key: HBASE-11682 > URL: https://issues.apache.org/jira/browse/HBASE-11682 > Project: HBase > Issue Type: Task > Components: documentation >Reporter: Misty Stanley-Jones >Assignee: Misty Stanley-Jones > Attachments: HBASE-11682-1.patch, HBASE-11682.patch, > HBASE-11682.patch, HBASE-11682.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11682) Explain hotspotting
[ https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14101527#comment-14101527 ] Misty Stanley-Jones commented on HBASE-11682: - Thanks for the extra info, working on this now. > Explain hotspotting > --- > > Key: HBASE-11682 > URL: https://issues.apache.org/jira/browse/HBASE-11682 > Project: HBase > Issue Type: Task > Components: documentation >Reporter: Misty Stanley-Jones >Assignee: Misty Stanley-Jones > Attachments: HBASE-11682-1.patch, HBASE-11682.patch, HBASE-11682.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11682) Explain hotspotting
[ https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14101308#comment-14101308 ] Jonathan Hsieh commented on HBASE-11682: {code} + Salting in this sense has nothing to do with cryptography, but refers to adding random +data to the start of a row key. In this case, salting refers to adding a prefix to the row +key to cause it to sort differently than it otherwise would. Salting can be helpful if you +have a few keys that come up over and over, along with other rows that don't fit those keys. +In that case, the regions holding rows with the "hot" keys would be overloaded, compared to +the other regions. Salting completely removes ordering, so is often a poorer choice than +hashing. Using totally random row keys for data which is accessed sequentially would remove +the benefit of HBase's row-sorting algorithm and cause very poor performance, as each get or +scan would need to query all regions. {code} I don't think this salting example is correct about the ramifications. Both Nick and I agree that salting is puting some random value in front of the actual value. This means instead of one sorted list of entries, we'd have many n sorted lists of entries if the cardinality of the salt is n. Example: naively we have rowkeys like this: foo0001 foo0002 foo0003 foo0004 if we us a 4 way salt (a,b,c,d), we could end up with data resorted like this: a-foo0003 b-foo0001 c-foo0004 d-foo0002 Let say we add some new values to row foo0003. It could get salted with a new salt, let's say 'c'. a-foo0003 b-foo0001 *c-foo0003* c-foo0004 d-foo0002 To read we still could get things read in the original order but we'd have to have a reader starting from each salt in parallel to get the rows back in order. (and likely need to do some coalescing of foo0003 to combine the a-foo0003 and c-foo0003 rows back into one. The effect here in this situtation is that we could be writing with 4x the throughput now since we would be on 4 different machines.(assuming that the a, b, c, d are balanced onto different machines). Nick's point of view (please correct me if I am wrong) says that you could "salt" the original row key with a one-way hash so that foo0003 would always get salted with 'a'. This would spread rowkeys that are lexicographically close (foo0001 and foo0002) to different machines that could help reduce contention and increase overall throughput but not allow ever allow a single row to have 4x the throughput like the other approach. {code} + Hashing refers to applying a random one-way function to the row key, such that a +particular row always gets the same arbitrary value applied. This preserves the sort order +so that scans are effective, but spreads out load across a region. One example where hashing +is the right strategy would be if for some reason, a large proportion of rows started with +the same letter. Normally, these would all be sorted into the same region. You can apply a +hash to artificially differentiate them and spread them out. {code} Hashing actually totally trashes the sort order -- in fact the goal of hashing is to evenly disburse entries that are near each other lexicographically as much as possible. > Explain hotspotting > --- > > Key: HBASE-11682 > URL: https://issues.apache.org/jira/browse/HBASE-11682 > Project: HBase > Issue Type: Task > Components: documentation >Reporter: Misty Stanley-Jones >Assignee: Misty Stanley-Jones > Attachments: HBASE-11682-1.patch, HBASE-11682.patch, HBASE-11682.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11682) Explain hotspotting
[ https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088698#comment-14088698 ] Nick Dimiduk commented on HBASE-11682: -- bq. thanks for bearing with me on this I was going to say the same :) > Explain hotspotting > --- > > Key: HBASE-11682 > URL: https://issues.apache.org/jira/browse/HBASE-11682 > Project: HBase > Issue Type: Task > Components: documentation >Reporter: Misty Stanley-Jones >Assignee: Misty Stanley-Jones > Attachments: HBASE-11682-1.patch, HBASE-11682.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11682) Explain hotspotting
[ https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088590#comment-14088590 ] Jonathan Hsieh commented on HBASE-11682: sounds good to me. thanks for bearing with me on this. > Explain hotspotting > --- > > Key: HBASE-11682 > URL: https://issues.apache.org/jira/browse/HBASE-11682 > Project: HBase > Issue Type: Task > Components: documentation >Reporter: Misty Stanley-Jones >Assignee: Misty Stanley-Jones > Attachments: HBASE-11682-1.patch, HBASE-11682.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11682) Explain hotspotting
[ https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088575#comment-14088575 ] Nick Dimiduk commented on HBASE-11682: -- I don't think we need to define how the salt bytes might be calculated. It should be enough to define the concept as prepending some with a defined cardinality to avoid overburdening a single machine. I have no problem with pointing off to an external resource as an example of how it might be done. > Explain hotspotting > --- > > Key: HBASE-11682 > URL: https://issues.apache.org/jira/browse/HBASE-11682 > Project: HBase > Issue Type: Task > Components: documentation >Reporter: Misty Stanley-Jones >Assignee: Misty Stanley-Jones > Attachments: HBASE-11682-1.patch, HBASE-11682.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11682) Explain hotspotting
[ https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088536#comment-14088536 ] Jonathan Hsieh commented on HBASE-11682: More precisely, for "salt"/"stripe" I've seen prepend some fixed byte-length rand value, were rand value is between [0, k * (# of regionservers) ] where k is some constant. I think that is what you mean but i'm not sure. If we agree we should have misty define salt in the prose instead of pointing to an external definition. > Explain hotspotting > --- > > Key: HBASE-11682 > URL: https://issues.apache.org/jira/browse/HBASE-11682 > Project: HBase > Issue Type: Task > Components: documentation >Reporter: Misty Stanley-Jones >Assignee: Misty Stanley-Jones > Attachments: HBASE-11682-1.patch, HBASE-11682.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11682) Explain hotspotting
[ https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088458#comment-14088458 ] Nick Dimiduk commented on HBASE-11682: -- >From the conversations on user list I've seen, "salt" tends to mean "prepend >with some fixed-byte-length value, usually a modulo of the number of >regionservers" -- the same as your "striping". I've also seen lazy people >prepend with the first N bytes of the hashed rowkey, hence my loose language >in the previous comment. > Explain hotspotting > --- > > Key: HBASE-11682 > URL: https://issues.apache.org/jira/browse/HBASE-11682 > Project: HBase > Issue Type: Task > Components: documentation >Reporter: Misty Stanley-Jones >Assignee: Misty Stanley-Jones > Attachments: HBASE-11682-1.patch, HBASE-11682.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11682) Explain hotspotting
[ https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088454#comment-14088454 ] Jonathan Hsieh commented on HBASE-11682: bq. Even though it's not perfect, I think it best to stick with "salt". I'm not sure if we are on the same page here -- my main point is that "salt" != "hash". Also, "salt" != "Prepend a hash". If the term salt causes confusion, my suggestion is not to use it. The other mechanisms mentioned (striping/prepened a rand and endian inversion) are valid strategies to avoid hotspotting as well. > Explain hotspotting > --- > > Key: HBASE-11682 > URL: https://issues.apache.org/jira/browse/HBASE-11682 > Project: HBase > Issue Type: Task > Components: documentation >Reporter: Misty Stanley-Jones >Assignee: Misty Stanley-Jones > Attachments: HBASE-11682-1.patch, HBASE-11682.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11682) Explain hotspotting
[ https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088435#comment-14088435 ] Nick Dimiduk commented on HBASE-11682: -- bq. From what phoenix uses and what I think you mean, "bucketing', and "binning" are equivalent to prepending a hash. Yes, that's consistent with my understanding as well. I'd prefer not to introduce yet another term to the conversation (or add confusion with striped compactions). Even though it's not perfect, I think it best to stick with "salt". > Explain hotspotting > --- > > Key: HBASE-11682 > URL: https://issues.apache.org/jira/browse/HBASE-11682 > Project: HBase > Issue Type: Task > Components: documentation >Reporter: Misty Stanley-Jones >Assignee: Misty Stanley-Jones > Attachments: HBASE-11682-1.patch, HBASE-11682.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11682) Explain hotspotting
[ https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087891#comment-14087891 ] Jonathan Hsieh commented on HBASE-11682: bq. Please note that "salting" as we use the term in HBase is different from "salting" in the cryptographic sense. Our usage pattern is more accurately described as "bucketing" (I think this is the term Phoenix uses), or "binning" [0]. This horse has been beaten to death on the user and dev mailing lists, so I won't belabor the point. I agree they are different but the idea of a random (as opposed to a deterministic hash) prepended is very similar to the crypto "salting". Instead of using "salting" how about we use the term "striping"? What I've was referring and described takes one logical row and stripes it across many rows so that write throughput can be increased. The penalty is that we loose some consistency and also if we want a "whole" answer read all the rows the logical rows was striped over. I've seen the pattern deployed at several customers. >From what phoenix uses and what I think you mean, "bucketing', and "binning" >are equivalent to prepending a hash. > Explain hotspotting > --- > > Key: HBASE-11682 > URL: https://issues.apache.org/jira/browse/HBASE-11682 > Project: HBase > Issue Type: Task > Components: documentation >Reporter: Misty Stanley-Jones >Assignee: Misty Stanley-Jones > Attachments: HBASE-11682-1.patch, HBASE-11682.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11682) Explain hotspotting
[ https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087839#comment-14087839 ] Nick Dimiduk commented on HBASE-11682: -- bq. However, using totally random row keys for data which is accessed sequentially would remove the benefit of HBase's row-sorting algorithm and cause very poor performance, as each get or scan would need to query all regions. Prefixing with random byte prevents any meaningful use of scans; gets become your only option. This approach is indistinguishable from hashing the rowkey. I like the rest of the updates, +1 Thanks a lot [~misty]! > Explain hotspotting > --- > > Key: HBASE-11682 > URL: https://issues.apache.org/jira/browse/HBASE-11682 > Project: HBase > Issue Type: Task > Components: documentation >Reporter: Misty Stanley-Jones >Assignee: Misty Stanley-Jones > Attachments: HBASE-11682-1.patch, HBASE-11682.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11682) Explain hotspotting
[ https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087833#comment-14087833 ] Nick Dimiduk commented on HBASE-11682: -- Please note that "salting" as we use the term in HBase is different from "salting" in the cryptographic sense. Our usage pattern is more accurately described as "bucketing" (I think this is the term Phoenix uses), or "binning" [0]. This horse has been beaten to death on the user and dev mailing lists, so I won't belabor the point. [0]: http://en.wikipedia.org/wiki/Data_binning > Explain hotspotting > --- > > Key: HBASE-11682 > URL: https://issues.apache.org/jira/browse/HBASE-11682 > Project: HBase > Issue Type: Task > Components: documentation >Reporter: Misty Stanley-Jones >Assignee: Misty Stanley-Jones > Attachments: HBASE-11682-1.patch, HBASE-11682.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11682) Explain hotspotting
[ https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087751#comment-14087751 ] Jonathan Hsieh commented on HBASE-11682: Nice addition. Personally, I don't really like the sematext definition of salting it conflates salting [1] with hashing[2] which are two separate things. *salting* adds random data to the start of a rowkey. this means depending on the 'salt factor' you could end up writing to n different row keys (and ideally n different regions). When reading you would generally want to read all n rows and coalesce the values. This is helpful if you have individual hot keys. It is often a bad smell because it is a trick used to try to mitigate having the date as the first part of a row key, but does have valid use cases. (ex: rowkey is name and you have a handful of individual celebreties - obama, bieber, gaga - that need to have their load spread). This preserves ordering but multiplies the number of reads required wrt # of writes. *hashing* applies a random one way function to the rowkey such that a particular row will get the same 'random' value prepended. The original row would get mapped to a single row. This is good for when you have clusters of related keys that in aggregate form a hotspot. (Example: rowkey is name and you have way to many joe's, john's, jon's, jonah's, jonathan's, and jonathons's all on the same region -- using a hash would spread all the j names around). this throws out the ability to effectively take advantage of the row ordering properties. Another trick is to take numeric or fixed length values and make the least significant digit (e.g. the one that changes the most) in least significant digit order (little endian). This effectively randomizes row key names but also sacrifices row ordering properties. [1] http://en.wikipedia.org/wiki/Salt_(cryptography) [2] http://en.wikipedia.org/wiki/Hash_function > Explain hotspotting > --- > > Key: HBASE-11682 > URL: https://issues.apache.org/jira/browse/HBASE-11682 > Project: HBase > Issue Type: Task > Components: documentation >Reporter: Misty Stanley-Jones >Assignee: Misty Stanley-Jones > Attachments: HBASE-11682-1.patch, HBASE-11682.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11682) Explain hotspotting
[ https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087318#comment-14087318 ] Hadoop QA commented on HBASE-11682: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12660043/HBASE-11682-1.patch against trunk revision . ATTACHMENT ID: 12660043 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: + xlink:href="http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/"; + >http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/. + xlink:href="https://communities.intel.com/community/itpeernetwork/datastack/blog/2013/11/10/discussion-on-designing-hbase-tables"; + >https://communities.intel.com/community/itpeernetwork/datastack/blog/2013/11/10/discussion-on-designing-hbase-tables. {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestRegionReplicas org.apache.hadoop.hbase.master.TestRestartCluster org.apache.hadoop.hbase.client.TestReplicasClient org.apache.hadoop.hbase.master.TestMasterOperationsForRegionReplicas Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/10312//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10312//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10312//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10312//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10312//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10312//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10312//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10312//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10312//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10312//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/10312//console This message is automatically generated. > Explain hotspotting > --- > > Key: HBASE-11682 > URL: https://issues.apache.org/jira/browse/HBASE-11682 > Project: HBase > Issue Type: Task > Components: documentation >Reporter: Misty Stanley-Jones >Assignee: Misty Stanley-Jones > Attachments: HBASE-11682-1.patch, HBASE-11682.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11682) Explain hotspotting
[ https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087245#comment-14087245 ] Hadoop QA commented on HBASE-11682: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12660036/HBASE-11682.patch against trunk revision . ATTACHMENT ID: 12660036 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestRegionReplicas org.apache.hadoop.hbase.master.TestRestartCluster org.apache.hadoop.hbase.client.TestReplicasClient org.apache.hadoop.hbase.master.TestMasterOperationsForRegionReplicas {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): at org.apache.hadoop.hbase.client.TestHCM.testClusterStatus(TestHCM.java:250) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/10309//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10309//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10309//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10309//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10309//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10309//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10309//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10309//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10309//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10309//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/10309//console This message is automatically generated. > Explain hotspotting > --- > > Key: HBASE-11682 > URL: https://issues.apache.org/jira/browse/HBASE-11682 > Project: HBase > Issue Type: Task > Components: documentation >Reporter: Misty Stanley-Jones >Assignee: Misty Stanley-Jones > Attachments: HBASE-11682-1.patch, HBASE-11682.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11682) Explain hotspotting
[ https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087207#comment-14087207 ] Nick Dimiduk commented on HBASE-11682: -- bq. HBase also attempts to store rows near each other in the same region, on the same region server. This sentence doesn't help much. A region is a contiguous sequence of rows that are physically hosted as a unit. Rows on region boundaries are lexicographically near each other but are part of different regions, so there are no guarantees about them being hosted on the same region server. bq. However, poorly designed row keys can lead to hotspotting. This is where schema/rowkey design and access patterns go hand-in-hand. bq. Hotspotting occurs when nearly all the rows being written to HBase are written to the same region, because their row keys are contiguous or very similar. I'd say "Hotspotting occurs when too much client traffic is directed at a single region. This can be from reads, writes, or both. The traffic overwhelms the single machine responsible for hosting that region, causing performance degradation and potentially leading to region unavailability. This can also have adverse effects on other regions hosted by the same region server as that host is unable to service the requested load." bq. but in the bigger picture, data is being written to multiple regions across the cluster ... Again, not limited to writes. bq. One technique is to salt the row keys Is the term "salt" explained? bq. However, using totally random row keys would remove any benefit of HBase's row-sorting algorithm and cause very poor performance, as each get or scan would need to query all regions. You're assuming a sequential access pattern here. Random rowkeys can be okay for random read access patterns, in that load is spread all over the cluster. I've seen other issues around poor blockcache performance from completely random access patterns, but that's a slight tangent. > Explain hotspotting > --- > > Key: HBASE-11682 > URL: https://issues.apache.org/jira/browse/HBASE-11682 > Project: HBase > Issue Type: Task > Components: documentation >Reporter: Misty Stanley-Jones >Assignee: Misty Stanley-Jones > Attachments: HBASE-11682.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)