[jira] [Commented] (HBASE-11682) Explain hotspotting

2014-08-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14103331#comment-14103331
 ] 

Hudson commented on HBASE-11682:


FAILURE: Integrated in HBase-TRUNK #5413 (See 
[https://builds.apache.org/job/HBase-TRUNK/5413/])
HBASE-11682 Explain Hotspotting (Misty Stanley-Jones) (jmhsieh: rev 
ac2e1c33fd32a6b473ebbfdc32f5e631a69f2a6d)
* src/main/docbkx/schema_design.xml


> Explain hotspotting
> ---
>
> Key: HBASE-11682
> URL: https://issues.apache.org/jira/browse/HBASE-11682
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Reporter: Misty Stanley-Jones
>Assignee: Misty Stanley-Jones
> Fix For: 0.99.0, 2.0.0
>
> Attachments: HBASE-11682-1.patch, HBASE-11682.patch, 
> HBASE-11682.patch, HBASE-11682.patch, HBASE-11682.patch, HBASE-11682.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11682) Explain hotspotting

2014-08-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14103143#comment-14103143
 ] 

Hudson commented on HBASE-11682:


FAILURE: Integrated in HBase-1.0 #113 (See 
[https://builds.apache.org/job/HBase-1.0/113/])
HBASE-11682 Explain Hotspotting (Misty Stanley-Jones) (jmhsieh: rev 
f4f77b5756464d3e48f686af18e92560e2a4f76b)
* src/main/docbkx/schema_design.xml


> Explain hotspotting
> ---
>
> Key: HBASE-11682
> URL: https://issues.apache.org/jira/browse/HBASE-11682
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Reporter: Misty Stanley-Jones
>Assignee: Misty Stanley-Jones
> Fix For: 0.99.0, 2.0.0
>
> Attachments: HBASE-11682-1.patch, HBASE-11682.patch, 
> HBASE-11682.patch, HBASE-11682.patch, HBASE-11682.patch, HBASE-11682.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11682) Explain hotspotting

2014-08-19 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14103050#comment-14103050
 ] 

Jonathan Hsieh commented on HBASE-11682:


i noticed the wrong row got emphasized in the latest patch (c-foo0003 should be 
emphasized instead of c-foo0004) otherwise it looked good.  

Also noticed on the re-read that the reversing trick should have the least 
significant digit as the first part of the key.  (ex: so 0001 0002 0003 would 
be  "flipped" to be 1000 2000 and 3000 to end up on different servers).

I'm going to fix both and commit.

> Explain hotspotting
> ---
>
> Key: HBASE-11682
> URL: https://issues.apache.org/jira/browse/HBASE-11682
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Reporter: Misty Stanley-Jones
>Assignee: Misty Stanley-Jones
> Attachments: HBASE-11682-1.patch, HBASE-11682.patch, 
> HBASE-11682.patch, HBASE-11682.patch, HBASE-11682.patch, HBASE-11682.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11682) Explain hotspotting

2014-08-19 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102948#comment-14102948
 ] 

Nick Dimiduk commented on HBASE-11682:
--

[~misty] you are a saint of patience.

+1

> Explain hotspotting
> ---
>
> Key: HBASE-11682
> URL: https://issues.apache.org/jira/browse/HBASE-11682
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Reporter: Misty Stanley-Jones
>Assignee: Misty Stanley-Jones
> Attachments: HBASE-11682-1.patch, HBASE-11682.patch, 
> HBASE-11682.patch, HBASE-11682.patch, HBASE-11682.patch, HBASE-11682.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11682) Explain hotspotting

2014-08-19 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102921#comment-14102921
 ] 

Nick Dimiduk commented on HBASE-11682:
--

A little nit-picky, but... (now you know what Aman went through ;) )

{noformat}
+Suppose you have the following list of row keys, and your table 
is split in such a way
+  that all the rows starting with "foo" are in the same region.
{noformat}

I would say "... and your table is split such that there is one region for each 
letter of the alphabet -- prefix 'a' is one region, prefix 'b' is another. In 
this table, all rows starting with 'f' are in the same region."

That is, be explicitly clear about the region split for the example.

{noformat}
+an http://phoenix.apache.org/salted.html";>article on 
Salted Tables
{noformat}

"an" should be "and" ?

> Explain hotspotting
> ---
>
> Key: HBASE-11682
> URL: https://issues.apache.org/jira/browse/HBASE-11682
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Reporter: Misty Stanley-Jones
>Assignee: Misty Stanley-Jones
> Attachments: HBASE-11682-1.patch, HBASE-11682.patch, 
> HBASE-11682.patch, HBASE-11682.patch, HBASE-11682.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11682) Explain hotspotting

2014-08-19 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102875#comment-14102875
 ] 

Jonathan Hsieh commented on HBASE-11682:


+1 to NIck's clarifications

> Explain hotspotting
> ---
>
> Key: HBASE-11682
> URL: https://issues.apache.org/jira/browse/HBASE-11682
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Reporter: Misty Stanley-Jones
>Assignee: Misty Stanley-Jones
> Attachments: HBASE-11682-1.patch, HBASE-11682.patch, 
> HBASE-11682.patch, HBASE-11682.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11682) Explain hotspotting

2014-08-19 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102867#comment-14102867
 ] 

Nick Dimiduk commented on HBASE-11682:
--

Very well articulated example, I like it! [~jmhsieh] you're right in that I 
don't think of using random data for a prefix because the nondeterminism makes 
gets ineffective. It is, however, a valid approach.

{noformat}
+Suppose you have the following list of row keys:
{noformat}

This example assumes the table is split in a way such that f* would be in a 
single region but a-, b-, c-, d- are in different regions. Be explicit about 
the region splits, include a sentence like "assume your table is split by 
letter, so the rowkey prefix {{a}} is on one region, {{b}} is on a second, 
{{c}} on a 3rd, &c." In that topology, then all the foo rows would be in the 
same region, and the prefixed rows are in different regions.

{noformat}
+Hashing
{noformat}

For this bit, you can add something like "using a deterministic hash allows the 
client to reconstruct the complete rowkey and use a get operation to retrieve 
that row as normal." The current text alludes to this, but maybe we can some 
out and say it explicitly.

For references, you could also link off to Phoenix's "Salted Tables" 
description http://phoenix.apache.org/salted.html

> Explain hotspotting
> ---
>
> Key: HBASE-11682
> URL: https://issues.apache.org/jira/browse/HBASE-11682
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Reporter: Misty Stanley-Jones
>Assignee: Misty Stanley-Jones
> Attachments: HBASE-11682-1.patch, HBASE-11682.patch, 
> HBASE-11682.patch, HBASE-11682.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11682) Explain hotspotting

2014-08-18 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14101735#comment-14101735
 ] 

Jonathan Hsieh commented on HBASE-11682:


+1.  I like it.

Before it gets committed, [~ndimiduk], do you agree with the latest version?  
Misty went into a little more detail but I think it captures two different 
techniques to deal with hostspotting and their distinct effects.

> Explain hotspotting
> ---
>
> Key: HBASE-11682
> URL: https://issues.apache.org/jira/browse/HBASE-11682
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Reporter: Misty Stanley-Jones
>Assignee: Misty Stanley-Jones
> Attachments: HBASE-11682-1.patch, HBASE-11682.patch, 
> HBASE-11682.patch, HBASE-11682.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11682) Explain hotspotting

2014-08-18 Thread Misty Stanley-Jones (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14101527#comment-14101527
 ] 

Misty Stanley-Jones commented on HBASE-11682:
-

Thanks for the extra info, working on this now.

> Explain hotspotting
> ---
>
> Key: HBASE-11682
> URL: https://issues.apache.org/jira/browse/HBASE-11682
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Reporter: Misty Stanley-Jones
>Assignee: Misty Stanley-Jones
> Attachments: HBASE-11682-1.patch, HBASE-11682.patch, HBASE-11682.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11682) Explain hotspotting

2014-08-18 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14101308#comment-14101308
 ] 

Jonathan Hsieh commented on HBASE-11682:


{code}
+  Salting in this sense has nothing to do with cryptography, but 
refers to adding random
+data to the start of a row key. In this case, salting refers to adding 
a prefix to the row
+key to cause it to sort differently than it otherwise would. Salting 
can be helpful if you
+have a few keys that come up over and over, along with other rows that 
don't fit those keys.
+In that case, the regions holding rows with the "hot" keys would be 
overloaded, compared to
+the other regions. Salting completely removes ordering, so is often a 
poorer choice than
+hashing. Using totally random row keys for data which is accessed 
sequentially would remove
+the benefit of HBase's row-sorting algorithm and cause very poor 
performance, as each get or
+scan would need to query all regions.
{code}

I don't think this salting example is correct about the ramifications.  Both 
Nick and I agree that salting is puting some random value in front of the 
actual value.  This means instead of one sorted list of entries, we'd have many 
n sorted lists of entries if the cardinality of the salt is n.

Example:  naively we have rowkeys like this:

foo0001
foo0002
foo0003
foo0004

if we us a 4 way salt (a,b,c,d), we could end up with data resorted like this:

a-foo0003
b-foo0001
c-foo0004
d-foo0002

Let say we add some new values to row foo0003.  It could get salted with a new 
salt, let's say 'c'.

a-foo0003
b-foo0001
*c-foo0003*
c-foo0004
d-foo0002

To read we still could get things read in the original order but we'd have to 
have a reader starting from each salt in parallel to get the rows back in 
order. (and likely need to do some coalescing of foo0003 to combine the 
a-foo0003 and c-foo0003 rows back into one.  The effect here in this situtation 
is that we could be writing with 4x the throughput now since we would be on 4 
different machines.(assuming that the a, b, c, d are balanced onto different 
machines).

Nick's point of view (please correct me if I am wrong) says that you could 
"salt" the original row key with a one-way hash so that foo0003 would always 
get salted with 'a'.  This would spread rowkeys that are lexicographically 
close (foo0001 and foo0002) to different machines that could help reduce 
contention and increase overall throughput but not allow ever allow a single 
row to have 4x the throughput like the other approach.

{code}
+  Hashing refers to applying a random one-way function to the row 
key, such that a
+particular row always gets the same arbitrary value applied. This 
preserves the sort order
+so that scans are effective, but spreads out load across a region. One 
example where hashing
+is the right strategy would be if for some reason, a large proportion 
of rows started with
+the same letter. Normally, these would all be sorted into the same 
region. You can apply a
+hash to artificially differentiate them and spread them out.
{code}

Hashing actually totally trashes the sort order -- in fact the goal of hashing 
is to evenly disburse entries that are near each other lexicographically as 
much as possible.

> Explain hotspotting
> ---
>
> Key: HBASE-11682
> URL: https://issues.apache.org/jira/browse/HBASE-11682
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Reporter: Misty Stanley-Jones
>Assignee: Misty Stanley-Jones
> Attachments: HBASE-11682-1.patch, HBASE-11682.patch, HBASE-11682.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11682) Explain hotspotting

2014-08-06 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088698#comment-14088698
 ] 

Nick Dimiduk commented on HBASE-11682:
--

bq. thanks for bearing with me on this

I was going to say the same :)

> Explain hotspotting
> ---
>
> Key: HBASE-11682
> URL: https://issues.apache.org/jira/browse/HBASE-11682
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Reporter: Misty Stanley-Jones
>Assignee: Misty Stanley-Jones
> Attachments: HBASE-11682-1.patch, HBASE-11682.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11682) Explain hotspotting

2014-08-06 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088590#comment-14088590
 ] 

Jonathan Hsieh commented on HBASE-11682:


sounds good to me.  thanks for bearing with me on this.

> Explain hotspotting
> ---
>
> Key: HBASE-11682
> URL: https://issues.apache.org/jira/browse/HBASE-11682
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Reporter: Misty Stanley-Jones
>Assignee: Misty Stanley-Jones
> Attachments: HBASE-11682-1.patch, HBASE-11682.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11682) Explain hotspotting

2014-08-06 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088575#comment-14088575
 ] 

Nick Dimiduk commented on HBASE-11682:
--

I don't think we need to define how the salt bytes might be calculated. It 
should be enough to define the concept as prepending some with a defined 
cardinality to avoid overburdening a single machine. I have no problem with 
pointing off to an external resource as an example of how it might be done.

> Explain hotspotting
> ---
>
> Key: HBASE-11682
> URL: https://issues.apache.org/jira/browse/HBASE-11682
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Reporter: Misty Stanley-Jones
>Assignee: Misty Stanley-Jones
> Attachments: HBASE-11682-1.patch, HBASE-11682.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11682) Explain hotspotting

2014-08-06 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088536#comment-14088536
 ] 

Jonathan Hsieh commented on HBASE-11682:


More precisely, for "salt"/"stripe" I've seen prepend some fixed byte-length 
rand value, were rand value is between [0, k * (# of regionservers) ] where k 
is some constant.  I think that is what you mean but i'm not sure.  If we agree 
we should have misty define salt in the prose instead of pointing to an 
external definition.




> Explain hotspotting
> ---
>
> Key: HBASE-11682
> URL: https://issues.apache.org/jira/browse/HBASE-11682
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Reporter: Misty Stanley-Jones
>Assignee: Misty Stanley-Jones
> Attachments: HBASE-11682-1.patch, HBASE-11682.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11682) Explain hotspotting

2014-08-06 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088458#comment-14088458
 ] 

Nick Dimiduk commented on HBASE-11682:
--

>From the conversations on user list I've seen, "salt" tends to mean "prepend 
>with some fixed-byte-length value, usually a modulo of the number of 
>regionservers" -- the same as your "striping". I've also seen lazy people 
>prepend with the first N bytes of the hashed rowkey, hence my loose language 
>in the previous comment.

> Explain hotspotting
> ---
>
> Key: HBASE-11682
> URL: https://issues.apache.org/jira/browse/HBASE-11682
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Reporter: Misty Stanley-Jones
>Assignee: Misty Stanley-Jones
> Attachments: HBASE-11682-1.patch, HBASE-11682.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11682) Explain hotspotting

2014-08-06 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088454#comment-14088454
 ] 

Jonathan Hsieh commented on HBASE-11682:


bq. Even though it's not perfect, I think it best to stick with "salt".

I'm not sure if we are on the same page here -- my main point is that "salt" != 
"hash".   Also, "salt" != "Prepend a hash".  If the term salt causes confusion, 
my suggestion is not to use it.

The other mechanisms mentioned (striping/prepened a rand and endian inversion) 
are valid strategies to avoid hotspotting as well.

> Explain hotspotting
> ---
>
> Key: HBASE-11682
> URL: https://issues.apache.org/jira/browse/HBASE-11682
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Reporter: Misty Stanley-Jones
>Assignee: Misty Stanley-Jones
> Attachments: HBASE-11682-1.patch, HBASE-11682.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11682) Explain hotspotting

2014-08-06 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088435#comment-14088435
 ] 

Nick Dimiduk commented on HBASE-11682:
--

bq. From what phoenix uses and what I think you mean, "bucketing', and 
"binning" are equivalent to prepending a hash.

Yes, that's consistent with my understanding as well. I'd prefer not to 
introduce yet another term to the conversation (or add confusion with striped 
compactions). Even though it's not perfect, I think it best to stick with 
"salt".

> Explain hotspotting
> ---
>
> Key: HBASE-11682
> URL: https://issues.apache.org/jira/browse/HBASE-11682
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Reporter: Misty Stanley-Jones
>Assignee: Misty Stanley-Jones
> Attachments: HBASE-11682-1.patch, HBASE-11682.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11682) Explain hotspotting

2014-08-06 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087891#comment-14087891
 ] 

Jonathan Hsieh commented on HBASE-11682:


bq. Please note that "salting" as we use the term in HBase is different from 
"salting" in the cryptographic sense. Our usage pattern is more accurately 
described as "bucketing" (I think this is the term Phoenix uses), or "binning" 
[0]. This horse has been beaten to death on the user and dev mailing lists, so 
I won't belabor the point.

I agree they are different but the idea of a random (as opposed to a 
deterministic hash) prepended is very similar to the crypto "salting".  Instead 
of using "salting" how about we use the term "striping"?  What I've was 
referring and described takes one logical row and stripes it across many rows 
so that write throughput can be increased.  The penalty is that we loose some 
consistency and also if we want a "whole" answer read all the rows the logical 
rows was striped over.  I've seen the pattern deployed at several customers.

>From what phoenix uses and what I think you mean, "bucketing', and "binning" 
>are equivalent to prepending a hash.

> Explain hotspotting
> ---
>
> Key: HBASE-11682
> URL: https://issues.apache.org/jira/browse/HBASE-11682
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Reporter: Misty Stanley-Jones
>Assignee: Misty Stanley-Jones
> Attachments: HBASE-11682-1.patch, HBASE-11682.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11682) Explain hotspotting

2014-08-06 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087839#comment-14087839
 ] 

Nick Dimiduk commented on HBASE-11682:
--

bq. However, using totally random row keys for data which is accessed 
sequentially would remove the benefit of HBase's row-sorting algorithm and 
cause very poor performance, as each get or scan would need to query all 
regions.

Prefixing with random byte prevents any meaningful use of scans; gets become 
your only option. This approach is indistinguishable from hashing the rowkey.

I like the rest of the updates, +1

Thanks a lot [~misty]!

> Explain hotspotting
> ---
>
> Key: HBASE-11682
> URL: https://issues.apache.org/jira/browse/HBASE-11682
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Reporter: Misty Stanley-Jones
>Assignee: Misty Stanley-Jones
> Attachments: HBASE-11682-1.patch, HBASE-11682.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11682) Explain hotspotting

2014-08-06 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087833#comment-14087833
 ] 

Nick Dimiduk commented on HBASE-11682:
--

Please note that "salting" as we use the term in HBase is different from 
"salting" in the cryptographic sense. Our usage pattern is more accurately 
described as "bucketing" (I think this is the term Phoenix uses), or "binning" 
[0]. This horse has been beaten to death on the user and dev mailing lists, so 
I won't belabor the point.

[0]: http://en.wikipedia.org/wiki/Data_binning

> Explain hotspotting
> ---
>
> Key: HBASE-11682
> URL: https://issues.apache.org/jira/browse/HBASE-11682
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Reporter: Misty Stanley-Jones
>Assignee: Misty Stanley-Jones
> Attachments: HBASE-11682-1.patch, HBASE-11682.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11682) Explain hotspotting

2014-08-06 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087751#comment-14087751
 ] 

Jonathan Hsieh commented on HBASE-11682:


Nice addition.  Personally, I don't really like the sematext definition of 
salting it conflates salting [1] with hashing[2] which are two separate things.

*salting* adds random data to the start of a rowkey. this means depending on 
the 'salt factor' you could end up writing to n different row keys (and ideally 
n different regions).  When reading you would generally want to read all n rows 
and coalesce the values.  This is helpful if you have individual hot keys.  It 
is often a bad smell because it is a trick used to try to mitigate having the 
date as the first part of a row key, but does have valid use cases. (ex: rowkey 
is name and you have a handful of individual celebreties  - obama, bieber, gaga 
- that need to have their load spread).  This preserves ordering but multiplies 
the number of reads required wrt # of writes.

*hashing* applies a random one way function to the rowkey such that a 
particular row will get the same 'random' value prepended.  The original row 
would get mapped to a single row.   This is good for when you have clusters of 
related keys that in aggregate form a hotspot.  (Example: rowkey is name and 
you have way to many joe's, john's, jon's, jonah's, jonathan's, and jonathons's 
all on the same region -- using a hash would spread all the j names around).  
this throws out the ability to effectively take advantage of the row ordering 
properties.

Another trick is to take numeric or fixed length values and make the least 
significant digit (e.g. the one that changes the most) in least significant 
digit order (little endian).  This effectively randomizes row key names but 
also sacrifices row ordering properties.

[1] http://en.wikipedia.org/wiki/Salt_(cryptography)
[2] http://en.wikipedia.org/wiki/Hash_function

> Explain hotspotting
> ---
>
> Key: HBASE-11682
> URL: https://issues.apache.org/jira/browse/HBASE-11682
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Reporter: Misty Stanley-Jones
>Assignee: Misty Stanley-Jones
> Attachments: HBASE-11682-1.patch, HBASE-11682.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11682) Explain hotspotting

2014-08-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087318#comment-14087318
 ] 

Hadoop QA commented on HBASE-11682:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12660043/HBASE-11682-1.patch
  against trunk revision .
  ATTACHMENT ID: 12660043

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+  
xlink:href="http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/";
+  
>http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/.
+  
xlink:href="https://communities.intel.com/community/itpeernetwork/datastack/blog/2013/11/10/discussion-on-designing-hbase-tables";
+  
>https://communities.intel.com/community/itpeernetwork/datastack/blog/2013/11/10/discussion-on-designing-hbase-tables.

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.regionserver.TestRegionReplicas
  org.apache.hadoop.hbase.master.TestRestartCluster
  org.apache.hadoop.hbase.client.TestReplicasClient
  
org.apache.hadoop.hbase.master.TestMasterOperationsForRegionReplicas

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10312//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10312//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10312//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10312//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10312//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10312//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10312//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10312//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10312//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10312//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10312//console

This message is automatically generated.

> Explain hotspotting
> ---
>
> Key: HBASE-11682
> URL: https://issues.apache.org/jira/browse/HBASE-11682
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Reporter: Misty Stanley-Jones
>Assignee: Misty Stanley-Jones
> Attachments: HBASE-11682-1.patch, HBASE-11682.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11682) Explain hotspotting

2014-08-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087245#comment-14087245
 ] 

Hadoop QA commented on HBASE-11682:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12660036/HBASE-11682.patch
  against trunk revision .
  ATTACHMENT ID: 12660036

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.regionserver.TestRegionReplicas
  org.apache.hadoop.hbase.master.TestRestartCluster
  org.apache.hadoop.hbase.client.TestReplicasClient
  
org.apache.hadoop.hbase.master.TestMasterOperationsForRegionReplicas

 {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s):   
at org.apache.hadoop.hbase.client.TestHCM.testClusterStatus(TestHCM.java:250)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10309//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10309//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10309//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10309//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10309//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10309//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10309//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10309//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10309//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10309//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10309//console

This message is automatically generated.

> Explain hotspotting
> ---
>
> Key: HBASE-11682
> URL: https://issues.apache.org/jira/browse/HBASE-11682
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Reporter: Misty Stanley-Jones
>Assignee: Misty Stanley-Jones
> Attachments: HBASE-11682-1.patch, HBASE-11682.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11682) Explain hotspotting

2014-08-05 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087207#comment-14087207
 ] 

Nick Dimiduk commented on HBASE-11682:
--

bq. HBase also attempts to store rows near each other in the same region, on 
the same region server.

This sentence doesn't help much. A region is a contiguous sequence of rows that 
are physically hosted as a unit. Rows on region boundaries are 
lexicographically near each other but are part of different regions, so there 
are no guarantees about them being hosted on the same region server.

bq. However, poorly designed row keys can lead to 
hotspotting.

This is where schema/rowkey design and access patterns go hand-in-hand.

bq. Hotspotting occurs when nearly all the rows being written to HBase are 
written to the same region, because their row keys are contiguous or very 
similar.

I'd say "Hotspotting occurs when too much client traffic is directed at a 
single region. This can be from reads, writes, or both. The traffic overwhelms 
the single machine responsible for hosting that region, causing performance 
degradation and potentially leading to region unavailability. This can also 
have adverse effects on other regions hosted by the same region server as that 
host is unable to service the requested load."

bq. but in the bigger picture, data is being written to multiple regions across 
the cluster ...

Again, not limited to writes.

bq. One technique is to salt the row keys

Is the term "salt" explained?

bq. However, using totally random row keys would remove any benefit of HBase's 
row-sorting algorithm and cause very poor performance, as each get or scan 
would need to query all regions.

You're assuming a sequential access pattern here. Random rowkeys can be okay 
for random read access patterns, in that load is spread all over the cluster. 
I've seen other issues around poor blockcache performance from completely 
random access patterns, but that's a slight tangent.

> Explain hotspotting
> ---
>
> Key: HBASE-11682
> URL: https://issues.apache.org/jira/browse/HBASE-11682
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Reporter: Misty Stanley-Jones
>Assignee: Misty Stanley-Jones
> Attachments: HBASE-11682.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)