[jira] [Comment Edited] (PHOENIX-111) Improve intra-region parallelization

James Taylor (JIRA) Wed, 05 Mar 2014 14:17:42 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13921508#comment-13921508
 ]


James Taylor edited comment on PHOENIX-111 at 3/5/14 10:15 PM:
---------------------------------------------------------------

bq. For the 2nd case, you mean the actual start/stop keys for the regions will 
be byte[] ...?

No, not that one (I'll explain how we deal with that one below). The one I mean 
is the byte[]{1} - byte[]{2} case (really every other case except the last 
one). See what happens when you split these two, as I think maybe all but one 
range would probably be for unprintable characters (assuming the next part of 
the key is a VARCHAR or CHAR).

For the first and last range, we have a scheduled scan that calculates the 
first key and the last key by scanning the table (the last key uses some Scan 
hack to do this).

For all these cases, you're right, you must guarantee that you're covering the 
range completely. But this isn't that difficult, b/c the start key should be 
the region boundary and the last key should be the next region boundary. It's 
just that the start of the first split point will be "farther" away (i.e. after 
the start of a possible valid value) and the last split point too.

The trickiest one is DATE, I think. At least with CHAR data, you can guess a 
range (sure, there'll be skew), but for a DATE, if you use the full possible 
range of dates, you'd go from (1902 to 2038). Most likely the date is +/- 1 
year from the current date.

bq. Looking at the code, it handles keys of different lengths but tail-padding 
the shorter key with 0's. So it should handle keys of different length 
correctly.

That's not ideal, as we'd want to pad ourselves, with 0's where the VARHCAR is 
(not necessarily at the end). For example, a PK like this: INTEGER, VARCHAR, 
INTEGER, we'd want to pad in the middle.



was (Author: jamestaylor):
bq. For the 2nd case, you mean the actual start/stop keys for the regions will 
be byte[] {0} and byte[]{1}?


No, not that one (I'll explain how we deal with that one below). The one I mean 
is the byte[]{1} - byte[]{2} case (really every other case except the last 
one). See what happens when you split these two, as I think maybe all but one 
range would probably be for unprintable characters (assuming the next part of 
the key is a VARCHAR or CHAR).

For the first and last range, we have a scheduled scan that calculates the 
first key and the last key by scanning the table (the last key uses some Scan 
hack to do this).

For all these cases, you're right, you must guarantee that you're covering the 
range completely. But this isn't that difficult, b/c the start key should be 
the region boundary and the last key should be the next region boundary. It's 
just that the start of the first split point will be "farther" away (i.e. after 
the start of a possible valid value) and the last split point too.

The trickiest one is DATE, I think. At least with CHAR data, you can guess a 
range (sure, there'll be skew), but for a DATE, if you use the full possible 
range of dates, you'd go from (1902 to 2038). Most likely the date is +/- 1 
year from the current date.


> Improve intra-region parallelization
> ------------------------------------
>
>                 Key: PHOENIX-111
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-111
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>            Assignee: Lars Hofhansl
>
> The manner in which Phoenix parallelizes queries is explained in some detail 
> in the  Parallelization section here: 
> http://phoenix-hbase.blogspot.com/2013/02/phoenix-knobs-dials.html
> It's actually not that important to understand all the details. In the case 
> where we try to parallelize within a region, we rely on the HBase 
> Bytes.split() method (in DefaultParallelIteratorRegionSplitter) to split, 
> based on the start and end key of the region. We basically use that method to 
> come up with the start row and stop row of scans that will all run in 
> parallel across that region.
> The problem is, we haven't really tested this method, and I have my doubts 
> about it, especially when two keys are of different length. The first thing 
> that should be done is to write a few simple, independent tests using 
> Bytes.split() directly to confirm whether or not there's a problem:
> 1. Write some simple tests to see if Bytes.split() works as expected. Does it 
> work for two keys that are of different lengths? If not, we can likely take 
> two keys and make them the same length through padding b/c we know the 
> structure of the row key. The better we choose the split points to get even 
> distribution, the better our parallelization will be.
> 2. One case that I know will be problematic is when a table is salted. In 
> that case, we pre-split the table into N regions, where N is the 
> SALT_BUCKETS=<N> value. The problem in this case is that the Bytes.split() 
> points are going to be terrible, because it's not taking into account the 
> possible values of the row key. For example, imagine you have a table like 
> this:
> {code}
> CREATE TABLE foo(k VARCHAR PRIMARY KEY) SALT_BUCKETS=4
> {code}
> In this case, we'll pre-split the table and have the following region 
> boundaries: 0-1, 1-2, 2-3, 3-4
> What will be the Bytes.split() for these region boundaries? It would chunk it 
> up into even byte boundaries which is not ideal, because the VARCHAR value 
> would most likely be ascii characters in a range of 'A' to 'z'. We'd be much 
> better off if we took into account the data types of the row key when we 
> calculate these split points.
> So the second thing to do is make some simple improvements to the start/stop 
> key we pass Bytes.split() that take into account the data type of each column 
> that makes up the primary key.
> For Phoenix 5.0, we'll collect stats and drive this off of those, but for 
> now, there's likely a few simple things we could do to make a big improvement.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (PHOENIX-111) Improve intra-region parallelization

Reply via email to