[ 
https://issues.apache.org/jira/browse/PHOENIX-111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13921708#comment-13921708
 ] 

James Taylor edited comment on PHOENIX-111 at 3/6/14 12:26 AM:
---------------------------------------------------------------

Seems I'm confusing JIRA with my curly braces! :-)

+1 to your idea, but my vote for staging would be:

1) Fix the case for Bytes.split on a byte[]'s of length one. I didn't realize 
that - this will already help a lot for the salted case. For salted tables, we 
pre-split using this method: SchemaUtil.processSplits() which calls 
SaltingUtil.getSalteByteSplitPoints(). Perhaps we can create better splits 
there, based on our metadata from the pkColumns?
2) Given two byte[] split points, get the padding right based on the data 
types. We have the metadata in our PTable.getPKColumns(). We also have a class 
name RowKeySchema that let's you iterate through a byte[] that conforms to the 
schema (PTable.getRowKeySchema()).
3) Do the seekBefore trick - we do that for only the last region currently. We 
cache in StatsManagerImpl currently, but it's kind of a hack - client-side only 
cache. We plan to replace this with a server-side stats gatherer (we even have 
the structure in our PTable), but haven't gotten to it yet - next release. So 
I'm not sure if we should go very far down this path yet. 


was (Author: jamestaylor):
Seems I'm confusing JIRA with my curly braces! :-)

So for salted tables, I believe we do pad out the split

+1 to your idea, but my vote for staging would be:

1) Fix the case for Bytes.split on a byte[]'s of length one. I didn't realize 
that - this will already help a lot for the salted case. For salted tables, we 
pre-split using this method: SchemaUtil.processSplits() which calls 
SaltingUtil.getSalteByteSplitPoints(). Perhaps we can create better splits 
there, based on our metadata from the pkColumns?
2) Given two byte[] split points, get the padding right based on the data 
types. We have the metadata in our PTable.getPKColumns(). We also have a class 
name RowKeySchema that let's you iterate through a byte[] that conforms to the 
schema (PTable.getRowKeySchema()).
3) Do the seekBefore trick - we do that for only the last region currently. We 
cache in StatsManagerImpl currently, but it's kind of a hack - client-side only 
cache. We plan to replace this with a server-side stats gatherer (we even have 
the structure in our PTable), but haven't gotten to it yet - next release. So 
I'm not sure if we should go very far down this path yet. 

> Improve intra-region parallelization
> ------------------------------------
>
>                 Key: PHOENIX-111
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-111
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>            Assignee: Lars Hofhansl
>
> The manner in which Phoenix parallelizes queries is explained in some detail 
> in the  Parallelization section here: 
> http://phoenix-hbase.blogspot.com/2013/02/phoenix-knobs-dials.html
> It's actually not that important to understand all the details. In the case 
> where we try to parallelize within a region, we rely on the HBase 
> Bytes.split() method (in DefaultParallelIteratorRegionSplitter) to split, 
> based on the start and end key of the region. We basically use that method to 
> come up with the start row and stop row of scans that will all run in 
> parallel across that region.
> The problem is, we haven't really tested this method, and I have my doubts 
> about it, especially when two keys are of different length. The first thing 
> that should be done is to write a few simple, independent tests using 
> Bytes.split() directly to confirm whether or not there's a problem:
> 1. Write some simple tests to see if Bytes.split() works as expected. Does it 
> work for two keys that are of different lengths? If not, we can likely take 
> two keys and make them the same length through padding b/c we know the 
> structure of the row key. The better we choose the split points to get even 
> distribution, the better our parallelization will be.
> 2. One case that I know will be problematic is when a table is salted. In 
> that case, we pre-split the table into N regions, where N is the 
> SALT_BUCKETS=<N> value. The problem in this case is that the Bytes.split() 
> points are going to be terrible, because it's not taking into account the 
> possible values of the row key. For example, imagine you have a table like 
> this:
> {code}
> CREATE TABLE foo(k VARCHAR PRIMARY KEY) SALT_BUCKETS=4
> {code}
> In this case, we'll pre-split the table and have the following region 
> boundaries: 0-1, 1-2, 2-3, 3-4
> What will be the Bytes.split() for these region boundaries? It would chunk it 
> up into even byte boundaries which is not ideal, because the VARCHAR value 
> would most likely be ascii characters in a range of 'A' to 'z'. We'd be much 
> better off if we took into account the data types of the row key when we 
> calculate these split points.
> So the second thing to do is make some simple improvements to the start/stop 
> key we pass Bytes.split() that take into account the data type of each column 
> that makes up the primary key.
> For Phoenix 5.0, we'll collect stats and drive this off of those, but for 
> now, there's likely a few simple things we could do to make a big improvement.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to