[ 
https://issues.apache.org/jira/browse/KUDU-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17574900#comment-17574900
 ] 

ASF subversion and git services commented on KUDU-2671:
-------------------------------------------------------

Commit 8df970f7a6520bb0dc0f9cc89ad7f62ab349e84d in kudu's branch 
refs/heads/master from Alexey Serbin
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=8df970f7a ]

KUDU-2671 fix updating partition end keys for unbounded ranges

This patch fixes an issue with updating the hash bucket components for
partitions' end keys in case of unbounded ranges.  I also added new test
scenarios that allowed to spot the issue.  The newly added scenarios
would fail if not including the fix.  In addition, I updated a few other
test scenarios to match the current logic of updating the hash partition
component of partition end key.

The previous implementation of updating the hash components tried to
avoid holes in the partition keyspace by carrying over iotas to next
hash dimension index, but with the introduction of range-specific hash
schemas it's no longer possible to do so while keeping the result
key ranges disjoint.

For example, consider the following partitioning:
  [-inf, 0) x 3 buckets x 3 buckets; [0, +inf) x 2 buckets x 2 buckets

The original set of ranges looks like the following:
  [ (0, 0,  ""), (0, 0, "0") )
  [ (0, 1,  ""), (0, 1, "0") )
  [ (0, 2,  ""), (0, 2, "0") )
  [ (1, 0,  ""), (1, 0, "0") )
  [ (1, 1,  ""), (1, 1, "0") )
  [ (1, 2,  ""), (1, 2, "0") )
  [ (2, 0,  ""), (2, 0, "0") )
  [ (2, 1,  ""), (2, 1, "0") )
  [ (2, 2,  ""), (2, 2, "0") )

  [ (0, 0, "0"), (0, 0,  "") )
  [ (0, 1, "0"), (0, 1,  "") )
  [ (1, 0, "0"), (1, 0,  "") )
  [ (1, 1, "0"), (1, 1,  "") )

The previous implementation would transform the key range
[ (0, 1, "0"), (0, 1,  "") ) into [ (0, 1, "0"), (1, 0,  "") ), but
that would put the key range [ (0, 2,  ""), (0, 2, "0") ) inside the
transformed one. That would mess up the interval logic of the partition
pruning and the client's metacache.

As it turns out, the continuous keyspace was just a nice thing to have
since the partition pruning and the client metacache work fine with
sparse keyspaces as well.

Change-Id: I3775f914a3b7cdc294a26911663c2d848f4e491b
Reviewed-on: http://gerrit.cloudera.org:8080/18808
Tested-by: Kudu Jenkins
Reviewed-by: Mahesh Reddy <mre...@cloudera.com>
Reviewed-by: Abhishek Chennaka <achenn...@cloudera.com>
Reviewed-by: Attila Bukor <abu...@apache.org>


> Change hash number for range partitioning
> -----------------------------------------
>
>                 Key: KUDU-2671
>                 URL: https://issues.apache.org/jira/browse/KUDU-2671
>             Project: Kudu
>          Issue Type: Improvement
>          Components: client, java, master, server
>    Affects Versions: 1.8.0
>            Reporter: yangz
>            Assignee: Mahesh Reddy
>            Priority: Major
>              Labels: feature, roadmap-candidate, scalability
>         Attachments: 屏幕快照 2019-01-24 下午12.03.41.png
>
>
> For our usage, the kudu schema design isn't flexible enough.
> We create our table for day range such as dt='20181112' as hive table.
> But our data size change a lot every day, for one day it will be 50G, but for 
> some other day it will be 500G. For this case, it be hard to set the hash 
> schema. If too big, for most case, it will be too wasteful. But too small, 
> there is a performance problem in the case of a large amount of data.
>  
> So we suggest a solution we can change the hash number by the history data of 
> a table.
> for example
>  # we create schema with one estimated value.
>  # we collect the data size by day range
>  # we create new day range partition by our collected day size.
> We use this feature for half a year, and it work well. We hope this feature 
> will be useful for the community. Maybe the solution isn't so complete. 
> Please help us make it better.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to