[ 
https://issues.apache.org/jira/browse/KUDU-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17534442#comment-17534442
 ] 

ASF subversion and git services commented on KUDU-2671:
-------------------------------------------------------

Commit 11db3f28b36d92ce1515bcaace51a3586838abcb in kudu's branch 
refs/heads/master from Abhishek Chennaka
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=11db3f28b ]

[master] KUDU-2671: Range specific hashing during table alter op.

This commit has the proto file, master.proto, updated to reflect
the changes needed for having custom hash partitioning while
adding a new range to an existing table.
The AlterTableRequestPB has the new field to accommodate the
new change.
The reason for pushing this patch with just this change is to
unblock client side work while the server side work is being
done.

Change-Id: Ifec5566ea8f3e49d00dcb6964b3d17c4be0504eb
Reviewed-on: http://gerrit.cloudera.org:8080/18485
Reviewed-by: Alexey Serbin <ale...@apache.org>
Tested-by: Kudu Jenkins


> Change hash number for range partitioning
> -----------------------------------------
>
>                 Key: KUDU-2671
>                 URL: https://issues.apache.org/jira/browse/KUDU-2671
>             Project: Kudu
>          Issue Type: Improvement
>          Components: client, java, master, server
>    Affects Versions: 1.8.0
>            Reporter: yangz
>            Assignee: Mahesh Reddy
>            Priority: Major
>              Labels: feature, roadmap-candidate, scalability
>         Attachments: 屏幕快照 2019-01-24 下午12.03.41.png
>
>
> For our usage, the kudu schema design isn't flexible enough.
> We create our table for day range such as dt='20181112' as hive table.
> But our data size change a lot every day, for one day it will be 50G, but for 
> some other day it will be 500G. For this case, it be hard to set the hash 
> schema. If too big, for most case, it will be too wasteful. But too small, 
> there is a performance problem in the case of a large amount of data.
>  
> So we suggest a solution we can change the hash number by the history data of 
> a table.
> for example
>  # we create schema with one estimated value.
>  # we collect the data size by day range
>  # we create new day range partition by our collected day size.
> We use this feature for half a year, and it work well. We hope this feature 
> will be useful for the community. Maybe the solution isn't so complete. 
> Please help us make it better.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to