Peter Ebert created KUDU-2585:
---------------------------------

             Summary: Custom Partitioning Schemes
                 Key: KUDU-2585
                 URL: https://issues.apache.org/jira/browse/KUDU-2585
             Project: Kudu
          Issue Type: New Feature
            Reporter: Peter Ebert


In HBase or HDFS tables you can come up with complex key design or partitioning 
(respectively) and build that logic into your application.  It would be nice to 
have more flexibility with Kudu beyond the range and hash options currently 
provided.

One example where this would help, borrowed from the docs:
CREATE TABLE metrics (
    host STRING NOT NULL,
    metric STRING NOT NULL,
    time INT64 NOT NULL,
    value DOUBLE NOT NULL,
    PRIMARY KEY (host, metric, time),
);
 
Now lets say these hosts to be stored in kudu are part of 2 Hadoop clusters 
which I happen to indicate as part of the hostname 
[c1dn1.domain.com|http://c1dn1.domain.com/] for cluster1 and 
[c2dn1.domain.com|http://c2dn1.domain.com/] for cluster2.  With a random hash 
and enough datanodes/hosts values, I might have to read all partitions because 
those will be randomly distributed.
 
If instead I can provide some UDF of some sort (or here even a simple substring 
of the first two letters) I could group cluster1 into one or a few different 
values, skipping reading any tablets for cluster 2 when I do a scan.
 
So instead of hash(host) it would be something like hash(substr(host, 1, 2)) 
but of course you could get more complex with a UDF and use the remainder of 
the string to hash and mod to 10 tablets to distribute the c1 to, and so on.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to