Balazs Varga created BAHIR-237:
----------------------------------
Summary: Hash and range partitioning support in Flink-Kudu
connector SQL DDL
Key: BAHIR-237
URL: https://issues.apache.org/jira/browse/BAHIR-237
Project: Bahir
Issue Type: Improvement
Components: Flink Streaming Connectors
Reporter: Balazs Varga
The current version of the Flink-Kudu connector's SQL DDL only supports a
limited set of properties. With regards to partitioning, only the
'kudu.hash-columns' option is available, which doesn't allow the setting of the
number of hash partitions. Range partitioning is currently not supported.
Since partitioning cannot be altered later, it should be possible to set the
number of hash buckets for each hash column in the DDL. A simple way to achieve
this is using additional properties. Here are some ways I can think of
specifying it:
*
'kudu.hash-columns'='col1,col2', kudu.hash-buckets'='4,8'
*
'kudu.hash-partitioning'='col1,4;col2,8'
*
'kudu.hash-buckets.col1' = '4', 'kudu.hash-buckets.col2' = '8'
I'd appreciate your input regarding which approach would be the best.
For range partitioning, I recommend adding a property to set the range
partitioning columns: 'kudu.range-columns'='col1,col2'
If this is correctly set for a table, the partitions themselves can be added
later using ALTER TABLE. Specifying the ranges here would add a lot of complex
(parsing) logic.
{{}}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)