[ https://issues.apache.org/jira/browse/CALCITE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17747290#comment-17747290 ]
grandfisher edited comment on CALCITE-5871 at 7/26/23 6:07 AM: --------------------------------------------------------------- OK, I have found *RelCompositeTrait* can solve data has satisfied more than one disrtibution trait. Howerver, It still confuse us in time-series distributed databases. For example, in some database such as doris and es , the data has Partition and Distribution. Suppose there are two days of data, 2023-01-01 and 2023-01-02 Every day's data has five buckets, and every day's data will enter the corresponding bucket according to a certain hash key. If such a table is queried, how should the data be considered distributed? Data satisfy {*}RANGE_DISTRIBUTION{*}. and each *Partition* data satisfy {*}HASH_DISTRIBUTION{*}. But we don't think this can be expressed with {*}RelCompositeTrait{*}. was (Author: JIRAUSER298606): OK, I have found *RelCompositeTrait* can solve data has satisfy more than one disrtibution trait. Howerver, It still confuse us in time-series distributed databases. For example, in some database such as doris and es , the data has Partition and Distribution. Suppose there are two days of data, 2023-01-01 and 2023-01-02 Every day's data has five buckets, and every day's data will enter the corresponding bucket according to a certain hash key. If such a table is queried, how should the data be considered distributed? Data satisfy {*}RANGE_DISTRIBUTION{*}. and each *Partition* data satisfy {*}HASH_DISTRIBUTION{*}. But we don't think this can be expressed with {*}RelCompositeTrait{*}. > Data distributions need to be combined and represented. > ------------------------------------------------------- > > Key: CALCITE-5871 > URL: https://issues.apache.org/jira/browse/CALCITE-5871 > Project: Calcite > Issue Type: Improvement > Components: core > Reporter: grandfisher > Priority: Major > > For a distributed partition database, the data may be partitioned by time, > and also hash partitioned by the `region` field. > If there is agg that aggregate on "(Day,Region)", It's hard to show AGG rel > distribution.(range(Day) hash(region)) > And for another hash shuffle join case `( L join R on L.a=R.c and L.b =R.d > ) as T` , now T has satisfy two distributions, one is Hash(a,b) and another > is Hash(c,d), it's not Hash(a,b,c,d). But we must lost one of them because > the Reldistribution can only has one distribution. > We think this is common in time-series distributed databases -- This message was sent by Atlassian Jira (v8.20.10#820010)