[jira] [Commented] (CALCITE-5871) Data distributions need to be combined and represented.

LakeShen (Jira) Tue, 25 Jul 2023 18:58:09 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17747228#comment-17747228
 ]


LakeShen commented on CALCITE-5871:
-----------------------------------

Hi [~grandfisher] ,can you explain the problem you described through an SQL 
example? I think it will help people understand the problem better

> Data distributions need to be combined and represented.
> -------------------------------------------------------
>
>                 Key: CALCITE-5871
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5871
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>            Reporter: grandfisher
>            Priority: Major
>
> For a distributed partition database, the data may be partitioned by time, 
> and also hash partitioned by the `region` field.
> If there is agg that  aggregate on "(Day,Region)", It's hard to show AGG rel 
> distribution.（range(Day) hash(region))
> And for another hash shuffle join case  `( L join R  on L.a=R.c and L.b =R.d  
> ) as T` , now  T has satisfy two distributions, one is Hash(a,b) and  another 
> is Hash(c,d),  it's not Hash(a,b,c,d). But we must lost one of them because 
> the Reldistribution can  only  has one distribution.
> We think this is common in time-series distributed databases



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5871) Data distributions need to be combined and represented.

Reply via email to