[ 
https://issues.apache.org/jira/browse/FLINK-25397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhuoYu Chen updated FLINK-25397:
--------------------------------
    Description: 
Performing data bucketing execution: two tables (orders, orders_item), divided 
into buckets (bucketing) based on the same fields (orderid) and the same number 
of buckets. In join by order id, join and aggregation calculations can be 
performed independently, because the same order ids of both tables are divided 
into buckets with the same ids.
This has several advantages.:
1. Whenever a bucket of data is computed, the memory occupied by this bucket 
can be released immediately, so memory consumption can be limited by 
controlling the number of buckets processed in parallel.
2. reduces a lot of shuffling

  was:
Performing data bucketing execution: two tables (orders, orders_item), divided 
into buckets (bucketing) based on the same fields (orderid) and the same number 
of buckets. In join by order id, join and aggregation calculations can be 
performed independently, because the same order ids of both tables are divided 
into buckets with the same ids.
This has several advantages. 1. Whenever a bucket of data is computed, the 
memory occupied by this bucket can be released immediately, so memory 
consumption can be limited by controlling the number of buckets processed in 
parallel.
2. reduces a lot of shuffling


> support grouped_execution
> -------------------------
>
>                 Key: FLINK-25397
>                 URL: https://issues.apache.org/jira/browse/FLINK-25397
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table SQL / Legacy Planner, Table SQL / Planner, Table 
> SQL / Runtime
>    Affects Versions: 1.15.0
>            Reporter: ZhuoYu Chen
>            Priority: Major
>
> Performing data bucketing execution: two tables (orders, orders_item), 
> divided into buckets (bucketing) based on the same fields (orderid) and the 
> same number of buckets. In join by order id, join and aggregation 
> calculations can be performed independently, because the same order ids of 
> both tables are divided into buckets with the same ids.
> This has several advantages.:
> 1. Whenever a bucket of data is computed, the memory occupied by this bucket 
> can be released immediately, so memory consumption can be limited by 
> controlling the number of buckets processed in parallel.
> 2. reduces a lot of shuffling



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to