[jira] [Updated] (FLINK-34442) Support optimizations for pre-partitioned [external] data sources

Jeyhun Karimov (Jira) Tue, 13 Feb 2024 10:23:04 -0800


     [ 
https://issues.apache.org/jira/browse/FLINK-34442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jeyhun Karimov updated FLINK-34442:
-----------------------------------
    Description: 
There are some use-cases in which data sources are pre-partitioned:

- Kafka broker is already partitioned w.r.t. some key[s]
- There are multiple [Flink] jobs  that materialize their outputs and read them 
as input subsequently

One of the main benefits is that we might avoid unnecessary shuffling. 
There is already an experimental feature in DataStream to support a subset of 
these [1].
We should support this for Flink Table/SQL as well. 

[1] 
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/experimental/

  was:
There are some use-cases in which data sources are pre-partitioned:

- Kafka broker is already partitioned w.r.t. some key
- There are multiple Flink jobs  that materialize their outputs and read them 
as input subsequently

One of the main benefits is that we might avoid unnecessary shuffling. 
There is already an experimental feature in DataStream to support a subset of 
these [1].
We should support this for Flink Table/SQL as well. 

[1] 
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/experimental/


> Support optimizations for pre-partitioned [external] data sources
> -----------------------------------------------------------------
>
>                 Key: FLINK-34442
>                 URL: https://issues.apache.org/jira/browse/FLINK-34442
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table SQL / API, Table SQL / Planner
>    Affects Versions: 1.18.1
>            Reporter: Jeyhun Karimov
>            Priority: Major
>
> There are some use-cases in which data sources are pre-partitioned:
> - Kafka broker is already partitioned w.r.t. some key[s]
> - There are multiple [Flink] jobs  that materialize their outputs and read 
> them as input subsequently
> One of the main benefits is that we might avoid unnecessary shuffling. 
> There is already an experimental feature in DataStream to support a subset of 
> these [1].
> We should support this for Flink Table/SQL as well. 
> [1] 
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/experimental/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-34442) Support optimizations for pre-partitioned [external] data sources

Reply via email to