[ 
https://issues.apache.org/jira/browse/FLINK-31240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin updated FLINK-31240:
-----------------------------
    Fix Version/s: 1.18.0

> Reduce the overhead of conversion between DataStream and Table
> --------------------------------------------------------------
>
>                 Key: FLINK-31240
>                 URL: https://issues.apache.org/jira/browse/FLINK-31240
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table SQL / API
>            Reporter: Jiang Xin
>            Assignee: Yunfeng Zhou
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.18.0
>
>
> In some cases, users may need to convert the underlying DataStream to Table 
> and then convert it back to DataStream(e.g. some Flink ML libraries accept a 
> Table as input and convert it to DataStream for calculation.). This would 
> cause unnecessary overhead because of data conversion between the internal 
> data type and the external data type.
> We can reduce the overhead by checking if there are paired 
> `fromDataStream`/`toDataStream` function call without any transformation, if 
> so using the source datastream directly.
>  The performance of Flink ML's Bucketizer algorithm[1] is used to demonstrate 
> the impact of this optimization. The execution time is obtained by taking the 
> median execution time across 5 runs for each setup.
> Before optimization: 40746ms
> After optimization: 12972ms
> Thus this optimization reduces the total execution time of Flink ML's 
> Bucketizer algorithm to about 1/3.
> [1] 
> https://github.com/apache/flink-ml/blob/master/flink-ml-benchmark/src/main/resources/bucketizer-benchmark.json



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to