[jira] [Comment Edited] (FLINK-32780) Release Testing: Verify FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

dalongliu (Jira) Mon, 28 Aug 2023 05:51:06 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-32780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17759578#comment-17759578
 ]


dalongliu edited comment on FLINK-32780 at 8/28/23 12:50 PM:
-------------------------------------------------------------

RuntimeFilter was validated against the TPC-DS test set. Enabling the 
table.optimizer.runtime-filter.enabled option. The following validations were 
done separately:
1. verified the plan of the query, and confirmed that many queries were 
inserted into the RuntimeFilter through the plan.
2. For the whole TPC-DS dataset, the gain of RuntimeFilter is 5%, the queries 
with significant gain are q88, q93, q95, other queries with limited gain.
3. q24, q72 showed performance regression, especially q72. By checking the plan 
of q72, compared with not turning on RuntimeFilter, we found that there is a 
dependency between the upstream and downstream operators, which leads to the 
source node not being able to be executed in parallel, thus leading to the 
performance regression. I don't think we should insert the RuntimeFilter 
operator for this pattern because it doesn't filter the amount of data that the 
Join operator needs to process by itself.

!image-2023-08-28-20-50-26-687.png!


was (Author: lsy):
RuntimeFilter was validated against the TPC-DS test set. Enabling the 
table.optimizer.runtime-filter.enabled option. The following validations were 
done separately:
1. verified the plan of the query, and confirmed that many queries were 
inserted into the RuntimeFilter through the plan.
2. For the whole TPC-DS dataset, the gain of RuntimeFilter is 5%, the queries 
with significant gain are q88, q93, q95, other queries with limited gain.
3. q24, q72 showed performance regression, especially q72. By checking the plan 
of q72, compared with not turning on RuntimeFilter, we found that there is a 
dependency between the upstream and downstream operators, which leads to the 
source node not being able to be executed in parallel, thus leading to the 
performance regression. I don't think we should insert the RuntimeFilter 
operator for this pattern because it doesn't filter the amount of data that the 
Join operator needs to process by itself.

!https://alidocs.oss-cn-zhangjiakou.aliyuncs.com/res/8oLl97DWKaLzqapY/img/fb011d3a-0e16-41e4-90eb-24d5be7b509e.png#255!

> Release Testing: Verify FLIP-324: Introduce Runtime Filter for Flink Batch 
> Jobs
> -------------------------------------------------------------------------------
>
>                 Key: FLINK-32780
>                 URL: https://issues.apache.org/jira/browse/FLINK-32780
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Tests
>    Affects Versions: 1.18.0
>            Reporter: Qingsheng Ren
>            Assignee: dalongliu
>            Priority: Major
>             Fix For: 1.18.0
>
>         Attachments: image-2023-08-28-20-50-26-687.png
>
>
> This issue aims to verify FLIP-324: 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-324%3A+Introduce+Runtime+Filter+for+Flink+Batch+Jobs
> We can enable runtime filter by set: table.optimizer.runtime-filter.enabled: 
> true
> 1. Create two tables, one small table (small amount of data), one large table 
> (large amount of data), and then run join query on these two tables(such as 
> the example in FLIP doc: SELECT * FROM fact, dim WHERE x = a AND z = 2). The 
> Flink table planner should be able to obtain the statistical information of 
> these two tables (for example, Hive table), and the data volume of the small 
> table should be less than 
> "table.optimizer.runtime-filter.max-build-data-size", and the data volume of 
> the large table should be larger than 
> "table.optimizer.runtime-filter.min-probe-data-size".
> 2. Show the plan of the join query. The plan should include nodes such as 
> LocalRuntimeFilterBuilder, GlobalRuntimeFilterBuilder and RuntimeFilter. We 
> can also verify plan for the various variants of above query.
> 3. Execute the above plan, and: 
> * Check whether the data in the large table has been successfully filtered  
> * Verify the execution result, the execution result should be same with the 
> execution plan which disable runtime filter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (FLINK-32780) Release Testing: Verify FLIP-324: Introduce Runtime Filter for Flink Batch Jobs

Reply via email to