Re: [PR] [FLINK-34379][table] Fix OutOfMemoryError with large queries [flink]
jeyhunkarimov commented on code in PR #24600: URL: https://github.com/apache/flink/pull/24600#discussion_r1601116454 ## flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/utils/DynamicPartitionPruningUtils.java: ## @@ -236,6 +238,9 @@ private void setTables(ContextResolvedTable catalogTable) { tables.add(catalogTable); } else { for (ContextResolvedTable thisTable : new ArrayList<>(tables)) { +if (tables.contains(catalogTable)) { Review Comment: HI @mumuhhh thanks for the ping and your suggestion! I think you are right. Now that I look in more detail, in addition to your suggestion, I think we can also remove the first `if` check in the method. I filed the patch: https://github.com/apache/flink/pull/24788 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-34379][table] Fix OutOfMemoryError with large queries [flink]
jeyhunkarimov commented on code in PR #24600: URL: https://github.com/apache/flink/pull/24600#discussion_r1601116454 ## flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/utils/DynamicPartitionPruningUtils.java: ## @@ -236,6 +238,9 @@ private void setTables(ContextResolvedTable catalogTable) { tables.add(catalogTable); } else { for (ContextResolvedTable thisTable : new ArrayList<>(tables)) { +if (tables.contains(catalogTable)) { Review Comment: HI @mumuhhh thanks for the ping. I think you are right. Now that I look in more detail, in addition to your suggestion, I think we can also remove the first `if` check in the method. I filed the patch: https://github.com/apache/flink/pull/24788 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-34379][table] Fix OutOfMemoryError with large queries [flink]
jeyhunkarimov commented on code in PR #24600: URL: https://github.com/apache/flink/pull/24600#discussion_r1601116454 ## flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/utils/DynamicPartitionPruningUtils.java: ## @@ -236,6 +238,9 @@ private void setTables(ContextResolvedTable catalogTable) { tables.add(catalogTable); } else { for (ContextResolvedTable thisTable : new ArrayList<>(tables)) { +if (tables.contains(catalogTable)) { Review Comment: HI @mumuhhh thanks for the ping. I think you are right. Now that I look in more detail, in addition to your suggestion, I think we can also remove the first `if` check in the method. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-34379][table] Fix OutOfMemoryError with large queries [flink]
mumuhhh commented on code in PR #24600: URL: https://github.com/apache/flink/pull/24600#discussion_r1600843684 ## flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/utils/DynamicPartitionPruningUtils.java: ## @@ -236,6 +238,9 @@ private void setTables(ContextResolvedTable catalogTable) { tables.add(catalogTable); } else { for (ContextResolvedTable thisTable : new ArrayList<>(tables)) { +if (tables.contains(catalogTable)) { Review Comment: > I think we can use a boolean flag to check here, then we don't need to call contains method every time, it is O(N) time complexity. > > ``` > boolean hasAdded = false; > for (ContextResolvedTable thisTable : new ArrayList<>(tables)) { > if (hasAdded) { > break; > } > if (!thisTable.getIdentifier().equals(catalogTable.getIdentifier())) { > tables.add(catalogTable); > hasAdded = true; > } > } > ``` I think we should modify the traversal logic. ``` boolean hasAdded = false; for (ContextResolvedTable thisTable : new ArrayList<>(tables)) { if (thisTable.getIdentifier().equals(catalogTable.getIdentifier())) { hasAdded = true; break; } } if (!hasAdded) { tables.add(catalogTable); } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-34379][table] Fix OutOfMemoryError with large queries [flink]
mumuhhh commented on code in PR #24600: URL: https://github.com/apache/flink/pull/24600#discussion_r1600843684 ## flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/utils/DynamicPartitionPruningUtils.java: ## @@ -236,6 +238,9 @@ private void setTables(ContextResolvedTable catalogTable) { tables.add(catalogTable); } else { for (ContextResolvedTable thisTable : new ArrayList<>(tables)) { +if (tables.contains(catalogTable)) { Review Comment: > 我想我们可以在这里使用布尔标志来检查,那么我们不需要每次都调用方法,它是 O(N) 时间复杂度。`contains` > > ``` > boolean hasAdded = false; > for (ContextResolvedTable thisTable : new ArrayList<>(tables)) { > if (hasAdded) { > break; > } > if (!thisTable.getIdentifier().equals(catalogTable.getIdentifier())) { > tables.add(catalogTable); > hasAdded = true; > } > } > ``` I think we should modify the traversal logic. ``` boolean hasAdded = false; for (ContextResolvedTable thisTable : new ArrayList<>(tables)) { if (thisTable.getIdentifier().equals(catalogTable.getIdentifier())) { hasAdded = true; break; } } if (!hasAdded) { tables.add(catalogTable); } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-34379][table] Fix OutOfMemoryError with large queries [flink]
mumuhhh commented on code in PR #24600: URL: https://github.com/apache/flink/pull/24600#discussion_r1600841072 ## flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/utils/DynamicPartitionPruningUtils.java: ## @@ -115,7 +117,7 @@ private static class DppDimSideChecker { private final RelNode relNode; private boolean hasFilter; private boolean hasPartitionedScan; -private final List tables = new ArrayList<>(); +private final Set tables = new HashSet<>(); Review Comment: Why do we write traversal comparisons like that? ``` boolean hasAdded = false; for (ContextResolvedTable thisTable : new ArrayList<>(tables)) { if (thisTable.getIdentifier().equals(catalogTable.getIdentifier())) { hasAdded = true; break; } } if (!hasAdded) { tables.add(catalogTable); } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-34379][table] Fix OutOfMemoryError with large queries [flink]
lsyldliu merged PR #24600: URL: https://github.com/apache/flink/pull/24600 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-34379][table] Fix OutOfMemoryError with large queries [flink]
lsyldliu commented on code in PR #24600: URL: https://github.com/apache/flink/pull/24600#discussion_r1581993829 ## flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/utils/DynamicPartitionPruningUtils.java: ## @@ -236,6 +238,9 @@ private void setTables(ContextResolvedTable catalogTable) { tables.add(catalogTable); } else { for (ContextResolvedTable thisTable : new ArrayList<>(tables)) { +if (tables.contains(catalogTable)) { Review Comment: I think we can use a boolean flag to check here, then we don't need to call `contains` method every time, it is O(N) time complexity. ``` boolean hasAdded = false; for (ContextResolvedTable thisTable : new ArrayList<>(tables)) { if (hasAdded) { break; } if (!thisTable.getIdentifier().equals(catalogTable.getIdentifier())) { tables.add(catalogTable); hasAdded = true; } } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-34379][table] Fix OutOfMemoryError with large queries [flink]
jeyhunkarimov commented on code in PR #24600: URL: https://github.com/apache/flink/pull/24600#discussion_r1581891157 ## flink-table/flink-table-planner/src/test/java/org/apache/flink/table/planner/plan/optimize/program/DynamicPartitionPruningProgramTest.java: ## @@ -81,6 +87,42 @@ void setup() { + ")"); } +@Test +void testLargeQueryPlanShouldNotOutOfMemory() { +// TABLE_OPTIMIZER_DYNAMIC_FILTERING_ENABLED is already enabled +List strings = new ArrayList<>(); Review Comment: Nice catch, I just copy-pasted the code in jira. Fixed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-34379][table] Fix OutOfMemoryError with large queries [flink]
jeyhunkarimov commented on PR #24600: URL: https://github.com/apache/flink/pull/24600#issuecomment-2081151799 Hi @lsyldliu thanks for the review. I addressed your comments. Could you please check in your available time? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-34379][table] Fix OutOfMemoryError with large queries [flink]
lsyldliu commented on code in PR #24600: URL: https://github.com/apache/flink/pull/24600#discussion_r1579386652 ## flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/utils/DynamicPartitionPruningUtils.java: ## @@ -115,7 +117,7 @@ private static class DppDimSideChecker { private final RelNode relNode; private boolean hasFilter; private boolean hasPartitionedScan; -private final List tables = new ArrayList<>(); +private final Set tables = new HashSet<>(); Review Comment: I think we can optimize this for loop by the way to reduce the time complexity. If the `catalogTable` has already been added to the collection `tables`, we can just exit the loop without having to do subsequent comparison operations. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-34379][table] Fix OutOfMemoryError with large queries [flink]
lsyldliu commented on code in PR #24600: URL: https://github.com/apache/flink/pull/24600#discussion_r1579356007 ## flink-table/flink-table-planner/src/test/java/org/apache/flink/table/planner/plan/optimize/program/DynamicPartitionPruningProgramTest.java: ## @@ -81,6 +87,42 @@ void setup() { + ")"); } +@Test +void testLargeQueryPlanShouldNotOutOfMemory() { +// TABLE_OPTIMIZER_DYNAMIC_FILTERING_ENABLED is already enabled +List strings = new ArrayList<>(); +for (int i = 0; i < 100; i++) { +util.tableEnv() +.executeSql( +"CREATE TABLE IF NOT EXISTS table" ++ i ++ "(att STRING,filename STRING) " ++ "with(" ++ " 'connector' = 'values', " ++ " 'runtime-source' = 'NewSource', " ++ " 'bounded' = 'true'" ++ ")"); +strings.add("select att,filename from table" + i); +} + +final String countName = "CNM"; +Table allUnionTable = util.tableEnv().sqlQuery(String.join(" UNION ALL ", strings)); +Table res = Review Comment: Can you complete this test pattern using SQL query purely instead of table API? ## flink-table/flink-table-planner/src/test/java/org/apache/flink/table/planner/plan/optimize/program/DynamicPartitionPruningProgramTest.java: ## @@ -81,6 +87,42 @@ void setup() { + ")"); } +@Test +void testLargeQueryPlanShouldNotOutOfMemory() { +// TABLE_OPTIMIZER_DYNAMIC_FILTERING_ENABLED is already enabled +List strings = new ArrayList<>(); Review Comment: strings -> subQueries? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-34379][table] Fix OutOfMemoryError with large queries [flink]
jeyhunkarimov commented on code in PR #24600: URL: https://github.com/apache/flink/pull/24600#discussion_r1546924713 ## flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/utils/DynamicPartitionPruningUtils.java: ## @@ -115,7 +117,7 @@ private static class DppDimSideChecker { private final RelNode relNode; private boolean hasFilter; private boolean hasPartitionedScan; -private final List tables = new ArrayList<>(); +private final Set tables = new HashSet<>(); Review Comment: OOM happens because of ``` for (ContextResolvedTable thisTable : new ArrayList<>(tables)) { if (!thisTable.getIdentifier().equals(catalogTable.getIdentifier())) { tables.add(catalogTable); } } ``` in `setTables` method. That is, `tables.add` is used without checking if `tables` already contains the `catalogTable` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-34379][table] Fix OutOfMemoryError with large queries [flink]
flinkbot commented on PR #24600: URL: https://github.com/apache/flink/pull/24600#issuecomment-2030689199 ## CI report: * 06e59ca12ef6650b79e82fb513c47e53d90f052e UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org