Re: [PR] [FLINK-34379][table] Fix OutOfMemoryError with large queries [flink]

2024-05-15 Thread via GitHub


jeyhunkarimov commented on code in PR #24600:
URL: https://github.com/apache/flink/pull/24600#discussion_r1601116454


##
flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/utils/DynamicPartitionPruningUtils.java:
##
@@ -236,6 +238,9 @@ private void setTables(ContextResolvedTable catalogTable) {
 tables.add(catalogTable);
 } else {
 for (ContextResolvedTable thisTable : new ArrayList<>(tables)) 
{
+if (tables.contains(catalogTable)) {

Review Comment:
   HI @mumuhhh thanks for the ping and your suggestion! I think you are right. 
Now that I look in more detail, in addition to your suggestion, I think we can 
also remove the first  `if` check in the method. I filed the patch: 
https://github.com/apache/flink/pull/24788



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-34379][table] Fix OutOfMemoryError with large queries [flink]

2024-05-15 Thread via GitHub


jeyhunkarimov commented on code in PR #24600:
URL: https://github.com/apache/flink/pull/24600#discussion_r1601116454


##
flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/utils/DynamicPartitionPruningUtils.java:
##
@@ -236,6 +238,9 @@ private void setTables(ContextResolvedTable catalogTable) {
 tables.add(catalogTable);
 } else {
 for (ContextResolvedTable thisTable : new ArrayList<>(tables)) 
{
+if (tables.contains(catalogTable)) {

Review Comment:
   HI @mumuhhh thanks for the ping. I think you are right. Now that I look in 
more detail, in addition to your suggestion, I think we can also remove the 
first  `if` check in the method. I filed the patch: 
https://github.com/apache/flink/pull/24788



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-34379][table] Fix OutOfMemoryError with large queries [flink]

2024-05-15 Thread via GitHub


jeyhunkarimov commented on code in PR #24600:
URL: https://github.com/apache/flink/pull/24600#discussion_r1601116454


##
flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/utils/DynamicPartitionPruningUtils.java:
##
@@ -236,6 +238,9 @@ private void setTables(ContextResolvedTable catalogTable) {
 tables.add(catalogTable);
 } else {
 for (ContextResolvedTable thisTable : new ArrayList<>(tables)) 
{
+if (tables.contains(catalogTable)) {

Review Comment:
   HI @mumuhhh thanks for the ping. I think you are right. Now that I look in 
more detail, in addition to your suggestion, I think we can also remove the 
first  `if` check in the method.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-34379][table] Fix OutOfMemoryError with large queries [flink]

2024-05-14 Thread via GitHub


mumuhhh commented on code in PR #24600:
URL: https://github.com/apache/flink/pull/24600#discussion_r1600843684


##
flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/utils/DynamicPartitionPruningUtils.java:
##
@@ -236,6 +238,9 @@ private void setTables(ContextResolvedTable catalogTable) {
 tables.add(catalogTable);
 } else {
 for (ContextResolvedTable thisTable : new ArrayList<>(tables)) 
{
+if (tables.contains(catalogTable)) {

Review Comment:
   > I think we can use a boolean flag to check here, then we don't need to 
call contains method every time, it is O(N) time complexity.
   > 
   > ```
   > boolean hasAdded = false;
   > for (ContextResolvedTable thisTable : new 
ArrayList<>(tables)) {
   > if (hasAdded) {
   > break;
   > }
   > if 
(!thisTable.getIdentifier().equals(catalogTable.getIdentifier())) {
   > tables.add(catalogTable);
   > hasAdded = true;
   > }
   > }
   > ```
   
   I think we should modify the traversal logic.
   ```
   boolean hasAdded = false;
   for (ContextResolvedTable thisTable : new 
ArrayList<>(tables)) {
   if 
(thisTable.getIdentifier().equals(catalogTable.getIdentifier())) {
   hasAdded = true;
   break;
   }
   }
   if (!hasAdded) {
   tables.add(catalogTable);
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-34379][table] Fix OutOfMemoryError with large queries [flink]

2024-05-14 Thread via GitHub


mumuhhh commented on code in PR #24600:
URL: https://github.com/apache/flink/pull/24600#discussion_r1600843684


##
flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/utils/DynamicPartitionPruningUtils.java:
##
@@ -236,6 +238,9 @@ private void setTables(ContextResolvedTable catalogTable) {
 tables.add(catalogTable);
 } else {
 for (ContextResolvedTable thisTable : new ArrayList<>(tables)) 
{
+if (tables.contains(catalogTable)) {

Review Comment:
   > 我想我们可以在这里使用布尔标志来检查,那么我们不需要每次都调用方法,它是 O(N) 时间复杂度。`contains`
   > 
   > ```
   > boolean hasAdded = false;
   > for (ContextResolvedTable thisTable : new 
ArrayList<>(tables)) {
   > if (hasAdded) {
   > break;
   > }
   > if 
(!thisTable.getIdentifier().equals(catalogTable.getIdentifier())) {
   > tables.add(catalogTable);
   > hasAdded = true;
   > }
   > }
   > ```
   
   I think we should modify the traversal logic.
   ```
   boolean hasAdded = false;
   for (ContextResolvedTable thisTable : new 
ArrayList<>(tables)) {
   if 
(thisTable.getIdentifier().equals(catalogTable.getIdentifier())) {
   hasAdded = true;
   break;
   }
   }
   if (!hasAdded) {
   tables.add(catalogTable);
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-34379][table] Fix OutOfMemoryError with large queries [flink]

2024-05-14 Thread via GitHub


mumuhhh commented on code in PR #24600:
URL: https://github.com/apache/flink/pull/24600#discussion_r1600841072


##
flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/utils/DynamicPartitionPruningUtils.java:
##
@@ -115,7 +117,7 @@ private static class DppDimSideChecker {
 private final RelNode relNode;
 private boolean hasFilter;
 private boolean hasPartitionedScan;
-private final List tables = new ArrayList<>();
+private final Set tables = new HashSet<>();

Review Comment:
   Why do we write traversal comparisons like that?
   ```
   boolean hasAdded = false;
   for (ContextResolvedTable thisTable : new 
ArrayList<>(tables)) {
   if 
(thisTable.getIdentifier().equals(catalogTable.getIdentifier())) {
   hasAdded = true;
   break;
   }
   }
   if (!hasAdded) {
   tables.add(catalogTable);
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-34379][table] Fix OutOfMemoryError with large queries [flink]

2024-04-28 Thread via GitHub


lsyldliu merged PR #24600:
URL: https://github.com/apache/flink/pull/24600


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-34379][table] Fix OutOfMemoryError with large queries [flink]

2024-04-27 Thread via GitHub


lsyldliu commented on code in PR #24600:
URL: https://github.com/apache/flink/pull/24600#discussion_r1581993829


##
flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/utils/DynamicPartitionPruningUtils.java:
##
@@ -236,6 +238,9 @@ private void setTables(ContextResolvedTable catalogTable) {
 tables.add(catalogTable);
 } else {
 for (ContextResolvedTable thisTable : new ArrayList<>(tables)) 
{
+if (tables.contains(catalogTable)) {

Review Comment:
   I think we can use a boolean flag to check here, then we don't need to call 
`contains` method every time, it is O(N) time complexity.
   
   ```
   boolean hasAdded = false;
   for (ContextResolvedTable thisTable : new 
ArrayList<>(tables)) {
   if (hasAdded) {
   break;
   }
   if 
(!thisTable.getIdentifier().equals(catalogTable.getIdentifier())) {
   tables.add(catalogTable);
   hasAdded = true;
   }
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-34379][table] Fix OutOfMemoryError with large queries [flink]

2024-04-27 Thread via GitHub


jeyhunkarimov commented on code in PR #24600:
URL: https://github.com/apache/flink/pull/24600#discussion_r1581891157


##
flink-table/flink-table-planner/src/test/java/org/apache/flink/table/planner/plan/optimize/program/DynamicPartitionPruningProgramTest.java:
##
@@ -81,6 +87,42 @@ void setup() {
 + ")");
 }
 
+@Test
+void testLargeQueryPlanShouldNotOutOfMemory() {
+// TABLE_OPTIMIZER_DYNAMIC_FILTERING_ENABLED is already enabled
+List strings = new ArrayList<>();

Review Comment:
   Nice catch, I just copy-pasted the code in jira. Fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-34379][table] Fix OutOfMemoryError with large queries [flink]

2024-04-27 Thread via GitHub


jeyhunkarimov commented on PR #24600:
URL: https://github.com/apache/flink/pull/24600#issuecomment-2081151799

   Hi @lsyldliu thanks for the review. I addressed your comments. Could you 
please check in your available time? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-34379][table] Fix OutOfMemoryError with large queries [flink]

2024-04-25 Thread via GitHub


lsyldliu commented on code in PR #24600:
URL: https://github.com/apache/flink/pull/24600#discussion_r1579386652


##
flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/utils/DynamicPartitionPruningUtils.java:
##
@@ -115,7 +117,7 @@ private static class DppDimSideChecker {
 private final RelNode relNode;
 private boolean hasFilter;
 private boolean hasPartitionedScan;
-private final List tables = new ArrayList<>();
+private final Set tables = new HashSet<>();

Review Comment:
   I think we can optimize this for loop by the way to reduce the time 
complexity. If the `catalogTable` has already been added to the collection 
`tables`, we can just exit the loop without having to do subsequent comparison 
operations.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-34379][table] Fix OutOfMemoryError with large queries [flink]

2024-04-25 Thread via GitHub


lsyldliu commented on code in PR #24600:
URL: https://github.com/apache/flink/pull/24600#discussion_r1579356007


##
flink-table/flink-table-planner/src/test/java/org/apache/flink/table/planner/plan/optimize/program/DynamicPartitionPruningProgramTest.java:
##
@@ -81,6 +87,42 @@ void setup() {
 + ")");
 }
 
+@Test
+void testLargeQueryPlanShouldNotOutOfMemory() {
+// TABLE_OPTIMIZER_DYNAMIC_FILTERING_ENABLED is already enabled
+List strings = new ArrayList<>();
+for (int i = 0; i < 100; i++) {
+util.tableEnv()
+.executeSql(
+"CREATE TABLE IF NOT EXISTS table"
++ i
++ "(att STRING,filename STRING) "
++ "with("
++ " 'connector' = 'values', "
++ " 'runtime-source' = 'NewSource', "
++ " 'bounded' = 'true'"
++ ")");
+strings.add("select att,filename from table" + i);
+}
+
+final String countName = "CNM";
+Table allUnionTable = util.tableEnv().sqlQuery(String.join(" UNION ALL 
", strings));
+Table res =

Review Comment:
   Can you complete this test pattern using SQL query purely instead of table 
API?



##
flink-table/flink-table-planner/src/test/java/org/apache/flink/table/planner/plan/optimize/program/DynamicPartitionPruningProgramTest.java:
##
@@ -81,6 +87,42 @@ void setup() {
 + ")");
 }
 
+@Test
+void testLargeQueryPlanShouldNotOutOfMemory() {
+// TABLE_OPTIMIZER_DYNAMIC_FILTERING_ENABLED is already enabled
+List strings = new ArrayList<>();

Review Comment:
   strings -> subQueries?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-34379][table] Fix OutOfMemoryError with large queries [flink]

2024-04-01 Thread via GitHub


jeyhunkarimov commented on code in PR #24600:
URL: https://github.com/apache/flink/pull/24600#discussion_r1546924713


##
flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/utils/DynamicPartitionPruningUtils.java:
##
@@ -115,7 +117,7 @@ private static class DppDimSideChecker {
 private final RelNode relNode;
 private boolean hasFilter;
 private boolean hasPartitionedScan;
-private final List tables = new ArrayList<>();
+private final Set tables = new HashSet<>();

Review Comment:
   OOM happens because of 
   ```
   for (ContextResolvedTable thisTable : new ArrayList<>(tables)) {
   if (!thisTable.getIdentifier().equals(catalogTable.getIdentifier())) {
   tables.add(catalogTable);
   }
   }
   ```
   
   in `setTables` method. That is, `tables.add` is used without checking if 
`tables` already contains the `catalogTable`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-34379][table] Fix OutOfMemoryError with large queries [flink]

2024-04-01 Thread via GitHub


flinkbot commented on PR #24600:
URL: https://github.com/apache/flink/pull/24600#issuecomment-2030689199

   
   ## CI report:
   
   * 06e59ca12ef6650b79e82fb513c47e53d90f052e UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org