Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
danny0405 merged PR #9904: URL: https://github.com/apache/hudi/pull/9904 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
hudi-bot commented on PR #9904: URL: https://github.com/apache/hudi/pull/9904#issuecomment-1790856867 ## CI report: * 62b7696970bac4382a9b6467721de915116fb3a5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20629) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
hudi-bot commented on PR #9904: URL: https://github.com/apache/hudi/pull/9904#issuecomment-1790838955 ## CI report: * 62b7696970bac4382a9b6467721de915116fb3a5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20629) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
zhuanshenbsj1 commented on PR #9904: URL: https://github.com/apache/hudi/pull/9904#issuecomment-1790774657 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
zhuanshenbsj1 commented on PR #9904: URL: https://github.com/apache/hudi/pull/9904#issuecomment-1790771816 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
hudi-bot commented on PR #9904: URL: https://github.com/apache/hudi/pull/9904#issuecomment-1790091327 ## CI report: * 62b7696970bac4382a9b6467721de915116fb3a5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20629) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
hudi-bot commented on PR #9904: URL: https://github.com/apache/hudi/pull/9904#issuecomment-1789972737 ## CI report: * 9aa65644532e1cbc3d7b13bd389dd0b5ea63b5e1 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20553) * 62b7696970bac4382a9b6467721de915116fb3a5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20629) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
hudi-bot commented on PR #9904: URL: https://github.com/apache/hudi/pull/9904#issuecomment-1789967218 ## CI report: * 9aa65644532e1cbc3d7b13bd389dd0b5ea63b5e1 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20553) * 62b7696970bac4382a9b6467721de915116fb3a5 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
zhuanshenbsj1 commented on code in PR #9904: URL: https://github.com/apache/hudi/pull/9904#discussion_r1379507296 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/OptionsResolver.java: ## @@ -272,6 +272,13 @@ public static boolean hasNoSpecificReadCommits(Configuration conf) { return !conf.contains(FlinkOptions.READ_START_COMMIT) && !conf.contains(FlinkOptions.READ_END_COMMIT); } + /** + * Returns whether the read commits limit is specified. + */ + public static boolean isSpecificReadCommitsLimit(Configuration conf) { +return conf.contains(FlinkOptions.READ_COMMITS_LIMIT); Review Comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
danny0405 commented on code in PR #9904: URL: https://github.com/apache/hudi/pull/9904#discussion_r1379478910 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/OptionsResolver.java: ## @@ -272,6 +272,13 @@ public static boolean hasNoSpecificReadCommits(Configuration conf) { return !conf.contains(FlinkOptions.READ_START_COMMIT) && !conf.contains(FlinkOptions.READ_END_COMMIT); } + /** + * Returns whether the read commits limit is specified. + */ + public static boolean isSpecificReadCommitsLimit(Configuration conf) { +return conf.contains(FlinkOptions.READ_COMMITS_LIMIT); Review Comment: `isSpecificReadCommitsLimit` -> `hasReadCommitsLimit` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
hudi-bot commented on PR #9904: URL: https://github.com/apache/hudi/pull/9904#issuecomment-1784725715 ## CI report: * 9aa65644532e1cbc3d7b13bd389dd0b5ea63b5e1 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20553) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
hudi-bot commented on PR #9904: URL: https://github.com/apache/hudi/pull/9904#issuecomment-1784533184 ## CI report: * 14a3dad977f45059922be67061203355b05cee59 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20540) * 9aa65644532e1cbc3d7b13bd389dd0b5ea63b5e1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20553) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
hudi-bot commented on PR #9904: URL: https://github.com/apache/hudi/pull/9904#issuecomment-1784526614 ## CI report: * 14a3dad977f45059922be67061203355b05cee59 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20540) * 9aa65644532e1cbc3d7b13bd389dd0b5ea63b5e1 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
zhuanshenbsj1 commented on code in PR #9904: URL: https://github.com/apache/hudi/pull/9904#discussion_r1375680863 ## hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/TestIncrementalInputSplits.java: ## @@ -191,6 +192,38 @@ void testInputSplitsWithPartitionPruner( assertEquals(expectedPartitions, partitions); } + @Test + void testInputSplitsWithSpeedLimit() throws Exception { +Configuration conf = TestConfigurations.getDefaultConf(basePath); +conf.set(FlinkOptions.READ_AS_STREAMING, true); +conf.set(FlinkOptions.READ_STREAMING_SKIP_CLUSTERING, true); +conf.set(FlinkOptions.READ_STREAMING_SKIP_COMPACT, true); +conf.set(FlinkOptions.READ_SPEED_LIMIT_ENABLED, true); +conf.set(FlinkOptions.READ_SPEED_LIMIT_COMMITS, 1); +// insert data +TestData.writeData(TestData.DATA_SET_INSERT, conf); +TestData.writeData(TestData.DATA_SET_INSERT, conf); +TestData.writeData(TestData.DATA_SET_INSERT, conf); +TestData.writeData(TestData.DATA_SET_INSERT, conf); + +HoodieTimeline commitsTimeline = metaClient.reloadActiveTimeline() +.filter(hoodieInstant -> hoodieInstant.getAction().equals(HoodieTimeline.COMMIT_ACTION)); +HoodieInstant firstInstant = commitsTimeline.firstInstant().get(); +HoodieInstant secondInstant = commitsTimeline.getInstants().get(1); +IncrementalInputSplits iis = IncrementalInputSplits.builder() +.conf(conf) +.path(new Path(basePath)) +.rowType(TestConfigurations.ROW_TYPE) +.partitionPruner(null) +.build(); +IncrementalInputSplits.Result result = iis.inputSplits(metaClient, firstInstant.getTimestamp(), firstInstant.getCompletionTime(), false); +result.getInputSplits().stream().forEach(split -> { + InstantRange range = split.getInstantRange().get(); Review Comment: Adjust to assert the commit range is 1; -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
zhuanshenbsj1 commented on code in PR #9904: URL: https://github.com/apache/hudi/pull/9904#discussion_r1373990668 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/IncrementalInputSplits.java: ## @@ -269,6 +269,9 @@ public Result inputSplits( Result hollowSplits = getHollowInputSplits(metaClient, metaClient.getHadoopConf(), issuedInstant, issuedOffset, commitTimeline, cdcEnabled); List instants = filterInstantsWithRange(commitTimeline, issuedInstant); +int instantLimit = this.conf.getInteger(FlinkOptions.READ_COMMITS_LIMIT,Integer.MAX_VALUE); Review Comment: Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
danny0405 commented on code in PR #9904: URL: https://github.com/apache/hudi/pull/9904#discussion_r1375599334 ## hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/TestIncrementalInputSplits.java: ## @@ -191,6 +192,38 @@ void testInputSplitsWithPartitionPruner( assertEquals(expectedPartitions, partitions); } + @Test + void testInputSplitsWithSpeedLimit() throws Exception { +Configuration conf = TestConfigurations.getDefaultConf(basePath); +conf.set(FlinkOptions.READ_AS_STREAMING, true); +conf.set(FlinkOptions.READ_STREAMING_SKIP_CLUSTERING, true); +conf.set(FlinkOptions.READ_STREAMING_SKIP_COMPACT, true); +conf.set(FlinkOptions.READ_SPEED_LIMIT_ENABLED, true); +conf.set(FlinkOptions.READ_SPEED_LIMIT_COMMITS, 1); +// insert data +TestData.writeData(TestData.DATA_SET_INSERT, conf); +TestData.writeData(TestData.DATA_SET_INSERT, conf); +TestData.writeData(TestData.DATA_SET_INSERT, conf); +TestData.writeData(TestData.DATA_SET_INSERT, conf); + +HoodieTimeline commitsTimeline = metaClient.reloadActiveTimeline() +.filter(hoodieInstant -> hoodieInstant.getAction().equals(HoodieTimeline.COMMIT_ACTION)); +HoodieInstant firstInstant = commitsTimeline.firstInstant().get(); +HoodieInstant secondInstant = commitsTimeline.getInstants().get(1); +IncrementalInputSplits iis = IncrementalInputSplits.builder() +.conf(conf) +.path(new Path(basePath)) +.rowType(TestConfigurations.ROW_TYPE) +.partitionPruner(null) +.build(); +IncrementalInputSplits.Result result = iis.inputSplits(metaClient, firstInstant.getTimestamp(), firstInstant.getCompletionTime(), false); +result.getInputSplits().stream().forEach(split -> { + InstantRange range = split.getInstantRange().get(); Review Comment: Can we just assert the commit number as 1. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
danny0405 commented on code in PR #9904: URL: https://github.com/apache/hudi/pull/9904#discussion_r1375598847 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java: ## @@ -343,12 +343,25 @@ private FlinkOptions() { .noDefaultValue() .withDescription("End commit instant for reading, the commit time format should be 'MMddHHmmss'"); + public static final ConfigOption READ_SPEED_LIMIT_ENABLED = ConfigOptions + .key("read.speed.limit.enabled") + .booleanType() + .defaultValue(false) + .withDescription("Enable stream read speed limit, avoiding the risk of oom caused by the streamReadMonitor " + + " loading too many metadata at once"); + + public static final ConfigOption READ_SPEED_LIMIT_COMMITS = ConfigOptions + .key("read.speed.limit.commits") + .intType() + .defaultValue(5) + .withDescription("The maximum number of commits allowed streamReadMonitor to read in each poll"); + Review Comment: `The maximum number of commits allowed streamReadMonitor to read in each poll` -> `The maximum number of commits allowed to read in each instant check, if it is streaming read, the avg read instants number per-second would be 'read.commits.limit'/'read.streaming.check-interval', by default no limit`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
danny0405 commented on code in PR #9904: URL: https://github.com/apache/hudi/pull/9904#discussion_r1375593559 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java: ## @@ -343,12 +343,25 @@ private FlinkOptions() { .noDefaultValue() .withDescription("End commit instant for reading, the commit time format should be 'MMddHHmmss'"); + public static final ConfigOption READ_SPEED_LIMIT_ENABLED = ConfigOptions + .key("read.speed.limit.enabled") + .booleanType() + .defaultValue(false) + .withDescription("Enable stream read speed limit, avoiding the risk of oom caused by the streamReadMonitor " + + " loading too many metadata at once"); + + public static final ConfigOption READ_SPEED_LIMIT_COMMITS = ConfigOptions + .key("read.speed.limit.commits") + .intType() + .defaultValue(5) Review Comment: A single option `read.commits.limit` without default should be enough, if the user does not configure it, that means there is no limit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
danny0405 commented on code in PR #9904: URL: https://github.com/apache/hudi/pull/9904#discussion_r1375593559 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java: ## @@ -343,12 +343,25 @@ private FlinkOptions() { .noDefaultValue() .withDescription("End commit instant for reading, the commit time format should be 'MMddHHmmss'"); + public static final ConfigOption READ_SPEED_LIMIT_ENABLED = ConfigOptions + .key("read.speed.limit.enabled") + .booleanType() + .defaultValue(false) + .withDescription("Enable stream read speed limit, avoiding the risk of oom caused by the streamReadMonitor " + + " loading too many metadata at once"); + + public static final ConfigOption READ_SPEED_LIMIT_COMMITS = ConfigOptions + .key("read.speed.limit.commits") + .intType() + .defaultValue(5) Review Comment: A single option `read.streaming.commits.limit` without default should be enough, if the user does not configure it, that means there is no limit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
zhuanshenbsj1 commented on PR #9904: URL: https://github.com/apache/hudi/pull/9904#issuecomment-1783992838 cc @danny0405 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
hudi-bot commented on PR #9904: URL: https://github.com/apache/hudi/pull/9904#issuecomment-1783815478 ## CI report: * 14a3dad977f45059922be67061203355b05cee59 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20540) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
hudi-bot commented on PR #9904: URL: https://github.com/apache/hudi/pull/9904#issuecomment-1783786306 ## CI report: * 3adf463b328be138970054c578c13e503cd71c71 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20508) * 14a3dad977f45059922be67061203355b05cee59 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20540) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
hudi-bot commented on PR #9904: URL: https://github.com/apache/hudi/pull/9904#issuecomment-1783776978 ## CI report: * 3adf463b328be138970054c578c13e503cd71c71 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20508) * 14a3dad977f45059922be67061203355b05cee59 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
zhuanshenbsj1 commented on code in PR #9904: URL: https://github.com/apache/hudi/pull/9904#discussion_r1375227941 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/IncrementalInputSplits.java: ## @@ -269,6 +269,9 @@ public Result inputSplits( Result hollowSplits = getHollowInputSplits(metaClient, metaClient.getHadoopConf(), issuedInstant, issuedOffset, commitTimeline, cdcEnabled); List instants = filterInstantsWithRange(commitTimeline, issuedInstant); +int instantLimit = this.conf.getInteger(FlinkOptions.READ_COMMITS_LIMIT,Integer.MAX_VALUE); +instants = instants.subList(0, Math.min(instantLimit, instants.size())); + Review Comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
zhuanshenbsj1 commented on code in PR #9904: URL: https://github.com/apache/hudi/pull/9904#discussion_r1373990668 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/IncrementalInputSplits.java: ## @@ -269,6 +269,9 @@ public Result inputSplits( Result hollowSplits = getHollowInputSplits(metaClient, metaClient.getHadoopConf(), issuedInstant, issuedOffset, commitTimeline, cdcEnabled); List instants = filterInstantsWithRange(commitTimeline, issuedInstant); +int instantLimit = this.conf.getInteger(FlinkOptions.READ_COMMITS_LIMIT,Integer.MAX_VALUE); Review Comment: Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
danny0405 commented on code in PR #9904: URL: https://github.com/apache/hudi/pull/9904#discussion_r1373983795 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/IncrementalInputSplits.java: ## @@ -269,6 +269,9 @@ public Result inputSplits( Result hollowSplits = getHollowInputSplits(metaClient, metaClient.getHadoopConf(), issuedInstant, issuedOffset, commitTimeline, cdcEnabled); List instants = filterInstantsWithRange(commitTimeline, issuedInstant); +int instantLimit = this.conf.getInteger(FlinkOptions.READ_COMMITS_LIMIT,Integer.MAX_VALUE); +instants = instants.subList(0, Math.min(instantLimit, instants.size())); + Review Comment: Can we optimize the code as: ```python if (read_limit_configured) { read_limit = get_read_limit if (read_limit < instants.size) { instants = instants.subList(0, read_limit) } } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
hudi-bot commented on PR #9904: URL: https://github.com/apache/hudi/pull/9904#issuecomment-1781471665 ## CI report: * 3adf463b328be138970054c578c13e503cd71c71 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20508) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
hudi-bot commented on PR #9904: URL: https://github.com/apache/hudi/pull/9904#issuecomment-1780795200 ## CI report: * afdd069ab345284754e397f130b0a8dc4b48a835 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20501) * 3adf463b328be138970054c578c13e503cd71c71 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20508) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
hudi-bot commented on PR #9904: URL: https://github.com/apache/hudi/pull/9904#issuecomment-1780779993 ## CI report: * f2900c75caf546827d8d7b04c6fbc09d1d411f90 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20481) * afdd069ab345284754e397f130b0a8dc4b48a835 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20501) * 3adf463b328be138970054c578c13e503cd71c71 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
danny0405 commented on code in PR #9904: URL: https://github.com/apache/hudi/pull/9904#discussion_r1372842856 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/IncrementalInputSplits.java: ## @@ -269,6 +269,9 @@ public Result inputSplits( Result hollowSplits = getHollowInputSplits(metaClient, metaClient.getHadoopConf(), issuedInstant, issuedOffset, commitTimeline, cdcEnabled); List instants = filterInstantsWithRange(commitTimeline, issuedInstant); +int instantLimit = this.conf.getInteger(FlinkOptions.READ_COMMITS_LIMIT,Integer.MAX_VALUE); Review Comment: ```suggestion int instantLimit = this.conf.getInteger(FlinkOptions.READ_COMMITS_LIMIT, Integer.MAX_VALUE); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
hudi-bot commented on PR #9904: URL: https://github.com/apache/hudi/pull/9904#issuecomment-1780523136 ## CI report: * f2900c75caf546827d8d7b04c6fbc09d1d411f90 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20481) * afdd069ab345284754e397f130b0a8dc4b48a835 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20501) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
hudi-bot commented on PR #9904: URL: https://github.com/apache/hudi/pull/9904#issuecomment-1780513244 ## CI report: * f2900c75caf546827d8d7b04c6fbc09d1d411f90 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20481) * afdd069ab345284754e397f130b0a8dc4b48a835 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
zhuanshenbsj1 commented on code in PR #9904: URL: https://github.com/apache/hudi/pull/9904#discussion_r1372638619 ## hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/utils/TestData.java: ## @@ -503,6 +503,20 @@ public static String rowDataToString(List rows) { public static void writeData( List dataBuffer, Configuration conf) throws Exception { +writeData(dataBuffer, conf, 1); + } + + /** + * Write a list of row data with Hoodie format base on the given configuration. + * + * @param dataBuffer The data buffer to write + * @param conf The flink configuration + * @param ckpId The checkpoint id + * @throws Exception if error occurs + */ + public static void writeData( + List dataBuffer, Review Comment: Revert this change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
danny0405 commented on code in PR #9904: URL: https://github.com/apache/hudi/pull/9904#discussion_r1372556576 ## hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/utils/TestData.java: ## @@ -503,6 +503,20 @@ public static String rowDataToString(List rows) { public static void writeData( List dataBuffer, Configuration conf) throws Exception { +writeData(dataBuffer, conf, 1); + } + + /** + * Write a list of row data with Hoodie format base on the given configuration. + * + * @param dataBuffer The data buffer to write + * @param conf The flink configuration + * @param ckpId The checkpoint id + * @throws Exception if error occurs + */ + public static void writeData( + List dataBuffer, Review Comment: The checkpoint id does not affect the data write, there is no need to specify it explicitly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
hudi-bot commented on PR #9904: URL: https://github.com/apache/hudi/pull/9904#issuecomment-1779563544 ## CI report: * f2900c75caf546827d8d7b04c6fbc09d1d411f90 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20481) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
hudi-bot commented on PR #9904: URL: https://github.com/apache/hudi/pull/9904#issuecomment-1779010849 ## CI report: * 23af1b3753a523ffd717b7fb56a87501f3327adf Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20443) * f2900c75caf546827d8d7b04c6fbc09d1d411f90 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20481) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
hudi-bot commented on PR #9904: URL: https://github.com/apache/hudi/pull/9904#issuecomment-1778996533 ## CI report: * 23af1b3753a523ffd717b7fb56a87501f3327adf Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20443) * f2900c75caf546827d8d7b04c6fbc09d1d411f90 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
zhuanshenbsj1 commented on code in PR #9904: URL: https://github.com/apache/hudi/pull/9904#discussion_r1371517511 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/IncrementalInputSplits.java: ## @@ -269,6 +269,9 @@ public Result inputSplits( Result hollowSplits = getHollowInputSplits(metaClient, metaClient.getHadoopConf(), issuedInstant, issuedOffset, commitTimeline, cdcEnabled); List instants = filterInstantsWithRange(commitTimeline, issuedInstant); +int instantLimit = this.conf.getInteger(FlinkOptions.READ_COMMITS_LIMIT,Integer.MAX_VALUE); +instants = instants.subList(0, Math.min(instantLimit, instants.size())); + Review Comment: Unit test added -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
danny0405 commented on code in PR #9904: URL: https://github.com/apache/hudi/pull/9904#discussion_r137106 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/IncrementalInputSplits.java: ## @@ -269,6 +269,9 @@ public Result inputSplits( Result hollowSplits = getHollowInputSplits(metaClient, metaClient.getHadoopConf(), issuedInstant, issuedOffset, commitTimeline, cdcEnabled); List instants = filterInstantsWithRange(commitTimeline, issuedInstant); +int instantLimit = this.conf.getInteger(FlinkOptions.READ_COMMITS_LIMIT,Integer.MAX_VALUE); +instants = instants.subList(0, Math.min(instantLimit, instants.size())); + Review Comment: Can we add some tests for it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
hudi-bot commented on PR #9904: URL: https://github.com/apache/hudi/pull/9904#issuecomment-1775266079 ## CI report: * 23af1b3753a523ffd717b7fb56a87501f3327adf Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20443) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
hudi-bot commented on PR #9904: URL: https://github.com/apache/hudi/pull/9904#issuecomment-1774940196 ## CI report: * 23af1b3753a523ffd717b7fb56a87501f3327adf Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20443) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]
hudi-bot commented on PR #9904: URL: https://github.com/apache/hudi/pull/9904#issuecomment-1774925985 ## CI report: * 23af1b3753a523ffd717b7fb56a87501f3327adf UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [HUDI-6969] Add speed limit for stream read [hudi]
zhuanshenbsj1 opened a new pull request, #9904: URL: https://github.com/apache/hudi/pull/9904 ### Change Logs Currently, there is no speed limit for stream read, and regardless of the instantranges, they will be read at once. It is easy to cause GC of monitor operator. Add a configuration to limit the number of commits read per round in stream read mode. ### Impact _Describe any public API or user-facing feature change or any performance impact._ ### Risk level (write none, low medium or high below) _If medium or high, explain what verification was done to mitigate the risks._ ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org