Re: [PR] [HUDI-6975] Optimize the code of DayBasedCompactionStrategy [hudi]
danny0405 merged PR #9911: URL: https://github.com/apache/hudi/pull/9911 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6975] Optimize the code of DayBasedCompactionStrategy [hudi]
danny0405 commented on code in PR #9911: URL: https://github.com/apache/hudi/pull/9911#discussion_r1372876171 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/strategy/DayBasedCompactionStrategy.java: ## @@ -63,21 +60,9 @@ public Comparator getComparator() { return comparator; } - @Override - public List orderAndFilter(HoodieWriteConfig writeConfig, - List operations, List pendingCompactionPlans) { -// Iterate through the operations and accept operations as long as we are within the configured target partitions -// limit -return operations.stream() - .collect(Collectors.groupingBy(HoodieCompactionOperation::getPartitionPath)).entrySet().stream() - .sorted(Map.Entry.comparingByKey(comparator)).limit(writeConfig.getTargetPartitionsPerDayBasedCompaction()) -.flatMap(e -> e.getValue().stream()).collect(Collectors.toList()); - } - @Override public List filterPartitionPaths(HoodieWriteConfig writeConfig, List allPartitionPaths) { -return allPartitionPaths.stream().map(partition -> partition.replace("/", "-")) -.sorted(Comparator.reverseOrder()).map(partitionPath -> partitionPath.replace("-", "/")) +return allPartitionPaths.stream().sorted(comparator) .collect(Collectors.toList()).subList(0, Math.min(allPartitionPaths.size(), Review Comment: Is the partition path relative or full, why it has code like this before: ```java partition.replace("/", "-") ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6975] Optimize the code of DayBasedCompactionStrategy [hudi]
ksmou commented on code in PR #9911: URL: https://github.com/apache/hudi/pull/9911#discussion_r1373049638 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/strategy/DayBasedCompactionStrategy.java: ## @@ -63,21 +60,9 @@ public Comparator getComparator() { return comparator; } - @Override - public List orderAndFilter(HoodieWriteConfig writeConfig, - List operations, List pendingCompactionPlans) { -// Iterate through the operations and accept operations as long as we are within the configured target partitions -// limit -return operations.stream() - .collect(Collectors.groupingBy(HoodieCompactionOperation::getPartitionPath)).entrySet().stream() - .sorted(Map.Entry.comparingByKey(comparator)).limit(writeConfig.getTargetPartitionsPerDayBasedCompaction()) -.flatMap(e -> e.getValue().stream()).collect(Collectors.toList()); - } - @Override public List filterPartitionPaths(HoodieWriteConfig writeConfig, List allPartitionPaths) { -return allPartitionPaths.stream().map(partition -> partition.replace("/", "-")) -.sorted(Comparator.reverseOrder()).map(partitionPath -> partitionPath.replace("-", "/")) +return allPartitionPaths.stream().sorted(comparator) .collect(Collectors.toList()).subList(0, Math.min(allPartitionPaths.size(), Review Comment: The partition path is relative path. In before code, `replace` is just used to sort the partitions by dictionary order. If we use the date format compator, no need to replace it. ``` allPartitionPaths.stream().map(partition -> partition.replace("/", "-")) .sorted(Comparator.reverseOrder()).map(partitionPath -> partitionPath.replace("-", "/")) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6975] Optimize the code of DayBasedCompactionStrategy [hudi]
danny0405 commented on code in PR #9911: URL: https://github.com/apache/hudi/pull/9911#discussion_r1372876171 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/strategy/DayBasedCompactionStrategy.java: ## @@ -63,21 +60,9 @@ public Comparator getComparator() { return comparator; } - @Override - public List orderAndFilter(HoodieWriteConfig writeConfig, - List operations, List pendingCompactionPlans) { -// Iterate through the operations and accept operations as long as we are within the configured target partitions -// limit -return operations.stream() - .collect(Collectors.groupingBy(HoodieCompactionOperation::getPartitionPath)).entrySet().stream() - .sorted(Map.Entry.comparingByKey(comparator)).limit(writeConfig.getTargetPartitionsPerDayBasedCompaction()) -.flatMap(e -> e.getValue().stream()).collect(Collectors.toList()); - } - @Override public List filterPartitionPaths(HoodieWriteConfig writeConfig, List allPartitionPaths) { -return allPartitionPaths.stream().map(partition -> partition.replace("/", "-")) -.sorted(Comparator.reverseOrder()).map(partitionPath -> partitionPath.replace("-", "/")) +return allPartitionPaths.stream().sorted(comparator) .collect(Collectors.toList()).subList(0, Math.min(allPartitionPaths.size(), Review Comment: Is the partitiona path relative or full, why it has code like this before: ```java partition.replace("/", "-") ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6975] Optimize the code of DayBasedCompactionStrategy [hudi]
ksmou commented on PR #9911: URL: https://github.com/apache/hudi/pull/9911#issuecomment-1780493399 > Retriggered the failed tests. Retriggered tests succeed. https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=20482&view=results -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6975] Optimize the code of DayBasedCompactionStrategy [hudi]
danny0405 commented on PR #9911: URL: https://github.com/apache/hudi/pull/9911#issuecomment-1780367765 Retriggered the failed tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6975] Optimize the code of DayBasedCompactionStrategy [hudi]
ksmou commented on PR #9911: URL: https://github.com/apache/hudi/pull/9911#issuecomment-1780305248 > I see some test failures: > > ```java > testUpsertsCOWContinuousMode{HoodieRecordType}[1] Time elapsed: 396.414 s <<< ERROR! > ``` > > Not sure whether it is related. It's is not related. I test it local successful. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6975] Optimize the code of DayBasedCompactionStrategy [hudi]
danny0405 commented on PR #9911: URL: https://github.com/apache/hudi/pull/9911#issuecomment-1780282030 I see some test failures: ```java testUpsertsCOWContinuousMode{HoodieRecordType}[1] Time elapsed: 396.414 s <<< ERROR! ``` Not sure whether it is related. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6975] Optimize the code of DayBasedCompactionStrategy [hudi]
danny0405 commented on code in PR #9911: URL: https://github.com/apache/hudi/pull/9911#discussion_r1371363530 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/strategy/DayBasedCompactionStrategy.java: ## @@ -63,21 +60,9 @@ public Comparator getComparator() { return comparator; } - @Override - public List orderAndFilter(HoodieWriteConfig writeConfig, - List operations, List pendingCompactionPlans) { -// Iterate through the operations and accept operations as long as we are within the configured target partitions -// limit -return operations.stream() - .collect(Collectors.groupingBy(HoodieCompactionOperation::getPartitionPath)).entrySet().stream() - .sorted(Map.Entry.comparingByKey(comparator)).limit(writeConfig.getTargetPartitionsPerDayBasedCompaction()) -.flatMap(e -> e.getValue().stream()).collect(Collectors.toList()); - } - @Override public List filterPartitionPaths(HoodieWriteConfig writeConfig, List allPartitionPaths) { -return allPartitionPaths.stream().map(partition -> partition.replace("/", "-")) -.sorted(Comparator.reverseOrder()).map(partitionPath -> partitionPath.replace("-", "/")) +return allPartitionPaths.stream().sorted(comparator) .collect(Collectors.toList()).subList(0, Math.min(allPartitionPaths.size(), Review Comment: Okay, got it, caution that you have changed the comparator of the partitions, does that introduce any protential regressions ? Can we add some tests to conver it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6975] Optimize the code of DayBasedCompactionStrategy [hudi]
hudi-bot commented on PR #9911: URL: https://github.com/apache/hudi/pull/9911#issuecomment-1779433738 ## CI report: * 0b914fba158ab8f78e647275ed323f3b55d9225a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20482) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6975] Optimize the code of DayBasedCompactionStrategy [hudi]
hudi-bot commented on PR #9911: URL: https://github.com/apache/hudi/pull/9911#issuecomment-1779296940 ## CI report: * a8ade6a6b287f42853f7c6e95c5534656f826b9b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20455) * 0b914fba158ab8f78e647275ed323f3b55d9225a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20482) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6975] Optimize the code of DayBasedCompactionStrategy [hudi]
hudi-bot commented on PR #9911: URL: https://github.com/apache/hudi/pull/9911#issuecomment-1779208553 ## CI report: * a8ade6a6b287f42853f7c6e95c5534656f826b9b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20455) * 0b914fba158ab8f78e647275ed323f3b55d9225a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6975] Optimize the code of DayBasedCompactionStrategy [hudi]
ksmou commented on code in PR #9911: URL: https://github.com/apache/hudi/pull/9911#discussion_r1371702258 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/strategy/DayBasedCompactionStrategy.java: ## @@ -63,21 +60,9 @@ public Comparator getComparator() { return comparator; } - @Override - public List orderAndFilter(HoodieWriteConfig writeConfig, - List operations, List pendingCompactionPlans) { -// Iterate through the operations and accept operations as long as we are within the configured target partitions -// limit -return operations.stream() - .collect(Collectors.groupingBy(HoodieCompactionOperation::getPartitionPath)).entrySet().stream() - .sorted(Map.Entry.comparingByKey(comparator)).limit(writeConfig.getTargetPartitionsPerDayBasedCompaction()) -.flatMap(e -> e.getValue().stream()).collect(Collectors.toList()); - } - @Override public List filterPartitionPaths(HoodieWriteConfig writeConfig, List allPartitionPaths) { -return allPartitionPaths.stream().map(partition -> partition.replace("/", "-")) -.sorted(Comparator.reverseOrder()).map(partitionPath -> partitionPath.replace("-", "/")) +return allPartitionPaths.stream().sorted(comparator) .collect(Collectors.toList()).subList(0, Math.min(allPartitionPaths.size(), Review Comment: Unit tests updated -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6975] Optimize the code of DayBasedCompactionStrategy [hudi]
danny0405 commented on code in PR #9911: URL: https://github.com/apache/hudi/pull/9911#discussion_r1371363530 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/strategy/DayBasedCompactionStrategy.java: ## @@ -63,21 +60,9 @@ public Comparator getComparator() { return comparator; } - @Override - public List orderAndFilter(HoodieWriteConfig writeConfig, - List operations, List pendingCompactionPlans) { -// Iterate through the operations and accept operations as long as we are within the configured target partitions -// limit -return operations.stream() - .collect(Collectors.groupingBy(HoodieCompactionOperation::getPartitionPath)).entrySet().stream() - .sorted(Map.Entry.comparingByKey(comparator)).limit(writeConfig.getTargetPartitionsPerDayBasedCompaction()) -.flatMap(e -> e.getValue().stream()).collect(Collectors.toList()); - } - @Override public List filterPartitionPaths(HoodieWriteConfig writeConfig, List allPartitionPaths) { -return allPartitionPaths.stream().map(partition -> partition.replace("/", "-")) -.sorted(Comparator.reverseOrder()).map(partitionPath -> partitionPath.replace("-", "/")) +return allPartitionPaths.stream().sorted(comparator) .collect(Collectors.toList()).subList(0, Math.min(allPartitionPaths.size(), Review Comment: Okay, got it, caution that you have change the compactor of the partitions, does that introduce any protential regressions ? Can we add some tests to conver it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6975] Optimize the code of DayBasedCompactionStrategy [hudi]
ksmou commented on code in PR #9911: URL: https://github.com/apache/hudi/pull/9911#discussion_r1371222617 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/strategy/DayBasedCompactionStrategy.java: ## @@ -63,21 +60,9 @@ public Comparator getComparator() { return comparator; } - @Override - public List orderAndFilter(HoodieWriteConfig writeConfig, - List operations, List pendingCompactionPlans) { -// Iterate through the operations and accept operations as long as we are within the configured target partitions -// limit -return operations.stream() - .collect(Collectors.groupingBy(HoodieCompactionOperation::getPartitionPath)).entrySet().stream() - .sorted(Map.Entry.comparingByKey(comparator)).limit(writeConfig.getTargetPartitionsPerDayBasedCompaction()) -.flatMap(e -> e.getValue().stream()).collect(Collectors.toList()); - } - @Override public List filterPartitionPaths(HoodieWriteConfig writeConfig, List allPartitionPaths) { -return allPartitionPaths.stream().map(partition -> partition.replace("/", "-")) -.sorted(Comparator.reverseOrder()).map(partitionPath -> partitionPath.replace("-", "/")) +return allPartitionPaths.stream().sorted(comparator) .collect(Collectors.toList()).subList(0, Math.min(allPartitionPaths.size(), Review Comment: If the original size `allPartitionPaths.size()` is smaller than `writeConfig.getTargetPartitionsPerDayBasedCompaction()`, only `subList(writeConfig.getTargetPartitionsPerDayBasedCompaction())` will throw IndexOutOfBoundsException. I think we can use `limit` to replace `subList()`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6975] Optimize the code of DayBasedCompactionStrategy [hudi]
danny0405 commented on code in PR #9911: URL: https://github.com/apache/hudi/pull/9911#discussion_r1371139632 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/strategy/DayBasedCompactionStrategy.java: ## @@ -63,21 +60,9 @@ public Comparator getComparator() { return comparator; } - @Override - public List orderAndFilter(HoodieWriteConfig writeConfig, - List operations, List pendingCompactionPlans) { -// Iterate through the operations and accept operations as long as we are within the configured target partitions -// limit -return operations.stream() - .collect(Collectors.groupingBy(HoodieCompactionOperation::getPartitionPath)).entrySet().stream() - .sorted(Map.Entry.comparingByKey(comparator)).limit(writeConfig.getTargetPartitionsPerDayBasedCompaction()) -.flatMap(e -> e.getValue().stream()).collect(Collectors.toList()); - } - @Override public List filterPartitionPaths(HoodieWriteConfig writeConfig, List allPartitionPaths) { -return allPartitionPaths.stream().map(partition -> partition.replace("/", "-")) -.sorted(Comparator.reverseOrder()).map(partitionPath -> partitionPath.replace("-", "/")) +return allPartitionPaths.stream().sorted(comparator) .collect(Collectors.toList()).subList(0, Math.min(allPartitionPaths.size(), Review Comment: Why we subList its original size, I'm confused. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6975] Optimize the code of DayBasedCompactionStrategy [hudi]
ksmou commented on code in PR #9911: URL: https://github.com/apache/hudi/pull/9911#discussion_r1371082887 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/strategy/DayBasedCompactionStrategy.java: ## @@ -63,21 +60,9 @@ public Comparator getComparator() { return comparator; } - @Override - public List orderAndFilter(HoodieWriteConfig writeConfig, - List operations, List pendingCompactionPlans) { -// Iterate through the operations and accept operations as long as we are within the configured target partitions -// limit -return operations.stream() - .collect(Collectors.groupingBy(HoodieCompactionOperation::getPartitionPath)).entrySet().stream() - .sorted(Map.Entry.comparingByKey(comparator)).limit(writeConfig.getTargetPartitionsPerDayBasedCompaction()) -.flatMap(e -> e.getValue().stream()).collect(Collectors.toList()); - } - @Override public List filterPartitionPaths(HoodieWriteConfig writeConfig, List allPartitionPaths) { -return allPartitionPaths.stream().map(partition -> partition.replace("/", "-")) -.sorted(Comparator.reverseOrder()).map(partitionPath -> partitionPath.replace("-", "/")) +return allPartitionPaths.stream().sorted(comparator) .collect(Collectors.toList()).subList(0, Math.min(allPartitionPaths.size(), Review Comment: The test failures are not related to this change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6975] Optimize the code of DayBasedCompactionStrategy [hudi]
danny0405 commented on code in PR #9911: URL: https://github.com/apache/hudi/pull/9911#discussion_r1371010223 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/strategy/DayBasedCompactionStrategy.java: ## @@ -63,21 +60,9 @@ public Comparator getComparator() { return comparator; } - @Override - public List orderAndFilter(HoodieWriteConfig writeConfig, - List operations, List pendingCompactionPlans) { -// Iterate through the operations and accept operations as long as we are within the configured target partitions -// limit -return operations.stream() - .collect(Collectors.groupingBy(HoodieCompactionOperation::getPartitionPath)).entrySet().stream() - .sorted(Map.Entry.comparingByKey(comparator)).limit(writeConfig.getTargetPartitionsPerDayBasedCompaction()) -.flatMap(e -> e.getValue().stream()).collect(Collectors.toList()); - } - @Override public List filterPartitionPaths(HoodieWriteConfig writeConfig, List allPartitionPaths) { -return allPartitionPaths.stream().map(partition -> partition.replace("/", "-")) -.sorted(Comparator.reverseOrder()).map(partitionPath -> partitionPath.replace("-", "/")) +return allPartitionPaths.stream().sorted(comparator) .collect(Collectors.toList()).subList(0, Math.min(allPartitionPaths.size(), Review Comment: Can you check the test failures. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org