[GitHub] [hudi] nsivabalan commented on a diff in pull request #6548: [HUDI-4749] Fixing full cleaning to leverage metadata table
nsivabalan commented on code in PR #6548: URL: https://github.com/apache/hudi/pull/6548#discussion_r964231352 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java: ## @@ -206,15 +206,7 @@ private List getPartitionPathsForIncrementalCleaning(HoodieCleanMetadata */ private List getPartitionPathsForFullCleaning() { // Go to brute force mode of scanning all partitions -try { - // Because the partition of BaseTableMetadata has been deleted, - // all partition information can only be obtained from FileSystemBackedTableMetadata. Review Comment: One option is. during DELETE_PARTITION, we will not delete it from metadata table. When cleaner comes around and detects there are not valid file groups, it will physically delete it and will relay it to metadata table. at this juncture, metadata table will remove the partition of interest. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a diff in pull request #6548: [HUDI-4749] Fixing full cleaning to leverage metadata table
nsivabalan commented on code in PR #6548: URL: https://github.com/apache/hudi/pull/6548#discussion_r964230632 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java: ## @@ -206,15 +206,7 @@ private List getPartitionPathsForIncrementalCleaning(HoodieCleanMetadata */ private List getPartitionPathsForFullCleaning() { // Go to brute force mode of scanning all partitions -try { - // Because the partition of BaseTableMetadata has been deleted, - // all partition information can only be obtained from FileSystemBackedTableMetadata. Review Comment: guess we can't go w/ this change. essentially what we are doing in cleaning is 1. fetch list of partitions to clean 2. Fetch list of files to clean 3. for those partitions where there is no valid file groups, we mark the partition to be deleted and physically delete the directory (lazy execution of DELETE_PARTITION) So, in step1, we have to go w/ FS based listing. if we revert to metadata table based listing, we may not get the deleted partition only. and so the physical directory will never get deleted. Need to think about an optimized way to go about this. for now, will punt on the patch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a diff in pull request #6548: [HUDI-4749] Fixing full cleaning to leverage metadata table
nsivabalan commented on code in PR #6548: URL: https://github.com/apache/hudi/pull/6548#discussion_r960987424 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java: ## @@ -206,15 +206,7 @@ private List getPartitionPathsForIncrementalCleaning(HoodieCleanMetadata */ private List getPartitionPathsForFullCleaning() { // Go to brute force mode of scanning all partitions -try { - // Because the partition of BaseTableMetadata has been deleted, - // all partition information can only be obtained from FileSystemBackedTableMetadata. Review Comment: yeah. when Sagar wrote that code, here is how DELETE_PARTITION was implemented. it wasn't well thought out design/impl. The way it was designed was, partition was synchronously deleted in data table and replace commit was relayed to MDT where in all file groups will be marked as invalid. but the partition in MDT remained. if you remember, I chimed in and rewrote the entire impl, where in we made DELETE_PARTITION lazy. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org