[GitHub] [hudi] nsivabalan commented on a diff in pull request #6548: [HUDI-4749] Fixing full cleaning to leverage metadata table

2022-09-06 Thread GitBox


nsivabalan commented on code in PR #6548:
URL: https://github.com/apache/hudi/pull/6548#discussion_r964231352


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java:
##
@@ -206,15 +206,7 @@ private List 
getPartitionPathsForIncrementalCleaning(HoodieCleanMetadata
*/
   private List getPartitionPathsForFullCleaning() {
 // Go to brute force mode of scanning all partitions
-try {
-  // Because the partition of BaseTableMetadata has been deleted,
-  // all partition information can only be obtained from 
FileSystemBackedTableMetadata.

Review Comment:
   One option is.
   during DELETE_PARTITION, we will not delete it from metadata table. 
   When cleaner comes around and detects there are not valid file groups, it 
will physically delete it and will relay it to metadata table. at this 
juncture, metadata table will remove the partition of interest. 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a diff in pull request #6548: [HUDI-4749] Fixing full cleaning to leverage metadata table

2022-09-06 Thread GitBox


nsivabalan commented on code in PR #6548:
URL: https://github.com/apache/hudi/pull/6548#discussion_r964230632


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java:
##
@@ -206,15 +206,7 @@ private List 
getPartitionPathsForIncrementalCleaning(HoodieCleanMetadata
*/
   private List getPartitionPathsForFullCleaning() {
 // Go to brute force mode of scanning all partitions
-try {
-  // Because the partition of BaseTableMetadata has been deleted,
-  // all partition information can only be obtained from 
FileSystemBackedTableMetadata.

Review Comment:
   guess we can't go w/ this change. essentially what we are doing in cleaning 
is
   
   1. fetch list of partitions to clean
   2. Fetch list of files to clean
   3. for those partitions where there is no valid file groups, we mark the 
partition to be deleted and physically delete the directory (lazy execution of 
DELETE_PARTITION)
   
   So, in step1, we have to go w/ FS based listing. if we revert to metadata 
table based listing, we may not get the deleted partition only. and so the 
physical directory will never get deleted. 
   
   Need to think about an optimized way to go about this. for now, will punt on 
the patch. 
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a diff in pull request #6548: [HUDI-4749] Fixing full cleaning to leverage metadata table

2022-09-01 Thread GitBox


nsivabalan commented on code in PR #6548:
URL: https://github.com/apache/hudi/pull/6548#discussion_r960987424


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java:
##
@@ -206,15 +206,7 @@ private List 
getPartitionPathsForIncrementalCleaning(HoodieCleanMetadata
*/
   private List getPartitionPathsForFullCleaning() {
 // Go to brute force mode of scanning all partitions
-try {
-  // Because the partition of BaseTableMetadata has been deleted,
-  // all partition information can only be obtained from 
FileSystemBackedTableMetadata.

Review Comment:
   yeah. when Sagar wrote that code, here is how DELETE_PARTITION was 
implemented. 
   it wasn't well thought out design/impl. The way it was designed was, 
partition was synchronously deleted in data table and replace commit was 
relayed to MDT where in all file groups will be marked as invalid. but the 
partition in MDT remained. 
   
   if you remember, I chimed in and rewrote the entire impl, where in we made 
DELETE_PARTITION lazy. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org