ayushtkn commented on code in PR #4897:
URL: https://github.com/apache/hive/pull/4897#discussion_r1403921601


##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##########
@@ -850,12 +851,44 @@ public void 
executeOperation(org.apache.hadoop.hive.ql.metadata.Table hmsTable,
         IcebergTableUtil.performMetadataDelete(icebergTable, 
deleteMetadataSpec.getBranchName(),
             deleteMetadataSpec.getSarg());
         break;
+      case DELETE_ORPHAN_FILES:

Review Comment:
   Theoretically yes, it has to store Path strings. I am not sure how much 
memory that can take, but if you see Hive Replication, It launches Distcp per 
table, so that does listing for each table in the Hs2 only, so that doesn't 
choke, & launch MR jobs for copy, not for listing.
   
   I did some benchmarking stuff there, so for ~3.8 million it was taking about 
250-300 mb, but there it stores some more stuff as well, so here it should be 
less.
   
   Yep, but we can explore having a Tez job, I don't have a clear idea how we 
can get it done via that route, but I will create a follow-up & discuss with 
folks & figure out a way :-) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to