[ https://issues.apache.org/jira/browse/HIVE-25492?focusedWorklogId=752705&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752705 ]
ASF GitHub Bot logged work on HIVE-25492: ----------------------------------------- Author: ASF GitHub Bot Created on: 05/Apr/22 08:21 Start Date: 05/Apr/22 08:21 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3157: URL: https://github.com/apache/hive/pull/3157#discussion_r842506372 ########## ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java: ########## @@ -1480,6 +1482,57 @@ private static ValidTxnList getValidTxnList(Configuration conf) { return validTxnList; } + + /** + * In case of the cleaner, we don't need to go into file level, it is enough to collect base/delta/deletedelta directories. + * + * @param fs the filesystem used for the directory lookup + * @param path the path of the table or partition needs to be cleaned + * @return The listed directory snapshot needs to be checked for cleaning + * @throws IOException on filesystem errors + */ + public static Map<Path, HdfsDirSnapshot> getHdfsDirSnapshotsForCleaner(final FileSystem fs, final Path path) + throws IOException { + Map<Path, HdfsDirSnapshot> dirToSnapshots = new HashMap<>(); + // depth first search + Deque<RemoteIterator<FileStatus>> stack = new ArrayDeque<>(); + stack.push(fs.listStatusIterator(path)); + while (!stack.isEmpty()) { + RemoteIterator<FileStatus> itr = stack.pop(); + while (itr.hasNext()) { + FileStatus fStatus = itr.next(); + Path fPath = fStatus.getPath(); + if (acidHiddenFileFilter.accept(fPath) && acidTempDirFilter.accept(fPath)) { Review Comment: we could use hiddenFileFilter as we don't need to include METADATA_FILE & ACID_FORMAT Issue Time Tracking ------------------- Worklog Id: (was: 752705) Time Spent: 2h (was: 1h 50m) > Major query-based compaction is skipped if partition is empty > ------------------------------------------------------------- > > Key: HIVE-25492 > URL: https://issues.apache.org/jira/browse/HIVE-25492 > Project: Hive > Issue Type: Bug > Reporter: Karen Coppage > Assignee: Antal Sinkovits > Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > Currently if the result of query-based compaction is an empty base, delta, or > delete delta, the empty directory is deleted. > This is because of minor compaction – if there are only deltas to compact, > then no compacted delete delta should be created (only a compacted delta). In > the same way, if there are only delete deltas to compact, then no compacted > delta should be created (only a compacted delete delta). > There is an issue with major compaction. If all the data in the partition has > been deleted, then we should get an empty base directory after compaction. > Instead, the empty base directory is deleted because it's empty and > compaction claims to succeed but we end up with the same deltas/delete deltas > we started with – basically compaction does not run. > Where to start? MajorQueryCompactor#commitCompaction -- This message was sent by Atlassian Jira (v8.20.1#820001)