[jira] [Work logged] (HIVE-25492) Major query-based compaction is skipped if partition is empty

ASF GitHub Bot (Jira) Tue, 05 Apr 2022 01:22:05 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-25492?focusedWorklogId=752705&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752705
 ]


ASF GitHub Bot logged work on HIVE-25492:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 05/Apr/22 08:21
            Start Date: 05/Apr/22 08:21
    Worklog Time Spent: 10m 
      Work Description: deniskuzZ commented on code in PR #3157:
URL: https://github.com/apache/hive/pull/3157#discussion_r842506372


##########
ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java:
##########
@@ -1480,6 +1482,57 @@ private static ValidTxnList 
getValidTxnList(Configuration conf) {
     return validTxnList;
   }
 
+
+  /**
+   * In case of the cleaner, we don't need to go into file level, it is enough 
to collect base/delta/deletedelta directories.
+   *
+   * @param fs the filesystem used for the directory lookup
+   * @param path the path of the table or partition needs to be cleaned
+   * @return The listed directory snapshot needs to be checked for cleaning
+   * @throws IOException on filesystem errors
+   */
+  public static Map<Path, HdfsDirSnapshot> getHdfsDirSnapshotsForCleaner(final 
FileSystem fs, final Path path)
+          throws IOException {
+    Map<Path, HdfsDirSnapshot> dirToSnapshots = new HashMap<>();
+    // depth first search
+    Deque<RemoteIterator<FileStatus>> stack = new ArrayDeque<>();
+    stack.push(fs.listStatusIterator(path));
+    while (!stack.isEmpty()) {
+      RemoteIterator<FileStatus> itr = stack.pop();
+      while (itr.hasNext()) {
+        FileStatus fStatus = itr.next();
+        Path fPath = fStatus.getPath();
+        if (acidHiddenFileFilter.accept(fPath) && 
acidTempDirFilter.accept(fPath)) {

Review Comment:
   we could use hiddenFileFilter as we don't need to include METADATA_FILE & 
ACID_FORMAT





Issue Time Tracking
-------------------

    Worklog Id:     (was: 752705)
    Time Spent: 2h  (was: 1h 50m)

> Major query-based compaction is skipped if partition is empty
> -------------------------------------------------------------
>
>                 Key: HIVE-25492
>                 URL: https://issues.apache.org/jira/browse/HIVE-25492
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Karen Coppage
>            Assignee: Antal Sinkovits
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> Currently if the result of query-based compaction is an empty base, delta, or 
> delete delta, the empty directory is deleted.
> This is because of minor compaction – if there are only deltas to compact, 
> then no compacted delete delta should be created (only a compacted delta). In 
> the same way, if there are only delete deltas to compact, then no compacted 
> delta should be created (only a compacted delete delta).
> There is an issue with major compaction. If all the data in the partition has 
> been deleted, then we should get an empty base directory after compaction. 
> Instead, the empty base directory is deleted because it's empty and 
> compaction claims to succeed but we end up with the same deltas/delete deltas 
> we started with – basically compaction does not run.
> Where to start? MajorQueryCompactor#commitCompaction



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25492) Major query-based compaction is skipped if partition is empty

Reply via email to