[ 
https://issues.apache.org/jira/browse/HIVE-26788?focusedWorklogId=830976&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-830976
 ]

ASF GitHub Bot logged work on HIVE-26788:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 05/Dec/22 09:58
            Start Date: 05/Dec/22 09:58
    Worklog Time Spent: 10m 
      Work Description: SourabhBadhya commented on code in PR #3812:
URL: https://github.com/apache/hive/pull/3812#discussion_r1039378601


##########
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/StatsUpdater.java:
##########
@@ -73,6 +69,9 @@ public void gatherStats(CompactionInfo ci, HiveConf conf, 
String userName, Strin
                 sb.append(")");
             }
             sb.append(" compute statistics");
+            if (ci.isMinorCompaction()) {
+                sb.append(" noscan");

Review Comment:
   Minor compaction is expected to not compact too many files and hence in most 
scenarios only the number of files gets changed after minor compaction. Whereas 
large updates like major compaction needs to update all statistics (since it 
happens once in a while) to keep the metadata updated. Therefore the idea was 
to do a fast update of statistics on a minor compaction & do complete update in 
case of major compaction.





Issue Time Tracking
-------------------

    Worklog Id:     (was: 830976)
    Time Spent: 1h  (was: 50m)

> Update stats of table/partition after minor compaction using noscan operation
> -----------------------------------------------------------------------------
>
>                 Key: HIVE-26788
>                 URL: https://issues.apache.org/jira/browse/HIVE-26788
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Sourabh Badhya
>            Assignee: Sourabh Badhya
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently, statistics are not updated for minor compaction since minor 
> compaction performs little updates on the statistics (such as number of files 
> in table/partition & total size of the table/partition). It is better to 
> utilize NOSCAN operation  for minor compaction since NOSCAN operations 
> performs faster update of statistics and updates the relevant fields such as 
> number of files & total sizes of the table/partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to