[GitHub] [hive] maheshk114 commented on a change in pull request #1834: HIVE-24515 : Analyze table job can be skipped when stats populated are already accurate.

GitBox Tue, 12 Jan 2021 21:14:01 -0800


maheshk114 commented on a change in pull request #1834:
URL: https://github.com/apache/hive/pull/1834#discussion_r556266918




##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java
##########
@@ -204,6 +206,54 @@ public int persistColumnStats(Hive db, Table tbl) throws 
HiveException, MetaExce
   public void setDpPartSpecs(Collection<Partition> dpPartSpecs) {
   }
 
+  public static boolean canSkipStatsGeneration(String dbName, String tblName, 
String partName,
+                                               long statsWriteId, String 
queryValidWriteIdList) {
+    if (queryValidWriteIdList != null) { // Can be null if its not an ACID 
table.
+      ValidWriteIdList validWriteIdList = new 
ValidReaderWriteIdList(queryValidWriteIdList);
+      // Just check if the write ID is valid. If it's valid (i.e. we are 
allowed to see it),
+      // that means it cannot possibly be a concurrent write. As stats 
optimization is enabled
+      // only in case auto gather is enabled. Thus the stats must be updated 
by a valid committed
+      // transaction and stats generation can be skipped.
+      if (validWriteIdList.isWriteIdValid(statsWriteId)) {
+        try {
+          IMetaStoreClient msc = Hive.get().getMSC();
+          TxnState state = msc.findStatStatusByWriteId(dbName, tblName, 
partName, statsWriteId);

Review comment:
       this is to make sure that the txn is not cleaned up by compactor.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hive] maheshk114 commented on a change in pull request #1834: HIVE-24515 : Analyze table job can be skipped when stats populated are already accurate.

Reply via email to