Copilot commented on code in PR #6198:
URL: https://github.com/apache/hive/pull/6198#discussion_r2904565179


##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##########
@@ -2760,6 +2763,165 @@ public List<Void> run(List<String> input) throws 
Exception {
     return true;
   }
 
+  /**
+      a helper function which will get the current COLUMN_STATS_ACCURATE 
parameter on table level
+      and update the COLUMN_STATS_ACCURATE parameter with the new value on 
table level using directSql
+   */
+  public long updateColumnStatsAccurateForTable(Table table, List<String> 
droppedCols) throws MetaException {
+    Map<String, String> params = table.getParameters();
+    // get the current COLUMN_STATS_ACCURATE
+    String currentValue = params.get(StatsSetupConst.COLUMN_STATS_ACCURATE);
+    if (currentValue == null) {
+      return 0;
+    }
+    try {
+      // if the dropping columns is empty, that means we delete all the columns
+      if (droppedCols == null || droppedCols.isEmpty()) {
+        StatsSetupConst.clearColumnStatsState(params);
+      } else {
+        StatsSetupConst.removeColumnStatsState(params, droppedCols);
+      }
+
+      String updatedValue = params.get(StatsSetupConst.COLUMN_STATS_ACCURATE);
+      // if the COL_STATS_ACCURATE has changed, then update it using directSql
+      if (currentValue.equals(updatedValue)) {
+        return 0;
+      }
+      return updateTableParam(table, StatsSetupConst.COLUMN_STATS_ACCURATE, 
currentValue, updatedValue);
+    } catch (Exception e) {
+      throw new MetaException("Failed to parse/update COLUMN_STATS_ACCURATE: " 
+ e.getMessage());
+    }
+  }
+
+
+
+  public boolean updateColumnStatsAccurateForPartitions(String catName, String 
dbName, Table table,
+                                                     List<String> partNames, 
List<String> colNames) throws MetaException {
+    if (partNames == null || partNames.isEmpty()) {
+      return true;
+    }
+
+    ObjectMapper mapper = new ObjectMapper();
+
+    // If colNames is empty, then all the column stats of all columns should 
be deleted fetch all table column names
+    List<String> effectiveColNames;
+    if (colNames == null || colNames.isEmpty()) {
+      if (table.getSd().getCols() == null) {
+        effectiveColNames = new ArrayList<>();
+      } else {
+        effectiveColNames = table.getSd().getCols().stream()
+                .map(f -> f.getName().toLowerCase())
+                .collect(Collectors.toList());
+      }
+    } else {
+      effectiveColNames = 
colNames.stream().map(String::toLowerCase).collect(Collectors.toList());
+    }
+
+    try {
+      Batchable.runBatched(batchSize, partNames, new Batchable<String, Void>() 
{
+        @Override
+        public List<Void> run(List<String> input) throws Exception {
+          // 1. Construct SQL filter for partition names
+          String sqlFilter = PARTITIONS + ".\"PART_NAME\" in (" + 
makeParams(input.size()) + ")";
+
+          // 2. Fetch PART_IDs of the partitions which are need to be changed
+          List<Long> partitionIds = getPartitionIdsViaSqlFilter(
+                  catName, dbName, table.getTableName(), sqlFilter, input, 
Collections.emptyList(), -1);
+
+          if (partitionIds.isEmpty()) return null;
+
+          // 3. Get current COLUMN_STATS_ACCURATE values
+          Map<Long, String> partStatsAccurateMap = 
getColumnStatsAccurateByPartitionIds(partitionIds);
+
+          // 4. Iterate each partition to update COLUMN_STATS_ACCURATE
+          for (Long partId : partitionIds) {
+            String currentValue = partStatsAccurateMap.get(partId);
+            if (currentValue == null) continue;
+
+            try {
+              Map<String, Object> statsMap = mapper.readValue(
+                      currentValue, new TypeReference<Map<String, Object>>() 
{});
+              Object columnStatsObj = statsMap.get("COLUMN_STATS");
+
+              boolean changed = false;
+              if (columnStatsObj instanceof Map) {
+                Map<String, String> columnStats = (Map<String, String>) 
columnStatsObj;
+                for (String col : effectiveColNames) {
+                  if (columnStats.remove(col) != null) {
+                    changed = true;
+                  }
+                }
+
+                if (columnStats.isEmpty()) {
+                  statsMap.remove("COLUMN_STATS");
+                  changed = true;
+                }
+              }
+
+              if (!statsMap.containsKey("COLUMN_STATS")) {
+                if (statsMap.remove("BASIC_STATS") != null) {
+                  changed = true;
+                }
+              }

Review Comment:
   `BASIC_STATS` is removed from the stats map whenever `COLUMN_STATS` is 
absent, regardless of whether the current operation removed `COLUMN_STATS` or 
it was already absent before. This can incorrectly strip `BASIC_STATS` from 
partitions where column stats for the specified columns were never present. The 
removal of `BASIC_STATS` should only occur if `COLUMN_STATS` was just emptied 
(i.e., was present before and became empty after the column removals), not 
unconditionally when `COLUMN_STATS` is absent.



##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java:
##########
@@ -10138,6 +10170,18 @@ private boolean 
deleteTableColumnStatisticsViaJdo(String catName, String dbName,
       } else {
         throw new NoSuchObjectException("Column stats doesn't exist for db=" + 
dbName + " table="
             + tableName + " col=" + String.join(", ", colNames));

Review Comment:
   The `COLUMN_STATS_ACCURATE` update block (lines 10174–10185) is placed after 
the closing `}` of the `if/else` block but before `ret = commitTransaction()`. 
However, the indentation and structure suggests this code is actually 
unreachable when the `else` branch is taken (which throws), but it is also 
outside the `if` block that executes the successful deletion. More critically, 
if `colNames` is `null` on line 10172, `String.join(", ", colNames)` will throw 
a `NullPointerException`. Ensure that the `colNames` null case is handled 
before calling `String.join`.
   ```suggestion
           String colNamesStr = (colNames == null) ? "null" : String.join(", ", 
colNames);
           throw new NoSuchObjectException("Column stats doesn't exist for db=" 
+ dbName + " table="
               + tableName + " col=" + colNamesStr);
   ```



##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##########
@@ -2760,6 +2763,165 @@ public List<Void> run(List<String> input) throws 
Exception {
     return true;
   }
 
+  /**
+      a helper function which will get the current COLUMN_STATS_ACCURATE 
parameter on table level
+      and update the COLUMN_STATS_ACCURATE parameter with the new value on 
table level using directSql
+   */
+  public long updateColumnStatsAccurateForTable(Table table, List<String> 
droppedCols) throws MetaException {
+    Map<String, String> params = table.getParameters();
+    // get the current COLUMN_STATS_ACCURATE
+    String currentValue = params.get(StatsSetupConst.COLUMN_STATS_ACCURATE);
+    if (currentValue == null) {
+      return 0;
+    }
+    try {
+      // if the dropping columns is empty, that means we delete all the columns
+      if (droppedCols == null || droppedCols.isEmpty()) {
+        StatsSetupConst.clearColumnStatsState(params);
+      } else {
+        StatsSetupConst.removeColumnStatsState(params, droppedCols);
+      }
+
+      String updatedValue = params.get(StatsSetupConst.COLUMN_STATS_ACCURATE);
+      // if the COL_STATS_ACCURATE has changed, then update it using directSql
+      if (currentValue.equals(updatedValue)) {
+        return 0;
+      }
+      return updateTableParam(table, StatsSetupConst.COLUMN_STATS_ACCURATE, 
currentValue, updatedValue);
+    } catch (Exception e) {
+      throw new MetaException("Failed to parse/update COLUMN_STATS_ACCURATE: " 
+ e.getMessage());
+    }
+  }
+
+
+
+  public boolean updateColumnStatsAccurateForPartitions(String catName, String 
dbName, Table table,
+                                                     List<String> partNames, 
List<String> colNames) throws MetaException {

Review Comment:
   The `updateColumnStatsAccurateForTable` method returns `long` (the number of 
rows updated) while `updateColumnStatsAccurateForPartitions` returns `boolean`. 
These two methods serve the same conceptual purpose (updating 
`COLUMN_STATS_ACCURATE` after a stats drop) but have inconsistent return types, 
which can confuse callers. Consider aligning them to use the same return type.



##########
itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/cache/TestCachedStoreUpdateUsingEvents.java:
##########
@@ -829,7 +829,16 @@ private String getValidWriteIds(String dbName, String 
tblName) throws Throwable
   private void validateTablePara(String dbName, String tblName) throws 
Throwable {
     Table tblRead = rawStore.getTable(DEFAULT_CATALOG_NAME, dbName, tblName);
     Table tblRead1 = sharedCache.getTableFromCache(DEFAULT_CATALOG_NAME, 
dbName, tblName);
-    Assert.assertEquals(tblRead.getParameters(), tblRead1.getParameters());
+    // Prepare both the expected and actual table parameters
+    Map<String, String> expected = new HashMap<>(tblRead.getParameters());
+    Map<String, String> actual = new HashMap<>(tblRead1.getParameters());
+
+    // Remove the COLUMN_STATS_ACCURATE entry from both maps, because it is 
now completely removed
+    expected.remove("COLUMN_STATS_ACCURATE");
+    actual.remove("COLUMN_STATS_ACCURATE");
+
+    // Now assert equality without the COLUMN_STATS_ACCURATE key

Review Comment:
   Rather than removing `COLUMN_STATS_ACCURATE` from both maps to skip 
comparison, consider asserting the expected value of `COLUMN_STATS_ACCURATE` 
explicitly (e.g., that it is `null` or has the expected content after the stats 
drop). Silently ignoring this key weakens the test and could mask future 
regressions where the cached value diverges from the raw store value in 
unexpected ways.
   ```suggestion
   
       Map<String, String> expected = tblRead.getParameters();
       Map<String, String> actual = tblRead1.getParameters();
   
       // COLUMN_STATS_ACCURATE is expected to be completely removed after 
stats are dropped.
       Assert.assertFalse("Expected raw store table parameters to not contain 
COLUMN_STATS_ACCURATE",
           expected.containsKey(StatsSetupConst.COLUMN_STATS_ACCURATE));
       Assert.assertFalse("Expected cached table parameters to not contain 
COLUMN_STATS_ACCURATE",
           actual.containsKey(StatsSetupConst.COLUMN_STATS_ACCURATE));
   
       // Assert full equality of the parameter maps
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to