Copilot commented on code in PR #6198:
URL: https://github.com/apache/hive/pull/6198#discussion_r2904565179
##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##########
@@ -2760,6 +2763,165 @@ public List<Void> run(List<String> input) throws
Exception {
return true;
}
+ /**
+ a helper function which will get the current COLUMN_STATS_ACCURATE
parameter on table level
+ and update the COLUMN_STATS_ACCURATE parameter with the new value on
table level using directSql
+ */
+ public long updateColumnStatsAccurateForTable(Table table, List<String>
droppedCols) throws MetaException {
+ Map<String, String> params = table.getParameters();
+ // get the current COLUMN_STATS_ACCURATE
+ String currentValue = params.get(StatsSetupConst.COLUMN_STATS_ACCURATE);
+ if (currentValue == null) {
+ return 0;
+ }
+ try {
+ // if the dropping columns is empty, that means we delete all the columns
+ if (droppedCols == null || droppedCols.isEmpty()) {
+ StatsSetupConst.clearColumnStatsState(params);
+ } else {
+ StatsSetupConst.removeColumnStatsState(params, droppedCols);
+ }
+
+ String updatedValue = params.get(StatsSetupConst.COLUMN_STATS_ACCURATE);
+ // if the COL_STATS_ACCURATE has changed, then update it using directSql
+ if (currentValue.equals(updatedValue)) {
+ return 0;
+ }
+ return updateTableParam(table, StatsSetupConst.COLUMN_STATS_ACCURATE,
currentValue, updatedValue);
+ } catch (Exception e) {
+ throw new MetaException("Failed to parse/update COLUMN_STATS_ACCURATE: "
+ e.getMessage());
+ }
+ }
+
+
+
+ public boolean updateColumnStatsAccurateForPartitions(String catName, String
dbName, Table table,
+ List<String> partNames,
List<String> colNames) throws MetaException {
+ if (partNames == null || partNames.isEmpty()) {
+ return true;
+ }
+
+ ObjectMapper mapper = new ObjectMapper();
+
+ // If colNames is empty, then all the column stats of all columns should
be deleted fetch all table column names
+ List<String> effectiveColNames;
+ if (colNames == null || colNames.isEmpty()) {
+ if (table.getSd().getCols() == null) {
+ effectiveColNames = new ArrayList<>();
+ } else {
+ effectiveColNames = table.getSd().getCols().stream()
+ .map(f -> f.getName().toLowerCase())
+ .collect(Collectors.toList());
+ }
+ } else {
+ effectiveColNames =
colNames.stream().map(String::toLowerCase).collect(Collectors.toList());
+ }
+
+ try {
+ Batchable.runBatched(batchSize, partNames, new Batchable<String, Void>()
{
+ @Override
+ public List<Void> run(List<String> input) throws Exception {
+ // 1. Construct SQL filter for partition names
+ String sqlFilter = PARTITIONS + ".\"PART_NAME\" in (" +
makeParams(input.size()) + ")";
+
+ // 2. Fetch PART_IDs of the partitions which are need to be changed
+ List<Long> partitionIds = getPartitionIdsViaSqlFilter(
+ catName, dbName, table.getTableName(), sqlFilter, input,
Collections.emptyList(), -1);
+
+ if (partitionIds.isEmpty()) return null;
+
+ // 3. Get current COLUMN_STATS_ACCURATE values
+ Map<Long, String> partStatsAccurateMap =
getColumnStatsAccurateByPartitionIds(partitionIds);
+
+ // 4. Iterate each partition to update COLUMN_STATS_ACCURATE
+ for (Long partId : partitionIds) {
+ String currentValue = partStatsAccurateMap.get(partId);
+ if (currentValue == null) continue;
+
+ try {
+ Map<String, Object> statsMap = mapper.readValue(
+ currentValue, new TypeReference<Map<String, Object>>()
{});
+ Object columnStatsObj = statsMap.get("COLUMN_STATS");
+
+ boolean changed = false;
+ if (columnStatsObj instanceof Map) {
+ Map<String, String> columnStats = (Map<String, String>)
columnStatsObj;
+ for (String col : effectiveColNames) {
+ if (columnStats.remove(col) != null) {
+ changed = true;
+ }
+ }
+
+ if (columnStats.isEmpty()) {
+ statsMap.remove("COLUMN_STATS");
+ changed = true;
+ }
+ }
+
+ if (!statsMap.containsKey("COLUMN_STATS")) {
+ if (statsMap.remove("BASIC_STATS") != null) {
+ changed = true;
+ }
+ }
Review Comment:
`BASIC_STATS` is removed from the stats map whenever `COLUMN_STATS` is
absent, regardless of whether the current operation removed `COLUMN_STATS` or
it was already absent before. This can incorrectly strip `BASIC_STATS` from
partitions where column stats for the specified columns were never present. The
removal of `BASIC_STATS` should only occur if `COLUMN_STATS` was just emptied
(i.e., was present before and became empty after the column removals), not
unconditionally when `COLUMN_STATS` is absent.
##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java:
##########
@@ -10138,6 +10170,18 @@ private boolean
deleteTableColumnStatisticsViaJdo(String catName, String dbName,
} else {
throw new NoSuchObjectException("Column stats doesn't exist for db=" +
dbName + " table="
+ tableName + " col=" + String.join(", ", colNames));
Review Comment:
The `COLUMN_STATS_ACCURATE` update block (lines 10174–10185) is placed after
the closing `}` of the `if/else` block but before `ret = commitTransaction()`.
However, the indentation and structure suggests this code is actually
unreachable when the `else` branch is taken (which throws), but it is also
outside the `if` block that executes the successful deletion. More critically,
if `colNames` is `null` on line 10172, `String.join(", ", colNames)` will throw
a `NullPointerException`. Ensure that the `colNames` null case is handled
before calling `String.join`.
```suggestion
String colNamesStr = (colNames == null) ? "null" : String.join(", ",
colNames);
throw new NoSuchObjectException("Column stats doesn't exist for db="
+ dbName + " table="
+ tableName + " col=" + colNamesStr);
```
##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##########
@@ -2760,6 +2763,165 @@ public List<Void> run(List<String> input) throws
Exception {
return true;
}
+ /**
+ a helper function which will get the current COLUMN_STATS_ACCURATE
parameter on table level
+ and update the COLUMN_STATS_ACCURATE parameter with the new value on
table level using directSql
+ */
+ public long updateColumnStatsAccurateForTable(Table table, List<String>
droppedCols) throws MetaException {
+ Map<String, String> params = table.getParameters();
+ // get the current COLUMN_STATS_ACCURATE
+ String currentValue = params.get(StatsSetupConst.COLUMN_STATS_ACCURATE);
+ if (currentValue == null) {
+ return 0;
+ }
+ try {
+ // if the dropping columns is empty, that means we delete all the columns
+ if (droppedCols == null || droppedCols.isEmpty()) {
+ StatsSetupConst.clearColumnStatsState(params);
+ } else {
+ StatsSetupConst.removeColumnStatsState(params, droppedCols);
+ }
+
+ String updatedValue = params.get(StatsSetupConst.COLUMN_STATS_ACCURATE);
+ // if the COL_STATS_ACCURATE has changed, then update it using directSql
+ if (currentValue.equals(updatedValue)) {
+ return 0;
+ }
+ return updateTableParam(table, StatsSetupConst.COLUMN_STATS_ACCURATE,
currentValue, updatedValue);
+ } catch (Exception e) {
+ throw new MetaException("Failed to parse/update COLUMN_STATS_ACCURATE: "
+ e.getMessage());
+ }
+ }
+
+
+
+ public boolean updateColumnStatsAccurateForPartitions(String catName, String
dbName, Table table,
+ List<String> partNames,
List<String> colNames) throws MetaException {
Review Comment:
The `updateColumnStatsAccurateForTable` method returns `long` (the number of
rows updated) while `updateColumnStatsAccurateForPartitions` returns `boolean`.
These two methods serve the same conceptual purpose (updating
`COLUMN_STATS_ACCURATE` after a stats drop) but have inconsistent return types,
which can confuse callers. Consider aligning them to use the same return type.
##########
itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/cache/TestCachedStoreUpdateUsingEvents.java:
##########
@@ -829,7 +829,16 @@ private String getValidWriteIds(String dbName, String
tblName) throws Throwable
private void validateTablePara(String dbName, String tblName) throws
Throwable {
Table tblRead = rawStore.getTable(DEFAULT_CATALOG_NAME, dbName, tblName);
Table tblRead1 = sharedCache.getTableFromCache(DEFAULT_CATALOG_NAME,
dbName, tblName);
- Assert.assertEquals(tblRead.getParameters(), tblRead1.getParameters());
+ // Prepare both the expected and actual table parameters
+ Map<String, String> expected = new HashMap<>(tblRead.getParameters());
+ Map<String, String> actual = new HashMap<>(tblRead1.getParameters());
+
+ // Remove the COLUMN_STATS_ACCURATE entry from both maps, because it is
now completely removed
+ expected.remove("COLUMN_STATS_ACCURATE");
+ actual.remove("COLUMN_STATS_ACCURATE");
+
+ // Now assert equality without the COLUMN_STATS_ACCURATE key
Review Comment:
Rather than removing `COLUMN_STATS_ACCURATE` from both maps to skip
comparison, consider asserting the expected value of `COLUMN_STATS_ACCURATE`
explicitly (e.g., that it is `null` or has the expected content after the stats
drop). Silently ignoring this key weakens the test and could mask future
regressions where the cached value diverges from the raw store value in
unexpected ways.
```suggestion
Map<String, String> expected = tblRead.getParameters();
Map<String, String> actual = tblRead1.getParameters();
// COLUMN_STATS_ACCURATE is expected to be completely removed after
stats are dropped.
Assert.assertFalse("Expected raw store table parameters to not contain
COLUMN_STATS_ACCURATE",
expected.containsKey(StatsSetupConst.COLUMN_STATS_ACCURATE));
Assert.assertFalse("Expected cached table parameters to not contain
COLUMN_STATS_ACCURATE",
actual.containsKey(StatsSetupConst.COLUMN_STATS_ACCURATE));
// Assert full equality of the parameter maps
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]