Copilot commented on code in PR #6198:
URL: https://github.com/apache/hive/pull/6198#discussion_r2904525436
##########
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java:
##########
@@ -1961,6 +1971,12 @@ public void testColumnStatistics() throws Throwable {
Lists.newArrayList(partitions.get(0), partitions.get(1),
partitions.get(2)), Lists.newArrayList(colName), ENGINE);
assertEquals(1, stats2.size());
assertEquals(2, stats2.get(partitions.get(2)).size());
+ // test if all columns are deleted from parameter COLUMN_STATS_ACCURATE
+ Partition partition_0 = client.getPartition(dbName, tblName,
partitions.get(0));
+ Map<String, String> partitionParams = partition_0.getParameters();
+ String partition_column_stats_accurate =
partitionParams.get("COLUMN_STATS_ACCURATE");
+ assertTrue("parameter COLUMN_STATS_ACCURATE is not accurate in " +
partitions.get(0),partition_column_stats_accurate == null ||
+ (!table_column_stats_accurate.contains(colName[0]) &&
!table_column_stats_accurate.contains(colName[1])));
Review Comment:
The assertion on lines 1978–1979 checks `table_column_stats_accurate` (the
table-level variable from the outer scope) instead of
`partition_column_stats_accurate` (the partition-level variable defined on line
1977). This means the test is not actually verifying that the partition's
`COLUMN_STATS_ACCURATE` parameter was correctly updated — it's re-checking the
table-level value. Both conditions should reference
`partition_column_stats_accurate`.
```suggestion
assertTrue("parameter COLUMN_STATS_ACCURATE is not accurate in " +
partitions.get(0), partition_column_stats_accurate == null ||
(!partition_column_stats_accurate.contains(colName[0]) &&
!partition_column_stats_accurate.contains(colName[1])));
```
##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java:
##########
@@ -10138,6 +10170,18 @@ private boolean
deleteTableColumnStatisticsViaJdo(String catName, String dbName,
} else {
throw new NoSuchObjectException("Column stats doesn't exist for db=" +
dbName + " table="
+ tableName + " col=" + String.join(", ", colNames));
+ }
+ // get the persistent object MTable
+ MTable mTable = getMTable(catName, dbName, tableName);
+ if (mTable != null) {
+ Map<String, String> tableParams = mTable.getParameters();
+ if (tableParams != null) {
+ if (colNames == null || colNames.isEmpty()) {
+ StatsSetupConst.clearColumnStatsState(tableParams);
+ } else {
+ StatsSetupConst.removeColumnStatsState(tableParams, colNames);
+ }
+ }
}
Review Comment:
This block is inside the `else` branch of the `if (result > 0)` / `else`
conditional that throws `NoSuchObjectException`. Because the closing brace of
the `else` (line 10173 in the diff) is placed before this block, the
`COLUMN_STATS_ACCURATE` update only executes when the deletion returned zero
rows (i.e., when the column stats did not exist), not on success. The block
should be inside the successful branch so that it only runs after stats are
actually deleted. Verify and correct the placement of the braces to ensure this
update runs after a successful deletion.
##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##########
@@ -2760,6 +2763,165 @@ public List<Void> run(List<String> input) throws
Exception {
return true;
}
+ /**
+ a helper function which will get the current COLUMN_STATS_ACCURATE
parameter on table level
+ and update the COLUMN_STATS_ACCURATE parameter with the new value on
table level using directSql
+ */
+ public long updateColumnStatsAccurateForTable(Table table, List<String>
droppedCols) throws MetaException {
+ Map<String, String> params = table.getParameters();
+ // get the current COLUMN_STATS_ACCURATE
+ String currentValue = params.get(StatsSetupConst.COLUMN_STATS_ACCURATE);
+ if (currentValue == null) {
+ return 0;
+ }
+ try {
+ // if the dropping columns is empty, that means we delete all the columns
+ if (droppedCols == null || droppedCols.isEmpty()) {
+ StatsSetupConst.clearColumnStatsState(params);
+ } else {
+ StatsSetupConst.removeColumnStatsState(params, droppedCols);
+ }
+
+ String updatedValue = params.get(StatsSetupConst.COLUMN_STATS_ACCURATE);
+ // if the COL_STATS_ACCURATE has changed, then update it using directSql
+ if (currentValue.equals(updatedValue)) {
+ return 0;
+ }
+ return updateTableParam(table, StatsSetupConst.COLUMN_STATS_ACCURATE,
currentValue, updatedValue);
+ } catch (Exception e) {
+ throw new MetaException("Failed to parse/update COLUMN_STATS_ACCURATE: "
+ e.getMessage());
+ }
+ }
+
+
+
+ public boolean updateColumnStatsAccurateForPartitions(String catName, String
dbName, Table table,
+ List<String> partNames,
List<String> colNames) throws MetaException {
+ if (partNames == null || partNames.isEmpty()) {
+ return true;
+ }
+
+ ObjectMapper mapper = new ObjectMapper();
+
+ // If colNames is empty, then all the column stats of all columns should
be deleted fetch all table column names
+ List<String> effectiveColNames;
+ if (colNames == null || colNames.isEmpty()) {
+ if (table.getSd().getCols() == null) {
+ effectiveColNames = new ArrayList<>();
+ } else {
+ effectiveColNames = table.getSd().getCols().stream()
+ .map(f -> f.getName().toLowerCase())
+ .collect(Collectors.toList());
+ }
+ } else {
+ effectiveColNames =
colNames.stream().map(String::toLowerCase).collect(Collectors.toList());
+ }
+
+ try {
+ Batchable.runBatched(batchSize, partNames, new Batchable<String, Void>()
{
+ @Override
+ public List<Void> run(List<String> input) throws Exception {
+ // 1. Construct SQL filter for partition names
+ String sqlFilter = PARTITIONS + ".\"PART_NAME\" in (" +
makeParams(input.size()) + ")";
+
+ // 2. Fetch PART_IDs of the partitions which are need to be changed
+ List<Long> partitionIds = getPartitionIdsViaSqlFilter(
+ catName, dbName, table.getTableName(), sqlFilter, input,
Collections.emptyList(), -1);
+
+ if (partitionIds.isEmpty()) return null;
+
+ // 3. Get current COLUMN_STATS_ACCURATE values
+ Map<Long, String> partStatsAccurateMap =
getColumnStatsAccurateByPartitionIds(partitionIds);
+
+ // 4. Iterate each partition to update COLUMN_STATS_ACCURATE
+ for (Long partId : partitionIds) {
+ String currentValue = partStatsAccurateMap.get(partId);
+ if (currentValue == null) continue;
+
+ try {
+ Map<String, Object> statsMap = mapper.readValue(
+ currentValue, new TypeReference<Map<String, Object>>()
{});
+ Object columnStatsObj = statsMap.get("COLUMN_STATS");
+
+ boolean changed = false;
+ if (columnStatsObj instanceof Map) {
+ Map<String, String> columnStats = (Map<String, String>)
columnStatsObj;
+ for (String col : effectiveColNames) {
+ if (columnStats.remove(col) != null) {
+ changed = true;
+ }
+ }
+
+ if (columnStats.isEmpty()) {
+ statsMap.remove("COLUMN_STATS");
+ changed = true;
+ }
+ }
+
+ if (!statsMap.containsKey("COLUMN_STATS")) {
+ if (statsMap.remove("BASIC_STATS") != null) {
Review Comment:
The string literal `"BASIC_STATS"` is used as a magic string. The existing
`StatsSetupConst` class already defines constants for stats-related keys (e.g.,
`StatsSetupConst.BASIC_STATS`). This literal should be replaced with the
appropriate constant to be consistent with the rest of the codebase and avoid
potential mismatches.
```suggestion
if (statsMap.remove(StatsSetupConst.BASIC_STATS) != null) {
```
##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##########
@@ -2760,6 +2763,165 @@ public List<Void> run(List<String> input) throws
Exception {
return true;
}
+ /**
+ a helper function which will get the current COLUMN_STATS_ACCURATE
parameter on table level
+ and update the COLUMN_STATS_ACCURATE parameter with the new value on
table level using directSql
Review Comment:
The Javadoc comment for `updateColumnStatsAccurateForTable` is written in
lowercase and does not follow standard Javadoc format. It also doesn't document
the parameters (`table`, `droppedCols`) or the return value. Consider using a
proper Javadoc block with `@param` and `@return` tags.
```suggestion
* Updates the {@code COLUMN_STATS_ACCURATE} parameter for the given table
using direct SQL.
*
* <p>The current value of {@code COLUMN_STATS_ACCURATE} is read from the
table parameters and
* adjusted to reflect the columns that have been dropped.</p>
*
* @param table the table whose {@code COLUMN_STATS_ACCURATE} parameter
should be updated
* @param droppedCols list of column names that have been dropped; if
{@code null} or empty,
* all column statistics are cleared
* @return the number of table parameter rows that were updated in the
metastore, or {@code 0}
* if no update was necessary
* @throws MetaException if the {@code COLUMN_STATS_ACCURATE} parameter
cannot be parsed or updated
```
##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##########
@@ -2760,6 +2763,165 @@ public List<Void> run(List<String> input) throws
Exception {
return true;
}
+ /**
+ a helper function which will get the current COLUMN_STATS_ACCURATE
parameter on table level
+ and update the COLUMN_STATS_ACCURATE parameter with the new value on
table level using directSql
+ */
+ public long updateColumnStatsAccurateForTable(Table table, List<String>
droppedCols) throws MetaException {
+ Map<String, String> params = table.getParameters();
+ // get the current COLUMN_STATS_ACCURATE
+ String currentValue = params.get(StatsSetupConst.COLUMN_STATS_ACCURATE);
+ if (currentValue == null) {
+ return 0;
+ }
+ try {
+ // if the dropping columns is empty, that means we delete all the columns
+ if (droppedCols == null || droppedCols.isEmpty()) {
+ StatsSetupConst.clearColumnStatsState(params);
+ } else {
+ StatsSetupConst.removeColumnStatsState(params, droppedCols);
+ }
+
+ String updatedValue = params.get(StatsSetupConst.COLUMN_STATS_ACCURATE);
+ // if the COL_STATS_ACCURATE has changed, then update it using directSql
+ if (currentValue.equals(updatedValue)) {
+ return 0;
+ }
+ return updateTableParam(table, StatsSetupConst.COLUMN_STATS_ACCURATE,
currentValue, updatedValue);
+ } catch (Exception e) {
+ throw new MetaException("Failed to parse/update COLUMN_STATS_ACCURATE: "
+ e.getMessage());
+ }
+ }
+
+
+
+ public boolean updateColumnStatsAccurateForPartitions(String catName, String
dbName, Table table,
+ List<String> partNames,
List<String> colNames) throws MetaException {
+ if (partNames == null || partNames.isEmpty()) {
+ return true;
+ }
+
+ ObjectMapper mapper = new ObjectMapper();
+
+ // If colNames is empty, then all the column stats of all columns should
be deleted fetch all table column names
+ List<String> effectiveColNames;
+ if (colNames == null || colNames.isEmpty()) {
+ if (table.getSd().getCols() == null) {
+ effectiveColNames = new ArrayList<>();
+ } else {
+ effectiveColNames = table.getSd().getCols().stream()
+ .map(f -> f.getName().toLowerCase())
+ .collect(Collectors.toList());
+ }
+ } else {
+ effectiveColNames =
colNames.stream().map(String::toLowerCase).collect(Collectors.toList());
+ }
+
+ try {
+ Batchable.runBatched(batchSize, partNames, new Batchable<String, Void>()
{
+ @Override
+ public List<Void> run(List<String> input) throws Exception {
+ // 1. Construct SQL filter for partition names
+ String sqlFilter = PARTITIONS + ".\"PART_NAME\" in (" +
makeParams(input.size()) + ")";
+
+ // 2. Fetch PART_IDs of the partitions which are need to be changed
+ List<Long> partitionIds = getPartitionIdsViaSqlFilter(
+ catName, dbName, table.getTableName(), sqlFilter, input,
Collections.emptyList(), -1);
+
+ if (partitionIds.isEmpty()) return null;
+
+ // 3. Get current COLUMN_STATS_ACCURATE values
+ Map<Long, String> partStatsAccurateMap =
getColumnStatsAccurateByPartitionIds(partitionIds);
+
+ // 4. Iterate each partition to update COLUMN_STATS_ACCURATE
+ for (Long partId : partitionIds) {
+ String currentValue = partStatsAccurateMap.get(partId);
+ if (currentValue == null) continue;
+
+ try {
+ Map<String, Object> statsMap = mapper.readValue(
+ currentValue, new TypeReference<Map<String, Object>>()
{});
+ Object columnStatsObj = statsMap.get("COLUMN_STATS");
Review Comment:
Similar to `"BASIC_STATS"`, the string literal `"COLUMN_STATS"` should be
replaced with `StatsSetupConst.COLUMN_STATS` (or the equivalent constant) for
consistency with the rest of the codebase.
##########
itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/cache/TestCachedStoreUpdateUsingEvents.java:
##########
@@ -829,7 +829,16 @@ private String getValidWriteIds(String dbName, String
tblName) throws Throwable
private void validateTablePara(String dbName, String tblName) throws
Throwable {
Table tblRead = rawStore.getTable(DEFAULT_CATALOG_NAME, dbName, tblName);
Table tblRead1 = sharedCache.getTableFromCache(DEFAULT_CATALOG_NAME,
dbName, tblName);
- Assert.assertEquals(tblRead.getParameters(), tblRead1.getParameters());
+ // Prepare both the expected and actual table parameters
+ Map<String, String> expected = new HashMap<>(tblRead.getParameters());
+ Map<String, String> actual = new HashMap<>(tblRead1.getParameters());
+
+ // Remove the COLUMN_STATS_ACCURATE entry from both maps, because it is
now completely removed
+ expected.remove("COLUMN_STATS_ACCURATE");
+ actual.remove("COLUMN_STATS_ACCURATE");
Review Comment:
The string literal `"COLUMN_STATS_ACCURATE"` should be replaced with
`StatsSetupConst.COLUMN_STATS_ACCURATE` to be consistent with the rest of the
codebase and avoid hard-coded strings.
```suggestion
expected.remove(StatsSetupConst.COLUMN_STATS_ACCURATE);
actual.remove(StatsSetupConst.COLUMN_STATS_ACCURATE);
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]