SourabhBadhya commented on code in PR #4397:
URL: https://github.com/apache/hive/pull/4397#discussion_r1223771475
##########
ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java:
##########
@@ -220,8 +220,10 @@ public int persistColumnStats(Hive db, Table tbl) throws
HiveException, MetaExce
start = System. currentTimeMillis();
if (tbl != null && tbl.isNonNative() &&
tbl.getStorageHandler().canSetColStatistics(tbl)) {
tbl.getStorageHandler().setColStatistics(tbl, colStats);
+ } else {
+ // Set table or partition column statistics in metastore.
+ db.setPartitionColumnStatistics(request);
}
- db.setPartitionColumnStatistics(request);
Review Comment:
@zhangbutao I agree with your point. However, storing stats in 2 places has
its pros & cons -
Pros -
1. We can fallback to metastore by changing the config -
`hive.iceberg.stats.source=metastore` if we are able to not able to get stats
from Puffin files.
Cons -
1. Any change in Puffin files by external clients is not visible to
metastore.
2. Performance effect of executing these metastore DB calls to store column
stats.
In the approach mentioned in the PR, if users want to use metastore to get
stats if they are not able to get stats from Puffin, then set
`hive.iceberg.stats.source=metastore` and execute `ANALYZE TABLE <tableName>
COMPUTE STATISTICS FOR COLUMNS`. (This will have an overhead of one more
ANALYZE query).
I will leave it to the community to decide if its best to store stats in 2
places or storing it in a single place is sufficient. If the community thinks
that this it is best to store in 2 places, then I won't proceed further.
Otherwise, I will continue with the patch.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]