zhangbutao commented on code in PR #4871:
URL: https://github.com/apache/hive/pull/4871#discussion_r1389593772


##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##########
@@ -593,6 +594,9 @@ private Path getColStatsPath(Table table) {
   }
 
   private Path getColStatsPath(Table table, long snapshotId) {
+    if (((BaseTable) table).operations() instanceof HadoopTableOperations) {
+      return new Path(table.location() + STATS + snapshotId);

Review Comment:
   Yes, this path is puffin stats file path. Hadoop tables and Hive-catalog 
tables are really no different, they both can use the stats to optimize query. 
After this PR, the puffin stats file for Hadoop tables is created correctly, 
and query can get stats from the puffin stats file. So i think we definitely 
need the stats file for Hadoop tables.
   
   What i want to say, should we keep the same path format for both Hadoop 
tables and hive-catalog tables? Then we don't need to judge this table type.
    e.g.
   ```
     private Path getColStatsPath(Table table, long snapshotId) {
   +    return new Path(table.location() + STATS + snapshotId);
   -    return new Path(table.location() + STATS + table.name() + snapshotId);
     } 
   ```
   But I'm not sure if there will be backward compatibility issues for 
hive-catalog tables.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to