ayushtkn commented on code in PR #5724:
URL: https://github.com/apache/hive/pull/5724#discussion_r2317030550
##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergTableUtil.java:
##########
@@ -223,18 +233,18 @@ static Snapshot getTableSnapshot(Table table, String
snapshotRef) {
return table.currentSnapshot();
}
- static Optional<Path> getColStatsPath(Table table) {
+ static Path getColStatsPath(Table table) {
return getColStatsPath(table, table.currentSnapshot().snapshotId());
}
- static Optional<Path> getColStatsPath(Table table, long snapshotId) {
+ static Path getColStatsPath(Table table, long snapshotId) {
Review Comment:
If we are changing the return type, can't we change it to String? I see we
are either checking for `null` or during `readColStats` reading `Path` and
then converting it to `String`. We can avoid the path creating in this method
itself
##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##########
@@ -1864,19 +1861,6 @@ public void addResourcesForCreateTable(Map<String,
String> tblProps, HiveConf hi
}
}
- /**
- * Check the operation type of all snapshots which are newer than the
specified. The specified snapshot is excluded.
- * @param hmsTable table metadata stored in Hive Metastore
- * @param since the snapshot preceding the oldest snapshot which should be
checked.
- * The value null means all should be checked.
- * @return null if table is empty, true if all snapshots are {@link
SnapshotContext.WriteOperationType#APPEND}s,
- * false otherwise.
- *
- * @deprecated
- * <br>Use {@link HiveStorageHandler#getSnapshotContexts(
- * org.apache.hadoop.hive.ql.metadata.Table hmsTable, SnapshotContext since)}
- * and check {@link SnapshotContext.WriteOperationType#APPEND}.equals({@link
SnapshotContext#getOperation()}).
- */
Review Comment:
Why are you removing the javadoc here?
##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##########
@@ -637,7 +644,7 @@ private boolean writeColStats(List<ColumnStatistics>
colStats, Table tbl) {
return true;
} catch (Exception e) {
- LOG.warn("Unable to invalidate or merge stats: {}", e.getMessage());
+ LOG.warn("Unable to invalidate or merge column stats: {}",
e.getMessage());
Review Comment:
if the stats file was generated, but we weren't able to update or write
column stat, shouldn't we delete the ``statsPath`` as well as a cleanup here?
##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##########
@@ -601,20 +601,27 @@ private boolean writeColStats(List<ColumnStatistics>
colStats, Table tbl) {
long snapshotId = tbl.currentSnapshot().snapshotId();
long snapshotSequenceNumber = tbl.currentSnapshot().sequenceNumber();
- colStats.forEach(statsObj -> {
- byte[] serializeColStats = SerializationUtils.serialize(statsObj);
- puffinWriter.add(
- new Blob(
- ColumnStatisticsObj.class.getSimpleName(),
- ImmutableList.of(1),
- snapshotId,
- snapshotSequenceNumber,
- ByteBuffer.wrap(serializeColStats),
- PuffinCompressionCodec.NONE,
- ImmutableMap.of("partition",
- String.valueOf(statsObj.getStatsDesc().getPartName()))
- ));
+ colStats.forEach(stats -> {
+ boolean isTblLevel = stats.getStatsDesc().isIsTblLevel();
+
+ for (Serializable statsObj : isTblLevel ? stats.getStatsObj() :
Collections.singletonList(stats)) {
+ byte[] serializeColStats = SerializationUtils.serialize(statsObj);
+ puffinWriter.add(
+ new Blob(
+ ColumnStatisticsObj.class.getSimpleName(),
+ ImmutableList.of(isTblLevel ? tbl.spec().schema().findField(
+ ((ColumnStatisticsObj) statsObj).getColName()).fieldId() :
1),
Review Comment:
Is this under some lock. Curious about the case of schema evolution, like if
the column gets dropped in the iceberg table before we fetch the table in
``setColStatistics``
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]