szehon-ho commented on code in PR #4456:
URL: https://github.com/apache/iceberg/pull/4456#discussion_r843073580
##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -90,6 +92,7 @@
private static final String HIVE_LOCK_CHECK_MAX_WAIT_MS =
"iceberg.hive.lock-check-max-wait-ms";
private static final String HIVE_ICEBERG_METADATA_REFRESH_MAX_RETRIES =
"iceberg.hive.metadata-refresh-max-retries";
private static final String HIVE_TABLE_LEVEL_LOCK_EVICT_MS =
"iceberg.hive.table-level-lock-evict-ms";
+ private static final long HIVE_TABLE_PROPERTY_VALUE_SIZE_MAX = 4000;
Review Comment:
Should we should expose this configuration, otherwise it may not match what
works for all users (different database and backends)? I am not sure what
others think.
##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -402,9 +405,32 @@ private void setHmsTableParameters(String
newMetadataLocation, Table tbl, TableM
parameters.put(StatsSetupConst.TOTAL_SIZE,
summary.get(SnapshotSummary.TOTAL_FILE_SIZE_PROP));
}
+ setSnapshotStats(metadata, parameters);
+
tbl.setParameters(parameters);
}
+ private void setSnapshotStats(TableMetadata metadata, Map<String, String>
parameters) {
+ Snapshot currentSnapshot = metadata.currentSnapshot();
Review Comment:
I think we need a way to clear the properties if they are not there anymore
in current snapshot (like original code setHmsProperties). For example if the
summary becomes suddenly over the length in new snapshot, then it will not
override the old summary and would be wrong.
##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -402,9 +405,32 @@ private void setHmsTableParameters(String
newMetadataLocation, Table tbl, TableM
parameters.put(StatsSetupConst.TOTAL_SIZE,
summary.get(SnapshotSummary.TOTAL_FILE_SIZE_PROP));
}
+ setSnapshotStats(metadata, parameters);
+
tbl.setParameters(parameters);
}
+ private void setSnapshotStats(TableMetadata metadata, Map<String, String>
parameters) {
+ Snapshot currentSnapshot = metadata.currentSnapshot();
+ if (currentSnapshot != null) {
+ parameters.put(TableProperties.CURRENT_SNAPSHOT_ID,
String.valueOf(currentSnapshot.snapshotId()));
+ parameters.put(TableProperties.CURRENT_SNAPSHOT_TIMESTAMP,
String.valueOf(currentSnapshot.timestampMillis()));
+ try {
+ String summary =
JsonUtil.mapper().writeValueAsString(currentSnapshot.summary());
+ if (summary.length() <= HIVE_TABLE_PROPERTY_VALUE_SIZE_MAX) {
+ parameters.put(TableProperties.CURRENT_SNAPSHOT_SUMMARY, summary);
Review Comment:
Also had one concern, there's some repeating information
(setHmsTableParameters puts some but not all of the summary). Did we ever
consider just inlining these fields from the summary like they did over there?
##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -402,9 +405,32 @@ private void setHmsTableParameters(String
newMetadataLocation, Table tbl, TableM
parameters.put(StatsSetupConst.TOTAL_SIZE,
summary.get(SnapshotSummary.TOTAL_FILE_SIZE_PROP));
}
+ setSnapshotStats(metadata, parameters);
+
tbl.setParameters(parameters);
}
+ private void setSnapshotStats(TableMetadata metadata, Map<String, String>
parameters) {
+ Snapshot currentSnapshot = metadata.currentSnapshot();
+ if (currentSnapshot != null) {
+ parameters.put(TableProperties.CURRENT_SNAPSHOT_ID,
String.valueOf(currentSnapshot.snapshotId()));
+ parameters.put(TableProperties.CURRENT_SNAPSHOT_TIMESTAMP,
String.valueOf(currentSnapshot.timestampMillis()));
+ try {
+ String summary =
JsonUtil.mapper().writeValueAsString(currentSnapshot.summary());
+ if (summary.length() <= HIVE_TABLE_PROPERTY_VALUE_SIZE_MAX) {
+ parameters.put(TableProperties.CURRENT_SNAPSHOT_SUMMARY, summary);
+ } else {
+ LOG.warn("Not expose the current snapshot({}) summary in HMS since
it exceeds {} characters",
Review Comment:
nit: "Not exposing".
##########
hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCatalog.java:
##########
@@ -468,4 +473,54 @@ public void testUUIDinTableProperties() throws Exception {
catalog.dropTable(tableIdentifier);
}
}
+
+ @Test
+ public void testSnapshotStatsTableProperties() throws Exception {
+ Schema schema = new Schema(
+ required(1, "id", Types.IntegerType.get(), "unique ID"),
+ required(2, "data", Types.StringType.get())
+ );
+ TableIdentifier tableIdentifier = TableIdentifier.of(DB_NAME, "tbl");
+ String location = temp.newFolder("tbl").toString();
+
+ try {
+ catalog.buildTable(tableIdentifier, schema)
+ .withLocation(location)
+ .create();
+
+ String tableName = tableIdentifier.name();
+ org.apache.hadoop.hive.metastore.api.Table hmsTable =
+ metastoreClient.getTable(tableIdentifier.namespace().level(0),
tableName);
+
+ // check whether parameters are in expected state
+ Map<String, String> parameters = hmsTable.getParameters();
+ Assert.assertEquals("0", parameters.get(TableProperties.SNAPSHOT_COUNT));
Review Comment:
Can we put some error messages in these asserts, like in other tests?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]