oneonestar opened a new issue, #9723:
URL: https://github.com/apache/iceberg/issues/9723
### Query engine
Spark
### Question
In `metadata_log_entries` table, `latest_snapshot_id, latest_schema_id,
latest_sequence_number` return null in some cases. Also, those values became
null after `CREATE OR REPLACE`.
The current implementation rely on `snapshot-log` field in metadata file,
and `snapshot-log` got reset after after the CREATE OR REPLACE statement.
Is this an intended behavior?
Could someone provide a precise definition for `latest_snapshot_id,
latest_schema_id, latest_sequence_number`?
```
spark-sql> create table test.t1 (c1 integer);
spark-sql> alter table test.t1 add columns (c2 string);
spark-sql> SELECT * FROM test.t1.metadata_log_entries;
2024-01-22 18:51:41.331
hdfs://hadoop/metadata/00000-d1ad9769-3bf3-497d-a86e-20f71d2700f7.metadata.json
NULL NULL NULL
2024-01-22 18:51:41.593
hdfs://hadoop/metadata/00001-9eb3275c-f53d-4e24-bc23-fdcadae28d92.metadata.json
NULL NULL NULL
spark-sql> insert into test.t1 values (1, 'a');
spark-sql> SELECT * FROM test.t1.metadata_log_entries;
2024-01-22 18:51:41.331
hdfs://hadoop/metadata/00000-d1ad9769-3bf3-497d-a86e-20f71d2700f7.metadata.json
NULL NULL NULL
2024-01-22 18:51:41.593
hdfs://hadoop/metadata/00001-9eb3275c-f53d-4e24-bc23-fdcadae28d92.metadata.json
NULL NULL NULL
2024-01-22 18:51:47.538
hdfs://hadoop/metadata/00002-ec3f0aa6-7dae-4c45-90ad-13839eefbd54.metadata.json
1836642618692808023 1 0
spark-sql> delete from test.t1 where c1 = 1;
spark-sql> SELECT * FROM test.t1.metadata_log_entries;
2024-01-22 18:51:41.331
hdfs://hadoop/metadata/00000-d1ad9769-3bf3-497d-a86e-20f71d2700f7.metadata.json
NULL NULL NULL
2024-01-22 18:51:41.593
hdfs://hadoop/metadata/00001-9eb3275c-f53d-4e24-bc23-fdcadae28d92.metadata.json
NULL NULL NULL
2024-01-22 18:51:47.538
hdfs://hadoop/metadata/00002-ec3f0aa6-7dae-4c45-90ad-13839eefbd54.metadata.json
1836642618692808023 1 0
2024-01-22 18:51:52.6
hdfs://hadoop/metadata/00003-806771f6-e7f1-44f5-ac80-350f2c084505.metadata.json
8876099574020403871 1 0
spark-sql> CALL local.system.rewrite_data_files('test.t1');
spark-sql> SELECT * FROM test.t1.metadata_log_entries;
2024-01-22 18:51:41.331
hdfs://hadoop/metadata/00000-d1ad9769-3bf3-497d-a86e-20f71d2700f7.metadata.json
NULL NULL NULL
2024-01-22 18:51:41.593
hdfs://hadoop/metadata/00001-9eb3275c-f53d-4e24-bc23-fdcadae28d92.metadata.json
NULL NULL NULL
2024-01-22 18:51:47.538
hdfs://hadoop/metadata/00002-ec3f0aa6-7dae-4c45-90ad-13839eefbd54.metadata.json
1836642618692808023 1 0
2024-01-22 18:51:52.6
hdfs://hadoop/metadata/00003-806771f6-e7f1-44f5-ac80-350f2c084505.metadata.json
8876099574020403871 1 0
spark-sql> create or replace table test.t1 (c3 integer);
spark-sql> SELECT * FROM test.t1.metadata_log_entries;
2024-01-22 18:51:41.331
hdfs://hadoop/metadata/00000-d1ad9769-3bf3-497d-a86e-20f71d2700f7.metadata.json
NULL NULL NULL
2024-01-22 18:51:41.593
hdfs://hadoop/metadata/00001-9eb3275c-f53d-4e24-bc23-fdcadae28d92.metadata.json
NULL NULL NULL
2024-01-22 18:51:47.538
hdfs://hadoop/metadata/00002-ec3f0aa6-7dae-4c45-90ad-13839eefbd54.metadata.json
NULL NULL NULL
2024-01-22 18:51:52.6
hdfs://hadoop/metadata/00003-806771f6-e7f1-44f5-ac80-350f2c084505.metadata.json
NULL NULL NULL
2024-01-22 18:52:01.749
hdfs://hadoop/metadata/00004-d9c74f49-37c8-4463-8565-7b31660723e2.metadata.json
NULL NULL NULL
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]