RussellSpitzer opened a new issue, #8856:
URL: https://github.com/apache/iceberg/issues/8856
### Feature Request / Improvement
Currently all versions of metadata tables have the exact same schema as
their not "all" versions. This is actually not very useful if you are
attempting to locate the state of a particular entry at a specific time because
the `snapshot_id` always just shows the file's original snapshot.
For example the entries table looks like
```
scala> spark.sql("SELECT * FROM db.timezoned.entries").show
warning: 1 deprecation (since 2.13.3); for details, enable `:setting
-deprecation` or `:replay -deprecation`
+------+-------------------+---------------+--------------------+--------------------+--------------------+
|status| snapshot_id|sequence_number|file_sequence_number|
data_file| readable_metrics|
+------+-------------------+---------------+--------------------+--------------------+--------------------+
| 1|6561920950175488866| 0| 0|{0,
/Users/russel...|{{51, 1, 0, null,...|
| 1|5535987506380389562| 0| 0|{0,
/Users/russel...|{{51, 1, 0, null,...|
| 1|2517256618694516958| 0| 0|{0,
/Users/russel...|{{51, 1, 0, null,...|
| 1|2750236691316126600| 0| 0|{0,
/Users/russel...|{{51, 1, 0, null,...|
| 1|7179885233531513409| 0| 0|{0,
/Users/russel...|{{51, 1, 0, null,...|
```
And `all_entries` looks like
```
scala> spark.sql("SELECT * FROM db.timezoned.all_entries").show
warning: 1 deprecation (since 2.13.3); for details, enable `:setting
-deprecation` or `:replay -deprecation`
+------+-------------------+---------------+--------------------+--------------------+--------------------+
|status| snapshot_id|sequence_number|file_sequence_number|
data_file| readable_metrics|
+------+-------------------+---------------+--------------------+--------------------+--------------------+
| 1|2517256618694516958| 0| 0|{0,
/Users/russel...|{{51, 1, 0, null,...|
| 1|6561920950175488866| 0| 0|{0,
/Users/russel...|{{51, 1, 0, null,...|
| 1|2750236691316126600| 0| 0|{0,
/Users/russel...|{{51, 1, 0, null,...|
| 1|7179885233531513409| 0| 0|{0,
/Users/russel...|{{51, 1, 0, null,...|
| 1|5535987506380389562| 0| 0|{0,
/Users/russel...|{{51, 1, 0, null,...|
| 0|6561920950175488866| 0| 0|{0,
/Users/russel...|{{51, 1, 0, null,...|
| 0|5535987506380389562| 0| 0|{0,
/Users/russel...|{{51, 1, 0, null,...|
| 0|2517256618694516958| 0| 0|{0,
/Users/russel...|{{51, 1, 0, null,...|
| 0|2750236691316126600| 0| 0|{0,
/Users/russel...|{{51, 1, 0, null,...|
| 0|7179885233531513409| 0| 0|{0,
/Users/russel...|{{51, 1, 0, null,...|
+------+-------------------+---------------+--------------------+--------------------+--------------------+
````
By looking at all_entries it is impossible for me to determine whether or
not all the manifests were rewritten at once, or if they were rewritten in
groups. Ideally we would see something like
```
+-------------------+-------------+------+-------------------+---------------+--------------------+--------------------+--------------------+
| as_of| time|status|
snapshot_id|sequence_number|file_sequence_number| data_file|
readable_metrics|
+-------------------+-------------+------+-------------------+---------------+--------------------+--------------------+--------------------+
|7179885233531513409|1697493267302| 1|7179885233531513409|
0| 0|{0, /Users/russel...|{{51, 1, 0, null,...|
|2750236691316126600|1697493268363| 1|2750236691316126600|
0| 0|{0, /Users/russel...|{{51, 1, 0, null,...|
|2750236691316126600|1697493268363| 1|7179885233531513409|
0| 0|{0, /Users/russel...|{{51, 1, 0, null,...|
|2517256618694516958|1697493269568| 1|2517256618694516958|
0| 0|{0, /Users/russel...|{{51, 1, 0, null,...|
|2517256618694516958|1697493269568| 1|2750236691316126600|
0| 0|{0, /Users/russel...|{{51, 1, 0, null,...|
|2517256618694516958|1697493269568| 1|7179885233531513409|
0| 0|{0, /Users/russel...|{{51, 1, 0, null,...|
|5535987506380389562|1697493270419| 1|5535987506380389562|
0| 0|{0, /Users/russel...|{{51, 1, 0, null,...|
|5535987506380389562|1697493270419| 1|2517256618694516958|
0| 0|{0, /Users/russel...|{{51, 1, 0, null,...|
|5535987506380389562|1697493270419| 1|2750236691316126600|
0| 0|{0, /Users/russel...|{{51, 1, 0, null,...|
|5535987506380389562|1697493270419| 1|7179885233531513409|
0| 0|{0, /Users/russel...|{{51, 1, 0, null,...|
|6561920950175488866|1697493271193| 1|6561920950175488866|
0| 0|{0, /Users/russel...|{{51, 1, 0, null,...|
|6561920950175488866|1697493271193| 1|5535987506380389562|
0| 0|{0, /Users/russel...|{{51, 1, 0, null,...|
|6561920950175488866|1697493271193| 1|2517256618694516958|
0| 0|{0, /Users/russel...|{{51, 1, 0, null,...|
|6561920950175488866|1697493271193| 1|2750236691316126600|
0| 0|{0, /Users/russel...|{{51, 1, 0, null,...|
|6561920950175488866|1697493271193| 1|7179885233531513409|
0| 0|{0, /Users/russel...|{{51, 1, 0, null,...|
|2315221993819944328|1697497673027| 1|6561920950175488866|
0| 0|{0, /Users/russel...|{{51, 1, 0, null,...|
|2315221993819944328|1697497673027| 1|5535987506380389562|
0| 0|{0, /Users/russel...|{{51, 1, 0, null,...|
|2315221993819944328|1697497673027| 1|2517256618694516958|
0| 0|{0, /Users/russel...|{{51, 1, 0, null,...|
|2315221993819944328|1697497673027| 1|2750236691316126600|
0| 0|{0, /Users/russel...|{{51, 1, 0, null,...|
|2315221993819944328|1697497673027| 1|7179885233531513409|
0| 0|{0, /Users/russel...|{{51, 1, 0, null,...|
+-------------------+-------------+------+-------------------+---------------+--------------------+--------------------+--------------------+
```
Apologies that my code is not getting status status correctly but basically
the idea being that we add on two columns (names are still up for debate)
"as_of_snapshot", "as_of_time"
These would allow us to be able to analyze the actual history in all_entires
and all_manifests
### Query engine
None
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]