This is an automated email from the ASF dual-hosted git repository.
fokko pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/iceberg.git
The following commit(s) were added to refs/heads/main by this push:
new 5821efcdd5 Spec: Clarify missing fields when writing (#8672)
5821efcdd5 is described below
commit 5821efcdd521fa4d0f244500d3edb5e1c9e06311
Author: Fokko Driesprong <[email protected]>
AuthorDate: Fri Apr 26 08:50:30 2024 +0200
Spec: Clarify missing fields when writing (#8672)
* Spec: Carify missing fields when writing
Jan raised a point on slack of the symantic meaning of a field
that can be written:
https://apache-iceberg.slack.com/archives/C03LG1D563F/p1695834739711569
There are two options:
- The field is not part of the schema, and omitted from the file
- The field is part of the schema, but the value is not written (nullable)
My personal take on this is that we should use static schema's when
writing Avro files, so that all the fields that are either optional or
required are in the schema.
I'm well aware of that this doesn't impose any issues if you dogfood
the Iceberg Avro reader, where you can add required fields, for example
the `134: content` field in the manifest.
However, I think we should try to stick to the concept of write strict,
read permissive where we try to encourage people to write all the fields
that are in the spec (even they if the value itself is all null).
* Add manifest-list explicitly
Co-authored-by: JFinis <[email protected]>
* Update wording
* Comments
* Retain formatting
* Thanks Steven
---------
Co-authored-by: JFinis <[email protected]>
---
format/spec.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/format/spec.md b/format/spec.md
index aa905e7032..b00c63256a 100644
--- a/format/spec.md
+++ b/format/spec.md
@@ -127,12 +127,12 @@ Tables do not require rename, except for tables that use
atomic rename to implem
#### Writer requirements
-Some tables in this spec have columns that specify requirements for v1 and v2
tables. These requirements are intended for writers when adding metadata files
to a table with the given version.
+Some tables in this spec have columns that specify requirements for v1 and v2
tables. These requirements are intended for writers when adding metadata files
(including manifests files and manifest lists) to a table with the given
version.
| Requirement | Write behavior |
|-------------|----------------|
| (blank) | The field should be omitted |
-| _optional_ | The field can be written |
+| _optional_ | The field can be written or omitted |
| _required_ | The field must be written |
Readers should be more permissive because v1 metadata files are allowed in v2
tables so that tables can be upgraded to v2 without rewriting the metadata
tree. For manifest list and manifest files, this table shows the expected v2
read behavior: