stevenzwu commented on code in PR #15630: URL: https://github.com/apache/iceberg/pull/15630#discussion_r3119345080
########## format/spec.md: ########## @@ -123,9 +128,22 @@ Tables do not require random-access writes. Once written, data and metadata file Tables do not require rename, except for tables that use atomic rename to implement the commit operation for new metadata files. +### Relative Locations in Metadata + +Version 4 of the Iceberg spec adds support for relative locations in metadata, enabling tables to be relocated without rewriting metadata files. Key changes include: + +* Support for relative locations in all metadata tracked path fields, resolved against the table's base location +* The table `location` field becomes optional, allowing the table location to be: + * Provided by an owning catalog + * Inferred from the metadata file location or storage layout + * Supplied directly where necessary Review Comment: what does `supplied directly` mean? the `location` field from table metadata? if yes, might be more clear to spell it out explicitly. ########## format/spec.md: ########## @@ -123,9 +128,22 @@ Tables do not require random-access writes. Once written, data and metadata file Tables do not require rename, except for tables that use atomic rename to implement the commit operation for new metadata files. +### Relative Locations in Metadata Review Comment: Is `File Paths in Metadata` slightly better for a summary sub section? It also matches the later section of `Paths in Metadata` We can also briefly talk about in V3 and prio, all file locations use absolute path. ########## format/spec.md: ########## @@ -1637,6 +1686,30 @@ The binary single-value serialization can be used to store the lower and upper b ## Appendix E: Format version changes +### Version 4 + +Relative path support is added in v4. + +Reading v3 metadata for v4: + +* All location fields are treated as absolute paths +* Any location field without a uri scheme prefix must prepend a scheme component consistent with v4 Review Comment: I thought absolute paths are required to start with a scheme component. why do we need this for pre-V4 metadata files? ########## format/spec.md: ########## @@ -168,6 +188,35 @@ All columns must be written to data files even if they introduce redundancy with Writers are not allowed to commit files with a partition spec that contains a field with an unknown transform. +### Paths in Metadata + +Path strings stored in Iceberg metadata files are classified as one of two types: + +* **Absolute path** -- A path string that includes a [URI scheme](https://datatracker.ietf.org/doc/html/rfc3986#section-3.1) (e.g., `s3:`, `gs:`, `hdfs:`, `file:`). Absolute paths are used as-is without modification. +* **Relative path** -- A path string that does not include a URI scheme. Relative paths must be resolved against the table's base location before use. + +Prior to v4, all path fields must contain absolute paths. Starting with v4, path fields may contain either absolute or relative paths. Directory navigation symbols (`.` and `..`) and other file system conventions are not supported in relative paths. Review Comment: > Directory navigation symbols (`.` and `..`) and other file system conventions are not supported in relative paths. just for my knowledge, are they allowed in absolute paths? ########## format/spec.md: ########## @@ -168,6 +188,35 @@ All columns must be written to data files even if they introduce redundancy with Writers are not allowed to commit files with a partition spec that contains a field with an unknown transform. +### Paths in Metadata + +Path strings stored in Iceberg metadata files are classified as one of two types: + +* **Absolute path** -- A path string that includes a [URI scheme](https://datatracker.ietf.org/doc/html/rfc3986#section-3.1) (e.g., `s3:`, `gs:`, `hdfs:`, `file:`). Absolute paths are used as-is without modification. +* **Relative path** -- A path string that does not include a URI scheme. Relative paths must be resolved against the table's base location before use. + +Prior to v4, all path fields must contain absolute paths. Starting with v4, path fields may contain either absolute or relative paths. Directory navigation symbols (`.` and `..`) and other file system conventions are not supported in relative paths. + +#### Path Resolution + +Path resolution is the process of producing an absolute path from a relative path by combining it with the table's base location. If a path is absolute, it is used as-is. If a path is relative, it is concatenated with the table location to produce an absolute path: + +* If the path contains a URI scheme, it is absolute and is used without modification. +* If the path does not contain a URI scheme, the resolved path is the table location followed by the relative path. + +Paths used as prefixes must not end in a path separator. The relative portion is appended to the prefix without introduction of any additional separator characters. Review Comment: is `Paths used as prefixes` just `Table base location`? is latter more clear? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
