anoopj commented on code in PR #15630:
URL: https://github.com/apache/iceberg/pull/15630#discussion_r3157554714


##########
format/spec.md:
##########
@@ -168,6 +188,35 @@ All columns must be written to data files even if they 
introduce redundancy with
 
 Writers are not allowed to commit files with a partition spec that contains a 
field with an unknown transform.
 
+### Paths in Metadata
+
+Path strings stored in Iceberg metadata files are classified as one of two 
types:
+
+* **Absolute path** -- A path string that includes a [URI 
scheme](https://datatracker.ietf.org/doc/html/rfc3986#section-3.1) (e.g., 
`s3:`, `gs:`, `hdfs:`, `file:`). Absolute paths are used as-is without 
modification.

Review Comment:
   RFC 3986 defines an absolute URI as starting with a valid scheme followed by 
:, where a scheme is a letter followed by any combination of letters, digits, 
+, -, .. Should implementations validate against the full RFC, or can we define 
something simpler? For instance: a path is absolute if it contains `:` and no 
`/` appears before the first `:` This is sufficient for all real filesystem 
URIs and avoids more expensive scheme validation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to