rdblue commented on a change in pull request #3556:
URL: https://github.com/apache/iceberg/pull/3556#discussion_r749710975



##########
File path: site/docs/spec.md
##########
@@ -212,6 +212,9 @@ Columns in Iceberg data files are selected by field id. The 
table schema's colum
 
 For example, a file may be written with schema `1: a int, 2: b string, 3: c 
double` and read using projection schema `3: measurement, 2: name, 4: a`. This 
must select file columns `c` (renamed to `measurement`), `b` (now called 
`name`), and a column of `null` values called `a`; in that order.
 
+Tables may also define a property `schema.name-mapping.default` with a JSON 
map of `columnName` -> `fieldId` which will be used if a data file was written 
without field ids. This `NameMapping` will **only** be used on files without 
field ids. Files imported or added to an Iceberg table from a system that does 
not generate field ids will fall back to using the table's name mapping to map 
columns to field ids.

Review comment:
       I'd change "This NameMapping will only" to "This NameMapping may only" 
because we're not describing behavior, we are setting requirements for behavior.
   
   This is a great start, but should also specify the name mapping itself more 
formally.
   
   > A name mapping is a list of field mapping objects. Each field mapping has 
the following properties:
   > * `names`: A required list of 0 or more names for a field. Note that names 
may contain `.`
   > * `field-id`: An optional Iceberg field ID to be used for a field with one 
of the given names
   > * `fields`: An optional list of field mappings for child fields of 
structs, maps, and lists
   >
   > A field mapping may map multiple names to a single field ID to support 
cases where a name has been updated. For example, Avro field aliases should 
also be listed in names. Similarly, fields that exist only in the Iceberg 
schema may be in the field mapping with an empty list of names, and fields that 
exist in imported files but not in the Iceberg schema may omit `field-id`.
   >
   > Mappings for list types should contain a child mapping for the "element" 
field and mappings for map types should contain child mappings for "key" and 
"value" fields.
   >
   > Fields that are not mapped to IDs must be ignored.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to