rdblue commented on a change in pull request #3556: URL: https://github.com/apache/iceberg/pull/3556#discussion_r749710975
########## File path: site/docs/spec.md ########## @@ -212,6 +212,9 @@ Columns in Iceberg data files are selected by field id. The table schema's colum For example, a file may be written with schema `1: a int, 2: b string, 3: c double` and read using projection schema `3: measurement, 2: name, 4: a`. This must select file columns `c` (renamed to `measurement`), `b` (now called `name`), and a column of `null` values called `a`; in that order. +Tables may also define a property `schema.name-mapping.default` with a JSON map of `columnName` -> `fieldId` which will be used if a data file was written without field ids. This `NameMapping` will **only** be used on files without field ids. Files imported or added to an Iceberg table from a system that does not generate field ids will fall back to using the table's name mapping to map columns to field ids. Review comment: I'd change "This NameMapping will only" to "This NameMapping may only" because we're not describing behavior, we are setting requirements for behavior. This is a great start, but should also specify the name mapping itself more formally. > A name mapping is a list of field mapping objects. Each field mapping has the following properties: > * `names`: A required list of 0 or more names for a field. Note that names may contain `.` > * `field-id`: An optional Iceberg field ID to be used for a field with one of the given names > * `fields`: An optional list of field mappings for child fields of structs, maps, and lists > > A field mapping may map multiple names to a single field ID to support cases where a name has been updated. For example, Avro field aliases should also be listed in names. Similarly, fields that exist only in the Iceberg schema may be in the field mapping with an empty list of names, and fields that exist in imported files but not in the Iceberg schema may omit `field-id`. > > Mappings for list types should contain a child mapping for the "element" field and mappings for map types should contain child mappings for "key" and "value" fields. > > Fields that are not mapped to IDs must be ignored. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org