JonasJ-ap commented on code in PR #6997:
URL: https://github.com/apache/iceberg/pull/6997#discussion_r1133139183
##########
python/pyiceberg/io/pyarrow.py:
##########
@@ -356,14 +366,19 @@ def field(self, field: NestedField, field_result:
pa.DataType) -> pa.Field:
name=field.name,
type=field_result,
nullable=field.optional,
- metadata={"doc": field.doc, "id": str(field.field_id)} if
field.doc else {},
+ metadata={PYTHON_DOC.decode(): field.doc,
PYTHON_FIELD_ID.decode(): str(field.field_id)}
+ if field.doc
+ else {PYTHON_FIELD_ID.decode(): str(field.field_id)},
Review Comment:
I made some chanes in the metadata field name for iceberg to pyarrow
visitor: from `doc`, `id` to `PYTHON:field_doc` to `PYTHON:field_id`. My
thought here is that with this change, we can make the name consistent with its
source: `PARQUET` indicates that the field comes from the parquet file.
`PYTHON` indicates that the field is inferred from the pyiceberg table schema.
The `_get_field_id_and_doc` will first search field labelled with `PARQUET`
first and then search for fields labelled with `PYTHON`. The order is
consistent with implementation of name mapping in the future as suggested by
@rdblue in https://github.com/apache/iceberg/pull/6997#discussion_r1125740765
@Fokko @rdblue May I ask what do you think about this change?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]