[GitHub] [iceberg] JonasJ-ap commented on a diff in pull request #6997: Python: Infer Iceberg schema from the Parquet file

via GitHub Sat, 11 Mar 2023 11:46:50 -0800


JonasJ-ap commented on code in PR #6997:
URL: https://github.com/apache/iceberg/pull/6997#discussion_r1133139183



##########
python/pyiceberg/io/pyarrow.py:
##########
@@ -356,14 +366,19 @@ def field(self, field: NestedField, field_result: 
pa.DataType) -> pa.Field:
             name=field.name,
             type=field_result,
             nullable=field.optional,
-            metadata={"doc": field.doc, "id": str(field.field_id)} if 
field.doc else {},
+            metadata={PYTHON_DOC.decode(): field.doc, 
PYTHON_FIELD_ID.decode(): str(field.field_id)}
+            if field.doc
+            else {PYTHON_FIELD_ID.decode(): str(field.field_id)},

Review Comment:
   I made some chanes in the metadata field name for iceberg to pyarrow 
visitor: from `doc`, `id` to `PYTHON:field_doc` to `PYTHON:field_id`. My 
thought here is that with this change, we can make the name consistent with its 
source: `PARQUET` indicates that the field comes from the parquet file. 
`PYTHON` indicates that the field is inferred from the pyiceberg table schema. 
   
   The `_get_field_id_and_doc` will first search field labelled with `PARQUET` 
first and then search for fields labelled with `PYTHON`. The order is 
consistent with implementation of name mapping in the future as suggested by 
@rdblue in https://github.com/apache/iceberg/pull/6997#discussion_r1125740765
   
   @Fokko @rdblue May I ask what do you think about this change?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] JonasJ-ap commented on a diff in pull request #6997: Python: Infer Iceberg schema from the Parquet file

Reply via email to