bitsondatadev commented on code in PR #117:
URL: https://github.com/apache/iceberg-python/pull/117#discussion_r1380591348
##########
tests/io/test_pyarrow.py:
##########
@@ -708,15 +709,17 @@ def _write_table_to_file(filepath: str, schema:
pa.Schema, table: pa.Table) -> s
@pytest.fixture
def file_int(schema_int: Schema, tmpdir: str) -> str:
- pyarrow_schema = pa.schema(schema_to_pyarrow(schema_int),
metadata={"iceberg.schema": schema_int.model_dump_json()})
+ pyarrow_schema = schema_to_pyarrow(schema_int, metadata={ICEBERG_SCHEMA:
bytes(schema_int.model_dump_json(), 'utf-8')})
Review Comment:
Should this string be using a [constant in a lib
somewhere](https://stackoverflow.com/a/44109455)? Or at we could least create
an encodings class that centralizes all the schema stuff (e.g. create a
constant for `'utf-8'`, hides `ICEBERG_SCHEMA` and expose some cleaner methods
that hides the bytes conversion, etc...
WDYT?
##########
pyiceberg/io/pyarrow.py:
##########
@@ -435,13 +435,18 @@ def delete(self, location: Union[str, InputFile,
OutputFile]) -> None:
raise # pragma: no cover - If some other kind of OSError, raise
the raw error
-def schema_to_pyarrow(schema: Union[Schema, IcebergType]) -> pa.schema:
- return visit(schema, _ConvertToArrowSchema())
+def schema_to_pyarrow(schema: Union[Schema, IcebergType], metadata:
Dict[bytes, bytes] = EMPTY_DICT) -> pa.schema:
+ return visit(schema, _ConvertToArrowSchema(metadata))
Review Comment:
What is the `visit()` behavior with an empty dict?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]