Fokko commented on code in PR #6997:
URL: https://github.com/apache/iceberg/pull/6997#discussion_r1124329104
##########
python/pyiceberg/io/pyarrow.py:
##########
@@ -476,6 +483,217 @@ def expression_to_pyarrow(expr: BooleanExpression) ->
pc.Expression:
return boolean_expression_visit(expr, _ConvertToArrowExpression())
+def pyarrow_to_schema(schema: pa.Schema) -> Schema:
Review Comment:
I think this is a great start, but would suggest doing it the same way as
the Iceberg Schema visitor:
https://github.com/apache/iceberg/blob/d42d1e89c0616c203f7ad29f002811ddd440e14f/python/pyiceberg/schema.py#L710-L774
Here we use the `@singledispatch` library to automatically call the right
method based on the first argument.
```python
def visit_arrow_list(obj: pa.DataType, visitor: ArrowSchemaVisitor[T]) -> T:
if not pa.types.is_list(obj):
raise TypeError(f"Expected list type, got {type(obj)}")
obj = cast(pa.ListType, obj)
visitor.before_list_element(obj.value_field)
list_result = visit_arrow(obj.value_field.type, visitor)
visitor.after_list_element(obj.value_field)
return visitor.list(obj, list_result)
```
Would become:
```python
@visit_pyarrow.register(pa.lib.ListType)
def visit_arrow_list(obj: pa.lib.ListType, visitor: ArrowSchemaVisitor[T])
-> T:
visitor.before_list_element(obj.value_field)
list_result = visit_arrow(obj.value_field.type, visitor)
visitor.after_list_element(obj.value_field)
return visitor.list(obj, list_result)
```
This way we don't have to do all the checking, and we push this down to
`dispatch`. I did some benchmarking, and it is also faster than trying to
implement this ourselves (pushed down to the C level).
The catch all for the primitive would then become the `DataType` one:
```
>>> type(pa.int32())
<class 'pyarrow.lib.DataType'>
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]