anentropic opened a new issue, #1337:
URL: https://github.com/apache/iceberg-python/issues/1337
### Apache Iceberg version
0.8.0 (latest release)
### Please describe the bug 🐞
It looks like`transform` is intended to be an optional field (?):
```python
class SortField(IcebergBaseModel):
"""Sort order field.
Args:
source_id (int): Source column id from the table’s schema.
transform (str): Transform that is used to produce values to be sorted
on from the source column.
This is the same transform as described in partition
transforms.
direction (SortDirection): Sort direction, that can only be either asc
or desc.
null_order (NullOrder): Null order that describes the order of null
values when sorted. Can only be either nulls-first or nulls-last.
"""
def __init__(
self,
source_id: Optional[int] = None,
transform: Optional[Union[Transform[Any, Any],
Callable[[IcebergType], Transform[Any, Any]]]] = None,
direction: Optional[SortDirection] = None,
null_order: Optional[NullOrder] = None,
**data: Any,
):
if source_id is not None:
data["source-id"] = source_id
if transform is not None:
data["transform"] = transform
if direction is not None:
data["direction"] = direction
if null_order is not None:
data["null-order"] = null_order
super().__init__(**data)
```
But if I don't specify `SortField(source_id=field.field_id)` or pass None
`SortField(source_id=field.field_id, transform=None)` then I get pydantic
validation error:
```
ValidationError: 1 validation error for SortField
transform
Field required [type=missing, input_value={'source-id': 4, 'directi...:
NullOrder.NULLS_FIRST}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.9/v/missing
```
`SortField(source_id=field.field_id, transform=IdentityTransform())` works
`SortField(source_id=field.field_id, transform=IDENTITY)` also works, but
type checkers don't like it
I think both problems stem from here:
```python
transform: Annotated[ # type: ignore
Transform,
BeforeValidator(parse_transform),
PlainSerializer(lambda c: str(c), return_type=str), # pylint:
disable=W0108
WithJsonSchema({"type": "string"}, mode="serialization"),
] = Field()
```
the type annotation doesn't make it `Optional`
and `BeforeValidator(parse_transform)` uses `parse_transform` to turn the
`IDENTITY` string constant into `IdentityTransform()` so the type you pass
doesn't match the annotation
for the latter one, there is a method here
https://docs.pydantic.dev/2.0/usage/types/custom/#handling-third-party-types
that would allow passing string constant that is converted to an instance of
the annotated `Transform` type
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]