mattmartin14 commented on PR #1534:
URL: https://github.com/apache/iceberg-python/pull/1534#issuecomment-2628065002
@Fokko - i added some additional smoke tests to test for situations where
the primary key is a string or a date; the filter list code you wrote works
fine for ints and strings, but on dates, i'm getting a type error such as this:
```bash
TypeError: Invalid literal value: datetime.date(2021, 1, 1)
```
For reference, here is the function to help jog your memory. Do you know how
we can handle updating this function to handle situations where a date is a
joined column?
```python
def get_filter_list(df: pyarrow_table, join_cols: list) -> BooleanExpression:
unique_keys = df.select(join_cols).group_by(join_cols).aggregate([])
pred = None
if len(join_cols) == 1:
pred = In(join_cols[0], unique_keys[0].to_pylist())
else:
pred = Or(*[
And(*[
EqualTo(col, row[col])
for col in join_cols
])
for row in unique_keys.to_pylist()
])
return pred
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]