mlondschien opened a new issue, #45161:
URL: https://github.com/apache/arrow/issues/45161
### Describe the bug, including details regarding any error messages,
version, and platform.
Upon trying to reproduce a different error that occurs when using filters
(predicates) which select none of the data, I came across this error:
```python
import numpy as np
import polars as pl
import pyarrow.dataset as ds
from pyarrow.parquet import ParquetDataset
n = 1_000_000
rng = np.random.default_rng(seed=42)
data = pl.DataFrame(
{
"a": rng.uniform(low=0, high=2, size=n),
"b": rng.choice(["a", "b"], n),
"c": rng.normal(size=n),
}
)
data.write_parquet("data.parquet", row_group_size=500_000)
df = pl.from_arrow(
ParquetDataset(
["data.parquet"],
filters=~ds.field("c").is_null() & ds.field("a") >= 3,
).read(columns=["b"])
)
print(df)
```
This yields
```
Traceback (most recent call last):
File "test_arrow.py", line 24, in <module>
).read(columns=["b"])
^^^^^^^^^^^^^^^^^^^
File
"/cluster/home/lmalte/nobackup/micromamba/envs/psutil/lib/python3.11/site-packages/pyarrow/parquet/core.py",
line 1485, in read
table = self._dataset.to_table(
^^^^^^^^^^^^^^^^^^^^^^^
File "pyarrow/_dataset.pyx", line 553, in pyarrow._dataset.Dataset.to_table
File "pyarrow/_dataset.pyx", line 399, in pyarrow._dataset.Dataset.scanner
File "pyarrow/_dataset.pyx", line 3557, in
pyarrow._dataset.Scanner.from_dataset
File "pyarrow/_dataset.pyx", line 3475, in
pyarrow._dataset.Scanner._make_scan_options
File "pyarrow/_dataset.pyx", line 3409, in
pyarrow._dataset._populate_builder
File "pyarrow/_compute.pyx", line 2724, in pyarrow._compute._bind
File "pyarrow/error.pxi", line 155, in
pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError: Function 'and_kleene' has no kernel
matching input types (bool, double)
```
This is using
```
polars 1.14.0 py311hcc3b33b_1 conda-forge
pyarrow 18.1.0 py311h38be061_0 conda-forge
pyarrow-core 18.1.0 py311h4854187_0_cpu conda-forge
```
### Component(s)
Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]