[ https://issues.apache.org/jira/browse/ARROW-11538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281102#comment-17281102 ]
Josh commented on ARROW-11538: ------------------------------ Thanks, will use casting to avoid the issue > [Python] Segfault reading Parquet dataset with Timestamp filter > --------------------------------------------------------------- > > Key: ARROW-11538 > URL: https://issues.apache.org/jira/browse/ARROW-11538 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 3.0.0 > Environment: platform: Linux 64bit > conda env: > conda create -n pya python=3.8 pyarrow=3.0.0 pandas=1.2.1 pytest -c > conda-forge > Reporter: Josh > Priority: Minor > Fix For: 4.0.0, 3.0.1 > > > The first two tests pass but the third gives: Fatal Python error: > Segmentation fault > All three pass in with pyarrow=2.0.0 > {code:java} > import pandas > import pyarrow as pa > import pyarrow.dataset as ds > import pyarrow.parquet as pq > import pytest > @pytest.fixture > def data_path(tmp_path): > path = tmp_path / "data.parquet" > df = pandas.DataFrame( > [ > ["A", pandas.Timestamp("2020-11-04")], > ], > columns=["name", "date"], > ) > table = pa.Table.from_pandas(df) > pq.write_table(table, path, version="2.0") > return df, path > @pytest.mark.parametrize( > "filter", > [ > None, > ds.field("date") == "2020-11-04", > ds.field("date") == pandas.Timestamp("2020-11-04"), > ], > ) > def test_dataset_filter(filter, data_path): > data, path = data_path > dataset = ds.dataset(path, format="parquet") > assert data.equals(dataset.to_table(filter=filter).to_pandas()) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)