[ https://issues.apache.org/jira/browse/ARROW-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joris Van den Bossche updated ARROW-5572: ----------------------------------------- Labels: dataset-parquet-read parquet (was: parquet) > [Python] raise error message when passing invalid filter in parquet reading > --------------------------------------------------------------------------- > > Key: ARROW-5572 > URL: https://issues.apache.org/jira/browse/ARROW-5572 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.13.0 > Reporter: Joris Van den Bossche > Priority: Minor > Labels: dataset-parquet-read, parquet > > From > https://stackoverflow.com/questions/56522977/using-predicates-to-filter-rows-from-pyarrow-parquet-parquetdataset > For example, when specifying a column in the filter which is a normal column > and not a key in your partitioned folder hierarchy, the filter gets silently > ignored. It would be nice to get an error message for this. > Reproducible example: > {code:python} > df = pd.DataFrame({'a': [0, 0, 1, 1], 'b': [0, 1, 0, 1], 'c': [1, 2, 3, 4]}) > table = pa.Table.from_pandas(df) > pq.write_to_dataset(table, 'test_parquet_row_filters', partition_cols=['a']) > # filter on 'a' (partition column) -> works > pq.read_table('test_parquet_row_filters', filters=[('a', '=', 1)]).to_pandas() > # filter on normal column (in future could do row group filtering) -> > silently does nothing > pq.read_table('test_parquet_row_filters', filters=[('b', '=', 1)]).to_pandas() > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)