[ https://issues.apache.org/jira/browse/ARROW-9345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Arrow JIRA Bot reassigned ARROW-9345: -------------------------------------------- Assignee: Ben Kietzman (was: Apache Arrow JIRA Bot) > [C++][Dataset] Expression with dictionary type should work with operand of > value type > -------------------------------------------------------------------------------------- > > Key: ARROW-9345 > URL: https://issues.apache.org/jira/browse/ARROW-9345 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ > Reporter: Joris Van den Bossche > Assignee: Ben Kietzman > Priority: Major > Labels: dataset, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Related to ARROW-8647, see comment at > https://github.com/apache/arrow/pull/7536#issuecomment-653124260 > When using dictionary type for the partition fields, this now creates > partition expressions that also use a dictionary type. Which means that doing > something like {{dataset.to_table(filter=ds.field("part") == "A")}} to filter > on the partition field with a plain string expression doesn't work, limiting > the usability of this option (and even with the new Python scalar stuff, it > would not be easy to construct the correct expression): > {code} > In [9]: part = ds.HivePartitioning.discover(max_partition_dictionary_size=2) > In [10]: dataset = ds.dataset("test_partitioned_filter/", format="parquet", > partitioning=part) > In [11]: fragment = list(dataset.get_fragments())[0] > In [12]: fragment.partition_expression > Out[12]: > <pyarrow.dataset.Expression (part == [ > "A", > "B" > ][0]:dictionary<values=string, indices=int32, ordered=0>)> > In [13]: dataset.to_table(filter=ds.field("part") == "A") > ... > ArrowNotImplementedError: cast from string > {code} > It might be an option to keep the {{partition_expression}} use the dictionary > _value type_ instead of dictionary type? Or alternatively, as > [~fsaintjacques] proposed, ensure that any comparison involving the dict type > should also work with the "effective" logical type (the value type of the > dict). -- This message was sent by Atlassian Jira (v8.3.4#803005)