Re: Compute expression using pc.call_function not working as expected

2022-04-21 Thread Partha Dutta
I'll need to look at that. Right now I am recursively flattening the atruct and then using an Expression to filter. Any way to push this lower in the stack would be more performant On Thu, Apr 21, 2022, 6:02 PM David Li wrote: > Coincidentally there was a StackOverflow question about this

Re: Compute expression using pc.call_function not working as expected

2022-04-21 Thread David Li
Coincidentally there was a StackOverflow question about this recently too with some answers outlining approaches for 7.0 and 8.0: https://stackoverflow.com/questions/71945507/how-can-i-filter-or-select-sub-fields-of-structtype-columns-in-pyarrow On Thu, Apr 21, 2022, at 17:46, Weston Pace

Re: Compute expression using pc.call_function not working as expected

2022-04-21 Thread Weston Pace
Awesome. I've created ARROW-16275[1] to track this. Also, I discovered that, starting with 8.0.0, we have support for expressing nested references in python so you can write: dataset.to_table(filter=ds.field("values", "one") < 200) [1] https://issues.apache.org/jira/browse/ARROW-16275 On

Re: Compute expression using pc.call_function not working as expected

2022-04-20 Thread Weston Pace
No and no. This filter will not be used for predicate pushdown now or in 8.0.0. It could possibly come after 8.0.0. If parquet stores statistics for each column of a struct array (don't know offhand if they do) then we should create a JIRA to expose this. On Wed, Apr 20, 2022, 11:01 AM Partha

Re: Compute expression using pc.call_function not working as expected

2022-04-20 Thread Partha Dutta
That works! Thanks. Do you know off hand if this filter would be used in a predicate pushdown for a parquet dataset? Or would it be possibly coming in version 8.0.0? On Wed, Apr 20, 2022 at 3:49 PM Weston Pace wrote: > The second argument to `call_function` should be a list (the args to > the

Re: Compute expression using pc.call_function not working as expected

2022-04-20 Thread Weston Pace
The second argument to `call_function` should be a list (the args to the function). Since `arr3` is iterable it is interpreting it as a list of args and trying to treat each row as an argument to your call (this is the reason it thinks you have 3 arguments). This should work:

Compute expression using pc.call_function not working as expected

2022-04-20 Thread Partha Dutta
I'm trying to use the compute function struct_field in order to create an expression for dataset filtering. But running into an error. This is the code snippet: arr1 = pa.array([100, 200, 300]) arr2 = pa.array([400, 500, 600]) arr3 = pa.StructArray.from_arrays([arr1, arr2], ["one", "two"]) e =