I've fixed the thing and added PR https://github.com/apache/arrow/pull/8461 May I ask someone for a review? I suggest Philip Moritz who contributed the original cython integration layer would be a good candidate. Since I cannot assign reviewers, I thought maybe it is a good idea to write in the mailing list.
On Tue, Oct 6, 2020 at 4:37 PM Kirill Lykov <[email protected]> wrote: > I've created: https://issues.apache.org/jira/browse/ARROW-10197 > I put priority "Trivial" -- not sure if it is correct. > > On Tue, Oct 6, 2020 at 3:41 PM Wes McKinney <[email protected]> wrote: > >> This looks like something to improve in the Python bindings. Would you >> like to open a JIRA issue about it? >> >> On Tue, Oct 6, 2020 at 4:26 AM Kirill Lykov <[email protected]> >> wrote: >> > >> > Hi, >> > >> > I'm trying to write a code in python which executes an expression on >> > filtered data. So I create a filter and later projector for some >> expression >> > but don't get how to combine those two in python: >> > >> > ```python >> > import pyarrow as pa >> > import pyarrow.gandiva as gandiva >> > >> > table = pa.Table.from_arrays([pa.array([1., 31., 46., 3., 57., 44., >> 22.]), >> > pa.array([5., 45., 36., 73., >> > 83., 23., 76.])], >> > ['a', 'b']) >> > >> > builder = gandiva.TreeExprBuilder() >> > node_a = builder.make_field(table.schema.field("a")) >> > node_b = builder.make_field(table.schema.field("b")) >> > fifty = builder.make_literal(50.0, pa.float64()) >> > eleven = builder.make_literal(11.0, pa.float64()) >> > >> > cond_1 = builder.make_function("less_than", [node_a, fifty], pa.bool_()) >> > cond_2 = builder.make_function("greater_than", [node_a, node_b], >> > pa.bool_()) >> > cond_3 = builder.make_function("less_than", [node_b, eleven], >> pa.bool_()) >> > cond = builder.make_or([builder.make_and([cond_1, cond_2]), cond_3]) >> > condition = builder.make_condition(cond) >> > >> > filter = gandiva.make_filter(table.schema, condition) >> > # filterResult has type SelectionVector >> > filterResult = filter.evaluate(table.to_batches()[0], >> > pa.default_memory_pool()) >> > print(result) >> > >> > sum = builder.make_function("add", [node_a, node_b], pa.float64()) >> > field_result = pa.field("c", pa.float64()) >> > expr = builder.make_expression(sum, field_result) >> > projector = gandiva.make_projector( >> > table.schema, [expr], pa.default_memory_pool()) >> > >> > ### Here there is a problem that I don't know how to use filterResult >> with >> > projector >> > r, = projector.evaluate(table.to_batches()[0], result) >> > ``` >> > >> > In C++, I see that it is possible to pass SelectionVector as second >> > argument to projector::Evaluate: >> > >> https://github.com/apache/arrow/blob/c5fa23ea0e15abe47b35524fa6a79c7b8c160fa0/cpp/src/gandiva/tests/filter_project_test.cc#L270 >> > >> > Meanwhile, it looks like it is impossible in `gandiva.pyx`: >> > >> https://github.com/apache/arrow/blob/a4eb08d54ee0d4c0d0202fa0a2dfa8af7aad7a05/python/pyarrow/gandiva.pyx#L154 >> > >> > >> > >> > -- >> > Best regards, >> > Kirill Lykov >> > > > -- > Best regards, > Kirill Lykov > -- Best regards, Kirill Lykov
