[GitHub] [arrow-datafusion] waynexia commented on pull request #792: Implement basic common subexpression eliminate optimization

GitBox Wed, 15 Sep 2021 09:38:37 -0700


waynexia commented on pull request #792:
URL: https://github.com/apache/arrow-datafusion/pull/792#issuecomment-920182797

Hi @houqp, here are some updates.

The reproducer with rust dataframe API is
[here](https://github.com/apache/arrow-datafusion/pull/792/commits/64330bdbe794b91789cdf45cd2127fe5b418a1a7#diff-a119e5d1231fc6f2551e39bf9427ed1499e18905054632e441895684e372c7afR2162).
IMO the problem we are facing is that `Filter` plan doesn't have its own
schema. And it doesn't require following right after the plan it's going to
filter too (a bit wired to me... I haven't inspected how other systems act),
which caused this problem. Taking the reproducer for example, the optimizer use
"input plan" 's schema to query expr's type and got nothing. The `Filter` is
actually performed in the table scan plan, and that's where to get the schema.
Later in another optimizer `FilterPushdown` the `Filter` plan is moved after
the table scan plan.

I changed the behavior of getting schema in
https://github.com/apache/arrow-datafusion/pull/792/commits/64330bdbe794b91789cdf45cd2127fe5b418a1a7.
It now will get all the schemas under the `Filter` plan and merge them into
one for querying. This can pass the tests (locally) but I'm wondering whether
there are some other approaches to achieve this. One in my mind is to place
this optimizer after `FilterPushDown`, which may solve this problem since
`Filter` is in the "right" place now.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] waynexia commented on pull request #792: Implement basic common subexpression eliminate optimization

Reply via email to