waynexia commented on pull request #792:
URL: https://github.com/apache/arrow-datafusion/pull/792#issuecomment-920182797


   Hi @houqp, here are some updates.
   
   The reproducer with rust dataframe API is 
[here](https://github.com/apache/arrow-datafusion/pull/792/commits/64330bdbe794b91789cdf45cd2127fe5b418a1a7#diff-a119e5d1231fc6f2551e39bf9427ed1499e18905054632e441895684e372c7afR2162).
 IMO the problem we are facing is that `Filter` plan doesn't have its own 
schema. And it doesn't require following right after the plan it's going to 
filter too (a bit wired to me... I haven't inspected how other systems act), 
which caused this problem. Taking the reproducer for example, the optimizer use 
"input plan" 's schema to query expr's type and got nothing. The `Filter` is 
actually performed in the table scan plan, and that's where to get the schema. 
Later in another optimizer `FilterPushdown` the `Filter` plan is moved after 
the table scan plan.
   
   I changed the behavior of getting schema in 
https://github.com/apache/arrow-datafusion/pull/792/commits/64330bdbe794b91789cdf45cd2127fe5b418a1a7.
 It now will get all the schemas under the `Filter` plan and merge them into 
one for querying. This can pass the tests (locally) but I'm wondering whether 
there are some other approaches to achieve this. One in my mind is to place 
this optimizer after  `FilterPushDown`, which may solve this problem since 
`Filter` is in the "right" place now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to