Dear Calcite devs,
First of all I really appreciate having a mature framework like Calcite.
Please continue your great work on this project!
My use case is feeding Calcite (v1.35.0) with an SQL query and doing
some optimizations by providing metadata and selected planner rules. I
initialize the Volcano planner and convert the logical plan resulting
from the sql to a physical plan (using bindable convention).
After the optimization, I convert the physical plan back to sql --
hoping its execution time is faster (running the query by a PostgreSQL
server) than the original query.
There are some aspects I don't understand regarding both the cost
calculation and cost propagation of (Rel) Subsets in the tree-based plan
representation generated by RuleMatchVisualizer.
AFAIK Subsets don't have any costs [1], so I'm really confused why
(cumulative) `cpu` is higher in the subset than it is in its child
elements (BindableJoin and BindableFilter), see [2]. In addition to that
the cost metric `rows` is smaller(!) than the values provided by the
children.
What I expect is that Subset has exactly the same `rows`, `cpu` (and
`io`) of the selected (purple) child element.
Having a look at this sub tree [3] the cost propagation works like expected.
Besides that I already noticed that Calcite costs seem to have an upper
bound (9.223372036854775807E18) where costs can't get any higher in sub
trees where this value is reached in an (physical operator) element.
I know it's hard to tell what Calcite actually does just using
screenshots. Please let me know if I should provide e.g., my code for
giving better insights.
Thank you in advance for your reply!
[1]:
https://github.com/apache/calcite/blob/c4042a34ef054b89cec1c47fefcbc8689bad55be/core/src/main/java/org/apache/calcite/plan/volcano/RelSubset.java#L254
[2]: https://ibb.co/7jtXKH3
[3]: https://ibb.co/5BZZyLz
Best regards,
Tony