Hi, I've spent some time on stabilizing the costs (see https://github.com/apache/calcite/pull/1702/commits ), and it looks like we might want to have some notion of "cost unit".
For instance, we want to express that sorting table with 2 int columns is cheaper than sorting table with 22 int columns. Unfortunately, VolcanoCost is compared by rows field only, so, for now, we express the number of fields into the cost#rows field by adding something like "cost += fieldCount * 0.1" :( Of course, you might say that cost is pluggable, however, I would like to make the default implementation sane. At least it should be good enough for inspirational purposes. What do you think if add a way to convert Cost to double? For instance, we can add measurer.measure(cost) that would weight the cost or we can add method like `double cost#toSeconds()`. I guess, if we add a unit (e.g. microsecond), then we could even micro-benchmark different join implementations, and use the appropriate cost values for extra columns and so on. I fully understand that the cost does not have to be precise, however, it is sad to guestimate the multipliers for an extra field in projection. Just to recap: 1) I've started with making tests parallel <-- this was the main goal 2) Then I run into EnumerableJoinTest#testSortMergeJoinWithEquiCondition which was using static variables 3) As I fixed EnumerableMergeJoinRule, it turned out the optimizer started to use merge join all over the place 4) It was caused by inappropriate costing of Sort, which I fixed 5) Then I updated the cost functions of EnumerableHashJoin and EnumerableNestedLoopJoin, and it was not enough because ReflectiveSchema was not providing the proper statistics 6) Then I updated ReflectiveSchema and Calc to propagate uniqueness and rowcount metadata. All of the above seems to be more-or-less stable (the plans improved!), and the failing tests, for now, are MaterializationTest. The problems with those tests are the cost differences between NestedLoop and HashJoin are tiny. For instance: testJoinMaterialization8 EnumerableProject(empid=[$2]): rowcount = 6.6, cumulative cost = {105.19999999999999 rows, 82.6 cpu, 0.0 io}, id = 780 EnumerableHashJoin(condition=[=($1, $4)], joinType=[inner]): rowcount = 6.6, cumulative cost = {98.6 rows, 76.0 cpu, 0.0 io}, id = 779 EnumerableProject(name=[$0], name0=[CAST($0):VARCHAR]): rowcount = 22.0, cumulative cost = {44.0 rows, 67.0 cpu, 0.0 io}, id = 777 EnumerableTableScan(table=[[hr, m0]]): rowcount = 22.0, cumulative cost = {22.0 rows, 23.0 cpu, 0.0 io}, id = 152 EnumerableProject(empid=[$0], name=[$1], name0=[CAST($1):VARCHAR]): rowcount = 2.0, cumulative cost = {4.0 rows, 9.0 cpu, 0.0 io}, id = 778 EnumerableTableScan(table=[[hr, dependents]]): rowcount = 2.0, cumulative cost = {2.0 rows, 3.0 cpu, 0.0 io}, id = 125 vs EnumerableProject(empid=[$0]): rowcount = 6.6, cumulative cost = {81.19999999999999 rows, 55.6 cpu, 0.0 io}, id = 778 EnumerableNestedLoopJoin(condition=[=(CAST($1):VARCHAR, CAST($2):VARCHAR)], joinType=[inner]): rowcount = 6.6, cumulative cost = {74.6 rows, 49.0 cpu, 0.0 io}, id = 777 EnumerableTableScan(table=[[hr, dependents]]): rowcount = 2.0, cumulative cost = {2.0 rows, 3.0 cpu, 0.0 io}, id = 125 EnumerableTableScan(table=[[hr, m0]]): rowcount = 22.0, cumulative cost = {22.0 rows, 23.0 cpu, 0.0 io}, id = 152 The second plan looks cheaper to the optimizer, however, the key difference comes from three projects in the first plan (project account for 6.6+22+2=30.6 cost). If I increase hr.dependents table to 3 rows, then hash-based plan becomes cheaper. As for me both plans looks acceptable, however, it is sad to analyze/debug those differences without being able to tell if that is a plan degradation or if it is acceptable. Vladimir