Hi Zack, Looks like it is a regression. Are you able to provide a reproducible test case? You can log a JIRA along with the test case, so people can do the root cause analysis.
Thanks, Haisheng Yuan On 2022/03/12 01:26:53 "Gramana, Zachary (GE Digital)" wrote: > Hello all! > > I would appreciate some help/guidance with troubleshooting a big performance > regression we saw on our UNION queries after updating our application from > 1.24 to 1.29. Many of the tables in the queries are wide (> 50 columns) so we > use a custom push-down Project class. These custom Projects are being > eliminated in 1.29 UNION queries--though not when replacing UNION with UNION > ALL. > > The examples below illustrate minimal queries that reproduce the issue our > test database. > > When compared to the UNION query times using 1.24, the prepared UNION > statements take 2-2.5 times longer in 1.29. The prepare time is around 2 > times longer as well. > > In the examples below, the UNION and UNION ALL queries generate the same > result set in our test database. All Calcite tables have default collations > ("ISO-8859-1$en_US"). Several are rather wide, with many > 50 columns. > > In 1.29 for a simple query, both UNION and UNION ALL produce the same query > plan, with the exception of EnumerableUnion's `all` parameter: > > 1.29 > EXPLAIN PLAN FOR > SELECT Id FROM t1 > UNION ALL > SELECT Id FROM t2 > ----------------------------------------------------------------------------------------------------- > EnumerableUnion(all=[true]) > MyEnumerableConverter > MyProject(Id=[$0]) > MyTableScan(table=[[MySchema, t1]]) > MyEnumerableConverter > MyProject(Id=[$0]) > MyTableScan(table=[[MySchema, t2]]) > > 1.29 > EXPLAIN PLAN FOR > SELECT Id FROM t1 > UNION > SELECT Id FROM t2 > ----------------------------------------------------------------------------------------------------- > EnumerableUnion(all=[false]) > MyEnumerableConverter > MyProject(Id=[$0]) > MyTableScan(table=[[MySchema, t1]]) > MyEnumerableConverter > MyProject(Id=[$0]) > MyTableScan(table=[[MySchema, t2]]) > > The next examples add a JOIN to one of the tables in order to make this a > slightly more complicated query and trigger the issue. Unlike in 1.24, the > plans in 1.29 for UNION and UNION ALL now vary significantly: > > 1.24 (like the above queries, the UNION and UNION ALL plans differ only by > the `all` parameter, so the UNION ALL plan for 1.24 is omitted) > EXPLAIN PLAN FOR > SELECT Id FROM t1 > UNION > SELECT t3.Id FROM t2 JOIN t3 ON (t3.Id = t2.t3_Id) > ----------------------------------------------------------------------------------------------------- > EnumerableUnion(all=[false]) > MyEnumerableConverter > MyProject(Id=[$0]) > MyTableScan(table=[[MySchema, t1]]) > EnumerableCalc(expr#0..1=[{inputs}], Id=[$t1]) > EnumerableMergeJoin(condition=[=($0, $1)], joinType=[inner]) > EnumerableSort(sort0=[$0], dir0=[ASC]) > EnumerableCalc(expr#0..262=[{inputs}], expr#263=[CAST($t178):BIGINT > NOT NULL], t3_Id0=[$t263]) > MyEnumerableConverter > MyTableScan(table=[[MySchema, t2]]) > EnumerableSort(sort0=[$0], dir0=[ASC]) > MyEnumerableConverter > MyProject(Id=[$0]) > MyTableScan(table=[[MySchema, t3]]) > > 1.29 (note this UNION ALL plan is nearly identical plan to the above `UNION` > plan in 1.24) > EXPLAIN PLAN FOR > SELECT Id FROM t1 > UNION ALL > SELECT t3.Id FROM t2 JOIN t3 ON (t3.Id = t2.t3_Id) > ----------------------------------------------------------------------------------------------------- > EnumerableUnion(all=[true]) > MyEnumerableConverter > MyProject(Id=[$0]) > MyTableScan(table=[[MySchema, t1]]) > EnumerableCalc(expr#0..1=[{inputs}], Id=[$t1]) > EnumerableMergeJoin(condition=[=($0, $1)], joinType=[inner]) > EnumerableSort(sort0=[$0], dir0=[ASC]) > EnumerableCalc(expr#0..262=[{inputs}], expr#263=[CAST($t178):BIGINT > NOT NULL], t3_Id0=[$t263]) > MyEnumerableConverter > MyTableScan(table=[[MySchema, t2]]) > EnumerableSort(sort0=[$0], dir0=[ASC]) > MyEnumerableConverter > MyProject(Id=[$0]) > MyTableScan(table=[[MySchema, t3]]) > > 1.29 (note the loss of the `MyProject` node above t3's MyTableScan, and now > the inputs for the post-MergeJoin EnumerableCalc node are very wide compared > to before) > EXPLAIN PLAN FOR > SELECT Id FROM t1 > UNION > SELECT t3.Id FROM t2 JOIN t3 ON (t3.Id = t2.t3_Id) > ----------------------------------------------------------------------------------------------------- > EnumerableUnion(all=[false]) > MyEnumerableConverter > MyProject(Id=[$0]) > MyTableScan(table=[[MySchema, t1]]) > EnumerableCalc(expr#0..402=[{inputs}], Id=[$t264]) > EnumerableMergeJoin(condition=[=($263, $264)], joinType=[inner]) > EnumerableSort(sort0=[$263], dir0=[ASC]) > EnumerableCalc(expr#0..262=[{inputs}], expr#263=[CAST($t178):BIGINT > NOT NULL], proj#0..263=[{exprs}]) > MyEnumerableConverter > MyTableScan(table=[[MySchema, t2]]) > EnumerableSort(sort0=[$0], dir0=[ASC]) > MyEnumerableConverter > MyTableScan(table=[[MySchema, t3]]) > > I've tried removing various rules and even restricting them, but the custom > Projects are still removed in the EnumerableMergeJoin despite setting the > self-cost to zero. > > Am I missing something? Any suggestions on where to look next? > > Thanks for any assistance/tips. > > Best, > Zack > >