It looks like Calcite stopped considering field names in RelNode equality as of Calcite 2.22 (which we use in Beam v2.34.0+). This can result in a planner state where two nodes that only differ by field name are considered equivalent.
I have a fix for Beam in https://github.com/apache/beam/pull/25290 and I'll send an email to the Calcite dev list with more details. Andrew On Fri, Jan 27, 2023 at 11:33 AM Andrew Pilloud <apill...@google.com> wrote: > Also this is at very least a Beam bug. You can file a Beam issue if you > want, otherwise I will when I get back. > > Andrew > > On Fri, Jan 27, 2023 at 11:27 AM Andrew Pilloud <apill...@google.com> > wrote: > >> Hi Talat, >> >> I did get your test case running and added some logging to >> RexProgramBuilder.mergePrograms. There is only one merge that occurs during >> the test and it has an output type of RecordType(JavaType(int) ID, >> JavaType(class java.lang.String) V). This does seem like the correct output >> name but it doesn't match the final output name, so something is still >> different than the Beam test case. I also modified mergePrograms to >> purposely corrupt the output names, that did not cause the test to fail or >> trip the 'assert mergedProg.getOutputRowType() == >> topProgram.getOutputRowType();' in mergePrograms. I could not find any >> Calcite unit tests for RexProgramBuilder.mergePrograms or >> CoreRules.CALC_MERGE rule so I think it is still probable that the problem >> is in this area. >> >> One minor issue I encountered. It took me a while to get your test case >> running, it doesn't appear there are any calcite gradle rules to run >> CoreQuidemTest and constructing the classpath manually was tedious. Did I >> miss something? >> >> I'm still working on this but I'm out today and Monday, it will probably >> be Wednesday before I make any more progress. >> >> Andrew >> >> On Fri, Jan 27, 2023 at 10:40 AM Talat Uyarer < >> tuya...@paloaltonetworks.com> wrote: >> >>> Hi Andrew, >>> >>> Yes This aligned also with my debugging. In My Kenn's reply you can see >>> a sql test which I wrote in Calcite. Somehow Calcite does not have this >>> issue with the 1.28 version. >>> >>> !use post >>> !set outputformat mysql >>> >>> #Test aliases with with clause >>> WITH tempTable(id, v) AS (select "hr"."emps"."empid" as id, >>> "hr"."emps"."name" as v from "hr"."emps") >>> SELECT tempTable.id as id, tempTable.v as "value" FROM tempTable WHERE >>> tempTable.v <> '11' ; >>> +-----+-----------+ >>> | ID | value | >>> +-----+-----------+ >>> | 100 | Bill | >>> | 110 | Theodore | >>> | 150 | Sebastian | >>> | 200 | Eric | >>> +-----+-----------+ >>> (4 rows) >>> >>> !ok >>> >>> >>> On Wed, Jan 25, 2023 at 6:08 PM Andrew Pilloud <apill...@google.com> >>> wrote: >>> >>>> Yes, that worked. >>>> >>>> The issue does not occur if I disable all of the following planner >>>> rules: CoreRules.FILTER_CALC_MERGE, CoreRules.PROJECT_CALC_MERGE, >>>> LogicalCalcMergeRule.INSTANCE (which wraps CoreRules.CALC_MERGE), >>>> and BeamCalcMergeRule.INSTANCE (which wraps CoreRules.CALC_MERGE). >>>> >>>> All the rules share a common call to RexProgramBuilder.mergePrograms, >>>> so I suspect the problem lies there. I spent some time looking but wasn't >>>> able to find it by code inspection, it looks like this code path is doing >>>> the right thing with names. I'll spend some time tomorrow trying to >>>> reproduce this on pure Calcite. >>>> >>>> Andrew >>>> >>>> >>>> On Tue, Jan 24, 2023 at 8:24 PM Talat Uyarer < >>>> tuya...@paloaltonetworks.com> wrote: >>>> >>>>> Hi Andrew, >>>>> >>>>> Thanks for writing a test for this use case. Without Where clause it >>>>> works as expected on our test cases also too. Please add where clause on >>>>> second select. With the below query it does not return column names. I >>>>> tested on my local also. >>>>> >>>>> WITH tempTable (id, v) AS (SELECT f_int as id, f_string as v FROM >>>>> PCOLLECTION) SELECT id AS fout_int, v AS fout_string FROM tempTable WHERE >>>>> id > 1 >>>>> >>>>> Thanks >>>>> >>>>> On Tue, Jan 24, 2023 at 5:28 PM Andrew Pilloud <apill...@google.com> >>>>> wrote: >>>>> >>>>>> +dev@beam.apache.org <dev@beam.apache.org> >>>>>> >>>>>> I tried reproducing this but was not successful, the output schema >>>>>> was as expected. I added the following to BeamSqlMultipleSchemasTest.java >>>>>> at head. (I did discover >>>>>> that PAssert.that(result).containsInAnyOrder(output) doesn't validate >>>>>> column names however.) >>>>>> >>>>>> @Test >>>>>> public void testSelectAs() { >>>>>> PCollection<Row> input = pipeline.apply(create(row(1, "strstr"))); >>>>>> >>>>>> PCollection<Row> result = >>>>>> input.apply(SqlTransform.query("WITH tempTable (id, v) AS >>>>>> (SELECT f_int as id, f_string as v FROM PCOLLECTION) SELECT id AS >>>>>> fout_int, >>>>>> v AS fout_string FROM tempTable")); >>>>>> >>>>>> Schema output_schema = >>>>>> >>>>>> Schema.builder().addInt32Field("fout_int").addStringField("fout_string").build(); >>>>>> assertThat(result.getSchema(), equalTo(output_schema)); >>>>>> >>>>>> Row output = Row.withSchema(output_schema).addValues(1, >>>>>> "strstr").build(); >>>>>> PAssert.that(result).containsInAnyOrder(output); >>>>>> pipeline.run(); >>>>>> } >>>>>> >>>>>> On Tue, Jan 24, 2023 at 8:13 AM Talat Uyarer < >>>>>> tuya...@paloaltonetworks.com> wrote: >>>>>> >>>>>>> Hi Kenn, >>>>>>> >>>>>>> Thank you for replying back to my email. >>>>>>> >>>>>>> I was under the same impression about Calcite. But I wrote a test on >>>>>>> Calcite 1.28 too. It is working without issue that I see on BEAM >>>>>>> >>>>>>> Here is my test case. If you want you can also run on Calcite. >>>>>>> Please put under core/src/test/resources/sql as text file. and Run >>>>>>> CoreQuidemTest >>>>>>> class. >>>>>>> >>>>>>> !use post >>>>>>> !set outputformat mysql >>>>>>> >>>>>>> #Test aliases with with clause >>>>>>> WITH tempTable(id, v) AS (select "hr"."emps"."empid" as id, >>>>>>> "hr"."emps"."name" as v from "hr"."emps") >>>>>>> SELECT tempTable.id as id, tempTable.v as "value" FROM tempTable WHERE >>>>>>> tempTable.v <> '11' ; >>>>>>> +-----+-----------+ >>>>>>> | ID | value | >>>>>>> +-----+-----------+ >>>>>>> | 100 | Bill | >>>>>>> | 110 | Theodore | >>>>>>> | 150 | Sebastian | >>>>>>> | 200 | Eric | >>>>>>> +-----+-----------+ >>>>>>> (4 rows) >>>>>>> >>>>>>> !ok >>>>>>> >>>>>>> >>>>>>> On Mon, Jan 23, 2023 at 10:16 AM Kenneth Knowles <k...@apache.org> >>>>>>> wrote: >>>>>>> >>>>>>>> Looking at the code that turns a logical CalcRel into a BeamCalcRel >>>>>>>> I do not see any obvious cause for this: >>>>>>>> https://github.com/apache/beam/blob/b3aa2e89489898f8c760294ba4dba2310ac53e70/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamCalcRule.java#L69 >>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_beam_blob_b3aa2e89489898f8c760294ba4dba2310ac53e70_sdks_java_extensions_sql_src_main_java_org_apache_beam_sdk_extensions_sql_impl_rule_BeamCalcRule.java-23L69&d=DwMFaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=BkW1L6EF7ergAVYDXCo-3Vwkpy6qjsWAz7_GD7pAR8g&m=KXc2qSceL6qFbFnQ_2qUOHr9mKuc6zYY8rJTNZC8p_wTcNs4M6mHQoCuoc4JfeaA&s=KjzplEf29oFB6uivvdjixpQiArWtfV-1SXpALL-ugEM&e=> >>>>>>>> >>>>>>>> I don't like to guess that upstream libraries have the bug, but in >>>>>>>> this case I wonder if the alias is lost in the Calcite optimizer rule >>>>>>>> for >>>>>>>> merging the projects and filters into a Calc. >>>>>>>> >>>>>>>> Kenn >>>>>>>> >>>>>>>> On Mon, Jan 23, 2023 at 10:13 AM Kenneth Knowles <k...@apache.org> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> I am not sure I understand the question, but I do see an issue. >>>>>>>>> >>>>>>>>> Context: "CalcRel" is an optimized relational operation that is >>>>>>>>> somewhat like ParDo, with a small snippet of a single-assignment DSL >>>>>>>>> embedded in it. Calcite will choose to merge all the projects and >>>>>>>>> filters >>>>>>>>> into the node, and then generates Java bytecode to directly execute >>>>>>>>> the DSL. >>>>>>>>> >>>>>>>>> Problem: it looks like the CalcRel has output columns with aliases >>>>>>>>> "id" and "v" where it should have output columns with aliases "id" and >>>>>>>>> "value". >>>>>>>>> >>>>>>>>> Kenn >>>>>>>>> >>>>>>>>> On Thu, Jan 19, 2023 at 6:01 PM Ahmet Altay <al...@google.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Adding: @Andrew Pilloud <apill...@google.com> @Kenneth Knowles >>>>>>>>>> <k...@google.com> >>>>>>>>>> >>>>>>>>>> On Thu, Jan 12, 2023 at 12:31 PM Talat Uyarer via user < >>>>>>>>>> u...@beam.apache.org> wrote: >>>>>>>>>> >>>>>>>>>>> Hi All, >>>>>>>>>>> >>>>>>>>>>> I am using Beam 2.43 with Calcite SQL with Java. >>>>>>>>>>> >>>>>>>>>>> I have a query with a WITH clause and some aliasing. Looks like >>>>>>>>>>> Beam Query optimizer after optimizing my query, it drops Select >>>>>>>>>>> statement's >>>>>>>>>>> aliases. Can you help me to identify where the problem is ? >>>>>>>>>>> >>>>>>>>>>> This is my query >>>>>>>>>>> INFO: SQL: >>>>>>>>>>> WITH `tempTable` (`id`, `v`) AS (SELECT >>>>>>>>>>> `PCOLLECTION`.`f_nestedRow`.`f_nestedInt` AS `id`, >>>>>>>>>>> `PCOLLECTION`.`f_nestedRow`.`f_nestedString` AS `v` >>>>>>>>>>> FROM `beam`.`PCOLLECTION` AS `PCOLLECTION`) (SELECT >>>>>>>>>>> `tempTable`.`id` AS `id`, `tempTable`.`v` AS `value` >>>>>>>>>>> FROM `tempTable` AS `tempTable` >>>>>>>>>>> WHERE `tempTable`.`v` <> '11') >>>>>>>>>>> >>>>>>>>>>> This is Calcite Plan look at LogicalProject(id=[$0], value=[$1]) >>>>>>>>>>> in SQL plan. >>>>>>>>>>> >>>>>>>>>>> Jan 12, 2023 12:19:08 PM >>>>>>>>>>> org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner >>>>>>>>>>> convertToBeamRel >>>>>>>>>>> INFO: SQLPlan> >>>>>>>>>>> LogicalProject(id=[$0], value=[$1]) >>>>>>>>>>> LogicalFilter(condition=[<>($1, '11')]) >>>>>>>>>>> LogicalProject(id=[$1.f_nestedInt], v=[$1.f_nestedString]) >>>>>>>>>>> BeamIOSourceRel(table=[[beam, PCOLLECTION]]) >>>>>>>>>>> >>>>>>>>>>> But Beam Plan does not have a LogicalProject(id=[$0], >>>>>>>>>>> value=[$1]) or similar. >>>>>>>>>>> >>>>>>>>>>> Jan 12, 2023 12:19:08 PM >>>>>>>>>>> org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner >>>>>>>>>>> convertToBeamRel >>>>>>>>>>> INFO: BEAMPlan> >>>>>>>>>>> BeamCalcRel(expr#0..1=[{inputs}], expr#2=[$t1.f_nestedInt], >>>>>>>>>>> expr#3=[$t1.f_nestedString], expr#4=['11':VARCHAR], expr#5=[<>($t3, >>>>>>>>>>> $t4)], >>>>>>>>>>> id=[$t2], v=[$t3], $condition=[$t5]) >>>>>>>>>>> BeamIOSourceRel(table=[[beam, PCOLLECTION]]) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> >>>>>>>>>>