[jira] [Updated] (CALCITE-6090) Jdbc adapter may create wrong sql for joins

Ulrich Kramer (Jira) Fri, 03 Nov 2023 03:12:04 -0700


     [ 
https://issues.apache.org/jira/browse/CALCITE-6090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ulrich Kramer updated CALCITE-6090:
-----------------------------------
    Description: 
Adding the following unit test to {{JdbcAdapterTest}} leads to an error
{code:java}
  @Test void testAmbiguousColumn() {
    CalciteAssert.model(JdbcTest.FOODMART_SCOTT_MODEL)
        .query("select\n" +
            "                  \"store_id\" \"latest_id\",\n" +
            "                  max(\"store_type\") \"latest_store_type\"\n" +
            "                from\n" +
            "                  ( SELECT \"store_id\",\"store_type\" FROM 
\"foodmart\".\"store\") \n" +
            "                group by\n" +
            "                  \"store_id\"")
        .runs()
        .enable(CalciteAssert.DB == CalciteAssert.DatabaseInstance.HSQLDB)
        .explainContains("JdbcToEnumerableConverter\n  JdbcProject" )
        .planHasSql("SELECT MAX(\"region_id\") AS \"latest_region_id\", 
\"store_id\" AS \"latest_id\"\nFROM \"foodmart\".\"store\"\nGROUP BY 
\"store_id\"");
  }
{code}
The projection for the column {{latest_id}} is missing. The problem is located 
[here|https://github.com/apache/calcite/blob/590ec85f0fcff7173c288c350c3f60e640976a34/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2094-L2095].
 If the aggregation doesn't add, remove or twist columns, 
{{RexUtil.isIdentity}} returns true. Additionally the {{inputRowType}} already 
contains the names of the output columns. Therefore 
{{fieldNameList.equals(inputRowType.getFieldNames())}} also returns true.

Normally, this doesn't cause issues. But if this statement is used within a 
join like this
{code:sql}
SELECT
  DISTINCT "region_id",
  "store_id",
  "dummy"
FROM
  (
    SELECT
      "region_id",
      "store_id",
      "dummy"
    FROM
      (
        select
          B."store_city" "store_city",
          B."store_id" "store_id",
          B."region_id" "region_id",
          cast(null as integer) "dummy"
        from
          (
            select
              A."store_city" "store_city",
              A."region_id" "region_id",
              A."store_id" "store_id"
            from
              (
                select
                  max("region_id") "latest_region_id",
                  "store_id" "latest_id"
                from
                  ( SELECT "region_id", "store_id" FROM "foodmart"."store") "C"
                group by
                  "store_id"
              ) "D"
              left outer join "foodmart"."store" A on (
                A."store_id" = "D"."latest_id"
                and A."region_id" = "D"."latest_region_id"
              )
          ) B
      )
    WHERE
      "store_city" IS NOT NULL
  )
{code}
an invalid SQL will be generated because the left and the right side of the 
join has a column "store_id". This will be fixed in 
{{SqlValidatorUtil::addFields}} by appending a unique number. But in the end 
this will result in a statement, which contains a {{GROUP BY ... store_id0}} 
and will never be successful.

  was:
Adding the following unit test to {{JdbcAdapterTest}} leads to an error
{code:java}
  @Test void testAmbiguousColumn() {
    CalciteAssert.model(JdbcTest.FOODMART_SCOTT_MODEL)
        .query("select\n" +
            "                  \"store_id\" \"latest_id\",\n" +
            "                  max(\"store_type\") \"latest_store_type\"\n" +
            "                from\n" +
            "                  ( SELECT \"store_id\",\"store_type\" FROM 
\"foodmart\".\"store\") \n" +
            "                group by\n" +
            "                  \"store_id\"")
        .runs()
        .enable(CalciteAssert.DB == CalciteAssert.DatabaseInstance.HSQLDB)
        .explainContains("JdbcToEnumerableConverter\n  JdbcProject" )
        .planHasSql("SELECT MAX(\"region_id\") AS \"latest_region_id\", 
\"store_id\" AS \"latest_id\"\nFROM \"foodmart\".\"store\"\nGROUP BY 
\"store_id\"");
  }
{code}

The projection for the column {{latest_id}} is missing. The problem is related 
[here|https://github.com/apache/calcite/blob/590ec85f0fcff7173c288c350c3f60e640976a34/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2094-L2095].
 If the aggregation doesn't add, remove or twist columns, 
{{RexUtil.isIdentity}} returns true. Additionally the {{inputRowType}} already 
contains the names of the output columns. Therefore 
{{fieldNameList.equals(inputRowType.getFieldNames())}} also returns true.

Normally, this doesn't cause issues. But if this statement is used within a 
join like this

{code:sql}
SELECT
  DISTINCT "region_id",
  "store_id",
  "dummy"
FROM
  (
    SELECT
      "region_id",
      "store_id",
      "dummy"
    FROM
      (
        select
          B."store_city" "store_city",
          B."store_id" "store_id",
          B."region_id" "region_id",
          cast(null as integer) "dummy"
        from
          (
            select
              A."store_city" "store_city",
              A."region_id" "region_id",
              A."store_id" "store_id"
            from
              (
                select
                  max("region_id") "latest_region_id",
                  "store_id" "latest_id"
                from
                  ( SELECT "region_id", "store_id" FROM "foodmart"."store") "C"
                group by
                  "store_id"
              ) "D"
              left outer join "foodmart"."store" A on (
                A."store_id" = "D"."latest_id"
                and A."region_id" = "D"."latest_region_id"
              )
          ) B
      )
    WHERE
      "store_city" IS NOT NULL
  )
{code}

an invalid SQL will be generated because the left and the right side of the 
join has a column "store_id". This will be fixed in 
{{SqlValidatorUtil::addFields}} by appending a unique number. But in the end 
this will result in a statement, which contains a {{GROUP BY ... store_id0}} 
and will never be successful.



> Jdbc adapter may create wrong sql for joins
> -------------------------------------------
>
>                 Key: CALCITE-6090
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6090
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.35.0
>            Reporter: Ulrich Kramer
>            Priority: Major
>
> Adding the following unit test to {{JdbcAdapterTest}} leads to an error
> {code:java}
>   @Test void testAmbiguousColumn() {
>     CalciteAssert.model(JdbcTest.FOODMART_SCOTT_MODEL)
>         .query("select\n" +
>             "                  \"store_id\" \"latest_id\",\n" +
>             "                  max(\"store_type\") \"latest_store_type\"\n" +
>             "                from\n" +
>             "                  ( SELECT \"store_id\",\"store_type\" FROM 
> \"foodmart\".\"store\") \n" +
>             "                group by\n" +
>             "                  \"store_id\"")
>         .runs()
>         .enable(CalciteAssert.DB == CalciteAssert.DatabaseInstance.HSQLDB)
>         .explainContains("JdbcToEnumerableConverter\n  JdbcProject" )
>         .planHasSql("SELECT MAX(\"region_id\") AS \"latest_region_id\", 
> \"store_id\" AS \"latest_id\"\nFROM \"foodmart\".\"store\"\nGROUP BY 
> \"store_id\"");
>   }
> {code}
> The projection for the column {{latest_id}} is missing. The problem is 
> located 
> [here|https://github.com/apache/calcite/blob/590ec85f0fcff7173c288c350c3f60e640976a34/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2094-L2095].
>  If the aggregation doesn't add, remove or twist columns, 
> {{RexUtil.isIdentity}} returns true. Additionally the {{inputRowType}} 
> already contains the names of the output columns. Therefore 
> {{fieldNameList.equals(inputRowType.getFieldNames())}} also returns true.
> Normally, this doesn't cause issues. But if this statement is used within a 
> join like this
> {code:sql}
> SELECT
>   DISTINCT "region_id",
>   "store_id",
>   "dummy"
> FROM
>   (
>     SELECT
>       "region_id",
>       "store_id",
>       "dummy"
>     FROM
>       (
>         select
>           B."store_city" "store_city",
>           B."store_id" "store_id",
>           B."region_id" "region_id",
>           cast(null as integer) "dummy"
>         from
>           (
>             select
>               A."store_city" "store_city",
>               A."region_id" "region_id",
>               A."store_id" "store_id"
>             from
>               (
>                 select
>                   max("region_id") "latest_region_id",
>                   "store_id" "latest_id"
>                 from
>                   ( SELECT "region_id", "store_id" FROM "foodmart"."store") 
> "C"
>                 group by
>                   "store_id"
>               ) "D"
>               left outer join "foodmart"."store" A on (
>                 A."store_id" = "D"."latest_id"
>                 and A."region_id" = "D"."latest_region_id"
>               )
>           ) B
>       )
>     WHERE
>       "store_city" IS NOT NULL
>   )
> {code}
> an invalid SQL will be generated because the left and the right side of the 
> join has a column "store_id". This will be fixed in 
> {{SqlValidatorUtil::addFields}} by appending a unique number. But in the end 
> this will result in a statement, which contains a {{GROUP BY ... store_id0}} 
> and will never be successful.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (CALCITE-6090) Jdbc adapter may create wrong sql for joins

Reply via email to