[jira] [Commented] (CALCITE-6361) Uncollect.deriveUncollectRowType crashes if the input data is not a collection

2024-04-10 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836014#comment-17836014
 ] 

Julian Hyde commented on CALCITE-6361:
--

By the way, please don't say 'crashes'. Say what error is thrown.

> Uncollect.deriveUncollectRowType crashes if the input data is not a collection
> --
>
> Key: CALCITE-6361
> URL: https://issues.apache.org/jira/browse/CALCITE-6361
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.37.0
>Reporter: Mihai Budiu
>Priority: Minor
>
> This happens because the type checker calls getComponentType() without 
> checking first that the field type has components. It should report an error 
> in such a case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6361) Uncollect.deriveUncollectRowType crashes if the input data is not a collection

2024-04-10 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836013#comment-17836013
 ] 

Julian Hyde commented on CALCITE-6361:
--

Is there a SQL query that reproduces this problem? It's sometimes reasonable 
that methods such as deriveUncollectRowType() make certain assumptions -- the 
alternative is an overly defensive programming style.

An alternative formulation of the same question: Do you consider this to be a 
user error or an internal error?

> Uncollect.deriveUncollectRowType crashes if the input data is not a collection
> --
>
> Key: CALCITE-6361
> URL: https://issues.apache.org/jira/browse/CALCITE-6361
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.37.0
>Reporter: Mihai Budiu
>Priority: Minor
>
> This happens because the type checker calls getComponentType() without 
> checking first that the field type has components. It should report an error 
> in such a case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-6361) Uncollect.deriveUncollectRowType crashes if the input data is not a collection

2024-04-10 Thread Mihai Budiu (Jira)
Mihai Budiu created CALCITE-6361:


 Summary: Uncollect.deriveUncollectRowType crashes if the input 
data is not a collection
 Key: CALCITE-6361
 URL: https://issues.apache.org/jira/browse/CALCITE-6361
 Project: Calcite
  Issue Type: Bug
  Components: core
Affects Versions: 1.37.0
Reporter: Mihai Budiu


This happens because the type checker calls getComponentType() without checking 
first that the field type has components. It should report an error in such a 
case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6358) Support all PostgreSQL 14 date/time patterns

2024-04-10 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835918#comment-17835918
 ] 

Julian Hyde commented on CALCITE-6358:
--

[~njordan], To be clear, which functions are you proposing to fix? TO_CHAR? 
TO_TIMESTAMP? CAST(... FORMAT ...)?

> Support all PostgreSQL 14 date/time patterns
> 
>
> Key: CALCITE-6358
> URL: https://issues.apache.org/jira/browse/CALCITE-6358
> Project: Calcite
>  Issue Type: Sub-task
>Reporter: Norman Jordan
>Priority: Minor
>
> Many of the date/time format patterns supported by PostgreSQL 14 are not 
> supported in Calcite.
>  * HH
>  * US
>  * 
>  * S
>  * AM
>  * A.M.
>  * am
>  * a.m.
>  * PM
>  * P.M.
>  * pm
>  * p.m.
>  * Y,YYY
>  * YYY
>  * Y
>  * IYYY
>  * IYY
>  * IY
>  * I
>  * BC
>  * B.C.
>  * bc
>  * b.c.
>  * AD
>  * A.D.
>  * ad
>  * a.d.
>  * MONTH
>  * month
>  * MON
>  * mon
>  * DAY
>  * day
>  * Dy
>  * dy
>  * IDDD
>  * ID
>  * TZH
>  * TZM
>  * OF
> There are also template pattern modifiers that need to be supported.
>  * FM (prefix)
>  * TH (suffix)
>  * th (suffix)
>  * FX (prefix)
>  * TM (prefix)
> Some format patterns in Calcite behave differently from PostgreSQL 14.
>  * FF1
>  * FF2
>  * FF4
>  * FF5
>  * FF6
> Also verify that the other existing format strings produce results that match 
> PostgreSQL 14.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6357) Calcite enforces select arguments count to be same as row schema fields which causes aliases to be ignored

2024-04-10 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835917#comment-17835917
 ] 

Julian Hyde commented on CALCITE-6357:
--

[~brachi_packter], It might be useful to reduce this to a test case in 
{{RelBuilderTest}}. Can you do that?

> Calcite enforces select arguments count to be same as row schema fields which 
> causes aliases to be ignored
> --
>
> Key: CALCITE-6357
> URL: https://issues.apache.org/jira/browse/CALCITE-6357
> Project: Calcite
>  Issue Type: Bug
>Reporter: Brachi Packter
>Priority: Major
>
> Calcite RelBuilder.ProjectNamed checks if row size in the select is identical 
> to schema fields, if no, it creates a project with fields as they appear in 
> the select , meaning if they have aliases, they are returning with their 
> aliases.
> Here, it checks if they are identical:
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2063
> using RexUtil.isIdentity method:
> ```
>  public static boolean isIdentity(List exps,
>   RelDataType inputRowType) {
> return inputRowType.getFieldCount() == exps.size()
> && containIdentity(exps, inputRowType, Litmus.IGNORE);
>   }
> ```
> This is the problematic part `inputRowType.getFieldCount() == exps.size()`
> If they are identical, and return with their aliases, it is ignored in the 
> "rename" method later on
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2125
> and alias is skipped
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2137
> This doesn't impact calcite queries, but in Apache Beam they are doing some 
> optimization on top of it, 
> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamAggregateProjectMergeRule.java
> which causes aliases to be ignored, and data is returning suddenly without 
> correct column field.
> I believe the isIdentity check can causes more issues if not fixed, we need 
> to understand why is it enforced? isn't it valid to have different size of 
> fields in select from what we have in the schema?
> In our case we have a one big row and we run on it different queries, each 
> with different fields in the select.
> Beam issue 
> https://github.com/apache/beam/issues/30498 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (CALCITE-6360) Add .asf.yaml to calcite-avatica-go repository

2024-04-10 Thread Francis Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francis Chuang resolved CALCITE-6360.
-
Resolution: Fixed

> Add .asf.yaml to calcite-avatica-go repository
> --
>
> Key: CALCITE-6360
> URL: https://issues.apache.org/jira/browse/CALCITE-6360
> Project: Calcite
>  Issue Type: Task
>  Components: avatica-go
>Reporter: Francis Chuang
>Assignee: Francis Chuang
>Priority: Major
> Fix For: avatica-go-5.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-6360) Add .asf.yaml to calcite-avatica-go repository

2024-04-10 Thread Francis Chuang (Jira)
Francis Chuang created CALCITE-6360:
---

 Summary: Add .asf.yaml to calcite-avatica-go repository
 Key: CALCITE-6360
 URL: https://issues.apache.org/jira/browse/CALCITE-6360
 Project: Calcite
  Issue Type: Task
  Components: avatica-go
Reporter: Francis Chuang
Assignee: Francis Chuang
 Fix For: avatica-go-5.4.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6358) Support all PostgreSQL 14 date/time patterns

2024-04-10 Thread Norman Jordan (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835912#comment-17835912
 ] 

Norman Jordan commented on CALCITE-6358:


https://www.postgresql.org/docs/14/functions-formatting.html

> Support all PostgreSQL 14 date/time patterns
> 
>
> Key: CALCITE-6358
> URL: https://issues.apache.org/jira/browse/CALCITE-6358
> Project: Calcite
>  Issue Type: Sub-task
>Reporter: Norman Jordan
>Priority: Minor
>
> Many of the date/time format patterns supported by PostgreSQL 14 are not 
> supported in Calcite.
>  * HH
>  * US
>  * 
>  * S
>  * AM
>  * A.M.
>  * am
>  * a.m.
>  * PM
>  * P.M.
>  * pm
>  * p.m.
>  * Y,YYY
>  * YYY
>  * Y
>  * IYYY
>  * IYY
>  * IY
>  * I
>  * BC
>  * B.C.
>  * bc
>  * b.c.
>  * AD
>  * A.D.
>  * ad
>  * a.d.
>  * MONTH
>  * month
>  * MON
>  * mon
>  * DAY
>  * day
>  * Dy
>  * dy
>  * IDDD
>  * ID
>  * TZH
>  * TZM
>  * OF
> There are also template pattern modifiers that need to be supported.
>  * FM (prefix)
>  * TH (suffix)
>  * th (suffix)
>  * FX (prefix)
>  * TM (prefix)
> Some format patterns in Calcite behave differently from PostgreSQL 14.
>  * FF1
>  * FF2
>  * FF4
>  * FF5
>  * FF6
> Also verify that the other existing format strings produce results that match 
> PostgreSQL 14.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (CALCITE-6359) Update GitHub Actions workflows to use docker compose v2

2024-04-10 Thread Francis Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francis Chuang resolved CALCITE-6359.
-
Resolution: Fixed

> Update GitHub Actions workflows to use docker compose v2
> 
>
> Key: CALCITE-6359
> URL: https://issues.apache.org/jira/browse/CALCITE-6359
> Project: Calcite
>  Issue Type: Task
>  Components: avatica, avatica-go, core
>Reporter: Francis Chuang
>Assignee: Francis Chuang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.37.0, avatica-go-5.4.0, avatica-1.26.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-6359) Update GitHub Actions workflows to use docker compose v2

2024-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated CALCITE-6359:

Labels: pull-request-available  (was: )

> Update GitHub Actions workflows to use docker compose v2
> 
>
> Key: CALCITE-6359
> URL: https://issues.apache.org/jira/browse/CALCITE-6359
> Project: Calcite
>  Issue Type: Task
>  Components: avatica, avatica-go, core
>Reporter: Francis Chuang
>Assignee: Francis Chuang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.37.0, avatica-go-5.4.0, avatica-1.26.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-6359) Update GitHub Actions workflows to use docker compose v2

2024-04-10 Thread Francis Chuang (Jira)
Francis Chuang created CALCITE-6359:
---

 Summary: Update GitHub Actions workflows to use docker compose v2
 Key: CALCITE-6359
 URL: https://issues.apache.org/jira/browse/CALCITE-6359
 Project: Calcite
  Issue Type: Task
  Components: avatica, avatica-go, core
Reporter: Francis Chuang
Assignee: Francis Chuang
 Fix For: 1.37.0, avatica-go-5.4.0, avatica-1.26.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (CALCITE-6327) getValidatedNodeTypeIfKnown should never throw

2024-04-10 Thread James Duong (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Duong resolved CALCITE-6327.
--
Resolution: Fixed

This was fixed in CALCITE-6015

> getValidatedNodeTypeIfKnown should never throw
> --
>
> Key: CALCITE-6327
> URL: https://issues.apache.org/jira/browse/CALCITE-6327
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.36.0
>Reporter: Claude Brisson
>Assignee: Mihai Budiu
>Priority: Major
>
> During validation, when a SqlNode has been rewritten (for instance when a 
> COALESCE call has been rewritten as a CASE call) but does not yet have a 
> RelDataType, the method SqlValidatorImpl.getValidatedNodeTypeIfKnown() throws 
> an exception because it relies on 
> SqlValidatorImpl.getValidatedNodeType(originalExpr), not on 
> SqlValidatorImpl.getValidatedNodeTypeIfKnown(originalExpr).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6357) Calcite enforces select arguments count to be same as row schema fields which causes aliases to be ignored

2024-04-10 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835893#comment-17835893
 ] 

Julian Hyde commented on CALCITE-6357:
--

*Caveat*: I haven't read every word you've written above; I only scanned the 
Beam case. I'm correcting what seem to be mistaken assumptions, in the hope 
that it will allow you to diagnose your problem faster. I hope that I am not 
dissuading other Calcite community members who may have more time from jumping 
in to help.

{quote}I presume you would agree that names of output columns is as much part 
of data integrity as the values{quote}

No, I would not. Calcite does not commit to preserving column names, only their 
types and ordering. It recognizes duplicate relational expressions (via 
memoization), forms equivalence sets of relational expressions, and after 
optimization will return one of the relational expressions in that subset.

{quote}isn't it a valid case to have a row with wider schema from what you 
actually need to select?{quote}

Sure, you can write "select x, y from aTableWithAHundredColumns". That's a 
Project (with two expressions) on a Scan (returning 100 columns). My point is 
that the Project knows that its input has 100 columns.

> Calcite enforces select arguments count to be same as row schema fields which 
> causes aliases to be ignored
> --
>
> Key: CALCITE-6357
> URL: https://issues.apache.org/jira/browse/CALCITE-6357
> Project: Calcite
>  Issue Type: Bug
>Reporter: Brachi Packter
>Priority: Major
>
> Calcite RelBuilder.ProjectNamed checks if row size in the select is identical 
> to schema fields, if no, it creates a project with fields as they appear in 
> the select , meaning if they have aliases, they are returning with their 
> aliases.
> Here, it checks if they are identical:
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2063
> using RexUtil.isIdentity method:
> ```
>  public static boolean isIdentity(List exps,
>   RelDataType inputRowType) {
> return inputRowType.getFieldCount() == exps.size()
> && containIdentity(exps, inputRowType, Litmus.IGNORE);
>   }
> ```
> This is the problematic part `inputRowType.getFieldCount() == exps.size()`
> If they are identical, and return with their aliases, it is ignored in the 
> "rename" method later on
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2125
> and alias is skipped
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2137
> This doesn't impact calcite queries, but in Apache Beam they are doing some 
> optimization on top of it, 
> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamAggregateProjectMergeRule.java
> which causes aliases to be ignored, and data is returning suddenly without 
> correct column field.
> I believe the isIdentity check can causes more issues if not fixed, we need 
> to understand why is it enforced? isn't it valid to have different size of 
> fields in select from what we have in the schema?
> In our case we have a one big row and we run on it different queries, each 
> with different fields in the select.
> Beam issue 
> https://github.com/apache/beam/issues/30498 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-6358) Support all PostgreSQL 14 date/time patterns

2024-04-10 Thread Norman Jordan (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norman Jordan updated CALCITE-6358:
---
Description: 
Many of the date/time format patterns supported by PostgreSQL 14 are not 
supported in Calcite.
 * HH
 * US
 * 
 * S
 * AM
 * A.M.
 * am
 * a.m.
 * PM
 * P.M.
 * pm
 * p.m.
 * Y,YYY
 * YYY
 * Y
 * IYYY
 * IYY
 * IY
 * I
 * BC
 * B.C.
 * bc
 * b.c.
 * AD
 * A.D.
 * ad
 * a.d.
 * MONTH
 * month
 * MON
 * mon
 * DAY
 * day
 * Dy
 * dy
 * IDDD
 * ID
 * TZH
 * TZM
 * OF

There are also template pattern modifiers that need to be supported.
 * FM (prefix)
 * TH (suffix)
 * th (suffix)
 * FX (prefix)
 * TM (prefix)

Some format patterns in Calcite behave differently from PostgreSQL 14.
 * FF1
 * FF2
 * FF4
 * FF5
 * FF6

Also verify that the other existing format strings produce results that match 
PostgreSQL 14.

  was:
Many of the date/time format patterns supported by PostgreSQL 14 are not 
supported in Calcite.
 * HH
 * US
 * 
 * S
 * AM
 * A.M.
 * am
 * a.m.
 * PM
 * P.M.
 * pm
 * p.m.
 * Y,YYY
 * YYY
 * Y
 * IYYY
 * IYY
 * IY
 * I
 * BC
 * B.C.
 * bc
 * b.c.
 * AD
 * A.D.
 * ad
 * a.d.
 * MONTH
 * month
 * MON
 * mon
 * DAY
 * day
 * Dy
 * dy
 * IDDD
 * ID
 * TZH
 * TZM
 * OF

Some format patterns in Calcite behave differently from PostgreSQL 14.
 * FF1
 * FF2
 * FF4
 * FF5
 * FF6

Also verify that the other existing format strings produce results that match 
PostgreSQL 14.


> Support all PostgreSQL 14 date/time patterns
> 
>
> Key: CALCITE-6358
> URL: https://issues.apache.org/jira/browse/CALCITE-6358
> Project: Calcite
>  Issue Type: Sub-task
>Reporter: Norman Jordan
>Priority: Minor
>
> Many of the date/time format patterns supported by PostgreSQL 14 are not 
> supported in Calcite.
>  * HH
>  * US
>  * 
>  * S
>  * AM
>  * A.M.
>  * am
>  * a.m.
>  * PM
>  * P.M.
>  * pm
>  * p.m.
>  * Y,YYY
>  * YYY
>  * Y
>  * IYYY
>  * IYY
>  * IY
>  * I
>  * BC
>  * B.C.
>  * bc
>  * b.c.
>  * AD
>  * A.D.
>  * ad
>  * a.d.
>  * MONTH
>  * month
>  * MON
>  * mon
>  * DAY
>  * day
>  * Dy
>  * dy
>  * IDDD
>  * ID
>  * TZH
>  * TZM
>  * OF
> There are also template pattern modifiers that need to be supported.
>  * FM (prefix)
>  * TH (suffix)
>  * th (suffix)
>  * FX (prefix)
>  * TM (prefix)
> Some format patterns in Calcite behave differently from PostgreSQL 14.
>  * FF1
>  * FF2
>  * FF4
>  * FF5
>  * FF6
> Also verify that the other existing format strings produce results that match 
> PostgreSQL 14.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6357) Calcite enforces select arguments count to be same as row schema fields which causes aliases to be ignored

2024-04-10 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835868#comment-17835868
 ] 

Kenneth Knowles commented on CALCITE-6357:
--

[~julianhyde] - are we "holding it wrong"? Can you help us understand how?

I presume you would agree that names of output columns is as much part of data 
integrity as the values. So in BeamAggregateProjectMergeRule are we somehow 
making Calcite think this is not the output column?

It does seem that our rule is the delta that causes the problem. Calcite 
without that rule seems to do the right thing. The rule does one thing: runs 
the normal AggregateProjectMergeRule only when the underlying connector does 
*not* support pushdown.

In the queries that hit this bug, we know that the connector does *not* support 
pushdown, so the original behavior should be maintained.

> Calcite enforces select arguments count to be same as row schema fields which 
> causes aliases to be ignored
> --
>
> Key: CALCITE-6357
> URL: https://issues.apache.org/jira/browse/CALCITE-6357
> Project: Calcite
>  Issue Type: Bug
>Reporter: Brachi Packter
>Priority: Major
>
> Calcite RelBuilder.ProjectNamed checks if row size in the select is identical 
> to schema fields, if no, it creates a project with fields as they appear in 
> the select , meaning if they have aliases, they are returning with their 
> aliases.
> Here, it checks if they are identical:
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2063
> using RexUtil.isIdentity method:
> ```
>  public static boolean isIdentity(List exps,
>   RelDataType inputRowType) {
> return inputRowType.getFieldCount() == exps.size()
> && containIdentity(exps, inputRowType, Litmus.IGNORE);
>   }
> ```
> This is the problematic part `inputRowType.getFieldCount() == exps.size()`
> If they are identical, and return with their aliases, it is ignored in the 
> "rename" method later on
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2125
> and alias is skipped
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2137
> This doesn't impact calcite queries, but in Apache Beam they are doing some 
> optimization on top of it, 
> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamAggregateProjectMergeRule.java
> which causes aliases to be ignored, and data is returning suddenly without 
> correct column field.
> I believe the isIdentity check can causes more issues if not fixed, we need 
> to understand why is it enforced? isn't it valid to have different size of 
> fields in select from what we have in the schema?
> In our case we have a one big row and we run on it different queries, each 
> with different fields in the select.
> Beam issue 
> https://github.com/apache/beam/issues/30498 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (CALCITE-6355) RelToSqlConverter[ORDER BY] generates an incorrect order by when NULLS LAST is used in non-projected field

2024-04-10 Thread Mihai Budiu (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mihai Budiu resolved CALCITE-6355.
--
Fix Version/s: 1.37.0
   Resolution: Fixed

Fixed in 
https://github.com/apache/calcite/commit/4ce1e1651e8fea305fb27f743473926f9feeec23
Thank you, [~brunocvcunha]

> RelToSqlConverter[ORDER BY] generates an incorrect order by when NULLS LAST 
> is used in non-projected field
> --
>
> Key: CALCITE-6355
> URL: https://issues.apache.org/jira/browse/CALCITE-6355
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.36.0
>Reporter: Bruno Volpato
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.37.0
>
>
>  
> We are using RelToSqlConverter, and seeing issues with it generating invalid 
> queries when using _DESC NULLS LAST,_ specifically.
>  
> For example, this test query:
>  
> {code:java}
> select "product_id"
> from "product"
> where "net_weight" is not null
> group by "product_id"
> order by MAX("net_weight") desc {code}
> Gets resolved correctly, with a subquery, to:
>  
> {code:java}
> SELECT "product_id"
> FROM (SELECT "product_id", MAX("net_weight") AS "EXPR$1"
> FROM "foodmart"."product"
> WHERE "net_weight" IS NOT NULL
> GROUP BY "product_id"
> ORDER BY 2 DESC) AS "t3" {code}
>  
>  
> However, if I specify `desc nulls last`:
>  
> {code:java}
> select "product_id"
> from "product"
> where "net_weight" is not null
> group by "product_id"
> order by MAX("net_weight") desc nulls last {code}
> It creates an invalid query (order by 2, but only one field was projected):
>  
>  
> {code:java}
> SELECT "product_id"
> FROM "foodmart"."product"
> WHERE "net_weight" IS NOT NULL
> GROUP BY "product_id"
> ORDER BY 2 DESC NULLS LAST {code}
>  
>  
>  
> Trying to troubleshoot it, it appears that without the `NULLS LAST`, we have 
> the following instance:
>  
> {code:java}
> SqlBasicCall -> SqlNumericLiteral {code}
>  
>  
> But when including it, it gets wrapped in another call:
>  
> {code:java}
> SqlBasicCall -> SqlBasicCall -> SqlNumericLiteral {code}
>  
>  
> So the [hasSortByOrdinal 
> method|https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/rel/rel2sql/SqlImplementor.java#L1938C21-L1958]
>  ends up returning {_}false{_}, which causes `needNewSubQuery` to incorrectly 
> report _false_ too.
>  
> It appears that the best way to deal with this is by using a recursion to 
> find numeric literals - but let me know if there are better ideas. 
>  
> I plan to take a stab at this since I got enough context.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (CALCITE-6357) Calcite enforces select arguments count to be same as row schema fields which causes aliases to be ignored

2024-04-10 Thread Brachi Packter (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835858#comment-17835858
 ] 

Brachi Packter edited comment on CALCITE-6357 at 4/10/24 7:09 PM:
--

> If the number of fields does not match, that's probably a problem on your 
> end. RelBuilder almost always requires number of fields to match.

why? isn't it a valid case to have a row with wider schema from what you 
actually need to select? (e.g group by queries, select one dimension from the 
row and make some count/sum on it)

> At RelBuilder#2125 it seems that force is false. For the behavior you want, 
> force would need to be true
can't see where I can pass force, only here 
[https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2063]
but it looks like it should be false in order to be renamed later on (and is 
identical should return true)


was (Author: brachi_packter):
> If the number of fields does not match, that's probably a problem on your 
> end. RelBuilder almost always requires number of fields to match.

why? isn't it a valid case to have a row with wider schema from you actually 
need to select? (e.g group by queries, select one dimension from the row and 
make some count/sum on it)

> At RelBuilder#2125 it seems that force is false. For the behavior you want, 
> force would need to be true
can't see where I can pass force, only here 
https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2063
but it looks like it should be false in order to be renamed later on (and is 
identical should return true)

> Calcite enforces select arguments count to be same as row schema fields which 
> causes aliases to be ignored
> --
>
> Key: CALCITE-6357
> URL: https://issues.apache.org/jira/browse/CALCITE-6357
> Project: Calcite
>  Issue Type: Bug
>Reporter: Brachi Packter
>Priority: Major
>
> Calcite RelBuilder.ProjectNamed checks if row size in the select is identical 
> to schema fields, if no, it creates a project with fields as they appear in 
> the select , meaning if they have aliases, they are returning with their 
> aliases.
> Here, it checks if they are identical:
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2063
> using RexUtil.isIdentity method:
> ```
>  public static boolean isIdentity(List exps,
>   RelDataType inputRowType) {
> return inputRowType.getFieldCount() == exps.size()
> && containIdentity(exps, inputRowType, Litmus.IGNORE);
>   }
> ```
> This is the problematic part `inputRowType.getFieldCount() == exps.size()`
> If they are identical, and return with their aliases, it is ignored in the 
> "rename" method later on
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2125
> and alias is skipped
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2137
> This doesn't impact calcite queries, but in Apache Beam they are doing some 
> optimization on top of it, 
> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamAggregateProjectMergeRule.java
> which causes aliases to be ignored, and data is returning suddenly without 
> correct column field.
> I believe the isIdentity check can causes more issues if not fixed, we need 
> to understand why is it enforced? isn't it valid to have different size of 
> fields in select from what we have in the schema?
> In our case we have a one big row and we run on it different queries, each 
> with different fields in the select.
> Beam issue 
> https://github.com/apache/beam/issues/30498 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (CALCITE-6357) Calcite enforces select arguments count to be same as row schema fields which causes aliases to be ignored

2024-04-10 Thread Brachi Packter (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835858#comment-17835858
 ] 

Brachi Packter edited comment on CALCITE-6357 at 4/10/24 7:09 PM:
--

> If the number of fields does not match, that's probably a problem on your 
> end. RelBuilder almost always requires number of fields to match.

why? isn't it a valid case to have a row with wider schema from you actually 
need to select? (e.g group by queries, select one dimension from the row and 
make some count/sum on it)

> At RelBuilder#2125 it seems that force is false. For the behavior you want, 
> force would need to be true
can't see where I can pass force, only here 
https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2063
but it looks like it should be false in order to be renamed later on (and is 
identical should return true)


was (Author: brachi_packter):
> If the number of fields does not match, that's probably a problem on your 
> end. RelBuilder almost always requires number of fields to match.

why? isn't it a valid case to have a row with wider schema from you actually 
need to select? (e.g group by queries, select one dimension from the row and 
make some count/sum on it)

> Calcite enforces select arguments count to be same as row schema fields which 
> causes aliases to be ignored
> --
>
> Key: CALCITE-6357
> URL: https://issues.apache.org/jira/browse/CALCITE-6357
> Project: Calcite
>  Issue Type: Bug
>Reporter: Brachi Packter
>Priority: Major
>
> Calcite RelBuilder.ProjectNamed checks if row size in the select is identical 
> to schema fields, if no, it creates a project with fields as they appear in 
> the select , meaning if they have aliases, they are returning with their 
> aliases.
> Here, it checks if they are identical:
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2063
> using RexUtil.isIdentity method:
> ```
>  public static boolean isIdentity(List exps,
>   RelDataType inputRowType) {
> return inputRowType.getFieldCount() == exps.size()
> && containIdentity(exps, inputRowType, Litmus.IGNORE);
>   }
> ```
> This is the problematic part `inputRowType.getFieldCount() == exps.size()`
> If they are identical, and return with their aliases, it is ignored in the 
> "rename" method later on
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2125
> and alias is skipped
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2137
> This doesn't impact calcite queries, but in Apache Beam they are doing some 
> optimization on top of it, 
> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamAggregateProjectMergeRule.java
> which causes aliases to be ignored, and data is returning suddenly without 
> correct column field.
> I believe the isIdentity check can causes more issues if not fixed, we need 
> to understand why is it enforced? isn't it valid to have different size of 
> fields in select from what we have in the schema?
> In our case we have a one big row and we run on it different queries, each 
> with different fields in the select.
> Beam issue 
> https://github.com/apache/beam/issues/30498 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6357) Calcite enforces select arguments count to be same as row schema fields which causes aliases to be ignored

2024-04-10 Thread Brachi Packter (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835858#comment-17835858
 ] 

Brachi Packter commented on CALCITE-6357:
-

> If the number of fields does not match, that's probably a problem on your 
> end. RelBuilder almost always requires number of fields to match.

why? isn't it a valid case to have a row with wider schema from you actually 
need to select? (e.g group by queries, select one dimension from the row and 
make some count/sum on it)

> Calcite enforces select arguments count to be same as row schema fields which 
> causes aliases to be ignored
> --
>
> Key: CALCITE-6357
> URL: https://issues.apache.org/jira/browse/CALCITE-6357
> Project: Calcite
>  Issue Type: Bug
>Reporter: Brachi Packter
>Priority: Major
>
> Calcite RelBuilder.ProjectNamed checks if row size in the select is identical 
> to schema fields, if no, it creates a project with fields as they appear in 
> the select , meaning if they have aliases, they are returning with their 
> aliases.
> Here, it checks if they are identical:
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2063
> using RexUtil.isIdentity method:
> ```
>  public static boolean isIdentity(List exps,
>   RelDataType inputRowType) {
> return inputRowType.getFieldCount() == exps.size()
> && containIdentity(exps, inputRowType, Litmus.IGNORE);
>   }
> ```
> This is the problematic part `inputRowType.getFieldCount() == exps.size()`
> If they are identical, and return with their aliases, it is ignored in the 
> "rename" method later on
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2125
> and alias is skipped
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2137
> This doesn't impact calcite queries, but in Apache Beam they are doing some 
> optimization on top of it, 
> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamAggregateProjectMergeRule.java
> which causes aliases to be ignored, and data is returning suddenly without 
> correct column field.
> I believe the isIdentity check can causes more issues if not fixed, we need 
> to understand why is it enforced? isn't it valid to have different size of 
> fields in select from what we have in the schema?
> In our case we have a one big row and we run on it different queries, each 
> with different fields in the select.
> Beam issue 
> https://github.com/apache/beam/issues/30498 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (CALCITE-5743) Query gives incorrect result when COUNT appears in the correlated subquery select list

2024-04-10 Thread James Starr (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835836#comment-17835836
 ] 

James Starr edited comment on CALCITE-5743 at 4/10/24 5:54 PM:
---

I think you have to join the value generation query above the aggregate and 
then have defaulting functions:

Replacement to a correlated value agg:
{code}
...
  PROJECT # pass through or or null replacing
JOIN # on correlate values
  {VALUE GENERATOR}
  PROJECT # identify + non-null constant for key off to trigger replacing 
empty columns (maybe this is not needed)
AGG group={correlateVariables} aggCalls={existing calls}
  ...
{code}

The next question is their a way to detect if an aggregate functions is a 
scalar function or not, aka when given an empty set, does it emit a row?


was (Author: jamesstarr):
I think you have to join the value generation query above the aggregate and 
then have defaulting functions:

Replacement to a correlated value agg:
{code}
...
  PROJECT # pass through or or null replacing
JOIN # on correlate values
  {VALUE GENERATOR}
PROJECT # identify + non-null constant for key off to trigger replacing 
empty columns (maybe this is not needed)
AGG group={correlateVariables} aggCalls={existing calls}
...
{code}

The next question is their a way to detect if an aggregate functions is a 
scalar function or not, aka when given an empty set, does it emit a row?

> Query gives incorrect result when COUNT appears in the correlated subquery 
> select list
> --
>
> Key: CALCITE-5743
> URL: https://issues.apache.org/jira/browse/CALCITE-5743
> Project: Calcite
>  Issue Type: Bug
>Reporter: libopeng
>Priority: Major
>
> {code:java}
> SELECT a 
> FROM t1 t1 
> WHERE b IN (SELECT COUNT (*) FROM t2 WHERE t1.a=t2.a); {code}
> {code:java}
>   t1   | t2
> +--+   |  +-+
> | a | b |  |  | a |
> +--+   |  +-+
> | 3 | 6 |  |  | 3 |
> | 10 | 1 | |  | 3 |
> | 8 | 0 |  |  | 10 | 
>|
> {code}
> correct result
> {code:java}
> +--+
> | a |
> +--+
> | 10 |
> | 8 |{code}
> after decorrelate
> {code:java}
> LogicalProject(A=[$0])
>   LogicalJoin(condition=[AND(=($0, $3), =($1, $2))], joinType=[inner])
>     LogicalTableScan(table=[[t1]])
>     LogicalFilter(condition=[=($0, $0)])
>       LogicalProject(EXPR$0=[$1], a=[$0])
>         LogicalAggregate(group=[{0}], EXPR$0=[COUNT()])
>           LogicalProject(a=[$0])
>             LogicalFilter(condition=[=($0, $0)])
>               LogicalTableScan(table=[[t2]]) {code}
> error result
> {code:java}
> +--+
> | a |
> +--+
> | 10 | {code}
> Data with count=0 will be lost
> This issue was discovered in [this 
> issue|https://issues.apache.org/jira/projects/CALCITE/issues/CALCITE-5568]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (CALCITE-5743) Query gives incorrect result when COUNT appears in the correlated subquery select list

2024-04-10 Thread James Starr (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835836#comment-17835836
 ] 

James Starr edited comment on CALCITE-5743 at 4/10/24 5:51 PM:
---

I think you have to join the value generation query above the aggregate and 
then have defaulting functions:

Replacement to a correlated value agg:
{code}
...
  PROJECT # pass through or or null replacing
JOIN # on correlate values
  {VALUE GENERATOR}
PROJECT # identify + non-null constant for key off to trigger replacing 
empty columns (maybe this is not needed)
AGG group={correlateVariables} aggCalls={existing calls}
...
{code}

The next question is their a way to detect if an aggregate functions is a 
scalar function or not, aka when given an empty set, does it emit a row?


was (Author: jamesstarr):
I think you have to join the value generation query above the aggregate and 
then have defaulting functions:

{code}
PROJECT # pass through or or null replacing
  JOIN # on correlate values
{VALUE GENERATOR}
  PROJECT # identify + non-null constant for key off to trigger replacing 
empty columns (maybe this is not needed)
  AGG group={correlateVariables} aggCalls={existing calls}
...
{code}

The next question is their a way to detect if an aggregate functions is a 
scalar function or not, aka when given an empty set, does it emit a row?

> Query gives incorrect result when COUNT appears in the correlated subquery 
> select list
> --
>
> Key: CALCITE-5743
> URL: https://issues.apache.org/jira/browse/CALCITE-5743
> Project: Calcite
>  Issue Type: Bug
>Reporter: libopeng
>Priority: Major
>
> {code:java}
> SELECT a 
> FROM t1 t1 
> WHERE b IN (SELECT COUNT (*) FROM t2 WHERE t1.a=t2.a); {code}
> {code:java}
>   t1   | t2
> +--+   |  +-+
> | a | b |  |  | a |
> +--+   |  +-+
> | 3 | 6 |  |  | 3 |
> | 10 | 1 | |  | 3 |
> | 8 | 0 |  |  | 10 | 
>|
> {code}
> correct result
> {code:java}
> +--+
> | a |
> +--+
> | 10 |
> | 8 |{code}
> after decorrelate
> {code:java}
> LogicalProject(A=[$0])
>   LogicalJoin(condition=[AND(=($0, $3), =($1, $2))], joinType=[inner])
>     LogicalTableScan(table=[[t1]])
>     LogicalFilter(condition=[=($0, $0)])
>       LogicalProject(EXPR$0=[$1], a=[$0])
>         LogicalAggregate(group=[{0}], EXPR$0=[COUNT()])
>           LogicalProject(a=[$0])
>             LogicalFilter(condition=[=($0, $0)])
>               LogicalTableScan(table=[[t2]]) {code}
> error result
> {code:java}
> +--+
> | a |
> +--+
> | 10 | {code}
> Data with count=0 will be lost
> This issue was discovered in [this 
> issue|https://issues.apache.org/jira/projects/CALCITE/issues/CALCITE-5568]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5743) Query gives incorrect result when COUNT appears in the correlated subquery select list

2024-04-10 Thread James Starr (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835836#comment-17835836
 ] 

James Starr commented on CALCITE-5743:
--

I think you have to join the value generation query above the aggregate and 
then have defaulting functions:

{code}
PROJECT # pass through or or null replacing
  JOIN # on correlate values
{VALUE GENERATOR}
  PROJECT # identify + non-null constant for key off to trigger replacing 
empty columns (maybe this is not needed)
  AGG group={correlateVariables} aggCalls={existing calls}
...
{code}

The next question is their a way to detect if an aggregate functions is a 
scalar function or not, aka when given an empty set, does it emit a row?

> Query gives incorrect result when COUNT appears in the correlated subquery 
> select list
> --
>
> Key: CALCITE-5743
> URL: https://issues.apache.org/jira/browse/CALCITE-5743
> Project: Calcite
>  Issue Type: Bug
>Reporter: libopeng
>Priority: Major
>
> {code:java}
> SELECT a 
> FROM t1 t1 
> WHERE b IN (SELECT COUNT (*) FROM t2 WHERE t1.a=t2.a); {code}
> {code:java}
>   t1   | t2
> +--+   |  +-+
> | a | b |  |  | a |
> +--+   |  +-+
> | 3 | 6 |  |  | 3 |
> | 10 | 1 | |  | 3 |
> | 8 | 0 |  |  | 10 | 
>|
> {code}
> correct result
> {code:java}
> +--+
> | a |
> +--+
> | 10 |
> | 8 |{code}
> after decorrelate
> {code:java}
> LogicalProject(A=[$0])
>   LogicalJoin(condition=[AND(=($0, $3), =($1, $2))], joinType=[inner])
>     LogicalTableScan(table=[[t1]])
>     LogicalFilter(condition=[=($0, $0)])
>       LogicalProject(EXPR$0=[$1], a=[$0])
>         LogicalAggregate(group=[{0}], EXPR$0=[COUNT()])
>           LogicalProject(a=[$0])
>             LogicalFilter(condition=[=($0, $0)])
>               LogicalTableScan(table=[[t2]]) {code}
> error result
> {code:java}
> +--+
> | a |
> +--+
> | 10 | {code}
> Data with count=0 will be lost
> This issue was discovered in [this 
> issue|https://issues.apache.org/jira/projects/CALCITE/issues/CALCITE-5568]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-6358) Support all PostgreSQL 14 date/time patterns

2024-04-10 Thread Norman Jordan (Jira)
Norman Jordan created CALCITE-6358:
--

 Summary: Support all PostgreSQL 14 date/time patterns
 Key: CALCITE-6358
 URL: https://issues.apache.org/jira/browse/CALCITE-6358
 Project: Calcite
  Issue Type: Sub-task
Reporter: Norman Jordan


Many of the date/time format patterns supported by PostgreSQL 14 are not 
supported in Calcite.
 * HH
 * US
 * 
 * S
 * AM
 * A.M.
 * am
 * a.m.
 * PM
 * P.M.
 * pm
 * p.m.
 * Y,YYY
 * YYY
 * Y
 * IYYY
 * IYY
 * IY
 * I
 * BC
 * B.C.
 * bc
 * b.c.
 * AD
 * A.D.
 * ad
 * a.d.
 * MONTH
 * month
 * MON
 * mon
 * DAY
 * day
 * Dy
 * dy
 * IDDD
 * ID
 * TZH
 * TZM
 * OF

Some format patterns in Calcite behave differently from PostgreSQL 14.
 * FF1
 * FF2
 * FF4
 * FF5
 * FF6

Also verify that the other existing format strings produce results that match 
PostgreSQL 14.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6357) Calcite enforces select arguments count to be same as row schema fields which causes aliases to be ignored

2024-04-10 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835833#comment-17835833
 ] 

Julian Hyde commented on CALCITE-6357:
--

If the number of fields does not match, that's probably a problem on your end. 
{{RelBuilder}} almost always requires number of fields to match.

Regarding column aliases. Calcite generally doesn't promise to preserve 
aliases, but in some cases you can force a rename. At 
[RelBuilder#2125|https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2125]
 it seems that {{force}} is false. For the behavior you want, {{force}} would 
need to be true.

> Calcite enforces select arguments count to be same as row schema fields which 
> causes aliases to be ignored
> --
>
> Key: CALCITE-6357
> URL: https://issues.apache.org/jira/browse/CALCITE-6357
> Project: Calcite
>  Issue Type: Bug
>Reporter: Brachi Packter
>Priority: Major
>
> Calcite RelBuilder.ProjectNamed checks if row size in the select is identical 
> to schema fields, if no, it creates a project with fields as they appear in 
> the select , meaning if they have aliases, they are returning with their 
> aliases.
> Here, it checks if they are identical:
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2063
> using RexUtil.isIdentity method:
> ```
>  public static boolean isIdentity(List exps,
>   RelDataType inputRowType) {
> return inputRowType.getFieldCount() == exps.size()
> && containIdentity(exps, inputRowType, Litmus.IGNORE);
>   }
> ```
> This is the problematic part `inputRowType.getFieldCount() == exps.size()`
> If they are identical, and return with their aliases, it is ignored in the 
> "rename" method later on
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2125
> and alias is skipped
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2137
> This doesn't impact calcite queries, but in Apache Beam they are doing some 
> optimization on top of it, 
> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamAggregateProjectMergeRule.java
> which causes aliases to be ignored, and data is returning suddenly without 
> correct column field.
> I believe the isIdentity check can causes more issues if not fixed, we need 
> to understand why is it enforced? isn't it valid to have different size of 
> fields in select from what we have in the schema?
> In our case we have a one big row and we run on it different queries, each 
> with different fields in the select.
> Beam issue 
> https://github.com/apache/beam/issues/30498 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (CALCITE-6286) Optimizing ARRAY_REPEAT expression causes an assertion failure

2024-04-10 Thread Mihai Budiu (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mihai Budiu resolved CALCITE-6286.
--
Fix Version/s: 1.37.0
   Resolution: Fixed

> Optimizing ARRAY_REPEAT expression causes an assertion failure
> --
>
> Key: CALCITE-6286
> URL: https://issues.apache.org/jira/browse/CALCITE-6286
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.36.0
>Reporter: Mihai Budiu
>Priority: Minor
> Fix For: 1.37.0
>
>
> The following RelOptRulesTest causes an assertion failure:
> {code:java}
>   @Test void testArrayRepeat() {
> sql("SELECT array_repeat(123, null)")
> .withFactory(
> t -> t.withOperatorTable(
> opTab -> 
> SqlLibraryOperatorTableFactory.INSTANCE.getOperatorTable(
> SqlLibrary.STANDARD, SqlLibrary.SPARK)))
> .withRule(CoreRules.PROJECT_REDUCE_EXPRESSIONS).checkUnchanged();
>   }
> {code}
> The assertion failure is:
> {code}
> java.lang.AssertionError: Cannot add expression of different type to set:
> set type is RecordType(INTEGER NOT NULL ARRAY NOT NULL EXPR$0) NOT NULL
> expression type is RecordType(INTEGER NOT NULL ARRAY EXPR$0) NOT NULL
> set is rel#4:LogicalProject.(input=HepRelVertex#3,exprs=[ARRAY_REPEAT(123, 
> null:DECIMAL(19, 9))])
> expression is LogicalProject(EXPR$0=[null:INTEGER NOT NULL ARRAY])
>   LogicalValues(tuples=[[{ 0 }]])
> Type mismatch:
> rowtype of original rel: RecordType(INTEGER NOT NULL ARRAY NOT NULL EXPR$0) 
> NOT NULL
> rowtype of new rel: RecordType(INTEGER NOT NULL ARRAY EXPR$0) NOT NULL
> Difference:
> EXPR$0: INTEGER NOT NULL ARRAY NOT NULL -> INTEGER NOT NULL ARRAY
>   at 
> org.apache.calcite.plan.RelOptUtil.verifyTypeEquivalence(RelOptUtil.java:419)
>   at 
> org.apache.calcite.plan.hep.HepRuleCall.transformTo(HepRuleCall.java:60)
>   at 
> org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:273)
>   at 
> org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:288)
>   at 
> org.apache.calcite.rel.rules.ReduceExpressionsRule$ProjectReduceExpressionsRule.onMatch(ReduceExpressionsRule.java:317)
>   at 
> org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:337)
> {code}
> This suggests that the type inferred for ARRAY_REPEAT is incorrect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (CALCITE-6210) Cast to VARBINARY causes an assertion failure

2024-04-10 Thread Mihai Budiu (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mihai Budiu resolved CALCITE-6210.
--
Resolution: Fixed

> Cast to VARBINARY causes an assertion failure
> -
>
> Key: CALCITE-6210
> URL: https://issues.apache.org/jira/browse/CALCITE-6210
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.36.0
>Reporter: Mihai Budiu
>Assignee: Mihai Budiu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.37.0
>
>
> This test in SqlOperatorTest:
> {code:java}
>  SqlOperatorFixture f = fixture();
>  f.checkScalar("CAST('00' AS VARBINARY)", "00", "VARBINARY NOT NULL");
> {code}
> Causes the following assertion failure:
> {code}
> java.lang.AssertionError: value 00 does not match type class  
> org.apache.calcite.avatica.util.ByteString
>   at 
> org.apache.calcite.linq4j.tree.ConstantExpression.(ConstantExpression.java:51)
>   at 
> org.apache.calcite.linq4j.tree.Expressions.constant(Expressions.java:585)
>   at 
> org.apache.calcite.linq4j.tree.OptimizeShuttle.visit(OptimizeShuttle.java:305)
>   at 
> org.apache.calcite.linq4j.tree.UnaryExpression.accept(UnaryExpression.java:39)
>   at 
> org.apache.calcite.linq4j.tree.TernaryExpression.accept(TernaryExpression.java:47)
>   at 
> org.apache.calcite.linq4j.tree.DeclarationStatement.accept(DeclarationStatement.java:45)
>   at 
> org.apache.calcite.linq4j.tree.DeclarationStatement.accept(DeclarationStatement.java:27)
>   at 
> org.apache.calcite.linq4j.tree.BlockBuilder.optimize(BlockBuilder.java:426)
>   at 
> org.apache.calcite.linq4j.tree.BlockBuilder.toBlock(BlockBuilder.java:340)
>   at 
> org.apache.calcite.rex.RexExecutorImpl.compile(RexExecutorImpl.java:102)
>   at 
> org.apache.calcite.rex.RexExecutorImpl.compile(RexExecutorImpl.java:68)
>   at 
> org.apache.calcite.rex.RexExecutorImpl.reduce(RexExecutorImpl.java:133)
>   at 
> org.apache.calcite.rex.RexSimplify.simplifyCast(RexSimplify.java:2272)
>   at org.apache.calcite.rex.RexSimplify.simplify(RexSimplify.java:292)
>   at 
> org.apache.calcite.rex.RexSimplify.simplifyUnknownAs(RexSimplify.java:250)
>   at 
> org.apache.calcite.rex.RexSimplify.simplifyPreservingType(RexSimplify.java:189)
>   at 
> org.apache.calcite.rex.RexSimplify.simplifyPreservingType(RexSimplify.java:184)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6210) Cast to VARBINARY causes an assertion failure

2024-04-10 Thread Mihai Budiu (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835830#comment-17835830
 ] 

Mihai Budiu commented on CALCITE-6210:
--

Fixed in 
https://github.com/apache/calcite/commit/aba64f0b217093b500629fe07a0befdc68293fbc

> Cast to VARBINARY causes an assertion failure
> -
>
> Key: CALCITE-6210
> URL: https://issues.apache.org/jira/browse/CALCITE-6210
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.36.0
>Reporter: Mihai Budiu
>Assignee: Mihai Budiu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.37.0
>
>
> This test in SqlOperatorTest:
> {code:java}
>  SqlOperatorFixture f = fixture();
>  f.checkScalar("CAST('00' AS VARBINARY)", "00", "VARBINARY NOT NULL");
> {code}
> Causes the following assertion failure:
> {code}
> java.lang.AssertionError: value 00 does not match type class  
> org.apache.calcite.avatica.util.ByteString
>   at 
> org.apache.calcite.linq4j.tree.ConstantExpression.(ConstantExpression.java:51)
>   at 
> org.apache.calcite.linq4j.tree.Expressions.constant(Expressions.java:585)
>   at 
> org.apache.calcite.linq4j.tree.OptimizeShuttle.visit(OptimizeShuttle.java:305)
>   at 
> org.apache.calcite.linq4j.tree.UnaryExpression.accept(UnaryExpression.java:39)
>   at 
> org.apache.calcite.linq4j.tree.TernaryExpression.accept(TernaryExpression.java:47)
>   at 
> org.apache.calcite.linq4j.tree.DeclarationStatement.accept(DeclarationStatement.java:45)
>   at 
> org.apache.calcite.linq4j.tree.DeclarationStatement.accept(DeclarationStatement.java:27)
>   at 
> org.apache.calcite.linq4j.tree.BlockBuilder.optimize(BlockBuilder.java:426)
>   at 
> org.apache.calcite.linq4j.tree.BlockBuilder.toBlock(BlockBuilder.java:340)
>   at 
> org.apache.calcite.rex.RexExecutorImpl.compile(RexExecutorImpl.java:102)
>   at 
> org.apache.calcite.rex.RexExecutorImpl.compile(RexExecutorImpl.java:68)
>   at 
> org.apache.calcite.rex.RexExecutorImpl.reduce(RexExecutorImpl.java:133)
>   at 
> org.apache.calcite.rex.RexSimplify.simplifyCast(RexSimplify.java:2272)
>   at org.apache.calcite.rex.RexSimplify.simplify(RexSimplify.java:292)
>   at 
> org.apache.calcite.rex.RexSimplify.simplifyUnknownAs(RexSimplify.java:250)
>   at 
> org.apache.calcite.rex.RexSimplify.simplifyPreservingType(RexSimplify.java:189)
>   at 
> org.apache.calcite.rex.RexSimplify.simplifyPreservingType(RexSimplify.java:184)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-6357) Calcite enforces select arguments count to be same as row schema fields which causes aliases to be ignored

2024-04-10 Thread Brachi Packter (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brachi Packter updated CALCITE-6357:

Description: 
Calcite RelBuilder.ProjectNamed checks if row size in the select is identical 
to schema fields, if no, it creates a project with fields as they appear in the 
select , meaning if they have aliases, they are returning with their aliases.

Here, it checks if they are identical:

https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2063

using RexUtil.isIdentity method:

```
 public static boolean isIdentity(List exps,
  RelDataType inputRowType) {
return inputRowType.getFieldCount() == exps.size()
&& containIdentity(exps, inputRowType, Litmus.IGNORE);
  }
```
This is the problematic part `inputRowType.getFieldCount() == exps.size()`

If they are identical, and return with their aliases, it is ignored in the 
"rename" method later on
https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2125

and alias is skipped

https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2137

This doesn't impact calcite queries, but in Apache Beam they are doing some 
optimization on top of it, 
https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamAggregateProjectMergeRule.java
which causes aliases to be ignored, and data is returning suddenly without 
correct column field.

I believe the isIdentity check can causes more issues if not fixed, we need to 
understand why is it enforced? isn't it valid to have different size of fields 
in select from what we have in the schema?

In our case we have a one big row and we run on it different queries, each with 
different fields in the select.

Beam issue 
https://github.com/apache/beam/issues/30498 

  was:
Calcite RelBuilder.ProjectNamed cehcks if row size in the select is identical 
to schema fields, if no, it creates a project with fields as they appear in the 
select , meaning if they have aliases, they are returning with their aliases.

Here it checks if they are identical:

https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2063

using RexUtil.isIdentity method:

```
 public static boolean isIdentity(List exps,
  RelDataType inputRowType) {
return inputRowType.getFieldCount() == exps.size()
&& containIdentity(exps, inputRowType, Litmus.IGNORE);
  }
```
This is the problematic part `inputRowType.getFieldCount() == exps.size()`

And then it is ignored in the "rename" method later on
https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2125

and alias is skipped

https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2137

This doesn't impact calcite queries, but in Apache Beam they are doing some 
optimization on top of it, 
https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamAggregateProjectMergeRule.java
which cause aliases to be ignored, and data is returning suddenly without 
correct column field.

I believe the isIdentity check can causes more issues if not fixed, we need to 
understand why is it enforced? isn't it valid to have different size of fields 
in select from what we have in the schema?

In our case we have a one big row and we run on it different queries, each with 
different fields in the select.

Beam issue 
https://github.com/apache/beam/issues/30498 


> Calcite enforces select arguments count to be same as row schema fields which 
> causes aliases to be ignored
> --
>
> Key: CALCITE-6357
> URL: https://issues.apache.org/jira/browse/CALCITE-6357
> Project: Calcite
>  Issue Type: Bug
>Reporter: Brachi Packter
>Priority: Major
>
> Calcite RelBuilder.ProjectNamed checks if row size in the select is identical 
> to schema fields, if no, it creates a project with fields as they appear in 
> the select , meaning if they have aliases, they are returning with their 
> aliases.
> Here, it checks if they are identical:
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2063
> using RexUtil.isIdentity method:
> ```
>  public static boolean isIdentity(List exps,
>   RelDataType 

[jira] [Commented] (CALCITE-5390) RelDecorrelator throws NullPointerException

2024-04-10 Thread Ruben Q L (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835683#comment-17835683
 ] 

Ruben Q L commented on CALCITE-5390:


Recently I stumbled upon this bug, in my case it's a LogicalFilter with 2 
RexSubquery using the same $cor0 which triggers the NPE when calling the 
Decorrelator. It's a far more complicated query that the ones already mentioned 
previously (so far I haven't been able to reproduce my issue with a 
minimalistic test), but I think the principle is the same.

I'm not sure trimming plays a role here (at least on my case I get the error 
without trimming the plan). I pretty much agree with [~libenchao]'s analysis 
here, it seems strange that the SubQueryRemoveRule generates a plan with two 
LogicalCorrelate using the same corr variable (it's exactly the same on my 
case). The plan looks odd, but... if I am not mistaken, if we skip the 
decorrelation, that odd plan seems to run fine (at least on my case, where we 
run the decorrelator within a try-catch, given that it's been traditionally a 
problematic module), and it case of exception, we simply go with the original 
(non-decorrelated) plan, and everything is fine. Actually, it seems that this 
"bizarre plan structure" of having 2 LogicalCorrelate with the same variable is 
not that uncommon: if we look inside {{RelOptRulesTest.xml}} we can see around 
10 coincidences of said pattern.

Moreover, it'd seem that this pattern is a necessary condition for the 
Decorrelator NPE, but not sufficient (according to my tests, some of these 
plans can be correctly decorrelated; it probably depends on which fields are 
accessed on each "instance" of the corr variable, as [~julianhyde] mentioned 
here).

To sum up, I agree with [~libenchao] that one potential solution could be 
modifying SubQueryRemoveRule to introduce different correlating variables 
instead of one in these cases (and that will probably make the Decorralator 
happy). But, given that the current plan generated by SubQueryRemoveRule seems 
"odd by correct" (it gets executed correctly if we skip the Decorrelator and 
its potential NPE); I wonder if perhaps our efforts should rather focus on 
fixing the Decorrelator and make it work with this type of plans. I'm not sure 
which approach would be simpler, but I just wanted to mention it.

PS: I haven't looked in detail, but I feel that on Decorrelator side, the 
problem (or at least one of them) might originate because its {{CorelMap}} 
contains just a Map for {{}} , and it would seems that 
in case like this, a MultiMap would be required? (two different 
LogicalCorrelate values having the same CorrelationId key?)

> RelDecorrelator throws NullPointerException
> ---
>
> Key: CALCITE-5390
> URL: https://issues.apache.org/jira/browse/CALCITE-5390
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Dmitry Sysolyatin
>Priority: Major
>
> The current query throws NullPointerException
> {code:java}
> SELECT
>   (SELECT 1 FROM emp d WHERE d.job = a.job LIMIT 1) AS t1,
>   (SELECT a.job = 'PRESIDENT' FROM emp s LIMIT 1) as t2
> FROM emp a;
> {code}
> Test case - 
> [https://github.com/apache/calcite/commit/46fe9bc456f2d34cf7dccd29829c9e85abe69d5f]
> Logical plan before it fails:
> {code:java}
> LogicalProject(T1=[$8], T2=[$9])
>   LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], $f0=[$8], $f09=[$9])
>     LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], $f0=[$8], $f00=[$10])
>       LogicalCorrelate(correlation=[$cor0], joinType=[left], 
> requiredColumns=[{9}])
>         LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], 
> HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], $f0=[$8], $f9=[=($2, 
> 'PRESIDENT')])
>           LogicalCorrelate(correlation=[$cor0], joinType=[left], 
> requiredColumns=[{2}])
>             LogicalTableScan(table=[[scott, EMP]])
>             LogicalAggregate(group=[{}], agg#0=[SINGLE_VALUE($0)])
>               LogicalSort(fetch=[1])
>                 LogicalProject(EXPR$0=[1])
>                   LogicalFilter(condition=[=($2, $cor0.JOB)])
>                     LogicalTableScan(table=[[scott, EMP]])
>         LogicalAggregate(group=[{}], agg#0=[SINGLE_VALUE($0)])
>           LogicalSort(fetch=[1])
>             LogicalProject(EXPR$0=[$cor0.$f9])
>               LogicalTableScan(table=[[scott, EMP]]) {code}
> Stack trace:
> {code:java}
>  Caused by: java.lang.NullPointerException
>   at java.util.Objects.requireNonNull(Objects.java:203)
>   at 
> org.apache.calcite.sql2rel.RelDecorrelator.createValueGenerator(RelDecorrelator.java:833)
>   at 
> 

[jira] [Created] (CALCITE-6357) Calcite enforces select argument count to be same as row schema fields which causes aliases to be ignored

2024-04-10 Thread Brachi Packter (Jira)
Brachi Packter created CALCITE-6357:
---

 Summary: Calcite enforces select argument count to be same as row 
schema fields which causes aliases to be ignored
 Key: CALCITE-6357
 URL: https://issues.apache.org/jira/browse/CALCITE-6357
 Project: Calcite
  Issue Type: Bug
Reporter: Brachi Packter


Calcite RelBuilder.ProjectNamed cehcks if row size in the select is identical 
to schema fields, if no, it creates a project with fields as they appear in the 
select , meaning if they have aliases, they are returning with their aliases.

Here it checks if they are identical:

https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2063

using RexUtil.isIdentity method:

```
 public static boolean isIdentity(List exps,
  RelDataType inputRowType) {
return inputRowType.getFieldCount() == exps.size()
&& containIdentity(exps, inputRowType, Litmus.IGNORE);
  }
```
This is the problematic part `inputRowType.getFieldCount() == exps.size()`

And then it is ignored in the "rename" method later on
https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2125

and alias is skipped

https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2137

This doesn't impact calcite queries, but in Apache Beam they are doing some 
optimization on top of it, 
https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamAggregateProjectMergeRule.java
which cause aliases to be ignored, and data is returning suddenly without 
correct column field.

I believe the isIdentity check can causes more issues if not fixed, we need to 
understand why is it enforced? isn't it valid to have different size of fields 
in select from what we have in the schema?

In our case we have a one big row and we run on it different queries, each with 
different fields in the select.

Beam issue 
https://github.com/apache/beam/issues/30498 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-6357) Calcite enforces select arguments count to be same as row schema fields which causes aliases to be ignored

2024-04-10 Thread Brachi Packter (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brachi Packter updated CALCITE-6357:

Summary: Calcite enforces select arguments count to be same as row schema 
fields which causes aliases to be ignored  (was: Calcite enforces select 
argument count to be same as row schema fields which causes aliases to be 
ignored)

> Calcite enforces select arguments count to be same as row schema fields which 
> causes aliases to be ignored
> --
>
> Key: CALCITE-6357
> URL: https://issues.apache.org/jira/browse/CALCITE-6357
> Project: Calcite
>  Issue Type: Bug
>Reporter: Brachi Packter
>Priority: Major
>
> Calcite RelBuilder.ProjectNamed cehcks if row size in the select is identical 
> to schema fields, if no, it creates a project with fields as they appear in 
> the select , meaning if they have aliases, they are returning with their 
> aliases.
> Here it checks if they are identical:
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2063
> using RexUtil.isIdentity method:
> ```
>  public static boolean isIdentity(List exps,
>   RelDataType inputRowType) {
> return inputRowType.getFieldCount() == exps.size()
> && containIdentity(exps, inputRowType, Litmus.IGNORE);
>   }
> ```
> This is the problematic part `inputRowType.getFieldCount() == exps.size()`
> And then it is ignored in the "rename" method later on
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2125
> and alias is skipped
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2137
> This doesn't impact calcite queries, but in Apache Beam they are doing some 
> optimization on top of it, 
> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamAggregateProjectMergeRule.java
> which cause aliases to be ignored, and data is returning suddenly without 
> correct column field.
> I believe the isIdentity check can causes more issues if not fixed, we need 
> to understand why is it enforced? isn't it valid to have different size of 
> fields in select from what we have in the schema?
> In our case we have a one big row and we run on it different queries, each 
> with different fields in the select.
> Beam issue 
> https://github.com/apache/beam/issues/30498 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)