[jira] [Comment Edited] (CALCITE-6020) SqlToRelConverter should not replace windowed SUM with equivalent expression using SUM0

2024-08-22 Thread Mihai Budiu (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17876123#comment-17876123
 ] 

Mihai Budiu edited comment on CALCITE-6020 at 8/22/24 10:16 PM:


If indeed your change is correct, then it uncovers a bug which I traced to this 
piece of code in RexBuilder.makeOver:

{code:java}
  makeCall(SqlStdOperatorTable.CASE,
  makeCall(SqlStdOperatorTable.GREATER_THAN,
  new RexOver(bigintType, SqlStdOperatorTable.COUNT, exprs,
  window, distinct, ignoreNulls),
  makeLiteral(BigDecimal.ZERO, bigintType,
  SqlTypeName.DECIMAL)),
  ensureType(type, // SUM0 is non-nullable, thus need a cast
  new RexOver(typeFactory.createTypeWithNullability(type, 
false),
  operator, exprs, window, distinct, ignoreNulls),
  false),
  makeNullLiteral(type));
{code}

I don't think the type should be made non-nullable here.


was (Author: JIRAUSER295926):
If indeed your change is correct, then it uncovers a bug which I traced to this 
piece of code in RexBuilder.makeOver:

{code:java}
  makeCall(SqlStdOperatorTable.CASE,
  makeCall(SqlStdOperatorTable.GREATER_THAN,
  new RexOver(bigintType, SqlStdOperatorTable.COUNT, exprs,
  window, distinct, ignoreNulls),
  makeLiteral(BigDecimal.ZERO, bigintType,
  SqlTypeName.DECIMAL)),
  ensureType(type, // SUM0 is non-nullable, thus need a cast
  new RexOver(typeFactory.createTypeWithNullability(type, 
false),
  operator, exprs, window, distinct, ignoreNulls),
  false),
  makeNullLiteral(type));
{code}

I don't think the type should be made nullable here.

> SqlToRelConverter should not replace windowed SUM with equivalent expression 
> using SUM0
> ---
>
> Key: CALCITE-6020
> URL: https://issues.apache.org/jira/browse/CALCITE-6020
> Project: Calcite
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Norman Jordan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.38.0
>
>
> {{SqlToRelConverter}} replaces {{SUM}} with {{SUM0}} around 
> [here|https://github.com/apache/calcite/blob/e1991e08a225ef08c2402ab35c310d88fff3c222/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L5885]
> This might have been needed at some point in the past - but I think it will 
> be better to leave it as {{SUM}} - as in case there is no {{SUM0}} in the 
> system that will be replaced with a {{COALESCE(SUM(...) , 0 )}} to provide it 
> - as see 
> [here|https://github.com/apache/calcite/blob/e1991e08a225ef08c2402ab35c310d88fff3c222/core/src/test/java/org/apache/calcite/rel/rel2sql/RelToSqlConverterTest.java#L1288]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (CALCITE-6020) SqlToRelConverter should not replace windowed SUM with equivalent expression using SUM0

2024-08-22 Thread Mihai Budiu (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17876118#comment-17876118
 ] 

Mihai Budiu edited comment on CALCITE-6020 at 8/22/24 10:06 PM:


I have already merged your PR in main.
I haven't filed an issue yet with our compiler, since I am still investigating 
the exact cause.
The symptom is that building with Calcite using the commit just before your 
change everything is fine, but with Calcite including your change the compiled 
program fails.

The program that reproduces the issue is the following:
{code:sql}
CREATE TABLE transactions (
cc_num BIGINT NOT NULL,
amt FLOAT64,
unix_time INTEGER NOT NULL
);

CREATE VIEW features as
SELECT
AVG(amt) OVER(
PARTITION BY cc_num
ORDER BY unix_time
RANGE BETWEEN 2592000 PRECEDING AND 1 PRECEDING) AS
avg_spend_pm,
COUNT(*) OVER(
PARTITION BY cc_num
ORDER BY unix_time
RANGE BETWEEN 86400  PRECEDING AND 1 PRECEDING ) AS
trans_freq_24
FROM transactions AS t1;
{code}

What happens using your change is that Calcite asserts that a specific result 
is not null, but it turns out to be null at runtime.

The working logical plan is:

{code}
LogicalProject(AVG_SPEND_PM=[/(CASE(>($3, 0), $4, null:DOUBLE), $3)], 
TRANS_FREQ_24=[$5]), id = 45
  LogicalWindow(window#0=[window(partition {0} order by [2] range between 
$3 PRECEDING and $4 PRECEDING aggs [COUNT($1), $SUM0($1)])], 
window#1=[window(partition {0} order by [2] range between $5 PRECEDING and $4 
PRECEDING aggs [COUNT()])]), id = 43
LogicalTableScan(table=[[schema, TRANSACTIONS]]), id = 1
{code}
The failing logical plan only differs in the use of SUM instead of SUM0:

{code}
LogicalProject(AVG_SPEND_PM=[/(CASE(>($3, 0), $4, null:DOUBLE), $3)], 
TRANS_FREQ_24=[$5]), id = 45
  LogicalWindow(window#0=[window(partition {0} order by [2] range between 
$3 PRECEDING and $4 PRECEDING aggs [COUNT($1), SUM($1)])], 
window#1=[window(partition {0} order by [2] range between $5 PRECEDING and $4 
PRECEDING aggs [COUNT()])]), id = 43
LogicalTableScan(table=[[schema, TRANSACTIONS]]), id = 1
{code}

But what you cannot see in the plans is the type of the row produced by the 
logical window. The failing plan says that field $4 of the LogicalWindow is not 
nullable, whereas it should nullable - in fact, the final result should be null 
for a any 1-row.

I guess SUM0 never produces a NULL result, and that's the difference.

Maybe your change actually uncovers a different bug in the window 
implementation.


was (Author: JIRAUSER295926):
I have already merged your PR in main.
I haven't filed an issue yet with our compiler, since I am still investigating 
the exact cause.
The symptom is that building with Calcite using the commit just before your 
change everything is fine, but with Calcite including your change the compiled 
program fails.

The program that reproduces the issue is the following:
{code:sql}
CREATE TABLE transactions (
cc_num BIGINT NOT NULL,
amt FLOAT64,
unix_time INTEGER NOT NULL
);

CREATE VIEW features as
SELECT
AVG(amt) OVER(
PARTITION BY cc_num
ORDER BY unix_time
RANGE BETWEEN 2592000 PRECEDING AND 1 PRECEDING) AS
avg_spend_pm,
COUNT(*) OVER(
PARTITION BY cc_num
ORDER BY unix_time
RANGE BETWEEN 86400  PRECEDING AND 1 PRECEDING ) AS
trans_freq_24
FROM transactions AS t1;
{code}

What happens using your change is that Calcite asserts that a specific result 
is not null, but it turns out to be null at runtime.

The working logical plan is:

LogicalProject(AVG_SPEND_PM=[/(CASE(>($3, 0), $4, null:DOUBLE), $3)], 
TRANS_FREQ_24=[$5]), id = 45
  LogicalWindow(window#0=[window(partition {0} order by [2] range between 
$3 PRECEDING and $4 PRECEDING aggs [COUNT($1), $SUM0($1)])], 
window#1=[window(partition {0} order by [2] range between $5 PRECEDING and $4 
PRECEDING aggs [COUNT()])]), id = 43
LogicalTableScan(table=[[schema, TRANSACTIONS]]), id = 1

The failing logical plan only differs in the use of SUM instead of SUM0:

LogicalProject(AVG_SPEND_PM=[/(CASE(>($3, 0), $4, null:DOUBLE), $3)], 
TRANS_FREQ_24=[$5]), id = 45
  LogicalWindow(window#0=[window(partition {0} order by [2] range between 
$3 PRECEDING and $4 PRECEDING aggs [COUNT($1), SUM($1)])], 
window#1=[window(

[jira] [Comment Edited] (CALCITE-6020) SqlToRelConverter should not replace windowed SUM with equivalent expression using SUM0

2023-09-21 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767768#comment-17767768
 ] 

Julian Hyde edited comment on CALCITE-6020 at 9/22/23 1:12 AM:
---

{{SUM0}} stores less information than {{SUM}}. Therefore there are situations 
where {{SUM0}} is easier to compute and still provides the needed data. Summary 
tables are one example.

If you need {{SUM}} you can compute it from {{SUM0}} and {{COUNT}}: {{SUM\(x) 
=== CASE COUNT\(x) WHEN 0 THEN NULL ELSE SUM0\(x)}}

I can't make the claim that SUM0 is definitely needed anywhere. However I do 
claim that it is a better primitive than SUM, and we should be embracing the 
best primitives we can find.


was (Author: julianhyde):
{{SUM0}} stores less information than {{SUM}}. Therefore there are situations 
where {{SUM0}} is easier to compute and still provides the needed data. Summary 
tables are one example.

If you need {{SUM}} you can compute it from {{SUM0}} and {{COUNT}}: {{SUM(x) 
=== CASE COUNT(x) WHEN 0 THEN NULL ELSE SUM0(x)}}

I can't make the claim that SUM0 is definitely needed anywhere. However I do 
claim that it is a better primitive than SUM, and we should be embracing the 
best primitives we can find.

> SqlToRelConverter should not replace windowed SUM with equivalent expression 
> using SUM0
> ---
>
> Key: CALCITE-6020
> URL: https://issues.apache.org/jira/browse/CALCITE-6020
> Project: Calcite
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>
> {{SqlToRelConverter}} replaces {{SUM}} with {{SUM0}} around 
> [here|https://github.com/apache/calcite/blob/e1991e08a225ef08c2402ab35c310d88fff3c222/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L5885]
> This might have been needed at some point in the past - but I think it will 
> be better to leave it as {{SUM}} - as in case there is no {{SUM0}} in the 
> system that will be replaced with a {{COALESCE(SUM(...) , 0 )}} to provide it 
> - as see 
> [here|https://github.com/apache/calcite/blob/e1991e08a225ef08c2402ab35c310d88fff3c222/core/src/test/java/org/apache/calcite/rel/rel2sql/RelToSqlConverterTest.java#L1288]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)