[jira] [Commented] (CALCITE-6214) Remove DISTINCT in aggregate function if field is unique

2024-02-11 Thread Jiajun Xie (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816494#comment-17816494
 ] 

Jiajun Xie commented on CALCITE-6214:
-

Fixed in 
[ec0dc3c|https://github.com/apache/calcite/commit/ec0dc3c886ef06020294dc80971c29ec3b90fa44]

> Remove DISTINCT in aggregate function if field is unique
> 
>
> Key: CALCITE-6214
> URL: https://issues.apache.org/jira/browse/CALCITE-6214
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Jiajun Xie
>Assignee: Jiajun Xie
>Priority: Minor
>  Labels: pull-request-available
>
> For the sql
> {code:java}
> select count(distinct x) cnt
> from(
>select distinct sal x from emp
> ) t  {code}
> The distinct that in count can be removed.
> {code:java}
> LogicalAggregate(group=[{}], CNT=[COUNT($0)])
>   LogicalAggregate(group=[{0}])
> LogicalProject(X=[$5])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> But `CoreRules#AGGREGATE_REMOVE` not support it, so there are two DISTINCT.
> {code:java}
> LogicalAggregate(group=[{}], CNT=[COUNT(DISTINCT $0)])
>   LogicalAggregate(group=[{0}])
> LogicalProject(X=[$5])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6214) Remove DISTINCT in aggregate function if field is unique

2024-02-03 Thread Jiajun Xie (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813884#comment-17813884
 ] 

Jiajun Xie commented on CALCITE-6214:
-

Sorry, I misunderstood.

The distinct can be removed if we implement `RelOptTable#isKey` for combined 
columns.

 

> Remove DISTINCT in aggregate function if field is unique
> 
>
> Key: CALCITE-6214
> URL: https://issues.apache.org/jira/browse/CALCITE-6214
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Jiajun Xie
>Assignee: Jiajun Xie
>Priority: Minor
>  Labels: pull-request-available
>
> For the sql
> {code:java}
> select count(distinct x) cnt
> from(
>select distinct sal x from emp
> ) t  {code}
> The distinct that in count can be removed.
> {code:java}
> LogicalAggregate(group=[{}], CNT=[COUNT($0)])
>   LogicalAggregate(group=[{0}])
> LogicalProject(X=[$5])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> But `CoreRules#AGGREGATE_REMOVE` not support it, so there are two DISTINCT.
> {code:java}
> LogicalAggregate(group=[{}], CNT=[COUNT(DISTINCT $0)])
>   LogicalAggregate(group=[{0}])
> LogicalProject(X=[$5])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6214) Remove DISTINCT in aggregate function if field is unique

2024-02-02 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813752#comment-17813752
 ] 

Julian Hyde commented on CALCITE-6214:
--

That’s not what I said. In my example the combined columns (deptno, ename) are 
unique but no single column is unique.

> Remove DISTINCT in aggregate function if field is unique
> 
>
> Key: CALCITE-6214
> URL: https://issues.apache.org/jira/browse/CALCITE-6214
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Jiajun Xie
>Assignee: Jiajun Xie
>Priority: Minor
>  Labels: pull-request-available
>
> For the sql
> {code:java}
> select count(distinct x) cnt
> from(
>select distinct sal x from emp
> ) t  {code}
> The distinct that in count can be removed.
> {code:java}
> LogicalAggregate(group=[{}], CNT=[COUNT($0)])
>   LogicalAggregate(group=[{0}])
> LogicalProject(X=[$5])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> But `CoreRules#AGGREGATE_REMOVE` not support it, so there are two DISTINCT.
> {code:java}
> LogicalAggregate(group=[{}], CNT=[COUNT(DISTINCT $0)])
>   LogicalAggregate(group=[{0}])
> LogicalProject(X=[$5])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6214) Remove DISTINCT in aggregate function if field is unique

2024-02-02 Thread Jiajun Xie (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813676#comment-17813676
 ] 

Jiajun Xie commented on CALCITE-6214:
-

Yes, empno is unique key, we can remove distinct for it.

I added a commit to test them.

> Remove DISTINCT in aggregate function if field is unique
> 
>
> Key: CALCITE-6214
> URL: https://issues.apache.org/jira/browse/CALCITE-6214
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Jiajun Xie
>Assignee: Jiajun Xie
>Priority: Minor
>  Labels: pull-request-available
>
> For the sql
> {code:java}
> select count(distinct x) cnt
> from(
>select distinct sal x from emp
> ) t  {code}
> The distinct that in count can be removed.
> {code:java}
> LogicalAggregate(group=[{}], CNT=[COUNT($0)])
>   LogicalAggregate(group=[{0}])
> LogicalProject(X=[$5])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> But `CoreRules#AGGREGATE_REMOVE` not support it, so there are two DISTINCT.
> {code:java}
> LogicalAggregate(group=[{}], CNT=[COUNT(DISTINCT $0)])
>   LogicalAggregate(group=[{0}])
> LogicalProject(X=[$5])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6214) Remove DISTINCT in aggregate function if field is unique

2024-01-29 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812038#comment-17812038
 ] 

Julian Hyde commented on CALCITE-6214:
--

Does this change take into account GROUP BY keys? For example, if we know that 
(ename, deptno) is unique (i.e. no employees in the same department have the 
same name) then we know we can remove the {{DISTINCT}} from the following query:
{code}
SELECT deptno, COUNT(DISTINCT ename)
FROM emp
GROUP BY deptno
{code}

We have to be careful if there are grouping sets; we cannot remove {{DISTINCT}} 
in the following similar query:
{code}
SELECT deptno, COUNT(DISTINCT ename)
FROM emp
GROUP BY ROLLUP(deptno)
{code}


> Remove DISTINCT in aggregate function if field is unique
> 
>
> Key: CALCITE-6214
> URL: https://issues.apache.org/jira/browse/CALCITE-6214
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Jiajun Xie
>Assignee: Jiajun Xie
>Priority: Minor
>  Labels: pull-request-available
>
> For the sql
> {code:java}
> select count(distinct x) cnt
> from(
>select distinct sal x from emp
> ) t  {code}
> The distinct that in count can be removed.
> {code:java}
> LogicalAggregate(group=[{}], CNT=[COUNT($0)])
>   LogicalAggregate(group=[{0}])
> LogicalProject(X=[$5])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> But `CoreRules#AGGREGATE_REMOVE` not support it, so there are two DISTINCT.
> {code:java}
> LogicalAggregate(group=[{}], CNT=[COUNT(DISTINCT $0)])
>   LogicalAggregate(group=[{0}])
> LogicalProject(X=[$5])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6214) Remove DISTINCT in aggregate function if field is unique

2024-01-28 Thread Jiajun Xie (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17811647#comment-17811647
 ] 

Jiajun Xie commented on CALCITE-6214:
-

I removed `AggregateRemoveDistinctRule` and added `removeRedundantDistinct` to  
`RelBuilder`.

Some users maybe not like  optimized logical plan by `RelBuilder`.

So the default value is false for `redundantDistinct`.

> Remove DISTINCT in aggregate function if field is unique
> 
>
> Key: CALCITE-6214
> URL: https://issues.apache.org/jira/browse/CALCITE-6214
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Jiajun Xie
>Assignee: Jiajun Xie
>Priority: Minor
>  Labels: pull-request-available
>
> For the sql
> {code:java}
> select count(distinct x) cnt
> from(
>select distinct sal x from emp
> ) t  {code}
> The distinct that in count can be removed.
> {code:java}
> LogicalAggregate(group=[{}], CNT=[COUNT($0)])
>   LogicalAggregate(group=[{0}])
> LogicalProject(X=[$5])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> But `CoreRules#AGGREGATE_REMOVE` not support it, so there are two DISTINCT.
> {code:java}
> LogicalAggregate(group=[{}], CNT=[COUNT(DISTINCT $0)])
>   LogicalAggregate(group=[{0}])
> LogicalProject(X=[$5])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6214) Remove DISTINCT in aggregate function if field is unique

2024-01-22 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17809646#comment-17809646
 ] 

Julian Hyde commented on CALCITE-6214:
--

I'm not sure what you mean by 'merge aggregates'. If you try to create your 
first plan
{noformat}
LogicalAggregate(group=[{0}], SDS=[SUM(DISTINCT $1)], SS=[SUM($1)])
  LogicalAggregate(group=[{0, 1}]) -- We can optimize it.
LogicalProject(DEPTNO=[$7], SAL=[$5])
  LogicalTableScan(table=[[CATALOG, SALES, EMP]])
{noformat}
using {{RelBuilder}}, it should remove the {{DISTINCT}} keyword and share the 
aggregate functions, so you get the following, which I believe is optimal:
{noformat}
LogicalProject(DEPTNO=[$0], SDS=[$1], SS=[$1])
  LogicalAggregate(group=[{0}], SS=[SUM($1)])
LogicalAggregate(group=[{0, 1}])
  LogicalProject(DEPTNO=[$7], SAL=[$5])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
{noformat}


> Remove DISTINCT in aggregate function if field is unique
> 
>
> Key: CALCITE-6214
> URL: https://issues.apache.org/jira/browse/CALCITE-6214
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Jiajun Xie
>Assignee: Jiajun Xie
>Priority: Minor
>  Labels: pull-request-available
>
> For the sql
> {code:java}
> select count(distinct x) cnt
> from(
>select distinct sal x from emp
> ) t  {code}
> The distinct that in count can be removed.
> {code:java}
> LogicalAggregate(group=[{}], CNT=[COUNT($0)])
>   LogicalAggregate(group=[{0}])
> LogicalProject(X=[$5])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> But `CoreRules#AGGREGATE_REMOVE` not support it, so there are two DISTINCT.
> {code:java}
> LogicalAggregate(group=[{}], CNT=[COUNT(DISTINCT $0)])
>   LogicalAggregate(group=[{0}])
> LogicalProject(X=[$5])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6214) Remove DISTINCT in aggregate function if field is unique

2024-01-21 Thread Jiajun Xie (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17809214#comment-17809214
 ] 

Jiajun Xie commented on CALCITE-6214:
-

1. Changed `COUNT` to `aggregate function`.
2. RelBuilder not support merge two aggregate that have different groupset. So 
there are two aggregate in plan.
{code:java}
LogicalAggregate(group=[{0}], SDS=[SUM(DISTINCT $1)], SS=[SUM($1)])
  LogicalAggregate(group=[{0, 1}]) -- We can optimize it.
    LogicalProject(DEPTNO=[$7], SAL=[$5])
      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
{code}
RelBuilder yield better plan for the sql that have no aggregate function.
{code:java}
SELECT deptno, sal ss
FROM (
   SELECT DISTINCT deptno, sal
   FROM emp)
GROUP BY deptno, sal 


LogicalProject(DEPTNO=[$0], SS=[$1])
  LogicalProject(DEPTNO=[$1], SAL=[$0])
    LogicalAggregate(group=[{5, 7}]) -- Only one aggregate
      LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
 I need some time to improve it, maybe I will add one commit that via 
RelBuilder on the weekend.

3. Removed backticks.

> Remove DISTINCT in aggregate function if field is unique
> 
>
> Key: CALCITE-6214
> URL: https://issues.apache.org/jira/browse/CALCITE-6214
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Jiajun Xie
>Assignee: Jiajun Xie
>Priority: Minor
>  Labels: pull-request-available
>
> For the sql
> {code:java}
> select count(distinct x) cnt
> from(
>select distinct sal x from emp
> ) t  {code}
> The distinct that in count can be removed.
> {code:java}
> LogicalAggregate(group=[{}], CNT=[COUNT($0)])
>   LogicalAggregate(group=[{0}])
> LogicalProject(X=[$5])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> But `CoreRules#AGGREGATE_REMOVE` not support it, so there are two DISTINCT.
> {code:java}
> LogicalAggregate(group=[{}], CNT=[COUNT(DISTINCT $0)])
>   LogicalAggregate(group=[{0}])
> LogicalProject(X=[$5])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)