[ 
https://issues.apache.org/jira/browse/CALCITE-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lai Zhou updated CALCITE-2907:
------------------------------
    Description: 
In my usecase:

an Aggregate which contains distinct call was converted improperly to an error  
relational algebra.

 
{code:java}
SELECT user_id,
                       order_id,
                       product_id,
                       count(DISTINCT secured_libs.u51decrypt(phone)) AS 
contact_count,
                       count(DISTINCT (CASE
                                           WHEN is_cell_phone=0 THEN 
secured_libs.u51decrypt(phone)
                                       END)) AS fixedphone_count,
                       count(DISTINCT (CASE
                                           WHEN is_cell_phone=1 THEN 
secured_libs.u51decrypt(phone)
                                       END)) AS telehone_count,
                       count(DISTINCT substr((CASE
                                                  WHEN is_cell_phone=1 THEN 
secured_libs.u51decrypt(phone)
                                              END),1,3)) AS seg1uv,
              count(DISTINCT substr((CASE
                                         WHEN is_cell_phone=1 THEN 
secured_libs.u51decrypt(phone)
                                     END),4,4)) AS seg2uv,
              count(DISTINCT substr((CASE
                                         WHEN is_cell_phone=1 THEN 
secured_libs.u51decrypt(phone)
                                     END),8,4)) AS seg3uv,
              stddev_pop(substr((CASE
                                     WHEN is_cell_phone=1 THEN 
secured_libs.u51decrypt(phone)
                                 END),1,3)) AS seg1stddev,
              stddev_pop(substr((CASE
                                     WHEN is_cell_phone=1 THEN 
secured_libs.u51decrypt(phone)
                                 END),4,4)) AS seg2stddev,
              stddev_pop(substr((CASE
                                     WHEN is_cell_phone=1 THEN 
secured_libs.u51decrypt(phone)
                                 END),8,4)) AS seg3stddev,
              entropy(substr((CASE
                                        WHEN is_cell_phone=1 THEN 
secured_libs.u51decrypt(phone)
                                    END),1,3)) AS seg1entropy,
              entropy(substr((CASE
                                        WHEN is_cell_phone=1 THEN 
secured_libs.u51decrypt(phone)
                                    END),4,4)) AS seg2entropy,
              entropy(substr((CASE
                                        WHEN is_cell_phone=1 THEN 
secured_libs.u51decrypt(phone)
                                    END),8,4)) AS seg3entropy

                FROM dw_risk__mygravitation_v_snap_contacts_contacts
                GROUP BY user_id,
                         order_id,
                         product_id
{code}
 

 

After digging into the code,I found at the line 444 of the 
AggregateExpandDistinctAggregatesRule.java :

 
{code:java}
int x = groupCount;
final List<AggregateCall> newCalls = new ArrayList<>();
for (AggregateCall aggCall : aggregate.getAggCallList()) {
  final int newFilterArg;
  final List<Integer> newArgList;
  final SqlAggFunction aggregation;
  if (!aggCall.isDistinct()) {
    aggregation = SqlStdOperatorTable.MIN;
    newArgList = ImmutableIntList.of(x++);
    newFilterArg = filters.get(aggregate.getGroupSet());
  } else {
{code}
the undistinct aggregate call `stddev_pop` and `entropy` was converted to a 

 SqlStdOperatorTable.MIN,(actually all undistinct aggregate call here will be 
converted to a SqlStdOperatorTable.MIN ).

I don't understand how it works.

I guess someone made a faulty assumption here. [~julianhyde] ,can someone help 
me ASAP ?

It’s very important for my business.

 

 

  was:
In my usecase:

an Aggregate which contains distinct call was converted improperly to an error  
relational algebra.

 
{code:java}
SELECT user_id,
                       order_id,
                       product_id,
                       count(DISTINCT secured_libs.u51decrypt(phone)) AS 
contact_count,
                       count(DISTINCT (CASE
                                           WHEN is_cell_phone=0 THEN 
secured_libs.u51decrypt(phone)
                                       END)) AS fixedphone_count,
                       count(DISTINCT (CASE
                                           WHEN is_cell_phone=1 THEN 
secured_libs.u51decrypt(phone)
                                       END)) AS telehone_count,
                       count(DISTINCT substr((CASE
                                                  WHEN is_cell_phone=1 THEN 
secured_libs.u51decrypt(phone)
                                              END),1,3)) AS seg1uv,
              count(DISTINCT substr((CASE
                                         WHEN is_cell_phone=1 THEN 
secured_libs.u51decrypt(phone)
                                     END),4,4)) AS seg2uv,
              count(DISTINCT substr((CASE
                                         WHEN is_cell_phone=1 THEN 
secured_libs.u51decrypt(phone)
                                     END),8,4)) AS seg3uv,
              stddev_pop(substr((CASE
                                     WHEN is_cell_phone=1 THEN 
secured_libs.u51decrypt(phone)
                                 END),1,3)) AS seg1stddev,
              stddev_pop(substr((CASE
                                     WHEN is_cell_phone=1 THEN 
secured_libs.u51decrypt(phone)
                                 END),4,4)) AS seg2stddev,
              stddev_pop(substr((CASE
                                     WHEN is_cell_phone=1 THEN 
secured_libs.u51decrypt(phone)
                                 END),8,4)) AS seg3stddev,
              entropy(substr((CASE
                                        WHEN is_cell_phone=1 THEN 
secured_libs.u51decrypt(phone)
                                    END),1,3)) AS seg1entropy,
              entropy(substr((CASE
                                        WHEN is_cell_phone=1 THEN 
secured_libs.u51decrypt(phone)
                                    END),4,4)) AS seg2entropy,
              entropy(substr((CASE
                                        WHEN is_cell_phone=1 THEN 
secured_libs.u51decrypt(phone)
                                    END),8,4)) AS seg3entropy

                FROM dw_risk__mygravitation_v_snap_contacts_contacts
                GROUP BY user_id,
                         order_id,
                         product_id
{code}
 

 

After digging into the code,I found at the line 444 of the 
AggregateExpandDistinctAggregatesRule.java :

 
{code:java}
int x = groupCount;
final List<AggregateCall> newCalls = new ArrayList<>();
for (AggregateCall aggCall : aggregate.getAggCallList()) {
  final int newFilterArg;
  final List<Integer> newArgList;
  final SqlAggFunction aggregation;
  if (!aggCall.isDistinct()) {
    aggregation = SqlStdOperatorTable.MIN;
    newArgList = ImmutableIntList.of(x++);
    newFilterArg = filters.get(aggregate.getGroupSet());
  } else {
{code}
the undistinct aggregate call `stddev_pop` and `entropy` was converted to a 

 SqlStdOperatorTable.MIN,(actually all undistinct aggregate call here will be 
converted to a SqlStdOperatorTable.MIN ).

I don't understand how it works.

I guess someone make a faulty assumption here. [~julianhyde] ,can someone help 
me ASAP ?

It’s very important for my business.

 

 


> AggregateExpandDistinctAggregatesRule produces a wrong relational algebra
> -------------------------------------------------------------------------
>
>                 Key: CALCITE-2907
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2907
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.18.0
>            Reporter: Lai Zhou
>            Priority: Major
>
> In my usecase:
> an Aggregate which contains distinct call was converted improperly to an 
> error  relational algebra.
>  
> {code:java}
> SELECT user_id,
>                        order_id,
>                        product_id,
>                        count(DISTINCT secured_libs.u51decrypt(phone)) AS 
> contact_count,
>                        count(DISTINCT (CASE
>                                            WHEN is_cell_phone=0 THEN 
> secured_libs.u51decrypt(phone)
>                                        END)) AS fixedphone_count,
>                        count(DISTINCT (CASE
>                                            WHEN is_cell_phone=1 THEN 
> secured_libs.u51decrypt(phone)
>                                        END)) AS telehone_count,
>                        count(DISTINCT substr((CASE
>                                                   WHEN is_cell_phone=1 THEN 
> secured_libs.u51decrypt(phone)
>                                               END),1,3)) AS seg1uv,
>               count(DISTINCT substr((CASE
>                                          WHEN is_cell_phone=1 THEN 
> secured_libs.u51decrypt(phone)
>                                      END),4,4)) AS seg2uv,
>               count(DISTINCT substr((CASE
>                                          WHEN is_cell_phone=1 THEN 
> secured_libs.u51decrypt(phone)
>                                      END),8,4)) AS seg3uv,
>               stddev_pop(substr((CASE
>                                      WHEN is_cell_phone=1 THEN 
> secured_libs.u51decrypt(phone)
>                                  END),1,3)) AS seg1stddev,
>               stddev_pop(substr((CASE
>                                      WHEN is_cell_phone=1 THEN 
> secured_libs.u51decrypt(phone)
>                                  END),4,4)) AS seg2stddev,
>               stddev_pop(substr((CASE
>                                      WHEN is_cell_phone=1 THEN 
> secured_libs.u51decrypt(phone)
>                                  END),8,4)) AS seg3stddev,
>               entropy(substr((CASE
>                                         WHEN is_cell_phone=1 THEN 
> secured_libs.u51decrypt(phone)
>                                     END),1,3)) AS seg1entropy,
>               entropy(substr((CASE
>                                         WHEN is_cell_phone=1 THEN 
> secured_libs.u51decrypt(phone)
>                                     END),4,4)) AS seg2entropy,
>               entropy(substr((CASE
>                                         WHEN is_cell_phone=1 THEN 
> secured_libs.u51decrypt(phone)
>                                     END),8,4)) AS seg3entropy
>                 FROM dw_risk__mygravitation_v_snap_contacts_contacts
>                 GROUP BY user_id,
>                          order_id,
>                          product_id
> {code}
>  
>  
> After digging into the code,I found at the line 444 of the 
> AggregateExpandDistinctAggregatesRule.java :
>  
> {code:java}
> int x = groupCount;
> final List<AggregateCall> newCalls = new ArrayList<>();
> for (AggregateCall aggCall : aggregate.getAggCallList()) {
>   final int newFilterArg;
>   final List<Integer> newArgList;
>   final SqlAggFunction aggregation;
>   if (!aggCall.isDistinct()) {
>     aggregation = SqlStdOperatorTable.MIN;
>     newArgList = ImmutableIntList.of(x++);
>     newFilterArg = filters.get(aggregate.getGroupSet());
>   } else {
> {code}
> the undistinct aggregate call `stddev_pop` and `entropy` was converted to a 
>  SqlStdOperatorTable.MIN,(actually all undistinct aggregate call here will be 
> converted to a SqlStdOperatorTable.MIN ).
> I don't understand how it works.
> I guess someone made a faulty assumption here. [~julianhyde] ,can someone 
> help me ASAP ?
> It’s very important for my business.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to