[ https://issues.apache.org/jira/browse/CALCITE-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lai Zhou updated CALCITE-2907: ------------------------------ Description: In my usecase: an Aggregate which contains distinct call was converted improperly to an error relational algebra. {code:java} SELECT user_id, order_id, product_id, count(DISTINCT secured_libs.u51decrypt(phone)) AS contact_count, count(DISTINCT (CASE WHEN is_cell_phone=0 THEN secured_libs.u51decrypt(phone) END)) AS fixedphone_count, count(DISTINCT (CASE WHEN is_cell_phone=1 THEN secured_libs.u51decrypt(phone) END)) AS telehone_count, count(DISTINCT substr((CASE WHEN is_cell_phone=1 THEN secured_libs.u51decrypt(phone) END),1,3)) AS seg1uv, count(DISTINCT substr((CASE WHEN is_cell_phone=1 THEN secured_libs.u51decrypt(phone) END),4,4)) AS seg2uv, count(DISTINCT substr((CASE WHEN is_cell_phone=1 THEN secured_libs.u51decrypt(phone) END),8,4)) AS seg3uv, stddev_pop(substr((CASE WHEN is_cell_phone=1 THEN secured_libs.u51decrypt(phone) END),1,3)) AS seg1stddev, stddev_pop(substr((CASE WHEN is_cell_phone=1 THEN secured_libs.u51decrypt(phone) END),4,4)) AS seg2stddev, stddev_pop(substr((CASE WHEN is_cell_phone=1 THEN secured_libs.u51decrypt(phone) END),8,4)) AS seg3stddev, entropy(substr((CASE WHEN is_cell_phone=1 THEN secured_libs.u51decrypt(phone) END),1,3)) AS seg1entropy, entropy(substr((CASE WHEN is_cell_phone=1 THEN secured_libs.u51decrypt(phone) END),4,4)) AS seg2entropy, entropy(substr((CASE WHEN is_cell_phone=1 THEN secured_libs.u51decrypt(phone) END),8,4)) AS seg3entropy FROM dw_risk__mygravitation_v_snap_contacts_contacts GROUP BY user_id, order_id, product_id {code} After digging into the code,I found at the line 444 of the AggregateExpandDistinctAggregatesRule.java : {code:java} int x = groupCount; final List<AggregateCall> newCalls = new ArrayList<>(); for (AggregateCall aggCall : aggregate.getAggCallList()) { final int newFilterArg; final List<Integer> newArgList; final SqlAggFunction aggregation; if (!aggCall.isDistinct()) { aggregation = SqlStdOperatorTable.MIN; newArgList = ImmutableIntList.of(x++); newFilterArg = filters.get(aggregate.getGroupSet()); } else { {code} the undistinct aggregate call `stddev_pop` and `entropy` was converted to a SqlStdOperatorTable.MIN,(actually all undistinct aggregate call here will be converted to a SqlStdOperatorTable.MIN ). I don't understand how it works. I guess someone made a faulty assumption here. [~julianhyde] ,can someone help me ASAP ? It’s very important for my business. was: In my usecase: an Aggregate which contains distinct call was converted improperly to an error relational algebra. {code:java} SELECT user_id, order_id, product_id, count(DISTINCT secured_libs.u51decrypt(phone)) AS contact_count, count(DISTINCT (CASE WHEN is_cell_phone=0 THEN secured_libs.u51decrypt(phone) END)) AS fixedphone_count, count(DISTINCT (CASE WHEN is_cell_phone=1 THEN secured_libs.u51decrypt(phone) END)) AS telehone_count, count(DISTINCT substr((CASE WHEN is_cell_phone=1 THEN secured_libs.u51decrypt(phone) END),1,3)) AS seg1uv, count(DISTINCT substr((CASE WHEN is_cell_phone=1 THEN secured_libs.u51decrypt(phone) END),4,4)) AS seg2uv, count(DISTINCT substr((CASE WHEN is_cell_phone=1 THEN secured_libs.u51decrypt(phone) END),8,4)) AS seg3uv, stddev_pop(substr((CASE WHEN is_cell_phone=1 THEN secured_libs.u51decrypt(phone) END),1,3)) AS seg1stddev, stddev_pop(substr((CASE WHEN is_cell_phone=1 THEN secured_libs.u51decrypt(phone) END),4,4)) AS seg2stddev, stddev_pop(substr((CASE WHEN is_cell_phone=1 THEN secured_libs.u51decrypt(phone) END),8,4)) AS seg3stddev, entropy(substr((CASE WHEN is_cell_phone=1 THEN secured_libs.u51decrypt(phone) END),1,3)) AS seg1entropy, entropy(substr((CASE WHEN is_cell_phone=1 THEN secured_libs.u51decrypt(phone) END),4,4)) AS seg2entropy, entropy(substr((CASE WHEN is_cell_phone=1 THEN secured_libs.u51decrypt(phone) END),8,4)) AS seg3entropy FROM dw_risk__mygravitation_v_snap_contacts_contacts GROUP BY user_id, order_id, product_id {code} After digging into the code,I found at the line 444 of the AggregateExpandDistinctAggregatesRule.java : {code:java} int x = groupCount; final List<AggregateCall> newCalls = new ArrayList<>(); for (AggregateCall aggCall : aggregate.getAggCallList()) { final int newFilterArg; final List<Integer> newArgList; final SqlAggFunction aggregation; if (!aggCall.isDistinct()) { aggregation = SqlStdOperatorTable.MIN; newArgList = ImmutableIntList.of(x++); newFilterArg = filters.get(aggregate.getGroupSet()); } else { {code} the undistinct aggregate call `stddev_pop` and `entropy` was converted to a SqlStdOperatorTable.MIN,(actually all undistinct aggregate call here will be converted to a SqlStdOperatorTable.MIN ). I don't understand how it works. I guess someone make a faulty assumption here. [~julianhyde] ,can someone help me ASAP ? It’s very important for my business. > AggregateExpandDistinctAggregatesRule produces a wrong relational algebra > ------------------------------------------------------------------------- > > Key: CALCITE-2907 > URL: https://issues.apache.org/jira/browse/CALCITE-2907 > Project: Calcite > Issue Type: Bug > Components: core > Affects Versions: 1.18.0 > Reporter: Lai Zhou > Priority: Major > > In my usecase: > an Aggregate which contains distinct call was converted improperly to an > error relational algebra. > > {code:java} > SELECT user_id, > order_id, > product_id, > count(DISTINCT secured_libs.u51decrypt(phone)) AS > contact_count, > count(DISTINCT (CASE > WHEN is_cell_phone=0 THEN > secured_libs.u51decrypt(phone) > END)) AS fixedphone_count, > count(DISTINCT (CASE > WHEN is_cell_phone=1 THEN > secured_libs.u51decrypt(phone) > END)) AS telehone_count, > count(DISTINCT substr((CASE > WHEN is_cell_phone=1 THEN > secured_libs.u51decrypt(phone) > END),1,3)) AS seg1uv, > count(DISTINCT substr((CASE > WHEN is_cell_phone=1 THEN > secured_libs.u51decrypt(phone) > END),4,4)) AS seg2uv, > count(DISTINCT substr((CASE > WHEN is_cell_phone=1 THEN > secured_libs.u51decrypt(phone) > END),8,4)) AS seg3uv, > stddev_pop(substr((CASE > WHEN is_cell_phone=1 THEN > secured_libs.u51decrypt(phone) > END),1,3)) AS seg1stddev, > stddev_pop(substr((CASE > WHEN is_cell_phone=1 THEN > secured_libs.u51decrypt(phone) > END),4,4)) AS seg2stddev, > stddev_pop(substr((CASE > WHEN is_cell_phone=1 THEN > secured_libs.u51decrypt(phone) > END),8,4)) AS seg3stddev, > entropy(substr((CASE > WHEN is_cell_phone=1 THEN > secured_libs.u51decrypt(phone) > END),1,3)) AS seg1entropy, > entropy(substr((CASE > WHEN is_cell_phone=1 THEN > secured_libs.u51decrypt(phone) > END),4,4)) AS seg2entropy, > entropy(substr((CASE > WHEN is_cell_phone=1 THEN > secured_libs.u51decrypt(phone) > END),8,4)) AS seg3entropy > FROM dw_risk__mygravitation_v_snap_contacts_contacts > GROUP BY user_id, > order_id, > product_id > {code} > > > After digging into the code,I found at the line 444 of the > AggregateExpandDistinctAggregatesRule.java : > > {code:java} > int x = groupCount; > final List<AggregateCall> newCalls = new ArrayList<>(); > for (AggregateCall aggCall : aggregate.getAggCallList()) { > final int newFilterArg; > final List<Integer> newArgList; > final SqlAggFunction aggregation; > if (!aggCall.isDistinct()) { > aggregation = SqlStdOperatorTable.MIN; > newArgList = ImmutableIntList.of(x++); > newFilterArg = filters.get(aggregate.getGroupSet()); > } else { > {code} > the undistinct aggregate call `stddev_pop` and `entropy` was converted to a > SqlStdOperatorTable.MIN,(actually all undistinct aggregate call here will be > converted to a SqlStdOperatorTable.MIN ). > I don't understand how it works. > I guess someone made a faulty assumption here. [~julianhyde] ,can someone > help me ASAP ? > It’s very important for my business. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)