[jira] [Commented] (FLINK-19004) Fail to call Hive percentile function together with distinct aggregate call

luoyuxia (Jira) Mon, 05 Sep 2022 02:14:37 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-19004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17600285#comment-17600285
 ]


luoyuxia commented on FLINK-19004:
----------------------------------

[~Runking]  Sorry for that. I mistook it may hardly happen,  so I haven't 
pushed it to be merged.  I think the pr 
[https://github.com/apache/flink/pull/18997]  is in good of shape. If you're 
urgently to fix it, you can apply this patch and build your flink.

The test failure is just plan assert failure for the plan has changed  since we 
will use `first_value` instead of `min`.

But notic it may bring performance regression since it'll use sort agg instead 
of hash agg after apply this patch.

But after finish this [https://github.com/apache/flink/pull/20130] , the 
performance regression will be fixed.

 

 

> Fail to call Hive percentile function together with distinct aggregate call
> ---------------------------------------------------------------------------
>
>                 Key: FLINK-19004
>                 URL: https://issues.apache.org/jira/browse/FLINK-19004
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / Hive, Table SQL / Planner
>            Reporter: Rui Li
>            Assignee: luoyuxia
>            Priority: Minor
>              Labels: auto-deprioritized-major, pull-request-available, 
> stale-assigned
>
> The following test case would fail:
> {code}
>       @Test
>       public void test() throws Exception {
>               TableEnvironment tableEnv = getTableEnvWithHiveCatalog();
>               tableEnv.unloadModule("core");
>               tableEnv.loadModule("hive", new HiveModule());
>               tableEnv.loadModule("core", CoreModule.INSTANCE);
>               tableEnv.executeSql("create table src(x int,y int)");
>               tableEnv.executeSql("select count(distinct 
> y),`percentile`(y,`array`(0.5,0.99)) from src group by x").collect();
>       }
> {code}
> The error is:
> {noformat}
> org.apache.flink.table.api.TableException: Cannot generate a valid execution 
> plan for the given query: 
> FlinkLogicalLegacySink(name=[collect], fields=[EXPR$0, EXPR$1])
> +- FlinkLogicalCalc(select=[EXPR$0, EXPR$1])
>    +- FlinkLogicalAggregate(group=[{0}], EXPR$0=[COUNT($1) FILTER $3], 
> EXPR$1=[MIN($2) FILTER $4])
>       +- FlinkLogicalCalc(select=[x, y, EXPR$1, =(CASE(=($e, 0:BIGINT), 
> 0:BIGINT, 1:BIGINT), 0) AS $g_0, =(CASE(=($e, 0:BIGINT), 0:BIGINT, 1:BIGINT), 
> 1) AS $g_1])
>          +- FlinkLogicalAggregate(group=[{0, 1, 3}], EXPR$1=[percentile($4, 
> $2)])
>             +- FlinkLogicalExpand(projects=[x, y, $f2, $e, y_0])
>                +- FlinkLogicalCalc(select=[x, y, array(0.5:DECIMAL(2, 1), 
> 0.99:DECIMAL(3, 2)) AS $f2])
>                   +- FlinkLogicalLegacyTableSourceScan(table=[[test-catalog, 
> default, src, source: [HiveTableSource(x, y) TablePath: default.src, 
> PartitionPruned: false, PartitionNums: null]]], fields=[x, y])
> Min aggregate function does not support type: ''ARRAY''.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-19004) Fail to call Hive percentile function together with distinct aggregate call

Reply via email to