[ 
https://issues.apache.org/jira/browse/FLINK-28120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-28120:
-----------------------------------
    Labels: pull-request-available  (was: )

> Meet assert error: BatchPhysicalExchange.BATCH_PHYSICAL has lower cost then  
> best cost  of subset :RelSubset#15.BATCH_PHYSICAL.hash[0, 1]true.[]]
> -------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-28120
>                 URL: https://issues.apache.org/jira/browse/FLINK-28120
>             Project: Flink
>          Issue Type: Bug
>          Components: Table SQL / Planner
>            Reporter: luoyuxia
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.16.0
>
>         Attachments: 截屏2022-06-18 上午11.48.46.png
>
>
> When I run the following sql with Hive dialect,
>  
> {code:java}
> create table src(key string, value string);
> SELECT key, value FROM
> (
>   SELECT key, value FROM src
>   UNION ALL
>   SELECT key, key as value FROM ( 
>     SELECT distinct key FROM (
>       SELECT key, value FROM (
>         SELECT key, value FROM src
>         UNION ALL
>         SELECT key, value FROM src
>       )t1 
>     group by key, value)t2
>   )t3
> )t4
> group by key, value {code}
>  
>  
> it'll throw the excpetion 
>  
> {code:java}
> Caused by: java.lang.AssertionError: rel 
> [rel#1507:BatchPhysicalExchange.BATCH_PHYSICAL.hash[0, 
> 1]true.[](input=RelSubset#999,distribution=hash[key, value])] has lower cost 
> {8.657154570189462E8 rows, 2.9568623376365746E10 cpu, 7.2E9 io, 
> 3.394292742113678E9 network, 4.944093593596532E9 memory} than best cost 
> {8.657154570189462E8 rows, 2.9568623376365746E10 cpu, 7.2E9 io, 
> 3.3942927421136775E9 network, 4.944093593596532E9 memory} of subset 
> [rel#1103:RelSubset#15.BATCH_PHYSICAL.hash[0, 1]true.[]] {code}
> And then I check the Flink code in where it's thrown, I find it's in 
>  
> {code:java}
> if (relCost.isLt(subset.bestCost)) {
>   return litmus.fail("rel [{}] has lower cost {} than "
>           + "best cost {} of subset [{}]",
>           rel, relCost, subset.bestCost, subset);
> } {code}
> It seems the relCost is less than best cost, so the excpetion throw.
> But the relCost is actually greater than the best cost, shown as follows:
> !截屏2022-06-18 上午11.48.46.png|width=391,height=268!
>  
> It seems the logic in Flink cost comparison breaks.
> Then, I find the method #isLt in FlinkCost, which depend on #isLe and 
> #equals. But #isLe  use normalizeCost, #equals doesn't use normalizeCost, 
> which bring such incosistent.
> For such case, the normalizeCost if  relCost and bestCost will be same. 
> Althogh the network isn't same,  they will end with be same when calculated 
> as a normalizeCost, which seems like precison loss in double.
> So #isLe will be true, but in method #equals, it will compare io, nework, 
> memory separately, which result in false. Then #isLt  = #isLe(other) && 
> !#equals(other) will be true, which bring such exceptioin.
> To fix it, I think we should change the logic for #equals to make it 
> consistent with what we use to compare in #isLe.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to