[ https://issues.apache.org/jira/browse/FLINK-28120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated FLINK-28120: ----------------------------------- Labels: pull-request-available (was: ) > Meet assert error: BatchPhysicalExchange.BATCH_PHYSICAL has lower cost then > best cost of subset :RelSubset#15.BATCH_PHYSICAL.hash[0, 1]true.[]] > ------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: FLINK-28120 > URL: https://issues.apache.org/jira/browse/FLINK-28120 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner > Reporter: luoyuxia > Priority: Major > Labels: pull-request-available > Fix For: 1.16.0 > > Attachments: 截屏2022-06-18 上午11.48.46.png > > > When I run the following sql with Hive dialect, > > {code:java} > create table src(key string, value string); > SELECT key, value FROM > ( > SELECT key, value FROM src > UNION ALL > SELECT key, key as value FROM ( > SELECT distinct key FROM ( > SELECT key, value FROM ( > SELECT key, value FROM src > UNION ALL > SELECT key, value FROM src > )t1 > group by key, value)t2 > )t3 > )t4 > group by key, value {code} > > > it'll throw the excpetion > > {code:java} > Caused by: java.lang.AssertionError: rel > [rel#1507:BatchPhysicalExchange.BATCH_PHYSICAL.hash[0, > 1]true.[](input=RelSubset#999,distribution=hash[key, value])] has lower cost > {8.657154570189462E8 rows, 2.9568623376365746E10 cpu, 7.2E9 io, > 3.394292742113678E9 network, 4.944093593596532E9 memory} than best cost > {8.657154570189462E8 rows, 2.9568623376365746E10 cpu, 7.2E9 io, > 3.3942927421136775E9 network, 4.944093593596532E9 memory} of subset > [rel#1103:RelSubset#15.BATCH_PHYSICAL.hash[0, 1]true.[]] {code} > And then I check the Flink code in where it's thrown, I find it's in > > {code:java} > if (relCost.isLt(subset.bestCost)) { > return litmus.fail("rel [{}] has lower cost {} than " > + "best cost {} of subset [{}]", > rel, relCost, subset.bestCost, subset); > } {code} > It seems the relCost is less than best cost, so the excpetion throw. > But the relCost is actually greater than the best cost, shown as follows: > !截屏2022-06-18 上午11.48.46.png|width=391,height=268! > > It seems the logic in Flink cost comparison breaks. > Then, I find the method #isLt in FlinkCost, which depend on #isLe and > #equals. But #isLe use normalizeCost, #equals doesn't use normalizeCost, > which bring such incosistent. > For such case, the normalizeCost if relCost and bestCost will be same. > Althogh the network isn't same, they will end with be same when calculated > as a normalizeCost, which seems like precison loss in double. > So #isLe will be true, but in method #equals, it will compare io, nework, > memory separately, which result in false. Then #isLt = #isLe(other) && > !#equals(other) will be true, which bring such exceptioin. > To fix it, I think we should change the logic for #equals to make it > consistent with what we use to compare in #isLe. > > -- This message was sent by Atlassian Jira (v8.20.7#820007)