[ https://issues.apache.org/jira/browse/CALCITE-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570729#comment-15570729 ]
Jiarong Wei commented on CALCITE-1428: -------------------------------------- Hi Julian, I'm trying to implement LIMIT pushing. I printed logs and noticed that, during {{findBestExp}} of {{VolcanoPlanner}}, {{DruidSort}} rule was applied but not chosen as the cheapest plan in the end. I think the cause may be related to the cost model. Could you give some hints or information about this, or how to implement this pushing? Thanks! > Inefficient execution plan of SELECT and LIMIT for Druid > -------------------------------------------------------- > > Key: CALCITE-1428 > URL: https://issues.apache.org/jira/browse/CALCITE-1428 > Project: Calcite > Issue Type: Bug > Components: core, druid > Reporter: Jiarong Wei > Assignee: Julian Hyde > > For SQLs like: > 1. {{SELECT * FROM <table> LIMIT <row_count>}} > 2. {{SELECT <all_columns_specified_explicitly> FROM <table> LIMIT > <row_count>}} > {{DruidSortRule}} in Druid adapter does take effect and {{LIMIT}} is pushed > into {{DruidQuery}}. However the corresponding execution plan isn't chosen as > the best one. Thus Calcite will retrieve all data from Druid and purge all > unnecessary columns. > These are three SQLs and their corresponding execution plans below for > dataset {{wikiticker}} in Druid quickstart: > 1. {{SELECT "cityName" FROM "wikiticker" LIMIT 5}} > {code} > rel#27:EnumerableInterpreter.ENUMERABLE.[](input=rel#26:Subset#2.BINDABLE.[]) > rel#85:DruidQuery.BINDABLE.[](table=[default, > wikiticker],intervals=[1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z],projects=[$3],fetch=5) > {code} > 2. {{SELECT * FROM "wikiticker" LIMIT 5}} > {code} > rel#52:EnumerableLimit.ENUMERABLE.[](input=rel#36:Subset#0.ENUMERABLE.[],fetch=5) > rel#79:EnumerableInterpreter.ENUMERABLE.[](input=rel#4:Subset#0.BINDABLE.[]) > rel#1:DruidQuery.BINDABLE.[](table=[default, > wikiticker],intervals=[1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z]) > {code} > 3. {{SELECT "__time", "added", "channel", "cityName", "comment", > "commentLength", "count", "countryIsoCode", "countryName", "deleted", > "delta", "deltaBucket", "diffUrl", "flags", "isAnonymous", "isMinor", > "isNew", "isRobot", "isUnpatrolled", "metroCode", "namespace", "page", > "regionIsoCode", "regionName", "user", "user_unique" FROM "wikiticker" LIMIT > 5}} > {code} > rel#42:EnumerableLimit.ENUMERABLE.[](input=rel#41:Subset#1.ENUMERABLE.[],fetch=5) > rel#113:EnumerableInterpreter.ENUMERABLE.[](input=rel#34:Subset#1.BINDABLE.[]) > rel#52:BindableProject.BINDABLE.[](input=rel#4:Subset#0.BINDABLE.[],__time=$0,added=$1,channel=$2,cityName=$3,comment=$4,commentLength=$5,count=$6,countryIsoCode=$7,countryName=$8,deleted=$9,delta=$10,deltaBucket=$11,diffUrl=$12,flags=$13,isAnonymous=$14,isMinor=$15,isNew=$16,isRobot=$17,isUnpatrolled=$18,metroCode=$19,namespace=$20,page=$21,regionIsoCode=$22,regionName=$23,user=USER,user_unique=$25) > rel#1:DruidQuery.BINDABLE.[](table=[default, > wikiticker],intervals=[1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z]) > {code} > Notice that 2 and 3 should have {{LIMIT}} pushed to {{DruidQuery}} like 1 > (and should not have {{EnumerableLimit}}) -- This message was sent by Atlassian JIRA (v6.3.4#6332)