Jiarong Wei created CALCITE-1428: ------------------------------------ Summary: Inefficient execution plan of SELECT and LIMIT for Druid Key: CALCITE-1428 URL: https://issues.apache.org/jira/browse/CALCITE-1428 Project: Calcite Issue Type: Bug Components: core, druid Reporter: Jiarong Wei Assignee: Julian Hyde
For SQLs like: 1. {{SELECT * FROM <table> LIMIT <row_count>}} 2. {{SELECT <all_columns_specified_explicitly> FROM <table> LIMIT <row_count>}} {{DruidSortRule}} in Druid adapter does take effect and {{LIMIT}} is pushed into {{DruidQuery}}. However the corresponding execution plan isn't chosen as the best one. Thus Calcite will retrieve all data from Druid and purge all unnecessary columns. These are three SQLs and their corresponding execution plans below for dataset {{wikiticker}} in Druid quickstart: 1. {{SELECT "cityName" FROM "wikiticker" LIMIT 5}} {code} rel#27:EnumerableInterpreter.ENUMERABLE.[](input=rel#26:Subset#2.BINDABLE.[]) rel#85:DruidQuery.BINDABLE.[](table=[default, wikiticker],intervals=[1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z],projects=[$3],fetch=5) {code} 2. {{SELECT * FROM "wikiticker" LIMIT 5}} {code} rel#52:EnumerableLimit.ENUMERABLE.[](input=rel#36:Subset#0.ENUMERABLE.[],fetch=5) rel#79:EnumerableInterpreter.ENUMERABLE.[](input=rel#4:Subset#0.BINDABLE.[]) rel#1:DruidQuery.BINDABLE.[](table=[default, wikiticker],intervals=[1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z]) {code} 3. {{SELECT "__time", "added", "channel", "cityName", "comment", "commentLength", "count", "countryIsoCode", "countryName", "deleted", "delta", "deltaBucket", "diffUrl", "flags", "isAnonymous", "isMinor", "isNew", "isRobot", "isUnpatrolled", "metroCode", "namespace", "page", "regionIsoCode", "regionName", "user", "user_unique" FROM "wikiticker" LIMIT 5}} {code} rel#42:EnumerableLimit.ENUMERABLE.[](input=rel#41:Subset#1.ENUMERABLE.[],fetch=5) rel#113:EnumerableInterpreter.ENUMERABLE.[](input=rel#34:Subset#1.BINDABLE.[]) rel#52:BindableProject.BINDABLE.[](input=rel#4:Subset#0.BINDABLE.[],__time=$0,added=$1,channel=$2,cityName=$3,comment=$4,commentLength=$5,count=$6,countryIsoCode=$7,countryName=$8,deleted=$9,delta=$10,deltaBucket=$11,diffUrl=$12,flags=$13,isAnonymous=$14,isMinor=$15,isNew=$16,isRobot=$17,isUnpatrolled=$18,metroCode=$19,namespace=$20,page=$21,regionIsoCode=$22,regionName=$23,user=USER,user_unique=$25) rel#1:DruidQuery.BINDABLE.[](table=[default, wikiticker],intervals=[1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z]) {code} Notice that 2 and 3 should have {{LIMIT}} pushed to {{DruidQuery}} like 1 (and should not have {{EnumerableLimit}}) -- This message was sent by Atlassian JIRA (v6.3.4#6332)