By “write a rule” I mean write a class that extends RelOptRule. An example is CassandraRules.CassandraFilterRule.
ProjectableFilterableTable was “only” designed for the case that occurs 80% of the time but requires 20% of the functionality. Rules run in a richer environment so have more power and flexibility. Julian > On Oct 25, 2017, at 5:25 AM, Alexey Roytman <[email protected]> wrote: > > Thank you, Luis Fernando and Julian... > Here are the tests (extracted from my junit code): > --- > @Test > public void test_sst_06_part01() { > String sql = "select VARIABLE_COST, AREA, BRAND, CHANNEL, COUNTRY, > DISCOUNT, FIXED_COST," + > " LOB, \"MONTH\", PRICE_LEVEL, PRODUCTS, QUARTER, REGION, > REVENUE, TYPE," + > " UNITS, YEARS from XSchema.t6p4"; > String tbl = > "VARIABLE_COST\tAREA\tBRAND\tCHANNEL\tCOUNTRY\tDISCOUNT\tFIXED_COST\tLOB\tMONTH\tPRICE_LEVEL\tPRODUCTS\tQUARTER\tREGION\tREVENUE\tTYPE\tUNITS\tYEARS\n" > + > "18.44\tArea1\tBrand1\tStr\tCountry1\t0.0\t21.49\tLob1\tDec\t1.0\tProd1\tQtr > 4\tEMEA\t13.32\tType1\t14.0\tCY2014\n" + > "7.34\tArea1\tBrand2\tStr\tCountry1\t16.24\t68.33\tLob2\tNov\t2.0\tProd2\tQtr > 4\tEMEA\t81.67\tType2\t12.0\tCY2014\n" + > "49.06\tArea1\tBrand1\tStr\tCountry1\t41.93\t16.23\tLob1\tMar\t1.0\tProd1\tQtr > 1\tEMEA\t16.76\tType1\t21.0\tCY2016\n" + > "61.07\tArea1\tFunPod\tStr\tCountry1\t23.94\t10.31\tLob3\tMar\t1.0\tProd3\tQtr > 1\tEMEA\t61.24\tType3\t9.0\tCY2014\n" + > "4.11\tArea1\tBrand1\tCat\tCountry1\t70.97\t15.84\tLob1\tJan\t2.0\tProd4\tQtr > 1\tEMEA\t52.49\tType4\t12.0\tCY2015\n" + > "19.16\tArea1\tBrand2\tCat\tCountry1\t6.32\t36.81\tLob2\tFeb\t4.0\tProd5\tQtr > 1\tEMEA\t23.18\tType5\t10.0\tCY2014\n" + > "71.93\tArea1\tFunPod\tOnl\tCountry1\t0.0\t15.62\tLob3\tOct\t1.0\tProd6\tQtr > 4\tEMEA\t5.61\tType6\t2.0\tCY2014\n" + > "23.19\tArea1\tFunPod\tOnl\tCountry1\t13.85\t19.13\tLob3\tOct\t1.0\tProd6\tQtr > 4\tEMEA\t9.73\tType6\t7.0\tCY2015\n"; > String exp = "PLAN\n" + > "EnumerableInterpreter: rowcount = 100.0, cumulative cost = > {50.0 rows, 50.0 cpu, 0.0 io}, id = 30\n" + > " BindableTableScan(table=[[XSCHEMA, T6P4]], projects=[[0, > 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]]): rowcount = 100.0, > cumulative cost = {1.0 rows, 1.01 cpu, 0.0 io}, id = 14\n" + > "\n"; > Utils.testeq(conn, explain(sql), explained(tbl, exp)); > } > > @Test > public void test_sst_06_part02() { > String sql = "select T1000002.\"MONTH\" as c1, T1000002.\"QUARTER\" > as c2, T1000002.\"REVENUE\" as c3 from (\n" + > " select VARIABLE_COST, AREA, BRAND, CHANNEL, COUNTRY, > DISCOUNT, FIXED_COST," + > " LOB, \"MONTH\", PRICE_LEVEL, PRODUCTS, QUARTER, REGION, > REVENUE, TYPE," + > " UNITS, YEARS from XSchema.t6p4) T1000002\n" + > " where (T1000002.\"QUARTER\" = 'Qtr 4')"; > String tbl = "C1\tC2\tC3\n" + > "Dec\tQtr 4\t13.32\n" + > "Nov\tQtr 4\t81.67\n" + > "Oct\tQtr 4\t5.61\n" + > "Oct\tQtr 4\t9.73\n"; > String exp = "PLAN\n" + > "EnumerableInterpreter: rowcount = 100.0, cumulative cost = > {50.0 rows, 50.0 cpu, 0.0 io}, id = 97\n" + > " BindableTableScan(table=[[XSCHEMA, T6P4]], > filters=[[=($11, 'Qtr 4')]], projects=[[8, 11, 13]]): rowcount = 100.0, > cumulative cost = {1.0 rows, 1.01 cpu, 0.0 io}, id = 83\n" + > "\n"; > Utils.testeq(conn, explain(sql), explained(tbl, exp)); > } > > @Test > public void test_sst_06_part03() { > String sql = "select D102.c1 as c1, D102.c2 as c2, sum(D102.c3) as c3 > from (\n" + > " select T1000002.\"MONTH\" as c1, T1000002.\"QUARTER\" as > c2, T1000002.\"REVENUE\" as c3 from (\n" + > " select VARIABLE_COST, AREA, BRAND, CHANNEL, COUNTRY, > DISCOUNT, FIXED_COST," + > " LOB, \"MONTH\", PRICE_LEVEL, PRODUCTS, QUARTER, REGION, > REVENUE, TYPE," + > " UNITS, YEARS from XSchema.t6p4) T1000002\n" + > " where (T1000002.\"QUARTER\" = 'Qtr 4')\n" + > " ) D102\n" + > " group by D102.c1, D102.c2\n"; > String tbl = "C1\tC2\tC3\n" + > "Oct\tQtr 4\t15.34\n" + > "Dec\tQtr 4\t13.32\n" + > "Nov\tQtr 4\t81.67\n"; // there is no ORDER BY, so order > differs from plsql's > String exp = "PLAN\n" + > "EnumerableAggregate(group=[{8, 11}], C3=[$SUM0($13)]): > rowcount = 10.0, cumulative cost = {61.25 rows, 50.0 cpu, 0.0 io}, id = > 186\n" + > " EnumerableInterpreter: rowcount = 100.0, cumulative cost = > {50.0 rows, 50.0 cpu, 0.0 io}, id = 184\n" + > // TODO: project shall be present here > " BindableTableScan(table=[[XSCHEMA, T6P4]], > filters=[[=($11, 'Qtr 4')]]): rowcount = 100.0, cumulative cost = {1.0 rows, > 1.01 cpu, 0.0 io}, id = 148\n" + > "\n"; > Utils.testeq(conn, explain(sql), explained(tbl, exp)); > } > > @Test > public void test_sst_06_part04() { > String sql = "select D1000001.c1 as c1, sum(D1000001.c3) as c2 from > (\n" + > " select D102.c1 as c1, D102.c2 as c2, sum(D102.c3) as c3 > from (\n" + > " select T1000002.\"MONTH\" as c1, T1000002.\"QUARTER\" as > c2, T1000002.\"REVENUE\" as c3 from (\n" + > " select VARIABLE_COST, AREA, BRAND, CHANNEL, COUNTRY, > DISCOUNT, FIXED_COST," + > " LOB, \"MONTH\", PRICE_LEVEL, PRODUCTS, QUARTER, REGION, > REVENUE, TYPE," + > " UNITS, YEARS from XSchema.t6p4) T1000002\n" + > " where (T1000002.\"QUARTER\" = 'Qtr 4')\n" + > " ) D102\n" + > " group by D102.c1, D102.c2\n" + > " ) D1000001\n" + > " group by D1000001.c1\n"; > String tbl = "C1\tC2\n" + > "Oct\t15.34\n" + > "Dec\t13.32\n" + > "Nov\t81.67\n"; // there is no ORDER BY, so order differs > from plsqls's > String exp = "PLAN\n" + > "EnumerableAggregate(group=[{0}], C2=[$SUM0($2)]): rowcount = > 1.0, cumulative cost = {62.375 rows, 50.0 cpu, 0.0 io}, id = 307\n" + > " EnumerableAggregate(group=[{8, 11}], C3=[$SUM0($13)]): > rowcount = 10.0, cumulative cost = {61.25 rows, 50.0 cpu, 0.0 io}, id = > 305\n" + > " EnumerableInterpreter: rowcount = 100.0, cumulative cost > = {50.0 rows, 50.0 cpu, 0.0 io}, id = 303\n" + > // TODO: project shall be present here > " BindableTableScan(table=[[XSCHEMA, T6P4]], > filters=[[=($11, 'Qtr 4')]]): rowcount = 100.0, cumulative cost = {1.0 rows, > 1.01 cpu, 0.0 io}, id = 259\n" + > "\n"; > Utils.testeq(conn, explain(sql), explained(tbl, exp)); > } > --- > > Please notice that each next testcase uses previous one's SQL as a subselect. > Please notice that in part01 and part02 we have projects!=null at the lowest > level (BindableTableScan), but on part03 and part04 we don't have, thus need > to calculate and return all columns. > > I understand that ProjectableFilterableTable is a simple interface, but I'd > expect it not to query all tables when not needed... > > (When I tried to use example's > org.apache.calcite.adapter.csv.CsvSchemaFactory with flavor=TRANSLATABLE, > also at the lowest level I did not get both projects and filters...) > > (And yes, the https://issues.apache.org/jira/browse/CALCITE-1876 > <https://issues.apache.org/jira/browse/CALCITE-1876> looks to be applicable > for my case...) > > Julian, when you say "write a rule", what do you mean? Is it more than using > TranslatableTable? > > - Alexey. > > On 10/23/2017 06:55 PM, Julian Hyde wrote: >> I agree with what Luis said. >> >> Also, remember that ProjectableFilterableTable is intended to be a simple >> solution for the simple, common cases. If you need something more complex >> you may need to write a rule. >> >> Julian >> >>> On Oct 23, 2017, at 4:49 AM, Luis Fernando Kauer >>> <[email protected]> wrote: >>> >>> Hi, >>> Can you give us some examples of the queries you tested? Include the the >>> query plan Calcite generated. (Use EXPLAIN PLAN FOR you query) >>> Currently, aggregates with no column reference, like count(*), generates a >>> plan that scans all projects when using ProjectableFilterableTable. I'm not >>> sure it there is a Jira for that already. >>> The other option is to use TranslatableTable, but be aware that you'll have >>> to implement rules to push the necessary projects to the table scan, unlike >>> when using ProjectableFilterableTable which has many rules built in Calcite >>> for that. >>> See also:[CALCITE-1876] Create a rule to push the projections used in >>> aggregate functions - ASF JIRA >>> >>> Druid adapter calls Druid to execute the table scan. In CSV Adapter it is >>> executed by Calcite using Enumerator. >>> Always check the query plan first and create rules to optimize it the way >>> you want. >>> Enable tracing if you want to check which rules were applied. >>> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__calcite.apache.org_docs_howto.html-23tracing&d=DwIFAg&c=RoP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=pDLIINlr5rZagLGo3VNW6xofw209iCAO8VpXxjH1KgY&m=-3fG7DlmYthyhzeTqOIfKXQzVwtgKia_9hvEy7_8wGs&s=Vg9O49VuA22sgdF-3WilTppWSc-_yf2bvfcRgrnpqoY&e= >>> >>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__calcite.apache.org_docs_howto.html-23tracing&d=DwIFAg&c=RoP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=pDLIINlr5rZagLGo3VNW6xofw209iCAO8VpXxjH1KgY&m=-3fG7DlmYthyhzeTqOIfKXQzVwtgKia_9hvEy7_8wGs&s=Vg9O49VuA22sgdF-3WilTppWSc-_yf2bvfcRgrnpqoY&e=>
