Re: ProjectableFilterableTable.scan() and expensive columns

Julian Hyde Wed, 25 Oct 2017 13:03:17 -0700

By “write a rule” I mean write a class that extends RelOptRule. An example is 
CassandraRules.CassandraFilterRule.


ProjectableFilterableTable was “only” designed for the case that occurs 80% of 
the time but requires 20% of the functionality. Rules run in a richer 
environment so have more power and flexibility. 

Julian


> On Oct 25, 2017, at 5:25 AM, Alexey Roytman <[email protected]> wrote:
> 
> Thank you, Luis Fernando and Julian...
> Here are the tests (extracted from my junit code):
> ---
>     @Test
>     public void test_sst_06_part01() {
>         String sql = "select VARIABLE_COST, AREA, BRAND, CHANNEL, COUNTRY, 
> DISCOUNT, FIXED_COST," +
>                 " LOB, \"MONTH\", PRICE_LEVEL, PRODUCTS, QUARTER, REGION, 
> REVENUE, TYPE," +
>                 " UNITS, YEARS from XSchema.t6p4";
>         String tbl = 
> "VARIABLE_COST\tAREA\tBRAND\tCHANNEL\tCOUNTRY\tDISCOUNT\tFIXED_COST\tLOB\tMONTH\tPRICE_LEVEL\tPRODUCTS\tQUARTER\tREGION\tREVENUE\tTYPE\tUNITS\tYEARS\n"
>  +
> "18.44\tArea1\tBrand1\tStr\tCountry1\t0.0\t21.49\tLob1\tDec\t1.0\tProd1\tQtr 
> 4\tEMEA\t13.32\tType1\t14.0\tCY2014\n" +
> "7.34\tArea1\tBrand2\tStr\tCountry1\t16.24\t68.33\tLob2\tNov\t2.0\tProd2\tQtr 
> 4\tEMEA\t81.67\tType2\t12.0\tCY2014\n" +
> "49.06\tArea1\tBrand1\tStr\tCountry1\t41.93\t16.23\tLob1\tMar\t1.0\tProd1\tQtr
>  1\tEMEA\t16.76\tType1\t21.0\tCY2016\n" +
> "61.07\tArea1\tFunPod\tStr\tCountry1\t23.94\t10.31\tLob3\tMar\t1.0\tProd3\tQtr
>  1\tEMEA\t61.24\tType3\t9.0\tCY2014\n" +
> "4.11\tArea1\tBrand1\tCat\tCountry1\t70.97\t15.84\tLob1\tJan\t2.0\tProd4\tQtr 
> 1\tEMEA\t52.49\tType4\t12.0\tCY2015\n" +
> "19.16\tArea1\tBrand2\tCat\tCountry1\t6.32\t36.81\tLob2\tFeb\t4.0\tProd5\tQtr 
> 1\tEMEA\t23.18\tType5\t10.0\tCY2014\n" +
> "71.93\tArea1\tFunPod\tOnl\tCountry1\t0.0\t15.62\tLob3\tOct\t1.0\tProd6\tQtr 
> 4\tEMEA\t5.61\tType6\t2.0\tCY2014\n" +
> "23.19\tArea1\tFunPod\tOnl\tCountry1\t13.85\t19.13\tLob3\tOct\t1.0\tProd6\tQtr
>  4\tEMEA\t9.73\tType6\t7.0\tCY2015\n";
>         String exp = "PLAN\n" +
>                 "EnumerableInterpreter: rowcount = 100.0, cumulative cost = 
> {50.0 rows, 50.0 cpu, 0.0 io}, id = 30\n" +
>                 "  BindableTableScan(table=[[XSCHEMA, T6P4]], projects=[[0, 
> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]]): rowcount = 100.0, 
> cumulative cost = {1.0 rows, 1.01 cpu, 0.0 io}, id = 14\n" +
>                 "\n";
>         Utils.testeq(conn, explain(sql), explained(tbl, exp));
>     }
> 
>     @Test
>     public void test_sst_06_part02() {
>         String sql = "select T1000002.\"MONTH\" as c1, T1000002.\"QUARTER\" 
> as c2, T1000002.\"REVENUE\" as c3 from (\n" +
>                 " select VARIABLE_COST, AREA, BRAND, CHANNEL, COUNTRY, 
> DISCOUNT, FIXED_COST," +
>                 " LOB, \"MONTH\", PRICE_LEVEL, PRODUCTS, QUARTER, REGION, 
> REVENUE, TYPE," +
>                 " UNITS, YEARS from XSchema.t6p4) T1000002\n" +
>                 " where (T1000002.\"QUARTER\" = 'Qtr 4')";
>         String tbl = "C1\tC2\tC3\n" +
>                 "Dec\tQtr 4\t13.32\n" +
>                 "Nov\tQtr 4\t81.67\n" +
>                 "Oct\tQtr 4\t5.61\n" +
>                 "Oct\tQtr 4\t9.73\n";
>         String exp = "PLAN\n" +
>                 "EnumerableInterpreter: rowcount = 100.0, cumulative cost = 
> {50.0 rows, 50.0 cpu, 0.0 io}, id = 97\n" +
>                 "  BindableTableScan(table=[[XSCHEMA, T6P4]], 
> filters=[[=($11, 'Qtr 4')]], projects=[[8, 11, 13]]): rowcount = 100.0, 
> cumulative cost = {1.0 rows, 1.01 cpu, 0.0 io}, id = 83\n" +
>                 "\n";
>         Utils.testeq(conn, explain(sql), explained(tbl, exp));
>     }
> 
>     @Test
>     public void test_sst_06_part03() {
>         String sql = "select D102.c1 as c1, D102.c2 as c2, sum(D102.c3) as c3 
> from (\n" +
>                 " select T1000002.\"MONTH\" as c1, T1000002.\"QUARTER\" as 
> c2, T1000002.\"REVENUE\" as c3 from (\n" +
>                 " select VARIABLE_COST, AREA, BRAND, CHANNEL, COUNTRY, 
> DISCOUNT, FIXED_COST," +
>                 " LOB, \"MONTH\", PRICE_LEVEL, PRODUCTS, QUARTER, REGION, 
> REVENUE, TYPE," +
>                 " UNITS, YEARS from XSchema.t6p4) T1000002\n" +
>                 " where (T1000002.\"QUARTER\" = 'Qtr 4')\n" +
>                 " ) D102\n" +
>                 " group by D102.c1, D102.c2\n";
>         String tbl = "C1\tC2\tC3\n" +
>                 "Oct\tQtr 4\t15.34\n" +
>                 "Dec\tQtr 4\t13.32\n" +
>                 "Nov\tQtr 4\t81.67\n"; // there is no ORDER BY, so order 
> differs from plsql's
>         String exp = "PLAN\n" +
>                 "EnumerableAggregate(group=[{8, 11}], C3=[$SUM0($13)]): 
> rowcount = 10.0, cumulative cost = {61.25 rows, 50.0 cpu, 0.0 io}, id = 
> 186\n" +
>                 "  EnumerableInterpreter: rowcount = 100.0, cumulative cost = 
> {50.0 rows, 50.0 cpu, 0.0 io}, id = 184\n" +
>                 // TODO: project shall be present here
>                 "    BindableTableScan(table=[[XSCHEMA, T6P4]], 
> filters=[[=($11, 'Qtr 4')]]): rowcount = 100.0, cumulative cost = {1.0 rows, 
> 1.01 cpu, 0.0 io}, id = 148\n" +
>                 "\n";
>         Utils.testeq(conn, explain(sql), explained(tbl, exp));
>     }
> 
>     @Test
>     public void test_sst_06_part04() {
>         String sql = "select D1000001.c1 as c1, sum(D1000001.c3) as c2 from 
> (\n" +
>                 " select D102.c1 as c1, D102.c2 as c2, sum(D102.c3) as c3 
> from (\n" +
>                 " select T1000002.\"MONTH\" as c1, T1000002.\"QUARTER\" as 
> c2, T1000002.\"REVENUE\" as c3 from (\n" +
>                 " select VARIABLE_COST, AREA, BRAND, CHANNEL, COUNTRY, 
> DISCOUNT, FIXED_COST," +
>                 " LOB, \"MONTH\", PRICE_LEVEL, PRODUCTS, QUARTER, REGION, 
> REVENUE, TYPE," +
>                 " UNITS, YEARS from XSchema.t6p4) T1000002\n" +
>                 " where (T1000002.\"QUARTER\" = 'Qtr 4')\n" +
>                 " ) D102\n" +
>                 " group by D102.c1, D102.c2\n" +
>                 " ) D1000001\n" +
>                 " group by D1000001.c1\n";
>         String tbl = "C1\tC2\n" +
>                 "Oct\t15.34\n" +
>                 "Dec\t13.32\n" +
>                 "Nov\t81.67\n"; // there is no ORDER BY, so order differs 
> from plsqls's
>         String exp = "PLAN\n" +
>                 "EnumerableAggregate(group=[{0}], C2=[$SUM0($2)]): rowcount = 
> 1.0, cumulative cost = {62.375 rows, 50.0 cpu, 0.0 io}, id = 307\n" +
>                 "  EnumerableAggregate(group=[{8, 11}], C3=[$SUM0($13)]): 
> rowcount = 10.0, cumulative cost = {61.25 rows, 50.0 cpu, 0.0 io}, id = 
> 305\n" +
>                 "    EnumerableInterpreter: rowcount = 100.0, cumulative cost 
> = {50.0 rows, 50.0 cpu, 0.0 io}, id = 303\n" +
>                 // TODO: project shall be present here
>                 "      BindableTableScan(table=[[XSCHEMA, T6P4]], 
> filters=[[=($11, 'Qtr 4')]]): rowcount = 100.0, cumulative cost = {1.0 rows, 
> 1.01 cpu, 0.0 io}, id = 259\n" +
>                 "\n";
>         Utils.testeq(conn, explain(sql), explained(tbl, exp));
>     }
> ---
> 
> Please notice that each next testcase uses previous one's SQL as a subselect.
> Please notice that in part01 and part02 we have projects!=null at the lowest 
> level (BindableTableScan), but on part03 and part04 we don't have, thus need 
> to calculate and return all columns.
> 
> I understand that ProjectableFilterableTable is a simple interface, but I'd 
> expect it not to query all tables when not needed...
> 
> (When I tried to use example's 
> org.apache.calcite.adapter.csv.CsvSchemaFactory with flavor=TRANSLATABLE, 
> also at the lowest level I did not get both projects and filters...)
> 
> (And yes, the https://issues.apache.org/jira/browse/CALCITE-1876 
> <https://issues.apache.org/jira/browse/CALCITE-1876> looks to be applicable 
> for my case...)
> 
> Julian, when you say "write a rule", what do you mean? Is it more than using 
> TranslatableTable?
> 
> - Alexey.
> 
> On 10/23/2017 06:55 PM, Julian Hyde wrote:
>> I agree with what Luis said.
>> 
>> Also, remember that ProjectableFilterableTable is intended to be a simple 
>> solution for the simple, common cases. If you need something more complex 
>> you may need to write a rule.
>> 
>> Julian
>> 
>>> On Oct 23, 2017, at 4:49 AM, Luis Fernando Kauer 
>>> <[email protected]> wrote:
>>> 
>>> Hi,
>>> Can you give us some examples of the queries you tested? Include the the 
>>> query plan Calcite generated.  (Use EXPLAIN PLAN FOR you query)
>>> Currently, aggregates with no column reference, like count(*), generates a 
>>> plan that scans all projects when using ProjectableFilterableTable. I'm not 
>>> sure it there is a Jira for that already.
>>> The other option is to use TranslatableTable, but be aware that you'll have 
>>> to implement rules to push the necessary projects to the table scan, unlike 
>>> when using ProjectableFilterableTable which has many rules built in Calcite 
>>> for that.
>>> See also:[CALCITE-1876] Create a rule to push the projections used in 
>>> aggregate functions - ASF JIRA
>>> 
>>> Druid adapter calls Druid to execute the table scan. In CSV Adapter it is 
>>> executed by Calcite using Enumerator.
>>> Always check the query plan first and create rules to optimize it the way 
>>> you want.
>>> Enable tracing if you want to check which rules were applied.
>>> 
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__calcite.apache.org_docs_howto.html-23tracing&d=DwIFAg&c=RoP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=pDLIINlr5rZagLGo3VNW6xofw209iCAO8VpXxjH1KgY&m=-3fG7DlmYthyhzeTqOIfKXQzVwtgKia_9hvEy7_8wGs&s=Vg9O49VuA22sgdF-3WilTppWSc-_yf2bvfcRgrnpqoY&e=
>>>  
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__calcite.apache.org_docs_howto.html-23tracing&d=DwIFAg&c=RoP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=pDLIINlr5rZagLGo3VNW6xofw209iCAO8VpXxjH1KgY&m=-3fG7DlmYthyhzeTqOIfKXQzVwtgKia_9hvEy7_8wGs&s=Vg9O49VuA22sgdF-3WilTppWSc-_yf2bvfcRgrnpqoY&e=>

Re: ProjectableFilterableTable.scan() and expensive columns

Reply via email to