Re: ProjectableFilterableTable.scan() and expensive columns

Alexey Roytman Wed, 25 Oct 2017 05:26:42 -0700

Thank you, Luis Fernando and Julian...
Here are the tests (extracted from my junit code):
---
    @Test
    public void test_sst_06_part01() {

String sql = "select VARIABLE_COST, AREA, BRAND, CHANNEL,COUNTRY, DISCOUNT, FIXED_COST," + " LOB, \"MONTH\", PRICE_LEVEL, PRODUCTS, QUARTER,REGION, REVENUE, TYPE," +

                " UNITS, YEARS from XSchema.t6p4";

String tbl ="VARIABLE_COST\tAREA\tBRAND\tCHANNEL\tCOUNTRY\tDISCOUNT\tFIXED_COST\tLOB\tMONTH\tPRICE_LEVEL\tPRODUCTS\tQUARTER\tREGION\tREVENUE\tTYPE\tUNITS\tYEARS\n"+"18.44\tArea1\tBrand1\tStr\tCountry1\t0.0\t21.49\tLob1\tDec\t1.0\tProd1\tQtr4\tEMEA\t13.32\tType1\t14.0\tCY2014\n" +"7.34\tArea1\tBrand2\tStr\tCountry1\t16.24\t68.33\tLob2\tNov\t2.0\tProd2\tQtr4\tEMEA\t81.67\tType2\t12.0\tCY2014\n" +"49.06\tArea1\tBrand1\tStr\tCountry1\t41.93\t16.23\tLob1\tMar\t1.0\tProd1\tQtr1\tEMEA\t16.76\tType1\t21.0\tCY2016\n" +"61.07\tArea1\tFunPod\tStr\tCountry1\t23.94\t10.31\tLob3\tMar\t1.0\tProd3\tQtr1\tEMEA\t61.24\tType3\t9.0\tCY2014\n" +"4.11\tArea1\tBrand1\tCat\tCountry1\t70.97\t15.84\tLob1\tJan\t2.0\tProd4\tQtr1\tEMEA\t52.49\tType4\t12.0\tCY2015\n" +"19.16\tArea1\tBrand2\tCat\tCountry1\t6.32\t36.81\tLob2\tFeb\t4.0\tProd5\tQtr1\tEMEA\t23.18\tType5\t10.0\tCY2014\n" +"71.93\tArea1\tFunPod\tOnl\tCountry1\t0.0\t15.62\tLob3\tOct\t1.0\tProd6\tQtr4\tEMEA\t5.61\tType6\t2.0\tCY2014\n" +"23.19\tArea1\tFunPod\tOnl\tCountry1\t13.85\t19.13\tLob3\tOct\t1.0\tProd6\tQtr4\tEMEA\t9.73\tType6\t7.0\tCY2015\n";

        String exp = "PLAN\n" +

"EnumerableInterpreter: rowcount = 100.0, cumulativecost = {50.0 rows, 50.0 cpu, 0.0 io}, id = 30\n" + " BindableTableScan(table=[[XSCHEMA, T6P4]],projects=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]]):rowcount = 100.0, cumulative cost = {1.0 rows, 1.01 cpu, 0.0 io}, id =14\n" +

                "\n";
        Utils.testeq(conn, explain(sql), explained(tbl, exp));
    }


    @Test
    public void test_sst_06_part02() {

String sql = "select T1000002.\"MONTH\" as c1,T1000002.\"QUARTER\" as c2, T1000002.\"REVENUE\" as c3 from (\n" + " select VARIABLE_COST, AREA, BRAND, CHANNEL, COUNTRY,DISCOUNT, FIXED_COST," + " LOB, \"MONTH\", PRICE_LEVEL, PRODUCTS, QUARTER,REGION, REVENUE, TYPE," +

                " UNITS, YEARS from XSchema.t6p4) T1000002\n" +
                " where (T1000002.\"QUARTER\" = 'Qtr 4')";
        String tbl = "C1\tC2\tC3\n" +
                "Dec\tQtr 4\t13.32\n" +
                "Nov\tQtr 4\t81.67\n" +
                "Oct\tQtr 4\t5.61\n" +
                "Oct\tQtr 4\t9.73\n";
        String exp = "PLAN\n" +

"EnumerableInterpreter: rowcount = 100.0, cumulativecost = {50.0 rows, 50.0 cpu, 0.0 io}, id = 97\n" + " BindableTableScan(table=[[XSCHEMA, T6P4]],filters=[[=($11, 'Qtr 4')]], projects=[[8, 11, 13]]): rowcount = 100.0,cumulative cost = {1.0 rows, 1.01 cpu, 0.0 io}, id = 83\n" +

                "\n";
        Utils.testeq(conn, explain(sql), explained(tbl, exp));
    }

    @Test
    public void test_sst_06_part03() {

String sql = "select D102.c1 as c1, D102.c2 as c2, sum(D102.c3)as c3 from (\n" + " select T1000002.\"MONTH\" as c1, T1000002.\"QUARTER\"as c2, T1000002.\"REVENUE\" as c3 from (\n" + " select VARIABLE_COST, AREA, BRAND, CHANNEL, COUNTRY,DISCOUNT, FIXED_COST," + " LOB, \"MONTH\", PRICE_LEVEL, PRODUCTS, QUARTER,REGION, REVENUE, TYPE," +

                " UNITS, YEARS from XSchema.t6p4) T1000002\n" +
                " where (T1000002.\"QUARTER\" = 'Qtr 4')\n" +
                " ) D102\n" +
                " group by D102.c1, D102.c2\n";
        String tbl = "C1\tC2\tC3\n" +
                "Oct\tQtr 4\t15.34\n" +
                "Dec\tQtr 4\t13.32\n" +

"Nov\tQtr 4\t81.67\n"; // there is no ORDER BY, soorder differs from plsql's

        String exp = "PLAN\n" +

"EnumerableAggregate(group=[{8, 11}], C3=[$SUM0($13)]):rowcount = 10.0, cumulative cost = {61.25 rows, 50.0 cpu, 0.0 io}, id =186\n" + " EnumerableInterpreter: rowcount = 100.0, cumulativecost = {50.0 rows, 50.0 cpu, 0.0 io}, id = 184\n" +

                // TODO: project shall be present here

" BindableTableScan(table=[[XSCHEMA, T6P4]],filters=[[=($11, 'Qtr 4')]]): rowcount = 100.0, cumulative cost = {1.0rows, 1.01 cpu, 0.0 io}, id = 148\n" +

                "\n";
        Utils.testeq(conn, explain(sql), explained(tbl, exp));
    }

    @Test
    public void test_sst_06_part04() {

String sql = "select D1000001.c1 as c1, sum(D1000001.c3) as c2from (\n" + " select D102.c1 as c1, D102.c2 as c2, sum(D102.c3) asc3 from (\n" + " select T1000002.\"MONTH\" as c1, T1000002.\"QUARTER\"as c2, T1000002.\"REVENUE\" as c3 from (\n" + " select VARIABLE_COST, AREA, BRAND, CHANNEL, COUNTRY,DISCOUNT, FIXED_COST," + " LOB, \"MONTH\", PRICE_LEVEL, PRODUCTS, QUARTER,REGION, REVENUE, TYPE," +

                " UNITS, YEARS from XSchema.t6p4) T1000002\n" +
                " where (T1000002.\"QUARTER\" = 'Qtr 4')\n" +
                " ) D102\n" +
                " group by D102.c1, D102.c2\n" +
                " ) D1000001\n" +
                " group by D1000001.c1\n";
        String tbl = "C1\tC2\n" +
                "Oct\t15.34\n" +
                "Dec\t13.32\n" +

"Nov\t81.67\n"; // there is no ORDER BY, so orderdiffers from plsqls's

        String exp = "PLAN\n" +

"EnumerableAggregate(group=[{0}], C2=[$SUM0($2)]):rowcount = 1.0, cumulative cost = {62.375 rows, 50.0 cpu, 0.0 io}, id =307\n" + " EnumerableAggregate(group=[{8, 11}],C3=[$SUM0($13)]): rowcount = 10.0, cumulative cost = {61.25 rows, 50.0cpu, 0.0 io}, id = 305\n" + " EnumerableInterpreter: rowcount = 100.0,cumulative cost = {50.0 rows, 50.0 cpu, 0.0 io}, id = 303\n" +

                // TODO: project shall be present here

" BindableTableScan(table=[[XSCHEMA, T6P4]],filters=[[=($11, 'Qtr 4')]]): rowcount = 100.0, cumulative cost = {1.0rows, 1.01 cpu, 0.0 io}, id = 259\n" +

                "\n";
        Utils.testeq(conn, explain(sql), explained(tbl, exp));
    }
---

Please notice that each next testcase uses previous one's SQL as asubselect.Please notice that in part01 and part02 we have projects!=null at thelowest level (BindableTableScan), but on part03 and part04 we don'thave, thus need to calculate and return all columns.

I understand that ProjectableFilterableTable is a simple interface, butI'd expect it not to query all tables when not needed...

(When I tried to use example'sorg.apache.calcite.adapter.csv.CsvSchemaFactory withflavor=TRANSLATABLE, also at the lowest level I did not get bothprojects and filters...)

(And yes, the https://issues.apache.org/jira/browse/CALCITE-1876 looksto be applicable for my case...)

Julian, when you say "write a rule", what do you mean? Is it more thanusing TranslatableTable?


- Alexey.

On 10/23/2017 06:55 PM, Julian Hyde wrote:

I agree with what Luis said.

Also, remember that ProjectableFilterableTable is intended to be a simple 
solution for the simple, common cases. If you need something more complex you 
may need to write a rule.

Julian

On Oct 23, 2017, at 4:49 AM, Luis Fernando Kauer <[email protected]> 
wrote:

Hi,
Can you give us some examples of the queries you tested? Include the the query 
plan Calcite generated.  (Use EXPLAIN PLAN FOR you query)
Currently, aggregates with no column reference, like count(*), generates a plan 
that scans all projects when using ProjectableFilterableTable. I'm not sure it 
there is a Jira for that already.
The other option is to use TranslatableTable, but be aware that you'll have to 
implement rules to push the necessary projects to the table scan, unlike when 
using ProjectableFilterableTable which has many rules built in Calcite for that.
See also:[CALCITE-1876] Create a rule to push the projections used in aggregate 
functions - ASF JIRA

Druid adapter calls Druid to execute the table scan. In CSV Adapter it is 
executed by Calcite using Enumerator.
Always check the query plan first and create rules to optimize it the way you 
want.
Enable tracing if you want to check which rules were applied.

https://urldefense.proofpoint.com/v2/url?u=https-3A__calcite.apache.org_docs_howto.html-23tracing&d=DwIFAg&c=RoP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=pDLIINlr5rZagLGo3VNW6xofw209iCAO8VpXxjH1KgY&m=-3fG7DlmYthyhzeTqOIfKXQzVwtgKia_9hvEy7_8wGs&s=Vg9O49VuA22sgdF-3WilTppWSc-_yf2bvfcRgrnpqoY&e=

Re: ProjectableFilterableTable.scan() and expensive columns

Reply via email to