Thank you, Luis Fernando and Julian...
Here are the tests (extracted from my junit code):
---
@Test
public void test_sst_06_part01() {
String sql = "select VARIABLE_COST, AREA, BRAND, CHANNEL,
COUNTRY, DISCOUNT, FIXED_COST," +
" LOB, \"MONTH\", PRICE_LEVEL, PRODUCTS, QUARTER,
REGION, REVENUE, TYPE," +
" UNITS, YEARS from XSchema.t6p4";
String tbl =
"VARIABLE_COST\tAREA\tBRAND\tCHANNEL\tCOUNTRY\tDISCOUNT\tFIXED_COST\tLOB\tMONTH\tPRICE_LEVEL\tPRODUCTS\tQUARTER\tREGION\tREVENUE\tTYPE\tUNITS\tYEARS\n"
+
"18.44\tArea1\tBrand1\tStr\tCountry1\t0.0\t21.49\tLob1\tDec\t1.0\tProd1\tQtr
4\tEMEA\t13.32\tType1\t14.0\tCY2014\n" +
"7.34\tArea1\tBrand2\tStr\tCountry1\t16.24\t68.33\tLob2\tNov\t2.0\tProd2\tQtr
4\tEMEA\t81.67\tType2\t12.0\tCY2014\n" +
"49.06\tArea1\tBrand1\tStr\tCountry1\t41.93\t16.23\tLob1\tMar\t1.0\tProd1\tQtr
1\tEMEA\t16.76\tType1\t21.0\tCY2016\n" +
"61.07\tArea1\tFunPod\tStr\tCountry1\t23.94\t10.31\tLob3\tMar\t1.0\tProd3\tQtr
1\tEMEA\t61.24\tType3\t9.0\tCY2014\n" +
"4.11\tArea1\tBrand1\tCat\tCountry1\t70.97\t15.84\tLob1\tJan\t2.0\tProd4\tQtr
1\tEMEA\t52.49\tType4\t12.0\tCY2015\n" +
"19.16\tArea1\tBrand2\tCat\tCountry1\t6.32\t36.81\tLob2\tFeb\t4.0\tProd5\tQtr
1\tEMEA\t23.18\tType5\t10.0\tCY2014\n" +
"71.93\tArea1\tFunPod\tOnl\tCountry1\t0.0\t15.62\tLob3\tOct\t1.0\tProd6\tQtr
4\tEMEA\t5.61\tType6\t2.0\tCY2014\n" +
"23.19\tArea1\tFunPod\tOnl\tCountry1\t13.85\t19.13\tLob3\tOct\t1.0\tProd6\tQtr
4\tEMEA\t9.73\tType6\t7.0\tCY2015\n";
String exp = "PLAN\n" +
"EnumerableInterpreter: rowcount = 100.0, cumulative
cost = {50.0 rows, 50.0 cpu, 0.0 io}, id = 30\n" +
" BindableTableScan(table=[[XSCHEMA, T6P4]],
projects=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]]):
rowcount = 100.0, cumulative cost = {1.0 rows, 1.01 cpu, 0.0 io}, id =
14\n" +
"\n";
Utils.testeq(conn, explain(sql), explained(tbl, exp));
}
@Test
public void test_sst_06_part02() {
String sql = "select T1000002.\"MONTH\" as c1,
T1000002.\"QUARTER\" as c2, T1000002.\"REVENUE\" as c3 from (\n" +
" select VARIABLE_COST, AREA, BRAND, CHANNEL, COUNTRY,
DISCOUNT, FIXED_COST," +
" LOB, \"MONTH\", PRICE_LEVEL, PRODUCTS, QUARTER,
REGION, REVENUE, TYPE," +
" UNITS, YEARS from XSchema.t6p4) T1000002\n" +
" where (T1000002.\"QUARTER\" = 'Qtr 4')";
String tbl = "C1\tC2\tC3\n" +
"Dec\tQtr 4\t13.32\n" +
"Nov\tQtr 4\t81.67\n" +
"Oct\tQtr 4\t5.61\n" +
"Oct\tQtr 4\t9.73\n";
String exp = "PLAN\n" +
"EnumerableInterpreter: rowcount = 100.0, cumulative
cost = {50.0 rows, 50.0 cpu, 0.0 io}, id = 97\n" +
" BindableTableScan(table=[[XSCHEMA, T6P4]],
filters=[[=($11, 'Qtr 4')]], projects=[[8, 11, 13]]): rowcount = 100.0,
cumulative cost = {1.0 rows, 1.01 cpu, 0.0 io}, id = 83\n" +
"\n";
Utils.testeq(conn, explain(sql), explained(tbl, exp));
}
@Test
public void test_sst_06_part03() {
String sql = "select D102.c1 as c1, D102.c2 as c2, sum(D102.c3)
as c3 from (\n" +
" select T1000002.\"MONTH\" as c1, T1000002.\"QUARTER\"
as c2, T1000002.\"REVENUE\" as c3 from (\n" +
" select VARIABLE_COST, AREA, BRAND, CHANNEL, COUNTRY,
DISCOUNT, FIXED_COST," +
" LOB, \"MONTH\", PRICE_LEVEL, PRODUCTS, QUARTER,
REGION, REVENUE, TYPE," +
" UNITS, YEARS from XSchema.t6p4) T1000002\n" +
" where (T1000002.\"QUARTER\" = 'Qtr 4')\n" +
" ) D102\n" +
" group by D102.c1, D102.c2\n";
String tbl = "C1\tC2\tC3\n" +
"Oct\tQtr 4\t15.34\n" +
"Dec\tQtr 4\t13.32\n" +
"Nov\tQtr 4\t81.67\n"; // there is no ORDER BY, so
order differs from plsql's
String exp = "PLAN\n" +
"EnumerableAggregate(group=[{8, 11}], C3=[$SUM0($13)]):
rowcount = 10.0, cumulative cost = {61.25 rows, 50.0 cpu, 0.0 io}, id =
186\n" +
" EnumerableInterpreter: rowcount = 100.0, cumulative
cost = {50.0 rows, 50.0 cpu, 0.0 io}, id = 184\n" +
// TODO: project shall be present here
" BindableTableScan(table=[[XSCHEMA, T6P4]],
filters=[[=($11, 'Qtr 4')]]): rowcount = 100.0, cumulative cost = {1.0
rows, 1.01 cpu, 0.0 io}, id = 148\n" +
"\n";
Utils.testeq(conn, explain(sql), explained(tbl, exp));
}
@Test
public void test_sst_06_part04() {
String sql = "select D1000001.c1 as c1, sum(D1000001.c3) as c2
from (\n" +
" select D102.c1 as c1, D102.c2 as c2, sum(D102.c3) as
c3 from (\n" +
" select T1000002.\"MONTH\" as c1, T1000002.\"QUARTER\"
as c2, T1000002.\"REVENUE\" as c3 from (\n" +
" select VARIABLE_COST, AREA, BRAND, CHANNEL, COUNTRY,
DISCOUNT, FIXED_COST," +
" LOB, \"MONTH\", PRICE_LEVEL, PRODUCTS, QUARTER,
REGION, REVENUE, TYPE," +
" UNITS, YEARS from XSchema.t6p4) T1000002\n" +
" where (T1000002.\"QUARTER\" = 'Qtr 4')\n" +
" ) D102\n" +
" group by D102.c1, D102.c2\n" +
" ) D1000001\n" +
" group by D1000001.c1\n";
String tbl = "C1\tC2\n" +
"Oct\t15.34\n" +
"Dec\t13.32\n" +
"Nov\t81.67\n"; // there is no ORDER BY, so order
differs from plsqls's
String exp = "PLAN\n" +
"EnumerableAggregate(group=[{0}], C2=[$SUM0($2)]):
rowcount = 1.0, cumulative cost = {62.375 rows, 50.0 cpu, 0.0 io}, id =
307\n" +
" EnumerableAggregate(group=[{8, 11}],
C3=[$SUM0($13)]): rowcount = 10.0, cumulative cost = {61.25 rows, 50.0
cpu, 0.0 io}, id = 305\n" +
" EnumerableInterpreter: rowcount = 100.0,
cumulative cost = {50.0 rows, 50.0 cpu, 0.0 io}, id = 303\n" +
// TODO: project shall be present here
" BindableTableScan(table=[[XSCHEMA, T6P4]],
filters=[[=($11, 'Qtr 4')]]): rowcount = 100.0, cumulative cost = {1.0
rows, 1.01 cpu, 0.0 io}, id = 259\n" +
"\n";
Utils.testeq(conn, explain(sql), explained(tbl, exp));
}
---
Please notice that each next testcase uses previous one's SQL as a
subselect.
Please notice that in part01 and part02 we have projects!=null at the
lowest level (BindableTableScan), but on part03 and part04 we don't
have, thus need to calculate and return all columns.
I understand that ProjectableFilterableTable is a simple interface, but
I'd expect it not to query all tables when not needed...
(When I tried to use example's
org.apache.calcite.adapter.csv.CsvSchemaFactory with
flavor=TRANSLATABLE, also at the lowest level I did not get both
projects and filters...)
(And yes, the https://issues.apache.org/jira/browse/CALCITE-1876 looks
to be applicable for my case...)
Julian, when you say "write a rule", what do you mean? Is it more than
using TranslatableTable?
- Alexey.
On 10/23/2017 06:55 PM, Julian Hyde wrote:
I agree with what Luis said.
Also, remember that ProjectableFilterableTable is intended to be a simple
solution for the simple, common cases. If you need something more complex you
may need to write a rule.
Julian
On Oct 23, 2017, at 4:49 AM, Luis Fernando Kauer <[email protected]>
wrote:
Hi,
Can you give us some examples of the queries you tested? Include the the query
plan Calcite generated. (Use EXPLAIN PLAN FOR you query)
Currently, aggregates with no column reference, like count(*), generates a plan
that scans all projects when using ProjectableFilterableTable. I'm not sure it
there is a Jira for that already.
The other option is to use TranslatableTable, but be aware that you'll have to
implement rules to push the necessary projects to the table scan, unlike when
using ProjectableFilterableTable which has many rules built in Calcite for that.
See also:[CALCITE-1876] Create a rule to push the projections used in aggregate
functions - ASF JIRA
Druid adapter calls Druid to execute the table scan. In CSV Adapter it is
executed by Calcite using Enumerator.
Always check the query plan first and create rules to optimize it the way you
want.
Enable tracing if you want to check which rules were applied.
https://urldefense.proofpoint.com/v2/url?u=https-3A__calcite.apache.org_docs_howto.html-23tracing&d=DwIFAg&c=RoP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=pDLIINlr5rZagLGo3VNW6xofw209iCAO8VpXxjH1KgY&m=-3fG7DlmYthyhzeTqOIfKXQzVwtgKia_9hvEy7_8wGs&s=Vg9O49VuA22sgdF-3WilTppWSc-_yf2bvfcRgrnpqoY&e=