zhangjing created FLINK-5394:
Summary: the estimateRowCount method of DataSetCalc didn't work
Key: FLINK-5394
URL: https://issues.apache.org/jira/browse/FLINK-5394
Project: Flink
Issue Type: Bug
Components: Table API & SQL
Reporter: zhangjing
Assignee: zhangjing
The estimateRowCount method of DataSetCalc didn't work now.
If I run the following code,
`
Table table = tableEnv
.fromDataSet(data, "a, b, c")
.groupBy("a")
.select("a, a.avg, b.sum, c.count")
.where("a == 1");
`
the cost of every node in Optimized node tree is :
`
DataSetAggregate(groupBy=[a], select=[a, AVG(a) AS TMP_0, SUM(b) AS TMP_1,
COUNT(c) AS TMP_2]): rowcount = 1000.0, cumulative cost = {3000.0 rows, 5000.0
cpu, 28000.0 io}, id = 28
DataSetCalc(select=[a, b, c], where=[=(a, 1)]): rowcount = 1000.0, cumulative
cost = {2000.0 rows, 2000.0 cpu, 0.0 io}, id = 27
DataSetScan(table=[[_DataSetTable_0]]): rowcount = 1000.0, cumulative cost
= {1000.0 rows, 1000.0 cpu, 0.0 io}, id = 20
`
We expect the input rowcount of DataSetAggregate less than 1000, however the
actual input rowcount is still 1000 because the the estimateRowCount method of
DataSetCalc didn't work.
There are two reasons caused to this:
1. when DataSetAggregate calls RelMetadataQuery.getRowCount(DataSetCalc) to
estimate its input rowcount which would dispatch to RelMdRowCount.
2. DataSetCalc is subclass of SingleRel, so previous function call would match
getRowCount(SingleRel rel, RelMetadataQuery mq) which would never use
DataSetCalc.estimateRowCount.
I plan to resolve this problem by adding a FlinkRelMdRowCount which contains
specific getRowCount of Flink RelNodes.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)