[
https://issues.apache.org/jira/browse/FLINK-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Timo Walther resolved FLINK-5394.
---------------------------------
Resolution: Fixed
Fix Version/s: 1.2.0
Fixed in 1.3.0: b1913a4af5ff5794629e66e2f4b0d26054d3fc29
Fixed in 1.2.0: 02b0e65034cb74944aebff33d44470483a5f26e7
> the estimateRowCount method of DataSetCalc didn't work
> ------------------------------------------------------
>
> Key: FLINK-5394
> URL: https://issues.apache.org/jira/browse/FLINK-5394
> Project: Flink
> Issue Type: Bug
> Components: Table API & SQL
> Reporter: zhangjing
> Assignee: zhangjing
> Fix For: 1.2.0
>
>
> The estimateRowCount method of DataSetCalc didn't work now.
> If I run the following code,
> {code}
> Table table = tableEnv
> .fromDataSet(data, "a, b, c")
> .groupBy("a")
> .select("a, a.avg, b.sum, c.count")
> .where("a == 1");
> {code}
> the cost of every node in Optimized node tree is :
> {code}
> DataSetAggregate(groupBy=[a], select=[a, AVG(a) AS TMP_0, SUM(b) AS TMP_1,
> COUNT(c) AS TMP_2]): rowcount = 1000.0, cumulative cost = {3000.0 rows,
> 5000.0 cpu, 28000.0 io}
> DataSetCalc(select=[a, b, c], where=[=(a, 1)]): rowcount = 1000.0,
> cumulative cost = {2000.0 rows, 2000.0 cpu, 0.0 io}
> DataSetScan(table=[[_DataSetTable_0]]): rowcount = 1000.0, cumulative
> cost = {1000.0 rows, 1000.0 cpu, 0.0 io}
> {code}
> We expect the input rowcount of DataSetAggregate less than 1000, however the
> actual input rowcount is still 1000 because the the estimateRowCount method
> of DataSetCalc didn't work.
> There are two reasons caused to this:
> 1. Didn't provide custom metadataProvider yet. So when DataSetAggregate calls
> RelMetadataQuery.getRowCount(DataSetCalc) to estimate its input rowcount
> which would dispatch to RelMdRowCount.
> 2. DataSetCalc is subclass of SingleRel. So previous function call would
> match getRowCount(SingleRel rel, RelMetadataQuery mq) which would never use
> DataSetCalc.estimateRowCount.
> The question would also appear to all Flink RelNodes which are subclass of
> SingleRel.
> I plan to resolve this problem by adding a FlinkRelMdRowCount which contains
> specific getRowCount of Flink RelNodes.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)