[jira] [Commented] (FLINK-5226) Eagerly project unused attributes

2016-12-14 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15748469#comment-15748469
 ] 

Fabian Hueske commented on FLINK-5226:
--

[~tonycox], costs make sense for streaming operators as well. Otherwise the 
optimizer can not compare plans (all have the same cost) and will not push 
filters or projections close to the source of a stream.
For cases with a single stream, we can work with a static input cardinality. If 
we have multiple streams, the input cardinality of a stream should somehow 
reflect the rate of incoming events (events / second).


> Eagerly project unused attributes
> -
>
> Key: FLINK-5226
> URL: https://issues.apache.org/jira/browse/FLINK-5226
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Affects Versions: 1.2.0
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
> Fix For: 1.2.0
>
>
> The optimizer does currently not eagerly remove unused attributes. 
> For example given a table {{tab5}} with five attributes {{a, b, c, d, e}}, 
> the following query
> {code}
> SELECT x.a, y.b FROM tab5 AS x, tab5 AS y WHERE x.a = y.a
> {code}
> would result in the non-optimized plan
> {code}
> LogicalProject(a=[$0], b=[$6])
>   LogicalFilter(condition=[=($0, $5)])
> LogicalJoin(condition=[true], joinType=[inner])
>   LogicalTableScan(table=[[tab5]])
>   LogicalTableScan(table=[[tab5]])
> {code}
> and the optimized plan:
> {code}
> DataSetCalc(select=[a, b0 AS b])
>   DataSetJoin(where=[=(a, a0)], join=[a, b, c, d, e, a0, b0, c0, d0, e0], 
> joinType=[InnerJoin])
> DataSetScan(table=[[_DataSetTable_0]])
> DataSetScan(table=[[_DataSetTable_0]])
> {code}
> This plan is inefficient because it joins all ten attributes of both tables 
> instead of eagerly projecting out all unused fields ({{x.b, x.c, x.d, x.e, 
> y.c, y.d, y.e}}).
> Since this is one of the most common optimizations, I would assume that 
> Calcite provides some rules to extract eager projections. If this is the 
> case, the issue can be solved by adding such rules to {{FlinkRuleSets}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5226) Eagerly project unused attributes

2016-12-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15748414#comment-15748414
 ] 

ASF GitHub Bot commented on FLINK-5226:
---

Github user tonycox commented on the issue:

https://github.com/apache/flink/pull/2926
  
@fhueske is there any sense to createcost optimization for Stream? 


> Eagerly project unused attributes
> -
>
> Key: FLINK-5226
> URL: https://issues.apache.org/jira/browse/FLINK-5226
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Affects Versions: 1.2.0
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
> Fix For: 1.2.0
>
>
> The optimizer does currently not eagerly remove unused attributes. 
> For example given a table {{tab5}} with five attributes {{a, b, c, d, e}}, 
> the following query
> {code}
> SELECT x.a, y.b FROM tab5 AS x, tab5 AS y WHERE x.a = y.a
> {code}
> would result in the non-optimized plan
> {code}
> LogicalProject(a=[$0], b=[$6])
>   LogicalFilter(condition=[=($0, $5)])
> LogicalJoin(condition=[true], joinType=[inner])
>   LogicalTableScan(table=[[tab5]])
>   LogicalTableScan(table=[[tab5]])
> {code}
> and the optimized plan:
> {code}
> DataSetCalc(select=[a, b0 AS b])
>   DataSetJoin(where=[=(a, a0)], join=[a, b, c, d, e, a0, b0, c0, d0, e0], 
> joinType=[InnerJoin])
> DataSetScan(table=[[_DataSetTable_0]])
> DataSetScan(table=[[_DataSetTable_0]])
> {code}
> This plan is inefficient because it joins all ten attributes of both tables 
> instead of eagerly projecting out all unused fields ({{x.b, x.c, x.d, x.e, 
> y.c, y.d, y.e}}).
> Since this is one of the most common optimizations, I would assume that 
> Calcite provides some rules to extract eager projections. If this is the 
> case, the issue can be solved by adding such rules to {{FlinkRuleSets}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5226) Eagerly project unused attributes

2016-12-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15733310#comment-15733310
 ] 

ASF GitHub Bot commented on FLINK-5226:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2926


> Eagerly project unused attributes
> -
>
> Key: FLINK-5226
> URL: https://issues.apache.org/jira/browse/FLINK-5226
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Affects Versions: 1.2.0
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
>
> The optimizer does currently not eagerly remove unused attributes. 
> For example given a table {{tab5}} with five attributes {{a, b, c, d, e}}, 
> the following query
> {code}
> SELECT x.a, y.b FROM tab5 AS x, tab5 AS y WHERE x.a = y.a
> {code}
> would result in the non-optimized plan
> {code}
> LogicalProject(a=[$0], b=[$6])
>   LogicalFilter(condition=[=($0, $5)])
> LogicalJoin(condition=[true], joinType=[inner])
>   LogicalTableScan(table=[[tab5]])
>   LogicalTableScan(table=[[tab5]])
> {code}
> and the optimized plan:
> {code}
> DataSetCalc(select=[a, b0 AS b])
>   DataSetJoin(where=[=(a, a0)], join=[a, b, c, d, e, a0, b0, c0, d0, e0], 
> joinType=[InnerJoin])
> DataSetScan(table=[[_DataSetTable_0]])
> DataSetScan(table=[[_DataSetTable_0]])
> {code}
> This plan is inefficient because it joins all ten attributes of both tables 
> instead of eagerly projecting out all unused fields ({{x.b, x.c, x.d, x.e, 
> y.c, y.d, y.e}}).
> Since this is one of the most common optimizations, I would assume that 
> Calcite provides some rules to extract eager projections. If this is the 
> case, the issue can be solved by adding such rules to {{FlinkRuleSets}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5226) Eagerly project unused attributes

2016-12-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15732843#comment-15732843
 ] 

ASF GitHub Bot commented on FLINK-5226:
---

Github user fhueske commented on the issue:

https://github.com/apache/flink/pull/2926
  
Merging


> Eagerly project unused attributes
> -
>
> Key: FLINK-5226
> URL: https://issues.apache.org/jira/browse/FLINK-5226
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Affects Versions: 1.2.0
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
>
> The optimizer does currently not eagerly remove unused attributes. 
> For example given a table {{tab5}} with five attributes {{a, b, c, d, e}}, 
> the following query
> {code}
> SELECT x.a, y.b FROM tab5 AS x, tab5 AS y WHERE x.a = y.a
> {code}
> would result in the non-optimized plan
> {code}
> LogicalProject(a=[$0], b=[$6])
>   LogicalFilter(condition=[=($0, $5)])
> LogicalJoin(condition=[true], joinType=[inner])
>   LogicalTableScan(table=[[tab5]])
>   LogicalTableScan(table=[[tab5]])
> {code}
> and the optimized plan:
> {code}
> DataSetCalc(select=[a, b0 AS b])
>   DataSetJoin(where=[=(a, a0)], join=[a, b, c, d, e, a0, b0, c0, d0, e0], 
> joinType=[InnerJoin])
> DataSetScan(table=[[_DataSetTable_0]])
> DataSetScan(table=[[_DataSetTable_0]])
> {code}
> This plan is inefficient because it joins all ten attributes of both tables 
> instead of eagerly projecting out all unused fields ({{x.b, x.c, x.d, x.e, 
> y.c, y.d, y.e}}).
> Since this is one of the most common optimizations, I would assume that 
> Calcite provides some rules to extract eager projections. If this is the 
> case, the issue can be solved by adding such rules to {{FlinkRuleSets}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5226) Eagerly project unused attributes

2016-12-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731552#comment-15731552
 ] 

ASF GitHub Bot commented on FLINK-5226:
---

Github user fhueske commented on the issue:

https://github.com/apache/flink/pull/2926
  
Thanks for the reviews @KurtYoung and @beyond1920.

I will merge this PR later today.


> Eagerly project unused attributes
> -
>
> Key: FLINK-5226
> URL: https://issues.apache.org/jira/browse/FLINK-5226
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Affects Versions: 1.2.0
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
>
> The optimizer does currently not eagerly remove unused attributes. 
> For example given a table {{tab5}} with five attributes {{a, b, c, d, e}}, 
> the following query
> {code}
> SELECT x.a, y.b FROM tab5 AS x, tab5 AS y WHERE x.a = y.a
> {code}
> would result in the non-optimized plan
> {code}
> LogicalProject(a=[$0], b=[$6])
>   LogicalFilter(condition=[=($0, $5)])
> LogicalJoin(condition=[true], joinType=[inner])
>   LogicalTableScan(table=[[tab5]])
>   LogicalTableScan(table=[[tab5]])
> {code}
> and the optimized plan:
> {code}
> DataSetCalc(select=[a, b0 AS b])
>   DataSetJoin(where=[=(a, a0)], join=[a, b, c, d, e, a0, b0, c0, d0, e0], 
> joinType=[InnerJoin])
> DataSetScan(table=[[_DataSetTable_0]])
> DataSetScan(table=[[_DataSetTable_0]])
> {code}
> This plan is inefficient because it joins all ten attributes of both tables 
> instead of eagerly projecting out all unused fields ({{x.b, x.c, x.d, x.e, 
> y.c, y.d, y.e}}).
> Since this is one of the most common optimizations, I would assume that 
> Calcite provides some rules to extract eager projections. If this is the 
> case, the issue can be solved by adding such rules to {{FlinkRuleSets}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5226) Eagerly project unused attributes

2016-12-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731472#comment-15731472
 ] 

ASF GitHub Bot commented on FLINK-5226:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/2926#discussion_r91460589
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/plan/nodes/dataset/DataSetCalc.scala
 ---
@@ -73,7 +75,11 @@ class DataSetCalc(
 
 val child = this.getInput
 val rowCnt = metadata.getRowCount(child)
-val exprCnt = calcProgram.getExprCount
+
+// compute number of non-field access expressions (computations, 
conditions, etc.)
+//   we only want to account for computations, not for simple 
projections.
+val exprCnt = 
calcProgram.getExprList.asScala.toList.count(!_.isInstanceOf[RexInputRef])
--- End diff --

Very good point, thanks!


> Eagerly project unused attributes
> -
>
> Key: FLINK-5226
> URL: https://issues.apache.org/jira/browse/FLINK-5226
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Affects Versions: 1.2.0
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
>
> The optimizer does currently not eagerly remove unused attributes. 
> For example given a table {{tab5}} with five attributes {{a, b, c, d, e}}, 
> the following query
> {code}
> SELECT x.a, y.b FROM tab5 AS x, tab5 AS y WHERE x.a = y.a
> {code}
> would result in the non-optimized plan
> {code}
> LogicalProject(a=[$0], b=[$6])
>   LogicalFilter(condition=[=($0, $5)])
> LogicalJoin(condition=[true], joinType=[inner])
>   LogicalTableScan(table=[[tab5]])
>   LogicalTableScan(table=[[tab5]])
> {code}
> and the optimized plan:
> {code}
> DataSetCalc(select=[a, b0 AS b])
>   DataSetJoin(where=[=(a, a0)], join=[a, b, c, d, e, a0, b0, c0, d0, e0], 
> joinType=[InnerJoin])
> DataSetScan(table=[[_DataSetTable_0]])
> DataSetScan(table=[[_DataSetTable_0]])
> {code}
> This plan is inefficient because it joins all ten attributes of both tables 
> instead of eagerly projecting out all unused fields ({{x.b, x.c, x.d, x.e, 
> y.c, y.d, y.e}}).
> Since this is one of the most common optimizations, I would assume that 
> Calcite provides some rules to extract eager projections. If this is the 
> case, the issue can be solved by adding such rules to {{FlinkRuleSets}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5226) Eagerly project unused attributes

2016-12-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731343#comment-15731343
 ] 

ASF GitHub Bot commented on FLINK-5226:
---

Github user beyond1920 commented on a diff in the pull request:

https://github.com/apache/flink/pull/2926#discussion_r91455098
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/plan/nodes/dataset/DataSetCalc.scala
 ---
@@ -73,7 +75,11 @@ class DataSetCalc(
 
 val child = this.getInput
 val rowCnt = metadata.getRowCount(child)
-val exprCnt = calcProgram.getExprCount
+
+// compute number of non-field access expressions (computations, 
conditions, etc.)
+//   we only want to account for computations, not for simple 
projections.
+val exprCnt = 
calcProgram.getExprList.asScala.toList.count(!_.isInstanceOf[RexInputRef])
--- End diff --

maybe we could also exclude RexLiteral as well.


> Eagerly project unused attributes
> -
>
> Key: FLINK-5226
> URL: https://issues.apache.org/jira/browse/FLINK-5226
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Affects Versions: 1.2.0
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
>
> The optimizer does currently not eagerly remove unused attributes. 
> For example given a table {{tab5}} with five attributes {{a, b, c, d, e}}, 
> the following query
> {code}
> SELECT x.a, y.b FROM tab5 AS x, tab5 AS y WHERE x.a = y.a
> {code}
> would result in the non-optimized plan
> {code}
> LogicalProject(a=[$0], b=[$6])
>   LogicalFilter(condition=[=($0, $5)])
> LogicalJoin(condition=[true], joinType=[inner])
>   LogicalTableScan(table=[[tab5]])
>   LogicalTableScan(table=[[tab5]])
> {code}
> and the optimized plan:
> {code}
> DataSetCalc(select=[a, b0 AS b])
>   DataSetJoin(where=[=(a, a0)], join=[a, b, c, d, e, a0, b0, c0, d0, e0], 
> joinType=[InnerJoin])
> DataSetScan(table=[[_DataSetTable_0]])
> DataSetScan(table=[[_DataSetTable_0]])
> {code}
> This plan is inefficient because it joins all ten attributes of both tables 
> instead of eagerly projecting out all unused fields ({{x.b, x.c, x.d, x.e, 
> y.c, y.d, y.e}}).
> Since this is one of the most common optimizations, I would assume that 
> Calcite provides some rules to extract eager projections. If this is the 
> case, the issue can be solved by adding such rules to {{FlinkRuleSets}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5226) Eagerly project unused attributes

2016-12-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730719#comment-15730719
 ] 

ASF GitHub Bot commented on FLINK-5226:
---

Github user KurtYoung commented on the issue:

https://github.com/apache/flink/pull/2926
  
Cool, LGTM


> Eagerly project unused attributes
> -
>
> Key: FLINK-5226
> URL: https://issues.apache.org/jira/browse/FLINK-5226
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Affects Versions: 1.2.0
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
>
> The optimizer does currently not eagerly remove unused attributes. 
> For example given a table {{tab5}} with five attributes {{a, b, c, d, e}}, 
> the following query
> {code}
> SELECT x.a, y.b FROM tab5 AS x, tab5 AS y WHERE x.a = y.a
> {code}
> would result in the non-optimized plan
> {code}
> LogicalProject(a=[$0], b=[$6])
>   LogicalFilter(condition=[=($0, $5)])
> LogicalJoin(condition=[true], joinType=[inner])
>   LogicalTableScan(table=[[tab5]])
>   LogicalTableScan(table=[[tab5]])
> {code}
> and the optimized plan:
> {code}
> DataSetCalc(select=[a, b0 AS b])
>   DataSetJoin(where=[=(a, a0)], join=[a, b, c, d, e, a0, b0, c0, d0, e0], 
> joinType=[InnerJoin])
> DataSetScan(table=[[_DataSetTable_0]])
> DataSetScan(table=[[_DataSetTable_0]])
> {code}
> This plan is inefficient because it joins all ten attributes of both tables 
> instead of eagerly projecting out all unused fields ({{x.b, x.c, x.d, x.e, 
> y.c, y.d, y.e}}).
> Since this is one of the most common optimizations, I would assume that 
> Calcite provides some rules to extract eager projections. If this is the 
> case, the issue can be solved by adding such rules to {{FlinkRuleSets}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5226) Eagerly project unused attributes

2016-12-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15715683#comment-15715683
 ] 

ASF GitHub Bot commented on FLINK-5226:
---

GitHub user fhueske opened a pull request:

https://github.com/apache/flink/pull/2926

[FLINK-5226] [table] Use correct DataSetCostFactory and improve DataSetCalc 
costs.

- Improved DataSetCalc costs make projections cheap and help to push them 
down.
- Adapted existing tests that check optimized plans

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/fhueske/flink tableEagerProject

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2926.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2926


commit 11142952016f3777eb3305aead0f83e9271fe736
Author: Fabian Hueske 
Date:   2016-12-02T14:28:16Z

[FLINK-5226] [table] Use correct DataSetCostFactory and improve DataSetCalc 
costs.

- Improved DataSetCalc costs make projections cheap and help to push them 
down.




> Eagerly project unused attributes
> -
>
> Key: FLINK-5226
> URL: https://issues.apache.org/jira/browse/FLINK-5226
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Affects Versions: 1.2.0
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
>
> The optimizer does currently not eagerly remove unused attributes. 
> For example given a table {{tab5}} with five attributes {{a, b, c, d, e}}, 
> the following query
> {code}
> SELECT x.a, y.b FROM tab5 AS x, tab5 AS y WHERE x.a = y.a
> {code}
> would result in the non-optimized plan
> {code}
> LogicalProject(a=[$0], b=[$6])
>   LogicalFilter(condition=[=($0, $5)])
> LogicalJoin(condition=[true], joinType=[inner])
>   LogicalTableScan(table=[[tab5]])
>   LogicalTableScan(table=[[tab5]])
> {code}
> and the optimized plan:
> {code}
> DataSetCalc(select=[a, b0 AS b])
>   DataSetJoin(where=[=(a, a0)], join=[a, b, c, d, e, a0, b0, c0, d0, e0], 
> joinType=[InnerJoin])
> DataSetScan(table=[[_DataSetTable_0]])
> DataSetScan(table=[[_DataSetTable_0]])
> {code}
> This plan is inefficient because it joins all ten attributes of both tables 
> instead of eagerly projecting out all unused fields ({{x.b, x.c, x.d, x.e, 
> y.c, y.d, y.e}}).
> Since this is one of the most common optimizations, I would assume that 
> Calcite provides some rules to extract eager projections. If this is the 
> case, the issue can be solved by adding such rules to {{FlinkRuleSets}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5226) Eagerly project unused attributes

2016-12-02 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15715341#comment-15715341
 ] 

Fabian Hueske commented on FLINK-5226:
--

I found the issue. The planner was not initialized with the correct cost 
factory.
After fixing that and some modification of {{DataSetCalc}} cost function, 
projections are correctly pushed down.

Will open a PR after the tests passed.

> Eagerly project unused attributes
> -
>
> Key: FLINK-5226
> URL: https://issues.apache.org/jira/browse/FLINK-5226
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Affects Versions: 1.2.0
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
>
> The optimizer does currently not eagerly remove unused attributes. 
> For example given a table {{tab5}} with five attributes {{a, b, c, d, e}}, 
> the following query
> {code}
> SELECT x.a, y.b FROM tab5 AS x, tab5 AS y WHERE x.a = y.a
> {code}
> would result in the non-optimized plan
> {code}
> LogicalProject(a=[$0], b=[$6])
>   LogicalFilter(condition=[=($0, $5)])
> LogicalJoin(condition=[true], joinType=[inner])
>   LogicalTableScan(table=[[tab5]])
>   LogicalTableScan(table=[[tab5]])
> {code}
> and the optimized plan:
> {code}
> DataSetCalc(select=[a, b0 AS b])
>   DataSetJoin(where=[=(a, a0)], join=[a, b, c, d, e, a0, b0, c0, d0, e0], 
> joinType=[InnerJoin])
> DataSetScan(table=[[_DataSetTable_0]])
> DataSetScan(table=[[_DataSetTable_0]])
> {code}
> This plan is inefficient because it joins all ten attributes of both tables 
> instead of eagerly projecting out all unused fields ({{x.b, x.c, x.d, x.e, 
> y.c, y.d, y.e}}).
> Since this is one of the most common optimizations, I would assume that 
> Calcite provides some rules to extract eager projections. If this is the 
> case, the issue can be solved by adding such rules to {{FlinkRuleSets}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5226) Eagerly project unused attributes

2016-12-02 Thread zhangjing (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15714962#comment-15714962
 ] 

zhangjing commented on FLINK-5226:
--

Hi Fabian, FlinkRuleSets already contains ProjectJoinTransposeRule, this rule 
would push project into inputs of Join. And in the above case,  
ProjectJoinTransposeRule is applied in fact, but the path it effects is not 
chosen as cheapest one by VolcanoPlanner because of the cost mode. When we do 
projection pushdown optimization,  we change computeSelfCost of BatchScan to 
take column count into consideration, and find out the path would be choose as 
best one, as the PushProjectIntoBatchTableSourceScanITCase.testJoinOnScanSql in 
https://github.com/fhueske/flink/commit/a6a40e9b6dee4ab178f1e497c66dbc7e577b67e6.

> Eagerly project unused attributes
> -
>
> Key: FLINK-5226
> URL: https://issues.apache.org/jira/browse/FLINK-5226
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Affects Versions: 1.2.0
>Reporter: Fabian Hueske
>
> The optimizer does currently not eagerly remove unused attributes. 
> For example given a table {{tab5}} with five attributes {{a, b, c, d, e}}, 
> the following query
> {code}
> SELECT x.a, y.b FROM tab5 AS x, tab5 AS y WHERE x.a = y.a
> {code}
> would result in the non-optimized plan
> {code}
> LogicalProject(a=[$0], b=[$6])
>   LogicalFilter(condition=[=($0, $5)])
> LogicalJoin(condition=[true], joinType=[inner])
>   LogicalTableScan(table=[[tab5]])
>   LogicalTableScan(table=[[tab5]])
> {code}
> and the optimized plan:
> {code}
> DataSetCalc(select=[a, b0 AS b])
>   DataSetJoin(where=[=(a, a0)], join=[a, b, c, d, e, a0, b0, c0, d0, e0], 
> joinType=[InnerJoin])
> DataSetScan(table=[[_DataSetTable_0]])
> DataSetScan(table=[[_DataSetTable_0]])
> {code}
> This plan is inefficient because it joins all ten attributes of both tables 
> instead of eagerly projecting out all unused fields ({{x.b, x.c, x.d, x.e, 
> y.c, y.d, y.e}}).
> Since this is one of the most common optimizations, I would assume that 
> Calcite provides some rules to extract eager projections. If this is the 
> case, the issue can be solved by adding such rules to {{FlinkRuleSets}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)