[ https://issues.apache.org/jira/browse/FLINK-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15772984#comment-15772984 ]
ASF GitHub Bot commented on FLINK-3850: --------------------------------------- GitHub user NickolayVasilishin opened a pull request: https://github.com/apache/flink/pull/3040 [FLINK-3850] Add forward field annotations to DataSet Add forward field annotations to DataSet operators generated by the Table API - Added field forwarding at most of `DataSetRel` implementations. - String with forwarded fields allowed to be empty at `SemanticPropUtil.java` - Wrapper for indices based on types moved to object class `FieldForwardingUtils` - In most cases forwarding done only for conversion `BatchScan`: forwarding at conversion `DataSetAggregate`: forwarding at conversion `DataSetCalc`: forwarding based on unmodified at RexCalls operands `DataSetCorrelate`: forwarding based on unmodified at RexCalls operands `DataSetIntersect`: forwarding at conversion `DataSetJoin`: forwarding based on fields which are not keys `DataSetMinus`: forwarding at conversion `DataSetSingleRowJoin`: forwarded all fields from multi row dataset, single row used via broadcast `DataSetSort`: all fields forwarded + conversion I hope, I've understood the meaning of forward fields right: fields, that are not used for computations. So I assumed, that these fields are not used in `RexCalls` or as `join keys`. Also I forwarded fields in type conversions. The most complex thing was to determine correct input and output field names. You can merge this pull request into a Git repository by running: $ git pull https://github.com/NickolayVasilishin/flink FLINK-3850 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3040.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3040 ---- commit 25cc1f022eb399bade37ef7b0fd0b87a9e509d67 Author: nikolay_vasilishin <nikolay_vasilis...@epam.com> Date: 2016-12-23T10:50:46Z [FLINK-3850] Add forward field annotations to DataSet operators generated by the Table API - Added field forwarding at most of DataSetRel implementations. - String with forwarded fields allowed to be empty at SemanticPropUtil.java - Wrapper for indices based on types moved to object class FieldForwardingUtils - In most cases forwarding done only for conversion BatchScan: forwarding at conversion DataSetAggregate: forwarding at conversion DataSetCalc: forwarding based on unmodified at RexCalls operands DataSetCorrelate: forwarding based on unmodified at RexCalls operands DataSetIntersect: forwarding at conversion DataSetJoin: forwarding based on fields which are not keys DataSetMinus: forwarding at conversion DataSetSingleRowJoin: forwarded all fields from multi row dataset, single row used via broadcast DataSetSort: all fields forwarded + conversion Conflicts: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/dataset/BatchScan.scala flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/dataset/DataSetAggregate.scala flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/dataset/DataSetCalc.scala flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/dataset/DataSetCorrelate.scala flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/dataset/DataSetIntersect.scala flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/dataset/DataSetJoin.scala flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/dataset/DataSetMinus.scala flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/dataset/DataSetSingleRowJoin.scala flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/dataset/DataSetSort.scala ---- > Add forward field annotations to DataSet operators generated by the Table API > ----------------------------------------------------------------------------- > > Key: FLINK-3850 > URL: https://issues.apache.org/jira/browse/FLINK-3850 > Project: Flink > Issue Type: Improvement > Components: Table API & SQL > Reporter: Fabian Hueske > Assignee: Nikolay Vasilishin > > The DataSet API features semantic annotations [1] to hint the optimizer which > input fields an operator copies. This information is valuable for the > optimizer because it can infer that certain physical properties such as > partitioning or sorting are not destroyed by user functions and thus generate > more efficient execution plans. > The Table API is built on top of the DataSet API and generates DataSet > programs and code for user-defined functions. Hence, it knows exactly which > fields are modified and which not. We should use this information to > automatically generate forward field annotations and attach them to the > operators. This can help to significantly improve the performance of certain > jobs. > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/batch/index.html#semantic-annotations -- This message was sent by Atlassian JIRA (v6.3.4#6332)