[ https://issues.apache.org/jira/browse/BEAM-4663?focusedWorklogId=155114&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155114 ]
ASF GitHub Bot logged work on BEAM-4663: ---------------------------------------- Author: ASF GitHub Bot Created on: 16/Oct/18 20:50 Start Date: 16/Oct/18 20:50 Worklog Time Spent: 10m Work Description: apilloud commented on issue #6656: [BEAM-4663] [SQL] CBO cost calculation URL: https://github.com/apache/beam/pull/6656#issuecomment-430395733 Overriding Calcite's cost functions in Beam SQL isn't going to buy us much until we implement `getStatistic` in BeamCalciteTable instead of using [UNKNOWN](https://github.com/apache/calcite/blob/d59b639d27da704f00eff616324a2c04aa06f84c/core/src/main/java/org/apache/calcite/schema/Statistics.java#L37). Calcite heavily weights RowCount and [it is the only attribute considered](https://github.com/apache/calcite/blob/d59b639d27da704f00eff616324a2c04aa06f84c/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoCost.java#L98) in the initial sort. This also drops important internal information in the cost model. The builtin [Aggregate](https://github.com/apache/calcite/blob/d59b639d27da704f00eff616324a2c04aa06f84c/core/src/main/java/org/apache/calcite/rel/core/Aggregate.java#L317) prefers the `$SUM0` operator via the cost model. The builtin [Join](https://github.com/apache/calcite/blob/d59b639d27da704f00eff616324a2c04aa06f84c/core/src/main/java/org/apache/calcite/rel/core/Join.java#L196) takes into account the join condition via the row count estimate. If we are going to do this, we need to extend the builtin cost model rather than overriding it to preserve this. I'm also not convinced that the internal model's assumption that dIo = 0 is wrong. (That appears to be the primary difference here.) Outside of Aggregate operators that assumption is effectively true in Dataflow. This is an area where we should have tests showing that our model produces better plans than the default. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 155114) Time Spent: 1h 50m (was: 1h 40m) > Implement Cost calculations for Cost-Based Optimization (CBO) > -------------------------------------------------------------- > > Key: BEAM-4663 > URL: https://issues.apache.org/jira/browse/BEAM-4663 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql > Reporter: Kai Jiang > Assignee: Kai Jiang > Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > To support CBO, we should implement methods in each Beam*Rel.java. > computeSelfCost(...) as our first step. -- This message was sent by Atlassian JIRA (v7.6.3#76005)