[ 
https://issues.apache.org/jira/browse/BEAM-4663?focusedWorklogId=155114&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155114
 ]

ASF GitHub Bot logged work on BEAM-4663:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 16/Oct/18 20:50
            Start Date: 16/Oct/18 20:50
    Worklog Time Spent: 10m 
      Work Description: apilloud commented on issue #6656: [BEAM-4663] [SQL] 
CBO cost calculation
URL: https://github.com/apache/beam/pull/6656#issuecomment-430395733
 
 
   Overriding Calcite's cost functions in Beam SQL isn't going to buy us much 
until we implement `getStatistic` in BeamCalciteTable instead of using 
[UNKNOWN](https://github.com/apache/calcite/blob/d59b639d27da704f00eff616324a2c04aa06f84c/core/src/main/java/org/apache/calcite/schema/Statistics.java#L37).
 Calcite heavily weights RowCount and [it is the only attribute 
considered](https://github.com/apache/calcite/blob/d59b639d27da704f00eff616324a2c04aa06f84c/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoCost.java#L98)
 in the initial sort.
   
   This also drops important internal information in the cost model. The 
builtin 
[Aggregate](https://github.com/apache/calcite/blob/d59b639d27da704f00eff616324a2c04aa06f84c/core/src/main/java/org/apache/calcite/rel/core/Aggregate.java#L317)
 prefers the `$SUM0` operator via the cost model. The builtin 
[Join](https://github.com/apache/calcite/blob/d59b639d27da704f00eff616324a2c04aa06f84c/core/src/main/java/org/apache/calcite/rel/core/Join.java#L196)
 takes into account the join condition via the row count estimate. If we are 
going to do this, we need to extend the builtin cost model rather than 
overriding it to preserve this.
   
   I'm also not convinced that the internal model's assumption that dIo = 0 is 
wrong. (That appears to be the primary difference here.) Outside of Aggregate 
operators that assumption is effectively true in Dataflow. This is an area 
where we should have tests showing that our model produces better plans than 
the default.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 155114)
    Time Spent: 1h 50m  (was: 1h 40m)

> Implement Cost calculations for Cost-Based Optimization (CBO) 
> --------------------------------------------------------------
>
>                 Key: BEAM-4663
>                 URL: https://issues.apache.org/jira/browse/BEAM-4663
>             Project: Beam
>          Issue Type: Sub-task
>          Components: dsl-sql
>            Reporter: Kai Jiang
>            Assignee: Kai Jiang
>            Priority: Major
>          Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> To support CBO, we should implement methods in each Beam*Rel.java.  
> computeSelfCost(...) as our first step.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to