jcamachor commented on a change in pull request #1439:
URL: https://github.com/apache/hive/pull/1439#discussion_r482301409
##########
File path:
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveOnTezCostModel.java
##########
@@ -89,22 +89,23 @@ public RelOptCost getAggregateCost(HiveAggregate aggregate)
{
} else {
final RelMetadataQuery mq = aggregate.getCluster().getMetadataQuery();
// 1. Sum of input cardinalities
- final Double rCount = mq.getRowCount(aggregate.getInput());
- if (rCount == null) {
+ final Double inputRowCount = mq.getRowCount(aggregate.getInput());
+ final Double rowCount = mq.getRowCount(aggregate);
+ if (inputRowCount == null || rowCount == null) {
return null;
}
// 2. CPU cost = sorting cost
- final double cpuCost = algoUtils.computeSortCPUCost(rCount);
+ final double cpuCost = algoUtils.computeSortCPUCost(rowCount) +
inputRowCount * algoUtils.getCpuUnitCost();
Review comment:
I think the problem is that we are trying to encapsulate here the
algorithm selection too: The fact that we are grouping in each node before
sorting the data (I think this is also somehow reflected in the `isLe`
discussion above). However, that is not represented with precision by current
model, since output rows is supposed to be the output of the final step in the
aggregation.
Wrt read, there is also the IO part of the cost, I am trying to understand
whether some of the cost representation that you are talking about is IO.
There is some more info about the original formulas that were used to
compute this here:
https://cwiki.apache.org/confluence/display/Hive/Cost-based+optimization+in+Hive
Can we split this into two patches and have the changes to the cost model on
their own? This should also help to discuss this in more detail.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]