Re: Bug in Sort.computeSelfCost()?

Julian Hyde Wed, 14 Jun 2017 15:45:06 -0700

You probably would want (100, 100, 100) to come out as less than (99, 10000000, 
10000000). So I don’t think a tie-break is a good idea. It is deterministic but 
it is arbitrary and frequently wrong.


Julian


> On Jun 14, 2017, at 12:03 PM, JD Zheng <[email protected]> wrote:
> 
> Sure. I’ll log them. 
> 
> As to make cost model totally ordered, can we add some kind of signature that 
> uniquely identify Relnode to break the tie? So the order will be 
> <cost(rowCount, cpuCost, ioCost), Relnode.signature>.
> 
> -jiandan
> 
> 
>> On Jun 14, 2017, at 11:41 AM, Julian Hyde <[email protected]> wrote:
>> 
>> That does look like a bug. Can you log it please.
>> 
>> The reason for the “if (false)” is that we found that costs don’t work well 
>> if they form only a partial order, not a total order. If you have two 
>> RelNodes R1 and R2 in an equivalent set, and they have costs C1 and C2, and 
>> neither C1 <= C2 nor C2 <= C1 is true, which is the Volcano planner to pick? 
>> It will tend to pick the one that it saw first, and that is bad news because 
>> it is arbitrary and non-deterministic.
>> 
>> So, we should probably find a way to convert a RelOptCost to a 
>> totally-ordered value, such as by applying weights to cpu, io and memory 
>> cost and returning a double. (Anyone have a better idea?) Can you log a bug 
>> for that also.
>> 
>> Julian
>> 
>> 
>>> On Jun 13, 2017, at 10:47 AM, JD Zheng <[email protected]> wrote:
>>> 
>>> Hi, 
>>> 
>>> Our team currently uses calcite druid-adaptor to query data in druid. We 
>>> found that at some cases when the limit is over 10 and the data set has 
>>> lots of dimensions, the limit is not pushed down to druid.
>>> 
>>> We looked further at the cost calculation of the different plans, and found 
>>> that the following code in Sort.java looks suspicious:
>>> 
>>> 
>>> @Override public RelOptCost computeSelfCost(RelOptPlanner planner,
>>>     RelMetadataQuery mq) {
>>>   // Higher cost if rows are wider discourages pushing a project through a
>>>   // sort.
>>>   double rowCount = mq.getRowCount(this);
>>>   double bytesPerRow = getRowType().getFieldCount() * 4;
>>>   return planner.getCostFactory().makeCost(
>>>       Util.nLogN(rowCount) * bytesPerRow, rowCount, 0);
>>> 
>>> 
>>> 
>>> And the definition of makeCost is:
>>> 
>>> public interface RelOptCostFactory {
>>> /**
>>>  * Creates a cost object.
>>>  */
>>> RelOptCost makeCost(double rowCount, double cpu, double io);
>>> 
>>> 
>>> 
>>> So, the first parameter should be rowCount, the second is cpu. 
>>> 
>>> It seems that caller is feeding the wrong parameters.
>>> 
>>> Once we switch these two parameters, it works out fine: the limit is pushed 
>>> down to the druid query.
>>> 
>>> 
>>> Are we doing the right thing by switching the parameters? Is it a bug here 
>>> or there’s any reason we feed the parameters this way?
>>> 
>>> 
>>> 
>>> 
>>> By the way, we found some dead code in VolcanoCost.java
>>> <PastedGraphic-1.tiff>
>>> 
>>> 
>>> Does it mean that we don’t need to bother feed in the cpu cost and io cost, 
>>> these costs should be somehow modeled in rowcounts?
>>> 
>>> 
>>> Thanks,
>>> 
>>> -Jiandan
>> 
>

Re: Bug in Sort.computeSelfCost()?

Reply via email to