You probably would want (100, 100, 100) to come out as less than (99, 10000000, 10000000). So I don’t think a tie-break is a good idea. It is deterministic but it is arbitrary and frequently wrong.
Julian > On Jun 14, 2017, at 12:03 PM, JD Zheng <[email protected]> wrote: > > Sure. I’ll log them. > > As to make cost model totally ordered, can we add some kind of signature that > uniquely identify Relnode to break the tie? So the order will be > <cost(rowCount, cpuCost, ioCost), Relnode.signature>. > > -jiandan > > >> On Jun 14, 2017, at 11:41 AM, Julian Hyde <[email protected]> wrote: >> >> That does look like a bug. Can you log it please. >> >> The reason for the “if (false)” is that we found that costs don’t work well >> if they form only a partial order, not a total order. If you have two >> RelNodes R1 and R2 in an equivalent set, and they have costs C1 and C2, and >> neither C1 <= C2 nor C2 <= C1 is true, which is the Volcano planner to pick? >> It will tend to pick the one that it saw first, and that is bad news because >> it is arbitrary and non-deterministic. >> >> So, we should probably find a way to convert a RelOptCost to a >> totally-ordered value, such as by applying weights to cpu, io and memory >> cost and returning a double. (Anyone have a better idea?) Can you log a bug >> for that also. >> >> Julian >> >> >>> On Jun 13, 2017, at 10:47 AM, JD Zheng <[email protected]> wrote: >>> >>> Hi, >>> >>> Our team currently uses calcite druid-adaptor to query data in druid. We >>> found that at some cases when the limit is over 10 and the data set has >>> lots of dimensions, the limit is not pushed down to druid. >>> >>> We looked further at the cost calculation of the different plans, and found >>> that the following code in Sort.java looks suspicious: >>> >>> >>> @Override public RelOptCost computeSelfCost(RelOptPlanner planner, >>> RelMetadataQuery mq) { >>> // Higher cost if rows are wider discourages pushing a project through a >>> // sort. >>> double rowCount = mq.getRowCount(this); >>> double bytesPerRow = getRowType().getFieldCount() * 4; >>> return planner.getCostFactory().makeCost( >>> Util.nLogN(rowCount) * bytesPerRow, rowCount, 0); >>> >>> >>> >>> And the definition of makeCost is: >>> >>> public interface RelOptCostFactory { >>> /** >>> * Creates a cost object. >>> */ >>> RelOptCost makeCost(double rowCount, double cpu, double io); >>> >>> >>> >>> So, the first parameter should be rowCount, the second is cpu. >>> >>> It seems that caller is feeding the wrong parameters. >>> >>> Once we switch these two parameters, it works out fine: the limit is pushed >>> down to the druid query. >>> >>> >>> Are we doing the right thing by switching the parameters? Is it a bug here >>> or there’s any reason we feed the parameters this way? >>> >>> >>> >>> >>> By the way, we found some dead code in VolcanoCost.java >>> <PastedGraphic-1.tiff> >>> >>> >>> Does it mean that we don’t need to bother feed in the cpu cost and io cost, >>> these costs should be somehow modeled in rowcounts? >>> >>> >>> Thanks, >>> >>> -Jiandan >> >
