Sure. I’ll log them. As to make cost model totally ordered, can we add some kind of signature that uniquely identify Relnode to break the tie? So the order will be <cost(rowCount, cpuCost, ioCost), Relnode.signature>.
-jiandan > On Jun 14, 2017, at 11:41 AM, Julian Hyde <[email protected]> wrote: > > That does look like a bug. Can you log it please. > > The reason for the “if (false)” is that we found that costs don’t work well > if they form only a partial order, not a total order. If you have two > RelNodes R1 and R2 in an equivalent set, and they have costs C1 and C2, and > neither C1 <= C2 nor C2 <= C1 is true, which is the Volcano planner to pick? > It will tend to pick the one that it saw first, and that is bad news because > it is arbitrary and non-deterministic. > > So, we should probably find a way to convert a RelOptCost to a > totally-ordered value, such as by applying weights to cpu, io and memory cost > and returning a double. (Anyone have a better idea?) Can you log a bug for > that also. > > Julian > > >> On Jun 13, 2017, at 10:47 AM, JD Zheng <[email protected]> wrote: >> >> Hi, >> >> Our team currently uses calcite druid-adaptor to query data in druid. We >> found that at some cases when the limit is over 10 and the data set has lots >> of dimensions, the limit is not pushed down to druid. >> >> We looked further at the cost calculation of the different plans, and found >> that the following code in Sort.java looks suspicious: >> >> >> @Override public RelOptCost computeSelfCost(RelOptPlanner planner, >> RelMetadataQuery mq) { >> // Higher cost if rows are wider discourages pushing a project through a >> // sort. >> double rowCount = mq.getRowCount(this); >> double bytesPerRow = getRowType().getFieldCount() * 4; >> return planner.getCostFactory().makeCost( >> Util.nLogN(rowCount) * bytesPerRow, rowCount, 0); >> >> >> >> And the definition of makeCost is: >> >> public interface RelOptCostFactory { >> /** >> * Creates a cost object. >> */ >> RelOptCost makeCost(double rowCount, double cpu, double io); >> >> >> >> So, the first parameter should be rowCount, the second is cpu. >> >> It seems that caller is feeding the wrong parameters. >> >> Once we switch these two parameters, it works out fine: the limit is pushed >> down to the druid query. >> >> >> Are we doing the right thing by switching the parameters? Is it a bug here >> or there’s any reason we feed the parameters this way? >> >> >> >> >> By the way, we found some dead code in VolcanoCost.java >> <PastedGraphic-1.tiff> >> >> >> Does it mean that we don’t need to bother feed in the cpu cost and io cost, >> these costs should be somehow modeled in rowcounts? >> >> >> Thanks, >> >> -Jiandan >
