[ optimizer ] Some questions on optimization cost estimates?

Army Fri, 31 Mar 2006 16:58:11 -0800

While trying to complete the Phase 4 work for DERBY-805 I've come across somebehavior in the optimizer's treatment of cost estimates that I don't fullyunderstand. If anyone out there can offer answers/feedback for any of thefollowing three questions, I'd appreciate it...


--------------------------------
1) Multiplication by outer rows.
--------------------------------

There are four classes--ProjectRestrictNode, UnionNode, JoinNode, andFromTable--that implement the "optimizeIt()" method and, in that method, call"getJoinStrategy().estimateCost(...)". One of the arguments to optimizeIt() isan outerCost value which holds the estimated cost for FromTables left of thecurrent one for this join order. So if we have a FROM list that is [ FT1, FT2,FT3 ] and the current FromTable (the one we're optimizing) is FT2, thenouterCost will hold the estimated cost for FT1. For the sake of this question,assume FT1, FT2, and FT3 are instances of one of the above-mentioned classes.outerCost is then used in two ways by these FTs: a) it's passed down to thechildren nodes, where it's factored into the cost estimates of those children,and b) it's passed into a call to estimate the cost of the current FT itself.As an example, take UnionNode. The a) code is like this:


        leftResultSet = optimizeSource(
                                optimizer,
                                leftResultSet,
                                getLeftOptPredicateList(),
                                outerCost);  // <==

        rightResultSet = optimizeSource(
                                optimizer,
                                rightResultSet,
                                getRightOptPredicateList(),
                                outerCost);  // <==

The b) code is then:

        /*
        ** Get the cost of this result set in the context of the whole plan.
        */
        getCurrentAccessPath().
                getJoinStrategy().
                        estimateCost(
                                this,
                                predList,
                                (ConglomerateDescriptor) null,
                                outerCost,   // <==
                                optimizer,
                                costEstimate
                                );

In both cases, the only part of outerCost that's actually used is the estimatedrow count. In cases where we're costing a Nested Loop Join (either for thecurrent FT or for any of it's children), we multiply the cost of the current FT(or its children) by the number of outer rows, since we will in effect beincurring the cost of the inner FT for each row of the outer FT.

All of that said, my question is this: why do we need to pass the outerCost downto the children, where it will be multiplied by their costs, AND at the sametime pass it into the "getJoinStrategy().estimateCost()" method, where it willbe multiplied (again) by the cost of the current FT? It seems to me the likethe correct thing to do here is to _not_ pass outerCost to the children and toinstead just let the current FT do the multiplication if necessary.

As a trivialized example, assume our FT2 is a UnionNode over two subqueries,each of which just has a single table:


FT2 = (select * from T1) UNION (select * from T2)

And assume FT1 is an outer table with row count of 100. Then in the part a)code above, we'll pass "100" down to the left child, where it will eventuallymake it down to T1 and we'll multiply the cost of T1 by 100. Likewise we'llpass "100" down to the right child and multiply the cost of T2 by 100:


T1 estimated cost: 100 * T1_cost
T2 estimated cost: 100 * T2_cost

Then we'll set the cost of the UnionNode to be the sum of the estimated costs ofits children:


FT2 estimated cost: (100 * T1_cost) + (100 * T2_cost)

That done, if the current join strategy for FT2 is NestedLoop, the call in partb) above will pass "100" to NestedLoopJoinStrategy.estimateCost(), where it willbe multiplied by the cost of FT2:


FT2 estimatedCost: 100 * ((100 * T1_cost) + (100 * T2_cost))

In other words, we've ended up multiplying the cost of FT2 by the outer rowcount *TWICE*.

I guess this could be intentional in order to make sure we favor hash joins(which won't do the second multiplication--see below) over nested loop joins,but it'd be great if someone could confirm this one way or the other. As it is,it seems a bit odd to me.


----------------------------------
2) HashJoinStrategy.estimateCost()
----------------------------------

As mentioned at the end of question 1, if the join strategy that we're currentlycosting is a HashJoin and we make a call to HashJoinStrategy.estimateCost() (seeabove for an example), we get here:


/** @see JoinStrategy#estimateCost */
public void estimateCost(Optimizable innerTable,
                         OptimizablePredicateList predList,
                         ConglomerateDescriptor cd,
                         CostEstimate outerCost,
                         Optimizer optimizer,
                         CostEstimate costEstimate) {
        /*
        ** The cost of a hash join is the cost of building the hash table.
        ** There is no extra cost per outer row.
        */
}

I.e. we don't do anything. But at the same time, if we're costing aFromBaseTable and considering a HashJoin, we _do_ actually use the outer rowcount. See FromBaseTable.estimateCost():


        /*
        ** Let the join strategy decide whether the cost of the base
        ** scan is a single scan, or a scan per outer row.
        ** NOTE: In this case we only want to multiply against the
        ** total row count, not the singleScanRowCount.
        */
        if (currentJoinStrategy.multiplyBaseCostByOuterRows())
        {
                newCost *= outerCost.rowCount();
        }
        rowCount *= outerCost.rowCount();

For a HashJoin the call to "multiplyBaseCostByOuterRows() will return false.Notice though that while we will not multiply the current cost (newCost) by theouter row count for a hash join, we _will_ multiply the current _rowCount_ bythe outer row count.

So my question is this: why is it that, for a FromBaseTable, we multiply thecurrent rowCount by the outer row count for a Hash join (as perFromBaseTable.estimateCost()), but for other (non-base-table) FromTables (suchas UnionNode) we skip this multiplication (as perHashJoinStrategy.estimateCost())? Is this an intentional difference, and if so,does anyone know why?


----------------------------------
3) Comparing plan costs.
----------------------------------

In OptimizerImpl.java there is the following code for comparing "current" costwith "best cost" for this round of optimization:


        /*
        ** Is the cost of this join order lower than the best one we've
        ** found so far?
        **
        ** NOTE: If the user has specified a join order, it will be the
        ** only join order the optimizer considers, so it is OK to use
        ** costing to decide that it is the "best" join order.
        */
        if ((! foundABestPlan) || currentCost.compare(bestCost) < 0)
        {
                rememberBestCost(currentCost, Optimizer.NORMAL_PLAN);

                // Since we just remembered all of the best plans,
                // no need to reload them when pulling Optimizables
                // from this join order.
                reloadBestPlan = false;
        }

The call to "currentCost.compare()" only compares the estimatedCost fields; itdoes not look at the other two fields of a CostEstimate--namely, rowCount andsingleScanRowCount.

In running some tests against 10.1, though, I noticed that it is possible forcurrentCost and bestCost to have the same estimatedCost values but different rowcounts. Since the above check does not look at row counts, we can end up insituations where the following two queries can result in different query plans:


select * from V1, V2 where V1.j = V2.b;
select * from V2, V1 where V1.j = V2.b;

I.e. the queries are identical with the exception of the order of V1 and V2. Inmy particular example V1 and V2 are both unions of two tables, where V2 has100,000 rows and V1 has less than ten. When I ran these queries against 10.1with NO optimizer timeout--meaning that the optimizer had time to examine allpossible query plans--the result was surprising: the first query chose joinorder [ V2, V1 ] and took 4.5 to 5 seconds to complete; the second query chosejoin order [ V1, V2 ] and took 30-33 seconds to complete. Wow.

I made a small change to the CostEstimateImpl.compare() method to considerrowCount as a secondary field of comparison (if the estimatedCosts are the same)and singleScanRowCount as a third field of comparison (if the rowCounts arethe same). With that change the behavior was then what I expected: namely, theoptimizer chose the same join order (the faster one) regardless of which query Iran.

So my question is this: if two cost estimates have the same estimatedCost butdifferent rowCounts (or even singleScanRowCounts), it is reasonable to supposethat the estimate with the lower rowCount is actually better? In the above casethe query with the lower row count was the one that ran more quickly--is itreasonable to believe that this will generally be the case?

---

As always, thanks to anyone with the time and inclination to read this email andprovide feedback...


Army

[ optimizer ] Some questions on optimization cost estimates?

Reply via email to