Optimizer cost estimates for subqueries are way (way) too high.
---------------------------------------------------------------

                 Key: DERBY-1905
                 URL: http://issues.apache.org/jira/browse/DERBY-1905
             Project: Derby
          Issue Type: Bug
          Components: Performance, SQL
    Affects Versions: 10.1.3.2, 10.2.1.5, 10.3.0.0
            Reporter: A B


If I run the attached repro (pulled from DERBY-1866) with 
derby.optimizer.noTimeout=true (meaning the optimizer should take all the time 
it needs to try out all possible plans within the restrictions of the optimizer 
overrides), the estimated cost shown in derby.log (if I log the query plan) is 
over 600k and the estimated row count is over 520k.

If, as the code in OptimizerImpl seems to expect, the unit for a cost estimate 
is milliseconds, then the optimize here is guessing that the query will take 
over 10 minutes to execute and will return a half-million rows.  But in truth 
the *combined* time for compilation AND execution is just a couple of seconds, 
and only 15 rows are returned.

That suggests to me a rather serious problem with the optimizer cost estimates 
for subqueries.

Among other things this can have a major impact on the optimizer's timeout 
mechanism for very deeply-nested queries.  The optimizer will continue to try 
out different combinations of indexes/joinOrders/joinStrategies until it either 
exhausts them all or until the number of milliseconds spent optimizing is 
greater than the "best cost" estimate so far. In the case of the repro for this 
issue, the optimizer quickly exhausts all of the options and so it finishes in 
a fair amount of time.

But in larger queries where there are far more combinations to try (see esp. 
queries attached to DERBY-1205, DERBY-1777), these severly inflated cost 
estimates get very large very quickly (sometimes even reaching infinity--see 
DERBY-1259, DERBY-1260) and so the optimizer just keeps on optimizing and never 
times out.  The result is that for queries like those in DERBY-1777, the 
optimizer can spend literally hours optimizing a query which then executes in a 
matter of seconds.

I'm still investigating this, but preliminary examination suggests that part of 
the problem is in some way related to the treatment of "outer costs" by the 
optimizer--and in particular, it looks like the optimizer is perhaps too 
aggressive in multiplying cost estimates by row counts pulled from "outer 
costs".  That's just my first guess at the problem, though; there could of 
course be far more to it than that...

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to