[ 
http://issues.apache.org/jira/browse/DERBY-2130?page=comments#action_12455802 ] 
            
A B commented on DERBY-2130:
----------------------------

Thanks for the query plans, Bryan.  I took a quick look and nothing stood out, 
but I'll have to look more closely before I can say anything for sure.  Thank 
you for taking the time to pull out the "noise"--I think that'll be helpful.

On a different note, I added some System.outs to various places in the 
Optimizer to see what cost estimates the optimizer is calculating for different 
complete join orders in the top-level OptimizerImpl.  I then ran the repro 
query against the latest 10.1 branch and the latest 10.3 trunk.  The 
abbreviated output from what I saw is shown below.

Some things to note:

  0. Lots of "Infinity" and "NaN" cost estimates, which are probably the result 
of DERBY-1905 and DERBY-1260, respectively. This pretty much renders all but a 
handful of the cost estimates useless.

  1. In 10.1.3 we try to jump to a target join order and we SUCCESSFULLY do so. 
 In 10.3 we also try to jump to the same target join order, BUT for some reason 
the jump does NOT take effect.  So it looks like Bryan's question about whether 
or not permutation jumping works correctly was indeed well-founded.  This is, I 
think, the first thing to investigate.

  2. Because the 10.3 "jump" does not occur we spend a lot of time iterating 
through different permutations until we get to the "best join order" found by 
10.1. Once we get there the estimated cost for 10.3 (on a 10.1 database) is 
finite but is much higher than the cost estimate of the same join order in 10.1 
(namely, "247" in 10.1 vs "1736685" in 10.3).  So this is another point of 
interest.  This could again be the result of DERBY-1260, but it's hard to say 
at this point.

  3. This next observation is weird: when running 10.3 on a 10.1 database the 
query finishes after about 10,000 iterations (roughly 5 minutes on my laptop).  
But when running 10.3 on a *10.3* database the query runs until it has 
exhausted all possible permutations (over 40,000).  The difference is in the 
calculated cost estimate for the "2 0 1 3 4 5 6 7" join order.  When running 
against a 10.1 database 10.3 estimates a cost of 1736685, followed up by a cost 
of 198589 for the next complete order, which leads leads to eventual timeout.  
But when running against a 10.3 database, the cost estimate is *Infinity* and 
thus the timeout value sits at 5E62, which we never reach.  I have absolutely 
no idea why this would be the case.  Any one out there have any theories?   
Bryan, can you perhaps try this out to see if you observe the same behavior?  
You can connect to a 10.1 database by setting the 
"derby.database.allowPreReleaseUpgrade" property to "true".  Note that you 
should only use that on a _test_ 10.1 database, though; I don't think it's 
intended for production db's.

Generally speaking lines in the output are of the form:

-=> estimatedCost: j o i n O r d e r

--------------------------
10.1.3 on 10.1.3 database:
--------------------------

ij version 10.1
-=> 5.607246651648704E62: 0 1 2 3 4 5 6 7
-==> Just set bestCost to: 5.607246651648704E62
-=-> JUMPING to: 1 4 2 6 7 0 3 5
-=> Infinity: 1 4 2 6 7 0 3 5
-=> NaN: 1 4 2 6 7 0 5 3
-=> NaN: 1 4 2 6 7 3 0 5
-=> NaN: 1 4 2 6 7 3 5 0
-=> NaN: 1 4 2 6 7 5 0 3
-=> NaN: 1 4 2 6 7 5 3 0
-=> NaN: 1 4 2 7 0 3 5 6
-=> NaN: 1 4 2 7 0 3 6 5
-=> NaN: 1 4 2 7 0 5 3 6
-=> NaN: 1 4 2 7 0 5 6 3
-=> NaN: 1 4 2 7 0 6 3 5
-=> NaN: 1 4 2 7 0 6 5 3

<snip ~2,600 permutations with cost estimates of "NaN">

-=> NaN: 1 7 6 5 4 0 2 3
-=> NaN: 1 7 6 5 4 0 3 2
-=> NaN: 1 7 6 5 4 2 0 3
-=> NaN: 1 7 6 5 4 2 3 0
-=> NaN: 1 7 6 5 4 3 0 2
-=> NaN: 1 7 6 5 4 3 2 0
-=> 247.4647753453739: 2 0 1 3 4 5 6 7
-==> Just set bestCost to: 247.4647753453739
-=-> TIMED OUT: exceeded 247.4647753453739 ms.
-=> 247.4647753453739: 2 0 1 3 4 5 6 7
ELAPSED TIME = 82961 milliseconds

------------------------
10.3 on *10.1.3* database:
------------------------

(on the *same* database that was created by 10.1)

ij version 10.3
-=> 5.607246651648704E62: 0 1 2 3 4 5 6 7
-==> Just set bestCost to: 5.607246651648704E62
-=-> JUMPING to: 1 4 2 6 7 0 3 5
-=> 1.8253980349686949E106: 0 1 2 3 4 5 6 7
-=> 3.021440681134236E107: 0 1 2 3 4 5 7 6
-=> 8.374147544754745E119: 0 1 2 3 4 6 5 7
-=> 8.374147544754745E119: 0 1 2 3 4 6 7 5
-=> 5.993308284393933E109: 0 1 2 3 4 7 5 6
-=> 2.736734332816085E108: 0 1 2 3 4 7 6 5
-=> 3.973494182787193E115: 0 1 2 3 5 4 6 7
-=> 6.577018684224087E116: 0 1 2 3 5 4 7 6
-=> 3.973494182787193E115: 0 1 2 3 5 6 4 7
-=> 6.577018684224087E116: 0 1 2 3 5 7 4 6
-=> 8.374147544754745E119: 0 1 2 3 6 4 5 7
-=> 8.374147544754745E119: 0 1 2 3 6 4 7 5
-=> 1.0852895218002162E119: 0 1 2 3 6 5 4 7
-=> 8.374147544754745E119: 0 1 2 3 6 7 4 5
-=> 5.993308284393933E109: 0 1 2 3 7 4 5 6
-=> 2.736734332816085E108: 0 1 2 3 7 4 6 5
-=> 1.2945545894290903E112: 0 1 2 3 7 5 4 6
-=> 2.736734332816085E108: 0 1 2 3 7 6 4 5
-=> NaN: 0 1 2 4 5 3 6 7
-=> NaN: 0 1 2 4 5 3 7 6
-=> NaN: 0 1 2 4 5 6 3 7
-=> NaN: 0 1 2 4 5 6 7 3
-=> NaN: 0 1 2 4 5 7 3 6
-=> NaN: 0 1 2 4 5 7 6 3

<snip ~10,000 permutations with cost estimates of "NaN">

-=> NaN: 1 7 6 5 4 0 2 3
-=> NaN: 1 7 6 5 4 0 3 2
-=> NaN: 1 7 6 5 4 2 0 3
-=> NaN: 1 7 6 5 4 2 3 0
-=> NaN: 1 7 6 5 4 3 0 2
-=> NaN: 1 7 6 5 4 3 2 0
-=> 1736685.050617207: 2 0 1 3 4 5 6 7
-==> Just set bestCost to: 1736685.050617207
-=> 198589.53739726284: 2 0 1 3 4 5 7 6
-==> Just set bestCost to: 198589.53739726284
-=-> TIMED OUT: exceeded 198589.53739726284 ms.
-=> 198589.53739726284: 2 0 1 3 4 5 7 6
ELAPSED TIME = 336370 milliseconds

------------------------
10.3 on 10.3 database:
------------------------

ij version 10.3
-=> 5.607246651648704E62: 0 1 2 3 4 5 6 7
-==> Just set bestCost to: 5.607246651648704E62
-=-> JUMPING to: 1 4 2 6 7 0 3 5
-=> 1.8253980349686949E106: 0 1 2 3 4 5 6 7
-=> 3.021440681134236E107: 0 1 2 3 4 5 7 6
-=> 8.374147544754745E119: 0 1 2 3 4 6 5 7
-=> 8.374147544754745E119: 0 1 2 3 4 6 7 5
-=> 5.993308284393933E109: 0 1 2 3 4 7 5 6
-=> 2.736734332816085E108: 0 1 2 3 4 7 6 5
-=> 3.973494182787193E115: 0 1 2 3 5 4 6 7
-=> 6.577018684224087E116: 0 1 2 3 5 4 7 6
-=> 3.973494182787193E115: 0 1 2 3 5 6 4 7
-=> 6.577018684224087E116: 0 1 2 3 5 7 4 6
-=> 8.374147544754745E119: 0 1 2 3 6 4 5 7
-=> 8.374147544754745E119: 0 1 2 3 6 4 7 5
-=> 1.0852895218002162E119: 0 1 2 3 6 5 4 7
-=> 8.374147544754745E119: 0 1 2 3 6 7 4 5
-=> 5.993308284393933E109: 0 1 2 3 7 4 5 6
-=> 2.736734332816085E108: 0 1 2 3 7 4 6 5
-=> 1.2945545894290903E112: 0 1 2 3 7 5 4 6
-=> 2.736734332816085E108: 0 1 2 3 7 6 4 5
-=> NaN: 0 1 2 4 5 3 6 7
-=> NaN: 0 1 2 4 5 3 7 6
-=> NaN: 0 1 2 4 5 6 3 7
-=> NaN: 0 1 2 4 5 6 7 3
-=> NaN: 0 1 2 4 5 7 3 6

<snip ~10,000 permutations with cost estimates of "NaN">

-=> NaN: 1 7 6 5 4 2 0 3
-=> NaN: 1 7 6 5 4 2 3 0
-=> NaN: 1 7 6 5 4 3 0 2
-=> NaN: 1 7 6 5 4 3 2 0
-=> Infinity: 2 0 1 3 4 5 6 7
-=> NaN: 2 0 1 3 4 5 7 6
-=> NaN: 2 0 1 3 4 6 5 7
-=> NaN: 2 0 1 3 4 6 7 5
-=> NaN: 2 0 1 3 4 7 5 6
-=> NaN: 2 0 1 3 4 7 6 5
-=> NaN: 2 0 1 3 5 4 6 7

<snip ~30,000 permutations, most with cost estimates of "NaN", a few with 
estimates of Infinity>

-=> NaN: 7 6 5 4 3 1 0 2
-=> NaN: 7 6 5 4 3 1 2 0
-=> NaN: 7 6 5 4 3 2 0 1
-=> NaN: 7 6 5 4 3 2 1 0
-=-> ENDED because ran out of permutations.
-=> 5.607246651648704E62: 0 1 2 3 4 5 6 7
ELAPSED TIME = 1145530 milliseconds

I'm still trying to make sense of this data, but perhaps this info will be 
helpful in tracking down the issue(s) here...

> Optimizer performance slowdown from 10.1 to 10.2
> ------------------------------------------------
>
>                 Key: DERBY-2130
>                 URL: http://issues.apache.org/jira/browse/DERBY-2130
>             Project: Derby
>          Issue Type: Bug
>          Components: Performance, SQL
>    Affects Versions: 10.2.1.6, 10.3.0.0, 10.1.3.1
>            Reporter: Bryan Pendleton
>         Attachments: plan10_1_2_1.txt, plan10_2.txt, plans.diff, repro.sql
>
>
> Attached is 'repro.sql', an IJ script which demonstrates what I
> believe to be a serious performance issue in the Optimizer.
> I have run this script in a number of configurations:
>  - 10.1.2.1: the script runs successfully. The 'prepare' statement
>    takes about 90 seconds, on a fairly powerful Windows machine
>  - 10.1.3.1: the script produces a NPE. I believe this is DERBY-1777
>  - 10.2.1.8/trunk: the script runs successfully. The 'prepare' statement
>    often takes about 220 seconds, on the same Windows machine
>    Intermittently, on 10.2 and on the trunk, the prepare statement takes
>    15+ minutes. I cannot reliably reproduce this; I run the same script
>    several times in a row and I cannot predict whether it will take 220
>    seconds or whether it will take 15+ minutes.
> I am quite motivated to work on this problem, as this is blocking me from
> using Derby for a project that I'm quite keen on, but I need some
> suggestions and ideas about how to attack it. From my perspective
> there are 3 primary topics:
> 1) Why did optimizer performance for this query degrade so significantly
> from 10.1.2.1 to 10.2? The optimizer seems to be at least 2.5 times slower,
> for this particular query at least, in 10.2. Sometimes it is 10x slower.
> 2) What is the source of the non-determinism? Why does the optimizer
> often take 4 minutes to optimize this query on the trunk, but sometimes
> take 15+ minutes? I don't believe that I'm changing anything from
> run to run.
> 3) Can we improve the optimizer performance even beyond what it was
> for 10.1.2? I realize that this is an ugly query, but I was hoping to
> see an optimization time of 5-10 seconds, not 90 seconds (and certainly
> not 220 seconds).
> I have attempted to start answering some of these questions, with
> limited success. Here is some of what I think I've discovered so far:
>  - the optimizer changes in 10.2 seem to have given the optimizer many
>    more choices of possible query plans to consider. I think this means
>    that, if the optimizer does not time out, it will spend substantially
>    more time optimizing because there are more choices to evaluate. Does
>    this by itself mean that the optimizer will take 2.5 times longer in
>    10.2 than it did in 10.1?
>  - something about this query seems to make the costing mechanism go
>    haywire, and produce extreme costs. While stepping through the
>    optimization of this query in the debugger I have seen it compute
>    costs like 1e63 and 1e200. This might be very closely related to
>    DERBY-1905, although I don't think I'm doing any subqueries here.
>    But maybe I'm misunderstanding the term "subquery" in DERBY-1905.
>    At any rate, due to the enormous estimated costs, timeout does not
>    occur.
>  - the WHERE clause in this query is converted during compilation to 
>    an equivalent IN clause, I believe, which then causes me to run into
>    a number of the problems described in DERBY-47 and DERBY-713.
>    Specifically, rather than constructing a plan which involves 4
>    index probes for the 4 WHERE clause values, the optimizer decides
>    that an index scan must be performed and that it will have to process
>    the entire index (because the query uses parameter markers, not
>    literal values). So perhaps solving DERBY-47 would help me
>  - the optimizer in fact comes up with a "decent" query plan quite quickly.
>    I have experimented with placing a hard limit into the optimizer
>    timeout code, so that I can force optimization to stop after an
>    arbitrary fixed period of time. Then I have been able to set that
>    value to as low as 1 second, and the optimizer has produced plans
>    that then execute in a few milliseconds. Of course, I have only tried
>    this with a trivial amount of data in my database, so it's possible
>    that the plan produced by the optimizer after just a second of
>    optimizing is in fact poor, and I'm just not noticing it because my
>    data sizes are so small.
> At this point, what would be really helpful to me would be some suggestions
> about some general approaches or techniques to try to start breaking down
> and analyzing this problem.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to