Hiya Rick,

   Thanks for you helpful reply!  I spent some time over the weekend
snooping around (mostly) InListOperatorNode and OptimizerImpl, but 
never found out _where_ the table scan was chosen over an index scan
for finding the IN matches...I really wanted to find it before begging
for someone to point out the obvious :-)  Specifically, I got all 
tangled up around the dynamically generated classes which implement
the chosen query plan.  Also, I didn't ever find any QueryPlan-like 
class, but I guess this what the dynamic class(es) implement.


  What did I want to do once I got there?  Pseudo-code says it best:

    // ISTR that the bound IN values are ordered and unique.  If not,
    // this should be done...

    if( column is indexed ){
        if( index is unique ){
            // this should be a guaranteed win.  We know we'll only get
            // one match per IN value
            do index scan
        }
        else if( table size > SOME_MAGIC_SIZE ){
            // SOME_MAGIC_SIZE would represent the size of (the) table
            // at which using the index would be reasonably known to
            // be more expensive than scanning the table.
            do index scan
        }
        else {
            do table scan
        }
    }
    else {
        do table scan
    }


  Is that a workable approach (excepting the SOME_MAGIC_SIZE bit)?

  I'll probably have some time later this week to immerse myself in locating 
the magic location to apply this logic.  A _little_ pointer would be 
helpful: maybe a brief overview of how a query plan is determined -- I
suspect his is somewhere around where trulyTheBestTableAccess (or whatever
it's really called :-)) is defined.  I'd rather have to bang my head 
against this wall a bit, to get a better understanding of things, instead
of being pointed immediately to the solution -- my own version of smelling
the roses, I guess ;-)

Cheers!
  Brent


[2006-06-19 11:30] Rick Hillegas said:
| Hi Brent,
| 
| Sounds like you're off to a good start. From the initial bug report, it 
| looks like there's a good idea about which heuristic is being 
| mis-applied here. Once you've studied the optimizer papers, I recommend 
| that you post some high-level candidate solutions. Try to avoid 
| optimizer jargon and concentrate on simple descriptions:
| 
| o What query plan would you rather see?
| o What heuristic would the optimizer apply that would lead it to your 
| preferred plan?
| o How would the optimizer decide to apply the new heuristic rather than 
| the old one?
| 
| I think you'll get some good feedback if your post contains the phrase 
| "Attention, optimizer experts." I think that the optimizer enthusiasts 
| on the list will give you good feedback:
| 
| o Maybe they can think of a better query plan or heuristic.
| o Maybe they can see some awkward corner cases in your  heuristic.
| o They can advise you on whether your heuristic will short-circuit other 
| optimizer choices.
| o They can advise you on whether your heuristic will cause an explosion 
| in the time that the optimizer takes.
| 
| Thanks for wanting to scratch this itch!
| 
| Regards,
| -Rick
| 
| Brent Verner wrote:
| 
| >Hi,
| >
| >    I've recently found need for an embedded java db and only Derby seems
| >even close to handling the task, however the broken query planning for IN
| >clauses makes it unusable :-(.  I've decided to eschew an embedded db
| >in favor of PG for now, but I'd really like to be able to use Derby in the
| >near(ish) future for deployment.
| >
| >    I'd like to try to fix the query planning around the IN clause.  I'm
| >reviewing the internals papers right now, but I'd appreciate any input
| >that might point me in the right direction :-).
| >
| >cheers!
| > Brent
| >
| > 
| >

Reply via email to