Hiya Rick, Thanks for you helpful reply! I spent some time over the weekend snooping around (mostly) InListOperatorNode and OptimizerImpl, but never found out _where_ the table scan was chosen over an index scan for finding the IN matches...I really wanted to find it before begging for someone to point out the obvious :-) Specifically, I got all tangled up around the dynamically generated classes which implement the chosen query plan. Also, I didn't ever find any QueryPlan-like class, but I guess this what the dynamic class(es) implement.
What did I want to do once I got there? Pseudo-code says it best: // ISTR that the bound IN values are ordered and unique. If not, // this should be done... if( column is indexed ){ if( index is unique ){ // this should be a guaranteed win. We know we'll only get // one match per IN value do index scan } else if( table size > SOME_MAGIC_SIZE ){ // SOME_MAGIC_SIZE would represent the size of (the) table // at which using the index would be reasonably known to // be more expensive than scanning the table. do index scan } else { do table scan } } else { do table scan } Is that a workable approach (excepting the SOME_MAGIC_SIZE bit)? I'll probably have some time later this week to immerse myself in locating the magic location to apply this logic. A _little_ pointer would be helpful: maybe a brief overview of how a query plan is determined -- I suspect his is somewhere around where trulyTheBestTableAccess (or whatever it's really called :-)) is defined. I'd rather have to bang my head against this wall a bit, to get a better understanding of things, instead of being pointed immediately to the solution -- my own version of smelling the roses, I guess ;-) Cheers! Brent [2006-06-19 11:30] Rick Hillegas said: | Hi Brent, | | Sounds like you're off to a good start. From the initial bug report, it | looks like there's a good idea about which heuristic is being | mis-applied here. Once you've studied the optimizer papers, I recommend | that you post some high-level candidate solutions. Try to avoid | optimizer jargon and concentrate on simple descriptions: | | o What query plan would you rather see? | o What heuristic would the optimizer apply that would lead it to your | preferred plan? | o How would the optimizer decide to apply the new heuristic rather than | the old one? | | I think you'll get some good feedback if your post contains the phrase | "Attention, optimizer experts." I think that the optimizer enthusiasts | on the list will give you good feedback: | | o Maybe they can think of a better query plan or heuristic. | o Maybe they can see some awkward corner cases in your heuristic. | o They can advise you on whether your heuristic will short-circuit other | optimizer choices. | o They can advise you on whether your heuristic will cause an explosion | in the time that the optimizer takes. | | Thanks for wanting to scratch this itch! | | Regards, | -Rick | | Brent Verner wrote: | | >Hi, | > | > I've recently found need for an embedded java db and only Derby seems | >even close to handling the task, however the broken query planning for IN | >clauses makes it unusable :-(. I've decided to eschew an embedded db | >in favor of PG for now, but I'd really like to be able to use Derby in the | >near(ish) future for deployment. | > | > I'd like to try to fix the query planning around the IN clause. I'm | >reviewing the internals papers right now, but I'd appreciate any input | >that might point me in the right direction :-). | > | >cheers! | > Brent | > | > | >