On Sun, September 3, 2006 21:52, Gregory Stark wrote: > I read that but apparently I misunderstood it since it would not have been > doable the way I understood it. I thought you wanted the predictor bits to > correspond to particular plans. If a plan was "wrong" then you marked it > as a > bad guess. I don't think that can be made to work though for the reasons I > stated then.
Oh, sorry--I guess I haven't been too systematic about it. In the algorithm's current incarnation, the confidence counters don't mark anything as bad ideas, as such. Instead, whenever a new invocation has a set of parameters that are (1) correctly and (2) confidently predicted by the counters, the predictor decides that it's probably a good idea to plan for calls with that particular set of parameters built in as constants. Assuming we didn't already have a plan for the particular combination of values at hand, the algorithm generates a new one. The new plan is optimized on the assumption that those predicted parameters are constants. We keep a small cache of recently-used plans, possibly including the original plan where all parameters are truly variable. Every plan also remembers a list of the predicted parameter values, so on any next call, we can check whether a particular cached plan actually applies to the call. If it doesn't (because of a mismatch between the incoming parameters and the plan's assumed pseudo-constants), we just pick another plan. If multiple cached plans can be applied to a given call, we prefer the one that optimizes away the most parameters. Age is used as a tiebreaker, on the assumption that more recent planning information is likely to be more accurate. The tricky part is deciding when to generate a new, more specialized plan when we already have a matching one that may not be optimal. Without this step we'd never get beyond that first, completely generic plan--it applies to every call. The way I've approached it is this: when the predictor's current state correctly and confidently predicts more of the invocation's parameter values than any of the cached plans did, then it's time to generate a new plan. So let's say your statement has a parameter x that's always the same value, say x==0, and another parameter y that's a randomly alternating Boolean value, and another one z that varies randomly between lots of values. What order they're in doesn't matter, and they needn't be the only parameters. You're probably going to see four plans generated for this example: 1. Either on the first call or during definition, you get the generic plan. This is the same plan you'd get in the existing backend, with placeholders for variable x, y, and z. 2. Pretty soon, the algorithm is going to detect that x is always zero. It will generate a new plan, substituting the constant value 0 for x, and hopefully getting better optimization because of it. 3. Sooner or later (probably fairly soon) you'll see a run of consecutive calls where y happens to be "true." A new plan is generated with the assumption that y==true. The new plan will also still assume that x==0. 4. The same is going to happen for y==false. Yet another specialized plan is generated. If we keep up to 3 plans per statement, say, then this new plan overflows the cache. The least recently used plan is flushed to make room for the new one--in this case, the generic one because we haven't seen any cases where x!=0 recently. More complex scenarios will also happen, of course, such as "if y==true then x will usually be 0, but otherwise x will be highly variable" or "if y==true then x is pseudo-constant and z is highly variable, but if y==false then it's the other way around" or "if y==false then z is usually the empty string," and so on. The predictor as I've simulated it is "too dumb to be intimidated" by the complexity. It should work reasonably well for all those scenarios, assuming of course that its cache is large enough to remember the most frequent patterns. Right now I'm using the plans' time since last use as the only eviction criterion when the cache overflows. There may be a better policy; the one Achilles heel of LRU is the "big loop" where every cache entry is used once, then evicted shortly before it is needed again. (If the loop is so big that entries are flushed long before they're needed again, well, then it's just a big job and you stop blaming LRU :-) > But if you have something working clearly that's not what you're doing. So > what are you doing? Storing up a list of arguments seen for each parameter > when executed and use the predictor bits to determine if any of those > arguments are constants? Storing up a list of selectivity estimates? The former. I'm keeping a single predictor with a single "more or less last-seen value" per parameter; plus a checklist of pseudoconstants for every cached plan. It's pretty simple, really, with no cost functions or spanning trees or other intelligent logic--and certainly nothing original. Which is what makes me cautiously optimistic: it's not so hard to come up with good or original ideas, but ones that are good *and* original are exceedingly rare. :) This particular algorithm is based entirely on standard processor architecture tricks. One (to me) surprising lesson I learned in that field was that simple, dynamic schemes with obvious flaws are often more effective than what the combination of a smart programmer, an expert user, and an aggressive compiler can come up with beforehand. Another one was that performance breakthroughs often come from applying these standard tricks to problems you'd think they'd already been applied to. Jeroen ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly