Hi Tatsuo, > BTW, I was thinking about cases where same DEFINE variable appears > twice or more in PATTERN for a same row. For example PATTERN (A|A). > But in this case it would be optimized out to (A). So we don't need > to worry about A appearing twice. So our cost model is correct in > this case.
Right, and I think the cost model holds even a bit more broadly than the (A|A) case. (A|A) does collapse to (A), but a variable can still appear more than once in patterns that do not collapse, e.g. PATTERN (A B | A C), or A A, or A+. In those cases the model is still correct, for a slightly different reason than the (A|A) rewrite: the executor evaluates each DEFINE predicate once per row, not once per PATTERN occurrence. For each row it evaluates every DEFINE once and keeps the boolean results in a varMatched array. When the same A appears at several positions in the pattern -- for example the A in each branch of (A B | A C), which are distinct states -- each looks up varMatched[A], so the same entry is read more than once; but that read reuses the already-computed value, not a re-evaluation. So repetition in PATTERN never multiplies DEFINE evaluations, and charging once per DEFINE variable in the cost model matches what the executor actually does. The related question that does run the other way is that today we evaluate every DEFINE for a row eagerly, not just the ones that row actually needs. For example, in PATTERN (A B C D) a single match walks the sequence one variable per row -- each row only needs to test the single variable its state expects -- yet we still evaluate A, B, C, and D at every row. That is the short-circuit / lazy DEFINE evaluation Jian raised on 2026-05-26 using that very (A B C D) example (evaluate a predicate only the first time a state tests it). If we ever adopt it, the cost model's premise -- every DEFINE once per row -- would change with it, so the two are tied together. There's also a soundness angle that argues for keeping it separate. DEFINE already forbids volatile functions and sequence operations (nextval), so the obvious non-deterministic cases are out. The wrinkle lazy evaluation adds is that a predicate would then be evaluated zero or one times per row -- skipped whenever no state reaches it -- rather than always. Whether that is safe for a predicate carrying some state-affecting behavior the volatility ban does not exclude is something I haven't worked through, so it wasn't a call I'd want to make lightly under the current review. As we discussed, that one is best left as a separate series after the initial commit, and since it was Jian's idea I'd be glad to see him drive it. For now I'd keep it out of the in-flight review so the commit stays small. Best regards, Henson
