Re: Row pattern recognition

Henson Choi Sat, 04 Jul 2026 02:00:09 -0700

Hi Tatsuo,

> BTW, I was thinking about cases where same DEFINE variable appears
> twice or more in PATTERN for a same row. For example PATTERN (A|A).
> But in this case it would be optimized out to (A). So we don't need
> to worry about A appearing twice. So our cost model is correct in
> this case.


Right, and I think the cost model holds even a bit more broadly than
the (A|A) case.  (A|A) does collapse to (A), but a variable can still
appear more than once in patterns that do not collapse, e.g. PATTERN
(A B | A C), or A A, or A+.

In those cases the model is still correct, for a slightly different
reason than the (A|A) rewrite: the executor evaluates each DEFINE
predicate once per row, not once per PATTERN occurrence.  For each
row it evaluates every DEFINE once and keeps the boolean results in
a varMatched array.

When the same A appears at several positions in the pattern -- for
example the A in each branch of (A B | A C), which are distinct
states -- each looks up varMatched[A], so the same entry is read
more than once; but that read reuses the already-computed value, not
a re-evaluation.  So repetition in PATTERN never multiplies DEFINE
evaluations, and charging once per DEFINE variable in the cost model
matches what the executor actually does.

The related question that does run the other way is that today we
evaluate every DEFINE for a row eagerly, not just the ones that row
actually needs.  For example, in PATTERN (A B C D) a single match
walks the sequence one variable per row -- each row only needs to
test the single variable its state expects -- yet we still evaluate
A, B, C, and D at every row.

That is the short-circuit / lazy DEFINE evaluation Jian raised on
2026-05-26 using that very (A B C D) example (evaluate a predicate
only the first time a state tests it).  If we ever adopt it, the
cost model's premise -- every DEFINE once per row -- would change
with it, so the two are tied together.

There's also a soundness angle that argues for keeping it separate.
DEFINE already forbids volatile functions and sequence operations
(nextval), so the obvious non-deterministic cases are out.  The
wrinkle lazy evaluation adds is that a predicate would then be
evaluated zero or one times per row -- skipped whenever no state
reaches it -- rather than always.  Whether that is safe for a
predicate carrying some state-affecting behavior the volatility ban
does not exclude is something I haven't worked through, so it wasn't a
call I'd want to make lightly under the current review.

As we discussed, that one is best left as a separate series after
the initial commit, and since it was Jian's idea I'd be glad to see
him drive it.  For now I'd keep it out of the in-flight review so the
commit stays small.

Best regards,
Henson

Re: Row pattern recognition

Reply via email to