Re: Row pattern recognition

Henson Choi Fri, 06 Mar 2026 15:19:00 -0800

Hi, Tatsuo

Does "a zero-length match" mean "an empty match"?
>


Yes, they refer to the same thing.  "Zero-length match" is the more
common term in general regex implementations (PCRE2, Perl, Python,
Java, etc.[1]), but the RPR standard (ISO/IEC 19075-5, Section 4.12.2)
uses "empty match" exclusively.

[1] https://www.regular-expressions.info/zerolength.html


BTW, currently we place all nfa_* functions at the bottom of
> nodeWindowAgg.c.  However nodeWindowAgg.c in master branch places "API
> exposed to window functions" at the bottom of the file. Do you think
> we should follow the way?


Yes, we should follow master's convention.  I see three options:

  (a) Reorder within nodeWindowAgg.c: move the nfa_* functions up and
      keep the "API exposed to window functions" section at the bottom,
      matching master's layout.

  (b) Separate file under src/backend/executor/, keeping it close to
      nodeWindowAgg.c while making the boundary explicit.

  (c) A dedicated src/backend/rpr/ directory modeled on
      src/backend/regex/, giving the NFA engine its own namespace.
      This could also be an opportunity to consolidate the existing
      src/backend/optimizer/plan/rpr.c into the same directory.

For now (a) is the safest change.  Longer term, (b) or (c) would make
more sense -- especially when we extend to MATCH_RECOGNIZE (R010),
where the NFA engine will need to be shared across both code paths.
Either way, the NFA engine can be exposed via a header so that R010
can share it without further restructuring.

Since the NFA algorithm is not familiar territory for most DBMS
developers, it would also be worth preserving the detailed algorithm
description posted earlier in this thread -- either as structured
comments or as a dedicated README alongside the code.

What do you think?  Should we start with (a) now and revisit the
broader restructuring approaches -- (b) or (c) -- later, or would you
prefer to discuss them first?  Either of those would also resolve the
file layout convention issue naturally, since new files would follow
proper conventions from the start.


One more thing: there are no ECPG example programs or regression tests
for RPR yet.  I'd like to propose adding them.  Shall I draft an
initial set, or would you prefer to coordinate with the ECPG
maintainers first?


Best regards,
Henson

Re: Row pattern recognition

Reply via email to