Hi jian,

Thanks for the refactor.  I went through it change by change -- the
per-file notes are at the bottom.

A note up front, so the notes don't read as dismissive: at this stage I'd
rather prioritize validated code over better code.  The branch has a lot of
coverage and Valgrind behind it, and every change -- even a clean,
equivalent one -- moves a line out from under that.  So where I say "drop",
it means "not now", not "not good"; most of these would be welcome as a
follow-up.  One piece, though, does need a fix before any of it lands.

That piece is the new guard in computeAbsorbability(): it carries a
behavior change that no test covers.

> Move isAbsorbable initialization from finalizeRPRPattern() into
> makeRPRPattern(), and guard the computeAbsorbabilityRecursive() call in
> computeAbsorbability() so it is only entered when the first element is
> an ALT or has an unbounded quantifier.

First, what "absorbable" means here: the first unbounded repetition of a
fixed-length body reachable from the start of the pattern.  This rule is
worth writing down -- in README.rpr and in the comment on
computeAbsorbability() (or wherever fits best) -- so it is stated once,
explicitly (patch attached: nocfbot-absorb-definition-doc).

By that definition the leading A+ in (A+ B){2,3} is absorbable, but the
patch drops it.

The offending check is in computeAbsorbability():

    if (RPRElemIsAlt(&pattern->elements[0]) ||
        pattern->elements[0].max == RPR_QUANTITY_INF)
        computeAbsorbabilityRecursive(pattern, 0, &hasAbsorbable);

It looks only at elements[0], so a bounded outer group's BEGIN (finite max)
fails the check and the recursion that would descend to A+ never runs.

I'm attaching a small test for this (nocfbot-absorb-bounded-outer-test): it
passes on the branch and fails with v50-0001 applied, pinning both the a+"
marker and the absorbed-context count.

For the rest, change by change -- "ok" = take as-is, "drop" = leave out:

src/backend/optimizer/plan/rpr.c

The branch is already validated under coverage and Valgrind, so I'd rather
not re-open verified logic for equivalent refactors -- glad to take those
as a follow-up.  In file order:

  ok    palloc_array / palloc0_array in makeRPRPattern()
      result->varNames = palloc_array(char *, numVars);
      result->elements = palloc0_array(RPRPatternElement, numElements);

  ok    move the isAbsorbable = false init into makeRPRPattern()

  drop  rename beginElem -> elem in fillRPRPatternGroup()
        only beginElem was renamed; with endElem kept, the begin/end pair is
        no longer symmetric (elem / endElem)

  drop  rename isFixedLengthChildren -> isFixedQuantifier -- keep the old
name
        "fixed-length" is the term everywhere else (README.rpr, the nearby
        comments, even this function's own body), and it drops the
"children"
        that flags a whole-subtree check

  drop  the jump-1 traversal rewrite in that function -- equivalent
refactor,
        defer to a follow-up

  drop  rename to start_with_unbounded_quantifier -- keep isUnboundedStart
    Lone snake_case among camelCase neighbors -- including
isFixedQuantifier,
    renamed in this same patch -- and it reads as a predicate though it
sets flags.

  drop  start_with_*'s new signature + if/else restructure -- equivalent
        refactor; keep the original isUnboundedStart body

  drop  rename branchFirst -> branchElement -- cosmetic churn, no behavior
change

  drop  the new computeAbsorbability() guard -- the regression above
      if (RPRElemIsAlt(&pattern->elements[0]) ||
          pattern->elements[0].max == RPR_QUANTITY_INF)

src/include/nodes/plannodes.h

  ok    drop the redundant /* T_RPRPattern */ on the type field
      NodeTag type;

  drop  reworded "see isUnboundedStart" block comment -- keep the original
        (it still matches now that isUnboundedStart stays)

  ok    drop the inline isAbsorbable comment -- the block comment above
covers it
      bool isAbsorbable;

src/backend/executor/README.rpr

  ok    spell out the skip mode -- a clear doc fix on its own
      (1) SKIP PAST LAST ROW (not SKIP TO NEXT ROW)

  drop  isUnboundedStart -> start_with_unbounded_quantifier -- follows the
rename

Tatsuo, what's your read on all this -- in particular, hold the equivalent
refactors until after commit, or take them now?

Best regards,
Henson
diff --git a/src/test/regress/expected/rpr_explain.out 
b/src/test/regress/expected/rpr_explain.out
index d8343c9accc..11c51d6f764 100644
--- a/src/test/regress/expected/rpr_explain.out
+++ b/src/test/regress/expected/rpr_explain.out
@@ -906,6 +906,50 @@ WINDOW w AS (
    ->  Function Scan on generate_series s (actual rows=50.00 loops=1)
 (10 rows)
 
+-- Context absorption with a bounded outer group whose body starts with A+
+-- The outer count {2,3} is bounded, but A+ is still the first unbounded 
portion
+-- of the pattern, so it remains the absorption comparison point (a+ keeps the 
"
+-- marker and contexts are absorbed).  A bounded outer group must not, by 
itself,
+-- disable absorption of the leading unbounded child.
+CREATE VIEW rpr_ev_ctx_absorb_bounded_outer AS
+SELECT count(*) OVER w
+FROM generate_series(1, 50) AS s(v)
+WINDOW w AS (
+    ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+    AFTER MATCH SKIP PAST LAST ROW
+    PATTERN ((A+ B){2,3})
+    DEFINE A AS v % 5 <> 0, B AS v % 5 = 0
+);
+SELECT line FROM 
unnest(string_to_array(pg_get_viewdef('rpr_ev_ctx_absorb_bounded_outer'), 
E'\n')) AS line WHERE line ~ 'PATTERN';
+           line           
+--------------------------
+   PATTERN ((a+ b){2,3}) 
+(1 row)
+
+SELECT rpr_explain_filter('
+EXPLAIN (ANALYZE, BUFFERS OFF, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT count(*) OVER w
+FROM generate_series(1, 50) AS s(v)
+WINDOW w AS (
+    ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+    AFTER MATCH SKIP PAST LAST ROW
+    PATTERN ((A+ B){2,3})
+    DEFINE A AS v % 5 <> 0, B AS v % 5 = 0
+);');
+                          rpr_explain_filter                          
+----------------------------------------------------------------------
+ WindowAgg (actual rows=50.00 loops=1)
+   Window: w AS (ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
+   Pattern: (a+" b){2,3}
+   Nav Mark Lookback: 0
+   Storage: Memory  Maximum Storage: NkB
+   NFA States: 6 peak, 118 total, 0 merged
+   NFA Contexts: 3 peak, 51 total, 4 pruned
+   NFA: 3 matched (len 15/15/15.0), 1 mismatched (len 5/5/5.0)
+   NFA: 30 absorbed (len 1/1/1.0), 12 skipped (len 1/5/3.0)
+   ->  Function Scan on generate_series s (actual rows=50.00 loops=1)
+(10 rows)
+
 -- Contexts skipped by SKIP PAST LAST ROW
 CREATE VIEW rpr_ev_ctx_skip AS
 SELECT count(*) OVER w
diff --git a/src/test/regress/sql/rpr_explain.sql 
b/src/test/regress/sql/rpr_explain.sql
index 3cd94af6bb8..5a9c83e7886 100644
--- a/src/test/regress/sql/rpr_explain.sql
+++ b/src/test/regress/sql/rpr_explain.sql
@@ -570,6 +570,32 @@ WINDOW w AS (
     DEFINE A AS v % 5 <> 0, B AS v % 5 = 0
 );');
 
+-- Context absorption with a bounded outer group whose body starts with A+
+-- The outer count {2,3} is bounded, but A+ is still the first unbounded 
portion
+-- of the pattern, so it remains the absorption comparison point (a+ keeps the 
"
+-- marker and contexts are absorbed).  A bounded outer group must not, by 
itself,
+-- disable absorption of the leading unbounded child.
+CREATE VIEW rpr_ev_ctx_absorb_bounded_outer AS
+SELECT count(*) OVER w
+FROM generate_series(1, 50) AS s(v)
+WINDOW w AS (
+    ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+    AFTER MATCH SKIP PAST LAST ROW
+    PATTERN ((A+ B){2,3})
+    DEFINE A AS v % 5 <> 0, B AS v % 5 = 0
+);
+SELECT line FROM 
unnest(string_to_array(pg_get_viewdef('rpr_ev_ctx_absorb_bounded_outer'), 
E'\n')) AS line WHERE line ~ 'PATTERN';
+SELECT rpr_explain_filter('
+EXPLAIN (ANALYZE, BUFFERS OFF, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT count(*) OVER w
+FROM generate_series(1, 50) AS s(v)
+WINDOW w AS (
+    ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+    AFTER MATCH SKIP PAST LAST ROW
+    PATTERN ((A+ B){2,3})
+    DEFINE A AS v % 5 <> 0, B AS v % 5 = 0
+);');
+
 -- Contexts skipped by SKIP PAST LAST ROW
 CREATE VIEW rpr_ev_ctx_skip AS
 SELECT count(*) OVER w
diff --git a/src/backend/executor/README.rpr b/src/backend/executor/README.rpr
index 9275e265d4b..f5e90848458 100644
--- a/src/backend/executor/README.rpr
+++ b/src/backend/executor/README.rpr
@@ -407,6 +407,10 @@ IV-5. Absorbability Analysis (RPR_ELEM_ABSORBABLE)
 Context absorption is an optimization technique that reduces O(n^2) to O(n).
 (Runtime behavior is described in Chapter VIII.)
 
+Definition: a pattern is absorbable when the first unbounded repetition of a
+fixed-length body reachable from the start of the pattern exists; that
+repetition is the absorption comparison point.
+
 This phase determines whether the pattern has a structure suitable for the
 absorption optimization and sets flags on the relevant elements:
 
diff --git a/src/backend/optimizer/plan/rpr.c b/src/backend/optimizer/plan/rpr.c
index fac3d7bcf38..f2801ffcba3 100644
--- a/src/backend/optimizer/plan/rpr.c
+++ b/src/backend/optimizer/plan/rpr.c
@@ -1742,6 +1742,10 @@ computeAbsorbabilityRecursive(RPRPattern *pattern, 
RPRElemIdx startIdx,
  * newer contexts that cannot produce longer matches than older contexts.
  * This achieves O(n^2) -> O(n) performance improvement.
  *
+ * Definition: a pattern is absorbable when the first unbounded repetition of
+ * a fixed-length body reachable from the start of the pattern exists; that
+ * repetition is the absorption comparison point.
+ *
  * Only greedy unbounded quantifiers at pattern start can be absorbable.
  * Reluctant quantifiers are excluded because they don't maintain monotonic
  * decrease property required for safe absorption.

Reply via email to