Hi Ishii-san, Thank you for raising the frame boundary concern. You're right that different contexts have different frame boundaries.
2026년 1월 15일 (목) PM 10:28, Henson Choi <[email protected]>님이 작성: > Hi Ishii-san > > We need to check the *frame" end, not the partition end. >> >> I think your patch relies on !window_gettupleslot() to check whether >> the row exists. >> >> if (!window_gettupleslot(winobj, pos, slot)) >> return false; /* No row exists */ >> >> But the function only checks the row existence in the current partition: >> >> * Fetch the pos'th tuple of the current partition into the slot, >> >> Thus it is possible that window_gettupleslot() returns true but the >> row is not in the current frame in case that the partition is divided >> into some frames. You need to check the row existence in a frame. For >> this purpose you can use row_is_in_frame(). >> > Your suggestion was to use row_is_in_frame() to check frame boundaries for each row. However, I took a simpler approach: just disable context absorption entirely when the frame is limited. The root cause is that context absorption assumes all contexts see the same rows. With limited frames, each context starting at a different row has a different visible range, so we cannot absorb one context into another. As I mentioned before, I think we should use absorption conservatively for now - only in cases where it's clearly safe (e.g., A* at the beginning of the pattern). We can extend it later through more in-depth research. The fix: if (winstate->rpSkipTo == ST_PAST_LAST_ROW && !hasLimitedFrame) nfa_absorb_contexts(winstate, targetCtx, currentPos); I added a test case to verify this: -- Pattern: A+ B, Data: A(0) A(1) A(2) B(3), Frame: CURRENT ROW AND 2 FOLLOWING -- Row 0: frame [0,2], cannot see B at row 3 -> no match -- Row 1: frame [1,3], can see A A B -> should match rows 1-3 WITH frame_absorb_test AS ( SELECT * FROM (VALUES (0, 'A'), (1, 'A'), (2, 'A'), (3, 'B') ) AS t(id, flag) ) SELECT id, flag, first_value(id) OVER w AS match_start, last_value(id) OVER w AS match_end FROM frame_absorb_test WINDOW w AS ( ORDER BY id ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING AFTER MATCH SKIP PAST LAST ROW PATTERN (A+ B) DEFINE A AS flag = 'A', B AS flag = 'B' ); Before fix (absorption incorrectly absorbs row 1's context into row 0's): id | flag | match_start | match_end ----+------+-------------+----------- 0 | A | | 1 | A | | 2 | A | | 3 | B | | After fix (row 1 correctly matches): id | flag | match_start | match_end ----+------+-------------+----------- 0 | A | | 1 | A | 1 | 3 2 | A | | 3 | B | | Note that for SKIP TO NEXT ROW, absorption is already disabled, so your test case with SKIP TO NEXT ROW should work correctly. I'm attaching three related commits: 1. Row pattern recognition: Improve NFA state memory management This commit refactors NFA state management for better readability and maintainability. 2. Row pattern recognition: Disable context absorption for limited frame This commit further restricts absorption: even with SKIP PAST LAST ROW, absorption is disabled when the frame is limited. With limited frames, different contexts have different frame boundaries, making absorption unsafe. > >> I ran following query and got the result with v38 (which includes your >> NFA patches). >> >> WITH data AS ( >> SELECT * FROM (VALUES >> ('A', 1), ('A', 2), >> ('B', 3), ('B', 4) >> ) AS t(gid, id)) >> SELECT gid, id, array_agg(id) OVER w >> FROM data >> WINDOW w AS ( >> PARTITION BY gid >> ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING >> AFTER MATCH SKIP TO NEXT ROW >> PATTERN (A+) >> DEFINE A AS id < 10 >> ); >> gid | id | array_agg >> -----+----+----------- >> A | 1 | {1,2} >> A | 2 | >> B | 3 | {3,4} >> B | 4 | >> (4 rows) >> >> I think the second and 4th rows are expected to return some data in >> array_agg colum. In fact v37 patch returns following results for the >> same query: >> >> gid | id | array_agg >> -----+----+----------- >> A | 1 | {1,2} >> A | 2 | {2} >> B | 3 | {3,4} >> B | 4 | {4} >> (4 rows) >> > > While investigating the issue you mentioned, I found that the problem > is caused by the context absorption logic, not the frame boundary check. > > The absorption logic removes a newer context when > older->matchEndRow >= newer->matchEndRow: > > Example with pattern A+: > - Context at row 1: matchEndRow = 2 (match {1,2}) > - Context at row 2: matchEndRow = 2 (match {2}) > → Newer context absorbed, row 2's result lost > > With SKIP TO NEXT ROW, each row should produce its own independent match. > The absorption was incorrectly removing these valid overlapping matches. > > I modified the absorption logic to only apply to SKIP PAST LAST ROW mode, > where it provides performance benefits without affecting correctness. > > For SKIP TO NEXT ROW, absorption is now disabled, allowing each row > to produce independent overlapping matches as expected: > > gid | id | array_agg > -----+----+----------- > A | 1 | {1,2} > A | 2 | {2} > B | 3 | {3,4} > B | 4 | {4} > (4 rows) > > I'm attaching 3 patches: > > 1. Test cases from Jacob Champion's branch > - Added 355 lines of test cases and 51 lines of executor changes > > 2. Refactored update_reduced_frame for readability and performance > - Extracted large code blocks into separate functions > - Early return for already-processed positions > - Integrated nfa_unlink_context into nfa_context_free > - Used oldest-first context ordering for early termination > > 3. Enable context absorption only for SKIP PAST LAST ROW > - Fixes overlapping match issue for SKIP TO NEXT ROW > - In my 5000-row test, absorption showed significant performance gain > for SKIP PAST LAST ROW > - Added your test case to regression tests > > Since I implemented it in the order of parser/planner/nfa/executor, > the executor flow became somewhat confusing. > I plan to continue improving it by reviewing and refactoring > from the top-level functions downward. > > During refactoring, I plan to simplify pattern optimization and context > absorption logic, keeping only the parts that are clearly correct, even > if this means some performance loss in the initial version. We can add > optimizations back later once the core logic is stable and well-tested. > I'm now reviewing the remaining parts: pattern optimization in planner, absorption function, and step function. Once the review is complete, I'll send the rest of the patches. Best regards, Henson
From 17d96417d053dced68f3fb1168bbbf77ec5486a5 Mon Sep 17 00:00:00 2001 From: Henson Choi <[email protected]> Date: Thu, 15 Jan 2026 21:05:04 +0900 Subject: [PATCH] Row pattern recognition: Refactor update_reduced_frame for readability and performance --- src/backend/executor/nodeWindowAgg.c | 299 ++++++++++++--------------- 1 file changed, 132 insertions(+), 167 deletions(-) diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c index d56e2e67170..3fc73db4e59 100644 --- a/src/backend/executor/nodeWindowAgg.c +++ b/src/backend/executor/nodeWindowAgg.c @@ -263,15 +263,17 @@ static void nfa_state_free_list(WindowAggState *winstate, RPRNFAState *list); static RPRNFAState *nfa_state_clone(WindowAggState *winstate, int16 elemIdx, int16 altPriority, int16 *counts, RPRNFAState *list); -static bool nfa_evaluate_row(WindowObject winobj, int64 pos, bool *varMatched, bool *anyMatch); +static bool nfa_evaluate_row(WindowObject winobj, int64 pos, bool *varMatched); static RPRNFAContext *nfa_context_alloc(WindowAggState *winstate); static void nfa_unlink_context(WindowAggState *winstate, RPRNFAContext *ctx); static void nfa_context_free(WindowAggState *winstate, RPRNFAContext *ctx); -static void nfa_start_context(WindowAggState *winstate, int64 startPos); +static RPRNFAContext *nfa_start_context(WindowAggState *winstate, int64 startPos); +static void nfa_step(WindowAggState *winstate, RPRNFAContext *ctx, + bool *varMatched, int64 pos); +static void nfa_process_context(WindowAggState *winstate, RPRNFAContext *ctx, + int64 currentPos, bool hasLimitedFrame, int64 frameOffset); static void nfa_step_single(WindowAggState *winstate, RPRNFAContext *ctx, RPRNFAState *state, bool *varMatched, int64 currentPos); -static void nfa_finalize_boundary(WindowAggState *winstate, RPRNFAContext *ctx, - int64 matchEndPos); static RPRNFAContext *nfa_find_context_for_pos(WindowAggState *winstate, int64 pos); static void nfa_remove_contexts_up_to(WindowAggState *winstate, int64 endPos); static void nfa_absorb_contexts(WindowAggState *winstate, RPRNFAContext *excludeCtx, int64 currentPos); @@ -4713,8 +4715,6 @@ update_reduced_frame(WindowObject winobj, int64 pos) { WindowAggState *winstate = winobj->winstate; RPRNFAContext *targetCtx; - RPRNFAContext *firstCtx; - int64 matchLength = 0; int64 currentPos; int64 startPos; int frameOptions = winstate->frameOptions; @@ -4733,21 +4733,14 @@ update_reduced_frame(WindowObject winobj, int64 pos) /* * Case 1: pos is before any existing context's start position. * This means the position was already processed and determined unmatched. - * Note: contexts are added at head with increasing positions, so we need - * to find the minimum matchStartRow (the oldest context). + * Head is the oldest context (lowest matchStartRow) since contexts are + * added at tail with increasing positions. */ + if (winstate->nfaContext != NULL && + pos < winstate->nfaContext->matchStartRow) { - int64 minStartRow = INT64_MAX; - for (firstCtx = winstate->nfaContext; firstCtx != NULL; firstCtx = firstCtx->next) - { - if (firstCtx->matchStartRow < minStartRow) - minStartRow = firstCtx->matchStartRow; - } - if (minStartRow != INT64_MAX && pos < minStartRow) - { - register_reduced_frame_map(winstate, pos, RF_UNMATCHED); - return; - } + register_reduced_frame_map(winstate, pos, RF_UNMATCHED); + return; } /* @@ -4756,14 +4749,30 @@ update_reduced_frame(WindowObject winobj, int64 pos) targetCtx = nfa_find_context_for_pos(winstate, pos); if (targetCtx == NULL) { - /* No context exists - create one and start fresh */ - nfa_start_context(winstate, pos); - targetCtx = winstate->nfaContext; + /* + * No context exists. If pos is already processed, it means this row + * was already determined to be unmatched or skipped - no need to + * reprocess. + */ + if (pos <= winstate->nfaLastProcessedRow) + { + register_reduced_frame_map(winstate, pos, RF_UNMATCHED); + return; + } + /* Not yet processed - create new context and start fresh */ + targetCtx = nfa_start_context(winstate, pos); + } + else if (targetCtx->states == NULL) + { + /* Context already completed - skip to result registration */ + goto register_result; } /* * Determine where to start processing. - * If we've already evaluated rows beyond pos, continue from there. + * Usually nfaLastProcessedRow+1 >= pos since contexts are created at + * currentPos+1 during processing. However, pos can exceed this when + * rows are skipped (e.g., unmatched rows don't update nfaLastProcessedRow). */ startPos = Max(pos, winstate->nfaLastProcessedRow + 1); @@ -4774,11 +4783,10 @@ update_reduced_frame(WindowObject winobj, int64 pos) for (currentPos = startPos; targetCtx->states != NULL; currentPos++) { bool rowExists; - bool anyMatch; RPRNFAContext *ctx; /* Evaluate variables for this row - done only once, shared by all contexts */ - rowExists = nfa_evaluate_row(winobj, currentPos, winstate->nfaVarMatched, &anyMatch); + rowExists = nfa_evaluate_row(winobj, currentPos, winstate->nfaVarMatched); /* No more rows in partition? Finalize all contexts */ if (!rowExists) @@ -4786,7 +4794,7 @@ update_reduced_frame(WindowObject winobj, int64 pos) for (ctx = winstate->nfaContext; ctx != NULL; ctx = ctx->next) { if (ctx->states != NULL) - nfa_finalize_boundary(winstate, ctx, currentPos - 1); + nfa_step(winstate, ctx, NULL, currentPos - 1); } /* Absorb completed contexts at partition boundary */ nfa_absorb_contexts(winstate, NULL, currentPos - 1); @@ -4796,69 +4804,15 @@ update_reduced_frame(WindowObject winobj, int64 pos) /* Update last processed row */ winstate->nfaLastProcessedRow = currentPos; - /* - * Process each active context with this row's evaluation results. - * Each context has its own frame boundary based on matchStartRow. - */ + /* Process each active context with this row's evaluation results */ for (ctx = winstate->nfaContext; ctx != NULL; ctx = ctx->next) - { - int64 ctxFrameEnd; - - /* Skip already-completed contexts */ - if (ctx->states == NULL) - continue; - - /* - * Calculate per-context frame end. - * For "ROWS BETWEEN CURRENT ROW AND N FOLLOWING", each context's - * frame end is matchStartRow + offset + 1 (exclusive). - */ - if (hasLimitedFrame) - { - ctxFrameEnd = ctx->matchStartRow + frameOffset + 1; - if (currentPos >= ctxFrameEnd) - { - nfa_finalize_boundary(winstate, ctx, ctxFrameEnd - 1); - continue; - } - } - - /* First row of this context must match at least one variable */ - if (currentPos == ctx->matchStartRow && !anyMatch) - { - /* Clear states to mark as unmatched */ - nfa_state_free_list(winstate, ctx->states); - ctx->states = NULL; - continue; - } - - /* Skip if this row is before context's start */ - if (currentPos < ctx->matchStartRow) - continue; - - /* Process states for this context */ - { - RPRNFAState *states = ctx->states; - RPRNFAState *state; - RPRNFAState *nextState; - - ctx->states = NULL; - - for (state = states; state != NULL; state = nextState) - { - nextState = state->next; - state->next = NULL; - nfa_step_single(winstate, ctx, state, winstate->nfaVarMatched, currentPos); - } - } - } + nfa_process_context(winstate, ctx, currentPos, hasLimitedFrame, frameOffset); /* * Create a new context for the next potential start position. * This enables overlapping match detection for SKIP TO NEXT ROW. */ - if (anyMatch) - nfa_start_context(winstate, currentPos + 1); + nfa_start_context(winstate, currentPos + 1); /* * Absorb redundant contexts. @@ -4873,23 +4827,20 @@ update_reduced_frame(WindowObject winobj, int64 pos) break; } - /* - * Get match result from target context. - */ - if (targetCtx->matchEndRow >= pos) - matchLength = targetCtx->matchEndRow - pos + 1; +register_result: + Assert(pos == targetCtx->matchStartRow); /* * Register reduced frame map based on match result. */ - if (matchLength <= 0) + if (targetCtx->matchEndRow < targetCtx->matchStartRow) { - register_reduced_frame_map(winstate, pos, RF_UNMATCHED); + register_reduced_frame_map(winstate, targetCtx->matchStartRow, RF_UNMATCHED); } else { - register_reduced_frame_map(winstate, pos, RF_FRAME_HEAD); - for (int64 i = pos + 1; i < pos + matchLength; i++) + register_reduced_frame_map(winstate, targetCtx->matchStartRow, RF_FRAME_HEAD); + for (int64 i = targetCtx->matchStartRow + 1; i <= targetCtx->matchEndRow; i++) { register_reduced_frame_map(winstate, i, RF_SKIPPED); } @@ -4898,26 +4849,16 @@ update_reduced_frame(WindowObject winobj, int64 pos) /* * Cleanup contexts based on SKIP mode. */ - if (matchLength > 0 && winstate->rpSkipTo == ST_PAST_LAST_ROW) + if (targetCtx->matchEndRow >= targetCtx->matchStartRow && + winstate->rpSkipTo == ST_PAST_LAST_ROW) { /* Remove all contexts with start <= matchEnd */ - nfa_remove_contexts_up_to(winstate, pos + matchLength - 1); + nfa_remove_contexts_up_to(winstate, targetCtx->matchEndRow); } else { /* SKIP TO NEXT ROW or no match: just remove the target context */ - RPRNFAContext *ctx = winstate->nfaContext; - - while (ctx != NULL) - { - if (ctx == targetCtx) - { - nfa_unlink_context(winstate, ctx); - nfa_context_free(winstate, ctx); - break; - } - ctx = ctx->next; - } + nfa_context_free(winstate, targetCtx); } } @@ -5152,22 +5093,19 @@ nfa_free_matched_state(WindowAggState *winstate, RPRNFAState *state) * * Evaluate all DEFINE variables for current row. * Returns true if the row exists, false if out of partition. - * If row exists, fills varMatched array and sets *anyMatch if any variable matched. + * If row exists, fills varMatched array. * varMatched[i] = true if variable i matched at current row. */ static bool -nfa_evaluate_row(WindowObject winobj, int64 pos, bool *varMatched, bool *anyMatchOut) +nfa_evaluate_row(WindowObject winobj, int64 pos, bool *varMatched) { WindowAggState *winstate = winobj->winstate; ExprContext *econtext = winstate->ss.ps.ps_ExprContext; int numDefineVars = list_length(winstate->defineVariableList); ListCell *lc; int varIdx = 0; - bool anyMatch = false; TupleTableSlot *slot; - *anyMatchOut = false; - /* * Set up slots for current, previous, and next rows. * We don't call get_slots() here to avoid recursion through @@ -5208,22 +5146,13 @@ nfa_evaluate_row(WindowObject winobj, int64 pos, bool *varMatched, bool *anyMatc /* Evaluate DEFINE expression */ result = ExecEvalExpr(exprState, econtext, &isnull); - if (!isnull && DatumGetBool(result)) - { - varMatched[varIdx] = true; - anyMatch = true; - } - else - { - varMatched[varIdx] = false; - } + varMatched[varIdx] = (!isnull && DatumGetBool(result)); varIdx++; if (varIdx >= numDefineVars) break; } - *anyMatchOut = anyMatch; return true; /* Row exists */ } @@ -5286,12 +5215,15 @@ nfa_unlink_context(WindowAggState *winstate, RPRNFAContext *ctx) /* * nfa_context_free * - * Return a context to free list. Also frees any states in the context. - * Note: Caller must unlink context from active list first using nfa_unlink_context. + * Unlink context from active list and return it to free list. + * Also frees any states in the context. */ static void nfa_context_free(WindowAggState *winstate, RPRNFAContext *ctx) { + /* Unlink from active list first */ + nfa_unlink_context(winstate, ctx); + if (ctx->states != NULL) nfa_state_free_list(winstate, ctx->states); if (ctx->matchedState != NULL) @@ -5307,9 +5239,9 @@ nfa_context_free(WindowAggState *winstate, RPRNFAContext *ctx) * nfa_start_context * * Start a new match context at given position. - * Adds context to winstate->nfaContext list. + * Adds context to winstate->nfaContext list and returns the new context. */ -static void +static RPRNFAContext * nfa_start_context(WindowAggState *winstate, int64 startPos) { RPRNFAContext *ctx; @@ -5318,14 +5250,16 @@ nfa_start_context(WindowAggState *winstate, int64 startPos) ctx->matchStartRow = startPos; ctx->states = nfa_state_alloc(winstate); /* initial state at elem 0 */ - /* Add to head of active context list (doubly-linked) */ - ctx->next = winstate->nfaContext; - ctx->prev = NULL; - if (winstate->nfaContext != NULL) - winstate->nfaContext->prev = ctx; + /* Add to tail of active context list (doubly-linked, oldest-first) */ + ctx->prev = winstate->nfaContextTail; + ctx->next = NULL; + if (winstate->nfaContextTail != NULL) + winstate->nfaContextTail->next = ctx; else - winstate->nfaContextTail = ctx; /* first context becomes tail */ - winstate->nfaContext = ctx; + winstate->nfaContext = ctx; /* first context becomes head */ + winstate->nfaContextTail = ctx; + + return ctx; } /* @@ -5339,10 +5273,16 @@ nfa_find_context_for_pos(WindowAggState *winstate, int64 pos) { RPRNFAContext *ctx; + /* + * List is sorted by matchStartRow ascending (oldest/smallest at head). + * Stop early if we pass the target position. + */ for (ctx = winstate->nfaContext; ctx != NULL; ctx = ctx->next) { if (ctx->matchStartRow == pos) return ctx; + if (ctx->matchStartRow > pos) + break; /* won't find it, list is sorted */ } return NULL; } @@ -5359,17 +5299,13 @@ nfa_remove_contexts_up_to(WindowAggState *winstate, int64 endPos) RPRNFAContext *ctx; RPRNFAContext *next; - ctx = winstate->nfaContext; - while (ctx != NULL) + /* Contexts are sorted by matchStartRow ascending, so we can stop early */ + for (ctx = winstate->nfaContext; ctx != NULL; ctx = next) { next = ctx->next; - if (ctx->matchStartRow <= endPos) - { - /* Remove this context */ - nfa_unlink_context(winstate, ctx); - nfa_context_free(winstate, ctx); - } - ctx = next; + if (ctx->matchStartRow > endPos) + break; + nfa_context_free(winstate, ctx); } } @@ -5384,8 +5320,8 @@ nfa_remove_contexts_up_to(WindowAggState *winstate, int64 endPos) * - For unbounded quantifiers (max=INT32_MAX): older.counts >= newer.counts * - For bounded quantifiers: older.counts == newer.counts * - * Note: List is newest-first, so we check if later nodes (older contexts) - * can absorb earlier nodes (newer contexts). + * Note: List is oldest-first (head=oldest, tail=newest), so we start from + * tail (newest) and check if older contexts (via prev) can absorb it. */ static void nfa_absorb_contexts(WindowAggState *winstate, RPRNFAContext *excludeCtx, int64 currentPos) @@ -5421,14 +5357,14 @@ nfa_absorb_contexts(WindowAggState *winstate, RPRNFAContext *excludeCtx, int64 c * for all depths d * 3. For bounded elements: counts must be exactly equal at all depths * - * List is newest-first (head = highest matchStartRow). - * We iterate from head (newest) and check if older contexts can absorb it. + * List is oldest-first (head = lowest matchStartRow, tail = highest). + * We iterate from tail (newest) via prev and check if older contexts can absorb it. */ - ctx = winstate->nfaContext; + ctx = winstate->nfaContextTail; while (ctx != NULL) { - RPRNFAContext *nextCtx = ctx->next; + RPRNFAContext *nextCtx = ctx->prev; /* traverse toward older (head) */ RPRNFAContext *older; bool absorbed = false; @@ -5461,7 +5397,7 @@ nfa_absorb_contexts(WindowAggState *winstate, RPRNFAContext *excludeCtx, int64 c } /* Check if any older completed context can absorb this one */ - for (older = ctx->next; older != NULL && !absorbed; older = older->next) + for (older = ctx->prev; older != NULL && !absorbed; older = older->prev) { /* Must have started earlier */ if (older->matchStartRow >= ctx->matchStartRow) @@ -5478,7 +5414,6 @@ nfa_absorb_contexts(WindowAggState *winstate, RPRNFAContext *excludeCtx, int64 c if (older->matchEndRow >= ctx->matchEndRow) { /* Absorb: remove ctx (newer) */ - nfa_unlink_context(winstate, ctx); nfa_context_free(winstate, ctx); absorbed = true; } @@ -5518,9 +5453,9 @@ nfa_absorb_contexts(WindowAggState *winstate, RPRNFAContext *excludeCtx, int64 c /* * Check if any older context can absorb this one. - * Older contexts have lower matchStartRow. + * Older contexts have lower matchStartRow (toward head via prev). */ - for (older = ctx->next; older != NULL && !absorbed; older = older->next) + for (older = ctx->prev; older != NULL && !absorbed; older = older->prev) { RPRNFAState *laterState; RPRNFAState *earlierState; @@ -5616,7 +5551,6 @@ nfa_absorb_contexts(WindowAggState *winstate, RPRNFAContext *excludeCtx, int64 c if (canAbsorb) { /* Absorb: remove ctx (newer) */ - nfa_unlink_context(winstate, ctx); nfa_context_free(winstate, ctx); absorbed = true; } @@ -5627,32 +5561,57 @@ nfa_absorb_contexts(WindowAggState *winstate, RPRNFAContext *excludeCtx, int64 c } /* - * nfa_finalize_boundary + * nfa_step * - * Finalize NFA states at partition/frame boundary. - * Sets all varMatched to false and processes remaining states. + * Process all states in context through NFA for one row. + * Calls nfa_step_single for each state. */ static void -nfa_finalize_boundary(WindowAggState *winstate, RPRNFAContext *ctx, int64 matchEndPos) +nfa_step(WindowAggState *winstate, RPRNFAContext *ctx, bool *varMatched, int64 pos) { RPRNFAState *states = ctx->states; RPRNFAState *state; RPRNFAState *nextState; - int numVars = list_length(winstate->defineVariableList); ctx->states = NULL; - for (int i = 0; i < numVars; i++) - winstate->nfaVarMatched[i] = false; - for (state = states; state != NULL; state = nextState) { nextState = state->next; state->next = NULL; - nfa_step_single(winstate, ctx, state, winstate->nfaVarMatched, matchEndPos); + nfa_step_single(winstate, ctx, state, varMatched, pos); } } +/* + * nfa_process_context + * + * Process one context for the current row. + * Handles frame boundary check and NFA step. + */ +static void +nfa_process_context(WindowAggState *winstate, RPRNFAContext *ctx, + int64 currentPos, bool hasLimitedFrame, int64 frameOffset) +{ + /* Skip already-completed contexts */ + if (ctx->states == NULL) + return; + + /* Check frame boundary */ + if (hasLimitedFrame) + { + int64 ctxFrameEnd = ctx->matchStartRow + frameOffset + 1; + if (currentPos >= ctxFrameEnd) + { + nfa_step(winstate, ctx, NULL, ctxFrameEnd - 1); + return; + } + } + + /* Process states for this context */ + nfa_step(winstate, ctx, winstate->nfaVarMatched, currentPos); +} + /* * nfa_step_single * @@ -5688,15 +5647,21 @@ nfa_step_single(WindowAggState *winstate, RPRNFAContext *ctx, { /* * Variable: check if it matches current row. - * With DEFINE-first ordering, varId < numDefines has DEFINE expr, + * If varMatched is NULL (boundary), all variables are forced to false. + * Otherwise, varId < numDefines uses DEFINE expr, * varId >= numDefines defaults to TRUE. */ - int numDefines = list_length(winstate->defineVariableList); - - if (elem->varId >= numDefines) - matched = true; /* Not defined in DEFINE, defaults to TRUE */ + if (varMatched == NULL) + matched = false; else - matched = varMatched[elem->varId]; + { + int numDefines = list_length(winstate->defineVariableList); + + if (elem->varId >= numDefines) + matched = true; /* Not defined in DEFINE, defaults to TRUE */ + else + matched = varMatched[elem->varId]; + } count = state->counts[depth]; -- 2.50.1 (Apple Git-155)
From 25116ddcc61c3d71690b7fc65d60cb06809631bc Mon Sep 17 00:00:00 2001 From: Henson Choi <[email protected]> Date: Thu, 15 Jan 2026 12:52:36 +0900 Subject: [PATCH] Row pattern recognition: Test cases from Jacob Champion's branch --- src/backend/executor/nodeWindowAgg.c | 51 ++- src/backend/optimizer/plan/rpr.c | 4 +- src/test/regress/expected/rpr.out | 474 ++++++++++++++++++++++++++- src/test/regress/sql/rpr.sql | 355 +++++++++++++++++++- 4 files changed, 855 insertions(+), 29 deletions(-) diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c index cf43c1f6127..d56e2e67170 100644 --- a/src/backend/executor/nodeWindowAgg.c +++ b/src/backend/executor/nodeWindowAgg.c @@ -4788,6 +4788,8 @@ update_reduced_frame(WindowObject winobj, int64 pos) if (ctx->states != NULL) nfa_finalize_boundary(winstate, ctx, currentPos - 1); } + /* Absorb completed contexts at partition boundary */ + nfa_absorb_contexts(winstate, NULL, currentPos - 1); break; } @@ -5093,8 +5095,9 @@ nfa_state_clone(WindowAggState *winstate, int16 elemIdx, int16 altPriority, /* * nfa_add_matched_state * - * Record a matched state following SQL standard lexical order preference. - * Priority: lower altPriority wins (lexical order), then longer match. + * Record a matched state following SQL standard semantics. + * For greedy quantifiers, longer match wins. For alternation at the same + * match length, lexical order (lower altPriority) wins. */ static void nfa_add_matched_state(WindowAggState *winstate, RPRNFAContext *ctx, @@ -5109,13 +5112,13 @@ nfa_add_matched_state(WindowAggState *winstate, RPRNFAContext *ctx, } else if (state->altPriority < ctx->matchedState->altPriority) { - /* New state has better lexical order priority */ + /* Better lexical order always wins (SQL standard preference) */ shouldUpdate = true; } else if (state->altPriority == ctx->matchedState->altPriority && matchEndRow > ctx->matchEndRow) { - /* Same priority, but longer match */ + /* Same lexical order, longer match wins (greedy) */ shouldUpdate = true; } @@ -5864,14 +5867,42 @@ nfa_step_single(WindowAggState *winstate, RPRNFAContext *ctx, { /* * Between min and max - can exit or continue. - * Exit state continues processing (pending). - * Loop state waits for next row (ctx->states). + * Exit state handling depends on what comes next: + * - If next is FIN: process on same row (match ends here) + * - If next is VAR/ALT: wait for next row (needs new input) + * Loop state always waits for next row. */ - RPRNFAState *exitState = nfa_state_clone(winstate, elem->next, - state->altPriority, - state->counts, pending); + RPRPatternElement *nextElem = &elements[elem->next]; + RPRElemIdx exitAltPriority; + RPRNFAState *exitState; + + /* + * For greedy extension: if context already has a match recorded, + * preserve its altPriority. This ensures that iterations of a + * quantified group don't compete on lexical order - the first + * iteration's choice sets the preference, and subsequent + * iterations can extend it if they produce a longer match. + */ + exitAltPriority = state->altPriority; + if (ctx->matchedState != NULL) + exitAltPriority = ctx->matchedState->altPriority; + + exitState = nfa_state_clone(winstate, elem->next, + exitAltPriority, + state->counts, NULL); exitState->counts[depth] = 0; - pending = exitState; + + if (RPRElemIsFin(nextElem)) + { + /* Match ends at current row */ + exitState->next = pending; + pending = exitState; + } + else + { + /* Next element (VAR/ALT) needs new row input */ + nfa_add_state_unique(winstate, ctx, exitState); + } state->counts[depth] = count; for (int d = depth + 1; d < pattern->maxDepth; d++) diff --git a/src/backend/optimizer/plan/rpr.c b/src/backend/optimizer/plan/rpr.c index 0eed6b865ae..4385e49d6fb 100644 --- a/src/backend/optimizer/plan/rpr.c +++ b/src/backend/optimizer/plan/rpr.c @@ -995,8 +995,8 @@ computeAbsorbability(RPRPattern *pattern) } } - /* Must have exactly one unbounded element/group */ - if (numUnbounded != 1) + /* Must have at least one unbounded element for absorption */ + if (numUnbounded < 1) return; /* diff --git a/src/test/regress/expected/rpr.out b/src/test/regress/expected/rpr.out index cea986d6704..652b1eb811d 100644 --- a/src/test/regress/expected/rpr.out +++ b/src/test/regress/expected/rpr.out @@ -884,23 +884,23 @@ SELECT company, tdate, price, first_value(tdate) OVER w, last_value(tdate) OVER ----------+------------+-------+-------------+------------ company1 | 07-01-2023 | 100 | | company1 | 07-02-2023 | 200 | 07-02-2023 | 07-05-2023 - company1 | 07-03-2023 | 150 | 07-03-2023 | 07-05-2023 - company1 | 07-04-2023 | 140 | 07-04-2023 | 07-05-2023 + company1 | 07-03-2023 | 150 | | + company1 | 07-04-2023 | 140 | | company1 | 07-05-2023 | 150 | | company1 | 07-06-2023 | 90 | | company1 | 07-07-2023 | 110 | 07-07-2023 | 07-10-2023 - company1 | 07-08-2023 | 130 | 07-08-2023 | 07-10-2023 - company1 | 07-09-2023 | 120 | 07-09-2023 | 07-10-2023 + company1 | 07-08-2023 | 130 | | + company1 | 07-09-2023 | 120 | | company1 | 07-10-2023 | 130 | | company2 | 07-01-2023 | 50 | | company2 | 07-02-2023 | 2000 | 07-02-2023 | 07-05-2023 - company2 | 07-03-2023 | 1500 | 07-03-2023 | 07-05-2023 - company2 | 07-04-2023 | 1400 | 07-04-2023 | 07-05-2023 + company2 | 07-03-2023 | 1500 | | + company2 | 07-04-2023 | 1400 | | company2 | 07-05-2023 | 1500 | | company2 | 07-06-2023 | 60 | | company2 | 07-07-2023 | 1100 | 07-07-2023 | 07-10-2023 - company2 | 07-08-2023 | 1300 | 07-08-2023 | 07-10-2023 - company2 | 07-09-2023 | 1200 | 07-09-2023 | 07-10-2023 + company2 | 07-08-2023 | 1300 | | + company2 | 07-09-2023 | 1200 | | company2 | 07-10-2023 | 1300 | | (20 rows) @@ -2701,7 +2701,7 @@ WINDOW w AS ( -- Pattern optimized: (A B) (A B)+ -> (A B){2,} -- Potential matches: 1-6 (3 reps), 3-6 (2 reps), 5-6 needs 2 reps (fail) -- Row 1 (1-6) absorbs Row 3 (3-6) - same endpoint, top-level unbounded GROUP --- Test absorption 6: Multiple unbounded - NOT absorbable +-- Test absorption 6: Multiple unbounded - first element unbounded enables absorption WITH test_multi_unbounded AS ( SELECT * FROM (VALUES (1, ARRAY['A']), @@ -2725,12 +2725,460 @@ WINDOW w AS ( id | flags | match_start | match_end ----+-------+-------------+----------- 1 | {A} | 1 | 4 - 2 | {A} | 2 | 4 + 2 | {A} | | 3 | {B} | | 4 | {B} | | 5 | {X} | | (5 rows) --- Pattern A+ B+ has TWO unbounded elements - NOT absorbable --- Greedy A+ consumes rows 1-2, then B+ matches 3-4 --- Row 1: 1-4, Row 2: 2-4 (different A+ counts, not absorbed) +-- Row 1: 1-4, Row 2: absorbed (same endpoint 4) +-- ============================================ +-- Jacob's RPR Patterns (from jacob branch) +-- ============================================ +-- Test: A? (optional, greedy) +WITH jacob_optional AS ( + SELECT * FROM (VALUES + (1, ARRAY['A']), + (2, ARRAY['X']) + ) AS t(id, flags) +) +SELECT id, flags, first_value(id) OVER w AS match_start, last_value(id) OVER w AS match_end +FROM jacob_optional +WINDOW w AS ( + ORDER BY id + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + PATTERN (A?) + DEFINE A AS 'A' = ANY(flags) +); + id | flags | match_start | match_end +----+-------+-------------+----------- + 1 | {A} | 1 | 1 + 2 | {X} | | +(2 rows) + +-- Expected: 1-1 (matches A) +-- Test: A{2} (exact count) +WITH jacob_exact AS ( + SELECT * FROM (VALUES + (1, ARRAY['A']), + (2, ARRAY['A']), + (3, ARRAY['A']), + (4, ARRAY['X']) + ) AS t(id, flags) +) +SELECT id, flags, first_value(id) OVER w AS match_start, last_value(id) OVER w AS match_end +FROM jacob_exact +WINDOW w AS ( + ORDER BY id + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + PATTERN (A{2}) + DEFINE A AS 'A' = ANY(flags) +); + id | flags | match_start | match_end +----+-------+-------------+----------- + 1 | {A} | 1 | 2 + 2 | {A} | | + 3 | {A} | | + 4 | {X} | | +(4 rows) + +-- Expected: 1-2 +-- Test: A{1,3} (bounded range, greedy) +WITH jacob_bounded AS ( + SELECT * FROM (VALUES + (1, ARRAY['A']), + (2, ARRAY['A']), + (3, ARRAY['A']), + (4, ARRAY['A']), + (5, ARRAY['X']) + ) AS t(id, flags) +) +SELECT id, flags, first_value(id) OVER w AS match_start, last_value(id) OVER w AS match_end +FROM jacob_bounded +WINDOW w AS ( + ORDER BY id + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + PATTERN (A{1,3}) + DEFINE A AS 'A' = ANY(flags) +); + id | flags | match_start | match_end +----+-------+-------------+----------- + 1 | {A} | 1 | 3 + 2 | {A} | | + 3 | {A} | | + 4 | {A} | 4 | 4 + 5 | {X} | | +(5 rows) + +-- Expected: 1-3 (greedy takes max), then 4-4 +-- Test: A | B (simple alternation) +WITH jacob_simple_alt AS ( + SELECT * FROM (VALUES + (1, ARRAY['A']), + (2, ARRAY['B']), + (3, ARRAY['X']) + ) AS t(id, flags) +) +SELECT id, flags, first_value(id) OVER w AS match_start, last_value(id) OVER w AS match_end +FROM jacob_simple_alt +WINDOW w AS ( + ORDER BY id + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + PATTERN (A | B) + DEFINE + A AS 'A' = ANY(flags), + B AS 'B' = ANY(flags) +); + id | flags | match_start | match_end +----+-------+-------------+----------- + 1 | {A} | 1 | 1 + 2 | {B} | 2 | 2 + 3 | {X} | | +(3 rows) + +-- Expected: 1-1 (A), 2-2 (B) +-- Test: A | B | C (three-way alternation) +WITH jacob_three_alt AS ( + SELECT * FROM (VALUES + (1, ARRAY['B']), + (2, ARRAY['X']) + ) AS t(id, flags) +) +SELECT id, flags, first_value(id) OVER w AS match_start, last_value(id) OVER w AS match_end +FROM jacob_three_alt +WINDOW w AS ( + ORDER BY id + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + PATTERN (A | B | C) + DEFINE + A AS 'A' = ANY(flags), + B AS 'B' = ANY(flags), + C AS 'C' = ANY(flags) +); + id | flags | match_start | match_end +----+-------+-------------+----------- + 1 | {B} | 1 | 1 + 2 | {X} | | +(2 rows) + +-- Expected: 1-1 (B) +-- Test: A B C (concatenation) +WITH jacob_concat AS ( + SELECT * FROM (VALUES + (1, ARRAY['A']), + (2, ARRAY['B']), + (3, ARRAY['C']), + (4, ARRAY['X']) + ) AS t(id, flags) +) +SELECT id, flags, first_value(id) OVER w AS match_start, last_value(id) OVER w AS match_end +FROM jacob_concat +WINDOW w AS ( + ORDER BY id + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + PATTERN (A B C) + DEFINE + A AS 'A' = ANY(flags), + B AS 'B' = ANY(flags), + C AS 'C' = ANY(flags) +); + id | flags | match_start | match_end +----+-------+-------------+----------- + 1 | {A} | 1 | 3 + 2 | {B} | | + 3 | {C} | | + 4 | {X} | | +(4 rows) + +-- Expected: 1-3 +-- Test: A B? C (optional middle) +WITH jacob_optional_mid AS ( + SELECT * FROM (VALUES + (1, ARRAY['A']), + (2, ARRAY['C']), + (3, ARRAY['X']) + ) AS t(id, flags) +) +SELECT id, flags, first_value(id) OVER w AS match_start, last_value(id) OVER w AS match_end +FROM jacob_optional_mid +WINDOW w AS ( + ORDER BY id + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + PATTERN (A B? C) + DEFINE + A AS 'A' = ANY(flags), + B AS 'B' = ANY(flags), + C AS 'C' = ANY(flags) +); + id | flags | match_start | match_end +----+-------+-------------+----------- + 1 | {A} | 1 | 2 + 2 | {C} | | + 3 | {X} | | +(3 rows) + +-- Expected: 1-2 (A C, B skipped) +-- Test: (A B){2} (nested group with quantifier) +WITH jacob_nested_group AS ( + SELECT * FROM (VALUES + (1, ARRAY['A']), + (2, ARRAY['B']), + (3, ARRAY['A']), + (4, ARRAY['B']), + (5, ARRAY['X']) + ) AS t(id, flags) +) +SELECT id, flags, first_value(id) OVER w AS match_start, last_value(id) OVER w AS match_end +FROM jacob_nested_group +WINDOW w AS ( + ORDER BY id + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + PATTERN ((A B){2}) + DEFINE + A AS 'A' = ANY(flags), + B AS 'B' = ANY(flags) +); + id | flags | match_start | match_end +----+-------+-------------+----------- + 1 | {A} | 1 | 4 + 2 | {B} | | + 3 | {A} | | + 4 | {B} | | + 5 | {X} | | +(5 rows) + +-- Expected: 1-4 (A B A B) +-- Test: (A){3} (quantifier on grouped single element) +WITH jacob_group_quant AS ( + SELECT * FROM (VALUES + (1, ARRAY['A']), + (2, ARRAY['A']), + (3, ARRAY['A']), + (4, ARRAY['X']) + ) AS t(id, flags) +) +SELECT id, flags, first_value(id) OVER w AS match_start, last_value(id) OVER w AS match_end +FROM jacob_group_quant +WINDOW w AS ( + ORDER BY id + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + PATTERN ((A){3}) + DEFINE A AS 'A' = ANY(flags) +); + id | flags | match_start | match_end +----+-------+-------------+----------- + 1 | {A} | 1 | 3 + 2 | {A} | | + 3 | {A} | | + 4 | {X} | | +(4 rows) + +-- Expected: 1-3 +-- Test: A B C | A B C D E (lexical order - first alt wins) +WITH jacob_lex_first AS ( + SELECT * FROM (VALUES + (1, ARRAY['A']), + (2, ARRAY['B']), + (3, ARRAY['C']), + (4, ARRAY['D']), + (5, ARRAY['E']) + ) AS t(id, flags) +) +SELECT id, flags, first_value(id) OVER w AS match_start, last_value(id) OVER w AS match_end +FROM jacob_lex_first +WINDOW w AS ( + ORDER BY id + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + PATTERN (A B C | A B C D E) + DEFINE + A AS 'A' = ANY(flags), + B AS 'B' = ANY(flags), + C AS 'C' = ANY(flags), + D AS 'D' = ANY(flags), + E AS 'E' = ANY(flags) +); + id | flags | match_start | match_end +----+-------+-------------+----------- + 1 | {A} | 1 | 3 + 2 | {B} | | + 3 | {C} | | + 4 | {D} | | + 5 | {E} | | +(5 rows) + +-- Expected: 1-3 (A B C wins by lexical order) +-- Test: A B C D E | A B C (lexical order - longer first wins) +WITH jacob_lex_long AS ( + SELECT * FROM (VALUES + (1, ARRAY['A']), + (2, ARRAY['B']), + (3, ARRAY['C']), + (4, ARRAY['D']), + (5, ARRAY['E']) + ) AS t(id, flags) +) +SELECT id, flags, first_value(id) OVER w AS match_start, last_value(id) OVER w AS match_end +FROM jacob_lex_long +WINDOW w AS ( + ORDER BY id + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + PATTERN (A B C D E | A B C) + DEFINE + A AS 'A' = ANY(flags), + B AS 'B' = ANY(flags), + C AS 'C' = ANY(flags), + D AS 'D' = ANY(flags), + E AS 'E' = ANY(flags) +); + id | flags | match_start | match_end +----+-------+-------------+----------- + 1 | {A} | 1 | 5 + 2 | {B} | | + 3 | {C} | | + 4 | {D} | | + 5 | {E} | | +(5 rows) + +-- Expected: 1-5 (A B C D E wins by lexical order) +-- ============================================ +-- Alternation with quantifiers (BUG cases from Jacob's tests) +-- ============================================ +-- Test: (A | B)+ C - alternation inside quantified group followed by C +-- BUG: Currently returns NULL instead of matching 1-4 +WITH jacob_alt_quant AS ( + SELECT * FROM (VALUES + (1, ARRAY['A']), + (2, ARRAY['B']), + (3, ARRAY['A']), + (4, ARRAY['C']) + ) AS t(id, flags) +) +SELECT id, flags, first_value(id) OVER w AS match_start, last_value(id) OVER w AS match_end +FROM jacob_alt_quant +WINDOW w AS ( + ORDER BY id + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + PATTERN ((A | B)+ C) + DEFINE + A AS 'A' = ANY(flags), + B AS 'B' = ANY(flags), + C AS 'C' = ANY(flags) +); + id | flags | match_start | match_end +----+-------+-------------+----------- + 1 | {A} | 1 | 4 + 2 | {B} | | + 3 | {A} | | + 4 | {C} | | +(4 rows) + +-- Expected: 1-4 (A B A C) +-- Test: ((A | B) C)+ - alternation inside group with outer quantifier +-- Currently returns two separate matches (1-2, 3-4) instead of single 1-4 +WITH jacob_alt_group AS ( + SELECT * FROM (VALUES + (1, ARRAY['A']), + (2, ARRAY['C']), + (3, ARRAY['B']), + (4, ARRAY['C']), + (5, ARRAY['X']) + ) AS t(id, flags) +) +SELECT id, flags, first_value(id) OVER w AS match_start, last_value(id) OVER w AS match_end +FROM jacob_alt_group +WINDOW w AS ( + ORDER BY id + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + PATTERN (((A | B) C)+) + DEFINE + A AS 'A' = ANY(flags), + B AS 'B' = ANY(flags), + C AS 'C' = ANY(flags) +); + id | flags | match_start | match_end +----+-------+-------------+----------- + 1 | {A} | 1 | 4 + 2 | {C} | | + 3 | {B} | | + 4 | {C} | | + 5 | {X} | | +(5 rows) + +-- Expected: 1-4 (A C B C) +-- ============================================ +-- RELUCTANT quantifiers (parser only - executor TODO) +-- ============================================ +-- Test: A+? B (reluctant) - CURRENTLY ACTS LIKE A+ (GREEDY) +-- Parser accepts the syntax but executor doesn't differentiate +WITH jacob_reluctant AS ( + SELECT * FROM (VALUES + (1, ARRAY['A']), + (2, ARRAY['A']), + (3, ARRAY['A']), + (4, ARRAY['B']) + ) AS t(id, flags) +) +SELECT id, flags, first_value(id) OVER w AS match_start, last_value(id) OVER w AS match_end +FROM jacob_reluctant +WINDOW w AS ( + ORDER BY id + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + PATTERN (A+? B) + DEFINE + A AS 'A' = ANY(flags), + B AS 'B' = ANY(flags) +); + id | flags | match_start | match_end +----+-------+-------------+----------- + 1 | {A} | 1 | 4 + 2 | {A} | | + 3 | {A} | | + 4 | {B} | | +(4 rows) + +-- Current: 1-4 (A A A B) - greedy behavior +-- Expected when RELUCTANT implemented: 3-4 (A B) - shortest A before B +-- Test: A{1,3}? B (reluctant bounded) +WITH jacob_reluctant_bounded AS ( + SELECT * FROM (VALUES + (1, ARRAY['A']), + (2, ARRAY['A']), + (3, ARRAY['A']), + (4, ARRAY['B']) + ) AS t(id, flags) +) +SELECT id, flags, first_value(id) OVER w AS match_start, last_value(id) OVER w AS match_end +FROM jacob_reluctant_bounded +WINDOW w AS ( + ORDER BY id + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + PATTERN (A{1,3}? B) + DEFINE + A AS 'A' = ANY(flags), + B AS 'B' = ANY(flags) +); + id | flags | match_start | match_end +----+-------+-------------+----------- + 1 | {A} | 1 | 4 + 2 | {A} | | + 3 | {A} | | + 4 | {B} | | +(4 rows) + +-- Current: 1-4 (greedy, takes max A before B) +-- Expected when RELUCTANT implemented: 3-4 (takes min A before B) diff --git a/src/test/regress/sql/rpr.sql b/src/test/regress/sql/rpr.sql index fa8aa50fcb9..19146f7f03d 100644 --- a/src/test/regress/sql/rpr.sql +++ b/src/test/regress/sql/rpr.sql @@ -1368,7 +1368,7 @@ WINDOW w AS ( -- Potential matches: 1-6 (3 reps), 3-6 (2 reps), 5-6 needs 2 reps (fail) -- Row 1 (1-6) absorbs Row 3 (3-6) - same endpoint, top-level unbounded GROUP --- Test absorption 6: Multiple unbounded - NOT absorbable +-- Test absorption 6: Multiple unbounded - first element unbounded enables absorption WITH test_multi_unbounded AS ( SELECT * FROM (VALUES (1, ARRAY['A']), @@ -1389,6 +1389,353 @@ WINDOW w AS ( A AS 'A' = ANY(flags), B AS 'B' = ANY(flags) ); --- Pattern A+ B+ has TWO unbounded elements - NOT absorbable --- Greedy A+ consumes rows 1-2, then B+ matches 3-4 --- Row 1: 1-4, Row 2: 2-4 (different A+ counts, not absorbed) +-- Row 1: 1-4, Row 2: absorbed (same endpoint 4) + +-- ============================================ +-- Jacob's RPR Patterns (from jacob branch) +-- ============================================ + +-- Test: A? (optional, greedy) +WITH jacob_optional AS ( + SELECT * FROM (VALUES + (1, ARRAY['A']), + (2, ARRAY['X']) + ) AS t(id, flags) +) +SELECT id, flags, first_value(id) OVER w AS match_start, last_value(id) OVER w AS match_end +FROM jacob_optional +WINDOW w AS ( + ORDER BY id + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + PATTERN (A?) + DEFINE A AS 'A' = ANY(flags) +); +-- Expected: 1-1 (matches A) + +-- Test: A{2} (exact count) +WITH jacob_exact AS ( + SELECT * FROM (VALUES + (1, ARRAY['A']), + (2, ARRAY['A']), + (3, ARRAY['A']), + (4, ARRAY['X']) + ) AS t(id, flags) +) +SELECT id, flags, first_value(id) OVER w AS match_start, last_value(id) OVER w AS match_end +FROM jacob_exact +WINDOW w AS ( + ORDER BY id + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + PATTERN (A{2}) + DEFINE A AS 'A' = ANY(flags) +); +-- Expected: 1-2 + +-- Test: A{1,3} (bounded range, greedy) +WITH jacob_bounded AS ( + SELECT * FROM (VALUES + (1, ARRAY['A']), + (2, ARRAY['A']), + (3, ARRAY['A']), + (4, ARRAY['A']), + (5, ARRAY['X']) + ) AS t(id, flags) +) +SELECT id, flags, first_value(id) OVER w AS match_start, last_value(id) OVER w AS match_end +FROM jacob_bounded +WINDOW w AS ( + ORDER BY id + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + PATTERN (A{1,3}) + DEFINE A AS 'A' = ANY(flags) +); +-- Expected: 1-3 (greedy takes max), then 4-4 + +-- Test: A | B (simple alternation) +WITH jacob_simple_alt AS ( + SELECT * FROM (VALUES + (1, ARRAY['A']), + (2, ARRAY['B']), + (3, ARRAY['X']) + ) AS t(id, flags) +) +SELECT id, flags, first_value(id) OVER w AS match_start, last_value(id) OVER w AS match_end +FROM jacob_simple_alt +WINDOW w AS ( + ORDER BY id + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + PATTERN (A | B) + DEFINE + A AS 'A' = ANY(flags), + B AS 'B' = ANY(flags) +); +-- Expected: 1-1 (A), 2-2 (B) + +-- Test: A | B | C (three-way alternation) +WITH jacob_three_alt AS ( + SELECT * FROM (VALUES + (1, ARRAY['B']), + (2, ARRAY['X']) + ) AS t(id, flags) +) +SELECT id, flags, first_value(id) OVER w AS match_start, last_value(id) OVER w AS match_end +FROM jacob_three_alt +WINDOW w AS ( + ORDER BY id + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + PATTERN (A | B | C) + DEFINE + A AS 'A' = ANY(flags), + B AS 'B' = ANY(flags), + C AS 'C' = ANY(flags) +); +-- Expected: 1-1 (B) + +-- Test: A B C (concatenation) +WITH jacob_concat AS ( + SELECT * FROM (VALUES + (1, ARRAY['A']), + (2, ARRAY['B']), + (3, ARRAY['C']), + (4, ARRAY['X']) + ) AS t(id, flags) +) +SELECT id, flags, first_value(id) OVER w AS match_start, last_value(id) OVER w AS match_end +FROM jacob_concat +WINDOW w AS ( + ORDER BY id + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + PATTERN (A B C) + DEFINE + A AS 'A' = ANY(flags), + B AS 'B' = ANY(flags), + C AS 'C' = ANY(flags) +); +-- Expected: 1-3 + +-- Test: A B? C (optional middle) +WITH jacob_optional_mid AS ( + SELECT * FROM (VALUES + (1, ARRAY['A']), + (2, ARRAY['C']), + (3, ARRAY['X']) + ) AS t(id, flags) +) +SELECT id, flags, first_value(id) OVER w AS match_start, last_value(id) OVER w AS match_end +FROM jacob_optional_mid +WINDOW w AS ( + ORDER BY id + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + PATTERN (A B? C) + DEFINE + A AS 'A' = ANY(flags), + B AS 'B' = ANY(flags), + C AS 'C' = ANY(flags) +); +-- Expected: 1-2 (A C, B skipped) + +-- Test: (A B){2} (nested group with quantifier) +WITH jacob_nested_group AS ( + SELECT * FROM (VALUES + (1, ARRAY['A']), + (2, ARRAY['B']), + (3, ARRAY['A']), + (4, ARRAY['B']), + (5, ARRAY['X']) + ) AS t(id, flags) +) +SELECT id, flags, first_value(id) OVER w AS match_start, last_value(id) OVER w AS match_end +FROM jacob_nested_group +WINDOW w AS ( + ORDER BY id + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + PATTERN ((A B){2}) + DEFINE + A AS 'A' = ANY(flags), + B AS 'B' = ANY(flags) +); +-- Expected: 1-4 (A B A B) + +-- Test: (A){3} (quantifier on grouped single element) +WITH jacob_group_quant AS ( + SELECT * FROM (VALUES + (1, ARRAY['A']), + (2, ARRAY['A']), + (3, ARRAY['A']), + (4, ARRAY['X']) + ) AS t(id, flags) +) +SELECT id, flags, first_value(id) OVER w AS match_start, last_value(id) OVER w AS match_end +FROM jacob_group_quant +WINDOW w AS ( + ORDER BY id + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + PATTERN ((A){3}) + DEFINE A AS 'A' = ANY(flags) +); +-- Expected: 1-3 + +-- Test: A B C | A B C D E (lexical order - first alt wins) +WITH jacob_lex_first AS ( + SELECT * FROM (VALUES + (1, ARRAY['A']), + (2, ARRAY['B']), + (3, ARRAY['C']), + (4, ARRAY['D']), + (5, ARRAY['E']) + ) AS t(id, flags) +) +SELECT id, flags, first_value(id) OVER w AS match_start, last_value(id) OVER w AS match_end +FROM jacob_lex_first +WINDOW w AS ( + ORDER BY id + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + PATTERN (A B C | A B C D E) + DEFINE + A AS 'A' = ANY(flags), + B AS 'B' = ANY(flags), + C AS 'C' = ANY(flags), + D AS 'D' = ANY(flags), + E AS 'E' = ANY(flags) +); +-- Expected: 1-3 (A B C wins by lexical order) + +-- Test: A B C D E | A B C (lexical order - longer first wins) +WITH jacob_lex_long AS ( + SELECT * FROM (VALUES + (1, ARRAY['A']), + (2, ARRAY['B']), + (3, ARRAY['C']), + (4, ARRAY['D']), + (5, ARRAY['E']) + ) AS t(id, flags) +) +SELECT id, flags, first_value(id) OVER w AS match_start, last_value(id) OVER w AS match_end +FROM jacob_lex_long +WINDOW w AS ( + ORDER BY id + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + PATTERN (A B C D E | A B C) + DEFINE + A AS 'A' = ANY(flags), + B AS 'B' = ANY(flags), + C AS 'C' = ANY(flags), + D AS 'D' = ANY(flags), + E AS 'E' = ANY(flags) +); +-- Expected: 1-5 (A B C D E wins by lexical order) + +-- ============================================ +-- Alternation with quantifiers (BUG cases from Jacob's tests) +-- ============================================ + +-- Test: (A | B)+ C - alternation inside quantified group followed by C +-- BUG: Currently returns NULL instead of matching 1-4 +WITH jacob_alt_quant AS ( + SELECT * FROM (VALUES + (1, ARRAY['A']), + (2, ARRAY['B']), + (3, ARRAY['A']), + (4, ARRAY['C']) + ) AS t(id, flags) +) +SELECT id, flags, first_value(id) OVER w AS match_start, last_value(id) OVER w AS match_end +FROM jacob_alt_quant +WINDOW w AS ( + ORDER BY id + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + PATTERN ((A | B)+ C) + DEFINE + A AS 'A' = ANY(flags), + B AS 'B' = ANY(flags), + C AS 'C' = ANY(flags) +); +-- Expected: 1-4 (A B A C) + +-- Test: ((A | B) C)+ - alternation inside group with outer quantifier +-- Currently returns two separate matches (1-2, 3-4) instead of single 1-4 +WITH jacob_alt_group AS ( + SELECT * FROM (VALUES + (1, ARRAY['A']), + (2, ARRAY['C']), + (3, ARRAY['B']), + (4, ARRAY['C']), + (5, ARRAY['X']) + ) AS t(id, flags) +) +SELECT id, flags, first_value(id) OVER w AS match_start, last_value(id) OVER w AS match_end +FROM jacob_alt_group +WINDOW w AS ( + ORDER BY id + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + PATTERN (((A | B) C)+) + DEFINE + A AS 'A' = ANY(flags), + B AS 'B' = ANY(flags), + C AS 'C' = ANY(flags) +); +-- Expected: 1-4 (A C B C) + +-- ============================================ +-- RELUCTANT quantifiers (parser only - executor TODO) +-- ============================================ + +-- Test: A+? B (reluctant) - CURRENTLY ACTS LIKE A+ (GREEDY) +-- Parser accepts the syntax but executor doesn't differentiate +WITH jacob_reluctant AS ( + SELECT * FROM (VALUES + (1, ARRAY['A']), + (2, ARRAY['A']), + (3, ARRAY['A']), + (4, ARRAY['B']) + ) AS t(id, flags) +) +SELECT id, flags, first_value(id) OVER w AS match_start, last_value(id) OVER w AS match_end +FROM jacob_reluctant +WINDOW w AS ( + ORDER BY id + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + PATTERN (A+? B) + DEFINE + A AS 'A' = ANY(flags), + B AS 'B' = ANY(flags) +); +-- Current: 1-4 (A A A B) - greedy behavior +-- Expected when RELUCTANT implemented: 3-4 (A B) - shortest A before B + +-- Test: A{1,3}? B (reluctant bounded) +WITH jacob_reluctant_bounded AS ( + SELECT * FROM (VALUES + (1, ARRAY['A']), + (2, ARRAY['A']), + (3, ARRAY['A']), + (4, ARRAY['B']) + ) AS t(id, flags) +) +SELECT id, flags, first_value(id) OVER w AS match_start, last_value(id) OVER w AS match_end +FROM jacob_reluctant_bounded +WINDOW w AS ( + ORDER BY id + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + PATTERN (A{1,3}? B) + DEFINE + A AS 'A' = ANY(flags), + B AS 'B' = ANY(flags) +); +-- Current: 1-4 (greedy, takes max A before B) +-- Expected when RELUCTANT implemented: 3-4 (takes min A before B) \ No newline at end of file -- 2.50.1 (Apple Git-155)
From 0ef3fa76fa1aa340a56eb5841183d61bc8129f3e Mon Sep 17 00:00:00 2001 From: Henson Choi <[email protected]> Date: Thu, 15 Jan 2026 22:19:07 +0900 Subject: [PATCH] Row pattern recognition: Enable context absorption only for SKIP PAST LAST ROW --- src/backend/executor/nodeWindowAgg.c | 10 +++-- src/test/regress/expected/rpr.out | 63 +++++++++++++++++++--------- src/test/regress/sql/rpr.sql | 18 ++++++++ 3 files changed, 68 insertions(+), 23 deletions(-) diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c index 3fc73db4e59..4af9998aacf 100644 --- a/src/backend/executor/nodeWindowAgg.c +++ b/src/backend/executor/nodeWindowAgg.c @@ -4796,8 +4796,9 @@ update_reduced_frame(WindowObject winobj, int64 pos) if (ctx->states != NULL) nfa_step(winstate, ctx, NULL, currentPos - 1); } - /* Absorb completed contexts at partition boundary */ - nfa_absorb_contexts(winstate, NULL, currentPos - 1); + /* Absorb completed contexts at partition boundary (SKIP PAST LAST ROW only) */ + if (winstate->rpSkipTo == ST_PAST_LAST_ROW) + nfa_absorb_contexts(winstate, NULL, currentPos - 1); break; } @@ -4815,12 +4816,13 @@ update_reduced_frame(WindowObject winobj, int64 pos) nfa_start_context(winstate, currentPos + 1); /* - * Absorb redundant contexts. + * Absorb redundant contexts (SKIP PAST LAST ROW only). * At the same elementIndex, if newer context's count <= older context's count, * the newer context can be absorbed (for unbounded quantifiers). * Note: Never absorb targetCtx - it's the context we're trying to complete. */ - nfa_absorb_contexts(winstate, targetCtx, currentPos); + if (winstate->rpSkipTo == ST_PAST_LAST_ROW) + nfa_absorb_contexts(winstate, targetCtx, currentPos); /* Check if target context is now complete */ if (targetCtx->states == NULL) diff --git a/src/test/regress/expected/rpr.out b/src/test/regress/expected/rpr.out index 652b1eb811d..f22bba23bff 100644 --- a/src/test/regress/expected/rpr.out +++ b/src/test/regress/expected/rpr.out @@ -884,26 +884,51 @@ SELECT company, tdate, price, first_value(tdate) OVER w, last_value(tdate) OVER ----------+------------+-------+-------------+------------ company1 | 07-01-2023 | 100 | | company1 | 07-02-2023 | 200 | 07-02-2023 | 07-05-2023 - company1 | 07-03-2023 | 150 | | - company1 | 07-04-2023 | 140 | | + company1 | 07-03-2023 | 150 | 07-03-2023 | 07-05-2023 + company1 | 07-04-2023 | 140 | 07-04-2023 | 07-05-2023 company1 | 07-05-2023 | 150 | | company1 | 07-06-2023 | 90 | | company1 | 07-07-2023 | 110 | 07-07-2023 | 07-10-2023 - company1 | 07-08-2023 | 130 | | - company1 | 07-09-2023 | 120 | | + company1 | 07-08-2023 | 130 | 07-08-2023 | 07-10-2023 + company1 | 07-09-2023 | 120 | 07-09-2023 | 07-10-2023 company1 | 07-10-2023 | 130 | | company2 | 07-01-2023 | 50 | | company2 | 07-02-2023 | 2000 | 07-02-2023 | 07-05-2023 - company2 | 07-03-2023 | 1500 | | - company2 | 07-04-2023 | 1400 | | + company2 | 07-03-2023 | 1500 | 07-03-2023 | 07-05-2023 + company2 | 07-04-2023 | 1400 | 07-04-2023 | 07-05-2023 company2 | 07-05-2023 | 1500 | | company2 | 07-06-2023 | 60 | | company2 | 07-07-2023 | 1100 | 07-07-2023 | 07-10-2023 - company2 | 07-08-2023 | 1300 | | - company2 | 07-09-2023 | 1200 | | + company2 | 07-08-2023 | 1300 | 07-08-2023 | 07-10-2023 + company2 | 07-09-2023 | 1200 | 07-09-2023 | 07-10-2023 company2 | 07-10-2023 | 1300 | | (20 rows) +-- SKIP TO NEXT ROW with limited frame (Ishii-san's test case) +-- Each row should produce its own match within its frame +WITH data AS ( + SELECT * FROM (VALUES + ('A', 1), ('A', 2), + ('B', 3), ('B', 4) + ) AS t(gid, id) +) +SELECT gid, id, array_agg(id) OVER w +FROM data +WINDOW w AS ( + PARTITION BY gid + ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING + AFTER MATCH SKIP TO NEXT ROW + PATTERN (A+) + DEFINE A AS id < 10 +); + gid | id | array_agg +-----+----+----------- + A | 1 | {1,2} + A | 2 | {2} + B | 3 | {3,4} + B | 4 | {4} +(4 rows) + -- ROWS BETWEEN CURRENT ROW AND offset FOLLOWING SELECT company, tdate, price, first_value(tdate) OVER w, last_value(tdate) OVER w, count(*) OVER w @@ -2340,8 +2365,8 @@ WINDOW w AS ( 5 | {B} | | 6 | {C} | | 7 | {B} | 7 | 10 - 8 | {B} | | - 9 | {B} | | + 8 | {B} | 8 | 10 + 9 | {B} | 9 | 10 10 | {D} | | (10 rows) @@ -2556,9 +2581,9 @@ WINDOW w AS ( id | flags | match_start | match_end ----+-------+-------------+----------- 1 | {A} | 1 | 4 - 2 | {A} | | - 3 | {A} | | - 4 | {A} | | + 2 | {A} | 2 | 4 + 3 | {A} | 3 | 4 + 4 | {A} | 4 | 4 5 | {B} | | (5 rows) @@ -2589,8 +2614,8 @@ WINDOW w AS ( id | flags | match_start | match_end ----+-------+-------------+----------- 1 | {A} | 1 | 4 - 2 | {A} | | - 3 | {A} | | + 2 | {A} | 2 | 4 + 3 | {A} | 3 | 4 4 | {B} | | 5 | {X} | | (5 rows) @@ -2623,8 +2648,8 @@ WINDOW w AS ( id | flags | match_start | match_end ----+-------+-------------+----------- 1 | {B} | 1 | 4 - 2 | {B} | | - 3 | {B} | | + 2 | {B} | 2 | 4 + 3 | {B} | 3 | 4 4 | {D} | | 5 | {X} | | (5 rows) @@ -2691,7 +2716,7 @@ WINDOW w AS ( ----+-------+-------------+----------- 1 | {A} | 1 | 6 2 | {B} | | - 3 | {A} | | + 3 | {A} | 3 | 6 4 | {B} | | 5 | {A} | | 6 | {B} | | @@ -2725,7 +2750,7 @@ WINDOW w AS ( id | flags | match_start | match_end ----+-------+-------------+----------- 1 | {A} | 1 | 4 - 2 | {A} | | + 2 | {A} | 2 | 4 3 | {B} | | 4 | {B} | | 5 | {X} | | diff --git a/src/test/regress/sql/rpr.sql b/src/test/regress/sql/rpr.sql index 19146f7f03d..93bf1698fd9 100644 --- a/src/test/regress/sql/rpr.sql +++ b/src/test/regress/sql/rpr.sql @@ -355,6 +355,24 @@ SELECT company, tdate, price, first_value(tdate) OVER w, last_value(tdate) OVER B AS price > 100 ); +-- SKIP TO NEXT ROW with limited frame (Ishii-san's test case) +-- Each row should produce its own match within its frame +WITH data AS ( + SELECT * FROM (VALUES + ('A', 1), ('A', 2), + ('B', 3), ('B', 4) + ) AS t(gid, id) +) +SELECT gid, id, array_agg(id) OVER w +FROM data +WINDOW w AS ( + PARTITION BY gid + ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING + AFTER MATCH SKIP TO NEXT ROW + PATTERN (A+) + DEFINE A AS id < 10 +); + -- ROWS BETWEEN CURRENT ROW AND offset FOLLOWING SELECT company, tdate, price, first_value(tdate) OVER w, last_value(tdate) OVER w, count(*) OVER w -- 2.50.1 (Apple Git-155)
0004-NFA-state-memory-memory-management.patch
Description: Binary data
0005-Disable-context-absorption-for-limited-frame.patch
Description: Binary data
