Hi Ishii-san,

During code review of nodeWindowAgg.c, I discovered that
update_reduced_frame()
can receive the same pos value repeatedly for RANGE/GROUPS frames.


----------------------------------------------------------------------
PATTERN DISCOVERED
----------------------------------------------------------------------

For RANGE/GROUPS frames with peer rows (rows having the same ORDER BY
value):

  Data: id=1,2,3 all have val=10 (peer rows)
        id=4,5 both have val=20 (peer rows)

  RANGE frame processing:
  - id=1 processing: pos=0 (frameheadpos points to first peer)
  - id=2 processing: pos=0 (same frameheadpos)
  - id=3 processing: pos=0 (same frameheadpos)
  - id=4 processing: pos=3
  - id=5 processing: pos=3

ROWS frames always have sequential pos values (0,1,2,3,4...), but
RANGE/GROUPS
can have repeated pos values (0,0,0,3,3...).


----------------------------------------------------------------------
CURRENT IMPLEMENTATION ISSUE
----------------------------------------------------------------------

The current implementation appears to assume sequential processing only,
which
is valid for ROWS frames. Two specific issues found:

1. In update_reduced_frame():
   - The pos <= nfaLastProcessedRow check assumes "already processed pos
doesn't
     need reprocessing", but RANGE/GROUPS require processing the same pos
     multiple times.

2. In nfa_process_row():
   - Frame boundary calculation: ctxFrameEnd = matchStartRow + frameOffset
+ 1
   - This adds frameOffset directly to row position, which works for ROWS
but
     may be incorrect for RANGE (value-based) and GROUPS (group-based)
frames.


----------------------------------------------------------------------
TEST COVERAGE LIMITATION
----------------------------------------------------------------------

The existing RANGE/GROUPS test cases in rpr_base.sql only cover simple
scenarios
that don't reach the frame end, which is why this issue wasn't detected.

Most tests target ROWS frames, with only a small number testing RANGE/GROUPS
frames.


----------------------------------------------------------------------
PROPOSAL
----------------------------------------------------------------------

Rather than directly modifying these functions, we should consider an
alternative approach for non-UNBOUNDED frames:
- Strictly respect the frame head position
- Minimize lower-level optimizations (absorption/pruning)
- Avoid creating subsequent contexts for limited reuse benefit
- For bounded frames, the benefit of context reuse across rows may be
limited
  compared to UNBOUNDED frames where contexts can be reused extensively
- This would simplify RANGE/GROUPS handling at the cost of some optimization

To implement this properly, we need to broaden our understanding of the
execution structure through code review of Window clauses and Aggregate
functions.

Therefore, for this round of nodeWindowAgg.c review:
1. Perform code simplification/refactoring for ROWS frames only
2. Add FIXME comments to RANGE/GROUPS test cases
3. Defer RANGE/GROUPS handling to a future phase

There may be other issues beyond RANGE/GROUPS.
After this code review round is complete, I would like to regroup and
discuss
the direction forward.

Best regards,
Henson

Reply via email to