Re: Row pattern recognition

Henson Choi Wed, 11 Mar 2026 03:32:50 -0700

Hi SungJun,

Thanks for the cross-validation report.


This appears to be due to its NFA implementation combined with Context
> Absorption,
>
which discards redundant matching contexts when they reach the same NFA
> state
>
but start later in the input.
>
> Example:
>
> Ctx1 start=0 length=2
> Ctx2 start=1 length=1
>
> Ctx1 subsumes Ctx2, so Ctx2 is discarded.
>
> This keeps the number of active contexts small and prevents quadratic
> growth.
>

Your description is accurate. An earlier-started context subsumes
all matches of a later-started context (monotonicity principle),
so the newer one can be safely removed.

However, Context Absorption is not universally applicable. It
requires careful analysis of several factors, and we continue to
refine the conditions under which absorption can be safely applied.

The current scope of absorption is determined by three factors:

1. Pattern structure:
   Currently supported:
   - Simple unbounded quantifiers: A+, (A B)+
   - Leading unbounded quantifier in group: (A+ B)+
   - First iteration of nested unbounded alternation (e.g., ((A+)|(B+))+)
   Planned improvements:
   - Patterns with a prefix before the repeating group (e.g., A (B+) C)
   - Fixed-length iteration groups (e.g., (A B{2})+ where each
     iteration consumes exactly 3 rows)

2. Frame specification:
   - AFTER MATCH SKIP TO NEXT ROW is not eligible (only
     SKIP PAST LAST ROW)
   - Frame end must be UNBOUNDED FOLLOWING

3. DEFINE clause evaluation:
   - Absorption relies on the monotonicity assumption that DEFINE
     conditions produce the same result regardless of starting
     position. Currently DEFINE uses simple boolean conditions
     where this holds. However, future additions such as
     CLASSIFIER() or running aggregates (e.g., RUNNING SUM())
     may introduce context-dependent evaluation that breaks
     this invariance, requiring absorption to be disabled for
     those cases.

That said, many practically useful patterns fall within these
conditions. Here are examples from our regression tests using
synthetic stock data:

  V-shape recovery (decline then rise):
    PATTERN (DECLINE{4,} RISE{4,})
    DEFINE
      DECLINE AS price < PREV(price),
      RISE AS price > PREV(price)

  W-bottom (decline, bounce, re-decline, recovery):
    PATTERN (DECLINE{3,} BOUNCE{3,} DIP{3,} RECOVER{3,})
    DEFINE
      DECLINE AS price < PREV(price),
      BOUNCE AS price > PREV(price),
      DIP AS price < PREV(price),
      RECOVER AS price > PREV(price)

  Consolidation then breakout:
    PATTERN (FLAT{5,} BREAKOUT)
    DEFINE
      FLAT AS price BETWEEN PREV(price) * 0.98 AND PREV(price) * 1.02,
      BREAKOUT AS price > PREV(price) * 1.05

  Price-volume divergence (bearish signal):
    PATTERN (INIT DIVERGE{3,})
    DEFINE
      DIVERGE AS price > PREV(price) AND volume < PREV(volume)

  Volume climax reversal:
    PATTERN (RALLY{3,} CLIMAX SELLOFF{2,})
    DEFINE
      RALLY AS price > PREV(price),
      CLIMAX AS volume > PREV(volume) * 1.5,
      SELLOFF AS price < PREV(price)

All of these use unbounded quantifiers with position-invariant
DEFINE conditions -- exactly the cases where Context Absorption
guarantees O(n) scaling. So despite the constraints, it remains
a powerful optimization for the workloads where RPR is most
commonly applied.

Best regards,
Henson

Re: Row pattern recognition

Reply via email to