Hi SungJun,
Thanks for the cross-validation report.
This appears to be due to its NFA implementation combined with Context
> Absorption,
>
which discards redundant matching contexts when they reach the same NFA
> state
>
but start later in the input.
>
> Example:
>
> Ctx1 start=0 length=2
> Ctx2 start=1 length=1
>
> Ctx1 subsumes Ctx2, so Ctx2 is discarded.
>
> This keeps the number of active contexts small and prevents quadratic
> growth.
>
Your description is accurate. An earlier-started context subsumes
all matches of a later-started context (monotonicity principle),
so the newer one can be safely removed.
However, Context Absorption is not universally applicable. It
requires careful analysis of several factors, and we continue to
refine the conditions under which absorption can be safely applied.
The current scope of absorption is determined by three factors:
1. Pattern structure:
Currently supported:
- Simple unbounded quantifiers: A+, (A B)+
- Leading unbounded quantifier in group: (A+ B)+
- First iteration of nested unbounded alternation (e.g., ((A+)|(B+))+)
Planned improvements:
- Patterns with a prefix before the repeating group (e.g., A (B+) C)
- Fixed-length iteration groups (e.g., (A B{2})+ where each
iteration consumes exactly 3 rows)
2. Frame specification:
- AFTER MATCH SKIP TO NEXT ROW is not eligible (only
SKIP PAST LAST ROW)
- Frame end must be UNBOUNDED FOLLOWING
3. DEFINE clause evaluation:
- Absorption relies on the monotonicity assumption that DEFINE
conditions produce the same result regardless of starting
position. Currently DEFINE uses simple boolean conditions
where this holds. However, future additions such as
CLASSIFIER() or running aggregates (e.g., RUNNING SUM())
may introduce context-dependent evaluation that breaks
this invariance, requiring absorption to be disabled for
those cases.
That said, many practically useful patterns fall within these
conditions. Here are examples from our regression tests using
synthetic stock data:
V-shape recovery (decline then rise):
PATTERN (DECLINE{4,} RISE{4,})
DEFINE
DECLINE AS price < PREV(price),
RISE AS price > PREV(price)
W-bottom (decline, bounce, re-decline, recovery):
PATTERN (DECLINE{3,} BOUNCE{3,} DIP{3,} RECOVER{3,})
DEFINE
DECLINE AS price < PREV(price),
BOUNCE AS price > PREV(price),
DIP AS price < PREV(price),
RECOVER AS price > PREV(price)
Consolidation then breakout:
PATTERN (FLAT{5,} BREAKOUT)
DEFINE
FLAT AS price BETWEEN PREV(price) * 0.98 AND PREV(price) * 1.02,
BREAKOUT AS price > PREV(price) * 1.05
Price-volume divergence (bearish signal):
PATTERN (INIT DIVERGE{3,})
DEFINE
DIVERGE AS price > PREV(price) AND volume < PREV(volume)
Volume climax reversal:
PATTERN (RALLY{3,} CLIMAX SELLOFF{2,})
DEFINE
RALLY AS price > PREV(price),
CLIMAX AS volume > PREV(volume) * 1.5,
SELLOFF AS price < PREV(price)
All of these use unbounded quantifiers with position-invariant
DEFINE conditions -- exactly the cases where Context Absorption
guarantees O(n) scaling. So despite the constraints, it remains
a powerful optimization for the workloads where RPR is most
commonly applied.
Best regards,
Henson