Hi Tatsuo,

Here are two incremental patches on top of v43 + our previous two.


nocfbot-0003: Fix ALT lexical ordering via DFS early termination

The altPriority FIXME turned out to have a simple solution.  The key
insight is that the NFA state list is already in lexical order from
DFS traversal, so when a state reaches FIN, all remaining states in
the list have worse lexical priority and can be pruned immediately.

This makes the altPriority field unnecessary -- DFS traversal order
itself guarantees correct lexical ordering.  The patch removes
altPriority from RPRNFAState entirely and adds early termination
in nfa_advance(): when a new FIN is reached, the remaining states
are freed and processing stops.

This also implements the state pruning optimization I mentioned
earlier -- it falls out naturally from the same mechanism.

Changes:
- Remove altPriority field from RPRNFAState and all call sites
- Add early termination in nfa_advance() on new FIN arrival
- Simplify nfa_add_matched_state() to unconditional replacement
- Fix outer END count increment for quantified VARs in
  nfa_advance_var()
- Remove FIXME 1 test cases, add test_alt_lexical_order test
- Add EXPLAIN ANALYZE test verifying early termination statistics


nocfbot-0004: Implement reluctant quantifiers

With the DFS early termination infrastructure from nocfbot-0003,
reluctant quantifiers become straightforward: reverse the DFS
traversal order so that shorter matches are explored first, then
apply the same early termination to stop at the shortest match.

Greedy explores enter/loop before skip/exit; reluctant reverses
this to skip/exit before enter/loop.  When the preferred (shorter)
path reaches FIN, the longer path is pruned immediately.

Changes:
- Remove parser error rejecting reluctant quantifier syntax
- Extend tryUnwrapGroup() to propagate reluctant flag when
  unwrapping single-child groups: (A)+? -> A+?
- Add reluctant branching in nfa_advance_begin(),
  nfa_advance_end(), and nfa_advance_var() with per-branch
  early termination
- Add tests in rpr_base.sql, rpr_nfa.sql, and rpr.sql covering
  basic reluctant semantics, quantifier boundaries, interaction
  with greedy quantifiers, and nested/alternation combinations


> I found reluctant quantifiers are useful but also I don't want to make
> patch sets far bigger. How do you estimate the difficulty and the size
> of the code for reluctant quantifiers?

You asked earlier how I estimated the difficulty of reluctant
quantifiers.  It turned out the answer was subtraction, not
addition -- removing altPriority and simplifying the match logic
first made reluctant quantifiers almost trivial to add on top.

Next I plan to work on the remaining FIXME (cycle prevention for
patterns like (A*)*), A{0} implementation, and test reorganization.


Best regards,
Henson

>

Attachment: nocfbot-0003-DFS-early-termination.patch
Description: Binary data

Attachment: nocfbot-0004-Implement-reluctant-quantifiers.patch
Description: Binary data

Reply via email to