Hi Henson,

> Hi Tatsuo,
> 
> Here are two incremental patches on top of v43 + our previous two.

I have tried v43 + those patches and it passed the regression test.

> nocfbot-0003: Fix ALT lexical ordering via DFS early termination
> 
> The altPriority FIXME turned out to have a simple solution.  The key
> insight is that the NFA state list is already in lexical order from
> DFS traversal, so when a state reaches FIN, all remaining states in
> the list have worse lexical priority and can be pruned immediately.
> 
> This makes the altPriority field unnecessary -- DFS traversal order
> itself guarantees correct lexical ordering.  The patch removes
> altPriority from RPRNFAState entirely and adds early termination
> in nfa_advance(): when a new FIN is reached, the remaining states
> are freed and processing stops.
> 
> This also implements the state pruning optimization I mentioned
> earlier -- it falls out naturally from the same mechanism.
> 
> Changes:
> - Remove altPriority field from RPRNFAState and all call sites
> - Add early termination in nfa_advance() on new FIN arrival
> - Simplify nfa_add_matched_state() to unconditional replacement
> - Fix outer END count increment for quantified VARs in
>   nfa_advance_var()
> - Remove FIXME 1 test cases, add test_alt_lexical_order test
> - Add EXPLAIN ANALYZE test verifying early termination statistics

Sounds good.

> nocfbot-0004: Implement reluctant quantifiers
> 
> With the DFS early termination infrastructure from nocfbot-0003,
> reluctant quantifiers become straightforward: reverse the DFS
> traversal order so that shorter matches are explored first, then
> apply the same early termination to stop at the shortest match.
> 
> Greedy explores enter/loop before skip/exit; reluctant reverses
> this to skip/exit before enter/loop.  When the preferred (shorter)
> path reaches FIN, the longer path is pruned immediately.
> 
> Changes:
> - Remove parser error rejecting reluctant quantifier syntax
> - Extend tryUnwrapGroup() to propagate reluctant flag when
>   unwrapping single-child groups: (A)+? -> A+?
> - Add reluctant branching in nfa_advance_begin(),
>   nfa_advance_end(), and nfa_advance_var() with per-branch
>   early termination
> - Add tests in rpr_base.sql, rpr_nfa.sql, and rpr.sql covering
>   basic reluctant semantics, quantifier boundaries, interaction
>   with greedy quantifiers, and nested/alternation combinations

This is really good news!

>> I found reluctant quantifiers are useful but also I don't want to make
>> patch sets far bigger. How do you estimate the difficulty and the size
>> of the code for reluctant quantifiers?
> 
> You asked earlier how I estimated the difficulty of reluctant
> quantifiers.  It turned out the answer was subtraction, not
> addition -- removing altPriority and simplifying the match logic
> first made reluctant quantifiers almost trivial to add on top.

Glad to hear that you found simpler solution to implement reluchtant
quantifiers.

> Next I plan to work on the remaining FIXME (cycle prevention for
> patterns like (A*)*), A{0} implementation, and test reorganization.

Ok. Looking forward to the next patches.

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp


Reply via email to