> On 30 Sep 2024, at 11:26, Austin Group Bug Tracker via austin-group-l at The 
> Open Group <[email protected]> wrote:

>> Geoff, have you considered my torture testing case? How do you see your
> wording cover that case?
> 
> - regex:
> [0-9](((((([0-9][a-z]+[0-9])+?)+)+?)+)+?)+[0-9]
> 
> - string to be matched:
> 12abc34def56ghi78jkl90mnop12qrst34uvw56xyz78

I will briefly analyze the examples given:
[0-9](((((([0-9][a-z]*[0-9])*?)*)*?)*)*?)*[0-9]
[0-9](((((([0-9][a-z]+[0-9])+?)+)+?)+)+?)+[0-9]
on the string
12abc34def56ghi78jkl90mnop12qrst34uvw56xyz78

The second example, with the ‘+’, gets 258 states, and computing all the 
matches runs at least into the tens of millions. —I did not finish this 
computation. So it seems a good stress test.

The first example curiously gets the same NFA as
[0-9]([0-9][a-z]*[0-9])*[0-9]

In my program, the NFA of the latter is:
(ε ↦ {0}, {4}, {
  0: {[0-9] ↦ {4, 1}}
  1: {[0-9] ↦ {3, 2}}
  2: {[a-z] ↦ {3, 2}}
  3: {[0-9] ↦ {4, 1}}
  4: {[0-9] ↦ {∅}}},
)
Here, {0} is the set of start states, and {4} the set of states with a final 
state ∅.

The set of matches of the string:
12abc34def56ghi78jkl90mnop12qrst34uvw56xyz78
is:
@2: (12).
@7: (12abc34).
@12: (12abc34def56).
@17: (12abc34def56ghi78).
@22: (12abc34def56ghi78jkl90).
@28: (12abc34def56ghi78jkl90mnop12).
@34: (12abc34def56ghi78jkl90mnop12qrst34).
@39: (12abc34def56ghi78jkl90mnop12qrst34uvw56).
@44: (12abc34def56ghi78jkl90mnop12qrst34uvw56xyz78).
Here, @n is the position of the string where the match ends.

Then the DFA states are simply sets of NFA states. One computes a DFA state for 
a character by transitioning forward, including nullable transitions, if 
possible, until the character has been matched ones.

The partial DFA for the match above is:
{
{3, 2}: w ↦ {2, 3}, t ↦ {2, 3}, s ↦ {2, 3}, r ↦ {2, 3}, v ↦ {2, 3}, 1 ↦ {1, 4}, 
p ↦ {2, 3}, o ↦ {2, 3}, n ↦ {2, 3}, 9 ↦ {1, 4}, k ↦ {2, 3}, i ↦ {2, 3}, h ↦ {2, 
3}, l ↦ {2, 3}, z ↦ {2, 3}, c ↦ {2, 3}, 5 ↦ {1, 4}, f ↦ {2, 3}, 7 ↦ {1, 4}, e ↦ 
{2, 3}, 3 ↦ {1, 4}, y ↦ {2, 3}, b ↦ {2, 3};
{∅, 3, 2}∅: q ↦ {2, 3}, x ↦ {2, 3}, m ↦ {2, 3}, u ↦ {2, 3}, j ↦ {2, 3}, g ↦ {2, 
3}, d ↦ {2, 3}, a ↦ {2, 3};
{4, 1}: 0 ↦ {2, 3, ∅}∅, 8 ↦ {2, 3, ∅}∅, 6 ↦ {2, 3, ∅}∅, 4 ↦ {2, 3, ∅}∅, 2 ↦ {2, 
3, ∅}∅;
{0}: 1 ↦ {1, 4}};
)
Here, as the NFA has a single start state 0, the initial DFA state becomes {0}.

For the match string first character ‘1’, it can transition to states 1 or 4, 
giving the DFA transition
{0}: ‘1’ ↦ {1, 4}
as written last above, where characters are without quotes.

And so on, for the other DFA states.

It is also possible to mark up sub-NFAs with action numbers. Write action 
numbers as subscripts, say:
(([0-9]₁(([0-9]₂([a-z]₃*)₄[0-9]₅)₆*)₇[0-9]₈)₉
Parenthesis are here used only for grouping, with no other semantic meaning.

Then the NFA, with these action numbers indicated using bra-ket notation 
⟨k|…|k⟩, becomes:
(ε ↦ {<9|<1|0}, {4}, {
  0: {[0-9] ↦ {|1><7|<6|<2|1, |1><7||7><8|4}}
  1: {[0-9] ↦ {|2><4|<3|2, |2><4||4><5|3}}
  2: {[a-z] ↦ {|3><3|2, |3>|4><5|3}}
  3: {[0-9] ↦ {|5>|6><6|<2|1, |5>|6>|7><8|4}}
  4: {[0-9] ↦ {|8>|9>∅}}},
)
Here, {<9|<1|0} means that the start state 0 also begins the actions 9 and 1: 
<9|… and <1|…, and so on.

The full set of matches above, but now with action numbers, is given below, 
with quotes ‘…’ around the match string characters for readability.

For @2, ⟨9|…|9⟩ says that the characters ‘1’ and ‘2’ in between are matched, 
that is all, as expected. There is also ⟨7||7⟩ indicating that the part 
(([0-9][a-z]*[0-9])*)₇ was matched with no characters.

@2: (⟨9|⟨1|‘1’|1⟩⟨7||7⟩⟨8|‘2’|8⟩|9⟩).
@7: 
(⟨9|⟨1|‘1’|1⟩⟨7|⟨6|⟨2|‘2’|2⟩⟨4|⟨3|‘a’|3⟩⟨3|‘b’|3⟩⟨3|‘c’|3⟩|4⟩⟨5|‘3’|5⟩|6⟩|7⟩⟨8|‘4’|8⟩|9⟩).
@12: 
(⟨9|⟨1|‘1’|1⟩⟨7|⟨6|⟨2|‘2’|2⟩⟨4|⟨3|‘a’|3⟩⟨3|‘b’|3⟩⟨3|‘c’|3⟩|4⟩⟨5|‘3’|5⟩|6⟩⟨6|⟨2|‘4’|2⟩⟨4|⟨3|‘d’|3⟩⟨3|‘e’|3⟩⟨3|‘f’|3⟩|4⟩⟨5|‘5’|5⟩|6⟩|7⟩⟨8|‘6’|8⟩|9⟩).
@17: 
(⟨9|⟨1|‘1’|1⟩⟨7|⟨6|⟨2|‘2’|2⟩⟨4|⟨3|‘a’|3⟩⟨3|‘b’|3⟩⟨3|‘c’|3⟩|4⟩⟨5|‘3’|5⟩|6⟩⟨6|⟨2|‘4’|2⟩⟨4|⟨3|‘d’|3⟩⟨3|‘e’|3⟩⟨3|‘f’|3⟩|4⟩⟨5|‘5’|5⟩|6⟩⟨6|⟨2|‘6’|2⟩⟨4|⟨3|‘g’|3⟩⟨3|‘h’|3⟩⟨3|‘i’|3⟩|4⟩⟨5|‘7’|5⟩|6⟩|7⟩⟨8|‘8’|8⟩|9⟩).
@22: 
(⟨9|⟨1|‘1’|1⟩⟨7|⟨6|⟨2|‘2’|2⟩⟨4|⟨3|‘a’|3⟩⟨3|‘b’|3⟩⟨3|‘c’|3⟩|4⟩⟨5|‘3’|5⟩|6⟩⟨6|⟨2|‘4’|2⟩⟨4|⟨3|‘d’|3⟩⟨3|‘e’|3⟩⟨3|‘f’|3⟩|4⟩⟨5|‘5’|5⟩|6⟩⟨6|⟨2|‘6’|2⟩⟨4|⟨3|‘g’|3⟩⟨3|‘h’|3⟩⟨3|‘i’|3⟩|4⟩⟨5|‘7’|5⟩|6⟩⟨6|⟨2|‘8’|2⟩⟨4|⟨3|‘j’|3⟩⟨3|‘k’|3⟩⟨3|‘l’|3⟩|4⟩⟨5|‘9’|5⟩|6⟩|7⟩⟨8|‘0’|8⟩|9⟩).
@28: 
(⟨9|⟨1|‘1’|1⟩⟨7|⟨6|⟨2|‘2’|2⟩⟨4|⟨3|‘a’|3⟩⟨3|‘b’|3⟩⟨3|‘c’|3⟩|4⟩⟨5|‘3’|5⟩|6⟩⟨6|⟨2|‘4’|2⟩⟨4|⟨3|‘d’|3⟩⟨3|‘e’|3⟩⟨3|‘f’|3⟩|4⟩⟨5|‘5’|5⟩|6⟩⟨6|⟨2|‘6’|2⟩⟨4|⟨3|‘g’|3⟩⟨3|‘h’|3⟩⟨3|‘i’|3⟩|4⟩⟨5|‘7’|5⟩|6⟩⟨6|⟨2|‘8’|2⟩⟨4|⟨3|‘j’|3⟩⟨3|‘k’|3⟩⟨3|‘l’|3⟩|4⟩⟨5|‘9’|5⟩|6⟩⟨6|⟨2|‘0’|2⟩⟨4|⟨3|‘m’|3⟩⟨3|‘n’|3⟩⟨3|‘o’|3⟩⟨3|‘p’|3⟩|4⟩⟨5|‘1’|5⟩|6⟩|7⟩⟨8|‘2’|8⟩|9⟩).
@34: 
(⟨9|⟨1|‘1’|1⟩⟨7|⟨6|⟨2|‘2’|2⟩⟨4|⟨3|‘a’|3⟩⟨3|‘b’|3⟩⟨3|‘c’|3⟩|4⟩⟨5|‘3’|5⟩|6⟩⟨6|⟨2|‘4’|2⟩⟨4|⟨3|‘d’|3⟩⟨3|‘e’|3⟩⟨3|‘f’|3⟩|4⟩⟨5|‘5’|5⟩|6⟩⟨6|⟨2|‘6’|2⟩⟨4|⟨3|‘g’|3⟩⟨3|‘h’|3⟩⟨3|‘i’|3⟩|4⟩⟨5|‘7’|5⟩|6⟩⟨6|⟨2|‘8’|2⟩⟨4|⟨3|‘j’|3⟩⟨3|‘k’|3⟩⟨3|‘l’|3⟩|4⟩⟨5|‘9’|5⟩|6⟩⟨6|⟨2|‘0’|2⟩⟨4|⟨3|‘m’|3⟩⟨3|‘n’|3⟩⟨3|‘o’|3⟩⟨3|‘p’|3⟩|4⟩⟨5|‘1’|5⟩|6⟩⟨6|⟨2|‘2’|2⟩⟨4|⟨3|‘q’|3⟩⟨3|‘r’|3⟩⟨3|‘s’|3⟩⟨3|‘t’|3⟩|4⟩⟨5|‘3’|5⟩|6⟩|7⟩⟨8|‘4’|8⟩|9⟩).
@39: 
(⟨9|⟨1|‘1’|1⟩⟨7|⟨6|⟨2|‘2’|2⟩⟨4|⟨3|‘a’|3⟩⟨3|‘b’|3⟩⟨3|‘c’|3⟩|4⟩⟨5|‘3’|5⟩|6⟩⟨6|⟨2|‘4’|2⟩⟨4|⟨3|‘d’|3⟩⟨3|‘e’|3⟩⟨3|‘f’|3⟩|4⟩⟨5|‘5’|5⟩|6⟩⟨6|⟨2|‘6’|2⟩⟨4|⟨3|‘g’|3⟩⟨3|‘h’|3⟩⟨3|‘i’|3⟩|4⟩⟨5|‘7’|5⟩|6⟩⟨6|⟨2|‘8’|2⟩⟨4|⟨3|‘j’|3⟩⟨3|‘k’|3⟩⟨3|‘l’|3⟩|4⟩⟨5|‘9’|5⟩|6⟩⟨6|⟨2|‘0’|2⟩⟨4|⟨3|‘m’|3⟩⟨3|‘n’|3⟩⟨3|‘o’|3⟩⟨3|‘p’|3⟩|4⟩⟨5|‘1’|5⟩|6⟩⟨6|⟨2|‘2’|2⟩⟨4|⟨3|‘q’|3⟩⟨3|‘r’|3⟩⟨3|‘s’|3⟩⟨3|‘t’|3⟩|4⟩⟨5|‘3’|5⟩|6⟩⟨6|⟨2|‘4’|2⟩⟨4|⟨3|‘u’|3⟩⟨3|‘v’|3⟩⟨3|‘w’|3⟩|4⟩⟨5|‘5’|5⟩|6⟩|7⟩⟨8|‘6’|8⟩|9⟩).
@44: 
(⟨9|⟨1|‘1’|1⟩⟨7|⟨6|⟨2|‘2’|2⟩⟨4|⟨3|‘a’|3⟩⟨3|‘b’|3⟩⟨3|‘c’|3⟩|4⟩⟨5|‘3’|5⟩|6⟩⟨6|⟨2|‘4’|2⟩⟨4|⟨3|‘d’|3⟩⟨3|‘e’|3⟩⟨3|‘f’|3⟩|4⟩⟨5|‘5’|5⟩|6⟩⟨6|⟨2|‘6’|2⟩⟨4|⟨3|‘g’|3⟩⟨3|‘h’|3⟩⟨3|‘i’|3⟩|4⟩⟨5|‘7’|5⟩|6⟩⟨6|⟨2|‘8’|2⟩⟨4|⟨3|‘j’|3⟩⟨3|‘k’|3⟩⟨3|‘l’|3⟩|4⟩⟨5|‘9’|5⟩|6⟩⟨6|⟨2|‘0’|2⟩⟨4|⟨3|‘m’|3⟩⟨3|‘n’|3⟩⟨3|‘o’|3⟩⟨3|‘p’|3⟩|4⟩⟨5|‘1’|5⟩|6⟩⟨6|⟨2|‘2’|2⟩⟨4|⟨3|‘q’|3⟩⟨3|‘r’|3⟩⟨3|‘s’|3⟩⟨3|‘t’|3⟩|4⟩⟨5|‘3’|5⟩|6⟩⟨6|⟨2|‘4’|2⟩⟨4|⟨3|‘u’|3⟩⟨3|‘v’|3⟩⟨3|‘w’|3⟩|4⟩⟨5|‘5’|5⟩|6⟩⟨6|⟨2|‘6’|2⟩⟨4|⟨3|‘x’|3⟩⟨3|‘y’|3⟩⟨3|‘z’|3⟩|4⟩⟨5|‘7’|5⟩|6⟩|7⟩⟨8|‘8’|8⟩|9⟩).
}



  • [1003.1(2024... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [1003.1(2024... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [1003.1(2024... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [1003.1(2024... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [1003.1(2024... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [1003.1(2024... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [1003.1(2024... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [1003.1(2024... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [1003.1(2024... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [1003.1(2024... Austin Group Bug Tracker via austin-group-l at The Open Group
    • Re: [10... Hans Åberg via austin-group-l at The Open Group
  • [1003.1(2024... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [1003.1(2024... Austin Group Bug Tracker via austin-group-l at The Open Group
    • Re: [10... Hans Åberg via austin-group-l at The Open Group
  • [1003.1(2024... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [1003.1(2024... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [1003.1(2024... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [1003.1(2024... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [1003.1(2024... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [1003.1(2024... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [1003.1(2024... Austin Group Bug Tracker via austin-group-l at The Open Group

Reply via email to