On 2018-10-28 21:04, Karsten Hilbert wrote:
On Sun, Oct 28, 2018 at 09:43:27PM +0100, Karsten Hilbert wrote:

Let my try to explain the expression I am actually after
(assuming .compile with re.VERBOSE):

rx_works = '
        \$<                          # start of match is literal '$<' anywhere 
inside string
        [^<:]+?::            # followed by at least one "character", except '<' or ':', 
until the next '::'             (this is the placeholder "name")
        .*?::                   # followed by any number of any "character", until the 
next '::'                                      (this is the placeholder "options")
        \d*?                    # followed by any number of digits              
                                                                                
        (the max length of placeholder output)
        >\$                          # followed by '>$'
        |                               # -- OR (in *either* order) --
        \$<                          # start of match is literal '$<' anywhere 
inside string
        [^<:]+?::            # followed by at least one "character", except '<' or ':', 
until the next '::'             (this is the placeholder "name")
        .*?::                   # followed by any number of any "character", until the 
next '::'                                      (this is the placeholder "options")
                                        # now the difference:
        \d+-\d+                 # followed by one-or-many digits, a '-', and 
one-or-many digits                                         (this is the *range* 
from with placeholder output)
        >\$'                 # followed by '>$'

Another try:

- lines can contain several placeholders

- placeholders start and end with '$'

- placeholders are parsed in three passes

- the pass in which a placeholder is parsed is denoted by the number of '<' and 
'>' next to the '$':

        $<...>$ / $<<...>>$ / $<<<...>>>$

- placeholders for different parsing passes must be nestable:

        $<<<...$<...>$...>>>$
        ....
        (lower=earlier parsing passes will be inside)

- the internal structure is "name::options::range"

        $<name::options::range>$

- name will *not* contain '$' '<' '>' ':'

- range can be either a length or a "from-until"

- a length will be a positive integer (no bounds checking)

- "from-until" is: a positive integer, a '-', and a positive integer (no sanity 
checking)

- options needs to be able to contain nearly anything, except '::'


Is that sufficiently defined and helpful to design the regular expression ?

How can they be nested inside one another?
Is the string scanned, placeholders filled in for that level, and then the string scanned again for the next level? (That would mean that the fill value itself will be scanned in the next pass.)

You could try matching the top level, for each match then match the next level, and for each of those matches then match for the final level.

Trying to do it all in one regex is usually a bad idea. Keep it simple! (Do you even need to use a regex?)
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to