On 23 June 2012 06:22, Roland Mainz <roland.ma...@nrubsig.org> wrote:
> On Sat, Jun 23, 2012 at 5:55 AM, Glenn Fowler <g...@research.att.com> wrote:
>> On Sat, 23 Jun 2012 03:40:15 +0200 Roland Mainz wrote:
>>> On Sat, Jun 23, 2012 at 2:34 AM, Roland Mainz <roland.ma...@nrubsig.org> 
>>> wrote:
>>> > Here's another issue with regex. We wrote the following script to
>>> > parse XML fragments:
>>> > -- snip --
>>> > typeset -r xmltext='<h1 ><div> a text </div>More [TEXT].<!-- a comment
>>> > (<disabled>) --></h1>'
>>> >
>>> > #
>>> > # parse the XML data
>>> > #
>>> > typeset dummy
>>> >
>>> > dummy="${xmltext//~(Ex)(?:
>>> >        (<!--.+-->)+?|  # xml comments
>>> >        (<[[:alnum:]_-:]+
>>> >                (?: # attributes
>>> >                        [:space:]+
>>
>>> Grumpf... this should be [[:space:]]+ ...
>>> >                        (?:[[:alnum:]_-:]+=[^[:space:]\"\']+)|  
>>> > #x='foo=bar huz=123'
>>> >                        (?:[[:alnum:]_-:]+=\"[^\"]*\")|         
>>> > #x='foo="ba=r o" huz=123'
>>> >                        (?:[[:alnum:]_-:]+=\'[^\"]*\')|         
>>> > #x="foo='ba=r o' huz=123"
>>> >                        (?:[[:alnum:]_-:]+)                     #x="foox 
>>> > huz=123"
>>> >                )*
>>> >                [:space:]*
>>
>>> Grumpf... this should be [[:space:]]* ...
>>
>>> Erm... David/Glenn... can we get a ~(<modifer>) flag which enables...
>>> 1. ... strict pattern interpretation
>>> 2. ... forces (controlled by ~(<modifer>) ... unless there's something
>>> else which already enabled that elsewhere (like a global shell
>>> option)) ksh93 to print runtime error messages if a pattern fails to
>>> compile
>>> ... please ?
>>
>> well again I think this is regex giving users enough rope to do whatever
>
> Erm... well in this case it gave enough rope for this:
> -- snip --
>    print " _________     \n";
>    print "|         |    \n";
>    print "|         0    \n";
>    print "|        /|\\  \n";
>    print "|        / \\  \n";
>    print "|              \n";
>    print "|              \n";
> -- snip --
>
>> [:space:] is a valid RE
>> in this example it happens to be a typo
>> in another context where, e.g., [...] classes are constructed by code,
>> its possible for duplicates to appear in the class
>>
>> regex should not be expected to complain about syntactically correct patterns
>>
>> if the pattern did have a syntax error regcomp() would report it to the 
>> caller
>> so reporting an error or not is not a regex issue
>> so it doesn't make sense to add something to ~(...) to check syntax
>> because regcomp() already does it by default (modulo the ast REG_LENIENT 
>> flag, which
>> is settable in ~(...))
>
> Uhm... which letter for "flags" in ~(<modifer><flags>) controls
> |REG_LENIENT| ? Looking at
> http://www2.research.att.com/~gsf/testregex/testregex.c it seems to be
> 'x' ... but isn"t this ready used for the free-spacing mode ?

I think this is ~(Ep), p is on by default and you want ~(E-p). Glenn,
can you confirm this?

Lionel

_______________________________________________
ast-developers mailing list
ast-developers@research.att.com
https://mailman.research.att.com/mailman/listinfo/ast-developers

Reply via email to