On Sat, Jun 23, 2012 at 5:55 AM, Glenn Fowler <[email protected]> wrote:
> On Sat, 23 Jun 2012 03:40:15 +0200 Roland Mainz wrote:
>> On Sat, Jun 23, 2012 at 2:34 AM, Roland Mainz <[email protected]>
>> wrote:
>> > Here's another issue with regex. We wrote the following script to
>> > parse XML fragments:
>> > -- snip --
>> > typeset -r xmltext='<h1 ><div> a text </div>More [TEXT].<!-- a comment
>> > (<disabled>) --></h1>'
>> >
>> > #
>> > # parse the XML data
>> > #
>> > typeset dummy
>> >
>> > dummy="${xmltext//~(Ex)(?:
>> > (<!--.+-->)+?| # xml comments
>> > (<[[:alnum:]_-:]+
>> > (?: # attributes
>> > [:space:]+
>
>> Grumpf... this should be [[:space:]]+ ...
>> > (?:[[:alnum:]_-:]+=[^[:space:]\"\']+)| #x='foo=bar
>> > huz=123'
>> > (?:[[:alnum:]_-:]+=\"[^\"]*\")|
>> > #x='foo="ba=r o" huz=123'
>> > (?:[[:alnum:]_-:]+=\'[^\"]*\')|
>> > #x="foo='ba=r o' huz=123"
>> > (?:[[:alnum:]_-:]+) #x="foox
>> > huz=123"
>> > )*
>> > [:space:]*
>
>> Grumpf... this should be [[:space:]]* ...
>
>> Erm... David/Glenn... can we get a ~(<modifer>) flag which enables...
>> 1. ... strict pattern interpretation
>> 2. ... forces (controlled by ~(<modifer>) ... unless there's something
>> else which already enabled that elsewhere (like a global shell
>> option)) ksh93 to print runtime error messages if a pattern fails to
>> compile
>> ... please ?
>
> well again I think this is regex giving users enough rope to do whatever
Erm... well in this case it gave enough rope for this:
-- snip --
print " _________ \n";
print "| | \n";
print "| 0 \n";
print "| /|\\ \n";
print "| / \\ \n";
print "| \n";
print "| \n";
-- snip --
> [:space:] is a valid RE
> in this example it happens to be a typo
> in another context where, e.g., [...] classes are constructed by code,
> its possible for duplicates to appear in the class
>
> regex should not be expected to complain about syntactically correct patterns
>
> if the pattern did have a syntax error regcomp() would report it to the caller
> so reporting an error or not is not a regex issue
> so it doesn't make sense to add something to ~(...) to check syntax
> because regcomp() already does it by default (modulo the ast REG_LENIENT
> flag, which
> is settable in ~(...))
Uhm... which letter for "flags" in ~(<modifer><flags>) controls
|REG_LENIENT| ? Looking at
http://www2.research.att.com/~gsf/testregex/testregex.c it seems to be
'x' ... but isn"t this ready used for the free-spacing mode ?
----
Bye,
Roland
--
__ . . __
(o.\ \/ /.o) [email protected]
\__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer
/O /==\ O\ TEL +49 641 3992797
(;O/ \/ \O;)
_______________________________________________
ast-developers mailing list
[email protected]
https://mailman.research.att.com/mailman/listinfo/ast-developers