On 23 June 2012 06:22, Roland Mainz <roland.ma...@nrubsig.org> wrote: > On Sat, Jun 23, 2012 at 5:55 AM, Glenn Fowler <g...@research.att.com> wrote: >> On Sat, 23 Jun 2012 03:40:15 +0200 Roland Mainz wrote: >>> On Sat, Jun 23, 2012 at 2:34 AM, Roland Mainz <roland.ma...@nrubsig.org> >>> wrote: >>> > Here's another issue with regex. We wrote the following script to >>> > parse XML fragments: >>> > -- snip -- >>> > typeset -r xmltext='<h1 ><div> a text </div>More [TEXT].<!-- a comment >>> > (<disabled>) --></h1>' >>> > >>> > # >>> > # parse the XML data >>> > # >>> > typeset dummy >>> > >>> > dummy="${xmltext//~(Ex)(?: >>> > (<!--.+-->)+?| # xml comments >>> > (<[[:alnum:]_-:]+ >>> > (?: # attributes >>> > [:space:]+ >> >>> Grumpf... this should be [[:space:]]+ ... >>> > (?:[[:alnum:]_-:]+=[^[:space:]\"\']+)| >>> > #x='foo=bar huz=123' >>> > (?:[[:alnum:]_-:]+=\"[^\"]*\")| >>> > #x='foo="ba=r o" huz=123' >>> > (?:[[:alnum:]_-:]+=\'[^\"]*\')| >>> > #x="foo='ba=r o' huz=123" >>> > (?:[[:alnum:]_-:]+) #x="foox >>> > huz=123" >>> > )* >>> > [:space:]* >> >>> Grumpf... this should be [[:space:]]* ... >> >>> Erm... David/Glenn... can we get a ~(<modifer>) flag which enables... >>> 1. ... strict pattern interpretation >>> 2. ... forces (controlled by ~(<modifer>) ... unless there's something >>> else which already enabled that elsewhere (like a global shell >>> option)) ksh93 to print runtime error messages if a pattern fails to >>> compile >>> ... please ? >> >> well again I think this is regex giving users enough rope to do whatever > > Erm... well in this case it gave enough rope for this: > -- snip -- > print " _________ \n"; > print "| | \n"; > print "| 0 \n"; > print "| /|\\ \n"; > print "| / \\ \n"; > print "| \n"; > print "| \n"; > -- snip -- > >> [:space:] is a valid RE >> in this example it happens to be a typo >> in another context where, e.g., [...] classes are constructed by code, >> its possible for duplicates to appear in the class >> >> regex should not be expected to complain about syntactically correct patterns >> >> if the pattern did have a syntax error regcomp() would report it to the >> caller >> so reporting an error or not is not a regex issue >> so it doesn't make sense to add something to ~(...) to check syntax >> because regcomp() already does it by default (modulo the ast REG_LENIENT >> flag, which >> is settable in ~(...)) > > Uhm... which letter for "flags" in ~(<modifer><flags>) controls > |REG_LENIENT| ? Looking at > http://www2.research.att.com/~gsf/testregex/testregex.c it seems to be > 'x' ... but isn"t this ready used for the free-spacing mode ?
I think this is ~(Ep), p is on by default and you want ~(E-p). Glenn, can you confirm this? Lionel _______________________________________________ ast-developers mailing list ast-developers@research.att.com https://mailman.research.att.com/mailman/listinfo/ast-developers