I do not see a mistake in the regular expression itself. Either it's a hard to spot quoting issue or a full bug in ksh93 or libast regex.
Glenn, what do you think? Olga On Sat, Jun 23, 2012 at 9:08 AM, Lionel Cons <[email protected]> wrote: > On 23 June 2012 06:40, Roland Mainz <[email protected]> wrote: >> On Sat, Jun 23, 2012 at 6:22 AM, Roland Mainz <[email protected]> >> wrote: >>> On Sat, Jun 23, 2012 at 5:55 AM, Glenn Fowler <[email protected]> wrote: >>>> On Sat, 23 Jun 2012 03:40:15 +0200 Roland Mainz wrote: >>>>> On Sat, Jun 23, 2012 at 2:34 AM, Roland Mainz <[email protected]> >>>>> wrote: >>>>> > Here's another issue with regex. We wrote the following script to >>>>> > parse XML fragments: >>>>> > -- snip -- >>>>> > typeset -r xmltext='<h1 ><div> a text </div>More [TEXT].<!-- a comment >>>>> > (<disabled>) --></h1>' >>>>> > >>>>> > # >>>>> > # parse the XML data >>>>> > # >>>>> > typeset dummy >>>>> > >>>>> > dummy="${xmltext//~(Ex)(?: >>>>> > (<!--.+-->)+?| # xml comments >>>>> > (<[[:alnum:]_-:]+ >>>>> > (?: # attributes >>>>> > [:space:]+ >>>> >>>>> Grumpf... this should be [[:space:]]+ ... >>>>> > (?:[[:alnum:]_-:]+=[^[:space:]\"\']+)| >>>>> > #x='foo=bar huz=123' >>>>> > (?:[[:alnum:]_-:]+=\"[^\"]*\")| >>>>> > #x='foo="ba=r o" huz=123' >>>>> > (?:[[:alnum:]_-:]+=\'[^\"]*\')| >>>>> > #x="foo='ba=r o' huz=123" >>>>> > (?:[[:alnum:]_-:]+) #x="foox >>>>> > huz=123" >>>>> > )* >>>>> > [:space:]* >>>> >>>>> Grumpf... this should be [[:space:]]* ... >>>> >>>>> Erm... David/Glenn... can we get a ~(<modifer>) flag which enables... >>>>> 1. ... strict pattern interpretation >>>>> 2. ... forces (controlled by ~(<modifer>) ... unless there's something >>>>> else which already enabled that elsewhere (like a global shell >>>>> option)) ksh93 to print runtime error messages if a pattern fails to >>>>> compile >>>>> ... please ? >>>> >>>> well again I think this is regex giving users enough rope to do whatever >> >> BTW: Can you look at the script in >> http://opensolaris.pastebin.ca/2164009 and tell me why it works when >> the variable "working" is set to "true" and fails when the variable is >> set to "false" ? The difference is the style of XML attribute value >> quoting used in the embedded test string, e.g. ... >> -- snip -- >> if ${working} ; then >> typeset -r xmltext=$'<h1 style=\'foo\' h=\'bar\'><div> a text >> </div>More [TEXT].<!-- a comment (<disabled>) --></h1>' >> else >> typeset -r xmltext=$'<h1 style=\'foo\' h="bar"><div> a text >> </div>More [TEXT].<!-- a comment (<disabled>) --></h1>' >> fi >> -- snip -- >> >> I don't see why name='value' should be matched differently than >> name="value" by this pattern in the script: >> -- snip -- >> dummy="${xmltext//~(Ex)(?: >> (<!--.+-->)+?| # xml comments >> (<[[:alnum:]_-:]+ >> (?: # attributes >> [[:space:]]+ >> (?:[[:alnum:]_-:]+=[^[:space:]\"]+?)| #x='foo=bar >> huz=123' >> (?:[[:alnum:]_-:]+=\"[^\"]*?\")| #x='foo="ba=r >> o" huz=123' >> (?:[[:alnum:]_-:]+=\'[^\']*?\')| #x="foo='ba=r >> o' huz=123" >> (?:[[:alnum:]_-:]+) #x="foox >> huz=123" >> )* >> [[:space:]]* >> \/? # start tags which are end tags, too (like <foo\/>) >> >)+?| # xml start tags >> (<\/[[:alnum:]_-:]+>)+?| # xml end tags >> ([^><]+) # xml text >> )/D}" >> -- snip -- >> >> Do you see anything suspicious ? > > The suspicious part is that your ${s//../..} <expression> is wrapped > in double-quotes and your matching fails when it tries to match > double-quotes. I suspect there's something which causes the shell to > misparse the regex near the double-quotes. > > Lionel > > _______________________________________________ > ast-developers mailing list > [email protected] > https://mailman.research.att.com/mailman/listinfo/ast-developers -- , _ _ , { \/`o;====- Olga Kryzhanovska -====;o`\/ } .----'-/`-/ [email protected] \-`\-'----. `'-..-| / http://twitter.com/fleyta \ |-..-'` /\/\ Solaris/BSD//C/C++ programmer /\/\ `--` `--` _______________________________________________ ast-developers mailing list [email protected] https://mailman.research.att.com/mailman/listinfo/ast-developers
