On Sat, 23 Jun 2012 22:38:36 +0200 =?KOI8-R?B?z8zYx8Egy9LZ1sHOz9fTy8HR?= wrote:
> I do not see a mistake in the regular expression itself. Either it's a
> hard to spot quoting issue or a full bug in ksh93 or libast regex.

> Glenn, what do you think?

to rule out any possible ksh quoting conflict
express the pattern and subject string as a testregex test
that will make it easy to rule in/out regex itself

> Olga

> On Sat, Jun 23, 2012 at 9:08 AM, Lionel Cons
> <[email protected]> wrote:
> > On 23 June 2012 06:40, Roland Mainz <[email protected]> wrote:
> >> On Sat, Jun 23, 2012 at 6:22 AM, Roland Mainz <[email protected]> 
> >> wrote:
> >>> On Sat, Jun 23, 2012 at 5:55 AM, Glenn Fowler <[email protected]> 
> >>> wrote:
> >>>> On Sat, 23 Jun 2012 03:40:15 +0200 Roland Mainz wrote:
> >>>>> On Sat, Jun 23, 2012 at 2:34 AM, Roland Mainz 
> >>>>> <[email protected]> wrote:
> >>>>> > Here's another issue with regex. We wrote the following script to
> >>>>> > parse XML fragments:
> >>>>> > -- snip --
> >>>>> > typeset -r xmltext='<h1 ><div> a text </div>More [TEXT].<!-- a comment
> >>>>> > (<disabled>) --></h1>'
> >>>>> >
> >>>>> > #
> >>>>> > # parse the XML data
> >>>>> > #
> >>>>> > typeset dummy
> >>>>> >
> >>>>> > dummy="${xmltext//~(Ex)(?:
> >>>>> >        (<!--.+-->)+?|  # xml comments
> >>>>> >        (<[[:alnum:]_-:]+
> >>>>> >                (?: # attributes
> >>>>> >                        [:space:]+
> >>>>
> >>>>> Grumpf... this should be [[:space:]]+ ...
> >>>>> >                        (?:[[:alnum:]_-:]+=[^[:space:]\"\']+)|  
> >>>>> > #x='foo=bar huz=123'
> >>>>> >                        (?:[[:alnum:]_-:]+=\"[^\"]*\")|         
> >>>>> > #x='foo="ba=r o" huz=123'
> >>>>> >                        (?:[[:alnum:]_-:]+=\'[^\"]*\')|         
> >>>>> > #x="foo='ba=r o' huz=123"
> >>>>> >                        (?:[[:alnum:]_-:]+)                     
> >>>>> > #x="foox huz=123"
> >>>>> >                )*
> >>>>> >                [:space:]*
> >>>>
> >>>>> Grumpf... this should be [[:space:]]* ...
> >>>>
> >>>>> Erm... David/Glenn... can we get a ~(<modifer>) flag which enables...
> >>>>> 1. ... strict pattern interpretation
> >>>>> 2. ... forces (controlled by ~(<modifer>) ... unless there's something
> >>>>> else which already enabled that elsewhere (like a global shell
> >>>>> option)) ksh93 to print runtime error messages if a pattern fails to
> >>>>> compile
> >>>>> ... please ?
> >>>>
> >>>> well again I think this is regex giving users enough rope to do whatever
> >>
> >> BTW: Can you look at the script in
> >> http://opensolaris.pastebin.ca/2164009 and tell me why it works when
> >> the variable "working" is set to "true" and fails when the variable is
> >> set to "false" ? The difference is the style of XML attribute value
> >> quoting used in the embedded test string, e.g. ...
> >> -- snip --
> >> if ${working} ; then
> >>        typeset -r xmltext=$'<h1 style=\'foo\' h=\'bar\'><div> a text
> >> </div>More [TEXT].<!-- a comment (<disabled>) --></h1>'
> >> else
> >>        typeset -r xmltext=$'<h1 style=\'foo\' h="bar"><div> a text
> >> </div>More [TEXT].<!-- a comment (<disabled>) --></h1>'
> >> fi
> >> -- snip --
> >>
> >> I don't see why name='value' should be matched differently than
> >> name="value" by this pattern in the script:
> >> -- snip --
> >> dummy="${xmltext//~(Ex)(?:
> >>        (<!--.+-->)+?|  # xml comments
> >>        (<[[:alnum:]_-:]+
> >>                (?: # attributes
> >>                        [[:space:]]+
> >>                        (?:[[:alnum:]_-:]+=[^[:space:]\"]+?)|   #x='foo=bar 
> >> huz=123'
> >>                        (?:[[:alnum:]_-:]+=\"[^\"]*?\")|        
> >> #x='foo="ba=r o" huz=123'
> >>                        (?:[[:alnum:]_-:]+=\'[^\']*?\')|        
> >> #x="foo='ba=r o' huz=123"
> >>                        (?:[[:alnum:]_-:]+)                     #x="foox 
> >> huz=123"
> >>                )*
> >>                [[:space:]]*
> >>                \/?     # start tags which are end tags, too (like <foo\/>)
> >>        >)+?|                           # xml start tags
> >>        (<\/[[:alnum:]_-:]+>)+?|        # xml end tags
> >>        ([^><]+)                        # xml text
> >>        )/D}"
> >> -- snip --
> >>
> >> Do you see anything suspicious ?
> >
> > The suspicious part is that your ${s//../..} <expression> is wrapped
> > in double-quotes and your matching fails when it tries to match
> > double-quotes. I suspect there's something which causes the shell to
> > misparse the regex near the double-quotes.
> >
> > Lionel
> >
> > _______________________________________________
> > ast-developers mailing list
> > [email protected]
> > https://mailman.research.att.com/mailman/listinfo/ast-developers

> -- 
>       ,   _                                    _   ,
>      { \/`o;====-    Olga Kryzhanovska   -====;o`\/ }
> .----'-/`-/     [email protected]   \-`\-'----.
>  `'-..-| /       http://twitter.com/fleyta     \ |-..-'`
>       /\/\     Solaris/BSD//C/C++ programmer   /\/\
>       `--`                                      `--`

_______________________________________________
ast-developers mailing list
[email protected]
https://mailman.research.att.com/mailman/listinfo/ast-developers

Reply via email to