On Wed, Jul 11, 2012 at 10:23 AM, Lionel Cons <lionelcons1...@googlemail.com> wrote: > On 1 July 2012 22:56, Lionel Cons <lionelcons1...@googlemail.com> wrote: >> On 27 June 2012 19:24, Glenn Fowler <g...@research.att.com> wrote: >>> >>> On Wed, 27 Jun 2012 18:15:06 +0200 Roland Mainz wrote: >>>> On Wed, Jun 27, 2012 at 6:04 PM, Glenn Fowler <g...@research.att.com> >>>> wrote: >>>> > On Wed, 27 Jun 2012 17:43:06 +0200 Roland Mainz wrote: >>>> >> How can I quote '-' in a ~(Ex)-style pattern [...] that it exactly >>>> >> matches a '-' latter ? >>>> >> I've tried the following pattern but the result is wrong (it should >>>> >> match "hello-world" and "foo-bar"): >>>> >> -- snip -- >>>> >> $ ~/bin/ksh -c 's="hello-world foo-bar" ; >>>> >> dummy="${s//~(Ex)([_\-[:alnum:]]+)/D}" ; print -v .sh.match' >>>> >> ( >>>> >> ( >>>> >> hello >>>> >> world >>>> >> foo >>>> >> bar >>>> >> ) >>>> >> ( >>>> >> hello >>>> >> world >>>> >> foo >>>> >> bar >>>> >> ) >>>> >> ) >>>> >> -- snip -- >>>> >> I tried to quote the '\' with a 2nd '\' without success (e.g. we get >>>> >> the same wrong output/matches) >>>> >> -- snip -- >>>> >> $ ~/bin/ksh -c 's="hello-world foo-bar" ; >>>> >> dummy="${s//~(Ex)([_\-[:alnum:]]+)/D}" ; print -v .sh.match' >>>> >> ... >>>> >> -- snip -- >>>> > >>>> >> Looking via dbx/gdb at the strings passed to the regex engine it looks >>>> >> like ksh93 is either passing no '\' to |_ast_regcomp()| (in the case >>>> >> of "~(Ex)([_\-[:alnum:]]+)") or it passes two '\' to |_ast_regcomp()| >>>> >> (in the case of "~(Ex)([_\\-[:alnum:]]+)") ... it looks like a bug in >>>> >> the ksh93 quoting mechanism for ~(E) patterns... ;-( >>>> > >>>> >> The only working workaround I found is to use \x<hex> to avoid having >>>> >> to use \ to quote the '-' (the output below is IMO the expected one >>>> >> for "${s//~(Ex)([_\-[:alnum:]]+)/D}"): >>>> >> -- snip -- >>>> >> $ ~/bin/ksh -c 's="hello-world foo-bar" ; >>>> >> dummy="${s//~(Ex)([_\x2d[:alnum:]]+)/D}" ; print -v .sh.match' >>>> >> ( >>>> >> ( >>>> >> hello-world >>>> >> foo-bar >>>> >> ) >>>> >> ( >>>> >> hello-world >>>> >> foo-bar >>>> >> ) >>>> >> ) >>>> >> -- snip -- >>>> > >>>> > its regex syntax and doesn't need a quote >>>> > at http://pubs.opengroup.org/onlinepubs/9699919799/ set 9.3.5 item 7 >>>> > from that it looks like >>>> > * if you want literal ']' use one of >>>> > []...] >>>> > [^]...] >>> >>>> I know... >>> >>>> > * if you want literal '-' place it last >>>> > [...-] >>> >>>> ... I didn't know that... ;-/ >>>> Thanks... :-) >>> >>>> ... but could you still check why ksh93 "swallows" the single '\' but >>>> passes two '\' as "\\" to |_ast_regcomp()|, please ? Is this intended >>>> or somehow a bug or sideeffect ? >>> >>> its a side effect or the conflict betwee ksh and regex quoting >>> if a side has to win it will be ksh in that context >>> dgk can give more detail on how tricky that part is because >>> ksh can't be expected to know all of the intricacies of each ~(...) RE >>> syntax >>> at some point when an RE gets complex enough it will have to be placed in a >>> var >>> then referencing it as $the_re is guaranteed to get sh and RE quoting right >>> (or at least pass what everquoting is present down to regex) >> >> I don't think this is going to be useful. Either ksh can be expected >> to know all of the egrep syntax or knows nothing and passes the >> pattern through unscathed after user has provided sufficient \ escapes >> to prevent clashes with ksh syntax. >> The current situation of "guessing" which side - ksh or ere - will win >> is NOT acceptable. >> >> Try to see it from the point of a POSIX standardisation committee or a >> code generator which will generate ksh93 code. The POSIX committee >> won't accept a fuzzy situation as it is right now and a code generator >> can't be expected to do a trial&error procedure like it is required >> right now until a pattern fits the needs of ksh's guesswork. >> >> if the situation can't be improved then I'd suggest to remove the >> whole ~(E) feature. While I see the very usefulness the current >> implementation is completely unacceptable. > > So what will be done here? If nothing can be done I'll post a patch to > wrap ~(E) support in SHOPT_EXPERIMENTAL_PATTERN_MATCHING so we can > disable this on production machines.
Lets say, I'm not happy with the number of issues with ~(....) either but please do not do such drastic steps. I think the problem is: 1. Lack of clear rules how quoting in ~(....), especially ~(E), works 2. Lack of clear documentation of said rules 3. Lack of diagnostics, e.g. it is not possible for a script developer to check how, and I consider neither gdb nor dbx options here, the grep/egrep/xgrep/pcre pattern looks like when it is passed to regcomp() Irek _______________________________________________ ast-developers mailing list ast-developers@research.att.com https://mailman.research.att.com/mailman/listinfo/ast-developers