I had focused on the other part of this thread and forgotten about this part
you are correct that [^...] triggers single byte mode
the [...] parser has an optimization for single byte mode, impelmented
by a 256 byte table, which is preferable over the multibyte mode,
which is implemented by callout functions
the optimizer should have selected multibyte mode for ^ and multibyte locale
but didn't -- its fixed now
thanks

On Thu, 21 Jun 2012 02:26:49 +0200 Roland Mainz wrote:
> Here is another issue related to using ([^[><]]+)+? in an egrep pattern.

> Running the following example with ast-ksh.2012-06-12 in the
> en_US.UTF-8 locale on Solaris 11/AMD64 prints single-byte values with
> the 7th bit set (e.g. illegal in UTF-8 ; and if you look closer the
> final "." of the input string gets missing, too):
> -- snip --
> $ ksh -c $'s="bye bye \u[20ac]." ;
> dummy="${s//~(E)(?:([^[><]]+)+?)/dummy}" ; print -v .sh.match'
> (
>         (
>                 b
>                 y
>                 e
>                 ' '
>                 b
>                 y
>                 e
>                 ' '
>                 ??GARBAGE??
>                 ??GARGABE??
>                 ??GARBAGE??
>         )
>         (
>                 b
>                 y
>                 e
>                 ' '
>                 b
>                 y
>                 e
>                 ' '
>                 ??GARBAGE??
>                 ??GARGABE??
>                 ??GARBAGE??
>         )
> )
> -- snip --

> I've replaced the invalid byte sequences with the text "??GARGABE??"
> here since not all email applications will view the issue.

> ----

> Bye,
> Roland

> P.S.: Technically these are two bugs: 1. ([^[><]]+)+? triggers
> single-byte interpretation and 2. that print -v .sh.match doesn't put
> the single-byte values into something like $'\xFF' ...

> -- 
>   __ .  . __
>  (o.\ \/ /.o) roland.ma...@nrubsig.org
>   \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
>   /O /==\ O\  TEL +49 641 3992797
>  (;O/ \/ \O;)

_______________________________________________
ast-developers mailing list
ast-developers@research.att.com
https://mailman.research.att.com/mailman/listinfo/ast-developers

Reply via email to