Hi! ----
Here is another issue related to using ([^[><]]+)+? in an egrep pattern. Running the following example with ast-ksh.2012-06-12 in the en_US.UTF-8 locale on Solaris 11/AMD64 prints single-byte values with the 7th bit set (e.g. illegal in UTF-8 ; and if you look closer the final "." of the input string gets missing, too): -- snip -- $ ksh -c $'s="bye bye \u[20ac]." ; dummy="${s//~(E)(?:([^[><]]+)+?)/dummy}" ; print -v .sh.match' ( ( b y e ' ' b y e ' ' ??GARBAGE?? ??GARGABE?? ??GARBAGE?? ) ( b y e ' ' b y e ' ' ??GARBAGE?? ??GARGABE?? ??GARBAGE?? ) ) -- snip -- I've replaced the invalid byte sequences with the text "??GARGABE??" here since not all email applications will view the issue. ---- Bye, Roland P.S.: Technically these are two bugs: 1. ([^[><]]+)+? triggers single-byte interpretation and 2. that print -v .sh.match doesn't put the single-byte values into something like $'\xFF' ... -- __ . . __ (o.\ \/ /.o) roland.ma...@nrubsig.org \__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer /O /==\ O\ TEL +49 641 3992797 (;O/ \/ \O;) _______________________________________________ ast-developers mailing list ast-developers@research.att.com https://mailman.research.att.com/mailman/listinfo/ast-developers