On Thu, 14 Dec 2006 02:55:38 +0100 Roland Mainz wrote: > Glenn Fowler wrote: > > > > some users have reported problems with ksh93 vs UTF-8 locales > > in particular garbage characters being emitted in ksh vi or emacs edit mode
> The last report was about Linux/x86 in interactive mode (non-interactive > processing of multibyte characters works, too) - right now > (ast-ksh.20061207) it is working perfectly on Solaris/SPARC and > Solaris/x86 which means this is a platform-specific hiccup while the > multibyte code itself is Ok (even with ja_JP.PCK&&zh_CN.GB18030). > > to date the reports have been anecdotal > > my (and dgk's) efforts to reproduce the problems have failed > > possibly due to improper xterm/os/env setup on my part > What exactly do you miss as development environment ? Just keyboards or > a machine which has the locales installed or what do you need ? I don't know enough to know (I know how to program to the standard interfaces, and know that the debug locale I devised works, but do not know how to set up and verify real world muti-byte/multiwidth characters) step by step instructions with nothing left out would be appreciated be kind to someone using LANG=C exclusively for > 20 years but send the strace info (below) first easier for me to debug bytes than yum and/or linux rpm version dependency hell > > I'd like to either capture ksh in the act or debunk the anecdotes > > so that the integration sources can be finalized for this round > Agreed. > > if anyone has seen this behavior with LC_ALL=en_US.UTF-8 and > > a true UTF-8 terminal could you please trace/strace/truss > > ksh reading the utf-8 chars in and then printing them out > > e.g., capture the ksh93 process read/write syscalls (with > > expanded buffer/size data) for a command like > > > > echo ascii<utf-8-char-sequence>ascii > > > > my guess is that the reported bad behavior is caused by invalid > > UTF-8 character input (garbage in) and not a bug with ksh' multibyte > > char processing -- provide a trace to prove me wrong > I'll try to post that later today... AFAIK the problem is a bug in ksh93 > because bash works OK in the same setup while ksh fails to handle the > '????' characters in my case (on Linux/x86, Solaris is Ok). just because bash looks like it works doesn't necessarily mean it is working it could be that ksh93 is interpreting 8-bit ascii as utf-8 to produce garbage and bash is interpreting 8-bit ascii verbatim that's why I'd like a trace of the read/write call data to verify exactly what bytes are coming from the tty into ksh (and bash too, why not) and then going back to the tty from ksh (bash) so strace ksh and bash for the s tzet case so we can see what's going on thanks -- Glenn Fowler -- AT&T Research, Florham Park NJ --
