\uXXXX on EBCDIC systems (was Re: [PATCH] IBM z/OS + EBCDIC support)

Thorsten Glaser Wed, 03 May 2017 09:08:08 -0700

Dixi quod…

>Use U+4DC0 HEXAGRAM FOR THE CREATIVE HEAVEN (䷀) then ☺


I *do* have a follow-up question for that now.

The utf8bug-1 test fails because its output is interpreted as UTF-8,
but the UTF-8 string it should match was treated as “extended ASCII”
and is thus converted…

So, the situation as it is right now is:

print -n '0\u4DC0' outputs the following octets:
- on an ASCII system : 30 E4 B7 80
- on an EBCDIC system: F0 E4 B7 80

That is, “0” is output in the native codepage, and the Unicode
value is output as real UTF-8 octets.

Now you say UTF-8 is not really used on z/OS or EBCDIC systems
in general, so I was considering the following heresy:
- output: F0 43 B3 20

That is, convert UTF-8 output, before actually outputting it,
as if it were “extended ASCII”, to EBCDIC.

Converting F0 43 B3 20 from EBCDIC(1047) to “extended ASCII”
yields 30 E4 B7 80 by the way, see above. (Typos in the manual
conversion notwithstanding.)

This would allow more consistency doing all those conversions
(which are done automatically). If it doesn’t diminish the
usefulness of mksh on EBCDIC systems I’d say go for it.

Comments?

Thanks,
//mirabilos
-- 
(gnutls can also be used, but if you are compiling lynx for your own use,
there is no reason to consider using that package)
        -- Thomas E. Dickey on the Lynx mailing list, about OpenSSL

\uXXXX on EBCDIC systems (was Re: [PATCH] IBM z/OS + EBCDIC support)

Reply via email to