On Wed, 2017 Apr 26 14:25+0000, Thorsten Glaser wrote:
> 
> Thanks, that looks promising… for cp1047 anyway.
>
> >> Can you run this in both codepages, and possibly their Euro
> >> equivalents?
> >
> >I'm afraid I'm not able to switch the codepage. Some searching
> >indicates that this can be done in a shell with e.g.
> […]
> 
> No problem.

Good news!

I played with this some more, and found what was missing: a call to
setlocale().

So I added <locale.h>, and experimentally, this line...

    /* very much NOT Latin-1 compatible */
    setlocale(LC_ALL, "Ru_RU.IBM-1025");

...and the result was

 00 01 02 03 9C 09 86 7F 97 8D 8E 0B 0C 0D 0E 0F
 10 11 12 13 9D 0A 08 87 18 19 92 8F 1C 1D 1E 1F
 80 81 82 83 84 85 17 1B 88 89 8A 8B 8C 05 06 07
 90 91 16 93 94 95 96 04 98 99 9A 9B 14 15 9E 1A
 20 A0 A1 A2 A3 A4 A5 A6 A8 A9 5B 2E 3C 28 2B 21
 26 AA AB AC AE AF B0 B1 B2 B3 5D 24 2A 29 3B 5E
 2D 2F B4 B5 B6 B7 B8 B9 BA BB 7C 2C 25 5F 3E 3F
 BC BD BE AD BF C0 C1 C2 C3 60 3A 23 40 27 3D 22
 C4 61 62 63 64 65 66 67 68 69 C5 C6 C7 C8 C9 CA
 CB 6A 6B 6C 6D 6E 6F 70 71 72 CC CD CE CF D0 D1
 D2 7E 73 74 75 76 77 78 79 7A D3 D4 D5 D6 D7 D8
 D9 DA DB DC DD DE DF E0 E1 E2 E3 E4 E5 E6 E7 E8
 7B 41 42 43 44 45 46 47 48 49 E9 EA EB EC ED EE
 7D 4A 4B 4C 4D 4E 4F 50 51 52 EF F0 F1 F2 F3 F4
 5C A7 53 54 55 56 57 58 59 5A F5 F6 F7 F8 F9 FA
 30 31 32 33 34 35 36 37 38 39 FB FC FD FE FF 9F

According to this page...

    
https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.2.0/com.ibm.zos.v2r2.cbcpx01/locnamc.htm

...the input should have been converted to ISO 8859-5.

So it seems like maybe the IBM docs are a bit flexible in what they mean
when they say "ISO 8859-1" :-]

I think what they really meant to say is "ASCII-compatible encoding." If
you look at the chcp(1) man page, for example...

    
https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.2.0/com.ibm.zos.v2r2.bpxa500/ip1.htm

...it talks about an "ASCII code page" in a sense distinct from (7-bit)
ASCII itself.

Incidentally, here is the result for "En_US.IBM-037":

 00 01 02 03 9C 09 86 7F 97 8D 8E 0B 0C 0D 0E 0F
 10 11 12 13 9D 0A 08 87 18 19 92 8F 1C 1D 1E 1F
 80 81 82 83 84 85 17 1B 88 89 8A 8B 8C 05 06 07
 90 91 16 93 94 95 96 04 98 99 9A 9B 14 15 9E 1A
 20 A0 E2 E4 E0 E1 E3 E5 E7 F1 A2 2E 3C 28 2B 7C
 26 E9 EA EB E8 ED EE EF EC DF 21 24 2A 29 3B AC
 2D 2F C2 C4 C0 C1 C3 C5 C7 D1 A6 2C 25 5F 3E 3F
 F8 C9 CA CB C8 CD CE CF CC 60 3A 23 40 27 3D 22
 D8 61 62 63 64 65 66 67 68 69 AB BB F0 FD FE B1
 B0 6A 6B 6C 6D 6E 6F 70 71 72 AA BA E6 B8 C6 A4
 B5 7E 73 74 75 76 77 78 79 7A A1 BF D0 DD DE AE
 5E A3 A5 B7 A9 A7 B6 BC BD BE 5B 5D AF A8 B4 D7
 7B 41 42 43 44 45 46 47 48 49 AD F4 F6 F2 F3 F5
 7D 4A 4B 4C 4D 4E 4F 50 51 52 B9 FB FC F9 FA FF
 5C F7 53 54 55 56 57 58 59 5A B2 D4 D6 D2 D3 D5
 30 31 32 33 34 35 36 37 38 39 B3 DB DC D9 DA 9F

Do you still want the other tables?

> >You don't have enough confidence in etoa_l() to generate the table at
> >build time?
> 
> I didn’t have this initially (curious about the newline setting and
> the handling of control characters in general) but I think I can work
> with it now.
> 
> There’s one thing though… what about codepages that do NOT completely
> map to latin1?

I discussed this with a colleague who is a long-time mainframer. One
thing to note is that not just any EBCDIC codepage can be used in a
POSIX environment, because if you can't encode e.g. square brackets,
then basic things like shell scripts will break.

These odd encodings should be usable in a 3270 terminal session, the
traditional mainframe UI. But the POSIX environment is a special
case of that.

> When does it error out, too?

It's in the doc. Both failure modes (non-SBCS locale, out-of-memory
condition) should be extremely rare, to the point that they don't really
need to be handled gracefully.


--Daniel


-- 
Daniel Richard G. || sk...@iskunk.org
My ASCII-art .sig got a bad case of Times New Roman.

Reply via email to