Re: UTF16 to EBCDIC

Phil Smith III Sat, 08 Feb 2020 13:32:55 -0800

Gil wrote:

>How does it handle characters absent from IBM-037?


 

I expect it will throw an error. This is, as you (Gil) know, one of the 
problems with OP’s query: you can’t stuff thousands of pounds of potatoes into 
a 256-pound sack. (OK, characters, not potatoes.) For those who don’t 
understand, consider these two characters:

зέ

The first one is Cyrillic, the second Greek. In EBCDIC (assuming the most 
common code pages of 1025 for Cyrillic and 825 for Greek), both are x’B2’. So 
while you can have a Unicode string encoded with any of the UTF family that 
contains both characters, you can’t have “an EBCDIC string” that does so 
without some metadata that says “Byte 1 is 1025 and byte 2 is 825”, which is 
(fairly) unlikely.

 

Thus you can take з and tell ICONV or equivalent to convert FROM 1025 TO 
UTF-whatever, or take έ and tell ICONV or equivalent to convert FROM 825 TO 
UTF-whatever. You’ll get two different UTF-encoded values for two different 
U+nnnn values, as you’d expect. And you can convert back, as long as your input 
string fits entirely within the target 256-byte EBCDIC code page.

 

This stuff gets pretty complex, with terms like “code point”, “Character”, 
“glyph”, and “grapheme” used somewhat interchangeably (but all are subtly 
different in specific contexts). So tread carefully.

 

>I wonder what was the motivation to require preallocated data set

>names rather than the more flexible alternative of DDNAMEs?

 

I believe it’s using ICONV on the USS side under the covers, which takes a data 
set name.

 

…phsiii


----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: UTF16 to EBCDIC

Reply via email to