On Thu, 26 Oct 2000, Peter Prymmer wrote:

> According to Nick's translated doc the first character on the third line
> of the .enc file is the one to be displayed if the Encode module cannot
> figure out what to do with a given character.  In iso8859-1.enc we
> see:
> 
> # Encoding file: iso8859-1, single-byte
> S
> 003F 0 1
> 00
> 
> which maps to '?'.  In the last rendition of my proposal for cp1047.enc
> I had left that as is, whereas to be compatible with iso8859-1.enc I ought
> to have written:
> 
> # Encoding file: cp1047, single-byte
> S
> 006F 0 1
> 00
> 
> with similar headers for cp37.enc and posix-bc.enc.

I don't think so. As I understand it, the first number is a hexadecimal
representation of a Unicode character to use as a replacement, so you want
003F "QUESTION MARK" rather than "3F just happens to be iso-8859-1 for
question mark, but it's different in EBCDIC". BICBW.

> Although I am quite hard pressed to find an example of a double byte
> character encoding that does make use of 0xFFFF, I do think that there
> could be a problem with the syllogism: "Unicode(tm) guarentees that 0xFFFF
> is not a character. All encodings can be mapped to Unicode(tm).  
> Therefore all coded character sets must reserve 0xFFFF as a
> non-character."  Unless it is the case that Encode .enc files are to be
> used solely as a to/from Unicode set as an intermediary coding.

I disagree. Unicode guarantees that 0xFFFF is not a Unicode character; it
may well be a character in some double byte encoding. This only means that
when you map this encoding to Unicode, no character should end up as
0xFFFF. The coded character set may well have 0xFFFF as a character as
long as it maps that character to something other than 0xFFFF or 0xFFFE
when converting to Unicode.

Cheers,
Philip
-- 
Philip Newton <[EMAIL PROTECTED]>

Reply via email to