Philip> On Thu, 26 Oct 2000, Mark Leisher wrote:
>> Following the first page will be all the other pages, each in the same
>> format as the first: one number identifying the page followed by 256
>> double-byte Unicode (UCS-2) characters. If a character in the encoding
>> maps to the Unicode character 0000, it means that the character doesn't
>> actually exist. If all characters on a page would map to 0000, that
>> page can be omitted.
Philip> This would mean that there is no good Unicode character to map
Philip> ASCII 0x00 to. The obvious character is U+0000 "<control> = NULL",
Philip> but that's reserved here. So if I'm translating a string
Philip> containing NULs, those characters will be treated as
Philip> "not-a-character"?
There is text in font encodings that have a glyph at position 0 which maps to
some non-zero Unicode value. But yes, using 0x0000 to mean not-a-character
means that no coded character set can have a legitimate mapping to 0x0000.
Basically it just restricts the output Unicode strings from containing
non-characters and by null-terminating at the first unknown character. When
this is the first character in the string, you become very puzzled that
nothing seems to be happening.
-----------------------------------------------------------------------------
Mark Leisher
Computing Research Lab Cinema, radio, television, magazines are a
New Mexico State University school of inattention: people look without
Box 30001, Dept. 3CRL seeing, listen without hearing.
Las Cruces, NM 88003 -- Robert Bresson