Hi,
I am asking the ASN.1 community to clarify the precise definition of TeletexString (T61String). The public opinion on the TeletexString (T61String) ASN.1 type is that it mostly obsolete and should not be used in new ASN.1 specifications. However, this does not preclude one from seeking a precise definition of the encoding, if only for historic purposes. Here is what I know so far. The bottom of this email contains some questions. I kindly ask people responsible for developing the ASN.1 compilers and actual ASN.1 based protocols to comment on this and present their own view on this topic. I should repeat myself there: I am perfectly aware that a sizable volume of software in the world treats TeletexString (T61String) as a simple 8-bit string with mostly Windows Latin 1 (superset of iso-8859-1) encoding. However, this particular quest is for a proper, precise and standards-based definition of TeletexString. Here is what I have: 1. The TeletexString (T61String) has its roots in T.61 encoding, but is no longer defined as being T.61 based. In addition to that, the T.61 standard is withdrawn by ITU-T: http://www.itu.int/rec/T-REC-T.61 2. The ASN.1 standard (X.680) specifies TeletexString (T61String) as a combination of the character sets specified by the registration numbers listed in ISO International Register of Coded Character Sets to be used with Escape Sequences (ISO-2375): 6, 87, 102, 103, 106, 107, 126, 144, 150, 153, 156, 164, 165, 168, plus SPACE and DELETE characters. In addition to that, the X.680 Table 6 NOTE 2 allows using register entries 6 and 156 instead of 102 and 103. 3. The ISO Register itself is available at http://www.itscj.ipsj.or.jp/ISO-IR/ 4. The following are excerpts from the appropriate documents found by ISO Register. Reg.#6 is ASCII. Escapes into: G0: ESC 2/8 4/2 ("(B") G1: ESC 2/9 4/2 (")B") The range is [0x21 .. 0x7e]. Conversion into Unicode is simple, because it has one-to-one correspondence. Reg.#87 is a "Japanese Graphic Character Set for Information Interchange". Is a multiple-byte set of 6877 characters. The character set is JIS X 0208-1983 (originally JIS C 6226-1983). Escapes into: G0: ESC 2/4 4/2 ("$B") G1: ESC 2/4 2/9 4/2 ("$)B") G2: ESC 2/4 2/10 4/2 ("$*B") G3: ESC 2/4 2/11 4/2 ("$+B") Reg.#102 is "Teletex Primary Set of Graphic Characters". Escapes into: G0: ESC 2/8 7/5 ("(u") G1: ESC 2/9 7/5 (")u") G2: ESC 2/10 7/5 ("*u") G3: ESC 2/11 7/5 ("+u") It is almost identical to ASCII, except for ASCII position for '$' (DOLLAR SIGN) is filled with '¤' (CURRENCY SIGN), which is U+00A4. Also, ASCII positions for '`', '\', '^', '{', '}', '~' are marked as "should not be used". Reg.#103 is a supplementary set of characters used with #102. Escapes into: G0: ESC 2/8 7/6 ("(v") G1: ESC 2/9 7/6 (")v") G2: ESC 2/10 7/6 ("*v") G3: ESC 2/11 7/6 ("+v") Some characters in that character set are combining characters, which can only be restrictively used with certain basic Latin letters. It can be thought of as a subset of #156 with the exception of 4/12 which is UNDERLINE in #103 and absent in #156. Reg.#106 is a primary set of control functions, used with #107. Escapes into: C0: ESC 2/1 4/5 ("!E") This set is so short I can list it here: 0x08 BS BACKSPACE -- same as Unicode 0x0a LF LINE FEED -- same as Unicode 0x0c FF FORM FEED -- same as Unicode 0x0d CR CARRIAGE RETURN -- same as Unicode 0x0e LS1 LOCKING SHIFT ONE 0x0f LS0 LOCKING SHIFT ZERO 0x19 SS2 SINGLE SHIFT TWO 0x1a SUB SUBSTITUTE CHARACTER 0x1b ESC ESCAPE -- same as Unicode 0x1d SS3 SINGLE SHIFT THREE The LS1 and LS0 are two magical functions which, respectively, invoke the currently designated G1 or G0 set into positions 2/1 to 7/14 The SS2 and SS3, respectively, invoke one character of the currently designated set G2 and G3. The SUB is wholly equivalent to U+001a (SUBSTITUTE) Reg.#107 is a supplementary set of control functions, used with #106. Escapes into: C1: ESC 2/2 4/8 ('"H') This set contains three special control codes: 0x8b PLD PARTIAL LINE DOWN -- similar to <SUB> 0x8c PLU PARTIAL LINE UP -- sumilar to <SUP> 0x9b CSI CONTROL SEQUENCE INTRODUCER PLD,PLU: this can not be adequately represented by Unicode. CSI: since TeletexString has fixed meaning in ASN.1, appearance of this code is allowed in the TeletexString, yet the semantics of its appearance is not specified. Hence, it is probably an error if CSI is present in the stream. Reg.#126 is a "Right-hand Part of the Latin/Greek Alphabet". Comprises of 90 characters, including accented letters. Escapes into: G1: ESC 2/13 4/6 ("-F") G2: ESC 2/14 4/6 (".F") G3: ESC 2/15 4/6 ("/F") Note: This Registration is a subset of ISO-IR 227. #144 is a "Cyrillic part of the Latin/Cyrillic Alphabet". Comprises of 95 characters. Escapes into: G1: ESC 2/13 4/12 ("-L") G2: ESC 2/14 4/12 (".L") G3: ESC 2/15 4/12 ("/L") #150 is a "Greek Primary Set of Graphic Characters". Comprises of 94 characters. Escapes into: G0: ESC 2/8 2/1 4/0 ("(!@") G1: ESC 2/9 2/1 4/0 (")!@") G2: ESC 2/10 2/1 4/0 ("*!@") G3: ESC 2/11 2/1 4/0 ("+!@") #153 is a "Basic Cyrillic Character Set for 8-bit codes". Comprises of 68 characters. Escapes into: G1: ESC 2/13 4/15 ("-O") G2: ESC 2/14 4/15 (".O") G3: ESC 2/15 4/15 ("/O") #156 is a "Supplementary Set of ISO/IEC 6937:1992" for use with #6 Comprises of 87 characters. Escapes into: G1: ESC 2/13 5/2 ("-R") G2: ESC 2/14 5/2 (".R") G3: ESC 2/15 5/2 ("/R") #164 is a "Hebrew Supplementary Set of Graphic Characters" Comprises of 27 characters. Escapes into: G1: ESC 2/13 5/3 ("-S") G2: ESC 2/14 5/3 (".S") G3: ESC 2/15 5/3 ("/S") #165 is a set of "Codes of the Chinese graphic character set" Is a multiple-byte set of 8446 characters. Escapes into: G0: ESC 2/4 2/8 4/5 ("$(E") G1: ESC 2/4 2/9 4/5 ("$)E") G2: ESC 2/4 2/10 4/5 ("$*E") G3: ESC 2/4 2/11 4/5 ("$+E") #168 is a "Japanese Graphic Character Set for Information Interchange" A multiple-byte set of 6879 characters updated from #87. Escapes into: G0: ESC 2/6 4/0 ESC 2/4 4/2 ("&@" "$B") G1: ESC 2/6 4/0 ESC 2/4 2/9 4/2 ("&@" "$)B") G2: ESC 2/6 4/0 ESC 2/4 2/10 4/2 ("&@" "$*B") G3: ESC 2/6 4/0 ESC 2/4 2/11 4/2 ("&@" "$+B") 5. Questions 5.1 The Reg.#107 contains 0x8b PLD PARTIAL LINE DOWN -- similar to <SUB> 0x8c PLU PARTIAL LINE UP -- sumilar to <SUP> 0x9b CSI CONTROL SEQUENCE INTRODUCER however, since TeletexString (T61String) is not defined as a reference to ISO-2022, does it mean that CSI is not defined and should not appear? 5.2 The Reg.#106 defines locking shift functions, LS1, LS0 etc. My understanding is that these functions must do what they are supposed to do, that is, invoke G1/G0 into GL. Is that right? 5.3 The main question. What is the default state of GR and GR at the very beginning of the string? According to X.208 (I believe; I don't have this at hands), the default state for GL and GR is Reg.#102 and Reg.#103. However, this was just a reflection on the T.61 roots of the TeletexString type. The more modern T.61 (T.51/50) have subsequently explicitly defined IRV through the a) alphabet identical to Reg.#6 and b) through the escape sequence identical to that of Reg.#6. I assume this fact has been reflected in the X.680 Table 6 Note 2. Since the #102 and #6 are practically different only in the DOLLAR SIGN position, we can mentally integrate #6 and #102 and #103 & #156, ignoring the undefined code points in either of them. However, the choice of the start encoding may affect the use of the dollar sign ($) versus currency sign (¤). Despite #6 and #102 being equal, according to X.608, there got to be something that is more equal. If we assume #6 is "more equal", then there is no controversy: both at the beginning of the string and during the sequence switch to #6 and #102 (explicitly) we always know whether 2/4 is dollar or currency sign. Hovewer, if we assume #103 is "more equal", then we have a problem, since the beginning of the string is defined as #6 in T.50/T.51, thus it can probably not be #103 at the beginning of the string, according to the latest state of standardization. 5.4 Can we assume CL has #106 and CR has #107 at the beginning of the string? -- Lev Walkin [EMAIL PROTECTED] _______________________________________________ Asn1 mailing list Asn1@asn1.org http://lists.asn1.org/mailman/listinfo/asn1