Re: [OT?] QBCS
Doug, In most industry usages, MBCS refers to variable width encodings, not fixed width. tex Doug Ewell wrote: > Paradoxically (at least to me), the term "multi-byte character set" > refers to a fixed-width encoding, such as UCS-2. The official name of > ISO/IEC 10646 is "Universal Multiple-Octet Coded Character Set." -- - Tex Texin cell: +1 781 789 1898 mailto:[EMAIL PROTECTED] Xen Master http://www.i18nGuy.com XenCrafthttp://www.XenCraft.com Making e-Business Work Around the World -
RE: [OT?] QBCS
Doug Ewell wrote: > [...] > (BTW, pet peeve: The word "acronym" should only be used to mean a > pronounceable WORD ("nym") formed from the initials of other words. > Classic examples are "scuba" and "radar." If you can figure > out how to pronounce "qbcs," more power to you, but to me it's just > an abbreviation.) Right, sorry. (I can pronounce ['kubks], although I wouldn't do it in front of my managers and customers. :-) Actually, I don't like this "QBCS" term and I'd rather avoid saying it myself. But I wanted to be sure other people mean when they use it. > [...] > > So what it really means must be "quadra-byte character > > encoding", and both GB 18030 and UTF-32 should fit > > into that category. > > GB 18030, yes, because its code units vary from one to four bytes in > length. UTF-32, no, because its code units are uniformly 32 bits. But UTF-8 fits the definition. _ Marco
Re: [OT?] QBCS
Lars Marius Garshol quoted Marco Cimarosti: > | It seems that the IT world has a new acronym: "QBCS". I understand > | that it stands for "quadra-byte character set", and I heard it used > | to refer to GB 13030. > | > | My question is: it just a fancy sinomym for GB 13030 or can it also > | refer to Unicode or other encodings? The original term "DBCS," or "double-byte character set," refers to a variable-width encoding where each character requires either one or two bytes. East Asian legacy character encodings fall into this category. By extension, then, a "QBCS" would be a variable-width character encoding where the code units can be anywhere from one to four bytes long -- an apt description of GB 18030. Paradoxically (at least to me), the term "multi-byte character set" refers to a fixed-width encoding, such as UCS-2. The official name of ISO/IEC 10646 is "Universal Multiple-Octet Coded Character Set." (BTW, pet peeve: The word "acronym" should only be used to mean a pronounceable WORD ("nym") formed from the initials of other words. Classic examples are "scuba" and "radar." If you can figure out how to pronounce "qbcs," more power to you, but to me it's just an abbreviation.) > This must be an oxymoron, in the sense that character sets don't > really have a byte width, being completely abstract assignments of > abstract characters to abstract numbers. This is technically true, but the terms SBCS and DBCS are so entrenched in the industry that it doesn't seem useful to try to deprecate them now. > So what it really means must be "quadra-byte character encoding", and > both GB 18030 and UTF-32 should fit into that category. GB 18030, yes, because its code units vary from one to four bytes in length. UTF-32, no, because its code units are uniformly 32 bits. -Doug Ewell Fullerton, California http://users.adelphia.net/~dewell/
Re: [OT?] QBCS
* Marco Cimarosti | | It seems that the IT world has a new acronym: "QBCS". I understand | that it stands for "quadra-byte character set", and I heard it used | to refer to GB 13030. | | My question is: it just a fancy sinomym for GB 13030 or can it also | refer to Unicode or other encodings? This must be an oxymoron, in the sense that character sets don't really have a byte width, being completely abstract assignments of abstract characters to abstract numbers. So what it really means must be "quadra-byte character encoding", and both GB 18030 and UTF-32 should fit into that category. -- Lars Marius Garshol, Ontopian http://www.ontopia.net > GSM: +47 98 21 55 50 http://www.garshol.priv.no >