ï
Hi Duraivel,
Your question is incomplete. There are several Unicode encodings to
choose from and the "number of bytes" question is influenced by your choice of
encoding, as well as by the data you choose.
For example, UTF-8 is a multibyte encoding of Unicode, where each
character is 1-, 2-, 3-, or 4-bytes long, depending on the character. The
majority of characters written in Simplified Chinese will be three bytes long in
this encoding.
UTF-16 encodes characters using two bytes per character for the vast
majority of characters in most sets of data. Some Chinese characters are encoded
on higher (or "supplemental") planes of Unicode and will require two
two-byte characters (a "surrogate pair") to access them in UTF-16. These
characters are generally considered to be quite rare in "average" data and it is
unlikely that your data will contain more than a few of these characters in any
event.
Probably, though, you are not starting your question in the right place.
Why do you care about the number of bytes in a character? The reasons you give
will determine whether a specific encoding is more (or less) suited for use than
another encoding (or even character set, such as a legacy, non-Unicode,
character set/encoding). For example, if you are trying to determine whether
Unicode is more (or less) efficient than a legacy solution, then I think you'll
find that the performance issues are somewhere other than the average byte count
per character. If you are worried about storage (disk, database, etc.), then the
specifics of your situation will determine what the "right answer" may be for
you.
Best Regards,
Addison
Addison P. PhillipsDirector, Globalization
ArchitecturewebMethods | Delivering Global Business Visibilityhttp://www.webMethods.comChair, W3C Internationalization
(I18N) Working GroupChair, W3C-I18N-WG, Web Services Task Forcehttp://www.w3.org/InternationalInternationalization is
an architecture.It is not a feature.
-Original Message-From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]On Behalf Of
DuraivelSent: 2004å6æ27æ 23:38To:
[EMAIL PROTECTED]Subject: number of bytes for simplified
chinese
hi,
I would like to know the number opf bytes
required for simplified chinese language. Can we represent all the characters
of simplified chinese in unicode using just two bytes.
regards
duraivel