You're just confused, although this is not helped by me typing faster
than I was thinking, in my descriptions of the bit-usage of UTF-8, etc
(sorry, I'm @ work so was only paying 50% attention to what I was
typing).  Better not to rely on me in this case, but to read this lot:
http://en.wikipedia.org/wiki/Unicode_Transformation_Format

Anyway, to answer your question:

If you use "bytes" and specify varchar2(2000), you'll get 2000 bytes of
storage space for each row in that column.  That's obvious.

If you use "chars" and specify varchar2(2000), you'll get 2000x
[whatever the size of the char could be, for the given encoding you're
using for that column].  If it's a 4-byte character encoding, the
varchar2(2000) will be 8000 BYTES long (4x2000).

So if you use chars and say "2000, pls": you can definitely fit 2000
characters in there.  If you specify 2000 bytes... you'll be able to
fit 2000 bytes worth of characters, which given a UTF-8 character can
be 1-4 bytes each, will quite possibly be fewer than 2000 chars.

Basically the "byte semantics" comes from back in the day when we only
ever thought about english-language characters, which fitted nicely
into 7 bits, so one byte = one char worked nicely.  Now that we deal
with all sorts of different alphabets, the space needed to store a
character also varies, so trying to use a byte as a character unit
doesn't make sense, hence the character-based "semantics".

I'd be more inclined to use "characters" than "bytes".

-- 
Adam

-- 
Adam


--~--~---------~--~----~------------~-------~--~----~
 You received this message because you are subscribed to the Google Groups 
"cfaussie" group.
To post to this group, send email to cfaussie@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/cfaussie?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to