Re: [HACKERS] UTF8 or Unicode

Karel Zak Tue, 15 Feb 2005 01:25:56 -0800

On Mon, 2005-02-14 at 22:05 -0500, Bruce Momjian wrote:
> Abhijit Menon-Sen wrote:
> > At 2005-02-14 21:14:54 -0500, [email protected] wrote:
> > >
> > > Should our multi-byte encoding be referred to as UTF8 or Unicode?
> > 
> > The *encoding* should certainly be referred to as UTF-8. Unicode is a
> > character set, not an encoding; Unicode characters may be encoded with
> > UTF-8, among other things.
> > 
> > (One might think of a charset as being a set of integers representing
> > characters, and an encoding as specifying how those integers may be
> > converted to bytes.)
> > 
> > > I know UTF8 is a type of unicode but do we need to rename anything
> > > from Unicode to UTF8?
> > 
> > I don't know. I'll go through the documentation to see if I can find
> > anything that needs changing.
> 
> I looked at encoding.sgml and that mentions Unicode, and then UTF8 as an
> acronym. I am wondering if we need to make UTF8 first and Unicode
> second.  Does initdb accept UTF8 as an encoding?


in PG: unicode = utf8 = utf-8 

Our internal routines in src/backend/utils/mb/encnames.c accept all
synonyms. The "official" internal PG name for UTF-8 is "UNICODE" :-(

It's historical reason that UTF8 = UNICODE, because there was "UNICODE"
first. It's same like "WIN" for WIN1251 (in sources it's marked as
"_dirty_ alias")...

I think initdb uses pg_char_to_encoding() from
src/backend/utils/mb/encnames.c and it should be accept all aliases.

        Karel

-- 
Karel Zak <[EMAIL PROTECTED]>


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] UTF8 or Unicode

Reply via email to