> Tatsuo Ishii <[EMAIL PROTECTED]> writes: > > If you regard the unicode code point as simply a number, why not > > regard the multibyte characters as a number too? > > Because there's a standard specifying the Unicode code points *as > numbers*. The mapping from those numbers to UTF8 strings (and other > representations) is well-defined by the standard. > > > Also I'm wondering you what we should do with different > > backend/frontend encoding combo. > > Nothing. chr() has always worked with reference to the database > encoding, and we should keep it that way.
Where is it documented? > BTW, it strikes me that there is another hole that we need to plug in > this area, and that's the convert() function. Being able to create > a value of type text that is not in the database encoding is simply > broken. Perhaps we could make it work on bytea instead (providing > a cast from text to bytea but not vice versa), or maybe we should just > forbid the whole thing if the database encoding isn't SQL_ASCII. Please don't do that. It will break an usefull use case of convert(). A user has a database encoded in UTF-8. He has English, French, Chinese and Japanese data in tables. To sort the tables in the language order, he will do like this: SELECT * FROM japanese_table ORDER BY convert(japanese_text using utf8_to_euc_jp); Without using convert(), he will get random order of data. This is because Kanji characters are in random order in UTF-8, while Kanji characters are reasonably ordered in EUC_JP. -- Tatsuo Ishii SRA OSS, Inc. Japan ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly