[HACKERS] Re: [PATCHES] encoding names
BTW, what's wrong with encoding? I don't think, for example EUC-JP or utf-8, are character set names. Hmm, SQL talks of character sets, it has a CHARACTER_SETS view and such. It's slightly incorrect, I agree. Maybe we should not touch getdatabaseencoding() right now, given that the names we currently use are apparently almost correct anyway and considering the pain it creates to alter them, and instead implement the information schema views in the future? I thought schema stuffs would be introduced in 7.2 but apparently it would not happen... I thought I could do it but ran out of time. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] Re: [PATCHES] encoding names
Tatsuo Ishii writes: Maybe we should not touch getdatabaseencoding() right now, given that the names we currently use are apparently almost correct anyway and considering the pain it creates to alter them, and instead implement the information schema views in the future? I thought schema stuffs would be introduced in 7.2 but apparently it would not happen... True, but right now we'd have to do rather elaborate changes just to switch a couple of names to more correct versions. Accepting them as input is good, but maybe we should hold back on the output part a bit until we can do it correctly. -- Peter Eisentraut [EMAIL PROTECTED] http://funkturm.homeip.net/~peter ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
[HACKERS] Re: [PATCHES] encoding names
Tatsuo Ishii writes: But getdbencoding isn't semantically different from the old getdatabaseencoding. encoding isn't the right term anyway, methinks, it should be character set. So maybe database_character_set()? (No get please.) I'm not a native English speaker, so please feel free to choose more appropriate name. BTW, what's wrong with encoding? I don't think, for example EUC-JP or utf-8, are character set names. Hmm, SQL talks of character sets, it has a CHARACTER_SETS view and such. It's slightly incorrect, I agree. Maybe we should not touch getdatabaseencoding() right now, given that the names we currently use are apparently almost correct anyway and considering the pain it creates to alter them, and instead implement the information schema views in the future? -- Peter Eisentraut [EMAIL PROTECTED] http://funkturm.homeip.net/~peter ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[HACKERS] Re: [PATCHES] encoding names
BTW, what's wrong with encoding? I don't think, for example EUC-JP or utf-8, are character set names. Hmm, SQL talks of character sets, it has a CHARACTER_SETS view and such. It's slightly incorrect, I agree. Maybe we should not touch getdatabaseencoding() right now, given that the names we currently use are apparently almost correct anyway and considering the pain it creates to alter them, and instead implement the information schema views in the future? I thought schema stuffs would be introduced in 7.2 but apparently it would not happen... -- Tatsuo Ishii ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
[HACKERS] Re: [PATCHES] encoding names
On Sun, Aug 19, 2001 at 11:02:57AM +0900, Tatsuo Ishii wrote: 4) Encoding official names are inconsistent. Here are my suggested changes (referring http://www.iana.org/assignments/character-sets, according to Peter's suggestiuon): ALT - IBM866 KOI8 - KOI8_R UNICODE - UTF_8 (Peter's suggestion) Right. But we will still need aliases UNICODE, ALT, KOI8 for back compatibility. Thanks, I try fix all. Karel -- Karel Zak [EMAIL PROTECTED] http://home.zf.jcu.cz/~zakkr/ C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Re: [PATCHES] encoding names
ALT - IBM866 Just a quick comment: ALT is not necessarily IBM866. It can be any US-ASCII or 26-character-alphabet Latin set, for example IBM819 or ISO8859-1. Is actually quite different from IBM866 in its true meaning, and they shouldn't be aliased together. ALT is used for example, when none of KOI8-R, Windows-1251, or IBM866 are available to a Russian-speaking person to read/write any text, messages and stuff, we use simple English letters to write words in Russian so that pronunciation sort of holds the same. It's something like russian_latin (as an equivalent to greek_latin in the http://www.iana.org/assignments/character-sets spec), and the writing this way reminds Polish or Serbian-Latin a bit. Ok. Let's leave ALT as it is. -- Tatsuo Ishii ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[HACKERS] Re: [PATCHES] encoding names
4) Encoding official names are inconsistent. Here are my suggested changes (referring http://www.iana.org/assignments/character-sets, according to Peter's suggestiuon): ALT - IBM866 KOI8 - KOI8_R UNICODE - UTF_8 (Peter's suggestion) Right. But we will still need aliases UNICODE, ALT, KOI8 for back compatibility. Sure. Thanks, I try fix all. Thanks! But we seem to leave ALT as it is (Serguei's suggestion). -- Tatsuo Ishii ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
[HACKERS] Re: [PATCHES] encoding names
Hi, attached is patch with: - new encoding names stuff with better performance (binary search intead for() and prevent some needless searching) - possible is use synonyms for encoding (an example ISO-8859-1, Latin1, l1) - implemented is Peter's idea about encoding names clearing (other chars than [A-Za-z0-9] are irrelevan -- 'ISO-8859-1' is same as 'iso8859_1' or iso-8-8-5-9-1 :-) - share routines for this between FE and BE (never more define encoding names separate in FE and BE) - add prefix PG_ to encoding identificator macros, something like 'ALT' is pretty dirty in source code, rather use PG_ALT. (Note: patch add new file mb/encname.c and remove mb/common.c) Karel Thanks for the patches, but... 1) There is a compiler error if --enable-unicode-conversion is not enabled 2) The patches break createdb. createdb should raise an error if client-only encodings such as SJIS etc. is specified. 3) I don't like following ugliness. Why not changing all of SQL_ASCII occurrences in the sources. /* * A lot of PG stuff use 'SQL_ASCII' without prefix (dirty...) */ #define SQL_ASCII PG_SQL_ASCII 4) Encoding official names are inconsistent. Here are my suggested changes (referring http://www.iana.org/assignments/character-sets, according to Peter's suggestiuon): ALT - IBM866 KOI8 - KOI8_R UNICODE - UTF_8 (Peter's suggestion) Also, I'm wondering why windows-1251, not windows_1251? or ISO_8859_1, not ISO-8859-1? there seems a confusion about the usage of _ and -. pg_enc2name pg_enc2name_tbl[] = { { SQL_ASCII, PG_SQL_ASCII }, { EUC_JP, PG_EUC_JP }, { EUC_CN, PG_EUC_CN }, { EUC_KR, PG_EUC_KR }, { EUC_TW, PG_EUC_TW }, { UNICODE,PG_UNICODE }, { MULE_INTERNAL,PG_MULE_INTERNAL }, { ISO_8859_1, PG_LATIN1 }, { ISO_8859_2, PG_LATIN2 }, { ISO_8859_3, PG_LATIN3 }, { ISO_8859_4, PG_LATIN4 }, { ISO_8859_5, PG_LATIN5 }, { KOI8, PG_KOI8 }, { window-1251,PG_WIN1251 }, { ALT,PG_ALT }, { Shift_JIS, PG_SJIS }, { Big5, PG_BIG5 }, { window-1250,PG_WIN1251 } }; ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Re: [PATCHES] encoding names
- Original Message - From: Tatsuo Ishii [EMAIL PROTECTED] Sent: Saturday, August 18, 2001 10:02 PM ALT - IBM866 Just a quick comment: ALT is not necessarily IBM866. It can be any US-ASCII or 26-character-alphabet Latin set, for example IBM819 or ISO8859-1. Is actually quite different from IBM866 in its true meaning, and they shouldn't be aliased together. ALT is used for example, when none of KOI8-R, Windows-1251, or IBM866 are available to a Russian-speaking person to read/write any text, messages and stuff, we use simple English letters to write words in Russian so that pronunciation sort of holds the same. It's something like russian_latin (as an equivalent to greek_latin in the http://www.iana.org/assignments/character-sets spec), and the writing this way reminds Polish or Serbian-Latin a bit. Serguei ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html