[HACKERS] Re: [PATCHES] encoding names

2001-08-24 Thread Bruce Momjian

   BTW, what's wrong with encoding? I don't think, for example EUC-JP
   or utf-8, are character set names.
  
  Hmm, SQL talks of character sets, it has a CHARACTER_SETS view and such.
  It's slightly incorrect, I agree.
  
  Maybe we should not touch getdatabaseencoding() right now, given that the
  names we currently use are apparently almost correct anyway and
  considering the pain it creates to alter them, and instead implement the
  information schema views in the future?
 
 I thought schema stuffs would be introduced in 7.2 but apparently it
 would not happen...

I thought I could do it but ran out of time.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



[HACKERS] Re: [PATCHES] encoding names

2001-08-24 Thread Peter Eisentraut

Tatsuo Ishii writes:

  Maybe we should not touch getdatabaseencoding() right now, given that the
  names we currently use are apparently almost correct anyway and
  considering the pain it creates to alter them, and instead implement the
  information schema views in the future?

 I thought schema stuffs would be introduced in 7.2 but apparently it
 would not happen...

True, but right now we'd have to do rather elaborate changes just to
switch a couple of names to more correct versions.  Accepting them as
input is good, but maybe we should hold back on the output part a bit
until we can do it correctly.

-- 
Peter Eisentraut   [EMAIL PROTECTED]   http://funkturm.homeip.net/~peter


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



[HACKERS] Re: [PATCHES] encoding names

2001-08-23 Thread Peter Eisentraut

Tatsuo Ishii writes:

  But getdbencoding isn't semantically different from the old
  getdatabaseencoding.  encoding isn't the right term anyway, methinks, it
  should be character set.  So maybe database_character_set()?  (No get
  please.)

 I'm not a native English speaker, so please feel free to choose more
 appropriate name.

 BTW, what's wrong with encoding? I don't think, for example EUC-JP
 or utf-8, are character set names.

Hmm, SQL talks of character sets, it has a CHARACTER_SETS view and such.
It's slightly incorrect, I agree.

Maybe we should not touch getdatabaseencoding() right now, given that the
names we currently use are apparently almost correct anyway and
considering the pain it creates to alter them, and instead implement the
information schema views in the future?

-- 
Peter Eisentraut   [EMAIL PROTECTED]   http://funkturm.homeip.net/~peter


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



[HACKERS] Re: [PATCHES] encoding names

2001-08-23 Thread Tatsuo Ishii

  BTW, what's wrong with encoding? I don't think, for example EUC-JP
  or utf-8, are character set names.
 
 Hmm, SQL talks of character sets, it has a CHARACTER_SETS view and such.
 It's slightly incorrect, I agree.
 
 Maybe we should not touch getdatabaseencoding() right now, given that the
 names we currently use are apparently almost correct anyway and
 considering the pain it creates to alter them, and instead implement the
 information schema views in the future?

I thought schema stuffs would be introduced in 7.2 but apparently it
would not happen...
--
Tatsuo Ishii

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



[HACKERS] Re: [PATCHES] encoding names

2001-08-19 Thread Karel Zak

On Sun, Aug 19, 2001 at 11:02:57AM +0900, Tatsuo Ishii wrote:

 4) Encoding official names are inconsistent. Here are my suggested
changes (referring http://www.iana.org/assignments/character-sets,
according to Peter's suggestiuon):
 
 ALT - IBM866
 KOI8 - KOI8_R
 UNICODE - UTF_8 (Peter's suggestion)
 

 Right.

 But we will still need aliases UNICODE, ALT, KOI8 for back compatibility.

 Thanks, I try fix all.
Karel

-- 
 Karel Zak  [EMAIL PROTECTED]
 http://home.zf.jcu.cz/~zakkr/
 
 C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Re: [PATCHES] encoding names

2001-08-19 Thread Tatsuo Ishii

  ALT - IBM866
 
 Just a quick comment: ALT is not necessarily IBM866.
 It can be any US-ASCII or 26-character-alphabet Latin set, for example
 IBM819 or ISO8859-1. Is  actually quite different from IBM866 in its
 true meaning, and they shouldn't be aliased together. ALT is used for example,
 when none of KOI8-R, Windows-1251, or IBM866 are available to a Russian-speaking
 person to read/write any text, messages and stuff, we use simple English letters 
 to write words in Russian so that pronunciation sort of holds the same. It's
 something like russian_latin (as an equivalent to greek_latin in the
 http://www.iana.org/assignments/character-sets spec), and the writing this
 way reminds Polish or Serbian-Latin a bit.

Ok. Let's leave ALT as it is.
--
Tatsuo Ishii

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



[HACKERS] Re: [PATCHES] encoding names

2001-08-19 Thread Tatsuo Ishii

  4) Encoding official names are inconsistent. Here are my suggested
 changes (referring http://www.iana.org/assignments/character-sets,
 according to Peter's suggestiuon):
  
  ALT - IBM866
  KOI8 - KOI8_R
  UNICODE - UTF_8 (Peter's suggestion)
  
 
  Right.
 
  But we will still need aliases UNICODE, ALT, KOI8 for back compatibility.

Sure. 

  Thanks, I try fix all.

Thanks! But we seem to leave ALT as it is (Serguei's suggestion).
--
Tatsuo Ishii


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



[HACKERS] Re: [PATCHES] encoding names

2001-08-18 Thread Tatsuo Ishii

  Hi,
 
  attached is patch with:
 
 - new encoding names stuff with better performance (binary search
   intead for() and prevent some needless searching)
 
 - possible is use synonyms for encoding (an example ISO-8859-1, 
   Latin1, l1)
 
 - implemented is Peter's idea about encoding names clearing 
   (other chars than [A-Za-z0-9] are irrelevan -- 'ISO-8859-1' is 
   same as 'iso8859_1' or iso-8-8-5-9-1 :-)  
 
 - share routines for this between FE and BE (never more define 
   encoding names separate in FE and BE)
 
 - add prefix PG_ to encoding identificator macros, something like 'ALT' 
   is pretty dirty in source code, rather use PG_ALT.
 
  (Note: patch add new file mb/encname.c and remove mb/common.c)
 
   Karel

Thanks for the patches, but...

1) There is a compiler error if --enable-unicode-conversion is not
   enabled

2) The patches break createdb. createdb should raise an error if
   client-only encodings such as SJIS etc. is specified.

3) I don't like following ugliness. Why not changing all of SQL_ASCII
   occurrences in the sources.

   /*
* A lot of PG stuff use 'SQL_ASCII' without prefix (dirty...)
 */
 #define SQL_ASCII  PG_SQL_ASCII

4) Encoding official names are inconsistent. Here are my suggested
   changes (referring http://www.iana.org/assignments/character-sets,
   according to Peter's suggestiuon):

ALT - IBM866
KOI8 - KOI8_R
UNICODE - UTF_8 (Peter's suggestion)

Also, I'm wondering why windows-1251, not windows_1251? or
ISO_8859_1, not ISO-8859-1? there seems a confusion about the
usage of _ and -.

pg_enc2name pg_enc2name_tbl[] =
{
{ SQL_ASCII,  PG_SQL_ASCII },
{ EUC_JP, PG_EUC_JP },
{ EUC_CN, PG_EUC_CN },
{ EUC_KR, PG_EUC_KR },
{ EUC_TW, PG_EUC_TW },
{ UNICODE,PG_UNICODE },
{ MULE_INTERNAL,PG_MULE_INTERNAL },
{ ISO_8859_1, PG_LATIN1 },
{ ISO_8859_2, PG_LATIN2 },
{ ISO_8859_3, PG_LATIN3 },
{ ISO_8859_4, PG_LATIN4 },
{ ISO_8859_5, PG_LATIN5 },
{ KOI8,   PG_KOI8 },
{ window-1251,PG_WIN1251 },
{ ALT,PG_ALT },
{ Shift_JIS,  PG_SJIS },
{ Big5,   PG_BIG5 },
{ window-1250,PG_WIN1251 }
};


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Re: [PATCHES] encoding names

2001-08-18 Thread Serguei Mokhov

- Original Message - 
From: Tatsuo Ishii [EMAIL PROTECTED]
Sent: Saturday, August 18, 2001 10:02 PM


 ALT - IBM866

Just a quick comment: ALT is not necessarily IBM866.
It can be any US-ASCII or 26-character-alphabet Latin set, for example
IBM819 or ISO8859-1. Is  actually quite different from IBM866 in its
true meaning, and they shouldn't be aliased together. ALT is used for example,
when none of KOI8-R, Windows-1251, or IBM866 are available to a Russian-speaking
person to read/write any text, messages and stuff, we use simple English letters 
to write words in Russian so that pronunciation sort of holds the same. It's
something like russian_latin (as an equivalent to greek_latin in the
http://www.iana.org/assignments/character-sets spec), and the writing this
way reminds Polish or Serbian-Latin a bit.

Serguei



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html