Re: [HACKERS] [GENERAL] trouble with to_char('L')

2010-04-26 Thread Bruce Momjian
Hiroshi Inoue wrote:
 Bruce Momjian wrote:
  Takahiro Itagaki wrote:
  Takahiro Itagaki itagaki.takah...@oss.ntt.co.jp wrote:
 
  Revised patch attached. Please test it.
  I applied this version of the patch.
  Please check wheter the bug is fixed and any buildfarm failures.
  
  Great.  I have merged in my C comments into the code with the attached
  patch so we remember why the code is setup as it is.
  
  One thing I am confused about is that, for Win32, our numeric/monetary
  handling sets lc_ctype to match numeric/monetary, while our time code in
  the same file uses that method _and_ uses wcsftime() to return the value
  in wide characters.  So, why do we do both for time?  Is there any value
  to that?
 
 Unfortunately wcsftime() is a halfway conveniece function which uses
 ANSI version of functionalities internally.
 AFAIC the only way to remove the dependency to LC_CTYPE is to call
   GeLocaleInfoW() directly.

Thanks.  I have documented this fact in a C comment;  patch attached.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com
Index: src/backend/utils/adt/pg_locale.c
===
RCS file: /cvsroot/pgsql/src/backend/utils/adt/pg_locale.c,v
retrieving revision 1.55
diff -c -c -r1.55 pg_locale.c
*** src/backend/utils/adt/pg_locale.c	24 Apr 2010 22:54:56 -	1.55
--- src/backend/utils/adt/pg_locale.c	26 Apr 2010 13:30:03 -
***
*** 627,633 
  		save_lc_time = pstrdup(save_lc_time);
  
  #ifdef WIN32
! 	/* See the WIN32 comment near the top of PGLC_localeconv() */
  	/* save user's value of ctype locale */
  	save_lc_ctype = setlocale(LC_CTYPE, NULL);
  	if (save_lc_ctype)
--- 627,641 
  		save_lc_time = pstrdup(save_lc_time);
  
  #ifdef WIN32
! 	/*
! 	 * On WIN32, there is no way to get locale-specific time values in a
! 	 * specified locale, like we do for monetary/numeric.  We can only get
! 	 * CP_ACP (see strftime_win32) or UTF16.  Therefore, we get UTF16 and
! 	 * convert it to the database locale.  However, wcsftime() internally
! 	 * uses LC_CTYPE, so we set it here.  See the WIN32 comment near the
! 	 * top of PGLC_localeconv().
! 	 */
! 
  	/* save user's value of ctype locale */
  	save_lc_ctype = setlocale(LC_CTYPE, NULL);
  	if (save_lc_ctype)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] trouble with to_char('L')

2010-04-25 Thread Hiroshi Inoue

Bruce Momjian wrote:

Takahiro Itagaki wrote:

Takahiro Itagaki itagaki.takah...@oss.ntt.co.jp wrote:


Revised patch attached. Please test it.

I applied this version of the patch.
Please check wheter the bug is fixed and any buildfarm failures.


Great.  I have merged in my C comments into the code with the attached
patch so we remember why the code is setup as it is.

One thing I am confused about is that, for Win32, our numeric/monetary
handling sets lc_ctype to match numeric/monetary, while our time code in
the same file uses that method _and_ uses wcsftime() to return the value
in wide characters.  So, why do we do both for time?  Is there any value
to that?


Unfortunately wcsftime() is a halfway conveniece function which uses
ANSI version of functionalities internally.
AFAIC the only way to remove the dependency to LC_CTYPE is to call
 GeLocaleInfoW() directly.

regards,
Hiroshi Inoue



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] trouble with to_char('L')

2010-04-24 Thread Bruce Momjian
Takahiro Itagaki wrote:
 
 Takahiro Itagaki itagaki.takah...@oss.ntt.co.jp wrote:
 
  Revised patch attached. Please test it.
 
 I applied this version of the patch.
 Please check wheter the bug is fixed and any buildfarm failures.

Great.  I have merged in my C comments into the code with the attached
patch so we remember why the code is setup as it is.

One thing I am confused about is that, for Win32, our numeric/monetary
handling sets lc_ctype to match numeric/monetary, while our time code in
the same file uses that method _and_ uses wcsftime() to return the value
in wide characters.  So, why do we do both for time?  Is there any value
to that?

Seems we should do the same for both numeric/monetary and time.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com
Index: src/backend/utils/adt/pg_locale.c
===
RCS file: /cvsroot/pgsql/src/backend/utils/adt/pg_locale.c,v
retrieving revision 1.54
diff -c -c -r1.54 pg_locale.c
*** src/backend/utils/adt/pg_locale.c	22 Apr 2010 01:55:52 -	1.54
--- src/backend/utils/adt/pg_locale.c	24 Apr 2010 22:43:53 -
***
*** 41,46 
--- 41,50 
   * DOES NOT WORK RELIABLY: on some platforms the second setlocale() call
   * will change the memory save is pointing at.	To do this sort of thing
   * safely, you *must* pstrdup what setlocale returns the first time.
+  *
+  * FYI, The Open Group locale standard is defined here:
+  *
+  *  http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html
   *--
   */
  
***
*** 424,430 
  	char	   *grouping;
  	char	   *thousands_sep;
  	int			encoding;
- 
  #ifdef WIN32
  	char	   *save_lc_ctype;
  #endif
--- 428,433 
***
*** 435,459 
  
  	free_struct_lconv(CurrentLocaleConv);
  
! 	/* Set user's values of monetary and numeric locales */
  	save_lc_monetary = setlocale(LC_MONETARY, NULL);
  	if (save_lc_monetary)
  		save_lc_monetary = pstrdup(save_lc_monetary);
  	save_lc_numeric = setlocale(LC_NUMERIC, NULL);
  	if (save_lc_numeric)
  		save_lc_numeric = pstrdup(save_lc_numeric);
  
  #ifdef WIN32
! 	/* set user's value of ctype locale */
  	save_lc_ctype = setlocale(LC_CTYPE, NULL);
  	if (save_lc_ctype)
  		save_lc_ctype = pstrdup(save_lc_ctype);
- #endif
  
! 	/* Get formatting information for numeric */
! #ifdef WIN32
  	setlocale(LC_CTYPE, locale_numeric);
  #endif
  	setlocale(LC_NUMERIC, locale_numeric);
  	extlconv = localeconv();
  	encoding = pg_get_encoding_from_locale(locale_numeric);
--- 438,485 
  
  	free_struct_lconv(CurrentLocaleConv);
  
! 	/* Save user's values of monetary and numeric locales */
  	save_lc_monetary = setlocale(LC_MONETARY, NULL);
  	if (save_lc_monetary)
  		save_lc_monetary = pstrdup(save_lc_monetary);
+ 
  	save_lc_numeric = setlocale(LC_NUMERIC, NULL);
  	if (save_lc_numeric)
  		save_lc_numeric = pstrdup(save_lc_numeric);
  
  #ifdef WIN32
!/*
! 	*  Ideally, monetary and numeric local symbols could be returned in
! 	*  any server encoding.  Unfortunately, the WIN32 API does not allow
! 	*  setlocale() to return values in a codepage/CTYPE that uses more
! 	*  than two bytes per character, like UTF-8:
! 	*
! 	*  http://msdn.microsoft.com/en-us/library/x99tb11d.aspx
! 	*
! 	*  Evidently, LC_CTYPE allows us to control the encoding used
! 	*  for strings returned by localeconv().  The Open Group
! 	*  standard, mentioned at the top of this C file, doesn't
! 	*  explicitly state this.
! 	*
! 	*  Therefore, we set LC_CTYPE to match LC_NUMERIC or LC_MONETARY
! 	*  (which cannot be UTF8), call localeconv(), and then convert from
! 	*  the numeric/monitary LC_CTYPE to the server encoding.  One
! 	*  example use of this is for the Euro symbol.
! 	*
! 	*  Perhaps someday we will use GetLocaleInfoW() which returns values
! 	*  in UTF16 and convert from that.
! 	*/
! 
! 	/* save user's value of ctype locale */
  	save_lc_ctype = setlocale(LC_CTYPE, NULL);
  	if (save_lc_ctype)
  		save_lc_ctype = pstrdup(save_lc_ctype);
  
! 	/* use numeric to set the ctype */
  	setlocale(LC_CTYPE, locale_numeric);
  #endif
+ 
+ 	/* Get formatting information for numeric */
  	setlocale(LC_NUMERIC, locale_numeric);
  	extlconv = localeconv();
  	encoding = pg_get_encoding_from_locale(locale_numeric);
***
*** 462,471 
  	thousands_sep = db_encoding_strdup(encoding, extlconv-thousands_sep);
  	grouping = strdup(extlconv-grouping);
  
- 	/* Get formatting information for monetary */
  #ifdef WIN32
  	setlocale(LC_CTYPE, locale_monetary);
  #endif
  	setlocale(LC_MONETARY, locale_monetary);
  	extlconv = localeconv();
  	encoding = pg_get_encoding_from_locale(locale_monetary);
--- 488,499 
  	thousands_sep = db_encoding_strdup(encoding, extlconv-thousands_sep);
  	grouping = strdup(extlconv-grouping);
  
  #ifdef WIN32
+ 	/* use monetary to set the ctype */
  	

Re: [HACKERS] [GENERAL] trouble with to_char('L')

2010-04-21 Thread Takahiro Itagaki

Takahiro Itagaki itagaki.takah...@oss.ntt.co.jp wrote:

 Revised patch attached. Please test it.

I applied this version of the patch.
Please check wheter the bug is fixed and any buildfarm failures.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] trouble with to_char('L')

2010-04-20 Thread Takahiro Itagaki

Magnus Hagander mag...@hagander.net wrote:

  1. setlocale(LC_CTYPE, lc_monetary)
  2. setlocale(LC_MONETARY, lc_monetary)
  3. lc = localeconv()
  4. pg_do_encoding_conversion(lc-xxx,
   FROM pg_get_encoding_from_locale(lc_monetary),
   TO GetDatabaseEncoding())
  5. Revert LC_CTYPE and LC_MONETARY.

A patch attached for the above straightforwardly. Does this work?
Note that #ifdef WIN32 parts in the patch are harmless on other platforms
even if they are enabled.

 Let's work off what we have now to start with at least. Bruce, can you
 comment on that thing about the extra parameter? And UTF8?

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center



pg_locale_20100420.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] trouble with to_char('L')

2010-04-20 Thread Bruce Momjian
Magnus Hagander wrote:
  One new idea would be to set LC_CTYPE to UTF16/widechars unconditionally
  on Win32 and then just convert that always to the server encoding with
  win32_wchar_to_db_encoding(), instead of using the encoding from
  LC_MONETARY to set LC_CTYPE and having to do double-conversion.
 
 So, hugely late, reviving this thread.
 
 Ideally, we should definitely consider doing that. Internally, Windows
 will do it in UTF16 anyway. So we're basically doing
 UTF16-db-UTF16-UTF8-db or something like that with this patch.
 
 But I'm unsure how that would work. We're talking about the output of
 localeconv(), right? I don't see a version of localeconv() that does
 wide chars anywhere. (You can't just set LC_CTYPE and use the regular
 function - Windows has a separate set of functions for dealing with
 UTF16).

I thought there was an LC_CTYPE for UTF16 that we could use without a
wide version of that function.  If not, forget that idea.

 Looking at the patch, you're passing item to db_encoding_strdup()
 but it doesn't seem to be used anywhere. Leftover from previous
 experiments, or forgot to use it? Perhaps you intended for it to be in
 the error messages?

It originally was in the error message but can be removed.  I have now
removed 'item' from my version of the patch.

 Also, won't this need special-casing for UTF8? Per comment in
 mbutils.c, wcstombs() doesn't work for UTF8 encodings - you need to
 use MultiByteToWideChar().

Well, we don't support UTF8 for any of the non-encoding locales, e.g.
monetary, numeric, so I never considered that we would support it.  If
we did support it, we would have to _pick_ a locale that is = 2 bytes
per character and use that, and then convert to UTF8, but what locale
would we pick?  They could use a LC_TYPE that is = 2 bytes and a
numeric that is UTF8, but I never suspected we would want to support
that, and we would need some logic to detect that case.

 I also note that we have char2wchar() already - we should perhaps just
 call that? Or will that use the wrong locale?

I see char2wchar() calling GetDatabaseEncoding() right away, which does
use the cached value for the server encoding, so I don't think it will
work.  We can't use our existing routines to convert _from_ the current
encoding to wide characters (because our numeric encoding might not
match the server encoding).  However, we can use existing code that
converts from wide to the server encoding, perhaps replacing
win32_wchar_to_db_encoding().

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] trouble with to_char('L')

2010-04-20 Thread Bruce Momjian
Takahiro Itagaki wrote:
 
 Magnus Hagander mag...@hagander.net wrote:
 
   1. setlocale(LC_CTYPE, lc_monetary)
   2. setlocale(LC_MONETARY, lc_monetary)
   3. lc = localeconv()
   4. pg_do_encoding_conversion(lc-xxx,
FROM pg_get_encoding_from_locale(lc_monetary),
TO GetDatabaseEncoding())
   5. Revert LC_CTYPE and LC_MONETARY.
 
 A patch attached for the above straightforwardly. Does this work?
 Note that #ifdef WIN32 parts in the patch are harmless on other platforms
 even if they are enabled.

I like this patch.  Instead of having special code to convert from the
_current_ locale, you pass the encoding name to our routines.  This does
mean we are bound by supporting only the encodings PG supports, not the
full range of Win32 encodings, but that seems fine.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] trouble with to_char('L')

2010-04-20 Thread Bruce Momjian
Magnus Hagander wrote:
  Another idea is to use GetLocaleInfoW() [1], that is win32 native locale
  functions, instead of the libc one. It returns locale characters in wide
  chars, so we can safely convert them as UTF16-UTF8-db. But it requires
  an additional branch in our locale codes only for Windows.
 
 If we can go UTF16-db directly, it might be a good idea. If we're
 going via UTF8 anyway, I doubt it's going to be worth it.
 
 Let's work off what we have now to start with at least. Bruce, can you
 comment on that thing about the extra parameter? And UTF8?

I do like the idea of using UTF16 directly because that would eliminate
our need to even set LC_CTYPE for Win32 in this routine.  That would
also eliminate any need to refer to the encoding for numeric/monetary,
so we could get rid of the odd case where their encoding is UTF8 but
their numeric/monetary locale settings have to use a non-UTF8 encoding. 
For example, the original bug report has these locale settings:

http://archives.postgresql.org/pgsql-general/2009-04/msg00829.php

psql (PostgreSQL) 8.3.7

server_version 8.3.7
server_encoding UTF8
client_encoding win1252
lc_numeric Finnish, Finland
lc_monetary Finnish, Finland

but really needed to use Finnish_Finland.1252:

http://archives.postgresql.org/pgsql-general/2009-04/msg00859.php

However, I noticed that both lc_collate and lc_ctype are set to
Finnish_Finland.1252 by the installer. Should I have just run initdb
with --locale fi_FI.UTF8 at the very start? The to_char('L') works
fine with a database with win1252 encoding.

Of course, that still does not work with our current CVS code if the
database encoding is UTF8, which is what we are trying to fix now.

I am not even sure how users set these things properly but I assume the
installer does all that magic.  And, of course, if someone manually runs
initdb on Windows, they can easily set things wrong.

Magnus, if I remember correctly, all our non-UTF8 to UTF8 conversion
already has to pass through UTF16 as an intermediary case, so going to
UTF16 directly seems fine.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] trouble with to_char('L')

2010-04-20 Thread Hiroshi Inoue

Takahiro Itagaki wrote:

Magnus Hagander mag...@hagander.net wrote:


1. setlocale(LC_CTYPE, lc_monetary)
2. setlocale(LC_MONETARY, lc_monetary)
3. lc = localeconv()
4. pg_do_encoding_conversion(lc-xxx,
 FROM pg_get_encoding_from_locale(lc_monetary),
 TO GetDatabaseEncoding())
5. Revert LC_CTYPE and LC_MONETARY.


A patch attached for the above straightforwardly. Does this work?


I have 2 questions about this patch.

1. How does it work when LC_MONETARY and LC_NUMERIC are different?
2. Calling db_encoding_strdup() for lconv-grouping is appropriate?

regards,
Hiroshi Inoue


Note that #ifdef WIN32 parts in the patch are harmless on other platforms
even if they are enabled.


Let's work off what we have now to start with at least. Bruce, can you
comment on that thing about the extra parameter? And UTF8?


Regards,
---
Takahiro Itagaki
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] trouble with to_char('L')

2010-04-20 Thread Takahiro Itagaki

Hiroshi Inoue in...@tpf.co.jp wrote:

 1. How does it work when LC_MONETARY and LC_NUMERIC are different?

I think it is rarely used, but possible. Fixed.

 2. Calling db_encoding_strdup() for lconv-grouping is appropriate?

Ah, we didn't need it. Removed.

Revised patch attached. Please test it.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center



pg_locale_20100421.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] trouble with to_char('L')

2010-04-19 Thread Magnus Hagander
On Mon, Apr 19, 2010 at 03:59, Takahiro Itagaki
itagaki.takah...@oss.ntt.co.jp wrote:

 Magnus Hagander mag...@hagander.net wrote:

 But I'm unsure how that would work. We're talking about the output of
 localeconv(), right? I don't see a version of localeconv() that does
 wide chars anywhere. (You can't just set LC_CTYPE and use the regular
 function - Windows has a separate set of functions for dealing with
 UTF16).

 Yeah, msvcrt doesn't have wlocaleconv :-( . Since localeconv() returns
 characters in the encoding specified in LC_TYPE, we need to hande the
 issue with codes something like:

    1. setlocale(LC_CTYPE, lc_monetary)
    2. setlocale(LC_MONETARY, lc_monetary)
    3. lc = localeconv()
    4. pg_do_encoding_conversion(lc-xxx,
          FROM pg_get_encoding_from_locale(lc_monetary),
          TO GetDatabaseEncoding())
    5. Revert LC_CTYPE and LC_MONETARY.


 Another idea is to use GetLocaleInfoW() [1], that is win32 native locale
 functions, instead of the libc one. It returns locale characters in wide
 chars, so we can safely convert them as UTF16-UTF8-db. But it requires
 an additional branch in our locale codes only for Windows.

If we can go UTF16-db directly, it might be a good idea. If we're
going via UTF8 anyway, I doubt it's going to be worth it.

Let's work off what we have now to start with at least. Bruce, can you
comment on that thing about the extra parameter? And UTF8?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] trouble with to_char('L')

2010-04-18 Thread Takahiro Itagaki

Magnus Hagander mag...@hagander.net wrote:

 But I'm unsure how that would work. We're talking about the output of
 localeconv(), right? I don't see a version of localeconv() that does
 wide chars anywhere. (You can't just set LC_CTYPE and use the regular
 function - Windows has a separate set of functions for dealing with
 UTF16).

Yeah, msvcrt doesn't have wlocaleconv :-( . Since localeconv() returns
characters in the encoding specified in LC_TYPE, we need to hande the
issue with codes something like:

1. setlocale(LC_CTYPE, lc_monetary)
2. setlocale(LC_MONETARY, lc_monetary)
3. lc = localeconv()
4. pg_do_encoding_conversion(lc-xxx,
  FROM pg_get_encoding_from_locale(lc_monetary),
  TO GetDatabaseEncoding())
5. Revert LC_CTYPE and LC_MONETARY.


Another idea is to use GetLocaleInfoW() [1], that is win32 native locale
functions, instead of the libc one. It returns locale characters in wide
chars, so we can safely convert them as UTF16-UTF8-db. But it requires
an additional branch in our locale codes only for Windows.

[1] http://msdn.microsoft.com/en-us/library/dd318101

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] trouble with to_char('L')

2010-04-16 Thread Magnus Hagander
On Mon, Mar 22, 2010 at 9:14 PM, Bruce Momjian br...@momjian.us wrote:
 Takahiro Itagaki wrote:

 Bruce Momjian br...@momjian.us wrote:

  Takahiro Itagaki wrote:
   Since 9.0 has GetPlatformEncoding() for the purpose, we could simplify
   db_encoding_strdup() with the function. Like this:
 
  OK, I don't have any Win32 people testing this patch so if we want this
  fixed for 9.0 someone is going to have to test my patch to see that it
  works.  Can you make the adjustments suggested above to my patch and
  test it to see that it works so we can apply it for 9.0?

 Here is a full patch that can be applied cleanly to HEAD.
 Can anyone test it on Windows?

 I'm not sure why temporary changes of lc_ctype was required in the
 original patch. The codes are not included in my patch, but please
 notice me it is still needed.

 Sorry for the delay in replying to you.

 I considered your idea of using the existing Postgres encoding
 conversion routines to do the conversion of localenv() strings, but
 found two problems.

 First, GetPlatformEncoding() caches its result, so it assumes the
 LC_CTYPE never changes for the server, while fixing this issue actually
 requires us to change LC_CTYPE.  We could avoid the caching but that
 then involves complex table lookups, etc, which seems overly complex:

 +       /* convert the string to the database encoding */
 +       pstr = (char *) pg_do_encoding_conversion(
 +                                               (unsigned char *) str, 
 strlen(str),
 +                                               GetPlatformEncoding(), 
 GetDatabaseEncoding());

 Second, having our backend routines do the conversion seems wrong
 because it is possible for someone to set LC_MONETARY to an encoding
 that our database does not understand, e.g. UTF16, but one that WIN32
 can convert to a valid encoding.

 The reason we are doing all this is because of this updated comment in
 my patch:

        ftp://momjian.us/pub/postgresql/mypatches/pg_locale

 +    *  Ideally, monetary and numeric local symbols could be returned in
 +    *  any server encoding.  Unfortunately, the WIN32 API does not allow
 +    *  setlocale() to return values in a codepage/CTYPE that uses more
 +    *  than two bytes per character, like UTF-8:
 +    *
 +    *      http://msdn.microsoft.com/en-us/library/x99tb11d.aspx
 +    *
 +    *  Evidently, LC_CTYPE allows us to control the encoding used
 +    *  for strings returned by localeconv().  The Open Group
 +    *  standard, mentioned at the top of this C file, doesn't
 +    *  explicitly state this.
 +    *
 +    *  Therefore, we set LC_CTYPE to match LC_NUMERIC and
 +    *  LC_MONETARY, call localeconv(), and use mbstowcs() to
 +    *  convert the locale-aware string, e.g. Euro symbol (which
 +    *  is not in UTF-8), to the server encoding.

 One new idea would be to set LC_CTYPE to UTF16/widechars unconditionally
 on Win32 and then just convert that always to the server encoding with
 win32_wchar_to_db_encoding(), instead of using the encoding from
 LC_MONETARY to set LC_CTYPE and having to do double-conversion.

So, hugely late, reviving this thread.

Ideally, we should definitely consider doing that. Internally, Windows
will do it in UTF16 anyway. So we're basically doing
UTF16-db-UTF16-UTF8-db or something like that with this patch.

But I'm unsure how that would work. We're talking about the output of
localeconv(), right? I don't see a version of localeconv() that does
wide chars anywhere. (You can't just set LC_CTYPE and use the regular
function - Windows has a separate set of functions for dealing with
UTF16).

Looking at the patch, you're passing item to db_encoding_strdup()
but it doesn't seem to be used anywhere. Leftover from previous
experiments, or forgot to use it? Perhaps you intended for it to be in
the error messages?

Also, won't this need special-casing for UTF8? Per comment in
mbutils.c, wcstombs() doesn't work for UTF8 encodings - you need to
use MultiByteToWideChar().

I also note that we have char2wchar() already - we should perhaps just
call that? Or will that use the wrong locale?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] trouble with to_char('L')

2010-03-22 Thread Bruce Momjian
Takahiro Itagaki wrote:
 
 Bruce Momjian br...@momjian.us wrote:
 
  Takahiro Itagaki wrote:
   Since 9.0 has GetPlatformEncoding() for the purpose, we could simplify
   db_encoding_strdup() with the function. Like this:
  
  OK, I don't have any Win32 people testing this patch so if we want this
  fixed for 9.0 someone is going to have to test my patch to see that it
  works.  Can you make the adjustments suggested above to my patch and
  test it to see that it works so we can apply it for 9.0?
 
 Here is a full patch that can be applied cleanly to HEAD.
 Can anyone test it on Windows?
 
 I'm not sure why temporary changes of lc_ctype was required in the
 original patch. The codes are not included in my patch, but please
 notice me it is still needed.

Sorry for the delay in replying to you.

I considered your idea of using the existing Postgres encoding
conversion routines to do the conversion of localenv() strings, but
found two problems.

First, GetPlatformEncoding() caches its result, so it assumes the
LC_CTYPE never changes for the server, while fixing this issue actually
requires us to change LC_CTYPE.  We could avoid the caching but that
then involves complex table lookups, etc, which seems overly complex:

+   /* convert the string to the database encoding */
+   pstr = (char *) pg_do_encoding_conversion(
+   (unsigned char *) str, 
strlen(str),
+   GetPlatformEncoding(), 
GetDatabaseEncoding());

Second, having our backend routines do the conversion seems wrong
because it is possible for someone to set LC_MONETARY to an encoding
that our database does not understand, e.g. UTF16, but one that WIN32
can convert to a valid encoding.

The reason we are doing all this is because of this updated comment in
my patch:

ftp://momjian.us/pub/postgresql/mypatches/pg_locale

+*  Ideally, monetary and numeric local symbols could be returned in
+*  any server encoding.  Unfortunately, the WIN32 API does not allow
+*  setlocale() to return values in a codepage/CTYPE that uses more
+*  than two bytes per character, like UTF-8:
+*
+*  http://msdn.microsoft.com/en-us/library/x99tb11d.aspx
+*
+*  Evidently, LC_CTYPE allows us to control the encoding used
+*  for strings returned by localeconv().  The Open Group
+*  standard, mentioned at the top of this C file, doesn't
+*  explicitly state this.
+*
+*  Therefore, we set LC_CTYPE to match LC_NUMERIC and
+*  LC_MONETARY, call localeconv(), and use mbstowcs() to
+*  convert the locale-aware string, e.g. Euro symbol (which
+*  is not in UTF-8), to the server encoding.

One new idea would be to set LC_CTYPE to UTF16/widechars unconditionally
on Win32 and then just convert that always to the server encoding with
win32_wchar_to_db_encoding(), instead of using the encoding from
LC_MONETARY to set LC_CTYPE and having to do double-conversion.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] trouble with to_char('L')

2010-03-17 Thread Takahiro Itagaki

Bruce Momjian br...@momjian.us wrote:

 Takahiro Itagaki wrote:
  Since 9.0 has GetPlatformEncoding() for the purpose, we could simplify
  db_encoding_strdup() with the function. Like this:
 
 OK, I don't have any Win32 people testing this patch so if we want this
 fixed for 9.0 someone is going to have to test my patch to see that it
 works.  Can you make the adjustments suggested above to my patch and
 test it to see that it works so we can apply it for 9.0?

Here is a full patch that can be applied cleanly to HEAD.
Can anyone test it on Windows?

I'm not sure why temporary changes of lc_ctype was required in the
original patch. The codes are not included in my patch, but please
notice me it is still needed.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center



pg_locale_20100318.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] trouble with to_char('L')

2010-03-12 Thread Bruce Momjian
Takahiro Itagaki wrote:
 
 Bruce Momjian br...@momjian.us wrote:
 
  OK, I have created a new function, win32_wchar_to_db_encoding(), to
  share the conversion from wide characters to the database encoding.
  New patch attached.
 
 Since 9.0 has GetPlatformEncoding() for the purpose, we could simplify
 db_encoding_strdup() with the function. Like this:
 
 static char *
 db_encoding_strdup(const char *str)
 {
   char   *pstr;
   char   *mstr;
 
   /* convert the string to the database encoding */
   pstr = (char *) pg_do_encoding_conversion(
   (unsigned char *) str, 
 strlen(str),
   GetPlatformEncoding(), 
 GetDatabaseEncoding());
   mstr = strdup(pstr);
   if (pstr != str)
   pfree(pstr);
 
   return mstr;
 }
 
 I beleive the code is harmless on all platforms and we can use it
 instead of strdup() without any #ifdef WIN32 quotes.

OK, I don't have any Win32 people testing this patch so if we want this
fixed for 9.0 someone is going to have to test my patch to see that it
works.  Can you make the adjustments suggested above to my patch and
test it to see that it works so we can apply it for 9.0?

 BTW, I found we'd better to add ANSI_X3.4-1968 as an alias for
 PG_SQL_ASCII. My Fedora 12 returns the name when --no-locale is used.

OK.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] trouble with to_char('L')

2010-03-11 Thread Takahiro Itagaki

Bruce Momjian br...@momjian.us wrote:

 OK, I have created a new function, win32_wchar_to_db_encoding(), to
 share the conversion from wide characters to the database encoding.
 New patch attached.

Since 9.0 has GetPlatformEncoding() for the purpose, we could simplify
db_encoding_strdup() with the function. Like this:

static char *
db_encoding_strdup(const char *str)
{
char   *pstr;
char   *mstr;

/* convert the string to the database encoding */
pstr = (char *) pg_do_encoding_conversion(
(unsigned char *) str, 
strlen(str),
GetPlatformEncoding(), 
GetDatabaseEncoding());
mstr = strdup(pstr);
if (pstr != str)
pfree(pstr);

return mstr;
}

I beleive the code is harmless on all platforms and we can use it
instead of strdup() without any #ifdef WIN32 quotes.


BTW, I found we'd better to add ANSI_X3.4-1968 as an alias for
PG_SQL_ASCII. My Fedora 12 returns the name when --no-locale is used.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] trouble with to_char('L')

2010-03-02 Thread Hiroshi Inoue

Bruce Momjian wrote:

Hiroshi Inoue wrote:

Bruce Momjian wrote:

Bruce Momjian wrote:

Hiroshi Inoue wrote:

Bruce Momjian wrote:

Hiroshi Inoue wrote:

Bruce Momjian wrote:

Where are we on this issue?

Oops I forgot it completely.
I have a little improved version and would post it tonight.

Ah, very good.  Thanks.

Attached is an improved version.

I spent many hours on this patch and am attaching an updated version.
I have restructured the code and added many comments, but this is the
main one:

*  Ideally, the server encoding and locale settings would
*  always match.  Unfortunately, WIN32 does not support UTF-8
*  values for setlocale(), even though PostgreSQL runs fine with
*  a UTF-8 encoding on Windows:
*
*  http://msdn.microsoft.com/en-us/library/x99tb11d.aspx
*
*  Therefore, we must set LC_CTYPE to match LC_NUMERIC and
*  LC_MONETARY, call localeconv(), and use mbstowcs() to
*  convert the locale-aware string, e.g. Euro symbol, which
*  is not in UTF-8 to the server encoding.

I need someone with WIN32 experience to review and test this patch.

I don't understand why cache_locale_time() works on Windows.  It sets
the LC_CTYPE but does not do any encoding coversion.

Doesn't strftime_win32 do the conversion?


Oh, I now see strftime is redefined as a macro in that C files.  Thanks.


Do month and
day-of-week names not work either, or do they work and the encoding
conversion for numeric/money, e.g. Euro, it not necessary?

db_strdup does the conversion.


Should we pull the encoding conversion into a separate function and have
strftime_win32() and db_strdup() both call it?


We may be able to pull the conversion WideChars = UTF8 =
a PG encoding into an function.

BTW both PGLC_localeconv() and cache_locale_time() save the current
 LC_CTYPE first and restore them just before returning the functions.
I'm suspicious if it's OK when errors occur in middle of the functions.

regards,
Hiroshi Inoue

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] trouble with to_char('L')

2010-03-02 Thread Bruce Momjian
Hiroshi Inoue wrote:
  I need someone with WIN32 experience to review and test this patch.
  I don't understand why cache_locale_time() works on Windows.  It sets
  the LC_CTYPE but does not do any encoding coversion.
  Doesn't strftime_win32 do the conversion?
  
  Oh, I now see strftime is redefined as a macro in that C files.  Thanks.
  
  Do month and
  day-of-week names not work either, or do they work and the encoding
  conversion for numeric/money, e.g. Euro, it not necessary?
  db_strdup does the conversion.
  
  Should we pull the encoding conversion into a separate function and have
  strftime_win32() and db_strdup() both call it?
 
 We may be able to pull the conversion WideChars = UTF8 =
 a PG encoding into an function.

OK, I have created a new function, win32_wchar_to_db_encoding(), to
share the conversion from wide characters to the database encoding.
New patch attached.

 BTW both PGLC_localeconv() and cache_locale_time() save the current
   LC_CTYPE first and restore them just before returning the functions.
 I'm suspicious if it's OK when errors occur in middle of the functions.

Yea, I added a comment questioning if that is a problem.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do
Index: src/backend/utils/adt/pg_locale.c
===
RCS file: /cvsroot/pgsql/src/backend/utils/adt/pg_locale.c,v
retrieving revision 1.53
diff -c -c -r1.53 pg_locale.c
*** src/backend/utils/adt/pg_locale.c	27 Feb 2010 20:20:44 -	1.53
--- src/backend/utils/adt/pg_locale.c	2 Mar 2010 18:11:41 -
***
*** 4,10 
   *
   * Portions Copyright (c) 2002-2010, PostgreSQL Global Development Group
   *
!  * $PostgreSQL: pgsql/src/backend/utils/adt/pg_locale.c,v 1.53 2010/02/27 20:20:44 momjian Exp $
   *
   *---
   */
--- 4,10 
   *
   * Portions Copyright (c) 2002-2010, PostgreSQL Global Development Group
   *
!  * $PostgreSQL: pgsql/src/backend/utils/adt/pg_locale.c,v 1.51 2010/01/02 16:57:54 momjian Exp $
   *
   *---
   */
***
*** 96,101 
--- 96,109 
  static char *IsoLocaleName(const char *);		/* MSVC specific */
  #endif
  
+ #ifdef WIN32
+ static size_t win32_wchar_to_db_encoding(const wchar_t *wbuf,
+ const size_t wchars, char *dst, size_t dstlen);
+ static char *db_encoding_strdup(const char *item, const char *str);
+ static size_t strftime_win32(char *dst, size_t dstlen, const wchar_t *format,
+ 			 const struct tm *tm);
+ #endif
+ 
  
  /*
   * pg_perm_setlocale
***
*** 387,392 
--- 395,488 
  }
  
  
+ #ifdef	WIN32
+ /*
+  *	Convert wide character string (UTF16 on Win32) to UTF8, and then
+  *	optionally to the db encoding.
+  */
+ static size_t win32_wchar_to_db_encoding(const wchar_t *wbuf,
+ const size_t wchars, char *dst, size_t dstlen)
+ {
+ 	int	db_encoding = GetDatabaseEncoding();
+ 	int	utf8len;
+ 
+ 	/* Convert wide string (UTF16) to UTF8 */
+ 	utf8len = WideCharToMultiByte(CP_UTF8, 0, wbuf, wchars, dst, dstlen, NULL, NULL);
+ 	if (utf8len == 0)
+ 		/* Does this leave LC_CTYPE set incorrectly? */
+ 		elog(ERROR,
+ 			could not convert string %04x to UTF-8: error %lu, wbuf[0], GetLastError());
+ 	pfree(wbuf);
+ 
+ 	dst[utf8len] = '\0';
+ 	if (db_encoding != PG_UTF8)
+ 	{
+ 		PG_TRY();
+ 		{
+ 			char *convstr = pg_do_encoding_conversion(dst, utf8len, PG_UTF8, db_encoding);
+ 			if (dst != convstr)
+ 			{
+ strlcpy(dst, convstr, dstlen);
+ pfree(convstr);
+ 			}
+ 		}
+ 		PG_CATCH();
+ 		{
+ 			FlushErrorState();
+ 			dst[0] = '\0';
+ 		}
+ 		PG_END_TRY();
+ 	}
+ 
+ 	return pg_mbstrlen(dst);
+ }
+ 
+ /*
+  *	This converts the LC_CTYPE-encoded string returned from the
+  *	locale routines to the database encoding.
+  */
+ static char *db_encoding_strdup(const char *item, const char *str)
+ {
+ 	int	db_encoding = GetDatabaseEncoding();
+ 	size_t	wchars, ilen, wclen, dstlen;
+ 	int	bytes_per_char;
+ 	wchar_t	*wbuf;
+ 	char	*dst;
+ 
+ 	if (!str[0])
+ 		return strdup(str);
+ 
+ 	/* allocate wide character string */
+ 	ilen = strlen(str) + 1;
+ 	wclen = ilen * sizeof(wchar_t);
+ 	wbuf = (wchar_t *) palloc(wclen);
+ 
+ 	/* Convert multi-byte string using current LC_CTYPE to a wide-character string */
+ 	wchars = mbstowcs(wbuf, str, ilen);
+ 	if (wchars == (size_t) -1)
+ 		elog(ERROR,
+ 			could not convert string to wide characters: error %lu, GetLastError());
+ 
+ 	/* allocate target string */
+ 	bytes_per_char = pg_encoding_max_length(PG_UTF8);
+ 	if (pg_encoding_max_length(db_encoding)  bytes_per_char)
+ 		bytes_per_char = pg_encoding_max_length(db_encoding);
+ 	dstlen = wchars * bytes_per_char + 1;
+ 	if ((dst = malloc(dstlen)) == NULL)
+ 		elog(ERROR, could not allocate 

Re: [HACKERS] [GENERAL] trouble with to_char('L')

2010-03-01 Thread Hiroshi Inoue

Bruce Momjian wrote:

Bruce Momjian wrote:

Hiroshi Inoue wrote:

Bruce Momjian wrote:

Hiroshi Inoue wrote:

Bruce Momjian wrote:

Where are we on this issue?

Oops I forgot it completely.
I have a little improved version and would post it tonight.

Ah, very good.  Thanks.

Attached is an improved version.

I spent many hours on this patch and am attaching an updated version.
I have restructured the code and added many comments, but this is the
main one:

*  Ideally, the server encoding and locale settings would
*  always match.  Unfortunately, WIN32 does not support UTF-8
*  values for setlocale(), even though PostgreSQL runs fine with
*  a UTF-8 encoding on Windows:
*
*  http://msdn.microsoft.com/en-us/library/x99tb11d.aspx
*
*  Therefore, we must set LC_CTYPE to match LC_NUMERIC and
*  LC_MONETARY, call localeconv(), and use mbstowcs() to
*  convert the locale-aware string, e.g. Euro symbol, which
*  is not in UTF-8 to the server encoding.

I need someone with WIN32 experience to review and test this patch.


I don't understand why cache_locale_time() works on Windows.  It sets
the LC_CTYPE but does not do any encoding coversion.


Doesn't strftime_win32 do the conversion?


Do month and
day-of-week names not work either, or do they work and the encoding
conversion for numeric/money, e.g. Euro, it not necessary?


db_strdup does the conversion.

regards,
Hiroshi Inoue

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] trouble with to_char('L')

2010-03-01 Thread Bruce Momjian
Hiroshi Inoue wrote:
 Bruce Momjian wrote:
  Bruce Momjian wrote:
  Hiroshi Inoue wrote:
  Bruce Momjian wrote:
  Hiroshi Inoue wrote:
  Bruce Momjian wrote:
  Where are we on this issue?
  Oops I forgot it completely.
  I have a little improved version and would post it tonight.
  Ah, very good.  Thanks.
  Attached is an improved version.
  I spent many hours on this patch and am attaching an updated version.
  I have restructured the code and added many comments, but this is the
  main one:
 
 *  Ideally, the server encoding and locale settings would
 *  always match.  Unfortunately, WIN32 does not support UTF-8
 *  values for setlocale(), even though PostgreSQL runs fine with
 *  a UTF-8 encoding on Windows:
 *
 *  http://msdn.microsoft.com/en-us/library/x99tb11d.aspx
 *
 *  Therefore, we must set LC_CTYPE to match LC_NUMERIC and
 *  LC_MONETARY, call localeconv(), and use mbstowcs() to
 *  convert the locale-aware string, e.g. Euro symbol, which
 *  is not in UTF-8 to the server encoding.
 
  I need someone with WIN32 experience to review and test this patch.
  
  I don't understand why cache_locale_time() works on Windows.  It sets
  the LC_CTYPE but does not do any encoding coversion.
 
 Doesn't strftime_win32 do the conversion?

Oh, I now see strftime is redefined as a macro in that C files.  Thanks.

  Do month and
  day-of-week names not work either, or do they work and the encoding
  conversion for numeric/money, e.g. Euro, it not necessary?
 
 db_strdup does the conversion.

Should we pull the encoding conversion into a separate function and have
strftime_win32() and db_strdup() both call it?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] trouble with to_char('L')

2010-02-28 Thread Bruce Momjian
Bruce Momjian wrote:
 Hiroshi Inoue wrote:
  Bruce Momjian wrote:
   Hiroshi Inoue wrote:
   Bruce Momjian wrote:
   Where are we on this issue?
   Oops I forgot it completely.
   I have a little improved version and would post it tonight.
   
   Ah, very good.  Thanks.
  
  Attached is an improved version.
 
 I spent many hours on this patch and am attaching an updated version.
 I have restructured the code and added many comments, but this is the
 main one:
 
   *  Ideally, the server encoding and locale settings would
   *  always match.  Unfortunately, WIN32 does not support UTF-8
   *  values for setlocale(), even though PostgreSQL runs fine with
   *  a UTF-8 encoding on Windows:
   *
   *  http://msdn.microsoft.com/en-us/library/x99tb11d.aspx
   *
   *  Therefore, we must set LC_CTYPE to match LC_NUMERIC and
   *  LC_MONETARY, call localeconv(), and use mbstowcs() to
   *  convert the locale-aware string, e.g. Euro symbol, which
   *  is not in UTF-8 to the server encoding.
 
 I need someone with WIN32 experience to review and test this patch.

I don't understand why cache_locale_time() works on Windows.  It sets
the LC_CTYPE but does not do any encoding coversion.  Do month and
day-of-week names not work either, or do they work and the encoding
conversion for numeric/money, e.g. Euro, it not necessary?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] trouble with to_char('L')

2010-02-27 Thread Bruce Momjian
Hiroshi Inoue wrote:
 Bruce Momjian wrote:
  Hiroshi Inoue wrote:
  Bruce Momjian wrote:
  Where are we on this issue?
  Oops I forgot it completely.
  I have a little improved version and would post it tonight.
  
  Ah, very good.  Thanks.
 
 Attached is an improved version.

I spent many hours on this patch and am attaching an updated version.
I have restructured the code and added many comments, but this is the
main one:

*  Ideally, the server encoding and locale settings would
*  always match.  Unfortunately, WIN32 does not support UTF-8
*  values for setlocale(), even though PostgreSQL runs fine with
*  a UTF-8 encoding on Windows:
*
*  http://msdn.microsoft.com/en-us/library/x99tb11d.aspx
*
*  Therefore, we must set LC_CTYPE to match LC_NUMERIC and
*  LC_MONETARY, call localeconv(), and use mbstowcs() to
*  convert the locale-aware string, e.g. Euro symbol, which
*  is not in UTF-8 to the server encoding.

I need someone with WIN32 experience to review and test this patch.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do
Index: src/backend/utils/adt/pg_locale.c
===
RCS file: /cvsroot/pgsql/src/backend/utils/adt/pg_locale.c,v
retrieving revision 1.53
diff -c -c -r1.53 pg_locale.c
*** src/backend/utils/adt/pg_locale.c	27 Feb 2010 20:20:44 -	1.53
--- src/backend/utils/adt/pg_locale.c	28 Feb 2010 03:59:14 -
***
*** 4,10 
   *
   * Portions Copyright (c) 2002-2010, PostgreSQL Global Development Group
   *
!  * $PostgreSQL: pgsql/src/backend/utils/adt/pg_locale.c,v 1.53 2010/02/27 20:20:44 momjian Exp $
   *
   *---
   */
--- 4,10 
   *
   * Portions Copyright (c) 2002-2010, PostgreSQL Global Development Group
   *
!  * $PostgreSQL: pgsql/src/backend/utils/adt/pg_locale.c,v 1.51 2010/01/02 16:57:54 momjian Exp $
   *
   *---
   */
***
*** 386,391 
--- 386,459 
  		free(s-positive_sign);
  }
  
+ #ifdef	WIN32
+ /*
+  *	This converts the LC_CTYPE-encoded string returned from the
+  *	locale routines to the database encoding.
+  */
+ static char *db_encoding_strdup(const char *item, const char *str)
+ {
+ 	int	db_encoding = GetDatabaseEncoding();
+ 	size_t	wchars, ilen, wclen, dstlen;
+ 	int	utflen, bytes_per_char;
+ 	wchar_t	*wbuf;
+ 	char	*dst;
+ 
+ 	if (!str[0])
+ 		return strdup(str);
+ 	ilen = strlen(str) + 1;
+ 	wclen = ilen * sizeof(wchar_t);
+ 	wbuf = (wchar_t *) palloc(wclen);
+ 
+ 	/* Convert multi-byte string using current LC_CTYPE to a wide-character string */
+ 	wchars = mbstowcs(wbuf, str, ilen);
+ 	if (wchars == (size_t) -1)
+ 		elog(ERROR,
+ 			could not convert string to wide characters: error %lu, GetLastError());
+ 
+ 	/* allocate target string */
+ 	bytes_per_char = pg_encoding_max_length(PG_UTF8);
+ 	if (pg_encoding_max_length(db_encoding)  bytes_per_char)
+ 		bytes_per_char = pg_encoding_max_length(db_encoding);
+ 	dstlen = wchars * bytes_per_char + 1;
+ 	if ((dst = malloc(dstlen)) == NULL)
+ 		elog(ERROR, could not allocate a destination buffer);
+ 
+ 	/* Convert wide string to UTF8 */  
+ 	utflen = WideCharToMultiByte(CP_UTF8, 0, wbuf, wchars, dst, dstlen, NULL, NULL);
+ 	if (utflen == 0)
+ 		elog(ERROR,
+ 			could not convert string %04x to UTF-8: error %lu, wbuf[0], GetLastError());
+ 	pfree(wbuf);
+ 
+ 	dst[utflen] = '\0';
+ 	if (db_encoding != PG_UTF8)
+ 	{
+ 		PG_TRY();
+ 		{
+ 			char *convstr = pg_do_encoding_conversion(dst, utflen, PG_UTF8, db_encoding);
+ 			if (dst != convstr)
+ 			{
+ strlcpy(dst, convstr, dstlen);
+ pfree(convstr);
+ 			}
+ 		}
+ 		PG_CATCH();
+ 		{
+ 			FlushErrorState();
+ 			dst[0] = '\0';
+ 		}
+ 		PG_END_TRY();
+ 	}
+ 
+ 	return dst;
+ }
+ #else
+ static char *db_encoding_strdup(const char *item, const char *str)
+ {
+ 	return strdup(str);
+ }
+ #endif /* WIN32 */
  
  /*
   * Return the POSIX lconv struct (contains number/money formatting
***
*** 398,403 
--- 466,475 
  	struct lconv *extlconv;
  	char	   *save_lc_monetary;
  	char	   *save_lc_numeric;
+ #ifdef	WIN32
+ 	char	   *save_lc_ctype = NULL;
+ 	bool		lc_ctype_was_null = false;
+ #endif
  
  	/* Did we do it already? */
  	if (CurrentLocaleConvValid)
***
*** 413,442 
  	if (save_lc_numeric)
  		save_lc_numeric = pstrdup(save_lc_numeric);
  
  	setlocale(LC_MONETARY, locale_monetary);
  	setlocale(LC_NUMERIC, locale_numeric);
! 
! 	/* Get formatting information */
  	extlconv = localeconv();
  
  	/*
! 	 * Must copy all values since restoring internal settings may overwrite
  	 * localeconv()'s results.
  	 */
  	CurrentLocaleConv = *extlconv;
! 	

Re: [HACKERS] [GENERAL] trouble with to_char('L')

2009-06-03 Thread Tom Lane
Hiroshi Inoue in...@tpf.co.jp writes:
 Tom Lane wrote:
 * This seems to be assuming that the user has set LC_MONETARY and
 LC_NUMERIC the same.  What if they're different?

 Strictky speaking they should be handled individually.

I thought about this some more, and I wonder why you did it like this at
all.  The patch claimed to be copying the LC_TIME code, but the LC_TIME
code isn't trying to temporarily change any locale settings.  What we
are doing in that code is assuming that the system will give us back
the localized strings in the encoding identified by CP_ACP; so all we
have to do is convert CP_ACP to wide chars and then to UTF8.  Can't we
use a similar approach for the output of localeconv?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] trouble with to_char('L')

2009-06-03 Thread Hiroshi Inoue
Tom Lane wrote:
 Hiroshi Inoue in...@tpf.co.jp writes:
 Tom Lane wrote:
 * This seems to be assuming that the user has set LC_MONETARY and
 LC_NUMERIC the same.  What if they're different?
 
 Strictky speaking they should be handled individually.
 
 I thought about this some more, and I wonder why you did it like this at
 all.  The patch claimed to be copying the LC_TIME code, but the LC_TIME
 code isn't trying to temporarily change any locale settings. 

LC_TIME and LC_CTYPE (on Windows) settings are changed temporarily
in cache_locale_time() in pg_locale.c.

 What we
 are doing in that code is assuming that the system will give us back
 the localized strings in the encoding identified by CP_ACP; 

AFAIK it's not right. LC_TIME, LC_MONETARY or LC_NUMERIC related output
is encoded using LC_CTYPE setting.

 so all we
 have to do is convert CP_ACP to wide chars and then to UTF8.  Can't we
 use a similar approach for the output of localeconv?

What LC_CTIME code and my patch intend is setting LC_CTYPE to an
appropriate value so that related output is converted correctly.
If we can set LC_CTYPE to xxx_xxx.65001(UTF8), we can eliminate
two steps but it causes an error on Windows.

regards,
HIroshi Inoue



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] trouble with to_char('L')

2009-06-01 Thread Hiroshi Inoue
Tom Lane wrote:
 Hiroshi Inoue in...@tpf.co.jp writes:
 Tom Lane wrote:
 I think what this suggests is that there probably needs to be some
 encoding conversion logic near the places we examine localeconv()
 output.
 
 Attached is a patch to the current CVS.
 It uses a similar way like LC_TIME stuff does.
 
 I'm not really in a position to test/commit this, since I don't have a
 Windows machine.  However, since no one else is stepping up to deal with
 it, here's a quick review:

Thanks for the review.
I've forgotten the patch because Japanese doesn't have trouble with
this issue (the currency symbol is ascii \). If this is really
expected to be fixed, I would update the patch according to your
suggestion.

 * This seems to be assuming that the user has set LC_MONETARY and
 LC_NUMERIC the same.  What if they're different?

Strictky speaking they should be handled individually.

 * What if the selected locale corresponds to Unicode (ie UTF16)
 encoding?

As far as I tested set_locale(LC_MONETARY, xxx.65001) causes an error.

 * #define'ing strdup() to do something rather different from strdup
 seems pretty horrid from the standpoint of code readability and
 maintainability, especially with nary a comment explaining it.

Maybe using a function instead of strdup() which calls dbstr_win32()
in case of Windows would be better.
BTW grouping and money_grouping seem to be out of encoding conversion.
Are they guaranteed to be null terminated?

 * Code will dump core on malloc failure.

I can take care of it.

 * Since this code is surely not performance critical, I wouldn't bother
 with trying to optimize it; hence drop the special case for all-ASCII.

I can take care of it.
 
 * Surely we already have a symbol somewhere that can be used in
 place of this:
#defineMAX_BYTES_PER_CHARACTER 4

I can't find it.
max(pg_encoding_max_length(encoding), pg_encoding_max_length(PG_UTF8))
may be better.

regards,
Hiroshi Inoue



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] trouble with to_char('L')

2009-05-29 Thread Tom Lane
Hiroshi Inoue in...@tpf.co.jp writes:
 Tom Lane wrote:
 I think what this suggests is that there probably needs to be some
 encoding conversion logic near the places we examine localeconv()
 output.

 Attached is a patch to the current CVS.
 It uses a similar way like LC_TIME stuff does.

I'm not really in a position to test/commit this, since I don't have a
Windows machine.  However, since no one else is stepping up to deal with
it, here's a quick review:

* This seems to be assuming that the user has set LC_MONETARY and
LC_NUMERIC the same.  What if they're different?

* What if the selected locale corresponds to Unicode (ie UTF16)
encoding?

* #define'ing strdup() to do something rather different from strdup
seems pretty horrid from the standpoint of code readability and
maintainability, especially with nary a comment explaining it.

* Code will dump core on malloc failure.

* Since this code is surely not performance critical, I wouldn't bother
with trying to optimize it; hence drop the special case for all-ASCII.

* Surely we already have a symbol somewhere that can be used in
place of this:
 #defineMAX_BYTES_PER_CHARACTER 4


regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers