Re: [HACKERS] More message encoding woes

Heikki Linnakangas Tue, 07 Apr 2009 03:10:14 -0700

Peter Eisentraut wrote:

On Tuesday 07 April 2009 11:21:25 Heikki Linnakangas wrote:

Using the name for the latin1 encoding in the currently Windows-only
mapping table, "LATIN1", you get no translation because that name is not
recognized by the system. Using the other name "ISO-8859-1", it works.
"LATIN1" is not listed in the output of locale -m either.

You are looking in the wrong place. What we need is for iconv to recognizethe encoding name used by PostgreSQL. iconv --list is the primary hint forthat.


The locale names provided by the operating system are arbitrary and unrelated.


Oh, ok. I guess we can do the simple fix you proposed then.

Patch attached. Instead of checking for LC_CTYPE == C, I'm checking"pg_get_encoding_from_locale(NULL) == encoding" which is more close towhat we actually want. The downside is thatpg_get_encoding_from_locale(NULL) isn't exactly free, but the upside isthat we don't need to keep this in sync with the rules we have in CREATEDATABASE that enforce that locale matches encoding.

This doesn't include the cleanup to make the mapping table easier tomaintain that Magnus was going to have a look at before I started thisthread.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

*** a/src/backend/utils/mb/mbutils.c
--- b/src/backend/utils/mb/mbutils.c
***************
*** 890,896 **** cliplen(const char *str, int len, int limit)
  	return l;
  }
  
! #if defined(ENABLE_NLS) && defined(WIN32)
  static const struct codeset_map {
  	int	encoding;
  	const char *codeset;
--- 890,896 ----
  	return l;
  }
  
! #if defined(ENABLE_NLS)
  static const struct codeset_map {
  	int	encoding;
  	const char *codeset;
***************
*** 929,935 **** static const struct codeset_map {
  	{PG_EUC_TW, "EUC-TW"},
  	{PG_EUC_JIS_2004, "EUC-JP"}
  };
! #endif /* WIN32 */
  
  void
  SetDatabaseEncoding(int encoding)
--- 929,935 ----
  	{PG_EUC_TW, "EUC-TW"},
  	{PG_EUC_JIS_2004, "EUC-JP"}
  };
! #endif /* ENABLE_NLS */
  
  void
  SetDatabaseEncoding(int encoding)
***************
*** 946,960 **** SetDatabaseEncoding(int encoding)
  }
  
  /*
!  * On Windows, we need to explicitly bind gettext to the correct
!  * encoding, because gettext() tends to get confused.
   */
  void
  pg_bind_textdomain_codeset(const char *domainname, int encoding)
  {
! #if defined(ENABLE_NLS) && defined(WIN32)
  	int     i;
  
  	for (i = 0; i < lengthof(codeset_map_array); i++)
  	{
  		if (codeset_map_array[i].encoding == encoding)
--- 946,975 ----
  }
  
  /*
!  * Bind gettext to the correct encoding.
   */
  void
  pg_bind_textdomain_codeset(const char *domainname, int encoding)
  {
! #if defined(ENABLE_NLS)
  	int     i;
  
+ 	/*
+ 	 * gettext() uses the encoding specified by LC_CTYPE by default,
+ 	 * so if that matches the database encoding, we don't need to do
+ 	 * anything. This is not for performance, but because if
+ 	 * bind_textdomain_codeset() doesn't recognize the codeset name we
+ 	 * pass it, it will fall back to English and we don't want that to 
+ 	 * happen unnecessarily.
+ 	 *
+ 	 * On Windows, though, gettext() tends to get confused so we always
+ 	 * bind it.
+ 	 */
+ #ifndef WIN32
+ 	if (pg_get_encoding_from_locale(NULL) == encoding)
+ 		return;
+ #endif
+ 
  	for (i = 0; i < lengthof(codeset_map_array); i++)
  	{
  		if (codeset_map_array[i].encoding == encoding)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More message encoding woes

Reply via email to