Re: [HACKERS] pg_upgrade: make the locale comparison more tolerating

Pavel Raiskup Fri, 24 Jan 2014 08:21:40 -0800

Rushabh, really sorry I have to re-create the patch and thanks a
lot for looking at it!


Looking at the patch once again, I see that there were at least two
problems.  Firstly, I used the equivalent_locale function also on the
encoding values.  Even if that should not cause bugs (as it should result
in strncasecmp anyway), it was not pretty..

The second problem was assuming that the locale specifier "A" is not
longer then locale specifier B.  Comparisons like 'en_US.utf8' with
'en_US_.utf8' would result in success.  Bug resulting from this mistake is
not real probably but it is not nice anyway..

Rather cleaning the patch once more, attached,
Pavel

>From 35b9f600b592db24bb0e25d168bc5955087d65df Mon Sep 17 00:00:00 2001
From: Pavel Raiskup <prais...@redhat.com>
Date: Sat, 21 Dec 2013 01:27:01 +0100
Subject: [PATCH] pg_upgrade: make the locale comparison more tolerating

Locale strings specified like 'cs_CZ.utf8' and 'cs_CZ.UTF-8'
should be treat as equivalent.  Absence of taking these as
equivalents caused fail during major server upgrade (when the
server machine has a little different encoding then the not yet
actualized data stack).  Workaround for that was changing the
system locale to match the previous locale string.

Applying of this commit makes the comparison to be done in two
phases.  Firstly is compared the encoding part of the locale
string (if any) and then the rest of string.  Before the encoding
part is compared, it is decoded into precisely defined code from
'enum pg_enc'.  This should make the comparison more stable even
for completely different spelling of encoding (e.g. 'latin2' and
'iso 8859-2').

References:
3356208.rhzgij6...@nb.usersys.redhat.com
20121002155857.ge30...@momjian.us
---
 contrib/pg_upgrade/check.c | 58 +++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 55 insertions(+), 3 deletions(-)

diff --git a/contrib/pg_upgrade/check.c b/contrib/pg_upgrade/check.c
new file mode 100644
index a706708..2adefb2
*** a/contrib/pg_upgrade/check.c
--- b/contrib/pg_upgrade/check.c
***************
*** 9,14 ****
--- 9,15 ----
  
  #include "postgres_fe.h"
  
+ #include "mb/pg_wchar.h"
  #include "pg_upgrade.h"
  
  
*************** set_locale_and_encoding(ClusterInfo *clu
*** 393,398 ****
--- 394,450 ----
  	PQfinish(conn);
  }
  
+ /*
+  * equivalent_encoding()
+  *
+  * Best effort encoding comparison.  Return true only if the encodings both
+  * are correctly spelled and when they are equivalent.
+  */
+ static bool
+ equivalent_encoding(const char *chara, const char *charb)
+ {
+ 	int			enca = pg_valid_server_encoding(chara);
+ 	int			encb = pg_valid_server_encoding(charb);
+ 
+ 	if (enca < 0 || encb < 0)
+ 		return false;
+ 
+ 	return (enca == encb);
+ }
+ 
+ /*
+  * equivalent_locale()
+  *
+  * Best effort locale comparison.  Return false if we are not 100% sure the
+  * locale is equivalent.
+  */
+ static bool
+ equivalent_locale(const char *loca, const char *locb)
+ {
+ 	int			lencmp;
+ 	const char *chara = strrchr(loca, '.');
+ 	const char *charb = strrchr(locb, '.');
+ 
+ 	if (!chara || !charb)
+ 		/* not both locale strings do contain encoding part */
+ 		return (pg_strcasecmp(loca, locb) == 0);
+ 	chara++;
+ 	charb++;
+ 
+ 	if (!equivalent_encoding(chara, charb))
+ 		return false;
+ 
+ 	/*
+ 	 * We know the encoding part is equivalent.  So now compare only the
+ 	 * locale identifier (e.g. en_US part of en_US.utf8).
+ 	 */
+ 
+ 	lencmp = chara - loca;
+ 	if (lencmp != charb - locb)
+ 		return false;
+ 
+ 	return (pg_strncasecmp(loca, locb, lencmp) == 0);
+ }
  
  /*
   * check_locale_and_encoding()
*************** check_locale_and_encoding(ControlData *o
*** 409,421 ****
  	 * They also often use inconsistent hyphenation, which we cannot fix, e.g.
  	 * UTF-8 vs. UTF8, so at least we display the mismatching values.
  	 */
! 	if (pg_strcasecmp(oldctrl->lc_collate, newctrl->lc_collate) != 0)
  		pg_fatal("lc_collate cluster values do not match:  old \"%s\", new \"%s\"\n",
  			   oldctrl->lc_collate, newctrl->lc_collate);
! 	if (pg_strcasecmp(oldctrl->lc_ctype, newctrl->lc_ctype) != 0)
  		pg_fatal("lc_ctype cluster values do not match:  old \"%s\", new \"%s\"\n",
  			   oldctrl->lc_ctype, newctrl->lc_ctype);
! 	if (pg_strcasecmp(oldctrl->encoding, newctrl->encoding) != 0)
  		pg_fatal("encoding cluster values do not match:  old \"%s\", new \"%s\"\n",
  			   oldctrl->encoding, newctrl->encoding);
  }
--- 461,473 ----
  	 * They also often use inconsistent hyphenation, which we cannot fix, e.g.
  	 * UTF-8 vs. UTF8, so at least we display the mismatching values.
  	 */
! 	if (!equivalent_locale(oldctrl->lc_collate, newctrl->lc_collate))
  		pg_fatal("lc_collate cluster values do not match:  old \"%s\", new \"%s\"\n",
  			   oldctrl->lc_collate, newctrl->lc_collate);
! 	if (!equivalent_locale(oldctrl->lc_ctype, newctrl->lc_ctype))
  		pg_fatal("lc_ctype cluster values do not match:  old \"%s\", new \"%s\"\n",
  			   oldctrl->lc_ctype, newctrl->lc_ctype);
! 	if (!equivalent_encoding(oldctrl->encoding, newctrl->encoding))
  		pg_fatal("encoding cluster values do not match:  old \"%s\", new \"%s\"\n",
  			   oldctrl->encoding, newctrl->encoding);
  }
-- 
1.8.5.3

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade: make the locale comparison more tolerating

Reply via email to