Re: [HACKERS] using Core Foundation locale functions

2014-12-02 Thread Peter Geoghegan
On Fri, Nov 28, 2014 at 8:43 AM, Peter Eisentraut pete...@gmx.net wrote:
 At the moment, this is probably just an experiment that shows where
 refactoring and better abstractions might be suitable if we want to
 support multiple locale libraries.  If we want to pursue ICU, I think
 this could be a useful third option.

FWIW, I think that the richer API that ICU provides for string
transformations could be handy in optimizing sorting using abbreviated
keys. For example, ICU will happily only produce parts of sort keys
(the equivalent of strxfrm() blobs) if that is all that is required
[1].

I think that ICU also allows clients to parse individual primary
weights in a principled way (primary weights tend to be isomorphic to
the Unicode code points in the original string). I think that this
will enable order-preserving compression of the type anticipated by
the Unicode collation algorithm [2]. That could be useful for certain
languages, like Russian, where the primary weight level usually
contains multi-byte code points with glibc's strxfrm() (this is
generally not true of languages that use the Latin alphabet, or of
East Asian languages).

Note that there is already naturally a form of what you might call
compression with strxfrm() [3]. This is very useful for abbreviated
keys.

[1] http://userguide.icu-project.org/collation/architecture
[2] http://www.unicode.org/reports/tr10/#Run-length_Compression
[3] 
http://www.postgresql.org/message-id/cam3swztywe5j69tapvzf2cm7mhskke3uhhnk9gluqckkwqo...@mail.gmail.com
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] using Core Foundation locale functions

2014-12-02 Thread Noah Misch
On Fri, Nov 28, 2014 at 11:43:28AM -0500, Peter Eisentraut wrote:
 In light of the recent discussions about using ICU on OS X, I looked
 into the Core Foundation locale functions (Core Foundation = traditional
 Mac API in OS X, as opposed to the Unix/POSIX APIs).
 
 Attached is a proof of concept patch that just about works for the
 sorting aspects.  (The ctype aspects aren't there yet and will crash,
 but they could be done similarly.)  It passes an appropriately adjusted
 collate.linux.utf8 test, meaning that it does produce language-aware
 sort orders that are equivalent to what glibc produces.
 
 At the moment, this is probably just an experiment that shows where
 refactoring and better abstractions might be suitable if we want to
 support multiple locale libraries.  If we want to pursue ICU, I think
 this could be a useful third option.

Does this make the backend multi-threaded?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] using Core Foundation locale functions

2014-12-02 Thread Craig Ringer
On 12/02/2014 12:52 AM, David E. Wheeler wrote:
 Gotta say, I’m thrilled to see movement on this front, and especially pleased 
 to see how consensus seems to be building around an abstracted interface to 
 keep options open. This platform-specific example really highlights the need 
 for it (I had no idea that there was separate and more up-to-date collation 
 support in Core Foundation than in the UNIX layer of OS X).

It'd also potentially let us make use of Windows' native locale APIs,
which AFAIK receive considerably more love on that platform than their
POSIX back-country cousins.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] using Core Foundation locale functions

2014-12-02 Thread Peter Geoghegan
On Tue, Dec 2, 2014 at 10:07 PM, Craig Ringer cr...@2ndquadrant.com wrote:
 On 12/02/2014 12:52 AM, David E. Wheeler wrote:
 Gotta say, I’m thrilled to see movement on this front, and especially 
 pleased to see how consensus seems to be building around an abstracted 
 interface to keep options open. This platform-specific example really 
 highlights the need for it (I had no idea that there was separate and more 
 up-to-date collation support in Core Foundation than in the UNIX layer of OS 
 X).

 It'd also potentially let us make use of Windows' native locale APIs,
 which AFAIK receive considerably more love on that platform than their
 POSIX back-country cousins.

Not to mention the fact that a MultiByteToWideChar() call could be
saved, and sortsupport for text would just work on Windows.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] using Core Foundation locale functions

2014-12-01 Thread David E. Wheeler
On Nov 28, 2014, at 8:43 AM, Peter Eisentraut pete...@gmx.net wrote:
 
 At the moment, this is probably just an experiment that shows where
 refactoring and better abstractions might be suitable if we want to
 support multiple locale libraries.  If we want to pursue ICU, I think
 this could be a useful third option.

Gotta say, I’m thrilled to see movement on this front, and especially pleased 
to see how consensus seems to be building around an abstracted interface to 
keep options open. This platform-specific example really highlights the need 
for it (I had no idea that there was separate and more up-to-date collation 
support in Core Foundation than in the UNIX layer of OS X).

Really looking forward to seeing where we end up.

Best,

David
 



smime.p7s
Description: S/MIME cryptographic signature


[HACKERS] using Core Foundation locale functions

2014-11-28 Thread Peter Eisentraut
In light of the recent discussions about using ICU on OS X, I looked
into the Core Foundation locale functions (Core Foundation = traditional
Mac API in OS X, as opposed to the Unix/POSIX APIs).

Attached is a proof of concept patch that just about works for the
sorting aspects.  (The ctype aspects aren't there yet and will crash,
but they could be done similarly.)  It passes an appropriately adjusted
collate.linux.utf8 test, meaning that it does produce language-aware
sort orders that are equivalent to what glibc produces.

At the moment, this is probably just an experiment that shows where
refactoring and better abstractions might be suitable if we want to
support multiple locale libraries.  If we want to pursue ICU, I think
this could be a useful third option.
diff --git a/configure b/configure
index 7594401..371cbe0 100755
--- a/configure
+++ b/configure
@@ -708,6 +708,8 @@ with_libxml
 XML2_CONFIG
 UUID_EXTRA_OBJS
 with_uuid
+LOCALE_EXTRA_LIBS
+with_locale
 with_selinux
 with_openssl
 krb_srvtab
@@ -831,6 +833,7 @@ with_openssl
 with_selinux
 with_readline
 with_libedit_preferred
+with_locale
 with_uuid
 with_ossp_uuid
 with_libxml
@@ -1520,6 +1523,7 @@ Optional Packages:
   --without-readline  do not use GNU Readline nor BSD Libedit for editing
   --with-libedit-preferred
   prefer BSD Libedit over GNU Readline
+  --with-locale=LIB   use locale library LIB (posix,cf)
   --with-uuid=LIB build contrib/uuid-ossp using LIB (bsd,e2fs,ossp)
   --with-ossp-uuidobsolete spelling of --with-uuid=ossp
   --with-libxml   build with XML support
@@ -5677,6 +5681,51 @@ fi
 
 
 #
+# collation library
+#
+
+
+
+# Check whether --with-locale was given.
+if test ${with_locale+set} = set; then :
+  withval=$with_locale;
+  case $withval in
+yes)
+  as_fn_error $? argument required for --with-locale option $LINENO 5
+  ;;
+no)
+  as_fn_error $? argument required for --with-locale option $LINENO 5
+  ;;
+*)
+
+  ;;
+  esac
+
+else
+  with_locale=posix
+fi
+
+
+case $with_locale in
+  posix)
+
+$as_echo #define USE_LOCALE_POSIX 1 confdefs.h
+
+;;
+  cf)
+
+$as_echo #define USE_LOCALE_CF 1 confdefs.h
+
+LOCALE_EXTRA_LIBS='-framework Foundation'
+;;
+  *)
+as_fn_error $? --with-locale must specify one of posix or cf $LINENO 5
+esac
+
+
+
+
+#
 # UUID library
 #
 # There are at least three UUID libraries in common use: the FreeBSD/NetBSD
diff --git a/configure.in b/configure.in
index 0dc3f18..16b97a1 100644
--- a/configure.in
+++ b/configure.in
@@ -706,6 +706,25 @@ PGAC_ARG_BOOL(with, libedit-preferred, no,
 
 
 #
+# collation library
+#
+PGAC_ARG_REQ(with, locale, [LIB], [use locale library LIB (posix,cf)], [], [with_locale=posix])
+case $with_locale in
+  posix)
+AC_DEFINE([USE_LOCALE_POSIX], 1, [Define to 1 to use POSIX locale functions.])
+;;
+  cf)
+AC_DEFINE([USE_LOCALE_CF], 1, [Define to 1 to use Core Foundation locale functions.])
+LOCALE_EXTRA_LIBS='-framework CoreFoundation'
+;;
+  *)
+AC_MSG_ERROR([--with-locale must specify one of posix or cf])
+esac
+AC_SUBST(with_locale)
+AC_SUBST(LOCALE_EXTRA_LIBS)
+
+
+#
 # UUID library
 #
 # There are at least three UUID libraries in common use: the FreeBSD/NetBSD
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 63ff50b..fa5a60e 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -166,6 +166,7 @@ with_openssl	= @with_openssl@
 with_selinux	= @with_selinux@
 with_libxml	= @with_libxml@
 with_libxslt	= @with_libxslt@
+with_locale	= @with_locale@
 with_system_tzdata = @with_system_tzdata@
 with_uuid	= @with_uuid@
 with_zlib	= @with_zlib@
@@ -241,6 +242,7 @@ DLLWRAP = @DLLWRAP@
 LIBS = @LIBS@
 LDAP_LIBS_FE = @LDAP_LIBS_FE@
 LDAP_LIBS_BE = @LDAP_LIBS_BE@
+LOCALE_EXTRA_LIBS = @LOCALE_EXTRA_LIBS@
 UUID_LIBS = @UUID_LIBS@
 UUID_EXTRA_OBJS = @UUID_EXTRA_OBJS@
 LD = @LD@
diff --git a/src/backend/Makefile b/src/backend/Makefile
index 870a022..f793e76 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -54,7 +54,7 @@ ifneq ($(PORTNAME), win32)
 ifneq ($(PORTNAME), aix)
 
 postgres: $(OBJS)
-	$(CC) $(CFLAGS) $(LDFLAGS) $(LDFLAGS_EX) $(export_dynamic) $(call expand_subsys,$^) $(LIBS) -o $@
+	$(CC) $(CFLAGS) $(LDFLAGS) $(LDFLAGS_EX) $(export_dynamic) $(call expand_subsys,$^) $(LIBS) $(LOCALE_EXTRA_LIBS) -o $@
 
 endif
 endif
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 94bb5a4..7f441b4 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -63,6 +63,10 @@
 #include utils/pg_locale.h
 #include utils/syscache.h
 
+#ifdef USE_LOCALE_CF
+#include cf_locale.h
+#endif
+
 #ifdef WIN32
 /*
  * This Windows file defines StrNCpy. We don't need it here, so we undefine
@@ -1023,7 +1027,6 @@ lc_ctype_is_c(Oid collation)
 
 
 /* simple subroutine for reporting errors from newlocale() */
-#ifdef HAVE_LOCALE_T
 static void
 report_newlocale_failure(const