Re: [HACKERS] Per-column collation, proof of concept

2010-08-18 Thread Peter Eisentraut
On tis, 2010-08-17 at 01:16 -0500, Jaime Casanova wrote:
  creating collations ...FATAL:  invalid byte sequence for encoding
  UTF8: 0xe56c09
  CONTEXT:  COPY tmp_pg_collation, line 86
  STATEMENT:  COPY tmp_pg_collation FROM
  E'/usr/local/pgsql/9.1/share/locales.txt';
  
 
  Hmm, what is in that file on that line?
 
 
 
 bokmål  ISO-8859-1

Hey, that borders on genius: Use a non-ASCII letter in the name of a
locale whose purpose it is to configure how non-ASCII letters are
interpreted. :-/

Interestingly, I don't see this on a Debian system.  Good thing to know
that this needs separate testing on different Linux variants.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-column collation, proof of concept

2010-08-18 Thread Jaime Casanova
On Wed, Aug 18, 2010 at 11:29 AM, Peter Eisentraut pete...@gmx.net wrote:
 On tis, 2010-08-17 at 01:16 -0500, Jaime Casanova wrote:
  creating collations ...FATAL:  invalid byte sequence for encoding
  UTF8: 0xe56c09
  CONTEXT:  COPY tmp_pg_collation, line 86
  STATEMENT:  COPY tmp_pg_collation FROM
  E'/usr/local/pgsql/9.1/share/locales.txt';
  
 
  Hmm, what is in that file on that line?
 
 

 bokmål  ISO-8859-1

 Hey, that borders on genius: Use a non-ASCII letter in the name of a
 locale whose purpose it is to configure how non-ASCII letters are
 interpreted. :-/

 Interestingly, I don't see this on a Debian system.  Good thing to know
 that this needs separate testing on different Linux variants.



Yeah! and when installing centos 5 i don't have a chance to choose
what locales i want, it just installs all of them

-- 
Jaime Casanova         www.2ndQuadrant.com
Soporte y capacitación de PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-column collation, proof of concept

2010-08-17 Thread Jaime Casanova
On Mon, Aug 16, 2010 at 10:56 PM, Peter Eisentraut pete...@gmx.net wrote:
 On lör, 2010-08-14 at 02:05 -0500, Jaime Casanova wrote:

 BTW, why the double quotes?

 Because the name contains upper case letters?


why everything seems so obvious once someone else state it? :)

 sorry to state the obvious but this doesn't work on windows, does it?

 Probably not, but hopefully there is some similar API that could be used
 on Windows.


good luck with that! ;)
seriously, maybe this helps
http://msdn.microsoft.com/en-us/library/system.windows.forms.inputlanguage.installedinputlanguages.aspx
but probably you will need to write the code yourself... at least i
don't think there is something like locale -a

 and for some reason it also didn't work on a centos 5 (this error
 ocurred when initdb'ing)
 
 loading system objects' descriptions ... ok
 creating collations ...FATAL:  invalid byte sequence for encoding
 UTF8: 0xe56c09
 CONTEXT:  COPY tmp_pg_collation, line 86
 STATEMENT:  COPY tmp_pg_collation FROM
 E'/usr/local/pgsql/9.1/share/locales.txt';
 

 Hmm, what is in that file on that line?



bokmål  ISO-8859-1

(i'm attaching the locales.txt just in case)

-- 
Jaime Casanova         www.2ndQuadrant.com
Soporte y capacitación de PostgreSQL
aa_DJ   ISO-8859-1
aa_DJ.iso88591  ISO-8859-1
aa_DJ.utf8  UTF-8
aa_ER   UTF-8
aa...@saaho UTF-8
aa_ER.utf8  UTF-8
aa_er.u...@saahoUTF-8
aa_ET   UTF-8
aa_ET.utf8  UTF-8
af_ZA   ISO-8859-1
af_ZA.iso88591  ISO-8859-1
af_ZA.utf8  UTF-8
am_ET   UTF-8
am_ET.utf8  UTF-8
an_ES   ISO-8859-15
an_ES.iso885915 ISO-8859-15
an_ES.utf8  UTF-8
ar_AE   ISO-8859-6
ar_AE.iso88596  ISO-8859-6
ar_AE.utf8  UTF-8
ar_BH   ISO-8859-6
ar_BH.iso88596  ISO-8859-6
ar_BH.utf8  UTF-8
ar_DZ   ISO-8859-6
ar_DZ.iso88596  ISO-8859-6
ar_DZ.utf8  UTF-8
ar_EG   ISO-8859-6
ar_EG.iso88596  ISO-8859-6
ar_EG.utf8  UTF-8
ar_IN   UTF-8
ar_IN.utf8  UTF-8
ar_IQ   ISO-8859-6
ar_IQ.iso88596  ISO-8859-6
ar_IQ.utf8  UTF-8
ar_JO   ISO-8859-6
ar_JO.iso88596  ISO-8859-6
ar_JO.utf8  UTF-8
ar_KW   ISO-8859-6
ar_KW.iso88596  ISO-8859-6
ar_KW.utf8  UTF-8
ar_LB   ISO-8859-6
ar_LB.iso88596  ISO-8859-6
ar_LB.utf8  UTF-8
ar_LY   ISO-8859-6
ar_LY.iso88596  ISO-8859-6
ar_LY.utf8  UTF-8
ar_MA   ISO-8859-6
ar_MA.iso88596  ISO-8859-6
ar_MA.utf8  UTF-8
ar_OM   ISO-8859-6
ar_OM.iso88596  ISO-8859-6
ar_OM.utf8  UTF-8
ar_QA   ISO-8859-6
ar_QA.iso88596  ISO-8859-6
ar_QA.utf8  UTF-8
ar_SA   ISO-8859-6
ar_SA.iso88596  ISO-8859-6
ar_SA.utf8  UTF-8
ar_SD   ISO-8859-6
ar_SD.iso88596  ISO-8859-6
ar_SD.utf8  UTF-8
ar_SY   ISO-8859-6
ar_SY.iso88596  ISO-8859-6
ar_SY.utf8  UTF-8
ar_TN   ISO-8859-6
ar_TN.iso88596  ISO-8859-6
ar_TN.utf8  UTF-8
ar_YE   ISO-8859-6
ar_YE.iso88596  ISO-8859-6
ar_YE.utf8  UTF-8
as_IN.utf8  UTF-8
az_AZ.utf8  UTF-8
be_BY   CP1251
be_BY.cp1251CP1251
be...@latin UTF-8
be_BY.utf8  UTF-8
be_by.u...@latinUTF-8
bg_BG   CP1251
bg_BG.cp1251CP1251
bg_BG.utf8  UTF-8
bn_BD   UTF-8
bn_BD.utf8  UTF-8
bn_IN   UTF-8
bn_IN.utf8  UTF-8
bokmal  ISO-8859-1
bokmål  ISO-8859-1
br_FR   ISO-8859-1
br...@euro  ISO-8859-15
br_FR.iso88591  ISO-8859-1
br_fr.iso885...@euroISO-8859-15
br_FR.utf8  UTF-8
bs_BA   ISO-8859-2
bs_BA.iso88592  ISO-8859-2
bs_BA.utf8  UTF-8
byn_ER  UTF-8
byn_ER.utf8 UTF-8
C   ANSI_X3.4-1968
ca_AD   ISO-8859-15
ca_AD.iso885915 ISO-8859-15
ca_AD.utf8  UTF-8
ca_ES   ISO-8859-1
ca...@euro  ISO-8859-15
ca_ES.iso88591  ISO-8859-1
ca_es.iso885...@euroISO-8859-15
ca_ES.utf8  UTF-8
ca_FR   ISO-8859-15
ca_FR.iso885915 ISO-8859-15
ca_FR.utf8  UTF-8
ca_IT   ISO-8859-15
ca_IT.iso885915 ISO-8859-15
ca_IT.utf8  UTF-8
catalan ISO-8859-1
croatianISO-8859-2
csb_PL  UTF-8
csb_PL.utf8 UTF-8
cs_CZ   ISO-8859-2
cs_CZ.iso88592  ISO-8859-2
cs_CZ.utf8  UTF-8
cy_GB   ISO-8859-14
cy_GB.iso885914 ISO-8859-14
cy_GB.utf8  UTF-8
czech   ISO-8859-2
da_DK   ISO-8859-1
da_DK.iso88591  ISO-8859-1
da_DK.iso885915 ISO-8859-15
da_DK.utf8  UTF-8
danish  ISO-8859-1
dansk   ISO-8859-1
de_AT   ISO-8859-1
de...@euro  ISO-8859-15
de_AT.iso88591  ISO-8859-1
de_at.iso885...@euroISO-8859-15
de_AT.utf8  UTF-8
de_BE   ISO-8859-1
de...@euro  ISO-8859-15
de_BE.iso88591  ISO-8859-1
de_be.iso885...@euroISO-8859-15
de_BE.utf8  UTF-8
de_CH   ISO-8859-1
de_CH.iso88591  ISO-8859-1
de_CH.utf8  UTF-8
de_DE   ISO-8859-1
de...@euro  ISO-8859-15
de_DE.iso88591  ISO-8859-1
de_de.iso885...@euroISO-8859-15
de_DE.utf8  UTF-8
de_LU   ISO-8859-1
de...@euro  ISO-8859-15
de_LU.iso88591  ISO-8859-1
de_lu.iso885...@euroISO-8859-15
de_LU.utf8  UTF-8
deutsch ISO-8859-1
dutch   ISO-8859-1
dz_BT   UTF-8
dz_BT.utf8  UTF-8
eesti   ISO-8859-1
el_CY   ISO-8859-7
el_CY.iso88597  ISO-8859-7
el_CY.utf8  UTF-8
el_GR   ISO-8859-7
el_GR.iso88597  ISO-8859-7
el_GR.utf8  UTF-8
en_AU   ISO-8859-1
en_AU.iso88591  ISO-8859-1
en_AU.utf8  

Re: [HACKERS] Per-column collation, proof of concept

2010-08-16 Thread Peter Eisentraut
On lör, 2010-08-14 at 02:05 -0500, Jaime Casanova wrote:
 btw, the patch no longer apply cleanly but most are just hunks the
 worst it's in src/backend/catalog/namespace.c because
 FindConversionByName() is now called get_conversion_oid()... so maybe
 this function should be named get_collation_oid(), i guess

OK, that will need to be adjusted.

 well at least pg_collation should be a shared catalog, no?
 and i think we shouldn't be thinking in this without think first how
 to integrate this with at least per-database configuration

Good point.  But one might also want to create private collations, so
a collation in a schema would also be useful.  Tricky.

 also, it doesn't recognize C collate although it is in the locales.txt
 
 test3=# create database test4 with template=template0 encoding 'utf-8'
 lc_collate='C';
 CREATE DATABASE
 test3=# create table t3 (col1 text collate C );
 ERROR:  collation C does not exist
 

I've fixed this in the meantime.  Your version of the patch doesn't
support the C locale yet.

 BTW, why the double quotes?

Because the name contains upper case letters?

 sorry to state the obvious but this doesn't work on windows, does it?

Probably not, but hopefully there is some similar API that could be used
on Windows.

 and for some reason it also didn't work on a centos 5 (this error
 ocurred when initdb'ing)
 
 loading system objects' descriptions ... ok
 creating collations ...FATAL:  invalid byte sequence for encoding
 UTF8: 0xe56c09
 CONTEXT:  COPY tmp_pg_collation, line 86
 STATEMENT:  COPY tmp_pg_collation FROM
 E'/usr/local/pgsql/9.1/share/locales.txt';
 

Hmm, what is in that file on that line?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-column collation, proof of concept

2010-08-14 Thread Jaime Casanova
Hi,

sorry for the delay...
btw, the patch no longer apply cleanly but most are just hunks the
worst it's in src/backend/catalog/namespace.c because
FindConversionByName() is now called get_conversion_oid()... so maybe
this function should be named get_collation_oid(), i guess

On Tue, Aug 3, 2010 at 11:32 AM, Peter Eisentraut pete...@gmx.net wrote:
 On mån, 2010-08-02 at 01:43 -0500, Jaime Casanova wrote:
 nowadays, CREATE DATABASE has a lc_collate clause. is the new collate
 clause similar as the lc_collate?
 i mean, is lc_collate what we will use as a default?

 Yes, if you do not specify anything per column, the database default is
 used.

 How to integrate the per-database or per-cluster configuration with the
 new system is something to figure out in the future.


well at least pg_collation should be a shared catalog, no?
and i think we shouldn't be thinking in this without think first how
to integrate this with at least per-database configuration

 if yes, then probably we need to use pg_collation there too because
 lc_collate and the new collate clause use different collation names.
 
 postgres=# create database test with lc_collate 'en_US.UTF-8';
 CREATE DATABASE
 test=# create table t1 (col1 text collate en_US.UTF-8);
 ERROR:  collation en_US.UTF-8 does not exist
 test=# create table t1 (col1 text collate en_US.utf8);
 CREATE TABLE
 

 This is something that libc does for you.  The locale as listed by
 locale -a is called en_US.utf8, but apparently libc takes
 en_US.UTF-8 as well.


ok, but at least this is confusing

also, it doesn't recognize C collate although it is in the locales.txt

test3=# create database test4 with template=template0 encoding 'utf-8'
lc_collate='C';
CREATE DATABASE
test3=# create table t3 (col1 text collate C );
ERROR:  collation C does not exist


BTW, why the double quotes?

 also i got errors from regression tests when MULTIBYTE=UTF8
 (attached). it seems i was trying to create locales that weren't
 defined on locales.txt (from were was fed that file?). i added a line
 to that file (for es_EC.utf8) then i create a table with a column
 using that collate and execute select * from t2 where col1  'n'; 
 and i got this error: ERROR:  could not create locale es_EC.utf8
 (of course, that last part was me messing the things up, but it show
 we shouldn't be using a file locales.txt, i think)

 It might be that you don't have those locales installed in your system.
 locales.txt is created by using locale -a.  Check what that gives you.


sorry to state the obvious but this doesn't work on windows, does it?
and for some reason it also didn't work on a centos 5 (this error
ocurred when initdb'ing)

loading system objects' descriptions ... ok
creating collations ...FATAL:  invalid byte sequence for encoding
UTF8: 0xe56c09
CONTEXT:  COPY tmp_pg_collation, line 86
STATEMENT:  COPY tmp_pg_collation FROM
E'/usr/local/pgsql/9.1/share/locales.txt';


-- 
Jaime Casanova         www.2ndQuadrant.com
Soporte y capacitación de PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-column collation, proof of concept

2010-08-03 Thread Peter Eisentraut
On mån, 2010-08-02 at 01:43 -0500, Jaime Casanova wrote:
 nowadays, CREATE DATABASE has a lc_collate clause. is the new collate
 clause similar as the lc_collate?
 i mean, is lc_collate what we will use as a default?

Yes, if you do not specify anything per column, the database default is
used.

How to integrate the per-database or per-cluster configuration with the
new system is something to figure out in the future.

 if yes, then probably we need to use pg_collation there too because
 lc_collate and the new collate clause use different collation names.
 
 postgres=# create database test with lc_collate 'en_US.UTF-8';
 CREATE DATABASE
 test=# create table t1 (col1 text collate en_US.UTF-8);
 ERROR:  collation en_US.UTF-8 does not exist
 test=# create table t1 (col1 text collate en_US.utf8);
 CREATE TABLE
 

This is something that libc does for you.  The locale as listed by
locale -a is called en_US.utf8, but apparently libc takes
en_US.UTF-8 as well.

 also i got errors from regression tests when MULTIBYTE=UTF8
 (attached). it seems i was trying to create locales that weren't
 defined on locales.txt (from were was fed that file?). i added a line
 to that file (for es_EC.utf8) then i create a table with a column
 using that collate and execute select * from t2 where col1  'n'; 
 and i got this error: ERROR:  could not create locale es_EC.utf8
 (of course, that last part was me messing the things up, but it show
 we shouldn't be using a file locales.txt, i think)

It might be that you don't have those locales installed in your system.
locales.txt is created by using locale -a.  Check what that gives you.

 i can attach a collate to a domain but i can't see where are we
 storing that info (actually it says it's not collatable):

Domain support is not done yet.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-column collation, proof of concept

2010-08-02 Thread Jaime Casanova
On Tue, Jul 13, 2010 at 1:25 PM, Peter Eisentraut pete...@gmx.net wrote:
 Here is a proof of concept for per-column collation support.


Hi,

i was looking at this.

nowadays, CREATE DATABASE has a lc_collate clause. is the new collate
clause similar as the lc_collate?
i mean, is lc_collate what we will use as a default?

if yes, then probably we need to use pg_collation there too because
lc_collate and the new collate clause use different collation names.

postgres=# create database test with lc_collate 'en_US.UTF-8';
CREATE DATABASE
test=# create table t1 (col1 text collate en_US.UTF-8);
ERROR:  collation en_US.UTF-8 does not exist
test=# create table t1 (col1 text collate en_US.utf8);
CREATE TABLE


also i got errors from regression tests when MULTIBYTE=UTF8
(attached). it seems i was trying to create locales that weren't
defined on locales.txt (from were was fed that file?). i added a line
to that file (for es_EC.utf8) then i create a table with a column
using that collate and execute select * from t2 where col1  'n'; 
and i got this error: ERROR:  could not create locale es_EC.utf8
(of course, that last part was me messing the things up, but it show
we shouldn't be using a file locales.txt, i think)

i can attach a collate to a domain but i can't see where are we
storing that info (actually it says it's not collatable):

-- 
Jaime Casanova         www.2ndQuadrant.com
Soporte y capacitación de PostgreSQL


regression.diffs
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-column collation, proof of concept

2010-07-15 Thread Peter Eisentraut
On tor, 2010-07-15 at 05:57 +0200, Pavel Stehule wrote:
 :( maybe we have to enhance a locales - or do some work in this way.
 In Czech's IS is relative often operation some like
 
 name = 'Stěhule' COLLATION cs_CZ_cs_ai -- compare case insensitive
 accent insensitive
 
 PostgreSQL is last db, that doesn't integreated support for it

Well, the comparison function varstr_cmp() contains this comment:

/*
 * In some locales strcoll() can claim that nonidentical strings are
 * equal.  Believing that would be bad news for a number of reasons,
 * so we follow Perl's lead and sort equal strings according to
 * strcmp().
 */

This might not be strictly necessary, seeing that citext obviously
doesn't work that way, but resolving this is really an orthogonal issue.
If you fix that and you have a locale that does what you want, my patch
will help you get your example working.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-column collation, proof of concept

2010-07-15 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 Well, the comparison function varstr_cmp() contains this comment:

 /*
  * In some locales strcoll() can claim that nonidentical strings are
  * equal.  Believing that would be bad news for a number of reasons,
  * so we follow Perl's lead and sort equal strings according to
  * strcmp().
  */

 This might not be strictly necessary, seeing that citext obviously
 doesn't work that way, but resolving this is really an orthogonal issue.

The problem with not doing that is it breaks hashing --- hash joins and
hash aggregation being the real pain points.

citext works around this in a rather klugy fashion by decreeing that two
strings are equal iff their str_tolower() conversions are bitwise equal.
So it can hash the str_tolower() representation.  But that's kinda slow
and it fails in the general case anyhow, I think.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-column collation, proof of concept

2010-07-15 Thread Greg Stark
On Thu, Jul 15, 2010 at 4:24 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 The problem with not doing that is it breaks hashing --- hash joins and
 hash aggregation being the real pain points.

 citext works around this in a rather klugy fashion by decreeing that two
 strings are equal iff their str_tolower() conversions are bitwise equal.
 So it can hash the str_tolower() representation.  But that's kinda slow
 and it fails in the general case anyhow, I think.

I think the general equivalent would be to call strxfrm and hash the
result of that.



-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-column collation, proof of concept

2010-07-14 Thread Kevin Grittner
Peter Eisentraut pete...@gmx.net wrote:
 
 Here is a proof of concept for per-column collation support.
 
Did you want a WIP review of that patch?  (CF closing to new
submissions soon)
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-column collation, proof of concept

2010-07-14 Thread Pavel Stehule
Hello

I have only one question - If I understand well you can use collate
just for sort. What is your plan for range search operation? Sort is
interesting and I am sure important for multilangual applications, for
me - more important is case sensitive, case insensitive, accent
sensitive, insensitive filtering - do you have a plan for it?

Regards

Pavel Stehule

2010/7/13 Peter Eisentraut pete...@gmx.net:
 Here is a proof of concept for per-column collation support.

 Here is how it works: When creating a table, an optional COLLATE clause
 can specify a collation name, which is stored (by OID) in pg_attribute.
 This becomes part of the type information and is propagated through the
 expression parse analysis, like typmod.  When an operator or function
 call is parsed (transformed), the collations of the arguments are
 unified, using some rules (like type analysis, but different in detail).
 The collations of the function/operator arguments come either from Var
 nodes which in turn got them from pg_attribute, or from other
 function and operator calls, or you can override them with explicit
 COLLATE clauses (not yet implemented, but will work a bit like
 RelabelType).  At the end, each function or operator call gets one
 collation to use.


what about DISTINCT clause, maybe GROUP BY clause ?

regards

Pavel

 The function call itself can then look up the collation using the
 fcinfo-flinfo-fn_expr field.  (Works for operator calls, but doesn't
 work for sort operations, needs more thought.)

 A collation is in this implementation defined as an lc_collate string
 and an lc_ctype string.  The implementation of functions interested in
 that information, such as comparison operators, or upper and lower
 functions, will take the collation OID that is passed in, look up the
 locale string, and use the xlocale.h interface (newlocale(),
 strcoll_l()) to compute the result.

 (Note that the xlocale stuff is only 10 or so lines in this patch.  It
 should be feasible to allow other appropriate locale libraries to be
 used.)

 Loose ends:

 - Support function calls (currently only operator calls) (easy)

 - Implementation of sort clauses

 - Indexing support/integration

 - Domain support (should be straightforward)

 - Make all expression node types deal with collation information
  appropriately

 - Explicit COLLATE clause on expressions

 - Caching and not leaking memory of locale lookups

 - I have typcollatable to mark which types can accept collation
  information, but perhaps there should also be proicareaboutcollation
  to skip collation resolution when none of the functions in the
  expression tree care.

 You can start by reading the collate.sql regression test file to see
 what it can do.  Btw., regression tests only work with make check
 MULTIBYTE=UTF8.  And it (probably) only works with glibc for now.



 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-column collation, proof of concept

2010-07-14 Thread Peter Eisentraut
On ons, 2010-07-14 at 19:35 +0200, Pavel Stehule wrote:
 I have only one question - If I understand well you can use collate
 just for sort. What is your plan for range search operation?

My patch does range searches.  Sorting uses the same operators, so both
will be supported.  (Sorting is not yet implemented, as I had written.)

 Sort is
 interesting and I am sure important for multilangual applications, for
 me - more important is case sensitive, case insensitive, accent
 sensitive, insensitive filtering - do you have a plan for it?

You may be able to do some of these by using appropriate locale
definitions.  I'd need some examples to be able to tell for sure.

 what about DISTINCT clause, maybe GROUP BY clause ?

DISTINCT and GROUP BY work with equality, which is not affected by
locales (at least under the current rules).



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-column collation, proof of concept

2010-07-14 Thread Pavel Stehule
2010/7/14 Peter Eisentraut pete...@gmx.net:
 On ons, 2010-07-14 at 19:35 +0200, Pavel Stehule wrote:
 I have only one question - If I understand well you can use collate
 just for sort. What is your plan for range search operation?

 My patch does range searches.  Sorting uses the same operators, so both
 will be supported.  (Sorting is not yet implemented, as I had written.)

 Sort is
 interesting and I am sure important for multilangual applications, for
 me - more important is case sensitive, case insensitive, accent
 sensitive, insensitive filtering - do you have a plan for it?

 You may be able to do some of these by using appropriate locale
 definitions.  I'd need some examples to be able to tell for sure.

 what about DISTINCT clause, maybe GROUP BY clause ?

 DISTINCT and GROUP BY work with equality, which is not affected by
 locales (at least under the current rules).


:( maybe we have to enhance a locales - or do some work in this way.
In Czech's IS is relative often operation some like

name = 'Stěhule' COLLATION cs_CZ_cs_ai -- compare case insensitive
accent insensitive

PostgreSQL is last db, that doesn't integreated support for it

Regards

Pavel




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers