Re: pgsql: Allow tailoring of ICU locales with custom rules

2023-09-04 Thread Amit Kapila
On Tue, Aug 22, 2023 at 10:55 PM Jeff Davis  wrote:
>
> On Mon, 2023-08-14 at 10:34 +0200, Peter Eisentraut wrote:
> > I have investigated this.  My assessment is that how PostgreSQL
> > interfaces with ICU is correct.  Whether what ICU does is correct
> > might
> > be debatable.  I have filed a bug with ICU about this:
> > https://unicode-org.atlassian.net/browse/ICU-22456 , but there is no
> > response yet.
>
> Is everything other than the language and region simply discarded when
> a rules string is present, or are some attributes preserved, or is
> there some other nuance?
>
> > You can work around this by including the desired attributes in the
> > rules string, for example
> >
> >  create collation c3 (provider=icu,
> >locale='und-u-ka-shifted-ks-level1',
> >rules='[alternate shifted][strength 1]',
> >deterministic=false);
> >
> > So I don't think there is anything we need to do here for PostgreSQL
> > 16.
>
> Is there some way we can warn a user that some attributes will be
> discarded, or improve the documentation? Letting the user figure this
> out for themselves doesn't seem right.
>
> Are you sure we want to allow rules for the database default collation
> in 16, or should we start with just allowing them in CREATE COLLATION
> and then expand to the database default collation later? I'm still a
> bit concerned about users getting too fancy with daticurules, and
> ending up not being able to connect to their database anymore.
>

There is still an Open Item corresponding to this. Does anyone else
want to weigh in?

-- 
With Regards,
Amit Kapila.




Re: pgsql: Allow tailoring of ICU locales with custom rules

2023-08-22 Thread Jeff Davis
On Mon, 2023-08-14 at 10:34 +0200, Peter Eisentraut wrote:
> I have investigated this.  My assessment is that how PostgreSQL 
> interfaces with ICU is correct.  Whether what ICU does is correct
> might 
> be debatable.  I have filed a bug with ICU about this: 
> https://unicode-org.atlassian.net/browse/ICU-22456 , but there is no 
> response yet.

Is everything other than the language and region simply discarded when
a rules string is present, or are some attributes preserved, or is
there some other nuance?

> You can work around this by including the desired attributes in the 
> rules string, for example
> 
>  create collation c3 (provider=icu,
>    locale='und-u-ka-shifted-ks-level1',
>    rules='[alternate shifted][strength 1]',
>    deterministic=false);
> 
> So I don't think there is anything we need to do here for PostgreSQL
> 16.

Is there some way we can warn a user that some attributes will be
discarded, or improve the documentation? Letting the user figure this
out for themselves doesn't seem right.

Are you sure we want to allow rules for the database default collation
in 16, or should we start with just allowing them in CREATE COLLATION
and then expand to the database default collation later? I'm still a
bit concerned about users getting too fancy with daticurules, and
ending up not being able to connect to their database anymore.

Regards,
Jeff Davis





Re: pgsql: Allow tailoring of ICU locales with custom rules

2023-08-14 Thread Peter Eisentraut

On 24.07.23 04:46, Amit Kapila wrote:

On Fri, Mar 10, 2023 at 3:24 PM Peter Eisentraut
 wrote:


On 08.03.23 21:57, Jeff Davis wrote:


* It appears rules IS NULL behaves differently from rules=''. Is that
desired? For instance:
create collation c1(provider=icu,
  locale='und-u-ka-shifted-ks-level1',
  deterministic=false);
create collation c2(provider=icu,
  locale='und-u-ka-shifted-ks-level1',
  rules='',
  deterministic=false);
select 'a b' collate c1 = 'ab' collate c1; -- true
select 'a b' collate c2 = 'ab' collate c2; -- false


I'm puzzled by this.  The general behavior is, extract the rules of the
original locale, append the custom rules, use that.  If the custom rules
are the empty string, that should match using the original rules
untouched.  Needs further investigation.


* Can you document the interaction between locale keywords
("@colStrength=primary") and a rule like '[strength 2]'?


I'll look into that.


This thread is listed on PostgreSQL 16 Open Items list. This is a
gentle reminder to see if there is a plan to move forward with respect
to open points.


I have investigated this.  My assessment is that how PostgreSQL 
interfaces with ICU is correct.  Whether what ICU does is correct might 
be debatable.  I have filed a bug with ICU about this: 
https://unicode-org.atlassian.net/browse/ICU-22456 , but there is no 
response yet.


You can work around this by including the desired attributes in the 
rules string, for example


create collation c3 (provider=icu,
  locale='und-u-ka-shifted-ks-level1',
  rules='[alternate shifted][strength 1]',
  deterministic=false);

So I don't think there is anything we need to do here for PostgreSQL 16.





Re: pgsql: Allow tailoring of ICU locales with custom rules

2023-07-23 Thread Amit Kapila
On Fri, Mar 10, 2023 at 3:24 PM Peter Eisentraut
 wrote:
>
> On 08.03.23 21:57, Jeff Davis wrote:
>
> > * It appears rules IS NULL behaves differently from rules=''. Is that
> > desired? For instance:
> >create collation c1(provider=icu,
> >  locale='und-u-ka-shifted-ks-level1',
> >  deterministic=false);
> >create collation c2(provider=icu,
> >  locale='und-u-ka-shifted-ks-level1',
> >  rules='',
> >  deterministic=false);
> >select 'a b' collate c1 = 'ab' collate c1; -- true
> >select 'a b' collate c2 = 'ab' collate c2; -- false
>
> I'm puzzled by this.  The general behavior is, extract the rules of the
> original locale, append the custom rules, use that.  If the custom rules
> are the empty string, that should match using the original rules
> untouched.  Needs further investigation.
>
> > * Can you document the interaction between locale keywords
> > ("@colStrength=primary") and a rule like '[strength 2]'?
>
> I'll look into that.
>

This thread is listed on PostgreSQL 16 Open Items list. This is a
gentle reminder to see if there is a plan to move forward with respect
to open points.

-- 
With Regards,
Amit Kapila.




Re: pgsql: Allow tailoring of ICU locales with custom rules

2023-03-10 Thread Peter Eisentraut

On 08.03.23 21:57, Jeff Davis wrote:

On Wed, 2023-03-08 at 16:03 +, Peter Eisentraut wrote:

Allow tailoring of ICU locales with custom rules


Late review:

* Should throw error when provider != icu and rules != NULL


I have fixed that.


* Explain what the example means. By itself, users might get confused
wondering why someone would want to do that.

* Also consider a more practical example?


I have added a more practical example with explanation.


* It appears rules IS NULL behaves differently from rules=''. Is that
desired? For instance:
   create collation c1(provider=icu,
 locale='und-u-ka-shifted-ks-level1',
 deterministic=false);
   create collation c2(provider=icu,
 locale='und-u-ka-shifted-ks-level1',
 rules='',
 deterministic=false);
   select 'a b' collate c1 = 'ab' collate c1; -- true
   select 'a b' collate c2 = 'ab' collate c2; -- false


I'm puzzled by this.  The general behavior is, extract the rules of the 
original locale, append the custom rules, use that.  If the custom rules 
are the empty string, that should match using the original rules 
untouched.  Needs further investigation.



* Can you document the interaction between locale keywords
("@colStrength=primary") and a rule like '[strength 2]'?


I'll look into that.