On 18.01.24 23:03, Jeff Davis wrote:
On Thu, 2024-01-18 at 13:53 +0100, Peter Eisentraut wrote:
I think that would be a terrible direction to take, because it would
regress the default sort order from "correct" to "useless".

I don't agree that the current default is "correct". There are a lot of
ways it can be wrong:

   * the environment variables at initdb time don't reflect what the
users of the database actually want
   * there are so many different users using so many different
applications connected to the database that no one "correct" sort order
exists
   * libc has some implementation quirks
   * the version of Unicode that libc is based on is not what you expect
   * the version of libc is not what you expect

These are arguments why the current defaults are not universally perfect, but I'd argue that they are still most often the right thing as the default.

   Aside from
the overall message this sends about how PostgreSQL cares about
locales
and Unicode and such.

Unicode is primarily about the semantics of characters and their
relationships. The patches I propose here do a great job of that.

Collation (relationships between *strings*) is a part of Unicode, but
not the whole thing or even the main thing.

I don't get this argument. Of course, people care about sorting and sort order. Whether you consider this part of Unicode or adjacent to it, people still want it.

Maybe you don't intend for this to be the default provider?

I am not proposing that this provider be the initdb-time default.

ok

   But then
who would really use it? I mean, sure, some people would, but how
would
you even explain, in practice, the particular niche of users or use
cases?

It's for users who want to respect Unicode support text from
international sources in their database; but are not experts on the
subject and don't know precisely what they want or understand the
consequences. If and when such users do notice a problem with the sort
order, they'd handle it at that time (perhaps with a COLLATE clause, or
sorting in the application).

Vision:

* ICU offers COLLATE UNICODE, locale tailoring, case-insensitive
matching, and customization with rules. It's the solution for
everything from "slightly more advanced" to "very advanced".

I am astonished by this. In your world, do users not want their text data sorted? Do they not care what the sort order is? You consider UCA sort order an "advanced" feature?



Reply via email to