On Mon, 2024-01-22 at 19:49 +0100, Peter Eisentraut wrote: > > > I don't get this argument. Of course, people care about sorting and > sort order. Whether you consider this part of Unicode or adjacent to > it, people still want it.
You said that my proposal sends a message that we somehow don't care about Unicode, and I strongly disagree. The built-in provider I'm proposing does implement Unicode semantics. Surely a database that offers UCS_BASIC (a SQL spec feature) isn't sending a message that it doesn't care about Unicode, and neither is my proposal. > > > > * ICU offers COLLATE UNICODE, locale tailoring, case-insensitive > > matching, and customization with rules. It's the solution for > > everything from "slightly more advanced" to "very advanced". > > I am astonished by this. In your world, do users not want their text > data sorted? Do they not care what the sort order is? I obviously care about Unicode and collation. I've put a lot of effort recently into contributions in this area, and I wouldn't have done that if I thought users didn't care. You've made much greater contributions and I thank you for that. The logical conclusion of your line of argument would be that libc's "C.UTF-8" locale and UCS_BASIC simply should not exist. But they do exist, and for good reason. One of those good reasons is that only *human* users care about the human-friendliness of sort order. If Postgres is just feeding the results to another system -- or an application layer that re-sorts the data anyway -- then stability, performance, and interoperability matter more than human-friendliness. (Though Unicode character semantics are still useful even when the data is not going directly to a human.) > You consider UCA > sort order an "advanced" feature? I said "slightly more advanced" compared with "basic". "Advanced" can be taken in either a positive way ("more useful") or a negative way ("complex"). I'm sorry for the misunderstanding, but my point was this: * The builtin provider is for people who are fine with code point order and no tailoring, but want Unicode character semantics, collation stability, and performance. * ICU is the right solution for anyone who wants human-friendly collation or tailoring, and is willing to put up with some collation stability risk and lower collation performance. Both have their place and the user is free to mix and match as needed, thanks to the COLLATE clause for columns and queries. Regards, Jeff Davis