On Tue, Mar 10, 2026 at 3:04 PM Jeff Davis <[email protected]> wrote: > If their environment's LC_CTYPE is UTF8-based, they already get UTF-8. > If it isn't, we can either: > > (a) Fall back to LC_CTYPE=C, which is the only UTF8-compatible locale > available everywhere. C is actually not a terrible fallback: it doesn't > actually affect many things, because I have moved almost everything to > use the database default locale. > > (b) Warn or error unless they explicitly specify the encoding with -E. > But the former is likely to be ignored and the latter is not what I'd > call "gentle". > > Which of these do you think is the right approach?
I'm a little confused as to how this relates to what you were asking before. I thought you were proposing to pick UTF-8 rather than SQL_ASCII when LC_CTYPE=C, but that's not on this list of options. To be honest, I'd probably be ready to support making the default encoding UTF8 regardless of the environment, and you have to use -E if you want anything else. I think there are still people using other encodings, but I believe it to be a small minority at this point. > There's narrower question about what we do with LC_CTYPE=C. Currently > we use SQL_ASCII encoding, which doesn't seem like a great default, and > we could change that to default to UTF8. And another question about > whether we change the meaning of --no-locale. I think SQL_ASCII is a terrible default. Nobody actually wants that unless they're trying to get out of a sticky situation. Making it opt-in must be right. I do not know what the question about --no-locale is. > We sweat over single-digit performance regressions in fairly specific > cases all the time, but here we're 3X slower for index builds: > > https://www.depesz.com/2024/06/11/how-much-speed-youre-leaving-at-the-table-if-you-use-default-locale/ > > and 2-5X slower for Sort: > > https://www.postgresql.org/message-id/[email protected] > > and others don't seem very concerned, so I feel like I'm missing > something. <insert shrug emoji here> At the end of the day, we're all just guessing. My experience working for EDB is that we have a number of customers who care about sort order quite a lot, and we've had to sweat blood to make them happy. And, on a personal level, I have a hard time understanding why anyone would be OK with a sort order that puts Álvaro after Zebra instead of between Alvaro and Beatriz, because that seems extremely frustrating. However, these are just personal biases. I'm much more likely to hear from the customers who care a lot about the details of how something works than I am to hear from the customers who are perfectly happy to take the defaults, because people who are happy don't contact support at all and people who are unhappy about relatively normal things get handled by support; I get the weird cases. And everybody is going to have different experiences. Presumably, your experience is that the indexing and sorting performance is a big concern for the users you support, and that's why you favor prioritizing that part of the experience. That's perfectly legitimate, but it's different from my experience. My experience is that when I tell people they can use collate "C" to speed up sorting, they tell me that's a stupid workaround that doesn't give them the answers that they want, which obviously colors my viewpoint on this question in the same way that your experiences color yours. -- Robert Haas EDB: http://www.enterprisedb.com
