On Wed, Feb 8, 2023 at 2:59 AM Jeff Davis <pg...@j-davis.com> wrote: > We do check that the value is accepted by ICU, but ICU seems to accept > anything and use some fallback logic. Bogus strings will typically end > up as the "root" locale (spelled "root" or "").
I've noticed this, and I think it's really frustrating. There's barely any documentation of what strings you're allowed to specify, and the documentation that does exist is extremely difficult to understand. Normally, you could work around that problem to some degree by making a guess at what you're supposed to be doing and then seeing whether the program accepts it, but here that doesn't work either. It just accepts anything you give it and then you have to try to figure out whether the behavior is what you wanted. But there's also no real documentation of what the behavior of any collation is, so you're apparently just supposed to magically know what collations exist and how they behave and then you can test whether the string you put in gave you the behavior you wanted. Adding validation and canonicalization wouldn't cure the documentation problems, but it would be a big help. You still wouldn't know what string you were supposed to be passing to ICU, but if you did pass it a string, you'd find out what it thought that string meant. I think that would be a huge step forward. Unfortunately, I have no idea whether your specific ideas about how to make that happen are any good or not. But I hope they are, because the current situation is pessimal. -- Robert Haas EDB: http://www.enterprisedb.com