On Tue, 2024-04-16 at 11:58 -0700, Andres Freund wrote: > > Hm, that seems annoying, even for update-unicode :/. But I guess it > won't be > very common to have such failures?
Things don't change a lot between Unicode versions (and are subject to the stability policy), but the tests are exhaustive, so even a single character's property being changed will cause a failure when compared against an older version of ICU. The case mapping test succeeds back to ICU 64 (based on Unicode 12.1), but the category/properties test succeeds only back to ICU 72 (based on Unicode 15.0). I agree this is annoying, and I briefly documented it in src/common/unicode/README. It means whoever updates Unicode for a Postgres version should probably know how to build ICU from source and point the Postgres build process at it. Maybe I should add more details in the README to make that easier for others. But it's also a really good test. The ICU parsing, interpretation of data files, and lookup code is entirely independent of ours. Therefore, if the results agree for all codepoints, we have a high degree of confidence that the results are correct. That level of confidence seems worth a bit of annoyance. This kind of test is possible because the category/property and case mapping functions accept a single code point, and there are only 0x10FFFF code points. > > That's not to say that the C code shouldn't be tested, of course. > > Maybe > > we can just do some spot checks for the functions that are > > reachable > > via SQL and get rid of the functions that aren't yet reachable (and > > re- > > add them when they are)? > > Yes, I think that'd be a good start. I don't think we necessarily > need > exhaustive coverage, just a bit more coverage than we have. OK, I'll submit a test module or something. Regards, Jeff Davis