Hi, On 2024-04-15 18:23:21 -0700, Jeff Davis wrote: > On Mon, 2024-04-15 at 17:05 -0700, Andres Freund wrote: > > Can't we test this as part of the normal testsuite? > > One thing that complicates things a bit is that the test compares the > results against ICU, so a mismatch in Unicode version between ICU and > Postgres can cause test failures. The test ignores unassigned code > points, so normally it just results in less-exhaustive test coverage. > But sometimes things really do change, and that would cause a failure.
Hm, that seems annoying, even for update-unicode :/. But I guess it won't be very common to have such failures? > Stepping back a moment, my top worry is really not to test those C > functions, but to test the perl code that parses the text files and > generates those arrays. Imagine a future Unicode version does something > that the perl scripts didn't anticipate, and they fail to add array > entries for half the code points, or something like that. By testing > the arrays generated from freshly-parsed files exhaustively against > ICU, then we have a good defense against that. That situation really > only comes up when updating Unicode. That's a good point. > That's not to say that the C code shouldn't be tested, of course. Maybe > we can just do some spot checks for the functions that are reachable > via SQL and get rid of the functions that aren't yet reachable (and re- > add them when they are)? Yes, I think that'd be a good start. I don't think we necessarily need exhaustive coverage, just a bit more coverage than we have. > > I don't at all like that the tests depend on downloading new unicode > > data. What if there was an update but I just want to test the current > > state? > > I was mostly following the precedent for normalization. Should we > change that, also? Yea, I think we should. But I think it's less urgent if we end up testing more of the code without those test binaries. I don't immediately know what dependencies would be best, tbh. Greetings, Andres Freund