It is possible to adjust the encoding, collate, and ctypes in postgres, but I wouldn’t recommend it because it can only be done as part of `CREATE DATABASE` I believe. I’d lean towards either option 2 or 1 myself.
Jonathan G From: ocket 8888 <[email protected]> Date: Monday, June 21, 2021 at 1:35 PM To: [email protected] <[email protected]> Subject: [EXTERNAL] Locale-dependent sorting in Traffic Ops API responses Currently, as some of you may have noticed, the integration tests for Traffic Ops and its Go API client are failing on master. This is (partially) because of a test that checks that the response to a GET request to /deliveryservices at version 4.0 is sorted by XMLID if no orderby query string parameter is provided. The way it does this is by making the request, then sorting the returned XMLIDs, then comparing the actual response to the sorted list. Here's the problem: the ordering is done in an SQL "ORDER BY" clause, which - besides being a different runtime - runs on ostensibly an entirely separate system from the client/TO integration tests. If those two systems aren't using the same locale, they won't sort the same set of strings in the same way. Specifically, right now our tests use a C locale, which sorts "-" *after* "1", but the postgresql service uses en_US.utf8, which sorts "-" *before* "1". This is the tests passing (that one, specifically, some vault things are still failing) after add a `COLLATE "C"` statement to the query's "ORDER BY" clause: https://urldefense.com/v3/__https://github.com/apache/trafficcontrol/pull/5957/checks?check_run_id=2878328870__;!!CQl3mcHX2A!VJxN96QQD6DYc7XFBWGmHSYO1xccKwfhHeUcSMzRrBnWEo4c5PMalFP4-K-k5vIXkyW8$<https://urldefense.com/v3/__https:/github.com/apache/trafficcontrol/pull/5957/checks?check_run_id=2878328870__;!!CQl3mcHX2A!VJxN96QQD6DYc7XFBWGmHSYO1xccKwfhHeUcSMzRrBnWEo4c5PMalFP4-K-k5vIXkyW8$> This is the tests failing without that extra statement: https://urldefense.com/v3/__https://github.com/apache/trafficcontrol/runs/2853688690__;!!CQl3mcHX2A!VJxN96QQD6DYc7XFBWGmHSYO1xccKwfhHeUcSMzRrBnWEo4c5PMalFP4-K-k5jcFib_j$<https://urldefense.com/v3/__https:/github.com/apache/trafficcontrol/runs/2853688690__;!!CQl3mcHX2A!VJxN96QQD6DYc7XFBWGmHSYO1xccKwfhHeUcSMzRrBnWEo4c5PMalFP4-K-k5jcFib_j$> Now: why is this a mailing list matter? Because, all of our laptops are using en_US.utf8, or somebody would've said something way earlier. And every attempt I've made (in this PR: https://urldefense.com/v3/__https://github.com/apache/trafficcontrol/pull/5957__;!!CQl3mcHX2A!VJxN96QQD6DYc7XFBWGmHSYO1xccKwfhHeUcSMzRrBnWEo4c5PMalFP4-K-k5lkogHoF$<https://urldefense.com/v3/__https:/github.com/apache/trafficcontrol/pull/5957__;!!CQl3mcHX2A!VJxN96QQD6DYc7XFBWGmHSYO1xccKwfhHeUcSMzRrBnWEo4c5PMalFP4-K-k5lkogHoF$> ) to change the locale in which the tests are running has failed. So adding that locale statement will cause the tests to start failing without extra configuration on everyone's local development machines and in every existing CI environment besides GHA, and we apparently can't force the GHA tests to comply with our existing environments (I think - if I've missed something somebody please let me know). To solve this, I can think of three options: 1. Get rid of the test. Default ordering is undocumented, so nobody should actually be relying on it. 2. Change the test; the original purpose behind default ordering was to make requests deterministic so that with the same data in TO you would always get the same response. The test can't be absolutely sure that if it checks N times that if it did it N+1 times the consistency wouldn't be broken, but we can ensure that under testing conditions that making, say, 3 identical, subsequent requests with identical data in TO yields identical responses. It's a weaker test, but it could be said it's better than nothing. 3. Enforce a locale. If the actual order is important to us, then we must enforce a locale, because that defines the order. en_US.UTF-8 seems a logical choice, so that the only thing that needs to change is the test. It will mean, materially, writing our own sorting function that uses en_US.UTF-8 rules to compare bytes in a string - regardless of the execution environment's locale - and adding some documentation that notes that our supported environments are restricted to a specific locale. Personally, I like option 2. I think our tests ought to reflect the intended behavior, not implementation details. I think the likelihood of that test passing when underlying functionality can be made acceptably low, and I think that in general the locale of a system is not something a single piece of software running on it should dictate. I don't feel good about telling the community/world "this software only runs on machines configured for United States-flavor English".
