Thanks for bringing this up! I just looked at the uses if isspace() in that file. It looks like it is the usual thing: it is allowing leading or trailing whitespace when parsing values, or for this "needs quoting" logic on output. The fix would be the same: this *should* be using scanner_isspace. This has the same disadvantage: it would change Postgres's results for some inputs that contain these non-ASCII "space" characters.
Here is a quick demonstration of this issue, showing that the quoting behavior is different between these two. Mac OS X with the "default" locale includes quotes because ą includes 0x85 in its UTF-8 encoding: postgres=# SELECT ROW('keyą'); row ---------- ("keyą") (1 row) On Mac OS X with the LANG=C environment variable set, it does not include quotes: postgres=# SELECT ROW('keyą'); row -------- (keyą) (1 row) On Mon, Oct 9, 2023 at 11:18 PM Thomas Munro <thomas.mu...@gmail.com> wrote: > FTR I ran into a benign case of the phenomenon in this thread when > dealing with row types. In rowtypes.c, we double-quote stuff > containing spaces, but we detect them by passing individual bytes of > UTF-8 sequences to isspace(). Like macOS, Windows thinks that 0xa0 is > a space when you do that, so for example the Korean character '점' > (code point C810, UTF-8 sequence EC A0 90) gets quotes on Windows but > not on Linux. That confused a migration/diff tool while comparing > Windows and Linux database servers using that representation. Not a > big deal, I guess no one ever promised that the format was stable > across platforms, and I don't immediately see a way for anything more > serious to go wrong (though I may lack imagination). It does seem a > bit weird to be using locale-aware tokenising for a machine-readable > format, and then making sure its behaviour is undefined by feeding it > chopped up bytes. >