On 2021-Mar-10, Gavan Schneider wrote: > On 10 Mar 2021, at 16:24, Alvaro Herrera wrote: > > > That space (0xe280af) is U+202F, which appears to be used for French and > > Mongolian languages (exclusively?). It is quite possible that in the > > future some other language will end up using some different whitespace > > character, possibly breaking any code you write today -- the use of > > U+202F appears to be quite recent. > > > Drifting off topic a little. That a proper code point for things that will > benefit from the whitespace but should still stay together. > Also it’s not that new, added in 1999 — https://codepoints.net/U+202F
I probably got misled on this whole thing by these change proposals. https://www.unicode.org/L2/L2019/19116-clarify-nnbsp.pdf https://www.unicode.org/L2/L2020/20008-core-text.pdf Apparently prior to this, they (?) had been using/recommending THIN SPACE U+2009 as separator, which is not non-breaking. Anyway, it reinforces my point that it's not impossible that some other locale definition could use U+2009 when printing numbers, or even some other kind of spacing entity in non-Latin languages etc. So I think that for truly robust handling you should separate the thing you use for display from the thing you use to talk to the database. > And the thin space is part of the international standard for breaking up > large numbers (from 1948), specifically no dots or commas should be used in > this role. The dot or comma is only to be used for the decimal point! Interesting U+2014 EM DASH I didn't know this. -- Álvaro Herrera Valdivia, Chile "This is a foot just waiting to be shot" (Andrew Dunstan)