On Fri, 2021-04-23 at 14:44 +0900, Kyotaro Horiguchi wrote:
> > The two examples I know of offhand are in German (eszett "ß" downcases to
> > "ss") and Turkish (dotted "Í" downcases to "i", likewise dotless "I"
> 
> According to Wikipedia, "ss" is equivalent to "ß" and their upper case
> letters are "SS" and "ẞ" respectively. (I didn't even know of the
> existence of "ẞ". AFAIK there's no word begins with eszett, but it
> seems that there's a case where "ẞ" appears in a word is spelled only
> with capital letters.

This "capital sharp s" is a recent invention that has never got much
traction.  I notice that on my Fedora 32 system with glibc 2.31 and de_DE.utf8,

SELECT lower(E'\u1E9E') = E'\u00DF', upper(E'\u00DF') = E'\u1E9E';

 ?column? │ ?column? 
══════════╪══════════
 t        │ f
(1 row)

which to me as a German speaker makes no sense.

But Tom's example was the wrong way around: "ß" is a lower case letter,
and the traditional upper case translation is "SS".

But the Turkish example is correct:

> > downcases to "ı"; one of each of those pairs is an ASCII letter, the
> > other is not).  Depending on which encoding is in use, these
> 
> Upper dotless "I" and lower dotted "i" are in ASCII (or English
> alphabet?).  That's interesting.

Yes.  In languages other than Turkish, "i" is the lower case version of "I",
and both are ASCII.  Only Turkish has an "ı" (U+0131) and an "İ" (U+0130).
That causes annoyance for Turks who create a table named KADIN and find
that PostgreSQL turns it into "kadin".

Yours,
Laurenz Albe



Reply via email to