Turkish 'I's can't D either

Ali Cehreli Mon, 24 Aug 2009 21:25:23 -0700

You may be aware of the problems related to the consistency of the two separate 
letter 'I's in the Turkish alphabet (and the alphabets that are based on the 
Turkish alphabet).


Lowercase and uppercase versions of the two are consistent in whether they have 
a dot or not:

  http://en.wikipedia.org/wiki/Turkish_I

Turkish alphabet being in a position so close to the western alphabets, but not 
close enough, puts it in a strange position. (Strangely; the same applies 
geographically, politically, socially, etc. as well... ;))

Computer systems *almost* work for Turkish, but not for those two letters.

I love the fact that D allows Unicode letters in the source code and that it 
natively supports Unicode. I cannot stress enough how important this is. That 
is the single biggest reason why I decided to finally write a programming 
tutorial. Thank you to all who proposed and implemented those features!

Back to the Turquois 'I's... What a programmer is to do who is writing programs 
that deals with Turkish letters?

a) Accept that Phobos too has this age old behavior that is a result of 
premature optimization (i.e. this code in tolower: c + (cast(char)'a' - 'A'))

b) Accept that the problem is unsolvable because the letter I has two 
minuscules, and the letter i has two majuscules anyway, and that the intent is 
not always clear

c) Accept Turkish alphabet as being pathological (merely for being in the 
minority!), and use a Turkish version of Phobos or some other library

d) Solve the problem with locale support

Is option d possible with today's systems? Whose resposibility is this anyway? 
OS? Language? Program? Something else?

The fact that alphanumerical ordering is also of interest, I think this has 
something to do with locales.

Is there a way for a program to work with Turkish letters and ensure that the 
following program produces the expected output of 'dotless i', 'I with dot', 
and 0?

import std.stdio;
import std.string;
import std.c.locale;
import std.uni;

void main()
{
    const char * result = setlocale(LC_ALL, "tr_TR.UTF-8");
    assert(result);

    writeln(toUniLower('I'));
    writeln(toUniUpper('i'));
    writeln(indexOf("I",
                    '\u0131',               // dotless i
                    (CaseSensitive).no));
}

This is a practical question. I really want to be able to work with Turkish... 
:)

Thank you,
Ali

Turkish 'I's can't D either

Reply via email to