On 2009-08-25 00:23:25 -0400, Ali Cehreli <acehr...@yahoo.com> said:
You may be aware of the problems related to the consistency of the two
separate letter 'I's in the Turkish alphabet (and the alphabets that
are based on the Turkish alphabet).
Lowercase and uppercase versions of the two are consistent in whether
they have a dot or not:
http://en.wikipedia.org/wiki/Turkish_I
Turkish alphabet being in a position so close to the western alphabets,
but not close enough, puts it in a strange position. (Strangely; the
same applies geographically, politically, socially, etc. as well... ;))
Computer systems *almost* work for Turkish, but not for those two letters.
I love the fact that D allows Unicode letters in the source code and
that it natively supports Unicode. I cannot stress enough how important
this is. That is the single biggest reason why I decided to finally
write a programming tutorial. Thank you to all who proposed and
implemented those features!
Back to the Turquois 'I's... What a programmer is to do who is writing
programs that deals with Turkish letters?
a) Accept that Phobos too has this age old behavior that is a result of
premature optimization (i.e. this code in tolower: c + (cast(char)'a' -
'A'))
b) Accept that the problem is unsolvable because the letter I has two
minuscules, and the letter i has two majuscules anyway, and that the
intent is not always clear
c) Accept Turkish alphabet as being pathological (merely for being in
the minority!), and use a Turkish version of Phobos or some other
library
d) Solve the problem with locale support
Is option d possible with today's systems? Whose resposibility is this
anyway? OS? Language? Program? Something else?
The fact that alphanumerical ordering is also of interest, I think this
has something to do with locales.
Is there a way for a program to work with Turkish letters and ensure
that the following program produces the expected output of 'dotless i',
'I with dot', and 0?
import std.stdio;
import std.string;
import std.c.locale;
import std.uni;
void main()
{
const char * result = setlocale(LC_ALL, "tr_TR.UTF-8");
assert(result);
writeln(toUniLower('I'));
writeln(toUniUpper('i'));
writeln(indexOf("I",
'\u0131', // dotless i
(CaseSensitive).no));
}
This is a practical question. I really want to be able to work with
Turkish... :)
Perhaps this could be of some inspiration. In Cocoa you can pass a
locale argument to many string methods (unfortunatly, not
lowercaseString or uppercaseStrings) to get the desired result. For
instance, the "rangeOfString:options:range:locale:" method can search
for substrings case-insentively, and it specifically discuss the
Turkish “ı” character under the locale parameter.
http://developer.apple.com/documentation/Cocoa/Reference/Foundation/Classes/NSString_Class/Reference/NSString.html#//apple_ref/occ/instm/NSString/rangeOfString:options:range:locale:
It's
also interesting to see that when you search for ß in a webpage using
Safari, it also matches every instance of SS (whatever your locale). ß
is a german character that becomes SS in uppercase.
- - -
What I'd like to see is an a base class representing a locale. Then you
can instanciate the locale you want (from a config file, by coding it
directly, having bindings to system APIs, or a mix of all this) and use
the locale. Something like:
class Locale
{
immutable:
string lowercase(string s);
string uppercase(string s);
int compare(string a, string b);
int compare(string a, string b);
// number & date formatting, etc.
}
immutable(Locale) systemLocale(); // get default system
locale
immutable(Locale) locale(string localeName); // get best matching locale
void main()
{
Locale turkish = locale("tr-TR");
writeln(turkish.lowercase("I")); // writes "ı"
writeln(turkish.uppercase("i")); // writes "İ"
Locale english = locale("en-US");
writeln(english.lowercase("I")); // writes "i"
writeln(english.uppercase("i")); // writes "I"
writeln(systemLocale.lowercase("I")); // depends on user settings
writeln(systemLocale.uppercase("i")); // depends on user settings
}
This way you can work with many locales at once. And there's no
reliance on a global state.
--
Michel Fortin
michel.for...@michelf.com
http://michelf.com/