On 03-Oct-12 21:10, Ali Çehreli wrote:
On 10/03/2012 03:56 AM, Minas wrote:
Currently, toUpper() (and probably toLower()) does not handle greek
characters correctly. I fixed toUpper() by making a another function for
greek characters
// called if (c >= 0x387 && c <= 0x3CE)
dchar toUpperGreek(dchar c)
{
if( c >= 'α' && c <= 'ω' )
{
if( c == 'ς' )
c = 'Σ';
else
c -= 32;
}
else
{
dchar[dchar] map;
map['ά'] = 'Ά';
map['έ'] = 'Έ';
map['ή'] = 'Ή';
map['ί'] = 'Ί';
map['ϊ'] = 'Ϊ';
map['ΐ'] = 'Ϊ';
map['ό'] = 'Ό';
map['ύ'] = 'Ύ';
map['ϋ'] = 'Ϋ';
map['ΰ'] = 'Ϋ';
map['ώ'] = 'Ώ';
c = map[c];
}
return c;
}
Then, in toUpper()
{
....
if (c >= 0x387 && c <= 0x3CE)
c = toUpperGreek()...
///
}
Do you think it should stay like that or I should copy-paste it in the
body of toUpper()?
I'm going to fix toLower() as well and make a pull request.
I don't want to detract from the usefulness of these functions but
toupper and tolower has been two of the strangests functions of the
computer history. It is amazing that they are still accepted, because
they are useful in very limited situations and those situations are
becoming rarer as more and more systems support Unicode.
Glad you showed up!
One and by far the most useful case is case-insensitive matching.
That being said this doesn't and shouldn't involve toLower/toUpper (and
on the whole string) anywhere. Not only it's multipass vs single pass
but it's also wrong. As a lot of other ASCII-minded carry-overs.
Other then this and being used as some intermediate sanitized form I
don't think it has much use.
Two quick examples:
1) How should this string be capitalized in a scientific article?
"Anti-obesity effects of α-lipoic acid"
There is a lot of lousy conversions. The basic toLower is defined in the
standard, try it here:
http://unicode.org/cldr/utility/transform.jsp?a=Upper&b=Anti-obesity+effects+of+%CE%B1-lipoic+acid
I don't think the α in there should be upper-cased.
Depends on why you are doing it in the first place :) Capitalizing
scientific article strikes me as kind of strange as well.
2) How should this name be capitalized in a list of names?
"Ali"
Again what's the goal of capitalization here?
Simplifying matching afterwards? - Then it doesn't matter as long as
it's lousiness is acceptable (rarely so) and it stays within the system,
i.e. doesn't leak away.
It completely depends on the writing system of that string itself, not
even the current locale. (There are two uppercases that I know of, which
can be considered as correct: "ALI" and "ALİ".)
One word: tailoring. Basically any software made in Turkey has to do ALİ :)
Only half-joking.
I agree that your toUpper() and toLower() will be useful in many
contexts but will necessarily do the wrong thing in others.
Ali
--
Dmitry Olshansky