Follow-up Comment #10, bug #15377 (project freeciv): Actually I came to the exact same conclusion Ulrik: we should simply rewrite the character functions to work only on ascii (range 0 to 127) characters, and leave others alone. Using the system-provided functions (such as tolower) which work in the current locale makes no sense and never will make sense. This means some UTF8 characters wont get properly converted which will cause minor bugs (for instance it wont do proper case comparison on non-ascii letters in player/ruler names), but will remove a much larger set of bugs (for instance it may currently do wrong case comparison on those same non-ascii letters).
As for making or using a UTF-8 variant for these, it's a bit more complicated than that. You can't have isspace() or tolower() or isalpha() functions in utf8 that go byte-by-byte. This means the same job you were doing via byte iteration over the string has to be done entirely differently, in every case. Additionally, some of the places these functions are called they are given utf-8, some they are given ascii, and in some they may be given latin1 or a different character set - so again it all requires careful auditing of the users in that case, of which there are a lot. If we go that route I'd rather rewrite all freeciv core to use ucs2 or ucs4 (fixed-width unicode) strings and impose that on these functions, and then we get type-checking out of it as well. This wouldn't be that hard but then we have to convert all data files (in utf-8) and all GUI strings (also in utf-8, for gtk2) on both input and output, everywhere, which is a lot of lines of change. It's probably overkill. _______________________________________________________ Reply to this item at: <http://gna.org/bugs/?15377> _______________________________________________ Message sent via/by Gna! http://gna.org/ _______________________________________________ Freeciv-dev mailing list Freeciv-dev@gna.org https://mail.gna.org/listinfo/freeciv-dev