On Sunday, 9 March 2014 at 08:32:09 UTC, monarch_dodra wrote:
On Saturday, 8 March 2014 at 20:05:36 UTC, Andrei Alexandrescu
wrote:
The current approach is a cut above treating strings as
arrays of bytes
for some languages, and still utterly broken for others. If
I'm
operating on a right to left language like Hebrew, what would
I expect
the result to be from something like countUntil?
The entire string processing paraphernalia is left to right. I
figure RTL languages are under-supported, but
s.retro.countUntil comes to mind.
Andrei
I'm pretty sure that all string operations are actually "front
to back". If I recall correctly, evenlanguages that "read"
right to left, are stored in a front to back manner: EG:
string[0] would be the right-most character. Is is only a
question of "display", and changes nothing to the code. As for
"countUntil", it would still work perfectly fine, as a RTL
reader would expect the counting to start at the "begining" eg:
the "Right" side.
I'm pretty confident RTL is 100% supported. The only issue is
the "front"/"left" abiguity, and the only one I know of is the
oddly named "stripLeft" function, which actually does a
"stripFront" anyways.
So I wouldn't worry about RTL.
Yeah, I think RTL strings are preceded by a code point that
indicates RTL display. It was just something I mentioned because
some operations might be confusing to the programmer.
But as mentioned, it is languages like indian, that have
complex graphemes, or languages with accentuated characters,
eg, most europeans ones, that can have problems, such as
canFind("cassé", 'e').
True. I still question why anyone would want to do
character-based operations on Unicode strings. I guess substring
searches could even end up with the same problem in some cases if
not implemented specifically for Unicode for the same reason, but
those should be far less common.