On Sunday, 9 March 2014 at 08:32:09 UTC, monarch_dodra wrote:
On Saturday, 8 March 2014 at 20:05:36 UTC, Andrei Alexandrescu wrote:
The current approach is a cut above treating strings as arrays of bytes for some languages, and still utterly broken for others. If I'm operating on a right to left language like Hebrew, what would I expect
the result to be from something like countUntil?

The entire string processing paraphernalia is left to right. I figure RTL languages are under-supported, but s.retro.countUntil comes to mind.

Andrei

I'm pretty sure that all string operations are actually "front to back". If I recall correctly, evenlanguages that "read" right to left, are stored in a front to back manner: EG: string[0] would be the right-most character. Is is only a question of "display", and changes nothing to the code. As for "countUntil", it would still work perfectly fine, as a RTL reader would expect the counting to start at the "begining" eg: the "Right" side.

I'm pretty confident RTL is 100% supported. The only issue is the "front"/"left" abiguity, and the only one I know of is the oddly named "stripLeft" function, which actually does a "stripFront" anyways.

So I wouldn't worry about RTL.

Yeah, I think RTL strings are preceded by a code point that indicates RTL display. It was just something I mentioned because some operations might be confusing to the programmer.


But as mentioned, it is languages like indian, that have complex graphemes, or languages with accentuated characters, eg, most europeans ones, that can have problems, such as canFind("cassé", 'e').

True. I still question why anyone would want to do character-based operations on Unicode strings. I guess substring searches could even end up with the same problem in some cases if not implemented specifically for Unicode for the same reason, but those should be far less common.

Reply via email to