On Fri, Dec 29, 2017 at 10:35:53AM +0000, Andrei via Digitalmars-d-learn wrote: > On Thursday, 28 December 2017 at 18:45:39 UTC, H. S. Teoh wrote: > > On Thu, Dec 28, 2017 at 05:56:32PM +0000, Andrei via Digitalmars-d-learn > > wrote: > > ... > > The string / wstring / dstring types in D are intended to be Unicode > > strings. If you need to use other encodings, you really should be > > using ubyte[] or const(ubyte)[] or immutable(ubyte)[], instead of > > string. > > Thank you Teoh for advise and good example! I was looking towards > writing something like that if no decision exists. Still this way of > deliberate translations seems to be not the best. It supposes explicit > workaround for every ahchoo in Russian and steady converting ubyte[] > to string and back around. No formatting gems, no simple and elegant > I/O statements or string/char comparisons. This may be endurable if > you write an application where Russian is only one of rare options, > and what if your whole environment is totally Russian?
You mean if your environment uses a non-UTF encoding? If your environment uses UTF, there is no problem. I have code with strings in Russian (and other languages) embedded, and it's no problem because everything is in Unicode, all input and all output. But I understand that in Windows you may not have this luxury. So you have to deal with codepages and what-not. Converting back and forth is not a big problem, and it actually also solves the problem of string comparisons, because std.uni provides utilities for collating strings, etc.. But it only works for Unicode, so you have to convert to Unicode internally anyway. Also, for static strings, it's not hard to make the codepage mapping functions CTFE-able, so you can actually write string literals in a codepage and have the compiler automatically convert it to UTF-8. The other approach, if you don't like the idea of converting codepages all the time, is to explicitly work in ubyte[] for all strings. Or, preferably, create your own string type with ubyte[] representation underneath, and implement your own comparison functions, etc., then use this type for all strings. Better yet, contribute this to code.dlang.org so that others who have the same problem can reuse your code instead of needing to write their own. [...] > p.s. I’ve found that I may set “Consolas” font for a console window > and then you can output properly localized UTF8 strings without any > special code in D script managing code pages. Still this does not > decide localized input problem: any localized input throws an > exception “std.utf.UTFException... Invalid UTF-8 sequence”. Is the exception thrown in readln() or in writeln()? If it's in writeln(), it shouldn't be a big deal, you just have to pass the data returned by readln() to fromKOI8 (or whatever other codepage you're using). If the problem is in readln(), then you probably need to read the input in binary (i.e., as ubyte[]) and convert it manually. Unfortunately, there's no other way around this if you're forced to use codepages. The ideal situation is if you can just use Unicode throughout your environment. But of course, sometimes you have no choice. T -- Heuristics are bug-ridden by definition. If they didn't have bugs, they'd be algorithms.