Re: how to localize console and GUI apps in Windows

2018-01-04 Thread Andrei via Digitalmars-d-learn

On Friday, 29 December 2017 at 18:13:04 UTC, H. S. Teoh wrote:
If the problem is in readln(), then you probably need to read 
the input in binary (i.e., as ubyte[]) and convert it manually.


Could you kindly explain how I can read console input into binary 
ubyte[]?






Re: how to localize console and GUI apps in Windows

2018-01-04 Thread Andrei via Digitalmars-d-learn

On Friday, 29 December 2017 at 18:13:04 UTC, H. S. Teoh wrote:
On Fri, Dec 29, 2017 at 10:35:53AM +, Andrei via 
Digitalmars-d-learn wrote:
This may be endurable if you write an application where 
Russian is only one of rare options, and what if your whole 
environment is totally Russian?


You mean if your environment uses a non-UTF encoding?  If your 
environment uses UTF, there is no problem.  I have code with 
strings in Russian (and other languages) embedded, and it's no 
problem because everything is in Unicode, all input and all 
output.


No, I mean difficulties to write a program based on non-ASCII 
locales. Every programming language learning since C starts with 
a "hello world" program which every non-English programmer 
essentially tries to translate to native language - and gets 
unreadable mess on the screen. Thousands try, hundreds look for a 
solution, dozens find it, and a few continue with the new 
language. That's not because these programmers cannot read 
English text-books, they can. That's because they want to write 
non-English programs for non-English people, and that's 
essential. And there are many programming languages (or rather 
their runtimes) which do not suffer such a deficiency.


That's the reason for UNICODE adoption all over the programming 
world - including D language, but what's the good for me if I can 
write in a D program a UTF8 string with my native language text, 
and get the same unreadable mess on the screen?


Yes, a new language in development can lack support for some 
features, but this forum branch shows that a simple and handy 
solution exists - yet nobody cares to bring it to the first pages 
of every text-book for beginners, at least as a footnote. Thus 
thousands of potential new language fans are lost from start.


But I understand that in Windows you may not have this luxury. 
So you have to deal with codepages and what-not.


Converting back and forth is not a big problem, and it actually 
also solves the problem of string comparisons, because std.uni 
provides utilities for collating strings, etc.. But it only 
works for Unicode, so you have to convert to Unicode internally 
anyway.  Also, for static strings, it's not hard to make the 
codepage mapping functions CTFE-able, so you can actually write 
string literals in a codepage and have the compiler 
automatically convert it to UTF-8.


The other approach, if you don't like the idea of converting 
codepages all the time, is to explicitly work in ubyte[] for 
all strings. Or, preferably, create your own string type with 
ubyte[] representation underneath, and implement your own 
comparison functions, etc., then use this type for all strings. 
Better yet, contribute this to code.dlang.org so that others 
who have the same problem can reuse your code instead of 
needing to write their own.


I'd definitely try this if I decide to use D language for my 
purposes (which not settled yet). But to decide I need some 
experience, and for now it stopped at reading the user's input 
(for training I intend to translate into D my recent rather 
complex interactive C# program).


Still this does not decide localized input problem: any 
localized input throws an exception “std.utf.UTFException...  
Invalid UTF-8 sequence”.


Is the exception thrown in readln() or in writeln()? If it's in
writeln(), it shouldn't be a big deal, you just have to pass 
the data returned by readln() to fromKOI8 (or whatever other 
codepage you're using).


If the problem is in readln(), then you probably need to read 
the input in binary (i.e., as ubyte[]) and convert it manually. 
Unfortunately, there's no other way around this if you're 
forced to use codepages. The ideal situation is if you can just 
use Unicode throughout your environment. But of course, 
sometimes you have no choice.


It depends.

If I avoid proper console code page initializing, I see in 
debugger that runtime reads the user's input as CP866 (MS DOS) 
Cyrillic and then throws the exception "Invalid UTF-8 sequence" 
when trying to handle it as UTF8 string (in particular by strip() 
or writeln() functions). This situation seems quite manageable by 
code page conversions you've mentioned above. I've tried first 
library function found (std.windows.charset), and got a rather 
fanciful working statement:


response = fromMBSz((readln()~"\0").ptr, 1).strip();

which assigns correct Latin/Cyrillic contents to the response 
variable.


And if I initialize console with SetConsoleCP(65001) statement 
things get worse, as I've said above. Then readln() statement 
returns an empty string and something gets broken inside the 
runtime, because any further readln() statements do not wait for 
user input, and return empty strings immediately.







Re: how to localize console and GUI apps in Windows

2018-01-03 Thread Andrei via Digitalmars-d-learn

On Wednesday, 3 January 2018 at 09:11:32 UTC, thedeemon wrote:
Windows API contains two sets of functions: those whose names 
end with A (meaning ANSI), the other where names end with W 
(wide characters, meaning Unicode). The sample uses TextOutA, 
this function that expects 8-bit encoding.


Gosh, I should new this :)) Thanks for the point! TextOutW() 
works fine with wstring texts in this example and no more changes 
needed.


That's just enough for this example. Thank you!

Yet my particular interest is console interconnections. With the 
help of this forum I've learned console settings to write 
Cyrillic properly and simply to the console using UTF8 encoding.


One thing that remains is to read and process the user's input.

For now in the example I've cited above response=readln(); 
statement returns an empty string, in a console set for UTF8 code 
page, if the user's input contains any Cyrillic letters. Then the 
program's behavior differs depending on the compiler (or more 
likely on the runtime library): the one compiled with ldc 
continues to read on and returns empty lines, instead of the 
user's input, and the one compiled with dmd only returns empty 
lines not waiting for the user's input and not actually reading 
anything (i.e. it falls into indefinite loop busily printing 
empty response hundreds times a second).


That's only for localized input. With ASCII input same program 
works fine.


May be there is some more settings I must learn to set console to 
properly read non-ASCII input?




Re: how to localize console and GUI apps in Windows

2018-01-03 Thread Martin Krejcirik via Digitalmars-d-learn

On Friday, 29 December 2017 at 11:14:39 UTC, zabruk70 wrote:


AFAIK, Windows GUI have no ANSI/OEM problem.
You can use Unicode.


Be advised there are some problems with console UTF-8 
input/output in Windows. The most usable is Win10 new console 
window but I recommend to use Windows API (WriteConsole) instead. 
It works correctly regardless of codepage setting, os version and 
C library.




Re: how to localize console and GUI apps in Windows

2018-01-03 Thread thedeemon via Digitalmars-d-learn

On Wednesday, 3 January 2018 at 09:11:32 UTC, thedeemon wrote:
you need to use TextOutW that accepts 16-bit Unicode, so just 
convert your UTF-8 D strings to 16-bit Unicode wstrings, there 
are appropriate conversion functions in Phobos.


Some details:
import std.utf : toUTF16z;
...
string s = "привет";
TextOutW(s.toUTF16z);


Re: how to localize console and GUI apps in Windows

2018-01-03 Thread thedeemon via Digitalmars-d-learn

On Wednesday, 3 January 2018 at 06:42:42 UTC, Andrei wrote:

AFAIK, Windows GUI have no ANSI/OEM problem.
You can use Unicode.


Partly, yes. Just for a test I tried to "russify" the example 
Windows GUI program that comes with D installation pack 
(samples\d\winsamp.d). Window captions, button captions, 
message box texts written in UTF8 all shows fine. But direct 
text output functions CreateFont()/TextOut() render all 
Cyrillic from UTF8 strings into garbage.


Windows API contains two sets of functions: those whose names end 
with A (meaning ANSI), the other where names end with W (wide 
characters, meaning Unicode). The sample uses TextOutA, this 
function that expects 8-bit encoding. Properly, you need to use 
TextOutW that accepts 16-bit Unicode, so just convert your UTF-8 
D strings to 16-bit Unicode wstrings, there are appropriate 
conversion functions in Phobos.




Re: how to localize console and GUI apps in Windows

2018-01-02 Thread Andrei via Digitalmars-d-learn

On Friday, 29 December 2017 at 11:14:39 UTC, zabruk70 wrote:

On Friday, 29 December 2017 at 10:35:53 UTC, Andrei wrote:
Though it is not suitable for GUI type of a Windows 
application.


AFAIK, Windows GUI have no ANSI/OEM problem.
You can use Unicode.


Partly, yes. Just for a test I tried to "russify" the example 
Windows GUI program that comes with D installation pack 
(samples\d\winsamp.d). Window captions, button captions, message 
box texts written in UTF8 all shows fine. But direct text output 
functions CreateFont()/TextOut() render all Cyrillic from UTF8 
strings into garbage.



For Windows ANSI/OEM problem you can use also
https://dlang.org/phobos/std_windows_charset.html


Thank you very much, toMBSz() makes requisite translation for  
TextOut() function with some workarounds.






Re: how to localize console and GUI apps in Windows

2017-12-29 Thread H. S. Teoh via Digitalmars-d-learn
On Fri, Dec 29, 2017 at 10:35:53AM +, Andrei via Digitalmars-d-learn wrote:
> On Thursday, 28 December 2017 at 18:45:39 UTC, H. S. Teoh wrote:
> > On Thu, Dec 28, 2017 at 05:56:32PM +, Andrei via Digitalmars-d-learn
> > wrote:
> > ...
> > The string / wstring / dstring types in D are intended to be Unicode
> > strings.  If you need to use other encodings, you really should be
> > using ubyte[] or const(ubyte)[] or immutable(ubyte)[], instead of
> > string.
> 
> Thank you Teoh for advise and good example! I was looking towards
> writing something like that if no decision exists. Still this way of
> deliberate translations seems to be not the best. It supposes explicit
> workaround for every ahchoo in Russian and steady converting ubyte[]
> to string and back around. No formatting gems, no simple and elegant
> I/O statements or string/char comparisons. This may be endurable if
> you write an application where Russian is only one of rare options,
> and what if your whole environment is totally Russian?

You mean if your environment uses a non-UTF encoding?  If your
environment uses UTF, there is no problem.  I have code with strings in
Russian (and other languages) embedded, and it's no problem because
everything is in Unicode, all input and all output.

But I understand that in Windows you may not have this luxury. So you
have to deal with codepages and what-not.

Converting back and forth is not a big problem, and it actually also
solves the problem of string comparisons, because std.uni provides
utilities for collating strings, etc.. But it only works for Unicode, so
you have to convert to Unicode internally anyway.  Also, for static
strings, it's not hard to make the codepage mapping functions CTFE-able,
so you can actually write string literals in a codepage and have the
compiler automatically convert it to UTF-8.

The other approach, if you don't like the idea of converting codepages
all the time, is to explicitly work in ubyte[] for all strings. Or,
preferably, create your own string type with ubyte[] representation
underneath, and implement your own comparison functions, etc., then use
this type for all strings. Better yet, contribute this to code.dlang.org
so that others who have the same problem can reuse your code instead of
needing to write their own.

[...]
> p.s. I’ve found that I may set “Consolas” font for a console window
> and then you can output properly localized UTF8 strings without any
> special code in D script managing code pages. Still this does not
> decide localized input problem: any localized input throws an
> exception “std.utf.UTFException...  Invalid UTF-8 sequence”.

Is the exception thrown in readln() or in writeln()? If it's in
writeln(), it shouldn't be a big deal, you just have to pass the data
returned by readln() to fromKOI8 (or whatever other codepage you're
using).

If the problem is in readln(), then you probably need to read the input
in binary (i.e., as ubyte[]) and convert it manually. Unfortunately,
there's no other way around this if you're forced to use codepages. The
ideal situation is if you can just use Unicode throughout your
environment. But of course, sometimes you have no choice.


T

-- 
Heuristics are bug-ridden by definition. If they didn't have bugs, they'd be 
algorithms.


Re: how to localize console and GUI apps in Windows

2017-12-29 Thread zabruk70 via Digitalmars-d-learn

On Friday, 29 December 2017 at 10:35:53 UTC, Andrei wrote:

Though it is not suitable for GUI type of a Windows application.


AFAIK, Windows GUI have no ANSI/OEM problem.
You can use Unicode.

For Windows ANSI/OEM problem you can use also
https://dlang.org/phobos/std_windows_charset.html



Re: how to localize console and GUI apps in Windows

2017-12-29 Thread Andrei via Digitalmars-d-learn

On Thursday, 28 December 2017 at 18:45:39 UTC, H. S. Teoh wrote:
On Thu, Dec 28, 2017 at 05:56:32PM +, Andrei via 
Digitalmars-d-learn wrote:

...
The string / wstring / dstring types in D are intended to be 
Unicode strings.  If you need to use other encodings, you 
really should be using ubyte[] or const(ubyte)[] or 
immutable(ubyte)[], instead of string.


Thank you Teoh for advise and good example! I was looking towards 
writing something like that if no decision exists. Still this way 
of deliberate translations seems to be not the best. It supposes 
explicit workaround for every ahchoo in Russian and steady 
converting ubyte[] to string and back around. No formatting gems, 
no simple and elegant I/O statements or string/char comparisons. 
This may be endurable if you write an application where Russian 
is only one of rare options, and what if your whole environment 
is totally Russian?


Or some other nonASCII locale... Many other cultures have same 
mix of DOS/Window/Unix code pages. The decision to use only 
Unicode for strings in D language seems excellent just because of 
this, but the realization turns out to be delusive. Folks in such 
countries won’t appreciate a language which is elegant only for 
English-spoken intercommunications.


This problem is common for most programming languages and 
runtimes I know of. The only system which has decided the whole 
case is .NET I think.


The way proposed by zabruk70 below seems more appropriate though 
more particular too - I feel it suits only console type of 
applications. Alas, this type of application proved to be buggy 
too.


On Thursday, 28 December 2017 at 22:49:30 UTC, zabruk70 wrote:

you can just set console CP to UTF-8:

https://github.com/CyberShadow/ae/blob/master/sys/console.d


Yes! This seems to be the required, thank you very much! Though 
it is not suitable for GUI type of a Windows application.


Still some testing showed that this way conforms only console 
output. Simple read/write/compare script listed below works very 
well until the user enters something Russian. It then prints 
**empty** response and falls into indefinite loop printing the 
prompt and then immediately empty response without actually 
reading it.


But I think this is subject for ”Issues” part of this forum.

p.s. I’ve found that I may set “Consolas” font for a console 
window and then you can output properly localized UTF8 strings 
without any special code in D script managing code pages. Still 
this does not decide localized input problem: any localized input 
throws an exception “std.utf.UTFException... Invalid UTF-8 
sequence”.


The script:

import core.sys.windows.windows;
import std.stdio;
import std.string;

int main(string[] args)
{
const UTF8CP = 65001;
UINT oldCP, oldOutputCP;
oldCP = GetConsoleCP();
oldOutputCP = GetConsoleOutputCP();

SetConsoleCP(UTF8CP);
SetConsoleOutputCP(UTF8CP);

writeln("hello world, привет всем!");

bool quit = false;
string response;
while (!quit)
{
write("responde something: ");
response=readln().strip();
writefln("your response is \"%s\"", response);
if (response == "quit")
{
writeln("good buy then!");
quit = true;
}
}

SetConsoleCP(oldCP);
SetConsoleOutputCP(oldOutputCP);

return 0;
}



Re: how to localize console and GUI apps in Windows

2017-12-28 Thread zabruk70 via Digitalmars-d-learn

you can just set console CP to UTF-8:

https://github.com/CyberShadow/ae/blob/master/sys/console.d



Re: how to localize console and GUI apps in Windows

2017-12-28 Thread H. S. Teoh via Digitalmars-d-learn
On Thu, Dec 28, 2017 at 05:56:32PM +, Andrei via Digitalmars-d-learn wrote:
> There is one everlasting problem writing Cyrillic programs in Windows:
> Microsoft consequently invented two much different code pages for
> Russia and other Cyrillic-alphabet countries: first was MSDOS-866 (and
> alike), second Windows-1251. Nowadays MS Windows uses first code page
> for console programs, second for GUI applications, and there always
> are many workarounds to get proper translation between them. Mostly a
> programmer should write program sources either in one code page for
> console and other for GUI, or use .NET, which basically uses UTF8 in
> sources and makes seamless translation depending on back end.
> 
> In D language which uses only UTF8 for string encoding I cannot write
> neither MS866 code page program texts, nor Windows-1251 - both cases
> end in a compiler error like "Invalid trailing code unit" or "Outside
> Unicode code space". And writing Cyrillic strings in UTF8 format is
> fatal for both console and GUI Windows targets.
> 
> My question is: is there any standard means to translate Cyrillic or
> any other localized UTF8 strings for console and GUI output in D
> libraries. If so - where I can get more information and good example.
> Google would not help.
[...]

The string / wstring / dstring types in D are intended to be Unicode
strings.  If you need to use other encodings, you really should be using
ubyte[] or const(ubyte)[] or immutable(ubyte)[], instead of string.

One approach is to use UTF-8 in your code, and only translate to one of
the code pages when you need to produce output.  I wrote a small module
for translating to/from KOI8-R when dealing with Russian text; you might
find it helpful:

---
/**
 * Module to convert between UTF and KOI8-R
 */
module koi8r;

import std.string;
import std.range;

static immutable ubyte[0x450 - 0x410] utf2koi8r = [
225, 226, 247, 231, 228, 229, 246, 250, // АБВГДЕЖЗ
233, 234, 235, 236, 237, 238, 239, 240, // ИЙКЛМНОП
242, 243, 244, 245, 230, 232, 227, 254, // РСТУФХЦЧ
251, 253, 255, 249, 248, 252, 224, 241, // ШЩЪЫЬЭЮЯ
193, 194, 215, 199, 196, 197, 214, 218, // абвгдежз
201, 202, 203, 204, 205, 206, 207, 208, // ийклмноп
210, 211, 212, 213, 198, 200, 195, 222, // рстуфхцч
219, 221, 223, 217, 216, 220, 192, 209  // шщъыьэюя
];

/**
 * Translates a range of UTF characters into KOI8-R characters.
 * Returns: Range of KOI8-R characters (as ubyte).
 */
auto toKOI8r(R)(R range)
if (isInputRange!R && is(ElementType!R : dchar))
{
static struct Result
{
R _range;

@property bool empty() { return _range.empty; }

@property ubyte front()
{
dchar ch = _range.front;

// ASCII
if (ch < 128)
return cast(ubyte)ch;

// Primary alphabetic range
if (ch >= 0x410 && ch < 0x450)
return utf2koi8r[ch - 0x410];

// Special case: Ё and ё are outside the usual range.
if (ch == 0x401) return 179;
if (ch == 0x451) return 163;

throw new Exception(
"Encoding error: unable to convert '%c' to KOI8-R".format(ch));
}

void popFront() { _range.popFront(); }

static if (isForwardRange!R)
{
@property Result save()
{
Result copy;
copy._range = _range.save;
return copy;
}
}
}
return Result(range);
}

unittest
{
import std.string;
import std.algorithm : equal;

assert("юабцдефгхийклмнопярстужвьызшэщчъ".toKOI8r.equal(iota(192, 224)));
assert("ЮАБЦДЕФГХИЙКЛМНОПЯРСТУЖВЬЫЗШЭЩЧЪ".toKOI8r.equal(iota(224, 256)));
}

unittest
{
auto r = "abc абв".toKOI8r;
static assert(isForwardRange!(typeof(r)));
import std.algorithm.comparison : equal;
assert(r.equal(['a', 'b', 'c', ' ', 193, 194, 215]));
}

static dchar[0x100 - 0xC0] koi8r2utf = [
'ю', 'а', 'б', 'ц', 'д', 'е', 'ф', 'г', // 192-199
'х', 'и', 'й', 'к', 'л', 'м', 'н', 'о', // 200-207
'п', 'я', 'р', 'с', 'т', 'у', 'ж', 'в', // 208-215
'ь', 'ы', 'з', 'ш', 'э', 'щ', 'ч', 'ъ', // 216-223
'Ю', 'А', 'Б', 'Ц', 'Д', 'Е', 'Ф', 'Г', // 224-231
'Х', 'И', 'Й', 'К', 'Л', 'М', 'Н', 'О', // 232-239
'П', 'Я', 'Р', 'С', 'Т', 'У', 'Ж', 'В', // 240-247
'Ь', 'Ы', 'З', 'Ш', 'Э', 'Щ', 'Ч', 'Ъ'  // 248-255
];

/**
 * Translates a range of KOI8-R characters to UTF.
 * Returns: Range of UTF characters (as dchar).
 */
auto fromKOI8r(R)(R range)
if (isInputRange!R && is(ElementType!R : ubyte))
{
static struct Result
{
R _range;
@property bool empty() { return _range.empty; }
@property dchar front()
{
ubyte b = _range.front;
if (b < 128) return b;
if (b >= 192)
return koi8r2utf[b - 

how to localize console and GUI apps in Windows

2017-12-28 Thread Andrei via Digitalmars-d-learn
There is one everlasting problem writing Cyrillic programs in 
Windows: Microsoft consequently invented two much different code 
pages for Russia and other Cyrillic-alphabet countries: first was 
MSDOS-866 (and alike), second Windows-1251. Nowadays MS Windows 
uses first code page for console programs, second for GUI 
applications, and there always are many workarounds to get proper 
translation between them. Mostly a programmer should write 
program sources either in one code page for console and other for 
GUI, or use .NET, which basically uses UTF8 in sources and makes 
seamless translation depending on back end.


In D language which uses only UTF8 for string encoding I cannot 
write neither MS866 code page program texts, nor Windows-1251 - 
both cases end in a compiler error like "Invalid trailing code 
unit" or "Outside Unicode code space". And writing Cyrillic 
strings in UTF8 format is fatal for both console and GUI Windows 
targets.


My question is: is there any standard means to translate Cyrillic 
or any other localized UTF8 strings for console and GUI output in 
D libraries. If so - where I can get more information and good 
example. Google would not help.


Thanks.