Re: Reading unicode string with readf ("%s")

anonymous via Digitalmars-d-learn Tue, 04 Nov 2014 05:06:03 -0800

On Monday, 3 November 2014 at 19:37:20 UTC, Ivan Kazmenko wrote:

Hi!


The following code does not correctly handle Unicode strings.
-----
import std.stdio;
void main () {
        string s;
        readf ("%s", &s);
        write (s);
}
-----

Example input ("Test." in cyrillic):
-----
Тест.
-----
(hex: D0 A2 D0 B5 D1 81 D1 82 2E 0D 0A)

Example output:
-----
Ð¢ÐµÑÑ.
-----
(hex: C3 90 C2 A2 C3 90 C2 B5 C3 91 C2 81 C3 91 C2 82 2E 0D 0A)

Here, the input bytes are handled separately: D0 -> C3 90, A2-> C2 A2, etc.


On the bright side, reading the file with readln works properly.

Is this an expected shortcoming of "%s"-reading a string?

No.

Could it be made to work somehow?


Yes. std.stdio.LockingTextReader is to blame:

void main()
{
     import std.stdio;
     auto ltr = LockingTextReader(std.stdio.stdin);
     write(ltr);
}
----
$ echo Тест | rdmd test.d
Ð¢ÐµÑÑ

LockingTextReader has a dchar front. But it doesn't do anydecoding. The dchar front is really a char front.

Is it worth a bug report?


Yes.

Ivan Kazmenko.

Re: Reading unicode string with readf ("%s")

Reply via email to