Re: Strange behavior in console with UTF-8
On Monday, 28 March 2016 at 18:28:33 UTC, Steven Schveighoffer wrote: On 3/27/16 12:04 PM, Jonathan Villa wrote: I can reproduce your issue on windows. It works on Mac OS X. I see different behavior on 32-bit (DMC stdlib) vs. 64-bit (MSVC stdlib). On both, the line is not read properly (I get a length of 0). On 32-bit, the program exits immediately, indicating it cannot read any more data. On 64-bit, the program continues to allow input. I don't think this is normal behavior, and should be filed as a bug. I'm not a Windows developer normally, but I would guess this is an issue with the Windows flavors of readln. Please file here: https://issues.dlang.org under the Phobos component. -Steve Ok, I'm gonna register it with your data. Thanks. JV.
Re: Strange behavior in console with UTF-8
On 3/27/16 12:04 PM, Jonathan Villa wrote: On Saturday, 26 March 2016 at 16:34:34 UTC, Steven Schveighoffer wrote: On 3/25/16 6:47 PM, Jonathan Villa wrote: On Friday, 25 March 2016 at 13:58:44 UTC, Steven Schveighoffer wrote: [...] OK, the following inputs I've tested: á, é, í, ó, ú, ñ, à, è, ì, ò, ù. Just one input is enough to reproduce the behaviour. JV It's the same Ali suggested (if I get it right) and the behaviour its the same. It just get to send a UTF8 char to reproduce the mess, independently of the char type you send. At this point, I think knowing exactly what input you are sending would be helpful. Can you attach a file which has the input that causes the error? Or just paste the input into your post. The following chars I've tested: á, é, í, ó, ú, ñ, à, è, ì, ò, ù. Just one input of thouse is enough to reproduce the behaviour I can reproduce your issue on windows. It works on Mac OS X. I see different behavior on 32-bit (DMC stdlib) vs. 64-bit (MSVC stdlib). On both, the line is not read properly (I get a length of 0). On 32-bit, the program exits immediately, indicating it cannot read any more data. On 64-bit, the program continues to allow input. I don't think this is normal behavior, and should be filed as a bug. I'm not a Windows developer normally, but I would guess this is an issue with the Windows flavors of readln. Please file here: https://issues.dlang.org under the Phobos component. -Steve
Re: Strange behavior in console with UTF-8
On Saturday, 26 March 2016 at 16:34:34 UTC, Steven Schveighoffer wrote: On 3/25/16 6:47 PM, Jonathan Villa wrote: At this point, I think knowing exactly what input you are sending would be helpful. Can you attach a file which has the input that causes the error? Or just paste the input into your post. -Steve I've tested on Debian 4.2 x64 using CHAR type, and it behaves correctly without any problems. Clearly this bug must be something related with the Windows console. Here's the behaviour in Windows 10 x64: http://prntscr.com/akskt1 And here's in Debian x64 4.2: http://prntscr.com/akskjw JV
Re: Strange behavior in console with UTF-8
On Saturday, 26 March 2016 at 16:34:34 UTC, Steven Schveighoffer wrote: On 3/25/16 6:47 PM, Jonathan Villa wrote: On Friday, 25 March 2016 at 13:58:44 UTC, Steven Schveighoffer wrote: [...] OK, the following inputs I've tested: á, é, í, ó, ú, ñ, à, è, ì, ò, ù. Just one input is enough to reproduce the behaviour. JV It's the same Ali suggested (if I get it right) and the behaviour its the same. It just get to send a UTF8 char to reproduce the mess, independently of the char type you send. At this point, I think knowing exactly what input you are sending would be helpful. Can you attach a file which has the input that causes the error? Or just paste the input into your post. -Steve The following chars I've tested: á, é, í, ó, ú, ñ, à, è, ì, ò, ù. Just one input of thouse is enough to reproduce the behaviour
Re: Strange behavior in console with UTF-8
On 3/25/16 6:47 PM, Jonathan Villa wrote: On Friday, 25 March 2016 at 13:58:44 UTC, Steven Schveighoffer wrote: On 3/24/16 8:54 PM, Jonathan Villa wrote: [...] D's File i/o uses C's FILE * i/o system. At least on Windows, this has literally zero support for wchar (you can set stream width, and the library just ignores it). What is likely happening is that it is putting the char code units into wchar buffer directly, which is not what you want. I am not certain of this cause, but I would steer clear of any i/o that is not char-based. What you can do is read into a char buffer, and then re-encode using std.conv.to to get wchar strings if you need that. It's the same Ali suggested (if I get it right) and the behaviour its the same. It just get to send a UTF8 char to reproduce the mess, independently of the char type you send. At this point, I think knowing exactly what input you are sending would be helpful. Can you attach a file which has the input that causes the error? Or just paste the input into your post. -Steve
Re: Strange behavior in console with UTF-8
On Friday, 25 March 2016 at 13:58:44 UTC, Steven Schveighoffer wrote: On 3/24/16 8:54 PM, Jonathan Villa wrote: [...] D's File i/o uses C's FILE * i/o system. At least on Windows, this has literally zero support for wchar (you can set stream width, and the library just ignores it). What is likely happening is that it is putting the char code units into wchar buffer directly, which is not what you want. I am not certain of this cause, but I would steer clear of any i/o that is not char-based. What you can do is read into a char buffer, and then re-encode using std.conv.to to get wchar strings if you need that. -Steve It's the same Ali suggested (if I get it right) and the behaviour its the same. It just get to send a UTF8 char to reproduce the mess, independently of the char type you send. JV
Re: Strange behavior in console with UTF-8
On 3/24/16 8:54 PM, Jonathan Villa wrote: I prefer to post this thing here because it could that I'm doing something wrong. I'm using std.stdio -> readln() to read whatever I'm typing in the console. BUT, if the line contains some UTF-8 characters, the data obtained is EMPTY and module runnable; import std.stdio; import std.string : chomp; import std.experimental.logger; void doSomethingElse(wchar[] data) { writeln("hello!"); } int main(string[] args) { /* Some fix I found to fix UTF-8 related problems, I'm using Windows 10 */ version(Windows) { import core.sys.windows.windows; if (SetConsoleCP(65001) == 0) throw new Exception("failure"); if (SetConsoleOutputCP(65001) == 0) throw new Exception("failure"); } FileLogger fl = new FileLogger("log.log"); wchar[] readerBuffer; readln(readerBuffer); readerBuffer = chomp(readerBuffer); fl.info(readerBuffer.length); /* <- if the readed string contains at least one UTF-8 char this prints 0, else it prints its length */ if (readerBuffer != "exit"w) doSomethingElse(readerBuffer); /* Also, all the following code doesn't run as expected, the program doesn't wait for you, it executes readln() even without pressing/sending a key */ readln(readerBuffer); fl.info(readerBuffer.length); readln(readerBuffer); fl.info(readerBuffer.length); readln(readerBuffer); fl.info(readerBuffer.length); readln(readerBuffer); fl.info(readerBuffer.length); readln(readerBuffer); fl.info(readerBuffer.length); return 0; } The real code is bigger but this describes the bug. Also, if it needs to print UTF-8 there's no problem. My main problem is that the line is gonna be sended through a TCP socket and I wanna make it work with UTF-8. I'm using WCHAR instead of CHAR with the hope to get less problems in the future. I you comment the fixed Windows code, the program crashes http://prntscr.com/ajmy14 Also I tried stdin.flush() right after the first readln() but nothing seems to fix it. I'm doing something wrong? many thanks. D's File i/o uses C's FILE * i/o system. At least on Windows, this has literally zero support for wchar (you can set stream width, and the library just ignores it). What is likely happening is that it is putting the char code units into wchar buffer directly, which is not what you want. I am not certain of this cause, but I would steer clear of any i/o that is not char-based. What you can do is read into a char buffer, and then re-encode using std.conv.to to get wchar strings if you need that. -Steve
Re: Strange behavior in console with UTF-8
On Friday, 25 March 2016 at 01:03:06 UTC, Ali Çehreli wrote: > Try char: char[] readerBuffer; Ali Also tried with dchar ... there's no changes.
Re: Strange behavior in console with UTF-8
On Friday, 25 March 2016 at 01:03:06 UTC, Ali Çehreli wrote: On 03/24/2016 05:54 PM, Jonathan Villa wrote: Try char: char[] readerBuffer; flush() has no effect on input streams. Ali Thankf fot he quick reply. Unfortunately it behaves exactly as before with wchar.
Re: Strange behavior in console with UTF-8
On 03/24/2016 05:54 PM, Jonathan Villa wrote: > I'm using WCHAR instead of CHAR > with the hope to get less problems in the future. Try char: char[] readerBuffer; > Also I tried stdin.flush() flush() has no effect on input streams. Ali
Strange behavior in console with UTF-8
I prefer to post this thing here because it could that I'm doing something wrong. I'm using std.stdio -> readln() to read whatever I'm typing in the console. BUT, if the line contains some UTF-8 characters, the data obtained is EMPTY and module runnable; import std.stdio; import std.string : chomp; import std.experimental.logger; void doSomethingElse(wchar[] data) { writeln("hello!"); } int main(string[] args) { /* Some fix I found to fix UTF-8 related problems, I'm using Windows 10 */ version(Windows) { import core.sys.windows.windows; if (SetConsoleCP(65001) == 0) throw new Exception("failure"); if (SetConsoleOutputCP(65001) == 0) throw new Exception("failure"); } FileLogger fl = new FileLogger("log.log"); wchar[] readerBuffer; readln(readerBuffer); readerBuffer = chomp(readerBuffer); fl.info(readerBuffer.length); /* <- if the readed string contains at least one UTF-8 char this prints 0, else it prints its length */ if (readerBuffer != "exit"w) doSomethingElse(readerBuffer); /* Also, all the following code doesn't run as expected, the program doesn't wait for you, it executes readln() even without pressing/sending a key */ readln(readerBuffer); fl.info(readerBuffer.length); readln(readerBuffer); fl.info(readerBuffer.length); readln(readerBuffer); fl.info(readerBuffer.length); readln(readerBuffer); fl.info(readerBuffer.length); readln(readerBuffer); fl.info(readerBuffer.length); return 0; } The real code is bigger but this describes the bug. Also, if it needs to print UTF-8 there's no problem. My main problem is that the line is gonna be sended through a TCP socket and I wanna make it work with UTF-8. I'm using WCHAR instead of CHAR with the hope to get less problems in the future. I you comment the fixed Windows code, the program crashes http://prntscr.com/ajmy14 Also I tried stdin.flush() right after the first readln() but nothing seems to fix it. I'm doing something wrong? many thanks.