Re: Strange behavior in console with UTF-8

2016-03-28 Thread Jonathan Villa via Digitalmars-d-learn
On Monday, 28 March 2016 at 18:28:33 UTC, Steven Schveighoffer 
wrote:

On 3/27/16 12:04 PM, Jonathan Villa wrote:

I can reproduce your issue on windows.

It works on Mac OS X.

I see different behavior on 32-bit (DMC stdlib) vs. 64-bit 
(MSVC stdlib). On both, the line is not read properly (I get a 
length of 0). On 32-bit, the program exits immediately, 
indicating it cannot read any more data.


On 64-bit, the program continues to allow input.

I don't think this is normal behavior, and should be filed as a 
bug. I'm not a Windows developer normally, but I would guess 
this is an issue with the Windows flavors of readln.


Please file here: https://issues.dlang.org under the Phobos 
component.


-Steve


Ok, I'm gonna register it with your data. Thanks.

JV.


Re: Strange behavior in console with UTF-8

2016-03-28 Thread Steven Schveighoffer via Digitalmars-d-learn

On 3/27/16 12:04 PM, Jonathan Villa wrote:

On Saturday, 26 March 2016 at 16:34:34 UTC, Steven Schveighoffer wrote:

On 3/25/16 6:47 PM, Jonathan Villa wrote:

On Friday, 25 March 2016 at 13:58:44 UTC, Steven Schveighoffer wrote:

[...]


OK, the following inputs I've tested: á, é, í, ó, ú, ñ, à, è, ì, ò, ù.
Just one input is enough to reproduce the behaviour.

JV


It's the same Ali suggested (if I get it right) and the behaviour its
the same.

It just get to send a UTF8 char to reproduce the mess, independently of
the char type you send.



At this point, I think knowing exactly what input you are sending
would be helpful. Can you attach a file which has the input that
causes the error? Or just paste the input into your post.



The following chars I've tested: á, é, í, ó, ú, ñ, à, è, ì, ò, ù.
Just one input of thouse is enough to reproduce the behaviour


I can reproduce your issue on windows.

It works on Mac OS X.

I see different behavior on 32-bit (DMC stdlib) vs. 64-bit (MSVC 
stdlib). On both, the line is not read properly (I get a length of 0). 
On 32-bit, the program exits immediately, indicating it cannot read any 
more data.


On 64-bit, the program continues to allow input.

I don't think this is normal behavior, and should be filed as a bug. I'm 
not a Windows developer normally, but I would guess this is an issue 
with the Windows flavors of readln.


Please file here: https://issues.dlang.org under the Phobos component.

-Steve


Re: Strange behavior in console with UTF-8

2016-03-27 Thread Jonathan Villa via Digitalmars-d-learn
On Saturday, 26 March 2016 at 16:34:34 UTC, Steven Schveighoffer 
wrote:

On 3/25/16 6:47 PM, Jonathan Villa wrote:
At this point, I think knowing exactly what input you are 
sending would be helpful. Can you attach a file which has the 
input that causes the error? Or just paste the input into your 
post.


-Steve


I've tested on Debian 4.2 x64 using CHAR type, and it behaves 
correctly without any problems.
Clearly this bug must be something related with the Windows 
console.


Here's the behaviour in Windows 10 x64:
http://prntscr.com/akskt1

And here's in Debian x64 4.2:
http://prntscr.com/akskjw

JV


Re: Strange behavior in console with UTF-8

2016-03-27 Thread Jonathan Villa via Digitalmars-d-learn
On Saturday, 26 March 2016 at 16:34:34 UTC, Steven Schveighoffer 
wrote:

On 3/25/16 6:47 PM, Jonathan Villa wrote:
On Friday, 25 March 2016 at 13:58:44 UTC, Steven Schveighoffer 
wrote:

[...]


OK, the following inputs I've tested: á, é, í, ó, ú, ñ, à, è, ì, 
ò, ù.

Just one input is enough to reproduce the behaviour.

JV


It's the same Ali suggested (if I get it right) and the 
behaviour its

the same.

It just get to send a UTF8 char to reproduce the mess, 
independently of

the char type you send.



At this point, I think knowing exactly what input you are 
sending would be helpful. Can you attach a file which has the 
input that causes the error? Or just paste the input into your 
post.


-Steve


The following chars I've tested: á, é, í, ó, ú, ñ, à, è, ì, ò, ù.
Just one input of thouse is enough to reproduce the behaviour


Re: Strange behavior in console with UTF-8

2016-03-26 Thread Steven Schveighoffer via Digitalmars-d-learn

On 3/25/16 6:47 PM, Jonathan Villa wrote:

On Friday, 25 March 2016 at 13:58:44 UTC, Steven Schveighoffer wrote:

On 3/24/16 8:54 PM, Jonathan Villa wrote:

[...]


D's File i/o uses C's FILE * i/o system. At least on Windows, this has
literally zero support for wchar (you can set stream width, and the
library just ignores it).

What is likely happening is that it is putting the char code units
into wchar buffer directly, which is not what you want.

I am not certain of this cause, but I would steer clear of any i/o
that is not char-based. What you can do is read into a char buffer,
and then re-encode using std.conv.to to get wchar strings if you need
that.



It's the same Ali suggested (if I get it right) and the behaviour its
the same.

It just get to send a UTF8 char to reproduce the mess, independently of
the char type you send.



At this point, I think knowing exactly what input you are sending would 
be helpful. Can you attach a file which has the input that causes the 
error? Or just paste the input into your post.


-Steve


Re: Strange behavior in console with UTF-8

2016-03-25 Thread Jonathan Villa via Digitalmars-d-learn
On Friday, 25 March 2016 at 13:58:44 UTC, Steven Schveighoffer 
wrote:

On 3/24/16 8:54 PM, Jonathan Villa wrote:

[...]


D's File i/o uses C's FILE * i/o system. At least on Windows, 
this has literally zero support for wchar (you can set stream 
width, and the library just ignores it).


What is likely happening is that it is putting the char code 
units into wchar buffer directly, which is not what you want.


I am not certain of this cause, but I would steer clear of any 
i/o that is not char-based. What you can do is read into a char 
buffer, and then re-encode using std.conv.to to get wchar 
strings if you need that.


-Steve


It's the same Ali suggested (if I get it right) and the behaviour 
its the same.


It just get to send a UTF8 char to reproduce the mess, 
independently of the char type you send.


JV


Re: Strange behavior in console with UTF-8

2016-03-25 Thread Steven Schveighoffer via Digitalmars-d-learn

On 3/24/16 8:54 PM, Jonathan Villa wrote:

I prefer to post this thing here because it could that I'm doing
something wrong.

I'm using std.stdio -> readln() to read whatever I'm typing in the console.
BUT, if the line contains some UTF-8 characters, the data obtained is
EMPTY and



module runnable;

import std.stdio;
import std.string : chomp;
import std.experimental.logger;

void doSomethingElse(wchar[] data)
{
 writeln("hello!");
}

int main(string[] args)
{
 /* Some fix I found to fix UTF-8 related problems, I'm using
Windows 10 */
 version(Windows)
 {
 import core.sys.windows.windows;
 if (SetConsoleCP(65001) == 0)
 throw new Exception("failure");
 if (SetConsoleOutputCP(65001) == 0)
 throw new Exception("failure");
 }
 FileLogger fl = new FileLogger("log.log");
 wchar[] readerBuffer;

 readln(readerBuffer);
 readerBuffer = chomp(readerBuffer);

 fl.info(readerBuffer.length); /* <- if the readed string contains
at least one UTF-8
 char this prints 0, else it
prints its length
*/

 if (readerBuffer != "exit"w)
 doSomethingElse(readerBuffer);

 /* Also, all the following code doesn't run as expected, the
program doesn't wait for
you, it executes readln() even without pressing/sending a key */
 readln(readerBuffer);
 fl.info(readerBuffer.length);
 readln(readerBuffer);
 fl.info(readerBuffer.length);
 readln(readerBuffer);
 fl.info(readerBuffer.length);
 readln(readerBuffer);
 fl.info(readerBuffer.length);
 readln(readerBuffer);
 fl.info(readerBuffer.length);

 return 0;
}

The real code is bigger but this describes the bug. Also, if it needs to
print UTF-8 there's no problem.

My main problem is that the line is gonna be sended through a TCP socket
and I wanna make it work with UTF-8. I'm using WCHAR instead of CHAR
with the hope to get less problems in the future.

I you comment the fixed Windows code, the program crashes
http://prntscr.com/ajmy14

Also I tried stdin.flush() right after the first readln() but nothing
seems to fix it.

I'm doing something wrong?
many thanks.


D's File i/o uses C's FILE * i/o system. At least on Windows, this has 
literally zero support for wchar (you can set stream width, and the 
library just ignores it).


What is likely happening is that it is putting the char code units into 
wchar buffer directly, which is not what you want.


I am not certain of this cause, but I would steer clear of any i/o that 
is not char-based. What you can do is read into a char buffer, and then 
re-encode using std.conv.to to get wchar strings if you need that.


-Steve


Re: Strange behavior in console with UTF-8

2016-03-24 Thread Jonathan Villa via Digitalmars-d-learn

On Friday, 25 March 2016 at 01:03:06 UTC, Ali Çehreli wrote:

>
Try char:

char[] readerBuffer;

Ali


Also tried with dchar ... there's no changes.


Re: Strange behavior in console with UTF-8

2016-03-24 Thread Jonathan Villa via Digitalmars-d-learn

On Friday, 25 March 2016 at 01:03:06 UTC, Ali Çehreli wrote:

On 03/24/2016 05:54 PM, Jonathan Villa wrote:

Try char:

char[] readerBuffer;




flush() has no effect on input streams.

Ali


Thankf fot he quick reply.
Unfortunately it behaves exactly as before with wchar.


Re: Strange behavior in console with UTF-8

2016-03-24 Thread Ali Çehreli via Digitalmars-d-learn

On 03/24/2016 05:54 PM, Jonathan Villa wrote:

> I'm using WCHAR instead of CHAR
> with the hope to get less problems in the future.

Try char:

char[] readerBuffer;

> Also I tried stdin.flush()

flush() has no effect on input streams.

Ali



Strange behavior in console with UTF-8

2016-03-24 Thread Jonathan Villa via Digitalmars-d-learn
I prefer to post this thing here because it could that I'm doing 
something wrong.


I'm using std.stdio -> readln() to read whatever I'm typing in 
the console.
BUT, if the line contains some UTF-8 characters, the data 
obtained is EMPTY and




module runnable;

import std.stdio;
import std.string : chomp;
import std.experimental.logger;

void doSomethingElse(wchar[] data)
{
writeln("hello!");
}

int main(string[] args)
{
/* Some fix I found to fix UTF-8 related problems, I'm using 
Windows 10 */

version(Windows)
{
import core.sys.windows.windows;
if (SetConsoleCP(65001) == 0)
throw new Exception("failure");
if (SetConsoleOutputCP(65001) == 0)
throw new Exception("failure");
}
FileLogger fl = new FileLogger("log.log");
wchar[] readerBuffer;

readln(readerBuffer);
readerBuffer = chomp(readerBuffer);

fl.info(readerBuffer.length); /* <- if the readed string 
contains at least one UTF-8
char this prints 0, else 
it prints its length

   */

if (readerBuffer != "exit"w)
doSomethingElse(readerBuffer);

/* Also, all the following code doesn't run as expected, the 
program doesn't wait for
   you, it executes readln() even without pressing/sending a 
key */

readln(readerBuffer);
fl.info(readerBuffer.length);
readln(readerBuffer);
fl.info(readerBuffer.length);
readln(readerBuffer);
fl.info(readerBuffer.length);
readln(readerBuffer);
fl.info(readerBuffer.length);
readln(readerBuffer);
fl.info(readerBuffer.length);

return 0;
}

The real code is bigger but this describes the bug. Also, if it 
needs to print UTF-8 there's no problem.


My main problem is that the line is gonna be sended through a TCP 
socket and I wanna make it work with UTF-8. I'm using WCHAR 
instead of CHAR with the hope to get less problems in the future.


I you comment the fixed Windows code, the program crashes
http://prntscr.com/ajmy14

Also I tried stdin.flush() right after the first readln() but 
nothing seems to fix it.


I'm doing something wrong?
many thanks.