On Wednesday, 2 August 2017 at 17:37:09 UTC, Steven Schveighoffer wrote:

What is expected? What I see on the screen when I run my code is:

[Ü]

Upper case?

What I see when I run your "working" code is:

[?]

Your terminal is incapable of rendering the Latin-1 encoding. The program prints one byte of value 0xfc. You may pipe the output into hexdump -C:

00000000  5b fc 5d 0a                                       |[ü].|
00000004

You are missing the point that your input string is invalid.

It's perfectly okay to put any value a octet can take into an octet. I did not claim that the data in the string memory is syntactically valid UTF-8. Read the comment in line 9 of my post of 15:02:22.

std.algorithm is not validating the entire string,

True and it should not. So this is what I want.

and so it doesn't throw an error like string.stripLeft does.

That is the point. You wrote

| I wouldn't expect good performance from this, as there is auto-decoding all
| over the place.

I erroneously thought that using byCodeUnit disables the whole UTF-8 processing and enforces operation on (u)bytes. But this is not the case at least not for stripLeft and probably other string functions.

writeln doesn't do any decoding of individual strings. It avoids the problem and just copies your bad data directly.

That is what I expected.


Reply via email to