Re: Using lazy code to process large files

kdevel via Digitalmars-d-learn Wed, 02 Aug 2017 11:31:42 -0700

On Wednesday, 2 August 2017 at 17:37:09 UTC, Steven Schveighofferwrote:

What is expected? What I see on the screen when I run my codeis:
[Ü]


Upper case?

What I see when I run your "working" code is:

[?]

Your terminal is incapable of rendering the Latin-1 encoding. Theprogram prints one byte of value 0xfc. You may pipe the outputinto hexdump -C:


00000000  5b fc 5d 0a                                       |[ü].|
00000004

You are missing the point that your input string is invalid.

It's perfectly okay to put any value a octet can take into anoctet. I did not claim that the data in the string memory issyntactically valid UTF-8. Read the comment in line 9 of my postof 15:02:22.

std.algorithm is not validating the entire string,


True and it should not. So this is what I want.

and so it doesn't throw an error like string.stripLeft does.


That is the point. You wrote

| I wouldn't expect good performance from this, as there isauto-decoding all

| over the place.

I erroneously thought that using byCodeUnit disables the wholeUTF-8 processing and enforces operation on (u)bytes. But this isnot the case at least not for stripLeft and probably other stringfunctions.

writeln doesn't do any decoding of individual strings. Itavoids the problem and just copies your bad data directly.


That is what I expected.

Re: Using lazy code to process large files

Reply via email to