Re: Using lazy code to process large files

Steven Schveighoffer via Digitalmars-d-learn Wed, 02 Aug 2017 08:56:26 -0700

On 8/2/17 11:02 AM, kdevel wrote:

On Wednesday, 2 August 2017 at 13:45:01 UTC, Steven Schveighoffer wrote:

As Daniel said, using byCodeUnit will help.

stripLeft seems to autodecode even when fed with CodeUnits. How do Iprevent this?


       1 void main ()
       2 {
       3    import std.stdio;
       4    import std.string;
       5    import std.conv;
       6    import std.utf;
       7    import std.algorithm;
       8

9 string [] src = [ " \xfc" ]; // blank + latin-1 encoded uumlaut

      10    auto result = src
      11       .map!(a => a.byCodeUnit)
      12       .map!(a => a.stripLeft);
      13    result.writeln;
      14 }

Crashes with a C++-like dump.

First, as a tip, please post either a link to a paste site, or don't putthe line numbers. It's much easier to copy-paste your code into aneditor if you don't have the line numbers.

What has happened is that you injected a non-encoded code point. InUTF8, any code point above 0x7f must be encoded into a string of severalcode units. See the table on this page: https://en.wikipedia.org/wiki/%C3%9C

If we use the correct code unit sequence (0xc3 0x9c), then it works:https://run.dlang.io/is/4umQoo


-Steve

Re: Using lazy code to process large files

Reply via email to