On 8/2/17 11:02 AM, kdevel wrote:
On Wednesday, 2 August 2017 at 13:45:01 UTC, Steven Schveighoffer wrote:
As Daniel said, using byCodeUnit will help.

stripLeft seems to autodecode even when fed with CodeUnits. How do I prevent this?

       1 void main ()
       2 {
       3    import std.stdio;
       4    import std.string;
       5    import std.conv;
       6    import std.utf;
       7    import std.algorithm;
       8
9 string [] src = [ " \xfc" ]; // blank + latin-1 encoded u umlaut
      10    auto result = src
      11       .map!(a => a.byCodeUnit)
      12       .map!(a => a.stripLeft);
      13    result.writeln;
      14 }

Crashes with a C++-like dump.


First, as a tip, please post either a link to a paste site, or don't put the line numbers. It's much easier to copy-paste your code into an editor if you don't have the line numbers.

What has happened is that you injected a non-encoded code point. In UTF8, any code point above 0x7f must be encoded into a string of several code units. See the table on this page: https://en.wikipedia.org/wiki/%C3%9C

If we use the correct code unit sequence (0xc3 0x9c), then it works: https://run.dlang.io/is/4umQoo

-Steve

Reply via email to