On Tuesday, 13 July 2021 at 05:37:49 UTC, ag0aep6g wrote:
On 13.07.21 03:03, someone wrote:
On Monday, 12 July 2021 at 23:28:29 UTC, ag0aep6g wrote:
[...]
I'm not sure where we stand with `in`

You mean *we* = D developers ?

Yes. Let me rephrase and elaborate: I'm not sure what the current status of `in` is. It used to mean `const scope`. But DIP1000 changes the effects of `scope` and there was some discussion about its relation to `in`.

Checking the spec, it says that `in` simply means `const` unless you use `-preview=in`. The preview switch makes it `const scope` again, but that's not all. There's also something about passing by reference.

https://dlang.org/spec/function.html#in-params

ACK. So for the time being I'll be reverting all my input parameters to const (unless ref or out of course) and when the whole in DIP matter resolves (one way or the other) I'll revert them (or not) accordingly. Parameters declared in read more naturally (and akin to out) than const but is form not function what I need to get right right now.

For a UDT like mine I think it has a lot of sense because when I think of a string and I want to chop/count/whatever on it my mind works one-based not zero-based. Say "abc" needs b my mind works a lot easier mid("abc", 2, 1) than mid("abc", 1, 1) and besides I am *not* returning a range or a reference slice to a range or whatever I am returning a whole new string construction. If I would be returning a range I will follow common sense since I don't know what will be done thereafter of course.

I think you're setting yourself up for off-by-one bugs by going against the grain like that. Your functions are one-based. The rest of the D world, including the standard library, is zero-based. You're bound to forget to account for the difference.

And I think you have a good point. I'll reconsider.

But it's your code, and you can do whatever you want, of course. Just looked like it might be a mistake.

All in all the whole module was updated accordingly and it seems it is working as expected (further testing needed) but, in the meantime, I learned a lot of things following the advice given by you, Ali, and others in this forum:

```d
/// implementation-bugs [-] using foreach (with this structure) always misses the last grapheme‐cluster … possible phobos bug # 20483 @ unittest's last line

/// implementation‐tasks [+] reconsider making this whole UDT zero‐based as suggested by ag0aep6g—has a good point /// implementation‐tasks [+] reconsider excessive cast usage as suggested by Ali: bypassing compiler checks could be potentially harmful … cast and integer promotion @ http://ddili.org/ders/d.en/cast.html /// implementation‐tasks [-] for the time being input parameters are declared const instead of in; eventually they'll be back to in when the related DIP was setted once and for all; but, definetely—not scope const

/// implementation‐tasks‐possible [-] pad[L|R]
/// implementation‐tasks‐possible [-] replicate/repeat
/// implementation‐tasks‐possible [-] replace(string, string)
/// implementation‐tasks‐possible [-] translate(string, string) … same‐size strings matching one‐to‐one

/// usage: array slicing can be used for usual things like: left() right() substr() etc … mainly when grapheme‐clusters are not expected at all /// usage: array slicing needs a zero‐based first range argument and a second one one‐based (or one‐past‐beyond; which it is somehow … counter‐intuitive

module fw.types.UniCode;

import std.algorithm : map, joiner;
import std.array : array;
import std.conv : to;
import std.range : walkLength, take, tail, drop, dropBack; /// repeat, padLeft, padRight
import std.stdio;
import std.uni : Grapheme, byGrapheme;

/// within this file: gudtUGC



shared static this() { } /// the following will be executed only‐once per‐app: static this() { } /// the following will be executed only‐once per‐thread: static ~this() { } /// the following will be executed only‐once per‐thread: shared static ~this() { } /// the following will be executed only‐once per‐app:



alias stringUGC = Grapheme;
alias stringUGC08 = gudtUGC!(stringUTF08);
alias stringUGC16 = gudtUGC!(stringUTF16);
alias stringUGC32 = gudtUGC!(stringUTF32);
alias stringUTF08 = string;  /// same as immutable(char )[];
alias stringUTF16 = wstring; /// same as immutable(wchar)[];
alias stringUTF32 = dstring; /// same as immutable(dchar)[];

/// mixin templateUGC!(stringUTF08, r"gudtUGC08"d);
/// mixin templateUGC!(stringUTF16, r"gudtUGC16"d);
/// mixin templateUGC!(stringUTF32, r"gudtUGC32"d);
/// template templateUGC (typeStringUTF, alias lstrStructureID) { /// if these were possible there will be no need for stringUGC## aliases in main()

public struct gudtUGC(typeStringUTF) { /// UniCode grapheme‐cluster‐aware string manipulation (implemented for one‐based operations)

   /// provides: public property size_t count

   /// provides: public size_t decode(typeStringUTF strSequence)
   /// provides: public typeStringUTF encode()

/// provides: public gudtUGC!(typeStringUTF) take(size_t intStart, size_t intCount = 1) /// provides: public gudtUGC!(typeStringUTF) takeL(size_t intCount) /// provides: public gudtUGC!(typeStringUTF) takeR(size_t intCount) /// provides: public gudtUGC!(typeStringUTF) chopL(size_t intCount) /// provides: public gudtUGC!(typeStringUTF) chopR(size_t intCount) /// provides: public gudtUGC!(typeStringUTF) padL(size_t intCount, typeStringUTF strPadding = r" ") /// provides: public gudtUGC!(typeStringUTF) padR(size_t intCount, typeStringUTF strPadding = r" ")

/// provides: public typeStringUTF takeasUTF(size_t intStart, size_t intCount = 1)
   /// provides: public typeStringUTF takeLasUTF(size_t intCount)
   /// provides: public typeStringUTF takeRasUTF(size_t intCount)
   /// provides: public typeStringUTF chopLasUTF(size_t intCount)
   /// provides: public typeStringUTF chopRasUTF(size_t intCount)
/// provides: public typeStringUTF padL(size_t intCount, typeStringUTF strPadding = r" ") /// provides: public typeStringUTF padR(size_t intCount, typeStringUTF strPadding = r" ")

/// usage; eg: stringUGC32("äëåčñœß … russian = русский 🇷🇺 ≠ 🇯🇵 日本語 = japanese"d).take(35, 3).take(1,2).take(1,1).encode(); /// 日 /// usage; eg: stringUGC32("äëåčñœß … russian = русский 🇷🇺 ≠ 🇯🇵 日本語 = japanese"d).take(35).encode(); /// 日 /// usage; eg: stringUGC32("äëåčñœß … russian = русский 🇷🇺 ≠ 🇯🇵 日本語 = japanese"d).takeasUTF(35); /// 日

   void popFront() { ++pintSequenceCurrent; }
bool empty() { return pintSequenceCurrent == pintSequenceCount; } typeStringUTF front() { return takeasUTF(pintSequenceCurrent); }

   private stringUGC[] pugcSequence;
   private size_t pintSequenceCount = cast(size_t) 0;
   private size_t pintSequenceCurrent = cast(size_t) 0;

   @property public size_t count() { return pintSequenceCount; }

   this(
      const typeStringUTF lstrSequence
      ) {

      /// (1) given UTF‐encoded sequence

      decode(lstrSequence);

   }

@safe public size_t decode( /// UniCode (UTF‐encoded → grapheme‐cluster) sequence
      const typeStringUTF lstrSequence
      ) {

      /// (1) given UTF‐encoded sequence

      size_t lintSequenceCount = cast(size_t) 0;

      if (lstrSequence is null) {

         pugcSequence = null;
         pintSequenceCount = cast(size_t) 0;
         pintSequenceCurrent = cast(size_t) 0;

      } else {

         pugcSequence = lstrSequence.byGrapheme.array;
         pintSequenceCount = pugcSequence.walkLength;
         pintSequenceCurrent = cast(size_t) 1;

         lintSequenceCount = pintSequenceCount;

      }

      return lintSequenceCount;

   }

@safe public typeStringUTF encode() { /// UniCode (grapheme‐cluster → UTF‐encoded) sequence

      typeStringUTF lstrSequence = null;

      if (pintSequenceCount >= cast(size_t) 1) {

         lstrSequence = pugcSequence
            .map!((ref g) => g[])
            .joiner
            .to!(typeStringUTF)
            ;

      }

      return lstrSequence;

   }

@safe public gudtUGC!(typeStringUTF) take( /// UniCode (grapheme‐cluster → grapheme‐cluster) sequence
      const size_t lintStart,
      const size_t lintCount = cast(size_t) 1
      ) {

      /// (1) given start position >= 1
      /// (2) given count >= 1

      gudtUGC!(typeStringUTF) lugcSequence;

if (lintStart >= cast(size_t) 1 && lintCount >= cast(size_t) 1) {

/// eg#1: takeasUTF(1,3) → range#1=start-1=1-1=0 and range#2=range#1+count=0+3=3 → 0..3 /// eg#1: takeasUTF(6,3) → range#2=start-1=6-1=5 and range#2=range#1+count=5+3=8 → 5..8

/// eg#2: takeasUTF(01,1) → range#1=start-1=01-1=00 and range#2=range#1+count=00+1=01 → 00..01 /// eg#2: takeasUTF(50,1) → range#2=start-1=50-1=49 and range#2=range#1+count=49+1=50 → 49..50

         size_t lintRange1 = lintStart - cast(size_t) 1;
         size_t lintRange2 = lintRange1 + lintCount;

         if (lintRange2 <= pintSequenceCount) {

lugcSequence = gudtUGC!(typeStringUTF)(pugcSequence[lintRange1..lintRange2]
               .map!((ref g) => g[])
               .joiner
               .to!(typeStringUTF)
               );

         }

      }

      return lugcSequence;

   }

@safe public gudtUGC!(typeStringUTF) takeL( /// UniCode (grapheme‐cluster → grapheme‐cluster) sequence
      const size_t lintCount
      ) {

      /// (1) given count >= 1

      gudtUGC!(typeStringUTF) lugcSequence;

if (lintCount >= cast(size_t) 1 && lintCount <= pintSequenceCount) {

         lugcSequence = gudtUGC!(typeStringUTF)(pugcSequence
            .take(lintCount)
            .map!((ref g) => g[])
            .joiner
            .to!(typeStringUTF)
            );

      }

      return lugcSequence;

   }

@safe public gudtUGC!(typeStringUTF) takeR( /// UniCode (grapheme‐cluster → grapheme‐cluster) sequence
      const size_t lintCount
      ) {

      /// (1) given count >= 1

      gudtUGC!(typeStringUTF) lugcSequence;

if (lintCount >= cast(size_t) 1 && lintCount <= pintSequenceCount) {

         lugcSequence = gudtUGC!(typeStringUTF)(pugcSequence
            .tail(lintCount)
            .map!((ref g) => g[])
            .joiner
            .to!(typeStringUTF)
            );

      }

      return lugcSequence;

   }

@safe public gudtUGC!(typeStringUTF) chopL( /// UniCode (grapheme‐cluster → grapheme‐cluster) sequence
      const size_t lintCount
      ) {

      /// (1) given count >= 1

      gudtUGC!(typeStringUTF) lugcSequence;

if (lintCount >= cast(size_t) 1 && lintCount <= pintSequenceCount) {

         lugcSequence = gudtUGC!(typeStringUTF)(pugcSequence
            .drop(lintCount)
            .map!((ref g) => g[])
            .joiner
            .to!(typeStringUTF)
            );

      }

      return lugcSequence;

   }

@safe public gudtUGC!(typeStringUTF) chopR( /// UniCode (grapheme‐cluster → grapheme‐cluster) sequence
      const size_t lintCount
      ) {

      /// (1) given count >= 1

      gudtUGC!(typeStringUTF) lugcSequence;

if (lintCount >= cast(size_t) 1 && lintCount <= pintSequenceCount) {

         lugcSequence = gudtUGC!(typeStringUTF)(pugcSequence
            .dropBack(lintCount)
            .map!((ref g) => g[])
            .joiner
            .to!(typeStringUTF)
            );

      }

      return lugcSequence;

   }

@safe public typeStringUTF takeasUTF( /// UniCode (grapheme‐cluster → UTF‐encoded) sequence
      const size_t lintStart,
      const size_t lintCount = cast(size_t) 1
      ) {

      /// (1) given start position >= 1
      /// (2) given count >= 1

      typeStringUTF lstrSequence = null;

if (lintStart >= cast(size_t) 1 && lintCount >= cast(size_t) 1) {

/// eg#1: takeasUTF(1,3) → range#1=start-1=1-1=0 and range#2=range#1+count=0+3=3 → 0..3 /// eg#1: takeasUTF(6,3) → range#2=start-1=6-1=5 and range#2=range#1+count=5+3=8 → 5..8

/// eg#2: takeasUTF(01,1) → range#1=start-1=01-1=00 and range#2=range#1+count=00+1=01 → 00..01 /// eg#2: takeasUTF(50,1) → range#2=start-1=50-1=49 and range#2=range#1+count=49+1=50 → 49..50

         size_t lintRange1 = lintStart - cast(size_t) 1;
         size_t lintRange2 = lintRange1 + lintCount;

         if (lintRange2 <= pintSequenceCount) {

            lstrSequence = pugcSequence[lintRange1..lintRange2]
               .map!((ref g) => g[])
               .joiner
               .to!(typeStringUTF)
               ;

         }

      }

      return lstrSequence;

   }

@safe public typeStringUTF takeLasUTF( /// UniCode (grapheme‐cluster → UTF‐encoded) sequence
      const size_t lintCount
      ) {

      /// (1) given count >= 1

      typeStringUTF lstrSequence = null;

if (lintCount >= cast(size_t) 1 && lintCount <= pintSequenceCount) {

         lstrSequence = pugcSequence
            .take(lintCount)
            .map!((ref g) => g[])
            .joiner
            .to!(typeStringUTF)
            ;

      }

      return lstrSequence;

   }

@safe public typeStringUTF takeRasUTF( /// UniCode (grapheme‐cluster → UTF‐encoded) sequence
      const size_t lintCount
      ) {

      /// (1) given count >= 1

      typeStringUTF lstrSequence = null;

if (lintCount >= cast(size_t) 1 && lintCount <= pintSequenceCount) {

         lstrSequence = pugcSequence
            .tail(lintCount)
            .map!((ref g) => g[])
            .joiner
            .to!(typeStringUTF)
            ;

      }

      return lstrSequence;

   }

@safe public typeStringUTF chopLasUTF( /// UniCode (grapheme‐cluster → UTF‐encoded) sequence
      const size_t lintCount
      ) {

      /// (1) given count >= 1

      typeStringUTF lstrSequence = null;

if (lintCount >= cast(size_t) 1 && lintCount <= pintSequenceCount) {

         lstrSequence = pugcSequence
            .drop(lintCount)
            .map!((ref g) => g[])
            .joiner
            .to!(typeStringUTF)
            ;

      }

      return lstrSequence;

   }

@safe public typeStringUTF chopRasUTF( /// UniCode (grapheme‐cluster → UTF‐encoded) sequence
      const size_t lintCount
      ) {

      /// (1) given count >= 1

      typeStringUTF lstrSequence = null;

if (lintCount >= cast(size_t) 1 && lintCount <= pintSequenceCount) {

         lstrSequence = pugcSequence
            .dropBack(lintCount)
            .map!((ref g) => g[])
            .joiner
            .to!(typeStringUTF)
            ;

      }

      return lstrSequence;

   }

@safe public typeStringUTF padLasUTF( /// UniCode (grapheme‐cluster → UTF‐encoded) sequence
      const size_t lintCount,
      const typeStringUTF lstrPadding = cast(typeStringUTF) r" "
      ) {

      /// (1) given count >= 1
      /// [2] given padding (default is a single blank space)

      typeStringUTF lstrSequence = null;

if (lintCount >= cast(size_t) 1 && lintCount > pintSequenceCount) {

         lstrSequence = null; /// pending

      }

      return lstrSequence;

   }

@safe public typeStringUTF padRasUTF( /// UniCode (grapheme‐cluster → UTF‐encoded) sequence
      const size_t lintCount,
      const typeStringUTF lstrPadding = cast(typeStringUTF) r" "
      ) {

      /// (1) given count >= 1
      /// [2] given padding (default is a single blank space)

      typeStringUTF lstrSequence = null;

if (lintCount >= cast(size_t) 1 && lintCount > pintSequenceCount) {

         lstrSequence = null; /// pending

      }

      return lstrSequence;

   }

}

unittest {

   version (useUTF08) {
stringUTF08 lstrSequence1 = r"12345678901234567890123456789012345678901234567890"c; stringUTF08 lstrSequence2 = r"1234567890АВГДЕЗИЙКЛABCDEFGHIJabcdefghijQRSTUVWXYZ"c; stringUTF08 lstrSequence3 = "äëåčñœß … russian = русский 🇷🇺 ≠ 🇯🇵 日本語 = japanese 😎"c;
   }

   version (useUTF16) {
stringUTF16 lstrSequence1 = r"12345678901234567890123456789012345678901234567890"w; stringUTF16 lstrSequence2 = r"1234567890АВГДЕЗИЙКЛABCDEFGHIJabcdefghijQRSTUVWXYZ"w; stringUTF16 lstrSequence3 = "äëåčñœß … russian = русский 🇷🇺 ≠ 🇯🇵 日本語 = japanese 😎"w;
   }

   version (useUTF32) {
stringUTF32 lstrSequence1 = r"12345678901234567890123456789012345678901234567890"d; stringUTF32 lstrSequence2 = r"1234567890АВГДЕЗИЙКЛABCDEFGHIJabcdefghijQRSTUVWXYZ"d; stringUTF32 lstrSequence3 = "äëåčñœß … russian = русский 🇷🇺 ≠ 🇯🇵 日本語 = japanese 😎"d;
   }

   size_t lintSequence1sizeUTF = lstrSequence1.length;
   size_t lintSequence2sizeUTF = lstrSequence2.length;
   size_t lintSequence3sizeUTF = lstrSequence3.length;

   size_t lintSequence1sizeUGA = lstrSequence1.walkLength;
   size_t lintSequence2sizeUGA = lstrSequence2.walkLength;
   size_t lintSequence3sizeUGA = lstrSequence3.walkLength;

size_t lintSequence1sizeUGC = lstrSequence1.byGrapheme.walkLength; size_t lintSequence2sizeUGC = lstrSequence2.byGrapheme.walkLength; size_t lintSequence3sizeUGC = lstrSequence3.byGrapheme.walkLength;

   assert(lintSequence1sizeUGC == cast(size_t) 50);
   assert(lintSequence2sizeUGC == cast(size_t) 50);
   assert(lintSequence3sizeUGC == cast(size_t) 50);

   assert(lintSequence1sizeUGA == cast(size_t) 50);
   assert(lintSequence2sizeUGA == cast(size_t) 50);
   assert(lintSequence3sizeUGA == cast(size_t) 52);

   version (useUTF08) {
   assert(lintSequence1sizeUTF == cast(size_t) 50);
   assert(lintSequence2sizeUTF == cast(size_t) 60);
   assert(lintSequence3sizeUTF == cast(size_t) 91);
   }

   version (useUTF16) {
   assert(lintSequence1sizeUTF == cast(size_t) 50);
   assert(lintSequence2sizeUTF == cast(size_t) 50);
   assert(lintSequence3sizeUTF == cast(size_t) 57);
   }

   version (useUTF32) {
   assert(lintSequence1sizeUTF == cast(size_t) 50);
   assert(lintSequence2sizeUTF == cast(size_t) 50);
   assert(lintSequence3sizeUTF == cast(size_t) 52);
   }

/// the following should be the same regardless of the encoding being used and is the whole point of this UDT being made:

version (useUTF08) { alias stringUTF = stringUTF08; stringUGC08 lugcSequence3 = stringUGC08(lstrSequence3); } version (useUTF16) { alias stringUTF = stringUTF16; stringUGC16 lugcSequence3 = stringUGC16(lstrSequence3); } version (useUTF32) { alias stringUTF = stringUTF32; stringUGC32 lugcSequence3 = stringUGC32(lstrSequence3); }

   assert(lugcSequence3.encode() == lstrSequence3);

assert(lugcSequence3.take(35, 3).take(1,2).take(1,1).encode() == cast(stringUTF) r"日");

assert(lugcSequence3.take(21).encode() == cast(stringUTF) r"р"); assert(lugcSequence3.take(27).encode() == cast(stringUTF) r"й"); assert(lugcSequence3.take(35).encode() == cast(stringUTF) r"日"); assert(lugcSequence3.take(37).encode() == cast(stringUTF) r"語"); assert(lugcSequence3.take(21, 7).encode() == cast(stringUTF) r"русский"); assert(lugcSequence3.take(35, 3).encode() == cast(stringUTF) r"日本語");

   assert(lugcSequence3.takeasUTF(21) == cast(stringUTF) r"р");
   assert(lugcSequence3.takeasUTF(27) == cast(stringUTF) r"й");
   assert(lugcSequence3.takeasUTF(35) == cast(stringUTF) r"日");
   assert(lugcSequence3.takeasUTF(37) == cast(stringUTF) r"語");
assert(lugcSequence3.takeasUTF(21, 7) == cast(stringUTF) r"русский"); assert(lugcSequence3.takeasUTF(35, 3) == cast(stringUTF) r"日本語");

assert(lugcSequence3.takeL(1).encode() == cast(stringUTF) r"ä"); assert(lugcSequence3.takeR(1).encode() == cast(stringUTF) r"😎"); assert(lugcSequence3.takeL(7).encode() == cast(stringUTF) r"äëåčñœß"); assert(lugcSequence3.takeR(16).encode() == cast(stringUTF) r"日本語 = japanese 😎");

   assert(lugcSequence3.takeLasUTF(1) == cast(stringUTF) r"ä");
   assert(lugcSequence3.takeRasUTF(1) == cast(stringUTF) r"😎");
assert(lugcSequence3.takeLasUTF(7) == cast(stringUTF) r"äëåčñœß"); assert(lugcSequence3.takeRasUTF(16) == cast(stringUTF) r"日本語 = japanese 😎");

assert(lugcSequence3.chopL(10).encode() == cast(stringUTF) r"russian = русский 🇷🇺 ≠ 🇯🇵 日本語 = japanese 😎"); assert(lugcSequence3.chopR(21).encode() == cast(stringUTF) r"äëåčñœß … russian = русский 🇷🇺");

assert(lugcSequence3.chopLasUTF(10) == cast(stringUTF) r"russian = русский 🇷🇺 ≠ 🇯🇵 日本語 = japanese 😎"); assert(lugcSequence3.chopRasUTF(21) == cast(stringUTF) r"äëåčñœß … russian = русский 🇷🇺");

   version (useUTF08) { stringUTF08 lstrSequence3reencoded; }
   version (useUTF16) { stringUTF16 lstrSequence3reencoded; }
   version (useUTF32) { stringUTF32 lstrSequence3reencoded; }

   for (
      size_t lintSequenceUGC = cast(size_t) 1;
      lintSequenceUGC <= lintSequence3sizeUGC;
      ++lintSequenceUGC
      ) {

lstrSequence3reencoded ~= lugcSequence3.takeasUTF(lintSequenceUGC);

   }

   assert(lstrSequence3reencoded == lstrSequence3);

   lstrSequence3reencoded = null;

version (useUTF08) { foreach (stringUTF08 lstrSequence3UGC; lugcSequence3) { lstrSequence3reencoded ~= lstrSequence3UGC; } } version (useUTF16) { foreach (stringUTF16 lstrSequence3UGC; lugcSequence3) { lstrSequence3reencoded ~= lstrSequence3UGC; } } version (useUTF32) { foreach (stringUTF32 lstrSequence3UGC; lugcSequence3) { lstrSequence3reencoded ~= lstrSequence3UGC; } }

//assert(lstrSequence3reencoded == lstrSequence3); /// ooops … always missing last grapheme‐cluster: possible bug # 20483

}
```

Reply via email to