Re: Major performance problem with std.array.front()

Nick Sabalausky Mon, 10 Mar 2014 00:11:34 -0700

On 3/10/2014 12:23 AM, Walter Bright wrote:

On 3/9/2014 9:19 PM, Nick Sabalausky wrote:

On 3/9/2014 6:31 PM, Walter Bright wrote:

On 3/9/2014 6:08 AM, "Marc Schütz" <schue...@gmx.net>" wrote:

Also, `byCodeUnit` and `byCodePoint` would probably be better names
than `raw`
and `decode`, to much the already existing `byGrapheme` in std.uni.


I'd vastly prefer 'byChar', 'byWchar', 'byDchar' for each of string,
wstring, dstring, and InputRange!char, etc.


'byCodePoint' and 'byDchar' are the same. However, 'byCodeUnit' is
completely
different from anything else:

string  str;
wstring wstr;
dstring dstr;

(str|wchar|dchar).byChar  // Always range of char
(str|wchar|dchar).byWchar // Always range of wchar
(str|wchar|dchar).byDchar // Always range of dchar

str.representation  // Range of ubyte
wstr.representation // Range of ushort
dstr.representation // Range of uint

str.byCodeUnit  // Range of char
wstr.byCodeUnit // Range of wchar
dstr.byCodeUnit // Range of dchar


I don't see much point to the latter 3.


Do you mean:

1. You don't see the point to iterating by code unit?
2. You don't see the point to 'byCodeUnit' if we have 'representation'?

3. You don't see the point to 'byCodeUnit' if we have'byChar/byWchar/byDchar'?

4. You don't see the point to having 'byCodeUnit' work on UTF-32 dstrings?

Responses:

1. Iterating by code unit: Useful for tweaking performance anytimedecoding is unnecessary. For example, parsing a grammar where the bulkof the keywords and operators are ASCII. (Occasional uses of Unicode,like unicode whitespace, can of course be handled easily enough by thelexer FSM).

2. 'byCodeUnit' if we have 'representation': This one I have troubleanswering since I'm still unclear on the purpose of 'representation' (Iwasn't even aware of it until a few days ago.) I've been assumingthere's some specific use-case I've overlooked where it's useful toiterate by code unit *while* treating the code units as if they weren'tUTF-8/16/32 at all. But since 'representation' is called *on* astring/wstring/dstring, they should already be UTF-8/16/32 anyway, notsome other encoding that would necessitate using integer types. Or maybeit's just for working around problems with the auto-verification beingtoo eager (I've ran into those)? I admit I don't quite get 'representation'.

3. 'byCodeUnit' if we have 'byChar/byWchar/byDchar': To avoid a "staticif" chain every time you want to use code units inside generic code.Also, so in non-generic code you can change your data type withoutupdating instances of 'by*char'.

4. Having 'byCodeUnit' work on UTF-32 dstrings: So generic code workingon code units doesn't have to special-case UTF-32.

Re: Major performance problem with std.array.front()

Reply via email to