On Mon, 17 May 2010 14:00:41 -0400, Simen kjaeraas <simen.kja...@gmail.com> wrote:

Andrei Alexandrescu <seewebsiteforem...@erdani.org> wrote:

I have two unrelated suggestions about unjoin.

First, you may want to follow the model set by splitter() instead of split() when defining unjoin(). This is because split() allocates memory whereas splitter splits lazily so it doesn't need to. If you do want split(), just call array(splitter()).

Second, there is an ambiguity between splitting using a string as separator and splitting using a set of characters as separator. This could be solved by simply using different names:

string str = ...;
foreach (splitByOneOf(str, "; ")) { ... }
foreach (splitter(str, "; ")) { ... }

First look splits by one of the two, whereas the second splits by the exact string "; ".

An idea I am toying with is to factor things out into the data types. After all, if I'm splitting by "one of" an element in a set of elements, that should be reflected in the set's type. For example:

foreach (splitter(str, either(';', ' ')) { ... }
foreach (splitter(str, "; ")) { ... }

or, using a more general notion of a set:

foreach (splitter(str, set(';', ' ')) { ... }

D could use a set type, and this is a very nice way to specify these
different parameters.

votes = -~votes;

Comparing splitByOneOf(str, "; ")) to splitter(str, set(';', ' ')), I see one major difference here -- "; " is a literal, set(';', ' ') is not.

I would expect that 'set' as a generic set type would implement it's guts as some sort of tree/hash, which means a lot of overhead for a simple argument. With the literal version, the notation is in the function, not the type. While it seems rather neat, the overhead should be considered.

A compromise:

foreach(x; splitter(str, either("; ")))

Which can be implemented without heap activity.

-Steve

Reply via email to