Re: A better way to write this function? (style question)

Brad Anderson Mon, 30 Dec 2013 14:36:26 -0800

On Monday, 30 December 2013 at 21:40:58 UTC, Thomas Gann wrote:

I've written a Markov bot in D, and I have function whose jobit is to take an input string, convert all newline charactersto spaces and all uppercase letters to lowercase, and thenreturn an array of words that are generated by splitting thestring up by whitespace. Here is the function is question:
string[] split_sentence(string input)
{
    string line;

    foreach(c; input)
    {
        if(c == '\n' || c == '\r')
            line ~= ' ';

        else
            line ~= c.toLower();
    }

    return line.splitter(' ').filter!(a => a.length).array;
}
Obviously, one issue is that because the string is immutable, Ican't modify it directly, and so I actually build an entirelynew string in place. I would have just made a mutable duplicateof the input and modify that, but then I would get errorsreturning, because it expects string[] and not char[][]. Isthere a more elegant way to do what I'm doing?


Not a huge improvement and it could be a lot faster, no doubt:

    string[] split_sentence(string input)
    {
        import std.regex;
        auto split_re = ctRegex!(r"[\n\r ]");
        return input.split(split_re)
                    .map!toLower
                    .filter!(a => !a.empty)
                    .array();
    }

You could return the result of filter instead of an array to makeit lazy and avoid an allocation if the caller is able to use itin that form. toLower had some really slow and incorrect behaviorbut I think that was fixed in 2.064. It still might be ASCIIonly though. I believe std.uni has a toLower that is correct forall of unicode.

Re: A better way to write this function? (style question)

Reply via email to