Re: A better way to write this function? (style question)

John Colvin Mon, 30 Dec 2013 14:56:00 -0800

On Monday, 30 December 2013 at 22:38:43 UTC, Brad Anderson wrote:

On Monday, 30 December 2013 at 22:30:02 UTC, John Colvin wrote:
On Monday, 30 December 2013 at 22:17:21 UTC, John Colvin wrote:
On Monday, 30 December 2013 at 21:40:58 UTC, Thomas Gannwrote:
I've written a Markov bot in D, and I have function whosejob it is to take an input string, convert all newlinecharacters to spaces and all uppercase letters to lowercase,and then return an array of words that are generated bysplitting the string up by whitespace. Here is the functionis question:
string[] split_sentence(string input)
{
 string line;

 foreach(c; input)
 {
     if(c == '\n' || c == '\r')
         line ~= ' ';

     else
         line ~= c.toLower();
 }

 return line.splitter(' ').filter!(a => a.length).array;
}
Obviously, one issue is that because the string isimmutable, I can't modify it directly, and so I actuallybuild an entirely new string in place. I would have justmade a mutable duplicate of the input and modify that, butthen I would get errors returning, because it expectsstring[] and not char[][]. Is there a more elegant way to dowhat I'm doing?
A few points:
by declaring a new string and appending to it you are riskinga lot of allocations. Either use std.array.appender orallocate the array with the correct size at the beginning.
using .array on the end of the ufcs chain is yet anotherallocation. It can be avoided using std.algorithm.copy tocopy the result back in to 'line'
In my opinion the whole API would be better as range-based:

auto splitSentence(R)(R input)
  if(isInputRange!R)
{
  return input
.map!(c => (c == "\n"[0] || c == "\r"[0]) ? ' ' :c.toLower)
         .splitter!(' ')
         .filter!(a => !(a.empty));
}
sorry, ignore that attempt, it's woefully broken...
Re: weird literal syntax, you didn't happen to be using dpasteto test and have trouble with character literals, did you?Because I did and thought I was going insane until realizedDPaste was broken.

oohhh so that was what that was. Anyway, here's what I have sofar:


import std.range : isInputRange;

auto splitSentence(R)(R input)
    if(isInputRange!R)
{
    import std.algorithm : map, splitter, filter;
    import std.uni : toLower;
    import std.range : ElementType;

    dchar preProc(ElementType!R c)
    {
        return (c == '\n' || c == '\r') ? ' ' : c.toLower;
    }

    return input
        .map!preProc
        .splitter!(c => c == ' ')
        .filter!(a => !(a.empty));
}

I have to have the function instead of a lamda in order to getthe return type correct. I guess I could cast or use std.conv.toinstead.

Anyway, the big problem I've hit is that AFAICT std.algorithmmakes a complete mess of unicode and i can't find a byCodeUnitrange anywhere in order to make it correct.

Re: A better way to write this function? (style question)

Reply via email to