On 3/2/13 2:32 PM, Steven Schveighoffer wrote:
This is not a personal attack, please don't take it that way.

Not at all. I'm just confused about this exchange. You mention some correct facts ("MySplitter finished in 6 seconds instead of 10"), then some mistaken suppositions ("a string-specific version could be written"), and derive conclusions from both ("Maybe there is a bad constraint somewhere"). Hard to deal with the mix.

My
anecdotal tests with hand writing a custom splitter range to handle the
OP's program gave me a 40% savings. Whether it's find or not, I'm not
sure, but there definitely is room for improvement.

I think I understand where the confusion comes from. If you're referring to MySplitter, that's not comparable. It uses this at the core:

for(; i + 1 < source.length; i++)
{
    if(source[i] == '\r' && source[i + 1] == '\n')
    {
        found = true;
        break;
    }
    ...
}

This is not close to the code that would work with arrays. We could specialize things for small arrays, but that hasn't been done yet. My point is it's not comparable with the classic brute force subsequence search.

What is comparable is shown below. indexOf2 defines the straight array-in-array brute force search. With dmd on my machine it runs in some 7 seconds. So does indexOf (this was my point). Then indexOf3 uses a simple trick that the Java implementation does: cache the first character. That reduces the running time to 5 seconds. Then indexOf4 optimizes the loop searching that first character, bringing run time to about 4 seconds. On that machine the Java implementation runs in 3.5 seconds.


Andrei

import std.stdio, std.string, std.datetime;

long indexOf2(string haystack, string needle)
{
    if (!needle.length) return 0;
    if (needle.length > haystack.length) return -1;
    auto limit = haystack.length - needle.length;
 bigloop:
    for (size_t i; i <= limit; ++i)
    {
        foreach (q; 0 .. needle.length)
        {
            if (haystack[i + q] != needle[q]) continue bigloop;
        }
        return i;
    }
    return -1;
}

long indexOf3(string haystack, string needle)
{
    if (!needle.length) return 0;
    if (needle.length > haystack.length) return -1;
    immutable c = needle[0];
    auto limit = haystack.length - needle.length;
    assert(limit < uint.max, "not handled");
 bigloop:
    for (uint i; i <= limit; ++i)
    {
        if (haystack.ptr[i] != c) continue;
        foreach (q; 1 .. needle.length)
        {
            if (haystack[i + q] != needle[q]) continue bigloop;
        }
        return i;
    }
    return -1;
}

long indexOf4(string haystack, string needle)
{
    if (!needle.length) return 0;
    if (needle.length > haystack.length) return -1;
    immutable c = needle[0];
    auto limit = haystack.length - needle.length;
    size_t i = 0;
 bigloop:
    for (;;)
    {
        while (i <= limit && haystack[i++] != c) {}
        if (i-- > limit) break;
        foreach (q; 1 .. needle.length)
        {
            if (haystack[i + q] != needle[q]) continue bigloop;
        }
        return i;
    }
    return -1;
}

unittest
{
    assert(indexOf2("hello, world", ",") == 5);
    assert(indexOf2("hello, world", "a") == -1);
    assert(indexOf3("hello, world", ",") == 5);
    assert(indexOf3("hello, world", "a") == -1);
    assert(indexOf4("hello, world", ",") == 5);
    assert(indexOf4("hello, world", "a") == -1);
}

void main(string[] args)
{
auto message = "REGISTER sip:example.com SIP/2.0\r\nContent-Length: 0\r\nContact: <sip:12345@10.1.3.114:59788;transport=tls>;expires=4294967295;events=\"message-summary\";q=0.9\r\nTo: <sip:12...@comm.example.com>\r\nUser-Agent: (\"VENDOR=MyCompany\" \"My User Agent\")\r\nMax-Forwards: 70\r\nCSeq: 1 REGISTER\r\nVia: SIP/2.0/TLS 10.1.3.114:59788;branch=z9hG4bK2910497772630690\r\nCall-ID: 2910497622026445\r\nFrom: <sip:12...@comm.example.com>;tag=2910497618150713\r\n\r\n";
  auto t1 = Clock.currTime();
  for (int i=0; i<10000000; i++)
  {
    long index1 = 0;
    while (true)
    {
      auto index2 = indexOf(message[index1 .. $], "\r\n");
      if (index2 == -1)
        break;
      auto notused = message[index1 .. index1 + index2];
      if (args.length == 42) writeln(notused);
      index1 += index2 + 2;
    }
  }
  writeln(Clock.currTime()-t1);
}





Reply via email to