On Thursday, July 05, 2012 00:04:03 Andrei Alexandrescu wrote: > On 7/4/12 10:11 PM, Jonathan M Davis wrote: > > On Wednesday, July 04, 2012 21:33:31 Andrei Alexandrescu wrote: > >> Great. Could you please post some code so we play with it? Thanks. > > > > Okay. You can use this: > [snip] > > Thanks. I made the following change to popFront: > > @trusted void popFront(A)(ref A a) > if (isNarrowString!A && isMutable!A && !isStaticArray!A) > { > assert(a.length, "Attempting to popFront() past the end of an array > of " > ~ typeof(a[0]).stringof); > immutable c = a[0]; > if (c < 0x80) > { > a = a.ptr[1 .. a.length]; > } > else > { > import core.bitop; > immutable msbs = 7 - bsr(~c); > if ((msbs >= 2) & (msbs <= 6)) > { > a = a[msbs .. $]; > } > else > { > //throw new UTFException("Invalid UTF-8 sequence", 0); > } > } > } > > For some reason, uncommenting the throwing code makes the function > significantly slower. That seems to be an issue with the compiler > because putting the throw in a function seems to restore speed. > > With the above, I get on a Mac: > > ascii 126.61%: old [682 ms, 479 μs, and 3 hnsecs], new [864 ms, 102 μs, > and 1 hnsec] > uni 86.76%: old [1 sec, 888 ms, 17 μs, and 8 hnsecs], new [1 sec, 638 > ms, 76 μs, and 3 hnsecs] > > So the ascii string handling became actually 27% faster whereas the uni > string handling is 13% slower. > > It might be argued that checking for validity is not the metier of > popFront; only if you do try to use stuff (e.g. by calling front) should > one see exceptions. If popFront sees incorrect characters, it should > just skip them one at a time. Following that argument, the > implementation may be: > > @trusted void popFront(A)(ref A a) > if (isNarrowString!A && isMutable!A && !isStaticArray!A) > { > assert(a.length, "Attempting to popFront() past the end of an array > of " > ~ typeof(a[0]).stringof); > immutable c = a[0]; > if (c < 0x80) > { > a = a.ptr[1 .. a.length]; > } > else > { > import core.bitop; > auto msbs = 7 - bsr(~c); > if ((msbs < 2) | (msbs > 6)) > { > msbs = 1; > } > a = a[msbs .. $]; > } > } > > With this code I get: > > ascii 115.39%: old [744 ms, 103 μs, and 6 hnsecs], new [858 ms, 628 μs, > and 4 hnsecs] > uni 96.78%: old [1 sec, 877 ms, and 461 μs], new [1 sec, 817 ms, 14 > μs, and 3 hnsecs]
Hmm. With the version without bounds checking compared against consumeFront, I'm getting downright bizarre results. _Without_ optimizations, I'm seeing around 85% for both ASCII and unicode, but with full optimizations, it's actually at about 140% for ASCII and 102% for unicode (with -release and - release -O being around 90% and 80% respectively). So, this code seems _very_ sensitive to optimizations and whatever's going on with the CPU in terms of caching and context switching. My StringCache proposal is slightly faster than consumeFront, but the situation is essentially the same. It's around 80 - 90% of front and you're improved popFront without -inline, but with -inline, it's at about 101 - 140%. So, this is just weird. It looks like the improved popFront makes it so that it's very uncertain as to whether it or consumeFront/StringCache will be faster. consumeFront and StringCache both use decode rather than popFront, so improvements to decode might make them win, but the front/popFront combo would get the same improvements (that, and looking at decode, it looks pretty streamlined already). I guess that the various solutions are all close enough that it becomes completely a question of the optimizer as to which is going to be faster, much as conceptually, consumeFront and StringCache should win (at least for unicode strings). That being the case, we _definitely_ don't want to go with consumeFront (it's far too invasive), and not even the relatively uninvasive StringCache seems like a good idea. It might improve the code, and it might make it worse. I wonder how the results look on gdc when using the improved popFront, given how surprising they were with consumeFront. In any case, I guess that this shows that what we can get with popFront is so close to what consumeFront or StringCache would do that we might as well not bother with them, which is a _big_ surpise to me. It does pay to benchmark code though. - Jonathan M Davis