Am Wed, 05 Jun 2013 13:36:06 -0400 schrieb Andrei Alexandrescu <seewebsiteforem...@erdani.org>:
> I think it's worth discussing the interface much more than the approach > to modularization. > > Walter's algo traffics in InputRange!ubyte and offers an > InputRange!ubyte. That makes sense in some situations, but often > trafficking one ubyte at a time may be not only inefficient but also the > wrong granularity. Consider: > > auto data = File("input.txt").byChunk().compress(); I realized that yesterday. Now that my circular buffer implementation appears to be thread safe enough *cough* to run some tests I realized that when I produce 64 KiB input blocks and mark them as read byte-by-byte I get a massive overhead. > That won't work because byChunk deals in ubyte[], not ubyte. How do we > fix this while keeping everybody efficient? My first attempt will keep a byte-wise interface but use larger buffer chunks inside the circular buffer, so that some of the checks (most notably: consumer marks a byte of buffer as read, can the producer continue?) can be delayed until e.g. 4 KiB or whatever is available have been processed. > I talked to Walter and during this work he figured a lot of things about > how ranges work and how they generate code. Turns out that the range > equivalent of a tight loop is slightly less efficient with dmd because a > range must keep its state together, which is harder to enregister than a > bunch of automatic variables. Then again, for performance critical applications dmd has long lost touch with GCC and LLVM, so while of big interest for its author and sometimes in "D is slow" threads, there are good alternatives. (I think LLVM is a real enabler in the current explosion of programming languages.) > Right now we have joiner(), which given several ranges of T, offers a > range of T. Developing along that idea, we should have two opposite > functions: itemize() and collect(). > > itemize() takes a range of ranges of T and offers a range of T. For > example, given a range of T[], offers a range of T. > > collect() takes a range of T and offers a range of T[]. The number of > items in each chunk can be a parameter. > > With that in tow, we can set things up such that compress() and expand() > traffic in ranges of ranges of ubyte (or simply ranges of ubyte[]), > which ensures work at maximum speed. Then the matter of adapting to and > fro ranges of ubyte is a simple matter of chaining a call to itemize() > or collect(). > > Destroy. That's neat from a usability point of view. Does this mean collect() will always return a range of built-in arrays, since it uses arrays as the internal buffer? > I salute this code. It is concise, well engineered, well written, just > as general as it needs, uses the right features in the right places, and > does real work. A perfect example to follow. The only thing I'd add is > this convenience function: > > void putBits(bool[] bits...) { > foreach (b; bits) putBit(b); > } > > That reduces a bunch of code, and leaves room for future optimizations > inside putBits. > > > Andrei Putting single bits into the buffer is an important functionality, but I'm not convinced of this solution. Real code has the bits in integer variables or could use: buffer.putBits!3(0b111); - or - buffer.putBits(3, 0b111); It is always more efficient than using a variadic number of function parameters or even a bool[]. In many cases the bits can be ORed on the current write position directly. One other feature worth adding might be to reverse the bit order before putting them into or after pulling them out of the buffer. It happens once in gzip code I've seen. -- Marco