Folks: I failed to make something sufficiently clear in my original message about StringChain. The use case that I am talking about is not simply that you need to accumulate a sequence of incoming chunks of data, concatenate them together, and then process the entire result. If that is all you need to do then indeed you can accumulate the incoming strings in a list and then run ''.join(thelist) when you have received all of them and are ready to process them.
But the use case that I am talking about is where you need to accumulate new incoming strings into your buffer while alternately processing leading prefixes of the buffer. The typical example is that sequences of bytes are arriving on your TCP socket and you are processing the stream incrementally. You need to process the leading prefixes of the stream as soon as you can (e.g. as soon as a line followed by a newline has accumulated, or as soon as another complete element in your data format has accumulated, etc.); you can't simply wait until the bytes stop coming and then concatenate the entire collection together at once. This is exactly the current situation in foolscap [1] which is causing a performance problem in Tahoe-LAFS [2]. To illustrate what I mean I cooked up some benchmarks showing the task of "accumulate a bunch of things then consume them in a single gulp" versus the task of "alternate between accumulating and consuming" (with accumulation going faster than consumption until the input stream runs dry). I implemented a few data structures for benchmarking: StringChain, the naive "accumulatorstr += newstr" approach (named "Stringy" in the benchmarks), one based on cStringIO (named "StringIOy"), one based on accumulating the new strings into a list and then doing ''.join(accumulatorlist) whenever you need to consume a leading prefix (called "SimplerStringChain"). Below are the abbreviated results of the benchmark. You can easily run this benchmark yourself by following the README.txt in the StringChain source package [3]. These results show that for the load imposed by this benchmark StringChain is the only one of the four that scales and that it is also nice and fast. The left-hand column is the total number of bytes that were run through it. The results are shown in nanoseconds per byte in exponential notation. ("e+01" means times 10, "e+02" means times 100, and "e+03" means times 1000, etc.) Each result is the best of 10 tries, or of 5 tries for the tasks which were taking too long to run it 10 times. Regards, Zooko [1] http://foolscap.lothar.com/trac/ticket/149 [2] http://allmydata.org/pipermail/tahoe-dev/2010-March/004181.html [3] http://tahoe-lafs.org/trac/stringchain impl: StringChain task: _accumulate_then_one_gulp 10000 best: 2.694e+00 50000 best: 2.742e+00 100000 best: 2.310e+00 500000 best: 2.040e+00 1000000 best: 1.988e+00 5000000 best: 2.193e+00 task: _alternate_str 10000 best: 6.509e+00 50000 best: 4.559e+00 100000 best: 4.308e+00 500000 best: 4.070e+00 1000000 best: 3.991e+00 5000000 best: 4.000e+00 impl: SimplerStringChain task: _accumulate_then_one_gulp 10000 best: 1.407e+00 50000 best: 2.317e+00 100000 best: 2.012e+00 500000 best: 1.902e+00 1000000 best: 1.897e+00 5000000 best: 2.104e+00 task: _alternate_str 10000 best: 4.888e+00 50000 best: 5.198e+00 100000 best: 1.750e+01 500000 best: 6.233e+01 1000000 best: 1.134e+02 5000000 best: 7.599e+02 impl: StringIOy task: _accumulate_then_one_gulp 10000 best: 4.196e+00 50000 best: 5.522e+00 100000 best: 4.499e+00 500000 best: 3.756e+00 1000000 best: 4.176e+00 5000000 best: 5.414e+00 task: _alternate_str 10000 best: 5.484e+00 50000 best: 7.863e+00 100000 best: 2.126e+01 500000 best: 6.972e+01 1000000 best: 1.219e+02 5000000 best: 9.463e+02 task: _accumulate_then_one_gulp 10000 best: 1.502e+00 50000 best: 1.420e+01 100000 best: 2.245e+01 500000 best: 8.577e+01 1000000 best: 2.295e+02 5000000 best: 1.326e+03 task: _alternate_str 10000 best: 3.290e+00 50000 best: 4.220e+00 100000 best: 1.665e+01 500000 best: 6.281e+01 1000000 best: 1.127e+02 5000000 best: 7.626e+02 -- http://mail.python.org/mailman/listinfo/python-list