On Mon, Mar 22, 2010 at 2:07 AM, Steven D'Aprano <ste...@remove.this.cybersource.com.au> wrote: > > Perhaps you should have said that it was a wrapper around deque giving > richer functionality, rather than giving the impression that it was a > brand new data structure invented by you. People are naturally going to > be more skeptical about a newly invented data structure than one based on > a reliable, component like deque.
The fact that StringChain uses deque to hold the queue of strings isn't that important. I just benchmarked it by swapping in the deque for a list and using the list costs about one third of a nanosecond per byte at the scales that the benchmark exercises (namely 10,000,000 bytes in about 10,000 strings). A third of a nanosecond per byte is about 4% of the runtime. I also implemented and benchmarked a simpler deque-based scheme which puts all the actual bytes from the strings into a deque with self.d.extend(newstr). As you would expect, this shows good asymptotic performance but the constants are relatively bad so that at all of the actual loads measured it was three orders of magnitude worse than StringChain and than String-Chain-with-a-list-instead-of-a-deque. Moral: the constants matter! Those benchmarks are appended. You can run the benchmarks yourself per the README.txt. But anyway, I take your point and I updated the StringChain README to explain that it is a pretty simple data structure that holds a list (actually a deque) of strings and isn't anything too clever or novel. By the way, we could further micro-optimize this kind of thing if ''.join() would accept either strings or buffers instead of requiring strings: >>> ''.join([buffer("abc"), "def"]) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: sequence item 0: expected string, buffer found Then whenever StringChain needs to slice a string into leading and trailing parts, it could construct a buffer() viewing each part instead of making a copy of each part. > it. Maybe you should consider linking to it on PyPI and seeing if there > is any interest from others? http://pypi.python.org/pypi/stringchain Regards, Zooko impl: StringChain task: _accumulate_then_one_gulp 10000 best: 5.698e+00, 3th-best: 7.486e+00, mean: 7.758e+00, 100000 best: 4.640e+00, 3th-best: 4.690e+00, mean: 7.674e+00, 1000000 best: 3.789e+00, 3th-best: 3.806e+00, mean: 3.995e+00, 10000000 best: 4.095e+00, 1th-best: 4.095e+00, mean: 4.095e+00, task: _alternate_str 10000 best: 1.378e+01, 3th-best: 1.390e+01, mean: 1.500e+01, 100000 best: 9.198e+00, 3th-best: 9.248e+00, mean: 9.385e+00, 1000000 best: 8.715e+00, 3th-best: 8.755e+00, mean: 8.808e+00, 10000000 best: 8.738e+00, 1th-best: 8.738e+00, mean: 8.738e+00, impl: StringChainWithList task: _accumulate_then_one_gulp 10000 best: 3.600e+00, 3th-best: 3.695e+00, mean: 4.129e+00, 100000 best: 4.070e+00, 3th-best: 4.079e+00, mean: 4.162e+00, 1000000 best: 3.662e+00, 3th-best: 3.688e+00, mean: 3.721e+00, 10000000 best: 3.902e+00, 1th-best: 3.902e+00, mean: 3.902e+00, 1th-worst: 3.902e+00, worst: 3.902e+00 (of 1) task: _alternate_str 10000 best: 1.369e+01, 3th-best: 1.380e+01, mean: 1.442e+01, 100000 best: 9.251e+00, 3th-best: 9.289e+00, mean: 9.416e+00, 1000000 best: 8.809e+00, 3th-best: 8.876e+00, mean: 8.943e+00, 10000000 best: 9.095e+00, 1th-best: 9.095e+00, mean: 9.095e+00, impl: Dequey task: _accumulate_then_one_gulp 10000 best: 2.772e+02, 3th-best: 2.785e+02, mean: 2.911e+02, 100000 best: 2.314e+02, 3th-best: 2.334e+02, mean: 2.422e+02, 1000000 best: 2.282e+02, 3th-best: 2.288e+02, mean: 2.370e+02, 10000000 best: 2.587e+02, 1th-best: 2.587e+02, mean: 2.587e+02, task: _alternate_str 10000 best: 1.576e+03, 3th-best: 1.580e+03, mean: 1.608e+03, 100000 best: 1.301e+03, 3th-best: 1.303e+03, mean: 1.306e+03, 1000000 best: 1.275e+03, 3th-best: 1.276e+03, mean: 1.278e+03, 10000000 best: 1.280e+03, 1th-best: 1.280e+03, mean: 1.280e+03, -- http://mail.python.org/mailman/listinfo/python-list