Wow thanks for the quick response. The performance is *much, much* better with the suggested list-join. CPython still beats Pypy, but only by a narrow margin:
pypy1.6: 1m33.142s CPython 2.7.1: 1m12.092s Thanks for the advice-- I had forgotten about string immutability and its associated costs. And keep up the good work on pypy! I look forward to the day I can replace CPython with pypy in more interesting scientific workflows </end plug for scipy integration> A bit OT: The recent release of ipython added some powerful multiprocessing features using ZeroMQ. I've only glanced at pypy's extensive threading optimizations (e.g., greenlets). Does pypy jit across thread/process boundaries? -- Jake Biesinger Graduate Student Xie Lab, UC Irvine On Thu, Aug 18, 2011 at 4:01 PM, Justin Peel <pee...@gmail.com> wrote: > Yes, I just looked at it. For cases like this where there is > effectively only one reference to the string being appended to, it > just resizes the string in-place and copies in the string being > appended which gives it O(N) performance. It is a hack that is > available only because of the reference counting that CPython employs > for memory management. > > For reference, the hack is in Python/ceval.c in the string_concatenate > function. > > On Thu, Aug 18, 2011 at 4:50 PM, Aaron DeVore <aaron.dev...@gmail.com> > wrote: > > Python 2.4 introduced a change that helps improve performance of > > string concatenation, according to its release notes. I don't know > > anything beyond that. > > > > -Aaron DeVore > > > > On Thu, Aug 18, 2011 at 3:31 PM, Justin Peel <pee...@gmail.com> wrote: > >> Yes, Vincent's way is the better way to go. To elaborate more on the > >> problem, string appending is O(N^2) while appending to a list and then > >> joining is an O(N) operation. Why CPython is faster than Pypy at doing > >> the less efficient way is something that I'm not fully sure about, but > >> I believe that it might have to do with the differing memory > >> management strategies. > >> > >> On Thu, Aug 18, 2011 at 4:24 PM, Vincent Legoll > >> <vincent.leg...@gmail.com> wrote: > >>> Hello, > >>> > >>> Try this: > >>> > >>> import sys > >>> > >>> fasta_file = sys.argv[1] # should be *.fa > >>> print 'loading dna from', fasta_file > >>> chroms = {} > >>> dna = [] > >>> for l in open(fasta_file): > >>> if l.startswith('>'): # new chromosome > >>> if len(dna) > 0: > >>> chroms[chrom] = ''.join(dna) > >>> chrom = l.strip().replace('>', '') > >>> dna = [] > >>> else: > >>> dna.append(l.rstrip()) > >>> if len(dna) > 0: > >>> chroms[chrom] = ''.join(dna) > >>> > >>> -- > >>> Vincent Legoll > >>> _______________________________________________ > >>> pypy-dev mailing list > >>> pypy-dev@python.org > >>> http://mail.python.org/mailman/listinfo/pypy-dev > >>> > >> _______________________________________________ > >> pypy-dev mailing list > >> pypy-dev@python.org > >> http://mail.python.org/mailman/listinfo/pypy-dev > >> > > > _______________________________________________ > pypy-dev mailing list > pypy-dev@python.org > http://mail.python.org/mailman/listinfo/pypy-dev >
_______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev