Wow thanks for the quick response. The performance is *much, much* better with the suggested list-join. CPython still beats Pypy, but only by a narrow margin:
pypy1.6: 1m33.142s CPython 2.7.1: 1m12.092s Thanks for the advice-- I had forgotten about string immutability and its associated costs. And keep up the good work on pypy! I look forward to the day I can replace CPython with pypy in more interesting scientific workflows </end plug for scipy integration> A bit OT: The recent release of ipython added some powerful multiprocessing features using ZeroMQ. I've only glanced at pypy's extensive threading optimizations (e.g., greenlets). Does pypy jit across thread/process boundaries? -- Jake Biesinger Graduate Student Xie Lab, UC Irvine On Thu, Aug 18, 2011 at 4:01 PM, Justin Peel <[email protected]> wrote: > Yes, I just looked at it. For cases like this where there is > effectively only one reference to the string being appended to, it > just resizes the string in-place and copies in the string being > appended which gives it O(N) performance. It is a hack that is > available only because of the reference counting that CPython employs > for memory management. > > For reference, the hack is in Python/ceval.c in the string_concatenate > function. > > On Thu, Aug 18, 2011 at 4:50 PM, Aaron DeVore <[email protected]> > wrote: > > Python 2.4 introduced a change that helps improve performance of > > string concatenation, according to its release notes. I don't know > > anything beyond that. > > > > -Aaron DeVore > > > > On Thu, Aug 18, 2011 at 3:31 PM, Justin Peel <[email protected]> wrote: > >> Yes, Vincent's way is the better way to go. To elaborate more on the > >> problem, string appending is O(N^2) while appending to a list and then > >> joining is an O(N) operation. Why CPython is faster than Pypy at doing > >> the less efficient way is something that I'm not fully sure about, but > >> I believe that it might have to do with the differing memory > >> management strategies. > >> > >> On Thu, Aug 18, 2011 at 4:24 PM, Vincent Legoll > >> <[email protected]> wrote: > >>> Hello, > >>> > >>> Try this: > >>> > >>> import sys > >>> > >>> fasta_file = sys.argv[1] # should be *.fa > >>> print 'loading dna from', fasta_file > >>> chroms = {} > >>> dna = [] > >>> for l in open(fasta_file): > >>> if l.startswith('>'): # new chromosome > >>> if len(dna) > 0: > >>> chroms[chrom] = ''.join(dna) > >>> chrom = l.strip().replace('>', '') > >>> dna = [] > >>> else: > >>> dna.append(l.rstrip()) > >>> if len(dna) > 0: > >>> chroms[chrom] = ''.join(dna) > >>> > >>> -- > >>> Vincent Legoll > >>> _______________________________________________ > >>> pypy-dev mailing list > >>> [email protected] > >>> http://mail.python.org/mailman/listinfo/pypy-dev > >>> > >> _______________________________________________ > >> pypy-dev mailing list > >> [email protected] > >> http://mail.python.org/mailman/listinfo/pypy-dev > >> > > > _______________________________________________ > pypy-dev mailing list > [email protected] > http://mail.python.org/mailman/listinfo/pypy-dev >
_______________________________________________ pypy-dev mailing list [email protected] http://mail.python.org/mailman/listinfo/pypy-dev
