Wow thanks for the quick response.  The performance is *much, much* better
with the suggested list-join.  CPython still beats Pypy, but only by a
narrow margin:

pypy1.6:             1m33.142s
CPython 2.7.1:       1m12.092s

Thanks for the advice-- I had forgotten about string immutability and its
associated costs.  And keep up the good work on pypy!  I look forward to the
day I can replace CPython with pypy in more interesting scientific workflows
</end plug for scipy integration>

A bit OT:  The recent release of ipython added some powerful multiprocessing
features using ZeroMQ.  I've only glanced at pypy's extensive threading
optimizations (e.g., greenlets).  Does pypy jit across thread/process
boundaries?
--
Jake Biesinger
Graduate Student
Xie Lab, UC Irvine


On Thu, Aug 18, 2011 at 4:01 PM, Justin Peel <pee...@gmail.com> wrote:

> Yes, I just looked at it. For cases like this where there is
> effectively only one reference to the string being appended to, it
> just resizes the string in-place and copies in the string being
> appended which gives it O(N) performance. It is a hack that is
> available only because of the reference counting that CPython employs
> for memory management.
>
> For reference, the hack is in Python/ceval.c in the string_concatenate
> function.
>
> On Thu, Aug 18, 2011 at 4:50 PM, Aaron DeVore <aaron.dev...@gmail.com>
> wrote:
> > Python 2.4 introduced a change that helps improve performance of
> > string concatenation, according to its release notes. I don't know
> > anything beyond that.
> >
> > -Aaron DeVore
> >
> > On Thu, Aug 18, 2011 at 3:31 PM, Justin Peel <pee...@gmail.com> wrote:
> >> Yes, Vincent's way is the better way to go. To elaborate more on the
> >> problem, string appending is O(N^2) while appending to a list and then
> >> joining is an O(N) operation. Why CPython is faster than Pypy at doing
> >> the less efficient way is something that I'm not fully sure about, but
> >> I believe that it might have to do with the differing memory
> >> management strategies.
> >>
> >> On Thu, Aug 18, 2011 at 4:24 PM, Vincent Legoll
> >> <vincent.leg...@gmail.com> wrote:
> >>> Hello,
> >>>
> >>> Try this:
> >>>
> >>> import sys
> >>>
> >>> fasta_file = sys.argv[1]  # should be *.fa
> >>> print 'loading dna from', fasta_file
> >>> chroms = {}
> >>> dna = []
> >>> for l in open(fasta_file):
> >>>    if l.startswith('>'):  # new chromosome
> >>>        if len(dna) > 0:
> >>>            chroms[chrom] = ''.join(dna)
> >>>        chrom = l.strip().replace('>', '')
> >>>        dna = []
> >>>    else:
> >>>        dna.append(l.rstrip())
> >>> if len(dna) > 0:
> >>>    chroms[chrom] = ''.join(dna)
> >>>
> >>> --
> >>> Vincent Legoll
> >>> _______________________________________________
> >>> pypy-dev mailing list
> >>> pypy-dev@python.org
> >>> http://mail.python.org/mailman/listinfo/pypy-dev
> >>>
> >> _______________________________________________
> >> pypy-dev mailing list
> >> pypy-dev@python.org
> >> http://mail.python.org/mailman/listinfo/pypy-dev
> >>
> >
> _______________________________________________
> pypy-dev mailing list
> pypy-dev@python.org
> http://mail.python.org/mailman/listinfo/pypy-dev
>
_______________________________________________
pypy-dev mailing list
pypy-dev@python.org
http://mail.python.org/mailman/listinfo/pypy-dev

Reply via email to