igor: > The fundamental difference is that in C++, I create a single object (a > line buffer) that's reused for each input line and column values are > extracted straight from that buffer without creating new string > objects. In python, new objects must be created and destroyed by the > million which must incur serious memory management overhead.
Python creates indeed many objects (as I think Tim once said "it allocates memory at a ferocious rate"), but the management of memory is quite efficient. And you may use the JIT Psyco (that's currently 1000 times more useful than PyPy, despite sadly not being developed anymore) that in some situations avoids data copying (example: in slices). Python is designed for string processing, and from my experience string processing Psyco programs may be faster than similar not-optimized-to-death C++/D programs (you can see that manually crafted code, or from ShedSkin that's often slower than Psyco during string processing). But in every language I know to gain performance you need to know the language, and Python isn't C++, so other kinds of tricks are necessary. The following advice is useful too: DouhetSukd: >Bottom line: Python built-in data objects, such as dictionaries and sets, are very much optimized. Relying on them, rather than writing a lot of ifs and doing weird data structure manipulations in Python itself, is a good approach to try. Try to build those objects outside of your main processing loops.< Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list