Re: [sqlalchemy] performance vs. psycopg2
On Fri, Dec 16, 2011 at 15:58, Jon Nelson jnel...@jamponi.net wrote: Revised to use: for row in rows: dict(row) # throw away result count += 1 I wonder how this could even work... iterating over the row yields individual values, not tuples?! I wonder what kind of column types you are using. Could you post your code for both your tests (with and without SA)? SQLAlchemy: 115,000 to 120,000 rows/s (vs. psycopg2 @ 480K - 580K, or psycopg2 COPY @ 620K). I suspect the issue is that I'm only selecting one column, so the per-row overhead is exaggerated. That is certainly a factor but even then, your numbers seem strange (at least to me). -- Gaëtan de Menten -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
Re: [sqlalchemy] performance vs. psycopg2
On Thu, Dec 15, 2011 at 19:52, Jon Nelson jnel...@jamponi.net wrote: On Thu, Dec 15, 2011 at 12:01 PM, Michael Bayer mike...@zzzcomputing.com wrote: On Dec 15, 2011, at 12:51 PM, Jon Nelson wrote: Up front, I'm not using the ORM at all, and I'm using SQLAlchemy 0.7.4 with psycopg2 2.4.3 on PostgreSQL 8.4.10 on Linux x86_64. I did some performance testing. Selecting 75 million rows (a straight up SELECT colA from tableA) from a 5GB table yielded some interesting results. psycopg2 averaged between 485,000 and 585,000 rows per second. Using COPY (via psycopg2) the average was right around 585,000. sqlalchemy averaged between 160,000 and 190,000 rows per second. That's a pretty big difference. Weird, IIRC, SA was much closer than raw psycopg2 (without using COPY), in the range of SA adding a 50% overhead, not a 200% overhead. I briefly looked into what the cause could be, but I didn't see anything jump out at me (except RowProxy, maybe). Thoughts? Performance tests like this are fraught with complicating details (such as, did you fully fetch each column in each row in both cases? Did you have equivalent unicode and numeric conversions in place in both tests ? ). In this case psycopg2 is written in pure C and SQLAlchemy's result proxy only partially (did you use the C extensions ?). You'd use the Python profiling module to get a clear picture for what difference there is in effort. But using any kind of abstraction layer, especially one written in Python, will always add latency versus a pure C program. I pretty much did this: for row in rows: count += 1 That test is probably flawed, as you don't fetch actual values. You should try to access individual elements (either by iterating over the row, or indexing it one way or another -- the speed difference can vary quite a bit depending on that). You might get even worse results with a proper test though ;-). -- Gaëtan de Menten -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
Re: [sqlalchemy] performance vs. psycopg2
On Fri, Dec 16, 2011 at 3:30 AM, Gaëtan de Menten gdemen...@gmail.com wrote: On Thu, Dec 15, 2011 at 19:52, Jon Nelson jnel...@jamponi.net wrote: On Thu, Dec 15, 2011 at 12:01 PM, Michael Bayer mike...@zzzcomputing.com wrote: On Dec 15, 2011, at 12:51 PM, Jon Nelson wrote: Up front, I'm not using the ORM at all, and I'm using SQLAlchemy 0.7.4 with psycopg2 2.4.3 on PostgreSQL 8.4.10 on Linux x86_64. I did some performance testing. Selecting 75 million rows (a straight up SELECT colA from tableA) from a 5GB table yielded some interesting results. psycopg2 averaged between 485,000 and 585,000 rows per second. Using COPY (via psycopg2) the average was right around 585,000. sqlalchemy averaged between 160,000 and 190,000 rows per second. That's a pretty big difference. Weird, IIRC, SA was much closer than raw psycopg2 (without using COPY), in the range of SA adding a 50% overhead, not a 200% overhead. I briefly looked into what the cause could be, but I didn't see anything jump out at me (except RowProxy, maybe). Thoughts? Performance tests like this are fraught with complicating details (such as, did you fully fetch each column in each row in both cases? Did you have equivalent unicode and numeric conversions in place in both tests ? ). In this case psycopg2 is written in pure C and SQLAlchemy's result proxy only partially (did you use the C extensions ?). You'd use the Python profiling module to get a clear picture for what difference there is in effort. But using any kind of abstraction layer, especially one written in Python, will always add latency versus a pure C program. I pretty much did this: for row in rows: count += 1 That test is probably flawed, as you don't fetch actual values. You should try to access individual elements (either by iterating over the row, or indexing it one way or another -- the speed difference can vary quite a bit depending on that). You might get even worse results with a proper test though ;-). Revised to use: for row in rows: dict(row) # throw away result count += 1 SQLAlchemy: 115,000 to 120,000 rows/s (vs. psycopg2 @ 480K - 580K, or psycopg2 COPY @ 620K). I suspect the issue is that I'm only selecting one column, so the per-row overhead is exaggerated. Thanks for the responses. -- Strange things are afoot at the Circle K. Jon -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
Re: [sqlalchemy] performance vs. psycopg2
On Dec 15, 2011, at 12:51 PM, Jon Nelson wrote: Up front, I'm not using the ORM at all, and I'm using SQLAlchemy 0.7.4 with psycopg2 2.4.3 on PostgreSQL 8.4.10 on Linux x86_64. I did some performance testing. Selecting 75 million rows (a straight up SELECT colA from tableA) from a 5GB table yielded some interesting results. psycopg2 averaged between 485,000 and 585,000 rows per second. Using COPY (via psycopg2) the average was right around 585,000. sqlalchemy averaged between 160,000 and 190,000 rows per second. That's a pretty big difference. I briefly looked into what the cause could be, but I didn't see anything jump out at me (except RowProxy, maybe). Thoughts? Performance tests like this are fraught with complicating details (such as, did you fully fetch each column in each row in both cases? Did you have equivalent unicode and numeric conversions in place in both tests ? ). In this case psycopg2 is written in pure C and SQLAlchemy's result proxy only partially (did you use the C extensions ?).You'd use the Python profiling module to get a clear picture for what difference there is in effort. But using any kind of abstraction layer, especially one written in Python, will always add latency versus a pure C program. PS - is the OrderedDict implementation in 2.6+ faster or slower than the one SA ships with? haven't clocked it but a source inspection indicates Python's would be much slower, as it's going for much more correct and comprehensive behavior using a linked list. Here's our __iter__() (self._list is a native Python list): def __iter__(self): return iter(self._list) Here's theirs: def __iter__(self): 'od.__iter__() == iter(od)' # Traverse the linked list in order. NEXT, KEY = 1, 2 root = self.__root curr = root[NEXT] while curr is not root: yield curr[KEY] curr = curr[NEXT] Thoughts? -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
Re: [sqlalchemy] performance vs. psycopg2
On Dec 15, 2011, at 1:01 PM, Michael Bayer wrote: haven't clocked it but a source inspection indicates Python's would be much slower, as it's going for much more correct and comprehensive behavior using a linked list. Here's our __iter__() (self._list is a native Python list): def __iter__(self): return iter(self._list) Here's theirs: def __iter__(self): 'od.__iter__() == iter(od)' # Traverse the linked list in order. NEXT, KEY = 1, 2 root = self.__root curr = root[NEXT] while curr is not root: yield curr[KEY] curr = curr[NEXT] Thoughts? I apologize for the snark here, it's just that I really can't overstate how obsessed we are with performance, for many years.If you read my blog, the CHANGES, look at all of our unit tests that specifically run the profiler and assert callcounts don't grow, you'd see that the amount of focus and effort that's gone into making the CPython interpreter here as fast as possible is relentless. There is nothing we haven't looked at, again and again. I looked at OrderedDict the day it came out, and I can assure you if it shaved just three ms off our usual operations, it would have been in the core on that day. Overall, CPython is just pretty slow. That's what Pypy hopes to solve - I'd look there if you need a giant speed boost, we support it fully and it's also in our continuous integration environment. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
Re: [sqlalchemy] performance vs. psycopg2
On Thu, Dec 15, 2011 at 12:01 PM, Michael Bayer mike...@zzzcomputing.com wrote: On Dec 15, 2011, at 12:51 PM, Jon Nelson wrote: Up front, I'm not using the ORM at all, and I'm using SQLAlchemy 0.7.4 with psycopg2 2.4.3 on PostgreSQL 8.4.10 on Linux x86_64. I did some performance testing. Selecting 75 million rows (a straight up SELECT colA from tableA) from a 5GB table yielded some interesting results. psycopg2 averaged between 485,000 and 585,000 rows per second. Using COPY (via psycopg2) the average was right around 585,000. sqlalchemy averaged between 160,000 and 190,000 rows per second. That's a pretty big difference. I briefly looked into what the cause could be, but I didn't see anything jump out at me (except RowProxy, maybe). Thoughts? Performance tests like this are fraught with complicating details (such as, did you fully fetch each column in each row in both cases? Did you have equivalent unicode and numeric conversions in place in both tests ? ). In this case psycopg2 is written in pure C and SQLAlchemy's result proxy only partially (did you use the C extensions ?). You'd use the Python profiling module to get a clear picture for what difference there is in effort. But using any kind of abstraction layer, especially one written in Python, will always add latency versus a pure C program. I pretty much did this: for row in rows: count += 1 I was using the C extensions. Thanks for the reply! -- Strange things are afoot at the Circle K. Jon -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.