Re: [sqlalchemy] performance vs. psycopg2

2011-12-19 Thread Gaëtan de Menten
On Fri, Dec 16, 2011 at 15:58, Jon Nelson jnel...@jamponi.net wrote:

 Revised to use:

 for row in rows:
  dict(row) # throw away result
  count += 1

I wonder how this could even work... iterating over the row yields
individual values, not tuples?!

I wonder what kind of column types you are using. Could you post your
code for both your tests (with and without SA)?

 SQLAlchemy: 115,000 to 120,000 rows/s (vs. psycopg2 @ 480K - 580K, or
 psycopg2 COPY @ 620K).

 I suspect the issue is that I'm only selecting one column, so the
 per-row overhead is exaggerated.

That is certainly a factor but even then, your numbers seem strange
(at least to me).

-- 
Gaëtan de Menten

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] performance vs. psycopg2

2011-12-16 Thread Gaëtan de Menten
On Thu, Dec 15, 2011 at 19:52, Jon Nelson jnel...@jamponi.net wrote:
 On Thu, Dec 15, 2011 at 12:01 PM, Michael Bayer
 mike...@zzzcomputing.com wrote:

 On Dec 15, 2011, at 12:51 PM, Jon Nelson wrote:

 Up front, I'm not using the ORM at all, and I'm using SQLAlchemy 0.7.4
 with psycopg2 2.4.3 on PostgreSQL 8.4.10 on Linux x86_64.

 I did some performance testing. Selecting 75 million rows (a straight
 up SELECT colA from tableA) from a 5GB table yielded some interesting
 results.
 psycopg2 averaged between 485,000 and 585,000 rows per second.
 Using COPY (via psycopg2) the average was right around 585,000.
 sqlalchemy averaged between 160,000 and 190,000 rows per second.
 That's a pretty big difference.

Weird, IIRC, SA was much closer than raw psycopg2 (without using
COPY), in the range of SA adding a 50% overhead, not a 200% overhead.

 I briefly looked into what the cause could be, but I didn't see
 anything jump out at me (except RowProxy, maybe).
 Thoughts?

 Performance tests like this are fraught with complicating details (such as, 
 did you fully fetch each column in each row in both cases?  Did you have 
 equivalent unicode and numeric conversions in place in both tests ? ).   In 
 this case psycopg2 is written in pure C and SQLAlchemy's result proxy only 
 partially (did you use the C extensions ?).    You'd use the Python 
 profiling module to get a clear picture for what difference there is in 
 effort.   But using any kind of abstraction layer, especially one written in 
 Python, will always add latency versus a pure C program.

 I pretty much did this:
 for row in rows:
  count += 1

That test is probably flawed, as you don't fetch actual values. You
should try to access individual elements (either by iterating over the
row, or indexing it one way or another -- the speed difference can
vary quite a bit depending on that). You might get even worse results
with a proper test though ;-).

-- 
Gaëtan de Menten

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] performance vs. psycopg2

2011-12-16 Thread Jon Nelson
On Fri, Dec 16, 2011 at 3:30 AM, Gaëtan de Menten gdemen...@gmail.com wrote:
 On Thu, Dec 15, 2011 at 19:52, Jon Nelson jnel...@jamponi.net wrote:
 On Thu, Dec 15, 2011 at 12:01 PM, Michael Bayer
 mike...@zzzcomputing.com wrote:

 On Dec 15, 2011, at 12:51 PM, Jon Nelson wrote:

 Up front, I'm not using the ORM at all, and I'm using SQLAlchemy 0.7.4
 with psycopg2 2.4.3 on PostgreSQL 8.4.10 on Linux x86_64.

 I did some performance testing. Selecting 75 million rows (a straight
 up SELECT colA from tableA) from a 5GB table yielded some interesting
 results.
 psycopg2 averaged between 485,000 and 585,000 rows per second.
 Using COPY (via psycopg2) the average was right around 585,000.
 sqlalchemy averaged between 160,000 and 190,000 rows per second.
 That's a pretty big difference.

 Weird, IIRC, SA was much closer than raw psycopg2 (without using
 COPY), in the range of SA adding a 50% overhead, not a 200% overhead.

 I briefly looked into what the cause could be, but I didn't see
 anything jump out at me (except RowProxy, maybe).
 Thoughts?

 Performance tests like this are fraught with complicating details (such as, 
 did you fully fetch each column in each row in both cases?  Did you have 
 equivalent unicode and numeric conversions in place in both tests ? ).   In 
 this case psycopg2 is written in pure C and SQLAlchemy's result proxy only 
 partially (did you use the C extensions ?).    You'd use the Python 
 profiling module to get a clear picture for what difference there is in 
 effort.   But using any kind of abstraction layer, especially one written 
 in Python, will always add latency versus a pure C program.

 I pretty much did this:
 for row in rows:
  count += 1

 That test is probably flawed, as you don't fetch actual values. You
 should try to access individual elements (either by iterating over the
 row, or indexing it one way or another -- the speed difference can
 vary quite a bit depending on that). You might get even worse results
 with a proper test though ;-).

Revised to use:

for row in rows:
  dict(row) # throw away result
  count += 1

SQLAlchemy: 115,000 to 120,000 rows/s (vs. psycopg2 @ 480K - 580K, or
psycopg2 COPY @ 620K).

I suspect the issue is that I'm only selecting one column, so the
per-row overhead is exaggerated.

Thanks for the responses.

-- 
Strange things are afoot at the Circle K.
Jon

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] performance vs. psycopg2

2011-12-15 Thread Michael Bayer

On Dec 15, 2011, at 12:51 PM, Jon Nelson wrote:

 Up front, I'm not using the ORM at all, and I'm using SQLAlchemy 0.7.4
 with psycopg2 2.4.3 on PostgreSQL 8.4.10 on Linux x86_64.
 
 I did some performance testing. Selecting 75 million rows (a straight
 up SELECT colA from tableA) from a 5GB table yielded some interesting
 results.
 psycopg2 averaged between 485,000 and 585,000 rows per second.
 Using COPY (via psycopg2) the average was right around 585,000.
 sqlalchemy averaged between 160,000 and 190,000 rows per second.
 That's a pretty big difference.
 
 I briefly looked into what the cause could be, but I didn't see
 anything jump out at me (except RowProxy, maybe).
 Thoughts?
 

Performance tests like this are fraught with complicating details (such as, did 
you fully fetch each column in each row in both cases?  Did you have equivalent 
unicode and numeric conversions in place in both tests ? ).   In this case 
psycopg2 is written in pure C and SQLAlchemy's result proxy only partially (did 
you use the C extensions ?).You'd use the Python profiling module to get a 
clear picture for what difference there is in effort.   But using any kind of 
abstraction layer, especially one written in Python, will always add latency 
versus a pure C program.


 PS - is the OrderedDict implementation in 2.6+ faster or slower than
 the one SA ships with?

haven't clocked it but a source inspection indicates Python's would be much 
slower, as it's going for much more correct and comprehensive behavior using 
a linked list.

Here's our __iter__() (self._list is a native Python list):

def __iter__(self):
return iter(self._list)


Here's theirs:

def __iter__(self):
'od.__iter__() == iter(od)'
# Traverse the linked list in order.
NEXT, KEY = 1, 2
root = self.__root
curr = root[NEXT]
while curr is not root:
yield curr[KEY]
curr = curr[NEXT]

Thoughts?







-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] performance vs. psycopg2

2011-12-15 Thread Michael Bayer

On Dec 15, 2011, at 1:01 PM, Michael Bayer wrote:

 
 
 haven't clocked it but a source inspection indicates Python's would be much 
 slower, as it's going for much more correct and comprehensive behavior 
 using a linked list.
 
 Here's our __iter__() (self._list is a native Python list):
 
def __iter__(self):
return iter(self._list)
 
 
 Here's theirs:
 
def __iter__(self):
'od.__iter__() == iter(od)'
# Traverse the linked list in order.
NEXT, KEY = 1, 2
root = self.__root
curr = root[NEXT]
while curr is not root:
yield curr[KEY]
curr = curr[NEXT]
 
 Thoughts?


I apologize for the snark here, it's just that I really can't overstate how 
obsessed we are with performance, for many years.If you read my blog, the 
CHANGES, look at all of our unit tests that specifically run the profiler and 
assert callcounts don't grow, you'd see that the amount of focus and effort 
that's gone into making the CPython interpreter here as fast as possible is 
relentless.   There is nothing we haven't looked at, again and again.   I 
looked at OrderedDict the day it came out, and I can assure you if it shaved 
just three ms off our usual operations, it would have been in the core on that 
day.   Overall, CPython is just pretty slow.  That's what Pypy hopes to solve - 
I'd look there if you need a giant speed boost, we support it fully and it's 
also in our continuous integration environment.


-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] performance vs. psycopg2

2011-12-15 Thread Jon Nelson
On Thu, Dec 15, 2011 at 12:01 PM, Michael Bayer
mike...@zzzcomputing.com wrote:

 On Dec 15, 2011, at 12:51 PM, Jon Nelson wrote:

 Up front, I'm not using the ORM at all, and I'm using SQLAlchemy 0.7.4
 with psycopg2 2.4.3 on PostgreSQL 8.4.10 on Linux x86_64.

 I did some performance testing. Selecting 75 million rows (a straight
 up SELECT colA from tableA) from a 5GB table yielded some interesting
 results.
 psycopg2 averaged between 485,000 and 585,000 rows per second.
 Using COPY (via psycopg2) the average was right around 585,000.
 sqlalchemy averaged between 160,000 and 190,000 rows per second.
 That's a pretty big difference.

 I briefly looked into what the cause could be, but I didn't see
 anything jump out at me (except RowProxy, maybe).
 Thoughts?


 Performance tests like this are fraught with complicating details (such as, 
 did you fully fetch each column in each row in both cases?  Did you have 
 equivalent unicode and numeric conversions in place in both tests ? ).   In 
 this case psycopg2 is written in pure C and SQLAlchemy's result proxy only 
 partially (did you use the C extensions ?).    You'd use the Python profiling 
 module to get a clear picture for what difference there is in effort.   But 
 using any kind of abstraction layer, especially one written in Python, will 
 always add latency versus a pure C program.

I pretty much did this:
for row in rows:
  count += 1

I was using the C extensions.

Thanks for the reply!

-- 
Strange things are afoot at the Circle K.
Jon

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.