[sqlalchemy] Re: [Sqlalchemy-users] named cursor

2007-01-10 Thread Michael Bayer

server-side cursors (i.e. named) for all postgres functionality by
default is currently in the trunk, and you can turn it off by saying
client_side_cursors=True in create_engine().  however one user
reported that table reflection breaks, which I cannot reproduce.  so
can some folks please try out the trunk with postgres and see if we can
go with server side cursors by default?  ive run the unit tests many
times with both settings and i dont perceieve any performance hit from
using server side cursors.  id like this to be in the next release, but
if theres some issue using server side cursors in all cases, then i
have to break open the code and make it a lot more complex to detect
the conditions where server-side cursors are appropriate.


--~--~-~--~~~---~--~~
 You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: [Sqlalchemy-users] named cursor

2007-01-10 Thread Daniel Miller

Hi Mike,

I've been lurking here in the shadows for a while...this particular topic looks 
very interesting to me and I'd love to upgrade to HEAD to try it out. However, 
I'm wondering what the risk factor is in doing that? Have there been any other 
major (potentially destabilizing) changes lately? I'm using r2168 right now, 
and it's been very stable. I've upgraded to HEAD quite a few times over the 
past year and I've always been impressed with (1) the overall stability of SA 
and (2) speed with which bugs were fixed when they are found. I'm just asking 
for a general idea of how stable you feel the trunk is right now. Don't worry, 
I'll take responsibility for my actions and will in no way hold you responsible 
for anything that may occur if I decide to upgrade right now--just looking for 
a general indicator. Thanks.

~ Daniel


Michael Bayer wrote:
 server-side cursors (i.e. named) for all postgres functionality by
 default is currently in the trunk, and you can turn it off by saying
 client_side_cursors=True in create_engine().  however one user
 reported that table reflection breaks, which I cannot reproduce.  so
 can some folks please try out the trunk with postgres and see if we can
 go with server side cursors by default?  ive run the unit tests many
 times with both settings and i dont perceieve any performance hit from
 using server side cursors.  id like this to be in the next release, but
 if theres some issue using server side cursors in all cases, then i
 have to break open the code and make it a lot more complex to detect
 the conditions where server-side cursors are appropriate.
 
 
  
 

--~--~-~--~~~---~--~~
 You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: [Sqlalchemy-users] named cursor

2007-01-10 Thread Michael Bayer

it should be pretty stable, we havent had much dramatic going on.  i  
moved some things around a bit in r2174 with regards to eager loading  
but i think those changes have ironed out pretty well.

On Jan 10, 2007, at 10:39 PM, Daniel Miller wrote:


 Hi Mike,

 I've been lurking here in the shadows for a while...this particular  
 topic looks very interesting to me and I'd love to upgrade to HEAD  
 to try it out. However, I'm wondering what the risk factor is in  
 doing that? Have there been any other major (potentially  
 destabilizing) changes lately? I'm using r2168 right now, and it's  
 been very stable. I've upgraded to HEAD quite a few times over the  
 past year and I've always been impressed with (1) the overall  
 stability of SA and (2) speed with which bugs were fixed when they  
 are found. I'm just asking for a general idea of how stable you  
 feel the trunk is right now. Don't worry, I'll take responsibility  
 for my actions and will in no way hold you responsible for anything  
 that may occur if I decide to upgrade right now--just looking for a  
 general indicator. Thanks.

 ~ Daniel


 Michael Bayer wrote:
 server-side cursors (i.e. named) for all postgres functionality by
 default is currently in the trunk, and you can turn it off by saying
 client_side_cursors=True in create_engine().  however one user
 reported that table reflection breaks, which I cannot reproduce.  so
 can some folks please try out the trunk with postgres and see if  
 we can
 go with server side cursors by default?  ive run the unit tests many
 times with both settings and i dont perceieve any performance hit  
 from
 using server side cursors.  id like this to be in the next  
 release, but
 if theres some issue using server side cursors in all cases, then i
 have to break open the code and make it a lot more complex to detect
 the conditions where server-side cursors are appropriate.





 


--~--~-~--~~~---~--~~
 You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: [Sqlalchemy-users] named cursor

2007-01-09 Thread Michael Bayer


On Jan 9, 2007, at 5:24 AM, Michele Petrazzo wrote:
 Because, like what I wrote on the psycopg2 ml (lists.initd.org/ 
 pipermail/psycopg/2007-January/005250.html)
 without the server side cursor, the memory usage on the client  
 side is proportionally at the number of records returned by the  
 query (on my example 400k records, 60MB ram). I think because the  
 driver fetch all the data from the server and leave it available  
 with the fetchXXX methods.

really ?  im not sure about psycopg2 specifically, but its been my  
experience with virtually all database client libraries that as you  
fetch rows and discard them, you only pull as much as you fetch into  
memory.  i.e. if you had a resultset of 4000 rows and just fetched 5  
rows, only 5 rows get pulled over the wire.  thats definitely how the  
lower level APIs work and it would be pretty bad if psycopg2 didnt  
maintain that behavior.




--~--~-~--~~~---~--~~
 You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: [Sqlalchemy-users] named cursor

2007-01-09 Thread Michele Petrazzo

Michael Bayer wrote:
 
 On Jan 9, 2007, at 5:24 AM, Michele Petrazzo wrote:
 Because, like what I wrote on the psycopg2 ml (lists.initd.org/ 
 pipermail/psycopg/2007-January/005250.html)
 without the server side cursor, the memory usage on the client  
 side is proportionally at the number of records returned by the  
 query (on my example 400k records, 60MB ram). I think because the  
 driver fetch all the data from the server and leave it available  
 with the fetchXXX methods.
 
 really ? im not sure about psycopg2 specifically, 

Here, sure! Assuming the I'm on linux and I have ps utility locally.
The db are on the same machine where I'm making tests. PG are 8.1.5 and
psycopg2 2.0.4.

This is my test:

-code-
import time
import subprocess as SP
import psycopg2

NAMED_CURSOR = 1

def get_mem():
   #Wait for the memory stabilization
   time.sleep(0.5)

   #get the memory usage with ps.
   output = SP.Popen([ps,-eo,rss,args], 
stdout=SP.PIPE).communicate()[0]
   for line in output.split('\n'):
 if not line: continue
 mem, pr_name = line.split(None,1)
 #show only the my python exe
 if not python  + __file__ in pr_name: continue
 return mem

def create_cur():
   if NAMED_CURSOR:
 return con.cursor(test)
   else:
 return con.cursor()

def exec_q(q):
   cur = create_cur()
   print empty cursor , get_mem()
   cur.execute(q)
   print cursor after execute , get_mem()
   print result --, cur.fetchone()
   print after fetchone , get_mem()
   cur.close()
   print cursor closed , get_mem()

print python and modules , get_mem()

con = psycopg2.connect('user=test password=test dbname=book')
print connection , get_mem()

exec_q(SELECT COUNT(id) FROM book)

exec_q(SELECT * FROM book)


/-code-

## with NAMED_CURSOR = 1

michele:~/tmp$ time python test_sql_mem.py
python and modules  3928
connection  4048
empty cursor  4068
cursor after execute  4080
result -- (406500L,)
after fetchone  4088
cursor closed  4092
empty cursor  4092
cursor after execute  4092
result -- (6087, 'title', 15, '18', 'Sonzogno', 508126)
after fetchone  4092
cursor closed  4092

real0m6.107s
user0m0.080s
sys 0m0.076s
michele:~/tmp$

## with NAMED_CURSOR = 0

michele:~/tmp$ time python test_sql_mem.py
python and modules  3932
connection  4048
empty cursor  4068
cursor after execute  4080
result -- (406500L,)
after fetchone  4092
cursor closed  4092
empty cursor  4092
cursor after execute  58440
result -- (6087, 'title', 15, '18', 'Sonzogno', 508126)
after fetchone  58452
cursor closed  58452

real0m11.499s
user0m0.544s
sys 0m0.176s
michele:~/tmp$

 but its been my  
 experience with virtually all database client libraries that as you  
 fetch rows and discard them, you only pull as much as you fetch into  
 memory.  i.e. if you had a resultset of 4000 rows and just fetched 5  
 rows, only 5 rows get pulled over the wire.  thats definitely how the  
 lower level APIs work 


I'm with you, but my code show other results

 and it would be pretty bad if psycopg2 didnt maintain that behavior.

I'll send another email to the psycopg2 ml, with this results, and I'll 
reply you in the near future.

Michele

--~--~-~--~~~---~--~~
 You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: [Sqlalchemy-users] named cursor

2007-01-09 Thread Michael Bayer

OK well this is all quite disturbing !

I would want to examine this more closely.  but, lets assume this is  
really how it is,  since thats how it looks (eesh).

So, you dont really want to use named cursors, you just want psycopg2  
to use cursors in its underlying operations so that it has more  
efficient operation...and the magical incantation you send to  
psycopg2 in order for it to use cursors is conn.cursor(x) as  
opposed to conn.cursor() (which to me is like, totally arbitrary..and  
at this point i am getting pretty annoyed with how utterly  
undocumented psycopg2 is...but then again so is MySQLDB).  so this is  
total artifact-land, and it has no implication for SA to grow some  
new fine-grained cursor-based API, we are here just looking for a  
hack workaround internally.

My first instinct is to say, ok, then we just use the call  
conn.cursor('x') to get cursors when using the postgres dialect.   
however, im told that this probably wont work for CRUD (i.e. insert,  
update etc) operations, so we would have to keep that in mind (SA  
does scan incoming SQL for INSERT/UPDATE etc so thats not an issue).   
also, is conn.cursor(x) more efficient in all cases ?  or is the  
overhead of creating a cursor significant, and only worthwhile when  
you know you are going to fetch only a subset of rows from an  
enormous result set ?

so my second instinct is to perhaps go with the approach of using  
conn.cursor(x) for all SELECT statements when using the postgres  
dialect, *if* you tell the dialect you want to use that.  im thinking  
a flag use_named_cursors=True when you create_engine().  Id really  
rather not pollute the connect()/Connection/execute() API with this  
concept if at all possible...i was initially thinking of doing  
something like, conn.with_cursor(mycursor).execute(), but thats  
pretty explicit for something that should really be an internal  
optimization.

thats my thoughts so far.  just FYI if you want to hack your SA to  
use named cursors in all cases, its line 282 and 322 of lib/ 
sqlalchemy/engine/base.py .


On Jan 9, 2007, at 11:40 AM, Michele Petrazzo wrote:


 Michael Bayer wrote:

 On Jan 9, 2007, at 5:24 AM, Michele Petrazzo wrote:
 Because, like what I wrote on the psycopg2 ml (lists.initd.org/
 pipermail/psycopg/2007-January/005250.html)
 without the server side cursor, the memory usage on the client
 side is proportionally at the number of records returned by the
 query (on my example 400k records, 60MB ram). I think because the
 driver fetch all the data from the server and leave it available
 with the fetchXXX methods.

 really ? im not sure about psycopg2 specifically,

 Here, sure! Assuming the I'm on linux and I have ps utility locally.
 The db are on the same machine where I'm making tests. PG are 8.1.5  
 and
 psycopg2 2.0.4.

 This is my test:

 -code-
 import time
 import subprocess as SP
 import psycopg2

 NAMED_CURSOR = 1

 def get_mem():
#Wait for the memory stabilization
time.sleep(0.5)

#get the memory usage with ps.
output = SP.Popen([ps,-eo,rss,args],
 stdout=SP.PIPE).communicate()[0]
for line in output.split('\n'):
  if not line: continue
  mem, pr_name = line.split(None,1)
  #show only the my python exe
  if not python  + __file__ in pr_name: continue
  return mem

 def create_cur():
if NAMED_CURSOR:
  return con.cursor(test)
else:
  return con.cursor()

 def exec_q(q):
cur = create_cur()
print empty cursor , get_mem()
cur.execute(q)
print cursor after execute , get_mem()
print result --, cur.fetchone()
print after fetchone , get_mem()
cur.close()
print cursor closed , get_mem()

 print python and modules , get_mem()

 con = psycopg2.connect('user=test password=test dbname=book')
 print connection , get_mem()

 exec_q(SELECT COUNT(id) FROM book)

 exec_q(SELECT * FROM book)


 /-code-

 ## with NAMED_CURSOR = 1

 michele:~/tmp$ time python test_sql_mem.py
 python and modules  3928
 connection  4048
 empty cursor  4068
 cursor after execute  4080
 result -- (406500L,)
 after fetchone  4088
 cursor closed  4092
 empty cursor  4092
 cursor after execute  4092
 result -- (6087, 'title', 15, '18', 'Sonzogno', 508126)
 after fetchone  4092
 cursor closed  4092

 real0m6.107s
 user0m0.080s
 sys 0m0.076s
 michele:~/tmp$

 ## with NAMED_CURSOR = 0

 michele:~/tmp$ time python test_sql_mem.py
 python and modules  3932
 connection  4048
 empty cursor  4068
 cursor after execute  4080
 result -- (406500L,)
 after fetchone  4092
 cursor closed  4092
 empty cursor  4092
 cursor after execute  58440
 result -- (6087, 'title', 15, '18', 'Sonzogno', 508126)
 after fetchone  58452
 cursor closed  58452

 real0m11.499s
 user0m0.544s
 sys 0m0.176s
 michele:~/tmp$

 but its been my
 experience with virtually all database client libraries that as you
 fetch rows and discard them, you only pull as much as you fetch into
 memory.  i.e. if you had a