[sqlalchemy] Re: psycopg2 default transaction inefficiency
Yeah, I think I explained badly. What I was trying to show with the sqlalchemy vs postgres query logs is that extra BEGIN that psycopg2 is sending before the SELECT that sqlalchemy executes. The BEGIN is implicitly added by psycopg2 because it's in its default transaction isolation state of read comitted. Which I didn't know was the DBAPI default -- so strike what I said about changing any default behavior. But It would be nice to be able to have total control over the transactional state of the connection, so that when I know that I'm just doing a select or two I don't have to have the overhead of a BEGIN that I know is useless, but as things stand I can't do that, because do_begin isn't implemented for any of the dialects. I guess that's because sqlalchemy is depending on the underlying DBAPI implementation to handle sending BEGIN statements before the first query that is outside of a transaction? I noticed a TODO about implementing true autocommit behavior, which is what I need -- a way to tell the dbapi module *not* to start those implicit transactions. If the engine/connection could be put into true autocommit mode, and issue connection.begin() from do_begin when in that mode, then I could do everything I want to do and I don't think anything would break, since the orm calls do_begin before each flush. I know how to change the isolation level in the sqlite and postgres dialects -- would you be interested in a patch or work on a branch where I added an isolation_level property to the engine, implemented it for those two, and for those two dialects had do_begin issue a connection.begin() when the isolation_level was set to autocommit? I could probably tackle mysql as well, but I have no access to mssql or oracle, so I can't do those. Hopefully I'm making some more sense this time... JP On Feb 12, 12:42 am, Michael Bayer [EMAIL PROTECTED] wrote: im confused. the django thread seems to be saying that it is setting psycopg2 *into* autocommit mode, which causes the problem; the default setting of autocommit for DBAPI (and maintained in SA) is False. When i wrote frameworks years ago I always thought autocommit mode was the way to go but i since learned that issues like this arise so SA *never* uses autocommit mode on the connection; instead it implements its own autocommitting behavior in a manner similar to hibernate - it looks at the statement being executed, determines if its a CRUD/ CREATE/DROP expression, and then explicitly calls COMMIT if no SA transaction is already in progress. the log you have above doesnt prove that anything unusual is going on, since you are illustrating a transactional operation, then a SELECT, then another transactional operation, then another SELECT. SA does an explicit COMMIT for the CREATE statements since they are required to complete the table creation operation. this test confirms that psycopg2 defaults to autocommit as false and doesnt do anything with transaction modes after the connection is opened: import psycopg2 as psycopg conn = psycopg.connect(user='scott', password='tiger', host='127.0.0.1', database='test') for x in range(0, 5): curs = conn.cursor() curs.execute(SELECT 1) curs.close() log output: LOG: statement: SET DATESTYLE TO 'ISO' LOG: statement: SHOW client_encoding LOG: statement: SHOW default_transaction_isolation LOG: statement: BEGIN; SET TRANSACTION ISOLATION LEVEL READ COMMITTED LOG: statement: SELECT 1 LOG: statement: SELECT 1 LOG: statement: SELECT 1 LOG: statement: SELECT 1 LOG: statement: SELECT 1 LOG: statement: ABORT one connection, five new cursors, only one set of setup corresponding to the connection. On Feb 9, 3:52 pm, JP [EMAIL PROTECTED] wrote: I noticed this thread on django-devs: http://groups.google.com/group/django-developers/browse_frm/thread/52... Which notes that psycopg2 by default starts transactions behind the scenes, even for select statements. If you happen to be running a web app where each hit starts a new cursor, and you only do a few selects, this adds a *lot* of overhead relative to autocommit. I wondered if sqlalchemy suffered from the same problem with psycopg2, and it does. Take a look at what sqlalchemy thinks it was doing for a short example, vs what the postgres query log contains. The code: import sqlalchemy as sa meta = sa.DynamicMetaData() users = sa.Table('users', meta, sa.Column('id', sa.Integer, primary_key=True, autoincrement=True), sa.Column('name', sa.String(50))) meta.connect('postgres://snip', debug=True) meta.create_all() users.select().execute() sqlalchemy query log: select relname from pg_class where lower(relname) = %(name)s CREATE TABLE users ( id SERIAL NOT NULL, name VARCHAR(50), PRIMARY KEY (id) ) COMMIT SELECT users.id, users.name FROM users postgres query log: LOG: statement: SET DATESTYLE TO 'ISO' LOG:
[sqlalchemy] Re: psycopg2 default transaction inefficiency
On Feb 12, 10:30 am, JP [EMAIL PROTECTED] wrote: But It would be nice to be able to have total control over the transactional state of the connection, so that when I know that I'm just doing a select or two I don't have to have the overhead of a BEGIN that I know is useless, but as things stand I can't do that, because do_begin isn't implemented for any of the dialects. I guess that's because sqlalchemy is depending on the underlying DBAPI implementation to handle sending BEGIN statements before the first query that is outside of a transaction? DBAPI, including psycopg2, has no begin() method; there is only rollback() and commit(). like the example illustrates, there is no BEGIN being issued for every SELECT statement when using psycopg2 in non-autocommit mode, which applies to SQLAlchemy as well. therefore there is no performance bottleneck; this is a django issue only. I noticed a TODO about implementing true autocommit behavior, which is what I need -- a way to tell the dbapi module *not* to start those implicit transactions. it doesnt. run the example - no implicit transaction beyond the initial connection (which is required, since it must implement rollback() and commit()). SA has been doing its own autocommit on top of non-autocommitting connections since the 0.2 series at least. running in non-auto-commit mode is standard practice for databases like oracle. theres always a transaction in place. but its not started for every select statement. only at connection time and after each rollback or commit. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: psycopg2 default transaction inefficiency
like the example illustrates, there is no BEGIN being issued for every SELECT statement when using psycopg2 in non-autocommit mode, which applies to SQLAlchemy as well. therefore there is no performance bottleneck; this is a django issue only. I guess we're reading the example differently. I see this BEGIN: LOG: statement: BEGIN; SET TRANSACTION ISOLATION LEVEL READ COMMITTED LOG: statement: SELECT users.id, users.name FROM users in the postgres log, but as far as I can tell sqlalchemy is not sending it; it's being implicitly inserted by psycopg2 because the isolation level of the connection is read-committed and it sees a statement outside of a transaction. If the connection were in the autocommit isolation level, doing the same thing in sqlalchemy would not result in that BEGIN, only the SELECT would be issued. I noticed a TODO about implementing true autocommit behavior, which is what I need -- a way to tell the dbapi module *not* to start those implicit transactions. it doesnt. run the example - no implicit transaction beyond the initial connection (which is required, since it must implement rollback() and commit()). Either I'm very confused or I'm having lots of trouble getting my point across. In my experience, it is absolutely the case that the dbapi drivers are starting implicit transactions when they aren't in autocommit mode -- otherwise I don't see how any transactions would work correctly, since do_begin is a no-op everywhere -- so if the dbapi modules aren't inserting the BEGIN to start transactions, I don't see where it could be coming from. What am I missing? If I'm wrong about what's going on here, how does sqlalchemy start transactions when it needs to, and where is that BEGIN before the select in my example coming from? I have an example that I think illustrates what I'm talking about pretty well, so I'm going to file a ticket and attach the example and the postgres query log I see when I run it. Hopefully that will clear things up one way or another. JP --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: psycopg2 default transaction inefficiency
im confused. the django thread seems to be saying that it is setting psycopg2 *into* autocommit mode, which causes the problem; the default setting of autocommit for DBAPI (and maintained in SA) is False. When i wrote frameworks years ago I always thought autocommit mode was the way to go but i since learned that issues like this arise so SA *never* uses autocommit mode on the connection; instead it implements its own autocommitting behavior in a manner similar to hibernate - it looks at the statement being executed, determines if its a CRUD/ CREATE/DROP expression, and then explicitly calls COMMIT if no SA transaction is already in progress. the log you have above doesnt prove that anything unusual is going on, since you are illustrating a transactional operation, then a SELECT, then another transactional operation, then another SELECT. SA does an explicit COMMIT for the CREATE statements since they are required to complete the table creation operation. this test confirms that psycopg2 defaults to autocommit as false and doesnt do anything with transaction modes after the connection is opened: import psycopg2 as psycopg conn = psycopg.connect(user='scott', password='tiger', host='127.0.0.1', database='test') for x in range(0, 5): curs = conn.cursor() curs.execute(SELECT 1) curs.close() log output: LOG: statement: SET DATESTYLE TO 'ISO' LOG: statement: SHOW client_encoding LOG: statement: SHOW default_transaction_isolation LOG: statement: BEGIN; SET TRANSACTION ISOLATION LEVEL READ COMMITTED LOG: statement: SELECT 1 LOG: statement: SELECT 1 LOG: statement: SELECT 1 LOG: statement: SELECT 1 LOG: statement: SELECT 1 LOG: statement: ABORT one connection, five new cursors, only one set of setup corresponding to the connection. On Feb 9, 3:52 pm, JP [EMAIL PROTECTED] wrote: I noticed this thread on django-devs: http://groups.google.com/group/django-developers/browse_frm/thread/52... Which notes that psycopg2 by default starts transactions behind the scenes, even for select statements. If you happen to be running a web app where each hit starts a new cursor, and you only do a few selects, this adds a *lot* of overhead relative to autocommit. I wondered if sqlalchemy suffered from the same problem with psycopg2, and it does. Take a look at what sqlalchemy thinks it was doing for a short example, vs what the postgres query log contains. The code: import sqlalchemy as sa meta = sa.DynamicMetaData() users = sa.Table('users', meta, sa.Column('id', sa.Integer, primary_key=True, autoincrement=True), sa.Column('name', sa.String(50))) meta.connect('postgres://snip', debug=True) meta.create_all() users.select().execute() sqlalchemy query log: select relname from pg_class where lower(relname) = %(name)s CREATE TABLE users ( id SERIAL NOT NULL, name VARCHAR(50), PRIMARY KEY (id) ) COMMIT SELECT users.id, users.name FROM users postgres query log: LOG: statement: SET DATESTYLE TO 'ISO' LOG: statement: SHOW client_encoding LOG: statement: SHOW default_transaction_isolation LOG: statement: BEGIN; SET TRANSACTION ISOLATION LEVEL READ COMMITTED LOG: statement: select relname from pg_class where lower(relname) = 'users' LOG: statement: CREATE TABLE users ( id SERIAL NOT NULL, name VARCHAR(50), PRIMARY KEY (id) ) LOG: statement: END LOG: statement: BEGIN; SET TRANSACTION ISOLATION LEVEL READ COMMITTED LOG: statement: SELECT users.id, users.name FROM users I think it would be better for sqlalchemy to set psycopg2 to autocommit by default, and implement do_begin in the dialect so that transactions are only started when desired, rather than implicitly on the first statment seen by a cursor when there's no current transaction, as seems to be the case now. Mike, would you be interested in a patch that implemented that behavior? Does anyone disagree that that would be better? JP --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---