On 07/11/2012 12:33 PM, Devananda van der Veen wrote: > Hi all, > > I've been taking a look at the way Nova uses its MySQL database. Having > done MySQL performance audits for years as a consultant, a few things > jumped out right away at me. First is the way that SQLAlchemy is > wrapping nearly every query in an unnecessary "ping check" and rollback, > eg.: > > select 1; > select ... from ... where ...; > rollback; > select 1; > update ... where ...; > commit; > rollback; > > > You can find a complete sample here: http://paste.openstack.org/show/18731/ > > I think I understand the reason for both the "select 1" and the > "rollback" statements. However, in the interest of performance for > large-scale deployments, I feel pretty strongly that they need to go away. > > As I see it, there are three factors here. > > (I) Most of the code in db/sqlalchemy/api.py abstracts a "unit of work" > to a very low level, generally a single database read or write, and does > not currently support transactions spanning multiple consistent writes. > This is why "select 1" and "rollback" appear around almost every query > -- most functions in api.py is checking out a session object, doing one > or two queries, and then releasing the session. This actually creates a > much larger issue about transactional atomicity for larger operations, > such as the open bug about network creation here, and is probably better > for another discussion. > https://bugs.launchpad.net/nova/+bug/755138. > > (II) Connections are tested by the MySQLPingListener function, using > "SELECT 1" statements, every time a connection / session is checked out > of the pool. Since connections are usually OK, this adds overhead > unnecessarily. It would be more efficient to handle the errors when > connections aren't OK. I've opened a bug with a description of one > possible way to fix this issue. There are probably other viable > solutions as well. > https://bugs..launchpad.net/nova/+bug/1007027 > <https://bugs.launchpad.net/nova/+bug/1007027> > > My understanding is that this was implemented to prevent "Database has > gone away" errors from occurring every time that the database is > restarted, or when connections are closed by the database after being > idle for long periods. In my opinion, a better solution to these > problems is to: > * wrap queries in retry logic, which will catch disconnect errors and > attempt to reconnect to the database. I have a patch for this, if folks > are interested. > * set Nova's sql_idle_timeout to less than MySQL's wait_timeout, and set > them both to a reasonably short interval (eg, 1 minute), rather than the > default (which I think is 8 hours). > > (III) Transaction state is reset when connections are returned to the > pool, even if no transaction was in progress, or the > transaction-in-progress already committed. This is completely wasteful, > and easy to disable. > https://bugs.launchpad.net/nova/+bug/1007038 > > Caveat here is that this reset-on-return functionality *is* useful when > code doesn't properly clean up its own transaction state. When I turned > it off, it exposed a few bugs, which I haven't tracked down yet. > Lowering the sql_idle_timeout will provide the same "solution" as the > current reset-on-return behavior in that it will prevent long-running > idle transactions that tie up resources unnecessarily. > > > In summary, I'd like to see Nova stop spamming the database with "SELECT > 1" and "ROLLBACK", and think this should be pretty easy to do. Testing > these two changes in my devstack seems to work fine, but I'd like to > know what others think before I propose a changeset with this kind of > potential impact.
I agree and welcome work in that direction. Should we be looking at patches to sqlalchemy to increase its support for dealing appropriately with mysql-gone-away errors? Monty _______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp