Yeah, that assert issue has bitten us once or twice, and I know Ryan squawked about it at some point. Do we have any point where enforcement will occur (BVT or some other tests)?
On Mon, Mar 31, 2014 at 4:04 PM, Alex Huang <[email protected]> wrote: > Hi All, > > I was alerted to this problem recently and it's something that affects > developers so I want to bring it up. It is a design principle in CloudStack > that we do not make agent calls within database transactions. The reason is > because when you make a call to an external system, there's no guarantee on > how long the call takes or even whether the call returns. When a call takes > a long time, several bad things can happen: > - The MySQL DB Connection held opened due to the DB transaction goes > into idle. Eventually, a timeout in MySQL hits and the connection gets > severed and the transaction is rolled back. By default, this timeout is 45 > seconds but can be changed via a parameter in my.cnf. So it's problem that > the agent call completes just fine but the DB transaction rolls back and > changes are undone. > - The rows locked in that transaction before the remote agent call > could be holding up other foreign key checks into the table. MySQL runs > foreign key checks in transactions to make sure the data modification and the > checks are done atomically. Therefore, these checks must wait for other > transactions to complete. Hence, an agent call that takes sometime can > severely slow down the system, particularly under scale. > > We have two solutions to this: > - Drive agent interactions with states. There are many examples of > this in VM, Volume, etc. > - When the above cannot be done, acquire a lock in the lock table via > a DAO method call. Locks do not maintain DB transactions and therefore will > not run into this problem. However, you are responsible for releasing locks. > It used to be that if you forget to release the locks, the @DB annotation > automatically releases locks once it went out of the scope and asserts to > alert the developer. However, the @DB annotation has been removed in the > Spring work so I'm not sure if it's still done. > > This is a tough problem to solve because > 1. It usually works just fine during functional testing. During > scale testing, this problem surfaces and often in unexpected places due to > the foreign key check problem. > 2. For developers, it is difficult for them to know if a method that > they're calling within a transaction ends up in an agent call. > > There is an assert in AgentManager to ensure that there are no db > transactions before making a agent call. Apparently, since the conversion to > Maven, no one actually runs with assert on any more. Due to that, this > design principle has been lost in CloudStack and we're finding more and more > calls being made in DB transactions. To counter that, I decided to add a > global parameter that turns the assert to an actual exception. It is advised > that all developers set this global parameter, > check.txn.before.sending.agent.commands, during their own testing to make > sure it doesn't call agent calls in transactions. > > --Alex > >
