Yeah, that assert issue has bitten us once or twice, and I know Ryan
squawked about it at some point.  Do we have any point where
enforcement will occur (BVT or some other tests)?

On Mon, Mar 31, 2014 at 4:04 PM, Alex Huang <alex.hu...@citrix.com> wrote:
> Hi All,
>
> I was alerted to this problem recently and it's something that affects 
> developers so I want to bring it up.  It is a design principle in CloudStack 
> that we do not make agent calls within database transactions.  The reason is 
> because when you make a call to an external system, there's no guarantee on 
> how long the call takes or even whether the call returns.  When a call takes 
> a long time, several bad things can happen:
>         - The MySQL DB Connection held opened due to the DB transaction goes 
> into idle. Eventually, a timeout in MySQL hits and the connection gets 
> severed and the transaction is rolled back.  By default, this timeout is 45 
> seconds but can be changed via a parameter in my.cnf.  So it's problem that 
> the agent call completes just fine but the DB transaction rolls back and 
> changes are undone.
>         - The rows locked in that transaction before the remote agent call 
> could be holding up other foreign key checks into the table.  MySQL runs 
> foreign key checks in transactions to make sure the data modification and the 
> checks are done atomically.  Therefore, these checks must wait for other 
> transactions to complete.  Hence, an agent call that takes sometime can 
> severely slow down the system, particularly under scale.
>
> We have two solutions to this:
>         - Drive agent interactions with states.  There are many examples of 
> this in VM, Volume, etc.
>         - When the above cannot be done, acquire a lock in the lock table via 
> a DAO method call.  Locks do not maintain DB transactions and therefore will 
> not run into this problem.  However, you are responsible for releasing locks. 
>  It used to be that if you forget to release the locks, the @DB annotation 
> automatically releases locks once it went out of the scope and asserts to 
> alert the developer.  However, the @DB annotation has been removed in the 
> Spring work so I'm not sure if it's still done.
>
> This is a tough problem to solve because
>         1. It usually works just fine during functional testing.  During 
> scale testing, this problem surfaces and often in unexpected places due to 
> the foreign key check problem.
>         2. For developers, it is difficult for them to know if a method that 
> they're calling within a transaction ends up in an agent call.
>
> There is an assert in AgentManager to ensure that there are no db 
> transactions before making a agent call.  Apparently, since the conversion to 
> Maven, no one actually runs with assert on any more.  Due to that, this 
> design principle has been lost in CloudStack and we're finding more and more 
> calls being made in DB transactions.   To counter that, I decided to add a 
> global parameter that turns the assert to an actual exception.  It is advised 
> that all developers set this global parameter, 
> check.txn.before.sending.agent.commands, during their own testing to make 
> sure it doesn't call agent calls in transactions.
>
> --Alex
>
>

Reply via email to