I have a question related to deadlock handling as well. Why the DBDeadlock exception is not caught generally for all api/rpc request ?
The mysql recommendation regarding to Deadlocks [1]: "Normally, you must write your applications so that they are always prepared to re-issue a transaction if it gets rolled back because of a deadlock." Now the services are just handling the DBDeadlock in several places. We have some logstash hits for other places even without galera. Instead of throwing 503 to the end user, the request could be repeated `silently`. The users would be able repeat the request himself, so the automated repeat should not cause unexpected new problem. The retry limit might be configurable, the exception needs to be watched before anything sent to the db on behalf of the transaction or request. Considering all request handler as potential deadlock thrower seams much easier than, deciding case by case. [1] http://dev.mysql.com/doc/refman/5.0/en/innodb-deadlocks.html ----- Original Message ----- > From: "Matthew Booth" <mbo...@redhat.com> > To: openstack-dev@lists.openstack.org > Sent: Thursday, February 5, 2015 10:36:55 AM > Subject: Re: [openstack-dev] [all][oslo.db][nova] TL; DR Things everybody > should know about Galera > > On 04/02/15 17:05, Sahid Orentino Ferdjaoui wrote: > >> * Commit will fail if there is a replication conflict > >> > >> foo is a table with a single field, which is its primary key. > >> > >> A: start transaction; > >> B: start transaction; > >> A: insert into foo values(1); > >> B: insert into foo values(1); <-- 'regular' DB would block here, and > >> report an error on A's commit > >> A: commit; <-- success > >> B: commit; <-- KABOOM > >> > >> Confusingly, Galera will report a 'deadlock' to node B, despite this not > >> being a deadlock by any definition I'm familiar with. > > > > Yes ! and if I can add more information and I hope I do not make > > mistake I think it's a know issue which comes from MySQL, that is why > > we have a decorator to do a retry and so handle this case here: > > > > > > http://git.openstack.org/cgit/openstack/nova/tree/nova/db/sqlalchemy/api.py#n177 > > Right, and that remains a significant source of confusion and > obfuscation in the db api. Our db code is littered with races and > potential actual deadlocks, but only some functions are decorated. Are > they decorated because of real deadlocks, or because of Galera lock > contention? The solutions to those 2 problems are very different! Also, > hunting deadlocks is hard enough work. Adding the possibility that they > might not even be there is just evil. > > Incidentally, we're currently looking to replace this stuff with some > new code in oslo.db, which is why I'm looking at it. > > Matt > -- > Matthew Booth > Red Hat Engineering, Virtualisation Team > > Phone: +442070094448 (UK) > GPG ID: D33C3490 > GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490 > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev