> On a side note, I'm rather ignorant on python frameworks for distributed coordination... concoord? > Is zookeper something that should be ruled out because of language restrictions?
I have not used zookeeper, so there might be reasons to use it where we have to write java code, but as long as zookeeper is added as a "package" dependency (apt-get/yum install zookeeper), with a python front-end* the language should not matter. * (like kazoo? I will need to investigate) Regards, Mandeep On Mon, May 19, 2014 at 3:26 PM, Salvatore Orlando <sorla...@nicira.com>wrote: > Some comments inline. > > Salvatore > > > On 19 May 2014 20:32, sridhar basam <sridhar.ba...@gmail.com> wrote: > >> >> >> >> On Mon, May 19, 2014 at 1:30 PM, Jay Pipes <jaypi...@gmail.com> wrote: >> >>> Stackers, >>> >>> On Friday in Atlanta, I had the pleasure of moderating the database >>> session at the Ops Meetup track. We had lots of good discussions and heard >>> important feedback from operators on DB topics. >>> >>> For the record, I would not bring this point up so publicly unless I >>> believed it was a serious problem affecting a large segment of users. When >>> doing an informal survey of the users/operators in the room at the start of >>> the session, out of approximately 200 people in the room, only a single >>> person was using PostgreSQL, about a dozen were using standard MySQL >>> master/slave replication, and the rest were using MySQL Galera clustering. >>> So, this is a real issue for a large segment of the operators -- or at >>> least the ones at the session. :) >>> >>> >> We are one of those operators that use Galera for replicating our mysql >> databases. We used to see issues with deadlocks when having multiple mysql >> writers in our mysql cluster. As a workaround we have our haproxy >> configuration in an active-standby configuration for our mysql VIP. >> >> I seem to recall we had a lot of the deadlocks happen through Neutron. >> When we go through our Icehouse testing, we will redo our multimaster mysql >> setup and provide feedback on the issues we see. >> > > The SELECT... FOR UPDATE issue is going to be a non trivial one for > neutron as well. Some components, like IPAM, heavily rely on it. > However, Neutron is a lot more susceptible to deadlock problems than nova > because it does not implement at the moment a retry mechanism. > This is something which should be added during the Juno release cycle > regardless of all the other enhancement currently being planned, such as > task oriented operations. > >> >> thanks, >> Sridhar >> >> >> >>> Peter Boros, from Percona, was able to provide some insight on MySQL >>> Galera topics, and one issue came up that is likely the cause of a lot of >>> heartache for operators who use MySQL Galera (or Percona XtraDB Cluster). >>> >>> We were discussing whether people had seen deadlock issues [1] when >>> using MySQL Galera in their deployment, and were brainstorming on why >>> deadlocks might be seen. I had suggested that perhaps Nova's use of >>> autoincrementing primary keys may have been the cause. Peter pretty quickly >>> dispatched that notion, saying that Galera automatically handles >>> autoincrementing keys using managed innodb_autoincrement_increment and >>> innodb_autoincrement_offset config options. >>> >>> I think at that point I mentioned that there were a number of places >>> that were using the SELECT ... FOR UPDATE construct in Nova (in SQLAlchemy, >>> it's the with_lockmode('update') modification of the query object). Peter >>> promptly said that was a problem. MySQL Galera does not support SELECT ... >>> FOR UPDATE, since it has no concept of cross-node locking of records and >>> results are non-deterministic. >>> >>> So... what to do? >>> >>> For starters, some information on the use of with_lockmode() in Nova and >>> Neutron... >>> >>> Within Nova, there are actually only a few places where >>> with_lockmode('update') is used. Unfortunately, the use of >>> with_lockmode('update') is in the quota code, which tends to wrap largish >>> blocks of code within the Nova compute execution code. >>> >>> Within Neutron, however, the use of with_lockmode('update') is all over >>> the place. There are 44 separate uses of it in 11 different files. >>> >>> > I will report on a separate thread on this, so that we can have an > assessment of where locking statements are used and why. > > >> We have a number of options: >>> >> > I thin option 0 should be to rework/redesign the code, where possible, to > avoid DB-level locking at all. > > >> >>> 1) Stop using MySQL Galera for databases of projects that contain >>> with_lockmode('update') >>> >> > This looks hideous, but I am afraid this is what all people wishing to > deploy Icehouse should consider doing. > > >> >>> 2) Put a big old warning in the docs somewhere about the problem of >>> potential deadlocks or odd behaviour with Galera in these projects >>> >>> 3) For Nova and Neutron, remove the use of with_lockmode('update') and >>> instead use a coarse-grained file lock or a distributed lock manager for >>> those areas where we need deterministic reads or quiescence. >>> >> > We had an attempt at implementing a sort of distributed lock for neutron: > https://review.openstack.org/#/c/34695/ > Beyond the implementation reservations on this patch, one thing that > should be noticed, probably needless to say, is that distributed > coordination is something that should never be taken in a light-hearted way. > Once all the non-locking solution have been ruled out, distributed > coordination among processes could be considered. In that case I think it > might be better to use some OTS software rather than working out some home > grown solution (I surely do not see space for a new project here) > On a side note, I'm rather ignorant on python frameworks for distributed > coordination... concoord? Is zookeper something that should be ruled out > because of language restrictions? > > > >> >>> 4) For the Nova db quota driver, refactor the driver to either use a >>> non-locking method for reservation and quota queries or move the driver out >>> into its own projects (or use something like Climate and make sure that >>> Climate uses a non-blocking algorithm for those queries...) >>> >>> Thoughts? >>> >>> -jay >>> >>> [1] http://lists.openstack.org/pipermail/openstack/2014-May/007202.html >>> >>> _______________________________________________ >>> OpenStack-dev mailing list >>> OpenStack-dev@lists.openstack.org >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>> >> >> >> _______________________________________________ >> OpenStack-dev mailing list >> OpenStack-dev@lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> >> > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >
_______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev