Am 10/9/18 um 1:23 PM schrieb Jay Pipes:
On 10/09/2018 06:34 AM, Florian Engelmann wrote:
Am 10/9/18 um 11:41 AM schrieb Jay Pipes:
On 10/09/2018 04:34 AM, Christian Berendt wrote:


On 8. Oct 2018, at 19:48, Jay Pipes <jaypi...@gmail.com> wrote:

Why not send all read and all write traffic to a single haproxy endpoint and just have haproxy spread all traffic across each Galera node?

Galera, after all, is multi-master synchronous replication... so it shouldn't matter which node in the Galera cluster you send traffic to.

Probably because of MySQL deadlocks in Galera:

—snip—
Galera cluster has known limitations, one of them is that it uses cluster-wide optimistic locking. This may cause some transactions to rollback. With an increasing number of writeable masters, the transaction rollback rate may increase, especially if there is write contention on the same dataset. It is of course possible to retry the transaction and perhaps it will COMMIT in the retries, but this will add to the transaction latency. However, some designs are deadlock prone, e.g sequence tables.
—snap—

Source: https://severalnines.com/resources/tutorials/mysql-load-balancing-haproxy-tutorial

Have you seen the above in production?

Yes of course. Just depends on the application and how high the workload gets.

Please read about deadloks and nova in the following report by Intel:

http://galeracluster.com/wp-content/uploads/2017/06/performance_analysis_and_tuning_in_china_mobiles_openstack_production_cloud_2.pdf


I have read the above. It's a synthetic workload analysis, which is why I asked if you'd seen this in production.

For the record, we addressed much of the contention/races mentioned in the above around scheduler resource consumption in the Ocata and Pike releases of Nova.

I'm aware that the report above identifies the quota handling code in Nova as the primary culprit of the deadlock issues but again, it's a synthetic workload that is designed to find breaking points. It doesn't represent a realistic production workload.

You can read about the deadlock issue in depth on my blog here:

http://www.joinfu.com/2015/01/understanding-reservations-concurrency-locking-in-nova/

That explains where the source of the problem comes from (it's the use of SELECT FOR UPDATE, which has been removed from Nova's quota-handling code in the Rocky release).

Thank you very much for your link. Great article!!! Took my a while to read it and understand everything but helped a lot!!!


If just Nova is affected we could also create an additional HAProxy listener using all Galera nodes with round-robin for all other services?

I fail to see the point of using Galera with a single writer. At that point, why bother with Galera at all? Just use a single database node with a single slave for backup purposes.

From my point of view Galera is easy to manage and great for HA. Having to handle a manual failover in production with mysql master/slave never was fun... Indeed writing to a single node and not using the other nodes (even for read, like it is done in kolla-ansible) is not the best solution. Galera is slower than a standalone MySQL. Using ProxySQL would enable us to use caching and read/write split to speed up database queries while HA and management are still good.



Anyway - proxySQL would be a great extension.

I don't disagree that proxySQL is a good extension. However, it adds yet another services to the mesh that needs to be deployed, configured and maintained.

True. I guess we will start with an external MySQL installation to collect some experience.

Attachment: smime.p7s
Description: S/MIME cryptographic signature

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to