Hi Roman.

That's interesting, although’s hard to believe (there is no slave lag in galera 
multi master). I can only suggest us to create another jepsen test to verify 
exactly scenario you describe. As well as other OpenStack specific patterns.






Regards,

Bogdan.





Od: Roman Podoliaka
Wysłano: ‎piątek‎, ‎29‎ ‎kwietnia‎ ‎2016 ‎21‎:‎04
Do: OpenStack Development Mailing List (not for usage questions)
DW: [email protected]





Hi Bogdan,

Thank you for sharing this! I'll need to familiarize myself with this
Jepsen thing, but overall it looks interesting.

As it turns out, we already run Galera in multi-writer mode in Fuel
unintentionally in the case, when the active MySQL node goes down,
HAProxy starts opening connections to a backup, then the active goes
up again, HAProxy starts opening connections to the original MySQL
node, but OpenStack services may still have connections opened to the
backup in their connection pools - so now you may have connections to
multiple MySQL nodes at the same time, exactly what you wanted to
avoid by using active/backup in the HAProxy configuration.

^ this actually leads to an interesting issue [1], when the DB state
committed on one node is not immediately available on another one.
Replication lag can be controlled  via session variables [2], but that
does not always help: e.g. in [1] Nova first goes to Neutron to create
a new floating IP, gets 201 (and Neutron actually *commits* the DB
transaction) and then makes another REST API request to get a list of
floating IPs by address - the latter can be served by another
neutron-server, connected to another Galera node, which does not have
the latest state applied yet due to 'slave lag' - it can happen that
the list will be empty. Unfortunately, 'wsrep_sync_wait' can't help
here, as it's two different REST API requests, potentially served by
two different neutron-server instances.

Basically, you'd need to *always* wait for the latest state to be
applied before executing any queries, which Galera is trying to avoid
for performance reasons.

Thanks,
Roman

[1] https://bugs.launchpad.net/fuel/+bug/1529937
[2] 
http://galeracluster.com/2015/06/achieving-read-after-write-semantics-with-galera/

On Fri, Apr 22, 2016 at 10:42 AM, Bogdan Dobrelya
<[email protected]> wrote:
> [crossposting to [email protected]]
>
> Hello.
> I wrote this paper [0] to demonstrate an approach how we can leverage a
> Jepsen framework for QA/CI/CD pipeline for OpenStack projects like Oslo
> (DB) or Trove, Tooz DLM and perhaps for any integration projects which
> rely on distributed systems. Although all tests are yet to be finished,
> results are quite visible, so I better off share early for a review,
> discussion and comments.
>
> I have similar tests done for the RabbitMQ OCF RA clusterers as well,
> although have yet wrote a report.
>
> PS. I'm sorry for so many tags I placed in the topic header, should I've
> used just "all" :) ? Have a nice weekends and take care!
>
> [0] https://goo.gl/VHyIIE
>
> --
> Best regards,
> Bogdan Dobrelya,
> Irc #bogdando
>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: [email protected]?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to