Sure, Daniel

PR #7895 is currently in draft as we need to do some more tests. However, the intention is to enable users to configure the DB connection URI directly through `db.properties` file. These are the tests that have been done so far with ACS without this PR changeset:

Using the current version in a setup with MariaDB and Galera, with a cluster size of 3 and the following configuration on the db.properties file:
```
# High Availability And Cluster Properties
db.ha.enabled=true
db.ha.loadBalanceStrategy=com.cloud.utils.db.StaticStrategy
# cloud stack Database
db.cloud.replicas=192.168.201.161,192.168.201.162
db.cloud.autoReconnect=false
db.cloud.failOverReadOnly=false
db.cloud.reconnectAtTxEnd=false
db.cloud.autoReconnectForPools=true
db.cloud.secondsBeforeRetrySource=1800
db.cloud.queriesBeforeRetrySource=5000
db.cloud.initialTimeout=3600
```
When the MariaDB service stops in the main node, ACS switches to one of the other two nodes. However, if the host is shut down, the switch never occurs.

Then, we also did tests using the changes proposed in the PR, by configuring the db.cloud.uri:

```
db.cloud.uri=jdbc:mariadb:sequential://192.168.201.160:3306,192.168.201.161:3306,192.168.201.162:3306/cloud?autoReconnect=true&prepStmtCacheSize=517&cachePrepStmts=true&sessionVariables=sql_mode='STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION'&serverTimezone=UTC

# These properties are ignored when setting the URI manually, so no need to set them.

# High Availability And Cluster Properties
# db.ha.enabled=true
# db.ha.loadBalanceStrategy=com.cloud.utils.db.StaticStrategy
# cloud stack Database
# db.cloud.replicas=192.168.201.161,192.168.201.162
# db.cloud.autoReconnect=false
# db.cloud.failOverReadOnly=false
# db.cloud.reconnectAtTxEnd=false
# db.cloud.autoReconnectForPools=true
# db.cloud.secondsBeforeRetrySource=1800
# db.cloud.queriesBeforeRetrySource=5000
# db.cloud.initialTimeout=3600
```

I was able to configure and use the sequential failover mode. This way, when the MariaDB service stops in the main node and even if the host is shut down, ACS is able to switch to the other DBs.

There are two differences between defining the URI manually (which is proposed with PR#7895) and the generated by ACS. The first one is the `jdbc:mariadb`, which is the driver that makes the connection with the DBMS, this enables usage of MariaDB URL configurations, this driver is being introduced into ACS with PR#7895. The second one is the usage of the `sequential` [1] failover mode, that will try to connect to hosts in the order in which they were declared in the connection URL, so the first available host is used for all queries, and if one of the hosts is shut down, it will try to reconnect with the other on the list. As this mode only connects to a single DB, the problems referenced by Rohit are avoided. But the failover mechanism is still in place.

Best regards,
João Jandre

[1] - https://mariadb.com/kb/en/about-mariadb-connector-j/

On 22/08/2023 16:03, Daniel Salvador wrote:
Hello Lucian and all,

I am -1 on removing the whole DB HA feature from CloudStack.

As we discussed on July[1], the current properties we have on
"db.properties" regarding DB HA are hardcoded and only address some MySQL
properties, which are not fully compatible with the properties for
configuring DB HA on MariaDB. It indeed has some problems; however, I think
we should keep the functionality and improve it, to enrich CloudStack and
avoid using other layers to accomplish the goals. It is good to have a
workaround, though.

João Jandre and I are already working on a solution to flexibilize the DB
parameters in order to allow one to configure DB HA properly when using
MariaDB (and also do several other configurations). João, could you point
to the PR that addresses the changes and share the configurations and tests
we have done so far?

Best regards,
Daniel Salvador (gutoveronezi)

[1] - https://lists.apache.org/thread/j0mmwy9dfr9k2kbnnjxcr2m7y8zwd34c

On Tue, Aug 22, 2023 at 12:42 PM Nux <n...@li.nux.ro> wrote:

New adopters may not go ahead with it in production because they won't
get it working, unless they fix a lot of code, that would be a nice pull
request. :)


On 2023-08-22 16:25, K B Shiv Kumar wrote:
Well, if it is broken and it is not prominently mentioned anywhere new
adopters may go ahead with that on production. So I guess best to
remove or at least mention that it is not production grade.

Thanks
Shiv

On 22-Aug-2023, at 20:12, Nux <n...@li.nux.ro> wrote:

But what do you think of the removal of DB HA code?

When using Galera you need to query against a single node, don't
spread the load among all 3, as this will break certain locking
functionality in Cloudstack and lead to problems.

In a Haproxy configuration you should be keeping just one active, eg:
        server galera1 10.0.3.2:3306 check
        server galera2 10.0.3.3:3306 check backup
        server galera3 10.0.3.4:3306 check backup

Regards

On 2023-08-22 15:36, K B Shiv Kumar wrote:
We faced some issues when running Galera. We went back to master
slave.
Anyone using Galera in production for a long time?
Regards,
Shiv
On 22-Aug-2023, at 19:34, Nux <n...@li.nux.ro> wrote:
Happy to contribute a doc on how to achieve HA if we decide to
remove this.
Thanks
On 2023-08-22 15:01, Rohit Yadav wrote:
+1 it's a broken feature that at least doesn't work with MySQL 8.x,
I'm not sure if it worked with prior versions of MySQL. However, we
need to document some sort of suggested MySQL HA setup in our docs.
Regards.
________________________________
From: Nux <n...@li.nux.ro>
Sent: Tuesday, August 22, 2023 18:54
To: us...@cloudstack.apache.org <us...@cloudstack.apache.org>; Dev
<dev@cloudstack.apache.org>
Subject: [Consultation] Remove DB HA feature (db.ha.enabled)
Hello everyone,
A few weeks ago I asked you if you use or managed to use the DB HA
Cloudstack feature (db.ha.enabled)[1] and after reading some of the
replies and doing intensive testing myself I have found out that
the
feature is indeed non-functional, it's broken.
In my testing I discovered DB HA can easily be done outside of
Cloudstack by employing load balancers and other techniques.
Personally I have achieved that by using Haproxy in front of Galera
cluster, but also introduced Keepalived (vrrp) in my setup to
"balance"
multiple Haproxies which also worked well.
As such, since the feature is basically broken, it will not be
trivial
to fix it and there are better ways of doing HA, then I propose to
remove it altogether.
Thoughts? Anyone against it?
Cheers
[1] -

https://docs.cloudstack.apache.org/en/latest/adminguide/reliability.html#database-high-availability

Reply via email to