Hello,
I'm trying to troubleshoot a performance a replication issue between two
version of MariaDB. I know that we are using two EOL versions, but I want to
rule out all issues before prioritizing migrating to a new version.
I have built a new dedicated replication server (B) to replace the current
dedicated replica (A). A and B are both running identical OSes and versions
and near identical hardware in a RAID 10 configuration. The main difference is
that B has newer disks that slightly outperform A. I benchmarked both A and B
to validate this. Both A and B are configured identically (other then
server-id).
Replica A (and primary) is running MariaDB 10.3.31. Replica B is running
10.3.39.
A few other things to note:
* B is not feeding off of the primary, but A -- that is until we sunset A.
* A is handing moderate read only traffic
* B is just running as a replica with no other responsibilities
I have external monitoring to alert when either hits 500+
Seconds_Behind_Master. Since standing B up, I am seeing it fall behind a few
times a day. The interesting thing is that A is not falling behind from the
primary.
Upon further examination, a scheduled job that executes hundreds of similar
delete operations off the same table (that has approx. 3 million records) is
where B is largely getting hung up. Something like this:
DELETE FROM table WHERE x = 800000 AND y = 40000
The table has an index on just x.
Running ANALYZE FORMAT=JSON SELECT * FROM table WHERE x = 800000 AND y = 40000
from B show a striking difference in total time spent:
REPLICA B:
{
"query_block": {
"select_id": 1,
"r_loops": 1,
"r_total_time_ms": 490.13,
"table": {
"table_name": "table",
"access_type": "ref",
"possible_keys": ["table_index"],
"key": "table_index",
"key_length": "4",
"used_key_parts": ["x"],
"ref": ["const"],
"r_loops": 1,
"rows": 795555,
"r_rows": 31,
"r_total_time_ms": 490.08,
"filtered": 100,
"r_filtered": 100,
"index_condition": "table.y = '40000'"
}
}
}
vs:
REPLICA A:
{
"query_block": {
"select_id": 1,
"r_loops": 1,
"r_total_time_ms": 106.37,
"table": {
"table_name": "table",
"access_type": "ref",
"possible_keys": ["table_index"],
"key": "table_index",
"key_length": "4",
"used_key_parts": ["x"],
"ref": ["const"],
"r_loops": 1,
"rows": 1146482,
"r_rows": 31,
"r_total_time_ms": 106.34,
"filtered": 100,
"r_filtered": 100,
"index_condition": "table.y = '40000'"
}
}
}
That is a significant difference performance wise. From a hardware point of
view, B should have a slight edge over A. But the contrary is true by a large
margin.
As a short term "fix", I added a secondary index: create index table_index2 on
table(x, y); This significantly speeds up this operation, and, for now, seems
to deal with much of the experienced replication lab in B:
SERVER B:
{
"query_block": {
"select_id": 1,
"r_loops": 1,
"r_total_time_ms": 0.253,
"table": {
"table_name": "table",
"access_type": "ref",
"possible_keys": [
"table_index",
"table_index2"
],
"key": "table_index2,
"key_length": "8",
"used_key_parts": ["x", "y"],
"ref": ["const", "const"],
"r_loops": 1,
"rows": 31,
"r_rows": 31,
"r_total_time_ms": 0.2037,
"filtered": 100,
"r_filtered": 100
}
}
}
This seems reasonable as a general optimization, but provides little comfort in
understanding the root cause of my performance issue with B. I'm hesitant to
retire A in favor of B until I can get to the bottom of this.
The question I have for the community is: what would you recommend that I do to
get further perspective on the issue to determine a course of action?
Thanks everyone,
Mike
Confidentiality Notice: The information contained in this electronic message
and any attachments to this message are intended only for the individual(s)
addressed in the message and may contain proprietary and confidential
information. If you are not the intended recipient, you should not disseminate,
distribute, or copy this e-mail. Please notify the sender and destroy this
message. WARNING: Computer viruses can be transmitted via email. The recipient
should scan this email before opening it. The company accepts no liability for
any damage caused by any virus transmitted by this email.
_______________________________________________
discuss mailing list -- [email protected]
To unsubscribe send an email to [email protected]