On Sun, 14 Jan 2024, 04:39 Michael Caplan via discuss, <
[email protected]> wrote:
> Hello,
>
> I'm trying to troubleshoot a performance a replication issue between two
> version of MariaDB. I know that we are using two EOL versions, but I want
> to rule out all issues before prioritizing migrating to a new version.
>
> I have built a new dedicated replication server (B) to replace the current
> dedicated replica (A). A and B are both running identical OSes and
> versions and near identical hardware in a RAID 10 configuration. The main
> difference is that B has newer disks that slightly outperform A. I
> benchmarked both A and B to validate this. Both A and B are configured
> identically (other then server-id).
>
> Replica A (and primary) is running MariaDB 10.3.31. Replica B is
> running 10.3.39.
>
> A few other things to note:
>
> - B is not feeding off of the primary, but A -- that is until we
> sunset A.
> - A is handing moderate read only traffic
> - B is just running as a replica with no other responsibilities
>
>
> I have external monitoring to alert when either hits 500+
> Seconds_Behind_Master. Since standing B up, I am seeing it fall behind a
> few times a day. The interesting thing is that A is not falling behind
> from the primary.
>
> Upon further examination, a scheduled job that executes hundreds of
> similar delete operations off the same table (that has approx. 3 million
> records) is where B is largely getting hung up. Something like this:
>
> DELETE FROM table WHERE x = 800000 AND y = 40000
>
>
> The table has an index on just x.
>
> Running *ANALYZE FORMAT=JSON SELECT * FROM table WHERE x = 800000 AND y =
> 40000* from B show a striking difference in total time spent:
>
> REPLICA B:
> {
> "query_block": {
> "select_id": 1,
> "r_loops": 1,
> * "r_total_time_ms": 490.13,*
> "table": {
> "table_name": "table",
> "access_type": "ref",
> "possible_keys": ["table_index"],
> "key": "table_index",
> "key_length": "4",
> "used_key_parts": ["x"],
> "ref": ["const"],
> "r_loops": 1,
> "rows": 795555,
> "r_rows": 31,
> "r_total_time_ms": 490.08,
> "filtered": 100,
> "r_filtered": 100,
> "index_condition": "table.y = '40000'"
> }
> }
> }
>
> vs:
>
> REPLICA A:
> {
> "query_block": {
> "select_id": 1,
> "r_loops": 1,
> * "r_total_time_ms": 106.37,*
> "table": {
> "table_name": "table",
> "access_type": "ref",
> "possible_keys": ["table_index"],
> "key": "table_index",
> "key_length": "4",
> "used_key_parts": ["x"],
> "ref": ["const"],
> "r_loops": 1,
> "rows": 1146482,
> "r_rows": 31,
> "r_total_time_ms": 106.34,
> "filtered": 100,
> "r_filtered": 100,
> "index_condition": "table.y = '40000'"
> }
> }
> }
>
>
> That is a *significant* difference performance wise. From a hardware
> point of view, B should have a slight edge over A. But the contrary is
> true by a large margin.
>
>
> As a short term "fix", I added a secondary index: *create index
> table_index2 on table(x, y); *This significantly speeds up this
> operation, and, for now, seems to deal with much of the experienced
> replication lab in B:
>
> SERVER B:
> {
> "query_block": {
> "select_id": 1,
> "r_loops": 1,
> * "r_total_time_ms": 0.253,*
> "table": {
> "table_name": "table",
> "access_type": "ref",
> "possible_keys": [
> "table_index",
> "table_index2"
> ],
> "key": "table_index2,
> "key_length": "8",
> "used_key_parts": ["x", "y"],
> "ref": ["const", "const"],
> "r_loops": 1,
> "rows": 31,
> "r_rows": 31,
> "r_total_time_ms": 0.2037,
> "filtered": 100,
> "r_filtered": 100
> }
> }
> }
>
>
> This seems reasonable as a *general optimization*, but provides little
> comfort in understanding the root cause of my performance issue with B.
> I'm hesitant to retire A in favor of B until I can get to the bottom of
> this.
>
> The question I have for the community is: *what would you recommend that
> I do to get further perspective on the issue to determine a course of
> action?*
>
Give the explain uses the same index on both, cold buffer pool could be one
possible explanation.
Did you make sure configuration is the same on both servers, particularly
innodb_buffer_pool% and innodb_log% settings?
Otherwise:
- Use binlog_format=ROW on all servers
- Make sure all your tables have primary keys defined
- It sounds like you should have that secondary index anyway, on all the
servers
- if a lot of rows match, use pt-archiver instead, which will delete
matching rows in small batches by primary keys, which will minimize lag.
- increase slave_parallel_threads (but not too high)
>
_______________________________________________
discuss mailing list -- [email protected]
To unsubscribe send an email to [email protected]