On Sun, 14 Jan 2024, 04:39 Michael Caplan via discuss, <
[email protected]> wrote:

> Hello,
>
> I'm trying to troubleshoot a performance a replication issue between two
> version of MariaDB.  I know that we are using two EOL versions, but I want
> to rule out all issues before prioritizing migrating to a new version.
>
> I have built a new dedicated replication server (B) to replace the current
> dedicated replica (A).  A and B are both running identical OSes and
> versions and near identical hardware in a RAID 10 configuration.  The main
> difference is that B has newer disks that slightly outperform A.  I
> benchmarked both A and B to validate this.  Both A and B are configured
> identically (other then server-id).
>
> Replica A (and primary)  is running MariaDB 10.3.31.  Replica B is
> running  10.3.39.
>
> A few other things to note:
>
>    - B is not feeding off of the primary, but A -- that is until we
>    sunset A.
>    - A is handing moderate read only traffic
>    - B is just running as a replica with no other responsibilities
>
>
> I have external monitoring to alert when either hits 500+
> Seconds_Behind_Master.  Since standing B up, I am seeing it fall behind a
> few times a day.   The interesting thing is that A is not falling behind
> from the primary.
>
> Upon further examination, a scheduled job that executes hundreds of
> similar delete operations off the same table (that has approx. 3 million
> records) is where B is largely getting hung up.  Something like this:
>
> DELETE FROM table WHERE x = 800000 AND y = 40000
>
>
> The table has an index on just x.
>
> Running *ANALYZE FORMAT=JSON SELECT * FROM table WHERE x = 800000 AND y =
> 40000*  from B show a striking difference in total time spent:
>
> REPLICA B:
> {
>   "query_block": {
>     "select_id": 1,
>     "r_loops": 1,
> *    "r_total_time_ms": 490.13,*
>     "table": {
>       "table_name": "table",
>       "access_type": "ref",
>       "possible_keys": ["table_index"],
>       "key": "table_index",
>       "key_length": "4",
>       "used_key_parts": ["x"],
>       "ref": ["const"],
>       "r_loops": 1,
>       "rows": 795555,
>       "r_rows": 31,
>       "r_total_time_ms": 490.08,
>       "filtered": 100,
>       "r_filtered": 100,
>       "index_condition": "table.y = '40000'"
>     }
>   }
> }
>
> vs:
>
> REPLICA A:
> {
>   "query_block": {
>     "select_id": 1,
>     "r_loops": 1,
> *    "r_total_time_ms": 106.37,*
>     "table": {
>       "table_name": "table",
>       "access_type": "ref",
>       "possible_keys": ["table_index"],
>       "key": "table_index",
>       "key_length": "4",
>       "used_key_parts": ["x"],
>       "ref": ["const"],
>       "r_loops": 1,
>       "rows": 1146482,
>       "r_rows": 31,
>       "r_total_time_ms": 106.34,
>       "filtered": 100,
>       "r_filtered": 100,
>       "index_condition": "table.y = '40000'"
>     }
>   }
> }
>
>
> That is a *significant* difference performance wise.  From a hardware
> point of view, B should have a slight edge over A.  But the contrary is
> true by a large margin.
>
>
> As a short term "fix", I added a secondary index: *create index
> table_index2 on table(x, y);   *This significantly speeds up this
> operation, and, for now, seems to deal with much of the experienced
> replication lab in B:
>
> SERVER B:
> {
>   "query_block": {
>     "select_id": 1,
>     "r_loops": 1,
> *    "r_total_time_ms": 0.253,*
>     "table": {
>       "table_name": "table",
>       "access_type": "ref",
>       "possible_keys": [
>         "table_index",
>         "table_index2"
>       ],
>       "key": "table_index2,
>       "key_length": "8",
>       "used_key_parts": ["x", "y"],
>       "ref": ["const", "const"],
>       "r_loops": 1,
>       "rows": 31,
>       "r_rows": 31,
>       "r_total_time_ms": 0.2037,
>       "filtered": 100,
>       "r_filtered": 100
>     }
>   }
> }
>
>
> This seems reasonable as a *general optimization*, but provides little
> comfort in understanding the root cause of my performance issue with B.
> I'm hesitant to retire A in favor of B until I can get to the bottom of
> this.
>
> The question I have for the community is: *what would you recommend that
> I do to get further perspective on the issue to determine a course of
> action?*
>

Give the explain uses the same index on both, cold buffer pool could be one
possible explanation.

Did you make sure configuration is the same on both servers, particularly
innodb_buffer_pool% and innodb_log% settings?

Otherwise:

- Use binlog_format=ROW on all servers

- Make sure all your tables have primary keys defined

- It sounds like you should have that secondary index anyway, on all the
servers

- if a lot of rows match, use pt-archiver instead, which will delete
matching rows in small batches by primary keys, which will minimize lag.

- increase slave_parallel_threads (but not too high)

>
_______________________________________________
discuss mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to