[MariaDB discuss] mariadb 10.6.17 + galera-26.4.19

Jaco Kroon via discuss Sun, 06 Oct 2024 22:55:45 -0700

Hi,

We're observing the following crash using galera and mariadb versions asper subject. Usually only one of the three nodes will go down, but lastnight two went down within minutes of each other resulting in a fairlynasty outage. Given the frequency of these crashes (about one once amonth) I highly suspect we're doing something wrong causing us to runinto this, but not others. So any advise appreciated.


Stack trace as per logs:

241007  1:01:25 [ERROR] mysqld got signal 11 ;
Sorry, we probably made a mistake, and this is a bug.

Your assistance in bug reporting will enable us to fix this for the nextrelease.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 10.6.17-MariaDB-log source revision:15c75ad083a55e198ae78324f22970694b72f22b

key_buffer_size=536870912
read_buffer_size=1048576
max_used_connections=522
max_threads=10002
thread_count=83
It is possible that mysqld could use up to

key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads =10498887186 K bytes of memory

Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f675c000c68
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f7f78056da8 thread_stack 0x49000
/usr/sbin/mysqld(my_print_stacktrace+0x32)[0x5632cb5f2212]
/usr/sbin/mysqld(handle_fatal_signal+0x2b3)[0x5632cb133db3]
/lib64/libc.so.6(+0x3c760)[0x7f7f7b653760]
/usr/sbin/mysqld(+0x8ee692)[0x5632cb44b692]
/usr/sbin/mysqld(+0x8ef5cd)[0x5632cb44c5cd]
/usr/sbin/mysqld(+0x8acfa5)[0x5632cb409fa5]
/usr/sbin/mysqld(+0x24e630)[0x5632cadab630]
/usr/sbin/mysqld(+0x24e6b5)[0x5632cadab6b5]
/usr/sbin/mysqld(+0x949418)[0x5632cb4a6418]
/usr/sbin/mysqld(+0x960b34)[0x5632cb4bdb34]
/usr/sbin/mysqld(+0x8b9e7c)[0x5632cb416e7c]
/usr/sbin/mysqld(_ZN7handler10ha_rnd_posEPhS0_+0x232)[0x5632cb13b022]
/usr/sbin/mysqld(_ZN14Rows_log_event8find_rowEP14rpl_group_info+0x3e4)[0x5632cb2624c4]
/usr/sbin/mysqld(_ZN21Delete_rows_log_event11do_exec_rowEP14rpl_group_info+0x142)[0x5632cb2629b2]
/usr/sbin/mysqld(_ZN14Rows_log_event14do_apply_eventEP14rpl_group_info+0x35f)[0x5632cb255c9f]
/usr/sbin/mysqld(_Z18wsrep_apply_eventsP3THDP14Relay_log_infoPKvm+0x1fd)[0x5632cb3def8d]
/usr/sbin/mysqld(+0x868ff0)[0x5632cb3c5ff0]
/usr/sbin/mysqld(_ZN21Wsrep_applier_service15apply_write_setERKN5wsrep7ws_metaERKNS0_12const_bufferERNS0_14mutable_bufferE+0xb5)[0x5632cb3c6ba5]
/usr/sbin/mysqld(+0xb0c921)[0x5632cb669921]
/usr/sbin/mysqld(+0xb1dfd6)[0x5632cb67afd6]
/usr/lib64/galera/libgalera_smm.so(+0x60724)[0x7f7f7b260724]
/usr/lib64/galera/libgalera_smm.so(+0x6fa67)[0x7f7f7b26fa67]
/usr/lib64/galera/libgalera_smm.so(+0x74d65)[0x7f7f7b274d65]
/usr/lib64/galera/libgalera_smm.so(+0x9fdc3)[0x7f7f7b29fdc3]
/usr/lib64/galera/libgalera_smm.so(+0xa098e)[0x7f7f7b2a098e]
/usr/lib64/galera/libgalera_smm.so(+0x75200)[0x7f7f7b275200]
/usr/lib64/galera/libgalera_smm.so(+0x4f35f)[0x7f7f7b24f35f]
/usr/sbin/mysqld(_ZN5wsrep18wsrep_provider_v2611run_applierEPNS_21high_priority_serviceE+0x12)[0x5632cb67b652]
/usr/sbin/mysqld(+0x8842d1)[0x5632cb3e12d1]
/usr/sbin/mysqld(_Z15start_wsrep_THDPv+0x276)[0x5632cb3d0d16]
/usr/sbin/mysqld(+0x80e941)[0x5632cb36b941]
/lib64/libc.so.6(+0x8ad22)[0x7f7f7b6a1d22]
/lib64/libc.so.6(+0x10698c)[0x7f7f7b71d98c]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7f7efdea05db): delete from `jobs` where `id` = 808933436

Connection ID (thread ID): 2
Status: NOT_KILLED

Optimizer switch:index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off,hash_join_cardinality=off,cset_narrowing=off

The manual page athttps://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mariadbd/contains

information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /var/lib/mysql
Resource Limits:
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited unlimited            seconds
Max file size             unlimited unlimited            bytes
Max data size             unlimited unlimited            bytes
Max stack size            8388608 unlimited            bytes
Max core file size        0 unlimited            bytes
Max resident set          unlimited unlimited            bytes
Max processes             1031460 1031460              processes
Max open files            91983 91983                files
Max locked memory         8388608 8388608              bytes
Max address space         unlimited unlimited            bytes
Max file locks            unlimited unlimited            locks
Max pending signals       1031460 1031460              signals
Max msgqueue size         819200 819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited unlimited            us
Core pattern: core

Kernel version: Linux version 6.4.12-uls (root@sysrescue) (gcc (Gentoo13.2.1_p20230826 p7) 13.2.1 20230826, GNU ld (Gentoo 2.41 p2) 2.41.0) #2SMP PREEMPT_DYNAMIC Thu Jan 4 20:10:49 SAST 2024

I'm unable to locate the referenced core file or I'd already have pulledthat into gdb.

Any advise or pointers appreciated. I do notice there are galeraupdates, and a number of mariadb upgrades available. For galera thislooks like improvements in logging, so probably not the cause of thecrash, and for mariadb in 10.6.18 these items stand out for me:

Server crashes in JOIN_CACHE::write_record_data upon EXPLAIN withsubqueries and constant tables (MDEV-21102<https://jira.mariadb.org/browse/MDEV-21102>) - but the stack tracedoesn't match.

Server crash in Rows_log_event::update_sequence upon replaying binarylog (MDEV-31779 <https://jira.mariadb.org/browse/MDEV-31779>) - thislooks feasible.

If anyone can confirm my conjecture that would be great. We'd want totest 10.11 or even 11.4 in a non-production environment before making abig jump to one of those versions in production.


Kind regards,
Jaco

_______________________________________________
discuss mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[MariaDB discuss] mariadb 10.6.17 + galera-26.4.19

Reply via email to