Bug#1030510: Info received (mariadb: FTBFS on s390x: timeout)
For the record: After upload of MariaDB 1:10.11.2-3 the MTR test suite failed to start (just timed out) after the build, potentially because the server binary was crashing/defect. On a third try it passed. 1: https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=s390x&ver=1%3A10.11.2-3&stamp=1682066999&raw=0 E: Build killed with signal TERM after 150 minutes of inactivity 2: https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=s390x&ver=1%3A10.11.2-3&stamp=1682102108&raw=0 E: Build killed with signal TERM after 150 minutes of inactivity 3: https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=s390x&ver=1%3A10.11.2-3&stamp=1682123620&raw=0 Completed: All 1028 tests were successful. All builds had: sbuild (Debian sbuild) 0.81.2+deb11u1 (31 August 2022) on zani.debian.org Kernel: Linux 5.10.0-21-s390x #1 SMP Debian 5.10.162-1 (2023-01-21) s390x (s390x)
Bug#1030510: Info received (Bug#1030510: Info received (Bug#1030510: Info received (mariadb: FTBFS on s390x: timeout)))
btw, I never said it was the same bug. Given this was a hang and the MDEV-30728 was a corrupted page read it's likely to be different.
Bug#1030510: Info received (Bug#1030510: Info received (Bug#1030510: Info received (mariadb: FTBFS on s390x: timeout)))
Thanks for the update Dipak! FYI Paul: According to upstream devs the ppc64el failures we saw in autopkgtests were related to same kernel bug: https://jira.mariadb.org/browse/MDEV-30728 On Fri, 24 Feb 2023 at 11:10, Dipak Zope1 wrote: > > The issue is fixed in 5.10 upstream stable kernel branch. > > And it is currently pending for the next bullseye upload against #1031753: > linux-image-5.10.0-21-s390x: user space process hangs on s390
Bug#1030510: Info received (Bug#1030510: Info received (Bug#1030510: Info received (mariadb: FTBFS on s390x: timeout)))
The issue is fixed in 5.10 upstream stable kernel branch. And it is currently pending for the next bullseye upload against #1031753: linux-image-5.10.0-21-s390x: user space process hangs on s390 We found an issue in Debian kernel 5.10.0-21-s390x which causes similar problems and we are working on the fix. Meanwhile I would request to downgrade all 5.10.0-21-s390x running machines to 5.10.0-20. I will update this bug once the next kernel version with fix is available to upgrade. Thanks, -Dipak From: Otto Kekäläinen Date: Sunday, 12 February 2023 at 4:09 AM To: Andrew Hutchings , Paul Gevers Cc: 1030...@bugs.debian.org <1030...@bugs.debian.org>, Daniel Black , Tuukka Pasanen , Faustin Lammler , debian-s...@lists.debian.org Subject: [EXTERNAL] Re: Bug#1030510: Info received (Bug#1030510: Info received (Bug#1030510: Info received (mariadb: FTBFS on s390x: timeout))) For the record, the failing Debian machines run: Linux ci-worker-s390x-01 5.10.0-21-s390x #1 SMP Debian 5.10.162-1 (2023-01-21) s390x GNU/Linux The passing Launchpad builders run: Linux bos02-s390x-013 5.4.0-137-generic #154-Ubuntu SMP Thu Jan 5 17:03:11 UTC 2023 s390x
Bug#1030510: Info received (Bug#1030510: Info received (Bug#1030510: Info received (mariadb: FTBFS on s390x: timeout)))
We found an issue in Debian kernel 5.10.0-21-s390x which causes similar problems and we are working on the fix. Meanwhile I would request to downgrade all 5.10.0-21-s390x running machines to 5.10.0-20. I will update this bug once the next kernel version with fix is available to upgrade. Thanks, -Dipak From: Otto Kekäläinen Date: Sunday, 12 February 2023 at 4:09 AM To: Andrew Hutchings , Paul Gevers Cc: 1030...@bugs.debian.org <1030...@bugs.debian.org>, Daniel Black , Tuukka Pasanen , Faustin Lammler , debian-s...@lists.debian.org Subject: [EXTERNAL] Re: Bug#1030510: Info received (Bug#1030510: Info received (Bug#1030510: Info received (mariadb: FTBFS on s390x: timeout))) For the record, the failing Debian machines run: Linux ci-worker-s390x-01 5.10.0-21-s390x #1 SMP Debian 5.10.162-1 (2023-01-21) s390x GNU/Linux The passing Launchpad builders run: Linux bos02-s390x-013 5.4.0-137-generic #154-Ubuntu SMP Thu Jan 5 17:03:11 UTC 2023 s390x
Bug#1030510: Info received (Bug#1030510: Info received (Bug#1030510: Info received (mariadb: FTBFS on s390x: timeout)))
For the record, the failing Debian machines run: Linux ci-worker-s390x-01 5.10.0-21-s390x #1 SMP Debian 5.10.162-1 (2023-01-21) s390x GNU/Linux The passing Launchpad builders run: Linux bos02-s390x-013 5.4.0-137-generic #154-Ubuntu SMP Thu Jan 5 17:03:11 UTC 2023 s390x
Bug#1030510: Info received (Bug#1030510: Info received (Bug#1030510: Info received (mariadb: FTBFS on s390x: timeout)))
Hi, On 11-02-2023 22:41, Andrew Hutchings wrote: On 11/02/2023 20:17, Paul Gevers wrote: Well, the buildd's don't run bookworm, but they run stable (which currently is bullseye). Sorry, I was told this was Bookworm blocker when this was emailed to me, not a Bullseye one. That is true. But Debian build infrastructure runs stable and the build happens in a schroot (if I recall correctly). If the intention is to release 10.11 for Bookworm, surely that is what should be tested? Given that the entire underlying base kernal and OS will be different? I see your point, but the kernel is nearly always different anyways between the moment you use a build artifact and the moment it's build. That's how binary distributions work. Obviously ideally we should have the same kernel during testing, but at this moment the infrastructure uses lxc (giving you the same OS, but a different kernel) and the host runs stable. Or is the intention to release update Bullseye to have 10.11? No. Paul OpenPGP_signature Description: OpenPGP digital signature
Bug#1030510: Info received (Bug#1030510: Info received (Bug#1030510: Info received (mariadb: FTBFS on s390x: timeout)))
Hi Paul, On 11/02/2023 20:17, Paul Gevers wrote: Well, the buildd's don't run bookworm, but they run stable (which currently is bullseye). Sorry, I was told this was Bookworm blocker when this was emailed to me, not a Bullseye one. If the intention is to release 10.11 for Bookworm, surely that is what should be tested? Given that the entire underlying base kernal and OS will be different? Or is the intention to release update Bullseye to have 10.11? Kind Regards -- Andrew (LinuxJedi) Hutchings Chief Contributions Officer MariaDB Foundation
Bug#1030510: Info received (Bug#1030510: Info received (Bug#1030510: Info received (mariadb: FTBFS on s390x: timeout)))
Hi, On Wed, 8 Feb 2023 11:42:10 + Andrew Hutchings wrote: Are we 100% sure that Bookworm's kernel on S390x is good? Well, the buildd's don't run bookworm, but they run stable (which currently is bullseye). I am seeing something weird on ci.debian.net too, which also runs stable. I upgraded the systems several days ago which pulled in a new security kernel. Since then I'm seeing that mariadb fails to install on s390x in tests because the install of mariadb-server-10.6 (1:10.6.11-2 testing) or mariadb-server (1:10.11.1-4 unstable) times out. root@ci-worker-s390x-01:~# uname -a Linux ci-worker-s390x-01 5.10.0-21-s390x #1 SMP Debian 5.10.162-1 (2023-01-21) s390x GNU/Linux See the recent tmpfails e.g. here: https://ci.debian.net/packages/d/dbconfig-common/testing/s390x/ https://ci.debian.net/packages/b/bacula/testing/s390x/ https://ci.debian.net/packages/d/django-reversion/testing/s390x/ Paul OpenPGP_signature Description: OpenPGP digital signature
Bug#1030510: Info received (Bug#1030510: Info received (Bug#1030510: Info received (mariadb: FTBFS on s390x: timeout)))
Some crashes in the signal handler are the just created "MDEV-30613 output_core_info crashes in my_read()" with a probable cause. Doesn't help the original crash reason however. A single thread backtrace isn't sufficient on errors like: " InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch" With MTR_PRINT_CORE=detailed in the environment will do multiple threads and be more likely to identify deadlocks between processes (or if the builder is just very slow at this point). Disabling performance schema -DPLUGIN_PERFSCHEMA=NO on s390x might be a way to get some breathing space on this optional feature to resolve later as the unmap crashes seem firmly in this feature's shutdown.
Bug#1030510: Info received (Bug#1030510: Info received (Bug#1030510: Info received (mariadb: FTBFS on s390x: timeout)))
Hi Otto, We are trying to figure this out but are going in completely blind as we don't have Bookworm on our S390x at the moment, and every other OS we test against is passing. This also doesn't look like anything 10.11 specific, some of this is code that hasn't been touched in a long time. I'm assuming other MariaDB versions are failing for you in similar ways on Bookworm? Some of these failurs are definitely this bug: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1020831 Are we 100% sure that Bookworm's kernel on S390x is good? Kind Regards Andrew On 07/02/2023 16:36, Otto Kekäläinen wrote: Control: severity -1 normal Control: tags -1 help The s390x build is still failing after 5 retries at https://buildd.debian.org/status/package.php?p=mariadb. The issue seems to be with Debian buildd, as the Launchpad s390x build passed just fine without the need to retry anything: https://launchpad.net/~mysql-ubuntu/+archive/ubuntu/mariadb-10.11/+builds?build_text=&build_state=all It seems to crash on different tests every time. I could disable the entire test suite, but that feels like a bad idea. I need help - if the s390x build does not pass, the 10.11 cannot enter Debian testing (Bookworm). (this email is for https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1030510) -- Andrew (LinuxJedi) Hutchings Chief Contributions Officer MariaDB Foundation
Bug#1030510: Info received (Bug#1030510: Info received (Bug#1030510: Info received (mariadb: FTBFS on s390x: timeout)))
Control: severity -1 serious On 2023-02-07 08:36:11 -0800, Otto Kekäläinen wrote: > Control: severity -1 normal > Control: tags -1 help > > The s390x build is still failing after 5 retries at > https://buildd.debian.org/status/package.php?p=mariadb. The issue > seems to be with Debian buildd, as the Launchpad s390x build passed > just fine without the need to retry anything: > https://launchpad.net/~mysql-ubuntu/+archive/ubuntu/mariadb-10.11/+builds?build_text=&build_state=all > > It seems to crash on different tests every time. I could disable the > entire test suite, but that feels like a bad idea. > > I need help - if the s390x build does not pass, the 10.11 cannot enter > Debian testing (Bookworm). Indeed: excuses: Migration status for mariadb (- to 1:10.11.1-3): BLOCKED: Rejected/violates migration policy/introduces a regression Issues preventing migration: ∙ ∙ Updating mariadb would introduce bugs in testing: #1029136, #1030604 ∙ ∙ autopkgtest for libreoffice/blocked-on-ci-infra: armel: Ignored failure, i386: Ignored failure, ppc64el: Ignored failure ∙ ∙ autopkgtest for mariadb/1:10.11.1-3: amd64: Pass, arm64: Pass, armel: Pass, armhf: Pass, i386: Pass, ppc64el: Pass ∙ ∙ autopkgtest for mariadb-10.6/1:10.6.11-2: amd64: Regression ♻ (reference ♻), arm64: Regression ♻ (reference ♻), armel: Pass, armhf: Pass, i386: Pass, ppc64el: Not a regression ∙ ∙ missing build on s390x And that's why the severity of this issue is serious. Cheers -- Sebastian Ramacher
Bug#1030510: Info received (Bug#1030510: Info received (Bug#1030510: Info received (mariadb: FTBFS on s390x: timeout)))
Control: severity -1 normal Control: tags -1 help The s390x build is still failing after 5 retries at https://buildd.debian.org/status/package.php?p=mariadb. The issue seems to be with Debian buildd, as the Launchpad s390x build passed just fine without the need to retry anything: https://launchpad.net/~mysql-ubuntu/+archive/ubuntu/mariadb-10.11/+builds?build_text=&build_state=all It seems to crash on different tests every time. I could disable the entire test suite, but that feels like a bad idea. I need help - if the s390x build does not pass, the 10.11 cannot enter Debian testing (Bookworm). (this email is for https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1030510)
Bug#1030510: Info received (Bug#1030510: Info received (mariadb: FTBFS on s390x: timeout))
And again same phenomenon in https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=s390x&ver=1%3A10.11.1-3&stamp=1675697634&raw=0. Copy-pasting more context to track if the main.xml is the preceding test other times as well: main.group_by_innodb 'innodb'w2 [ pass ] 11 main.group_min_max_innodb 'innodb' w2 [ pass ] 21 main.host_cache_size_functionality 'innodb' w2 [ fail ] Found warnings/errors in server log file! Test ended at 2023-02-06 13:37:33 line Attempting backtrace. You can use the following information to find out ^ Found warnings in /<>/builddir/mysql-test/var/2/log/mysqld.1.err ok - found 'core' (0/5) Core generated by '/<>/builddir/sql/mariadbd' Output from gdb follows. The first stack trace is from the failing thread. The following stack traces are from all threads (so the failing one is duplicated). -- warning: Can't open file anon_inode:[io_uring] which was expanded to anon_inode:[io_uring] during file-backed mapping note processing warning: Can't open file anon_inode:[io_uring] which was expanded to anon_inode:[io_uring] during file-backed mapping note processing [New LWP 4037916] [New LWP 4037923] [New LWP 4037925] [New LWP 4037924] [New LWP 4037934] [New LWP 4038289] [New LWP 4038342] [New LWP 4038372] [New LWP 4037931] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/s390x-linux-gnu/libthread_db.so.1". Core was generated by `/<>/builddir/sql/mariadbd --defaults-group-su'. Program terminated with signal SIGABRT, Aborted. #0 0x03ffab01861a in ?? () from /lib/s390x-linux-gnu/libc.so.6 [Current thread is 1 (Thread 0x3ffabeea820 (LWP 4037916))] #0 0x03ffab01861a in ?? () from /lib/s390x-linux-gnu/libc.so.6 #1 0x02aa242907f4 in handle_fatal_signal (sig=) at ./sql/signal_handler.cc:355 #2 #3 0x03ffab0130d2 in ?? () from /lib/s390x-linux-gnu/libc.so.6 #4 0x03ffab015c32 in pthread_cond_wait () from /lib/s390x-linux-gnu/libc.so.6 #5 0x02aa2468bb52 in buf_dblwr_t::flush_buffered_writes (this=this@entry=0x2aa25aeaa80 , size=size@entry=64) at ./storage/innobase/buf/buf0dblwr.cc:576 #6 0x02aa2468c082 in buf_dblwr_t::flush_buffered_writes (this=0x2aa25aeaa80 ) at ./storage/innobase/buf/buf0dblwr.cc:719 #7 0x02aa24693334 in buf_flush_list (lsn=4396608790832, max_n=24) at ./storage/innobase/buf/buf0flu.cc:1508 #8 buf_flush_list (max_n=24, lsn=4396608790832) at ./storage/innobase/buf/buf0flu.cc:1478 #9 0x02aa23ef6d52 in buf_flush_buffer_pool () at ./storage/innobase/buf/buf0flu.cc:2546 #10 0x02aa23ee959c in logs_empty_and_mark_files_at_shutdown () at ./storage/innobase/log/log0log.cc:1163 #11 0x02aa24636174 in innodb_shutdown () at ./storage/innobase/srv/srv0start.cc:1949 #12 0x02aa245447a2 in innobase_end () at ./storage/innobase/handler/ha_innodb.cc:4284 #13 innobase_end () at ./storage/innobase/handler/ha_innodb.cc:4271 #14 0x02aa24293c9e in ha_finalize_handlerton (plugin=0x2aa26381790) at ./sql/handler.cc:596 #15 0x02aa24056e06 in plugin_deinitialize (plugin=0x2aa26381790, ref_check=ref_check@entry=true) at ./sql/sql_plugin.cc:1273 #16 0x02aa24059bd0 in reap_plugins () at ./sql/sql_plugin.cc:1344 #17 0x02aa2405a6b0 in plugin_shutdown () at ./sql/sql_plugin.cc:2052 #18 0x02aa23f3ecec in clean_up (print_message=) at ./sql/mysqld.cc:2000 #19 0x02aa23f4a4d8 in clean_up (print_message=true) at ./sql/mysqld.cc:1972 #20 mysqld_main (argc=, argv=) at ./sql/mysqld.cc:6024 #21 0x03ffaafab84a in ?? () from /lib/s390x-linux-gnu/libc.so.6 #22 0x03ffaafab932 in __libc_start_main () from /lib/s390x-linux-gnu/libc.so.6 #23 0x02aa23f3d378 in _start () main.xml w2 [ pass ] 19 worker[1] Test still running: main.innodb_ext_key worker[1] Test still running: main.innodb_ext_key worker[1] Test still running: main.innodb_ext_key worker[1] Test still running: main.innodb_ext_key worker[1] Test still running: main.innodb_ext_key worker[1] Trying to dump core for [mysqltest - pid: 4034646, winpid: 4034646] worker[1] Trying to dump core for [mysqld.1 - pid: 4034628, winpid: 4034628] main.innodb_ext_key 'innodb,off,unoptimized' w1 [ fail ] timeout after 7200 seconds Test ended at 2023-02-06 15:33:16 Test case timeout after 7200 seconds == /<>/builddir/mysql-test/var/1/tmp/analyze-timeout-mysqld.1.err == mysqltest: Could not open connection 'default' after 500 attempts: 2002 Can't connect to local server through socket '/<>/builddir/mysql-test/var/tm' (111) - found 'core' (1/5) Core generated by '/<>/builddir/sql/mariadbd' Output from gdb follows. The first stack trace is from the failing thread. The following stack traces are from all threads (so the failing one is duplicated). -- warning: Can't open file anon_inode:[io_uring] which was expanded to anon_inode:[io_uring] during file-backed mapping note processing warning: Can't open file anon_inode:[io_uring] whic
Bug#1030510: Info received (mariadb: FTBFS on s390x: timeout)
Control: retitle -1 mariadb: FTBFS on s390x: crash on munmap(), free(), aligned_free() For the record, the latest build https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=s390x&ver=1%3A10.11.1-3&stamp=1675662468&raw=0 shows other test failures again, but the stack trace seem to have munmap(), free(), aligned_free() etc in common: main.bootstrap_innodb 'innodb' w2 [ fail ] Found warnings/errors in server log file! Test ended at 2023-02-06 05:41:47 line Attempting backtrace. You can use the following information to find out ^ Found warnings in /<>/builddir/mysql-test/var/2/log/mysqld.1.err ok - found 'core' (0/5) Core generated by '/<>/builddir/sql/mariadbd' Output from gdb follows. The first stack trace is from the failing thread. The following stack traces are from all threads (so the failing one is duplicated). -- [New LWP 2264728] [New LWP 2264825] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/s390x-linux-gnu/libthread_db.so.1". Core was generated by `/<>/builddir/sql/mariadbd --defaults-group-su'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x03ffb4448992 in kill () from /lib/s390x-linux-gnu/libc.so.6 [Current thread is 1 (Thread 0x3ffb536a820 (LWP 2264728))] #0 0x03ffb4448992 in kill () from /lib/s390x-linux-gnu/libc.so.6 #1 0x02aa06f107c4 in handle_fatal_signal (sig=) at ./sql/signal_handler.cc:367 #2 #3 0x02aa073fc26a in my_read (Filedes=, Buffer=0xd , Count=4096, MyFlags=) at ./mysys/my_read.c:63 #4 0x02aa06f10266 in output_core_info () at ./sql/signal_handler.cc:90 #5 0x02aa06f10792 in handle_fatal_signal (sig=) at ./sql/signal_handler.cc:351 #6 #7 0x03ffb450e632 in munmap () from /lib/s390x-linux-gnu/libc.so.6 #8 0x03ffb44a7790 in free () from /lib/s390x-linux-gnu/libc.so.6 #9 0x02aa07140022 in aligned_free (ptr=) at ./include/aligned.h:37 #10 pfs_free (ptr=, size=0, klass=0x2aa086c6900 ) at ./storage/perfschema/pfs_global.cc:83 #11 pfs_free (ptr=, size=0, klass=0x2aa086c6900 ) at ./storage/perfschema/pfs_global.cc:78 #12 pfs_free_array (klass=0x2aa086c6900 , n=n@entry=256, size=size@entry=32, ptr=) at ./storage/perfschema/pfs_global.cc:134 #13 0x02aa07135e82 in PFS_thread_allocator::free_array (this=, array=array@entry=0x2aa08f4fd30) at ./storage/perfschema/pfs_buffer_container.cc:659 #14 0x02aa071425da in PFS_buffer_scalable_container::cleanup (this=) at ./storage/perfschema/pfs_buffer_container.h:506 #15 PFS_buffer_scalable_container::cleanup (this=) at ./storage/perfschema/pfs_buffer_container.h:491 #16 cleanup_instruments () at ./storage/perfschema/pfs_instr.cc:233 #17 0x02aa0715000c in cleanup_performance_schema () at ./storage/perfschema/pfs_server.cc:296 #18 0x02aa071504f0 in shutdown_performance_schema () at ./storage/perfschema/pfs_server.cc:326 #19 0x02aa06bbf912 in mysqld_exit (exit_code=exit_code@entry=0) at ./sql/mysqld.cc:1943 #20 0x02aa06bca4fe in mysqld_main (argc=, argv=) at ./sql/mysqld.cc:6040 #21 0x03ffb442b84a in ?? () from /lib/s390x-linux-gnu/libc.so.6 #22 0x03ffb442b932 in __libc_start_main () from /lib/s390x-linux-gnu/libc.so.6 #23 0x02aa06bbd378 in _start () main.host_cache_size_functionality 'innodb' w2 [ fail ] Found warnings/errors in server log file! Test ended at 2023-02-06 05:44:47 line Attempting backtrace. You can use the following information to find out ^ Found warnings in /<>/builddir/mysql-test/var/2/log/mysqld.1.err ok - found 'core' (2/5) Core generated by '/<>/builddir/sql/mariadbd' Output from gdb follows. The first stack trace is from the failing thread. The following stack traces are from all threads (so the failing one is duplicated). -- [New LWP 2267523] [New LWP 2268734] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/s390x-linux-gnu/libthread_db.so.1". Core was generated by `/<>/builddir/sql/mariadbd --defaults-group-su'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x03ffa7e48992 in kill () from /lib/s390x-linux-gnu/libc.so.6 [Current thread is 1 (Thread 0x3ffa8d6a820 (LWP 2267523))] #0 0x03ffa7e48992 in kill () from /lib/s390x-linux-gnu/libc.so.6 #1 0x02aa0f2907c4 in handle_fatal_signal (sig=) at ./sql/signal_handler.cc:367 #2 #3 0x02aa0f77c26a in my_read (Filedes=, Buffer=0xd , Count=4096, MyFlags=) at ./mysys/my_read.c:63 #4 0x02aa0f290266 in output_core_info () at ./sql/signal_handler.cc:90 #5 0x02aa0f290792 in handle_fatal_signal (sig=) at ./sql/signal_handler.cc:351 #6 #7 0x03ffa7f0e632 in munmap () from /lib/s390x-linux-gnu/libc.so.6 #8 0x03ffa7ea7790 in free () from /lib/s390x-linux-gnu/libc.so.6 #9 0x02aa0f4c0022 in aligned_free (ptr=) at ./include/aligned.h:37 #10 pfs_free (ptr=, size=2841600, klass=0x2aa10a46700 ) at ./storage/perfschema/pfs_global.cc:83 #11 pfs_free (ptr=, size=2841600, klass=0x2aa10a4670