Bug#1032104: marked as done (linux: ppc64el iouring corrupted read)
Your message dated Sun, 14 Jan 2024 19:48:12 + with message-id and subject line Bug#1032104: fixed in linux 5.10.205-1 has caused the Debian Bug report #1032104, regarding linux: ppc64el iouring corrupted read to be marked as done. This means that you claim that the problem has been dealt with. If this is not the case it is now your responsibility to reopen the Bug report if necessary, and/or fix the problem forthwith. (NB: If you are a system administrator and have no idea what this message is talking about, this may indicate a serious mail system misconfiguration somewhere. Please contact ow...@bugs.debian.org immediately.) -- 1032104: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1032104 Debian Bug Tracking System Contact ow...@bugs.debian.org with problems --- Begin Message --- Source: linux Version: 5.10.0-21-powerpc64le Severity: grave Justification: causes non-serious data loss X-Debbugs-Cc: dan...@mariadb.org Dear Maintainer, *** Reporter, please consider answering these questions, where appropriate *** * What led up to the situation? * What exactly did you do (or not do) that was effective (or ineffective)? * What was the outcome of this action? * What outcome did you expect instead? *** End of the template - remove these template lines *** >From https://jira.mariadb.org/browse/MDEV-30728 MariaDB's mtr tests on a number of specific tests depend on the correct kernel operation. As observed in these tests, there is a ~1/5 chance the encryption.innodb_encryption test will read zeros on the later part of the 16k pages that InnoDB uses by default. This affects MariaDB-10.6+ packages where there is a liburing in the distribution. This has been observed in the CI of Debian (https://ci.debian.net/packages/m/mariadb/testing/ppc64el/) and upstreams https://buildbot.mariadb.org/#/builders/318. The one ppc64le worker that has the Debian 5.10.0-21 kernel, the same as the Debian CI, has the prefix ppc64le-db-bbw1-*. Test faults occur on all MariaDB 10.6+ builds in containers on this kernel. There a no faults on non-ppc64le or RHEL7/8 based ppc64le kernels. To reproduce: apt-get install mariadb-test cd /usr/share/mysql/mysql-test ./mtr --mysqld=--innodb-flush-method=fsync --mysqld=--innodb-use-native-aio=1 --vardir=/var/lib/mysql --force encryption.innodb_encryption,innodb,undo0 --repeat=12 A test will frequenty fail. 2023-02-28 1:41:01 0 [ERROR] InnoDB: Database page corruption on disk or a failed read of file './ibdata1' page [page id: space=0, page number=282]. You may have to recover from a backup. (the page number isn't predictable) The complete mtr error log of mariadb server is $PWD/var/log/mysqld.1.err I tested on tmpfs. This is a different fault from bug #1020831 as: * there is no iouring error, just a bunch of zeros where data was expected. * this is ppc64le only. Note, more serious faults exist on overlayfs (MDEV-28751) and remote filesystems so sticking to local xfs, ext4, btrfs is recommended. -- System Information: Debian Release: bullseye APT prefers jammy-updates APT policy: (500, 'jammy-updates'), (500, 'jammy-security'), (500, 'jammy'), (100, 'jammy-backports') Architecture: ppc64el (ppc64le) Kernel: Linux 5.10.0-21-powerpc64le (SMP w/128 CPU threads) Locale: LANG=C.UTF-8, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE not set Shell: /bin/sh linked to /usr/bin/dash Init: unable to detect --- End Message --- --- Begin Message --- Source: linux Source-Version: 5.10.205-1 Done: Salvatore Bonaccorso We believe that the bug you reported is fixed in the latest version of linux, which is due to be installed in the Debian FTP archive. A summary of the changes between this version and the previous one is attached. Thank you for reporting the bug, which will now be closed. If you have further comments please address them to 1032...@bugs.debian.org, and the maintainer will reopen the bug report if appropriate. Debian distribution maintenance software pp. Salvatore Bonaccorso (supplier of updated linux package) (This message was generated automatically at their request; if you believe that there is a problem with it please contact the archive administrators by mailing ftpmas...@ftp-master.debian.org) -BEGIN PGP SIGNED MESSAGE- Hash: SHA512 Format: 1.8 Date: Sat, 30 Dec 2023 10:41:34 +0100 Source: linux Architecture: source Version: 5.10.205-1 Distribution: bullseye-security Urgency: high Maintainer: Debian Kernel Team Changed-By: Salvatore Bonaccorso Closes: 1032104 1035587 1052304 Changes: linux (5.10.205-1) bullseye-security; urgency=high . * New upstream stable update: https://www.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.10.198 - NFS: Use the correct commit info in nfs_join_page_group() - NFS/pNFS: Report EINVAL errors from connect() to the server - SUNRPC: Mark the cred for revalidation if the server rejects it - tra
Bug#1032104: marked as done (linux: ppc64el iouring corrupted read)
Your message dated Sat, 09 Dec 2023 17:56:32 + with message-id and subject line Bug#1032104: fixed in linux 6.1.66-1 has caused the Debian Bug report #1032104, regarding linux: ppc64el iouring corrupted read to be marked as done. This means that you claim that the problem has been dealt with. If this is not the case it is now your responsibility to reopen the Bug report if necessary, and/or fix the problem forthwith. (NB: If you are a system administrator and have no idea what this message is talking about, this may indicate a serious mail system misconfiguration somewhere. Please contact ow...@bugs.debian.org immediately.) -- 1032104: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1032104 Debian Bug Tracking System Contact ow...@bugs.debian.org with problems --- Begin Message --- Source: linux Version: 5.10.0-21-powerpc64le Severity: grave Justification: causes non-serious data loss X-Debbugs-Cc: dan...@mariadb.org Dear Maintainer, *** Reporter, please consider answering these questions, where appropriate *** * What led up to the situation? * What exactly did you do (or not do) that was effective (or ineffective)? * What was the outcome of this action? * What outcome did you expect instead? *** End of the template - remove these template lines *** >From https://jira.mariadb.org/browse/MDEV-30728 MariaDB's mtr tests on a number of specific tests depend on the correct kernel operation. As observed in these tests, there is a ~1/5 chance the encryption.innodb_encryption test will read zeros on the later part of the 16k pages that InnoDB uses by default. This affects MariaDB-10.6+ packages where there is a liburing in the distribution. This has been observed in the CI of Debian (https://ci.debian.net/packages/m/mariadb/testing/ppc64el/) and upstreams https://buildbot.mariadb.org/#/builders/318. The one ppc64le worker that has the Debian 5.10.0-21 kernel, the same as the Debian CI, has the prefix ppc64le-db-bbw1-*. Test faults occur on all MariaDB 10.6+ builds in containers on this kernel. There a no faults on non-ppc64le or RHEL7/8 based ppc64le kernels. To reproduce: apt-get install mariadb-test cd /usr/share/mysql/mysql-test ./mtr --mysqld=--innodb-flush-method=fsync --mysqld=--innodb-use-native-aio=1 --vardir=/var/lib/mysql --force encryption.innodb_encryption,innodb,undo0 --repeat=12 A test will frequenty fail. 2023-02-28 1:41:01 0 [ERROR] InnoDB: Database page corruption on disk or a failed read of file './ibdata1' page [page id: space=0, page number=282]. You may have to recover from a backup. (the page number isn't predictable) The complete mtr error log of mariadb server is $PWD/var/log/mysqld.1.err I tested on tmpfs. This is a different fault from bug #1020831 as: * there is no iouring error, just a bunch of zeros where data was expected. * this is ppc64le only. Note, more serious faults exist on overlayfs (MDEV-28751) and remote filesystems so sticking to local xfs, ext4, btrfs is recommended. -- System Information: Debian Release: bullseye APT prefers jammy-updates APT policy: (500, 'jammy-updates'), (500, 'jammy-security'), (500, 'jammy'), (100, 'jammy-backports') Architecture: ppc64el (ppc64le) Kernel: Linux 5.10.0-21-powerpc64le (SMP w/128 CPU threads) Locale: LANG=C.UTF-8, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE not set Shell: /bin/sh linked to /usr/bin/dash Init: unable to detect --- End Message --- --- Begin Message --- Source: linux Source-Version: 6.1.66-1 Done: Salvatore Bonaccorso We believe that the bug you reported is fixed in the latest version of linux, which is due to be installed in the Debian FTP archive. A summary of the changes between this version and the previous one is attached. Thank you for reporting the bug, which will now be closed. If you have further comments please address them to 1032...@bugs.debian.org, and the maintainer will reopen the bug report if appropriate. Debian distribution maintenance software pp. Salvatore Bonaccorso (supplier of updated linux package) (This message was generated automatically at their request; if you believe that there is a problem with it please contact the archive administrators by mailing ftpmas...@ftp-master.debian.org) -BEGIN PGP SIGNED MESSAGE- Hash: SHA512 Format: 1.8 Date: Sat, 09 Dec 2023 16:48:39 +0100 Source: linux Architecture: source Version: 6.1.66-1 Distribution: bookworm Urgency: medium Maintainer: Debian Kernel Team Changed-By: Salvatore Bonaccorso Closes: 1032104 1057790 1057843 Changes: linux (6.1.66-1) bookworm; urgency=medium . * New upstream stable update: https://www.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.1.65 - afs: Fix afs_server_list to be cleaned up with RCU - afs: Make error on cell lookup failure consistent with OpenAFS - [arm64,armhf] drm/panel: simple: Fix Innolux G101ICE-L01 bus flags - [arm64,armhf] drm/panel: simple: Fix
Bug#1032104: Fixed in 4.19.301, 5.10.203, 6.1.66
So the fix landed as well in 5.10.203 and 6.1.66 in particular, will add a respective closer for this bug with those rebases. This means the update will be in the next upload rebasing at least to those versions (it was too late for the next round of point release for bookworm).
Bug#1032104: marked as done (linux: ppc64el iouring corrupted read)
Your message dated Sun, 03 Dec 2023 20:48:02 + with message-id and subject line Bug#1032104: fixed in linux 6.6.4-1~exp1 has caused the Debian Bug report #1032104, regarding linux: ppc64el iouring corrupted read to be marked as done. This means that you claim that the problem has been dealt with. If this is not the case it is now your responsibility to reopen the Bug report if necessary, and/or fix the problem forthwith. (NB: If you are a system administrator and have no idea what this message is talking about, this may indicate a serious mail system misconfiguration somewhere. Please contact ow...@bugs.debian.org immediately.) -- 1032104: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1032104 Debian Bug Tracking System Contact ow...@bugs.debian.org with problems --- Begin Message --- Source: linux Version: 5.10.0-21-powerpc64le Severity: grave Justification: causes non-serious data loss X-Debbugs-Cc: dan...@mariadb.org Dear Maintainer, *** Reporter, please consider answering these questions, where appropriate *** * What led up to the situation? * What exactly did you do (or not do) that was effective (or ineffective)? * What was the outcome of this action? * What outcome did you expect instead? *** End of the template - remove these template lines *** >From https://jira.mariadb.org/browse/MDEV-30728 MariaDB's mtr tests on a number of specific tests depend on the correct kernel operation. As observed in these tests, there is a ~1/5 chance the encryption.innodb_encryption test will read zeros on the later part of the 16k pages that InnoDB uses by default. This affects MariaDB-10.6+ packages where there is a liburing in the distribution. This has been observed in the CI of Debian (https://ci.debian.net/packages/m/mariadb/testing/ppc64el/) and upstreams https://buildbot.mariadb.org/#/builders/318. The one ppc64le worker that has the Debian 5.10.0-21 kernel, the same as the Debian CI, has the prefix ppc64le-db-bbw1-*. Test faults occur on all MariaDB 10.6+ builds in containers on this kernel. There a no faults on non-ppc64le or RHEL7/8 based ppc64le kernels. To reproduce: apt-get install mariadb-test cd /usr/share/mysql/mysql-test ./mtr --mysqld=--innodb-flush-method=fsync --mysqld=--innodb-use-native-aio=1 --vardir=/var/lib/mysql --force encryption.innodb_encryption,innodb,undo0 --repeat=12 A test will frequenty fail. 2023-02-28 1:41:01 0 [ERROR] InnoDB: Database page corruption on disk or a failed read of file './ibdata1' page [page id: space=0, page number=282]. You may have to recover from a backup. (the page number isn't predictable) The complete mtr error log of mariadb server is $PWD/var/log/mysqld.1.err I tested on tmpfs. This is a different fault from bug #1020831 as: * there is no iouring error, just a bunch of zeros where data was expected. * this is ppc64le only. Note, more serious faults exist on overlayfs (MDEV-28751) and remote filesystems so sticking to local xfs, ext4, btrfs is recommended. -- System Information: Debian Release: bullseye APT prefers jammy-updates APT policy: (500, 'jammy-updates'), (500, 'jammy-security'), (500, 'jammy'), (100, 'jammy-backports') Architecture: ppc64el (ppc64le) Kernel: Linux 5.10.0-21-powerpc64le (SMP w/128 CPU threads) Locale: LANG=C.UTF-8, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE not set Shell: /bin/sh linked to /usr/bin/dash Init: unable to detect --- End Message --- --- Begin Message --- Source: linux Source-Version: 6.6.4-1~exp1 Done: Bastian Blank We believe that the bug you reported is fixed in the latest version of linux, which is due to be installed in the Debian FTP archive. A summary of the changes between this version and the previous one is attached. Thank you for reporting the bug, which will now be closed. If you have further comments please address them to 1032...@bugs.debian.org, and the maintainer will reopen the bug report if appropriate. Debian distribution maintenance software pp. Bastian Blank (supplier of updated linux package) (This message was generated automatically at their request; if you believe that there is a problem with it please contact the archive administrators by mailing ftpmas...@ftp-master.debian.org) -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Format: 1.8 Date: Sun, 03 Dec 2023 20:57:56 +0100 Source: linux Architecture: source Version: 6.6.4-1~exp1 Distribution: experimental Urgency: medium Maintainer: Debian Kernel Team Changed-By: Bastian Blank Closes: 1032104 1037938 Changes: linux (6.6.4-1~exp1) experimental; urgency=medium . * New upstream stable update: https://www.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.6.4 - nvmet: nul-terminate the NQNs passed in the connect command (CVE-2023-6121) . [ Bastian Blank ] * Fix build dependency on rsync. * Fix build dependency on kernel-wedge. * udeb: Make i2c-hid modules optional. . [ Timothy
Bug#1032104:
Root cause found, merge request here: https://salsa.debian.org/kernel-team/linux/-/merge_requests/917
Processed: bug 1032104 is forwarded to https://lore.kernel.org/regressions/19221908.47168775.1699937769845.javamail.zim...@raptorengineeringinc.com/T/#u https://lore.kernel.org/regressions/480932026.4
Processing commands for cont...@bugs.debian.org: > forwarded 1032104 > https://lore.kernel.org/regressions/19221908.47168775.1699937769845.javamail.zim...@raptorengineeringinc.com/T/#u > > https://lore.kernel.org/regressions/480932026.45576726.1699374859845.javamail.zim...@raptorengineeringinc.com/ > https://lore.kernel.org/all/2b015a34-220e-674e-7301-2cf17ef45...@kernel.dk/ Bug #1032104 [src:linux] linux: ppc64el iouring corrupted read Changed Bug forwarded-to-address to 'https://lore.kernel.org/regressions/19221908.47168775.1699937769845.javamail.zim...@raptorengineeringinc.com/T/#u https://lore.kernel.org/regressions/480932026.45576726.1699374859845.javamail.zim...@raptorengineeringinc.com/ https://lore.kernel.org/all/2b015a34-220e-674e-7301-2cf17ef45...@kernel.dk/' from '! https://lore.kernel.org/regressions/19221908.47168775.1699937769845.javamail.zim...@raptorengineeringinc.com/T/#u'. > thanks Stopping processing here. Please contact me if you need assistance. -- 1032104: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1032104 Debian Bug Tracking System Contact ow...@bugs.debian.org with problems
Bug#1032104: Status update
I have traced this bug to a missing memory barrier in the powerpc IPI handling code. io_uring uses task_work_add() to schedule I/O worker creation, which in turn issues an IPI, and when precise timing conditions are met the inconsistent state between the two CPU cores can lead to corruption of userspace data in RAM. I have sent a patch upstream, and created a merge request for Debian here: https://salsa.debian.org/kernel-team/linux/-/merge_requests/907
Bug#1032104: Status update
I have traced this to a regression in the Linux kernel. The issue appears to be a type of data race that is more likely to occur on ppc64el than on other architectures, but is also likely to affect other architectures. The issue remains in the latest GIT version of the Linux kernel, and I am working with both upstream and our internal resources to try to isolate the root cause and generate a fix. In the interim, disabling the io_uring subsystem will allow mariadb to function normally. Given the nature of the kernel bug, I would recommend disabling io_uring entirely in the kernel configuration for affected systems, as other applications may also be impacted by the data corruption.
Bug#1032104: Still present in Bookworm
We've started hitting this on a busy server after upgrading to Bookworm: 2023-09-07 17:00:31 0 [Warning] You need to use --log-bin to make --expire-logs-days or --binlog-expire-logs-seconds work. 2023-09-07 17:00:31 0 [Note] Server socket created on IP: '127.0.0.1'. 2023-09-07 17:00:31 0 [Note] /usr/sbin/mariadbd: ready for connections. Version: '10.11.3-MariaDB-1' socket: '/run/mysqld/mysqld.sock' port: 3306 Debian 12 2023-09-07 17:00:31 0 [Note] InnoDB: Buffer pool(s) load completed at 230907 17:00:31 2023-09-07 20:35:06 8630 [ERROR] InnoDB: Database page corruption on disk or a failed read of file './database/data_table.ibd' page [page id: space=393, page number=1534]. You may have to recover from a backup. 2023-09-07 20:35:06 8630 [Note] InnoDB: Page dump (8192 bytes): 2023-09-07 20:35:06 8630 [Note] InnoDB: 86c518ee05fe05fd05ff00067ee5a53145bf 2023-09-07 20:35:06 8630 [Note] InnoDB: 0189000f3eff803d3e030002003a003b 2023-09-07 20:35:06 8630 [Note] InnoDB: 045a6881 2023-09-07 20:35:06 8630 [Note] InnoDB: 12d4ae6314accb64e0a8630400b55a4b8f6557796d5eb10dee577503 2023-09-07 20:35:06 8630 [Note] InnoDB: c64ee090070c90e9fd7e2014256e140c31c2b84d193051d88fefebbe72b9caaa 2023-09-07 20:35:06 8630 [Note] InnoDB: aa36b4985494699461a42852fe41a48ca248f91b5186496a1009f10320bc42d6 2023-09-07 20:35:06 8630 [Note] InnoDB: bef79c7b6fd53d7546a73df0b5caf5d53d6b7dafb5f63ed7ba3bdddbd7ceaeb5 2023-09-07 20:35:06 8630 [Note] InnoDB: 7fbefdb3d5e7b58ff2e2804eeedd3fa674ba789fee9547e9f8f4e4deebc7f4ee 2023-09-07 20:35:06 8630 [Note] InnoDB: e2f1bb2fef53393d3a7ef96be5e8f0e5d716f9381d3fb9f7069da6c5c1f26727 2023-09-07 20:35:06 8630 [Note] InnoDB: f71eec7ff5de57d2f13bf71e3c3a7aefbdc5e1c3ee95f4b013f28b27ef3f54c6 2023-09-07 20:35:06 8630 [Note] InnoDB: a44aac65a0646a544255978c7554b3d05652ff28ff3d12da3fdd5efff9cceaf3 2023-09-07 20:35:06 8630 [Note] InnoDB: f9bf6a9f7ff9853ffef2f667ff3bd7b2d4e43c9134ae70a69088490b1b6d0a2a 2023-09-07 20:35:06 8630 [Note] InnoDB: 7852f8bd97ae751feb1e0c1cfc7c6e0ebe4e3fa483e3270d8074ce09a77dd245 2023-09-07 20:35:06 8630 [Note] InnoDB: 7b321c284b67524ec12525580ed8b742c631dfbc3f85d9daa0133952d6280ab5 2023-09-07 20:35:06 8630 [Note] InnoDB: b028926a9456a6500d8b15e6dbdd7707ccfffbd4f27e1f7fa0c1705e0ac79c95 2023-09-07 20:35:06 8630 [Note] InnoDB: cb1c44c93269ca8695676d4b60dec9fa10388efff6fff4b8bfb4fd39e0a72cbc 2023-09-07 20:35:06 8630 [Note] InnoDB: 2c26daacaab2c20a5d3c809baa6355a29415feadbaffc5dcf8df4a070774daa5 2023-09-07 20:35:06 8630 [Note] InnoDB: c3da80d40684a565112db9a26a55896ba254bda7ec8a14b6e4818191d0710eae 2023-09-07 20:35:06 8630 [Note] InnoDB: ffd31407316a8a3a3b67854825e69482cd5278958b032f3d077bddf7060e7e39 2023-09-07 20:35:06 8630 [Note] InnoDB: 3707f7d3c9a3ee0de2c7402254c39102f08aaa751445db2475cd35e8685cd241 2023-09-07 20:35:06 8630 [Note] InnoDB: ca5a060a7623c719f8d4c15417184e5c925146061b0c33eace07e3654ac58283 2023-09-07 20:35:06 8630 [Note] InnoDB: 9e81e7bb3707067e357f172c96c5bc4c3fb28c311738666b63b5a1c860220aa0 2023-09-07 20:35:06 8630 [Note] InnoDB: ea628c8f9b06d8c48ca3defbd054de0ba98c660bd12a7c93641f3102285b6f63 2023-09-07 20:35:06 8630 [Note] InnoDB: 492a8715ea9bdd5b03ea5fcf8e3a1d1086f6c93bb46ce0200b5b4c7b2f7d724e 2023-09-07 20:35:06 8630 [Note] InnoDB: 3b5652b28dca5a5b83e5b0067e316c1cfbcdff9cca78100163d44aad35a66921 2023-09-07 20:35:06 8630 [Note] InnoDB: 937c44fb57a525c7487685fd56f79d01fb6fe6c6fe95633a39e9904384371482 2023-09-07 20:35:06 8630 [Note] InnoDB: 6b4e967c102a455d6d515a4b9b4375687965fd00fe72dc38fadb7f3f95791fdb 2023-09-07 20:35:06 8630 [Note] InnoDB: 2a11ca05cc561d42162260b21699bc9414d3d9e54df77f73a37f351d3ea4e325 2023-09-07 20:35:06 8630 [Note] InnoDB: 8050b35656b1e06c6370b93a8995a46b72a8f694f3007c2b641cf38dbda98cd7 2023-09-07 20:35:06 8630 [Note] InnoDB: a48553598548c280670cba5c94773561b4b0952bcccf76af0f987ffb34aa7db9 2023-09-07 20:35:06 8630 [Note] InnoDB: da83ab46d59443abbb1c83748c1fa4429cbc4a7abbcc2726da07a6d092d1de60 2023-09-07 20:35:06 8630 [Note] InnoDB: 60586da9266b52a02a5bad83578c939dbdb63f44ce57df8b9372f4f8f0b47bb3 2023-09-07 20:35:06 8630 [Note] InnoDB: 89b2e5682e3a20d55201371592229a9a527602722e665a36ddaac47743c739d8 2023-09-07 20:35:06 8630 [Note] InnoDB: fbc7290ed890434547634df499547186a3935240d529cb7dc637da66ff037373 2023-09-07 20:35:06 8630 [Note] InnoDB: d06fa7878b93533a6e304cb04a64af183c44e858283cfcc0d79065117993fa9d 2023-09-07 20:35:06 8630 [Note] InnoDB: c071fc2ffc5d8f7bb4cb33a5040e72c56c0f6d7114c8298af83236d5bb7ebedf 2023-09-07 20:35:06 8630 [Note] InnoDB: e886dcef7f706efc5f5df069f7e034adbad689e8bc434f1379a36cc88692a4a2 2023-09-07 20:35:06 8630 [Note] InnoDB: 30dcbca8d10de02f468d237ff1afa7328f0de25451b10a7210b34d356b2f8268 2023-09-07 20:35:06 8630 [Note] InnoDB: d97769691c2e56ff879e6ef52f2d092bacf0a86a4c55845ca3774a90909804d9 2023-09-07 20:35:06 8630 [Note] InnoDB: 5be9cd78f54fb899bbff35957dd02c3
Bug#1032104: linux: ppc64el iouring corrupted read
Hello! This is not fixed. I sampled failing autopkgtests for MariaDB at https://ci.debian.net/packages/m/mariadb/testing/ppc64el/ between May 7th and 22nd. They still have crashes that include error message 'Database page corruption on disk'. Both failing and passing ones were running kernel: Linux 6.1.0-9-powerpc64le #1 SMP Debian 6.1.27-1 (2023-05-08) Most passing ones were on ci-worker-ppc64el-03, but also on -02. The failing ones were on workers -01 and -02. Since -02 had both failing and passing it indicates that this is not a hardware issue. The overall symptoms indicate that this is a software issue that started on Feb 6th (kernel: Linux 5.10.0-21-powerpc64le) and it happens sporadically, not on every run, and continues to happen. - Otto
Bug#1032104: linux: ppc64el iouring corrupted read
Hi Otto, On Sun, Apr 09, 2023 at 03:30:35PM -0700, Otto Kekäläinen wrote: > > > > Paul Gevers asked if the issues are gone as well with 6.1.12-1 > > > > (or later 6.1.y series versions, which will land in bookworm). That > > > > would be valuable information to know as well to exclude we do not > > > > have the issue as well in bookworm. > > > > > > Were you able to verify this? > > Yes and new kernel did not fix it. > > I reviewed now all ppc64el autopkgtest runs of src:mariadb at > https://ci.debian.net/packages/m/mariadb/testing/ppc64el/ > > This is still happening on latest kernel and latest src:mariadb in > bookworm. The failing test varies, but they all have in common that > they error on 'Database page corruption on disk'. > > autopkgtest [20:11:55]: starting date and time: 2023-04-08 20:11:55+ > autopkgtest [20:12:17]: testbed running kernel: Linux > 6.1.0-7-powerpc64le #1 SMP Debian 6.1.20-1 (2023-03-19) > autopkgtest [20:12:39]: testing package mariadb version 1:10.11.2-1 > Completed: Failed 6/1021 tests, 99.41% were successful. > Failing test(s): main.innodb_ext_key main.statistics_upgrade_not_done > > Attached summary of downloading all recent logs and running: > $ zgrep -e 'starting date' -e 'running kernel' -e 'testing package > mariadb version' -e 'Completed: ' -e 'Failing test(s)' *.gz | tee > mariadb-autopkgtest-ppc64el-summary.txt Are those issues still present with recent kernels? There were again enough io_uring based changes which make worth rebase our checking on those. Regards, Salvatore
Bug#1032104: linux: ppc64el iouring corrupted read
> > > Paul Gevers asked if the issues are gone as well with 6.1.12-1 > > > (or later 6.1.y series versions, which will land in bookworm). That > > > would be valuable information to know as well to exclude we do not > > > have the issue as well in bookworm. > > > > Were you able to verify this? Yes and new kernel did not fix it. I reviewed now all ppc64el autopkgtest runs of src:mariadb at https://ci.debian.net/packages/m/mariadb/testing/ppc64el/ This is still happening on latest kernel and latest src:mariadb in bookworm. The failing test varies, but they all have in common that they error on 'Database page corruption on disk'. autopkgtest [20:11:55]: starting date and time: 2023-04-08 20:11:55+ autopkgtest [20:12:17]: testbed running kernel: Linux 6.1.0-7-powerpc64le #1 SMP Debian 6.1.20-1 (2023-03-19) autopkgtest [20:12:39]: testing package mariadb version 1:10.11.2-1 Completed: Failed 6/1021 tests, 99.41% were successful. Failing test(s): main.innodb_ext_key main.statistics_upgrade_not_done Attached summary of downloading all recent logs and running: $ zgrep -e 'starting date' -e 'running kernel' -e 'testing package mariadb version' -e 'Completed: ' -e 'Failing test(s)' *.gz | tee mariadb-autopkgtest-ppc64el-summary.txt 30542346.log.gz:autopkgtest [16:38:18]: starting date and time: 2023-01-20 16:38:18+ 30542346.log.gz:autopkgtest [16:39:14]: testbed running kernel: Linux 5.10.0-20-powerpc64le #1 SMP Debian 5.10.158-2 (2022-12-13) 30542346.log.gz:autopkgtest [16:39:30]: testing package mariadb version 1:10.11.1-1 30542346.log.gz:Completed: All 1016 tests were successful. 31013059.log.gz:autopkgtest [23:16:23]: starting date and time: 2023-02-03 23:16:23+ 31013059.log.gz:autopkgtest [23:16:53]: testbed running kernel: Linux 5.10.0-20-powerpc64le #1 SMP Debian 5.10.158-2 (2022-12-13) 31013059.log.gz:autopkgtest [23:17:06]: testing package mariadb version 1:10.11.1-2 31013059.log.gz:Completed: All 1016 tests were successful. 31114152.log.gz:autopkgtest [10:00:31]: starting date and time: 2023-02-06 10:00:31+ 31114152.log.gz:autopkgtest [10:00:57]: testbed running kernel: Linux 5.10.0-21-powerpc64le #1 SMP Debian 5.10.162-1 (2023-01-21) 31114152.log.gz:autopkgtest [10:01:09]: testing package mariadb version 1:10.11.1-3 31114152.log.gz:Completed: Failed 2/1016 tests, 99.80% were successful. 31114152.log.gz:Failing test(s): main.xa_prepared_binlog_off main.update_use_source 31138628.log.gz:autopkgtest [06:52:36]: starting date and time: 2023-02-07 06:52:36+ 31138628.log.gz:autopkgtest [06:53:04]: testbed running kernel: Linux 5.10.0-21-powerpc64le #1 SMP Debian 5.10.162-1 (2023-01-21) 31138628.log.gz:autopkgtest [06:53:17]: testing package mariadb version 1:10.11.1-3 31138628.log.gz:Completed: All 1016 tests were successful. 31204767.log.gz:autopkgtest [12:32:51]: starting date and time: 2023-02-10 12:32:51+ 31204767.log.gz:autopkgtest [12:33:23]: testbed running kernel: Linux 5.10.0-21-powerpc64le #1 SMP Debian 5.10.162-1 (2023-01-21) 31204767.log.gz:autopkgtest [12:33:46]: testing package mariadb version 1:10.11.1-4 31204767.log.gz:Completed: Failed 2/1016 tests, 99.80% were successful. 31204767.log.gz:Failing test(s): main.innodb_ext_key main.order_by_innodb 31253808.log.gz:autopkgtest [19:05:34]: starting date and time: 2023-02-11 19:05:34+ 31253808.log.gz:autopkgtest [19:06:15]: testbed running kernel: Linux 5.10.0-21-powerpc64le #1 SMP Debian 5.10.162-1 (2023-01-21) 31253808.log.gz:autopkgtest [19:06:25]: testing package mariadb version 1:10.11.1-4 31253808.log.gz:Completed: All 1016 tests were successful. 31452860.log.gz:autopkgtest [09:50:34]: starting date and time: 2023-02-17 09:50:34+ 31452860.log.gz:autopkgtest [09:51:00]: testbed running kernel: Linux 5.10.0-21-powerpc64le #1 SMP Debian 5.10.162-1 (2023-01-21) 31452860.log.gz:autopkgtest [09:51:21]: testing package mariadb version 1:10.11.1-5 31452860.log.gz:Completed: Failed 6/1020 tests, 99.41% were successful. 31452860.log.gz:Failing test(s): main.ctype_utf8mb4_innodb main.index_merge_innodb 31480673.log.gz:autopkgtest [01:00:30]: starting date and time: 2023-02-18 01:00:30+ 31480673.log.gz:autopkgtest [01:01:00]: testbed running kernel: Linux 5.10.0-21-powerpc64le #1 SMP Debian 5.10.162-1 (2023-01-21) 31480673.log.gz:autopkgtest [01:01:17]: testing package mariadb version 1:10.11.2-1 31480673.log.gz:Completed: Failed 6/1021 tests, 99.41% were successful. 31480673.log.gz:Failing test(s): main.xa_prepared_binlog_off main.range_mrr_icp 31509348.log.gz:autopkgtest [05:09:32]: starting date and time: 2023-02-19 05:09:32+ 31509348.log.gz:autopkgtest [05:10:50]: testbed running kernel: Linux 5.10.0-21-powerpc64le #1 SMP Debian 5.10.162-1 (2023-01-21) 31509348.log.gz:autopkgtest [05:11:06]: testing package mariadb version 1:10.11.2-1 31509348.log.gz:Completed: Failed 3/1019 tests, 99.71% were successful. 31509348.log.gz:Failing test(s): main.ctype_utf8mb4_innodb 323410
Bug#1032104: linux: ppc64el iouring corrupted read
Hi Otto, On 09-04-2023 03:54, Otto Kekäläinen wrote: Paul Gevers asked if the issues are gone as well with 6.1.12-1 (or later 6.1.y series versions, which will land in bookworm). That would be valuable information to know as well to exclude we do not have the issue as well in bookworm. Were you able to verify this? No, not yet. I have not done new uploads to experimental after the one I mentioned and linked above from March 18th. I don't understand this point, so I wonder if you understood my question. Maybe you did, but in my view no new uploads are needed to answer the bookworm question. The builds for unstable are passing because I forced the tests to run with regular fsync instead of native I/O in https://salsa.debian.org/mariadb-team/mariadb-server/-/commit/fc1358087b39ac6520420c7bbae2e536bc86748d. I will test this again later but right now I don't want to do any extra uploads as the package is pending unblock and inclusion in Bullseye (Bug#1033811) and I don't want one single minor issue to jeopardize getting fixes for multiple major issues forward. My point was that I upgraded the ppc64el hosts where ci.debian.net runs the autopkgtests (so *not* the Debian build infrastructure). Since that upgrade, all tests on ci.debian.net *in every suite* have been using the bookworm (6.1.y) kernel. E.g. in unstable MariaDb 1:10.11.2-1 (so before the "Prevent mariadb-test-run from using native I/O on ppc64el and s390x due to Linux kernel bug" change) passed on 2023-03-26 10:39 but failed on the same day at 14:40. Is any of the failures on ppc64el before 1:10.11.2-2 and after 2023-03-09 from the same kernel issue we're discussing here (and thus the kernel still needs fixing in bookworm). Or are all the failures in that time-span from something else, and thus can we conclude that the kernel *probably* (no proof of course) got fixed between the version of the kernel in bullseye and the version in bookworm. Paul OpenPGP_signature Description: OpenPGP digital signature
Bug#1032104: linux: ppc64el iouring corrupted read
> > On Sat, Mar 18, 2023 at 11:19:29PM -0700, Otto Kekäläinen wrote: > > > Any updates on this one? > > > > > > I am still seeing the main.index_merge_innodb failure in > > > https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=ppc64el&ver=1%3A10.11.2-2%7Eexp1&stamp=1678728871&raw=0 > > > and rebuild > > > https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=ppc64el&ver=1%3A10.11.2-2%7Eexp1&stamp=1679174850&raw=0. > > > > > > Logs show: Kernel: Linux 5.10.0-21-powerpc64le #1 SMP Debian > > > 5.10.162-1 (2023-01-21) ppc64el (ppc64le) > > > > Remember that with the 5.10.162 upstream version the io_uring code was > > rebased to the 5.15-stable one. So it is likely, and it maches the > > verison ranges, that the regression was introduced with this > > particular changes. Ideally someone with access to the given > > architecture, can verify that the issue is gone with the current > > 5.10.175 upstream (where there were several followup fixes, in > > particular e.g. a similar one for s390x), and if not, reports the > > problem to upstream. > > > > Paul Gevers asked if the issues are gone as well with 6.1.12-1 > > (or later 6.1.y series versions, which will land in bookworm). That > > would be valuable information to know as well to exclude we do not > > have the issue as well in bookworm. > > Were you able to verify this? No, not yet. I have not done new uploads to experimental after the one I mentioned and linked above from March 18th. The builds for unstable are passing because I forced the tests to run with regular fsync instead of native I/O in https://salsa.debian.org/mariadb-team/mariadb-server/-/commit/fc1358087b39ac6520420c7bbae2e536bc86748d. I will test this again later but right now I don't want to do any extra uploads as the package is pending unblock and inclusion in Bullseye (Bug#1033811) and I don't want one single minor issue to jeopardize getting fixes for multiple major issues forward.
Processed: Re: Bug#1032104: linux: ppc64el iouring corrupted read
Processing control commands: > tags -1 + moreinfo Bug #1032104 [src:linux] linux: ppc64el iouring corrupted read Added tag(s) moreinfo. -- 1032104: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1032104 Debian Bug Tracking System Contact ow...@bugs.debian.org with problems
Bug#1032104: linux: ppc64el iouring corrupted read
Control: tags -1 + moreinfo Hi On Sun, Mar 19, 2023 at 05:02:19PM +0100, Salvatore Bonaccorso wrote: > Hi, > > On Sat, Mar 18, 2023 at 11:19:29PM -0700, Otto Kekäläinen wrote: > > Any updates on this one? > > > > I am still seeing the main.index_merge_innodb failure in > > https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=ppc64el&ver=1%3A10.11.2-2%7Eexp1&stamp=1678728871&raw=0 > > and rebuild > > https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=ppc64el&ver=1%3A10.11.2-2%7Eexp1&stamp=1679174850&raw=0. > > > > Logs show: Kernel: Linux 5.10.0-21-powerpc64le #1 SMP Debian > > 5.10.162-1 (2023-01-21) ppc64el (ppc64le) > > Remember that with the 5.10.162 upstream version the io_uring code was > rebased to the 5.15-stable one. So it is likely, and it maches the > verison ranges, that the regression was introduced with this > particular changes. Ideally someone with access to the given > architecture, can verify that the issue is gone with the current > 5.10.175 upstream (where there were several followup fixes, in > particular e.g. a similar one for s390x), and if not, reports the > problem to upstream. > > Paul Gevers asked if the issues are gone as well with 6.1.12-1 > (or later 6.1.y series versions, which will land in bookworm). That > would be valuable information to know as well to exclude we do not > have the issue as well in bookworm. Were you able to verify this? Regards, Salvatore
Bug#1032104: linux: ppc64el iouring corrupted read
Hi, On Sat, Mar 18, 2023 at 11:19:29PM -0700, Otto Kekäläinen wrote: > Any updates on this one? > > I am still seeing the main.index_merge_innodb failure in > https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=ppc64el&ver=1%3A10.11.2-2%7Eexp1&stamp=1678728871&raw=0 > and rebuild > https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=ppc64el&ver=1%3A10.11.2-2%7Eexp1&stamp=1679174850&raw=0. > > Logs show: Kernel: Linux 5.10.0-21-powerpc64le #1 SMP Debian > 5.10.162-1 (2023-01-21) ppc64el (ppc64le) Remember that with the 5.10.162 upstream version the io_uring code was rebased to the 5.15-stable one. So it is likely, and it maches the verison ranges, that the regression was introduced with this particular changes. Ideally someone with access to the given architecture, can verify that the issue is gone with the current 5.10.175 upstream (where there were several followup fixes, in particular e.g. a similar one for s390x), and if not, reports the problem to upstream. Paul Gevers asked if the issues are gone as well with 6.1.12-1 (or later 6.1.y series versions, which will land in bookworm). That would be valuable information to know as well to exclude we do not have the issue as well in bookworm. Regards, Salvatore
Bug#1032104: linux: ppc64el iouring corrupted read
Any updates on this one? I am still seeing the main.index_merge_innodb failure in https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=ppc64el&ver=1%3A10.11.2-2%7Eexp1&stamp=1678728871&raw=0 and rebuild https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=ppc64el&ver=1%3A10.11.2-2%7Eexp1&stamp=1679174850&raw=0. Logs show: Kernel: Linux 5.10.0-21-powerpc64le #1 SMP Debian 5.10.162-1 (2023-01-21) ppc64el (ppc64le)
Bug#1032104: linux: ppc64el iouring corrupted read
On Mon, 6 Mar 2023 13:25:36 +1100 Daniel Black wrote: Since revering to linux-image-5.10.0-20 we've been free of the same errors. On ci.debian.net I upgraded all ppc64el hosts to bookworm on 2023-03-09. debian@ci-worker-ppc64el-04:~$ uname -a Linux ci-worker-ppc64el-04 6.1.0-5-powerpc64le #1 SMP Debian 6.1.12-1 (2023-02-15) ppc64le GNU/Linux Can you check if the errors are still the same (yes, there's still intermittent failures). Paul OpenPGP_signature Description: OpenPGP digital signature
Bug#1032104: linux: ppc64el iouring corrupted read
Since revering to linux-image-5.10.0-20 we've been free of the same errors.
Bug#1032104: linux: ppc64el iouring corrupted read
On Tue, Feb 28, 2023 at 5:24 PM Diederik de Haas wrote: > > On Tuesday, 28 February 2023 04:13:18 CET Daniel Black wrote: > > Source: linux > > Version: 5.10.0-21-powerpc64le > > Severity: grave > > Justification: causes non-serious data loss > > X-Debbugs-Cc: dan...@mariadb.org > > > > >From https://jira.mariadb.org/browse/MDEV-30728 > > > > MariaDB's mtr tests on a number of specific tests depend on the correct > > kernel operation. > > > > As observed in these tests, there is a ~1/5 chance the > > encryption.innodb_encryption test will read zeros on the later part of > > the 16k pages that InnoDB uses by default. > > > > This affects MariaDB-10.6+ packages where there is a liburing in the > > distribution. > > > > I tested on tmpfs. This is a different fault from bug #1020831 as: > > * there is no iouring error, just a bunch of zeros where data was > > expected. > > * this is ppc64le only. > > What was the last kernel where this problem did NOT occur? 2022-12-19 03:55:34 install linux-image-5.10.0-20-powerpc64le:ppc64el 5.10.158-2 no similar errors between ^ and .. 2023-01-24 03:19:59 install linux-image-5.10.0-21-powerpc64le:ppc64el 5.10.162-1 (no other linux image installs in between these two) first failure found ~ Feb 4 2023. Unsure when kernel rebooted to this kernel bug it does appear to be the last revision. https://buildbot.mariadb.org/#/builders/318/builds/10008 log example https://ci.mariadb.org/32263/logs/ppc64le-debian-11/mysqld.1.err.7 (search for CURRENT_TEST: encryption.innodb_encryption) - contains hex dump of page > It's probably needed to pinpoint the (upstream) commit that caused this error/ > issue and the best start is normally finding the closest range with Debian > kernel releases where it did not and did occur. > > > -- System Information: > > Debian Release: bullseye > > APT prefers jammy-updates > > APT policy: (500, 'jammy-updates'), (500, 'jammy-security'), (500, > > 'jammy'), (100, 'jammy-backports') Architecture: ppc64el (ppc64le) > > > > Kernel: Linux 5.10.0-21-powerpc64le (SMP w/128 CPU threads) > > Init: unable to detect > > Why is there no 'bullseye' in APT policy's output? > Mixing distrubutions (aka FrankenDebian) isn't recommended, but seeing no > bullseye in there is odd, especially since the kernel version very much does > look like Debian. Apologies for the FrankenDebian look. This was a jammy container and jammy report bug with bullseye edited (badly) in the system info.
Bug#1032104: linux: ppc64el iouring corrupted read
On Tuesday, 28 February 2023 04:13:18 CET Daniel Black wrote: > Source: linux > Version: 5.10.0-21-powerpc64le > Severity: grave > Justification: causes non-serious data loss > X-Debbugs-Cc: dan...@mariadb.org > > >From https://jira.mariadb.org/browse/MDEV-30728 > > MariaDB's mtr tests on a number of specific tests depend on the correct > kernel operation. > > As observed in these tests, there is a ~1/5 chance the > encryption.innodb_encryption test will read zeros on the later part of > the 16k pages that InnoDB uses by default. > > This affects MariaDB-10.6+ packages where there is a liburing in the > distribution. > > I tested on tmpfs. This is a different fault from bug #1020831 as: > * there is no iouring error, just a bunch of zeros where data was > expected. > * this is ppc64le only. What was the last kernel where this problem did NOT occur? It's probably needed to pinpoint the (upstream) commit that caused this error/ issue and the best start is normally finding the closest range with Debian kernel releases where it did not and did occur. > -- System Information: > Debian Release: bullseye > APT prefers jammy-updates > APT policy: (500, 'jammy-updates'), (500, 'jammy-security'), (500, > 'jammy'), (100, 'jammy-backports') Architecture: ppc64el (ppc64le) > > Kernel: Linux 5.10.0-21-powerpc64le (SMP w/128 CPU threads) > Init: unable to detect Why is there no 'bullseye' in APT policy's output? Mixing distrubutions (aka FrankenDebian) isn't recommended, but seeing no bullseye in there is odd, especially since the kernel version very much does look like Debian. signature.asc Description: This is a digitally signed message part.
Bug#1032104: linux: ppc64el iouring corrupted read
Source: linux Version: 5.10.0-21-powerpc64le Severity: grave Justification: causes non-serious data loss X-Debbugs-Cc: dan...@mariadb.org Dear Maintainer, *** Reporter, please consider answering these questions, where appropriate *** * What led up to the situation? * What exactly did you do (or not do) that was effective (or ineffective)? * What was the outcome of this action? * What outcome did you expect instead? *** End of the template - remove these template lines *** >From https://jira.mariadb.org/browse/MDEV-30728 MariaDB's mtr tests on a number of specific tests depend on the correct kernel operation. As observed in these tests, there is a ~1/5 chance the encryption.innodb_encryption test will read zeros on the later part of the 16k pages that InnoDB uses by default. This affects MariaDB-10.6+ packages where there is a liburing in the distribution. This has been observed in the CI of Debian (https://ci.debian.net/packages/m/mariadb/testing/ppc64el/) and upstreams https://buildbot.mariadb.org/#/builders/318. The one ppc64le worker that has the Debian 5.10.0-21 kernel, the same as the Debian CI, has the prefix ppc64le-db-bbw1-*. Test faults occur on all MariaDB 10.6+ builds in containers on this kernel. There a no faults on non-ppc64le or RHEL7/8 based ppc64le kernels. To reproduce: apt-get install mariadb-test cd /usr/share/mysql/mysql-test ./mtr --mysqld=--innodb-flush-method=fsync --mysqld=--innodb-use-native-aio=1 --vardir=/var/lib/mysql --force encryption.innodb_encryption,innodb,undo0 --repeat=12 A test will frequenty fail. 2023-02-28 1:41:01 0 [ERROR] InnoDB: Database page corruption on disk or a failed read of file './ibdata1' page [page id: space=0, page number=282]. You may have to recover from a backup. (the page number isn't predictable) The complete mtr error log of mariadb server is $PWD/var/log/mysqld.1.err I tested on tmpfs. This is a different fault from bug #1020831 as: * there is no iouring error, just a bunch of zeros where data was expected. * this is ppc64le only. Note, more serious faults exist on overlayfs (MDEV-28751) and remote filesystems so sticking to local xfs, ext4, btrfs is recommended. -- System Information: Debian Release: bullseye APT prefers jammy-updates APT policy: (500, 'jammy-updates'), (500, 'jammy-security'), (500, 'jammy'), (100, 'jammy-backports') Architecture: ppc64el (ppc64le) Kernel: Linux 5.10.0-21-powerpc64le (SMP w/128 CPU threads) Locale: LANG=C.UTF-8, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE not set Shell: /bin/sh linked to /usr/bin/dash Init: unable to detect