Hi, Le 23/03/2023 à 19:54, Jens Axboe a écrit : > Hi, > > I got a report sent to me from mariadb, in where 5.10.158 works fine and > 5.10.162 is broken. And in fact, current 6.3-rc also fails the test > case. Beware that this email is long, as I'm trying to include > everything that may be relevant...
Which variant of powerpc ? 32 or 64 bits ? Book3S or BookE ? Christophe > > The test case in question is pretty simple. On debian testing, do: > > $ sudo apt-get install mariadb-test > $ cd /usr/share/mysql/mysql-test > $ ./mtr --mysqld=--innodb-flush-method=fsync > --mysqld=--innodb-use-native-aio=1 --vardir=/dev/shm/mysql --force > encryption.innodb_encryption,innodb,undo0 --repeat=200 > > and if it fails, you'll see something like: > > encryption.innodb_encryption 'innodb,undo0' [ 6 pass ] 3120 > encryption.innodb_encryption 'innodb,undo0' [ 7 pass ] 3123 > encryption.innodb_encryption 'innodb,undo0' [ 8 pass ] 3042 > encryption.innodb_encryption 'innodb,undo0' [ 9 fail ] > Test ended at 2023-03-23 16:55:17 > > CURRENT_TEST: encryption.innodb_encryption > mysqltest: At line 11: query 'SET @start_global_value = > @@global.innodb_encryption_threads' failed: ER_UNKNOWN_SYSTEM_VARIABLE > (1193): Unknown system variable 'innodb_encryption_threads' > > The result from queries just before the failure was: > SET @start_global_value = @@global.innodb_encryption_threads; > > - saving '/dev/shm/mysql/log/encryption.innodb_encryption-innodb,undo0/' to > '/dev/shm/mysql/log/encryption.innodb_encryption-innodb,undo0/' > ***Warnings generated in error logs during shutdown after running tests: > encryption.innodb_encryption > > 2023-03-23 16:55:17 0 [Warning] Plugin 'example_key_management' is of > maturity level experimental while the server is stable > 2023-03-23 16:55:17 0 [ERROR] InnoDB: Database page corruption on disk or a > failed read of file './ibdata1' page [page id: space=0, page number=221]. You > may have to recover from a backup. > > where data read was not as expected. > > Now, there are a number of io_uring changes between .158 and .162, as it > includes the backport that brought 5.10-stable into line with what > 5.15-stable includes. I'll spare you all the digging I did to vet those > changes, but the key thing is that it STILL happens on 6.3-git on > powerpc. > > After ruling out many things, one key difference between 158 and 162 is > that the former offloaded requests that could not be done nonblocking to > a kthread, and 162 and newer offloads to an IO thread. An IO thread is > just a normal thread created from the application submitting IO, the > only difference is that it never exits to userspace. An IO thread has > the same mm/files/you-name-it from the original task. It really is the > same as a userspace thread created by the application The switch to IO > threads was done exactly because of that, rather than rely on a fragile > scheme of having the kthread worker assume all sorts of identify from > the original task. surprises if things were missed. This is what caused > most of the io_uring security issues in the past. > > The IO that mariadb does in this test is pretty simple - a bunch of > largish buffered writes with IORING_OP_WRITEV, and some smallish (16K) > buffered reads with IORING_OP_READV. > > Today I finally gave up and ran a basic experiment, which simply > offloads the writes to a kthread. Since powerpc has an interesting > memory coherency model, my suspicion was that the work involved with > switching MMs for the kthread could just be the main difference here. > The patch is really dumb and simple - rather than queue the write to an > IO thread, it just offloads it to a kthread that then does > kthread_use_mm(), perform write with the same write handler, > kthread_unuse_mm(). AND THIS WORKS! Usually the above mtr test would > fail in 2..20 loops, I've now done 200 and 500 loops and it's fine. > > Which then leads me to the question, what about the IO thread offload > makes this fail on powerpc (and no other arch I've tested on, including > x86/x86-64/aarch64/hppa64)? The offload should be equivalent to having a > thread in userspace in the application, and having that thread just > perform the writes. Is there some magic involved with the kthread mm > use/unuse that makes this sufficiently consistent on powerpc? I've tried > any mix of isync()/mb and making the flush_dcache_page() unconditionally > done in the filemap read/write helpers, and it still falls flat on its > face with the offload to an IO thread. > > I must clearly be missing something here, which is why I'm emailing the > powerpc Gods for help :-) >