On 3/26/23 10:22?PM, Nicholas Piggin wrote: > On Sat Mar 25, 2023 at 11:20 AM AEST, Jens Axboe wrote: >> On 3/24/23 7:15?PM, Jens Axboe wrote: >>>> Are there any CONFIG options I'd need to trip this? >>> >>> I don't think you need any special CONFIG options. I'll attach my config >>> here, and I know the default distro one hits it too. But perhaps the >>> mariadb version is not new enough? I think you need 10.6 or above, as >>> will use io_uring by default. What version are you running? >> >> And here's the .config and the patch for using queue_work(). > > So if you *don't* apply this patch, the work gets queued up with an IO > thread? In io-wq.c? Does that worker end up just doing an io_write() > same as this one?
Right, without this patch, it gets added to the io-wq work pool. If a thread is available to run it, it will. If one is not, then one is created. Eg either event can happen. That thread does the exact same io_write() again. > Can the queueing cause the creation of an IO thread (if one does not > exist, or all blocked?) Yep Since writing this email, I've gone through a lot of different tests. Here's a rough listing of what I found: - Like using the hack patch, if I just limit the number of IO thread workers to 1, it seems to pass. At least longer than before, does 1000 iterations. - If I pin each IO worker to a single CPU, it also passes. - If I liberally sprinkle smp_mb() for the io-wq side, test still fails. I've added one before queueing the work item, and after. One before the io-wq worker grabs a work item and one after. Eg full hammer approach. This still fails. Puzzling... For the "pin each IO worker to a single CPU" I added some basic code around trying to ensure that a work item queued on CPU X would be processed by a worker on CPU X, and too a large degree, this does happen. But since the work list is a normal list, it's quite possible that some other worker finishes its work on CPU Y just in time to grab the one from cpu X. I checked and this does happen in the test case, yet it still passes. This may be because I got a bit lucky, but seems suspect with thousands of passes of the test case. Another theory there is that it's perhaps related to an io-wq worker being rescheduled on a different CPU. Though again puzzled as to why the smp_mb sprinkling didn't fix that then. I'm going to try and run the test case with JUST the io-wq worker pinning and not caring about where the work is processed to see if that does anything. > I'm wondering what the practical differences are between this patch and > upstream. > > kthread_use_mm() should be basically the same as context switching to an > IO thread. There is maybe a difference in that kthread_switch_mm() has > a 'sync' instruction *after* the MMU is switched to the new thread from > the membarrier code, but a regular context switch might not. The MMU > switch does have an isync() after it though, so loads *should* be > prohibited from moving ahead of that. > > Something like this adds a sync roughly where kthread_use_mm() has one. > It's a pretty unlikely shot in the dark though. I'm more inclined to > think the work submission to the IO thread might have a problem. Didn't seem to change anything, fails pretty quickly: [...] encryption.innodb_encryption 'innodb,undo0' [ 38 pass ] 3083 encryption.innodb_encryption 'innodb,undo0' [ 39 pass ] 3135 encryption.innodb_encryption 'innodb,undo0' [ 40 fail ] Test ended at 2023-03-27 12:20:46 CURRENT_TEST: encryption.innodb_encryption mysqltest: At line 11: query 'SET @start_global_value = @@global.innodb_encryption_threads' failed: ER_UNKNOWN_SYSTEM_VARIABLE (1193): Unknown system variable 'innodb_encryption_threads' The result from queries just before the failure was: SET @start_global_value = @@global.innodb_encryption_threads; - saving '/dev/shm/mysql/log/encryption.innodb_encryption-innodb,undo0/' to '/dev/shm/mysql/log/encryption.innodb_encryption-innodb,undo0/' ***Warnings generated in error logs during shutdown after running tests: encryption.innodb_encryption 2023-03-27 12:20:45 0 [Warning] Plugin 'example_key_management' is of maturity level experimental while the server is gamma 2023-03-27 12:20:45 0 [ERROR] InnoDB: Database page corruption on disk or a failed read of file './ibdata1' page [page id: space=0, page number=214]. You may have to recover from a backup. 2023-03-27 12:20:45 0 [ERROR] InnoDB: File './ibdata1' is corrupted 2023-03-27 12:20:45 0 [ERROR] InnoDB: Plugin initialization aborted with error Page read from tablespace is corrupted. 2023-03-27 12:20:45 0 [ERROR] Plugin 'InnoDB' init function returned error. 2023-03-27 12:20:45 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed. -- Jens Axboe