Re: Memory coherency issue with IO thread offloading?
On 3/24/23 6:42?PM, Michael Ellerman wrote: > Jens Axboe writes: >> Hi, > > Hi Jens, > > Thanks for the report. > >> I got a report sent to me from mariadb, in where 5.10.158 works fine and >> 5.10.162 is broken. And in fact, current 6.3-rc also fails the test >> case. Beware that this email is long, as I'm trying to include >> everything that may be relevant... >> >> The test case in question is pretty simple. On debian testing, do: >> >> $ sudo apt-get install mariadb-test >> $ cd /usr/share/mysql/mysql-test >> $ ./mtr --mysqld=--innodb-flush-method=fsync >> --mysqld=--innodb-use-native-aio=1 --vardir=/dev/shm/mysql --force >> encryption.innodb_encryption,innodb,undo0 --repeat=200 > > I mostly use Fedora, the package name is the same but the mtr binary > ends up in /usr/share/mysql. > >> and if it fails, you'll see something like: >> >> encryption.innodb_encryption 'innodb,undo0' [ 6 pass ] 3120 >> encryption.innodb_encryption 'innodb,undo0' [ 7 pass ] 3123 >> encryption.innodb_encryption 'innodb,undo0' [ 8 pass ] 3042 >> encryption.innodb_encryption 'innodb,undo0' [ 9 fail ] >> Test ended at 2023-03-23 16:55:17 > > I haven't been able to get this to fail yet. I've done several runs with > --repeat=500 and haven't seen any errors yet. > > Are there any CONFIG options I'd need to trip this? I don't think you need any special CONFIG options. I'll attach my config here, and I know the default distro one hits it too. But perhaps the mariadb version is not new enough? I think you need 10.6 or above, as will use io_uring by default. What version are you running? > ... >> Today I finally gave up and ran a basic experiment, which simply >> offloads the writes to a kthread. Since powerpc has an interesting >> memory coherency model, my suspicion was that the work involved with >> switching MMs for the kthread could just be the main difference here. >> The patch is really dumb and simple - rather than queue the write to an >> IO thread, it just offloads it to a kthread that then does >> kthread_use_mm(), perform write with the same write handler, >> kthread_unuse_mm(). AND THIS WORKS! Usually the above mtr test would >> fail in 2..20 loops, I've now done 200 and 500 loops and it's fine. > > Can you share the patch that does that? It would help me track down > where exactly in the io_uring code you're talking about. Shoot yes, I actually meant to attach it but then forgot. Below! >> Which then leads me to the question, what about the IO thread offload >> makes this fail on powerpc (and no other arch I've tested on, including >> x86/x86-64/aarch64/hppa64)? The offload should be equivalent to having a >> thread in userspace in the application, and having that thread just >> perform the writes. Is there some magic involved with the kthread mm >> use/unuse that makes this sufficiently consistent on powerpc? I've tried >> any mix of isync()/mb and making the flush_dcache_page() unconditionally >> done in the filemap read/write helpers, and it still falls flat on its >> face with the offload to an IO thread. > > My first guess would be that there's some missing barriers between the > thread that queues the IO and the IO worker thread. That was my guess too, and I consulted Paul McKenney as well on that. And he had some ideas of course, in terms of ordering of the CQ ring. But tried it all out, and it still failed in the same way... > I think you're using schedule_work() for that though, which should be a > full barrier. Could it be on the completion side? queue_work() for the patch, before that it's io-wq which is an internal IO thread worker pool. The latter just needs a spin_lock() around queueing the work, and then a wake of the task. Typing this out, maybe this is where a barrier is now missing? If the IO thread is already running rather than sleeping? > I can't think of any magic in kthread_use_mm() other than extra > barriers. In particular kthread_unuse_mm() has an > smp_mb__after_spinlock() which is a full memory barrier on powerpc but > is a nop on some other architectures, x86 at least. Yeah, I did poke at kthread_use_mm() and the related powerpc bits, but didn't immediately find anything that seemed promising in this regard. >> I must clearly be missing something here, which is why I'm emailing the >> powerpc Gods for help :-) > > Unfortunately the true God of powerpc memory ordering has left us and > ascended into the Metaverse ;) ;-) -- Jens Axboe
Re: Memory coherency issue with IO thread offloading?
Jens Axboe writes: > Hi, Hi Jens, Thanks for the report. > I got a report sent to me from mariadb, in where 5.10.158 works fine and > 5.10.162 is broken. And in fact, current 6.3-rc also fails the test > case. Beware that this email is long, as I'm trying to include > everything that may be relevant... > > The test case in question is pretty simple. On debian testing, do: > > $ sudo apt-get install mariadb-test > $ cd /usr/share/mysql/mysql-test > $ ./mtr --mysqld=--innodb-flush-method=fsync > --mysqld=--innodb-use-native-aio=1 --vardir=/dev/shm/mysql --force > encryption.innodb_encryption,innodb,undo0 --repeat=200 I mostly use Fedora, the package name is the same but the mtr binary ends up in /usr/share/mysql. > and if it fails, you'll see something like: > > encryption.innodb_encryption 'innodb,undo0' [ 6 pass ] 3120 > encryption.innodb_encryption 'innodb,undo0' [ 7 pass ] 3123 > encryption.innodb_encryption 'innodb,undo0' [ 8 pass ] 3042 > encryption.innodb_encryption 'innodb,undo0' [ 9 fail ] > Test ended at 2023-03-23 16:55:17 I haven't been able to get this to fail yet. I've done several runs with --repeat=500 and haven't seen any errors yet. Are there any CONFIG options I'd need to trip this? ... > Today I finally gave up and ran a basic experiment, which simply > offloads the writes to a kthread. Since powerpc has an interesting > memory coherency model, my suspicion was that the work involved with > switching MMs for the kthread could just be the main difference here. > The patch is really dumb and simple - rather than queue the write to an > IO thread, it just offloads it to a kthread that then does > kthread_use_mm(), perform write with the same write handler, > kthread_unuse_mm(). AND THIS WORKS! Usually the above mtr test would > fail in 2..20 loops, I've now done 200 and 500 loops and it's fine. Can you share the patch that does that? It would help me track down where exactly in the io_uring code you're talking about. > Which then leads me to the question, what about the IO thread offload > makes this fail on powerpc (and no other arch I've tested on, including > x86/x86-64/aarch64/hppa64)? The offload should be equivalent to having a > thread in userspace in the application, and having that thread just > perform the writes. Is there some magic involved with the kthread mm > use/unuse that makes this sufficiently consistent on powerpc? I've tried > any mix of isync()/mb and making the flush_dcache_page() unconditionally > done in the filemap read/write helpers, and it still falls flat on its > face with the offload to an IO thread. My first guess would be that there's some missing barriers between the thread that queues the IO and the IO worker thread. I think you're using schedule_work() for that though, which should be a full barrier. Could it be on the completion side? I can't think of any magic in kthread_use_mm() other than extra barriers. In particular kthread_unuse_mm() has an smp_mb__after_spinlock() which is a full memory barrier on powerpc but is a nop on some other architectures, x86 at least. > I must clearly be missing something here, which is why I'm emailing the > powerpc Gods for help :-) Unfortunately the true God of powerpc memory ordering has left us and ascended into the Metaverse ;) cheers
Re: Memory coherency issue with IO thread offloading?
On 3/24/23 6:15 PM, Michael Ellerman wrote: > Jens Axboe writes: >> On 3/24/23 1:27?AM, Christophe Leroy wrote: >>> Le 23/03/2023 ? 19:54, Jens Axboe a ?crit : I got a report sent to me from mariadb, in where 5.10.158 works fine and 5.10.162 is broken. And in fact, current 6.3-rc also fails the test case. Beware that this email is long, as I'm trying to include everything that may be relevant... >>> >>> Which variant of powerpc ? 32 or 64 bits ? Book3S or BookE ? >> >> I knew I'd forget something important... It's power9: >> >> processor: 0 >> cpu : POWER9 (architected), altivec supported >> clock: 2200.00MHz >> revision : 2.2 (pvr 004e 1202) > > Believe it or not there's still more variables in play, Power9 has two > different MMUs, and Linux can run on two different hypervisors as well > as on bare metal P9 :) Just my luck :-) > > Can you paste the last ~10 lines of /proc/cpuinfo, with the "machine", > "firmware" and "MMU" lines, that should tell us everything we need to > know. timebase: 51200 platform: pSeries model : IBM pSeries (emulated by qemu) machine : CHRP IBM pSeries (emulated by qemu) MMU : Radix I know it reproduces bare metal as well, but no details on what that box is. I just know I was able to reproduce it in this vm that I was able to get my hands on. -- Jens Axboe
Re: Memory coherency issue with IO thread offloading?
Jens Axboe writes: > On 3/24/23 1:27?AM, Christophe Leroy wrote: >> Le 23/03/2023 ? 19:54, Jens Axboe a ?crit : >>> I got a report sent to me from mariadb, in where 5.10.158 works fine and >>> 5.10.162 is broken. And in fact, current 6.3-rc also fails the test >>> case. Beware that this email is long, as I'm trying to include >>> everything that may be relevant... >> >> Which variant of powerpc ? 32 or 64 bits ? Book3S or BookE ? > > I knew I'd forget something important... It's power9: > > processor : 0 > cpu : POWER9 (architected), altivec supported > clock : 2200.00MHz > revision : 2.2 (pvr 004e 1202) Believe it or not there's still more variables in play, Power9 has two different MMUs, and Linux can run on two different hypervisors as well as on bare metal P9 :) Can you paste the last ~10 lines of /proc/cpuinfo, with the "machine", "firmware" and "MMU" lines, that should tell us everything we need to know. cheers
[PATCH] powerpc/pseries: Add spaces around / operator
This is follow up change after 14b5d59a261b ("powerpc/pseries: Fix formatting to make code look more beautiful") to conform to kernel coding style. Signed-off-by: Petr Vaněk --- arch/powerpc/platforms/pseries/iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c index c74b71d4733d..b0cb2fa39cf8 100644 --- a/arch/powerpc/platforms/pseries/iommu.c +++ b/arch/powerpc/platforms/pseries/iommu.c @@ -474,7 +474,7 @@ static int tce_setrange_multi_pSeriesLP(unsigned long start_pfn, * Set up the page with TCE data, looping through and setting * the values. */ - limit = min_t(long, num_tce, 4096/TCE_ENTRY_SIZE); + limit = min_t(long, num_tce, 4096 / TCE_ENTRY_SIZE); dma_offset = next + be64_to_cpu(maprange->dma_base); for (l = 0; l < limit; l++) { -- 2.39.2
Re: [linux-next:master] BUILD REGRESSION 7c4a254d78f89546d0e74a40617ef24c6151c8d1
Hi, On 23/03/2023 23:34, kernel test robot wrote: > tree/branch: > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master > branch HEAD: 7c4a254d78f89546d0e74a40617ef24c6151c8d1 Add linux-next > specific files for 20230323 > > Error/Warning reports: > > https://lore.kernel.org/oe-kbuild-all/202303161521.jbgbafjj-...@intel.com > https://lore.kernel.org/oe-kbuild-all/202303231302.iy6qifxa-...@intel.com > https://lore.kernel.org/oe-kbuild-all/202303232154.axoxawhg-...@intel.com > > Error/Warning: (recently discovered and may have been fixed) > > drivers/gpu/drm/amd/amdgpu/../display/dc/link/link_validation.c:351:13: > warning: variable 'bw_needed' set but not used [-Wunused-but-set-variable] > drivers/gpu/drm/amd/amdgpu/../display/dc/link/link_validation.c:352:25: > warning: variable 'link' set but not used [-Wunused-but-set-variable] > drivers/net/wireless/legacy/ray_cs.c:628:17: warning: 'strncpy' specified > bound 32 equals destination size [-Wstringop-truncation] > gpio.c:(.init.text+0xec): undefined reference to `of_mm_gpiochip_add_data' > include/linux/mmzone.h:1749:2: error: #error Allocator MAX_ORDER exceeds > SECTION_SIZE > > Unverified Error/Warning (likely false positive, please contact us if > interested): ... > sound/soc/sof/ipc4-pcm.c:391 sof_ipc4_pcm_dai_link_fixup_rate() error: > uninitialized symbol 'be_rate'. > sound/soc/sof/ipc4-topology.c:1132 ipc4_copier_set_capture_fmt() error: > uninitialized symbol 'sample_valid_bits'. These are false positives, the copier which is used in these functions always have input and output side, the be_rate will be initialized: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/sound/soc/sof/ipc4-pcm.c#n485 on the first iteration. the sample_valid_bits will be initialized: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/sound/soc/sof/ipc4-topology.c#n1310 on the first iteration. -- -- Péter
Re: [PATCH v2 00/14] arch,mm: cleanup Kconfig entries for ARCH_FORCE_MAX_ORDER
On Fri, Mar 24, 2023 at 10:30:07AM -0400, Zi Yan wrote: > On 24 Mar 2023, at 1:22, Mike Rapoport wrote: > > > From: "Mike Rapoport (IBM)" > > > > Hi, > > > > Several architectures have ARCH_FORCE_MAX_ORDER in their Kconfig and > > they all have wrong and misleading prompt and help text for this option. > > > > Besides, some define insane limits for possible values of > > ARCH_FORCE_MAX_ORDER, some carefully define ranges only for a subset of > > possible configurations, some make this option configurable by users for no > > good reason. > > > > This set updates the prompt and help text everywhere and does its best to > > update actual definitions of ranges where applicable. > > > > kbuild generated a bunch of false positives because it assigns -1 to > > ARCH_FORCE_MAX_ORDER, hopefully this will be fixed soon. > > > > v2: > > * arm64: show prompt for ARCH_FORCE_MAX_ORDER only if EXPERT (Catalin) > > * Add Acked- and Reviewed-by tags (thanks Geert, Kirill and Max) > > > > v1: https://lore.kernel.org/all/20230323092156.2545741-1-r...@kernel.org > > > > Mike Rapoport (IBM) (14): > > arm: reword ARCH_FORCE_MAX_ORDER prompt and help text > > arm64: drop ranges in definition of ARCH_FORCE_MAX_ORDER > > arm64: reword ARCH_FORCE_MAX_ORDER prompt and help text > > csky: drop ARCH_FORCE_MAX_ORDER > > ia64: don't allow users to override ARCH_FORCE_MAX_ORDER > > m68k: reword ARCH_FORCE_MAX_ORDER prompt and help text > > nios2: reword ARCH_FORCE_MAX_ORDER prompt and help text > > nios2: drop ranges for definition of ARCH_FORCE_MAX_ORDER > > powerpc: reword ARCH_FORCE_MAX_ORDER prompt and help text > > powerpc: drop ranges for definition of ARCH_FORCE_MAX_ORDER > > sh: reword ARCH_FORCE_MAX_ORDER prompt and help text > > sh: drop ranges for definition of ARCH_FORCE_MAX_ORDER > > sparc: reword ARCH_FORCE_MAX_ORDER prompt and help text > > xtensa: reword ARCH_FORCE_MAX_ORDER prompt and help text > > > > arch/arm/Kconfig | 16 +--- > > arch/arm64/Kconfig| 27 --- > > arch/csky/Kconfig | 4 > > arch/ia64/Kconfig | 3 +-- > > arch/m68k/Kconfig.cpu | 16 +--- > > arch/nios2/Kconfig| 17 + > > arch/powerpc/Kconfig | 22 +- > > arch/sh/mm/Kconfig| 19 +-- > > arch/sparc/Kconfig| 16 +--- > > arch/xtensa/Kconfig | 16 +--- > > 10 files changed, 76 insertions(+), 80 deletions(-) > > > > > > base-commit: 51551d71edbc998fd8c8afa7312db3d270f5998e > > LGTM, thanks. Reviewed-by: Zi Yan Thanks! And thanks for spotting the mistakes in arm64 and sh patches. > -- > Best Regards, > Yan, Zi -- Sincerely yours, Mike.
Re: [PATCH 01/10] locking/atomic: Add missing cast to try_cmpxchg() fallbacks
On Fri, Mar 24, 2023 at 04:14:22PM +, Mark Rutland wrote: > On Fri, Mar 24, 2023 at 04:43:32PM +0100, Uros Bizjak wrote: > > On Fri, Mar 24, 2023 at 3:13 PM Mark Rutland wrote: > > > > > > On Sun, Mar 05, 2023 at 09:56:19PM +0100, Uros Bizjak wrote: > > > > Cast _oldp to the type of _ptr to avoid incompatible-pointer-types > > > > warning. > > > > > > Can you give an example of where we are passing an incompatible pointer? > > > > An example is patch 10/10 from the series, which will fail without > > this fix when fallback code is used. We have: > > > > - } while (local_cmpxchg(>head, offset, head) != offset); > > + } while (!local_try_cmpxchg(>head, , head)); > > > > where rb->head is defined as: > > > > typedef struct { > >atomic_long_t a; > > } local_t; > > > > while offset is defined as 'unsigned long'. > > Ok, but that's because we're doing the wrong thing to start with. > > Since local_t is defined in terms of atomic_long_t, we should define the > generic local_try_cmpxchg() in terms of atomic_long_try_cmpxchg(). We'll still > have a mismatch between 'long *' and 'unsigned long *', but then we can fix > that in the callsite: > > while (!local_try_cmpxchg(>head, &(long *)offset, head)) Sorry, that should be: while (!local_try_cmpxchg(>head, (long *), head)) The fundamenalthing I'm trying to say is that the atomic/atomic64/atomic_long/local/local64 APIs should be type-safe, and for their try_cmpxchg() implementations, the type signature should be: ${atomictype}_try_cmpxchg(${atomictype} *ptr, ${inttype} *old, ${inttype} new) Thanks, Mark.
Re: [PATCH 01/10] locking/atomic: Add missing cast to try_cmpxchg() fallbacks
On Fri, Mar 24, 2023 at 04:43:32PM +0100, Uros Bizjak wrote: > On Fri, Mar 24, 2023 at 3:13 PM Mark Rutland wrote: > > > > On Sun, Mar 05, 2023 at 09:56:19PM +0100, Uros Bizjak wrote: > > > Cast _oldp to the type of _ptr to avoid incompatible-pointer-types > > > warning. > > > > Can you give an example of where we are passing an incompatible pointer? > > An example is patch 10/10 from the series, which will fail without > this fix when fallback code is used. We have: > > - } while (local_cmpxchg(>head, offset, head) != offset); > + } while (!local_try_cmpxchg(>head, , head)); > > where rb->head is defined as: > > typedef struct { >atomic_long_t a; > } local_t; > > while offset is defined as 'unsigned long'. Ok, but that's because we're doing the wrong thing to start with. Since local_t is defined in terms of atomic_long_t, we should define the generic local_try_cmpxchg() in terms of atomic_long_try_cmpxchg(). We'll still have a mismatch between 'long *' and 'unsigned long *', but then we can fix that in the callsite: while (!local_try_cmpxchg(>head, &(long *)offset, head)) ... which then won't silently mask issues elsewhere, and will be consistent with all the other atomic APIs. Thanks, Mark. > > The assignment in existing try_cmpxchg template: > > typeof(*(_ptr)) *___op = (_oldp) > > will trigger an initialization from an incompatible pointer type error. > > Please note that x86 avoids this issue by a cast in its > target-dependent definition: > > #define __raw_try_cmpxchg(_ptr, _pold, _new, size, lock)\ > ({ \ >bool success; \ >__typeof__(_ptr) _old = (__typeof__(_ptr))(_pold); \ >__typeof__(*(_ptr)) __old = *_old; \ >__typeof__(*(_ptr)) __new = (_new); \ > > so, the warning/error will trigger only in the fallback code. > > > That sounds indicative of a bug in the caller, but maybe I'm missing some > > reason this is necessary due to some indirection. > > > > > Fixes: 29f006fdefe6 ("asm-generic/atomic: Add try_cmpxchg() fallbacks") > > > > I'm not sure that this needs a fixes tag. Does anything go wrong today, or > > only > > later in this series? > > The patch at [1] triggered a build error in posix_acl.c/__get.acl due > to the same problem. The compilation for x86 target was OK, because > x86 defines target-specific arch_try_cmpxchg, but the compilation > broke for targets that revert to generic support. Please note that > this specific problem was recently fixed in a different way [2], but > the issue with the fallback remains. > > [1] https://lore.kernel.org/lkml/20220714173819.13312-1-ubiz...@gmail.com/ > [2] https://lore.kernel.org/lkml/20221201160103.76012-1-ubiz...@gmail.com/ > > Uros.
Re: [PATCH 01/10] locking/atomic: Add missing cast to try_cmpxchg() fallbacks
On Fri, Mar 24, 2023 at 3:13 PM Mark Rutland wrote: > > On Sun, Mar 05, 2023 at 09:56:19PM +0100, Uros Bizjak wrote: > > Cast _oldp to the type of _ptr to avoid incompatible-pointer-types warning. > > Can you give an example of where we are passing an incompatible pointer? An example is patch 10/10 from the series, which will fail without this fix when fallback code is used. We have: - } while (local_cmpxchg(>head, offset, head) != offset); + } while (!local_try_cmpxchg(>head, , head)); where rb->head is defined as: typedef struct { atomic_long_t a; } local_t; while offset is defined as 'unsigned long'. The assignment in existing try_cmpxchg template: typeof(*(_ptr)) *___op = (_oldp) will trigger an initialization from an incompatible pointer type error. Please note that x86 avoids this issue by a cast in its target-dependent definition: #define __raw_try_cmpxchg(_ptr, _pold, _new, size, lock)\ ({ \ bool success; \ __typeof__(_ptr) _old = (__typeof__(_ptr))(_pold); \ __typeof__(*(_ptr)) __old = *_old; \ __typeof__(*(_ptr)) __new = (_new); \ so, the warning/error will trigger only in the fallback code. > That sounds indicative of a bug in the caller, but maybe I'm missing some > reason this is necessary due to some indirection. > > > Fixes: 29f006fdefe6 ("asm-generic/atomic: Add try_cmpxchg() fallbacks") > > I'm not sure that this needs a fixes tag. Does anything go wrong today, or > only > later in this series? The patch at [1] triggered a build error in posix_acl.c/__get.acl due to the same problem. The compilation for x86 target was OK, because x86 defines target-specific arch_try_cmpxchg, but the compilation broke for targets that revert to generic support. Please note that this specific problem was recently fixed in a different way [2], but the issue with the fallback remains. [1] https://lore.kernel.org/lkml/20220714173819.13312-1-ubiz...@gmail.com/ [2] https://lore.kernel.org/lkml/20221201160103.76012-1-ubiz...@gmail.com/ Uros.
Re: [PATCH v2 00/14] arch,mm: cleanup Kconfig entries for ARCH_FORCE_MAX_ORDER
On 24 Mar 2023, at 1:22, Mike Rapoport wrote: > From: "Mike Rapoport (IBM)" > > Hi, > > Several architectures have ARCH_FORCE_MAX_ORDER in their Kconfig and > they all have wrong and misleading prompt and help text for this option. > > Besides, some define insane limits for possible values of > ARCH_FORCE_MAX_ORDER, some carefully define ranges only for a subset of > possible configurations, some make this option configurable by users for no > good reason. > > This set updates the prompt and help text everywhere and does its best to > update actual definitions of ranges where applicable. > > kbuild generated a bunch of false positives because it assigns -1 to > ARCH_FORCE_MAX_ORDER, hopefully this will be fixed soon. > > v2: > * arm64: show prompt for ARCH_FORCE_MAX_ORDER only if EXPERT (Catalin) > * Add Acked- and Reviewed-by tags (thanks Geert, Kirill and Max) > > v1: https://lore.kernel.org/all/20230323092156.2545741-1-r...@kernel.org > > Mike Rapoport (IBM) (14): > arm: reword ARCH_FORCE_MAX_ORDER prompt and help text > arm64: drop ranges in definition of ARCH_FORCE_MAX_ORDER > arm64: reword ARCH_FORCE_MAX_ORDER prompt and help text > csky: drop ARCH_FORCE_MAX_ORDER > ia64: don't allow users to override ARCH_FORCE_MAX_ORDER > m68k: reword ARCH_FORCE_MAX_ORDER prompt and help text > nios2: reword ARCH_FORCE_MAX_ORDER prompt and help text > nios2: drop ranges for definition of ARCH_FORCE_MAX_ORDER > powerpc: reword ARCH_FORCE_MAX_ORDER prompt and help text > powerpc: drop ranges for definition of ARCH_FORCE_MAX_ORDER > sh: reword ARCH_FORCE_MAX_ORDER prompt and help text > sh: drop ranges for definition of ARCH_FORCE_MAX_ORDER > sparc: reword ARCH_FORCE_MAX_ORDER prompt and help text > xtensa: reword ARCH_FORCE_MAX_ORDER prompt and help text > > arch/arm/Kconfig | 16 +--- > arch/arm64/Kconfig| 27 --- > arch/csky/Kconfig | 4 > arch/ia64/Kconfig | 3 +-- > arch/m68k/Kconfig.cpu | 16 +--- > arch/nios2/Kconfig| 17 + > arch/powerpc/Kconfig | 22 +- > arch/sh/mm/Kconfig| 19 +-- > arch/sparc/Kconfig| 16 +--- > arch/xtensa/Kconfig | 16 +--- > 10 files changed, 76 insertions(+), 80 deletions(-) > > > base-commit: 51551d71edbc998fd8c8afa7312db3d270f5998e LGTM, thanks. Reviewed-by: Zi Yan -- Best Regards, Yan, Zi signature.asc Description: OpenPGP digital signature
Re: [PATCH v2 11/14] sh: reword ARCH_FORCE_MAX_ORDER prompt and help text
On 24 Mar 2023, at 1:22, Mike Rapoport wrote: > From: "Mike Rapoport (IBM)" > > The prompt and help text of ARCH_FORCE_MAX_ORDER are not even close to > describe this configuration option. > > Update both to actually describe what this option does. > > Signed-off-by: Mike Rapoport (IBM) > Acked-by: Kirill A. Shutemov > --- > arch/sh/mm/Kconfig | 17 + > 1 file changed, 9 insertions(+), 8 deletions(-) > > diff --git a/arch/sh/mm/Kconfig b/arch/sh/mm/Kconfig > index 40271090bd7d..fb15ba1052ba 100644 > --- a/arch/sh/mm/Kconfig > +++ b/arch/sh/mm/Kconfig > @@ -19,8 +19,7 @@ config PAGE_OFFSET > default "0x" > > config ARCH_FORCE_MAX_ORDER > - int "Maximum zone order" > - range 8 63 if PAGE_SIZE_16KB This range is dropped in current patch, but the other ranges are dropped in the next patch. But feel free to ignore this since ultimately the ranges are all dropped. > + int "Order of maximal physically contiguous allocations" > default "8" if PAGE_SIZE_16KB > range 6 63 if PAGE_SIZE_64KB > default "6" if PAGE_SIZE_64KB > @@ -28,16 +27,18 @@ config ARCH_FORCE_MAX_ORDER > default "13" if !MMU > default "10" > help > - The kernel memory allocator divides physically contiguous memory > - blocks into "zones", where each zone is a power of two number of > - pages. This option selects the largest power of two that the kernel > - keeps in the memory allocator. If you need to allocate very large > - blocks of physically contiguous memory, then you may need to > - increase this value. > + The kernel page allocator limits the size of maximal physically > + contiguous allocations. The limit is called MAX_ORDER and it > + defines the maximal power of two of number of pages that can be > + allocated as a single contiguous block. This option allows > + overriding the default setting when ability to allocate very > + large blocks of physically contiguous memory is required. > > The page size is not necessarily 4KB. Keep this in mind when > choosing a value for this option. > > + Don't change if unsure. > + > config MEMORY_START > hex "Physical memory start address" > default "0x0800" > -- > 2.35.1 -- Best Regards, Yan, Zi signature.asc Description: OpenPGP digital signature
Re: [PATCH 01/10] locking/atomic: Add missing cast to try_cmpxchg() fallbacks
On Sun, Mar 05, 2023 at 09:56:19PM +0100, Uros Bizjak wrote: > Cast _oldp to the type of _ptr to avoid incompatible-pointer-types warning. Can you give an example of where we are passing an incompatible pointer? That sounds indicative of a bug in the caller, but maybe I'm missing some reason this is necessary due to some indirection. > Fixes: 29f006fdefe6 ("asm-generic/atomic: Add try_cmpxchg() fallbacks") I'm not sure that this needs a fixes tag. Does anything go wrong today, or only later in this series? Thanks, Mark. > Cc: Will Deacon > Cc: Peter Zijlstra > Cc: Boqun Feng > Cc: Mark Rutland > Signed-off-by: Uros Bizjak > --- > include/linux/atomic/atomic-arch-fallback.h | 18 +- > scripts/atomic/gen-atomic-fallback.sh | 2 +- > 2 files changed, 10 insertions(+), 10 deletions(-) > > diff --git a/include/linux/atomic/atomic-arch-fallback.h > b/include/linux/atomic/atomic-arch-fallback.h > index 77bc5522e61c..19debd501ee7 100644 > --- a/include/linux/atomic/atomic-arch-fallback.h > +++ b/include/linux/atomic/atomic-arch-fallback.h > @@ -87,7 +87,7 @@ > #ifndef arch_try_cmpxchg > #define arch_try_cmpxchg(_ptr, _oldp, _new) \ > ({ \ > - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \ > + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \ > ___r = arch_cmpxchg((_ptr), ___o, (_new)); \ > if (unlikely(___r != ___o)) \ > *___op = ___r; \ > @@ -98,7 +98,7 @@ > #ifndef arch_try_cmpxchg_acquire > #define arch_try_cmpxchg_acquire(_ptr, _oldp, _new) \ > ({ \ > - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \ > + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \ > ___r = arch_cmpxchg_acquire((_ptr), ___o, (_new)); \ > if (unlikely(___r != ___o)) \ > *___op = ___r; \ > @@ -109,7 +109,7 @@ > #ifndef arch_try_cmpxchg_release > #define arch_try_cmpxchg_release(_ptr, _oldp, _new) \ > ({ \ > - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \ > + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \ > ___r = arch_cmpxchg_release((_ptr), ___o, (_new)); \ > if (unlikely(___r != ___o)) \ > *___op = ___r; \ > @@ -120,7 +120,7 @@ > #ifndef arch_try_cmpxchg_relaxed > #define arch_try_cmpxchg_relaxed(_ptr, _oldp, _new) \ > ({ \ > - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \ > + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \ > ___r = arch_cmpxchg_relaxed((_ptr), ___o, (_new)); \ > if (unlikely(___r != ___o)) \ > *___op = ___r; \ > @@ -157,7 +157,7 @@ > #ifndef arch_try_cmpxchg64 > #define arch_try_cmpxchg64(_ptr, _oldp, _new) \ > ({ \ > - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \ > + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \ > ___r = arch_cmpxchg64((_ptr), ___o, (_new)); \ > if (unlikely(___r != ___o)) \ > *___op = ___r; \ > @@ -168,7 +168,7 @@ > #ifndef arch_try_cmpxchg64_acquire > #define arch_try_cmpxchg64_acquire(_ptr, _oldp, _new) \ > ({ \ > - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \ > + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \ > ___r = arch_cmpxchg64_acquire((_ptr), ___o, (_new)); \ > if (unlikely(___r != ___o)) \ > *___op = ___r; \ > @@ -179,7 +179,7 @@ > #ifndef arch_try_cmpxchg64_release > #define arch_try_cmpxchg64_release(_ptr, _oldp, _new) \ > ({ \ > - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \ > + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \ > ___r = arch_cmpxchg64_release((_ptr), ___o, (_new)); \ > if (unlikely(___r != ___o)) \ > *___op = ___r; \ > @@ -190,7 +190,7 @@ > #ifndef arch_try_cmpxchg64_relaxed > #define arch_try_cmpxchg64_relaxed(_ptr, _oldp, _new) \ > ({ \ > - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \ > + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \ > ___r = arch_cmpxchg64_relaxed((_ptr), ___o, (_new)); \ > if (unlikely(___r != ___o)) \ > *___op = ___r; \ > @@ -2456,4 +2456,4 @@ arch_atomic64_dec_if_positive(atomic64_t *v) > #endif > > #endif /* _LINUX_ATOMIC_FALLBACK_H */ > -// b5e87bdd5ede61470c29f7a7e4de781af3770f09 > +// 1b4d4c82ae653389cd1538d5b07170267d9b3837 > diff --git a/scripts/atomic/gen-atomic-fallback.sh > b/scripts/atomic/gen-atomic-fallback.sh > index 3a07695e3c89..39f447161108 100755 > --- a/scripts/atomic/gen-atomic-fallback.sh > +++ b/scripts/atomic/gen-atomic-fallback.sh > @@ -171,7 +171,7 @@ cat < #ifndef arch_try_${cmpxchg}${order} > #define arch_try_${cmpxchg}${order}(_ptr, _oldp, _new) \\ > ({ \\ > - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \\ > + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \\ > ___r =
Re: [PATCH v2 03/14] arm64: reword ARCH_FORCE_MAX_ORDER prompt and help text
On 24 Mar 2023, at 1:22, Mike Rapoport wrote: > From: "Mike Rapoport (IBM)" > > The prompt and help text of ARCH_FORCE_MAX_ORDER are not even close to > describe this configuration option. > > Update both to actually describe what this option does. > > Signed-off-by: Mike Rapoport (IBM) > Acked-by: Kirill A. Shutemov > --- > arch/arm64/Kconfig | 24 > 1 file changed, 12 insertions(+), 12 deletions(-) > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index 7324032af859..cdaf52ac4018 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -1487,24 +1487,24 @@ config XEN > # 16K | 27 | 14 | 13| 11 >| > # 64K | 29 | 16 | 13| 13 >| > config ARCH_FORCE_MAX_ORDER > - int "Maximum zone order" if EXPERT && (ARM64_4K_PAGES || > ARM64_16K_PAGES) > + int "Order of maximal physically contiguous allocations" if > ARM64_4K_PAGES || ARM64_16K_PAGES It seems that "if EXPERT" was dropped by accident here. > default "13" if ARM64_64K_PAGES > default "11" if ARM64_16K_PAGES > default "10" > help > - The kernel memory allocator divides physically contiguous memory > - blocks into "zones", where each zone is a power of two number of > - pages. This option selects the largest power of two that the kernel > - keeps in the memory allocator. If you need to allocate very large > - blocks of physically contiguous memory, then you may need to > - increase this value. > + The kernel page allocator limits the size of maximal physically > + contiguous allocations. The limit is called MAX_ORDER and it > + defines the maximal power of two of number of pages that can be > + allocated as a single contiguous block. This option allows > + overriding the default setting when ability to allocate very > + large blocks of physically contiguous memory is required. > > - We make sure that we can allocate up to a HugePage size for each > configuration. > - Hence we have : > - MAX_ORDER = PMD_SHIFT - PAGE_SHIFT => PAGE_SHIFT - 3 > + The maximal size of allocation cannot exceed the size of the > + section, so the value of MAX_ORDER should satisfy > > - However for 4K, we choose a higher default value, 10 as opposed to 9, > giving us > - 4M allocations matching the default size used by generic code. > + MAX_ORDER + PAGE_SHIFT <= SECTION_SIZE_BITS > + > + Don't change if unsure. > > config UNMAP_KERNEL_AT_EL0 > bool "Unmap kernel when running in userspace (aka \"KAISER\")" if EXPERT > -- > 2.35.1 -- Best Regards, Yan, Zi signature.asc Description: OpenPGP digital signature
Re: [PATCH v12 13/13] arm64: dts: ls1088ardb: Add serdes descriptions
On Tue, Mar 21, 2023 at 04:13:12PM -0400, Sean Anderson wrote: > This adds serdes support to the LS1088ARDB. I have tested the QSGMII > ports as well as the two 10G ports. The SFP slot is now fully supported, > instead of being modeled as a fixed-link. > > Linux hangs around when the serdes is initialized if the si5341 is > enabled with the in-tree driver, so I have modeled it as a two fixed > clocks instead. There are a few registers in the QIXIS FPGA which > control the SFP GPIOs; I have modeled them as discrete GPIO controllers > for now. I never saw the AQR105 interrupt fire; not sure what was going > on, but I have removed it to force polling. So you didn't see the interrupt fire even without these patches? I just tested this on a LS1088ARDB and it works. root@localhost:~# cat /proc/interrupts | grep extirq 99: 5 ls-extirq 2 Level 0x08b97000:00 root@localhost:~# ip link set dev endpmac2 up root@localhost:~# cat /proc/interrupts | grep extirq 99: 6 ls-extirq 2 Level 0x08b97000:00 root@localhost:~# ip link set dev endpmac2 down root@localhost:~# cat /proc/interrupts | grep extirq 99: 7 ls-extirq 2 Level 0x08b97000:00 Please don't just remove things. Ioana
Re: Memory coherency issue with IO thread offloading?
On 3/24/23 1:27?AM, Christophe Leroy wrote: > Hi, > > Le 23/03/2023 ? 19:54, Jens Axboe a ?crit : >> Hi, >> >> I got a report sent to me from mariadb, in where 5.10.158 works fine and >> 5.10.162 is broken. And in fact, current 6.3-rc also fails the test >> case. Beware that this email is long, as I'm trying to include >> everything that may be relevant... > > Which variant of powerpc ? 32 or 64 bits ? Book3S or BookE ? I knew I'd forget something important... It's power9: processor : 0 cpu : POWER9 (architected), altivec supported clock : 2200.00MHz revision: 2.2 (pvr 004e 1202) -- Jens Axboe
Re: [PATCH v8 1/3] riscv: Introduce CONFIG_RELOCATABLE
Hi Nick, On 3/22/23 19:25, Nick Desaulniers wrote: On Fri, Feb 24, 2023 at 7:58 AM Björn Töpel wrote: Alexandre Ghiti writes: +cc linux-kbuild, llvm, Nathan, Nick On 2/15/23 15:36, Alexandre Ghiti wrote: From: Alexandre Ghiti I tried a lot of things, but I struggle to understand, does anyone have any idea? FYI, the same problem happens with LLVM. Off the top of my head, no idea. (Maybe as a follow up to this series, I wonder if pursuing ARCH_HAS_RELR for ARCH=riscv is worthwhile?) IIUC, the goal for using RELR is to reduce the size of a kernel image: right now, this is not my priority, but I'll add that to my todo list because that may be useful to distros. Don't ask me *why*, but adding --emit-relocs to your linker flags solves "the NULL .rela.dyn" both for GCC and LLVM. The downside is that you end up with a bunch of .rela cruft in your vmlinux. There was a patch just this week to use $(OBJCOPY) to strip these from vmlinux (for x86). Looks like x86 uses --emit-relocs for KASLR: https://lore.kernel.org/lkml/20230320121006.4863-1-petr.pa...@suse.com/ That's nice, that would be an interesting intermediate step until we find the issue here as I believe it is important to have the relocations in the init section to save memory. Thanks for your answer Nick, really appreciated, Alex
Re: [PATCH] crypto: p10-aes-gcm - remove duplicate include header
On Tue, Mar 14, 2023 at 04:31:51PM +0800, ye.xingc...@zte.com.cn wrote: > From: Ye Xingchen > > crypto/algapi.h is included more than once. > > Signed-off-by: Ye Xingchen > --- > arch/powerpc/crypto/aes-gcm-p10-glue.c | 1 - > 1 file changed, 1 deletion(-) Patch applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH v7 6/6] PCI: Make use of pci_resource_n()
On Fri, Mar 24, 2023 at 10:08:39AM +0100, Philippe Mathieu-Daudé wrote: > On 23/3/23 18:36, Andy Shevchenko wrote: > > Replace open-coded implementations of pci_resource_n() in pci.h. ... > > #define pci_resource_n(dev, bar) (&(dev)->resource[(bar)]) > > -#define pci_resource_start(dev, bar) ((dev)->resource[(bar)].start) > > -#define pci_resource_end(dev, bar) ((dev)->resource[(bar)].end) > > -#define pci_resource_flags(dev, bar) ((dev)->resource[(bar)].flags) > > -#define pci_resource_len(dev,bar) \ > > - ((pci_resource_end((dev), (bar)) == 0) ? 0 :\ > > - \ > > -(pci_resource_end((dev), (bar)) - \ > > - pci_resource_start((dev), (bar)) + 1)) > > +#define pci_resource_start(dev, bar) (pci_resource_n(dev, > > bar)->start) > > +#define pci_resource_end(dev, bar) (pci_resource_n(dev, bar)->end) > > +#define pci_resource_flags(dev, bar) (pci_resource_n(dev, > > bar)->flags) > > +#define pci_resource_len(dev,bar) \ > > + (pci_resource_end((dev), (bar)) ? \ > > +resource_size(pci_resource_n((dev), (bar))) : 0) > > Seems (to me) more logical to have this patch as "PCI: Introduce > pci_resource_n()" ordered before your patch #2 "PCI: Introduce > pci_dev_for_each_resource()". Either way works for me. Bjorn, what do you like? > Here as #6 or as #2: > Reviewed-by: Philippe Mathieu-Daudé Thank you! -- With Best Regards, Andy Shevchenko
Re: [PATCH v7 4/6] EISA: Convert to use less arguments in pci_bus_for_each_resource()
On Fri, Mar 24, 2023 at 10:02:15AM +0100, Philippe Mathieu-Daudé wrote: > On 23/3/23 18:36, Andy Shevchenko wrote: > > The pci_bus_for_each_resource() can hide the iterator loop since > > it may be not used otherwise. With this, we may drop that iterator > > variable definition. > > > > Signed-off-by: Andy Shevchenko > > Reviewed-by: Krzysztof Wilczyński > > --- > > drivers/eisa/pci_eisa.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/eisa/pci_eisa.c b/drivers/eisa/pci_eisa.c > > Since this is *PCI* EISA, could be squashed into previous patch. I believe it would be better to have them separated. But if maintainers want to squash, I can do that. > Reviewed-by: Philippe Mathieu-Daudé Thank you! -- With Best Regards, Andy Shevchenko
Re: [PATCH v3 0/4] Use dma_default_coherent for devicetree default coherency
> 2023年3月23日 21:39,Christoph Hellwig 写道: > > On Thu, Mar 23, 2023 at 09:07:31PM +, Jiaxun Yang wrote: >> >> >>> 2023年3月23日 07:29,Christoph Hellwig 写道: >>> >>> The series looks fine to me. How should we merge it? >> >> Perhaps go through dma-mapping tree? > > Is patch a 6.3 candidate or should all of it go into 6.4? Please leave it for 6.4, as corresponding MIPS arch part will be a part of 6.4. Thanks Jiaxun
Re: [PATCH v7 6/6] PCI: Make use of pci_resource_n()
On 23/3/23 18:36, Andy Shevchenko wrote: Replace open-coded implementations of pci_resource_n() in pci.h. Signed-off-by: Andy Shevchenko --- include/linux/pci.h | 14 ++ 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/include/linux/pci.h b/include/linux/pci.h index 70a4684d5f26..9539cf63fe5e 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -2006,14 +2006,12 @@ int pci_iobar_pfn(struct pci_dev *pdev, int bar, struct vm_area_struct *vma); * for accessing popular PCI BAR info */ #define pci_resource_n(dev, bar) (&(dev)->resource[(bar)]) -#define pci_resource_start(dev, bar) ((dev)->resource[(bar)].start) -#define pci_resource_end(dev, bar) ((dev)->resource[(bar)].end) -#define pci_resource_flags(dev, bar) ((dev)->resource[(bar)].flags) -#define pci_resource_len(dev,bar) \ - ((pci_resource_end((dev), (bar)) == 0) ? 0 :\ - \ -(pci_resource_end((dev), (bar)) - \ - pci_resource_start((dev), (bar)) + 1)) +#define pci_resource_start(dev, bar) (pci_resource_n(dev, bar)->start) +#define pci_resource_end(dev, bar) (pci_resource_n(dev, bar)->end) +#define pci_resource_flags(dev, bar) (pci_resource_n(dev, bar)->flags) +#define pci_resource_len(dev,bar) \ + (pci_resource_end((dev), (bar)) ? \ +resource_size(pci_resource_n((dev), (bar))) : 0) Seems (to me) more logical to have this patch as "PCI: Introduce pci_resource_n()" ordered before your patch #2 "PCI: Introduce pci_dev_for_each_resource()". Here as #6 or as #2: Reviewed-by: Philippe Mathieu-Daudé
Re: [PATCH v7 4/6] EISA: Convert to use less arguments in pci_bus_for_each_resource()
On 23/3/23 18:36, Andy Shevchenko wrote: The pci_bus_for_each_resource() can hide the iterator loop since it may be not used otherwise. With this, we may drop that iterator variable definition. Signed-off-by: Andy Shevchenko Reviewed-by: Krzysztof Wilczyński --- drivers/eisa/pci_eisa.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/eisa/pci_eisa.c b/drivers/eisa/pci_eisa.c Since this is *PCI* EISA, could be squashed into previous patch. Reviewed-by: Philippe Mathieu-Daudé
Re: [PATCH v7 3/6] PCI: Allow pci_bus_for_each_resource() to take less arguments
On 23/3/23 18:36, Andy Shevchenko wrote: Refactor pci_bus_for_each_resource() in the same way as it's done in pci_dev_for_each_resource() case. This will allow to hide iterator inside the loop, where it's not used otherwise. No functional changes intended. Signed-off-by: Andy Shevchenko Reviewed-by: Krzysztof Wilczyński --- drivers/pci/bus.c | 7 +++ drivers/pci/hotplug/shpchp_sysfs.c | 8 drivers/pci/pci.c | 3 +-- drivers/pci/probe.c| 2 +- drivers/pci/setup-bus.c| 10 -- include/linux/pci.h| 17 + 6 files changed, 26 insertions(+), 21 deletions(-) Nice. Reviewed-by: Philippe Mathieu-Daudé
Re: Memory coherency issue with IO thread offloading?
Hi, Le 23/03/2023 à 19:54, Jens Axboe a écrit : > Hi, > > I got a report sent to me from mariadb, in where 5.10.158 works fine and > 5.10.162 is broken. And in fact, current 6.3-rc also fails the test > case. Beware that this email is long, as I'm trying to include > everything that may be relevant... Which variant of powerpc ? 32 or 64 bits ? Book3S or BookE ? Christophe > > The test case in question is pretty simple. On debian testing, do: > > $ sudo apt-get install mariadb-test > $ cd /usr/share/mysql/mysql-test > $ ./mtr --mysqld=--innodb-flush-method=fsync > --mysqld=--innodb-use-native-aio=1 --vardir=/dev/shm/mysql --force > encryption.innodb_encryption,innodb,undo0 --repeat=200 > > and if it fails, you'll see something like: > > encryption.innodb_encryption 'innodb,undo0' [ 6 pass ] 3120 > encryption.innodb_encryption 'innodb,undo0' [ 7 pass ] 3123 > encryption.innodb_encryption 'innodb,undo0' [ 8 pass ] 3042 > encryption.innodb_encryption 'innodb,undo0' [ 9 fail ] > Test ended at 2023-03-23 16:55:17 > > CURRENT_TEST: encryption.innodb_encryption > mysqltest: At line 11: query 'SET @start_global_value = > @@global.innodb_encryption_threads' failed: ER_UNKNOWN_SYSTEM_VARIABLE > (1193): Unknown system variable 'innodb_encryption_threads' > > The result from queries just before the failure was: > SET @start_global_value = @@global.innodb_encryption_threads; > > - saving '/dev/shm/mysql/log/encryption.innodb_encryption-innodb,undo0/' to > '/dev/shm/mysql/log/encryption.innodb_encryption-innodb,undo0/' > ***Warnings generated in error logs during shutdown after running tests: > encryption.innodb_encryption > > 2023-03-23 16:55:17 0 [Warning] Plugin 'example_key_management' is of > maturity level experimental while the server is stable > 2023-03-23 16:55:17 0 [ERROR] InnoDB: Database page corruption on disk or a > failed read of file './ibdata1' page [page id: space=0, page number=221]. You > may have to recover from a backup. > > where data read was not as expected. > > Now, there are a number of io_uring changes between .158 and .162, as it > includes the backport that brought 5.10-stable into line with what > 5.15-stable includes. I'll spare you all the digging I did to vet those > changes, but the key thing is that it STILL happens on 6.3-git on > powerpc. > > After ruling out many things, one key difference between 158 and 162 is > that the former offloaded requests that could not be done nonblocking to > a kthread, and 162 and newer offloads to an IO thread. An IO thread is > just a normal thread created from the application submitting IO, the > only difference is that it never exits to userspace. An IO thread has > the same mm/files/you-name-it from the original task. It really is the > same as a userspace thread created by the application The switch to IO > threads was done exactly because of that, rather than rely on a fragile > scheme of having the kthread worker assume all sorts of identify from > the original task. surprises if things were missed. This is what caused > most of the io_uring security issues in the past. > > The IO that mariadb does in this test is pretty simple - a bunch of > largish buffered writes with IORING_OP_WRITEV, and some smallish (16K) > buffered reads with IORING_OP_READV. > > Today I finally gave up and ran a basic experiment, which simply > offloads the writes to a kthread. Since powerpc has an interesting > memory coherency model, my suspicion was that the work involved with > switching MMs for the kthread could just be the main difference here. > The patch is really dumb and simple - rather than queue the write to an > IO thread, it just offloads it to a kthread that then does > kthread_use_mm(), perform write with the same write handler, > kthread_unuse_mm(). AND THIS WORKS! Usually the above mtr test would > fail in 2..20 loops, I've now done 200 and 500 loops and it's fine. > > Which then leads me to the question, what about the IO thread offload > makes this fail on powerpc (and no other arch I've tested on, including > x86/x86-64/aarch64/hppa64)? The offload should be equivalent to having a > thread in userspace in the application, and having that thread just > perform the writes. Is there some magic involved with the kthread mm > use/unuse that makes this sufficiently consistent on powerpc? I've tried > any mix of isync()/mb and making the flush_dcache_page() unconditionally > done in the filemap read/write helpers, and it still falls flat on its > face with the offload to an IO thread. > > I must clearly be missing something here, which is why I'm emailing the > powerpc Gods for help :-) >