Re: Memory coherency issue with IO thread offloading?

2023-03-24 Thread Jens Axboe
On 3/24/23 6:42?PM, Michael Ellerman wrote:
> Jens Axboe  writes:
>> Hi,
> 
> Hi Jens,
> 
> Thanks for the report.
> 
>> I got a report sent to me from mariadb, in where 5.10.158 works fine and
>> 5.10.162 is broken. And in fact, current 6.3-rc also fails the test
>> case. Beware that this email is long, as I'm trying to include
>> everything that may be relevant...
>>
>> The test case in question is pretty simple. On debian testing, do:
>>
>> $ sudo apt-get install mariadb-test
>> $ cd /usr/share/mysql/mysql-test
>> $ ./mtr --mysqld=--innodb-flush-method=fsync 
>> --mysqld=--innodb-use-native-aio=1 --vardir=/dev/shm/mysql  --force 
>> encryption.innodb_encryption,innodb,undo0 --repeat=200
> 
> I mostly use Fedora, the package name is the same but the mtr binary
> ends up in /usr/share/mysql.
> 
>> and if it fails, you'll see something like:
>>
>> encryption.innodb_encryption 'innodb,undo0' [ 6 pass ]   3120
>> encryption.innodb_encryption 'innodb,undo0' [ 7 pass ]   3123
>> encryption.innodb_encryption 'innodb,undo0' [ 8 pass ]   3042
>> encryption.innodb_encryption 'innodb,undo0' [ 9 fail ]
>> Test ended at 2023-03-23 16:55:17
> 
> I haven't been able to get this to fail yet. I've done several runs with
> --repeat=500 and haven't seen any errors yet.
> 
> Are there any CONFIG options I'd need to trip this?

I don't think you need any special CONFIG options. I'll attach my config
here, and I know the default distro one hits it too. But perhaps the
mariadb version is not new enough? I think you need 10.6 or above, as
will use io_uring by default. What version are you running?

> ...
>> Today I finally gave up and ran a basic experiment, which simply
>> offloads the writes to a kthread. Since powerpc has an interesting
>> memory coherency model, my suspicion was that the work involved with
>> switching MMs for the kthread could just be the main difference here.
>> The patch is really dumb and simple - rather than queue the write to an
>> IO thread, it just offloads it to a kthread that then does
>> kthread_use_mm(), perform write with the same write handler,
>> kthread_unuse_mm(). AND THIS WORKS! Usually the above mtr test would
>> fail in 2..20 loops, I've now done 200 and 500 loops and it's fine.
> 
> Can you share the patch that does that? It would help me track down
> where exactly in the io_uring code you're talking about.

Shoot yes, I actually meant to attach it but then forgot. Below!

>> Which then leads me to the question, what about the IO thread offload
>> makes this fail on powerpc (and no other arch I've tested on, including
>> x86/x86-64/aarch64/hppa64)? The offload should be equivalent to having a
>> thread in userspace in the application, and having that thread just
>> perform the writes. Is there some magic involved with the kthread mm
>> use/unuse that makes this sufficiently consistent on powerpc? I've tried
>> any mix of isync()/mb and making the flush_dcache_page() unconditionally
>> done in the filemap read/write helpers, and it still falls flat on its
>> face with the offload to an IO thread.
> 
> My first guess would be that there's some missing barriers between the
> thread that queues the IO and the IO worker thread. 

That was my guess too, and I consulted Paul McKenney as well on that.
And he had some ideas of course, in terms of ordering of the CQ ring.
But tried it all out, and it still failed in the same way...

> I think you're using schedule_work() for that though, which should be a
> full barrier. Could it be on the completion side?

queue_work() for the patch, before that it's io-wq which is an internal
IO thread worker pool. The latter just needs a spin_lock() around
queueing the work, and then a wake of the task. Typing this out, maybe
this is where a barrier is now missing? If the IO thread is already
running rather than sleeping?

> I can't think of any magic in kthread_use_mm() other than extra
> barriers. In particular kthread_unuse_mm() has an
> smp_mb__after_spinlock() which is a full memory barrier on powerpc but
> is a nop on some other architectures, x86 at least.

Yeah, I did poke at kthread_use_mm() and the related powerpc bits, but
didn't immediately find anything that seemed promising in this regard.

>> I must clearly be missing something here, which is why I'm emailing the
>> powerpc Gods for help :-)
> 
> Unfortunately the true God of powerpc memory ordering has left us and
> ascended into the Metaverse ;)

;-)

-- 
Jens Axboe



Re: Memory coherency issue with IO thread offloading?

2023-03-24 Thread Michael Ellerman
Jens Axboe  writes:
> Hi,

Hi Jens,

Thanks for the report.

> I got a report sent to me from mariadb, in where 5.10.158 works fine and
> 5.10.162 is broken. And in fact, current 6.3-rc also fails the test
> case. Beware that this email is long, as I'm trying to include
> everything that may be relevant...
>
> The test case in question is pretty simple. On debian testing, do:
>
> $ sudo apt-get install mariadb-test
> $ cd /usr/share/mysql/mysql-test
> $ ./mtr --mysqld=--innodb-flush-method=fsync 
> --mysqld=--innodb-use-native-aio=1 --vardir=/dev/shm/mysql  --force 
> encryption.innodb_encryption,innodb,undo0 --repeat=200

I mostly use Fedora, the package name is the same but the mtr binary
ends up in /usr/share/mysql.

> and if it fails, you'll see something like:
>
> encryption.innodb_encryption 'innodb,undo0' [ 6 pass ]   3120
> encryption.innodb_encryption 'innodb,undo0' [ 7 pass ]   3123
> encryption.innodb_encryption 'innodb,undo0' [ 8 pass ]   3042
> encryption.innodb_encryption 'innodb,undo0' [ 9 fail ]
> Test ended at 2023-03-23 16:55:17

I haven't been able to get this to fail yet. I've done several runs with
--repeat=500 and haven't seen any errors yet.

Are there any CONFIG options I'd need to trip this?

...
> Today I finally gave up and ran a basic experiment, which simply
> offloads the writes to a kthread. Since powerpc has an interesting
> memory coherency model, my suspicion was that the work involved with
> switching MMs for the kthread could just be the main difference here.
> The patch is really dumb and simple - rather than queue the write to an
> IO thread, it just offloads it to a kthread that then does
> kthread_use_mm(), perform write with the same write handler,
> kthread_unuse_mm(). AND THIS WORKS! Usually the above mtr test would
> fail in 2..20 loops, I've now done 200 and 500 loops and it's fine.

Can you share the patch that does that? It would help me track down
where exactly in the io_uring code you're talking about.

> Which then leads me to the question, what about the IO thread offload
> makes this fail on powerpc (and no other arch I've tested on, including
> x86/x86-64/aarch64/hppa64)? The offload should be equivalent to having a
> thread in userspace in the application, and having that thread just
> perform the writes. Is there some magic involved with the kthread mm
> use/unuse that makes this sufficiently consistent on powerpc? I've tried
> any mix of isync()/mb and making the flush_dcache_page() unconditionally
> done in the filemap read/write helpers, and it still falls flat on its
> face with the offload to an IO thread.

My first guess would be that there's some missing barriers between the
thread that queues the IO and the IO worker thread. 

I think you're using schedule_work() for that though, which should be a
full barrier. Could it be on the completion side?

I can't think of any magic in kthread_use_mm() other than extra
barriers. In particular kthread_unuse_mm() has an
smp_mb__after_spinlock() which is a full memory barrier on powerpc but
is a nop on some other architectures, x86 at least.

> I must clearly be missing something here, which is why I'm emailing the
> powerpc Gods for help :-)

Unfortunately the true God of powerpc memory ordering has left us and
ascended into the Metaverse ;)

cheers


Re: Memory coherency issue with IO thread offloading?

2023-03-24 Thread Jens Axboe
On 3/24/23 6:15 PM, Michael Ellerman wrote:
> Jens Axboe  writes:
>> On 3/24/23 1:27?AM, Christophe Leroy wrote:
>>> Le 23/03/2023 ? 19:54, Jens Axboe a ?crit :
 I got a report sent to me from mariadb, in where 5.10.158 works fine and
 5.10.162 is broken. And in fact, current 6.3-rc also fails the test
 case. Beware that this email is long, as I'm trying to include
 everything that may be relevant...
>>>
>>> Which variant of powerpc ? 32 or 64 bits ? Book3S or BookE ?
>>
>> I knew I'd forget something important... It's power9:
>>
>> processor: 0
>> cpu  : POWER9 (architected), altivec supported
>> clock: 2200.00MHz
>> revision : 2.2 (pvr 004e 1202)
> 
> Believe it or not there's still more variables in play, Power9 has two
> different MMUs, and Linux can run on two different hypervisors as well
> as on bare metal P9 :)

Just my luck :-)

> 
> Can you paste the last ~10 lines of /proc/cpuinfo, with the "machine",
> "firmware" and "MMU" lines, that should tell us everything we need to
> know.

timebase: 51200
platform: pSeries
model   : IBM pSeries (emulated by qemu)
machine : CHRP IBM pSeries (emulated by qemu)
MMU : Radix

I know it reproduces bare metal as well, but no details on what
that box is. I just know I was able to reproduce it in this vm that
I was able to get my hands on.

-- 
Jens Axboe




Re: Memory coherency issue with IO thread offloading?

2023-03-24 Thread Michael Ellerman
Jens Axboe  writes:
> On 3/24/23 1:27?AM, Christophe Leroy wrote:
>> Le 23/03/2023 ? 19:54, Jens Axboe a ?crit :
>>> I got a report sent to me from mariadb, in where 5.10.158 works fine and
>>> 5.10.162 is broken. And in fact, current 6.3-rc also fails the test
>>> case. Beware that this email is long, as I'm trying to include
>>> everything that may be relevant...
>> 
>> Which variant of powerpc ? 32 or 64 bits ? Book3S or BookE ?
>
> I knew I'd forget something important... It's power9:
>
> processor : 0
> cpu   : POWER9 (architected), altivec supported
> clock : 2200.00MHz
> revision  : 2.2 (pvr 004e 1202)

Believe it or not there's still more variables in play, Power9 has two
different MMUs, and Linux can run on two different hypervisors as well
as on bare metal P9 :)

Can you paste the last ~10 lines of /proc/cpuinfo, with the "machine",
"firmware" and "MMU" lines, that should tell us everything we need to
know.

cheers


[PATCH] powerpc/pseries: Add spaces around / operator

2023-03-24 Thread Petr Vaněk
This is follow up change after 14b5d59a261b ("powerpc/pseries: Fix
formatting to make code look more beautiful") to conform to kernel
coding style.

Signed-off-by: Petr Vaněk 
---
 arch/powerpc/platforms/pseries/iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index c74b71d4733d..b0cb2fa39cf8 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -474,7 +474,7 @@ static int tce_setrange_multi_pSeriesLP(unsigned long 
start_pfn,
 * Set up the page with TCE data, looping through and setting
 * the values.
 */
-   limit = min_t(long, num_tce, 4096/TCE_ENTRY_SIZE);
+   limit = min_t(long, num_tce, 4096 / TCE_ENTRY_SIZE);
dma_offset = next + be64_to_cpu(maprange->dma_base);
 
for (l = 0; l < limit; l++) {
-- 
2.39.2



Re: [linux-next:master] BUILD REGRESSION 7c4a254d78f89546d0e74a40617ef24c6151c8d1

2023-03-24 Thread Péter Ujfalusi
Hi,

On 23/03/2023 23:34, kernel test robot wrote:
> tree/branch: 
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> branch HEAD: 7c4a254d78f89546d0e74a40617ef24c6151c8d1  Add linux-next 
> specific files for 20230323
> 
> Error/Warning reports:
> 
> https://lore.kernel.org/oe-kbuild-all/202303161521.jbgbafjj-...@intel.com
> https://lore.kernel.org/oe-kbuild-all/202303231302.iy6qifxa-...@intel.com
> https://lore.kernel.org/oe-kbuild-all/202303232154.axoxawhg-...@intel.com
> 
> Error/Warning: (recently discovered and may have been fixed)
> 
> drivers/gpu/drm/amd/amdgpu/../display/dc/link/link_validation.c:351:13: 
> warning: variable 'bw_needed' set but not used [-Wunused-but-set-variable]
> drivers/gpu/drm/amd/amdgpu/../display/dc/link/link_validation.c:352:25: 
> warning: variable 'link' set but not used [-Wunused-but-set-variable]
> drivers/net/wireless/legacy/ray_cs.c:628:17: warning: 'strncpy' specified 
> bound 32 equals destination size [-Wstringop-truncation]
> gpio.c:(.init.text+0xec): undefined reference to `of_mm_gpiochip_add_data'
> include/linux/mmzone.h:1749:2: error: #error Allocator MAX_ORDER exceeds 
> SECTION_SIZE
> 
> Unverified Error/Warning (likely false positive, please contact us if 
> interested):

...

> sound/soc/sof/ipc4-pcm.c:391 sof_ipc4_pcm_dai_link_fixup_rate() error: 
> uninitialized symbol 'be_rate'.
> sound/soc/sof/ipc4-topology.c:1132 ipc4_copier_set_capture_fmt() error: 
> uninitialized symbol 'sample_valid_bits'.

These are false positives, the copier which is used in these functions
always have input and output side, the be_rate will be initialized:
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/sound/soc/sof/ipc4-pcm.c#n485
on the first iteration.

the sample_valid_bits will be initialized:
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/sound/soc/sof/ipc4-topology.c#n1310
on the first iteration.

-- 
-- 
Péter


Re: [PATCH v2 00/14] arch,mm: cleanup Kconfig entries for ARCH_FORCE_MAX_ORDER

2023-03-24 Thread Mike Rapoport
On Fri, Mar 24, 2023 at 10:30:07AM -0400, Zi Yan wrote:
> On 24 Mar 2023, at 1:22, Mike Rapoport wrote:
> 
> > From: "Mike Rapoport (IBM)" 
> >
> > Hi,
> >
> > Several architectures have ARCH_FORCE_MAX_ORDER in their Kconfig and
> > they all have wrong and misleading prompt and help text for this option.
> >
> > Besides, some define insane limits for possible values of
> > ARCH_FORCE_MAX_ORDER, some carefully define ranges only for a subset of
> > possible configurations, some make this option configurable by users for no
> > good reason.
> >
> > This set updates the prompt and help text everywhere and does its best to
> > update actual definitions of ranges where applicable.
> >
> > kbuild generated a bunch of false positives because it assigns -1 to
> > ARCH_FORCE_MAX_ORDER, hopefully this will be fixed soon.
> >
> > v2:
> > * arm64: show prompt for ARCH_FORCE_MAX_ORDER only if EXPERT (Catalin)
> > * Add Acked- and Reviewed-by tags (thanks Geert, Kirill and Max)
> >
> > v1: https://lore.kernel.org/all/20230323092156.2545741-1-r...@kernel.org
> >
> > Mike Rapoport (IBM) (14):
> >   arm: reword ARCH_FORCE_MAX_ORDER prompt and help text
> >   arm64: drop ranges in definition of ARCH_FORCE_MAX_ORDER
> >   arm64: reword ARCH_FORCE_MAX_ORDER prompt and help text
> >   csky: drop ARCH_FORCE_MAX_ORDER
> >   ia64: don't allow users to override ARCH_FORCE_MAX_ORDER
> >   m68k: reword ARCH_FORCE_MAX_ORDER prompt and help text
> >   nios2: reword ARCH_FORCE_MAX_ORDER prompt and help text
> >   nios2: drop ranges for definition of ARCH_FORCE_MAX_ORDER
> >   powerpc: reword ARCH_FORCE_MAX_ORDER prompt and help text
> >   powerpc: drop ranges for definition of ARCH_FORCE_MAX_ORDER
> >   sh: reword ARCH_FORCE_MAX_ORDER prompt and help text
> >   sh: drop ranges for definition of ARCH_FORCE_MAX_ORDER
> >   sparc: reword ARCH_FORCE_MAX_ORDER prompt and help text
> >   xtensa: reword ARCH_FORCE_MAX_ORDER prompt and help text
> >
> >  arch/arm/Kconfig  | 16 +---
> >  arch/arm64/Kconfig| 27 ---
> >  arch/csky/Kconfig |  4 
> >  arch/ia64/Kconfig |  3 +--
> >  arch/m68k/Kconfig.cpu | 16 +---
> >  arch/nios2/Kconfig| 17 +
> >  arch/powerpc/Kconfig  | 22 +-
> >  arch/sh/mm/Kconfig| 19 +--
> >  arch/sparc/Kconfig| 16 +---
> >  arch/xtensa/Kconfig   | 16 +---
> >  10 files changed, 76 insertions(+), 80 deletions(-)
> >
> >
> > base-commit: 51551d71edbc998fd8c8afa7312db3d270f5998e
> 
> LGTM, thanks. Reviewed-by: Zi Yan 

Thanks!

And thanks for spotting the mistakes in arm64 and sh patches.
 
> --
> Best Regards,
> Yan, Zi

-- 
Sincerely yours,
Mike.


Re: [PATCH 01/10] locking/atomic: Add missing cast to try_cmpxchg() fallbacks

2023-03-24 Thread Mark Rutland
On Fri, Mar 24, 2023 at 04:14:22PM +, Mark Rutland wrote:
> On Fri, Mar 24, 2023 at 04:43:32PM +0100, Uros Bizjak wrote:
> > On Fri, Mar 24, 2023 at 3:13 PM Mark Rutland  wrote:
> > >
> > > On Sun, Mar 05, 2023 at 09:56:19PM +0100, Uros Bizjak wrote:
> > > > Cast _oldp to the type of _ptr to avoid incompatible-pointer-types 
> > > > warning.
> > >
> > > Can you give an example of where we are passing an incompatible pointer?
> > 
> > An example is patch 10/10 from the series, which will fail without
> > this fix when fallback code is used. We have:
> > 
> > -   } while (local_cmpxchg(>head, offset, head) != offset);
> > +   } while (!local_try_cmpxchg(>head, , head));
> > 
> > where rb->head is defined as:
> > 
> > typedef struct {
> >atomic_long_t a;
> > } local_t;
> > 
> > while offset is defined as 'unsigned long'.
> 
> Ok, but that's because we're doing the wrong thing to start with.
> 
> Since local_t is defined in terms of atomic_long_t, we should define the
> generic local_try_cmpxchg() in terms of atomic_long_try_cmpxchg(). We'll still
> have a mismatch between 'long *' and 'unsigned long *', but then we can fix
> that in the callsite:
> 
>   while (!local_try_cmpxchg(>head, &(long *)offset, head))

Sorry, that should be:

while (!local_try_cmpxchg(>head, (long *), head))

The fundamenalthing I'm trying to say is that the
atomic/atomic64/atomic_long/local/local64 APIs should be type-safe, and for
their try_cmpxchg() implementations, the type signature should be:

${atomictype}_try_cmpxchg(${atomictype} *ptr, ${inttype} *old, 
${inttype} new)

Thanks,
Mark.


Re: [PATCH 01/10] locking/atomic: Add missing cast to try_cmpxchg() fallbacks

2023-03-24 Thread Mark Rutland
On Fri, Mar 24, 2023 at 04:43:32PM +0100, Uros Bizjak wrote:
> On Fri, Mar 24, 2023 at 3:13 PM Mark Rutland  wrote:
> >
> > On Sun, Mar 05, 2023 at 09:56:19PM +0100, Uros Bizjak wrote:
> > > Cast _oldp to the type of _ptr to avoid incompatible-pointer-types 
> > > warning.
> >
> > Can you give an example of where we are passing an incompatible pointer?
> 
> An example is patch 10/10 from the series, which will fail without
> this fix when fallback code is used. We have:
> 
> -   } while (local_cmpxchg(>head, offset, head) != offset);
> +   } while (!local_try_cmpxchg(>head, , head));
> 
> where rb->head is defined as:
> 
> typedef struct {
>atomic_long_t a;
> } local_t;
> 
> while offset is defined as 'unsigned long'.

Ok, but that's because we're doing the wrong thing to start with.

Since local_t is defined in terms of atomic_long_t, we should define the
generic local_try_cmpxchg() in terms of atomic_long_try_cmpxchg(). We'll still
have a mismatch between 'long *' and 'unsigned long *', but then we can fix
that in the callsite:

while (!local_try_cmpxchg(>head, &(long *)offset, head))

... which then won't silently mask issues elsewhere, and will be consistent
with all the other atomic APIs.

Thanks,
Mark.

> 
> The assignment in existing try_cmpxchg template:
> 
> typeof(*(_ptr)) *___op = (_oldp)
> 
> will trigger an initialization from an incompatible pointer type error.
> 
> Please note that x86 avoids this issue by a cast in its
> target-dependent definition:
> 
> #define __raw_try_cmpxchg(_ptr, _pold, _new, size, lock)\
> ({  \
>bool success;   \
>__typeof__(_ptr) _old = (__typeof__(_ptr))(_pold);  \
>__typeof__(*(_ptr)) __old = *_old;  \
>__typeof__(*(_ptr)) __new = (_new); \
> 
> so, the warning/error will trigger only in the fallback code.
> 
> > That sounds indicative of a bug in the caller, but maybe I'm missing some
> > reason this is necessary due to some indirection.
> >
> > > Fixes: 29f006fdefe6 ("asm-generic/atomic: Add try_cmpxchg() fallbacks")
> >
> > I'm not sure that this needs a fixes tag. Does anything go wrong today, or 
> > only
> > later in this series?
> 
> The patch at [1] triggered a build error in posix_acl.c/__get.acl due
> to the same problem. The compilation for x86 target was OK, because
> x86 defines target-specific arch_try_cmpxchg, but the compilation
> broke for targets that revert to generic support. Please note that
> this specific problem was recently fixed in a different way [2], but
> the issue with the fallback remains.
> 
> [1] https://lore.kernel.org/lkml/20220714173819.13312-1-ubiz...@gmail.com/
> [2] https://lore.kernel.org/lkml/20221201160103.76012-1-ubiz...@gmail.com/
> 
> Uros.


Re: [PATCH 01/10] locking/atomic: Add missing cast to try_cmpxchg() fallbacks

2023-03-24 Thread Uros Bizjak
On Fri, Mar 24, 2023 at 3:13 PM Mark Rutland  wrote:
>
> On Sun, Mar 05, 2023 at 09:56:19PM +0100, Uros Bizjak wrote:
> > Cast _oldp to the type of _ptr to avoid incompatible-pointer-types warning.
>
> Can you give an example of where we are passing an incompatible pointer?

An example is patch 10/10 from the series, which will fail without
this fix when fallback code is used. We have:

-   } while (local_cmpxchg(>head, offset, head) != offset);
+   } while (!local_try_cmpxchg(>head, , head));

where rb->head is defined as:

typedef struct {
   atomic_long_t a;
} local_t;

while offset is defined as 'unsigned long'.

The assignment in existing try_cmpxchg template:

typeof(*(_ptr)) *___op = (_oldp)

will trigger an initialization from an incompatible pointer type error.

Please note that x86 avoids this issue by a cast in its
target-dependent definition:

#define __raw_try_cmpxchg(_ptr, _pold, _new, size, lock)\
({  \
   bool success;   \
   __typeof__(_ptr) _old = (__typeof__(_ptr))(_pold);  \
   __typeof__(*(_ptr)) __old = *_old;  \
   __typeof__(*(_ptr)) __new = (_new); \

so, the warning/error will trigger only in the fallback code.

> That sounds indicative of a bug in the caller, but maybe I'm missing some
> reason this is necessary due to some indirection.
>
> > Fixes: 29f006fdefe6 ("asm-generic/atomic: Add try_cmpxchg() fallbacks")
>
> I'm not sure that this needs a fixes tag. Does anything go wrong today, or 
> only
> later in this series?

The patch at [1] triggered a build error in posix_acl.c/__get.acl due
to the same problem. The compilation for x86 target was OK, because
x86 defines target-specific arch_try_cmpxchg, but the compilation
broke for targets that revert to generic support. Please note that
this specific problem was recently fixed in a different way [2], but
the issue with the fallback remains.

[1] https://lore.kernel.org/lkml/20220714173819.13312-1-ubiz...@gmail.com/
[2] https://lore.kernel.org/lkml/20221201160103.76012-1-ubiz...@gmail.com/

Uros.


Re: [PATCH v2 00/14] arch,mm: cleanup Kconfig entries for ARCH_FORCE_MAX_ORDER

2023-03-24 Thread Zi Yan
On 24 Mar 2023, at 1:22, Mike Rapoport wrote:

> From: "Mike Rapoport (IBM)" 
>
> Hi,
>
> Several architectures have ARCH_FORCE_MAX_ORDER in their Kconfig and
> they all have wrong and misleading prompt and help text for this option.
>
> Besides, some define insane limits for possible values of
> ARCH_FORCE_MAX_ORDER, some carefully define ranges only for a subset of
> possible configurations, some make this option configurable by users for no
> good reason.
>
> This set updates the prompt and help text everywhere and does its best to
> update actual definitions of ranges where applicable.
>
> kbuild generated a bunch of false positives because it assigns -1 to
> ARCH_FORCE_MAX_ORDER, hopefully this will be fixed soon.
>
> v2:
> * arm64: show prompt for ARCH_FORCE_MAX_ORDER only if EXPERT (Catalin)
> * Add Acked- and Reviewed-by tags (thanks Geert, Kirill and Max)
>
> v1: https://lore.kernel.org/all/20230323092156.2545741-1-r...@kernel.org
>
> Mike Rapoport (IBM) (14):
>   arm: reword ARCH_FORCE_MAX_ORDER prompt and help text
>   arm64: drop ranges in definition of ARCH_FORCE_MAX_ORDER
>   arm64: reword ARCH_FORCE_MAX_ORDER prompt and help text
>   csky: drop ARCH_FORCE_MAX_ORDER
>   ia64: don't allow users to override ARCH_FORCE_MAX_ORDER
>   m68k: reword ARCH_FORCE_MAX_ORDER prompt and help text
>   nios2: reword ARCH_FORCE_MAX_ORDER prompt and help text
>   nios2: drop ranges for definition of ARCH_FORCE_MAX_ORDER
>   powerpc: reword ARCH_FORCE_MAX_ORDER prompt and help text
>   powerpc: drop ranges for definition of ARCH_FORCE_MAX_ORDER
>   sh: reword ARCH_FORCE_MAX_ORDER prompt and help text
>   sh: drop ranges for definition of ARCH_FORCE_MAX_ORDER
>   sparc: reword ARCH_FORCE_MAX_ORDER prompt and help text
>   xtensa: reword ARCH_FORCE_MAX_ORDER prompt and help text
>
>  arch/arm/Kconfig  | 16 +---
>  arch/arm64/Kconfig| 27 ---
>  arch/csky/Kconfig |  4 
>  arch/ia64/Kconfig |  3 +--
>  arch/m68k/Kconfig.cpu | 16 +---
>  arch/nios2/Kconfig| 17 +
>  arch/powerpc/Kconfig  | 22 +-
>  arch/sh/mm/Kconfig| 19 +--
>  arch/sparc/Kconfig| 16 +---
>  arch/xtensa/Kconfig   | 16 +---
>  10 files changed, 76 insertions(+), 80 deletions(-)
>
>
> base-commit: 51551d71edbc998fd8c8afa7312db3d270f5998e

LGTM, thanks. Reviewed-by: Zi Yan 

--
Best Regards,
Yan, Zi


signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2 11/14] sh: reword ARCH_FORCE_MAX_ORDER prompt and help text

2023-03-24 Thread Zi Yan
On 24 Mar 2023, at 1:22, Mike Rapoport wrote:

> From: "Mike Rapoport (IBM)" 
>
> The prompt and help text of ARCH_FORCE_MAX_ORDER are not even close to
> describe this configuration option.
>
> Update both to actually describe what this option does.
>
> Signed-off-by: Mike Rapoport (IBM) 
> Acked-by: Kirill A. Shutemov 
> ---
>  arch/sh/mm/Kconfig | 17 +
>  1 file changed, 9 insertions(+), 8 deletions(-)
>
> diff --git a/arch/sh/mm/Kconfig b/arch/sh/mm/Kconfig
> index 40271090bd7d..fb15ba1052ba 100644
> --- a/arch/sh/mm/Kconfig
> +++ b/arch/sh/mm/Kconfig
> @@ -19,8 +19,7 @@ config PAGE_OFFSET
>   default "0x"
>
>  config ARCH_FORCE_MAX_ORDER
> - int "Maximum zone order"
> - range 8 63 if PAGE_SIZE_16KB

This range is dropped in current patch, but the other ranges are dropped in
the next patch. But feel free to ignore this since ultimately the ranges
are all dropped.

> + int "Order of maximal physically contiguous allocations"
>   default "8" if PAGE_SIZE_16KB
>   range 6 63 if PAGE_SIZE_64KB
>   default "6" if PAGE_SIZE_64KB
> @@ -28,16 +27,18 @@ config ARCH_FORCE_MAX_ORDER
>   default "13" if !MMU
>   default "10"
>   help
> -   The kernel memory allocator divides physically contiguous memory
> -   blocks into "zones", where each zone is a power of two number of
> -   pages.  This option selects the largest power of two that the kernel
> -   keeps in the memory allocator.  If you need to allocate very large
> -   blocks of physically contiguous memory, then you may need to
> -   increase this value.
> +   The kernel page allocator limits the size of maximal physically
> +   contiguous allocations. The limit is called MAX_ORDER and it
> +   defines the maximal power of two of number of pages that can be
> +   allocated as a single contiguous block. This option allows
> +   overriding the default setting when ability to allocate very
> +   large blocks of physically contiguous memory is required.
>
> The page size is not necessarily 4KB. Keep this in mind when
> choosing a value for this option.
>
> +   Don't change if unsure.
> +
>  config MEMORY_START
>   hex "Physical memory start address"
>   default "0x0800"
> -- 
> 2.35.1


--
Best Regards,
Yan, Zi


signature.asc
Description: OpenPGP digital signature


Re: [PATCH 01/10] locking/atomic: Add missing cast to try_cmpxchg() fallbacks

2023-03-24 Thread Mark Rutland
On Sun, Mar 05, 2023 at 09:56:19PM +0100, Uros Bizjak wrote:
> Cast _oldp to the type of _ptr to avoid incompatible-pointer-types warning.

Can you give an example of where we are passing an incompatible pointer?

That sounds indicative of a bug in the caller, but maybe I'm missing some
reason this is necessary due to some indirection.

> Fixes: 29f006fdefe6 ("asm-generic/atomic: Add try_cmpxchg() fallbacks")

I'm not sure that this needs a fixes tag. Does anything go wrong today, or only
later in this series?

Thanks,
Mark.

> Cc: Will Deacon 
> Cc: Peter Zijlstra 
> Cc: Boqun Feng 
> Cc: Mark Rutland 
> Signed-off-by: Uros Bizjak 
> ---
>  include/linux/atomic/atomic-arch-fallback.h | 18 +-
>  scripts/atomic/gen-atomic-fallback.sh   |  2 +-
>  2 files changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/atomic/atomic-arch-fallback.h 
> b/include/linux/atomic/atomic-arch-fallback.h
> index 77bc5522e61c..19debd501ee7 100644
> --- a/include/linux/atomic/atomic-arch-fallback.h
> +++ b/include/linux/atomic/atomic-arch-fallback.h
> @@ -87,7 +87,7 @@
>  #ifndef arch_try_cmpxchg
>  #define arch_try_cmpxchg(_ptr, _oldp, _new) \
>  ({ \
> - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
> + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
>   ___r = arch_cmpxchg((_ptr), ___o, (_new)); \
>   if (unlikely(___r != ___o)) \
>   *___op = ___r; \
> @@ -98,7 +98,7 @@
>  #ifndef arch_try_cmpxchg_acquire
>  #define arch_try_cmpxchg_acquire(_ptr, _oldp, _new) \
>  ({ \
> - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
> + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
>   ___r = arch_cmpxchg_acquire((_ptr), ___o, (_new)); \
>   if (unlikely(___r != ___o)) \
>   *___op = ___r; \
> @@ -109,7 +109,7 @@
>  #ifndef arch_try_cmpxchg_release
>  #define arch_try_cmpxchg_release(_ptr, _oldp, _new) \
>  ({ \
> - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
> + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
>   ___r = arch_cmpxchg_release((_ptr), ___o, (_new)); \
>   if (unlikely(___r != ___o)) \
>   *___op = ___r; \
> @@ -120,7 +120,7 @@
>  #ifndef arch_try_cmpxchg_relaxed
>  #define arch_try_cmpxchg_relaxed(_ptr, _oldp, _new) \
>  ({ \
> - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
> + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
>   ___r = arch_cmpxchg_relaxed((_ptr), ___o, (_new)); \
>   if (unlikely(___r != ___o)) \
>   *___op = ___r; \
> @@ -157,7 +157,7 @@
>  #ifndef arch_try_cmpxchg64
>  #define arch_try_cmpxchg64(_ptr, _oldp, _new) \
>  ({ \
> - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
> + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
>   ___r = arch_cmpxchg64((_ptr), ___o, (_new)); \
>   if (unlikely(___r != ___o)) \
>   *___op = ___r; \
> @@ -168,7 +168,7 @@
>  #ifndef arch_try_cmpxchg64_acquire
>  #define arch_try_cmpxchg64_acquire(_ptr, _oldp, _new) \
>  ({ \
> - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
> + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
>   ___r = arch_cmpxchg64_acquire((_ptr), ___o, (_new)); \
>   if (unlikely(___r != ___o)) \
>   *___op = ___r; \
> @@ -179,7 +179,7 @@
>  #ifndef arch_try_cmpxchg64_release
>  #define arch_try_cmpxchg64_release(_ptr, _oldp, _new) \
>  ({ \
> - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
> + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
>   ___r = arch_cmpxchg64_release((_ptr), ___o, (_new)); \
>   if (unlikely(___r != ___o)) \
>   *___op = ___r; \
> @@ -190,7 +190,7 @@
>  #ifndef arch_try_cmpxchg64_relaxed
>  #define arch_try_cmpxchg64_relaxed(_ptr, _oldp, _new) \
>  ({ \
> - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
> + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
>   ___r = arch_cmpxchg64_relaxed((_ptr), ___o, (_new)); \
>   if (unlikely(___r != ___o)) \
>   *___op = ___r; \
> @@ -2456,4 +2456,4 @@ arch_atomic64_dec_if_positive(atomic64_t *v)
>  #endif
>  
>  #endif /* _LINUX_ATOMIC_FALLBACK_H */
> -// b5e87bdd5ede61470c29f7a7e4de781af3770f09
> +// 1b4d4c82ae653389cd1538d5b07170267d9b3837
> diff --git a/scripts/atomic/gen-atomic-fallback.sh 
> b/scripts/atomic/gen-atomic-fallback.sh
> index 3a07695e3c89..39f447161108 100755
> --- a/scripts/atomic/gen-atomic-fallback.sh
> +++ b/scripts/atomic/gen-atomic-fallback.sh
> @@ -171,7 +171,7 @@ cat <  #ifndef arch_try_${cmpxchg}${order}
>  #define arch_try_${cmpxchg}${order}(_ptr, _oldp, _new) \\
>  ({ \\
> - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \\
> + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \\
>   ___r = 

Re: [PATCH v2 03/14] arm64: reword ARCH_FORCE_MAX_ORDER prompt and help text

2023-03-24 Thread Zi Yan
On 24 Mar 2023, at 1:22, Mike Rapoport wrote:

> From: "Mike Rapoport (IBM)" 
>
> The prompt and help text of ARCH_FORCE_MAX_ORDER are not even close to
> describe this configuration option.
>
> Update both to actually describe what this option does.
>
> Signed-off-by: Mike Rapoport (IBM) 
> Acked-by: Kirill A. Shutemov 
> ---
>  arch/arm64/Kconfig | 24 
>  1 file changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 7324032af859..cdaf52ac4018 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1487,24 +1487,24 @@ config XEN
>  # 16K |   27  |  14  |   13| 11  
>|
>  # 64K |   29  |  16  |   13| 13  
>|
>  config ARCH_FORCE_MAX_ORDER
> - int "Maximum zone order" if EXPERT && (ARM64_4K_PAGES || 
> ARM64_16K_PAGES)
> + int "Order of maximal physically contiguous allocations" if 
> ARM64_4K_PAGES || ARM64_16K_PAGES

It seems that "if EXPERT" was dropped by accident here.

>   default "13" if ARM64_64K_PAGES
>   default "11" if ARM64_16K_PAGES
>   default "10"
>   help
> -   The kernel memory allocator divides physically contiguous memory
> -   blocks into "zones", where each zone is a power of two number of
> -   pages.  This option selects the largest power of two that the kernel
> -   keeps in the memory allocator.  If you need to allocate very large
> -   blocks of physically contiguous memory, then you may need to
> -   increase this value.
> +   The kernel page allocator limits the size of maximal physically
> +   contiguous allocations. The limit is called MAX_ORDER and it
> +   defines the maximal power of two of number of pages that can be
> +   allocated as a single contiguous block. This option allows
> +   overriding the default setting when ability to allocate very
> +   large blocks of physically contiguous memory is required.
>
> -   We make sure that we can allocate up to a HugePage size for each 
> configuration.
> -   Hence we have :
> - MAX_ORDER = PMD_SHIFT - PAGE_SHIFT  => PAGE_SHIFT - 3
> +   The maximal size of allocation cannot exceed the size of the
> +   section, so the value of MAX_ORDER should satisfy
>
> -   However for 4K, we choose a higher default value, 10 as opposed to 9, 
> giving us
> -   4M allocations matching the default size used by generic code.
> + MAX_ORDER + PAGE_SHIFT <= SECTION_SIZE_BITS
> +
> +   Don't change if unsure.
>
>  config UNMAP_KERNEL_AT_EL0
>   bool "Unmap kernel when running in userspace (aka \"KAISER\")" if EXPERT
> -- 
> 2.35.1


--
Best Regards,
Yan, Zi


signature.asc
Description: OpenPGP digital signature


Re: [PATCH v12 13/13] arm64: dts: ls1088ardb: Add serdes descriptions

2023-03-24 Thread Ioana Ciornei
On Tue, Mar 21, 2023 at 04:13:12PM -0400, Sean Anderson wrote:
> This adds serdes support to the LS1088ARDB. I have tested the QSGMII
> ports as well as the two 10G ports. The SFP slot is now fully supported,
> instead of being modeled as a fixed-link.
> 
> Linux hangs around when the serdes is initialized if the si5341 is
> enabled with the in-tree driver, so I have modeled it as a two fixed
> clocks instead. There are a few registers in the QIXIS FPGA which
> control the SFP GPIOs; I have modeled them as discrete GPIO controllers
> for now. I never saw the AQR105 interrupt fire; not sure what was going
> on, but I have removed it to force polling.

So you didn't see the interrupt fire even without these patches?

I just tested this on a LS1088ARDB and it works.

root@localhost:~# cat /proc/interrupts | grep extirq
 99:  5  ls-extirq   2 Level 0x08b97000:00
root@localhost:~# ip link set dev endpmac2 up
root@localhost:~# cat /proc/interrupts | grep extirq
 99:  6  ls-extirq   2 Level 0x08b97000:00
root@localhost:~# ip link set dev endpmac2 down
root@localhost:~# cat /proc/interrupts | grep extirq
 99:  7  ls-extirq   2 Level 0x08b97000:00

Please don't just remove things.

Ioana



Re: Memory coherency issue with IO thread offloading?

2023-03-24 Thread Jens Axboe
On 3/24/23 1:27?AM, Christophe Leroy wrote:
> Hi,
> 
> Le 23/03/2023 ? 19:54, Jens Axboe a ?crit :
>> Hi,
>>
>> I got a report sent to me from mariadb, in where 5.10.158 works fine and
>> 5.10.162 is broken. And in fact, current 6.3-rc also fails the test
>> case. Beware that this email is long, as I'm trying to include
>> everything that may be relevant...
> 
> Which variant of powerpc ? 32 or 64 bits ? Book3S or BookE ?

I knew I'd forget something important... It's power9:

processor   : 0
cpu : POWER9 (architected), altivec supported
clock   : 2200.00MHz
revision: 2.2 (pvr 004e 1202)

-- 
Jens Axboe



Re: [PATCH v8 1/3] riscv: Introduce CONFIG_RELOCATABLE

2023-03-24 Thread Alexandre Ghiti

Hi Nick,

On 3/22/23 19:25, Nick Desaulniers wrote:

On Fri, Feb 24, 2023 at 7:58 AM Björn Töpel  wrote:

Alexandre Ghiti  writes:


+cc linux-kbuild, llvm, Nathan, Nick

On 2/15/23 15:36, Alexandre Ghiti wrote:

From: Alexandre Ghiti 


I tried a lot of things, but I struggle to understand, does anyone have
any idea? FYI, the same problem happens with LLVM.

Off the top of my head, no idea.

(Maybe as a follow up to this series, I wonder if pursuing
ARCH_HAS_RELR for ARCH=riscv is worthwhile?)



IIUC, the goal for using RELR is to reduce the size of a kernel image: 
right now, this is not my priority, but I'll add that to my todo list 
because that may be useful to distros.






Don't ask me *why*, but adding --emit-relocs to your linker flags solves
"the NULL .rela.dyn" both for GCC and LLVM.

The downside is that you end up with a bunch of .rela cruft in your
vmlinux.

There was a patch just this week to use $(OBJCOPY) to strip these from
vmlinux (for x86). Looks like x86 uses --emit-relocs for KASLR:
https://lore.kernel.org/lkml/20230320121006.4863-1-petr.pa...@suse.com/



That's nice, that would be an interesting intermediate step until we 
find the issue here as I believe it is important to have the relocations 
in the init section to save memory.


Thanks for your answer Nick, really appreciated,

Alex




Re: [PATCH] crypto: p10-aes-gcm - remove duplicate include header

2023-03-24 Thread Herbert Xu
On Tue, Mar 14, 2023 at 04:31:51PM +0800, ye.xingc...@zte.com.cn wrote:
> From: Ye Xingchen 
> 
> crypto/algapi.h is included more than once.
> 
> Signed-off-by: Ye Xingchen 
> ---
>  arch/powerpc/crypto/aes-gcm-p10-glue.c | 1 -
>  1 file changed, 1 deletion(-)

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH v7 6/6] PCI: Make use of pci_resource_n()

2023-03-24 Thread Andy Shevchenko
On Fri, Mar 24, 2023 at 10:08:39AM +0100, Philippe Mathieu-Daudé wrote:
> On 23/3/23 18:36, Andy Shevchenko wrote:
> > Replace open-coded implementations of pci_resource_n() in pci.h.

...

> >   #define pci_resource_n(dev, bar)  (&(dev)->resource[(bar)])
> > -#define pci_resource_start(dev, bar)   ((dev)->resource[(bar)].start)
> > -#define pci_resource_end(dev, bar) ((dev)->resource[(bar)].end)
> > -#define pci_resource_flags(dev, bar)   ((dev)->resource[(bar)].flags)
> > -#define pci_resource_len(dev,bar) \
> > -   ((pci_resource_end((dev), (bar)) == 0) ? 0 :\
> > -   \
> > -(pci_resource_end((dev), (bar)) -  \
> > - pci_resource_start((dev), (bar)) + 1))
> > +#define pci_resource_start(dev, bar)   (pci_resource_n(dev, 
> > bar)->start)
> > +#define pci_resource_end(dev, bar) (pci_resource_n(dev, bar)->end)
> > +#define pci_resource_flags(dev, bar)   (pci_resource_n(dev, 
> > bar)->flags)
> > +#define pci_resource_len(dev,bar)  \
> > +   (pci_resource_end((dev), (bar)) ?   \
> > +resource_size(pci_resource_n((dev), (bar))) : 0)
> 
> Seems (to me) more logical to have this patch as "PCI: Introduce
> pci_resource_n()" ordered before your patch #2 "PCI: Introduce
> pci_dev_for_each_resource()".

Either way works for me. Bjorn, what do you like?

> Here as #6 or as #2:
> Reviewed-by: Philippe Mathieu-Daudé 

Thank you!

-- 
With Best Regards,
Andy Shevchenko




Re: [PATCH v7 4/6] EISA: Convert to use less arguments in pci_bus_for_each_resource()

2023-03-24 Thread Andy Shevchenko
On Fri, Mar 24, 2023 at 10:02:15AM +0100, Philippe Mathieu-Daudé wrote:
> On 23/3/23 18:36, Andy Shevchenko wrote:
> > The pci_bus_for_each_resource() can hide the iterator loop since
> > it may be not used otherwise. With this, we may drop that iterator
> > variable definition.
> > 
> > Signed-off-by: Andy Shevchenko 
> > Reviewed-by: Krzysztof Wilczyński 
> > ---
> >   drivers/eisa/pci_eisa.c | 4 ++--
> >   1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/eisa/pci_eisa.c b/drivers/eisa/pci_eisa.c
> 
> Since this is *PCI* EISA, could be squashed into previous patch.

I believe it would be better to have them separated.
But if maintainers want to squash, I can do that.

> Reviewed-by: Philippe Mathieu-Daudé 

Thank you!


-- 
With Best Regards,
Andy Shevchenko




Re: [PATCH v3 0/4] Use dma_default_coherent for devicetree default coherency

2023-03-24 Thread Jiaxun Yang



> 2023年3月23日 21:39,Christoph Hellwig  写道:
> 
> On Thu, Mar 23, 2023 at 09:07:31PM +, Jiaxun Yang wrote:
>> 
>> 
>>> 2023年3月23日 07:29,Christoph Hellwig  写道:
>>> 
>>> The series looks fine to me.  How should we merge it?
>> 
>> Perhaps go through dma-mapping tree?
> 
> Is patch a 6.3 candidate or should all of it go into 6.4?

Please leave it for 6.4, as corresponding MIPS arch part will be a part of 6.4.

Thanks
Jiaxun

Re: [PATCH v7 6/6] PCI: Make use of pci_resource_n()

2023-03-24 Thread Philippe Mathieu-Daudé

On 23/3/23 18:36, Andy Shevchenko wrote:

Replace open-coded implementations of pci_resource_n() in pci.h.

Signed-off-by: Andy Shevchenko 
---
  include/linux/pci.h | 14 ++
  1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 70a4684d5f26..9539cf63fe5e 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -2006,14 +2006,12 @@ int pci_iobar_pfn(struct pci_dev *pdev, int bar, struct 
vm_area_struct *vma);
   * for accessing popular PCI BAR info
   */
  #define pci_resource_n(dev, bar)  (&(dev)->resource[(bar)])
-#define pci_resource_start(dev, bar)   ((dev)->resource[(bar)].start)
-#define pci_resource_end(dev, bar) ((dev)->resource[(bar)].end)
-#define pci_resource_flags(dev, bar)   ((dev)->resource[(bar)].flags)
-#define pci_resource_len(dev,bar) \
-   ((pci_resource_end((dev), (bar)) == 0) ? 0 :\
-   \
-(pci_resource_end((dev), (bar)) -  \
- pci_resource_start((dev), (bar)) + 1))
+#define pci_resource_start(dev, bar)   (pci_resource_n(dev, bar)->start)
+#define pci_resource_end(dev, bar) (pci_resource_n(dev, bar)->end)
+#define pci_resource_flags(dev, bar)   (pci_resource_n(dev, bar)->flags)
+#define pci_resource_len(dev,bar)  \
+   (pci_resource_end((dev), (bar)) ?   \
+resource_size(pci_resource_n((dev), (bar))) : 0)


Seems (to me) more logical to have this patch as "PCI: Introduce 
pci_resource_n()" ordered before your patch #2 "PCI: Introduce 
pci_dev_for_each_resource()".


Here as #6 or as #2:
Reviewed-by: Philippe Mathieu-Daudé 


Re: [PATCH v7 4/6] EISA: Convert to use less arguments in pci_bus_for_each_resource()

2023-03-24 Thread Philippe Mathieu-Daudé

On 23/3/23 18:36, Andy Shevchenko wrote:

The pci_bus_for_each_resource() can hide the iterator loop since
it may be not used otherwise. With this, we may drop that iterator
variable definition.

Signed-off-by: Andy Shevchenko 
Reviewed-by: Krzysztof Wilczyński 
---
  drivers/eisa/pci_eisa.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/eisa/pci_eisa.c b/drivers/eisa/pci_eisa.c


Since this is *PCI* EISA, could be squashed into previous patch.

Reviewed-by: Philippe Mathieu-Daudé 



Re: [PATCH v7 3/6] PCI: Allow pci_bus_for_each_resource() to take less arguments

2023-03-24 Thread Philippe Mathieu-Daudé

On 23/3/23 18:36, Andy Shevchenko wrote:

Refactor pci_bus_for_each_resource() in the same way as it's done in
pci_dev_for_each_resource() case. This will allow to hide iterator
inside the loop, where it's not used otherwise.

No functional changes intended.

Signed-off-by: Andy Shevchenko 
Reviewed-by: Krzysztof Wilczyński 
---
  drivers/pci/bus.c  |  7 +++
  drivers/pci/hotplug/shpchp_sysfs.c |  8 
  drivers/pci/pci.c  |  3 +--
  drivers/pci/probe.c|  2 +-
  drivers/pci/setup-bus.c| 10 --
  include/linux/pci.h| 17 +
  6 files changed, 26 insertions(+), 21 deletions(-)


Nice.

Reviewed-by: Philippe Mathieu-Daudé 



Re: Memory coherency issue with IO thread offloading?

2023-03-24 Thread Christophe Leroy
Hi,

Le 23/03/2023 à 19:54, Jens Axboe a écrit :
> Hi,
> 
> I got a report sent to me from mariadb, in where 5.10.158 works fine and
> 5.10.162 is broken. And in fact, current 6.3-rc also fails the test
> case. Beware that this email is long, as I'm trying to include
> everything that may be relevant...

Which variant of powerpc ? 32 or 64 bits ? Book3S or BookE ?

Christophe


> 
> The test case in question is pretty simple. On debian testing, do:
> 
> $ sudo apt-get install mariadb-test
> $ cd /usr/share/mysql/mysql-test
> $ ./mtr --mysqld=--innodb-flush-method=fsync 
> --mysqld=--innodb-use-native-aio=1 --vardir=/dev/shm/mysql  --force 
> encryption.innodb_encryption,innodb,undo0 --repeat=200
> 
> and if it fails, you'll see something like:
> 
> encryption.innodb_encryption 'innodb,undo0' [ 6 pass ]   3120
> encryption.innodb_encryption 'innodb,undo0' [ 7 pass ]   3123
> encryption.innodb_encryption 'innodb,undo0' [ 8 pass ]   3042
> encryption.innodb_encryption 'innodb,undo0' [ 9 fail ]
>  Test ended at 2023-03-23 16:55:17
> 
> CURRENT_TEST: encryption.innodb_encryption
> mysqltest: At line 11: query 'SET @start_global_value = 
> @@global.innodb_encryption_threads' failed: ER_UNKNOWN_SYSTEM_VARIABLE 
> (1193): Unknown system variable 'innodb_encryption_threads'
> 
> The result from queries just before the failure was:
> SET @start_global_value = @@global.innodb_encryption_threads;
> 
>   - saving '/dev/shm/mysql/log/encryption.innodb_encryption-innodb,undo0/' to 
> '/dev/shm/mysql/log/encryption.innodb_encryption-innodb,undo0/'
> ***Warnings generated in error logs during shutdown after running tests: 
> encryption.innodb_encryption
> 
> 2023-03-23 16:55:17 0 [Warning] Plugin 'example_key_management' is of 
> maturity level experimental while the server is stable
> 2023-03-23 16:55:17 0 [ERROR] InnoDB: Database page corruption on disk or a 
> failed read of file './ibdata1' page [page id: space=0, page number=221]. You 
> may have to recover from a backup.
> 
> where data read was not as expected.
> 
> Now, there are a number of io_uring changes between .158 and .162, as it
> includes the backport that brought 5.10-stable into line with what
> 5.15-stable includes. I'll spare you all the digging I did to vet those
> changes, but the key thing is that it STILL happens on 6.3-git on
> powerpc.
> 
> After ruling out many things, one key difference between 158 and 162 is
> that the former offloaded requests that could not be done nonblocking to
> a kthread, and 162 and newer offloads to an IO thread. An IO thread is
> just a normal thread created from the application submitting IO, the
> only difference is that it never exits to userspace. An IO thread has
> the same mm/files/you-name-it from the original task. It really is the
> same as a userspace thread created by the application The switch to IO
> threads was done exactly because of that, rather than rely on a fragile
> scheme of having the kthread worker assume all sorts of identify from
> the original task. surprises if things were missed. This is what caused
> most of the io_uring security issues in the past.
> 
> The IO that mariadb does in this test is pretty simple - a bunch of
> largish buffered writes with IORING_OP_WRITEV, and some smallish (16K)
> buffered reads with IORING_OP_READV.
> 
> Today I finally gave up and ran a basic experiment, which simply
> offloads the writes to a kthread. Since powerpc has an interesting
> memory coherency model, my suspicion was that the work involved with
> switching MMs for the kthread could just be the main difference here.
> The patch is really dumb and simple - rather than queue the write to an
> IO thread, it just offloads it to a kthread that then does
> kthread_use_mm(), perform write with the same write handler,
> kthread_unuse_mm(). AND THIS WORKS! Usually the above mtr test would
> fail in 2..20 loops, I've now done 200 and 500 loops and it's fine.
> 
> Which then leads me to the question, what about the IO thread offload
> makes this fail on powerpc (and no other arch I've tested on, including
> x86/x86-64/aarch64/hppa64)? The offload should be equivalent to having a
> thread in userspace in the application, and having that thread just
> perform the writes. Is there some magic involved with the kthread mm
> use/unuse that makes this sufficiently consistent on powerpc? I've tried
> any mix of isync()/mb and making the flush_dcache_page() unconditionally
> done in the filemap read/write helpers, and it still falls flat on its
> face with the offload to an IO thread.
> 
> I must clearly be missing something here, which is why I'm emailing the
> powerpc Gods for help :-)
>