date:20220224

Re: [PATCH v6 0/4] Add perf interface to expose nvdimm

2022-02-24 Thread Aneesh Kumar K V

On Fri, 2022-02-25 at 12:08 +0530, kajoljain wrote:
> 
> 
> On 2/25/22 11:25, Nageswara Sastry wrote:
> > 
> > 
> > On 17/02/22 10:03 pm, Kajol Jain wrote:
> > > 

> > > 
> > > Changelog
> > 
> > Tested these patches with the automated tests at
> > avocado-misc-tests/perf/perf_nmem.py
> > URL:
> > https://github.com/avocado-framework-tests/avocado-misc-tests/blob/master/perf/perf_nmem.py
> > 
> > 
> > 1. On the system where target id and online id were different then
> > not
> > seeing value in 'cpumask' and those tests failed.
> > 
> > Example:
> > Log from dmesg
> > ...
> > papr_scm ibm,persistent-memory:ibm,pmemory@4413: Region
> > registered
> > with target node 1 and online node 0
> > ...
> 
> Hi Nageswara Sastry,
>    Thanks for testing the patch set. Yes you right, incase target
> node id and online node id is different, it can happen when target
> node is not online and hence can cause this issue, thanks for
> pointing
> it.
> 
> Function dev_to_node will return node id for a given nvdimm device
> which
> can be offline in some scenarios. We should use numa node id return
> by
> numa_map_to_online_node function in that scenario. This function
> incase
> given node is offline, it will lookup for next closest online node
> and
> return that nodeid.
> 
> Can you try with below change and see, if you are still getting this
> issue. Please let me know.
> 
> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c
> b/arch/powerpc/platforms/pseries/papr_scm.c
> index bdf2620db461..4dd513d7c029 100644
> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> @@ -536,7 +536,7 @@ static void papr_scm_pmu_register(struct
> papr_scm_priv *p)
>     PERF_PMU_CAP_NO_EXCLUDE;
> 
>     /*updating the cpumask variable */
> -   nodeid = dev_to_node(>pdev->dev);
> +   nodeid = numa_map_to_online_node(dev_to_node(>pdev->dev));
>     nd_pmu->arch_cpumask = *cpumask_of_node(nodeid);
> 
> > 

Can you use p->region->numa_node? 

-aneesh

Re: [PATCH v6 0/4] Add perf interface to expose nvdimm

2022-02-24 Thread kajoljain




On 2/25/22 11:25, Nageswara Sastry wrote:
> 
> 
> On 17/02/22 10:03 pm, Kajol Jain wrote:
>> Patchset adds performance stats reporting support for nvdimm.
>> Added interface includes support for pmu register/unregister
>> functions. A structure is added called nvdimm_pmu to be used for
>> adding arch/platform specific data such as cpumask, nvdimm device
>> pointer and pmu event functions like event_init/add/read/del.
>> User could use the standard perf tool to access perf events
>> exposed via pmu.
>>
>> Interface also defines supported event list, config fields for the
>> event attributes and their corresponding bit values which are exported
>> via sysfs. Patch 3 exposes IBM pseries platform nmem* device
>> performance stats using this interface.
>>
>> Result from power9 pseries lpar with 2 nvdimm device:
>>
>> Ex: List all event by perf list
>>
>> command:# perf list nmem
>>
>>    nmem0/cache_rh_cnt/    [Kernel PMU event]
>>    nmem0/cache_wh_cnt/    [Kernel PMU event]
>>    nmem0/cri_res_util/    [Kernel PMU event]
>>    nmem0/ctl_res_cnt/ [Kernel PMU event]
>>    nmem0/ctl_res_tm/  [Kernel PMU event]
>>    nmem0/fast_w_cnt/  [Kernel PMU event]
>>    nmem0/host_l_cnt/  [Kernel PMU event]
>>    nmem0/host_l_dur/  [Kernel PMU event]
>>    nmem0/host_s_cnt/  [Kernel PMU event]
>>    nmem0/host_s_dur/  [Kernel PMU event]
>>    nmem0/med_r_cnt/   [Kernel PMU event]
>>    nmem0/med_r_dur/   [Kernel PMU event]
>>    nmem0/med_w_cnt/   [Kernel PMU event]
>>    nmem0/med_w_dur/   [Kernel PMU event]
>>    nmem0/mem_life/    [Kernel PMU event]
>>    nmem0/poweron_secs/    [Kernel PMU event]
>>    ...
>>    nmem1/mem_life/    [Kernel PMU event]
>>    nmem1/poweron_secs/    [Kernel PMU event]
>>
>> Patch1:
>>  Introduces the nvdimm_pmu structure
>> Patch2:
>>  Adds common interface to add arch/platform specific data
>>  includes nvdimm device pointer, pmu data along with
>>  pmu event functions. It also defines supported event list
>>  and adds attribute groups for format, events and cpumask.
>>  It also adds code for cpu hotplug support.
>> Patch3:
>>  Add code in arch/powerpc/platform/pseries/papr_scm.c to expose
>>  nmem* pmu. It fills in the nvdimm_pmu structure with pmu name,
>>  capabilities, cpumask and event functions and then registers
>>  the pmu by adding callbacks to register_nvdimm_pmu.
>> Patch4:
>>  Sysfs documentation patch
>>
>> Changelog
> 
> Tested these patches with the automated tests at
> avocado-misc-tests/perf/perf_nmem.py
> URL:
> https://github.com/avocado-framework-tests/avocado-misc-tests/blob/master/perf/perf_nmem.py
> 
> 
> 1. On the system where target id and online id were different then not
> seeing value in 'cpumask' and those tests failed.
> 
> Example:
> Log from dmesg
> ...
> papr_scm ibm,persistent-memory:ibm,pmemory@4413: Region registered
> with target node 1 and online node 0
> ...

Hi Nageswara Sastry,
   Thanks for testing the patch set. Yes you right, incase target
node id and online node id is different, it can happen when target
node is not online and hence can cause this issue, thanks for pointing
it.

Function dev_to_node will return node id for a given nvdimm device which
can be offline in some scenarios. We should use numa node id return by
numa_map_to_online_node function in that scenario. This function incase
given node is offline, it will lookup for next closest online node and
return that nodeid.

Can you try with below change and see, if you are still getting this
issue. Please let me know.

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c
b/arch/powerpc/platforms/pseries/papr_scm.c
index bdf2620db461..4dd513d7c029 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -536,7 +536,7 @@ static void papr_scm_pmu_register(struct
papr_scm_priv *p)
PERF_PMU_CAP_NO_EXCLUDE;

/*updating the cpumask variable */
-   nodeid = dev_to_node(>pdev->dev);
+   nodeid = numa_map_to_online_node(dev_to_node(>pdev->dev));
nd_pmu->arch_cpumask = *cpumask_of_node(nodeid);

Thanks,
Kajol Jain

> 
> tests log:
>  (1/9) perf_nmem.py:perfNMEM.test_pmu_register_dmesg: PASS (1.13 s)
>  (2/9) perf_nmem.py:perfNMEM.test_sysfs: PASS (1.10 s)
>  (3/9) perf_nmem.py:perfNMEM.test_pmu_count: PASS (1.07 s)
>  (4/9)

Re: [PATCH v6 0/4] Add perf interface to expose nvdimm

2022-02-24 Thread Nageswara Sastry





On 17/02/22 10:03 pm, Kajol Jain wrote:

Patchset adds performance stats reporting support for nvdimm.
Added interface includes support for pmu register/unregister
functions. A structure is added called nvdimm_pmu to be used for
adding arch/platform specific data such as cpumask, nvdimm device
pointer and pmu event functions like event_init/add/read/del.
User could use the standard perf tool to access perf events
exposed via pmu.

Interface also defines supported event list, config fields for the
event attributes and their corresponding bit values which are exported
via sysfs. Patch 3 exposes IBM pseries platform nmem* device
performance stats using this interface.

Result from power9 pseries lpar with 2 nvdimm device:

Ex: List all event by perf list

command:# perf list nmem

   nmem0/cache_rh_cnt/[Kernel PMU event]
   nmem0/cache_wh_cnt/[Kernel PMU event]
   nmem0/cri_res_util/[Kernel PMU event]
   nmem0/ctl_res_cnt/ [Kernel PMU event]
   nmem0/ctl_res_tm/  [Kernel PMU event]
   nmem0/fast_w_cnt/  [Kernel PMU event]
   nmem0/host_l_cnt/  [Kernel PMU event]
   nmem0/host_l_dur/  [Kernel PMU event]
   nmem0/host_s_cnt/  [Kernel PMU event]
   nmem0/host_s_dur/  [Kernel PMU event]
   nmem0/med_r_cnt/   [Kernel PMU event]
   nmem0/med_r_dur/   [Kernel PMU event]
   nmem0/med_w_cnt/   [Kernel PMU event]
   nmem0/med_w_dur/   [Kernel PMU event]
   nmem0/mem_life/[Kernel PMU event]
   nmem0/poweron_secs/[Kernel PMU event]
   ...
   nmem1/mem_life/[Kernel PMU event]
   nmem1/poweron_secs/[Kernel PMU event]

Patch1:
 Introduces the nvdimm_pmu structure
Patch2:
 Adds common interface to add arch/platform specific data
 includes nvdimm device pointer, pmu data along with
 pmu event functions. It also defines supported event list
 and adds attribute groups for format, events and cpumask.
 It also adds code for cpu hotplug support.
Patch3:
 Add code in arch/powerpc/platform/pseries/papr_scm.c to expose
 nmem* pmu. It fills in the nvdimm_pmu structure with pmu name,
 capabilities, cpumask and event functions and then registers
 the pmu by adding callbacks to register_nvdimm_pmu.
Patch4:
 Sysfs documentation patch

Changelog


Tested these patches with the automated tests at 
avocado-misc-tests/perf/perf_nmem.py

URL:
https://github.com/avocado-framework-tests/avocado-misc-tests/blob/master/perf/perf_nmem.py

1. On the system where target id and online id were different then not 
seeing value in 'cpumask' and those tests failed.


Example:
Log from dmesg
...
papr_scm ibm,persistent-memory:ibm,pmemory@4413: Region registered 
with target node 1 and online node 0

...

tests log:
 (1/9) perf_nmem.py:perfNMEM.test_pmu_register_dmesg: PASS (1.13 s)
 (2/9) perf_nmem.py:perfNMEM.test_sysfs: PASS (1.10 s)
 (3/9) perf_nmem.py:perfNMEM.test_pmu_count: PASS (1.07 s)
 (4/9) perf_nmem.py:perfNMEM.test_all_events: PASS (18.14 s)
 (5/9) perf_nmem.py:perfNMEM.test_all_group_events: PASS (2.18 s)
 (6/9) perf_nmem.py:perfNMEM.test_mixed_events: CANCEL: With single PMU 
mixed events test is not possible. (1.10 s)
 (7/9) perf_nmem.py:perfNMEM.test_pmu_cpumask: ERROR: invalid literal 
for int() with base 10: '' (1.10 s)
 (8/9) perf_nmem.py:perfNMEM.test_cpumask: ERROR: invalid literal for 
int() with base 10: '' (1.10 s)
 (9/9) perf_nmem.py:perfNMEM.test_cpumask_cpu_off: ERROR: invalid 
literal for int() with base 10: '' (1.07 s)


2. On the system where target id and online id were same then seeing 
value in 'cpumask' and those tests pass.


tests log:
 (1/9) perf_nmem.py:perfNMEM.test_pmu_register_dmesg: PASS (1.16 s)
 (2/9) perf_nmem.py:perfNMEM.test_sysfs: PASS (1.10 s)
 (3/9) perf_nmem.py:perfNMEM.test_pmu_count: PASS (1.12 s)
 (4/9) perf_nmem.py:perfNMEM.test_all_events: PASS (18.10 s)
 (5/9) perf_nmem.py:perfNMEM.test_all_group_events: PASS (2.23 s)
 (6/9) perf_nmem.py:perfNMEM.test_mixed_events: CANCEL: With single PMU 
mixed events test is not possible. (1.13 s)

 (7/9) perf_nmem.py:perfNMEM.test_pmu_cpumask: PASS (1.08 s)
 (8/9) perf_nmem.py:perfNMEM.test_cpumask: PASS (1.09 s)
 (9/9) perf_nmem.py:perfNMEM.test_cpumask_cpu_off: PASS (1.62 s)


---
Resend v5 -> v6
- No logic change, just a rebase to latest upstream and
   tested the patchset.

- Link to the patchset Resend v5: https://lkml.org/lkml/2021/11/15/3979

v5 -> Resend v5
- Resend the patchset

- Link

Re: [PATCH v2] usercopy: Check valid lifetime via stack depth

2022-02-24 Thread Kees Cook

On Thu, Feb 24, 2022 at 08:58:20AM +, David Laight wrote:
> From: Kees Cook
> > Sent: 24 February 2022 06:04
> > 
> > Under CONFIG_HARDENED_USERCOPY=y, when exact stack frame boundary checking
> > is not available (i.e. everything except x86 with FRAME_POINTER), check
> > a stack object as being at least "current depth valid", in the sense
> > that any object within the stack region but not between start-of-stack
> > and current_stack_pointer should be considered unavailable (i.e. its
> > lifetime is from a call no longer present on the stack).
> > 
> ...
> > diff --git a/mm/usercopy.c b/mm/usercopy.c
> > index d0d268135d96..5d28725af95f 100644
> > --- a/mm/usercopy.c
> > +++ b/mm/usercopy.c
> > @@ -22,6 +22,30 @@
> >  #include 
> >  #include "slab.h"
> > 
> > +/*
> > + * Only called if obj is within stack/stackend bounds. Determine if within
> > + * current stack depth.
> > + */
> > +static inline int check_stack_object_depth(const void *obj,
> > +  unsigned long len)
> > +{
> > +#ifdef CONFIG_ARCH_HAS_CURRENT_STACK_POINTER
> > +#ifndef CONFIG_STACK_GROWSUP
> 
> Pointless negation
> 
> > +   const void * const high = stackend;
> > +   const void * const low = (void *)current_stack_pointer;
> > +#else
> > +   const void * const high = (void *)current_stack_pointer;
> > +   const void * const low = stack;
> > +#endif
> > +
> > +   /* Reject: object not within current stack depth. */
> > +   if (obj < low || high < obj + len)
> > +   return BAD_STACK;
> > +
> > +#endif
> > +   return GOOD_STACK;
> > +}
> 
> If the comment at the top of the function is correct then
> only a single test for the correct end of the buffer against
> the current stack pointer is needed.
> Something like:
> #ifdef CONFIG_STACK_GROWSUP
>   if ((void *)current_stack_pointer < obj + len)
>   return BAD_STACK;
> #else
>   if (obj < (void *)current_stack_pointer)
>   return BAD_STACK;
> #endif
>   return GOOD_STACK;

Oh, yeah, excellent point. I suspect the compiler would probably
optimize it all away, but yes, this is, in fact, easier to read, and
short enough I should probably just not bother with a separate function.

Thanks!

-Kees

> 
> Although it may depend on exactly where the stack pointer
> points to - especially for GROWSUP.
> 
>   David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 
> 1PT, UK
> Registration No: 1397386 (Wales)
> 

-- 
Kees Cook

Re: [PATCH v2 18/18] uaccess: drop maining CONFIG_SET_FS users

2022-02-24 Thread Dinh Nguyen





On 2/16/22 07:13, Arnd Bergmann wrote:

From: Arnd Bergmann 

There are no remaining callers of set_fs(), so CONFIG_SET_FS
can be removed globally, along with the thread_info field and
any references to it.

This turns access_ok() into a cheaper check against TASK_SIZE_MAX.

With CONFIG_SET_FS gone, so drop all remaining references to
set_fs()/get_fs(), mm_segment_t and uaccess_kernel().

Signed-off-by: Arnd Bergmann 
---
  arch/Kconfig  |  3 -
  arch/alpha/Kconfig|  1 -
  arch/alpha/include/asm/processor.h|  4 --
  arch/alpha/include/asm/thread_info.h  |  2 -
  arch/alpha/include/asm/uaccess.h  | 19 --
  arch/arc/Kconfig  |  1 -
  arch/arc/include/asm/segment.h| 20 ---
  arch/arc/include/asm/thread_info.h|  3 -
  arch/arc/include/asm/uaccess.h|  1 -
  arch/arm/lib/uaccess_with_memcpy.c| 10 
  arch/csky/Kconfig |  1 -
  arch/csky/include/asm/processor.h |  2 -
  arch/csky/include/asm/segment.h   | 10 
  arch/csky/include/asm/thread_info.h   |  2 -
  arch/csky/include/asm/uaccess.h   |  3 -
  arch/csky/kernel/asm-offsets.c|  1 -
  arch/h8300/Kconfig|  1 -
  arch/h8300/include/asm/processor.h|  1 -
  arch/h8300/include/asm/segment.h  | 40 -
  arch/h8300/include/asm/thread_info.h  |  3 -
  arch/h8300/kernel/entry.S |  1 -
  arch/h8300/kernel/head_ram.S  |  1 -
  arch/h8300/mm/init.c  |  6 --
  arch/h8300/mm/memory.c|  1 -
  arch/hexagon/Kconfig  |  1 -
  arch/hexagon/include/asm/thread_info.h|  6 --
  arch/hexagon/kernel/process.c |  1 -
  arch/microblaze/Kconfig   |  1 -
  arch/microblaze/include/asm/thread_info.h |  6 --
  arch/microblaze/include/asm/uaccess.h | 24 
  arch/microblaze/kernel/asm-offsets.c  |  1 -
  arch/microblaze/kernel/process.c  |  1 -
  arch/nds32/Kconfig|  1 -
  arch/nds32/include/asm/thread_info.h  |  4 --
  arch/nds32/include/asm/uaccess.h  | 15 +
  arch/nds32/kernel/process.c   |  5 +-
  arch/nds32/mm/alignment.c |  3 -
  arch/nios2/Kconfig|  1 -
  arch/nios2/include/asm/thread_info.h  |  9 ---
  arch/nios2/include/asm/uaccess.h  | 12 


For NIOS2:

Acked-by: Dinh Nguyen

Re: [PATCH v2 13/18] uaccess: generalize access_ok()

2022-02-24 Thread Dinh Nguyen





On 2/16/22 07:13, Arnd Bergmann wrote:

From: Arnd Bergmann 

There are many different ways that access_ok() is defined across
architectures, but in the end, they all just compare against the
user_addr_max() value or they accept anything.

Provide one definition that works for most architectures, checking
against TASK_SIZE_MAX for user processes or skipping the check inside
of uaccess_kernel() sections.

For architectures without CONFIG_SET_FS(), this should be the fastest
check, as it comes down to a single comparison of a pointer against a
compile-time constant, while the architecture specific versions tend to
do something more complex for historic reasons or get something wrong.

Type checking for __user annotations is handled inconsistently across
architectures, but this is easily simplified as well by using an inline
function that takes a 'const void __user *' argument. A handful of
callers need an extra __user annotation for this.

Some architectures had trick to use 33-bit or 65-bit arithmetic on the
addresses to calculate the overflow, however this simpler version uses
fewer registers, which means it can produce better object code in the
end despite needing a second (statically predicted) branch.

Reviewed-by: Christoph Hellwig 
Acked-by: Mark Rutland  [arm64, asm-generic]
Signed-off-by: Arnd Bergmann 
---
  arch/Kconfig  |  7 
  arch/alpha/include/asm/uaccess.h  | 34 +++
  arch/arc/include/asm/uaccess.h| 29 -
  arch/arm/include/asm/uaccess.h| 20 +
  arch/arm64/include/asm/uaccess.h  | 11 ++---
  arch/csky/include/asm/uaccess.h   |  8 
  arch/hexagon/include/asm/uaccess.h| 25 
  arch/ia64/include/asm/uaccess.h   |  5 +--
  arch/m68k/Kconfig.cpu |  1 +
  arch/m68k/include/asm/uaccess.h   | 19 +
  arch/microblaze/include/asm/uaccess.h |  8 +---
  arch/mips/include/asm/uaccess.h   | 29 +
  arch/nds32/include/asm/uaccess.h  |  7 +---
  arch/nios2/include/asm/uaccess.h  | 11 +


Acked-by: Dinh Nguyen

Re: [PATCH v2 12/18] uaccess: fix type mismatch warnings from access_ok()

2022-02-24 Thread Dinh Nguyen





On 2/16/22 07:13, Arnd Bergmann wrote:

From: Arnd Bergmann 

On some architectures, access_ok() does not do any argument type
checking, so replacing the definition with a generic one causes
a few warnings for harmless issues that were never caught before.

Fix the ones that I found either through my own test builds or
that were reported by the 0-day bot.

Reported-by: kernel test robot 
Signed-off-by: Arnd Bergmann 
---
  arch/arc/kernel/process.c   |  2 +-
  arch/arm/kernel/swp_emulate.c   |  2 +-
  arch/arm/kernel/traps.c |  2 +-
  arch/csky/kernel/signal.c   |  2 +-
  arch/mips/sibyte/common/sb_tbprof.c |  6 +++---
  arch/nios2/kernel/signal.c  | 20 +++-


Acked-by: Dinh Nguyen

Re: [PATCH v2 02/18] uaccess: fix nios2 and microblaze get_user_8()

2022-02-24 Thread Dinh Nguyen





On 2/16/22 07:13, Arnd Bergmann wrote:

From: Arnd Bergmann 

These two architectures implement 8-byte get_user() through
a memcpy() into a four-byte variable, which won't fit.

Use a temporary 64-bit variable instead here, and use a double
cast the way that risc-v and openrisc do to avoid compile-time
warnings.

Fixes: 6a090e97972d ("arch/microblaze: support get_user() of size 8 bytes")
Fixes: 5ccc6af5e88e ("nios2: Memory management")
Signed-off-by: Arnd Bergmann 
---
  arch/microblaze/include/asm/uaccess.h | 18 +-
  arch/nios2/include/asm/uaccess.h  | 26 --
  2 files changed, 25 insertions(+), 19 deletions(-)

diff --git a/arch/microblaze/include/asm/uaccess.h 
b/arch/microblaze/include/asm/uaccess.h
index 5b6e0e7788f4..3fe96979d2c6 100644
--- a/arch/microblaze/include/asm/uaccess.h
+++ b/arch/microblaze/include/asm/uaccess.h
@@ -130,27 +130,27 @@ extern long __user_bad(void);
  
  #define __get_user(x, ptr)		\

  ({\
-   unsigned long __gu_val = 0; \
long __gu_err;  \
switch (sizeof(*(ptr))) {   \
case 1: \
-   __get_user_asm("lbu", (ptr), __gu_val, __gu_err); \
+   __get_user_asm("lbu", (ptr), x, __gu_err);\
break;  \
case 2: \
-   __get_user_asm("lhu", (ptr), __gu_val, __gu_err); \
+   __get_user_asm("lhu", (ptr), x, __gu_err);\
break;  \
case 4: \
-   __get_user_asm("lw", (ptr), __gu_val, __gu_err);  \
+   __get_user_asm("lw", (ptr), x, __gu_err); \
break;  \
-   case 8: \
-   __gu_err = __copy_from_user(&__gu_val, ptr, 8); \
-   if (__gu_err)   \
-   __gu_err = -EFAULT; \
+   case 8: {   \
+   __u64 __x = 0;  \
+   __gu_err = raw_copy_from_user(&__x, ptr, 8) ?   \
+   -EFAULT : 0;\
+   (x) = (typeof(x))(typeof((x) - (x)))__x;\
break;  \
+   }   \
default:\
/* __gu_val = 0; __gu_err = -EINVAL;*/ __gu_err = __user_bad();\
}   \
-   x = (__force __typeof__(*(ptr))) __gu_val;  \
__gu_err;   \
  })
  
diff --git a/arch/nios2/include/asm/uaccess.h b/arch/nios2/include/asm/uaccess.h

index ba9340e96fd4..ca9285a915ef 100644
--- a/arch/nios2/include/asm/uaccess.h
+++ b/arch/nios2/include/asm/uaccess.h
@@ -88,6 +88,7 @@ extern __must_check long strnlen_user(const char __user *s, 
long n);
  /* Optimized macros */
  #define __get_user_asm(val, insn, addr, err)  \
  { \
+   unsigned long __gu_val; \
__asm__ __volatile__(   \
"   movi%0, %3\n" \
"1:   " insn " %1, 0(%2)\n" \
@@ -96,14 +97,20 @@ extern __must_check long strnlen_user(const char __user *s, 
long n);
"   .section __ex_table,\"a\"\n"\
"   .word 1b, 2b\n"   \
"   .previous"\
-   : "=" (err), "=r" (val)   \
+   : "=" (err), "=r" (__gu_val)  \
: "r" (addr), "i" (-EFAULT));   \
+   val = (__force __typeof__(*(addr)))__gu_val;\
  }
  
-#define __get_user_unknown(val, size, ptr, err) do {			\

+extern void __get_user_unknown(void);
+
+#define __get_user_8(val, ptr, err) do {   \
+   u64 __val = 0;  \
err = 0;

Re: [PATCH v3 1/2] selftest/vm: Add util.h and and move helper functions there

2022-02-24 Thread Andrew Morton

On Thu, 17 Feb 2022 14:05:36 +0530 "Aneesh Kumar K.V" 
 wrote:

> Avoid code duplication by adding util.h. No functional change
> in this patch.

Sorry, but changes in linux-next have messed this patch up a bit more
than I'm prepared to fix.  Could you please redo these two against
linux-next after it has settled down a little?  Next week would be
good, thanks.

Re: [PATCH V6 17/20] riscv: compat: vdso: Add setup additional pages implementation

2022-02-24 Thread Guo Ren

On Fri, Feb 25, 2022 at 1:57 AM Palmer Dabbelt  wrote:
>
> On Thu, 24 Feb 2022 00:54:07 PST (-0800), guo...@kernel.org wrote:
> > From: Guo Ren 
> >
> > Reconstruct __setup_additional_pages() by appending vdso info
> > pointer argument to meet compat_vdso_info requirement. And change
> > vm_special_mapping *dm, *cm initialization into static.
> >
> > Signed-off-by: Guo Ren 
> > Signed-off-by: Guo Ren 
> > Reviewed-by: Palmer Dabbelt 
> > Cc: Arnd Bergmann 
> > ---
> >  arch/riscv/include/asm/elf.h |   5 ++
> >  arch/riscv/include/asm/mmu.h |   1 +
> >  arch/riscv/kernel/vdso.c | 103 +++
> >  3 files changed, 74 insertions(+), 35 deletions(-)
> >
> > diff --git a/arch/riscv/include/asm/elf.h b/arch/riscv/include/asm/elf.h
> > index 3a4293dc7229..d87d3bcc758d 100644
> > --- a/arch/riscv/include/asm/elf.h
> > +++ b/arch/riscv/include/asm/elf.h
> > @@ -134,5 +134,10 @@ do {if ((ex).e_ident[EI_CLASS] == ELFCLASS32)  
> >   \
> >  typedef compat_ulong_t   compat_elf_greg_t;
> >  typedef compat_elf_greg_tcompat_elf_gregset_t[ELF_NGREG];
> >
> > +extern int compat_arch_setup_additional_pages(struct linux_binprm *bprm,
> > +   int uses_interp);
> > +#define compat_arch_setup_additional_pages \
> > + compat_arch_setup_additional_pages
> > +
> >  #endif /* CONFIG_COMPAT */
> >  #endif /* _ASM_RISCV_ELF_H */
> > diff --git a/arch/riscv/include/asm/mmu.h b/arch/riscv/include/asm/mmu.h
> > index 0099dc116168..cedcf8ea3c76 100644
> > --- a/arch/riscv/include/asm/mmu.h
> > +++ b/arch/riscv/include/asm/mmu.h
> > @@ -16,6 +16,7 @@ typedef struct {
> >   atomic_long_t id;
> >  #endif
> >   void *vdso;
> > + void *vdso_info;
> >  #ifdef CONFIG_SMP
> >   /* A local icache flush is needed before user execution can resume. */
> >   cpumask_t icache_stale_mask;
> > diff --git a/arch/riscv/kernel/vdso.c b/arch/riscv/kernel/vdso.c
> > index a9436a65161a..f864811aa011 100644
> > --- a/arch/riscv/kernel/vdso.c
> > +++ b/arch/riscv/kernel/vdso.c
> > @@ -23,6 +23,9 @@ struct vdso_data {
> >  #endif
> >
> >  extern char vdso_start[], vdso_end[];
> > +#ifdef CONFIG_COMPAT
> > +extern char compat_vdso_start[], compat_vdso_end[];
> > +#endif
> >
> >  enum vvar_pages {
> >   VVAR_DATA_PAGE_OFFSET,
> > @@ -30,6 +33,11 @@ enum vvar_pages {
> >   VVAR_NR_PAGES,
> >  };
> >
> > +enum rv_vdso_map {
> > + RV_VDSO_MAP_VVAR,
> > + RV_VDSO_MAP_VDSO,
> > +};
> > +
> >  #define VVAR_SIZE  (VVAR_NR_PAGES << PAGE_SHIFT)
> >
> >  /*
> > @@ -52,12 +60,6 @@ struct __vdso_info {
> >   struct vm_special_mapping *cm;
> >  };
> >
> > -static struct __vdso_info vdso_info __ro_after_init = {
> > - .name = "vdso",
> > - .vdso_code_start = vdso_start,
> > - .vdso_code_end = vdso_end,
> > -};
> > -
> >  static int vdso_mremap(const struct vm_special_mapping *sm,
> >  struct vm_area_struct *new_vma)
> >  {
> > @@ -66,37 +68,33 @@ static int vdso_mremap(const struct vm_special_mapping 
> > *sm,
> >   return 0;
> >  }
> >
> > -static int __init __vdso_init(void)
> > +static void __init __vdso_init(struct __vdso_info *vdso_info)
> >  {
> >   unsigned int i;
> >   struct page **vdso_pagelist;
> >   unsigned long pfn;
> >
> > - if (memcmp(vdso_info.vdso_code_start, "\177ELF", 4)) {
> > - pr_err("vDSO is not a valid ELF object!\n");
> > - return -EINVAL;
> > - }
> > + if (memcmp(vdso_info->vdso_code_start, "\177ELF", 4))
> > + panic("vDSO is not a valid ELF object!\n");
> >
> > - vdso_info.vdso_pages = (
> > - vdso_info.vdso_code_end -
> > - vdso_info.vdso_code_start) >>
> > + vdso_info->vdso_pages = (
> > + vdso_info->vdso_code_end -
> > + vdso_info->vdso_code_start) >>
> >   PAGE_SHIFT;
> >
> > - vdso_pagelist = kcalloc(vdso_info.vdso_pages,
> > + vdso_pagelist = kcalloc(vdso_info->vdso_pages,
> >   sizeof(struct page *),
> >   GFP_KERNEL);
> >   if (vdso_pagelist == NULL)
> > - return -ENOMEM;
> > + panic("vDSO kcalloc failed!\n");
> >
> >   /* Grab the vDSO code pages. */
> > - pfn = sym_to_pfn(vdso_info.vdso_code_start);
> > + pfn = sym_to_pfn(vdso_info->vdso_code_start);
> >
> > - for (i = 0; i < vdso_info.vdso_pages; i++)
> > + for (i = 0; i < vdso_info->vdso_pages; i++)
> >   vdso_pagelist[i] = pfn_to_page(pfn + i);
> >
> > - vdso_info.cm->pages = vdso_pagelist;
> > -
> > - return 0;
> > + vdso_info->cm->pages = vdso_pagelist;
> >  }
> >
> >  #ifdef CONFIG_TIME_NS
> > @@ -116,13 +114,14 @@ int vdso_join_timens(struct task_struct *task, struct 
> > time_namespace *ns)
> >  {
> >   struct mm_struct *mm = task->mm;
> >   struct vm_area_struct *vma;
> > + struct

Re: [PATCH v2 02/13] tracing: Fix selftest config check for function graph start up test

2022-02-24 Thread Michael Ellerman

Steven Rostedt  writes:
> On Thu, 24 Feb 2022 13:43:02 +
> Christophe Leroy  wrote:
>
>> Hi Michael,
>> 
>> Le 20/12/2021 à 17:38, Christophe Leroy a écrit :
>> > CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS is required to test
>> > direct tramp.
>> > 
>> > Signed-off-by: Christophe Leroy   
>> 
>> You didn't apply this patch when you merged the series. Without it I get 
>> the following :
>
> Maybe they wanted my acked-by.

Yeah, I didn't want to take it via my tree without an ack. I meant to
reply to the patch saying that but ...

> But I'm working on a series to send to Linus. I can pick this patch up, as
> it touches just my code.

Thanks.

cheers

[PATCH V2] platforms/83xx: Use of_device_get_match_data()

2022-02-24 Thread cgel . zte

From: Minghao Chi (CGEL ZTE) 

Use of_device_get_match_data() to simplify the code.
v1->v2:
Add a judgment on the return value of the A function as NULL

Reported-by: Zeal Robot 
Signed-off-by: Minghao Chi (CGEL ZTE) 
---
 arch/powerpc/platforms/83xx/suspend.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/83xx/suspend.c 
b/arch/powerpc/platforms/83xx/suspend.c
index bb147d34d4a6..6d47a5b81485 100644
--- a/arch/powerpc/platforms/83xx/suspend.c
+++ b/arch/powerpc/platforms/83xx/suspend.c
@@ -322,18 +322,15 @@ static const struct platform_suspend_ops 
mpc83xx_suspend_ops = {
 static const struct of_device_id pmc_match[];
 static int pmc_probe(struct platform_device *ofdev)
 {
-   const struct of_device_id *match;
struct device_node *np = ofdev->dev.of_node;
struct resource res;
const struct pmc_type *type;
int ret = 0;
 
-   match = of_match_device(pmc_match, >dev);
-   if (!match)
+   type = of_device_get_match_data(>dev);
+   if (!type)
return -EINVAL;
 
-   type = match->data;
-
if (!of_device_is_available(np))
return -ENODEV;
 
-- 
2.25.1

Re: [PATCH 2/3] powerpc: fix build errors

2022-02-24 Thread Nicholas Piggin

Excerpts from Segher Boessenkool's message of February 25, 2022 3:12 am:
> On Thu, Feb 24, 2022 at 03:05:28PM +1000, Nicholas Piggin wrote:
>> + * gcc 10 started to emit a .machine directive at the beginning of generated
>> + * .s files, which overrides assembler -Wa,-m options passed down.
>> + * Unclear if this behaviour will be reverted.
> 
> It will not be reverted.  If you need a certain .machine for some asm
> code, you should write just that!

It should be reverted because it breaks old binutils which did not have
the workaround patch for this broken gcc behaviour. And it is just
unnecessary because -m option can already be used to do the same thing.

Not that I expect gcc to revert it.

> 
>> +#ifdef CONFIG_CC_IS_GCC
>> +#if (GCC_VERSION >= 10)
>> +#if (CONFIG_AS_VERSION == 23800)
>> +asm(".machine any");
>> +#endif
>> +#endif
>> +#endif
>> +#endif /* __ASSEMBLY__ */
> 
> Abusing toplevel asm like this is broken and you *will* end up with
> unhappiness all around.

It actually unbreaks things and reduces my unhappiness. It's only done 
for broken compiler versions and only where as does not have the 
workaround for the breakage.

Thanks,
Nick

Re: [PATCH 2/3] powerpc: fix build errors

2022-02-24 Thread Nicholas Piggin

Excerpts from Segher Boessenkool's message of February 25, 2022 3:29 am:
> On Thu, Feb 24, 2022 at 09:13:25PM +1000, Nicholas Piggin wrote:
>> Excerpts from Arnd Bergmann's message of February 24, 2022 8:20 pm:
>> > Again, there should be a minimum number of those .machine directives
>> > in inline asm as well, which tends to work out fine as long as the
>> > entire kernel is built with the correct -march= option for the minimum
>> > supported CPU, and stays away from inline asm that requires a higher
>> > CPU level.
>> 
>> There's really no advantage to them, and they're ugly and annoying
>> and if we applied the concept consistently for all asm they would grow 
>> to a very large number.
> 
> The advantage is that you get machine code that *works*.  There are
> quite a few mnemonics that translate to different instructions with
> different machine options!  We like to get the intended instructions
> instead of something that depends on what assembler options the user
> has passed behind our backs.
> 
>> The idea they'll give you good static checking just doesn't really
>> pan out.
> 
> That never was a goal of this at all.
> 
> -many was very problematical for GCC itself.  We no longer use it.

You have the wrong context. We're not talking about -many vs .machine
here.

Thanks,
Nick

Re: False positive kmemleak report for dtb properties names on powerpc

2022-02-24 Thread Ariel Marcovitch


Ping :)

On 18/02/2022 21:45, Ariel Marcovitch wrote:

Hello!

I was running a powerpc 32bit kernel (built using 
qemu_ppc_mpc8544ds_defconfig
buildroot config, with enabling DEBUGFS+KMEMLEAK+HIGHMEM in the kernel 
config)
on qemu and invoked the kmemleak scan (twice. for some reason the 
first time wasn't enough).


(Actually the problem will probably reproduce on every ppc kernel with
HIGHMEM enabled, but I only checked this config)

I got 97 leak reports, all similar to the following:

```

unreferenced object 0xc1803840 (size 16):
  comm "swapper", pid 1, jiffies 4294892303 (age 39.320s)
  hex dump (first 16 bytes):
    64 65 76 69 63 65 5f 74 79 70 65 00 00 00 00 00 device_type.
  backtrace:
    [<(ptrval)>] kstrdup+0x40/0x98
    [<(ptrval)>] __of_add_property_sysfs+0xa4/0x10c
    [<(ptrval)>] __of_attach_node_sysfs+0xc0/0x110
    [<(ptrval)>] of_core_init+0xa8/0x15c
    [<(ptrval)>] driver_init+0x24/0x3c
    [<(ptrval)>] kernel_init_freeable+0xb8/0x23c
    [<(ptrval)>] kernel_init+0x24/0x14c
    [<(ptrval)>] ret_from_kernel_thread+0x5c/0x64
```

The objects in the reports are the names of the sysfs files created 
for the dtb

nodes and properties.

These are definitely not leaked, as they are even visible to the user 
as the sysfs file names.


These strings (for dtb properties, in the case of the shown report, 
but the case with dtb nodes is very similar) are created in 
__of_add_property_sysfs() and the pointer to them is stored in 
pp->attr.attr.name (so, actually stored in the memory pointed by pp)


pp is one of the dtb property objects which are allocated in 
early_init_dt_alloc_memory_arch() in of/fdt.c using memblock_alloc. 
This happens very early, in setup_arch()->unflatten_device_tree().


memblock_alloc lets kmemleak know about the allocated memory using 
kmemleak_alloc_phys (in mm/memblock.c:memblock_alloc_range_nid()).


The problem is with the following code (mm/kmemleak.c):

```c

void __ref kmemleak_alloc_phys(phys_addr_t phys, size_t size, int 
min_count,

   gfp_t gfp)
{
    if (!IS_ENABLED(CONFIG_HIGHMEM) || PHYS_PFN(phys) < max_low_pfn)
    kmemleak_alloc(__va(phys), size, min_count, gfp);
}

```

When CONFIG_HIGHMEM is enabled, the pfn of the allocated memory is 
checked against max_low_pfn, to make sure it is not in the HIGHMEM zone.


However, when called through unflatten_device_tree(), max_low_pfn is 
not yet initialized in powerpc.


max_low_pfn is initialized (when NUMA is disabled) in 
arch/powerpc/mm/mem.c:mem_topology_setup() which is called only after 
unflatten_device_tree() is called in the same function (setup_arch()).


Because max_low_pfn is global it is 0 before initialization, so as far 
as kmemleak_alloc_phys() is concerned, every memory is HIGHMEM (: and 
the allocated memory is not tracked by kmemleak, causing references to 
objects allocated later with kmalloc() to be ignored and these objects 
are marked as leaked.


I actually tried to find out whether this happen on other arches as 
well, and it seems like arm64 also have this problem when dtb is used 
instead of acpi, although I haven't had the chance to confirm this.


I don't suppose I can just shuffle the calls in setup_arch() around, 
so I wanted to hear your opinions first


Thanks!

[PATCH] powerpc/mm/numa: skip NUMA_NO_NODE onlining in parse_numa_properties()

2022-02-24 Thread Daniel Henrique Barboza

Executing node_set_online() when nid = NUMA_NO_NODE results in an
undefined behavior. node_set_online() will call node_set_state(), into
__node_set(), into set_bit(), and since NUMA_NO_NODE is -1 we'll end up
doing a negative shift operation inside
arch/powerpc/include/asm/bitops.h. This potential UB was detected
running a kernel with CONFIG_UBSAN.

The behavior was introduced by commit 10f78fd0dabb ("powerpc/numa: Fix a
regression on memoryless node 0"), where the check for nid > 0 was
removed to fix a problem that was happening with nid = 0, but the result
is that now we're trying to online NUMA_NO_NODE nids as well.

Checking for nid >= 0 will allow node 0 to be onlined while avoiding
this UB with NUMA_NO_NODE.

Reported-by: Ping Fang 
Cc: Diego Domingos 
Cc: Aneesh Kumar K.V 
Cc: Srikar Dronamraju 
Fixes: 10f78fd0dabb ("powerpc/numa: Fix a regression on memoryless node 0")
Signed-off-by: Daniel Henrique Barboza 
---
 arch/powerpc/mm/numa.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 9d5f710d2c20..b9b7fefbb64b 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -956,7 +956,9 @@ static int __init parse_numa_properties(void)
of_node_put(cpu);
}
 
-   node_set_online(nid);
+   /* node_set_online() is an UB if 'nid' is negative */
+   if (likely(nid >= 0))
+   node_set_online(nid);
}
 
get_n_mem_cells(_mem_addr_cells, _mem_size_cells);
-- 
2.35.1

Re: [PATCH 00/16] Remove usage of the deprecated "pci-dma-compat.h" API

2022-02-24 Thread Christophe JAILLET




Le 24/02/2022 à 08:07, Arnd Bergmann a écrit :

On Thu, Feb 24, 2022 at 7:25 AM Christoph Hellwig  wrote:

On Wed, Feb 23, 2022 at 09:26:56PM +0100, Christophe JAILLET wrote:

Patch 01, 04, 05, 06, 08, 09 have not reached -next yet.
They all still apply cleanly.

04 has been picked it up for inclusion in the media subsystem for 5.18.
The other ones all have 1 or more Reviewed-by:/Acked-by: tags.

Patch 16 must be resubmitted to add "#include " in
order not to break builds.

So how about this:  I'll pick up 1, 5,6,8 and 9 for the dma-mapping
tree.  After -rc1 when presumably all other patches have reached
mainline your resubmit one with the added include and we finish this
off?

Sounds good to me as well.

Arnd


This is fine for me.
When all patches have reached -next, I'll re-submit the fixed 16th patch.


Thanks for your assistance for ending this long story :)

CJ

Re: [PATCH V6 17/20] riscv: compat: vdso: Add setup additional pages implementation

2022-02-24 Thread Palmer Dabbelt


On Thu, 24 Feb 2022 00:54:07 PST (-0800), guo...@kernel.org wrote:

From: Guo Ren 

Reconstruct __setup_additional_pages() by appending vdso info
pointer argument to meet compat_vdso_info requirement. And change
vm_special_mapping *dm, *cm initialization into static.

Signed-off-by: Guo Ren 
Signed-off-by: Guo Ren 
Reviewed-by: Palmer Dabbelt 
Cc: Arnd Bergmann 
---
 arch/riscv/include/asm/elf.h |   5 ++
 arch/riscv/include/asm/mmu.h |   1 +
 arch/riscv/kernel/vdso.c | 103 +++
 3 files changed, 74 insertions(+), 35 deletions(-)

diff --git a/arch/riscv/include/asm/elf.h b/arch/riscv/include/asm/elf.h
index 3a4293dc7229..d87d3bcc758d 100644
--- a/arch/riscv/include/asm/elf.h
+++ b/arch/riscv/include/asm/elf.h
@@ -134,5 +134,10 @@ do {if ((ex).e_ident[EI_CLASS] == ELFCLASS32)  
\
 typedef compat_ulong_t compat_elf_greg_t;
 typedef compat_elf_greg_t  compat_elf_gregset_t[ELF_NGREG];

+extern int compat_arch_setup_additional_pages(struct linux_binprm *bprm,
+ int uses_interp);
+#define compat_arch_setup_additional_pages \
+   compat_arch_setup_additional_pages
+
 #endif /* CONFIG_COMPAT */
 #endif /* _ASM_RISCV_ELF_H */
diff --git a/arch/riscv/include/asm/mmu.h b/arch/riscv/include/asm/mmu.h
index 0099dc116168..cedcf8ea3c76 100644
--- a/arch/riscv/include/asm/mmu.h
+++ b/arch/riscv/include/asm/mmu.h
@@ -16,6 +16,7 @@ typedef struct {
atomic_long_t id;
 #endif
void *vdso;
+   void *vdso_info;
 #ifdef CONFIG_SMP
/* A local icache flush is needed before user execution can resume. */
cpumask_t icache_stale_mask;
diff --git a/arch/riscv/kernel/vdso.c b/arch/riscv/kernel/vdso.c
index a9436a65161a..f864811aa011 100644
--- a/arch/riscv/kernel/vdso.c
+++ b/arch/riscv/kernel/vdso.c
@@ -23,6 +23,9 @@ struct vdso_data {
 #endif

 extern char vdso_start[], vdso_end[];
+#ifdef CONFIG_COMPAT
+extern char compat_vdso_start[], compat_vdso_end[];
+#endif

 enum vvar_pages {
VVAR_DATA_PAGE_OFFSET,
@@ -30,6 +33,11 @@ enum vvar_pages {
VVAR_NR_PAGES,
 };

+enum rv_vdso_map {
+   RV_VDSO_MAP_VVAR,
+   RV_VDSO_MAP_VDSO,
+};
+
 #define VVAR_SIZE  (VVAR_NR_PAGES << PAGE_SHIFT)

 /*
@@ -52,12 +60,6 @@ struct __vdso_info {
struct vm_special_mapping *cm;
 };

-static struct __vdso_info vdso_info __ro_after_init = {
-   .name = "vdso",
-   .vdso_code_start = vdso_start,
-   .vdso_code_end = vdso_end,
-};
-
 static int vdso_mremap(const struct vm_special_mapping *sm,
   struct vm_area_struct *new_vma)
 {
@@ -66,37 +68,33 @@ static int vdso_mremap(const struct vm_special_mapping *sm,
return 0;
 }

-static int __init __vdso_init(void)
+static void __init __vdso_init(struct __vdso_info *vdso_info)
 {
unsigned int i;
struct page **vdso_pagelist;
unsigned long pfn;

-   if (memcmp(vdso_info.vdso_code_start, "\177ELF", 4)) {
-   pr_err("vDSO is not a valid ELF object!\n");
-   return -EINVAL;
-   }
+   if (memcmp(vdso_info->vdso_code_start, "\177ELF", 4))
+   panic("vDSO is not a valid ELF object!\n");

-   vdso_info.vdso_pages = (
-   vdso_info.vdso_code_end -
-   vdso_info.vdso_code_start) >>
+   vdso_info->vdso_pages = (
+   vdso_info->vdso_code_end -
+   vdso_info->vdso_code_start) >>
PAGE_SHIFT;

-   vdso_pagelist = kcalloc(vdso_info.vdso_pages,
+   vdso_pagelist = kcalloc(vdso_info->vdso_pages,
sizeof(struct page *),
GFP_KERNEL);
if (vdso_pagelist == NULL)
-   return -ENOMEM;
+   panic("vDSO kcalloc failed!\n");

/* Grab the vDSO code pages. */
-   pfn = sym_to_pfn(vdso_info.vdso_code_start);
+   pfn = sym_to_pfn(vdso_info->vdso_code_start);

-   for (i = 0; i < vdso_info.vdso_pages; i++)
+   for (i = 0; i < vdso_info->vdso_pages; i++)
vdso_pagelist[i] = pfn_to_page(pfn + i);

-   vdso_info.cm->pages = vdso_pagelist;
-
-   return 0;
+   vdso_info->cm->pages = vdso_pagelist;
 }

 #ifdef CONFIG_TIME_NS
@@ -116,13 +114,14 @@ int vdso_join_timens(struct task_struct *task, struct 
time_namespace *ns)
 {
struct mm_struct *mm = task->mm;
struct vm_area_struct *vma;
+   struct __vdso_info *vdso_info = mm->context.vdso_info;

mmap_read_lock(mm);

for (vma = mm->mmap; vma; vma = vma->vm_next) {
unsigned long size = vma->vm_end - vma->vm_start;

-   if (vma_is_special_mapping(vma, vdso_info.dm))
+   if (vma_is_special_mapping(vma, vdso_info->dm))
zap_page_range(vma, vma->vm_start, size);
}

@@ -187,11 +186,6 @@ static vm_fault_t vvar_fault(const struct 
vm_special_mapping *sm,
return

Re: [PATCH 2/3] powerpc: fix build errors

2022-02-24 Thread Segher Boessenkool

On Thu, Feb 24, 2022 at 11:39:16PM +1100, Michael Ellerman wrote:
> > /* Calculate the parity of the value */
> > -   asm ("popcntd %0,%1" : "=r" (parity) : "r" (val));
> > +   asm (".machine \"push\"\n"
> > +".machine \"power7\"\n"
> > +"popcntd %0,%1\n"
> > +".machine \"pop\"\n"
> > +: "=r" (parity) : "r" (val));
> 
> This was actually present in an older CPU, but it doesn't really matter,
> this is fine.

popcntd was new on p7 (popcntb is the older one :-) )  And it does not
matter indeed.


Segher

Re: [PATCH 2/3] powerpc: fix build errors

2022-02-24 Thread Segher Boessenkool

On Thu, Feb 24, 2022 at 09:13:25PM +1000, Nicholas Piggin wrote:
> Excerpts from Arnd Bergmann's message of February 24, 2022 8:20 pm:
> > Again, there should be a minimum number of those .machine directives
> > in inline asm as well, which tends to work out fine as long as the
> > entire kernel is built with the correct -march= option for the minimum
> > supported CPU, and stays away from inline asm that requires a higher
> > CPU level.
> 
> There's really no advantage to them, and they're ugly and annoying
> and if we applied the concept consistently for all asm they would grow 
> to a very large number.

The advantage is that you get machine code that *works*.  There are
quite a few mnemonics that translate to different instructions with
different machine options!  We like to get the intended instructions
instead of something that depends on what assembler options the user
has passed behind our backs.

> The idea they'll give you good static checking just doesn't really
> pan out.

That never was a goal of this at all.

-many was very problematical for GCC itself.  We no longer use it.

Segher

Re: [PATCH 2/3] powerpc: fix build errors

2022-02-24 Thread Segher Boessenkool

On Thu, Feb 24, 2022 at 03:05:28PM +1000, Nicholas Piggin wrote:
> + * gcc 10 started to emit a .machine directive at the beginning of generated
> + * .s files, which overrides assembler -Wa,-m options passed down.
> + * Unclear if this behaviour will be reverted.

It will not be reverted.  If you need a certain .machine for some asm
code, you should write just that!

> +#ifdef CONFIG_CC_IS_GCC
> +#if (GCC_VERSION >= 10)
> +#if (CONFIG_AS_VERSION == 23800)
> +asm(".machine any");
> +#endif
> +#endif
> +#endif
> +#endif /* __ASSEMBLY__ */

Abusing toplevel asm like this is broken and you *will* end up with
unhappiness all around.

Segher

Re: cleanup swiotlb initialization

2022-02-24 Thread Boris Ostrovsky




On 2/24/22 11:39 AM, Christoph Hellwig wrote:

On Thu, Feb 24, 2022 at 11:18:33AM -0500, Boris Ostrovsky wrote:

On 2/24/22 10:58 AM, Christoph Hellwig wrote:

Thanks.

This looks really strange as early_amd_iommu_init should not interact much
with the changes.  I'll see if I can find a AMD system to test on.


Just to be clear: this crashes only as dom0. Boots fine as baremetal.

Ah.  I can gues what this might be.  On Xen the hypervisor controls the
IOMMU and we should never end up initializing it in Linux, right?



Right, we shouldn't be in that code path.


-boris

Re: cleanup swiotlb initialization

2022-02-24 Thread Christoph Hellwig

On Thu, Feb 24, 2022 at 11:18:33AM -0500, Boris Ostrovsky wrote:
>
> On 2/24/22 10:58 AM, Christoph Hellwig wrote:
>> Thanks.
>>
>> This looks really strange as early_amd_iommu_init should not interact much
>> with the changes.  I'll see if I can find a AMD system to test on.
>
>
> Just to be clear: this crashes only as dom0. Boots fine as baremetal.

Ah.  I can gues what this might be.  On Xen the hypervisor controls the
IOMMU and we should never end up initializing it in Linux, right?

[PATCHv2 2/3] powerpc: fix build errors

2022-02-24 Thread Anders Roxell

Building tinyconfig with gcc (Debian 11.2.0-16) and assembler (Debian
2.37.90.20220207) the following build error shows up:

 {standard input}: Assembler messages:
 {standard input}:1190: Error: unrecognized opcode: `stbcix'
 {standard input}:1433: Error: unrecognized opcode: `lwzcix'
 {standard input}:1453: Error: unrecognized opcode: `stbcix'
 {standard input}:1460: Error: unrecognized opcode: `stwcix'
 {standard input}:1596: Error: unrecognized opcode: `stbcix'
 ...

Rework to add assembler directives [1] around the instruction. Going
through the them one by one shows that the changes should be safe.  Like
__get_user_atomic_128_aligned() is only called in p9_hmi_special_emu(),
which according to the name is specific to power9.  And __raw_rm_read*()
are only called in things that are powernv or book3s_hv specific.

[1] 
https://sourceware.org/binutils/docs/as/PowerPC_002dPseudo.html#PowerPC_002dPseudo

Cc: 
Co-developed-by: Arnd Bergmann 
Signed-off-by: Arnd Bergmann 
Reviewed-by: Segher Boessenkool 
Signed-off-by: Anders Roxell 
---
 arch/powerpc/include/asm/io.h| 40 ++--
 arch/powerpc/include/asm/uaccess.h   |  3 +++
 arch/powerpc/platforms/powernv/rng.c |  6 -
 3 files changed, 40 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
index beba4979bff9..fee979d3a1aa 100644
--- a/arch/powerpc/include/asm/io.h
+++ b/arch/powerpc/include/asm/io.h
@@ -359,25 +359,37 @@ static inline void __raw_writeq_be(unsigned long v, 
volatile void __iomem *addr)
  */
 static inline void __raw_rm_writeb(u8 val, volatile void __iomem *paddr)
 {
-   __asm__ __volatile__("stbcix %0,0,%1"
+   __asm__ __volatile__(".machine push;   \
+ .machine power6; \
+ stbcix %0,0,%1;  \
+ .machine pop;"
: : "r" (val), "r" (paddr) : "memory");
 }
 
 static inline void __raw_rm_writew(u16 val, volatile void __iomem *paddr)
 {
-   __asm__ __volatile__("sthcix %0,0,%1"
+   __asm__ __volatile__(".machine push;   \
+ .machine power6; \
+ sthcix %0,0,%1;  \
+ .machine pop;"
: : "r" (val), "r" (paddr) : "memory");
 }
 
 static inline void __raw_rm_writel(u32 val, volatile void __iomem *paddr)
 {
-   __asm__ __volatile__("stwcix %0,0,%1"
+   __asm__ __volatile__(".machine push;   \
+ .machine power6; \
+ stwcix %0,0,%1;  \
+ .machine pop;"
: : "r" (val), "r" (paddr) : "memory");
 }
 
 static inline void __raw_rm_writeq(u64 val, volatile void __iomem *paddr)
 {
-   __asm__ __volatile__("stdcix %0,0,%1"
+   __asm__ __volatile__(".machine push;   \
+ .machine power6; \
+ stdcix %0,0,%1;  \
+ .machine pop;"
: : "r" (val), "r" (paddr) : "memory");
 }
 
@@ -389,7 +401,10 @@ static inline void __raw_rm_writeq_be(u64 val, volatile 
void __iomem *paddr)
 static inline u8 __raw_rm_readb(volatile void __iomem *paddr)
 {
u8 ret;
-   __asm__ __volatile__("lbzcix %0,0, %1"
+   __asm__ __volatile__(".machine push;   \
+ .machine power6; \
+ lbzcix %0,0, %1; \
+ .machine pop;"
 : "=r" (ret) : "r" (paddr) : "memory");
return ret;
 }
@@ -397,7 +412,10 @@ static inline u8 __raw_rm_readb(volatile void __iomem 
*paddr)
 static inline u16 __raw_rm_readw(volatile void __iomem *paddr)
 {
u16 ret;
-   __asm__ __volatile__("lhzcix %0,0, %1"
+   __asm__ __volatile__(".machine push;   \
+ .machine power6; \
+ lhzcix %0,0, %1; \
+ .machine pop;"
 : "=r" (ret) : "r" (paddr) : "memory");
return ret;
 }
@@ -405,7 +423,10 @@ static inline u16 __raw_rm_readw(volatile void __iomem 
*paddr)
 static inline u32 __raw_rm_readl(volatile void __iomem *paddr)
 {
u32 ret;
-   __asm__ __volatile__("lwzcix %0,0, %1"
+   __asm__ __volatile__(".machine push;   \
+ .machine power6; \
+ lwzcix %0,0, %1; \
+ .machine pop;"
 : "=r" (ret) : "r" (paddr) : "memory");
return ret;
 }
@@ -413,7 +434,10 @@ static inline u32 __raw_rm_readl(volatile void __iomem 
*paddr)
 static inline u64 __raw_rm_readq(volatile void __iomem *paddr)
 {
u64 ret;
-   __asm__ __volatile__("ldcix %0,0, %1"
+   __asm__ __volatile__(".machine push;   \
+ .machine power6; \
+ ldcix %0,0, %1;  \
+ .machine pop;"

[PATCHv2 1/3] powerpc: lib: sstep: fix 'sthcx' instruction

2022-02-24 Thread Anders Roxell

Looks like there been a copy paste mistake when added the instruction
'stbcx' twice and one was probably meant to be 'sthcx'.
Changing to 'sthcx' from 'stbcx'.

Cc:  # v4.13+
Fixes: 350779a29f11 ("powerpc: Handle most loads and stores in instruction 
emulation code")
Reported-by: Arnd Bergmann 
Signed-off-by: Anders Roxell 
---
 arch/powerpc/lib/sstep.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index bd3734d5be89..d2d29243fa6d 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -3389,7 +3389,7 @@ int emulate_loadstore(struct pt_regs *regs, struct 
instruction_op *op)
__put_user_asmx(op->val, ea, err, "stbcx.", cr);
break;
case 2:
-   __put_user_asmx(op->val, ea, err, "stbcx.", cr);
+   __put_user_asmx(op->val, ea, err, "sthcx.", cr);
break;
 #endif
case 4:
-- 
2.34.1

[PATCHv2 3/3] powerpc: lib: sstep: fix build errors

2022-02-24 Thread Anders Roxell

Building tinyconfig with gcc (Debian 11.2.0-16) and assembler (Debian
2.37.90.20220207) the following build error shows up:

{standard input}: Assembler messages:
{standard input}:10576: Error: unrecognized opcode: `stbcx.'
{standard input}:10680: Error: unrecognized opcode: `lharx'
{standard input}:10694: Error: unrecognized opcode: `lbarx'

Rework to add assembler directives [1] around the instruction.  The
problem with this might be that we can trick a power6 into
single-stepping through an stbcx. for instance, and it will execute that
in kernel mode.

[1] 
https://sourceware.org/binutils/docs/as/PowerPC_002dPseudo.html#PowerPC_002dPseudo

Cc: 
Co-developed-by: Arnd Bergmann 
Signed-off-by: Arnd Bergmann 
Reviewed-by: Segher Boessenkool 
Signed-off-by: Anders Roxell 
---
 arch/powerpc/lib/sstep.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index d2d29243fa6d..f3ed80513a90 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -1097,7 +1097,10 @@ NOKPROBE_SYMBOL(emulate_dcbz);
 
 #define __put_user_asmx(x, addr, err, op, cr)  \
__asm__ __volatile__(   \
+   ".machine push\n"   \
+   ".machine power8\n" \
"1: " op " %2,0,%3\n"   \
+   ".machine pop\n"\
"   mfcr%1\n"   \
"2:\n"  \
".section .fixup,\"ax\"\n"  \
@@ -1110,7 +1113,10 @@ NOKPROBE_SYMBOL(emulate_dcbz);
 
 #define __get_user_asmx(x, addr, err, op)  \
__asm__ __volatile__(   \
+   ".machine push\n"   \
+   ".machine power8\n" \
"1: "op" %1,0,%2\n" \
+   ".machine pop\n"\
"2:\n"  \
".section .fixup,\"ax\"\n"  \
"3: li  %0,%3\n"\
-- 
2.34.1

Re: cleanup swiotlb initialization

2022-02-24 Thread Boris Ostrovsky




On 2/24/22 10:58 AM, Christoph Hellwig wrote:

Thanks.

This looks really strange as early_amd_iommu_init should not interact much
with the changes.  I'll see if I can find a AMD system to test on.



Just to be clear: this crashes only as dom0. Boots fine as baremetal.


-boris




On Wed, Feb 23, 2022 at 07:57:49PM -0500, Boris Ostrovsky wrote:

[   37.377313] BUG: unable to handle page fault for address: c90042880018
[   37.378219] #PF: supervisor read access in kernel mode
[   37.378219] #PF: error_code(0x) - not-present page
[   37.378219] PGD 7c2f2ee067 P4D 7c2f2ee067 PUD 7bf019b067 PMD 105a30067 PTE 0
[   37.378219] Oops:  [#1] PREEMPT SMP NOPTI
[   37.378219] CPU: 14 PID: 1 Comm: swapper/0 Not tainted 5.17.0-rc5swiotlb #9
[   37.378219] Hardware name: Oracle Corporation ORACLE SERVER 
E1-2c/ASY,Generic,SM,E1-2c, BIOS 49004900 12/23/2020
[   37.378219] RIP: e030:init_iommu_one+0x248/0x2f0
[   37.378219] Code: 48 89 43 68 48 85 c0 74 c4 be 00 20 00 00 48 89 df e8 ea ee ff 
ff 48 89 43 78 48 85 c0 74 ae c6 83 98 00 00 00 00 48 8b 43 38 <48> 8b 40 18 a8 
01 74 07 83 8b a8 04 00 00 01 f6 83 a8 04 00 00 01
[   37.378219] RSP: e02b:c9004044bd18 EFLAGS: 00010286
[   37.378219] RAX: c9004288 RBX: 888107260800 RCX: 
[   37.378219] RDX: 8000 RSI: ea00041cab80 RDI: 
[   37.378219] RBP: c9004044bd38 R08: 0901 R09: ea00041cab00
[   37.378219] R10: 0002 R11:  R12: c90040435008
[   37.378219] R13: 0008 R14: efa0 R15: 
[   37.378219] FS:  () GS:88fef418() 
knlGS:
[   37.378219] CS:  e030 DS:  ES:  CR0: 80050033
[   37.378219] CR2: c90042880018 CR3: 0260a000 CR4: 00050660
[   37.378219] Call Trace:
[   37.378219]  
[   37.378219]  early_amd_iommu_init+0x3c5/0x72d
[   37.378219]  ? iommu_setup+0x284/0x284
[   37.378219]  state_next+0x158/0x68f
[   37.378219]  ? iommu_setup+0x284/0x284
[   37.378219]  iommu_go_to_state+0x28/0x2d
[   37.378219]  amd_iommu_init+0x15/0x4b
[   37.378219]  ? iommu_setup+0x284/0x284
[   37.378219]  pci_iommu_init+0x12/0x37
[   37.378219]  do_one_initcall+0x48/0x210
[   37.378219]  kernel_init_freeable+0x229/0x28c
[   37.378219]  ? rest_init+0xe0/0xe0
[   37.963966]  kernel_init+0x1a/0x130
[   37.979415]  ret_from_fork+0x22/0x30
[   37.991436]  
[   37.999465] Modules linked in:
[   38.007413] CR2: c90042880018
[   38.019416] ---[ end trace  ]---
[   38.023418] RIP: e030:init_iommu_one+0x248/0x2f0
[   38.023418] Code: 48 89 43 68 48 85 c0 74 c4 be 00 20 00 00 48 89 df e8 ea ee ff 
ff 48 89 43 78 48 85 c0 74 ae c6 83 98 00 00 00 00 48 8b 43 38 <48> 8b 40 18 a8 
01 74 07 83 8b a8 04 00 00 01 f6 83 a8 04 00 00 01
[   38.023418] RSP: e02b:c9004044bd18 EFLAGS: 00010286
[   38.023418] RAX: c9004288 RBX: 888107260800 RCX: 
[   38.155413] RDX: 8000 RSI: ea00041cab80 RDI: 
[   38.175965] Freeing initrd memory: 62640K
[   38.155413] RBP: c9004044bd38 R08: 0901 R09: ea00041cab00
[   38.155413] R10: 0002 R11:  R12: c90040435008
[   38.155413] R13: 0008 R14: efa0 R15: 
[   38.155413] FS:  () GS:88fef418() 
knlGS:
[   38.287414] CS:  e030 DS:  ES:  CR0: 80050033
[   38.309557] CR2: c90042880018 CR3: 0260a000 CR4: 00050660
[   38.332403] Kernel panic - not syncing: Fatal exception
[   38.351414] Rebooting in 20 seconds..



-boris

---end quoted text---

Re: [PATCH 2/3] powerpc: fix build errors

2022-02-24 Thread Anders Roxell

On Thu, 24 Feb 2022 at 13:39, Michael Ellerman  wrote:
>
> Hi Anders,

Hi Michael,

>
> Thanks for these, just a few comments below ...

I will resolve the comments below and resend a v2 shortly.

Cheers,
Anders

>
> Anders Roxell  writes:
> > Building tinyconfig with gcc (Debian 11.2.0-16) and assembler (Debian
> > 2.37.90.20220207) the following build error shows up:
> >
> >  {standard input}: Assembler messages:
> >  {standard input}:1190: Error: unrecognized opcode: `stbcix'
> >  {standard input}:1433: Error: unrecognized opcode: `lwzcix'
> >  {standard input}:1453: Error: unrecognized opcode: `stbcix'
> >  {standard input}:1460: Error: unrecognized opcode: `stwcix'
> >  {standard input}:1596: Error: unrecognized opcode: `stbcix'
> >  ...
> >
> > Rework to add assembler directives [1] around the instruction. Going
> > through the them one by one shows that the changes should be safe.  Like
> > __get_user_atomic_128_aligned() is only called in p9_hmi_special_emu(),
> > which according to the name is specific to power9.  And __raw_rm_read*()
> > are only called in things that are powernv or book3s_hv specific.
> >
> > [1] 
> > https://sourceware.org/binutils/docs/as/PowerPC_002dPseudo.html#PowerPC_002dPseudo
> >
> > Cc: 
> > Co-developed-by: Arnd Bergmann 
> > Signed-off-by: Arnd Bergmann 
> > Signed-off-by: Anders Roxell 
> > ---
> >  arch/powerpc/include/asm/io.h| 46 +++-
> >  arch/powerpc/include/asm/uaccess.h   |  3 ++
> >  arch/powerpc/platforms/powernv/rng.c |  6 +++-
> >  3 files changed, 46 insertions(+), 9 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
> > index beba4979bff9..5ff6dec489f8 100644
> > --- a/arch/powerpc/include/asm/io.h
> > +++ b/arch/powerpc/include/asm/io.h
> > @@ -359,25 +359,37 @@ static inline void __raw_writeq_be(unsigned long v, 
> > volatile void __iomem *addr)
> >   */
> >  static inline void __raw_rm_writeb(u8 val, volatile void __iomem *paddr)
> >  {
> > - __asm__ __volatile__("stbcix %0,0,%1"
> > + __asm__ __volatile__(".machine \"push\"\n"
> > +  ".machine \"power6\"\n"
> > +  "stbcix %0,0,%1\n"
> > +  ".machine \"pop\"\n"
> >   : : "r" (val), "r" (paddr) : "memory");
>
> As Segher said it'd be cleaner without the embedded quotes.
>
> > @@ -441,7 +465,10 @@ static inline unsigned int name(unsigned int port) 
> >   \
> >   unsigned int x; \
> >   __asm__ __volatile__(   \
> >   "sync\n"\
> > + ".machine \"push\"\n"   \
> > + ".machine \"power6\"\n" \
> >   "0:"op "%0,0,%1\n"  \
> > + ".machine \"pop\"\n"\
> >   "1: twi 0,%0,0\n"   \
> >   "2: isync\n"\
> >   "3: nop\n"  \
> > @@ -465,7 +492,10 @@ static inline void name(unsigned int val, unsigned int 
> > port) \
> >  {\
> >   __asm__ __volatile__(   \
> >   "sync\n"\
> > + ".machine \"push\"\n"   \
> > + ".machine \"power6\"\n" \
> >   "0:" op " %0,0,%1\n"\
> > + ".machine \"pop\"\n"\
> >   "1: sync\n" \
> >   "2:\n"  \
> >   EX_TABLE(0b, 2b)\
>
> It's not visible from the diff, but the above two are __do_in_asm and
> __do_out_asm and are inside an ifdef CONFIG_PPC32.
>
> AFAICS they're only used for:
>
> __do_in_asm(_rec_inb, "lbzx")
> __do_in_asm(_rec_inw, "lhbrx")
> __do_in_asm(_rec_inl, "lwbrx")
> __do_out_asm(_rec_outb, "stbx")
> __do_out_asm(_rec_outw, "sthbrx")
> __do_out_asm(_rec_outl, "stwbrx")
>
> Which are all old instructions, so I don't think we need the machine
> power6 for those two macros?
>
> > diff --git a/arch/powerpc/platforms/powernv/rng.c 
> > b/arch/powerpc/platforms/powernv/rng.c
> > index b4386714494a..5bf30ef6d928 100644
> > --- a/arch/powerpc/platforms/powernv/rng.c
> > +++ b/arch/powerpc/platforms/powernv/rng.c
> > @@ -43,7 +43,11 @@ static unsigned long rng_whiten(struct powernv_rng *rng, 
> > unsigned long val)
> >   unsigned long parity;
> >
> >   /* Calculate the parity of the value */
> > - asm ("popcntd %0,%1" : "=r" (parity) : "r" (val));
> > + asm (".machine \"push\"\n"
> > +  ".machine \"power7\"\n"
> > +  "popcntd %0,%1\n"
> > +  ".machine \"pop\"\n"
> > +  : "=r" (parity) : "r" (val));
>
> This was actually present in an older CPU, but it doesn't really matter,
> this

Re: cleanup swiotlb initialization

2022-02-24 Thread Christoph Hellwig

Thanks.

This looks really strange as early_amd_iommu_init should not interact much
with the changes.  I'll see if I can find a AMD system to test on.

On Wed, Feb 23, 2022 at 07:57:49PM -0500, Boris Ostrovsky wrote:
> [   37.377313] BUG: unable to handle page fault for address: c90042880018
> [   37.378219] #PF: supervisor read access in kernel mode
> [   37.378219] #PF: error_code(0x) - not-present page
> [   37.378219] PGD 7c2f2ee067 P4D 7c2f2ee067 PUD 7bf019b067 PMD 105a30067 PTE > 0
> [   37.378219] Oops:  [#1] PREEMPT SMP NOPTI
> [   37.378219] CPU: 14 PID: 1 Comm: swapper/0 Not tainted 5.17.0-rc5swiotlb #9
> [   37.378219] Hardware name: Oracle Corporation ORACLE SERVER 
> E1-2c/ASY,Generic,SM,E1-2c, BIOS 49004900 12/23/2020
> [   37.378219] RIP: e030:init_iommu_one+0x248/0x2f0
> [   37.378219] Code: 48 89 43 68 48 85 c0 74 c4 be 00 20 00 00 48 89 df e8 ea 
> ee ff ff 48 89 43 78 48 85 c0 74 ae c6 83 98 00 00 00 00 48 8b 43 38 <48> 8b 
> 40 18 a8 01 74 07 83 8b a8 04 00 00 01 f6 83 a8 04 00 00 01
> [   37.378219] RSP: e02b:c9004044bd18 EFLAGS: 00010286
> [   37.378219] RAX: c9004288 RBX: 888107260800 RCX: 
> 
> [   37.378219] RDX: 8000 RSI: ea00041cab80 RDI: 
> 
> [   37.378219] RBP: c9004044bd38 R08: 0901 R09: 
> ea00041cab00
> [   37.378219] R10: 0002 R11:  R12: 
> c90040435008
> [   37.378219] R13: 0008 R14: efa0 R15: 
> 
> [   37.378219] FS:  () GS:88fef418() 
> knlGS:
> [   37.378219] CS:  e030 DS:  ES:  CR0: 80050033
> [   37.378219] CR2: c90042880018 CR3: 0260a000 CR4: 
> 00050660
> [   37.378219] Call Trace:
> [   37.378219]  
> [   37.378219]  early_amd_iommu_init+0x3c5/0x72d
> [   37.378219]  ? iommu_setup+0x284/0x284
> [   37.378219]  state_next+0x158/0x68f
> [   37.378219]  ? iommu_setup+0x284/0x284
> [   37.378219]  iommu_go_to_state+0x28/0x2d
> [   37.378219]  amd_iommu_init+0x15/0x4b
> [   37.378219]  ? iommu_setup+0x284/0x284
> [   37.378219]  pci_iommu_init+0x12/0x37
> [   37.378219]  do_one_initcall+0x48/0x210
> [   37.378219]  kernel_init_freeable+0x229/0x28c
> [   37.378219]  ? rest_init+0xe0/0xe0
> [   37.963966]  kernel_init+0x1a/0x130
> [   37.979415]  ret_from_fork+0x22/0x30
> [   37.991436]  
> [   37.999465] Modules linked in:
> [   38.007413] CR2: c90042880018
> [   38.019416] ---[ end trace  ]---
> [   38.023418] RIP: e030:init_iommu_one+0x248/0x2f0
> [   38.023418] Code: 48 89 43 68 48 85 c0 74 c4 be 00 20 00 00 48 89 df e8 ea 
> ee ff ff 48 89 43 78 48 85 c0 74 ae c6 83 98 00 00 00 00 48 8b 43 38 <48> 8b 
> 40 18 a8 01 74 07 83 8b a8 04 00 00 01 f6 83 a8 04 00 00 01
> [   38.023418] RSP: e02b:c9004044bd18 EFLAGS: 00010286
> [   38.023418] RAX: c9004288 RBX: 888107260800 RCX: 
> 
> [   38.155413] RDX: 8000 RSI: ea00041cab80 RDI: 
> 
> [   38.175965] Freeing initrd memory: 62640K
> [   38.155413] RBP: c9004044bd38 R08: 0901 R09: 
> ea00041cab00
> [   38.155413] R10: 0002 R11:  R12: 
> c90040435008
> [   38.155413] R13: 0008 R14: efa0 R15: 
> 
> [   38.155413] FS:  () GS:88fef418() 
> knlGS:
> [   38.287414] CS:  e030 DS:  ES:  CR0: 80050033
> [   38.309557] CR2: c90042880018 CR3: 0260a000 CR4: 
> 00050660
> [   38.332403] Kernel panic - not syncing: Fatal exception
> [   38.351414] Rebooting in 20 seconds..
>
>
>
> -boris
---end quoted text---

Re: [PATCH v2 02/13] tracing: Fix selftest config check for function graph start up test

2022-02-24 Thread Steven Rostedt

On Thu, 24 Feb 2022 15:13:12 +
Christophe Leroy  wrote:

> > But I'm working on a series to send to Linus. I can pick this patch up, as
> > it touches just my code.
> >   
> 
> That would be great, thanks.

It's in my queue and running through my tests, which take 7 to 13 hours to
complete (depending on the changes).

-- Steve

Re: [PATCH v7 05/14] sizes.h: Add SZ_1T macro

2022-02-24 Thread Lorenzo Pieralisi

On Fri, 21 Jan 2022 08:42:21 +, Christophe Leroy wrote:
> Today drivers/pci/controller/pci-xgene.c defines SZ_1T
> 
> Move it into linux/sizes.h so that it can be re-used elsewhere.
> 
> 

Applied to pci/misc, thanks!

[05/14] sizes.h: Add SZ_1T macro
https://git.kernel.org/lpieralisi/pci/c/0cc62aed37

Thanks,
Lorenzo

Re: [PATCH v2 02/13] tracing: Fix selftest config check for function graph start up test

2022-02-24 Thread Christophe Leroy



Le 24/02/2022 à 15:53, Steven Rostedt a écrit :
> On Thu, 24 Feb 2022 13:43:02 +
> Christophe Leroy  wrote:
> 
>> Hi Michael,
>>
>> Le 20/12/2021 à 17:38, Christophe Leroy a écrit :
>>> CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS is required to test
>>> direct tramp.
>>>
>>> Signed-off-by: Christophe Leroy 
>>
>> You didn't apply this patch when you merged the series. Without it I get
>> the following :
> 
> Maybe they wanted my acked-by.
> 
> But I'm working on a series to send to Linus. I can pick this patch up, as
> it touches just my code.
> 

That would be great, thanks.

Christophe

Re: [PATCH v2 02/13] tracing: Fix selftest config check for function graph start up test

2022-02-24 Thread Steven Rostedt

On Thu, 24 Feb 2022 13:43:02 +
Christophe Leroy  wrote:

> Hi Michael,
> 
> Le 20/12/2021 à 17:38, Christophe Leroy a écrit :
> > CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS is required to test
> > direct tramp.
> > 
> > Signed-off-by: Christophe Leroy   
> 
> You didn't apply this patch when you merged the series. Without it I get 
> the following :

Maybe they wanted my acked-by.

But I'm working on a series to send to Linus. I can pick this patch up, as
it touches just my code.

-- Steve

Re: [PATCH v2 02/13] tracing: Fix selftest config check for function graph start up test

2022-02-24 Thread Christophe Leroy

Hi Michael,

Le 20/12/2021 à 17:38, Christophe Leroy a écrit :
> CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS is required to test
> direct tramp.
> 
> Signed-off-by: Christophe Leroy 

You didn't apply this patch when you merged the series. Without it I get 
the following :

[6.191287] Testing ftrace recursion: PASSED
[6.473308] Testing ftrace recursion safe: PASSED
[6.755759] Testing ftrace regs: PASSED
[7.037994] Testing tracer nop: PASSED
[7.042256] Testing tracer function_graph: FAILED!
[   12.216112] [ cut here ]
[   12.220436] WARNING: CPU: 0 PID: 1 at kernel/trace/trace.c:1953 
run_tracer_selftest+0x138/0x1b4
[   12.229045] CPU: 0 PID: 1 Comm: swapper Not tainted 
5.17.0-rc2-s3k-dev-02096-g28b040bd2357 #1030
[   12.237735] NIP:  c00d01b4 LR: c00d01b4 CTR: c03d37fc
[   12.242724] REGS: c902bd90 TRAP: 0700   Not tainted 
(5.17.0-rc2-s3k-dev-02096-g28b040bd2357)
[   12.251157] MSR:  00029032   CR: 28000242  XER: 
[   12.257870]
[   12.257870] GPR00: c00d01b4 c902be50 c214 0007 c108d224 
0001 c11ed2e8 c108d340
[   12.257870] GPR08: 3fffbfff  c129beac 3fffc000 22000244 
 c0004b78 
[   12.257870] GPR16:      
  c1039020
[   12.257870] GPR24: c12d c1000144 c1223c48 c12b53c4 c12b55dc 
c1293118 fdf4 c1223c38
[   12.293843] NIP [c00d01b4] run_tracer_selftest+0x138/0x1b4
[   12.299265] LR [c00d01b4] run_tracer_selftest+0x138/0x1b4
[   12.304603] Call Trace:
[   12.307012] [c902be50] [c00d01b4] run_tracer_selftest+0x138/0x1b4 
(unreliable)
[   12.314155] [c902be70] [c100cf44] register_tracer+0x14c/0x218
[   12.319835] [c902be90] [c10011a0] do_one_initcall+0x8c/0x17c
[   12.325430] [c902bef0] [c10014c0] kernel_init_freeable+0x1a8/0x254
[   12.331540] [c902bf20] [c0004ba8] kernel_init+0x30/0x150
[   12.336789] [c902bf30] [c001222c] ret_from_kernel_thread+0x5c/0x64
[   12.342902] Instruction dump:
[   12.345828] 4bf9a135 813d0030 7fc4f378 7d2903a6 7fa3eb78 4e800421 
7c7e1b79 939f0f60
[   12.353657] 41820014 3c60c08a 3863644c 4bf9a109 <0fe0> 387f00b0 
4bff76bd 893d0052
[   12.361659] ---[ end trace  ]---


With the patch I get:

[6.191286] Testing ftrace recursion: PASSED
[6.473307] Testing ftrace recursion safe: PASSED
[6.755758] Testing ftrace regs: PASSED
[7.037993] Testing tracer nop: PASSED
[7.042255] Testing tracer function_graph: PASSED

Is this patch going to be merged via another tree ?

Thanks
Christophe


> ---
>   kernel/trace/trace_selftest.c | 6 ++
>   1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
> index afd937a46496..abcadbe933bb 100644
> --- a/kernel/trace/trace_selftest.c
> +++ b/kernel/trace/trace_selftest.c
> @@ -784,9 +784,7 @@ static struct fgraph_ops fgraph_ops __initdata  = {
>   .retfunc= _graph_return,
>   };
>   
> -#if defined(CONFIG_DYNAMIC_FTRACE) && \
> -defined(CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS)
> -#define TEST_DIRECT_TRAMP
> +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
>   noinline __noclone static void trace_direct_tramp(void) { }
>   #endif
>   
> @@ -849,7 +847,7 @@ trace_selftest_startup_function_graph(struct tracer 
> *trace,
>   goto out;
>   }
>   
> -#ifdef TEST_DIRECT_TRAMP
> +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
>   tracing_reset_online_cpus(>array_buffer);
>   set_graph_array(tr);
>

Re: [PATCH] powerpc/32: Clear volatile regs on syscall exit

2022-02-24 Thread Segher Boessenkool

Hi!

On Thu, Feb 24, 2022 at 09:29:55AM +0100, Gabriel Paubert wrote:
> On Wed, Feb 23, 2022 at 05:27:39PM -0600, Segher Boessenkool wrote:
> > On Wed, Feb 23, 2022 at 09:48:09PM +0100, Gabriel Paubert wrote:
> > > On Wed, Feb 23, 2022 at 06:11:36PM +0100, Christophe Leroy wrote:
> > > > +   /* Zero volatile regs that may contain sensitive kernel data */
> > > > +   li  r0,0
> > > > +   li  r4,0
> > > > +   li  r5,0
> > > > +   li  r6,0
> > > > +   li  r7,0
> > > > +   li  r8,0
> > > > +   li  r9,0
> > > > +   li  r10,0
> > > > +   li  r11,0
> > > > +   li  r12,0
> > > > +   mtctr   r0
> > > > +   mtxer   r0
> > > 
> > > Here, I'm almost sure that on some processors, it would be better to
> > > separate mtctr form mtxer. mtxer is typically very expensive (pipeline
> > > flush) but I don't know what's the best ordering for the average core.
> > 
> > mtxer is cheaper than mtctr on many cores :-)
> 
> We're speaking of 32 bit here I believe;

32-bit userland, yes.  Which runs fine on non-ancient cores, too.

> on my (admittedly old) paper
> copy of PowerPC 604 user's manual, I read in a footnote:
> 
> "The mtspr (XER) instruction causes instructions to be flushed when it
> executes." 

And the 604 has a trivial depth pipeline anyway.

> I know there are probably very few 604 left in the field, but in this
> case mtspr(xer) looks very much like a superset of isync.

It hasn't been like that for decades.  On the 750 mtxer was execution
synchronised only already, for example.

> I also just had a look at the documentation of a more widespread core:
> 
> https://www.nxp.com/docs/en/reference-manual/MPC7450UM.pdf
> 
> and mtspr(xer) is marked as execution and refetch serialized, actually
> it is the only instruction to have both.

This looks like a late addition (it messes up the table, for example,
being put after "mtspr (other)").  It also is different from 7400 and
750 and everything else.  A late bugfix?  Curious :-)

> Maybe there is a subtle difference between "refetch serialization" and
> "pipeline flush", but in this case please educate me.

There is a subtle difference, but it goes the other way: refetch
serialisation doesn't stop fetch / flush everything after it, only when
the instruction completes it rejects everything after it.  So it can
waste a bit more :-)

> Besides that the back to back mtctr/mtspr(xer) may limit instruction
> decoding and issuing bandwidth.

It doesn't limit decode or dispatch (not issue fwiw) bandwidth on any
core I have ever heard of.

> I'd rather move one of them up by a few
> lines since they can only go to one of the execution units on some
> (or even most?) cores. This was my main point initially.

I think it is much more beneficial to *not* do these insns than to
shift them back and forth a cycle.

Segher

Re: [PATCH 2/3] powerpc: fix build errors

2022-02-24 Thread Michael Ellerman

Hi Anders,

Thanks for these, just a few comments below ...

Anders Roxell  writes:
> Building tinyconfig with gcc (Debian 11.2.0-16) and assembler (Debian
> 2.37.90.20220207) the following build error shows up:
>
>  {standard input}: Assembler messages:
>  {standard input}:1190: Error: unrecognized opcode: `stbcix'
>  {standard input}:1433: Error: unrecognized opcode: `lwzcix'
>  {standard input}:1453: Error: unrecognized opcode: `stbcix'
>  {standard input}:1460: Error: unrecognized opcode: `stwcix'
>  {standard input}:1596: Error: unrecognized opcode: `stbcix'
>  ...
>
> Rework to add assembler directives [1] around the instruction. Going
> through the them one by one shows that the changes should be safe.  Like
> __get_user_atomic_128_aligned() is only called in p9_hmi_special_emu(),
> which according to the name is specific to power9.  And __raw_rm_read*()
> are only called in things that are powernv or book3s_hv specific.
>
> [1] 
> https://sourceware.org/binutils/docs/as/PowerPC_002dPseudo.html#PowerPC_002dPseudo
>
> Cc: 
> Co-developed-by: Arnd Bergmann 
> Signed-off-by: Arnd Bergmann 
> Signed-off-by: Anders Roxell 
> ---
>  arch/powerpc/include/asm/io.h| 46 +++-
>  arch/powerpc/include/asm/uaccess.h   |  3 ++
>  arch/powerpc/platforms/powernv/rng.c |  6 +++-
>  3 files changed, 46 insertions(+), 9 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
> index beba4979bff9..5ff6dec489f8 100644
> --- a/arch/powerpc/include/asm/io.h
> +++ b/arch/powerpc/include/asm/io.h
> @@ -359,25 +359,37 @@ static inline void __raw_writeq_be(unsigned long v, 
> volatile void __iomem *addr)
>   */
>  static inline void __raw_rm_writeb(u8 val, volatile void __iomem *paddr)
>  {
> - __asm__ __volatile__("stbcix %0,0,%1"
> + __asm__ __volatile__(".machine \"push\"\n"
> +  ".machine \"power6\"\n"
> +  "stbcix %0,0,%1\n"
> +  ".machine \"pop\"\n"
>   : : "r" (val), "r" (paddr) : "memory");

As Segher said it'd be cleaner without the embedded quotes.

> @@ -441,7 +465,10 @@ static inline unsigned int name(unsigned int port)   
> \
>   unsigned int x; \
>   __asm__ __volatile__(   \
>   "sync\n"\
> + ".machine \"push\"\n"   \
> + ".machine \"power6\"\n" \
>   "0:"op "%0,0,%1\n"  \
> + ".machine \"pop\"\n"\
>   "1: twi 0,%0,0\n"   \
>   "2: isync\n"\
>   "3: nop\n"  \
> @@ -465,7 +492,10 @@ static inline void name(unsigned int val, unsigned int 
> port) \
>  {\
>   __asm__ __volatile__(   \
>   "sync\n"\
> + ".machine \"push\"\n"   \
> + ".machine \"power6\"\n" \
>   "0:" op " %0,0,%1\n"\
> + ".machine \"pop\"\n"\
>   "1: sync\n" \
>   "2:\n"  \
>   EX_TABLE(0b, 2b)\

It's not visible from the diff, but the above two are __do_in_asm and
__do_out_asm and are inside an ifdef CONFIG_PPC32.

AFAICS they're only used for:

__do_in_asm(_rec_inb, "lbzx")
__do_in_asm(_rec_inw, "lhbrx")
__do_in_asm(_rec_inl, "lwbrx")
__do_out_asm(_rec_outb, "stbx")
__do_out_asm(_rec_outw, "sthbrx")
__do_out_asm(_rec_outl, "stwbrx")

Which are all old instructions, so I don't think we need the machine
power6 for those two macros?

> diff --git a/arch/powerpc/platforms/powernv/rng.c 
> b/arch/powerpc/platforms/powernv/rng.c
> index b4386714494a..5bf30ef6d928 100644
> --- a/arch/powerpc/platforms/powernv/rng.c
> +++ b/arch/powerpc/platforms/powernv/rng.c
> @@ -43,7 +43,11 @@ static unsigned long rng_whiten(struct powernv_rng *rng, 
> unsigned long val)
>   unsigned long parity;
>  
>   /* Calculate the parity of the value */
> - asm ("popcntd %0,%1" : "=r" (parity) : "r" (val));
> + asm (".machine \"push\"\n"
> +  ".machine \"power7\"\n"
> +  "popcntd %0,%1\n"
> +  ".machine \"pop\"\n"
> +  : "=r" (parity) : "r" (val));

This was actually present in an older CPU, but it doesn't really matter,
this is fine.

cheers

Re: [PATCH 2/3] powerpc: fix build errors

2022-02-24 Thread Nicholas Piggin

Excerpts from Arnd Bergmann's message of February 24, 2022 8:20 pm:
> On Thu, Feb 24, 2022 at 11:11 AM Nicholas Piggin  wrote:
>> Excerpts from Arnd Bergmann's message of February 24, 2022 6:55 pm:
>> > On Thu, Feb 24, 2022 at 6:05 AM Nicholas Piggin  wrote:
>> > We had the same thing on Arm a few years ago when binutils
>> > started enforcing this more strictly, and it does catch actual
>> > bugs. I think annotating individual inline asm statements is
>> > the best choice here, as that documents what the intention is.
>>
>> A few cases where there are differences in privileged instructions
>> (that won't be compiler generated), that will be done anyway.
>>
>> For new instructions added to the ISA though? I think it's ugly and
>> unecesaary. There is no ambiguity about the intention when you see
>> a lharx instruction is there?
>>
>> It would delinate instructions that can't be used on all processors
>> but I don't see  much advantage there, it's not an exhaustive check
>> because we have other restrictions on instructions in the kernel
>> environment. And why would inline asm be special but not the rest
>> of the asm? Would you propose to put these .machine directives
>> everywhere in thousands of lines of asm code in the kernel? I
>> don't know that it's an improvement. And inline asm is a small
>> fraction of instructions.
> 
> Most of the code is fine, as we tend to only build .S files that
> are for the given target CPU,

That's not true on powerpc at least. grep FTR_SECTION.

Not all of them are different ISA, but it's more than just the
CPU_FTR_ARCH ones which only started about POWER7.

> the explicit .machine directives are
> only needed when you have a file that mixes instructions for
> incompatible machines, using a runtime detection.

Right. There are .S files are in that category. And a lot of
it for inline and .S we probably skirt entirely due to using raw 
instruction encoding because of old toolchains (which gets no error 
checking at all) which we really should tidy up and trim.

> 
>> Right that should be caught if you just pass -m architecture
>> to the assembler that does not include the mtpmr. 32-bit is a lot more
>> complicated than 64s like this though, so it's pssible in some cases
>> you will want more checking and -m + some .machine directives
>> will work better.
>>
>> Once you add the .machine directive to your inline asm though, you lose
>> *all* such static checking for the instruction. So it's really not a
>> panacea and has its own downsides.
> 
> Again, there should be a minimum number of those .machine directives
> in inline asm as well, which tends to work out fine as long as the
> entire kernel is built with the correct -march= option for the minimum
> supported CPU, and stays away from inline asm that requires a higher
> CPU level.

There's really no advantage to them, and they're ugly and annoying
and if we applied the concept consistently for all asm they would grow 
to a very large number.

The idea they'll give you good static checking just doesn't really
pan out.

Thanks,
Nick

Re: [PATCH 05/11] swiotlb: pass a gfp_mask argument to swiotlb_init_late

2022-02-24 Thread Anshuman Khandual




On 2/22/22 9:05 PM, Christoph Hellwig wrote:
> Let the caller chose a zone to allocate from.

This is being used later via xen_swiotlb_gfp() on arm platform.

> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/x86/pci/sta2x11-fixup.c | 2 +-
>  include/linux/swiotlb.h  | 2 +-
>  kernel/dma/swiotlb.c | 4 ++--
>  3 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/pci/sta2x11-fixup.c b/arch/x86/pci/sta2x11-fixup.c
> index e0c039a75b2db..c7e6faf59a861 100644
> --- a/arch/x86/pci/sta2x11-fixup.c
> +++ b/arch/x86/pci/sta2x11-fixup.c
> @@ -57,7 +57,7 @@ static void sta2x11_new_instance(struct pci_dev *pdev)
>   int size = STA2X11_SWIOTLB_SIZE;
>   /* First instance: register your own swiotlb area */
>   dev_info(>dev, "Using SWIOTLB (size %i)\n", size);
> - if (swiotlb_init_late(size))
> + if (swiotlb_init_late(size, GFP_DMA))
>   dev_emerg(>dev, "init swiotlb failed\n");
>   }
>   list_add(>list, _instance_list);
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index b48b26bfa0edb..1befd6b2ccf5e 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -40,7 +40,7 @@ extern void swiotlb_init(int verbose);
>  int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
>  unsigned long swiotlb_size_or_default(void);
>  extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
> -int swiotlb_init_late(size_t size);
> +int swiotlb_init_late(size_t size, gfp_t gfp_mask);
>  extern void __init swiotlb_update_mem_attributes(void);
>  
>  phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t phys,
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 5f64b02fbb732..a653fcf1fe6c2 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -290,7 +290,7 @@ swiotlb_init(int verbose)
>   * initialize the swiotlb later using the slab allocator if needed.
>   * This should be just like above, but with some error catching.
>   */
> -int swiotlb_init_late(size_t size)
> +int swiotlb_init_late(size_t size, gfp_t gfp_mask)
>  {
>   unsigned long nslabs = ALIGN(size >> IO_TLB_SHIFT, IO_TLB_SEGSIZE);
>   unsigned long bytes;
> @@ -309,7 +309,7 @@ int swiotlb_init_late(size_t size)
>   bytes = nslabs << IO_TLB_SHIFT;
>  
>   while ((SLABS_PER_PAGE << order) > IO_TLB_MIN_SLABS) {
> - vstart = (void *)__get_free_pages(GFP_DMA | __GFP_NOWARN,
> + vstart = (void *)__get_free_pages(gfp_mask | __GFP_NOWARN,
> order);
>   if (vstart)
>   break;
> 

Reviewed-by: Anshuman Khandual

Re: [PATCH 04/11] swiotlb: rename swiotlb_late_init_with_default_size

2022-02-24 Thread Anshuman Khandual




On 2/22/22 9:05 PM, Christoph Hellwig wrote:
> swiotlb_late_init_with_default_size is an overly verbose name that
> doesn't even catch what the function is doing, given that the size is
> not just a default but the actual requested size.
> 
> Rename it to swiotlb_init_late.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/x86/pci/sta2x11-fixup.c | 2 +-
>  include/linux/swiotlb.h  | 2 +-
>  kernel/dma/swiotlb.c | 6 ++
>  3 files changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/pci/sta2x11-fixup.c b/arch/x86/pci/sta2x11-fixup.c
> index 101081ad64b6d..e0c039a75b2db 100644
> --- a/arch/x86/pci/sta2x11-fixup.c
> +++ b/arch/x86/pci/sta2x11-fixup.c
> @@ -57,7 +57,7 @@ static void sta2x11_new_instance(struct pci_dev *pdev)
>   int size = STA2X11_SWIOTLB_SIZE;
>   /* First instance: register your own swiotlb area */
>   dev_info(>dev, "Using SWIOTLB (size %i)\n", size);
> - if (swiotlb_late_init_with_default_size(size))
> + if (swiotlb_init_late(size))
>   dev_emerg(>dev, "init swiotlb failed\n");
>   }
>   list_add(>list, _instance_list);
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 9fb3a568f0c51..b48b26bfa0edb 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -40,7 +40,7 @@ extern void swiotlb_init(int verbose);
>  int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
>  unsigned long swiotlb_size_or_default(void);
>  extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
> -extern int swiotlb_late_init_with_default_size(size_t default_size);
> +int swiotlb_init_late(size_t size);
>  extern void __init swiotlb_update_mem_attributes(void);
>  
>  phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t phys,
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 519e363097190..5f64b02fbb732 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -290,11 +290,9 @@ swiotlb_init(int verbose)
>   * initialize the swiotlb later using the slab allocator if needed.
>   * This should be just like above, but with some error catching.
>   */
> -int
> -swiotlb_late_init_with_default_size(size_t default_size)
> +int swiotlb_init_late(size_t size)
>  {
> - unsigned long nslabs =
> - ALIGN(default_size >> IO_TLB_SHIFT, IO_TLB_SEGSIZE);
> + unsigned long nslabs = ALIGN(size >> IO_TLB_SHIFT, IO_TLB_SEGSIZE);
>   unsigned long bytes;
>   unsigned char *vstart = NULL;
>   unsigned int order;
> 

Reviewed-by: Anshuman Khandual

Re: [PATCH 03/11] swiotlb: simplify swiotlb_max_segment

2022-02-24 Thread Anshuman Khandual




On 2/22/22 9:05 PM, Christoph Hellwig wrote:
> Remove the bogus Xen override that was usually larger than the actual
> size and just calculate the value on demand.  Note that
> swiotlb_max_segment still doesn't make sense as an interface and should
> eventually be removed.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/xen/swiotlb-xen.c |  2 --
>  include/linux/swiotlb.h   |  1 -
>  kernel/dma/swiotlb.c  | 20 +++-
>  3 files changed, 3 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> index 47aebd98f52f5..485cd06ed39e7 100644
> --- a/drivers/xen/swiotlb-xen.c
> +++ b/drivers/xen/swiotlb-xen.c
> @@ -202,7 +202,6 @@ int xen_swiotlb_init(void)
>   rc = swiotlb_late_init_with_tbl(start, nslabs);
>   if (rc)
>   return rc;
> - swiotlb_set_max_segment(PAGE_SIZE);
>   return 0;
>  error:
>   if (nslabs > 1024 && repeat--) {
> @@ -254,7 +253,6 @@ void __init xen_swiotlb_init_early(void)
>  
>   if (swiotlb_init_with_tbl(start, nslabs, true))
>   panic("Cannot allocate SWIOTLB buffer");
> - swiotlb_set_max_segment(PAGE_SIZE);
>  }
>  #endif /* CONFIG_X86 */
>  
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index f6c3638255d54..9fb3a568f0c51 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -164,7 +164,6 @@ static inline void swiotlb_adjust_size(unsigned long size)
>  #endif /* CONFIG_SWIOTLB */
>  
>  extern void swiotlb_print_info(void);
> -extern void swiotlb_set_max_segment(unsigned int);
>  
>  #ifdef CONFIG_DMA_RESTRICTED_POOL
>  struct page *swiotlb_alloc(struct device *dev, size_t size);
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 36fbf1181d285..519e363097190 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -75,12 +75,6 @@ struct io_tlb_mem io_tlb_default_mem;
>  
>  phys_addr_t swiotlb_unencrypted_base;
>  
> -/*
> - * Max segment that we can provide which (if pages are contingous) will
> - * not be bounced (unless SWIOTLB_FORCE is set).
> - */
> -static unsigned int max_segment;
> -
>  static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT;
>  
>  static int __init
> @@ -104,18 +98,12 @@ early_param("swiotlb", setup_io_tlb_npages);
>  
>  unsigned int swiotlb_max_segment(void)
>  {
> - return io_tlb_default_mem.nslabs ? max_segment : 0;
> + if (!io_tlb_default_mem.nslabs)
> + return 0;
> + return rounddown(io_tlb_default_mem.nslabs << IO_TLB_SHIFT, PAGE_SIZE);
>  }
>  EXPORT_SYMBOL_GPL(swiotlb_max_segment);
>  
> -void swiotlb_set_max_segment(unsigned int val)
> -{
> - if (swiotlb_force == SWIOTLB_FORCE)
> - max_segment = 1;
> - else
> - max_segment = rounddown(val, PAGE_SIZE);
> -}
> -
>  unsigned long swiotlb_size_or_default(void)
>  {
>   return default_nslabs << IO_TLB_SHIFT;
> @@ -267,7 +255,6 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long 
> nslabs, int verbose)
>  
>   if (verbose)
>   swiotlb_print_info();
> - swiotlb_set_max_segment(mem->nslabs << IO_TLB_SHIFT);
>   return 0;
>  }
>  
> @@ -368,7 +355,6 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long 
> nslabs)
>   swiotlb_init_io_tlb_mem(mem, virt_to_phys(tlb), nslabs, true);
>  
>   swiotlb_print_info();
> - swiotlb_set_max_segment(mem->nslabs << IO_TLB_SHIFT);
>   return 0;
>  }
>  

Reviewed-by: Anshuman Khandual

Re: [PATCH V6 11/20] riscv: compat: syscall: Add compat_sys_call_table implementation

2022-02-24 Thread Guo Ren

On Thu, Feb 24, 2022 at 5:38 PM Arnd Bergmann  wrote:
>
> On Thu, Feb 24, 2022 at 9:54 AM  wrote:
> >
> > From: Guo Ren 
> >
> > Implement compat sys_call_table and some system call functions:
> > truncate64, ftruncate64, fallocate, pread64, pwrite64,
> > sync_file_range, readahead, fadvise64_64 which need argument
> > translation.
> >
> > Signed-off-by: Guo Ren 
> > Signed-off-by: Guo Ren 
> > Cc: Arnd Bergmann 
> > Cc: Palmer Dabbelt 
>
> Here, I was hoping you'd convert some of the other architectures to use
> the same code, but the changes you did do look correct.
>
> Please at least add the missing bit for big-endian architectures here:
>
> +#if !defined(compat_arg_u64) && !defined(CONFIG_CPU_BIG_ENDIAN)
> +#define compat_arg_u64(name)   u32  name##_lo, u32  name##_hi
> +#define compat_arg_u64_dual(name)  u32, name##_lo, u32, name##_hi
> +#define compat_arg_u64_glue(name)  (((u64)name##_hi << 32) | \
> +((u64)name##_lo & 0xUL))
> +#endif
>
> with the lo/hi words swapped. With that change:
Got it, I would change it in next version of patch.

>
> Reviewed-by: Arnd Bergmann 



-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/

Re: [PATCH 02/11] swiotlb: make swiotlb_exit a no-op if SWIOTLB_FORCE is set

2022-02-24 Thread Anshuman Khandual




On 2/22/22 9:05 PM, Christoph Hellwig wrote:
> If force bouncing is enabled we can't release the bufffers.

typo

> 
> Signed-off-by: Christoph Hellwig 
> ---
>  kernel/dma/swiotlb.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index f1e7ea160b433..36fbf1181d285 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -378,6 +378,9 @@ void __init swiotlb_exit(void)
>   unsigned long tbl_vaddr;
>   size_t tbl_size, slots_size;
>  
> + if (swiotlb_force == SWIOTLB_FORCE)
> + return;
> +
>   if (!mem->nslabs)
>   return;
>  
> 

Reviewed-by: Anshuman Khandual

Re: [PATCH 01/11] dma-direct: use is_swiotlb_active in dma_direct_map_page

2022-02-24 Thread Anshuman Khandual




On 2/22/22 9:05 PM, Christoph Hellwig wrote:
> Use the more specific is_swiotlb_active check instead of checking the
> global swiotlb_force variable.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  kernel/dma/direct.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
> index 4632b0f4f72eb..4dc16e08c7e1a 100644
> --- a/kernel/dma/direct.h
> +++ b/kernel/dma/direct.h
> @@ -91,7 +91,7 @@ static inline dma_addr_t dma_direct_map_page(struct device 
> *dev,
>   return swiotlb_map(dev, phys, size, dir, attrs);
>  
>   if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
> - if (swiotlb_force != SWIOTLB_NO_FORCE)
> + if (is_swiotlb_active(dev))
>   return swiotlb_map(dev, phys, size, dir, attrs);
>  
>   dev_WARN_ONCE(dev, 1,
> 

Reviewed-by: Anshuman Khandual

Re: [PATCH 2/3] powerpc: fix build errors

2022-02-24 Thread Arnd Bergmann

On Thu, Feb 24, 2022 at 11:11 AM Nicholas Piggin  wrote:
> Excerpts from Arnd Bergmann's message of February 24, 2022 6:55 pm:
> > On Thu, Feb 24, 2022 at 6:05 AM Nicholas Piggin  wrote:
> > We had the same thing on Arm a few years ago when binutils
> > started enforcing this more strictly, and it does catch actual
> > bugs. I think annotating individual inline asm statements is
> > the best choice here, as that documents what the intention is.
>
> A few cases where there are differences in privileged instructions
> (that won't be compiler generated), that will be done anyway.
>
> For new instructions added to the ISA though? I think it's ugly and
> unecesaary. There is no ambiguity about the intention when you see
> a lharx instruction is there?
>
> It would delinate instructions that can't be used on all processors
> but I don't see  much advantage there, it's not an exhaustive check
> because we have other restrictions on instructions in the kernel
> environment. And why would inline asm be special but not the rest
> of the asm? Would you propose to put these .machine directives
> everywhere in thousands of lines of asm code in the kernel? I
> don't know that it's an improvement. And inline asm is a small
> fraction of instructions.

Most of the code is fine, as we tend to only build .S files that
are for the given target CPU, the explicit .machine directives are
only needed when you have a file that mixes instructions for
incompatible machines, using a runtime detection.

> Right that should be caught if you just pass -m architecture
> to the assembler that does not include the mtpmr. 32-bit is a lot more
> complicated than 64s like this though, so it's pssible in some cases
> you will want more checking and -m + some .machine directives
> will work better.
>
> Once you add the .machine directive to your inline asm though, you lose
> *all* such static checking for the instruction. So it's really not a
> panacea and has its own downsides.

Again, there should be a minimum number of those .machine directives
in inline asm as well, which tends to work out fine as long as the
entire kernel is built with the correct -march= option for the minimum
supported CPU, and stays away from inline asm that requires a higher
CPU level.

  Arnd

Re: [PATCH 07/11] x86: remove the IOMMU table infrastructure

2022-02-24 Thread Anshuman Khandual



On 2/22/22 9:05 PM, Christoph Hellwig wrote:
> The IOMMU table tries to separate the different IOMMUs into different
> backends, but actually requires various cross calls.
> 
> Rewrite the code to do the generic swiotlb/swiotlb-xen setup directly
> in pci-dma.c and then just call into the IOMMU drivers.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/ia64/include/asm/iommu_table.h|   7 --
>  arch/x86/include/asm/dma-mapping.h |   1 -
>  arch/x86/include/asm/gart.h|   5 +-
>  arch/x86/include/asm/iommu.h   |   6 ++
>  arch/x86/include/asm/iommu_table.h | 102 --
>  arch/x86/include/asm/swiotlb.h |  30 ---
>  arch/x86/include/asm/xen/swiotlb-xen.h |   2 -
>  arch/x86/kernel/Makefile   |   2 -
>  arch/x86/kernel/amd_gart_64.c  |   5 +-
>  arch/x86/kernel/aperture_64.c  |  14 ++--
>  arch/x86/kernel/pci-dma.c  | 112 -
>  arch/x86/kernel/pci-iommu_table.c  |  77 -
>  arch/x86/kernel/pci-swiotlb.c  |  77 -
>  arch/x86/kernel/tboot.c|   1 -
>  arch/x86/kernel/vmlinux.lds.S  |  12 ---
>  arch/x86/xen/Makefile  |   2 -
>  arch/x86/xen/pci-swiotlb-xen.c |  96 -
>  drivers/iommu/amd/init.c   |   6 --
>  drivers/iommu/amd/iommu.c  |   5 +-
>  drivers/iommu/intel/dmar.c |   6 +-
>  include/linux/dmar.h   |   6 +-
>  21 files changed, 115 insertions(+), 459 deletions(-)
>  delete mode 100644 arch/ia64/include/asm/iommu_table.h
>  delete mode 100644 arch/x86/include/asm/iommu_table.h
>  delete mode 100644 arch/x86/include/asm/swiotlb.h
>  delete mode 100644 arch/x86/kernel/pci-iommu_table.c
>  delete mode 100644 arch/x86/kernel/pci-swiotlb.c
>  delete mode 100644 arch/x86/xen/pci-swiotlb-xen.c

checkpatch.pl has some warnings here.

WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#44: 
deleted file mode 100644

WARNING: Prefer [subsystem eg: netdev]_info([subsystem]dev, ... then 
dev_info(dev, ... then pr_info(...  to printk(KERN_INFO ...
#496: FILE: arch/x86/kernel/pci-dma.c:171:
+   printk(KERN_INFO "PCI-DMA: "

WARNING: quoted string split across lines
#497: FILE: arch/x86/kernel/pci-dma.c:172:
+   printk(KERN_INFO "PCI-DMA: "
+  "Using software bounce buffering for IO (SWIOTLB)\n");

ERROR: trailing whitespace
#881: FILE: drivers/iommu/amd/iommu.c:1837:
+^Iif (iommu_default_passthrough() || sme_me_mask) $

total: 1 errors, 3 warnings, 389 lines checked

Re: [PATCH 10/11] swiotlb: merge swiotlb-xen initialization into swiotlb

2022-02-24 Thread Anshuman Khandual

On 2/22/22 9:05 PM, Christoph Hellwig wrote:
> Allow to pass a remap argument to the swiotlb initialization functions
> to handle the Xen/x86 remap case.  ARM/ARM64 never did any remapping
> from xen_swiotlb_fixup, so we don't even need that quirk.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/arm/xen/mm.c   |  23 +++---
>  arch/x86/include/asm/xen/page.h |   5 --
>  arch/x86/kernel/pci-dma.c   |  27 ---
>  arch/x86/pci/sta2x11-fixup.c|   2 +-
>  drivers/xen/swiotlb-xen.c   | 128 +---
>  include/linux/swiotlb.h |   7 +-
>  include/xen/arm/page.h  |   1 -
>  include/xen/swiotlb-xen.h   |   8 +-
>  kernel/dma/swiotlb.c| 120 +++---
>  9 files changed, 102 insertions(+), 219 deletions(-)

checkpatch.pl has some warnings here.

ERROR: trailing whitespace
#151: FILE: arch/x86/kernel/pci-dma.c:217:
+ $

WARNING: please, no spaces at the start of a line
#151: FILE: arch/x86/kernel/pci-dma.c:217:
+ $

total: 1 errors, 1 warnings, 470 lines checked

Re: [PATCH V6 16/20] riscv: compat: vdso: Add rv32 VDSO base code implementation

2022-02-24 Thread Arnd Bergmann

On Thu, Feb 24, 2022 at 9:54 AM  wrote:
>
> From: Guo Ren 
>
> There is no vgettimeofday supported in rv32 that makes simple to
> generate rv32 vdso code which only needs riscv64 compiler. Other
> architectures need change compiler or -m (machine parameter) to
> support vdso32 compiling. If rv32 support vgettimeofday (which
> cause C compile) in future, we would add CROSS_COMPILE to support
> that makes more requirement on compiler enviornment.

I think it's just a bug that rv32 doesn't have the vdso version of the
time syscalls. Fixing that is of course independent of the compat support,
but I think you need that anyway, and it would be better to start
out by building the compat vdso with the correct
architecture level.

At least this should be a lot easier than on arch/arm64 because you
can assume that an rv64 compiler is able to also build rv32 output.

Arnd

Re: [PATCH 2/3] powerpc: fix build errors

2022-02-24 Thread Nicholas Piggin

Excerpts from Arnd Bergmann's message of February 24, 2022 6:55 pm:
> On Thu, Feb 24, 2022 at 6:05 AM Nicholas Piggin  wrote:
>> Excerpts from Nicholas Piggin's message of February 24, 2022 12:54 pm:
>> >
>> > Not sure on the outlook for GCC fix. Either way unfortunately we have
>> > toolchains in the wild now that will explode, so we might have to take
>> > your patches for the time being.
>>
>> Perhaps not... Here's a hack that seems to work around the problem.
>>
>> The issue of removing -many from the kernel and replacing it with
>> appropriate architecture versions is an orthogonal one (that we
>> should do). Either way this hack should be able to allow us to do
>> that as well, on these problem toolchains.
>>
>> But for now it just uses -many as the trivial regression fix to get
>> back to previous behaviour.
> 
> I don't think the previous behavior is what you want to be honest.

-many isn't good but that's what we're using and that is still
what we're using upstream on any other toolchain that doesn't
have these issues. Including the next binutils version that will
ignore the initial .machine directive for 64s.

Neither of these approaches solves that. At least for 64s that
is passing -Wa,-many down already. (Although Anders' series
gets almost there).

So this is the minimal fix that brings the toolchians in to line
with others and behaves how it previously did and fixes immediate
build regressions. Removing -many is somewhat independent of that.

> We had the same thing on Arm a few years ago when binutils
> started enforcing this more strictly, and it does catch actual
> bugs. I think annotating individual inline asm statements is
> the best choice here, as that documents what the intention is.

A few cases where there are differences in privileged instructions
(that won't be compiler generated), that will be done anyway.

For new instructions added to the ISA though? I think it's ugly and
unecesaary. There is no ambiguity about the intention when you see
a lharx instruction is there?

It would delinate instructions that can't be used on all processors
but I don't see  much advantage there, it's not an exhaustive check
because we have other restrictions on instructions in the kernel
environment. And why would inline asm be special but not the rest
of the asm? Would you propose to put these .machine directives
everywhere in thousands of lines of asm code in the kernel? I
don't know that it's an improvement. And inline asm is a small
fraction of instructions.

> 
> There is one more bug in this series that I looked at with Anders, but
> he did not send a patch for that so far:
> 
> static void dummy_perf(struct pt_regs *regs)
> {
> #if defined(CONFIG_FSL_EMB_PERFMON)
> mtpmr(PMRN_PMGC0, mfpmr(PMRN_PMGC0) & ~PMGC0_PMIE);
> #elif defined(CONFIG_PPC64) || defined(CONFIG_PPC_BOOK3S_32)
> if (cur_cpu_spec->pmc_type == PPC_PMC_IBM)
> mtspr(SPRN_MMCR0, mfspr(SPRN_MMCR0) & 
> ~(MMCR0_PMXE|MMCR0_PMAO));
> #else
> mtspr(SPRN_MMCR0, mfspr(SPRN_MMCR0) & ~MMCR0_PMXE);
> #endif
> }
> 
> Here, the assembler correctly flags the mtpmr/mfpmr as an invalid
> instruction for a combined 6xx kernel: As far as I can tell, these are
> only available on e300 but not the others, and instead of the compile-time
> check for CONFIG_FSL_EMB_PERFMON, there needs to be some
> runtime check to use the first method on 83xx but the #elif one on
> the other 6xx machines.

Right that should be caught if you just pass -m architecture
to the assembler that does not include the mtpmr. 32-bit is a lot more
complicated than 64s like this though, so it's pssible in some cases
you will want more checking and -m + some .machine directives
will work better.

Once you add the .machine directive to your inline asm though, you lose
*all* such static checking for the instruction. So it's really not a
panacea and has its own downsides.

Thanks,
Nick

Re: [PATCH V6 11/20] riscv: compat: syscall: Add compat_sys_call_table implementation

2022-02-24 Thread Arnd Bergmann

On Thu, Feb 24, 2022 at 9:54 AM  wrote:
>
> From: Guo Ren 
>
> Implement compat sys_call_table and some system call functions:
> truncate64, ftruncate64, fallocate, pread64, pwrite64,
> sync_file_range, readahead, fadvise64_64 which need argument
> translation.
>
> Signed-off-by: Guo Ren 
> Signed-off-by: Guo Ren 
> Cc: Arnd Bergmann 
> Cc: Palmer Dabbelt 

Here, I was hoping you'd convert some of the other architectures to use
the same code, but the changes you did do look correct.

Please at least add the missing bit for big-endian architectures here:

+#if !defined(compat_arg_u64) && !defined(CONFIG_CPU_BIG_ENDIAN)
+#define compat_arg_u64(name)   u32  name##_lo, u32  name##_hi
+#define compat_arg_u64_dual(name)  u32, name##_lo, u32, name##_hi
+#define compat_arg_u64_glue(name)  (((u64)name##_hi << 32) | \
+((u64)name##_lo & 0xUL))
+#endif

with the lo/hi words swapped. With that change:

Reviewed-by: Arnd Bergmann

Re: [PATCH v4 0/3] KVM: PPC: Book3S PR: Fixes for AIL and SCV

2022-02-24 Thread Christian Borntraeger





Am 23.02.22 um 12:47 schrieb Nicholas Piggin:

Excerpts from Christian Borntraeger's message of February 23, 2022 7:14 pm:



Am 22.02.22 um 15:11 schrieb Paolo Bonzini:

On 2/22/22 07:47, Nicholas Piggin wrote:

Patch 3 requires a KVM_CAP_PPC number allocated. QEMU maintainers are
happy with it (link in changelog) just waiting on KVM upstreaming. Do
you have objections to the series going to ppc/kvm tree first, or
another option is you could take patch 3 alone first (it's relatively
independent of the other 2) and ppc/kvm gets it from you?


Hi Nick,

I have pushed a topic branch kvm-cap-ppc-210 to kvm.git with just the 
definition and documentation of the capability.  ppc/kvm can apply your patch 
based on it (and drop the relevant parts of patch 3).  I'll send it to Linus 
this week.


We to have be careful with the 210 cap that was merged from the s390 tree.


Ah thanks, I didn't notice it.

Using 211 is no problem for me, merge will have a conflict now though.
We could avoid it by just sending my patch in a second batch instead of
doing the topic branch this time (I still like the idea of a topic
branch for caps for future).


Paolo,

the power people have not used your branch yet. So you could - as an 
alternative also
create an kvm-cap-ppc-211 branch for 5.17 and leave the s390 cap at 210. But it 
would
be good to do something now so that we have final numbers for the caps. Either 
create
a kvm-cap-ppc-211 branch, or merge the kvm-cap-ppc-210 branch into next and 
fixup the
s390 cap to become 211.

[PATCH kernel v3] powerpc/64: Add UADDR64 relocation support

2022-02-24 Thread Alexey Kardashevskiy

When ld detects unaligned relocations, it emits R_PPC64_UADDR64
relocations instead of R_PPC64_RELATIVE. Currently R_PPC64_UADDR64 are
detected by arch/powerpc/tools/relocs_check.sh and expected not to work.
Below is a simple chunk to trigger this behaviour (this disables
optimization for the demonstration purposes only, this also happens with
-O1/-O2 when CONFIG_PRINTK_INDEX=y, for example):

\#pragma GCC push_options
\#pragma GCC optimize ("O0")
struct entry {
const char *file;
int line;
} __attribute__((packed));
static const struct entry e1 = { .file = __FILE__, .line = __LINE__ };
static const struct entry e2 = { .file = __FILE__, .line = __LINE__ };
...
prom_printf("e1=%s %lx %lx\n", e1.file, (unsigned long) e1.file, mfmsr());
prom_printf("e2=%s %lx\n", e2.file, (unsigned long) e2.file);
\#pragma GCC pop_options


This adds support for UADDR64 for 64bit. This reuses __dynamic_symtab
from the 32bit which supports more relocation types already.

Because RELACOUNT includes only R_PPC64_RELATIVE, this replaces it with
RELASZ which is the size of all relocation records.

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v3:
* named some labels

v2:
* replaced RELACOUNT with RELASZ/RELAENT
* removed FIXME

---

Tested via qemu gdb stub (the kernel is loaded at 0x40).

Disasm:

c1a804d0 :
c1a804d0:   b0 04 a8 01 .long 0x1a804b0
c1a804d0: R_PPC64_RELATIVE  
*ABS*-0x3e57fb50
c1a804d4:   00 00 00 c0 lfs f0,0(0)
c1a804d8:   fa 08 00 00 .long 0x8fa

c1a804dc :
...
c1a804dc: R_PPC64_UADDR64   .rodata+0x4b0

Before relocation:
>>> p *(unsigned long *) 0x1e804d0
$1 = 0xc1a804b0
>>> p *(unsigned long *) 0x1e804dc
$2 = 0x0

After relocation in __boot_from_prom:
>>> p *(unsigned long *) 0x1e804d0
$1 = 0x1e804b0
>>> p *(unsigned long *) 0x1e804dc
$2 = 0x1e804b0

After relocation in __after_prom_start:
>>> p *(unsigned long *) 0x1e804d0
$1 = 0xc1a804b0
>>> p *(unsigned long *) 0x1e804dc
$2 = 0xc1a804b0
>>>
---
 arch/powerpc/kernel/reloc_64.S | 67 +-
 arch/powerpc/kernel/vmlinux.lds.S  |  2 -
 arch/powerpc/tools/relocs_check.sh |  7 +---
 3 files changed, 48 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/kernel/reloc_64.S b/arch/powerpc/kernel/reloc_64.S
index 02d4719bf43a..4a8eccbaebb4 100644
--- a/arch/powerpc/kernel/reloc_64.S
+++ b/arch/powerpc/kernel/reloc_64.S
@@ -8,8 +8,10 @@
 #include 
 
 RELA = 7
-RELACOUNT = 0x6ff9
+RELASZ = 8
+RELAENT = 9
 R_PPC64_RELATIVE = 22
+R_PPC64_UADDR64 = 43
 
 /*
  * r3 = desired final address of kernel
@@ -25,29 +27,38 @@ _GLOBAL(relocate)
add r9,r9,r12   /* r9 has runtime addr of .rela.dyn section */
ld  r10,(p_st - 0b)(r12)
add r10,r10,r12 /* r10 has runtime addr of _stext */
+   ld  r13,(p_sym - 0b)(r12)
+   add r13,r13,r12 /* r13 has runtime addr of .dynsym */
 
/*
-* Scan the dynamic section for the RELA and RELACOUNT entries.
+* Scan the dynamic section for the RELA, RELASZ and RELAENT entries.
 */
li  r7,0
li  r8,0
-1: ld  r6,0(r11)   /* get tag */
+.Ltags:
+   ld  r6,0(r11)   /* get tag */
cmpdi   r6,0
-   beq 4f  /* end of list */
+   beq .Lend_of_list   /* end of list */
cmpdi   r6,RELA
bne 2f
ld  r7,8(r11)   /* get RELA pointer in r7 */
-   b   3f
-2: addis   r6,r6,(-RELACOUNT)@ha
-   cmpdi   r6,RELACOUNT@l
+   b   4f
+2: cmpdi   r6,RELASZ
bne 3f
-   ld  r8,8(r11)   /* get RELACOUNT value in r8 */
-3: addir11,r11,16
-   b   1b
-4: cmpdi   r7,0/* check we have both RELA and RELACOUNT */
+   ld  r8,8(r11)   /* get RELASZ value in r8 */
+   b   4f
+3: cmpdi   r6,RELAENT
+   bne 4f
+   ld  r12,8(r11)  /* get RELAENT value in r12 */
+4: addir11,r11,16
+   b   .Ltags
+.Lend_of_list:
+   cmpdi   r7,0/* check we have RELA, RELASZ, RELAENT */
cmpdi   cr1,r8,0
-   beq 6f
-   beq cr1,6f
+   beq .Lout
+   beq cr1,.Lout
+   cmpdi   r12,0
+   beq .Lout
 
/*
 * Work out linktime address of _stext and hence the
@@ -62,23 +73,39 @@ _GLOBAL(relocate)
 
/*
 * Run through the list of relocations and process the
-* R_PPC64_RELATIVE ones.
+* R_PPC64_RELATIVE and R_PPC64_UADDR64 ones.
 */
+   divdr8,r8,r12   /* RELASZ / RELAENT */
mtctr   r8
-5: ld  r0,8(9) /* ELF64_R_TYPE(reloc->r_info) */
+.Lrelocations:
+   lwa r0,8(r9)/* ELF64_R_TYPE(reloc->r_info) */
cmpdi   r0,R_PPC64_RELATIVE
-   bne 6f
+

Re: [PATCH v3 3/4] powerpc/pseries/vas: Add VAS migration handler

2022-02-24 Thread Haren Myneni

On Wed, 2022-02-23 at 20:03 +1000, Nicholas Piggin wrote:
> Excerpts from Haren Myneni's message of February 20, 2022 6:06 am:
> > Since the VAS windows belong to the VAS hardware resource, the
> > hypervisor expects the partition to close them on source partition
> > and reopen them after the partition migrated on the destination
> > machine.
> > 
> > This handler is called before pseries_suspend() to close these
> > windows and again invoked after migration. All active windows
> > for both default and QoS types will be closed and mark them
> > in-active and reopened after migration with this handler.
> > During the migration, the user space receives paste instruction
> > failure if it issues copy/paste on these in-active windows.
> > 
> > Signed-off-by: Haren Myneni 
> > ---
> >  arch/powerpc/platforms/pseries/mobility.c |  5 ++
> >  arch/powerpc/platforms/pseries/vas.c  | 86
> > +++
> >  arch/powerpc/platforms/pseries/vas.h  |  6 ++
> >  3 files changed, 97 insertions(+)
> > 
> > diff --git a/arch/powerpc/platforms/pseries/mobility.c
> > b/arch/powerpc/platforms/pseries/mobility.c
> > index 85033f392c78..70004243e25e 100644
> > --- a/arch/powerpc/platforms/pseries/mobility.c
> > +++ b/arch/powerpc/platforms/pseries/mobility.c
> > @@ -26,6 +26,7 @@
> >  #include 
> >  #include 
> >  #include "pseries.h"
> > +#include "vas.h"   /* vas_migration_handler() */
> >  #include "../../kernel/cacheinfo.h"
> >  
> >  static struct kobject *mobility_kobj;
> > @@ -669,12 +670,16 @@ static int pseries_migrate_partition(u64
> > handle)
> > if (ret)
> > return ret;
> >  
> > +   vas_migration_handler(VAS_SUSPEND);
> 
> Not sure if there is much point having a "handler" like this that
> only
> takes two operations. vas_migration_begin()/vas_migration_end() is
> better isn't it?

The actual suspend / resume framework will be added later. So using the
VAS_SUSPEND/VAS_RESUME right now, but will be removed later after
having the permanent fix. 

> 
> Other question is why can't the suspend handler return error and
> handle
> it here?

We can, but has to call pseries_cancel_migration() if VAS suspend
handler returns failure. We should expect this failure only from
H_DEALLOCATE_VAS_WINDOW and H_QUERY_VAS_CAPABILITIES HCALLs wich should
not happen generally.

> 
> > +
> > ret = pseries_suspend(handle);
> > if (ret == 0)
> > post_mobility_fixup();
> > else
> > pseries_cancel_migration(handle, ret);
> >  
> > +   vas_migration_handler(VAS_RESUME);
> > +
> > return ret;
> >  }
> >  
> > diff --git a/arch/powerpc/platforms/pseries/vas.c
> > b/arch/powerpc/platforms/pseries/vas.c
> > index fbcf311da0ec..df22827969db 100644
> > --- a/arch/powerpc/platforms/pseries/vas.c
> > +++ b/arch/powerpc/platforms/pseries/vas.c
> > @@ -869,6 +869,92 @@ static struct notifier_block pseries_vas_nb =
> > {
> > .notifier_call = pseries_vas_notifier,
> >  };
> >  
> > +/*
> > + * For LPM, all windows have to be closed on the source partition
> > + * before migration and reopen them on the destination partition
> > + * after migration. So closing windows during suspend and
> > + * reopen them during resume.
> > + */
> > +int vas_migration_handler(int action)
> > +{
> > +   struct vas_cop_feat_caps *caps;
> > +   int old_nr_creds, new_nr_creds = 0;
> > +   struct vas_caps *vcaps;
> > +   int i, rc = 0;
> > +
> > +   /*
> > +* NX-GZIP is not enabled. Nothing to do for migration.
> > +*/
> > +   if (!copypaste_feat)
> > +   return rc;
> > +
> > +   mutex_lock(_pseries_mutex);
> > +
> > +   for (i = 0; i < VAS_MAX_FEAT_TYPE; i++) {
> > +   vcaps = [i];
> > +   caps = >caps;
> > +   old_nr_creds = atomic_read(>nr_total_credits);
> > +
> > +   rc = h_query_vas_capabilities(H_QUERY_VAS_CAPABILITIES,
> > + vcaps->feat,
> > + (u64)virt_to_phys(_cop
> > _caps));
> > +   if (!rc) {
> > +   new_nr_creds =
> > be16_to_cpu(hv_cop_caps.target_lpar_creds);
> > +   /*
> > +* Should not happen. But incase print
> > messages, close
> > +* all windows in the list during suspend and
> > reopen
> > +* windows based on new lpar_creds on the
> > destination
> > +* system.
> > +*/
> > +   if (old_nr_creds != new_nr_creds) {
> > +   pr_err("state(%d): lpar creds: %d HV
> > lpar creds: %d\n",
> > +   action, old_nr_creds,
> > new_nr_creds);
> > +   pr_err("Used creds: %d, Active creds:
> > %d\n",
> > +   atomic_read(
> > >nr_used_credits),
> > +   vcaps->nr_open_windows - vcaps-
> > >nr_close_wins);
> 
> Error messages should have some vague use to the administrator

Re: [PATCH V6 19/20] riscv: compat: ptrace: Add compat_arch_ptrace implement

2022-02-24 Thread Arnd Bergmann

On Thu, Feb 24, 2022 at 9:54 AM  wrote:
> From: Guo Ren 

>
> Signed-off-by: Guo Ren 
> Signed-off-by: Guo Ren 
> Reviewed-by: Palmer Dabbelt 

Reviewed-by: Arnd Bergmann

Re: [PATCH] powerpc/32: Clear volatile regs on syscall exit

2022-02-24 Thread Segher Boessenkool

On Wed, Feb 23, 2022 at 09:48:09PM +0100, Gabriel Paubert wrote:
> On Wed, Feb 23, 2022 at 06:11:36PM +0100, Christophe Leroy wrote:
> > +   /* Zero volatile regs that may contain sensitive kernel data */
> > +   li  r0,0
> > +   li  r4,0
> > +   li  r5,0
> > +   li  r6,0
> > +   li  r7,0
> > +   li  r8,0
> > +   li  r9,0
> > +   li  r10,0
> > +   li  r11,0
> > +   li  r12,0
> > +   mtctr   r0
> > +   mtxer   r0
> 
> Here, I'm almost sure that on some processors, it would be better to
> separate mtctr form mtxer. mtxer is typically very expensive (pipeline
> flush) but I don't know what's the best ordering for the average core.

mtxer is cheaper than mtctr on many cores :-)

On p9 mtxer is cracked into two latency 3 ops (which run in parallel).
While mtctr has latency 5.

On p8 mtxer was horrible indeed (but nothing near as bad as a pipeline
flush).


Segher

Re: [PATCH 00/16] Remove usage of the deprecated "pci-dma-compat.h" API

2022-02-24 Thread Christoph Hellwig

On Wed, Feb 23, 2022 at 09:26:56PM +0100, Christophe JAILLET wrote:
> Patch 01, 04, 05, 06, 08, 09 have not reached -next yet.
> They all still apply cleanly.
> 
> 04 has been picked it up for inclusion in the media subsystem for 5.18.
> The other ones all have 1 or more Reviewed-by:/Acked-by: tags.
> 
> Patch 16 must be resubmitted to add "#include " in
> order not to break builds.

So how about this:  I'll pick up 1, 5,6,8 and 9 for the dma-mapping
tree.  After -rc1 when presumably all other patches have reached
mainline your resubmit one with the added include and we finish this
off?

Thanks a lot for all your work already!

RE: [PATCH v2] usercopy: Check valid lifetime via stack depth

2022-02-24 Thread David Laight

From: Kees Cook
> Sent: 24 February 2022 06:04
> 
> Under CONFIG_HARDENED_USERCOPY=y, when exact stack frame boundary checking
> is not available (i.e. everything except x86 with FRAME_POINTER), check
> a stack object as being at least "current depth valid", in the sense
> that any object within the stack region but not between start-of-stack
> and current_stack_pointer should be considered unavailable (i.e. its
> lifetime is from a call no longer present on the stack).
> 
...
> diff --git a/mm/usercopy.c b/mm/usercopy.c
> index d0d268135d96..5d28725af95f 100644
> --- a/mm/usercopy.c
> +++ b/mm/usercopy.c
> @@ -22,6 +22,30 @@
>  #include 
>  #include "slab.h"
> 
> +/*
> + * Only called if obj is within stack/stackend bounds. Determine if within
> + * current stack depth.
> + */
> +static inline int check_stack_object_depth(const void *obj,
> +unsigned long len)
> +{
> +#ifdef CONFIG_ARCH_HAS_CURRENT_STACK_POINTER
> +#ifndef CONFIG_STACK_GROWSUP

Pointless negation

> + const void * const high = stackend;
> + const void * const low = (void *)current_stack_pointer;
> +#else
> + const void * const high = (void *)current_stack_pointer;
> + const void * const low = stack;
> +#endif
> +
> + /* Reject: object not within current stack depth. */
> + if (obj < low || high < obj + len)
> + return BAD_STACK;
> +
> +#endif
> + return GOOD_STACK;
> +}

If the comment at the top of the function is correct then
only a single test for the correct end of the buffer against
the current stack pointer is needed.
Something like:
#ifdef CONFIG_STACK_GROWSUP
if ((void *)current_stack_pointer < obj + len)
return BAD_STACK;
#else
if (obj < (void *)current_stack_pointer)
return BAD_STACK;
#endif
return GOOD_STACK;

Although it may depend on exactly where the stack pointer
points to - especially for GROWSUP.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)

[PATCH V6 20/20] riscv: compat: Add COMPAT Kbuild skeletal support

2022-02-24 Thread guoren

From: Guo Ren 

Adds initial skeletal COMPAT Kbuild (Running 32bit U-mode on
64bit S-mode) support.
 - Setup kconfig & dummy functions for compiling.
 - Implement compat_start_thread by the way.

Signed-off-by: Guo Ren 
Signed-off-by: Guo Ren 
Reviewed-by: Arnd Bergmann 
Cc: Palmer Dabbelt 
---
 arch/riscv/Kconfig | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 5adcbd9b5e88..6f11df8c189f 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -73,6 +73,7 @@ config RISCV
select HAVE_ARCH_KGDB if !XIP_KERNEL
select HAVE_ARCH_KGDB_QXFER_PKT
select HAVE_ARCH_MMAP_RND_BITS if MMU
+   select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
select HAVE_ARCH_SECCOMP_FILTER
select HAVE_ARCH_TRACEHOOK
select HAVE_ARCH_TRANSPARENT_HUGEPAGE if 64BIT && MMU
@@ -123,12 +124,18 @@ config ARCH_MMAP_RND_BITS_MIN
default 18 if 64BIT
default 8
 
+config ARCH_MMAP_RND_COMPAT_BITS_MIN
+   default 8
+
 # max bits determined by the following formula:
 #  VA_BITS - PAGE_SHIFT - 3
 config ARCH_MMAP_RND_BITS_MAX
default 24 if 64BIT # SV39 based
default 17
 
+config ARCH_MMAP_RND_COMPAT_BITS_MAX
+   default 17
+
 # set if we run in machine mode, cleared if we run in supervisor mode
 config RISCV_M_MODE
bool
@@ -406,6 +413,18 @@ config CRASH_DUMP
 
  For more details see Documentation/admin-guide/kdump/kdump.rst
 
+config COMPAT
+   bool "Kernel support for 32-bit U-mode"
+   default 64BIT
+   depends on 64BIT && MMU
+   help
+ This option enables support for a 32-bit U-mode running under a 64-bit
+ kernel at S-mode. riscv32-specific components such as system calls,
+ the user helper functions (vdso), signal rt_frame functions and the
+ ptrace interface are handled appropriately by the kernel.
+
+ If you want to execute 32-bit userspace applications, say Y.
+
 endmenu
 
 menu "Boot options"
-- 
2.25.1

[PATCH V6 19/20] riscv: compat: ptrace: Add compat_arch_ptrace implement

2022-02-24 Thread guoren

From: Guo Ren 

Now, you can use native gdb on riscv64 for rv32 app debugging.

$ uname -a
Linux buildroot 5.16.0-rc4-00036-gbef6b82fdf23-dirty #53 SMP Mon Dec 20 
23:06:53 CST 2021 riscv64 GNU/Linux
$ cat /proc/cpuinfo
processor   : 0
hart: 0
isa : rv64imafdcsuh
mmu : sv48

$ file /bin/busybox
/bin/busybox: setuid ELF 32-bit LSB shared object, UCB RISC-V, version 1 
(SYSV), dynamically linked, interpreter /lib/ld-linux-riscv32-ilp32d.so.1, for 
GNU/Linux 5.15.0, stripped
$ file /usr/bin/gdb
/usr/bin/gdb: ELF 32-bit LSB shared object, UCB RISC-V, version 1 (GNU/Linux), 
dynamically linked, interpreter /lib/ld-linux-riscv32-ilp32d.so.1, for 
GNU/Linux 5.15.0, stripped
$ /usr/bin/gdb /bin/busybox
GNU gdb (GDB) 10.2
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
...
Reading symbols from /bin/busybox...
(No debugging symbols found in /bin/busybox)
(gdb) b main
Breakpoint 1 at 0x8ddc
(gdb) r
Starting program: /bin/busybox
Failed to read a valid object file image from memory.

Breakpoint 1, 0x555a8ddc in main ()
(gdb) i r
ra 0x77df0b74   0x77df0b74
sp 0x7fdd3d10   0x7fdd3d10
gp 0x5567e800   0x5567e800 
tp 0x77f64280   0x77f64280
t0 0x0  0
t1 0x555a6fac   1431990188
t2 0x77dd8db4   2011008436
fp 0x7fdd3e34   0x7fdd3e34
s1 0x7fdd3e34   2145205812
a0 0x   -1
a1 0x2000   8192
a2 0x7fdd3e3c   2145205820
a3 0x0  0
a4 0x7fdd3d30   2145205552
a5 0x555a8dc0   1431997888
a6 0x77f2c170   2012397936
a7 0x6a7c7a2f   1786542639
s2 0x0  0
s3 0x0  0
s4 0x555a8dc0   1431997888
s5 0x77f8a3a8   2012783528
s6 0x7fdd3e3c   2145205820
s7 0x5567cecc   1432866508
--Type  for more, q to quit, c to continue without paging--
s8 0x1  1
s9 0x0  0
s100x55634448   1432568904
s110x0  0
t3 0x77df0bb8   2011106232
t4 0x42fc   17148
t5 0x0  0
t6 0x40 64
pc 0x555a8ddc   0x555a8ddc 
(gdb) si
0x555a78f0 in mallopt@plt ()
(gdb) c
Continuing.
BusyBox v1.34.1 (2021-12-19 22:39:48 CST) multi-call binary.
BusyBox is copyrighted by many authors between 1998-2015.
Licensed under GPLv2. See source distribution for detailed
copyright notices.

Usage: busybox [function [arguments]...]
   or: busybox --list[-full]
...
[Inferior 1 (process 107) exited normally]
(gdb) q

Signed-off-by: Guo Ren 
Signed-off-by: Guo Ren 
Reviewed-by: Palmer Dabbelt 
Cc: Arnd Bergmann 
---
 arch/riscv/kernel/ptrace.c | 87 +++---
 1 file changed, 82 insertions(+), 5 deletions(-)

diff --git a/arch/riscv/kernel/ptrace.c b/arch/riscv/kernel/ptrace.c
index a89243730153..bb387593a121 100644
--- a/arch/riscv/kernel/ptrace.c
+++ b/arch/riscv/kernel/ptrace.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -111,11 +112,6 @@ static const struct user_regset_view 
riscv_user_native_view = {
.n = ARRAY_SIZE(riscv_user_regset),
 };
 
-const struct user_regset_view *task_user_regset_view(struct task_struct *task)
-{
-   return _user_native_view;
-}
-
 struct pt_regs_offset {
const char *name;
int offset;
@@ -273,3 +269,84 @@ __visible void do_syscall_trace_exit(struct pt_regs *regs)
trace_sys_exit(regs, regs_return_value(regs));
 #endif
 }
+
+#ifdef CONFIG_COMPAT
+static int compat_riscv_gpr_get(struct task_struct *target,
+   const struct user_regset *regset,
+   struct membuf to)
+{
+   struct compat_user_regs_struct cregs;
+
+   regs_to_cregs(, task_pt_regs(target));
+
+   return membuf_write(, ,
+   sizeof(struct compat_user_regs_struct));
+}
+
+static int compat_riscv_gpr_set(struct task_struct *target,
+   const struct user_regset *regset,
+   unsigned int pos, unsigned int count,
+   const void *kbuf, const void __user *ubuf)
+{
+   int ret;
+   struct compat_user_regs_struct cregs;
+
+   ret = user_regset_copyin(, , , , , 0, -1);
+
+   cregs_to_regs(, task_pt_regs(target));
+
+   return ret;
+}
+
+static const struct user_regset compat_riscv_user_regset[] = {
+   [REGSET_X] = {
+   .core_note_type = NT_PRSTATUS,
+   .n = ELF_NGREG,
+   .size = sizeof(compat_elf_greg_t),
+   .align = sizeof(compat_elf_greg_t),
+   .regset_get = compat_riscv_gpr_get,
+   .set =

[PATCH V6 18/20] riscv: compat: signal: Add rt_frame implementation

2022-02-24 Thread guoren

From: Guo Ren 

Implement compat_setup_rt_frame for sigcontext save & restore. The
main process is the same with signal, but the rv32 pt_regs' size
is different from rv64's, so we needs convert them.

Signed-off-by: Guo Ren 
Signed-off-by: Guo Ren 
Reviewed-by: Palmer Dabbelt 
Cc: Arnd Bergmann 
---
 arch/riscv/kernel/Makefile|   1 +
 arch/riscv/kernel/compat_signal.c | 243 ++
 arch/riscv/kernel/signal.c|  13 +-
 3 files changed, 256 insertions(+), 1 deletion(-)
 create mode 100644 arch/riscv/kernel/compat_signal.c

diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index 88e79f481c21..a46f9807c59e 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -67,4 +67,5 @@ obj-$(CONFIG_JUMP_LABEL)  += jump_label.o
 
 obj-$(CONFIG_EFI)  += efi.o
 obj-$(CONFIG_COMPAT)   += compat_syscall_table.o
+obj-$(CONFIG_COMPAT)   += compat_signal.o
 obj-$(CONFIG_COMPAT)   += compat_vdso/
diff --git a/arch/riscv/kernel/compat_signal.c 
b/arch/riscv/kernel/compat_signal.c
new file mode 100644
index ..7041742ded08
--- /dev/null
+++ b/arch/riscv/kernel/compat_signal.c
@@ -0,0 +1,243 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#define COMPAT_DEBUG_SIG 0
+
+struct compat_sigcontext {
+   struct compat_user_regs_struct sc_regs;
+   union __riscv_fp_state sc_fpregs;
+};
+
+struct compat_ucontext {
+   compat_ulong_t  uc_flags;
+   struct compat_ucontext  *uc_link;
+   compat_stack_t  uc_stack;
+   sigset_tuc_sigmask;
+   /* There's some padding here to allow sigset_t to be expanded in the
+* future.  Though this is unlikely, other architectures put uc_sigmask
+* at the end of this structure and explicitly state it can be
+* expanded, so we didn't want to box ourselves in here. */
+   __u8  __unused[1024 / 8 - sizeof(sigset_t)];
+   /* We can't put uc_sigmask at the end of this structure because we need
+* to be able to expand sigcontext in the future.  For example, the
+* vector ISA extension will almost certainly add ISA state.  We want
+* to ensure all user-visible ISA state can be saved and restored via a
+* ucontext, so we're putting this at the end in order to allow for
+* infinite extensibility.  Since we know this will be extended and we
+* assume sigset_t won't be extended an extreme amount, we're
+* prioritizing this. */
+   struct compat_sigcontext uc_mcontext;
+};
+
+struct compat_rt_sigframe {
+   struct compat_siginfo info;
+   struct compat_ucontext uc;
+};
+
+#ifdef CONFIG_FPU
+static long compat_restore_fp_state(struct pt_regs *regs,
+   union __riscv_fp_state __user *sc_fpregs)
+{
+   long err;
+   struct __riscv_d_ext_state __user *state = _fpregs->d;
+   size_t i;
+
+   err = __copy_from_user(>thread.fstate, state, sizeof(*state));
+   if (unlikely(err))
+   return err;
+
+   fstate_restore(current, regs);
+
+   /* We support no other extension state at this time. */
+   for (i = 0; i < ARRAY_SIZE(sc_fpregs->q.reserved); i++) {
+   u32 value;
+
+   err = __get_user(value, _fpregs->q.reserved[i]);
+   if (unlikely(err))
+   break;
+   if (value != 0)
+   return -EINVAL;
+   }
+
+   return err;
+}
+
+static long compat_save_fp_state(struct pt_regs *regs,
+ union __riscv_fp_state __user *sc_fpregs)
+{
+   long err;
+   struct __riscv_d_ext_state __user *state = _fpregs->d;
+   size_t i;
+
+   fstate_save(current, regs);
+   err = __copy_to_user(state, >thread.fstate, sizeof(*state));
+   if (unlikely(err))
+   return err;
+
+   /* We support no other extension state at this time. */
+   for (i = 0; i < ARRAY_SIZE(sc_fpregs->q.reserved); i++) {
+   err = __put_user(0, _fpregs->q.reserved[i]);
+   if (unlikely(err))
+   break;
+   }
+
+   return err;
+}
+#else
+#define compat_save_fp_state(task, regs) (0)
+#define compat_restore_fp_state(task, regs) (0)
+#endif
+
+static long compat_restore_sigcontext(struct pt_regs *regs,
+   struct compat_sigcontext __user *sc)
+{
+   long err;
+   struct compat_user_regs_struct cregs;
+
+   /* sc_regs is structured the same as the start of pt_regs */
+   err = __copy_from_user(, >sc_regs, sizeof(sc->sc_regs));
+
+   cregs_to_regs(, regs);
+
+   /* Restore the floating-point state. */
+   if (has_fpu())
+   err |= compat_restore_fp_state(regs, >sc_fpregs);
+   return err;
+}
+
+COMPAT_SYSCALL_DEFINE0(rt_sigreturn)
+{
+   struct pt_regs *regs =

[PATCH V6 17/20] riscv: compat: vdso: Add setup additional pages implementation

2022-02-24 Thread guoren

From: Guo Ren 

Reconstruct __setup_additional_pages() by appending vdso info
pointer argument to meet compat_vdso_info requirement. And change
vm_special_mapping *dm, *cm initialization into static.

Signed-off-by: Guo Ren 
Signed-off-by: Guo Ren 
Reviewed-by: Palmer Dabbelt 
Cc: Arnd Bergmann 
---
 arch/riscv/include/asm/elf.h |   5 ++
 arch/riscv/include/asm/mmu.h |   1 +
 arch/riscv/kernel/vdso.c | 103 +++
 3 files changed, 74 insertions(+), 35 deletions(-)

diff --git a/arch/riscv/include/asm/elf.h b/arch/riscv/include/asm/elf.h
index 3a4293dc7229..d87d3bcc758d 100644
--- a/arch/riscv/include/asm/elf.h
+++ b/arch/riscv/include/asm/elf.h
@@ -134,5 +134,10 @@ do {if ((ex).e_ident[EI_CLASS] == ELFCLASS32)  
\
 typedef compat_ulong_t compat_elf_greg_t;
 typedef compat_elf_greg_t  compat_elf_gregset_t[ELF_NGREG];
 
+extern int compat_arch_setup_additional_pages(struct linux_binprm *bprm,
+ int uses_interp);
+#define compat_arch_setup_additional_pages \
+   compat_arch_setup_additional_pages
+
 #endif /* CONFIG_COMPAT */
 #endif /* _ASM_RISCV_ELF_H */
diff --git a/arch/riscv/include/asm/mmu.h b/arch/riscv/include/asm/mmu.h
index 0099dc116168..cedcf8ea3c76 100644
--- a/arch/riscv/include/asm/mmu.h
+++ b/arch/riscv/include/asm/mmu.h
@@ -16,6 +16,7 @@ typedef struct {
atomic_long_t id;
 #endif
void *vdso;
+   void *vdso_info;
 #ifdef CONFIG_SMP
/* A local icache flush is needed before user execution can resume. */
cpumask_t icache_stale_mask;
diff --git a/arch/riscv/kernel/vdso.c b/arch/riscv/kernel/vdso.c
index a9436a65161a..f864811aa011 100644
--- a/arch/riscv/kernel/vdso.c
+++ b/arch/riscv/kernel/vdso.c
@@ -23,6 +23,9 @@ struct vdso_data {
 #endif
 
 extern char vdso_start[], vdso_end[];
+#ifdef CONFIG_COMPAT
+extern char compat_vdso_start[], compat_vdso_end[];
+#endif
 
 enum vvar_pages {
VVAR_DATA_PAGE_OFFSET,
@@ -30,6 +33,11 @@ enum vvar_pages {
VVAR_NR_PAGES,
 };
 
+enum rv_vdso_map {
+   RV_VDSO_MAP_VVAR,
+   RV_VDSO_MAP_VDSO,
+};
+
 #define VVAR_SIZE  (VVAR_NR_PAGES << PAGE_SHIFT)
 
 /*
@@ -52,12 +60,6 @@ struct __vdso_info {
struct vm_special_mapping *cm;
 };
 
-static struct __vdso_info vdso_info __ro_after_init = {
-   .name = "vdso",
-   .vdso_code_start = vdso_start,
-   .vdso_code_end = vdso_end,
-};
-
 static int vdso_mremap(const struct vm_special_mapping *sm,
   struct vm_area_struct *new_vma)
 {
@@ -66,37 +68,33 @@ static int vdso_mremap(const struct vm_special_mapping *sm,
return 0;
 }
 
-static int __init __vdso_init(void)
+static void __init __vdso_init(struct __vdso_info *vdso_info)
 {
unsigned int i;
struct page **vdso_pagelist;
unsigned long pfn;
 
-   if (memcmp(vdso_info.vdso_code_start, "\177ELF", 4)) {
-   pr_err("vDSO is not a valid ELF object!\n");
-   return -EINVAL;
-   }
+   if (memcmp(vdso_info->vdso_code_start, "\177ELF", 4))
+   panic("vDSO is not a valid ELF object!\n");
 
-   vdso_info.vdso_pages = (
-   vdso_info.vdso_code_end -
-   vdso_info.vdso_code_start) >>
+   vdso_info->vdso_pages = (
+   vdso_info->vdso_code_end -
+   vdso_info->vdso_code_start) >>
PAGE_SHIFT;
 
-   vdso_pagelist = kcalloc(vdso_info.vdso_pages,
+   vdso_pagelist = kcalloc(vdso_info->vdso_pages,
sizeof(struct page *),
GFP_KERNEL);
if (vdso_pagelist == NULL)
-   return -ENOMEM;
+   panic("vDSO kcalloc failed!\n");
 
/* Grab the vDSO code pages. */
-   pfn = sym_to_pfn(vdso_info.vdso_code_start);
+   pfn = sym_to_pfn(vdso_info->vdso_code_start);
 
-   for (i = 0; i < vdso_info.vdso_pages; i++)
+   for (i = 0; i < vdso_info->vdso_pages; i++)
vdso_pagelist[i] = pfn_to_page(pfn + i);
 
-   vdso_info.cm->pages = vdso_pagelist;
-
-   return 0;
+   vdso_info->cm->pages = vdso_pagelist;
 }
 
 #ifdef CONFIG_TIME_NS
@@ -116,13 +114,14 @@ int vdso_join_timens(struct task_struct *task, struct 
time_namespace *ns)
 {
struct mm_struct *mm = task->mm;
struct vm_area_struct *vma;
+   struct __vdso_info *vdso_info = mm->context.vdso_info;
 
mmap_read_lock(mm);
 
for (vma = mm->mmap; vma; vma = vma->vm_next) {
unsigned long size = vma->vm_end - vma->vm_start;
 
-   if (vma_is_special_mapping(vma, vdso_info.dm))
+   if (vma_is_special_mapping(vma, vdso_info->dm))
zap_page_range(vma, vma->vm_start, size);
}
 
@@ -187,11 +186,6 @@ static vm_fault_t vvar_fault(const struct 
vm_special_mapping *sm,
return vmf_insert_pfn(vma, vmf->address, pfn);
 }
 
-enum

[PATCH V6 16/20] riscv: compat: vdso: Add rv32 VDSO base code implementation

2022-02-24 Thread guoren

From: Guo Ren 

There is no vgettimeofday supported in rv32 that makes simple to
generate rv32 vdso code which only needs riscv64 compiler. Other
architectures need change compiler or -m (machine parameter) to
support vdso32 compiling. If rv32 support vgettimeofday (which
cause C compile) in future, we would add CROSS_COMPILE to support
that makes more requirement on compiler enviornment.

linux-rv64/arch/riscv/kernel/compat_vdso/compat_vdso.so.dbg:
file format elf64-littleriscv

Disassembly of section .text:

0800 <__vdso_rt_sigreturn>:
 800:   08b00893li  a7,139
 804:   0073ecall
 808:   unimp
...

080c <__vdso_getcpu>:
 80c:   0a800893li  a7,168
 810:   0073ecall
 814:   8082ret
...

0818 <__vdso_flush_icache>:
 818:   10300893li  a7,259
 81c:   0073ecall
 820:   8082ret

linux-rv32/arch/riscv/kernel/vdso/vdso.so.dbg:
file format elf32-littleriscv

Disassembly of section .text:

0800 <__vdso_rt_sigreturn>:
 800:   08b00893li  a7,139
 804:   0073ecall
 808:   unimp
...

080c <__vdso_getcpu>:
 80c:   0a800893li  a7,168
 810:   0073ecall
 814:   8082ret
...

0818 <__vdso_flush_icache>:
 818:   10300893li  a7,259
 81c:   0073ecall
 820:   8082ret

Finally, reuse all *.S from vdso in compat_vdso that makes
implementation clear and readable.

Signed-off-by: Guo Ren 
Signed-off-by: Guo Ren 
Cc: Arnd Bergmann 
Cc: Palmer Dabbelt 
---
 arch/riscv/Makefile   |  5 ++
 arch/riscv/include/asm/vdso.h |  9 +++
 arch/riscv/kernel/Makefile|  1 +
 arch/riscv/kernel/compat_vdso/.gitignore  |  2 +
 arch/riscv/kernel/compat_vdso/Makefile| 68 +++
 arch/riscv/kernel/compat_vdso/compat_vdso.S   |  8 +++
 .../kernel/compat_vdso/compat_vdso.lds.S  |  3 +
 arch/riscv/kernel/compat_vdso/flush_icache.S  |  3 +
 .../compat_vdso/gen_compat_vdso_offsets.sh|  5 ++
 arch/riscv/kernel/compat_vdso/getcpu.S|  3 +
 arch/riscv/kernel/compat_vdso/note.S  |  3 +
 arch/riscv/kernel/compat_vdso/rt_sigreturn.S  |  3 +
 arch/riscv/kernel/vdso/vdso.S |  6 +-
 13 files changed, 118 insertions(+), 1 deletion(-)
 create mode 100644 arch/riscv/kernel/compat_vdso/.gitignore
 create mode 100644 arch/riscv/kernel/compat_vdso/Makefile
 create mode 100644 arch/riscv/kernel/compat_vdso/compat_vdso.S
 create mode 100644 arch/riscv/kernel/compat_vdso/compat_vdso.lds.S
 create mode 100644 arch/riscv/kernel/compat_vdso/flush_icache.S
 create mode 100755 arch/riscv/kernel/compat_vdso/gen_compat_vdso_offsets.sh
 create mode 100644 arch/riscv/kernel/compat_vdso/getcpu.S
 create mode 100644 arch/riscv/kernel/compat_vdso/note.S
 create mode 100644 arch/riscv/kernel/compat_vdso/rt_sigreturn.S

diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
index c6ca1b9cbf71..6a494029b8bd 100644
--- a/arch/riscv/Makefile
+++ b/arch/riscv/Makefile
@@ -112,12 +112,17 @@ libs-$(CONFIG_EFI_STUB) += 
$(objtree)/drivers/firmware/efi/libstub/lib.a
 PHONY += vdso_install
 vdso_install:
$(Q)$(MAKE) $(build)=arch/riscv/kernel/vdso $@
+   $(if $(CONFIG_COMPAT),$(Q)$(MAKE) \
+   $(build)=arch/riscv/kernel/compat_vdso $@)
 
 ifeq ($(KBUILD_EXTMOD),)
 ifeq ($(CONFIG_MMU),y)
 prepare: vdso_prepare
 vdso_prepare: prepare0
$(Q)$(MAKE) $(build)=arch/riscv/kernel/vdso 
include/generated/vdso-offsets.h
+   $(if $(CONFIG_COMPAT),$(Q)$(MAKE) \
+   $(build)=arch/riscv/kernel/compat_vdso 
include/generated/compat_vdso-offsets.h)
+
 endif
 endif
 
diff --git a/arch/riscv/include/asm/vdso.h b/arch/riscv/include/asm/vdso.h
index bc6f75f3a199..af981426fe0f 100644
--- a/arch/riscv/include/asm/vdso.h
+++ b/arch/riscv/include/asm/vdso.h
@@ -21,6 +21,15 @@
 
 #define VDSO_SYMBOL(base, name)
\
(void __user *)((unsigned long)(base) + __vdso_##name##_offset)
+
+#ifdef CONFIG_COMPAT
+#include 
+
+#define COMPAT_VDSO_SYMBOL(base, name) 
\
+   (void __user *)((unsigned long)(base) + compat__vdso_##name##_offset)
+
+#endif /* CONFIG_COMPAT */
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* CONFIG_MMU */
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index 954dc7043ad2..88e79f481c21 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -67,3 +67,4 @@ obj-$(CONFIG_JUMP_LABEL)  += jump_label.o
 
 obj-$(CONFIG_EFI)  += efi.o
 obj-$(CONFIG_COMPAT)   += compat_syscall_table.o
+obj-$(CONFIG_COMPAT)   += compat_vdso/
diff --git

[PATCH V6 15/20] riscv: compat: Add hw capability check for elf

2022-02-24 Thread guoren

From: Guo Ren 

Detect hardware COMPAT (32bit U-mode) capability in rv64. If not
support COMPAT mode in hw, compat_elf_check_arch would return
false by compat_binfmt_elf.c

Signed-off-by: Guo Ren 
Signed-off-by: Guo Ren 
Cc: Arnd Bergmann 
Cc: Christoph Hellwig 
---
 arch/riscv/include/asm/elf.h |  3 ++-
 arch/riscv/kernel/process.c  | 26 ++
 2 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/include/asm/elf.h b/arch/riscv/include/asm/elf.h
index aee40040917b..3a4293dc7229 100644
--- a/arch/riscv/include/asm/elf.h
+++ b/arch/riscv/include/asm/elf.h
@@ -40,7 +40,8 @@
  * elf64_hdr e_machine's offset are different. The checker is
  * a little bit simple compare to other architectures.
  */
-#define compat_elf_check_arch(x) ((x)->e_machine == EM_RISCV)
+extern bool compat_elf_check_arch(Elf32_Ehdr *hdr);
+#define compat_elf_check_arch  compat_elf_check_arch
 
 #define CORE_DUMP_USE_REGSET
 #define ELF_EXEC_PAGESIZE  (PAGE_SIZE)
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index 54787ca9806a..7bbe4dd95e85 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -83,6 +83,32 @@ void show_regs(struct pt_regs *regs)
dump_backtrace(regs, NULL, KERN_DEFAULT);
 }
 
+#ifdef CONFIG_COMPAT
+static bool compat_mode_supported __read_mostly;
+
+bool compat_elf_check_arch(Elf32_Ehdr *hdr)
+{
+   return compat_mode_supported && hdr->e_machine == EM_RISCV;
+}
+
+static int __init compat_mode_detect(void)
+{
+   unsigned long tmp = csr_read(CSR_STATUS);
+
+   csr_write(CSR_STATUS, (tmp & ~SR_UXL) | SR_UXL_32);
+   compat_mode_supported =
+   (csr_read(CSR_STATUS) & SR_UXL) == SR_UXL_32;
+
+   csr_write(CSR_STATUS, tmp);
+
+   pr_info("riscv: ELF compat mode %s",
+   compat_mode_supported ? "supported" : "failed");
+
+   return 0;
+}
+early_initcall(compat_mode_detect);
+#endif
+
 void start_thread(struct pt_regs *regs, unsigned long pc,
unsigned long sp)
 {
-- 
2.25.1

[PATCH V6 14/20] riscv: compat: Add elf.h implementation

2022-02-24 Thread guoren

From: Guo Ren 

Implement necessary type and macro for compat elf. See the code
comment for detail.

Signed-off-by: Guo Ren 
Signed-off-by: Guo Ren 
Reviewed-by: Arnd Bergmann 
---
 arch/riscv/include/asm/elf.h | 46 +++-
 1 file changed, 45 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/include/asm/elf.h b/arch/riscv/include/asm/elf.h
index f53c40026c7a..aee40040917b 100644
--- a/arch/riscv/include/asm/elf.h
+++ b/arch/riscv/include/asm/elf.h
@@ -8,6 +8,8 @@
 #ifndef _ASM_RISCV_ELF_H
 #define _ASM_RISCV_ELF_H
 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -18,11 +20,13 @@
  */
 #define ELF_ARCH   EM_RISCV
 
+#ifndef ELF_CLASS
 #ifdef CONFIG_64BIT
 #define ELF_CLASS  ELFCLASS64
 #else
 #define ELF_CLASS  ELFCLASS32
 #endif
+#endif
 
 #define ELF_DATA   ELFDATA2LSB
 
@@ -31,6 +35,13 @@
  */
 #define elf_check_arch(x) ((x)->e_machine == EM_RISCV)
 
+/*
+ * Use the same code with elf_check_arch, because elf32_hdr &
+ * elf64_hdr e_machine's offset are different. The checker is
+ * a little bit simple compare to other architectures.
+ */
+#define compat_elf_check_arch(x) ((x)->e_machine == EM_RISCV)
+
 #define CORE_DUMP_USE_REGSET
 #define ELF_EXEC_PAGESIZE  (PAGE_SIZE)
 
@@ -43,8 +54,14 @@
 #define ELF_ET_DYN_BASE((TASK_SIZE / 3) * 2)
 
 #ifdef CONFIG_64BIT
+#ifdef CONFIG_COMPAT
+#define STACK_RND_MASK (test_thread_flag(TIF_32BIT) ? \
+0x7ff >> (PAGE_SHIFT - 12) : \
+0x3 >> (PAGE_SHIFT - 12))
+#else
 #define STACK_RND_MASK (0x3 >> (PAGE_SHIFT - 12))
 #endif
+#endif
 /*
  * This yields a mask that user programs can use to figure out what
  * instruction set this CPU supports.  This could be done in user space,
@@ -60,11 +77,19 @@ extern unsigned long elf_hwcap;
  */
 #define ELF_PLATFORM   (NULL)
 
+#define COMPAT_ELF_PLATFORM(NULL)
+
 #ifdef CONFIG_MMU
 #define ARCH_DLINFO\
 do {   \
+   /*  \
+* Note that we add ulong after elf_addr_t because  \
+* casting current->mm->context.vdso triggers a cast\
+* warning of cast from pointer to integer for  \
+* COMPAT ELFCLASS32.   \
+*/ \
NEW_AUX_ENT(AT_SYSINFO_EHDR,\
-   (elf_addr_t)current->mm->context.vdso); \
+   (elf_addr_t)(ulong)current->mm->context.vdso);  \
NEW_AUX_ENT(AT_L1I_CACHESIZE,   \
get_cache_size(1, CACHE_TYPE_INST));\
NEW_AUX_ENT(AT_L1I_CACHEGEOMETRY,   \
@@ -90,4 +115,23 @@ do {
\
*(struct user_regs_struct *)regs;   \
 } while (0);
 
+#ifdef CONFIG_COMPAT
+
+#define SET_PERSONALITY(ex)\
+do {if ((ex).e_ident[EI_CLASS] == ELFCLASS32)  \
+   set_thread_flag(TIF_32BIT); \
+   else\
+   clear_thread_flag(TIF_32BIT);   \
+   if (personality(current->personality) != PER_LINUX32)   \
+   set_personality(PER_LINUX | \
+   (current->personality & (~PER_MASK)));  \
+} while (0)
+
+#define COMPAT_ELF_ET_DYN_BASE ((TASK_SIZE_32 / 3) * 2)
+
+/* rv32 registers */
+typedef compat_ulong_t compat_elf_greg_t;
+typedef compat_elf_greg_t  compat_elf_gregset_t[ELF_NGREG];
+
+#endif /* CONFIG_COMPAT */
 #endif /* _ASM_RISCV_ELF_H */
-- 
2.25.1

[PATCH V6 13/20] riscv: compat: process: Add UXL_32 support in start_thread

2022-02-24 Thread guoren

From: Guo Ren 

If the current task is in COMPAT mode, set SR_UXL_32 in status for
returning userspace. We need CONFIG _COMPAT to prevent compiling
errors with rv32 defconfig.

Signed-off-by: Guo Ren 
Signed-off-by: Guo Ren 
Cc: Arnd Bergmann 
Cc: Palmer Dabbelt 
---
 arch/riscv/kernel/process.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index 03ac3aa611f5..54787ca9806a 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -97,6 +97,11 @@ void start_thread(struct pt_regs *regs, unsigned long pc,
}
regs->epc = pc;
regs->sp = sp;
+
+   if (is_compat_task())
+   regs->status = (regs->status & ~SR_UXL) | SR_UXL_32;
+   else
+   regs->status = (regs->status & ~SR_UXL) | SR_UXL_64;
 }
 
 void flush_thread(void)
-- 
2.25.1

Re: [PATCH v2 07/18] nios2: drop access_ok() check from __put_user()

2022-02-24 Thread Dinh Nguyen





On 2/16/22 07:13, Arnd Bergmann wrote:

From: Arnd Bergmann 

Unlike other architectures, the nios2 version of __put_user() has an
extra check for access_ok(), preventing it from being used to implement
__put_kernel_nofault().

Split up put_user() along the same lines as __get_user()/get_user()

Signed-off-by: Arnd Bergmann 
---
  arch/nios2/include/asm/uaccess.h | 56 +++-
  1 file changed, 33 insertions(+), 23 deletions(-)

diff --git a/arch/nios2/include/asm/uaccess.h b/arch/nios2/include/asm/uaccess.h
index ca9285a915ef..a5cbe07cf0da 100644
--- a/arch/nios2/include/asm/uaccess.h
+++ b/arch/nios2/include/asm/uaccess.h
@@ -167,34 +167,44 @@ do {  
\
: "r" (val), "r" (ptr), "i" (-EFAULT));   \
  }
  
-#define put_user(x, ptr)		\

+#define __put_user_common(__pu_val, __pu_ptr)  \
  ({\
long __pu_err = -EFAULT;\
-   __typeof__(*(ptr)) __user *__pu_ptr = (ptr);\
-   __typeof__(*(ptr)) __pu_val = (__typeof(*ptr))(x);  \
-   if (access_ok(__pu_ptr, sizeof(*__pu_ptr))) {   \
-   switch (sizeof(*__pu_ptr)) {\
-   case 1: \
-   __put_user_asm(__pu_val, "stb", __pu_ptr, __pu_err); \
-   break;  \
-   case 2: \
-   __put_user_asm(__pu_val, "sth", __pu_ptr, __pu_err); \
-   break;  \
-   case 4: \
-   __put_user_asm(__pu_val, "stw", __pu_ptr, __pu_err); \
-   break;  \
-   default:\
-   /* XXX: This looks wrong... */  \
-   __pu_err = 0;   \
-   if (copy_to_user(__pu_ptr, &(__pu_val), \
-   sizeof(*__pu_ptr))) \
-   __pu_err = -EFAULT; \
-   break;  \
-   }   \
+   switch (sizeof(*__pu_ptr)) {\
+   case 1: \
+   __put_user_asm(__pu_val, "stb", __pu_ptr, __pu_err);  \
+   break;  \
+   case 2: \
+   __put_user_asm(__pu_val, "sth", __pu_ptr, __pu_err);  \
+   break;  \
+   case 4: \
+   __put_user_asm(__pu_val, "stw", __pu_ptr, __pu_err);  \
+   break;  \
+   default:\
+   /* XXX: This looks wrong... */  \
+   __pu_err = 0;   \
+   if (__copy_to_user(__pu_ptr, &(__pu_val),   \
+   sizeof(*__pu_ptr))) \
+   __pu_err = -EFAULT; \
+   break;  \
}   \
__pu_err;   \
  })
  
-#define __put_user(x, ptr) put_user(x, ptr)

+#define __put_user(x, ptr) \
+({ \
+   __auto_type __pu_ptr = (ptr);   \
+   typeof(*__pu_ptr) __pu_val = (typeof(*__pu_ptr))(x);\
+   __put_user_common(__pu_val, __pu_ptr);  \
+})
+
+#define put_user(x, ptr)   \
+({ \
+   __auto_type __pu_ptr = (ptr);   \
+   typeof(*__pu_ptr) __pu_val = (typeof(*__pu_ptr))(x);\
+   access_ok(__pu_ptr, sizeof(*__pu_ptr)) ?\
+   __put_user_common(__pu_val, __pu_ptr) : \
+   -EFAULT;\

[PATCH V6 12/20] riscv: compat: syscall: Add entry.S implementation

2022-02-24 Thread guoren

From: Guo Ren 

Implement the entry of compat_sys_call_table[] in asm. Ref to
riscv-privileged spec 4.1.1 Supervisor Status Register (sstatus):

 BIT[32:33] = UXL[1:0]:
 - 1:32
 - 2:64
 - 3:128

Signed-off-by: Guo Ren 
Signed-off-by: Guo Ren 
Reviewed-by: Palmer Dabbelt 
Cc: Arnd Bergmann 
---
 arch/riscv/include/asm/csr.h |  7 +++
 arch/riscv/kernel/entry.S| 18 --
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
index ae711692eec9..eed96fa62d66 100644
--- a/arch/riscv/include/asm/csr.h
+++ b/arch/riscv/include/asm/csr.h
@@ -36,6 +36,13 @@
 #define SR_SD  _AC(0x8000, UL) /* FS/XS dirty */
 #endif
 
+#ifdef CONFIG_COMPAT
+#define SR_UXL _AC(0x3, UL) /* XLEN mask for U-mode */
+#define SR_UXL_32  _AC(0x1, UL) /* XLEN = 32 for U-mode */
+#define SR_UXL_64  _AC(0x2, UL) /* XLEN = 64 for U-mode */
+#define SR_UXL_SHIFT   32
+#endif
+
 /* SATP flags */
 #ifndef CONFIG_64BIT
 #define SATP_PPN   _AC(0x003F, UL)
diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
index ed29e9c8f660..1951743f09b3 100644
--- a/arch/riscv/kernel/entry.S
+++ b/arch/riscv/kernel/entry.S
@@ -207,13 +207,27 @@ check_syscall_nr:
 * Syscall number held in a7.
 * If syscall number is above allowed value, redirect to ni_syscall.
 */
-   bgeu a7, t0, 1f
+   bgeu a7, t0, 3f
+#ifdef CONFIG_COMPAT
+   REG_L s0, PT_STATUS(sp)
+   srli s0, s0, SR_UXL_SHIFT
+   andi s0, s0, (SR_UXL >> SR_UXL_SHIFT)
+   li t0, (SR_UXL_32 >> SR_UXL_SHIFT)
+   sub t0, s0, t0
+   bnez t0, 1f
+
+   /* Call compat_syscall */
+   la s0, compat_sys_call_table
+   j 2f
+1:
+#endif
/* Call syscall */
la s0, sys_call_table
+2:
slli t0, a7, RISCV_LGPTR
add s0, s0, t0
REG_L s0, 0(s0)
-1:
+3:
jalr s0
 
 ret_from_syscall:
-- 
2.25.1

[PATCH V6 11/20] riscv: compat: syscall: Add compat_sys_call_table implementation

2022-02-24 Thread guoren

From: Guo Ren 

Implement compat sys_call_table and some system call functions:
truncate64, ftruncate64, fallocate, pread64, pwrite64,
sync_file_range, readahead, fadvise64_64 which need argument
translation.

Signed-off-by: Guo Ren 
Signed-off-by: Guo Ren 
Cc: Arnd Bergmann 
Cc: Palmer Dabbelt 
---
 arch/riscv/include/asm/syscall.h |  1 +
 arch/riscv/include/asm/unistd.h  | 11 +++
 arch/riscv/include/uapi/asm/unistd.h |  2 +-
 arch/riscv/kernel/Makefile   |  1 +
 arch/riscv/kernel/compat_syscall_table.c | 19 
 arch/riscv/kernel/sys_riscv.c|  6 ++--
 fs/open.c| 24 +++
 fs/read_write.c  | 16 ++
 fs/sync.c|  9 ++
 include/asm-generic/compat.h |  7 +
 include/linux/compat.h   | 37 
 mm/fadvise.c | 11 +++
 mm/readahead.c   |  7 +
 13 files changed, 148 insertions(+), 3 deletions(-)
 create mode 100644 arch/riscv/kernel/compat_syscall_table.c

diff --git a/arch/riscv/include/asm/syscall.h b/arch/riscv/include/asm/syscall.h
index 7ac6a0e275f2..384a63b86420 100644
--- a/arch/riscv/include/asm/syscall.h
+++ b/arch/riscv/include/asm/syscall.h
@@ -16,6 +16,7 @@
 
 /* The array of function pointers for syscalls. */
 extern void * const sys_call_table[];
+extern void * const compat_sys_call_table[];
 
 /*
  * Only the low 32 bits of orig_r0 are meaningful, so we return int.
diff --git a/arch/riscv/include/asm/unistd.h b/arch/riscv/include/asm/unistd.h
index 6c316093a1e5..5ddac412b578 100644
--- a/arch/riscv/include/asm/unistd.h
+++ b/arch/riscv/include/asm/unistd.h
@@ -11,6 +11,17 @@
 #define __ARCH_WANT_SYS_CLONE
 #define __ARCH_WANT_MEMFD_SECRET
 
+#ifdef CONFIG_COMPAT
+#define __ARCH_WANT_COMPAT_TRUNCATE64
+#define __ARCH_WANT_COMPAT_FTRUNCATE64
+#define __ARCH_WANT_COMPAT_FALLOCATE
+#define __ARCH_WANT_COMPAT_PREAD64
+#define __ARCH_WANT_COMPAT_PWRITE64
+#define __ARCH_WANT_COMPAT_SYNC_FILE_RANGE
+#define __ARCH_WANT_COMPAT_READAHEAD
+#define __ARCH_WANT_COMPAT_FADVISE64_64
+#endif
+
 #include 
 
 #define NR_syscalls (__NR_syscalls)
diff --git a/arch/riscv/include/uapi/asm/unistd.h 
b/arch/riscv/include/uapi/asm/unistd.h
index 8062996c2dfd..c9e50eed14aa 100644
--- a/arch/riscv/include/uapi/asm/unistd.h
+++ b/arch/riscv/include/uapi/asm/unistd.h
@@ -15,7 +15,7 @@
  * along with this program.  If not, see .
  */
 
-#ifdef __LP64__
+#if defined(__LP64__) && !defined(__SYSCALL_COMPAT)
 #define __ARCH_WANT_NEW_STAT
 #define __ARCH_WANT_SET_GET_RLIMIT
 #endif /* __LP64__ */
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index 612556faa527..954dc7043ad2 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -66,3 +66,4 @@ obj-$(CONFIG_CRASH_DUMP)  += crash_dump.o
 obj-$(CONFIG_JUMP_LABEL)   += jump_label.o
 
 obj-$(CONFIG_EFI)  += efi.o
+obj-$(CONFIG_COMPAT)   += compat_syscall_table.o
diff --git a/arch/riscv/kernel/compat_syscall_table.c 
b/arch/riscv/kernel/compat_syscall_table.c
new file mode 100644
index ..651f2b009c28
--- /dev/null
+++ b/arch/riscv/kernel/compat_syscall_table.c
@@ -0,0 +1,19 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#define __SYSCALL_COMPAT
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#undef __SYSCALL
+#define __SYSCALL(nr, call)  [nr] = (call),
+
+asmlinkage long compat_sys_rt_sigreturn(void);
+
+void * const compat_sys_call_table[__NR_syscalls] = {
+   [0 ... __NR_syscalls - 1] = sys_ni_syscall,
+#include 
+};
diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c
index 12f8a7fce78b..9c0194f176fc 100644
--- a/arch/riscv/kernel/sys_riscv.c
+++ b/arch/riscv/kernel/sys_riscv.c
@@ -33,7 +33,9 @@ SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
 {
return riscv_sys_mmap(addr, len, prot, flags, fd, offset, 0);
 }
-#else
+#endif
+
+#if defined(CONFIG_32BIT) || defined(CONFIG_COMPAT)
 SYSCALL_DEFINE6(mmap2, unsigned long, addr, unsigned long, len,
unsigned long, prot, unsigned long, flags,
unsigned long, fd, off_t, offset)
@@ -44,7 +46,7 @@ SYSCALL_DEFINE6(mmap2, unsigned long, addr, unsigned long, 
len,
 */
return riscv_sys_mmap(addr, len, prot, flags, fd, offset, 12);
 }
-#endif /* !CONFIG_64BIT */
+#endif
 
 /*
  * Allows the instruction cache to be flushed from userspace.  Despite RISC-V
diff --git a/fs/open.c b/fs/open.c
index 9ff2f621b760..b25613f7c0a7 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -224,6 +224,21 @@ SYSCALL_DEFINE2(ftruncate64, unsigned int, fd, loff_t, 
length)
 }
 #endif /* BITS_PER_LONG == 32 */
 
+#if defined(CONFIG_COMPAT) && defined(__ARCH_WANT_COMPAT_TRUNCATE64)
+COMPAT_SYSCALL_DEFINE3(truncate64, const char __user *, pathname,
+  compat_arg_u64_dual(length))
+{
+

[PATCH V6 10/20] riscv: compat: Re-implement TASK_SIZE for COMPAT_32BIT

2022-02-24 Thread guoren

From: Guo Ren 

Make TASK_SIZE from const to dynamic detect TIF_32BIT flag
function. Refer to arm64 to implement DEFAULT_MAP_WINDOW_64 for
efi-stub.

Limit 32-bit compatible process in 0-2GB virtual address range
(which is enough for real scenarios), because it could avoid
address sign extend problem when 32-bit enter 64-bit and ease
software design.

The standard 32-bit TASK_SIZE is 0x9dc0:FIXADDR_START, and
compared to a compatible 32-bit, it increases 476MB for the
application's virtual address.

Signed-off-by: Guo Ren 
Signed-off-by: Guo Ren 
Reviewed-by: Arnd Bergmann 
---
 arch/riscv/include/asm/pgtable.h | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 7e949f25c933..f0d125ea3ceb 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -704,8 +704,17 @@ static inline pmd_t pmdp_establish(struct vm_area_struct 
*vma,
  * 63–48 all equal to bit 47, or else a page-fault exception will occur."
  */
 #ifdef CONFIG_64BIT
-#define TASK_SIZE  (PGDIR_SIZE * PTRS_PER_PGD / 2)
-#define TASK_SIZE_MIN  (PGDIR_SIZE_L3 * PTRS_PER_PGD / 2)
+#define TASK_SIZE_64   (PGDIR_SIZE * PTRS_PER_PGD / 2)
+#define TASK_SIZE_MIN  (PGDIR_SIZE_L3 * PTRS_PER_PGD / 2)
+
+#ifdef CONFIG_COMPAT
+#define TASK_SIZE_32   (_AC(0x8000, UL) - PAGE_SIZE)
+#define TASK_SIZE  (test_thread_flag(TIF_32BIT) ? \
+TASK_SIZE_32 : TASK_SIZE_64)
+#else
+#define TASK_SIZE  TASK_SIZE_64
+#endif
+
 #else
 #define TASK_SIZE  FIXADDR_START
 #define TASK_SIZE_MIN  TASK_SIZE
-- 
2.25.1

[PATCH V6 09/20] riscv: compat: Add basic compat data type implementation

2022-02-24 Thread guoren

From: Guo Ren 

Implement riscv asm/compat.h for struct compat_xxx,
is_compat_task, compat_user_regset, regset convert.

The rv64 compat.h has inherited most of the structs
from the generic one.

Signed-off-by: Guo Ren 
Signed-off-by: Guo Ren 
Cc: Arnd Bergmann 
Cc: Palmer Dabbelt 
---
 arch/riscv/include/asm/compat.h  | 129 +++
 arch/riscv/include/asm/thread_info.h |   1 +
 2 files changed, 130 insertions(+)
 create mode 100644 arch/riscv/include/asm/compat.h

diff --git a/arch/riscv/include/asm/compat.h b/arch/riscv/include/asm/compat.h
new file mode 100644
index ..2ac955b51148
--- /dev/null
+++ b/arch/riscv/include/asm/compat.h
@@ -0,0 +1,129 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_COMPAT_H
+#define __ASM_COMPAT_H
+
+#define COMPAT_UTS_MACHINE "riscv\0\0"
+
+/*
+ * Architecture specific compatibility types
+ */
+#include 
+#include 
+#include 
+#include 
+
+static inline int is_compat_task(void)
+{
+   return test_thread_flag(TIF_32BIT);
+}
+
+struct compat_user_regs_struct {
+   compat_ulong_t pc;
+   compat_ulong_t ra;
+   compat_ulong_t sp;
+   compat_ulong_t gp;
+   compat_ulong_t tp;
+   compat_ulong_t t0;
+   compat_ulong_t t1;
+   compat_ulong_t t2;
+   compat_ulong_t s0;
+   compat_ulong_t s1;
+   compat_ulong_t a0;
+   compat_ulong_t a1;
+   compat_ulong_t a2;
+   compat_ulong_t a3;
+   compat_ulong_t a4;
+   compat_ulong_t a5;
+   compat_ulong_t a6;
+   compat_ulong_t a7;
+   compat_ulong_t s2;
+   compat_ulong_t s3;
+   compat_ulong_t s4;
+   compat_ulong_t s5;
+   compat_ulong_t s6;
+   compat_ulong_t s7;
+   compat_ulong_t s8;
+   compat_ulong_t s9;
+   compat_ulong_t s10;
+   compat_ulong_t s11;
+   compat_ulong_t t3;
+   compat_ulong_t t4;
+   compat_ulong_t t5;
+   compat_ulong_t t6;
+};
+
+static inline void regs_to_cregs(struct compat_user_regs_struct *cregs,
+struct pt_regs *regs)
+{
+   cregs->pc   = (compat_ulong_t) regs->epc;
+   cregs->ra   = (compat_ulong_t) regs->ra;
+   cregs->sp   = (compat_ulong_t) regs->sp;
+   cregs->gp   = (compat_ulong_t) regs->gp;
+   cregs->tp   = (compat_ulong_t) regs->tp;
+   cregs->t0   = (compat_ulong_t) regs->t0;
+   cregs->t1   = (compat_ulong_t) regs->t1;
+   cregs->t2   = (compat_ulong_t) regs->t2;
+   cregs->s0   = (compat_ulong_t) regs->s0;
+   cregs->s1   = (compat_ulong_t) regs->s1;
+   cregs->a0   = (compat_ulong_t) regs->a0;
+   cregs->a1   = (compat_ulong_t) regs->a1;
+   cregs->a2   = (compat_ulong_t) regs->a2;
+   cregs->a3   = (compat_ulong_t) regs->a3;
+   cregs->a4   = (compat_ulong_t) regs->a4;
+   cregs->a5   = (compat_ulong_t) regs->a5;
+   cregs->a6   = (compat_ulong_t) regs->a6;
+   cregs->a7   = (compat_ulong_t) regs->a7;
+   cregs->s2   = (compat_ulong_t) regs->s2;
+   cregs->s3   = (compat_ulong_t) regs->s3;
+   cregs->s4   = (compat_ulong_t) regs->s4;
+   cregs->s5   = (compat_ulong_t) regs->s5;
+   cregs->s6   = (compat_ulong_t) regs->s6;
+   cregs->s7   = (compat_ulong_t) regs->s7;
+   cregs->s8   = (compat_ulong_t) regs->s8;
+   cregs->s9   = (compat_ulong_t) regs->s9;
+   cregs->s10  = (compat_ulong_t) regs->s10;
+   cregs->s11  = (compat_ulong_t) regs->s11;
+   cregs->t3   = (compat_ulong_t) regs->t3;
+   cregs->t4   = (compat_ulong_t) regs->t4;
+   cregs->t5   = (compat_ulong_t) regs->t5;
+   cregs->t6   = (compat_ulong_t) regs->t6;
+};
+
+static inline void cregs_to_regs(struct compat_user_regs_struct *cregs,
+struct pt_regs *regs)
+{
+   regs->epc   = (unsigned long) cregs->pc;
+   regs->ra= (unsigned long) cregs->ra;
+   regs->sp= (unsigned long) cregs->sp;
+   regs->gp= (unsigned long) cregs->gp;
+   regs->tp= (unsigned long) cregs->tp;
+   regs->t0= (unsigned long) cregs->t0;
+   regs->t1= (unsigned long) cregs->t1;
+   regs->t2= (unsigned long) cregs->t2;
+   regs->s0= (unsigned long) cregs->s0;
+   regs->s1= (unsigned long) cregs->s1;
+   regs->a0= (unsigned long) cregs->a0;
+   regs->a1= (unsigned long) cregs->a1;
+   regs->a2= (unsigned long) cregs->a2;
+   regs->a3= (unsigned long) cregs->a3;
+   regs->a4= (unsigned long) cregs->a4;
+   regs->a5= (unsigned long) cregs->a5;
+   regs->a6= (unsigned long) cregs->a6;
+   regs->a7= (unsigned long) cregs->a7;
+   regs->s2= (unsigned long) cregs->s2;
+   regs->s3= (unsigned long) cregs->s3;
+   regs->s4=

[PATCH V6 08/20] riscv: Fixup difference with defconfig

2022-02-24 Thread guoren

From: Guo Ren 

Let's follow the origin patch's spirit:

The only difference between rv32_defconfig and defconfig is that
rv32_defconfig has  CONFIG_ARCH_RV32I=y.

This is helpful to compare rv64-compat-rv32 v.s. rv32-linux.

Fixes: 1b937e8faa87ccfb ("RISC-V: Add separate defconfig for 32bit systems")
Signed-off-by: Guo Ren 
Signed-off-by: Guo Ren 
Reviewed-by: Arnd Bergmann 
Cc: Palmer Dabbelt 
---
 arch/riscv/Makefile   |   4 +
 arch/riscv/configs/rv32_defconfig | 135 --
 2 files changed, 4 insertions(+), 135 deletions(-)
 delete mode 100644 arch/riscv/configs/rv32_defconfig

diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
index 7d81102cffd4..c6ca1b9cbf71 100644
--- a/arch/riscv/Makefile
+++ b/arch/riscv/Makefile
@@ -154,3 +154,7 @@ PHONY += rv64_randconfig
 rv64_randconfig:
$(Q)$(MAKE) 
KCONFIG_ALLCONFIG=$(srctree)/arch/riscv/configs/64-bit.config \
-f $(srctree)/Makefile randconfig
+
+PHONY += rv32_defconfig
+rv32_defconfig:
+   $(Q)$(MAKE) -f $(srctree)/Makefile defconfig 32-bit.config
diff --git a/arch/riscv/configs/rv32_defconfig 
b/arch/riscv/configs/rv32_defconfig
deleted file mode 100644
index 8b56a7f1eb06..
--- a/arch/riscv/configs/rv32_defconfig
+++ /dev/null
@@ -1,135 +0,0 @@
-CONFIG_SYSVIPC=y
-CONFIG_POSIX_MQUEUE=y
-CONFIG_NO_HZ_IDLE=y
-CONFIG_HIGH_RES_TIMERS=y
-CONFIG_BPF_SYSCALL=y
-CONFIG_IKCONFIG=y
-CONFIG_IKCONFIG_PROC=y
-CONFIG_CGROUPS=y
-CONFIG_CGROUP_SCHED=y
-CONFIG_CFS_BANDWIDTH=y
-CONFIG_CGROUP_BPF=y
-CONFIG_NAMESPACES=y
-CONFIG_USER_NS=y
-CONFIG_CHECKPOINT_RESTORE=y
-CONFIG_BLK_DEV_INITRD=y
-CONFIG_EXPERT=y
-# CONFIG_SYSFS_SYSCALL is not set
-CONFIG_SOC_SIFIVE=y
-CONFIG_SOC_VIRT=y
-CONFIG_ARCH_RV32I=y
-CONFIG_SMP=y
-CONFIG_HOTPLUG_CPU=y
-CONFIG_VIRTUALIZATION=y
-CONFIG_KVM=m
-CONFIG_JUMP_LABEL=y
-CONFIG_MODULES=y
-CONFIG_MODULE_UNLOAD=y
-CONFIG_NET=y
-CONFIG_PACKET=y
-CONFIG_UNIX=y
-CONFIG_INET=y
-CONFIG_IP_MULTICAST=y
-CONFIG_IP_ADVANCED_ROUTER=y
-CONFIG_IP_PNP=y
-CONFIG_IP_PNP_DHCP=y
-CONFIG_IP_PNP_BOOTP=y
-CONFIG_IP_PNP_RARP=y
-CONFIG_NETLINK_DIAG=y
-CONFIG_NET_9P=y
-CONFIG_NET_9P_VIRTIO=y
-CONFIG_PCI=y
-CONFIG_PCIEPORTBUS=y
-CONFIG_PCI_HOST_GENERIC=y
-CONFIG_PCIE_XILINX=y
-CONFIG_DEVTMPFS=y
-CONFIG_DEVTMPFS_MOUNT=y
-CONFIG_BLK_DEV_LOOP=y
-CONFIG_VIRTIO_BLK=y
-CONFIG_BLK_DEV_SD=y
-CONFIG_BLK_DEV_SR=y
-CONFIG_SCSI_VIRTIO=y
-CONFIG_ATA=y
-CONFIG_SATA_AHCI=y
-CONFIG_SATA_AHCI_PLATFORM=y
-CONFIG_NETDEVICES=y
-CONFIG_VIRTIO_NET=y
-CONFIG_MACB=y
-CONFIG_E1000E=y
-CONFIG_R8169=y
-CONFIG_MICROSEMI_PHY=y
-CONFIG_INPUT_MOUSEDEV=y
-CONFIG_SERIAL_8250=y
-CONFIG_SERIAL_8250_CONSOLE=y
-CONFIG_SERIAL_OF_PLATFORM=y
-CONFIG_SERIAL_EARLYCON_RISCV_SBI=y
-CONFIG_HVC_RISCV_SBI=y
-CONFIG_VIRTIO_CONSOLE=y
-CONFIG_HW_RANDOM=y
-CONFIG_HW_RANDOM_VIRTIO=y
-CONFIG_SPI=y
-CONFIG_SPI_SIFIVE=y
-# CONFIG_PTP_1588_CLOCK is not set
-CONFIG_DRM=y
-CONFIG_DRM_RADEON=y
-CONFIG_DRM_VIRTIO_GPU=y
-CONFIG_FB=y
-CONFIG_FRAMEBUFFER_CONSOLE=y
-CONFIG_USB=y
-CONFIG_USB_XHCI_HCD=y
-CONFIG_USB_XHCI_PLATFORM=y
-CONFIG_USB_EHCI_HCD=y
-CONFIG_USB_EHCI_HCD_PLATFORM=y
-CONFIG_USB_OHCI_HCD=y
-CONFIG_USB_OHCI_HCD_PLATFORM=y
-CONFIG_USB_STORAGE=y
-CONFIG_USB_UAS=y
-CONFIG_MMC=y
-CONFIG_MMC_SPI=y
-CONFIG_RTC_CLASS=y
-CONFIG_VIRTIO_PCI=y
-CONFIG_VIRTIO_BALLOON=y
-CONFIG_VIRTIO_INPUT=y
-CONFIG_VIRTIO_MMIO=y
-CONFIG_RPMSG_CHAR=y
-CONFIG_RPMSG_VIRTIO=y
-CONFIG_EXT4_FS=y
-CONFIG_EXT4_FS_POSIX_ACL=y
-CONFIG_AUTOFS4_FS=y
-CONFIG_MSDOS_FS=y
-CONFIG_VFAT_FS=y
-CONFIG_TMPFS=y
-CONFIG_TMPFS_POSIX_ACL=y
-CONFIG_NFS_FS=y
-CONFIG_NFS_V4=y
-CONFIG_NFS_V4_1=y
-CONFIG_NFS_V4_2=y
-CONFIG_ROOT_NFS=y
-CONFIG_9P_FS=y
-CONFIG_CRYPTO_USER_API_HASH=y
-CONFIG_CRYPTO_DEV_VIRTIO=y
-CONFIG_PRINTK_TIME=y
-CONFIG_DEBUG_FS=y
-CONFIG_DEBUG_PAGEALLOC=y
-CONFIG_SCHED_STACK_END_CHECK=y
-CONFIG_DEBUG_VM=y
-CONFIG_DEBUG_VM_PGFLAGS=y
-CONFIG_DEBUG_MEMORY_INIT=y
-CONFIG_DEBUG_PER_CPU_MAPS=y
-CONFIG_SOFTLOCKUP_DETECTOR=y
-CONFIG_WQ_WATCHDOG=y
-CONFIG_DEBUG_TIMEKEEPING=y
-CONFIG_DEBUG_RT_MUTEXES=y
-CONFIG_DEBUG_SPINLOCK=y
-CONFIG_DEBUG_MUTEXES=y
-CONFIG_DEBUG_RWSEMS=y
-CONFIG_DEBUG_ATOMIC_SLEEP=y
-CONFIG_STACKTRACE=y
-CONFIG_DEBUG_LIST=y
-CONFIG_DEBUG_PLIST=y
-CONFIG_DEBUG_SG=y
-# CONFIG_RCU_TRACE is not set
-CONFIG_RCU_EQS_DEBUG=y
-# CONFIG_FTRACE is not set
-# CONFIG_RUNTIME_TESTING_MENU is not set
-CONFIG_MEMTEST=y
-- 
2.25.1

Re: [PATCH 2/3] powerpc: fix build errors

2022-02-24 Thread Arnd Bergmann

On Thu, Feb 24, 2022 at 6:05 AM Nicholas Piggin  wrote:
> Excerpts from Nicholas Piggin's message of February 24, 2022 12:54 pm:
> >
> > Not sure on the outlook for GCC fix. Either way unfortunately we have
> > toolchains in the wild now that will explode, so we might have to take
> > your patches for the time being.
>
> Perhaps not... Here's a hack that seems to work around the problem.
>
> The issue of removing -many from the kernel and replacing it with
> appropriate architecture versions is an orthogonal one (that we
> should do). Either way this hack should be able to allow us to do
> that as well, on these problem toolchains.
>
> But for now it just uses -many as the trivial regression fix to get
> back to previous behaviour.

I don't think the previous behavior is what you want to be honest.

We had the same thing on Arm a few years ago when binutils
started enforcing this more strictly, and it does catch actual
bugs. I think annotating individual inline asm statements is
the best choice here, as that documents what the intention is.

There is one more bug in this series that I looked at with Anders, but
he did not send a patch for that so far:

static void dummy_perf(struct pt_regs *regs)
{
#if defined(CONFIG_FSL_EMB_PERFMON)
mtpmr(PMRN_PMGC0, mfpmr(PMRN_PMGC0) & ~PMGC0_PMIE);
#elif defined(CONFIG_PPC64) || defined(CONFIG_PPC_BOOK3S_32)
if (cur_cpu_spec->pmc_type == PPC_PMC_IBM)
mtspr(SPRN_MMCR0, mfspr(SPRN_MMCR0) & ~(MMCR0_PMXE|MMCR0_PMAO));
#else
mtspr(SPRN_MMCR0, mfspr(SPRN_MMCR0) & ~MMCR0_PMXE);
#endif
}

Here, the assembler correctly flags the mtpmr/mfpmr as an invalid
instruction for a combined 6xx kernel: As far as I can tell, these are
only available on e300 but not the others, and instead of the compile-time
check for CONFIG_FSL_EMB_PERFMON, there needs to be some
runtime check to use the first method on 83xx but the #elif one on
the other 6xx machines.

   Arnd

[PATCH V6 07/20] syscalls: compat: Fix the missing part for __SYSCALL_COMPAT

2022-02-24 Thread guoren

From: Guo Ren 

Make "uapi asm unistd.h" could be used for architectures' COMPAT
mode. The __SYSCALL_COMPAT is first used in riscv.

Signed-off-by: Guo Ren 
Signed-off-by: Guo Ren 
Reviewed-by: Arnd Bergmann 
Reviewed-by: Christoph Hellwig 
---
 include/uapi/asm-generic/unistd.h   | 4 ++--
 tools/include/uapi/asm-generic/unistd.h | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/uapi/asm-generic/unistd.h 
b/include/uapi/asm-generic/unistd.h
index 1c48b0ae3ba3..45fa180cc56a 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -383,7 +383,7 @@ __SYSCALL(__NR_syslog, sys_syslog)
 
 /* kernel/ptrace.c */
 #define __NR_ptrace 117
-__SYSCALL(__NR_ptrace, sys_ptrace)
+__SC_COMP(__NR_ptrace, sys_ptrace, compat_sys_ptrace)
 
 /* kernel/sched/core.c */
 #define __NR_sched_setparam 118
@@ -779,7 +779,7 @@ __SYSCALL(__NR_rseq, sys_rseq)
 #define __NR_kexec_file_load 294
 __SYSCALL(__NR_kexec_file_load, sys_kexec_file_load)
 /* 295 through 402 are unassigned to sync up with generic numbers, don't use */
-#if __BITS_PER_LONG == 32
+#if defined(__SYSCALL_COMPAT) || __BITS_PER_LONG == 32
 #define __NR_clock_gettime64 403
 __SYSCALL(__NR_clock_gettime64, sys_clock_gettime)
 #define __NR_clock_settime64 404
diff --git a/tools/include/uapi/asm-generic/unistd.h 
b/tools/include/uapi/asm-generic/unistd.h
index 1c48b0ae3ba3..45fa180cc56a 100644
--- a/tools/include/uapi/asm-generic/unistd.h
+++ b/tools/include/uapi/asm-generic/unistd.h
@@ -383,7 +383,7 @@ __SYSCALL(__NR_syslog, sys_syslog)
 
 /* kernel/ptrace.c */
 #define __NR_ptrace 117
-__SYSCALL(__NR_ptrace, sys_ptrace)
+__SC_COMP(__NR_ptrace, sys_ptrace, compat_sys_ptrace)
 
 /* kernel/sched/core.c */
 #define __NR_sched_setparam 118
@@ -779,7 +779,7 @@ __SYSCALL(__NR_rseq, sys_rseq)
 #define __NR_kexec_file_load 294
 __SYSCALL(__NR_kexec_file_load, sys_kexec_file_load)
 /* 295 through 402 are unassigned to sync up with generic numbers, don't use */
-#if __BITS_PER_LONG == 32
+#if defined(__SYSCALL_COMPAT) || __BITS_PER_LONG == 32
 #define __NR_clock_gettime64 403
 __SYSCALL(__NR_clock_gettime64, sys_clock_gettime)
 #define __NR_clock_settime64 404
-- 
2.25.1

[PATCH V6 06/20] asm-generic: compat: Cleanup duplicate definitions

2022-02-24 Thread guoren

From: Guo Ren 

There are 7 64bit architectures that support Linux COMPAT mode to
run 32bit applications. A lot of definitions are duplicate:
 - COMPAT_USER_HZ
 - COMPAT_RLIM_INFINITY
 - COMPAT_OFF_T_MAX
 - __compat_uid_t, __compat_uid_t
 - compat_dev_t
 - compat_ipc_pid_t
 - struct compat_flock
 - struct compat_flock64
 - struct compat_statfs
 - struct compat_ipc64_perm, compat_semid64_ds,
  compat_msqid64_ds, compat_shmid64_ds

Cleanup duplicate definitions and merge them into asm-generic.

Signed-off-by: Guo Ren 
Signed-off-by: Guo Ren 
Reviewed-by: Arnd Bergmann 
Reviewed-by: Christoph Hellwig 
Cc: Palmer Dabbelt 
---
 arch/arm64/include/asm/compat.h   |  71 ++--
 arch/mips/include/asm/compat.h|  18 ++---
 arch/parisc/include/asm/compat.h  |  29 ++--
 arch/powerpc/include/asm/compat.h |  30 ++---
 arch/s390/include/asm/compat.h|  79 --
 arch/sparc/include/asm/compat.h   |  39 ---
 arch/x86/include/asm/compat.h |  80 --
 include/asm-generic/compat.h  | 106 ++
 8 files changed, 168 insertions(+), 284 deletions(-)

diff --git a/arch/arm64/include/asm/compat.h b/arch/arm64/include/asm/compat.h
index e0faec1984a1..46317319738a 100644
--- a/arch/arm64/include/asm/compat.h
+++ b/arch/arm64/include/asm/compat.h
@@ -8,6 +8,13 @@
 #define compat_mode_t compat_mode_t
 typedef u16compat_mode_t;
 
+#define __compat_uid_t __compat_uid_t
+typedef u16__compat_uid_t;
+typedef u16__compat_gid_t;
+
+#define compat_ipc_pid_t compat_ipc_pid_t
+typedef u16 compat_ipc_pid_t;
+
 #include 
 
 #ifdef CONFIG_COMPAT
@@ -19,21 +26,15 @@ typedef u16 compat_mode_t;
 #include 
 #include 
 
-#define COMPAT_USER_HZ 100
 #ifdef __AARCH64EB__
 #define COMPAT_UTS_MACHINE "armv8b\0\0"
 #else
 #define COMPAT_UTS_MACHINE "armv8l\0\0"
 #endif
 
-typedef u16__compat_uid_t;
-typedef u16__compat_gid_t;
 typedef u16__compat_uid16_t;
 typedef u16__compat_gid16_t;
-typedef u32compat_dev_t;
 typedef s32compat_nlink_t;
-typedef u16compat_ipc_pid_t;
-typedef __kernel_fsid_tcompat_fsid_t;
 
 struct compat_stat {
 #ifdef __AARCH64EB__
@@ -87,64 +88,6 @@ struct compat_statfs {
 #define compat_user_stack_pointer() (user_stack_pointer(task_pt_regs(current)))
 #define COMPAT_MINSIGSTKSZ 2048
 
-struct compat_ipc64_perm {
-   compat_key_t key;
-   __compat_uid32_t uid;
-   __compat_gid32_t gid;
-   __compat_uid32_t cuid;
-   __compat_gid32_t cgid;
-   unsigned short mode;
-   unsigned short __pad1;
-   unsigned short seq;
-   unsigned short __pad2;
-   compat_ulong_t unused1;
-   compat_ulong_t unused2;
-};
-
-struct compat_semid64_ds {
-   struct compat_ipc64_perm sem_perm;
-   compat_ulong_t sem_otime;
-   compat_ulong_t sem_otime_high;
-   compat_ulong_t sem_ctime;
-   compat_ulong_t sem_ctime_high;
-   compat_ulong_t sem_nsems;
-   compat_ulong_t __unused3;
-   compat_ulong_t __unused4;
-};
-
-struct compat_msqid64_ds {
-   struct compat_ipc64_perm msg_perm;
-   compat_ulong_t msg_stime;
-   compat_ulong_t msg_stime_high;
-   compat_ulong_t msg_rtime;
-   compat_ulong_t msg_rtime_high;
-   compat_ulong_t msg_ctime;
-   compat_ulong_t msg_ctime_high;
-   compat_ulong_t msg_cbytes;
-   compat_ulong_t msg_qnum;
-   compat_ulong_t msg_qbytes;
-   compat_pid_t   msg_lspid;
-   compat_pid_t   msg_lrpid;
-   compat_ulong_t __unused4;
-   compat_ulong_t __unused5;
-};
-
-struct compat_shmid64_ds {
-   struct compat_ipc64_perm shm_perm;
-   compat_size_t  shm_segsz;
-   compat_ulong_t shm_atime;
-   compat_ulong_t shm_atime_high;
-   compat_ulong_t shm_dtime;
-   compat_ulong_t shm_dtime_high;
-   compat_ulong_t shm_ctime;
-   compat_ulong_t shm_ctime_high;
-   compat_pid_t   shm_cpid;
-   compat_pid_t   shm_lpid;
-   compat_ulong_t shm_nattch;
-   compat_ulong_t __unused4;
-   compat_ulong_t __unused5;
-};
-
 static inline int is_compat_task(void)
 {
return test_thread_flag(TIF_32BIT);
diff --git a/arch/mips/include/asm/compat.h b/arch/mips/include/asm/compat.h
index 6d6e5a451f4d..ec01dc000a41 100644
--- a/arch/mips/include/asm/compat.h
+++ b/arch/mips/include/asm/compat.h
@@ -9,28 +9,28 @@
 #include 
 #include 
 
+#define __compat_uid_t __compat_uid_t
 typedef s32__compat_uid_t;
 typedef s32__compat_gid_t;
+
 typedef __compat_uid_t __compat_uid32_t;
 typedef __compat_gid_t __compat_gid32_t;
 #define __compat_uid32_t __compat_uid32_t
-#define __compat_gid32_t __compat_gid32_t
+
+#define compat_statfs  compat_statfs
+#define compat_ipc64_perm  compat_ipc64_perm
 
 #define _COMPAT_NSIG   128 /* Don't ask !$@#% ...  */
 #define

Re: [PATCH 2/3] powerpc: fix build errors

2022-02-24 Thread Nicholas Piggin

Excerpts from Anders Roxell's message of February 23, 2022 11:58 pm:
> Building tinyconfig with gcc (Debian 11.2.0-16) and assembler (Debian
> 2.37.90.20220207) the following build error shows up:
> 
>  {standard input}: Assembler messages:
>  {standard input}:1190: Error: unrecognized opcode: `stbcix'
>  {standard input}:1433: Error: unrecognized opcode: `lwzcix'
>  {standard input}:1453: Error: unrecognized opcode: `stbcix'
>  {standard input}:1460: Error: unrecognized opcode: `stwcix'
>  {standard input}:1596: Error: unrecognized opcode: `stbcix'
>  ...
> 
> Rework to add assembler directives [1] around the instruction. Going
> through the them one by one shows that the changes should be safe.  Like
> __get_user_atomic_128_aligned() is only called in p9_hmi_special_emu(),
> which according to the name is specific to power9.  And __raw_rm_read*()
> are only called in things that are powernv or book3s_hv specific.
> 
> [1] 
> https://sourceware.org/binutils/docs/as/PowerPC_002dPseudo.html#PowerPC_002dPseudo

Thanks for doing this. There is a recent patch committed to binutils to work
around this compiler bug.

https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=cebc89b9328

Not sure on the outlook for GCC fix. Either way unfortunately we have 
toolchains in the wild now that will explode, so we might have to take 
your patches for the time being.

Thanks,
Nick

[PATCH V6 05/20] fs: stat: compat: Add __ARCH_WANT_COMPAT_STAT

2022-02-24 Thread guoren

From: Guo Ren 

RISC-V doesn't neeed compat_stat, so using __ARCH_WANT_COMPAT_STAT
to exclude unnecessary SYSCALL functions.

Signed-off-by: Guo Ren 
Signed-off-by: Guo Ren 
Reviewed-by: Arnd Bergmann 
Reviewed-by: Christoph Hellwig 
Cc: Palmer Dabbelt 
---
 arch/arm64/include/asm/unistd.h   | 1 +
 arch/mips/include/asm/unistd.h| 2 ++
 arch/parisc/include/asm/unistd.h  | 1 +
 arch/powerpc/include/asm/unistd.h | 1 +
 arch/s390/include/asm/unistd.h| 1 +
 arch/sparc/include/asm/unistd.h   | 1 +
 arch/x86/include/asm/unistd.h | 1 +
 fs/stat.c | 2 +-
 8 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index 4e65da3445c7..037feba03a51 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -3,6 +3,7 @@
  * Copyright (C) 2012 ARM Ltd.
  */
 #ifdef CONFIG_COMPAT
+#define __ARCH_WANT_COMPAT_STAT
 #define __ARCH_WANT_COMPAT_STAT64
 #define __ARCH_WANT_SYS_GETHOSTNAME
 #define __ARCH_WANT_SYS_PAUSE
diff --git a/arch/mips/include/asm/unistd.h b/arch/mips/include/asm/unistd.h
index c2196b1b6604..25a5253db7f4 100644
--- a/arch/mips/include/asm/unistd.h
+++ b/arch/mips/include/asm/unistd.h
@@ -50,6 +50,8 @@
 # ifdef CONFIG_32BIT
 #  define __ARCH_WANT_STAT64
 #  define __ARCH_WANT_SYS_TIME32
+# else
+#  define __ARCH_WANT_COMPAT_STAT
 # endif
 # ifdef CONFIG_MIPS32_O32
 #  define __ARCH_WANT_SYS_TIME32
diff --git a/arch/parisc/include/asm/unistd.h b/arch/parisc/include/asm/unistd.h
index cd438e4150f6..14e0668184cb 100644
--- a/arch/parisc/include/asm/unistd.h
+++ b/arch/parisc/include/asm/unistd.h
@@ -168,6 +168,7 @@ type name(type1 arg1, type2 arg2, type3 arg3, type4 arg4, 
type5 arg5)   \
 #define __ARCH_WANT_SYS_CLONE
 #define __ARCH_WANT_SYS_CLONE3
 #define __ARCH_WANT_COMPAT_SYS_SENDFILE
+#define __ARCH_WANT_COMPAT_STAT
 
 #ifdef CONFIG_64BIT
 #define __ARCH_WANT_SYS_TIME
diff --git a/arch/powerpc/include/asm/unistd.h 
b/arch/powerpc/include/asm/unistd.h
index 5eb462af6766..b1129b4ef57d 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -44,6 +44,7 @@
 #define __ARCH_WANT_SYS_TIME
 #define __ARCH_WANT_SYS_UTIME
 #define __ARCH_WANT_SYS_NEWFSTATAT
+#define __ARCH_WANT_COMPAT_STAT
 #define __ARCH_WANT_COMPAT_SYS_SENDFILE
 #endif
 #define __ARCH_WANT_SYS_FORK
diff --git a/arch/s390/include/asm/unistd.h b/arch/s390/include/asm/unistd.h
index 9e9f75ef046a..4260bc5ce7f8 100644
--- a/arch/s390/include/asm/unistd.h
+++ b/arch/s390/include/asm/unistd.h
@@ -28,6 +28,7 @@
 #define __ARCH_WANT_SYS_SIGPENDING
 #define __ARCH_WANT_SYS_SIGPROCMASK
 # ifdef CONFIG_COMPAT
+#   define __ARCH_WANT_COMPAT_STAT
 #   define __ARCH_WANT_SYS_TIME32
 #   define __ARCH_WANT_SYS_UTIME32
 # endif
diff --git a/arch/sparc/include/asm/unistd.h b/arch/sparc/include/asm/unistd.h
index 1e66278ba4a5..d6bc76706a7a 100644
--- a/arch/sparc/include/asm/unistd.h
+++ b/arch/sparc/include/asm/unistd.h
@@ -46,6 +46,7 @@
 #define __ARCH_WANT_SYS_TIME
 #define __ARCH_WANT_SYS_UTIME
 #define __ARCH_WANT_COMPAT_SYS_SENDFILE
+#define __ARCH_WANT_COMPAT_STAT
 #endif
 
 #ifdef __32bit_syscall_numbers__
diff --git a/arch/x86/include/asm/unistd.h b/arch/x86/include/asm/unistd.h
index 80e9d5206a71..761173ccc33c 100644
--- a/arch/x86/include/asm/unistd.h
+++ b/arch/x86/include/asm/unistd.h
@@ -22,6 +22,7 @@
 #  include 
 #  define __ARCH_WANT_SYS_TIME
 #  define __ARCH_WANT_SYS_UTIME
+#  define __ARCH_WANT_COMPAT_STAT
 #  define __ARCH_WANT_COMPAT_SYS_PREADV64
 #  define __ARCH_WANT_COMPAT_SYS_PWRITEV64
 #  define __ARCH_WANT_COMPAT_SYS_PREADV64V2
diff --git a/fs/stat.c b/fs/stat.c
index 28d2020ba1f4..ffdeb9065d53 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -639,7 +639,7 @@ SYSCALL_DEFINE5(statx,
return do_statx(dfd, filename, flags, mask, buffer);
 }
 
-#ifdef CONFIG_COMPAT
+#if defined(CONFIG_COMPAT) && defined(__ARCH_WANT_COMPAT_STAT)
 static int cp_compat_stat(struct kstat *stat, struct compat_stat __user *ubuf)
 {
struct compat_stat tmp;
-- 
2.25.1

[PATCH V6 04/20] kconfig: Add SYSVIPC_COMPAT for all architectures

2022-02-24 Thread guoren

From: Guo Ren 

The existing per-arch definitions are pretty much historic cruft.
Move SYSVIPC_COMPAT into init/Kconfig.

Signed-off-by: Guo Ren 
Signed-off-by: Guo Ren 
Acked-by: Arnd Bergmann 
Reviewed-by: Christoph Hellwig 
Cc: Palmer Dabbelt 
---
 arch/arm64/Kconfig   | 4 
 arch/mips/Kconfig| 5 -
 arch/parisc/Kconfig  | 4 
 arch/powerpc/Kconfig | 5 -
 arch/s390/Kconfig| 3 ---
 arch/sparc/Kconfig   | 5 -
 arch/x86/Kconfig | 4 
 init/Kconfig | 4 
 8 files changed, 4 insertions(+), 30 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 09b885cc4db5..51fdb6e9c522 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2108,10 +2108,6 @@ config DMI
 
 endmenu
 
-config SYSVIPC_COMPAT
-   def_bool y
-   depends on COMPAT && SYSVIPC
-
 menu "Power management options"
 
 source "kernel/power/Kconfig"
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 058446f01487..91a17ad380c9 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -3170,16 +3170,12 @@ config MIPS32_COMPAT
 config COMPAT
bool
 
-config SYSVIPC_COMPAT
-   bool
-
 config MIPS32_O32
bool "Kernel support for o32 binaries"
depends on 64BIT
select ARCH_WANT_OLD_COMPAT_IPC
select COMPAT
select MIPS32_COMPAT
-   select SYSVIPC_COMPAT if SYSVIPC
help
  Select this option if you want to run o32 binaries.  These are pure
  32-bit binaries as used by the 32-bit Linux/MIPS port.  Most of
@@ -3193,7 +3189,6 @@ config MIPS32_N32
select ARCH_WANT_COMPAT_IPC_PARSE_VERSION
select COMPAT
select MIPS32_COMPAT
-   select SYSVIPC_COMPAT if SYSVIPC
help
  Select this option if you want to run n32 binaries.  These are
  64-bit binaries using 32-bit quantities for addressing and certain
diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index 43c1c880def6..bc56759d44a2 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -345,10 +345,6 @@ config COMPAT
def_bool y
depends on 64BIT
 
-config SYSVIPC_COMPAT
-   def_bool y
-   depends on COMPAT && SYSVIPC
-
 config AUDIT_ARCH
def_bool y
 
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index b779603978e1..5a32b7f21af2 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -291,11 +291,6 @@ config COMPAT
select ARCH_WANT_OLD_COMPAT_IPC
select COMPAT_OLD_SIGACTION
 
-config SYSVIPC_COMPAT
-   bool
-   depends on COMPAT && SYSVIPC
-   default y
-
 config SCHED_OMIT_FRAME_POINTER
bool
default y
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index be9f39fd06df..80f69cafbb87 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -459,9 +459,6 @@ config COMPAT
  (and some other stuff like libraries and such) is needed for
  executing 31 bit applications.  It is safe to say "Y".
 
-config SYSVIPC_COMPAT
-   def_bool y if COMPAT && SYSVIPC
-
 config SMP
def_bool y
 
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index 1cab1b284f1a..15d5725bd623 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -488,9 +488,4 @@ config COMPAT
select ARCH_WANT_OLD_COMPAT_IPC
select COMPAT_OLD_SIGACTION
 
-config SYSVIPC_COMPAT
-   bool
-   depends on COMPAT && SYSVIPC
-   default y
-
 source "drivers/sbus/char/Kconfig"
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 9f5bd41bf660..7d0487189f6e 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2860,10 +2860,6 @@ config COMPAT
 if COMPAT
 config COMPAT_FOR_U64_ALIGNMENT
def_bool y
-
-config SYSVIPC_COMPAT
-   def_bool y
-   depends on SYSVIPC
 endif
 
 endmenu
diff --git a/init/Kconfig b/init/Kconfig
index e9119bf54b1f..589ccec56571 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -386,6 +386,10 @@ config SYSVIPC_SYSCTL
depends on SYSCTL
default y
 
+config SYSVIPC_COMPAT
+   def_bool y
+   depends on COMPAT && SYSVIPC
+
 config POSIX_MQUEUE
bool "POSIX Message Queues"
depends on NET
-- 
2.25.1

[PATCH V6 03/20] compat: consolidate the compat_flock{, 64} definition

2022-02-24 Thread guoren

From: Christoph Hellwig 

Provide a single common definition for the compat_flock and
compat_flock64 structures using the same tricks as for the native
variants.  Another extra define is added for the packing required on
x86.

Signed-off-by: Christoph Hellwig 
Signed-off-by: Guo Ren 
Reviewed-by: Arnd Bergmann 
---
 arch/arm64/include/asm/compat.h   | 16 
 arch/mips/include/asm/compat.h| 19 ++-
 arch/parisc/include/asm/compat.h  | 16 
 arch/powerpc/include/asm/compat.h | 16 
 arch/s390/include/asm/compat.h| 16 
 arch/sparc/include/asm/compat.h   | 18 +-
 arch/x86/include/asm/compat.h | 20 +++-
 include/linux/compat.h| 31 +++
 8 files changed, 37 insertions(+), 115 deletions(-)

diff --git a/arch/arm64/include/asm/compat.h b/arch/arm64/include/asm/compat.h
index 276328765408..e0faec1984a1 100644
--- a/arch/arm64/include/asm/compat.h
+++ b/arch/arm64/include/asm/compat.h
@@ -65,22 +65,6 @@ struct compat_stat {
compat_ulong_t  __unused4[2];
 };
 
-struct compat_flock {
-   short   l_type;
-   short   l_whence;
-   compat_off_tl_start;
-   compat_off_tl_len;
-   compat_pid_tl_pid;
-};
-
-struct compat_flock64 {
-   short   l_type;
-   short   l_whence;
-   compat_loff_t   l_start;
-   compat_loff_t   l_len;
-   compat_pid_tl_pid;
-};
-
 struct compat_statfs {
int f_type;
int f_bsize;
diff --git a/arch/mips/include/asm/compat.h b/arch/mips/include/asm/compat.h
index 6a350c1f70d7..6d6e5a451f4d 100644
--- a/arch/mips/include/asm/compat.h
+++ b/arch/mips/include/asm/compat.h
@@ -55,23 +55,8 @@ struct compat_stat {
s32 st_pad4[14];
 };
 
-struct compat_flock {
-   short   l_type;
-   short   l_whence;
-   compat_off_tl_start;
-   compat_off_tl_len;
-   s32 l_sysid;
-   compat_pid_tl_pid;
-   s32 pad[4];
-};
-
-struct compat_flock64 {
-   short   l_type;
-   short   l_whence;
-   compat_loff_t   l_start;
-   compat_loff_t   l_len;
-   compat_pid_tl_pid;
-};
+#define __ARCH_COMPAT_FLOCK_EXTRA_SYSIDs32 l_sysid;
+#define __ARCH_COMPAT_FLOCK_PADs32 pad[4];
 
 struct compat_statfs {
int f_type;
diff --git a/arch/parisc/include/asm/compat.h b/arch/parisc/include/asm/compat.h
index c04f5a637c39..a1e4534d8050 100644
--- a/arch/parisc/include/asm/compat.h
+++ b/arch/parisc/include/asm/compat.h
@@ -53,22 +53,6 @@ struct compat_stat {
u32 st_spare4[3];
 };
 
-struct compat_flock {
-   short   l_type;
-   short   l_whence;
-   compat_off_tl_start;
-   compat_off_tl_len;
-   compat_pid_tl_pid;
-};
-
-struct compat_flock64 {
-   short   l_type;
-   short   l_whence;
-   compat_loff_t   l_start;
-   compat_loff_t   l_len;
-   compat_pid_tl_pid;
-};
-
 struct compat_statfs {
s32 f_type;
s32 f_bsize;
diff --git a/arch/powerpc/include/asm/compat.h 
b/arch/powerpc/include/asm/compat.h
index 83d8f70779cb..5ef3c7c83c34 100644
--- a/arch/powerpc/include/asm/compat.h
+++ b/arch/powerpc/include/asm/compat.h
@@ -44,22 +44,6 @@ struct compat_stat {
u32 __unused4[2];
 };
 
-struct compat_flock {
-   short   l_type;
-   short   l_whence;
-   compat_off_tl_start;
-   compat_off_tl_len;
-   compat_pid_tl_pid;
-};
-
-struct compat_flock64 {
-   short   l_type;
-   short   l_whence;
-   compat_loff_t   l_start;
-   compat_loff_t   l_len;
-   compat_pid_tl_pid;
-};
-
 struct compat_statfs {
int f_type;
int f_bsize;
diff --git a/arch/s390/include/asm/compat.h b/arch/s390/include/asm/compat.h
index 0f14b3188b1b..07f04d37068b 100644
--- a/arch/s390/include/asm/compat.h
+++ b/arch/s390/include/asm/compat.h
@@ -102,22 +102,6 @@ struct compat_stat {
u32 __unused5;
 };
 
-struct compat_flock {
-   short   l_type;
-   short   l_whence;
-   compat_off_tl_start;
-   compat_off_tl_len;
-   compat_pid_tl_pid;
-};
-
-struct compat_flock64 {
-   short   l_type;
-   short   l_whence;
-   compat_loff_t   l_start;
-   compat_loff_t   l_len;
-   compat_pid_tl_pid;
-};
-
 struct compat_statfs {
u32 f_type;
u32 f_bsize;
diff --git a/arch/sparc/include/asm/compat.h b/arch/sparc/include/asm/compat.h
index 108078751bb5..d78fb44942e0 100644
---

[PATCH V6 02/20] uapi: always define F_GETLK64/F_SETLK64/F_SETLKW64 in fcntl.h

2022-02-24 Thread guoren

From: Christoph Hellwig 

The F_GETLK64/F_SETLK64/F_SETLKW64 fcntl opcodes are only implemented
for the 32-bit syscall APIs, but are also needed for compat handling
on 64-bit kernels.

Consolidate them in unistd.h instead of definining the internal compat
definitions in compat.h, which is rather error prone (e.g. parisc
gets the values wrong currently).

Note that before this change they were never visible to userspace due
to the fact that CONFIG_64BIT is only set for kernel builds.

Signed-off-by: Christoph Hellwig 
Signed-off-by: Guo Ren 
Reviewed-by: Arnd Bergmann 
---
 arch/arm64/include/asm/compat.h| 4 
 arch/mips/include/asm/compat.h | 4 
 arch/mips/include/uapi/asm/fcntl.h | 4 ++--
 arch/powerpc/include/asm/compat.h  | 4 
 arch/s390/include/asm/compat.h | 4 
 arch/sparc/include/asm/compat.h| 4 
 arch/x86/include/asm/compat.h  | 4 
 include/uapi/asm-generic/fcntl.h   | 4 ++--
 tools/include/uapi/asm-generic/fcntl.h | 2 --
 9 files changed, 4 insertions(+), 30 deletions(-)

diff --git a/arch/arm64/include/asm/compat.h b/arch/arm64/include/asm/compat.h
index eaa6ca062d89..276328765408 100644
--- a/arch/arm64/include/asm/compat.h
+++ b/arch/arm64/include/asm/compat.h
@@ -73,10 +73,6 @@ struct compat_flock {
compat_pid_tl_pid;
 };
 
-#define F_GETLK64  12  /*  using 'struct flock64' */
-#define F_SETLK64  13
-#define F_SETLKW64 14
-
 struct compat_flock64 {
short   l_type;
short   l_whence;
diff --git a/arch/mips/include/asm/compat.h b/arch/mips/include/asm/compat.h
index bbb3bc5a42fd..6a350c1f70d7 100644
--- a/arch/mips/include/asm/compat.h
+++ b/arch/mips/include/asm/compat.h
@@ -65,10 +65,6 @@ struct compat_flock {
s32 pad[4];
 };
 
-#define F_GETLK64  33
-#define F_SETLK64  34
-#define F_SETLKW64 35
-
 struct compat_flock64 {
short   l_type;
short   l_whence;
diff --git a/arch/mips/include/uapi/asm/fcntl.h 
b/arch/mips/include/uapi/asm/fcntl.h
index 9e44ac810db9..0369a38e3d4f 100644
--- a/arch/mips/include/uapi/asm/fcntl.h
+++ b/arch/mips/include/uapi/asm/fcntl.h
@@ -44,11 +44,11 @@
 #define F_SETOWN   24  /*  for sockets. */
 #define F_GETOWN   23  /*  for sockets. */
 
-#ifndef __mips64
+#if __BITS_PER_LONG == 32 || defined(__KERNEL__)
 #define F_GETLK64  33  /*  using 'struct flock64' */
 #define F_SETLK64  34
 #define F_SETLKW64 35
-#endif
+#endif /* __BITS_PER_LONG == 32 || defined(__KERNEL__) */
 
 #if _MIPS_SIM != _MIPS_SIM_ABI64
 #define __ARCH_FLOCK_EXTRA_SYSID   long l_sysid;
diff --git a/arch/powerpc/include/asm/compat.h 
b/arch/powerpc/include/asm/compat.h
index 7afc96fb6524..83d8f70779cb 100644
--- a/arch/powerpc/include/asm/compat.h
+++ b/arch/powerpc/include/asm/compat.h
@@ -52,10 +52,6 @@ struct compat_flock {
compat_pid_tl_pid;
 };
 
-#define F_GETLK64  12  /*  using 'struct flock64' */
-#define F_SETLK64  13
-#define F_SETLKW64 14
-
 struct compat_flock64 {
short   l_type;
short   l_whence;
diff --git a/arch/s390/include/asm/compat.h b/arch/s390/include/asm/compat.h
index cdc7ae72529d..0f14b3188b1b 100644
--- a/arch/s390/include/asm/compat.h
+++ b/arch/s390/include/asm/compat.h
@@ -110,10 +110,6 @@ struct compat_flock {
compat_pid_tl_pid;
 };
 
-#define F_GETLK64   12
-#define F_SETLK64   13
-#define F_SETLKW64  14
-
 struct compat_flock64 {
short   l_type;
short   l_whence;
diff --git a/arch/sparc/include/asm/compat.h b/arch/sparc/include/asm/compat.h
index bd949fcf9d63..108078751bb5 100644
--- a/arch/sparc/include/asm/compat.h
+++ b/arch/sparc/include/asm/compat.h
@@ -84,10 +84,6 @@ struct compat_flock {
short   __unused;
 };
 
-#define F_GETLK64  12
-#define F_SETLK64  13
-#define F_SETLKW64 14
-
 struct compat_flock64 {
short   l_type;
short   l_whence;
diff --git a/arch/x86/include/asm/compat.h b/arch/x86/include/asm/compat.h
index 7516e4199b3c..8d19a212f4f2 100644
--- a/arch/x86/include/asm/compat.h
+++ b/arch/x86/include/asm/compat.h
@@ -58,10 +58,6 @@ struct compat_flock {
compat_pid_tl_pid;
 };
 
-#define F_GETLK64  12  /*  using 'struct flock64' */
-#define F_SETLK64  13
-#define F_SETLKW64 14
-
 /*
  * IA32 uses 4 byte alignment for 64 bit quantities,
  * so we need to pack this structure.
diff --git a/include/uapi/asm-generic/fcntl.h b/include/uapi/asm-generic/fcntl.h
index 77aa9f2ff98d..f13d37b60775 100644
--- a/include/uapi/asm-generic/fcntl.h
+++ b/include/uapi/asm-generic/fcntl.h
@@ -116,13 +116,13 @@
 #define F_GETSIG   11  /* for sockets. */
 #endif
 
-#ifndef CONFIG_64BIT
+#if __BITS_PER_LONG == 32 || defined(__KERNEL__)
 #ifndef F_GETLK64
 #define F_GETLK64  12  /*  using 'struct flock64' */
 #define F_SETLK64

[PATCH V6 01/20] uapi: simplify __ARCH_FLOCK{,64}_PAD a little

2022-02-24 Thread guoren

From: Christoph Hellwig 

Don't bother to define the symbols empty, just don't use them.
That makes the intent a little more clear.

Remove the unused HAVE_ARCH_STRUCT_FLOCK64 define and merge the
32-bit mips struct flock into the generic one.

Add a new __ARCH_FLOCK_EXTRA_SYSID macro following the style of
__ARCH_FLOCK_PAD to avoid having a separate definition just for
one architecture.

Signed-off-by: Christoph Hellwig 
Signed-off-by: Guo Ren 
Reviewed-by: Arnd Bergmann 
---
 arch/mips/include/uapi/asm/fcntl.h | 26 +++---
 include/uapi/asm-generic/fcntl.h   | 19 +++
 tools/include/uapi/asm-generic/fcntl.h | 19 +++
 3 files changed, 17 insertions(+), 47 deletions(-)

diff --git a/arch/mips/include/uapi/asm/fcntl.h 
b/arch/mips/include/uapi/asm/fcntl.h
index 42e13dead543..9e44ac810db9 100644
--- a/arch/mips/include/uapi/asm/fcntl.h
+++ b/arch/mips/include/uapi/asm/fcntl.h
@@ -50,30 +50,10 @@
 #define F_SETLKW64 35
 #endif
 
-/*
- * The flavours of struct flock.  "struct flock" is the ABI compliant
- * variant.  Finally struct flock64 is the LFS variant of struct flock.
 As
- * a historic accident and inconsistence with the ABI definition it doesn't
- * contain all the same fields as struct flock.
- */
-
 #if _MIPS_SIM != _MIPS_SIM_ABI64
-
-#include 
-
-struct flock {
-   short   l_type;
-   short   l_whence;
-   __kernel_off_t  l_start;
-   __kernel_off_t  l_len;
-   longl_sysid;
-   __kernel_pid_t l_pid;
-   longpad[4];
-};
-
-#define HAVE_ARCH_STRUCT_FLOCK
-
-#endif /* _MIPS_SIM == _MIPS_SIM_ABI32 */
+#define __ARCH_FLOCK_EXTRA_SYSID   long l_sysid;
+#define __ARCH_FLOCK_PAD   long pad[4];
+#endif
 
 #include 
 
diff --git a/include/uapi/asm-generic/fcntl.h b/include/uapi/asm-generic/fcntl.h
index ecd0f5bdfc1d..77aa9f2ff98d 100644
--- a/include/uapi/asm-generic/fcntl.h
+++ b/include/uapi/asm-generic/fcntl.h
@@ -192,25 +192,19 @@ struct f_owner_ex {
 
 #define F_LINUX_SPECIFIC_BASE  1024
 
-#ifndef HAVE_ARCH_STRUCT_FLOCK
-#ifndef __ARCH_FLOCK_PAD
-#define __ARCH_FLOCK_PAD
-#endif
-
 struct flock {
short   l_type;
short   l_whence;
__kernel_off_t  l_start;
__kernel_off_t  l_len;
__kernel_pid_t  l_pid;
-   __ARCH_FLOCK_PAD
-};
+#ifdef __ARCH_FLOCK_EXTRA_SYSID
+   __ARCH_FLOCK_EXTRA_SYSID
 #endif
-
-#ifndef HAVE_ARCH_STRUCT_FLOCK64
-#ifndef __ARCH_FLOCK64_PAD
-#define __ARCH_FLOCK64_PAD
+#ifdef __ARCH_FLOCK_PAD
+   __ARCH_FLOCK_PAD
 #endif
+};
 
 struct flock64 {
short  l_type;
@@ -218,8 +212,9 @@ struct flock64 {
__kernel_loff_t l_start;
__kernel_loff_t l_len;
__kernel_pid_t  l_pid;
+#ifdef __ARCH_FLOCK64_PAD
__ARCH_FLOCK64_PAD
-};
 #endif
+};
 
 #endif /* _ASM_GENERIC_FCNTL_H */
diff --git a/tools/include/uapi/asm-generic/fcntl.h 
b/tools/include/uapi/asm-generic/fcntl.h
index ac190958c981..99bc9b15ce2b 100644
--- a/tools/include/uapi/asm-generic/fcntl.h
+++ b/tools/include/uapi/asm-generic/fcntl.h
@@ -187,25 +187,19 @@ struct f_owner_ex {
 
 #define F_LINUX_SPECIFIC_BASE  1024
 
-#ifndef HAVE_ARCH_STRUCT_FLOCK
-#ifndef __ARCH_FLOCK_PAD
-#define __ARCH_FLOCK_PAD
-#endif
-
 struct flock {
short   l_type;
short   l_whence;
__kernel_off_t  l_start;
__kernel_off_t  l_len;
__kernel_pid_t  l_pid;
-   __ARCH_FLOCK_PAD
-};
+#ifdef __ARCH_FLOCK_EXTRA_SYSID
+   __ARCH_FLOCK_EXTRA_SYSID
 #endif
-
-#ifndef HAVE_ARCH_STRUCT_FLOCK64
-#ifndef __ARCH_FLOCK64_PAD
-#define __ARCH_FLOCK64_PAD
+#ifdef __ARCH_FLOCK_PAD
+   __ARCH_FLOCK_PAD
 #endif
+};
 
 struct flock64 {
short  l_type;
@@ -213,8 +207,9 @@ struct flock64 {
__kernel_loff_t l_start;
__kernel_loff_t l_len;
__kernel_pid_t  l_pid;
+#ifdef __ARCH_FLOCK64_PAD
__ARCH_FLOCK64_PAD
-};
 #endif
+};
 
 #endif /* _ASM_GENERIC_FCNTL_H */
-- 
2.25.1

[PATCH V6 00/20] riscv: compat: Add COMPAT mode support for rv64

2022-02-24 Thread guoren

From: Guo Ren 

Currently, most 64-bit architectures (x86, parisc, powerpc, arm64,
s390, mips, sparc) have supported COMPAT mode. But they all have
history issues and can't use standard linux unistd.h. RISC-V would
be first standard __SYSCALL_COMPAT user of include/uapi/asm-generic
/unistd.h.

The patchset are based on v5.17-rc5, you can compare rv64-compat
v.s. rv32-native in qemu with following steps:

 - Prepare rv32 rootfs & fw_jump.bin by buildroot.org
   $ git clone git://git.busybox.net/buildroot
   $ cd buildroot
   $ make qemu_riscv32_virt_defconfig O=qemu_riscv32_virt_defconfig
   $ make -C qemu_riscv32_virt_defconfig
   $ make qemu_riscv64_virt_defconfig O=qemu_riscv64_virt_defconfig
   $ make -C qemu_riscv64_virt_defconfig
   (Got fw_jump.bin & rootfs.ext2 in qemu_riscvXX_virt_defconfig/images)

 - Prepare Linux rv32 & rv64 Image
   $ git clone g...@github.com:c-sky/csky-linux.git -b riscv_compat_v6 linux
   $ cd linux
   $ echo "CONFIG_STRICT_KERNEL_RWX=n" >> arch/riscv/configs/defconfig
   $ echo "CONFIG_STRICT_MODULE_RWX=n" >> arch/riscv/configs/defconfig
   $ make ARCH=riscv CROSS_COMPILE=riscv32-buildroot-linux-gnu- 
O=../build-rv32/ rv32_defconfig
   $ make ARCH=riscv CROSS_COMPILE=riscv32-buildroot-linux-gnu- 
O=../build-rv32/ Image
   $ make ARCH=riscv CROSS_COMPILE=riscv64-buildroot-linux-gnu- 
O=../build-rv64/ defconfig
   $ make ARCH=riscv CROSS_COMPILE=riscv64-buildroot-linux-gnu- 
O=../build-rv64/ Image

 - Prepare Qemu: (rv32 compat was made by LIU Zhiwei )
   $ git clone g...@github.com:alistair23/qemu.git -b 
riscv-to-apply.for-upstream linux
   $ cd qemu
   $ ./configure --target-list="riscv64-softmmu riscv32-softmmu"
   $ make

Now let's compare rv64-compat with rv32-native memory footprint with almost the 
same
defconfig, rootfs, opensbi in one qemu.

 - Run rv64 with rv32 rootfs in compat mode:
   $ ./build/qemu-system-riscv64 -cpu rv64 -M virt -m 64m -nographic -bios 
qemu_riscv64_virt_defconfig/images/fw_jump.bin -kernel build-rv64/Image -drive 
file qemu_riscv32_virt_defconfig/images/rootfs.ext2,format=raw,id=hd0 -device 
virtio-blk-device,drive=hd0 -append "rootwait root=/dev/vda ro console=ttyS0 
earlycon=sbi" -netdev user,id=net0 -device virtio-net-device,netdev=net0

QEMU emulator version 6.2.50 (v6.2.0-29-g196d7182c8)
OpenSBI v0.9
[0.00] Linux version 5.16.0-rc6-00017-g750f87086bdd-dirty 
(guoren@guoren-Z87-HD3) (riscv64-unknown-linux-gnu-gcc (GCC) 10.2.0, GNU ld 
(GNU Binutils) 2.37) #96 SMP Tue Dec 28 21:01:55 CST 2021
[0.00] OF: fdt: Ignoring memory range 0x8000 - 0x8020
[0.00] Machine model: riscv-virtio,qemu
[0.00] earlycon: sbi0 at I/O port 0x0 (options '')
[0.00] printk: bootconsole [sbi0] enabled
[0.00] efi: UEFI not found.
[0.00] Zone ranges:
[0.00]   DMA32[mem 0x8020-0x83ff]
[0.00]   Normal   empty
[0.00] Movable zone start for each node
[0.00] Early memory node ranges
[0.00]   node   0: [mem 0x8020-0x83ff]
[0.00] Initmem setup node 0 [mem 0x8020-0x83ff]
[0.00] SBI specification v0.2 detected
[0.00] SBI implementation ID=0x1 Version=0x9
[0.00] SBI TIME extension detected
[0.00] SBI IPI extension detected
[0.00] SBI RFENCE extension detected
[0.00] SBI v0.2 HSM extension detected
[0.00] riscv: ISA extensions acdfhimsu
[0.00] riscv: ELF capabilities acdfim
[0.00] percpu: Embedded 17 pages/cpu s30696 r8192 d30744 u69632
[0.00] Built 1 zonelists, mobility grouping on.  Total pages: 15655
[0.00] Kernel command line: rootwait root=/dev/vda ro console=ttyS0 
earlycon=sbi
[0.00] Dentry cache hash table entries: 8192 (order: 4, 65536 bytes, 
linear)
[0.00] Inode-cache hash table entries: 4096 (order: 3, 32768 bytes, 
linear)
[0.00] mem auto-init: stack:off, heap alloc:off, heap free:off
[0.00] Virtual kernel memory layout:
[0.00]   fixmap : 0xffcefee0 - 0xffceff00   (2048 
kB)
[0.00]   pci io : 0xffceff00 - 0xffcf   (  16 
MB)
[0.00]  vmemmap : 0xffcf - 0xffcf   (4095 
MB)
[0.00]  vmalloc : 0xffd0 - 0xffdf   (65535 
MB)
[0.00]   lowmem : 0xffe0 - 0xffe003e0   (  62 
MB)
[0.00]   kernel : 0x8000 - 0x   (2047 
MB)
[0.00] Memory: 52788K/63488K available (6184K kernel code, 888K rwdata, 
1917K rodata, 294K init, 297K bss, 10700K reserved, 0K cma-reserved)
[0.00] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[0.00] rcu: Hierarchical RCU implementation.
[0.00] rcu: RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=1.
[0.00] rcu: RCU debug extended QS entry/exit.
[0.00]  Tracing variant of Tasks RCU

Re: [PATCH v3 2/4] powerpc/pseries/vas: Modify reconfig open/close functions for migration

2022-02-24 Thread Haren Myneni

On Wed, 2022-02-23 at 19:54 +1000, Nicholas Piggin wrote:
> Excerpts from Haren Myneni's message of February 20, 2022 6:05 am:
> > VAS is a hardware engine stays on the chip. So when the partition
> > migrates, all VAS windows on the source system have to be closed
> > and reopen them on the destination after migration.
> > 
> > This patch make changes to the current reconfig_open/close_windows
> > functions to support migration:
> > - Set VAS_WIN_MIGRATE_CLOSE to the window status when closes and
> >   reopen windows with the same status during resume.
> > - Continue to close all windows even if deallocate HCALL failed
> >   (should not happen) since no way to stop migration with the
> >   current LPM implementation.
> 
> Hmm.  pseries_migrate_partition *can* fail?

Yes, it can fail. If pseries_suspend() fails, all VAS windows will be
reopened again without migration. vas_migration_handler(VAS_RESUME) is
called whether pseries_suspend() returns 0 or not.

> 
> > - If the DLPAR CPU event happens while migration is in progress,
> >   set VAS_WIN_NO_CRED_CLOSE to the window status. Close window
> >   happens with the first event (migration or DLPAR) and Reopen
> >   window happens only with the last event (migration or DLPAR).
> 
> Can DLPAR happen while migration is in progress? Couldn't
> this cause your source and destination credits to go out of
> whack?

Should not be, If the DLPAR event happens while migration is in
progress, windows will be closed in the hypervisor (and mark inactive
with migration status bit in OS) for migration. For DLPAR event, mark
the DLPAR_CLOSED status bits for the necessary windows. Then after the
migration, we open windows in the hypervisor and set them active in OS
that have only migration status. Open the other remaining windows only
after the other DLPAR core add event. 

Regarding the traget credits on the destination, we get the new
capabilities after migration and use the new value for reopen. 

Ex: Used the following test case -
- Configuted 2 dedicated cores (40 credits) and exeuted the test case
which opened 35 credits / windows
- Removed 1 core, means available 20 credits. So closed 15 windows and
set them with DLPAR closed status
- Migration start: Closed the remaining 20 windows and set all windows
(means 35) migration status
- After migration, opened windows that have only migration status - 20
windows, and also clear migration stats for the remaining 15 widnows
- Add core which gives the system 20 more credits, So opened the
remaining 15 windows and these have only DLPAR closed status. 

> 
> Why do you need two close window types, what if you finish
> LPM and just open as many as possible regardless how they
> are closed?

Adding 2 different status bits to support DLPAR and LPM closed staus.
As I mentioned above, windows will be active only after both bits are
cleared.

Thanks
Haren

> 
> Thanks,
> Nick
> 
> > Signed-off-by: Haren Myneni 
> > ---
> >  arch/powerpc/include/asm/vas.h   |  2 +
> >  arch/powerpc/platforms/pseries/vas.c | 88 ++
> > --
> >  2 files changed, 73 insertions(+), 17 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/vas.h
> > b/arch/powerpc/include/asm/vas.h
> > index 6baf7b9ffed4..83afcb6c194b 100644
> > --- a/arch/powerpc/include/asm/vas.h
> > +++ b/arch/powerpc/include/asm/vas.h
> > @@ -36,6 +36,8 @@
> > /* vas mmap() */
> >  /* Window is closed in the hypervisor due to lost credit */
> >  #define VAS_WIN_NO_CRED_CLOSE  0x0001
> > +/* Window is closed due to migration */
> > +#define VAS_WIN_MIGRATE_CLOSE  0x0002
> >  
> >  /*
> >   * Get/Set bit fields
> > diff --git a/arch/powerpc/platforms/pseries/vas.c
> > b/arch/powerpc/platforms/pseries/vas.c
> > index 3bb219f54806..fbcf311da0ec 100644
> > --- a/arch/powerpc/platforms/pseries/vas.c
> > +++ b/arch/powerpc/platforms/pseries/vas.c
> > @@ -457,11 +457,12 @@ static int vas_deallocate_window(struct
> > vas_window *vwin)
> > mutex_lock(_pseries_mutex);
> > /*
> >  * VAS window is already closed in the hypervisor when
> > -* lost the credit. So just remove the entry from
> > -* the list, remove task references and free vas_window
> > +* lost the credit or with migration. So just remove the entry
> > +* from the list, remove task references and free vas_window
> >  * struct.
> >  */
> > -   if (win->vas_win.status & VAS_WIN_NO_CRED_CLOSE) {
> > +   if (!(win->vas_win.status & VAS_WIN_NO_CRED_CLOSE) &&
> > +   !(win->vas_win.status & VAS_WIN_MIGRATE_CLOSE)) {
> > rc = deallocate_free_window(win);
> > if (rc) {
> > mutex_unlock(_pseries_mutex);
> > @@ -578,12 +579,14 @@ static int __init get_vas_capabilities(u8
> > feat, enum vas_cop_feat_type type,
> >   * by setting the remapping to new paste address if the window is
> >   * active.
> >   */
> > -static int reconfig_open_windows(struct vas_caps *vcaps, int
> >

[PATCH v2] usercopy: Check valid lifetime via stack depth

2022-02-24 Thread Kees Cook

Under CONFIG_HARDENED_USERCOPY=y, when exact stack frame boundary checking
is not available (i.e. everything except x86 with FRAME_POINTER), check
a stack object as being at least "current depth valid", in the sense
that any object within the stack region but not between start-of-stack
and current_stack_pointer should be considered unavailable (i.e. its
lifetime is from a call no longer present on the stack).

Introduce ARCH_HAS_CURRENT_STACK_POINTER to track which architectures
have actually implemented the common global register alias.

Additionally report usercopy bounds checking failures with an offset
from current_stack_pointer, which may assist with diagnosing failures.

The LKDTM USERCOPY_STACK_FRAME_TO and USERCOPY_STACK_FRAME_FROM tests
(once slightly adjusted in a separate patch) will pass again with
this fixed.

Cc: Matthew Wilcox (Oracle) 
Cc: Josh Poimboeuf 
Cc: Andrew Morton 
Cc: linux...@kvack.org
Reported-by: Muhammad Usama Anjum 
Signed-off-by: Kees Cook 
---
v1: https://lore.kernel.org/all/20220216201449.2087956-1-keesc...@chromium.org/
v2: adjust for only some archs having current_stack_pointer
---
 arch/arm/Kconfig |  1 +
 arch/arm64/Kconfig   |  1 +
 arch/powerpc/Kconfig |  1 +
 arch/s390/Kconfig|  1 +
 arch/sh/Kconfig  |  1 +
 arch/x86/Kconfig |  1 +
 mm/usercopy.c| 41 ++---
 7 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 4c97cb40eebb..a7a09eef1852 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -5,6 +5,7 @@ config ARM
select ARCH_32BIT_OFF_T
select ARCH_CORRECT_STACKTRACE_ON_KRETPROBE if HAVE_KRETPROBES && 
FRAME_POINTER && !ARM_UNWIND
select ARCH_HAS_BINFMT_FLAT
+   select ARCH_HAS_CURRENT_STACK_POINTER
select ARCH_HAS_DEBUG_VIRTUAL if MMU
select ARCH_HAS_DMA_WRITE_COMBINE if !ARM_DMA_MEM_BUFFERABLE
select ARCH_HAS_ELF_RANDOMIZE
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index f2b5a4abef21..b8ab790555c8 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -18,6 +18,7 @@ config ARM64
select ARCH_ENABLE_SPLIT_PMD_PTLOCK if PGTABLE_LEVELS > 2
select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE
select ARCH_HAS_CACHE_LINE_SIZE
+   select ARCH_HAS_CURRENT_STACK_POINTER
select ARCH_HAS_DEBUG_VIRTUAL
select ARCH_HAS_DEBUG_VM_PGTABLE
select ARCH_HAS_DMA_PREP_COHERENT
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index b779603978e1..7e7387bd7d53 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -108,6 +108,7 @@ config PPC
select ARCH_ENABLE_MEMORY_HOTPLUG
select ARCH_ENABLE_MEMORY_HOTREMOVE
select ARCH_HAS_COPY_MC if PPC64
+   select ARCH_HAS_CURRENT_STACK_POINTER
select ARCH_HAS_DEBUG_VIRTUAL
select ARCH_HAS_DEBUG_VM_PGTABLE
select ARCH_HAS_DEBUG_WXif STRICT_KERNEL_RWX
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index be9f39fd06df..4845ab549dd1 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -60,6 +60,7 @@ config S390
select ARCH_ENABLE_MEMORY_HOTPLUG if SPARSEMEM
select ARCH_ENABLE_MEMORY_HOTREMOVE
select ARCH_ENABLE_SPLIT_PMD_PTLOCK if PGTABLE_LEVELS > 2
+   select ARCH_HAS_CURRENT_STACK_POINTER
select ARCH_HAS_DEBUG_VM_PGTABLE
select ARCH_HAS_DEBUG_WX
select ARCH_HAS_DEVMEM_IS_ALLOWED
diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index 2474a04ceac4..1c2b53bf3093 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -7,6 +7,7 @@ config SUPERH
select ARCH_HAVE_CUSTOM_GPIO_H
select ARCH_HAVE_NMI_SAFE_CMPXCHG if (GUSA_RB || CPU_SH4A)
select ARCH_HAS_BINFMT_FLAT if !MMU
+   select ARCH_HAS_CURRENT_STACK_POINTER
select ARCH_HAS_GIGANTIC_PAGE
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_HAS_PTE_SPECIAL
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 9f5bd41bf660..90494fba3620 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -69,6 +69,7 @@ config X86
select ARCH_ENABLE_THP_MIGRATION if X86_64 && TRANSPARENT_HUGEPAGE
select ARCH_HAS_ACPI_TABLE_UPGRADE  if ACPI
select ARCH_HAS_CACHE_LINE_SIZE
+   select ARCH_HAS_CURRENT_STACK_POINTER
select ARCH_HAS_DEBUG_VIRTUAL
select ARCH_HAS_DEBUG_VM_PGTABLEif !X86_PAE
select ARCH_HAS_DEVMEM_IS_ALLOWED
diff --git a/mm/usercopy.c b/mm/usercopy.c
index d0d268135d96..5d28725af95f 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -22,6 +22,30 @@
 #include 
 #include "slab.h"
 
+/*
+ * Only called if obj is within stack/stackend bounds. Determine if within
+ * current stack depth.
+ */
+static inline int check_stack_object_depth(const void *obj,
+  unsigned long len)
+{
+#ifdef CONFIG_ARCH_HAS_CURRENT_STACK_POINTER
+#ifndef CONFIG_STACK_GROWSUP
+   const void * const

Re: [PATCH v2 18/18] uaccess: drop maining CONFIG_SET_FS users

2022-02-24 Thread Stafford Horne

On Wed, Feb 16, 2022 at 02:13:32PM +0100, Arnd Bergmann wrote:
> From: Arnd Bergmann 
> 
> There are no remaining callers of set_fs(), so CONFIG_SET_FS
> can be removed globally, along with the thread_info field and
> any references to it.
> 
> This turns access_ok() into a cheaper check against TASK_SIZE_MAX.
> 
> With CONFIG_SET_FS gone, so drop all remaining references to
> set_fs()/get_fs(), mm_segment_t and uaccess_kernel().
> 
> Signed-off-by: Arnd Bergmann 
> ---
...
>  arch/openrisc/Kconfig |  1 -
>  arch/openrisc/include/asm/thread_info.h   |  7 ---
>  arch/openrisc/include/asm/uaccess.h   | 23 
...
>  fs/exec.c |  6 --
>  include/asm-generic/access_ok.h   | 10 +---
>  include/asm-generic/uaccess.h | 25 +---
>  include/linux/syscalls.h  |  4 --
>  include/linux/uaccess.h   | 33 ---
>  include/rdma/ib.h |  2 +-
>  kernel/events/callchain.c |  4 --
>  kernel/events/core.c  |  3 -
>  kernel/exit.c | 14 -
>  kernel/kthread.c  |  5 --
>  kernel/stacktrace.c   |  3 -
>  kernel/trace/bpf_trace.c  |  4 --
>  mm/maccess.c  | 11 
>  mm/memory.c   |  8 ---
>  net/bpfilter/bpfilter_kern.c  |  2 +-
>  72 files changed, 10 insertions(+), 522 deletions(-)
>  delete mode 100644 arch/arc/include/asm/segment.h
>  delete mode 100644 arch/csky/include/asm/segment.h
>  delete mode 100644 arch/h8300/include/asm/segment.h
> 
> diff --git a/arch/Kconfig b/arch/Kconfig
> index fa5db36bda67..99349547afed 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -24,9 +24,6 @@ config KEXEC_ELF
>  config HAVE_IMA_KEXEC
>   bool
>  
> -config SET_FS
> - bool
> -
>  config HOTPLUG_SMT
>   bool
>  
...
> diff --git a/arch/openrisc/Kconfig b/arch/openrisc/Kconfig
> index f724b3f1aeed..0d68adf6e02b 100644
> --- a/arch/openrisc/Kconfig
> +++ b/arch/openrisc/Kconfig
> @@ -36,7 +36,6 @@ config OPENRISC
>   select ARCH_WANT_FRAME_POINTERS
>   select GENERIC_IRQ_MULTI_HANDLER
>   select MMU_GATHER_NO_RANGE if MMU
> - select SET_FS
>   select TRACE_IRQFLAGS_SUPPORT
>  
>  config CPU_BIG_ENDIAN
> diff --git a/arch/openrisc/include/asm/thread_info.h 
> b/arch/openrisc/include/asm/thread_info.h
> index 659834ab87fa..4af3049c34c2 100644
> --- a/arch/openrisc/include/asm/thread_info.h
> +++ b/arch/openrisc/include/asm/thread_info.h
> @@ -40,18 +40,12 @@
>   */
>  #ifndef __ASSEMBLY__
>  
> -typedef unsigned long mm_segment_t;
> -
>  struct thread_info {
>   struct task_struct  *task;  /* main task structure */
>   unsigned long   flags;  /* low level flags */
>   __u32   cpu;/* current CPU */
>   __s32   preempt_count; /* 0 => preemptable, <0 => BUG */
>  
> - mm_segment_taddr_limit; /* thread address space:
> -0-0x7FFF for user-thead
> -0-0x for kernel-thread
> -  */
>   __u8supervisor_stack[0];
>  
>   /* saved context data */
> @@ -71,7 +65,6 @@ struct thread_info {
>   .flags  = 0,\
>   .cpu= 0,\
>   .preempt_count  = INIT_PREEMPT_COUNT,   \
> - .addr_limit = KERNEL_DS,\
>   .ksp= 0,\
>  }
>  
> diff --git a/arch/openrisc/include/asm/uaccess.h 
> b/arch/openrisc/include/asm/uaccess.h
> index 8f049ec99b3e..d6500a374e18 100644
> --- a/arch/openrisc/include/asm/uaccess.h
> +++ b/arch/openrisc/include/asm/uaccess.h
> @@ -22,29 +22,6 @@
>  #include 
>  #include 
>  #include 
> -
> -/*
> - * The fs value determines whether argument validity checking should be
> - * performed or not.  If get_fs() == USER_DS, checking is performed, with
> - * get_fs() == KERNEL_DS, checking is bypassed.
> - *
> - * For historical reasons, these macros are grossly misnamed.
> - */
> -
> -/* addr_limit is the maximum accessible address for the task. we misuse
> - * the KERNEL_DS and USER_DS values to both assign and compare the
> - * addr_limit values through the equally misnamed get/set_fs macros.
> - * (see above)
> - */
> -
> -#define KERNEL_DS(~0UL)
> -
> -#define USER_DS  (TASK_SIZE)
> -#define get_fs() (current_thread_info()->addr_limit)
> -#define set_fs(x)(current_thread_info()->addr_limit = (x))
> -
> -#define uaccess_kernel() (get_fs() == KERNEL_DS)
> -
>  #include 
>  
>  /*
...
> diff --git a/fs/exec.c b/fs/exec.c
> index 79f2c9483302..bc68a0c089ac 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1303,12 +1303,6 @@ int

Re: [PATCH V5 17/21] riscv: compat: vdso: Add setup additional pages implementation

2022-02-24 Thread Guo Ren

On Wed, Feb 23, 2022 at 9:42 AM Palmer Dabbelt  wrote:
>
> On Tue, 01 Feb 2022 07:05:41 PST (-0800), guo...@kernel.org wrote:
> > From: Guo Ren 
> >
> > Reconstruct __setup_additional_pages() by appending vdso info
> > pointer argument to meet compat_vdso_info requirement. And change
> > vm_special_mapping *dm, *cm initialization into static.
> >
> > Signed-off-by: Guo Ren 
> > Signed-off-by: Guo Ren 
> > Cc: Arnd Bergmann 
> > Cc: Palmer Dabbelt 
> > ---
> >  arch/riscv/include/asm/elf.h |   5 ++
> >  arch/riscv/include/asm/mmu.h |   1 +
> >  arch/riscv/kernel/vdso.c | 104 +--
> >  3 files changed, 81 insertions(+), 29 deletions(-)
> >
> > diff --git a/arch/riscv/include/asm/elf.h b/arch/riscv/include/asm/elf.h
> > index 3a4293dc7229..d87d3bcc758d 100644
> > --- a/arch/riscv/include/asm/elf.h
> > +++ b/arch/riscv/include/asm/elf.h
> > @@ -134,5 +134,10 @@ do {if ((ex).e_ident[EI_CLASS] == ELFCLASS32)  
> >   \
> >  typedef compat_ulong_t   compat_elf_greg_t;
> >  typedef compat_elf_greg_tcompat_elf_gregset_t[ELF_NGREG];
> >
> > +extern int compat_arch_setup_additional_pages(struct linux_binprm *bprm,
> > +   int uses_interp);
> > +#define compat_arch_setup_additional_pages \
> > + compat_arch_setup_additional_pages
> > +
> >  #endif /* CONFIG_COMPAT */
> >  #endif /* _ASM_RISCV_ELF_H */
> > diff --git a/arch/riscv/include/asm/mmu.h b/arch/riscv/include/asm/mmu.h
> > index 0099dc116168..cedcf8ea3c76 100644
> > --- a/arch/riscv/include/asm/mmu.h
> > +++ b/arch/riscv/include/asm/mmu.h
> > @@ -16,6 +16,7 @@ typedef struct {
> >   atomic_long_t id;
> >  #endif
> >   void *vdso;
> > + void *vdso_info;
> >  #ifdef CONFIG_SMP
> >   /* A local icache flush is needed before user execution can resume. */
> >   cpumask_t icache_stale_mask;
> > diff --git a/arch/riscv/kernel/vdso.c b/arch/riscv/kernel/vdso.c
> > index a9436a65161a..deca69524799 100644
> > --- a/arch/riscv/kernel/vdso.c
> > +++ b/arch/riscv/kernel/vdso.c
> > @@ -23,6 +23,9 @@ struct vdso_data {
> >  #endif
> >
> >  extern char vdso_start[], vdso_end[];
> > +#ifdef CONFIG_COMPAT
> > +extern char compat_vdso_start[], compat_vdso_end[];
> > +#endif
> >
> >  enum vvar_pages {
> >   VVAR_DATA_PAGE_OFFSET,
> > @@ -30,6 +33,11 @@ enum vvar_pages {
> >   VVAR_NR_PAGES,
> >  };
> >
> > +enum rv_vdso_map {
> > + RV_VDSO_MAP_VVAR,
> > + RV_VDSO_MAP_VDSO,
> > +};
> > +
> >  #define VVAR_SIZE  (VVAR_NR_PAGES << PAGE_SHIFT)
> >
> >  /*
> > @@ -52,12 +60,6 @@ struct __vdso_info {
> >   struct vm_special_mapping *cm;
> >  };
> >
> > -static struct __vdso_info vdso_info __ro_after_init = {
> > - .name = "vdso",
> > - .vdso_code_start = vdso_start,
> > - .vdso_code_end = vdso_end,
> > -};
> > -
> >  static int vdso_mremap(const struct vm_special_mapping *sm,
> >  struct vm_area_struct *new_vma)
> >  {
> > @@ -66,35 +68,35 @@ static int vdso_mremap(const struct vm_special_mapping 
> > *sm,
> >   return 0;
> >  }
> >
> > -static int __init __vdso_init(void)
> > +static int __init __vdso_init(struct __vdso_info *vdso_info)
> >  {
> >   unsigned int i;
> >   struct page **vdso_pagelist;
> >   unsigned long pfn;
> >
> > - if (memcmp(vdso_info.vdso_code_start, "\177ELF", 4)) {
> > + if (memcmp(vdso_info->vdso_code_start, "\177ELF", 4)) {
> >   pr_err("vDSO is not a valid ELF object!\n");
> >   return -EINVAL;
> >   }
> >
> > - vdso_info.vdso_pages = (
> > - vdso_info.vdso_code_end -
> > - vdso_info.vdso_code_start) >>
> > + vdso_info->vdso_pages = (
> > + vdso_info->vdso_code_end -
> > + vdso_info->vdso_code_start) >>
> >   PAGE_SHIFT;
> >
> > - vdso_pagelist = kcalloc(vdso_info.vdso_pages,
> > + vdso_pagelist = kcalloc(vdso_info->vdso_pages,
> >   sizeof(struct page *),
> >   GFP_KERNEL);
> >   if (vdso_pagelist == NULL)
> >   return -ENOMEM;
> >
> >   /* Grab the vDSO code pages. */
> > - pfn = sym_to_pfn(vdso_info.vdso_code_start);
> > + pfn = sym_to_pfn(vdso_info->vdso_code_start);
> >
> > - for (i = 0; i < vdso_info.vdso_pages; i++)
> > + for (i = 0; i < vdso_info->vdso_pages; i++)
> >   vdso_pagelist[i] = pfn_to_page(pfn + i);
> >
> > - vdso_info.cm->pages = vdso_pagelist;
> > + vdso_info->cm->pages = vdso_pagelist;
> >
> >   return 0;
> >  }
> > @@ -116,13 +118,14 @@ int vdso_join_timens(struct task_struct *task, struct 
> > time_namespace *ns)
> >  {
> >   struct mm_struct *mm = task->mm;
> >   struct vm_area_struct *vma;
> > + struct __vdso_info *vdso_info = mm->context.vdso_info;
>
> IIUC this is the only use for context.vdso_info?  If that's the case,
> can we just switch between VDSO targets

Re: [PATCH 1/3] powerpc: lib: sstep: fix 'sthcx' instruction

2022-02-24 Thread Nicholas Piggin

Excerpts from Anders Roxell's message of February 23, 2022 11:58 pm:
> Looks like there been a copy paste mistake when added the instruction
> 'stbcx' twice and one was probably meant to be 'sthcx'.
> Changing to 'sthcx' from 'stbcx'.
> 
> Cc:  # v4.13+
> Fixes: 350779a29f11 ("powerpc: Handle most loads and stores in instruction 
> emulation code")
> Reported-by: Arnd Bergmann 
> Signed-off-by: Anders Roxell 

Good catch.

Reviewed-by: Nicholas Piggin 

> ---
>  arch/powerpc/lib/sstep.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
> index bd3734d5be89..d2d29243fa6d 100644
> --- a/arch/powerpc/lib/sstep.c
> +++ b/arch/powerpc/lib/sstep.c
> @@ -3389,7 +3389,7 @@ int emulate_loadstore(struct pt_regs *regs, struct 
> instruction_op *op)
>   __put_user_asmx(op->val, ea, err, "stbcx.", cr);
>   break;
>   case 2:
> - __put_user_asmx(op->val, ea, err, "stbcx.", cr);
> + __put_user_asmx(op->val, ea, err, "sthcx.", cr);
>   break;
>  #endif
>   case 4:
> -- 
> 2.34.1
> 
>

Re: [PATCH v2 13/18] uaccess: generalize access_ok()

2022-02-24 Thread Arnd Bergmann

On Thu, Feb 24, 2022 at 9:29 AM Stafford Horne  wrote:

> > -
> > -#define access_ok(addr, size)  
> >   \
> > -({   \
> > - __chk_user_ptr(addr);   \
> > - __range_ok((unsigned long)(addr), (size));  \
> > -})
> > +#include 
>
> I was going to ask why we are missing __chk_user_ptr in the generic version.
> But this is basically now a no-op so I think its OK.

Correct, the type checking is implied by making __access_ok() an inline
function that takes a __user pointer.

> Acked-by: Stafford Horne  [openrisc, asm-generic]

Thanks!

   Arnd

Re: [PATCH] powerpc/32: Clear volatile regs on syscall exit

2022-02-24 Thread Christophe Leroy



Le 23/02/2022 à 20:34, Kees Cook a écrit :
> On Wed, Feb 23, 2022 at 06:11:36PM +0100, Christophe Leroy wrote:
>> Commit a82adfd5c7cb ("hardening: Introduce CONFIG_ZERO_CALL_USED_REGS")
>> added zeroing of used registers at function exit.
>>
>> At the time being, PPC64 clears volatile registers on syscall exit but
>> PPC32 doesn't do it for performance reason.
>>
>> Add that clearing in PPC32 syscall exit as well, but only when
>> CONFIG_ZERO_CALL_USED_REGS is selected.
>>
>> On an 8xx, the null_syscall selftest gives:
>> - Without CONFIG_ZERO_CALL_USED_REGS : 288 cycles
>> - With CONFIG_ZERO_CALL_USED_REGS: 305 cycles
>> - With CONFIG_ZERO_CALL_USED_REGS + this patch   : 319 cycles
>>
>> Note that (independent of this patch), with pmac32_defconfig,
>> vmlinux size is as follows with/without CONFIG_ZERO_CALL_USED_REGS:
>>
>> text databss dec hex filename
>> 9578869  2525210  194400 12298479bba8ef  vmlinux.without
>> 10318045 2525210  194400 13037655c6f057  vmlinux.with
>>
>> That is a 7.7% increase on text size, 6.0% on overall size.
>>
>> Signed-off-by: Christophe Leroy 
>> ---
>>   arch/powerpc/kernel/entry_32.S | 15 +++
>>   1 file changed, 15 insertions(+)
>>
>> diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
>> index 7748c278d13c..199f23092c02 100644
>> --- a/arch/powerpc/kernel/entry_32.S
>> +++ b/arch/powerpc/kernel/entry_32.S
>> @@ -151,6 +151,21 @@ syscall_exit_finish:
>>  bne 3f
>>  mtcrr5
>>   
>> +#ifdef CONFIG_ZERO_CALL_USED_REGS
>> +/* Zero volatile regs that may contain sensitive kernel data */
>> +li  r0,0
>> +li  r4,0
>> +li  r5,0
>> +li  r6,0
>> +li  r7,0
>> +li  r8,0
>> +li  r9,0
>> +li  r10,0
>> +li  r11,0
>> +li  r12,0
>> +mtctr   r0
>> +mtxer   r0
>> +#endif
> 
> I think this should probably be unconditional -- if this is actually
> leaking kernel pointers (or data) that's pretty bad. :|
> 
> If you really want to leave it build-time selectable, maybe add a new
> config that gets "select"ed by CONFIG_ZERO_CALL_USED_REGS?

You mean a CONFIG that is selected by CONFIG_ZERO_CALL_USED_REGS and may 
also be selected by the user when CONFIG_ZERO_CALL_USED_REGS is not 
selected ?

At exit:
- contain of r4 is loaded in LR
- contain of r5 is loaded in CR
- contain of r7 is were we branch after switching back to user mode
- contain of r8 is loaded in MSR. Allthough MSR can't be read by the 
user, there is nothing secret in it.
- XER contains arithmetic flags, nothing really sensitive.

So remain r0, r6, r9 to r12 and ctr.

Maybe a compromise could be to only clear those when 
CONFIG_ZERO_CALL_USED_REGS is not selected ?

> 
> (And you may want to consider wiping all "unused" registers at syscall
> entry as well.)

How "unused" ?

At syscall entry we have syscall NR in r0, syscall args in r3 to r8.
The handler uses r9, r10, r11 and r12 prior to re-enabling MMU and 
taking any conditional branche.
r1 and r2 are also soon set and used (r1 is stack ptr, r2 is ptr to 
current task struct) and restored from stack at the end.
r13-r31 are callee saved/restored.

Christophe

Re: [PATCH v6 0/4] Add perf interface to expose nvdimm

2022-02-24 Thread kajoljain

On 2/24/22 02:47, Dan Williams wrote:
> On Wed, Feb 23, 2022 at 11:07 AM Dan Williams  
> wrote:
>>
>> On Fri, Feb 18, 2022 at 10:06 AM Dan Williams  
>> wrote:
>>>
>>> On Thu, Feb 17, 2022 at 8:34 AM Kajol Jain  wrote:

 Patchset adds performance stats reporting support for nvdimm.
 Added interface includes support for pmu register/unregister
 functions. A structure is added called nvdimm_pmu to be used for
 adding arch/platform specific data such as cpumask, nvdimm device
 pointer and pmu event functions like event_init/add/read/del.
 User could use the standard perf tool to access perf events
 exposed via pmu.

 Interface also defines supported event list, config fields for the
 event attributes and their corresponding bit values which are exported
 via sysfs. Patch 3 exposes IBM pseries platform nmem* device
 performance stats using this interface.

 Result from power9 pseries lpar with 2 nvdimm device:

 Ex: List all event by perf list

 command:# perf list nmem

   nmem0/cache_rh_cnt/[Kernel PMU event]
   nmem0/cache_wh_cnt/[Kernel PMU event]
   nmem0/cri_res_util/[Kernel PMU event]
   nmem0/ctl_res_cnt/ [Kernel PMU event]
   nmem0/ctl_res_tm/  [Kernel PMU event]
   nmem0/fast_w_cnt/  [Kernel PMU event]
   nmem0/host_l_cnt/  [Kernel PMU event]
   nmem0/host_l_dur/  [Kernel PMU event]
   nmem0/host_s_cnt/  [Kernel PMU event]
   nmem0/host_s_dur/  [Kernel PMU event]
   nmem0/med_r_cnt/   [Kernel PMU event]
   nmem0/med_r_dur/   [Kernel PMU event]
   nmem0/med_w_cnt/   [Kernel PMU event]
   nmem0/med_w_dur/   [Kernel PMU event]
   nmem0/mem_life/[Kernel PMU event]
   nmem0/poweron_secs/[Kernel PMU event]
   ...
   nmem1/mem_life/[Kernel PMU event]
   nmem1/poweron_secs/[Kernel PMU event]

 Patch1:
 Introduces the nvdimm_pmu structure
 Patch2:
 Adds common interface to add arch/platform specific data
 includes nvdimm device pointer, pmu data along with
 pmu event functions. It also defines supported event list
 and adds attribute groups for format, events and cpumask.
 It also adds code for cpu hotplug support.
 Patch3:
 Add code in arch/powerpc/platform/pseries/papr_scm.c to expose
 nmem* pmu. It fills in the nvdimm_pmu structure with pmu name,
 capabilities, cpumask and event functions and then registers
 the pmu by adding callbacks to register_nvdimm_pmu.
 Patch4:
 Sysfs documentation patch

 Changelog
 ---
 Resend v5 -> v6
 - No logic change, just a rebase to latest upstream and
   tested the patchset.

 - Link to the patchset Resend v5: https://lkml.org/lkml/2021/11/15/3979

 v5 -> Resend v5
 - Resend the patchset

 - Link to the patchset v5: https://lkml.org/lkml/2021/9/28/643

 v4 -> v5:
 - Remove multiple variables defined in nvdimm_pmu structure include
   name and pmu functions(event_int/add/del/read) as they are just
   used to copy them again in pmu variable. Now we are directly doing
   this step in arch specific code as suggested by Dan Williams.

 - Remove attribute group field from nvdimm pmu structure and
   defined these attribute groups in common interface which
   includes format, event list along with cpumask as suggested by
   Dan Williams.
   Since we added static defination for attrbute groups needed in
   common interface, removes corresponding code from papr.

 - Add nvdimm pmu event list with event codes in the common interface.

 - Remove Acked-by/Reviewed-by/Tested-by tags as code is refactored
   to handle review comments from Dan.
>>>
>>> I don't think review comments should invalidate the Acked-by tags in
>>> this case. Nothing fundamentally changed in the approach, and I would
>>> like to have the perf ack before taking this through the nvdimm tree.
>>>
>>> Otherwise this looks good to me.
>>>
>>> Peter, might you have a chance to re-Ack this series, or any concerns
>>> about me retrieving those Acks from the previous postings?
>>
>> Reached Peter offline and he refreshed his Acked-by.
> 
> There's still time for the tags from:
> 
> "Madhavan Srinivasan"
> "Nageswara R

Re: [PATCH] powerpc/module_64: fix array_size.cocci warning

2022-02-24 Thread Russell Currey

On Wed, 2022-02-23 at 15:54 +0800, Guo Zhengkui wrote:
> Fix following coccicheck warning:
> ./arch/powerpc/kernel/module_64.c:432:40-41: WARNING: Use ARRAY_SIZE.
> 
> ARRAY_SIZE(arr) is a macro provided by the kernel. It makes sure that
> arr
> is an array, so it's safer than sizeof(arr) / sizeof(arr[0]) and more
> standard.
> 
> Signed-off-by: Guo Zhengkui 

Reviewed-by: Russell Currey

Re: [PATCH] powerpc/32: Clear volatile regs on syscall exit

2022-02-24 Thread Gabriel Paubert

On Wed, Feb 23, 2022 at 05:27:39PM -0600, Segher Boessenkool wrote:
> On Wed, Feb 23, 2022 at 09:48:09PM +0100, Gabriel Paubert wrote:
> > On Wed, Feb 23, 2022 at 06:11:36PM +0100, Christophe Leroy wrote:
> > > + /* Zero volatile regs that may contain sensitive kernel data */
> > > + li  r0,0
> > > + li  r4,0
> > > + li  r5,0
> > > + li  r6,0
> > > + li  r7,0
> > > + li  r8,0
> > > + li  r9,0
> > > + li  r10,0
> > > + li  r11,0
> > > + li  r12,0
> > > + mtctr   r0
> > > + mtxer   r0
> > 
> > Here, I'm almost sure that on some processors, it would be better to
> > separate mtctr form mtxer. mtxer is typically very expensive (pipeline
> > flush) but I don't know what's the best ordering for the average core.
> 
> mtxer is cheaper than mtctr on many cores :-)

We're speaking of 32 bit here I believe; on my (admittedly old) paper
copy of PowerPC 604 user's manual, I read in a footnote:

"The mtspr (XER) instruction causes instructions to be flushed when it
executes." 

Also a paragraph about "PostDispatch Serialization Mode" which reads:
"All instructions following the postdispatch serialization instruction
are flushed, refetched, and reexecuted."

Then it goes on to list the affected instructions which starts with:
mtsper(xer), mcrxr, isync, ...

I know there are probably very few 604 left in the field, but in this
case mtspr(xer) looks very much like a superset of isync.

I also just had a look at the documentation of a more widespread core:

https://www.nxp.com/docs/en/reference-manual/MPC7450UM.pdf

and mtspr(xer) is marked as execution and refetch serialized, actually
it is the only instruction to have both.

Maybe there is a subtle difference between "refetch serialization" and
"pipeline flush", but in this case please educate me.

Besides that the back to back mtctr/mtspr(xer) may limit instruction
decoding and issuing bandwidth.  I'd rather move one of them up by a few
lines since they can only go to one of the execution units on some
(or even most?) cores. This was my main point initially.

Gabriel

> 
> On p9 mtxer is cracked into two latency 3 ops (which run in parallel).
> While mtctr has latency 5.
> 
> On p8 mtxer was horrible indeed (but nothing near as bad as a pipeline
> flush).
> 
> 
> Segher

Re: [PATCH v2 13/18] uaccess: generalize access_ok()

2022-02-24 Thread Stafford Horne

On Wed, Feb 16, 2022 at 02:13:27PM +0100, Arnd Bergmann wrote:
> From: Arnd Bergmann 
> 
> There are many different ways that access_ok() is defined across
> architectures, but in the end, they all just compare against the
> user_addr_max() value or they accept anything.
> 
> Provide one definition that works for most architectures, checking
> against TASK_SIZE_MAX for user processes or skipping the check inside
> of uaccess_kernel() sections.
> 
> For architectures without CONFIG_SET_FS(), this should be the fastest
> check, as it comes down to a single comparison of a pointer against a
> compile-time constant, while the architecture specific versions tend to
> do something more complex for historic reasons or get something wrong.
> 
> Type checking for __user annotations is handled inconsistently across
> architectures, but this is easily simplified as well by using an inline
> function that takes a 'const void __user *' argument. A handful of
> callers need an extra __user annotation for this.
> 
> Some architectures had trick to use 33-bit or 65-bit arithmetic on the
> addresses to calculate the overflow, however this simpler version uses
> fewer registers, which means it can produce better object code in the
> end despite needing a second (statically predicted) branch.
> 
> Reviewed-by: Christoph Hellwig 
> Acked-by: Mark Rutland  [arm64, asm-generic]
> Signed-off-by: Arnd Bergmann 
> ---
...
>  arch/openrisc/include/asm/uaccess.h   | 19 +
...
>  include/asm-generic/access_ok.h   | 59 +++
>  include/asm-generic/uaccess.h | 21 +-
>  include/linux/uaccess.h   |  7 
>  32 files changed, 109 insertions(+), 366 deletions(-)
> 
...
> diff --git a/arch/openrisc/include/asm/uaccess.h 
> b/arch/openrisc/include/asm/uaccess.h
> index 120f5005461b..8f049ec99b3e 100644
> --- a/arch/openrisc/include/asm/uaccess.h
> +++ b/arch/openrisc/include/asm/uaccess.h
> @@ -45,21 +45,7 @@
>  
>  #define uaccess_kernel() (get_fs() == KERNEL_DS)
>  
> -/* Ensure that the range from addr to addr+size is all within the process'
> - * address space
> - */
> -static inline int __range_ok(unsigned long addr, unsigned long size)
> -{
> - const mm_segment_t fs = get_fs();
> -
> - return size <= fs && addr <= (fs - size);
> -}
> -
> -#define access_ok(addr, size)
> \
> -({   \
> - __chk_user_ptr(addr);   \
> - __range_ok((unsigned long)(addr), (size));  \
> -})
> +#include 

I was going to ask why we are missing __chk_user_ptr in the generic version.
But this is basically now a no-op so I think its OK.

>  /*
>   * These are the main single-value transfer routines.  They automatically
> @@ -268,9 +254,6 @@ clear_user(void __user *addr, unsigned long size)
>   return size;
>  }
>  
> -#define user_addr_max() \
> - (uaccess_kernel() ? ~0UL : TASK_SIZE)
> -
>  extern long strncpy_from_user(char *dest, const char __user *src, long 
> count);
>  
>  extern __must_check long strnlen_user(const char __user *str, long n);

...
> diff --git a/include/asm-generic/access_ok.h b/include/asm-generic/access_ok.h
> new file mode 100644
> index ..1aad8964d2ed
> --- /dev/null
> +++ b/include/asm-generic/access_ok.h
> @@ -0,0 +1,59 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __ASM_GENERIC_ACCESS_OK_H__
> +#define __ASM_GENERIC_ACCESS_OK_H__
> +
> +/*
> + * Checking whether a pointer is valid for user space access.
> + * These definitions work on most architectures, but overrides can
> + * be used where necessary.
> + */
> +
> +/*
> + * architectures with compat tasks have a variable TASK_SIZE and should
> + * override this to a constant.
> + */
> +#ifndef TASK_SIZE_MAX
> +#define TASK_SIZE_MAXTASK_SIZE
> +#endif
> +
> +#ifndef uaccess_kernel
> +#ifdef CONFIG_SET_FS
> +#define uaccess_kernel() (get_fs().seg == KERNEL_DS.seg)
> +#else
> +#define uaccess_kernel() (0)
> +#endif
> +#endif
> +
> +#ifndef user_addr_max
> +#define user_addr_max()  (uaccess_kernel() ? ~0UL : 
> TASK_SIZE_MAX)
> +#endif
> +
> +#ifndef __access_ok
> +/*
> + * 'size' is a compile-time constant for most callers, so optimize for
> + * this case to turn the check into a single comparison against a constant
> + * limit and catch all possible overflows.
> + * On architectures with separate user address space (m68k, s390, parisc,
> + * sparc64) or those without an MMU, this should always return true.
> + *
> + * This version was originally contributed by Jonas Bonn for the
> + * OpenRISC architecture, and was found to be the most efficient
> + * for constant 'size' and 'limit' values.
> + */
> +static inline int __access_ok(const void __user *ptr, unsigned long size)
> +{
> + unsigned long limit = user_addr_max();
> +

Re: [PATCH v2 07/18] nios2: drop access_ok() check from __put_user()

2022-02-24 Thread Arnd Bergmann

On Thu, Feb 24, 2022 at 12:30 AM Dinh Nguyen  wrote:
> On 2/16/22 07:13, Arnd Bergmann wrote: From: Arnd Bergmann 
> >
> > Unlike other architectures, the nios2 version of __put_user() has an
> > extra check for access_ok(), preventing it from being used to implement
> > __put_kernel_nofault().
> >
> > Split up put_user() along the same lines as __get_user()/get_user()
> >
> > Signed-off-by: Arnd Bergmann 
>
> Acked-by: Dinh Nguyen 

Thanks! Could you also have a look at patch 2 (uaccess: fix nios2 and
microblaze get_user_8)? That one is actually more critical, and should
be backported to stable kernels.

   Arnd

Re: [PATCH] powerpc/32: Clear volatile regs on syscall exit

2022-02-24 Thread Christophe Leroy



Le 23/02/2022 à 21:48, Gabriel Paubert a écrit :
> On Wed, Feb 23, 2022 at 06:11:36PM +0100, Christophe Leroy wrote:
>> Commit a82adfd5c7cb ("hardening: Introduce CONFIG_ZERO_CALL_USED_REGS")
>> added zeroing of used registers at function exit.
>>
>> At the time being, PPC64 clears volatile registers on syscall exit but
>> PPC32 doesn't do it for performance reason.
>>
>> Add that clearing in PPC32 syscall exit as well, but only when
>> CONFIG_ZERO_CALL_USED_REGS is selected.
>>
>> On an 8xx, the null_syscall selftest gives:
>> - Without CONFIG_ZERO_CALL_USED_REGS : 288 cycles
>> - With CONFIG_ZERO_CALL_USED_REGS: 305 cycles
>> - With CONFIG_ZERO_CALL_USED_REGS + this patch   : 319 cycles
>>
>> Note that (independent of this patch), with pmac32_defconfig,
>> vmlinux size is as follows with/without CONFIG_ZERO_CALL_USED_REGS:
>>
>> text databss dec hex filename
>> 9578869  2525210  194400 12298479bba8ef  vmlinux.without
>> 10318045 2525210  194400 13037655c6f057  vmlinux.with
>>
>> That is a 7.7% increase on text size, 6.0% on overall size.
>>
>> Signed-off-by: Christophe Leroy 
>> ---
>>   arch/powerpc/kernel/entry_32.S | 15 +++
>>   1 file changed, 15 insertions(+)
>>
>> diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
>> index 7748c278d13c..199f23092c02 100644
>> --- a/arch/powerpc/kernel/entry_32.S
>> +++ b/arch/powerpc/kernel/entry_32.S
>> @@ -151,6 +151,21 @@ syscall_exit_finish:
>>  bne 3f
>>  mtcrr5
>>   
>> +#ifdef CONFIG_ZERO_CALL_USED_REGS
>> +/* Zero volatile regs that may contain sensitive kernel data */
>> +li  r0,0
>> +li  r4,0
>> +li  r5,0
>> +li  r6,0
>> +li  r7,0
>> +li  r8,0
>> +li  r9,0
>> +li  r10,0
>> +li  r11,0
>> +li  r12,0
>> +mtctr   r0
>> +mtxer   r0
> 
> Here, I'm almost sure that on some processors, it would be better to
> separate mtctr form mtxer. mtxer is typically very expensive (pipeline
> flush) but I don't know what's the best ordering for the average core.

In the 8xx, CTR and LR are handled by the BPU as any other reg (Latency 
1 blocage 1).
AFAIU, XER is serialize + 1

> 
> And what about lr? Should it also be cleared?

LR is restored from stack.

Christophe

Re: cleanup swiotlb initialization

2022-02-24 Thread Boris Ostrovsky




On 2/22/22 10:35 AM, Christoph Hellwig wrote:

Hi all,

this series tries to clean up the swiotlb initialization, including
that of swiotlb-xen.  To get there is also removes the x86 iommu table
infrastructure that massively obsfucates the initialization path.

Git tree:

 git://git.infradead.org/users/hch/misc.git swiotlb-init-cleanup



I haven't had a chance to look at this yet but this crashes as dom0:


[   37.377313] BUG: unable to handle page fault for address: c90042880018
[   37.378219] #PF: supervisor read access in kernel mode
[   37.378219] #PF: error_code(0x) - not-present page
[   37.378219] PGD 7c2f2ee067 P4D 7c2f2ee067 PUD 7bf019b067 PMD 105a30067 PTE 0
[   37.378219] Oops:  [#1] PREEMPT SMP NOPTI
[   37.378219] CPU: 14 PID: 1 Comm: swapper/0 Not tainted 5.17.0-rc5swiotlb #9
[   37.378219] Hardware name: Oracle Corporation ORACLE SERVER 
E1-2c/ASY,Generic,SM,E1-2c, BIOS 49004900 12/23/2020
[   37.378219] RIP: e030:init_iommu_one+0x248/0x2f0
[   37.378219] Code: 48 89 43 68 48 85 c0 74 c4 be 00 20 00 00 48 89 df e8 ea ee ff 
ff 48 89 43 78 48 85 c0 74 ae c6 83 98 00 00 00 00 48 8b 43 38 <48> 8b 40 18 a8 
01 74 07 83 8b a8 04 00 00 01 f6 83 a8 04 00 00 01
[   37.378219] RSP: e02b:c9004044bd18 EFLAGS: 00010286
[   37.378219] RAX: c9004288 RBX: 888107260800 RCX: 
[   37.378219] RDX: 8000 RSI: ea00041cab80 RDI: 
[   37.378219] RBP: c9004044bd38 R08: 0901 R09: ea00041cab00
[   37.378219] R10: 0002 R11:  R12: c90040435008
[   37.378219] R13: 0008 R14: efa0 R15: 
[   37.378219] FS:  () GS:88fef418() 
knlGS:
[   37.378219] CS:  e030 DS:  ES:  CR0: 80050033
[   37.378219] CR2: c90042880018 CR3: 0260a000 CR4: 00050660
[   37.378219] Call Trace:
[   37.378219]  
[   37.378219]  early_amd_iommu_init+0x3c5/0x72d
[   37.378219]  ? iommu_setup+0x284/0x284
[   37.378219]  state_next+0x158/0x68f
[   37.378219]  ? iommu_setup+0x284/0x284
[   37.378219]  iommu_go_to_state+0x28/0x2d
[   37.378219]  amd_iommu_init+0x15/0x4b
[   37.378219]  ? iommu_setup+0x284/0x284
[   37.378219]  pci_iommu_init+0x12/0x37
[   37.378219]  do_one_initcall+0x48/0x210
[   37.378219]  kernel_init_freeable+0x229/0x28c
[   37.378219]  ? rest_init+0xe0/0xe0
[   37.963966]  kernel_init+0x1a/0x130
[   37.979415]  ret_from_fork+0x22/0x30
[   37.991436]  
[   37.999465] Modules linked in:
[   38.007413] CR2: c90042880018
[   38.019416] ---[ end trace  ]---
[   38.023418] RIP: e030:init_iommu_one+0x248/0x2f0
[   38.023418] Code: 48 89 43 68 48 85 c0 74 c4 be 00 20 00 00 48 89 df e8 ea ee ff 
ff 48 89 43 78 48 85 c0 74 ae c6 83 98 00 00 00 00 48 8b 43 38 <48> 8b 40 18 a8 
01 74 07 83 8b a8 04 00 00 01 f6 83 a8 04 00 00 01
[   38.023418] RSP: e02b:c9004044bd18 EFLAGS: 00010286
[   38.023418] RAX: c9004288 RBX: 888107260800 RCX: 
[   38.155413] RDX: 8000 RSI: ea00041cab80 RDI: 
[   38.175965] Freeing initrd memory: 62640K
[   38.155413] RBP: c9004044bd38 R08: 0901 R09: ea00041cab00
[   38.155413] R10: 0002 R11:  R12: c90040435008
[   38.155413] R13: 0008 R14: efa0 R15: 
[   38.155413] FS:  () GS:88fef418() 
knlGS:
[   38.287414] CS:  e030 DS:  ES:  CR0: 80050033
[   38.309557] CR2: c90042880018 CR3: 0260a000 CR4: 00050660
[   38.332403] Kernel panic - not syncing: Fatal exception
[   38.351414] Rebooting in 20 seconds..



-boris

Re: [PATCH 00/16] Remove usage of the deprecated "pci-dma-compat.h" API

2022-02-24 Thread Arnd Bergmann

On Thu, Feb 24, 2022 at 7:25 AM Christoph Hellwig  wrote:
>
> On Wed, Feb 23, 2022 at 09:26:56PM +0100, Christophe JAILLET wrote:
> > Patch 01, 04, 05, 06, 08, 09 have not reached -next yet.
> > They all still apply cleanly.
> >
> > 04 has been picked it up for inclusion in the media subsystem for 5.18.
> > The other ones all have 1 or more Reviewed-by:/Acked-by: tags.
> >
> > Patch 16 must be resubmitted to add "#include " in
> > order not to break builds.
>
> So how about this:  I'll pick up 1, 5,6,8 and 9 for the dma-mapping
> tree.  After -rc1 when presumably all other patches have reached
> mainline your resubmit one with the added include and we finish this
> off?

Sounds good to me as well.

   Arnd

Re: [PATCH 2/3] powerpc: fix build errors

2022-02-24 Thread Nicholas Piggin

Excerpts from Nicholas Piggin's message of February 24, 2022 12:54 pm:
> Excerpts from Anders Roxell's message of February 23, 2022 11:58 pm:
>> Building tinyconfig with gcc (Debian 11.2.0-16) and assembler (Debian
>> 2.37.90.20220207) the following build error shows up:
>> 
>>  {standard input}: Assembler messages:
>>  {standard input}:1190: Error: unrecognized opcode: `stbcix'
>>  {standard input}:1433: Error: unrecognized opcode: `lwzcix'
>>  {standard input}:1453: Error: unrecognized opcode: `stbcix'
>>  {standard input}:1460: Error: unrecognized opcode: `stwcix'
>>  {standard input}:1596: Error: unrecognized opcode: `stbcix'
>>  ...
>> 
>> Rework to add assembler directives [1] around the instruction. Going
>> through the them one by one shows that the changes should be safe.  Like
>> __get_user_atomic_128_aligned() is only called in p9_hmi_special_emu(),
>> which according to the name is specific to power9.  And __raw_rm_read*()
>> are only called in things that are powernv or book3s_hv specific.
>> 
>> [1] 
>> https://sourceware.org/binutils/docs/as/PowerPC_002dPseudo.html#PowerPC_002dPseudo
> 
> Thanks for doing this. There is a recent patch committed to binutils to work
> around this compiler bug.
> 
> https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=cebc89b9328
> 
> Not sure on the outlook for GCC fix. Either way unfortunately we have 
> toolchains in the wild now that will explode, so we might have to take 
> your patches for the time being.

Perhaps not... Here's a hack that seems to work around the problem.

The issue of removing -many from the kernel and replacing it with
appropriate architecture versions is an orthogonal one (that we
should do). Either way this hack should be able to allow us to do
that as well, on these problem toolchains.

But for now it just uses -many as the trivial regression fix to get
back to previous behaviour.

Thanks,
Nick

---
 arch/powerpc/include/asm/asm-compat.h | 28 +++
 1 file changed, 28 insertions(+)

diff --git a/arch/powerpc/include/asm/asm-compat.h 
b/arch/powerpc/include/asm/asm-compat.h
index 2b736d9fbb1b..f9ac4a36f026 100644
--- a/arch/powerpc/include/asm/asm-compat.h
+++ b/arch/powerpc/include/asm/asm-compat.h
@@ -5,6 +5,34 @@
 #include 
 #include 
 
+#ifndef __ASSEMBLY__
+/*
+ * gcc 10 started to emit a .machine directive at the beginning of generated
+ * .s files, which overrides assembler -Wa,-m options passed down.
+ * Unclear if this behaviour will be reverted.
+ *
+ * gas 2.38 commit b25f942e18d6 made .machine directive more strict, commit
+ * cebc89b9328ea weakens it to take into account the gcc directive and allow
+ * assembler -m options to work.
+ *
+ * A combination of both results in an older machine -mcpu= code generation
+ * preventing newer mneumonics in inline asm being recognised because it
+ * overrides our -Wa,-many option from being recognised.
+ *
+ * Emitting a .machine any directive by hand allows us to hack our way around
+ * this.
+ *
+ * XXX: verify versions and combinations.
+ */
+#ifdef CONFIG_CC_IS_GCC
+#if (GCC_VERSION >= 10)
+#if (CONFIG_AS_VERSION == 23800)
+asm(".machine any");
+#endif
+#endif
+#endif
+#endif /* __ASSEMBLY__ */
+
 #ifdef __powerpc64__
 
 /* operations for longs and pointers */
-- 
2.23.0

Re: [RFC PATCH 0/5] Avoid kdump service reload on CPU hotplug events

2022-02-24 Thread Sourabh Jain


Hello Baoquan,


Hi,

On 02/21/22 at 02:16pm, Sourabh Jain wrote:

On hotplug event (CPU/memory) the CPU information prepared for the kdump kernel
becomes stale unless it is prepared again. To keep the CPU information
up-to-date a kdump service reload is triggered via the udev rule.

The above approach has two downsides:

1) The udev rules are prone to races if hotplug event is frequent. The time is
taken to settle down all the kdump service reload requested is significant
when multiple CPU/memory hotplug is performed at the same time. This creates
a window where kernel crash might not lead to successfully dump collection.

2) Unnecessary CPU cycles are consumed to reload all the kdump components
including initrd, vmlinux, FDT, etc. whereas only one component needs to
update that is FDT.

I roughly went through this sereis, while haven't read the code
carefully. Seems the issue and the approach are similar to what below
patchset is doing. Do you notice below patchset from Oracle engineer?
And is there stuff the ppc code can be rebased on and reused?

[PATCH v4 00/10] crash: Kernel handling of CPU and memory hot un/plug
https://lore.kernel.org/all/20220209195706.51522-1-eric.devol...@oracle.com/T/#u


Thanks for the suggestion. I have seen earlier versions of this patch series
but since it did not have support for kexec_load system call we tried 
implementing

something from scratch.

Since Eric's added support for kexec_load and has a generic handler for 
CPU and
memory hotplug let me see if I can rebase my PowerPC changes on top of 
his patches.
The major difference across the distro is that on PowerPC we need to 
update FDT instead

of elfcorehdr on hotplug event.

Thanks,
Sourabh Jain

97 matches

Mail list logo