date:20220921

Re: [powerpc] memcpy warning drivers/scsi/scsi_transport_fc.c:581 (next-20220921)

2022-09-21 Thread Sachin Sant




> On 22-Sep-2022, at 2:13 AM, Kees Cook  wrote:
> 
> On Wed, Sep 21, 2022 at 09:21:52PM +0530, Sachin Sant wrote:
>> While booting recent linux-next kernel on a Power server following
>> warning is seen:
>> 
>> [6.427054] lpfc 0022:01:00.0: 0:6468 Set host date / time: Status x10:
>> [6.471457] lpfc 0022:01:00.0: 0:6448 Dual Dump is enabled
>> [7.432161] [ cut here ]
>> [7.432178] memcpy: detected field-spanning write (size 8) of single 
>> field ">event_data" at drivers/scsi/scsi_transport_fc.c:581 (size 4)
> 
> Interesting!
> 
> The memcpy() is this one:
> 
>memcpy(>event_data, data_buf, data_len);
> 
> The struct member, "event_data" is defined as u32:
> 
> ...
> * Note: if Vendor Unique message, _data will be  start of
> * Note: if Vendor Unique message, event_data_flex will be start of
> *  vendor unique payload, and the length of the payload is
> *   per event_datalen
> ...
> struct fc_nl_event {
>struct scsi_nl_hdr snlh;/* must be 1st element !  */
>__u64 seconds;
>__u64 vendor_id;
>__u16 host_no;
>__u16 event_datalen;
>__u32 event_num;
>__u32 event_code;
>__u32 event_data;
> } __attribute__((aligned(sizeof(__u64;
> 
> The warning says memcpy is trying to write 8 bytes into the 4 byte
> member, so the compiler is seeing it "correctly", but I think this is
> partially a false positive. It looks like there is also a small bug in
> the allocation size calculation and therefore a small leak of kernel
> heap memory contents. My notes:
> 
> void
> fc_host_post_fc_event(struct Scsi_Host *shost, u32 event_number,
>enum fc_host_event_code event_code,
>u32 data_len, char *data_buf, u64 vendor_id)
> {
>   ...
>struct fc_nl_event *event;
>   ...
>if (!data_buf || data_len < 4)
>data_len = 0;
> 
> This wants a data_buf and a data_len >= 4, so it does look like it's
> expected to be variable sized. There does appear to be an alignment
> and padding expectation, though:
> 
> /* macro to round up message lengths to 8byte boundary */
> #define FC_NL_MSGALIGN(len) (((len) + 7) & ~7)
> 
>   ...
>len = FC_NL_MSGALIGN(sizeof(*event) + data_len);
> 
> But this is immediately suspicious: sizeof(*event) _includes_ event_data,
> so the alignment is going to be bumped up incorrectly. Note that
> struct fc_nl_event is 8 * 5 == 40 bytes, which allows for 4 bytes in
> event_data. But setting data_len to 4 (i.e. no "overflow") means we're
> asking for 44 bytes, which is aligned to 48.
> 
> So, in all cases, there is uninitialized memory being sent...
> 
>skb = nlmsg_new(len, GFP_KERNEL);
>   ...
>nlh = nlmsg_put(skb, 0, 0, SCSI_TRANSPORT_MSG, len, 0);
>   ...
>event = nlmsg_data(nlh);
>   ...
>event->event_datalen = data_len;/* bytes */
> 
> Comments in the struct say this is counting from start of event_data.
> 
>   ...
>if (data_len)
>memcpy(>event_data, data_buf, data_len);
> 
> And here is the reported memcpy().
> 
> The callers of fc_host_post_fc_event() are:
> 
>fc_host_post_fc_event(shost, event_number, event_code,
>(u32)sizeof(u32), (char *)_data, 0);
> 
> Fixed-size of 4 bytes: no "overflow".
> 
>fc_host_post_fc_event(shost, event_number, FCH_EVT_VENDOR_UNIQUE,
>data_len, data_buf, vendor_id);
> 
>fc_host_post_fc_event(shost, fc_get_event_number(),
>FCH_EVT_LINK_FPIN, fpin_len, fpin_buf, 0);
> 
> These two appear to be of arbitrary length, but I didn't look more
> deeply.
> 
> Given that the only user of struct fc_nl_event is fc_host_post_fc_event()
> in drivers/scsi/scsi_transport_fc.c, it looks safe to say that changing
> the struct to use a flexible array is the thing to do in the kernel, but
> we can't actually change the size or layout because it's a UAPI header.
> 
> Are you able to test this patch:

Thank you for the detailed analysis.
With this patch applied I don’t see the warning.

Thanks
- Sachin

Re: [PATCH v3 4/4] arm64: support batched/deferred tlb shootdown during page reclamation

2022-09-21 Thread Anshuman Khandual




On 9/21/22 12:47, Nadav Amit wrote:
> On Sep 20, 2022, at 11:53 PM, Anshuman Khandual  
> wrote:
> 
>> ⚠ External Email
>>
>> On 8/22/22 13:51, Yicong Yang wrote:
>>> +static inline void arch_tlbbatch_add_mm(struct arch_tlbflush_unmap_batch 
>>> *batch,
>>> + struct mm_struct *mm,
>>> + unsigned long uaddr)
>>> +{
>>> + __flush_tlb_page_nosync(mm, uaddr);
>>> +}
>>> +
>>> +static inline void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch 
>>> *batch)
>>> +{
>>> + dsb(ish);
>>> +}
>>
>> Just wondering if arch_tlbbatch_add_mm() could also detect continuous mapping
>> TLB invalidation requests on a given mm and try to generate a range based TLB
>> invalidation such as flush_tlb_range().
>>
>> struct arch_tlbflush_unmap_batch via task->tlb_ubc->arch can track continuous
>> ranges while being queued up via arch_tlbbatch_add_mm(), any range formed can
>> later be flushed in subsequent arch_tlbbatch_flush() ?
>>
>> OR
>>
>> It might not be worth the effort and complexity, in comparison to performance
>> improvement, TLB range flush brings in ?
> 
> So here are my 2 cents, based on my experience with Intel-x86. It is likely
> different on arm64, but perhaps it can provide you some insight into what
> parameters you should measure and consider.
> 
> In general there is a tradeoff between full TLB flushes and entry-specific
> ones. Flushing specific entries takes more time than flushing the entire
> TLB, but sade TLB refills.

Right.

> 
> Dave Hansen made some calculations in the past and came up with 33 as a
> magic cutoff number, i.e., if you need to flush more than 33 entries, just
> flush the entire TLB. I am not sure that this exact number is very
> meaningful, since one might argue that it should’ve taken PTI into account
> (which might require twice as many TLB invalidations).

Okay.

> 
> Anyhow, back to arch_tlbbatch_add_mm(). It may be possible to track ranges,
> but the question is whether you would actually succeed in forming continuous
> ranges that are eventually (on x86) smaller than the full TLB flush cutoff
> (=33). Questionable (perhaps better with MGLRU?).

This proposal here for arm64 does not cause a full TLB flush ever. It creates
individual TLB flushes all the time. Hence the choice here is not between full
TLB flush and possible range flushes. Choice is actually between individual
TLB flushes and range/full TLB flushes.

> 
> Then, you should remember that tracking should be very efficient, since even
> few cache misses might have greater cost than what you save by
> selective-flushing. Finally, on x86 you would need to invoke the smp/IPI
> layer multiple times to send different cores the relevant range they need to
> flush.

Agreed, these reasons make it much difficult to gain any more performance.

> 
> IOW: It is somewhat complicated to implement efficeintly. On x86, and
> probably other IPI-based TLB shootdown systems, does not have clear
> performance benefit (IMHO).

Agreed, thanks for such a detailed explanation, appreciate it.

Re: [PATCH] block: move from strlcpy with unused retval to strscpy

2022-09-21 Thread Jens Axboe

On Thu, 18 Aug 2022 22:59:57 +0200, Wolfram Sang wrote:
> Follow the advice of the below link and prefer 'strscpy' in this
> subsystem. Conversion is 1:1 because the return value is not used.
> Generated by a coccinelle script.
> 
> 

Applied, thanks!

[1/1] block: move from strlcpy with unused retval to strscpy
  commit: e55e1b4831563e5766f88fa821f5bfac0d6c298c

Best regards,
-- 
Jens Axboe

Re: [PATCH v2 07/44] cpuidle,psci: Push RCU-idle into driver

2022-09-21 Thread Guo Ren

Reviewed-by: Guo Ren 

On Mon, Sep 19, 2022 at 6:17 PM Peter Zijlstra  wrote:
>
> Doing RCU-idle outside the driver, only to then temporarily enable it
> again, at least twice, before going idle is daft.
>
> Signed-off-by: Peter Zijlstra (Intel) 
> ---
>  drivers/cpuidle/cpuidle-psci.c |9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
>
> --- a/drivers/cpuidle/cpuidle-psci.c
> +++ b/drivers/cpuidle/cpuidle-psci.c
> @@ -69,12 +69,12 @@ static int __psci_enter_domain_idle_stat
> return -1;
>
> /* Do runtime PM to manage a hierarchical CPU toplogy. */
> -   ct_irq_enter_irqson();
> if (s2idle)
> dev_pm_genpd_suspend(pd_dev);
> else
> pm_runtime_put_sync_suspend(pd_dev);
> -   ct_irq_exit_irqson();
> +
> +   ct_idle_enter();
>
> state = psci_get_domain_state();
> if (!state)
> @@ -82,12 +82,12 @@ static int __psci_enter_domain_idle_stat
>
> ret = psci_cpu_suspend_enter(state) ? -1 : idx;
>
> -   ct_irq_enter_irqson();
> +   ct_idle_exit();
> +
> if (s2idle)
> dev_pm_genpd_resume(pd_dev);
> else
> pm_runtime_get_sync(pd_dev);
> -   ct_irq_exit_irqson();
>
> cpu_pm_exit();
>
> @@ -240,6 +240,7 @@ static int psci_dt_cpu_init_topology(str
>  * of a shared state for the domain, assumes the domain states are all
>  * deeper states.
>  */
> +   drv->states[state_count - 1].flags |= CPUIDLE_FLAG_RCU_IDLE;
> drv->states[state_count - 1].enter = psci_enter_domain_idle_state;
> drv->states[state_count - 1].enter_s2idle = 
> psci_enter_s2idle_domain_idle_state;
> psci_cpuidle_use_cpuhp = true;
>
>


-- 
Best Regards
 Guo Ren

Re: [PATCH v2 07/44] cpuidle,psci: Push RCU-idle into driver

2022-09-21 Thread Kajetan Puchalski

On Mon, Sep 19, 2022 at 11:59:46AM +0200, Peter Zijlstra wrote:
> Doing RCU-idle outside the driver, only to then temporarily enable it
> again, at least twice, before going idle is daft.
> 
> Signed-off-by: Peter Zijlstra (Intel) 

Tried it on Pixel 6 running psci_idle, looks good with no apparent issues.

Tested-by: Kajetan Puchalski

Re: [powerpc] Kernel crash with THP tests (next-20220920)

2022-09-21 Thread Mike Kravetz

On 09/21/22 12:00, Sachin Sant wrote:
> While running transparent huge page tests [1] against 6.0.0-rc6-next-20220920
> following crash is seen on IBM Power server.

Thanks Sachin,

Naoya reported this, with my analysis here:
https://lore.kernel.org/linux-mm/YyqCS6+OXAgoqI8T@monkey/

An updated version of the patch was posted here,
https://lore.kernel.org/linux-mm/20220921202702.106069-1-mike.krav...@oracle.com/

Sorry about that,
-- 
Mike Kravetz

> 
> Kernel attempted to read user page (34) - exploit attempt? (uid: 0)
> BUG: Kernel NULL pointer dereference on read at 0x0034
> Faulting instruction address: 0xc04d2744
> Oops: Kernel access of bad area, sig: 11 [#1]
> LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> Modules linked in: dm_mod(E) bonding(E) rfkill(E) tls(E) sunrpc(E) nd_pmem(E) 
> nd_btt(E) dax_pmem(E) papr_scm(E) libnvdimm(E) pseries_rng(E) vmx_crypto(E) 
> ext4(E) mbcache(E) jbd2(E) sd_mod(E) t10_pi(E) crc64_rocksoft(E) crc64(E) 
> sg(E) ibmvscsi(E) scsi_transport_srp(E) ibmveth(E) fuse(E)
> CPU: 37 PID: 2219255 Comm: sysctl Tainted: GE  
> 6.0.0-rc6-next-20220920 #1
> NIP:  c04d2744 LR: c04d2734 CTR: 
> REGS: c012801bf660 TRAP: 0300   Tainted: GE   
> (6.0.0-rc6-next-20220920)
> MSR:  80009033   CR: 24048222  XER: 2004
> CFAR: c04b0eac DAR: 0034 DSISR: 4000 IRQMASK: 0 
> GPR00: c04d2734 c012801bf900 c2a92300  
> GPR04: c2ac8ac0 c1209340 0005 c01286714b80 
> GPR08: 0034    
> GPR12: 28048242 c0167fff6b00   
> GPR16:     
> GPR20: c012801bfae8 0001 0100 0001 
> GPR24: c012801bfae8 c2ac8ac0 0002 0005 
> GPR28:  0001  00346cca 
> NIP [c04d2744] alloc_buddy_huge_page+0xd4/0x240
> LR [c04d2734] alloc_buddy_huge_page+0xc4/0x240
> Call Trace:
> [c012801bf900] [c04d2734] alloc_buddy_huge_page+0xc4/0x240 
> (unreliable)
> [c012801bf9b0] [c04d46a4] 
> alloc_fresh_huge_page.part.72+0x214/0x2a0
> [c012801bfa40] [c04d7f88] alloc_pool_huge_page+0x118/0x190
> [c012801bfa90] [c04d84dc] __nr_hugepages_store_common+0x4dc/0x610
> [c012801bfb70] [c04d88bc] 
> hugetlb_sysctl_handler_common+0x13c/0x180
> [c012801bfc10] [c06380e0] proc_sys_call_handler+0x210/0x350
> [c012801bfc90] [c0551c00] vfs_write+0x2e0/0x460
> [c012801bfd50] [c0551f5c] ksys_write+0x7c/0x140
> [c012801bfda0] [c0033f58] system_call_exception+0x188/0x3f0
> [c012801bfe10] [c000c53c] system_call_common+0xec/0x270
> --- interrupt: c00 at 0x7fffa9520c34
> NIP:  7fffa9520c34 LR: 0001024754bc CTR: 
> REGS: c012801bfe80 TRAP: 0c00   Tainted: GE   
> (6.0.0-rc6-next-20220920)
> MSR:  8280f033   CR: 28002202  
> XER: 
> IRQMASK: 0 
> GPR00: 0004 7fffccd76cd0 7fffa9607300 0003 
> GPR04: 000138da6970 0006 fff6  
> GPR08: 000138da6970    
> GPR12:  7fffa9a40940   
> GPR16:     
> GPR20:     
> GPR24: 0001 0010 0006 000138da8aa0 
> GPR28: 7fffa95fc2c8 000138da8aa0 0006 000138da6930 
> NIP [7fffa9520c34] 0x7fffa9520c34
> LR [0001024754bc] 0x1024754bc
> --- interrupt: c00
> Instruction dump:
> 3b42 3ba1 3b80 7f26cb78 7fc5f378 7f64db78 7fe3fb78 4bfde5b9 
> 6000 7c691b78 39030034 7c0004ac <7d404028> 7c0ae800 40c20010 7f80412d 
> ---[ end trace  ]---
> 
> Kernel panic - not syncing: Fatal exception
> 
> Bisect points to following patch:
> commit f2f3c25dea3acfb17aecb7273541e7266dfc8842
> hugetlb: freeze allocated pages before creating hugetlb pages
> 
> Reverting the patch allows the test to run successfully.
> 
> Thanks
> - Sachin
> 
> [1] 
> https://github.com/avocado-framework-tests/avocado-misc-tests/blob/master/memory/transparent_hugepages_defrag.py

Re: [powerpc] memcpy warning drivers/scsi/scsi_transport_fc.c:581 (next-20220921)

2022-09-21 Thread Kees Cook

On Wed, Sep 21, 2022 at 09:21:52PM +0530, Sachin Sant wrote:
> While booting recent linux-next kernel on a Power server following
> warning is seen:
> 
> [6.427054] lpfc 0022:01:00.0: 0:6468 Set host date / time: Status x10:
> [6.471457] lpfc 0022:01:00.0: 0:6448 Dual Dump is enabled
> [7.432161] [ cut here ]
> [7.432178] memcpy: detected field-spanning write (size 8) of single field 
> ">event_data" at drivers/scsi/scsi_transport_fc.c:581 (size 4)

Interesting!

The memcpy() is this one:

memcpy(>event_data, data_buf, data_len);

The struct member, "event_data" is defined as u32:

...
 * Note: if Vendor Unique message, _data will be  start of
 * Note: if Vendor Unique message, event_data_flex will be start of
 *  vendor unique payload, and the length of the payload is
 *   per event_datalen
...
struct fc_nl_event {
struct scsi_nl_hdr snlh;/* must be 1st element !  */
__u64 seconds;
__u64 vendor_id;
__u16 host_no;
__u16 event_datalen;
__u32 event_num;
__u32 event_code;
__u32 event_data;
} __attribute__((aligned(sizeof(__u64;

The warning says memcpy is trying to write 8 bytes into the 4 byte
member, so the compiler is seeing it "correctly", but I think this is
partially a false positive. It looks like there is also a small bug in
the allocation size calculation and therefore a small leak of kernel
heap memory contents. My notes:

void
fc_host_post_fc_event(struct Scsi_Host *shost, u32 event_number,
enum fc_host_event_code event_code,
u32 data_len, char *data_buf, u64 vendor_id)
{
...
struct fc_nl_event *event;
...
if (!data_buf || data_len < 4)
data_len = 0;

This wants a data_buf and a data_len >= 4, so it does look like it's
expected to be variable sized. There does appear to be an alignment
and padding expectation, though:

/* macro to round up message lengths to 8byte boundary */
#define FC_NL_MSGALIGN(len) (((len) + 7) & ~7)

...
len = FC_NL_MSGALIGN(sizeof(*event) + data_len);

But this is immediately suspicious: sizeof(*event) _includes_ event_data,
so the alignment is going to be bumped up incorrectly. Note that
struct fc_nl_event is 8 * 5 == 40 bytes, which allows for 4 bytes in
event_data. But setting data_len to 4 (i.e. no "overflow") means we're
asking for 44 bytes, which is aligned to 48.

So, in all cases, there is uninitialized memory being sent...

skb = nlmsg_new(len, GFP_KERNEL);
...
nlh = nlmsg_put(skb, 0, 0, SCSI_TRANSPORT_MSG, len, 0);
...
event = nlmsg_data(nlh);
...
event->event_datalen = data_len;/* bytes */

Comments in the struct say this is counting from start of event_data.

...
if (data_len)
memcpy(>event_data, data_buf, data_len);

And here is the reported memcpy().

The callers of fc_host_post_fc_event() are:

fc_host_post_fc_event(shost, event_number, event_code,
(u32)sizeof(u32), (char *)_data, 0);

Fixed-size of 4 bytes: no "overflow".

fc_host_post_fc_event(shost, event_number, FCH_EVT_VENDOR_UNIQUE,
data_len, data_buf, vendor_id);

fc_host_post_fc_event(shost, fc_get_event_number(),
FCH_EVT_LINK_FPIN, fpin_len, fpin_buf, 0);

These two appear to be of arbitrary length, but I didn't look more
deeply.

Given that the only user of struct fc_nl_event is fc_host_post_fc_event()
in drivers/scsi/scsi_transport_fc.c, it looks safe to say that changing
the struct to use a flexible array is the thing to do in the kernel, but
we can't actually change the size or layout because it's a UAPI header.

Are you able to test this patch:

diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index a2524106206d..0d798f11dc34 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -543,7 +543,7 @@ fc_host_post_fc_event(struct Scsi_Host *shost, u32 
event_number,
struct nlmsghdr *nlh;
struct fc_nl_event *event;
const char *name;
-   u32 len;
+   size_t len, padding;
int err;

if (!data_buf || data_len < 4)
@@ -554,7 +554,7 @@ fc_host_post_fc_event(struct Scsi_Host *shost, u32 
event_number,
goto send_fail;
}

-   len = FC_NL_MSGALIGN(sizeof(*event) + data_len);
+   len = FC_NL_MSGALIGN(sizeof(*event) - sizeof(event->event_data) + 
data_len);

skb = nlmsg_new(len, GFP_KERNEL);
if (!skb) {
@@ -578,7 +578,9 @@ fc_host_post_fc_event(struct Scsi_Host *shost, u32 
event_number,
event->event_num = event_number;
event->event_code = event_code;
if (data_len)
-   memcpy(>event_data, data_buf, data_len);
+   memcpy(event->event_data_flex, data_buf, data_len);
+

Re: [PATCH] Documentation: spufs: correct a duplicate word typo

2022-09-21 Thread Jonathan Corbet

Randy Dunlap  writes:

> Fix a typo of "or" which should be "of".
>
> Signed-off-by: Randy Dunlap 
> Cc: Jeremy Kerr 
> Cc: Arnd Bergmann 
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: Jonathan Corbet 
> ---
>  Documentation/filesystems/spufs/spufs.rst |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> --- a/Documentation/filesystems/spufs/spufs.rst
> +++ b/Documentation/filesystems/spufs/spufs.rst
> @@ -227,7 +227,7 @@ Files
>from the data buffer, updating the value of the specified 
> signal
>notification register.  The signal  notification  register  
> will
>either be replaced with the input data or will be updated to 
> the
> -  bitwise OR or the old value and the input data, depending on 
> the
> +  bitwise OR of the old value and the input data, depending on 
> the
>contents  of  the  signal1_type,  or  signal2_type 
> respectively,
>file.

Applied, thanks.

jon

Re: [PATCH 2/2] powerpc/64s: update cpu selection options

2022-09-21 Thread Segher Boessenkool

On Wed, Sep 21, 2022 at 11:41:03AM +1000, Nicholas Piggin wrote:
> Update the 64s GENERIC_CPU option. POWER4 support has been dropped, so
> make that clear in the option name. The POWER5_CPU option is dropped
> because it's uncommon, and GENERIC_CPU covers it.
> 
> -mtune= before power8 is dropped because the minimum gcc version
> supports power8, and tuning is made consistent between big and little
> endian.
> 
> A 970 option is added for PowerPC 970 / G5 because they still have a
> user base, and -mtune=power8 does not generate good code for the 970.
> 
> This also updates the ISA versions document to add Power4/Power4+
> because I didn't realise Power4+ used 2.01.
> 
> Signed-off-by: Nicholas Piggin 

Reviewed-by: Segher Boessenkool 

Cheers,


Segher

Re: [PATCH 1/2] powerpc/64s: Fix GENERIC_CPU build flags for PPC970 / G5

2022-09-21 Thread Segher Boessenkool

On Wed, Sep 21, 2022 at 11:41:02AM +1000, Nicholas Piggin wrote:
> Big-endian GENERIC_CPU supports 970, but builds with -mcpu=power5.
> POWER5 is ISA v2.02 whereas 970 is v2.01 plus Altivec. 2.02 added
> the popcntb instruction which a compiler might use.
> 
> Use -mcpu=power4.
> 
> Fixes: 471d7ff8b51b ("powerpc/64s: Remove POWER4 support")
> Signed-off-by: Nicholas Piggin 

Reviewed-by: Segher Boessenkool 

Thank you!

Maybe superfluous, but some more context: GCC's -mcpu=power4 means
POWER4+, ISA 2.01, according to our documentation.  There is no
difference with ISA 2.00 (what plain POWER4 implements) for anything
GCC does.

Segher

[PATCH 2/2] tools/perf/tests: Fix build id test check for PE file

2022-09-21 Thread Athira Rajeev

Perf test "build id cache operations" fails for PE
executable.  Logs below from powerpc system.
Same is observed on x86 as well.

<<>>
Adding 5a0fd882b53084224ba47b624c55a469 ./tests/shell/../pe-file.exe: Ok
build id: 5a0fd882b53084224ba47b624c55a469
link: /tmp/perf.debug.w0V/.build-id/5a/0fd882b53084224ba47b624c55a469
file: 
/tmp/perf.debug.w0V/.build-id/5a/../../root/athira/linux/tools/perf/tests/pe-file.exe/5a0fd882b53084224ba47b624c55a469/elf
failed: file 
/tmp/perf.debug.w0V/.build-id/5a/../../root/athira/linux/tools/perf/tests/pe-file.exe/5a0fd882b53084224ba47b624c55a469/elf
 does not exist
test child finished with -1
 end 
build id cache operations: FAILED!
<<>>

The test tries to do:

<<>>
mkdir /tmp/perf.debug.TeY1
perf --buildid-dir /tmp/perf.debug.TeY1 buildid-cache -v -a 
./tests/shell/../pe-file.exe
<<>>

The option "--buildid-dir" sets the build id cache
directory as /tmp/perf.debug.TeY1. The option given
to buildid-cahe, ie "-a ./tests/shell/../pe-file.exe",
is to add the pe-file.exe to the cache. The testcase,
sets buildid-dir and adds the file: pe-file.exe to build
id cache. To check if the command is run successfully,
"check" function looks for presence of the file in buildid
cache directory. But the check here expects the added
file to be executable. Snippet below:

<<>>
if [ ! -x $file ]; then
echo "failed: file ${file} does not exist"
exit 1
fi
<<>>

Buildid test is done for sha1 binary, md5 binary and also
for PE file. The firt two binaries are created at runtime
by compiling with "--build-id" option and hence the check
for sha1/md5 test should use [ ! -x ]. But in case of PE
file, the permission for this input file is rw-r--r--
Hence the file added to build id cache has same permissoin

Original file:
ls tests/pe-file.exe | xargs stat --printf "%n %A \n"
tests/pe-file.exe -rw-r--r--

buildid cache file:

ls 
/tmp/perf.debug.w0V/.build-id/5a/../../root/athira/linux/tools/perf/tests/pe-file.exe/5a0fd882b53084224ba47b624c55a469/elf
 | xargs stat --printf "%n %A \n"
/tmp/perf.debug.w0V/.build-id/5a/../../root/athira/linux/tools/perf/tests/pe-file.exe/5a0fd882b53084224ba47b624c55a469/elf
 -rw-r--r--

Fix the test to match with the permission of original file in
case of FE file. ie if the "tests/pe-file.exe" file is not
having exec permission, just check for existence of the buildid
file using [ ! -e  ]

Signed-off-by: Athira Rajeev 
---
 tools/perf/tests/shell/buildid.sh | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/tools/perf/tests/shell/buildid.sh 
b/tools/perf/tests/shell/buildid.sh
index 3512c4423d48..75f5117c3417 100755
--- a/tools/perf/tests/shell/buildid.sh
+++ b/tools/perf/tests/shell/buildid.sh
@@ -77,7 +77,20 @@ check()
file=${build_id_dir}/.build-id/$(echo ${id}|cut -c 1-2)/`readlink 
${link}`/elf
echo "file: ${file}"
 
-   if [ ! -x $file ]; then
+   # Check for file permission of orginal file
+   # in case of pe-file.exe file
+   echo $1 | grep ".exe"
+   if [ $? -eq 0 ]; then
+   if [ -x $1  -a ! -x $file ]; then
+   echo "failed: file ${file} executable does not exist"
+   exit 1
+   fi
+
+   if [ ! -x $file -a ! -e $file ]; then
+   echo "failed: file ${file} does not exist"
+   exit 1
+   fi
+   elif [ ! -x $file ]; then
echo "failed: file ${file} does not exist"
exit 1
fi
-- 
2.17.1

[PATCH 1/2] tools/perf/tests: Fix string substitutions in build id test

2022-09-21 Thread Athira Rajeev

The perf test named “build id cache operations” skips with below
error on some distros:

<<>>
 78: build id cache operations   :
test child forked, pid 01
WARNING: wine not found. PE binaries will not be run.
test binaries: /tmp/perf.ex.SHA1.PKz /tmp/perf.ex.MD5.Gt3 
./tests/shell/../pe-file.exe
DEBUGINFOD_URLS=
Adding 4abd406f041feb4f10ecde3fc30fd0639e1a91cb /tmp/perf.ex.SHA1.PKz: Ok
build id: 4abd406f041feb4f10ecde3fc30fd0639e1a91cb
./tests/shell/buildid.sh: 69: ./tests/shell/buildid.sh: Bad substitution
test child finished with -2
build id cache operations: Skip
<<>>

The test script "tests/shell/buildid.sh" uses some of the
string substitution ways which are supported in bash, but not in
"sh" or other shells. Above error on line number 69 that reports
"Bad substitution" is:

<<>>
link=${build_id_dir}/.build-id/${id:0:2}/${id:2}
<<>>

Here the way of getting first two characters from id ie,
${id:0:2} and similarly expressions like ${id:2} is not
recognised in "sh". So the line errors and instead of
hitting failure, the test gets skipped as shown in logs.
So the syntax issue causes test not to be executed in
such cases. Similarly usage : "${@: -1}" [ to pick last
argument passed to a function] in “test_record” doesn’t
work in all distros.

Fix this by using alternative way with "cut" command
to pick "n" characters from the string. Also fix the usage
of “${@: -1}” to work in all cases.

Another usage in “test_record” is:
<<>>
${perf} record --buildid-all -o ${data} $@ &> ${log}
<<>>

This causes the perf record to start in background and
Results in the data file not being created by the time
"check" function is invoked. Below log shows perf record
result getting displayed after the call to "check" function.

<<>>
running: perf record /tmp/perf.ex.SHA1.EAU
build id: 4abd406f041feb4f10ecde3fc30fd0639e1a91cb
link: /tmp/perf.debug.mLT/.build-id/4a/bd406f041feb4f10ecde3fc30fd0639e1a91cb
failed: link 
/tmp/perf.debug.mLT/.build-id/4a/bd406f041feb4f10ecde3fc30fd0639e1a91cb does 
not exist
test child finished with -1
build id cache operations: FAILED!
root@machine:~/athira/linux/tools/perf# Couldn't synthesize bpf events.
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.010 MB /tmp/perf.data.bFF ]
<<>>

Fix this by redirecting output instead of using “&” which
starts the command in background.

Signed-off-by: Athira Rajeev 
---
 tools/perf/tests/shell/buildid.sh | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/tools/perf/tests/shell/buildid.sh 
b/tools/perf/tests/shell/buildid.sh
index f05670d1e39e..3512c4423d48 100755
--- a/tools/perf/tests/shell/buildid.sh
+++ b/tools/perf/tests/shell/buildid.sh
@@ -66,7 +66,7 @@ check()
esac
echo "build id: ${id}"
 
-   link=${build_id_dir}/.build-id/${id:0:2}/${id:2}
+   link=${build_id_dir}/.build-id/$(echo ${id}|cut -c 1-2)/$(echo 
${id}|cut -c 3-)
echo "link: ${link}"
 
if [ ! -h $link ]; then
@@ -74,7 +74,7 @@ check()
exit 1
fi
 
-   file=${build_id_dir}/.build-id/${id:0:2}/`readlink ${link}`/elf
+   file=${build_id_dir}/.build-id/$(echo ${id}|cut -c 1-2)/`readlink 
${link}`/elf
echo "file: ${file}"
 
if [ ! -x $file ]; then
@@ -117,20 +117,22 @@ test_record()
 {
data=$(mktemp /tmp/perf.data.XXX)
build_id_dir=$(mktemp -d /tmp/perf.debug.XXX)
-   log=$(mktemp /tmp/perf.log.XXX)
+   log_out=$(mktemp /tmp/perf.log.out.XXX)
+   log_err=$(mktemp /tmp/perf.log.err.XXX)
perf="perf --buildid-dir ${build_id_dir}"
+   eval last=\${$#}
 
echo "running: perf record $@"
-   ${perf} record --buildid-all -o ${data} $@ &> ${log}
+   ${perf} record --buildid-all -o ${data} $@ 1>${log_out} 2>${log_err}
if [ $? -ne 0 ]; then
echo "failed: record $@"
-   echo "see log: ${log}"
+   echo "see log: ${log_err}"
exit 1
fi
 
-   check ${@: -1}
+   check $last
 
-   rm -f ${log}
+   rm -f ${log_out} ${log_err}
rm -rf ${build_id_dir}
rm -rf ${data}
 }
-- 
2.17.1

Re: [RFC PATCH 3/7] powerpc/64: provide a helper macro to load r2 with the kernel TOC

2022-09-21 Thread Christophe Leroy



Le 19/09/2022 à 16:01, Nicholas Piggin a écrit :
> A later change stops the kernel using r2 and loads it with a poison
> value.  Provide a PACATOC loading abstraction which can hide this
> detail.
> 
> XXX: 64e, KVM, ftrace not entirely done
> 
> Signed-off-by: Nicholas Piggin 
> ---
>   arch/powerpc/include/asm/ppc_asm.h |  3 +++
>   arch/powerpc/kernel/exceptions-64e.S   |  4 ++--
>   arch/powerpc/kernel/exceptions-64s.S   |  6 +++---
>   arch/powerpc/kernel/head_64.S  |  4 ++--
>   arch/powerpc/kernel/interrupt_64.S | 12 ++--
>   arch/powerpc/kernel/optprobes_head.S   |  2 +-
>   arch/powerpc/kernel/trace/ftrace_mprofile.S|  4 ++--
>   arch/powerpc/platforms/powernv/opal-wrappers.S |  2 +-
>   8 files changed, 20 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/ppc_asm.h 
> b/arch/powerpc/include/asm/ppc_asm.h
> index 520c4c9caf7f..c0848303151c 100644
> --- a/arch/powerpc/include/asm/ppc_asm.h
> +++ b/arch/powerpc/include/asm/ppc_asm.h
> @@ -374,6 +374,9 @@ GLUE(.,name):
>   
>   #ifdef __powerpc64__
>   
> +#define LOAD_PACA_TOC()  \
> + ld  r2,PACATOC(r13)
> +

Wouldn't it be better as a GAS macro ?

Re: [RFC PATCH 2/7] powerpc/64: abstract asm global variable declaration and access

2022-09-21 Thread Christophe Leroy



Le 19/09/2022 à 16:01, Nicholas Piggin a écrit :
> Use asm helpers to access global variables and to define them in asm.
> Stop using got addressing and use the more common @toc offsets. 32-bit
> already does this so that should be unchanged.
> 
> Signed-off-by: Nicholas Piggin 
> ---

> diff --git a/arch/powerpc/boot/ppc_asm.h b/arch/powerpc/boot/ppc_asm.h
> index 192b97523b05..ea290bf78fb2 100644
> --- a/arch/powerpc/boot/ppc_asm.h
> +++ b/arch/powerpc/boot/ppc_asm.h
> @@ -84,4 +84,8 @@
>   #define MFTBU(dest) mfspr dest, SPRN_TBRU
>   #endif
>   
> +#define LOAD_REG_ADDR(reg,name)  \
> + addis   reg,r2,name@toc@ha; \
> + addireg,reg,name@toc@l
> +
>   #endif /* _PPC64_PPC_ASM_H */

Wouldn't it be better as a GAS macro ?

Re: [RFC PATCH 1/7] powerpc: use 16-bit immediate for STACK_FRAME_REGS_MARKER

2022-09-21 Thread Christophe Leroy



Le 19/09/2022 à 16:01, Nicholas Piggin a écrit :
> Using a 16-bit constant for this marker allows it to be loaded with
> a single 'li' instruction. On 64-bit this avoids a TOC entry and a
> TOC load that depends on the r2 value that has just been loaded from
> the PACA.
> 
> XXX: this probably should be 64-bit change and use 2 instruction
> sequence that 32-bit uses, to avoid false positives.

Yes would probably be safer ? It is only one instruction more, would 
likely be unnoticeable.

Why value 0xba51 ?
Why not 0xdead like PPC64 ?

Christophe

> 
> Signed-off-by: Nicholas Piggin 
> ---
>   arch/powerpc/include/asm/ptrace.h| 6 +++---
>   arch/powerpc/kernel/entry_32.S   | 9 -
>   arch/powerpc/kernel/exceptions-64e.S | 8 +---
>   arch/powerpc/kernel/exceptions-64s.S | 2 +-
>   arch/powerpc/kernel/head_32.h| 3 +--
>   arch/powerpc/kernel/head_64.S| 7 ---
>   arch/powerpc/kernel/head_booke.h | 3 +--
>   arch/powerpc/kernel/interrupt_64.S   | 6 +++---
>   8 files changed, 14 insertions(+), 30 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/ptrace.h 
> b/arch/powerpc/include/asm/ptrace.h
> index a03403695cd4..f47066f7878e 100644
> --- a/arch/powerpc/include/asm/ptrace.h
> +++ b/arch/powerpc/include/asm/ptrace.h
> @@ -115,10 +115,10 @@ struct pt_regs
>   
>   #define STACK_FRAME_OVERHEAD112 /* size of minimum stack frame 
> */
>   #define STACK_FRAME_LR_SAVE 2   /* Location of LR in stack frame */
> -#define STACK_FRAME_REGS_MARKER  ASM_CONST(0x7265677368657265)
> +#define STACK_FRAME_REGS_MARKER  ASM_CONST(0xdead)
>   #define STACK_INT_FRAME_SIZE(sizeof(struct pt_regs) + \
>STACK_FRAME_OVERHEAD + KERNEL_REDZONE_SIZE)
> -#define STACK_FRAME_MARKER   12
> +#define STACK_FRAME_MARKER   1   /* Reuse CR+reserved word */
>   
>   #ifdef CONFIG_PPC64_ELF_ABI_V2
>   #define STACK_FRAME_MIN_SIZE32
> @@ -136,7 +136,7 @@ struct pt_regs
>   #define KERNEL_REDZONE_SIZE 0
>   #define STACK_FRAME_OVERHEAD16  /* size of minimum stack frame 
> */
>   #define STACK_FRAME_LR_SAVE 1   /* Location of LR in stack frame */
> -#define STACK_FRAME_REGS_MARKER  ASM_CONST(0x72656773)
> +#define STACK_FRAME_REGS_MARKER  ASM_CONST(0xba51)
>   #define STACK_INT_FRAME_SIZE(sizeof(struct pt_regs) + 
> STACK_FRAME_OVERHEAD)
>   #define STACK_FRAME_MARKER  2
>   #define STACK_FRAME_MIN_SIZESTACK_FRAME_OVERHEAD
> diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
> index 1d599df6f169..c221e764cefd 100644
> --- a/arch/powerpc/kernel/entry_32.S
> +++ b/arch/powerpc/kernel/entry_32.S
> @@ -108,9 +108,8 @@ transfer_to_syscall:
>   #ifdef CONFIG_BOOKE_OR_40x
>   rlwinm  r9,r9,0,14,12   /* clear MSR_WE (necessary?) */
>   #endif
> - lis r12,STACK_FRAME_REGS_MARKER@ha /* exception frame marker */
> + li  r12,STACK_FRAME_REGS_MARKER /* exception frame marker */
>   SAVE_GPR(2, r1)
> - addir12,r12,STACK_FRAME_REGS_MARKER@l
>   stw r9,_MSR(r1)
>   li  r2, INTERRUPT_SYSCALL
>   stw r12,8(r1)
> @@ -265,7 +264,7 @@ fast_exception_return:
>   mtcrr10
>   lwz r10,_LINK(r11)
>   mtlrr10
> - /* Clear the exception_marker on the stack to avoid confusing 
> stacktrace */
> + /* Clear the STACK_FRAME_REGS_MARKER on the stack to avoid confusing 
> stacktrace */
>   li  r10, 0
>   stw r10, 8(r11)
>   REST_GPR(10, r11)
> @@ -322,7 +321,7 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
>   li  r0,0
>   
>   /*
> -  * Leaving a stale exception_marker on the stack can confuse
> +  * Leaving a stale STACK_FRAME_REGS_MARKER on the stack can confuse
>* the reliable stack unwinder later on. Clear it.
>*/
>   stw r0,8(r1)
> @@ -374,7 +373,7 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
>   mtspr   SPRN_XER,r5
>   
>   /*
> -  * Leaving a stale exception_marker on the stack can confuse
> +  * Leaving a stale STACK_FRAME_REGS_MARKER on the stack can confuse
>* the reliable stack unwinder later on. Clear it.
>*/
>   stw r0,8(r1)
> diff --git a/arch/powerpc/kernel/exceptions-64e.S 
> b/arch/powerpc/kernel/exceptions-64e.S
> index 67dc4e3179a0..08b7d6bd4da6 100644
> --- a/arch/powerpc/kernel/exceptions-64e.S
> +++ b/arch/powerpc/kernel/exceptions-64e.S
> @@ -389,7 +389,7 @@ exc_##n##_common: 
> \
>   ld  r9,excf+EX_R1(r13); /* load orig r1 back from PACA */   \
>   lwz r10,excf+EX_CR(r13);/* load orig CR back from PACA  */  \
>   lbz r11,PACAIRQSOFTMASK(r13); /* get current IRQ softe */   \
> - ld  r12,exception_marker@toc(r2);   \
> + li  r12,STACK_FRAME_REGS_MARKER;\
>   li

Re: [PATCH v2 10/10] drm/ofdrm: Support color management

2022-09-21 Thread Geert Uytterhoeven

Hi Thomas,

On Wed, Sep 21, 2022 at 2:55 PM Thomas Zimmermann  wrote:
> Am 05.08.22 um 02:19 schrieb Benjamin Herrenschmidt:
> > On Wed, 2022-07-20 at 16:27 +0200, Thomas Zimmermann wrote:
> >> +#if !defined(CONFIG_PPC)
> >> +static inline void out_8(void __iomem *addr, int val)
> >> +{ }
> >> +static inline void out_le32(void __iomem *addr, int val)
> >> +{ }
> >> +static inline unsigned int in_le32(const void __iomem *addr)
> >> +{
> >> +   return 0;
> >> +}
> >> +#endif
> >
> > These guys could just be replaced with readb/writel/readl respectively
> > (beware of the argument swap).
>
> I only added them for COMPILE_TEST. There appears to be no portable
> interface that implements out_le32() and in_le32()?

iowrite32() and ioread32()?

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [PATCH 1/2] powerpc/64s: Fix GENERIC_CPU build flags for PPC970 / G5

2022-09-21 Thread Christophe Leroy



Le 21/09/2022 à 03:41, Nicholas Piggin a écrit :
> Big-endian GENERIC_CPU supports 970, but builds with -mcpu=power5.
> POWER5 is ISA v2.02 whereas 970 is v2.01 plus Altivec. 2.02 added
> the popcntb instruction which a compiler might use.
> 
> Use -mcpu=power4.
> 
> Fixes: 471d7ff8b51b ("powerpc/64s: Remove POWER4 support")
> Signed-off-by: Nicholas Piggin 
> ---
>   arch/powerpc/Makefile | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
> index 02742facf895..140a5e6471fe 100644
> --- a/arch/powerpc/Makefile
> +++ b/arch/powerpc/Makefile
> @@ -152,7 +152,7 @@ CFLAGS-$(CONFIG_GENERIC_CPU) += -mcpu=power8
>   CFLAGS-$(CONFIG_GENERIC_CPU) += $(call 
> cc-option,-mtune=power9,-mtune=power8)
>   else
>   CFLAGS-$(CONFIG_GENERIC_CPU) += $(call cc-option,-mtune=power7,$(call 
> cc-option,-mtune=power5))
> -CFLAGS-$(CONFIG_GENERIC_CPU) += $(call cc-option,-mcpu=power5,-mcpu=power4)
> +CFLAGS-$(CONFIG_GENERIC_CPU) += -mcpu=power4
>   endif
>   else ifdef CONFIG_PPC_BOOK3E_64
>   CFLAGS-$(CONFIG_GENERIC_CPU) += -mcpu=powerpc64

That else ifdef CONFIG_PPC_BOOK3E_64 looks odd.

I might have forgotten to drop something. Since commit d6b551b8f90c 
("powerpc/64e: Fix build failure with GCC 12 (unrecognized opcode: 
`wrteei')") it is not possible anymore to select CONFIG_GENERIC_CPU if 
not book3s64.

Christophe

Re: [PATCH] Revert "powerpc/rtas: Implement reentrant rtas call"

2022-09-21 Thread Nathan Lynch

"Nicholas Piggin"  writes:

> On Mon Sep 19, 2022 at 11:51 PM AEST, Nathan Lynch wrote:
>> > I wonder - would it be worth making the panic path use a separate
>> > "emergency" rtas_args buffer as well? If a CPU is actually "stuck" in
>> > RTAS at panic time, then leaving rtas.args untouched might make the
>> > ibm,int-off, ibm,set-xive, ibm,os-term, and any other RTAS calls we
>> > incur on the panic path more likely to succeed.
>
> Yeah I think that's probably not a bad idea. Not sure if you've got the
> bandwidth to take on doing the patch but be my guest if you do :)
> Otherwise we can file it in github issues.

Not sure I'll be able to work it soon. I filed
https://github.com/linuxppc/issues/issues/435

[powerpc] memcpy warning drivers/scsi/scsi_transport_fc.c:581 (next-20220921)

2022-09-21 Thread Sachin Sant

While booting recent linux-next kernel on a Power server following
warning is seen:

[6.427054] lpfc 0022:01:00.0: 0:6468 Set host date / time: Status x10:
[6.471457] lpfc 0022:01:00.0: 0:6448 Dual Dump is enabled
[7.432161] [ cut here ]
[7.432178] memcpy: detected field-spanning write (size 8) of single field 
">event_data" at drivers/scsi/scsi_transport_fc.c:581 (size 4)
[7.432201] WARNING: CPU: 0 PID: 16 at drivers/scsi/scsi_transport_fc.c:581 
fc_host_post_fc_event+0x214/0x300 [scsi_transport_fc]
[7.432228] Modules linked in: sr_mod(E) cdrom(E) sd_mod(E) sg(E) lpfc(E+) 
nvmet_fc(E) ibmvscsi(E) nvmet(E) scsi_transport_srp(E) ibmveth(E) nvme_fc(E) 
nvme(E) nvme_fabrics(E) nvme_core(E) t10_pi(E) scsi_transport_fc(E) 
crc64_rocksoft(E) crc64(E) tg3(E) ipmi_devintf(E) ipmi_msghandler(E) fuse(E)
[7.432263] CPU: 0 PID: 16 Comm: kworker/0:1 Tainted: GE  
6.0.0-rc6-next-20220921 #38
[7.432270] Workqueue: events work_for_cpu_fn
[7.432277] NIP:  c00801366a2c LR: c00801366a28 CTR: 007088ec
[7.432282] REGS: c380b6d0 TRAP: 0700   Tainted: GE  
 (6.0.0-rc6-next-20220921)
[7.432288] MSR:  8282b033   CR: 
48002824  XER: 0005
[7.432304] CFAR: c01555b4 IRQMASK: 0 
   GPR00: c00801366a28 c380b970 c00801388300 
0084 
   GPR04: 7fff c380b730 c380b728 
0027 
   GPR08: c00db7007f98 0001 0027 
c2947378 
   GPR12: 2000 c2dc c018e3d8 
c3045740 
   GPR16:    
 
   GPR20:  0030 010010df 
c380ba90 
   GPR24: 0001 c30ea000  
c2da2a08 
   GPR28: 0040 c00073f52400 0008 
c000940b9834 
[7.432365] NIP [c00801366a2c] fc_host_post_fc_event+0x214/0x300 
[scsi_transport_fc]
[7.432374] LR [c00801366a28] fc_host_post_fc_event+0x210/0x300 
[scsi_transport_fc]
[7.432383] Call Trace:
[7.432385] [c380b970] [c00801366a28] 
fc_host_post_fc_event+0x210/0x300 [scsi_transport_fc] (unreliable)
[7.432396] [c380ba30] [c00801c23028] 
lpfc_post_init_setup+0xc0/0x1f0 [lpfc]
[7.432429] [c380bab0] [c00801c24e00] 
lpfc_pci_probe_one_s4.isra.59+0x428/0xa10 [lpfc]
[7.432455] [c380bb40] [c00801c255a4] 
lpfc_pci_probe_one+0x1bc/0xb70 [lpfc]
[7.432480] [c380bbe0] [c07fdc7c] local_pci_probe+0x6c/0x110
[7.432489] [c380bc60] [c017bdf8] work_for_cpu_fn+0x38/0x60
[7.432494] [c380bc90] [c01812d4] 
process_one_work+0x2b4/0x5b0
[7.432501] [c380bd30] [c0181820] worker_thread+0x250/0x600
[7.432508] [c380bdc0] [c018e4f4] kthread+0x124/0x130
[7.432514] [c380be10] [c000cdf4] 
ret_from_kernel_thread+0x5c/0x64
[7.432521] Instruction dump:
[7.432524] 2f89 409eff5c 3ca2 e8a58170 3c62 e8638178 3921 
38c4 
[7.432535] 7fc4f378 992a 4800414d e8410018 <0fe0> 7fa3eb78 3881 
480044d1 
[7.432546] ---[ end trace  ]---
[7.471075] lpfc 0022:01:00.0: 0:3176 Port Name 0 Physical Link is functional
[7.471405] lpfc 0022:01:00.1: enabling device (0144 -> 0146)

The warning was added by the following patch
commit 54d9469bc515dc5fcbc20eecbe19cea868b70d68
fortify: Add run-time WARN for cross-field memcpy()

Should this be fixed in the driver or is this a false warning?

Thanks
- Sachin

Re: [RFC PATCH 5/7] powerpc/64s: update generic cpu option name and compiler flags

2022-09-21 Thread Segher Boessenkool

Hi!

On Wed, Sep 21, 2022 at 11:01:18AM +1000, Nicholas Piggin wrote:
> On Wed Sep 21, 2022 at 8:16 AM AEST, Segher Boessenkool wrote:
> > On Tue, Sep 20, 2022 at 12:01:47AM +1000, Nicholas Piggin wrote:
> > > Update the 64s GENERIC_CPU option. POWER4 support has been dropped, so
> > > make that clear in the option name.
> >
> > AFAIR the minimum now is POWER4+ (ISA 2.01), not POWER5 (ISA 2.02).
> 
> It's POWER5 now, because of commit 471d7ff8b5 ("powerpc/64s: Remove
> POWER4 support"), which is misguided about POWER4+ and also introduced
> the -mcpu=power5 bug on 970 builds :\

ISA 2.01 added just a few things (LPES[0], HDEC, some PM things, but
crucially also anything that sets MSR[PR] also sets MSR[EE] since then).

> Not sure it's worth adding POWER4+ support back but if someone has a
> POWER4+ or adds it to QEMU TCG, I will do the patch.

970 is 2.01 -- pretending it is 2.02 is a ticking time bomb: the popcntb
insn will be generated for popcount and parity intrinsics, which can be
generated by generic code!

> > > -mtune= before power8 is dropped because the minimum gcc version
> > > supports power8, and tuning is made consistent between big and little
> > > endian.
> >
> > Tuning for p8 on e.g. 970 gives quite bad results.  No idea if anyone
> > cares, but this is a serious regression if so.
> 
> It's for "generic" kernel so we set low minimum but higher tune,
> assuming that people would usually have newer, so it was already
> doing -mtune=power7.
> 
> We could make a specific 970/G5 entry though, since those still
> have users.

If that uses -mcpu=power4 (which means ISA 2.01 btw!) all is fine
already?  (Or -mcpu=970, same thing really, it just allows VMX as well).

Thanks for taking care of this Nick, much appreciated!


Segher

[PATCH V2 2/3] tools/perf/tests: Fix branch stack sampling test to include sanity check for branch filter

2022-09-21 Thread Athira Rajeev

commit b55878c90ab9 ("perf test: Add test for branch stack sampling")
added test for branch stack sampling. There is a sanity check in the
beginning to skip the test if the hardware doesn't support branch stack
sampling.

Snippet
<<>>
skip the test if the hardware doesn't support branch stack sampling
perf record -b -o- -B true > /dev/null 2>&1 || exit 2
<<>>

But the testcase also uses branch sample types: save_type, any. if any
platform doesn't support the branch filters used in the test, the testcase
will fail. In powerpc, currently mutliple branch filters are not supported
and hence this test fails in powerpc. Fix the sanity check to look at
the support for branch filters used in this test before proceeding with
the test.

Fixes: b55878c90ab9 ("perf test: Add test for branch stack sampling")
Reported-by: Disha Goel 
Signed-off-by: Athira Rajeev 
Reviewed-by: Kajol Jain 
---
 tools/perf/tests/shell/test_brstack.sh | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/perf/tests/shell/test_brstack.sh 
b/tools/perf/tests/shell/test_brstack.sh
index c644f94a6500..ec801cffae6b 100755
--- a/tools/perf/tests/shell/test_brstack.sh
+++ b/tools/perf/tests/shell/test_brstack.sh
@@ -12,7 +12,8 @@ if ! [ -x "$(command -v cc)" ]; then
 fi
 
 # skip the test if the hardware doesn't support branch stack sampling
-perf record -b -o- -B true > /dev/null 2>&1 || exit 2
+# and if the architecture doesn't support filter types: any,save_type,u
+perf record -b -o- -B --branch-filter any,save_type,u true > /dev/null 2>&1 || 
exit 2
 
 TMPDIR=$(mktemp -d /tmp/__perf_test.program.X)
 
-- 
2.31.1

[PATCH V2 3/3] tools/testing/selftests/powerpc: Update the bhrb filter sampling test to test for multiple branch filters

2022-09-21 Thread Athira Rajeev

For PERF_SAMPLE_BRANCH_STACK sample type, different branch_sample_type,
ie branch filters are supported. The testcase "bhrb_filter_map_test"
tests the valid and invalid filter maps in different powerpc platforms.
Update this testcase to include scenario to cover multiple branch
filters at sametime. Since powerpc doesn't support multiple filters at
sametime, expect failure during perf_event_open.

Reported-by: Disha Goel 
Signed-off-by: Athira Rajeev 
Reviewed-by: Kajol Jain 
---
Changelog:
 Fixed compile error for undefined branch sample type
 by using PERF_SAMPLE_BRANCH_ANY_CALL instead of
 PERF_SAMPLE_BRANCH_TYPE_SAVE

 .../powerpc/pmu/sampling_tests/bhrb_filter_map_test.c| 9 +
 1 file changed, 9 insertions(+)

diff --git 
a/tools/testing/selftests/powerpc/pmu/sampling_tests/bhrb_filter_map_test.c 
b/tools/testing/selftests/powerpc/pmu/sampling_tests/bhrb_filter_map_test.c
index 8182647c63c8..3f43c315c666 100644
--- a/tools/testing/selftests/powerpc/pmu/sampling_tests/bhrb_filter_map_test.c
+++ b/tools/testing/selftests/powerpc/pmu/sampling_tests/bhrb_filter_map_test.c
@@ -96,6 +96,15 @@ static int bhrb_filter_map_test(void)
}
}
 
+   /*
+* Combine filter maps which includes a valid branch filter and an 
invalid branch
+* filter. Example: any ( PERF_SAMPLE_BRANCH_ANY) and any_call
+* (PERF_SAMPLE_BRANCH_ANY_CALL).
+* The perf_event_open should fail in this case.
+*/
+   event.attr.branch_sample_type = PERF_SAMPLE_BRANCH_ANY | 
PERF_SAMPLE_BRANCH_ANY_CALL;
+   FAIL_IF(!event_open());
+
return 0;
 }
 
-- 
2.31.1

[PATCH V2 1/3] powerpc/perf: Fix branch_filter support for multiple filters in powerpc

2022-09-21 Thread Athira Rajeev

For PERF_SAMPLE_BRANCH_STACK sample type, different branch_sample_type
ie branch filters are supported. The branch filters are requested via
event attribute "branch_sample_type". Multiple branch filters can be
passed in event attribute.

Example:
perf record -b -o- -B --branch-filter any,ind_call true

Powerpc does not support having multiple branch filters at
sametime. In powerpc, branch filters for branch stack sampling
is set via MMCRA IFM bits [32:33]. But currently when requesting
for multiple filter types, the "perf record" command does not
report any error.

Example:
perf record -b -o- -B --branch-filter any,save_type true
perf record -b -o- -B --branch-filter any,ind_call true

The "bhrb_filter_map" function in PMU driver code does the
validity check for supported branch filters. But this check
is done for single filter. Hence "perf record" will proceed
here without reporting any error.

Fix power_pmu_event_init to return EOPNOTSUPP when multiple
branch filters are requested in the event attr.

After the fix:
perf record --branch-filter any,ind_call -- ls
Error:
cycles: PMU Hardware doesn't support sampling/overflow-interrupts.
Try 'perf stat'

Reported-by: Disha Goel 
Signed-off-by: Athira Rajeev 
Reviewed-by: Madhavan Srinivasan 
Tested-by: Disha Goel
Reviewed-by: Kajol Jain 
---
Changelog:
 Addressed review comment from Michael Ellerman to
 do irq restore before returning EOPNOTSUPP

 arch/powerpc/perf/core-book3s.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 13919eb96931..a939ac3b3a84 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2131,6 +2131,23 @@ static int power_pmu_event_init(struct perf_event *event)
if (has_branch_stack(event)) {
u64 bhrb_filter = -1;
 
+   /*
+* powerpc does not support having multiple branch filters
+* at sametime. Branch filters are set via MMCRA IFM[32:33]
+* bits for power8 and above. Return EOPNOTSUPP when multiple
+* branch filters are requested in the event attr.
+*
+* When opening event via perf_event_open, branch_sample_type
+* gets adjusted in perf_copy_attr function. Kernel will
+* automatically adjust the branch_sample_type based on the
+* event modifier settings to include 
PERF_SAMPLE_BRANCH_PLM_ALL.
+* Hence drop the check for PERF_SAMPLE_BRANCH_PLM_ALL.
+*/
+   if (hweight64(event->attr.branch_sample_type & 
~PERF_SAMPLE_BRANCH_PLM_ALL) > 1) {
+   local_irq_restore(irq_flags);
+   return -EOPNOTSUPP;
+   }
+
if (ppmu->bhrb_filter_map)
bhrb_filter = ppmu->bhrb_filter_map(
event->attr.branch_sample_type);
-- 
2.31.1

Re: [PATCH v1 2/3] powerpc/prom_init: drop PROM_BUG()

2022-09-21 Thread David Hildenbrand


On 21.09.22 15:02, Michael Ellerman wrote:

David Hildenbrand  writes:

Unused, let's drop it.

Signed-off-by: David Hildenbrand 
---
  arch/powerpc/kernel/prom_init.c | 6 --
  1 file changed, 6 deletions(-)


Thanks. I'll take this one via the powerpc tree, and the others can go
via wherever?


Makes sense; I'll drop this patch in case I have to resend, assuming 
it's in your tree.


Thanks!

--
Thanks,

David / dhildenb

Re: [PATCH v1 2/3] powerpc/prom_init: drop PROM_BUG()

2022-09-21 Thread Michael Ellerman

David Hildenbrand  writes:
> Unused, let's drop it.
>
> Signed-off-by: David Hildenbrand 
> ---
>  arch/powerpc/kernel/prom_init.c | 6 --
>  1 file changed, 6 deletions(-)

Thanks. I'll take this one via the powerpc tree, and the others can go
via wherever?

cheers

> diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
> index a6669c40c1db..d464ba412084 100644
> --- a/arch/powerpc/kernel/prom_init.c
> +++ b/arch/powerpc/kernel/prom_init.c
> @@ -96,12 +96,6 @@ static int of_workarounds __prombss;
>  #define OF_WA_CLAIM  1   /* do phys/virt claim separately, then map */
>  #define OF_WA_LONGTRAIL  2   /* work around longtrail bugs */
>  
> -#define PROM_BUG() do {  \
> -prom_printf("kernel BUG at %s line 0x%x!\n", \
> - __FILE__, __LINE__);\
> - __builtin_trap();   \
> -} while (0)
> -
>  #ifdef DEBUG_PROM
>  #define prom_debug(x...) prom_printf(x)
>  #else
> -- 
> 2.37.3

Re: [PATCH] powerpc: Save AMR/IAMR when switching tasks

2022-09-21 Thread Michael Ellerman

Christophe Leroy  writes:
> Le 19/09/2022 à 14:37, Michael Ellerman a écrit :
>> Christophe Leroy  writes:
>>> Le 16/09/2022 à 07:05, Samuel Holland a écrit :
 With CONFIG_PREEMPT=y (involuntary preemption enabled), it is possible
 to switch away from a task inside copy_{from,to}_user. This left the CPU
 with userspace access enabled until after the next IRQ or privilege
 level switch, when AMR/IAMR got reset to AMR_KU[AE]P_BLOCKED. Then, when
 switching back to the original task, the userspace access would fault:
>>>
>>> This is not supposed to happen. You never switch away from a task
>>> magically. Task switch will always happen in an interrupt, that means
>>> copy_{from,to}_user() get interrupted.
>> 
>> Unfortunately this isn't true when CONFIG_PREEMPT=y.
>
> Argh, yes, I wrote the above with the assumption that we properly follow 
> the main principles that no complex fonction is to be used while KUAP is 
> open ... Which is apparently not true here. x86 would have detected it 
> with objtool, but we don't have it yet in powerpc.

Yes and yes :/

>> We can switch away without an interrupt via:
>>__copy_tofrom_user()
>>  -> __copy_tofrom_user_power7()
>> -> exit_vmx_usercopy()
>>-> preempt_enable()
>>   -> __preempt_schedule()
>>  -> preempt_schedule()
>> -> preempt_schedule_common()
>>-> __schedule()
>
>
> Should we use preempt_enable_no_resched() to avoid that ?

Good point :)

...
>> 
>> Still I think it might be our best option for an easy fix.
>
> Wouldn't it be even easier and less abusive to use 
> preemt_enable_no_resched() ? Or is there definitively a good reason to 
> resched after a VMX copy while we don't with regular copies ?

I don't think it's important to reschedule there. I guess it means
another task that wants to preempt will have to wait a little longer,
but I doubt it's measurable.

One reason to do the KUAP lock is it also protects us from running
disable_kernel_altivec() with KUAP unlocked.

That in turn calls into msr_check_and_clear() and
__msr_check_and_clear(), none of which is a problem per-se, but we'd
need to make that all notrace to be 100% safe.

At the moment we're running that all with KUAP unlocked anyway, so using
preempt_enable_no_resched() would fix the actual bug and is much nicer
than my patch, so we should probably do that.

We can look at making disable_kernel_altivec() etc. notrace as a
follow-up.

cheers

Re: [PATCH v2 10/10] drm/ofdrm: Support color management

2022-09-21 Thread Thomas Zimmermann


Hi

Am 05.08.22 um 02:19 schrieb Benjamin Herrenschmidt:

On Wed, 2022-07-20 at 16:27 +0200, Thomas Zimmermann wrote:

+#if !defined(CONFIG_PPC)
+static inline void out_8(void __iomem *addr, int val)
+{ }
+static inline void out_le32(void __iomem *addr, int val)
+{ }
+static inline unsigned int in_le32(const void __iomem *addr)
+{
+   return 0;
+}
+#endif


These guys could just be replaced with readb/writel/readl respectively
(beware of the argument swap).


I only added them for COMPILE_TEST. There appears to be no portable 
interface that implements out_le32() and in_le32()?


Best regards
Thomas



Cheers,
Ben.



--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


OpenPGP_signature
Description: OpenPGP digital signature

Re: [PATCH v2 09/10] drm/ofdrm: Add per-model device function

2022-09-21 Thread Thomas Zimmermann


Hi

Am 05.08.22 um 02:22 schrieb Benjamin Herrenschmidt:

On Tue, 2022-07-26 at 16:40 +0200, Michal Suchánek wrote:

Hello,

On Tue, Jul 26, 2022 at 03:38:37PM +0200, Javier Martinez Canillas wrote:

On 7/20/22 16:27, Thomas Zimmermann wrote:

Add a per-model device-function structure in preparation of adding
color-management support. Detection of the individual models has been
taken from fbdev's offb.

Signed-off-by: Thomas Zimmermann 
---


Reviewed-by: Javier Martinez Canillas 

[...]


+static bool is_avivo(__be32 vendor, __be32 device)
+{
+   /* This will match most R5xx */
+   return (vendor == 0x1002) &&
+  ((device >= 0x7100 && device < 0x7800) || (device >= 0x9400));
+}


Maybe add some constant macros to not have these magic numbers ?


This is based on the existing fbdev implementation's magic numbers:

drivers/video/fbdev/offb.c: ((*did >= 0x7100 && *did < 0x7800) 
||

Of course, it would be great if somebody knowledgeable could clarify
those.


I don't think anybody remembers :-) Vendor 0x1002 is PCI_VENDOR_ID_ATI,


I do :)


but the rest is basically ranges of PCI IDs for which we don't have
symbolic constants.


Should we add them to the official list in pci_ids.h?  I cannot find 
0x7800. The others are R520 and R600.


Best regards
Thomas



Cheers,
Ben.



--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


OpenPGP_signature
Description: OpenPGP digital signature

Re: [PATCH v2 08/10] drm/ofdrm: Add CRTC state

2022-09-21 Thread Thomas Zimmermann


Hi

Am 26.07.22 um 15:36 schrieb Javier Martinez Canillas:

On 7/20/22 16:27, Thomas Zimmermann wrote:

Add a dedicated CRTC state to ofdrm to later store information for
palette updates.

Signed-off-by: Thomas Zimmermann 
---
  drivers/gpu/drm/tiny/ofdrm.c | 62 ++--
  


Reviewed-by: Javier Martinez Canillas 

[...]


+static void ofdrm_crtc_reset(struct drm_crtc *crtc)
+{
+   struct ofdrm_crtc_state *ofdrm_crtc_state;
+
+   if (crtc->state) {
+   ofdrm_crtc_state_destroy(to_ofdrm_crtc_state(crtc->state));
+   crtc->state = NULL; /* must be set to NULL here */
+   }
+
+   ofdrm_crtc_state = kzalloc(sizeof(*ofdrm_crtc_state), GFP_KERNEL);
+   if (!ofdrm_crtc_state)
+   return;
+   __drm_atomic_helper_crtc_reset(crtc, _crtc_state->base);
+}
+


IMO this function is hard to read, I would instead write it as following:

static void ofdrm_crtc_reset(struct drm_crtc *crtc)
{
 struct ofdrm_crtc_state *ofdrm_crtc_state = 
kzalloc(sizeof(*ofdrm_crtc_state), GFP_KERNEL);

if (!ofdrm_crtc_state)
return;

 if (crtc->state) {
 ofdrm_crtc_state_destroy(to_ofdrm_crtc_state(crtc->state));
crtc->state = NULL; /* must be set to NULL here */
}

 __drm_atomic_helper_crtc_reset(crtc, _crtc_state->base);
}

Also with that form I think that the crtc->state = NULL could just be dropped ?


I once had to add this line to a driver to make the DRM helpers work. 
But I cannot find any longer why. Maybe it's been resolved meanwhile.


Best regards
Thomas





--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


OpenPGP_signature
Description: OpenPGP digital signature

Re: [PATCH v2 07/10] drm/ofdrm: Add ofdrm for Open Firmware framebuffers

2022-09-21 Thread Thomas Zimmermann


Hi

Am 26.07.22 um 15:17 schrieb Javier Martinez Canillas:

Hello Thomas,

On 7/20/22 16:27, Thomas Zimmermann wrote:

Open Firmware provides basic display output via the 'display' node.
DT platform code already provides a device that represents the node's
framebuffer. Add a DRM driver for the device. The display mode and
color format is pre-initialized by the system's firmware. Runtime
modesetting via DRM is not possible. The display is useful during
early boot stages or as error fallback.



I'm not familiar with OF display but the driver looks good to me.

Reviewed-by: Javier Martinez Canillas 

I just have a few questions below.

[...]


+static int ofdrm_primary_plane_helper_atomic_check(struct drm_plane *plane,
+  struct drm_atomic_state 
*new_state)
+{
+   struct drm_plane_state *new_plane_state = 
drm_atomic_get_new_plane_state(new_state, plane);
+   struct drm_crtc_state *new_crtc_state;
+   int ret;
+
+   if (!new_plane_state->fb)
+   return 0;
+
+   new_crtc_state = drm_atomic_get_new_crtc_state(new_state, 
new_plane_state->crtc);
+
+   ret = drm_atomic_helper_check_plane_state(new_plane_state, 
new_crtc_state,
+ DRM_PLANE_HELPER_NO_SCALING,
+ DRM_PLANE_HELPER_NO_SCALING,
+ false, false);
+   if (ret)
+   return ret;
+
+   return 0;
+}


This seems to be exactly the same check than used in the simpledrm driver.
Maybe could be moved to the fwfb helper library too ?


I've meanwhile dropped fwfb helpers. Color management requires specific 
code here, so there's no much to share anyway.




[...]


+
+static void ofdrm_crtc_helper_atomic_disable(struct drm_crtc *crtc,
+struct drm_atomic_state *old_state)
+{
+   /*
+* Always enabled; disabling clears the screen in the
+* primary plane's atomic_disable function.
+*/
+}
+


Same comment than for simpledrm, are these no-op helpers really needed ?


In simpledrm, I've meanwhile removed them. I'll do so here as well.



[...]


+static const struct of_device_id ofdrm_of_match_display[] = {
+   { .compatible = "display", },
+   { },
+};
+MODULE_DEVICE_TABLE(of, ofdrm_of_match_display);
+


I don't see a binding for this in Documentation/devicetree/bindings/display.
Do we need one or it's that only required for FDT and not Open Firmware DT ?



No idea. The device is being created in drivers/of/platform.c. If offb 
didn't need these bindings, ofdrm probably won't need them either.


Best regards
Thomas


--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


OpenPGP_signature
Description: OpenPGP digital signature

Re: [PATCH v3] hugetlb: simplify hugetlb handling in follow_page_mask

2022-09-21 Thread Baolin Wang





On 9/19/2022 10:13 AM, Mike Kravetz wrote:

During discussions of this series [1], it was suggested that hugetlb
handling code in follow_page_mask could be simplified.  At the beginning
of follow_page_mask, there currently is a call to follow_huge_addr which
'may' handle hugetlb pages.  ia64 is the only architecture which provides
a follow_huge_addr routine that does not return error.  Instead, at each
level of the page table a check is made for a hugetlb entry.  If a hugetlb
entry is found, a call to a routine associated with that entry is made.

Currently, there are two checks for hugetlb entries at each page table
level.  The first check is of the form:
 if (p?d_huge())
 page = follow_huge_p?d();
the second check is of the form:
 if (is_hugepd())
 page = follow_huge_pd().

We can replace these checks, as well as the special handling routines
such as follow_huge_p?d() and follow_huge_pd() with a single routine to
handle hugetlb vmas.

A new routine hugetlb_follow_page_mask is called for hugetlb vmas at the
beginning of follow_page_mask.  hugetlb_follow_page_mask will use the
existing routine huge_pte_offset to walk page tables looking for hugetlb
entries.  huge_pte_offset can be overwritten by architectures, and already
handles special cases such as hugepd entries.

[1] 
https://lore.kernel.org/linux-mm/cover.1661240170.git.baolin.w...@linux.alibaba.com/

Suggested-by: David Hildenbrand 
Signed-off-by: Mike Kravetz 


LGTM, and works well on my machine. So feel free to add:
Reviewed-by: Baolin Wang 
Tested-by: Baolin Wang

Re: [PATCH v3 00/16] objtool: Enable and implement --mcount option on powerpc

2022-09-21 Thread Sathvika Vasireddy


Hi Josh,

On 14/09/22 05:45, Josh Poimboeuf wrote:


On Tue, Sep 13, 2022 at 04:13:52PM +0200, Peter Zijlstra wrote:

On Mon, Sep 12, 2022 at 01:50:04PM +0530, Sathvika Vasireddy wrote:

Christophe Leroy (4):
   objtool: Fix SEGFAULT
   objtool: Use target file endianness instead of a compiled constant
   objtool: Use target file class size instead of a compiled constant
Sathvika Vasireddy (12):
   objtool: Add --mnop as an option to --mcount
   objtool: Read special sections with alts only when specific options are 
selected
   objtool: Use macros to define arch specific reloc types
   objtool: Add arch specific function arch_ftrace_match()
   objtool/powerpc: Enable objtool to be built on ppc
   objtool/powerpc: Add --mcount specific implementation
  tools/objtool/arch/powerpc/Build  |   2 +
  tools/objtool/arch/powerpc/decode.c   | 101 ++
  .../arch/powerpc/include/arch/cfi_regs.h  |  11 ++
  tools/objtool/arch/powerpc/include/arch/elf.h |  10 ++
  .../arch/powerpc/include/arch/special.h   |  21 
  tools/objtool/arch/powerpc/special.c  |  19 
  tools/objtool/arch/x86/decode.c   |   5 +
  tools/objtool/arch/x86/include/arch/elf.h |   2 +
  .../arch/x86/include/arch/endianness.h|   9 --
  tools/objtool/builtin-check.c |  14 +++
  tools/objtool/check.c |  53 -
  tools/objtool/elf.c   |   8 +-
  tools/objtool/include/objtool/arch.h  |   2 +
  tools/objtool/include/objtool/builtin.h   |   1 +
  tools/objtool/include/objtool/elf.h   |   8 ++
  tools/objtool/include/objtool/endianness.h|  32 +++---
  tools/objtool/orc_dump.c  |  11 +-
  tools/objtool/orc_gen.c   |   4 +-
  tools/objtool/special.c   |   3 +-

This seems to painlessly merge with the objtool changes I have in
queue.git/call-depth-tracking. After that all I need is the below little
patch to make it to build ppc44x_defconfig + CONFIG_DYNAMIC_FTRACE=y.

So I think these patches can go through the powerpc tree if Michael
wants. Josh you okay with that, or should we do something complicated?

I'm all for avoiding complicated, but let me try to give it a proper
review first.


Did you get a chance to review this patch set?

- Sathvika

Re: [PATCH v4 1/2] mm/tlbbatch: Introduce arch_tlbbatch_should_defer()

2022-09-21 Thread Barry Song

On Wed, Sep 21, 2022 at 8:45 PM Yicong Yang  wrote:
>
> From: Anshuman Khandual 
>
> The entire scheme of deferred TLB flush in reclaim path rests on the
> fact that the cost to refill TLB entries is less than flushing out
> individual entries by sending IPI to remote CPUs. But architecture
> can have different ways to evaluate that. Hence apart from checking
> TTU_BATCH_FLUSH in the TTU flags, rest of the decision should be
> architecture specific.
>
> Signed-off-by: Anshuman Khandual 
> [https://lore.kernel.org/linuxppc-dev/20171101101735.2318-2-khand...@linux.vnet.ibm.com/]
> Signed-off-by: Yicong Yang 
> [Rebase and fix incorrect return value type]
> Reviewed-by: Kefeng Wang 
> Reviewed-by: Anshuman Khandual 
> ---

Reviewed-by: Barry Song 

>  arch/x86/include/asm/tlbflush.h | 12 
>  mm/rmap.c   |  9 +
>  2 files changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
> index cda3118f3b27..8a497d902c16 100644
> --- a/arch/x86/include/asm/tlbflush.h
> +++ b/arch/x86/include/asm/tlbflush.h
> @@ -240,6 +240,18 @@ static inline void flush_tlb_page(struct vm_area_struct 
> *vma, unsigned long a)
> flush_tlb_mm_range(vma->vm_mm, a, a + PAGE_SIZE, PAGE_SHIFT, false);
>  }
>
> +static inline bool arch_tlbbatch_should_defer(struct mm_struct *mm)
> +{
> +   bool should_defer = false;
> +
> +   /* If remote CPUs need to be flushed then defer batch the flush */
> +   if (cpumask_any_but(mm_cpumask(mm), get_cpu()) < nr_cpu_ids)
> +   should_defer = true;
> +   put_cpu();
> +
> +   return should_defer;
> +}
> +
>  static inline u64 inc_mm_tlb_gen(struct mm_struct *mm)
>  {
> /*
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 93d5a6f793d2..cd8cf5cb0b01 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -690,17 +690,10 @@ static void set_tlb_ubc_flush_pending(struct mm_struct 
> *mm, bool writable)
>   */
>  static bool should_defer_flush(struct mm_struct *mm, enum ttu_flags flags)
>  {
> -   bool should_defer = false;
> -
> if (!(flags & TTU_BATCH_FLUSH))
> return false;
>
> -   /* If remote CPUs need to be flushed then defer batch the flush */
> -   if (cpumask_any_but(mm_cpumask(mm), get_cpu()) < nr_cpu_ids)
> -   should_defer = true;
> -   put_cpu();
> -
> -   return should_defer;
> +   return arch_tlbbatch_should_defer(mm);
>  }
>
>  /*
> --
> 2.24.0
>

[PATCH v4 0/2] mm: arm64: bring up BATCHED_UNMAP_TLB_FLUSH

2022-09-21 Thread Yicong Yang

From: Yicong Yang 

Though ARM64 has the hardware to do tlb shootdown, the hardware
broadcasting is not free.
A simplest micro benchmark shows even on snapdragon 888 with only
8 cores, the overhead for ptep_clear_flush is huge even for paging
out one page mapped by only one process:
5.36%  a.out[kernel.kallsyms]  [k] ptep_clear_flush

While pages are mapped by multiple processes or HW has more CPUs,
the cost should become even higher due to the bad scalability of
tlb shootdown.

The same benchmark can result in 16.99% CPU consumption on ARM64
server with around 100 cores according to Yicong's test on patch
4/4.

This patchset leverages the existing BATCHED_UNMAP_TLB_FLUSH by
1. only send tlbi instructions in the first stage -
arch_tlbbatch_add_mm()
2. wait for the completion of tlbi by dsb while doing tlbbatch
sync in arch_tlbbatch_flush()
Testing on snapdragon shows the overhead of ptep_clear_flush
is removed by the patchset. The micro benchmark becomes 5% faster
even for one page mapped by single process on snapdragon 888.

-v4:
1. Add tags from Kefeng and Anshuman, Thanks.
2. Limit the TLB batch/defer on systems with >4 CPUs, per Anshuman
3. Merge previous Patch 1,2-3 into one, per Anshuman
Link: 
https://lore.kernel.org/linux-mm/20220822082120.8347-1-yangyic...@huawei.com/

-v3:
1. Declare arch's tlbbatch defer support by arch_tlbbatch_should_defer() instead
   of ARCH_HAS_MM_CPUMASK, per Barry and Kefeng
2. Add Tested-by from Xin Hao
Link: 
https://lore.kernel.org/linux-mm/20220711034615.482895-1-21cn...@gmail.com/

-v2:
1. Collected Yicong's test result on kunpeng920 ARM64 server;
2. Removed the redundant vma parameter in arch_tlbbatch_add_mm()
   according to the comments of Peter Zijlstra and Dave Hansen
3. Added ARCH_HAS_MM_CPUMASK rather than checking if mm_cpumask
   is empty according to the comments of Nadav Amit

Thanks, Peter, Dave and Nadav for your testing or reviewing
, and comments.

-v1:
https://lore.kernel.org/lkml/20220707125242.425242-1-21cn...@gmail.com/


Anshuman Khandual (1):
  mm/tlbbatch: Introduce arch_tlbbatch_should_defer()

Barry Song (1):
  arm64: support batched/deferred tlb shootdown during page reclamation

 .../features/vm/TLB/arch-support.txt  |  2 +-
 arch/arm64/Kconfig|  1 +
 arch/arm64/include/asm/tlbbatch.h | 12 ++
 arch/arm64/include/asm/tlbflush.h | 37 ++-
 arch/x86/include/asm/tlbflush.h   | 15 +++-
 mm/rmap.c | 19 --
 6 files changed, 70 insertions(+), 16 deletions(-)
 create mode 100644 arch/arm64/include/asm/tlbbatch.h

-- 
2.24.0

[PATCH v4 1/2] mm/tlbbatch: Introduce arch_tlbbatch_should_defer()

2022-09-21 Thread Yicong Yang

From: Anshuman Khandual 

The entire scheme of deferred TLB flush in reclaim path rests on the
fact that the cost to refill TLB entries is less than flushing out
individual entries by sending IPI to remote CPUs. But architecture
can have different ways to evaluate that. Hence apart from checking
TTU_BATCH_FLUSH in the TTU flags, rest of the decision should be
architecture specific.

Signed-off-by: Anshuman Khandual 
[https://lore.kernel.org/linuxppc-dev/20171101101735.2318-2-khand...@linux.vnet.ibm.com/]
Signed-off-by: Yicong Yang 
[Rebase and fix incorrect return value type]
Reviewed-by: Kefeng Wang 
Reviewed-by: Anshuman Khandual 
---
 arch/x86/include/asm/tlbflush.h | 12 
 mm/rmap.c   |  9 +
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index cda3118f3b27..8a497d902c16 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -240,6 +240,18 @@ static inline void flush_tlb_page(struct vm_area_struct 
*vma, unsigned long a)
flush_tlb_mm_range(vma->vm_mm, a, a + PAGE_SIZE, PAGE_SHIFT, false);
 }
 
+static inline bool arch_tlbbatch_should_defer(struct mm_struct *mm)
+{
+   bool should_defer = false;
+
+   /* If remote CPUs need to be flushed then defer batch the flush */
+   if (cpumask_any_but(mm_cpumask(mm), get_cpu()) < nr_cpu_ids)
+   should_defer = true;
+   put_cpu();
+
+   return should_defer;
+}
+
 static inline u64 inc_mm_tlb_gen(struct mm_struct *mm)
 {
/*
diff --git a/mm/rmap.c b/mm/rmap.c
index 93d5a6f793d2..cd8cf5cb0b01 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -690,17 +690,10 @@ static void set_tlb_ubc_flush_pending(struct mm_struct 
*mm, bool writable)
  */
 static bool should_defer_flush(struct mm_struct *mm, enum ttu_flags flags)
 {
-   bool should_defer = false;
-
if (!(flags & TTU_BATCH_FLUSH))
return false;
 
-   /* If remote CPUs need to be flushed then defer batch the flush */
-   if (cpumask_any_but(mm_cpumask(mm), get_cpu()) < nr_cpu_ids)
-   should_defer = true;
-   put_cpu();
-
-   return should_defer;
+   return arch_tlbbatch_should_defer(mm);
 }
 
 /*
-- 
2.24.0

[PATCH v4 2/2] arm64: support batched/deferred tlb shootdown during page reclamation

2022-09-21 Thread Yicong Yang

From: Barry Song 

on x86, batched and deferred tlb shootdown has lead to 90%
performance increase on tlb shootdown. on arm64, HW can do
tlb shootdown without software IPI. But sync tlbi is still
quite expensive.

Even running a simplest program which requires swapout can
prove this is true,
 #include 
 #include 
 #include 
 #include 

 int main()
 {
 #define SIZE (1 * 1024 * 1024)
 volatile unsigned char *p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
  MAP_SHARED | MAP_ANONYMOUS, -1, 0);

 memset(p, 0x88, SIZE);

 for (int k = 0; k < 1; k++) {
 /* swap in */
 for (int i = 0; i < SIZE; i += 4096) {
 (void)p[i];
 }

 /* swap out */
 madvise(p, SIZE, MADV_PAGEOUT);
 }
 }

Perf result on snapdragon 888 with 8 cores by using zRAM
as the swap block device.

 ~ # perf record taskset -c 4 ./a.out
 [ perf record: Woken up 10 times to write data ]
 [ perf record: Captured and wrote 2.297 MB perf.data (60084 samples) ]
 ~ # perf report
 # To display the perf.data header info, please use --header/--header-only 
options.
 # To display the perf.data header info, please use --header/--header-only 
options.
 #
 #
 # Total Lost Samples: 0
 #
 # Samples: 60K of event 'cycles'
 # Event count (approx.): 35706225414
 #
 # Overhead  Command  Shared Object  Symbol
 #   ...  .  
.
 #
21.07%  a.out[kernel.kallsyms]  [k] _raw_spin_unlock_irq
 8.23%  a.out[kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
 6.67%  a.out[kernel.kallsyms]  [k] filemap_map_pages
 6.16%  a.out[kernel.kallsyms]  [k] __zram_bvec_write
 5.36%  a.out[kernel.kallsyms]  [k] ptep_clear_flush
 3.71%  a.out[kernel.kallsyms]  [k] _raw_spin_lock
 3.49%  a.out[kernel.kallsyms]  [k] memset64
 1.63%  a.out[kernel.kallsyms]  [k] clear_page
 1.42%  a.out[kernel.kallsyms]  [k] _raw_spin_unlock
 1.26%  a.out[kernel.kallsyms]  [k] 
mod_zone_state.llvm.8525150236079521930
 1.23%  a.out[kernel.kallsyms]  [k] xas_load
 1.15%  a.out[kernel.kallsyms]  [k] zram_slot_lock

ptep_clear_flush() takes 5.36% CPU in the micro-benchmark
swapping in/out a page mapped by only one process. If the
page is mapped by multiple processes, typically, like more
than 100 on a phone, the overhead would be much higher as
we have to run tlb flush 100 times for one single page.
Plus, tlb flush overhead will increase with the number
of CPU cores due to the bad scalability of tlb shootdown
in HW, so those ARM64 servers should expect much higher
overhead.

Further perf annonate shows 95% cpu time of ptep_clear_flush
is actually used by the final dsb() to wait for the completion
of tlb flush. This provides us a very good chance to leverage
the existing batched tlb in kernel. The minimum modification
is that we only send async tlbi in the first stage and we send
dsb while we have to sync in the second stage.

With the above simplest micro benchmark, collapsed time to
finish the program decreases around 5%.

Typical collapsed time w/o patch:
 ~ # time taskset -c 4 ./a.out
 0.21user 14.34system 0:14.69elapsed
w/ patch:
 ~ # time taskset -c 4 ./a.out
 0.22user 13.45system 0:13.80elapsed

Also, Yicong Yang added the following observation.
Tested with benchmark in the commit on Kunpeng920 arm64 server,
observed an improvement around 12.5% with command
`time ./swap_bench`.
w/o w/
real0m13.460s   0m11.771s
user0m0.248s0m0.279s
sys 0m12.039s   0m11.458s

Originally it's noticed a 16.99% overhead of ptep_clear_flush()
which has been eliminated by this patch:

[root@localhost yang]# perf record -- ./swap_bench && perf report
[...]
16.99%  swap_bench  [kernel.kallsyms]  [k] ptep_clear_flush

Cc: Jonathan Corbet 
Cc: Nadav Amit 
Cc: Mel Gorman 
Tested-by: Yicong Yang 
Tested-by: Xin Hao 
Signed-off-by: Barry Song 
Signed-off-by: Yicong Yang 
Reviewed-by: Kefeng Wang 
---
 .../features/vm/TLB/arch-support.txt  |  2 +-
 arch/arm64/Kconfig|  1 +
 arch/arm64/include/asm/tlbbatch.h | 12 ++
 arch/arm64/include/asm/tlbflush.h | 37 ++-
 arch/x86/include/asm/tlbflush.h   |  3 +-
 mm/rmap.c | 10 +++--
 6 files changed, 57 insertions(+), 8 deletions(-)
 create mode 100644 arch/arm64/include/asm/tlbbatch.h

diff --git a/Documentation/features/vm/TLB/arch-support.txt 
b/Documentation/features/vm/TLB/arch-support.txt
index 039e4e91ada3..2caf815d7c6c 100644
--- a/Documentation/features/vm/TLB/arch-support.txt
+++ b/Documentation/features/vm/TLB/arch-support.txt
@@ -9,7 +9,7 @@
 |

Re: [PATCH v3 4/4] arm64: support batched/deferred tlb shootdown during page reclamation

2022-09-21 Thread Nadav Amit

On Sep 20, 2022, at 11:53 PM, Anshuman Khandual  
wrote:

> ⚠ External Email
> 
> On 8/22/22 13:51, Yicong Yang wrote:
>> +static inline void arch_tlbbatch_add_mm(struct arch_tlbflush_unmap_batch 
>> *batch,
>> + struct mm_struct *mm,
>> + unsigned long uaddr)
>> +{
>> + __flush_tlb_page_nosync(mm, uaddr);
>> +}
>> +
>> +static inline void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch 
>> *batch)
>> +{
>> + dsb(ish);
>> +}
> 
> Just wondering if arch_tlbbatch_add_mm() could also detect continuous mapping
> TLB invalidation requests on a given mm and try to generate a range based TLB
> invalidation such as flush_tlb_range().
> 
> struct arch_tlbflush_unmap_batch via task->tlb_ubc->arch can track continuous
> ranges while being queued up via arch_tlbbatch_add_mm(), any range formed can
> later be flushed in subsequent arch_tlbbatch_flush() ?
> 
> OR
> 
> It might not be worth the effort and complexity, in comparison to performance
> improvement, TLB range flush brings in ?

So here are my 2 cents, based on my experience with Intel-x86. It is likely
different on arm64, but perhaps it can provide you some insight into what
parameters you should measure and consider.

In general there is a tradeoff between full TLB flushes and entry-specific
ones. Flushing specific entries takes more time than flushing the entire
TLB, but sade TLB refills.

Dave Hansen made some calculations in the past and came up with 33 as a
magic cutoff number, i.e., if you need to flush more than 33 entries, just
flush the entire TLB. I am not sure that this exact number is very
meaningful, since one might argue that it should’ve taken PTI into account
(which might require twice as many TLB invalidations).

Anyhow, back to arch_tlbbatch_add_mm(). It may be possible to track ranges,
but the question is whether you would actually succeed in forming continuous
ranges that are eventually (on x86) smaller than the full TLB flush cutoff
(=33). Questionable (perhaps better with MGLRU?).

Then, you should remember that tracking should be very efficient, since even
few cache misses might have greater cost than what you save by
selective-flushing. Finally, on x86 you would need to invoke the smp/IPI
layer multiple times to send different cores the relevant range they need to
flush.

IOW: It is somewhat complicated to implement efficeintly. On x86, and
probably other IPI-based TLB shootdown systems, does not have clear
performance benefit (IMHO).

Re: [PATCH v3 4/4] arm64: support batched/deferred tlb shootdown during page reclamation

2022-09-21 Thread Barry Song

On Wed, Sep 21, 2022 at 6:53 PM Anshuman Khandual
 wrote:
>
>
> On 8/22/22 13:51, Yicong Yang wrote:
> > +static inline void arch_tlbbatch_add_mm(struct arch_tlbflush_unmap_batch 
> > *batch,
> > + struct mm_struct *mm,
> > + unsigned long uaddr)
> > +{
> > + __flush_tlb_page_nosync(mm, uaddr);
> > +}
> > +
> > +static inline void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch 
> > *batch)
> > +{
> > + dsb(ish);
> > +}
>
> Just wondering if arch_tlbbatch_add_mm() could also detect continuous mapping
> TLB invalidation requests on a given mm and try to generate a range based TLB
> invalidation such as flush_tlb_range().
>
> struct arch_tlbflush_unmap_batch via task->tlb_ubc->arch can track continuous
> ranges while being queued up via arch_tlbbatch_add_mm(), any range formed can
> later be flushed in subsequent arch_tlbbatch_flush() ?
>
> OR
>
> It might not be worth the effort and complexity, in comparison to performance
> improvement, TLB range flush brings in ?

Probably it is not worth the complexity as perf annotate shows
"
Further perf annonate shows 95% cpu time of ptep_clear_flush
is actually used by the final dsb() to wait for the completion
of tlb flush."

so any further optimization before dsb(ish) might bring some improvement
but seems minor.

Thanks
Barry

Re: [PATCH v2 4/4] powerpc/64s: Enable KFENCE on book3s64

2022-09-21 Thread Christophe Leroy



Le 21/09/2022 à 04:02, Nicholas Miehlbradt a écrit :
> KFENCE support was added for ppc32 in commit 90cbac0e995d
> ("powerpc: Enable KFENCE for PPC32").
> Enable KFENCE on ppc64 architecture with hash and radix MMUs.
> It uses the same mechanism as debug pagealloc to
> protect/unprotect pages. All KFENCE kunit tests pass on both
> MMUs.
> 
> KFENCE memory is initially allocated using memblock but is
> later marked as SLAB allocated. This necessitates the change
> to __pud_free to ensure that the KFENCE pages are freed
> appropriately.
> 
> Based on previous work by Christophe Leroy and Jordan Niethe.
> 
> Signed-off-by: Nicholas Miehlbradt 
> ---
> v2: Refactor
> ---
>   arch/powerpc/Kconfig |  2 +-
>   arch/powerpc/include/asm/book3s/64/pgalloc.h |  6 --
>   arch/powerpc/include/asm/book3s/64/pgtable.h |  2 +-
>   arch/powerpc/include/asm/kfence.h| 15 +++
>   arch/powerpc/mm/book3s64/hash_utils.c| 10 +-
>   arch/powerpc/mm/book3s64/radix_pgtable.c |  8 +---
>   6 files changed, 31 insertions(+), 12 deletions(-)
> 

> diff --git a/arch/powerpc/include/asm/kfence.h 
> b/arch/powerpc/include/asm/kfence.h
> index a9846b68c6b9..cff60983e88d 100644
> --- a/arch/powerpc/include/asm/kfence.h
> +++ b/arch/powerpc/include/asm/kfence.h
> @@ -11,11 +11,25 @@
>   #include 
>   #include 
>   
> +#if defined(CONFIG_PPC64) && !defined(CONFIG_PPC64_ELF_ABI_V2)

CONFIG_PPC64 && !CONFIG_PPC64_ELF_ABI_V2

is the same as

CONFIG_PPC64_ELF_ABI_V1

> +#define ARCH_FUNC_PREFIX "."
> +#endif
> +



Christophe

[PATCH v6 25/25] powerpc/64e: Clear gprs on interrupt routine entry on Book3E

2022-09-21 Thread Rohan McLure

Zero GPRS r14-r31 on entry into the kernel for interrupt sources to
limit influence of user-space values in potential speculation gadgets.
Prior to this commit, all other GPRS are reassigned during the common
prologue to interrupt handlers and so need not be zeroised explicitly.

This may be done safely, without loss of register state prior to the
interrupt, as the common prologue saves the initial values of
non-volatiles, which are unconditionally restored in interrupt_64.S.
Mitigation defaults to enabled by INTERRUPT_SANITIZE_REGISTERS.

Signed-off-by: Rohan McLure 
---
V4: New patch.
V5: Depend on Kconfig option. Remove ZEROIZE_NVGPRS on bad kernel
stack handler.
V6: Change name of zeroize macro to be consistent with Book3S.
---
 arch/powerpc/kernel/exceptions-64e.S | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S
index f1d714acc945..d55c0a368dcb 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -366,6 +366,11 @@ ret_from_mc_except:
std r14,PACA_EXMC+EX_R14(r13);  \
std r15,PACA_EXMC+EX_R15(r13)
 
+#ifdef CONFIG_INTERRUPT_SANITIZE_REGISTERS
+#define SANITIZE_ZEROIZE_NVGPRS()  ZEROIZE_NVGPRS()
+#else
+#define SANITIZE_ZEROIZE_NVGPRS()
+#endif
 
 /* Core exception code for all exceptions except TLB misses. */
 #define EXCEPTION_COMMON_LVL(n, scratch, excf) \
@@ -402,7 +407,8 @@ exc_##n##_common:   
\
std r12,STACK_FRAME_OVERHEAD-16(r1); /* mark the frame */   \
std r3,_TRAP(r1);   /* set trap number  */  \
std r0,RESULT(r1);  /* clear regs->result */\
-   SAVE_NVGPRS(r1);
+   SAVE_NVGPRS(r1);\
+   SANITIZE_ZEROIZE_NVGPRS();  /* minimise speculation influence */
 
 #define EXCEPTION_COMMON(n) \
EXCEPTION_COMMON_LVL(n, SPRN_SPRG_GEN_SCRATCH, PACA_EXGEN)
-- 
2.34.1

[PATCH v6 24/25] powerpc/64s: Clear gprs on interrupt routine entry in Book3S

2022-09-21 Thread Rohan McLure

Zero GPRS r0, r2-r11, r14-r31, on entry into the kernel for all
other interrupt sources to limit influence of user-space values
in potential speculation gadgets. The remaining gprs are overwritten by
entry macros to interrupt handlers, irrespective of whether or not a
given handler consumes these register values.

Prior to this commit, r14-r31 are restored on a per-interrupt basis at
exit, but now they are always restored. Remove explicit REST_NVGPRS
invocations as non-volatiles must now always be restored. 32-bit systems
do not clear user registers on interrupt, and continue to depend on the
return value of interrupt_exit_user_prepare to determine whether or not
to restore non-volatiles.

The mmap_bench benchmark in selftests should rapidly invoke pagefaults.
See ~0.8% performance regression with this mitigation, but this
indicates the worst-case performance due to heavier-weight interrupt
handlers. This mitigation is disabled by default, but enabled with
CONFIG_INTERRUPT_SANITIZE_REGISTERS.

Signed-off-by: Rohan McLure 
---
V2: Add benchmark data
V3: Use ZEROIZE_GPR{,S} macro renames, clarify
upt_exit_user_prepare changes in summary.
V5: Configurable now with INTERRUPT_SANITIZE_REGISTERS. Zero r12
(containing MSR) from common macro on per-interrupt basis with IOPTION.
V6: Replace ifdefs with invocations of conditionally defined macros.
---
 arch/powerpc/kernel/exceptions-64s.S | 47 +-
 arch/powerpc/kernel/interrupt_64.S   | 15 
 2 files changed, 53 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index a3b51441b039..b3f5ef1c712f 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -21,6 +21,19 @@
 #include 
 #include 
 
+/*
+ * macros for handling user register sanitisation
+ */
+#ifdef CONFIG_INTERRUPT_SANITIZE_REGISTERS
+#define SANITIZE_ZEROIZE_NVGPRS()  ZEROIZE_NVGPRS()
+#define SANITIZE_RESTORE_NVGPRS()  REST_NVGPRS(r1)
+#define HANDLER_RESTORE_NVGPRS()
+#else
+#define SANITIZE_ZEROIZE_NVGPRS()
+#define SANITIZE_RESTORE_NVGPRS()
+#define HANDLER_RESTORE_NVGPRS()   REST_NVGPRS(r1)
+#endif /* CONFIG_INTERRUPT_SANITIZE_REGISTERS */
+
 /*
  * Following are fixed section helper macros.
  *
@@ -111,6 +124,7 @@ name:
 #define ISTACK .L_ISTACK_\name\()  /* Set regular kernel stack */
 #define __ISTACK(name) .L_ISTACK_ ## name
 #define IKUAP  .L_IKUAP_\name\()   /* Do KUAP lock */
+#define IMSR_R12   .L_IMSR_R12_\name\()/* Assumes MSR saved to r12 */
 
 #define INT_DEFINE_BEGIN(n)\
 .macro int_define_ ## n name
@@ -176,6 +190,9 @@ do_define_int n
.ifndef IKUAP
IKUAP=1
.endif
+   .ifndef IMSR_R12
+   IMSR_R12=0
+   .endif
 .endm
 
 /*
@@ -502,6 +519,7 @@ DEFINE_FIXED_SYMBOL(\name\()_common_real, text)
std r10,0(r1)   /* make stack chain pointer */
std r0,GPR0(r1) /* save r0 in stackframe*/
std r10,GPR1(r1)/* save r1 in stackframe*/
+   ZEROIZE_GPR(0)
 
/* Mark our [H]SRRs valid for return */
li  r10,1
@@ -544,8 +562,14 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
std r9,GPR11(r1)
std r10,GPR12(r1)
std r11,GPR13(r1)
+   .if !IMSR_R12
+   ZEROIZE_GPRS(9, 12)
+   .else
+   ZEROIZE_GPRS(9, 11)
+   .endif
 
SAVE_NVGPRS(r1)
+   SANITIZE_ZEROIZE_NVGPRS()
 
.if IDAR
.if IISIDE
@@ -577,8 +601,8 @@ BEGIN_FTR_SECTION
 END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
ld  r10,IAREA+EX_CTR(r13)
std r10,_CTR(r1)
-   std r2,GPR2(r1) /* save r2 in stackframe*/
-   SAVE_GPRS(3, 8, r1) /* save r3 - r8 in stackframe   */
+   SAVE_GPRS(2, 8, r1) /* save r2 - r8 in stackframe   */
+   ZEROIZE_GPRS(2, 8)
mflrr9  /* Get LR, later save to stack  */
ld  r2,PACATOC(r13) /* get kernel TOC into r2   */
std r9,_LINK(r1)
@@ -696,6 +720,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
mtlrr9
ld  r9,_CCR(r1)
mtcrr9
+   SANITIZE_RESTORE_NVGPRS()
REST_GPRS(2, 13, r1)
REST_GPR(0, r1)
/* restore original r1. */
@@ -1372,7 +1397,7 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
 * do_break() may have changed the NV GPRS while handling a breakpoint.
 * If so, we need to restore them with their updated values.
 */
-   REST_NVGPRS(r1)
+   HANDLER_RESTORE_NVGPRS()
b   interrupt_return_srr
 
 
@@ -1598,7 +1623,7 @@ EXC_COMMON_BEGIN(alignment_common)
GEN_COMMON alignment
addir3,r1,STACK_FRAME_OVERHEAD
bl  alignment_exception
-   REST_NVGPRS(r1) /* instruction emulation may change GPRs */
+

[PATCH v6 23/25] powerpc/64: Add INTERRUPT_SANITIZE_REGISTERS Kconfig

2022-09-21 Thread Rohan McLure

Add Kconfig option for enabling clearing of registers on arrival in an
interrupt handler. This reduces the speculation influence of registers
on kernel internals. The option will be consumed by 64-bit systems that
feature speculation and wish to implement this mitigation.

This patch only introduces the Kconfig option, no actual mitigations.

The primary overhead of this mitigation lies in an increased number of
registers that must be saved and restored by interrupt handlers on
Book3S systems. Enable by default on Book3E systems, which prior to
this patch eagerly save and restore register state, meaning that the
mitigation when implemented will have minimal overhead.

Signed-off-by: Rohan McLure 
Acked-by: Nicholas Piggin 
---
V5: New patch
---
 arch/powerpc/Kconfig | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index ef6c83e79c9b..a643ebd83349 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -528,6 +528,15 @@ config HOTPLUG_CPU
 
  Say N if you are unsure.
 
+config INTERRUPT_SANITIZE_REGISTERS
+   bool "Clear gprs on interrupt arrival"
+   depends on PPC64 && ARCH_HAS_SYSCALL_WRAPPER
+   default PPC_BOOK3E_64
+   help
+ Reduce the influence of user register state on interrupt handlers and
+ syscalls through clearing user state from registers before handling
+ the exception.
+
 config PPC_QUEUED_SPINLOCKS
bool "Queued spinlocks" if EXPERT
depends on SMP
-- 
2.34.1

[PATCH v6 22/25] powerpc/64s: Clear user GPRs in syscall interrupt entry

2022-09-21 Thread Rohan McLure

Clear user state in gprs (assign to zero) to reduce the influence of user
registers on speculation within kernel syscall handlers. Clears occur
at the very beginning of the sc and scv 0 interrupt handlers, with
restores occurring following the execution of the syscall handler.

Signed-off-by: Rohan McLure 
---
V2: Update summary
V3: Remove erroneous summary paragraph on syscall_exit_prepare
V4: Use ZEROIZE instead of NULLIFY. Clear r0 also.
V5: Move to end of patch series.
V6: Include clears which were previously in the syscall wrapper patch.
Move comment on r3-r8 register save to when we alter the calling
convention for system_call_exception.
---
 arch/powerpc/kernel/interrupt_64.S | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/interrupt_64.S 
b/arch/powerpc/kernel/interrupt_64.S
index a5dd78bdbe6d..40147558e1a6 100644
--- a/arch/powerpc/kernel/interrupt_64.S
+++ b/arch/powerpc/kernel/interrupt_64.S
@@ -106,6 +106,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 * but this is the best we can do.
 */
 
+   /*
+* Zero user registers to prevent influencing speculative execution
+* state of kernel code.
+*/
+   ZEROIZE_GPR(0)
+   ZEROIZE_GPRS(5, 12)
+   ZEROIZE_NVGPRS()
bl  system_call_exception
 
 .Lsyscall_vectored_\name\()_exit:
@@ -134,6 +141,7 @@ BEGIN_FTR_SECTION
HMT_MEDIUM_LOW
 END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 
+   REST_NVGPRS(r1)
cmpdi   r3,0
bne .Lsyscall_vectored_\name\()_restore_regs
 
@@ -285,6 +293,13 @@ END_BTB_FLUSH_SECTION
wrteei  1
 #endif
 
+   /*
+* Zero user registers to prevent influencing speculative execution
+* state of kernel code.
+*/
+   ZEROIZE_GPR(0)
+   ZEROIZE_GPRS(5, 12)
+   ZEROIZE_NVGPRS()
bl  system_call_exception
 
 .Lsyscall_exit:
@@ -325,6 +340,7 @@ BEGIN_FTR_SECTION
stdcx.  r0,0,r1 /* to clear the reservation */
 END_FTR_SECTION_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
 
+   REST_NVGPRS(r1)
cmpdi   r3,0
bne .Lsyscall_restore_regs
/* Zero volatile regs that may contain sensitive kernel data */
@@ -352,7 +368,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 .Lsyscall_restore_regs:
ld  r3,_CTR(r1)
ld  r4,_XER(r1)
-   REST_NVGPRS(r1)
mtctr   r3
mtspr   SPRN_XER,r4
REST_GPR(0, r1)
-- 
2.34.1

[PATCH v6 21/25] powerpc: Provide syscall wrapper

2022-09-21 Thread Rohan McLure

Implement syscall wrapper as per s390, x86, arm64. When enabled
cause handlers to accept parameters from a stack frame rather than
from user scratch register state. This allows for user registers to be
safely cleared in order to reduce caller influence on speculation
within syscall routine. The wrapper is a macro that emits syscall
handler symbols that call into the target handler, obtaining its
parameters from a struct pt_regs on the stack.

As registers are already saved to the stack prior to calling
system_call_exception, it appears that this function is executed more
efficiently with the new stack-pointer convention than with parameters
passed by registers, avoiding the allocation of a stack frame for this
method. On a 32-bit system, we see >20% performance increases on the
null_syscall microbenchmark, and on a Power 8 the performance gains
amortise the cost of clearing and restoring registers which is
implemented at the end of this series, seeing final result of ~5.6%
performance improvement on null_syscall.

Syscalls are wrapped in this fashion on all platforms except for the
Cell processor as this commit does not provide SPU support. This can be
quickly fixed in a successive patch, but requires spu_sys_callback to
allocate a pt_regs structure to satisfy the wrapped calling convention.

Co-developed-by: Andrew Donnellan 
Signed-off-by: Andrew Donnellan 
Signed-off-by: Rohan McLure 
---
V2: Generate prototypes for symbols produced by the wrapper.
V3: Rebased to remove conflict with 1547db7d1f44
("powerpc: Move system_call_exception() to syscall.c"). Also remove copy
from gpr3 save slot on stackframe to orig_r3's slot. Fix whitespace with
preprocessor defines in system_call_exception.
V5: Move systbl.c syscall wrapper support to this patch. Swap
calling convention for system_call_exception to be (, r0)
V6: Change calling convention for system_call_exception in another
patch.
---
 arch/powerpc/Kconfig   |  1 +
 arch/powerpc/include/asm/syscall.h |  4 +
 arch/powerpc/include/asm/syscall_wrapper.h | 84 
 arch/powerpc/include/asm/syscalls.h| 30 ++-
 arch/powerpc/kernel/syscall.c  |  7 +-
 arch/powerpc/kernel/systbl.c   |  7 ++
 arch/powerpc/kernel/vdso.c |  2 +
 7 files changed, 132 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 4c466acdc70d..ef6c83e79c9b 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -137,6 +137,7 @@ config PPC
select ARCH_HAS_STRICT_KERNEL_RWX   if (PPC_BOOK3S || PPC_8xx || 
40x) && !HIBERNATION
select ARCH_HAS_STRICT_KERNEL_RWX   if FSL_BOOKE && !HIBERNATION && 
!RANDOMIZE_BASE
select ARCH_HAS_STRICT_MODULE_RWX   if ARCH_HAS_STRICT_KERNEL_RWX
+   select ARCH_HAS_SYSCALL_WRAPPER if !SPU_BASE
select ARCH_HAS_TICK_BROADCAST  if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_HAS_UACCESS_FLUSHCACHE
select ARCH_HAS_UBSAN_SANITIZE_ALL
diff --git a/arch/powerpc/include/asm/syscall.h 
b/arch/powerpc/include/asm/syscall.h
index d2a8dfd5de33..3dd36c5e334a 100644
--- a/arch/powerpc/include/asm/syscall.h
+++ b/arch/powerpc/include/asm/syscall.h
@@ -14,8 +14,12 @@
 #include 
 #include 
 
+#ifdef CONFIG_ARCH_HAS_SYSCALL_WRAPPER
+typedef long (*syscall_fn)(const struct pt_regs *);
+#else
 typedef long (*syscall_fn)(unsigned long, unsigned long, unsigned long,
   unsigned long, unsigned long, unsigned long);
+#endif
 
 /* ftrace syscalls requires exporting the sys_call_table */
 extern const syscall_fn sys_call_table[];
diff --git a/arch/powerpc/include/asm/syscall_wrapper.h 
b/arch/powerpc/include/asm/syscall_wrapper.h
new file mode 100644
index ..91bcfa40f740
--- /dev/null
+++ b/arch/powerpc/include/asm/syscall_wrapper.h
@@ -0,0 +1,84 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * syscall_wrapper.h - powerpc specific wrappers to syscall definitions
+ *
+ * Based on arch/{x86,arm64}/include/asm/syscall_wrapper.h
+ */
+
+#ifndef __ASM_SYSCALL_WRAPPER_H
+#define __ASM_SYSCALL_WRAPPER_H
+
+struct pt_regs;
+
+#define SC_POWERPC_REGS_TO_ARGS(x, ...)\
+   __MAP(x,__SC_ARGS   \
+ ,,regs->gpr[3],,regs->gpr[4],,regs->gpr[5]\
+ ,,regs->gpr[6],,regs->gpr[7],,regs->gpr[8])
+
+#ifdef CONFIG_COMPAT
+
+#define COMPAT_SYSCALL_DEFINEx(x, name, ...)   
\
+   long __powerpc_compat_sys##name(const struct pt_regs *regs);
\
+   ALLOW_ERROR_INJECTION(__powerpc_compat_sys##name, ERRNO);   
\
+   static long __se_compat_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__));  
\
+   static inline long 
__do_compat_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__));   \
+   long __powerpc_compat_sys##name(const struct pt_regs *regs) 
\
+

[PATCH v6 20/25] powerpc: Change system_call_exception calling convention

2022-09-21 Thread Rohan McLure

Change system_call_exception arguments to pass a pointer to a stack frame
container caller state, as well as the original r0, which determines the
number of the syscall. This has been observed to yield improved performance
to passing them by registers, circumventing the need to allocate a stack frame.

Signed-off-by: Rohan McLure 
---
V6: Split off from syscall wrapper patch.
---
 arch/powerpc/include/asm/interrupt.h |  3 +--
 arch/powerpc/kernel/entry_32.S   |  6 +++---
 arch/powerpc/kernel/interrupt_64.S   | 20 ++--
 arch/powerpc/kernel/syscall.c| 10 +-
 4 files changed, 15 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/include/asm/interrupt.h 
b/arch/powerpc/include/asm/interrupt.h
index 8069dbc4b8d1..48eec9cd1429 100644
--- a/arch/powerpc/include/asm/interrupt.h
+++ b/arch/powerpc/include/asm/interrupt.h
@@ -665,8 +665,7 @@ static inline void interrupt_cond_local_irq_enable(struct 
pt_regs *regs)
local_irq_enable();
 }
 
-long system_call_exception(long r3, long r4, long r5, long r6, long r7, long 
r8,
-  unsigned long r0, struct pt_regs *regs);
+long system_call_exception(struct pt_regs *regs, unsigned long r0);
 notrace unsigned long syscall_exit_prepare(unsigned long r3, struct pt_regs 
*regs, long scv);
 notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs);
 notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs);
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index e4b694cebc44..96782aa72083 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -122,9 +122,9 @@ transfer_to_syscall:
SAVE_NVGPRS(r1)
kuep_lock
 
-   /* Calling convention has r9 = orig r0, r10 = regs */
-   addir10,r1,STACK_FRAME_OVERHEAD
-   mr  r9,r0
+   /* Calling convention has r3 = regs, r4 = orig r0 */
+   addir3,r1,STACK_FRAME_OVERHEAD
+   mr  r4,r0
bl  system_call_exception
 
 ret_from_syscall:
diff --git a/arch/powerpc/kernel/interrupt_64.S 
b/arch/powerpc/kernel/interrupt_64.S
index 7d92a7a54727..a5dd78bdbe6d 100644
--- a/arch/powerpc/kernel/interrupt_64.S
+++ b/arch/powerpc/kernel/interrupt_64.S
@@ -70,7 +70,7 @@ _ASM_NOKPROBE_SYMBOL(system_call_vectored_\name)
ld  r2,PACATOC(r13)
mfcrr12
li  r11,0
-   /* Can we avoid saving r3-r8 in common case? */
+   /* Save syscall parameters in r3-r8 */
SAVE_GPRS(3, 8, r1)
/* Zero r9-r12, this should only be required when restoring all GPRs */
std r11,GPR9(r1)
@@ -87,9 +87,11 @@ _ASM_NOKPROBE_SYMBOL(system_call_vectored_\name)
std r11,_TRAP(r1)
std r12,_CCR(r1)
std r3,ORIG_GPR3(r1)
-   addir10,r1,STACK_FRAME_OVERHEAD
+   /* Calling convention has r3 = regs, r4 = orig r0 */
+   addir3,r1,STACK_FRAME_OVERHEAD
+   mr  r4,r0
ld  r11,exception_marker@toc(r2)
-   std r11,-16(r10)/* "regshere" marker */
+   std r11,-16(r3) /* "regshere" marker */
 
 BEGIN_FTR_SECTION
HMT_MEDIUM
@@ -104,8 +106,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 * but this is the best we can do.
 */
 
-   /* Calling convention has r9 = orig r0, r10 = regs */
-   mr  r9,r0
bl  system_call_exception
 
 .Lsyscall_vectored_\name\()_exit:
@@ -237,7 +237,7 @@ END_BTB_FLUSH_SECTION
ld  r2,PACATOC(r13)
mfcrr12
li  r11,0
-   /* Can we avoid saving r3-r8 in common case? */
+   /* Save syscall parameters in r3-r8 */
SAVE_GPRS(3, 8, r1)
/* Zero r9-r12, this should only be required when restoring all GPRs */
std r11,GPR9(r1)
@@ -260,9 +260,11 @@ END_BTB_FLUSH_SECTION
std r11,_TRAP(r1)
std r12,_CCR(r1)
std r3,ORIG_GPR3(r1)
-   addir10,r1,STACK_FRAME_OVERHEAD
+   /* Calling convention has r3 = regs, r4 = orig r0 */
+   addir3,r1,STACK_FRAME_OVERHEAD
+   mr  r4,r0
ld  r11,exception_marker@toc(r2)
-   std r11,-16(r10)/* "regshere" marker */
+   std r11,-16(r3) /* "regshere" marker */
 
 #ifdef CONFIG_PPC_BOOK3S
li  r11,1
@@ -283,8 +285,6 @@ END_BTB_FLUSH_SECTION
wrteei  1
 #endif
 
-   /* Calling convention has r9 = orig r0, r10 = regs */
-   mr  r9,r0
bl  system_call_exception
 
 .Lsyscall_exit:
diff --git a/arch/powerpc/kernel/syscall.c b/arch/powerpc/kernel/syscall.c
index 15af0ed019a7..0e9ba3efee94 100644
--- a/arch/powerpc/kernel/syscall.c
+++ b/arch/powerpc/kernel/syscall.c
@@ -13,9 +13,7 @@
 
 
 /* Has to run notrace because it is entered not completely "reconciled" */
-notrace long system_call_exception(long r3, long r4, long r5,
-  long r6, long r7, long r8,
-

[PATCH v6 19/25] powerpc: Remove high-order word clearing on compat syscall entry

2022-09-21 Thread Rohan McLure

Remove explicit clearing of the high order-word of user parameters when
handling compatibility syscalls in system_call_exception. The
COMPAT_SYSCALL_DEFINEx macros handle this clearing through an
explicit cast to the signature type of the target handler.

Signed-off-by: Rohan McLure 
Reported-by: Nicholas Piggin 
---
V6: New patch
---
 arch/powerpc/kernel/syscall.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/arch/powerpc/kernel/syscall.c b/arch/powerpc/kernel/syscall.c
index 9875486f6168..15af0ed019a7 100644
--- a/arch/powerpc/kernel/syscall.c
+++ b/arch/powerpc/kernel/syscall.c
@@ -157,14 +157,6 @@ notrace long system_call_exception(long r3, long r4, long 
r5,
 
if (unlikely(is_compat_task())) {
f = (void *)compat_sys_call_table[r0];
-
-   r3 &= 0xULL;
-   r4 &= 0xULL;
-   r5 &= 0xULL;
-   r6 &= 0xULL;
-   r7 &= 0xULL;
-   r8 &= 0xULL;
-
} else {
f = (void *)sys_call_table[r0];
}
-- 
2.34.1

[PATCH v6 18/25] powerpc: Use common syscall handler type

2022-09-21 Thread Rohan McLure

Cause syscall handlers to be typed as follows when called indirectly
throughout the kernel. This is to allow for better type checking.

typedef long (*syscall_fn)(unsigned long, unsigned long, unsigned long,
   unsigned long, unsigned long, unsigned long);

Since both 32 and 64-bit abis allow for at least the first six
machine-word length parameters to a function to be passed by registers,
even handlers which admit fewer than six parameters may be viewed as
having the above type.

Coercing syscalls to syscall_fn requires a cast to void* to avoid
-Wcast-function-type.

Fixup comparisons in VDSO to avoid pointer-integer comparison. Introduce
explicit cast on systems with SPUs.

Signed-off-by: Rohan McLure 
Reviewed-by: Nicholas Piggin 
---
V2: New patch.
V3: Remove unnecessary cast from const syscall_fn to syscall_fn
V5: Update patch desctiption.
V6: Remove syscall_fn typedef in syscall.c. Comment void* cast.
---
 arch/powerpc/include/asm/syscall.h  |  7 +--
 arch/powerpc/include/asm/syscalls.h |  1 +
 arch/powerpc/kernel/syscall.c   |  2 --
 arch/powerpc/kernel/systbl.c| 11 ---
 arch/powerpc/kernel/vdso.c  |  4 ++--
 arch/powerpc/platforms/cell/spu_callbacks.c |  6 +++---
 6 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/syscall.h 
b/arch/powerpc/include/asm/syscall.h
index 25fc8ad9a27a..d2a8dfd5de33 100644
--- a/arch/powerpc/include/asm/syscall.h
+++ b/arch/powerpc/include/asm/syscall.h
@@ -14,9 +14,12 @@
 #include 
 #include 
 
+typedef long (*syscall_fn)(unsigned long, unsigned long, unsigned long,
+  unsigned long, unsigned long, unsigned long);
+
 /* ftrace syscalls requires exporting the sys_call_table */
-extern const unsigned long sys_call_table[];
-extern const unsigned long compat_sys_call_table[];
+extern const syscall_fn sys_call_table[];
+extern const syscall_fn compat_sys_call_table[];
 
 static inline int syscall_get_nr(struct task_struct *task, struct pt_regs 
*regs)
 {
diff --git a/arch/powerpc/include/asm/syscalls.h 
b/arch/powerpc/include/asm/syscalls.h
index 5d106acf7906..cc87168d6ecb 100644
--- a/arch/powerpc/include/asm/syscalls.h
+++ b/arch/powerpc/include/asm/syscalls.h
@@ -8,6 +8,7 @@
 #include 
 #include 
 
+#include 
 #ifdef CONFIG_PPC64
 #include 
 #endif
diff --git a/arch/powerpc/kernel/syscall.c b/arch/powerpc/kernel/syscall.c
index 64102a64fd84..9875486f6168 100644
--- a/arch/powerpc/kernel/syscall.c
+++ b/arch/powerpc/kernel/syscall.c
@@ -12,8 +12,6 @@
 #include 
 
 
-typedef long (*syscall_fn)(long, long, long, long, long, long);
-
 /* Has to run notrace because it is entered not completely "reconciled" */
 notrace long system_call_exception(long r3, long r4, long r5,
   long r6, long r7, long r8,
diff --git a/arch/powerpc/kernel/systbl.c b/arch/powerpc/kernel/systbl.c
index ce52bd2ec292..d64dfafc2e5e 100644
--- a/arch/powerpc/kernel/systbl.c
+++ b/arch/powerpc/kernel/systbl.c
@@ -16,9 +16,14 @@
 #include 
 
 #define __SYSCALL_WITH_COMPAT(nr, entry, compat) __SYSCALL(nr, entry)
-#define __SYSCALL(nr, entry) [nr] = (unsigned long) ,
 
-const unsigned long sys_call_table[] = {
+/*
+ * Coerce syscall handlers with arbitrary parameters to common type
+ * requires cast to void* to avoid -Wcast-function-type.
+ */
+#define __SYSCALL(nr, entry) [nr] = (void *) entry,
+
+const syscall_fn sys_call_table[] = {
 #ifdef CONFIG_PPC64
 #include 
 #else
@@ -29,7 +34,7 @@ const unsigned long sys_call_table[] = {
 #ifdef CONFIG_COMPAT
 #undef __SYSCALL_WITH_COMPAT
 #define __SYSCALL_WITH_COMPAT(nr, native, compat)  __SYSCALL(nr, compat)
-const unsigned long compat_sys_call_table[] = {
+const syscall_fn compat_sys_call_table[] = {
 #include 
 };
 #endif /* CONFIG_COMPAT */
diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index bf9574ec26ce..fcca06d200d3 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -304,10 +304,10 @@ static void __init vdso_setup_syscall_map(void)
unsigned int i;
 
for (i = 0; i < NR_syscalls; i++) {
-   if (sys_call_table[i] != (unsigned long)_ni_syscall)
+   if (sys_call_table[i] != (void *)_ni_syscall)
vdso_data->syscall_map[i >> 5] |= 0x8000UL >> (i & 
0x1f);
if (IS_ENABLED(CONFIG_COMPAT) &&
-   compat_sys_call_table[i] != (unsigned long)_ni_syscall)
+   compat_sys_call_table[i] != (void *)_ni_syscall)
vdso_data->compat_syscall_map[i >> 5] |= 0x8000UL 
>> (i & 0x1f);
}
 }
diff --git a/arch/powerpc/platforms/cell/spu_callbacks.c 
b/arch/powerpc/platforms/cell/spu_callbacks.c
index fe0d8797a00a..e780c14c5733 100644
--- a/arch/powerpc/platforms/cell/spu_callbacks.c
+++ b/arch/powerpc/platforms/cell/spu_callbacks.c
@@ -34,15 +34,15 @@
  * mbind, mq_open, ipc, ...
  */
 
-static

[PATCH v6 17/25] powerpc: Enable compile-time check for syscall handlers

2022-09-21 Thread Rohan McLure

The table of syscall handlers and registered compatibility syscall
handlers has in past been produced using assembly, with function
references resolved at link time. This moves link-time errors to
compile-time, by rewriting systbl.S in C, and including the
linux/syscalls.h, linux/compat.h and asm/syscalls.h headers for
prototypes.

Reported-by: Arnd Bergmann 
Signed-off-by: Rohan McLure 
Reviewed-by: Nicholas Piggin 
---
V2: New patch.
V5: For this patch only, represent handler function pointers as
unsigned long. Remove reference to syscall wrappers. Use asm/syscalls.h
which implies asm/syscall.h
---
 arch/powerpc/kernel/{systbl.S => systbl.c} | 28 
 1 file changed, 11 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/kernel/systbl.S b/arch/powerpc/kernel/systbl.c
similarity index 61%
rename from arch/powerpc/kernel/systbl.S
rename to arch/powerpc/kernel/systbl.c
index 6c1db3b6de2d..ce52bd2ec292 100644
--- a/arch/powerpc/kernel/systbl.S
+++ b/arch/powerpc/kernel/systbl.c
@@ -10,32 +10,26 @@
  * PPC64 updates by Dave Engebretsen (engeb...@us.ibm.com) 
  */
 
-#include 
+#include 
+#include 
+#include 
+#include 
 
-.section .rodata,"a"
+#define __SYSCALL_WITH_COMPAT(nr, entry, compat) __SYSCALL(nr, entry)
+#define __SYSCALL(nr, entry) [nr] = (unsigned long) ,
 
-#ifdef CONFIG_PPC64
-   .p2align3
-#define __SYSCALL(nr, entry)   .8byte entry
-#else
-   .p2align2
-#define __SYSCALL(nr, entry)   .long entry
-#endif
-
-#define __SYSCALL_WITH_COMPAT(nr, native, compat)  __SYSCALL(nr, native)
-.globl sys_call_table
-sys_call_table:
+const unsigned long sys_call_table[] = {
 #ifdef CONFIG_PPC64
 #include 
 #else
 #include 
 #endif
+};
 
 #ifdef CONFIG_COMPAT
 #undef __SYSCALL_WITH_COMPAT
 #define __SYSCALL_WITH_COMPAT(nr, native, compat)  __SYSCALL(nr, compat)
-.globl compat_sys_call_table
-compat_sys_call_table:
-#define compat_sys_sigsuspend  sys_sigsuspend
+const unsigned long compat_sys_call_table[] = {
 #include 
-#endif
+};
+#endif /* CONFIG_COMPAT */
-- 
2.34.1

[PATCH v6 16/25] powerpc: Include all arch-specific syscall prototypes

2022-09-21 Thread Rohan McLure

Forward declare all syscall handler prototypes where a generic prototype
is not provided in either linux/syscalls.h or linux/compat.h in
asm/syscalls.h. This is required for compile-time type-checking for
syscall handlers, which is implemented later in this series.

32-bit compatibility syscall handlers are expressed in terms of types in
ppc32.h. Expose this header globally.

Signed-off-by: Rohan McLure 
Acked-by: Nicholas Piggin 
---
V2: Explicitly include prototypes.
V3: Remove extraneous #include  and ppc_fallocate
prototype. Rename header.
V5: Clean. Elaborate comment on long long munging. Remove
prototype hiding conditional on SYSCALL_WRAPPER.
---
 arch/powerpc/include/asm/syscalls.h  | 97 ++
 .../ppc32.h => include/asm/syscalls_32.h}|  0
 arch/powerpc/kernel/signal_32.c  |  2 +-
 arch/powerpc/perf/callchain_32.c |  2 +-
 4 files changed, 77 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/include/asm/syscalls.h 
b/arch/powerpc/include/asm/syscalls.h
index 525d2aa0c8ca..5d106acf7906 100644
--- a/arch/powerpc/include/asm/syscalls.h
+++ b/arch/powerpc/include/asm/syscalls.h
@@ -8,6 +8,14 @@
 #include 
 #include 
 
+#ifdef CONFIG_PPC64
+#include 
+#endif
+#include 
+#include 
+
+struct rtas_args;
+
 /*
  * long long munging:
  * The 32 bit ABI passes long longs in an odd even register pair.
@@ -20,44 +28,89 @@
 #define merge_64(high, low) ((u64)high << 32) | low
 #endif
 
-struct rtas_args;
+long sys_ni_syscall(void);
+
+/*
+ * PowerPC architecture-specific syscalls
+ */
+
+long sys_rtas(struct rtas_args __user *uargs);
+
+#ifdef CONFIG_PPC64
+long sys_ppc64_personality(unsigned long personality);
+#ifdef CONFIG_COMPAT
+long compat_sys_ppc64_personality(unsigned long personality);
+#endif /* CONFIG_COMPAT */
+#endif /* CONFIG_PPC64 */
 
+long sys_swapcontext(struct ucontext __user *old_ctx,
+struct ucontext __user *new_ctx, long ctx_size);
 long sys_mmap(unsigned long addr, size_t len,
  unsigned long prot, unsigned long flags,
  unsigned long fd, off_t offset);
 long sys_mmap2(unsigned long addr, size_t len,
   unsigned long prot, unsigned long flags,
   unsigned long fd, unsigned long pgoff);
-long sys_ppc64_personality(unsigned long personality);
-long sys_rtas(struct rtas_args __user *uargs);
-long sys_ppc_fadvise64_64(int fd, int advice, u32 offset_high, u32 offset_low,
- u32 len_high, u32 len_low);
+long sys_switch_endian(void);
 
-#ifdef CONFIG_COMPAT
-unsigned long compat_sys_mmap2(unsigned long addr, size_t len,
-  unsigned long prot, unsigned long flags,
-  unsigned long fd, unsigned long pgoff);
-
-compat_ssize_t compat_sys_pread64(unsigned int fd, char __user *ubuf, 
compat_size_t count,
- u32 reg6, u32 pos1, u32 pos2);
+#ifdef CONFIG_PPC32
+long sys_sigreturn(void);
+long sys_debug_setcontext(struct ucontext __user *ctx, int ndbg,
+ struct sig_dbg_op __user *dbg);
+#endif
 
-compat_ssize_t compat_sys_pwrite64(unsigned int fd, const char __user *ubuf, 
compat_size_t count,
-  u32 reg6, u32 pos1, u32 pos2);
+long sys_rt_sigreturn(void);
 
-compat_ssize_t compat_sys_readahead(int fd, u32 r4, u32 offset1, u32 offset2, 
u32 count);
+long sys_subpage_prot(unsigned long addr,
+ unsigned long len, u32 __user *map);
 
-int compat_sys_truncate64(const char __user *path, u32 reg4,
- unsigned long len1, unsigned long len2);
+#ifdef CONFIG_COMPAT
+long compat_sys_swapcontext(struct ucontext32 __user *old_ctx,
+   struct ucontext32 __user *new_ctx,
+   int ctx_size);
+long compat_sys_old_getrlimit(unsigned int resource,
+ struct compat_rlimit __user *rlim);
+long compat_sys_sigreturn(void);
+long compat_sys_rt_sigreturn(void);
+#endif /* CONFIG_COMPAT */
 
-int compat_sys_ftruncate64(unsigned int fd, u32 reg4, unsigned long len1,
-  unsigned long len2);
+/*
+ * Architecture specific signatures required by long long munging:
+ * The 32 bit ABI passes long longs in an odd even register pair.
+ * The following signatures provide a machine long parameter for
+ * each register that will be supplied. The implementation is
+ * responsible for combining parameter pairs.
+ */
 
+#ifdef CONFIG_COMPAT
+long compat_sys_mmap2(unsigned long addr, size_t len,
+ unsigned long prot, unsigned long flags,
+ unsigned long fd, unsigned long pgoff);
+long compat_sys_ppc_pread64(unsigned int fd,
+   char __user *ubuf, compat_size_t count,
+   u32 reg6, u32 pos1, u32 pos2);
+long compat_sys_ppc_pwrite64(unsigned int fd,
+const char __user *ubuf, compat_size_t count,
+

[PATCH v6 15/25] powerpc: Adopt SYSCALL_DEFINE for arch-specific syscall handlers

2022-09-21 Thread Rohan McLure

Arch-specific implementations of syscall handlers are currently used
over generic implementations for the following reasons:

1. Semantics unique to powerpc
2. Compatibility syscalls require 'argument padding' to comply with
   64-bit argument convention in ELF32 abi.
3. Parameter types or order is different in other architectures.

These syscall handlers have been defined prior to this patch series
without invoking the SYSCALL_DEFINE or COMPAT_SYSCALL_DEFINE macros with
custom input and output types. We remove every such direct definition in
favour of the aforementioned macros.

Also update syscalls.tbl in order to refer to the symbol names generated
by each of these macros. Since ppc64_personality can be called by both
64 bit and 32 bit binaries through compatibility, we must generate both
both compat_sys_ and sys_ symbols for this handler.

As an aside:
A number of architectures including arm and powerpc agree on an
alternative argument order and numbering for most of these arch-specific
handlers. A future patch series may allow for asm/unistd.h to signal
through its defines that a generic implementation of these syscall
handlers with the correct calling convention be emitted, through the
__ARCH_WANT_COMPAT_SYS_... convention.

Signed-off-by: Rohan McLure 
Reviewed-by: Nicholas Piggin 
---
V2: All syscall handlers wrapped by this macro.
V3: Move creation of do_ppc64_personality helper to prior patch.
V4: Fix parenthesis alignment. Don't emit sys_*** symbols.
V5: Use 'aside' in the asm-generic rant in commit message.
---
 arch/powerpc/include/asm/syscalls.h  | 10 ++---
 arch/powerpc/kernel/sys_ppc32.c  | 38 +++---
 arch/powerpc/kernel/syscalls.c   | 17 ++--
 arch/powerpc/kernel/syscalls/syscall.tbl | 22 +-
 .../arch/powerpc/entry/syscalls/syscall.tbl  | 22 +-
 5 files changed, 64 insertions(+), 45 deletions(-)

diff --git a/arch/powerpc/include/asm/syscalls.h 
b/arch/powerpc/include/asm/syscalls.h
index 20cbd29b1228..525d2aa0c8ca 100644
--- a/arch/powerpc/include/asm/syscalls.h
+++ b/arch/powerpc/include/asm/syscalls.h
@@ -28,10 +28,10 @@ long sys_mmap(unsigned long addr, size_t len,
 long sys_mmap2(unsigned long addr, size_t len,
   unsigned long prot, unsigned long flags,
   unsigned long fd, unsigned long pgoff);
-long ppc64_personality(unsigned long personality);
+long sys_ppc64_personality(unsigned long personality);
 long sys_rtas(struct rtas_args __user *uargs);
-long ppc_fadvise64_64(int fd, int advice, u32 offset_high, u32 offset_low,
- u32 len_high, u32 len_low);
+long sys_ppc_fadvise64_64(int fd, int advice, u32 offset_high, u32 offset_low,
+ u32 len_high, u32 len_low);
 
 #ifdef CONFIG_COMPAT
 unsigned long compat_sys_mmap2(unsigned long addr, size_t len,
@@ -52,8 +52,8 @@ int compat_sys_truncate64(const char __user *path, u32 reg4,
 int compat_sys_ftruncate64(unsigned int fd, u32 reg4, unsigned long len1,
   unsigned long len2);
 
-long ppc32_fadvise64(int fd, u32 unused, u32 offset1, u32 offset2,
-size_t len, int advice);
+long compat_sys_ppc32_fadvise64(int fd, u32 unused, u32 offset1, u32 offset2,
+   size_t len, int advice);
 
 long compat_sys_sync_file_range2(int fd, unsigned int flags,
 unsigned int offset1, unsigned int offset2,
diff --git a/arch/powerpc/kernel/sys_ppc32.c b/arch/powerpc/kernel/sys_ppc32.c
index 776ae7565fc5..dcc3c9fd4cfd 100644
--- a/arch/powerpc/kernel/sys_ppc32.c
+++ b/arch/powerpc/kernel/sys_ppc32.c
@@ -47,45 +47,55 @@
 #include 
 #include 
 
-compat_ssize_t compat_sys_pread64(unsigned int fd, char __user *ubuf, 
compat_size_t count,
-u32 reg6, u32 pos1, u32 pos2)
+COMPAT_SYSCALL_DEFINE6(ppc_pread64,
+  unsigned int, fd,
+  char __user *, ubuf, compat_size_t, count,
+  u32, reg6, u32, pos1, u32, pos2)
 {
return ksys_pread64(fd, ubuf, count, merge_64(pos1, pos2));
 }
 
-compat_ssize_t compat_sys_pwrite64(unsigned int fd, const char __user *ubuf, 
compat_size_t count,
- u32 reg6, u32 pos1, u32 pos2)
+COMPAT_SYSCALL_DEFINE6(ppc_pwrite64,
+  unsigned int, fd,
+  const char __user *, ubuf, compat_size_t, count,
+  u32, reg6, u32, pos1, u32, pos2)
 {
return ksys_pwrite64(fd, ubuf, count, merge_64(pos1, pos2));
 }
 
-compat_ssize_t compat_sys_readahead(int fd, u32 r4, u32 offset1, u32 offset2, 
u32 count)
+COMPAT_SYSCALL_DEFINE5(ppc_readahead,
+  int, fd, u32, r4,
+  u32, offset1, u32, offset2, u32, count)
 {
return ksys_readahead(fd, merge_64(offset1, offset2), count);
 }
 
-int compat_sys_truncate64(const char __user * path, u32 reg4,
-   unsigned long len1, unsigned

[PATCH v6 14/25] powerpc: Provide do_ppc64_personality helper

2022-09-21 Thread Rohan McLure

Avoid duplication in future patch that will define the ppc64_personality
syscall handler in terms of the SYSCALL_DEFINE and COMPAT_SYSCALL_DEFINE
macros, by extracting the common body of ppc64_personality into a helper
function.

Signed-off-by: Rohan McLure 
Reviewed-by: Nicholas Piggin 
---
V3: New commit.
V5: Remove 'inline'.
---
 arch/powerpc/kernel/syscalls.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/syscalls.c b/arch/powerpc/kernel/syscalls.c
index 9830957498b0..135a0b9108d5 100644
--- a/arch/powerpc/kernel/syscalls.c
+++ b/arch/powerpc/kernel/syscalls.c
@@ -75,7 +75,7 @@ SYSCALL_DEFINE6(mmap, unsigned long, addr, size_t, len,
 }
 
 #ifdef CONFIG_PPC64
-long ppc64_personality(unsigned long personality)
+static long do_ppc64_personality(unsigned long personality)
 {
long ret;
 
@@ -87,6 +87,10 @@ long ppc64_personality(unsigned long personality)
ret = (ret & ~PER_MASK) | PER_LINUX;
return ret;
 }
+long ppc64_personality(unsigned long personality)
+{
+   return do_ppc64_personality(personality);
+}
 #endif
 
 long ppc_fadvise64_64(int fd, int advice, u32 offset_high, u32 offset_low,
-- 
2.34.1

[PATCH v6 13/25] powerpc: Remove direct call to mmap2 syscall handlers

2022-09-21 Thread Rohan McLure

Syscall handlers should not be invoked internally by their symbol names,
as these symbols defined by the architecture-defined SYSCALL_DEFINE
macro. Move the compatibility syscall definition for mmap2 to
syscalls.c, so that all mmap implementations can share a helper function.

Remove 'inline' on static mmap helper.

Signed-off-by: Rohan McLure 
Reviewed-by: Nicholas Piggin 
---
V2: Move mmap2 compat implementation to asm/kernel/syscalls.c.
V4: Move to be applied before syscall wrapper introduced.
V5: Remove 'inline' in helper.
---
 arch/powerpc/kernel/sys_ppc32.c |  9 -
 arch/powerpc/kernel/syscalls.c  | 17 ++---
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kernel/sys_ppc32.c b/arch/powerpc/kernel/sys_ppc32.c
index d961634976d8..776ae7565fc5 100644
--- a/arch/powerpc/kernel/sys_ppc32.c
+++ b/arch/powerpc/kernel/sys_ppc32.c
@@ -25,7 +25,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -48,14 +47,6 @@
 #include 
 #include 
 
-unsigned long compat_sys_mmap2(unsigned long addr, size_t len,
- unsigned long prot, unsigned long flags,
- unsigned long fd, unsigned long pgoff)
-{
-   /* This should remain 12 even if PAGE_SIZE changes */
-   return sys_mmap(addr, len, prot, flags, fd, pgoff << 12);
-}
-
 compat_ssize_t compat_sys_pread64(unsigned int fd, char __user *ubuf, 
compat_size_t count,
 u32 reg6, u32 pos1, u32 pos2)
 {
diff --git a/arch/powerpc/kernel/syscalls.c b/arch/powerpc/kernel/syscalls.c
index a04c97faa21a..9830957498b0 100644
--- a/arch/powerpc/kernel/syscalls.c
+++ b/arch/powerpc/kernel/syscalls.c
@@ -36,9 +36,9 @@
 #include 
 #include 
 
-static inline long do_mmap2(unsigned long addr, size_t len,
-   unsigned long prot, unsigned long flags,
-   unsigned long fd, unsigned long off, int shift)
+static long do_mmap2(unsigned long addr, size_t len,
+unsigned long prot, unsigned long flags,
+unsigned long fd, unsigned long off, int shift)
 {
if (!arch_validate_prot(prot, addr))
return -EINVAL;
@@ -56,6 +56,17 @@ SYSCALL_DEFINE6(mmap2, unsigned long, addr, size_t, len,
return do_mmap2(addr, len, prot, flags, fd, pgoff, PAGE_SHIFT-12);
 }
 
+#ifdef CONFIG_COMPAT
+COMPAT_SYSCALL_DEFINE6(mmap2,
+  unsigned long, addr, size_t, len,
+  unsigned long, prot, unsigned long, flags,
+  unsigned long, fd, unsigned long, pgoff)
+{
+   /* This should remain 12 even if PAGE_SIZE changes */
+   return do_mmap2(addr, len, prot, flags, fd, pgoff << 12, PAGE_SHIFT-12);
+}
+#endif
+
 SYSCALL_DEFINE6(mmap, unsigned long, addr, size_t, len,
unsigned long, prot, unsigned long, flags,
unsigned long, fd, off_t, offset)
-- 
2.34.1

[PATCH v6 12/25] powerpc: Remove direct call to personality syscall handler

2022-09-21 Thread Rohan McLure

Syscall handlers should not be invoked internally by their symbol names,
as these symbols defined by the architecture-defined SYSCALL_DEFINE
macro. Fortunately, in the case of ppc64_personality, its call to
sys_personality can be replaced with an invocation to the
equivalent ksys_personality inline helper in .

Signed-off-by: Rohan McLure 
Reviewed-by: Nicholas Piggin 
---
V2: Use inline helper to deduplicate bodies in compat/regular
implementations.
V4: Move to be applied before syscall wrapper.
---
 arch/powerpc/kernel/syscalls.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/syscalls.c b/arch/powerpc/kernel/syscalls.c
index 34e1ae88e15b..a04c97faa21a 100644
--- a/arch/powerpc/kernel/syscalls.c
+++ b/arch/powerpc/kernel/syscalls.c
@@ -71,7 +71,7 @@ long ppc64_personality(unsigned long personality)
if (personality(current->personality) == PER_LINUX32
&& personality(personality) == PER_LINUX)
personality = (personality & ~PER_MASK) | PER_LINUX32;
-   ret = sys_personality(personality);
+   ret = ksys_personality(personality);
if (personality(ret) == PER_LINUX32)
ret = (ret & ~PER_MASK) | PER_LINUX;
return ret;
-- 
2.34.1

[PATCH v6 11/25] powerpc/32: Remove powerpc select specialisation

2022-09-21 Thread Rohan McLure

Syscall #82 has been implemented for 32-bit platforms in a unique way on
powerpc systems. This hack will in effect guess whether the caller is
expecting new select semantics or old select semantics. It does so via a
guess, based off the first parameter. In new select, this parameter
represents the length of a user-memory array of file descriptors, and in
old select this is a pointer to an arguments structure.

The heuristic simply interprets sufficiently large values of its first
parameter as being a call to old select. The following is a discussion
on how this syscall should be handled.

Link: 
https://lore.kernel.org/lkml/13737de5-0eb7-e881-9af0-163b0d29a...@csgroup.eu/

As discussed in this thread, the existence of such a hack suggests that for
whatever powerpc binaries may predate glibc, it is most likely that they
would have taken use of the old select semantics. x86 and arm64 both
implement this syscall with oldselect semantics.

Remove the powerpc implementation, and update syscall.tbl to refer to emit
a reference to sys_old_select and compat_sys_old_select
for 32-bit binaries, in keeping with how other architectures support
syscall #82.

Signed-off-by: Rohan McLure 
Reviewed-by: Nicholas Piggin 
---
V2: Remove arch-specific select handler
V3: Remove ppc_old_select prototype in . Move to
earlier in patch series
V5: Use compat_sys_old_select on 64-bit systems.
---
 arch/powerpc/include/asm/syscalls.h   |  2 --
 arch/powerpc/kernel/syscalls.c| 17 -
 arch/powerpc/kernel/syscalls/syscall.tbl  |  2 +-
 .../arch/powerpc/entry/syscalls/syscall.tbl   |  2 +-
 4 files changed, 2 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/include/asm/syscalls.h 
b/arch/powerpc/include/asm/syscalls.h
index 960b3871db72..20cbd29b1228 100644
--- a/arch/powerpc/include/asm/syscalls.h
+++ b/arch/powerpc/include/asm/syscalls.h
@@ -30,8 +30,6 @@ long sys_mmap2(unsigned long addr, size_t len,
   unsigned long fd, unsigned long pgoff);
 long ppc64_personality(unsigned long personality);
 long sys_rtas(struct rtas_args __user *uargs);
-int ppc_select(int n, fd_set __user *inp, fd_set __user *outp,
-  fd_set __user *exp, struct __kernel_old_timeval __user *tvp);
 long ppc_fadvise64_64(int fd, int advice, u32 offset_high, u32 offset_low,
  u32 len_high, u32 len_low);
 
diff --git a/arch/powerpc/kernel/syscalls.c b/arch/powerpc/kernel/syscalls.c
index abc3fbb3c490..34e1ae88e15b 100644
--- a/arch/powerpc/kernel/syscalls.c
+++ b/arch/powerpc/kernel/syscalls.c
@@ -63,23 +63,6 @@ SYSCALL_DEFINE6(mmap, unsigned long, addr, size_t, len,
return do_mmap2(addr, len, prot, flags, fd, offset, PAGE_SHIFT);
 }
 
-#ifdef CONFIG_PPC32
-/*
- * Due to some executables calling the wrong select we sometimes
- * get wrong args.  This determines how the args are being passed
- * (a single ptr to them all args passed) then calls
- * sys_select() with the appropriate args. -- Cort
- */
-int
-ppc_select(int n, fd_set __user *inp, fd_set __user *outp, fd_set __user *exp, 
struct __kernel_old_timeval __user *tvp)
-{
-   if ((unsigned long)n >= 4096)
-   return sys_old_select((void __user *)n);
-
-   return sys_select(n, inp, outp, exp, tvp);
-}
-#endif
-
 #ifdef CONFIG_PPC64
 long ppc64_personality(unsigned long personality)
 {
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl 
b/arch/powerpc/kernel/syscalls/syscall.tbl
index 2600b4237292..64f27cbbdd2c 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -110,7 +110,7 @@
 79 common  settimeofdaysys_settimeofday
compat_sys_settimeofday
 80 common  getgroups   sys_getgroups
 81 common  setgroups   sys_setgroups
-82 32  select  ppc_select  
sys_ni_syscall
+82 32  select  sys_old_select  
compat_sys_old_select
 82 64  select  sys_ni_syscall
 82 spu select  sys_ni_syscall
 83 common  symlink sys_symlink
diff --git a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl 
b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
index 2600b4237292..64f27cbbdd2c 100644
--- a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
+++ b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
@@ -110,7 +110,7 @@
 79 common  settimeofdaysys_settimeofday
compat_sys_settimeofday
 80 common  getgroups   sys_getgroups
 81 common  setgroups   sys_setgroups
-82 32  select  ppc_select  
sys_ni_syscall
+82 32  select  sys_old_select  
compat_sys_old_select
 82 64  select

[PATCH v6 10/25] powerpc: Use generic fallocate compatibility syscall

2022-09-21 Thread Rohan McLure

The powerpc fallocate compat syscall handler is identical to the
generic implementation provided by commit 59c10c52f573f ("riscv:
compat: syscall: Add compat_sys_call_table implementation"), and as
such can be removed in favour of the generic implementation.

A future patch series will replace more architecture-defined syscall
handlers with generic implementations, dependent on introducing generic
implementations that are compatible with powerpc and arm's parameter
reorderings.

Reported-by: Arnd Bergmann 
Signed-off-by: Rohan McLure 
Reviewed-by: Arnd Bergmann 
---
V2: Remove arch-specific fallocate handler.
V3: Remove generic fallocate prototype. Move to beginning of.
V5: Remove implementation as well which I somehow failed to do.
Replace local BE compat_arg_u64 with generic.
---
 arch/powerpc/include/asm/syscalls.h | 2 --
 arch/powerpc/include/asm/unistd.h   | 1 +
 arch/powerpc/kernel/sys_ppc32.c | 7 ---
 3 files changed, 1 insertion(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/syscalls.h 
b/arch/powerpc/include/asm/syscalls.h
index 16b668515d15..960b3871db72 100644
--- a/arch/powerpc/include/asm/syscalls.h
+++ b/arch/powerpc/include/asm/syscalls.h
@@ -51,8 +51,6 @@ compat_ssize_t compat_sys_readahead(int fd, u32 r4, u32 
offset1, u32 offset2, u3
 int compat_sys_truncate64(const char __user *path, u32 reg4,
  unsigned long len1, unsigned long len2);
 
-long compat_sys_fallocate(int fd, int mode, u32 offset1, u32 offset2, u32 
len1, u32 len2);
-
 int compat_sys_ftruncate64(unsigned int fd, u32 reg4, unsigned long len1,
   unsigned long len2);
 
diff --git a/arch/powerpc/include/asm/unistd.h 
b/arch/powerpc/include/asm/unistd.h
index b1129b4ef57d..659a996c75aa 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -45,6 +45,7 @@
 #define __ARCH_WANT_SYS_UTIME
 #define __ARCH_WANT_SYS_NEWFSTATAT
 #define __ARCH_WANT_COMPAT_STAT
+#define __ARCH_WANT_COMPAT_FALLOCATE
 #define __ARCH_WANT_COMPAT_SYS_SENDFILE
 #endif
 #define __ARCH_WANT_SYS_FORK
diff --git a/arch/powerpc/kernel/sys_ppc32.c b/arch/powerpc/kernel/sys_ppc32.c
index ba363328da2b..d961634976d8 100644
--- a/arch/powerpc/kernel/sys_ppc32.c
+++ b/arch/powerpc/kernel/sys_ppc32.c
@@ -79,13 +79,6 @@ int compat_sys_truncate64(const char __user * path, u32 reg4,
return ksys_truncate(path, merge_64(len1, len2));
 }
 
-long compat_sys_fallocate(int fd, int mode, u32 offset1, u32 offset2,
-u32 len1, u32 len2)
-{
-   return ksys_fallocate(fd, mode, merge_64(offset1, offset2),
-merge_64(len1, len2));
-}
-
 int compat_sys_ftruncate64(unsigned int fd, u32 reg4, unsigned long len1,
 unsigned long len2)
 {
-- 
2.34.1

[PATCH v6 09/25] asm-generic: compat: Support BE for long long args in 32-bit ABIs

2022-09-21 Thread Rohan McLure

32-bit ABIs support passing 64-bit integers by registers via argument
translation. Commit 59c10c52f573 ("riscv: compat: syscall: Add
compat_sys_call_table implementation") implements the compat_arg_u64
macro for efficiently defining little endian compatibility syscalls.

Architectures supporting big endianness may benefit from reciprocal
argument translation, but are welcome also to implement their own.

Signed-off-by: Rohan McLure 
Reviewed-by: Nicholas Piggin 
Reviewed-by: Arnd Bergmann 
---
V5: New patch.
---
 include/asm-generic/compat.h | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/asm-generic/compat.h b/include/asm-generic/compat.h
index d06308a2a7a8..aeb257ad3d1a 100644
--- a/include/asm-generic/compat.h
+++ b/include/asm-generic/compat.h
@@ -14,12 +14,17 @@
 #define COMPAT_OFF_T_MAX   0x7fff
 #endif
 
-#if !defined(compat_arg_u64) && !defined(CONFIG_CPU_BIG_ENDIAN)
+#ifndef compat_arg_u64
+#ifdef CONFIG_CPU_BIG_ENDIAN
 #define compat_arg_u64(name)   u32  name##_lo, u32  name##_hi
 #define compat_arg_u64_dual(name)  u32, name##_lo, u32, name##_hi
+#else
+#define compat_arg_u64(name)   u32  name##_hi, u32  name##_lo
+#define compat_arg_u64_dual(name)  u32, name##_hi, u32, name##_lo
+#endif
 #define compat_arg_u64_glue(name)  (((u64)name##_lo & 0xUL) | \
 ((u64)name##_hi << 32))
-#endif
+#endif /* compat_arg_u64 */
 
 /* These types are common across all compat ABIs */
 typedef u32 compat_size_t;
-- 
2.34.1

[PATCH v6 08/25] powerpc: Fix fallocate and fadvise64_64 compat parameter combination

2022-09-21 Thread Rohan McLure

As reported[1] by Arnd, the arch-specific fadvise64_64 and fallocate
compatibility handlers assume parameters are passed with 32-bit
big-endian ABI. This affects the assignment of odd-even parameter pairs
to the high or low words of a 64-bit syscall parameter.

Fix fadvise64_64 fallocate compat handlers to correctly swap upper/lower
32 bits conditioned on endianness.

A future patch will replace the arch-specific compat fallocate with an
asm-generic implementation. This patch is intended for ease of
back-port.

[1]: 
https://lore.kernel.org/all/be29926f-226e-48dc-871a-e29a54e80...@www.fastmail.com/

Fixes: 57f48b4b74e7 ("powerpc/compat_sys: swap hi/lo parts of 64-bit syscall 
args in LE mode")
Reported-by: Arnd Bergmann 
Signed-off-by: Rohan McLure 
Reviewed-by: Arnd Bergmann 
Reviewed-by: Nicholas Piggin 
---
V5: New patch.
---
 arch/powerpc/include/asm/syscalls.h | 12 
 arch/powerpc/kernel/sys_ppc32.c | 14 +-
 arch/powerpc/kernel/syscalls.c  |  4 ++--
 3 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/syscalls.h 
b/arch/powerpc/include/asm/syscalls.h
index 21c2faaa2957..16b668515d15 100644
--- a/arch/powerpc/include/asm/syscalls.h
+++ b/arch/powerpc/include/asm/syscalls.h
@@ -8,6 +8,18 @@
 #include 
 #include 
 
+/*
+ * long long munging:
+ * The 32 bit ABI passes long longs in an odd even register pair.
+ * High and low parts are swapped depending on endian mode,
+ * so define a macro (similar to mips linux32) to handle that.
+ */
+#ifdef __LITTLE_ENDIAN__
+#define merge_64(low, high) ((u64)high << 32) | low
+#else
+#define merge_64(high, low) ((u64)high << 32) | low
+#endif
+
 struct rtas_args;
 
 long sys_mmap(unsigned long addr, size_t len,
diff --git a/arch/powerpc/kernel/sys_ppc32.c b/arch/powerpc/kernel/sys_ppc32.c
index f4edcc9489fb..ba363328da2b 100644
--- a/arch/powerpc/kernel/sys_ppc32.c
+++ b/arch/powerpc/kernel/sys_ppc32.c
@@ -56,18 +56,6 @@ unsigned long compat_sys_mmap2(unsigned long addr, size_t 
len,
return sys_mmap(addr, len, prot, flags, fd, pgoff << 12);
 }
 
-/* 
- * long long munging:
- * The 32 bit ABI passes long longs in an odd even register pair.
- * High and low parts are swapped depending on endian mode,
- * so define a macro (similar to mips linux32) to handle that.
- */
-#ifdef __LITTLE_ENDIAN__
-#define merge_64(low, high) ((u64)high << 32) | low
-#else
-#define merge_64(high, low) ((u64)high << 32) | low
-#endif
-
 compat_ssize_t compat_sys_pread64(unsigned int fd, char __user *ubuf, 
compat_size_t count,
 u32 reg6, u32 pos1, u32 pos2)
 {
@@ -94,7 +82,7 @@ int compat_sys_truncate64(const char __user * path, u32 reg4,
 long compat_sys_fallocate(int fd, int mode, u32 offset1, u32 offset2,
 u32 len1, u32 len2)
 {
-   return ksys_fallocate(fd, mode, ((loff_t)offset1 << 32) | offset2,
+   return ksys_fallocate(fd, mode, merge_64(offset1, offset2),
 merge_64(len1, len2));
 }
 
diff --git a/arch/powerpc/kernel/syscalls.c b/arch/powerpc/kernel/syscalls.c
index fc999140bc27..abc3fbb3c490 100644
--- a/arch/powerpc/kernel/syscalls.c
+++ b/arch/powerpc/kernel/syscalls.c
@@ -98,8 +98,8 @@ long ppc64_personality(unsigned long personality)
 long ppc_fadvise64_64(int fd, int advice, u32 offset_high, u32 offset_low,
  u32 len_high, u32 len_low)
 {
-   return ksys_fadvise64_64(fd, (u64)offset_high << 32 | offset_low,
-(u64)len_high << 32 | len_low, advice);
+   return ksys_fadvise64_64(fd, merge_64(offset_high, offset_low),
+merge_64(len_high, len_low), advice);
 }
 
 SYSCALL_DEFINE0(switch_endian)
-- 
2.34.1

[PATCH v6 07/25] powerpc/64s: Fix comment on interrupt handler prologue

2022-09-21 Thread Rohan McLure

Interrupt handlers on 64s systems will often need to save register state
from the interrupted process to make space for loading special purpose
registers or for internal state.

Fix a comment documenting a common code path macro in the beginning of
interrupt handlers where r10 is saved to the PACA to afford space for
the value of the CFAR. Comment is currently written as if r10-r12 are
saved to PACA, but in fact only r10 is saved, with r11-r12 saved much
later. The distance in code between these saves has grown over the many
revisions of this macro. Fix this by signalling with a comment where
r11-r12 are saved to the PACA.

Signed-off-by: Rohan McLure 
Reported-by: Nicholas Piggin 
Reviewed-by: Nicholas Piggin 
---
V2: Given its own commit
V3: Annotate r11-r12 save locations with comment.
---
 arch/powerpc/kernel/exceptions-64s.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 3d0dc133a9ae..a3b51441b039 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -281,7 +281,7 @@ BEGIN_FTR_SECTION
mfspr   r9,SPRN_PPR
 END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
HMT_MEDIUM
-   std r10,IAREA+EX_R10(r13)   /* save r10 - r12 */
+   std r10,IAREA+EX_R10(r13)   /* save r10 */
.if ICFAR
 BEGIN_FTR_SECTION
mfspr   r10,SPRN_CFAR
@@ -321,7 +321,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
mfctr   r10
std r10,IAREA+EX_CTR(r13)
mfcrr9
-   std r11,IAREA+EX_R11(r13)
+   std r11,IAREA+EX_R11(r13)   /* save r11 - r12 */
std r12,IAREA+EX_R12(r13)
 
/*
-- 
2.34.1

[PATCH v6 06/25] powerpc/64e: Clarify register saves and clears with {SAVE,ZEROIZE}_GPRS

2022-09-21 Thread Rohan McLure

The common interrupt handler prologue macro and the bad_stack
trampolines include consecutive sequences of register saves, and some
register clears. Neaten such instances by expanding use of the SAVE_GPRS
macro and employing the ZEROIZE_GPR macro when appropriate.

Also simplify an invocation of SAVE_GPRS targetting all non-volatile
registers to SAVE_NVGPRS.

Signed-off-by: Rohan Mclure 
Reported-by: Nicholas Piggin 
Reviewed-by: Nicholas Piggin 
---
V4: New commit.
V6: Split REST_GPRS(0, 1, r1) to remove dependence on macro
implementation ordering the r0 restore before r1 restore.
---
 arch/powerpc/kernel/exceptions-64e.S | 28 +++---
 1 file changed, 12 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S
index 67dc4e3179a0..f1d714acc945 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -216,17 +216,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV)
mtlrr10
mtcrr11
 
-   ld  r10,GPR10(r1)
-   ld  r11,GPR11(r1)
-   ld  r12,GPR12(r1)
+   REST_GPRS(10, 12, r1)
mtspr   \scratch,r0
 
std r10,\paca_ex+EX_R10(r13);
std r11,\paca_ex+EX_R11(r13);
ld  r10,_NIP(r1)
ld  r11,_MSR(r1)
-   ld  r0,GPR0(r1)
-   ld  r1,GPR1(r1)
+   REST_GPR(0, r1)
+   REST_GPR(1, r1)
mtspr   \srr0,r10
mtspr   \srr1,r11
ld  r10,\paca_ex+EX_R10(r13)
@@ -372,16 +370,15 @@ ret_from_mc_except:
 /* Core exception code for all exceptions except TLB misses. */
 #define EXCEPTION_COMMON_LVL(n, scratch, excf) \
 exc_##n##_common:  \
-   std r0,GPR0(r1);/* save r0 in stackframe */ \
-   std r2,GPR2(r1);/* save r2 in stackframe */ \
-   SAVE_GPRS(3, 9, r1);/* save r3 - r9 in stackframe */\
+   SAVE_GPR(0, r1);/* save r0 in stackframe */ \
+   SAVE_GPRS(2, 9, r1);/* save r2 - r9 in stackframe */\
std r10,_NIP(r1);   /* save SRR0 to stackframe */   \
std r11,_MSR(r1);   /* save SRR1 to stackframe */   \
beq 2f; /* if from kernel mode */   \
 2: ld  r3,excf+EX_R10(r13);/* get back r10 */  \
ld  r4,excf+EX_R11(r13);/* get back r11 */  \
mfspr   r5,scratch; /* get back r13 */  \
-   std r12,GPR12(r1);  /* save r12 in stackframe */\
+   SAVE_GPR(12, r1);   /* save r12 in stackframe */\
ld  r2,PACATOC(r13);/* get kernel TOC into r2 */\
mflrr6; /* save LR in stackframe */ \
mfctr   r7; /* save CTR in stackframe */\
@@ -390,7 +387,7 @@ exc_##n##_common:   
\
lwz r10,excf+EX_CR(r13);/* load orig CR back from PACA  */  \
lbz r11,PACAIRQSOFTMASK(r13); /* get current IRQ softe */   \
ld  r12,exception_marker@toc(r2);   \
-   li  r0,0;   \
+   ZEROIZE_GPR(0); \
std r3,GPR10(r1);   /* save r10 to stackframe */\
std r4,GPR11(r1);   /* save r11 to stackframe */\
std r5,GPR13(r1);   /* save it to stackframe */ \
@@ -1056,15 +1053,14 @@ bad_stack_book3e:
mfspr   r11,SPRN_ESR
std r10,_DEAR(r1)
std r11,_ESR(r1)
-   std r0,GPR0(r1);/* save r0 in stackframe */ \
-   std r2,GPR2(r1);/* save r2 in stackframe */ \
-   SAVE_GPRS(3, 9, r1);/* save r3 - r9 in stackframe */\
+   SAVE_GPR(0, r1);/* save r0 in stackframe */ \
+   SAVE_GPRS(2, 9, r1);/* save r2 - r9 in stackframe */\
ld  r3,PACA_EXGEN+EX_R10(r13);/* get back r10 */\
ld  r4,PACA_EXGEN+EX_R11(r13);/* get back r11 */\
mfspr   r5,SPRN_SPRG_GEN_SCRATCH;/* get back r13 XXX can be wrong */ \
std r3,GPR10(r1);   /* save r10 to stackframe */\
std r4,GPR11(r1);   /* save r11 to stackframe */\
-   std r12,GPR12(r1);  /* save r12 in stackframe */\
+   SAVE_GPR(12, r1);   /* save r12 in stackframe */\
std r5,GPR13(r1);   /* save it to stackframe */ \
mflrr10
mfctr   r11
@@ -1072,12 +1068,12 @@ bad_stack_book3e:
std

[PATCH v6 05/25] powerpc/32: Clarify interrupt restores with REST_GPR macro in entry_32.S

2022-09-21 Thread Rohan McLure

Restoring the register state of the interrupted thread involves issuing
a large number of predictable loads to the kernel stack frame. Issue the
REST_GPR{,S} macros to clearly signal when this is happening, and bunch
together restores at the end of the interrupt handler where the saved
value is not consumed earlier in the handler code.

Signed-off-by: Rohan McLure 
Reported-by: Christophe Leroy 
Reviewed-by: Nicholas Piggin 
---
V3: New patch.
V4: Minimise restores in the unrecoverable window between
restoring SRR0/1 and return from interrupt.
---
 arch/powerpc/kernel/entry_32.S | 33 +---
 1 file changed, 13 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 44dfce9a60c5..e4b694cebc44 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -68,7 +68,7 @@ prepare_transfer_to_handler:
lwz r9,_MSR(r11)/* if sleeping, clear MSR.EE */
rlwinm  r9,r9,0,~MSR_EE
lwz r12,_LINK(r11)  /* and return to address in LR */
-   lwz r2, GPR2(r11)
+   REST_GPR(2, r11)
b   fast_exception_return
 _ASM_NOKPROBE_SYMBOL(prepare_transfer_to_handler)
 #endif /* CONFIG_PPC_BOOK3S_32 || CONFIG_E500 */
@@ -144,7 +144,7 @@ ret_from_syscall:
lwz r7,_NIP(r1)
lwz r8,_MSR(r1)
cmpwi   r3,0
-   lwz r3,GPR3(r1)
+   REST_GPR(3, r1)
 syscall_exit_finish:
mtspr   SPRN_SRR0,r7
mtspr   SPRN_SRR1,r8
@@ -152,8 +152,8 @@ syscall_exit_finish:
bne 3f
mtcrr5
 
-1: lwz r2,GPR2(r1)
-   lwz r1,GPR1(r1)
+1: REST_GPR(2, r1)
+   REST_GPR(1, r1)
rfi
 #ifdef CONFIG_40x
b . /* Prevent prefetch past rfi */
@@ -165,10 +165,8 @@ syscall_exit_finish:
REST_NVGPRS(r1)
mtctr   r4
mtxer   r5
-   lwz r0,GPR0(r1)
-   lwz r3,GPR3(r1)
-   REST_GPRS(4, 11, r1)
-   lwz r12,GPR12(r1)
+   REST_GPR(0, r1)
+   REST_GPRS(3, 12, r1)
b   1b
 
 #ifdef CONFIG_44x
@@ -260,9 +258,8 @@ fast_exception_return:
beq 3f  /* if not, we've got problems */
 #endif
 
-2: REST_GPRS(3, 6, r11)
-   lwz r10,_CCR(r11)
-   REST_GPRS(1, 2, r11)
+2: lwz r10,_CCR(r11)
+   REST_GPRS(1, 6, r11)
mtcrr10
lwz r10,_LINK(r11)
mtlrr10
@@ -277,7 +274,7 @@ fast_exception_return:
mtspr   SPRN_SRR0,r12
REST_GPR(9, r11)
REST_GPR(12, r11)
-   lwz r11,GPR11(r11)
+   REST_GPR(11, r11)
rfi
 #ifdef CONFIG_40x
b . /* Prevent prefetch past rfi */
@@ -454,9 +451,8 @@ _ASM_NOKPROBE_SYMBOL(interrupt_return)
lwz r3,_MSR(r1);\
andi.   r3,r3,MSR_PR;   \
bne interrupt_return;   \
-   lwz r0,GPR0(r1);\
-   lwz r2,GPR2(r1);\
-   REST_GPRS(3, 8, r1);\
+   REST_GPR(0, r1);\
+   REST_GPRS(2, 8, r1);\
lwz r10,_XER(r1);   \
lwz r11,_CTR(r1);   \
mtspr   SPRN_XER,r10;   \
@@ -475,11 +471,8 @@ _ASM_NOKPROBE_SYMBOL(interrupt_return)
lwz r12,_MSR(r1);   \
mtspr   exc_lvl_srr0,r11;   \
mtspr   exc_lvl_srr1,r12;   \
-   lwz r9,GPR9(r1);\
-   lwz r12,GPR12(r1);  \
-   lwz r10,GPR10(r1);  \
-   lwz r11,GPR11(r1);  \
-   lwz r1,GPR1(r1);\
+   REST_GPRS(9, 12, r1);   \
+   REST_GPR(1, r1);\
exc_lvl_rfi;\
b   .;  /* prevent prefetch past exc_lvl_rfi */
 
-- 
2.34.1

[PATCH v6 04/25] powerpc/64s: Use {ZEROIZE,SAVE,REST}_GPRS macros in sc, scv 0 handlers

2022-09-21 Thread Rohan McLure

Use the convenience macros for saving/clearing/restoring gprs in keeping
with syscall calling conventions. The plural variants of these macros
can store a range of registers for concision.

This works well when the user gpr value we are hoping to save is still
live. In the syscall interrupt handlers, user register state is
sometimes juggled between registers. Hold-off from issuing the SAVE_GPR
macro for applicable neighbouring lines to highlight the delicate
register save logic.

Signed-off-by: Rohan McLure 
Reviewed-by: Nicholas Piggin 
---
V2: Update summary
V3: Update summary regarding exclusions for the SAVE_GPR marco.
ledge new name for ZEROIZE_GPR{,S} macros.
V5: Move to beginning of series
---
 arch/powerpc/kernel/interrupt_64.S | 43 ++--
 1 file changed, 9 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/kernel/interrupt_64.S 
b/arch/powerpc/kernel/interrupt_64.S
index 71d2d9497283..7d92a7a54727 100644
--- a/arch/powerpc/kernel/interrupt_64.S
+++ b/arch/powerpc/kernel/interrupt_64.S
@@ -71,12 +71,7 @@ _ASM_NOKPROBE_SYMBOL(system_call_vectored_\name)
mfcrr12
li  r11,0
/* Can we avoid saving r3-r8 in common case? */
-   std r3,GPR3(r1)
-   std r4,GPR4(r1)
-   std r5,GPR5(r1)
-   std r6,GPR6(r1)
-   std r7,GPR7(r1)
-   std r8,GPR8(r1)
+   SAVE_GPRS(3, 8, r1)
/* Zero r9-r12, this should only be required when restoring all GPRs */
std r11,GPR9(r1)
std r11,GPR10(r1)
@@ -149,17 +144,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
/* Could zero these as per ABI, but we may consider a stricter ABI
 * which preserves these if libc implementations can benefit, so
 * restore them for now until further measurement is done. */
-   ld  r0,GPR0(r1)
-   ld  r4,GPR4(r1)
-   ld  r5,GPR5(r1)
-   ld  r6,GPR6(r1)
-   ld  r7,GPR7(r1)
-   ld  r8,GPR8(r1)
+   REST_GPR(0, r1)
+   REST_GPRS(4, 8, r1)
/* Zero volatile regs that may contain sensitive kernel data */
-   li  r9,0
-   li  r10,0
-   li  r11,0
-   li  r12,0
+   ZEROIZE_GPRS(9, 12)
mtspr   SPRN_XER,r0
 
/*
@@ -182,7 +170,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
ld  r5,_XER(r1)
 
REST_NVGPRS(r1)
-   ld  r0,GPR0(r1)
+   REST_GPR(0, r1)
mtcrr2
mtctr   r3
mtlrr4
@@ -250,12 +238,7 @@ END_BTB_FLUSH_SECTION
mfcrr12
li  r11,0
/* Can we avoid saving r3-r8 in common case? */
-   std r3,GPR3(r1)
-   std r4,GPR4(r1)
-   std r5,GPR5(r1)
-   std r6,GPR6(r1)
-   std r7,GPR7(r1)
-   std r8,GPR8(r1)
+   SAVE_GPRS(3, 8, r1)
/* Zero r9-r12, this should only be required when restoring all GPRs */
std r11,GPR9(r1)
std r11,GPR10(r1)
@@ -345,16 +328,8 @@ END_FTR_SECTION_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
cmpdi   r3,0
bne .Lsyscall_restore_regs
/* Zero volatile regs that may contain sensitive kernel data */
-   li  r0,0
-   li  r4,0
-   li  r5,0
-   li  r6,0
-   li  r7,0
-   li  r8,0
-   li  r9,0
-   li  r10,0
-   li  r11,0
-   li  r12,0
+   ZEROIZE_GPR(0)
+   ZEROIZE_GPRS(4, 12)
mtctr   r0
mtspr   SPRN_XER,r0
 .Lsyscall_restore_regs_cont:
@@ -380,7 +355,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
REST_NVGPRS(r1)
mtctr   r3
mtspr   SPRN_XER,r4
-   ld  r0,GPR0(r1)
+   REST_GPR(0, r1)
REST_GPRS(4, 12, r1)
b   .Lsyscall_restore_regs_cont
 .Lsyscall_rst_end:
-- 
2.34.1

[PATCH v6 03/25] powerpc: Add ZEROIZE_GPRS macros for register clears

2022-09-21 Thread Rohan McLure

Provide register zeroing macros, following the same convention as
existing register stack save/restore macros, to be used in later
change to concisely zero a sequence of consecutive gprs.

The resulting macros are called ZEROIZE_GPRS and ZEROIZE_NVGPRS, keeping
with the naming of the accompanying restore and save macros, and usage
of zeroize to describe this operation elsewhere in the kernel.

Signed-off-by: Rohan McLure 
Reviewed-by: Nicholas Piggin 
---
V2: Change 'ZERO' usage in naming to 'NULLIFY', a more obvious verb
V3: Change 'NULLIFY' usage in naming to 'ZEROIZE', which has
ent in kernel and explicitly specifies that we are zeroing.
V4: Update commit message to use zeroize.
V5: The reason for the patch is to add zeroize macros. Move that
to first paragraph in patch description.
---
 arch/powerpc/include/asm/ppc_asm.h | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/arch/powerpc/include/asm/ppc_asm.h 
b/arch/powerpc/include/asm/ppc_asm.h
index 83c02f5a7f2a..b95689ada59c 100644
--- a/arch/powerpc/include/asm/ppc_asm.h
+++ b/arch/powerpc/include/asm/ppc_asm.h
@@ -33,6 +33,20 @@
.endr
 .endm
 
+/*
+ * This expands to a sequence of register clears for regs start to end
+ * inclusive, of the form:
+ *
+ *   li rN, 0
+ */
+.macro ZEROIZE_REGS start, end
+   .Lreg=\start
+   .rept (\end - \start + 1)
+   li  .Lreg, 0
+   .Lreg=.Lreg+1
+   .endr
+.endm
+
 /*
  * Macros for storing registers into and loading registers from
  * exception frames.
@@ -49,6 +63,14 @@
 #define REST_NVGPRS(base)  REST_GPRS(13, 31, base)
 #endif
 
+#defineZEROIZE_GPRS(start, end)ZEROIZE_REGS start, end
+#ifdef __powerpc64__
+#defineZEROIZE_NVGPRS()ZEROIZE_GPRS(14, 31)
+#else
+#defineZEROIZE_NVGPRS()ZEROIZE_GPRS(13, 31)
+#endif
+#defineZEROIZE_GPR(n)  ZEROIZE_GPRS(n, n)
+
 #define SAVE_GPR(n, base)  SAVE_GPRS(n, n, base)
 #define REST_GPR(n, base)  REST_GPRS(n, n, base)
 
-- 
2.34.1

[PATCH v6 02/25] powerpc: Save caller r3 prior to system_call_exception

2022-09-21 Thread Rohan McLure

This reverts commit 8875f47b7681 ("powerpc/syscall: Save r3 in regs->orig_r3
").

Save caller's original r3 state to the kernel stackframe before entering
system_call_exception. This allows for user registers to be cleared by
the time system_call_exception is entered, reducing the influence of
user registers on speculation within the kernel.

Prior to this commit, orig_r3 was saved at the beginning of
system_call_exception. Instead, save orig_r3 while the user value is
still live in r3.

Also replicate this early save in 32-bit. A similar save was removed in
commit 6f76a01173cc ("powerpc/syscall: implement system call entry/exit
logic in C for PPC32") when 32-bit adopted system_call_exception. Revert
its removal of orig_r3 saves.

Signed-off-by: Rohan McLure 
Reviewed-by: Nicholas Piggin 
---
V3: New commit.
V5: New commit message, as we do more than just revert 8875f47b7681.
---
 arch/powerpc/kernel/entry_32.S | 1 +
 arch/powerpc/kernel/interrupt_64.S | 2 ++
 arch/powerpc/kernel/syscall.c  | 1 -
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 1d599df6f169..44dfce9a60c5 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -101,6 +101,7 @@ __kuep_unlock:
 
.globl  transfer_to_syscall
 transfer_to_syscall:
+   stw r3, ORIG_GPR3(r1)
stw r11, GPR1(r1)
stw r11, 0(r1)
mflrr12
diff --git a/arch/powerpc/kernel/interrupt_64.S 
b/arch/powerpc/kernel/interrupt_64.S
index ce25b28cf418..71d2d9497283 100644
--- a/arch/powerpc/kernel/interrupt_64.S
+++ b/arch/powerpc/kernel/interrupt_64.S
@@ -91,6 +91,7 @@ _ASM_NOKPROBE_SYMBOL(system_call_vectored_\name)
li  r11,\trapnr
std r11,_TRAP(r1)
std r12,_CCR(r1)
+   std r3,ORIG_GPR3(r1)
addir10,r1,STACK_FRAME_OVERHEAD
ld  r11,exception_marker@toc(r2)
std r11,-16(r10)/* "regshere" marker */
@@ -275,6 +276,7 @@ END_BTB_FLUSH_SECTION
std r10,_LINK(r1)
std r11,_TRAP(r1)
std r12,_CCR(r1)
+   std r3,ORIG_GPR3(r1)
addir10,r1,STACK_FRAME_OVERHEAD
ld  r11,exception_marker@toc(r2)
std r11,-16(r10)/* "regshere" marker */
diff --git a/arch/powerpc/kernel/syscall.c b/arch/powerpc/kernel/syscall.c
index 81ace9e8b72b..64102a64fd84 100644
--- a/arch/powerpc/kernel/syscall.c
+++ b/arch/powerpc/kernel/syscall.c
@@ -25,7 +25,6 @@ notrace long system_call_exception(long r3, long r4, long r5,
kuap_lock();
 
add_random_kstack_offset();
-   regs->orig_gpr3 = r3;
 
if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
BUG_ON(irq_soft_mask_return() != IRQS_ALL_DISABLED);
-- 
2.34.1

Re: [PATCH v6 2/8] dt-bindings: phy: Add Lynx 10G phy binding

2022-09-21 Thread Krzysztof Kozlowski

On Tue, 20 Sep 2022 16:23:50 -0400, Sean Anderson wrote:
> This adds a binding for the SerDes module found on QorIQ processors.
> Each phy is a subnode of the top-level device, possibly supporting
> multiple lanes and protocols. This "thick" #phy-cells is used due to
> allow for better organization of parameters. Note that the particular
> parameters necessary to select a protocol-controller/lane combination
> vary across different SoCs, and even within different SerDes on the same
> SoC.
> 
> The driver is designed to be able to completely reconfigure lanes at
> runtime. Generally, the phy consumer can select the appropriate
> protocol using set_mode.
> 
> There are two PLLs, each of which can be used as the master clock for
> each lane. Each PLL has its own reference. For the moment they are
> required, because it simplifies the driver implementation. Absent
> reference clocks can be modeled by a fixed-clock with a rate of 0.
> 
> Signed-off-by: Sean Anderson 
> Reviewed-by: Rob Herring 
> ---
> 
> Changes in v6:
> - fsl,type -> phy-type
> 
> Changes in v4:
> - Use subnodes to describe lane configuration, instead of describing
>   PCCRs. This is the same style used by phy-cadence-sierra et al.
> 
> Changes in v3:
> - Manually expand yaml references
> - Add mode configuration to device tree
> 
> Changes in v2:
> - Rename to fsl,lynx-10g.yaml
> - Refer to the device in the documentation, rather than the binding
> - Move compatible first
> - Document phy cells in the description
> - Allow a value of 1 for phy-cells. This allows for compatibility with
>   the similar (but according to Ioana Ciornei different enough) lynx-28g
>   binding.
> - Remove minItems
> - Use list for clock-names
> - Fix example binding having too many cells in regs
> - Add #clock-cells. This will allow using assigned-clocks* to configure
>   the PLLs.
> - Document the structure of the compatible strings
> 
>  .../devicetree/bindings/phy/fsl,lynx-10g.yaml | 236 ++
>  1 file changed, 236 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/phy/fsl,lynx-10g.yaml
> 

My bot found errors running 'make DT_CHECKER_FLAGS=-m dt_binding_check'
on your patch (DT_CHECKER_FLAGS is new in v5.13):

yamllint warnings/errors:

dtschema/dtc warnings/errors:
Error: Documentation/devicetree/bindings/phy/fsl,lynx-10g.example.dts:51.27-28 
syntax error
FATAL ERROR: Unable to parse input tree
make[1]: *** [scripts/Makefile.lib:384: 
Documentation/devicetree/bindings/phy/fsl,lynx-10g.example.dtb] Error 1
make[1]: *** Waiting for unfinished jobs
make: *** [Makefile:1420: dt_binding_check] Error 2

doc reference errors (make refcheckdocs):

See https://patchwork.ozlabs.org/patch/

This check can fail if there are any dependencies. The base for a patch
series is generally the most recent rc1.

If you already ran 'make dt_binding_check' and didn't see the above
error(s), then make sure 'yamllint' is installed and dt-schema is up to
date:

pip3 install dtschema --upgrade

Please check and re-submit.

[PATCH v6 01/25] powerpc: Remove asmlinkage from syscall handler definitions

2022-09-21 Thread Rohan McLure

The asmlinkage macro has no special meaning in powerpc, and prior to
this patch is used sporadically on some syscall handler definitions. On
architectures that do not define asmlinkage, it resolves to extern "C"
for C++ compilers and a nop otherwise. The current invocations of
asmlinkage provide far from complete support for C++ toolchains, and so
the macro serves no purpose in powerpc.

Remove all invocations of asmlinkage in arch/powerpc. These incidentally
only occur in syscall definitions and prototypes.

Signed-off-by: Rohan McLure 
Reviewed-by: Nicholas Piggin 
Reviewed-by: Andrew Donnellan 
---
V3: new patch
---
 arch/powerpc/include/asm/syscalls.h | 16 
 arch/powerpc/kernel/sys_ppc32.c |  8 
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/syscalls.h 
b/arch/powerpc/include/asm/syscalls.h
index a2b13e55254f..21c2faaa2957 100644
--- a/arch/powerpc/include/asm/syscalls.h
+++ b/arch/powerpc/include/asm/syscalls.h
@@ -10,14 +10,14 @@
 
 struct rtas_args;
 
-asmlinkage long sys_mmap(unsigned long addr, size_t len,
-   unsigned long prot, unsigned long flags,
-   unsigned long fd, off_t offset);
-asmlinkage long sys_mmap2(unsigned long addr, size_t len,
-   unsigned long prot, unsigned long flags,
-   unsigned long fd, unsigned long pgoff);
-asmlinkage long ppc64_personality(unsigned long personality);
-asmlinkage long sys_rtas(struct rtas_args __user *uargs);
+long sys_mmap(unsigned long addr, size_t len,
+ unsigned long prot, unsigned long flags,
+ unsigned long fd, off_t offset);
+long sys_mmap2(unsigned long addr, size_t len,
+  unsigned long prot, unsigned long flags,
+  unsigned long fd, unsigned long pgoff);
+long ppc64_personality(unsigned long personality);
+long sys_rtas(struct rtas_args __user *uargs);
 int ppc_select(int n, fd_set __user *inp, fd_set __user *outp,
   fd_set __user *exp, struct __kernel_old_timeval __user *tvp);
 long ppc_fadvise64_64(int fd, int advice, u32 offset_high, u32 offset_low,
diff --git a/arch/powerpc/kernel/sys_ppc32.c b/arch/powerpc/kernel/sys_ppc32.c
index 16ff0399a257..f4edcc9489fb 100644
--- a/arch/powerpc/kernel/sys_ppc32.c
+++ b/arch/powerpc/kernel/sys_ppc32.c
@@ -85,20 +85,20 @@ compat_ssize_t compat_sys_readahead(int fd, u32 r4, u32 
offset1, u32 offset2, u3
return ksys_readahead(fd, merge_64(offset1, offset2), count);
 }
 
-asmlinkage int compat_sys_truncate64(const char __user * path, u32 reg4,
+int compat_sys_truncate64(const char __user * path, u32 reg4,
unsigned long len1, unsigned long len2)
 {
return ksys_truncate(path, merge_64(len1, len2));
 }
 
-asmlinkage long compat_sys_fallocate(int fd, int mode, u32 offset1, u32 
offset2,
+long compat_sys_fallocate(int fd, int mode, u32 offset1, u32 offset2,
 u32 len1, u32 len2)
 {
return ksys_fallocate(fd, mode, ((loff_t)offset1 << 32) | offset2,
 merge_64(len1, len2));
 }
 
-asmlinkage int compat_sys_ftruncate64(unsigned int fd, u32 reg4, unsigned long 
len1,
+int compat_sys_ftruncate64(unsigned int fd, u32 reg4, unsigned long len1,
 unsigned long len2)
 {
return ksys_ftruncate(fd, merge_64(len1, len2));
@@ -111,7 +111,7 @@ long ppc32_fadvise64(int fd, u32 unused, u32 offset1, u32 
offset2,
 advice);
 }
 
-asmlinkage long compat_sys_sync_file_range2(int fd, unsigned int flags,
+long compat_sys_sync_file_range2(int fd, unsigned int flags,
   unsigned offset1, unsigned offset2,
   unsigned nbytes1, unsigned nbytes2)
 {
-- 
2.34.1

[PATCH v6 00/25] powerpc: Syscall wrapper and register clearing

2022-09-21 Thread Rohan McLure

V5 available here:

Link: 
https://lore.kernel.org/all/20220916053300.786330-2-rmcl...@linux.ibm.com/T/

Implement a syscall wrapper, causing arguments to handlers to be passed
via a struct pt_regs on the stack. The syscall wrapper is implemented
for all platforms other than the Cell processor, from which SPUs expect
the ability to directly call syscall handler symbols with the regular
in-register calling convention.

Adopting syscall wrappers requires redefinition of architecture-specific
syscalls and compatibility syscalls to use the SYSCALL_DEFINE and
COMPAT_SYSCALL_DEFINE macros, as well as removal of direct-references to
the emitted syscall-handler symbols from within the kernel. This work
lead to the following modernisations of powerpc's syscall handlers:

 - Replace syscall 82 semantics with sys_old_select and remove
   ppc_select handler, which features direct call to both sys_old_select
   and sys_select.
 - Use a generic fallocate compatibility syscall

Replace asm implementation of syscall table with C implementation for
more compile-time checks.

Many compatibility syscalls are candidates to be removed in favour of
generically defined handlers, but exhibit different parameter orderings
and numberings due to 32-bit ABI support for 64-bit parameters. The
parameter reorderings are however consistent with arm. A future patch
series will serve to modernise syscalls by providing generic
implementations featuring these reorderings.

The design of this syscall is very similar to the s390, x86 and arm64
implementations. See also Commit 4378a7d4be30 (arm64: implement syscall 
wrappers).
The motivation for this change is that it allows for the clearing of
register state when entering the kernel via through interrupt handlers
on 64-bit servers. This serves to reduce the influence of values in
registers carried over from the interrupted process, e.g. syscall
parameters from user space, or user state at the site of a pagefault.
All values in registers are saved and zeroized at the entry to an
interrupt handler and restored afterward. While this may sound like a
heavy-weight mitigation, many gprs are already saved and restored on
handling of an interrupt, and the mmap_bench benchmark on Power 9 guest,
repeatedly invoking the pagefault handler suggests at most ~0.8%
regression in performance. Realistic workloads are not constantly
producing interrupts, and so this does not indicate realistic slowdown.

Using wrapped syscalls yields to a performance improvement of ~5.6% on
the null_syscall benchmark on pseries guests, by removing the need for
system_call_exception to allocate its own stack frame. This amortises
the additional costs of saving and restoring non-volatile registers
(register clearing is cheap on super scalar platforms), and so the
final mitigation actually yields a net performance improvement of ~0.6%
on the null_syscall benchmark.

The clearing of general purpose registers on interrupts other than
syscalls is enabled by default only on Book3E 64-bit systems (where the
mitigation is inexpensive), but available to other 64-bit systems via
the INTERRUPT_SANITIZE_REGISTERS Kconfig option. This mitigation is
optional, as the speculation influence of interrupts is likely less than
that of syscalls.

Patch Changelog:

 - Clears and restores of NVGPRS that depend on either SANITIZE or
   !SANITIZE have convenience macros.
 - Split syscall wrapper patch with change to system_call_exception
   calling convention.
 - In 64e, exchange misleading REST_GPRS(0, 1, r1) to clearly restore
   r0 first and avoid clobbering the stack pointer.
 - Remove extraneous clears of the high-order words of syscall
   parameters, which is now redundant thanks to use of
   COMPAT_SYSCALL_DEFINE everywhere.

Rohan McLure (25):
  powerpc: Remove asmlinkage from syscall handler definitions
  powerpc: Save caller r3 prior to system_call_exception
  powerpc: Add ZEROIZE_GPRS macros for register clears
  powerpc/64s: Use {ZEROIZE,SAVE,REST}_GPRS macros in sc, scv 0 handlers
  powerpc/32: Clarify interrupt restores with REST_GPR macro in
entry_32.S
  powerpc/64e: Clarify register saves and clears with
{SAVE,ZEROIZE}_GPRS
  powerpc/64s: Fix comment on interrupt handler prologue
  powerpc: Fix fallocate and fadvise64_64 compat parameter combination
  asm-generic: compat: Support BE for long long args in 32-bit ABIs
  powerpc: Use generic fallocate compatibility syscall
  powerpc/32: Remove powerpc select specialisation
  powerpc: Remove direct call to personality syscall handler
  powerpc: Remove direct call to mmap2 syscall handlers
  powerpc: Provide do_ppc64_personality helper
  powerpc: Adopt SYSCALL_DEFINE for arch-specific syscall handlers
  powerpc: Include all arch-specific syscall prototypes
  powerpc: Enable compile-time check for syscall handlers
  powerpc: Use common syscall handler type
  powerpc: Remove high-order word clearing on compat syscall entry
  powerpc: Change system_call_exception calling convention

Re: [PATCH v3 4/4] arm64: support batched/deferred tlb shootdown during page reclamation

2022-09-21 Thread Anshuman Khandual

On 8/22/22 13:51, Yicong Yang wrote:
> +static inline void arch_tlbbatch_add_mm(struct arch_tlbflush_unmap_batch 
> *batch,
> + struct mm_struct *mm,
> + unsigned long uaddr)
> +{
> + __flush_tlb_page_nosync(mm, uaddr);
> +}
> +
> +static inline void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch 
> *batch)
> +{
> + dsb(ish);
> +}

Just wondering if arch_tlbbatch_add_mm() could also detect continuous mapping
TLB invalidation requests on a given mm and try to generate a range based TLB
invalidation such as flush_tlb_range().

struct arch_tlbflush_unmap_batch via task->tlb_ubc->arch can track continuous
ranges while being queued up via arch_tlbbatch_add_mm(), any range formed can
later be flushed in subsequent arch_tlbbatch_flush() ?

OR

It might not be worth the effort and complexity, in comparison to performance
improvement, TLB range flush brings in ?

[powerpc] Kernel crash with THP tests (next-20220920)

2022-09-21 Thread Sachin Sant

While running transparent huge page tests [1] against 6.0.0-rc6-next-20220920
following crash is seen on IBM Power server.

Kernel attempted to read user page (34) - exploit attempt? (uid: 0)
BUG: Kernel NULL pointer dereference on read at 0x0034
Faulting instruction address: 0xc04d2744
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: dm_mod(E) bonding(E) rfkill(E) tls(E) sunrpc(E) nd_pmem(E) 
nd_btt(E) dax_pmem(E) papr_scm(E) libnvdimm(E) pseries_rng(E) vmx_crypto(E) 
ext4(E) mbcache(E) jbd2(E) sd_mod(E) t10_pi(E) crc64_rocksoft(E) crc64(E) sg(E) 
ibmvscsi(E) scsi_transport_srp(E) ibmveth(E) fuse(E)
CPU: 37 PID: 2219255 Comm: sysctl Tainted: GE  
6.0.0-rc6-next-20220920 #1
NIP:  c04d2744 LR: c04d2734 CTR: 
REGS: c012801bf660 TRAP: 0300   Tainted: GE   
(6.0.0-rc6-next-20220920)
MSR:  80009033   CR: 24048222  XER: 2004
CFAR: c04b0eac DAR: 0034 DSISR: 4000 IRQMASK: 0 
GPR00: c04d2734 c012801bf900 c2a92300  
GPR04: c2ac8ac0 c1209340 0005 c01286714b80 
GPR08: 0034    
GPR12: 28048242 c0167fff6b00   
GPR16:     
GPR20: c012801bfae8 0001 0100 0001 
GPR24: c012801bfae8 c2ac8ac0 0002 0005 
GPR28:  0001  00346cca 
NIP [c04d2744] alloc_buddy_huge_page+0xd4/0x240
LR [c04d2734] alloc_buddy_huge_page+0xc4/0x240
Call Trace:
[c012801bf900] [c04d2734] alloc_buddy_huge_page+0xc4/0x240 
(unreliable)
[c012801bf9b0] [c04d46a4] alloc_fresh_huge_page.part.72+0x214/0x2a0
[c012801bfa40] [c04d7f88] alloc_pool_huge_page+0x118/0x190
[c012801bfa90] [c04d84dc] __nr_hugepages_store_common+0x4dc/0x610
[c012801bfb70] [c04d88bc] hugetlb_sysctl_handler_common+0x13c/0x180
[c012801bfc10] [c06380e0] proc_sys_call_handler+0x210/0x350
[c012801bfc90] [c0551c00] vfs_write+0x2e0/0x460
[c012801bfd50] [c0551f5c] ksys_write+0x7c/0x140
[c012801bfda0] [c0033f58] system_call_exception+0x188/0x3f0
[c012801bfe10] [c000c53c] system_call_common+0xec/0x270
--- interrupt: c00 at 0x7fffa9520c34
NIP:  7fffa9520c34 LR: 0001024754bc CTR: 
REGS: c012801bfe80 TRAP: 0c00   Tainted: GE   
(6.0.0-rc6-next-20220920)
MSR:  8280f033   CR: 28002202  XER: 

IRQMASK: 0 
GPR00: 0004 7fffccd76cd0 7fffa9607300 0003 
GPR04: 000138da6970 0006 fff6  
GPR08: 000138da6970    
GPR12:  7fffa9a40940   
GPR16:     
GPR20:     
GPR24: 0001 0010 0006 000138da8aa0 
GPR28: 7fffa95fc2c8 000138da8aa0 0006 000138da6930 
NIP [7fffa9520c34] 0x7fffa9520c34
LR [0001024754bc] 0x1024754bc
--- interrupt: c00
Instruction dump:
3b42 3ba1 3b80 7f26cb78 7fc5f378 7f64db78 7fe3fb78 4bfde5b9 
6000 7c691b78 39030034 7c0004ac <7d404028> 7c0ae800 40c20010 7f80412d 
---[ end trace  ]---

Kernel panic - not syncing: Fatal exception

Bisect points to following patch:
commit f2f3c25dea3acfb17aecb7273541e7266dfc8842
hugetlb: freeze allocated pages before creating hugetlb pages

Reverting the patch allows the test to run successfully.

Thanks
- Sachin

[1] 
https://github.com/avocado-framework-tests/avocado-misc-tests/blob/master/memory/transparent_hugepages_defrag.py

68 matches

Mail list logo