Re: [PATCH 2/4] ibmvfc: fix invalid sub-CRQ handles after hard reset

2021-02-16 Thread Brian King
Reviewed-by: Brian King 


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center



Re: [PATCH 3/4] ibmvfc: treat H_CLOSED as success during sub-CRQ registration

2021-02-16 Thread Brian King
Reviewed-by: Brian King 


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center



Re: [PATCH 4/4] ibmvfc: store return code of H_FREE_SUB_CRQ during cleanup

2021-02-16 Thread Brian King
On 2/11/21 12:57 PM, Tyrel Datwyler wrote:
> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
> index ba6fcf9cbc57..23b803ac4a13 100644
> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
> @@ -5670,7 +5670,7 @@ static int ibmvfc_register_scsi_channel(struct 
> ibmvfc_host *vhost,
>  
>  irq_failed:
>   do {
> - plpar_hcall_norets(H_FREE_SUB_CRQ, vdev->unit_address, 
> scrq->cookie);
> + rc = plpar_hcall_norets(H_FREE_SUB_CRQ, vdev->unit_address, 
> scrq->cookie);
>   } while (rc == H_BUSY || H_IS_LONG_BUSY(rc));

Other places in the driver where we get a busy return code back we have an 
msleep(100).
Should we be doing that here as well?

Thanks,

Brian

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center



Re: [PATCH 1/4] add generic builtin command line

2021-02-16 Thread Christophe Leroy

Daniel Gimpelevich  a écrit :


On Thu, 2019-03-21 at 15:15 -0700, Andrew Morton wrote:

On Thu, 21 Mar 2019 08:13:08 -0700 Daniel Walker  wrote:
> On Wed, Mar 20, 2019 at 08:14:33PM -0700, Andrew Morton wrote:
> > The patches (or some version of them) are already in linux-next,
> > which messes me up.  I'll disable them for now.
>
> Those are from my tree, but I remove them when you picked up the  
series. The

> next linux-next should not have them.

Yup, thanks, all looks good now.


This patchset is currently neither in mainline nor in -next. May I ask
what happened to it? Thanks.


As far as I remember, there has been a lot of discussion around this series.

As of today, it doesn't apply cleanly anymore and would require rebasing.

I'd suggest also to find the good arguments to convince us that this  
series has a real added value, not just "cisco use it in its kernels  
so it is good".


I proposed an alternative at  
https://patchwork.ozlabs.org/project/linuxppc-dev/cover/cover.1554195798.git.christophe.le...@c-s.fr/ but never got any feedback so I gave  
up.


If you submit a new series, don't forget to copy ppclinux-dev and  
linux-arch lists.


Christophe




Re: [PATCH 1/4] add generic builtin command line

2021-02-16 Thread Daniel Walker
On Mon, Feb 15, 2021 at 11:32:01AM -0800, Daniel Gimpelevich wrote:
> On Thu, 2019-03-21 at 15:15 -0700, Andrew Morton wrote:
> > On Thu, 21 Mar 2019 08:13:08 -0700 Daniel Walker  wrote:
> > > On Wed, Mar 20, 2019 at 08:14:33PM -0700, Andrew Morton wrote:
> > > > The patches (or some version of them) are already in linux-next,
> > > > which messes me up.  I'll disable them for now.
> > >  
> > > Those are from my tree, but I remove them when you picked up the series. 
> > > The
> > > next linux-next should not have them.
> > 
> > Yup, thanks, all looks good now.
> 
> This patchset is currently neither in mainline nor in -next. May I ask
> what happened to it? Thanks.
> 

It was dropped silently by Andrew at some point. I wasn't watching -next closely
to know when. I have no idea why he dropped it.

We still use this series extensively in Cisco, and have extended it beyond this
current series.

We can re-submit.

Daniel


Re: [PATCH 4/4] ibmvfc: store return code of H_FREE_SUB_CRQ during cleanup

2021-02-16 Thread Tyrel Datwyler
On 2/16/21 6:58 AM, Brian King wrote:
> On 2/11/21 12:57 PM, Tyrel Datwyler wrote:
>> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
>> index ba6fcf9cbc57..23b803ac4a13 100644
>> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
>> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
>> @@ -5670,7 +5670,7 @@ static int ibmvfc_register_scsi_channel(struct 
>> ibmvfc_host *vhost,
>>  
>>  irq_failed:
>>  do {
>> -plpar_hcall_norets(H_FREE_SUB_CRQ, vdev->unit_address, 
>> scrq->cookie);
>> +rc = plpar_hcall_norets(H_FREE_SUB_CRQ, vdev->unit_address, 
>> scrq->cookie);
>>  } while (rc == H_BUSY || H_IS_LONG_BUSY(rc));
> 
> Other places in the driver where we get a busy return code back we have an 
> msleep(100).
> Should we be doing that here as well?

Indeed, and actually even better would be to use rtas_busy_delay() which will
perform the sleep with the correct ms delay, and marks itself with the
might_sleep() macro.

-Tyrel

> 
> Thanks,
> 
> Brian
> 



Re: [PATCH 1/4] add generic builtin command line

2021-02-16 Thread Daniel Gimpelevich
On Tue, 2021-02-16 at 18:42 +0100, Christophe Leroy wrote:
> I'd suggest also to find the good arguments to convince us that this  
> series has a real added value, not just "cisco use it in its kernels  
> so it is good".

Well, IIRC, this series was endorsed by the device tree maintainers as
the preferred alternative to this:

https://lore.kernel.org/linux-devicetree/1565020400-25679-1-git-send-email-dan...@gimpelevich.san-francisco.ca.us/T/#u

The now-defunct patchwork.linux-mips.org link in that thread pointed to:

https://lore.kernel.org/linux-mips/1510796793.16864.25.camel@chimera/T/#u

When running modern kernels from ancient vendor bootloaders, it is
sometimes necessary to pick and choose bits and pieces of the info they
pass without taking it verbatim.



Re: [PATCH kernel 1/2] powerpc/iommu: Allocate it_map by vmalloc

2021-02-16 Thread David Gibson
On Tue, Feb 16, 2021 at 02:33:06PM +1100, Alexey Kardashevskiy wrote:
> The IOMMU table uses the it_map bitmap to keep track of allocated DMA
> pages. This has always been a contiguous array allocated at either
> the boot time or when a passed through device is returned to the host OS.
> The it_map memory is allocated by alloc_pages() which allocates
> contiguous physical memory.
> 
> Such allocation method occasionally creates a problem when there is
> no big chunk of memory available (no free memory or too fragmented).
> On powernv/ioda2 the default DMA window requires 16MB for it_map.
> 
> This replaces alloc_pages_node() with vzalloc_node() which allocates
> contiguous block but in virtual memory. This should reduce changes of
> failure but should not cause other behavioral changes as it_map is only
> used by the kernel's DMA hooks/api when MMU is on.
> 
> Signed-off-by: Alexey Kardashevskiy 

Reviewed-by: David Gibson 

> ---
>  arch/powerpc/kernel/iommu.c | 15 +++
>  1 file changed, 3 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> index c00214a4355c..8eb6eb0afa97 100644
> --- a/arch/powerpc/kernel/iommu.c
> +++ b/arch/powerpc/kernel/iommu.c
> @@ -719,7 +719,6 @@ struct iommu_table *iommu_init_table(struct iommu_table 
> *tbl, int nid,
>  {
>   unsigned long sz;
>   static int welcomed = 0;
> - struct page *page;
>   unsigned int i;
>   struct iommu_pool *p;
>  
> @@ -728,11 +727,9 @@ struct iommu_table *iommu_init_table(struct iommu_table 
> *tbl, int nid,
>   /* number of bytes needed for the bitmap */
>   sz = BITS_TO_LONGS(tbl->it_size) * sizeof(unsigned long);
>  
> - page = alloc_pages_node(nid, GFP_KERNEL, get_order(sz));
> - if (!page)
> + tbl->it_map = vzalloc_node(sz, nid);
> + if (!tbl->it_map)
>   panic("iommu_init_table: Can't allocate %ld bytes\n", sz);
> - tbl->it_map = page_address(page);
> - memset(tbl->it_map, 0, sz);
>  
>   iommu_table_reserve_pages(tbl, res_start, res_end);
>  
> @@ -774,8 +771,6 @@ struct iommu_table *iommu_init_table(struct iommu_table 
> *tbl, int nid,
>  
>  static void iommu_table_free(struct kref *kref)
>  {
> - unsigned long bitmap_sz;
> - unsigned int order;
>   struct iommu_table *tbl;
>  
>   tbl = container_of(kref, struct iommu_table, it_kref);
> @@ -796,12 +791,8 @@ static void iommu_table_free(struct kref *kref)
>   if (!bitmap_empty(tbl->it_map, tbl->it_size))
>   pr_warn("%s: Unexpected TCEs\n", __func__);
>  
> - /* calculate bitmap size in bytes */
> - bitmap_sz = BITS_TO_LONGS(tbl->it_size) * sizeof(unsigned long);
> -
>   /* free bitmap */
> - order = get_order(bitmap_sz);
> - free_pages((unsigned long) tbl->it_map, order);
> + vfree(tbl->it_map);
>  
>   /* free table */
>   kfree(tbl);

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v4 2/3] KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE

2021-02-16 Thread David Gibson
On Mon, Feb 15, 2021 at 12:05:41PM +0530, Bharata B Rao wrote:
> Implement H_RPT_INVALIDATE hcall and add KVM capability
> KVM_CAP_PPC_RPT_INVALIDATE to indicate the support for the same.
> 
> This hcall does two types of TLB invalidations:
> 
> 1. Process-scoped invalidations for guests with LPCR[GTSE]=0.
>This is currently not used in KVM as GTSE is not usually
>disabled in KVM.
> 2. Partition-scoped invalidations that an L1 hypervisor does on
>behalf of an L2 guest. This replaces the uses of the existing
>hcall H_TLB_INVALIDATE.
> 
> In order to handle process scoped invalidations of L2, we
> intercept the nested exit handling code in L0 only to handle
> H_TLB_INVALIDATE hcall.
> 
> Signed-off-by: Bharata B Rao 
> ---
>  Documentation/virt/kvm/api.rst | 17 +
>  arch/powerpc/include/asm/kvm_book3s.h  |  3 +
>  arch/powerpc/include/asm/mmu_context.h | 11 +++
>  arch/powerpc/kvm/book3s_hv.c   | 91 
>  arch/powerpc/kvm/book3s_hv_nested.c| 96 ++
>  arch/powerpc/kvm/powerpc.c |  3 +
>  arch/powerpc/mm/book3s64/radix_tlb.c   | 25 +++
>  include/uapi/linux/kvm.h   |  1 +
>  8 files changed, 247 insertions(+)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 99ceb978c8b0..416c36aa35d4 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6038,6 +6038,23 @@ KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR exit 
> notifications which user space
>  can then handle to implement model specific MSR handling and/or user 
> notifications
>  to inform a user that an MSR was not handled.
>  
> +7.22 KVM_CAP_PPC_RPT_INVALIDATE
> +--
> +
> +:Capability: KVM_CAP_PPC_RPT_INVALIDATE
> +:Architectures: ppc
> +:Type: vm
> +
> +This capability indicates that the kernel is capable of handling
> +H_RPT_INVALIDATE hcall.
> +
> +In order to enable the use of H_RPT_INVALIDATE in the guest,
> +user space might have to advertise it for the guest. For example,
> +IBM pSeries (sPAPR) guest starts using it if "hcall-rpt-invalidate" is
> +present in the "ibm,hypertas-functions" device-tree property.
> +
> +This capability is always enabled.

I guess that means it's always enabled when it's available - I'm
pretty sure it won't be enabled on POWER8 or on PR KVM.

> +
>  8. Other capabilities.
>  ==
>  
> diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
> b/arch/powerpc/include/asm/kvm_book3s.h
> index d32ec9ae73bd..0f1c5fa6e8ce 100644
> --- a/arch/powerpc/include/asm/kvm_book3s.h
> +++ b/arch/powerpc/include/asm/kvm_book3s.h
> @@ -298,6 +298,9 @@ void kvmhv_set_ptbl_entry(unsigned int lpid, u64 dw0, u64 
> dw1);
>  void kvmhv_release_all_nested(struct kvm *kvm);
>  long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu);
>  long kvmhv_do_nested_tlbie(struct kvm_vcpu *vcpu);
> +long kvmhv_h_rpti_nested(struct kvm_vcpu *vcpu, unsigned long lpid,
> +  unsigned long type, unsigned long pg_sizes,
> +  unsigned long start, unsigned long end);
>  int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu,
> u64 time_limit, unsigned long lpcr);
>  void kvmhv_save_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr);
> diff --git a/arch/powerpc/include/asm/mmu_context.h 
> b/arch/powerpc/include/asm/mmu_context.h
> index d5821834dba9..fbf3b5b45fe9 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -124,8 +124,19 @@ static inline bool need_extra_context(struct mm_struct 
> *mm, unsigned long ea)
>  
>  #if defined(CONFIG_KVM_BOOK3S_HV_POSSIBLE) && defined(CONFIG_PPC_RADIX_MMU)
>  extern void radix_kvm_prefetch_workaround(struct mm_struct *mm);
> +void do_h_rpt_invalidate(unsigned long pid, unsigned long lpid,
> +  unsigned long type, unsigned long page_size,
> +  unsigned long psize, unsigned long start,
> +  unsigned long end);
>  #else
>  static inline void radix_kvm_prefetch_workaround(struct mm_struct *mm) { }
> +static inline void do_h_rpt_invalidate(unsigned long pid,
> +unsigned long lpid,
> +unsigned long type,
> +unsigned long page_size,
> +unsigned long psize,
> +unsigned long start,
> +unsigned long end) { }
>  #endif
>  
>  extern void switch_cop(struct mm_struct *next);
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 6f612d240392..802cb77c39cc 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -904,6 +904,64 @@ static int kvmppc_get_yield_count(struct kvm_vcpu *vcpu)
>   return yield_count;
>  }
>  
> +static void do_h_rpt_invalidate_prs(unsigned long pid,

Re: [PATCH kernel 2/2] powerpc/iommu: Do not immediately panic when failed IOMMU table allocation

2021-02-16 Thread David Gibson
On Tue, Feb 16, 2021 at 02:33:07PM +1100, Alexey Kardashevskiy wrote:
> Most platforms allocate IOMMU table structures (specifically it_map)
> at the boot time and when this fails - it is a valid reason for panic().
> 
> However the powernv platform allocates it_map after a device is returned
> to the host OS after being passed through and this happens long after
> the host OS booted. It is quite possible to trigger the it_map allocation
> panic() and kill the host even though it is not necessary - the host OS
> can still use the DMA bypass mode (requires a tiny fraction of it_map's
> memory) and even if that fails, the host OS is runnnable as it was without
> the device for which allocating it_map causes the panic.
> 
> Instead of immediately crashing in a powernv/ioda2 system, this prints
> an error and continues. All other platforms still call panic().
> 
> Signed-off-by: Alexey Kardashevskiy 

Reviewed-by: David Gibson 

> ---
>  arch/powerpc/kernel/iommu.c   |  6 --
>  arch/powerpc/platforms/cell/iommu.c   |  3 ++-
>  arch/powerpc/platforms/pasemi/iommu.c |  4 +++-
>  arch/powerpc/platforms/powernv/pci-ioda.c | 15 ---
>  arch/powerpc/platforms/pseries/iommu.c| 10 +++---
>  arch/powerpc/sysdev/dart_iommu.c  |  3 ++-
>  6 files changed, 26 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> index 8eb6eb0afa97..c1a5c366a664 100644
> --- a/arch/powerpc/kernel/iommu.c
> +++ b/arch/powerpc/kernel/iommu.c
> @@ -728,8 +728,10 @@ struct iommu_table *iommu_init_table(struct iommu_table 
> *tbl, int nid,
>   sz = BITS_TO_LONGS(tbl->it_size) * sizeof(unsigned long);
>  
>   tbl->it_map = vzalloc_node(sz, nid);
> - if (!tbl->it_map)
> - panic("iommu_init_table: Can't allocate %ld bytes\n", sz);
> + if (!tbl->it_map) {
> + pr_err("%s: Can't allocate %ld bytes\n", __func__, sz);
> + return NULL;
> + }
>  
>   iommu_table_reserve_pages(tbl, res_start, res_end);
>  
> diff --git a/arch/powerpc/platforms/cell/iommu.c 
> b/arch/powerpc/platforms/cell/iommu.c
> index 2124831cf57c..fa08699aedeb 100644
> --- a/arch/powerpc/platforms/cell/iommu.c
> +++ b/arch/powerpc/platforms/cell/iommu.c
> @@ -486,7 +486,8 @@ cell_iommu_setup_window(struct cbe_iommu *iommu, struct 
> device_node *np,
>   window->table.it_size = size >> window->table.it_page_shift;
>   window->table.it_ops = &cell_iommu_ops;
>  
> - iommu_init_table(&window->table, iommu->nid, 0, 0);
> + if (!iommu_init_table(&window->table, iommu->nid, 0, 0))
> + panic("Failed to initialize iommu table");
>  
>   pr_debug("\tioid  %d\n", window->ioid);
>   pr_debug("\tblocksize %ld\n", window->table.it_blocksize);
> diff --git a/arch/powerpc/platforms/pasemi/iommu.c 
> b/arch/powerpc/platforms/pasemi/iommu.c
> index b500a6e47e6b..5be7242fbd86 100644
> --- a/arch/powerpc/platforms/pasemi/iommu.c
> +++ b/arch/powerpc/platforms/pasemi/iommu.c
> @@ -146,7 +146,9 @@ static void iommu_table_iobmap_setup(void)
>*/
>   iommu_table_iobmap.it_blocksize = 4;
>   iommu_table_iobmap.it_ops = &iommu_table_iobmap_ops;
> - iommu_init_table(&iommu_table_iobmap, 0, 0, 0);
> + if (!iommu_init_table(&iommu_table_iobmap, 0, 0, 0))
> + panic("Failed to initialize iommu table");
> +
>   pr_debug(" <- %s\n", __func__);
>  }
>  
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> b/arch/powerpc/platforms/powernv/pci-ioda.c
> index f0f901683a2f..66c3c3337334 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -1762,7 +1762,8 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb 
> *phb,
>   tbl->it_ops = &pnv_ioda1_iommu_ops;
>   pe->table_group.tce32_start = tbl->it_offset << tbl->it_page_shift;
>   pe->table_group.tce32_size = tbl->it_size << tbl->it_page_shift;
> - iommu_init_table(tbl, phb->hose->node, 0, 0);
> + if (!iommu_init_table(tbl, phb->hose->node, 0, 0))
> + panic("Failed to initialize iommu table");
>  
>   pe->dma_setup_done = true;
>   return;
> @@ -1930,16 +1931,16 @@ static long pnv_pci_ioda2_setup_default_config(struct 
> pnv_ioda_pe *pe)
>   res_start = pe->phb->ioda.m32_pci_base >> tbl->it_page_shift;
>   res_end = min(window_size, SZ_4G) >> tbl->it_page_shift;
>   }
> - iommu_init_table(tbl, pe->phb->hose->node, res_start, res_end);
>  
> - rc = pnv_pci_ioda2_set_window(&pe->table_group, 0, tbl);
> + if (iommu_init_table(tbl, pe->phb->hose->node, res_start, res_end))
> + rc = pnv_pci_ioda2_set_window(&pe->table_group, 0, tbl);
> + else
> + rc = -ENOMEM;
>   if (rc) {
> - pe_err(pe, "Failed to configure 32-bit TCE table, err %ld\n",
> - rc);
> + pe_err(pe, "Failed to configure 32-bit TCE table, err %ld

Re: [PATCH v4 1/3] powerpc/book3s64/radix/tlb: tlbie primitives for process-scoped invalidations from guests

2021-02-16 Thread David Gibson
On Mon, Feb 15, 2021 at 12:05:40PM +0530, Bharata B Rao wrote:
> H_RPT_INVALIDATE hcall needs to perform process scoped tlbie
> invalidations of L1 and nested guests from L0. This needs RS register
> for TLBIE instruction to contain both PID and LPID. Introduce
> primitives that execute tlbie instruction with both PID
> and LPID set in prepartion for H_RPT_INVALIDATE hcall.
> 
> While we are here, move RIC_FLUSH definitions to header file
> and introduce helper rpti_pgsize_to_psize() that will be needed
> by the upcoming hcall.
> 
> Signed-off-by: Bharata B Rao 
> ---
>  .../include/asm/book3s/64/tlbflush-radix.h|  18 +++
>  arch/powerpc/mm/book3s64/radix_tlb.c  | 122 +-
>  2 files changed, 136 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h 
> b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
> index 94439e0cefc9..aace7e9b2397 100644
> --- a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
> +++ b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
> @@ -4,6 +4,10 @@
>  
>  #include 
>  
> +#define RIC_FLUSH_TLB 0
> +#define RIC_FLUSH_PWC 1
> +#define RIC_FLUSH_ALL 2
> +
>  struct vm_area_struct;
>  struct mm_struct;
>  struct mmu_gather;
> @@ -21,6 +25,20 @@ static inline u64 psize_to_rpti_pgsize(unsigned long psize)
>   return H_RPTI_PAGE_ALL;
>  }
>  
> +static inline int rpti_pgsize_to_psize(unsigned long page_size)
> +{
> + if (page_size == H_RPTI_PAGE_4K)
> + return MMU_PAGE_4K;
> + if (page_size == H_RPTI_PAGE_64K)
> + return MMU_PAGE_64K;
> + if (page_size == H_RPTI_PAGE_2M)
> + return MMU_PAGE_2M;
> + if (page_size == H_RPTI_PAGE_1G)
> + return MMU_PAGE_1G;
> + else
> + return MMU_PAGE_64K; /* Default */
> +}

Would it make sense to put the H_RPT_PAGE_ tags into the
mmu_psize_defs table and scan that here, rather than open coding the
conversion?

> +
>  static inline int mmu_get_ap(int psize)
>  {
>   return mmu_psize_defs[psize].ap;
> diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c 
> b/arch/powerpc/mm/book3s64/radix_tlb.c
> index fb66d154b26c..097402435303 100644
> --- a/arch/powerpc/mm/book3s64/radix_tlb.c
> +++ b/arch/powerpc/mm/book3s64/radix_tlb.c
> @@ -18,10 +18,6 @@
>  #include 
>  #include 
>  
> -#define RIC_FLUSH_TLB 0
> -#define RIC_FLUSH_PWC 1
> -#define RIC_FLUSH_ALL 2
> -
>  /*
>   * tlbiel instruction for radix, set invalidation
>   * i.e., r=1 and is=01 or is=10 or is=11
> @@ -128,6 +124,21 @@ static __always_inline void __tlbie_pid(unsigned long 
> pid, unsigned long ric)
>   trace_tlbie(0, 0, rb, rs, ric, prs, r);
>  }
>  
> +static __always_inline void __tlbie_pid_lpid(unsigned long pid,
> +  unsigned long lpid,
> +  unsigned long ric)
> +{
> + unsigned long rb, rs, prs, r;
> +
> + rb = PPC_BIT(53); /* IS = 1 */
> + rs = (pid << PPC_BITLSHIFT(31)) | (lpid & ~(PPC_BITMASK(0, 31)));
> + prs = 1; /* process scoped */
> + r = 1;   /* radix format */
> +
> + asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1)
> +  : : "r"(rb), "i"(r), "i"(prs), "i"(ric), "r"(rs) : 
> "memory");
> + trace_tlbie(0, 0, rb, rs, ric, prs, r);
> +}
>  static __always_inline void __tlbie_lpid(unsigned long lpid, unsigned long 
> ric)
>  {
>   unsigned long rb,rs,prs,r;
> @@ -188,6 +199,23 @@ static __always_inline void __tlbie_va(unsigned long va, 
> unsigned long pid,
>   trace_tlbie(0, 0, rb, rs, ric, prs, r);
>  }
>  
> +static __always_inline void __tlbie_va_lpid(unsigned long va, unsigned long 
> pid,
> + unsigned long lpid,
> + unsigned long ap, unsigned long ric)
> +{
> + unsigned long rb, rs, prs, r;
> +
> + rb = va & ~(PPC_BITMASK(52, 63));
> + rb |= ap << PPC_BITLSHIFT(58);
> + rs = (pid << PPC_BITLSHIFT(31)) | (lpid & ~(PPC_BITMASK(0, 31)));
> + prs = 1; /* process scoped */
> + r = 1;   /* radix format */
> +
> + asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1)
> +  : : "r"(rb), "i"(r), "i"(prs), "i"(ric), "r"(rs) : 
> "memory");
> + trace_tlbie(0, 0, rb, rs, ric, prs, r);
> +}
> +
>  static __always_inline void __tlbie_lpid_va(unsigned long va, unsigned long 
> lpid,
>   unsigned long ap, unsigned long ric)
>  {
> @@ -233,6 +261,22 @@ static inline void fixup_tlbie_va_range(unsigned long 
> va, unsigned long pid,
>   }
>  }
>  
> +static inline void fixup_tlbie_va_range_lpid(unsigned long va,
> +  unsigned long pid,
> +  unsigned long lpid,
> +  unsigned long ap)
> +{
> + if (cpu_has_feature(CPU_FTR_P9_TLBIE_ERAT_BUG)) {
> + asm volatile("ptesync" : : : "memory");
> + __tlbie_pid_lp

Re: [PATCH 2/2] powerpc/uaccess: Move might_fault() into user_access_begin()

2021-02-16 Thread Michael Ellerman
Christophe Leroy  writes:
> Le 08/02/2021 à 14:57, Michael Ellerman a écrit :
>> We have a might_fault() check in __unsafe_put_user_goto(), but that is
>> dangerous as it potentially calls lots of code while user access is
>> enabled.
>> 
>> It also triggers the check Alexey added to the irq restore path to
>> catch cases like that:
>> 
>>WARNING: CPU: 30 PID: 1 at arch/powerpc/include/asm/book3s/64/kup.h:324 
>> arch_local_irq_restore+0x160/0x190
>>NIP arch_local_irq_restore+0x160/0x190
>>LR  lock_is_held_type+0x140/0x200
>>Call Trace:
>>  0xc0007f392ff8 (unreliable)
>>  ___might_sleep+0x180/0x320
>>  __might_fault+0x50/0xe0
>>  filldir64+0x2d0/0x5d0
>>  call_filldir+0xc8/0x180
>>  ext4_readdir+0x948/0xb40
>>  iterate_dir+0x1ec/0x240
>>  sys_getdents64+0x80/0x290
>>  system_call_exception+0x160/0x280
>>  system_call_common+0xf0/0x27c
>> 
>> So remove the might fault check from unsafe_put_user().
>> 
>> Any call to unsafe_put_user() must be inside a region that's had user
>> access enabled with user_access_begin(), so move the might_fault() in
>> there. That also allows us to drop the is_kernel_addr() test, because
>> there should be no code using user_access_begin() in order to access a
>> kernel address.
>
> x86 and mips only have might_fault() on get_user() and put_user(),
> neither on __get_user() nor on __put_user() nor on the unsafe
> alternative.

Yeah, that's their choice, or perhaps it's by accident.

arm64 on the other hand has might_fault() in all variants.

A __get_user() can fault just as much as a get_user(), so there's no
reason the check should be omitted from __get_user(), other than perhaps
some historical argument about __get_user() being the "fast" case.

> When have might_fault() in __get_user_nocheck() that is used by
> __get_user() and __get_user_allowed() ie by unsafe_get_user().
>
> Shoudln't those be dropped as well ?

That was handled by Alexey's patch, which I ended up merging with this
one:

  https://git.kernel.org/torvalds/c/7d506ca97b665b95e698a53697dad99fae813c1a


ie. we still have might_fault() in __get_user_nocheck(), but it's
guarded by a check of do_allow, so we won't call it for
__get_user_allowed().

So I think the code (in my next branch) is correct, we don't have any
might_fault() calls in unsafe regions.

But I'd still be happier if we could simplify our uaccess.h more, it's a
bit of a rats nest. We could start by making __get/put_user() ==
get/put_user() the same way arm64 did.

cheers