Re: [PATCH] KVM: PPC: Book3S: Assign boolean values to a bool variable

2020-12-14 Thread Paul Mackerras
On Sat, Nov 07, 2020 at 02:26:22PM +0800, xiakaixu1...@gmail.com wrote:
> From: Kaixu Xia 
> 
> Fix the following coccinelle warnings:
> 
> ./arch/powerpc/kvm/book3s_xics.c:476:3-15: WARNING: Assignment of 0/1 to bool 
> variable
> ./arch/powerpc/kvm/book3s_xics.c:504:3-15: WARNING: Assignment of 0/1 to bool 
> variable
> 
> Reported-by: Tosk Robot 
> Signed-off-by: Kaixu Xia 

Acked-by: Paul Mackerras 


Re: [PATCH v2 1/1] powerpc/kvm: Fix mask size for emulated msgsndp

2020-12-14 Thread Paul Mackerras
On Tue, Dec 08, 2020 at 06:57:08PM -0300, Leonardo Bras wrote:
> According to ISAv3.1 and ISAv3.0b, the msgsndp is described to split RB in:
> msgtype <- (RB) 32:36
> payload <- (RB) 37:63
> t   <- (RB) 57:63
> 
> The current way of getting 'msgtype', and 't' is missing their MSB:
> msgtype: ((arg >> 27) & 0xf) : Gets (RB) 33:36, missing bit 32
> t:   (arg &= 0x3f)   : Gets (RB) 58:63, missing bit 57
> 
> Fixes this by applying the correct mask.
> 
> Signed-off-by: Leonardo Bras 

Acked-by: Paul Mackerras 


Re: [PATCH v2 4/4] KVM: PPC: Introduce new capability for 2nd DAWR

2020-12-08 Thread Paul Mackerras
On Tue, Nov 24, 2020 at 04:29:53PM +0530, Ravi Bangoria wrote:
> Introduce KVM_CAP_PPC_DAWR1 which can be used by Qemu to query whether
> kvm supports 2nd DAWR or not.

This should be described in Documentation/virt/kvm/api.rst.

Strictly speaking, it should be a capability which is disabled by
default, so the guest can only do the H_SET_MODE to set DAWR[X]1 if it
has been explicitly permitted to do so by userspace (QEMU).  This is
because we want as little as possible of the VM configuration to come
from the host capabilities rather than from what userspace configures.

So what we really need here is for this to be a capability which can
be queried by userspace to find out if it is possible, and then
enabled by userspace if it wants.  See how KVM_CAP_PPC_NESTED_HV is
handled for example.

Paul.


Re: [PATCH -next v2] KVM: PPC: Book3S HV: XIVE: Convert to DEFINE_SHOW_ATTRIBUTE

2020-09-22 Thread Paul Mackerras
On Sat, Sep 19, 2020 at 09:29:25AM +0800, Qinglang Miao wrote:
> Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.
> 
> Signed-off-by: Qinglang Miao 

Thanks, applied.

Paul.


Re: [PATCH -next] powerpc/kvm/books: Fix symbol undeclared warnings

2020-09-22 Thread Paul Mackerras
On Mon, Sep 21, 2020 at 11:22:11AM +, Wang Wensheng wrote:
> Build the kernel with `C=2`:
> arch/powerpc/kvm/book3s_hv_nested.c:572:25: warning: symbol
> 'kvmhv_alloc_nested' was not declared. Should it be static?
> arch/powerpc/kvm/book3s_64_mmu_radix.c:350:6: warning: symbol
> 'kvmppc_radix_set_pte_at' was not declared. Should it be static?
> arch/powerpc/kvm/book3s_hv.c:3568:5: warning: symbol
> 'kvmhv_p9_guest_entry' was not declared. Should it be static?
> arch/powerpc/kvm/book3s_hv_rm_xics.c:767:15: warning: symbol 'eoi_rc'
> was not declared. Should it be static?
> arch/powerpc/kvm/book3s_64_vio_hv.c:240:13: warning: symbol
> 'iommu_tce_kill_rm' was not declared. Should it be static?
> arch/powerpc/kvm/book3s_64_vio.c:492:6: warning: symbol
> 'kvmppc_tce_iommu_do_map' was not declared. Should it be static?
> arch/powerpc/kvm/book3s_pr.c:572:6: warning: symbol 'kvmppc_set_pvr_pr'
> was not declared. Should it be static?
> 
> Those symbols are used only in the files that define them so make them
> static to fix the warnings.
> 
> Signed-off-by: Wang Wensheng 

Thanks, applied.

Paul.


Re: [PATCH] KVM: PPC: Book3S: Remove redundant initialization of variable ret

2020-09-22 Thread Paul Mackerras
On Sat, Sep 19, 2020 at 03:12:30PM +0800, Jing Xiangfeng wrote:
> The variable ret is being initialized with '-ENOMEM' that is meaningless.
> So remove it.
> 
> Signed-off-by: Jing Xiangfeng 

Thanks, applied.

Paul.


Re: [PATCH V2] Doc: admin-guide: Add entry for kvm_cma_resv_ratio kernel param

2020-09-21 Thread Paul Mackerras
On Mon, Sep 21, 2020 at 02:32:20PM +0530, sathn...@linux.vnet.ibm.com wrote:
> From: Satheesh Rajendran 
> 
> Add document entry for kvm_cma_resv_ratio kernel param which
> is used to alter the KVM contiguous memory allocation percentage
> for hash pagetable allocation used by hash mode PowerPC KVM guests.
> 
> Cc: linux-kernel@vger.kernel.org
> Cc: kvm-...@vger.kernel.org
> Cc: linuxppc-...@lists.ozlabs.org
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: Jonathan Corbet 
> Reviewed-by: Randy Dunlap 
> Signed-off-by: Satheesh Rajendran 
> ---
> 
> V2: 
> Addressed review comments from Randy.
> 
> V1: https://lkml.org/lkml/2020/9/16/72
> ---
>  Documentation/admin-guide/kernel-parameters.txt | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index a1068742a6df..932ed45740c9 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2258,6 +2258,14 @@
>   [KVM,ARM] Allow use of GICv4 for direct injection of
>   LPIs.
>  
> + kvm_cma_resv_ratio=n [PPC]
> + Reserves given percentage from system memory area for
> + contiguous memory allocation for KVM hash pagetable
> + allocation.
> + By default it reserves 5% of total system memory.

I am concerned that using the term "reserve" here could give the
impression that this memory is then not available for any other use.
It is in fact available for other uses as long as they are movable
allocations.  So this memory is available for uses such as process
anonymous memory and page cache, just not for things like kmalloc.

I'm not sure what would be a better term than "reserve", though.
Perhaps we need to add a sentence something like "The reserved memory
is available for use as process memory and page cache when it is not
being used by KVM."

Paul.


Re: [PATCH -next] powerpc: Convert to DEFINE_SHOW_ATTRIBUTE

2020-09-01 Thread Paul Mackerras
On Thu, Jul 16, 2020 at 05:07:12PM +0800, Qinglang Miao wrote:
> From: Chen Huang 
> 
> Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.
> 
> Signed-off-by: Chen Huang 

For the arch/powerpc/kvm part:

Acked-by: Paul Mackerras 

I expect Michael Ellerman will take the patch through his tree.

Paul.


Re: [PATCH 0/7] powerpc/watchpoint: 2nd DAWR kvm enablement + selftests

2020-09-01 Thread Paul Mackerras
On Thu, Jul 23, 2020 at 03:50:51PM +0530, Ravi Bangoria wrote:
> Patch #1, #2 and #3 enables p10 2nd DAWR feature for Book3S kvm guest. DAWR
> is a hypervisor resource and thus H_SET_MODE hcall is used to set/unset it.
> A new case H_SET_MODE_RESOURCE_SET_DAWR1 is introduced in H_SET_MODE hcall
> for setting/unsetting 2nd DAWR. Also, new capability KVM_CAP_PPC_DAWR1 has
> been added to query 2nd DAWR support via kvm ioctl.
> 
> This feature also needs to be enabled in Qemu to really use it. I'll reply
> link to qemu patches once I post them in qemu-devel mailing list.
> 
> Patch #4, #5, #6 and #7 adds selftests to test 2nd DAWR.

If/when you resubmit these patches, please split the KVM patches into
a separate series, since the KVM patches would go via my tree whereas
I expect the selftests/powerpc patches would go through Michael
Ellerman's tree.

Paul.


Re: [PATCH 2/7] powerpc/watchpoint/kvm: Add infrastructure to support 2nd DAWR

2020-09-01 Thread Paul Mackerras
On Thu, Jul 23, 2020 at 03:50:53PM +0530, Ravi Bangoria wrote:
> kvm code assumes single DAWR everywhere. Add code to support 2nd DAWR.
> DAWR is a hypervisor resource and thus H_SET_MODE hcall is used to set/
> unset it. Introduce new case H_SET_MODE_RESOURCE_SET_DAWR1 for 2nd DAWR.

Is this the same interface as will be defined in PAPR and available
under PowerVM, or is it a new/different interface for KVM?

> Also, kvm will support 2nd DAWR only if CPU_FTR_DAWR1 is set.

In general QEMU wants to be able to control all aspects of the virtual
machine presented to the guest, meaning that just because a host has a
particular hardware capability does not mean we should automatically
present that capability to the guest.

In this case, QEMU will want a way to control whether the guest sees
the availability of the second DAWR/X registers or not, i.e. whether a
H_SET_MODE to set DAWR[X]1 will succeed or fail.

Paul.


Re: [PATCH 1/7] powerpc/watchpoint/kvm: Rename current DAWR macros and variables

2020-09-01 Thread Paul Mackerras
On Thu, Jul 23, 2020 at 03:50:52PM +0530, Ravi Bangoria wrote:
> Power10 is introducing second DAWR. Use real register names (with
> suffix 0) from ISA for current macros and variables used by kvm.

Most of this looks fine, but I think we should not change the existing
names in arch/powerpc/include/uapi/asm/kvm.h (and therefore also
Documentation/virt/kvm/api.rst).

> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 426f94582b7a..4dc18fe6a2bf 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -2219,8 +2219,8 @@ registers, find a list below:
>PPC KVM_REG_PPC_BESCR   64
>PPC KVM_REG_PPC_TAR 64
>PPC KVM_REG_PPC_DPDES   64
> -  PPC KVM_REG_PPC_DAWR64
> -  PPC KVM_REG_PPC_DAWRX   64
> +  PPC KVM_REG_PPC_DAWR0   64
> +  PPC KVM_REG_PPC_DAWRX0  64
>PPC KVM_REG_PPC_CIABR   64
>PPC KVM_REG_PPC_IC  64
>PPC KVM_REG_PPC_VTB 64
...
> diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
> b/arch/powerpc/include/uapi/asm/kvm.h
> index 264e266a85bf..38d61b73f5ed 100644
> --- a/arch/powerpc/include/uapi/asm/kvm.h
> +++ b/arch/powerpc/include/uapi/asm/kvm.h
> @@ -608,8 +608,8 @@ struct kvm_ppc_cpu_char {
>  #define KVM_REG_PPC_BESCR(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xa7)
>  #define KVM_REG_PPC_TAR  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xa8)
>  #define KVM_REG_PPC_DPDES(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xa9)
> -#define KVM_REG_PPC_DAWR (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xaa)
> -#define KVM_REG_PPC_DAWRX(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xab)
> +#define KVM_REG_PPC_DAWR0(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xaa)
> +#define KVM_REG_PPC_DAWRX0   (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xab)
>  #define KVM_REG_PPC_CIABR(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xac)
>  #define KVM_REG_PPC_IC   (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xad)
>  #define KVM_REG_PPC_VTB  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xae)

The existing names are an API, and if you change them you will break
compilation of existing userspace programs.  I don't see that adding
the '0' on the end is so important that we need to break userspace.

Paul.


Re: [PATCH v6 3/5] KVM: PPC: clean up redundant kvm_run parameters in assembly

2020-07-23 Thread Paul Mackerras
On Tue, Jun 23, 2020 at 09:14:16PM +0800, Tianjia Zhang wrote:
> In the current kvm version, 'kvm_run' has been included in the 'kvm_vcpu'
> structure. For historical reasons, many kvm-related function parameters
> retain the 'kvm_run' and 'kvm_vcpu' parameters at the same time. This
> patch does a unified cleanup of these remaining redundant parameters.

Thanks, patch applied to my kvm-ppc-next branch, with fixes.

Paul.


Re: [PATCH 2/2] KVM: PPC: Book3S HV: rework secure mem slot dropping

2020-07-20 Thread Paul Mackerras
On Wed, Jul 08, 2020 at 02:16:36PM +0200, Laurent Dufour wrote:
> Le 08/07/2020 à 13:25, Bharata B Rao a écrit :
> > On Fri, Jul 03, 2020 at 05:59:14PM +0200, Laurent Dufour wrote:
> > > When a secure memslot is dropped, all the pages backed in the secure 
> > > device
> > > (aka really backed by secure memory by the Ultravisor) should be paged out
> > > to a normal page. Previously, this was achieved by triggering the page
> > > fault mechanism which is calling kvmppc_svm_page_out() on each pages.
> > > 
> > > This can't work when hot unplugging a memory slot because the memory slot
> > > is flagged as invalid and gfn_to_pfn() is then not trying to access the
> > > page, so the page fault mechanism is not triggered.
> > > 
> > > Since the final goal is to make a call to kvmppc_svm_page_out() it seems
> > > simpler to directly calling it instead of triggering such a mechanism. 
> > > This
> > > way kvmppc_uvmem_drop_pages() can be called even when hot unplugging a
> > > memslot.
> > 
> > Yes, this appears much simpler.
> 
> Thanks Bharata for reviewing this.
> 
> > 
> > > 
> > > Since kvmppc_uvmem_drop_pages() is already holding kvm->arch.uvmem_lock,
> > > the call to __kvmppc_svm_page_out() is made.
> > > As __kvmppc_svm_page_out needs the vma pointer to migrate the pages, the
> > > VMA is fetched in a lazy way, to not trigger find_vma() all the time. In
> > > addition, the mmap_sem is help in read mode during that time, not in write
> > > mode since the virual memory layout is not impacted, and
> > > kvm->arch.uvmem_lock prevents concurrent operation on the secure device.
> > > 
> > > Cc: Ram Pai 
> > > Cc: Bharata B Rao 
> > > Cc: Paul Mackerras 
> > > Signed-off-by: Laurent Dufour 
> > > ---
> > >   arch/powerpc/kvm/book3s_hv_uvmem.c | 54 --
> > >   1 file changed, 37 insertions(+), 17 deletions(-)
> > > 
> > > diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
> > > b/arch/powerpc/kvm/book3s_hv_uvmem.c
> > > index 852cc9ae6a0b..479ddf16d18c 100644
> > > --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> > > +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> > > @@ -533,35 +533,55 @@ static inline int kvmppc_svm_page_out(struct 
> > > vm_area_struct *vma,
> > >* fault on them, do fault time migration to replace the device PTEs in
> > >* QEMU page table with normal PTEs from newly allocated pages.
> > >*/
> > > -void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
> > > +void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *slot,
> > >struct kvm *kvm, bool skip_page_out)
> > >   {
> > >   int i;
> > >   struct kvmppc_uvmem_page_pvt *pvt;
> > > - unsigned long pfn, uvmem_pfn;
> > > - unsigned long gfn = free->base_gfn;
> > > + struct page *uvmem_page;
> > > + struct vm_area_struct *vma = NULL;
> > > + unsigned long uvmem_pfn, gfn;
> > > + unsigned long addr, end;
> > > +
> > > + down_read(>mm->mmap_sem);
> > 
> > You should be using mmap_read_lock(kvm->mm) with recent kernels.
> 
> Absolutely, shame on me, I reviewed Michel's series about that!
> 
> Paul, Michael, could you fix that when pulling this patch or should I sent a
> whole new series?

Given that Ram has reworked his series, I think it would be best if
you rebase on top of his new series and re-send your series.

Thanks,
Paul.


Re: [PATCH 1/1] KVM/PPC: Fix typo on H_DISABLE_AND_GET hcall

2020-07-08 Thread Paul Mackerras
On Mon, Jul 06, 2020 at 09:48:12PM -0300, Leonardo Bras wrote:
> On PAPR+ the hcall() on 0x1B0 is called H_DISABLE_AND_GET, but got
> defined as H_DISABLE_AND_GETC instead.
> 
> This define was introduced with a typo in commit 
> ("[PATCH] powerpc: Extends HCALL interface for InfiniBand usage"), and was
> later used without having the typo noticed.
> 
> Signed-off-by: Leonardo Bras 

Acked-by: Paul Mackerras 

Since this hypercall is not implemented in KVM nor used by KVM guests,
I'll leave this one for Michael to pick up.

Paul.


Re: [PATCH v4 3/7] KVM: PPC: Remove redundant kvm_run from vcpu_arch

2020-05-26 Thread Paul Mackerras
On Mon, Apr 27, 2020 at 12:35:10PM +0800, Tianjia Zhang wrote:
> The 'kvm_run' field already exists in the 'vcpu' structure, which
> is the same structure as the 'kvm_run' in the 'vcpu_arch' and
> should be deleted.
> 
> Signed-off-by: Tianjia Zhang 

Thanks, patches 3 and 4 of this series applied to my kvm-ppc-next branch.

Paul.


Re: [PATCH -next] KVM: PPC: Book3S HV: remove redundant NULL check

2020-05-26 Thread Paul Mackerras
On Wed, Apr 01, 2020 at 09:09:03PM +0800, Chen Zhou wrote:
> Free function kfree() already does NULL check, so the additional
> check is unnecessary, just remove it.
> 
> Signed-off-by: Chen Zhou 

Thanks, applied to my kvm-ppc-next branch.

Paul.


Re: [PATCH] KVM: PPC: Book3S HV: read ibm,secure-memory nodes

2020-05-26 Thread Paul Mackerras
On Thu, Apr 16, 2020 at 06:27:15PM +0200, Laurent Dufour wrote:
> The newly introduced ibm,secure-memory nodes supersede the
> ibm,uv-firmware's property secure-memory-ranges.
> 
> Firmware will no more expose the secure-memory-ranges property so first
> read the new one and if not found rollback to the older one.
> 
> Signed-off-by: Laurent Dufour 

Thanks, applied to my kvm-ppc-next branch.

Paul.


Re: [PATCH] powerpc/kvm/radix: ignore kmemleak false positives

2020-05-26 Thread Paul Mackerras
On Wed, May 13, 2020 at 09:39:15AM -0400, Qian Cai wrote:
> kvmppc_pmd_alloc() and kvmppc_pte_alloc() allocate some memory but then
> pud_populate() and pmd_populate() will use __pa() to reference the newly
> allocated memory.
> 
> Since kmemleak is unable to track the physical memory resulting in false
> positives, silence those by using kmemleak_ignore().
> 
> unreferenced object 0xc000201c382a1000 (size 4096):
>  comm "qemu-kvm", pid 124828, jiffies 4295733767 (age 341.250s)
>  hex dump (first 32 bytes):
>c0 00 20 09 f4 60 03 87 c0 00 20 10 72 a0 03 87  .. ..` .r...
>c0 00 20 0e 13 a0 03 87 c0 00 20 1b dc c0 03 87  .. ... .
>  backtrace:
>[<4cc2790f>] kvmppc_create_pte+0x838/0xd20 [kvm_hv]
>kvmppc_pmd_alloc at arch/powerpc/kvm/book3s_64_mmu_radix.c:366
>(inlined by) kvmppc_create_pte at 
> arch/powerpc/kvm/book3s_64_mmu_radix.c:590
>[] kvmppc_book3s_instantiate_page+0x2e0/0x8c0 [kvm_hv]
>[] kvmppc_book3s_radix_page_fault+0x1b4/0x2b0 [kvm_hv]
>[<86dddc0e>] kvmppc_book3s_hv_page_fault+0x214/0x12a0 [kvm_hv]
>[<5ae9ccc2>] kvmppc_vcpu_run_hv+0xc5c/0x15f0 [kvm_hv]
>[] kvmppc_vcpu_run+0x34/0x48 [kvm]
>[] kvm_arch_vcpu_ioctl_run+0x314/0x420 [kvm]
>[<2543dd54>] kvm_vcpu_ioctl+0x33c/0x950 [kvm]
>[<48155cd6>] ksys_ioctl+0xd8/0x130
>[<41ffeaa7>] sys_ioctl+0x28/0x40
>[<4afc4310>] system_call_exception+0x114/0x1e0
>[] system_call_common+0xf0/0x278
> unreferenced object 0xc0002001f0c03900 (size 256):
>  comm "qemu-kvm", pid 124830, jiffies 4295735235 (age 326.570s)
>  hex dump (first 32 bytes):
>c0 00 20 10 fa a0 03 87 c0 00 20 10 fa a1 03 87  .. ... .
>c0 00 20 10 fa a2 03 87 c0 00 20 10 fa a3 03 87  .. ... .
>  backtrace:
>[<23f675b8>] kvmppc_create_pte+0x854/0xd20 [kvm_hv]
>kvmppc_pte_alloc at arch/powerpc/kvm/book3s_64_mmu_radix.c:356
>(inlined by) kvmppc_create_pte at 
> arch/powerpc/kvm/book3s_64_mmu_radix.c:593
>[] kvmppc_book3s_instantiate_page+0x2e0/0x8c0 [kvm_hv]
>[] kvmppc_book3s_radix_page_fault+0x1b4/0x2b0 [kvm_hv]
>[<86dddc0e>] kvmppc_book3s_hv_page_fault+0x214/0x12a0 [kvm_hv]
>[<5ae9ccc2>] kvmppc_vcpu_run_hv+0xc5c/0x15f0 [kvm_hv]
>[] kvmppc_vcpu_run+0x34/0x48 [kvm]
>[] kvm_arch_vcpu_ioctl_run+0x314/0x420 [kvm]
>[<2543dd54>] kvm_vcpu_ioctl+0x33c/0x950 [kvm]
>[<48155cd6>] ksys_ioctl+0xd8/0x130
>[<41ffeaa7>] sys_ioctl+0x28/0x40
>[<4afc4310>] system_call_exception+0x114/0x1e0
>[] system_call_common+0xf0/0x278
> 
> Signed-off-by: Qian Cai 

Thanks, applied to my kvm-ppc-next branch.

Paul.


Re: [PATCH v2] KVM: PPC: Book3S HV: relax check on H_SVM_INIT_ABORT

2020-05-26 Thread Paul Mackerras
On Wed, May 20, 2020 at 07:43:08PM +0200, Laurent Dufour wrote:
> The commit 8c47b6ff29e3 ("KVM: PPC: Book3S HV: Check caller of H_SVM_*
> Hcalls") added checks of secure bit of SRR1 to filter out the Hcall
> reserved to the Ultravisor.
> 
> However, the Hcall H_SVM_INIT_ABORT is made by the Ultravisor passing the
> context of the VM calling UV_ESM. This allows the Hypervisor to return to
> the guest without going through the Ultravisor. Thus the Secure bit of SRR1
> is not set in that particular case.
> 
> In the case a regular VM is calling H_SVM_INIT_ABORT, this hcall will be
> filtered out in kvmppc_h_svm_init_abort() because kvm->arch.secure_guest is
> not set in that case.
> 
> Fixes: 8c47b6ff29e3 ("KVM: PPC: Book3S HV: Check caller of H_SVM_* Hcalls")
> Signed-off-by: Laurent Dufour 

Thanks, applied to my kvm-ppc-next branch.  I expanded the comment in
the code a little.

Paul.


Re: [PATCH] powerpc/kvm/book3s64/vio: fix some RCU-list locks

2020-05-26 Thread Paul Mackerras
On Sun, May 10, 2020 at 01:18:34AM -0400, Qian Cai wrote:
> It is unsafe to traverse kvm->arch.spapr_tce_tables and
> stt->iommu_tables without the RCU read lock held. Also, add
> cond_resched_rcu() in places with the RCU read lock held that could take
> a while to finish.
> 
>  arch/powerpc/kvm/book3s_64_vio.c:76 RCU-list traversed in non-reader 
> section!!
> 
>  other info that might help us debug this:
> 
>  rcu_scheduler_active = 2, debug_locks = 1
>  no locks held by qemu-kvm/4265.
> 
>  stack backtrace:
>  CPU: 96 PID: 4265 Comm: qemu-kvm Not tainted 5.7.0-rc4-next-20200508+ #2
>  Call Trace:
>  [c000201a8690f720] [c0715948] dump_stack+0xfc/0x174 (unreliable)
>  [c000201a8690f770] [c01d9470] lockdep_rcu_suspicious+0x140/0x164
>  [c000201a8690f7f0] [c00810b9fb48] 
> kvm_spapr_tce_release_iommu_group+0x1f0/0x220 [kvm]
>  [c000201a8690f870] [c00810b8462c] 
> kvm_spapr_tce_release_vfio_group+0x54/0xb0 [kvm]
>  [c000201a8690f8a0] [c00810b84710] kvm_vfio_destroy+0x88/0x140 [kvm]
>  [c000201a8690f8f0] [c00810b7d488] kvm_put_kvm+0x370/0x600 [kvm]
>  [c000201a8690f990] [c00810b7e3c0] kvm_vm_release+0x38/0x60 [kvm]
>  [c000201a8690f9c0] [c05223f4] __fput+0x124/0x330
>  [c000201a8690fa20] [c0151cd8] task_work_run+0xb8/0x130
>  [c000201a8690fa70] [c01197e8] do_exit+0x4e8/0xfa0
>  [c000201a8690fb70] [c011a374] do_group_exit+0x64/0xd0
>  [c000201a8690fbb0] [c0132c90] get_signal+0x1f0/0x1200
>  [c000201a8690fcc0] [c0020690] do_notify_resume+0x130/0x3c0
>  [c000201a8690fda0] [c0038d64] syscall_exit_prepare+0x1a4/0x280
>  [c000201a8690fe20] [c000c8f8] system_call_common+0xf8/0x278
> 
>  
>  arch/powerpc/kvm/book3s_64_vio.c:368 RCU-list traversed in non-reader 
> section!!
> 
>  other info that might help us debug this:
> 
>  rcu_scheduler_active = 2, debug_locks = 1
>  2 locks held by qemu-kvm/4264:
>   #0: c000201ae2d000d8 (>mutex){+.+.}-{3:3}, at: 
> kvm_vcpu_ioctl+0xdc/0x950 [kvm]
>   #1: c000200c9ed0c468 (>srcu){}-{0:0}, at: 
> kvmppc_h_put_tce+0x88/0x340 [kvm]
> 
>  
>  arch/powerpc/kvm/book3s_64_vio.c:108 RCU-list traversed in non-reader 
> section!!
> 
>  other info that might help us debug this:
> 
>  rcu_scheduler_active = 2, debug_locks = 1
>  1 lock held by qemu-kvm/4257:
>   #0: c000200b1b363a40 (>lock){+.+.}-{3:3}, at: 
> kvm_vfio_set_attr+0x598/0x6c0 [kvm]
> 
>  
>  arch/powerpc/kvm/book3s_64_vio.c:146 RCU-list traversed in non-reader 
> section!!
> 
>  other info that might help us debug this:
> 
>  rcu_scheduler_active = 2, debug_locks = 1
>  1 lock held by qemu-kvm/4257:
>   #0: c000200b1b363a40 (>lock){+.+.}-{3:3}, at: 
> kvm_vfio_set_attr+0x598/0x6c0 [kvm]
> 
> Signed-off-by: Qian Cai 

Thanks, applied to my kvm-ppc-next branch, with the cond_resched_rcu()
in kvmppc_tce_validate removed.

Paul.


Re: [PATCH] powerpc/kvm/book3s64/vio: fix some RCU-list locks

2020-05-26 Thread Paul Mackerras
On Sun, May 10, 2020 at 01:18:34AM -0400, Qian Cai wrote:
> It is unsafe to traverse kvm->arch.spapr_tce_tables and
> stt->iommu_tables without the RCU read lock held. Also, add
> cond_resched_rcu() in places with the RCU read lock held that could take
> a while to finish.

This mostly looks fine.  The cond_resched_rcu() in kvmppc_tce_validate
doesn't seem necessary (the list would rarely have more than a few
dozen entries) and could be a performance problem given that TCE
validation is a hot-path.

Are you OK with me modifying the patch to take out that
cond_resched_rcu(), or is there some reason why it's essential that it
be there?

Paul.


Re: [linux-next PATCH] mm/gup.c: Convert to use get_user_{page|pages}_fast_only()

2020-05-26 Thread Paul Mackerras
On Mon, May 25, 2020 at 02:23:32PM +0530, Souptick Joarder wrote:
> API __get_user_pages_fast() renamed to get_user_pages_fast_only()
> to align with pin_user_pages_fast_only().
> 
> As part of this we will get rid of write parameter.
> Instead caller will pass FOLL_WRITE to get_user_pages_fast_only().
> This will not change any existing functionality of the API.
> 
> All the callers are changed to pass FOLL_WRITE.
> 
> Also introduce get_user_page_fast_only(), and use it in a few
> places that hard-code nr_pages to 1.
> 
> Updated the documentation of the API.
> 
> Signed-off-by: Souptick Joarder 

The arch/powerpc/kvm bits look reasonable.

Reviewed-by: Paul Mackerras 


Re: [PATCH v4 4/7] KVM: PPC: clean up redundant 'kvm_run' parameters

2020-05-25 Thread Paul Mackerras
On Mon, Apr 27, 2020 at 12:35:11PM +0800, Tianjia Zhang wrote:
> In the current kvm version, 'kvm_run' has been included in the 'kvm_vcpu'
> structure. For historical reasons, many kvm-related function parameters
> retain the 'kvm_run' and 'kvm_vcpu' parameters at the same time. This
> patch does a unified cleanup of these remaining redundant parameters.
> 
> Signed-off-by: Tianjia Zhang 

This looks OK, though possibly a little larger than it needs to be
because of variable name changes (kvm_run -> run) that aren't strictly
necessary.

Reviewed-by: Paul Mackerras 


Re: [PATCH v4 5/7] KVM: PPC: clean up redundant kvm_run parameters in assembly

2020-05-25 Thread Paul Mackerras
On Mon, Apr 27, 2020 at 12:35:12PM +0800, Tianjia Zhang wrote:
> In the current kvm version, 'kvm_run' has been included in the 'kvm_vcpu'
> structure. For historical reasons, many kvm-related function parameters
> retain the 'kvm_run' and 'kvm_vcpu' parameters at the same time. This
> patch does a unified cleanup of these remaining redundant parameters.

Some of these changes don't look completely correct to me, see below.
If you're expecting these patches to go through my tree, I can fix up
the patch and commit it (with you as author), noting the changes I
made in the commit message.  Do you want me to do that?

> diff --git a/arch/powerpc/kvm/book3s_interrupts.S 
> b/arch/powerpc/kvm/book3s_interrupts.S
> index f7ad99d972ce..0eff749d8027 100644
> --- a/arch/powerpc/kvm/book3s_interrupts.S
> +++ b/arch/powerpc/kvm/book3s_interrupts.S
> @@ -55,8 +55,7 @@
>   
> /
>  
>  /* Registers:
> - *  r3: kvm_run pointer
> - *  r4: vcpu pointer
> + *  r3: vcpu pointer
>   */
>  _GLOBAL(__kvmppc_vcpu_run)
>  
> @@ -68,8 +67,8 @@ kvm_start_entry:
>   /* Save host state to the stack */
>   PPC_STLU r1, -SWITCH_FRAME_SIZE(r1)
>  
> - /* Save r3 (kvm_run) and r4 (vcpu) */
> - SAVE_2GPRS(3, r1)
> + /* Save r3 (vcpu) */
> + SAVE_GPR(3, r1)
>  
>   /* Save non-volatile registers (r14 - r31) */
>   SAVE_NVGPRS(r1)
> @@ -82,11 +81,11 @@ kvm_start_entry:
>   PPC_STL r0, _LINK(r1)
>  
>   /* Load non-volatile guest state from the vcpu */
> - VCPU_LOAD_NVGPRS(r4)
> + VCPU_LOAD_NVGPRS(r3)
>  
>  kvm_start_lightweight:
>   /* Copy registers into shadow vcpu so we can access them in real mode */
> - mr  r3, r4
> + mr  r4, r3

This mr doesn't seem necessary.

>   bl  FUNC(kvmppc_copy_to_svcpu)
>   nop
>   REST_GPR(4, r1)

This should be loading r4 from GPR3(r1), not GPR4(r1) - which is what
REST_GPR(4, r1) will do.

Then, in the file but not in the patch context, there is this line:

PPC_LL  r3, GPR4(r1)/* vcpu pointer */

where once again GPR4 needs to be GPR3.

> @@ -191,10 +190,10 @@ after_sprg3_load:
>   PPC_STL r31, VCPU_GPR(R31)(r7)
>  
>   /* Pass the exit number as 3rd argument to kvmppc_handle_exit */

The comment should be modified to say "2nd" instead of "3rd",
otherwise it is confusing.

The rest of the patch looks OK.

Paul.


Re: [PATCH v4 3/7] KVM: PPC: Remove redundant kvm_run from vcpu_arch

2020-05-25 Thread Paul Mackerras
On Mon, Apr 27, 2020 at 12:35:10PM +0800, Tianjia Zhang wrote:
> The 'kvm_run' field already exists in the 'vcpu' structure, which
> is the same structure as the 'kvm_run' in the 'vcpu_arch' and
> should be deleted.
> 
> Signed-off-by: Tianjia Zhang 

This looks fine.

I assume each architecture sub-maintainer is taking the relevant
patches from this series via their tree - is that right?

Reviewed-by: Paul Mackerras 


Re: [PATCH] powerpc/kvm: Mark expected switch fall-through

2019-08-23 Thread Paul Mackerras
On Tue, Jul 30, 2019 at 04:46:37PM +0200, Paul Menzel wrote:
> Date: Tue, 30 Jul 2019 10:53:10 +0200
> 
> Fix the error below triggered by `-Wimplicit-fallthrough`, by tagging
> it as an expected fall-through.
> 
> arch/powerpc/kvm/book3s_32_mmu.c: In function 
> ‘kvmppc_mmu_book3s_32_xlate_pte’:
> arch/powerpc/kvm/book3s_32_mmu.c:241:21: error: this statement may fall 
> through [-Werror=implicit-fallthrough=]
>   pte->may_write = true;
>   ~~~^~
> arch/powerpc/kvm/book3s_32_mmu.c:242:5: note: here
>  case 3:
>  ^~~~
> 

Thanks, applied to my kvm-ppc-next branch.

Paul.


Re: [PATCH 1/2] powerpc: rewrite LOAD_REG_IMMEDIATE() as an intelligent macro

2019-08-13 Thread Paul Mackerras
On Tue, Aug 13, 2019 at 09:59:35AM +, Christophe Leroy wrote:

[snip]

> +.macro __LOAD_REG_IMMEDIATE r, x
> + .if \x & ~0x != 0
> + __LOAD_REG_IMMEDIATE_32 \r, (\x) >> 32
> + rldicr  \r, \r, 32, 31
> + .if (\x) & 0x != 0
> + oris \r, \r, (\x)@__AS_ATHIGH
> + .endif
> + .if (\x) & 0x != 0
> + oris \r, \r, (\x)@l
> + .endif
> + .else
> + __LOAD_REG_IMMEDIATE_32 \r, \x
> + .endif
> +.endm

Doesn't this force all negative constants, even small ones, to use
the long sequence?  For example, __LOAD_REG_IMMEDIATE r3, -1 will
generate (as far as I can see):

li  r3, -1
rldicr  r3, r3, 32, 31
orisr3, r3, 0x
ori r3, r3, 0x

which seems suboptimal.

Paul.


Re: [PATCH v3 14/16] powerpc/32: implement fast entry for syscalls on BOOKE

2019-05-23 Thread Paul Mackerras
On Tue, Apr 30, 2019 at 12:39:03PM +, Christophe Leroy wrote:
> This patch implements a fast entry for syscalls.
> 
> Syscalls don't have to preserve non volatile registers except LR.
> 
> This patch then implement a fast entry for syscalls, where
> volatile registers get clobbered.
> 
> As this entry is dedicated to syscall it always sets MSR_EE
> and warns in case MSR_EE was previously off
> 
> It also assumes that the call is always from user, system calls are
> unexpected from kernel.

This is now upstream as commit 1a4b739bbb4f.  On the e500mc test
config that I use, I'm getting this build failure:

arch/powerpc/kernel/head_fsl_booke.o: In function `SystemCall':
arch/powerpc/kernel/head_fsl_booke.S:416: undefined reference to 
`kvmppc_handler_BOOKE_INTERRUPT_SYSCALL_SPRN_SRR1'
Makefile:1052: recipe for target 'vmlinux' failed

> +.macro SYSCALL_ENTRY trapno intno
> + mfspr   r10, SPRN_SPRG_THREAD
> +#ifdef CONFIG_KVM_BOOKE_HV
> +BEGIN_FTR_SECTION
> + mtspr   SPRN_SPRG_WSCRATCH0, r10
> + stw r11, THREAD_NORMSAVE(0)(r10)
> + stw r13, THREAD_NORMSAVE(2)(r10)
> + mfcrr13 /* save CR in r13 for now  */
> + mfspr   r11, SPRN_SRR1
> + mtocrf  0x80, r11   /* check MSR[GS] without clobbering reg */
> + bf  3, 1975f
> + b   kvmppc_handler_BOOKE_INTERRUPT_\intno\()_SPRN_SRR1

It seems to me that the "_SPRN_SRR1" on the end of this line
isn't meant to be there...  However, it still fails to link with that
removed.

Paul.


Re: [RFC PATCH 02/12] powerpc: Add support for adding an ESM blob to the zImage wrapper

2019-05-21 Thread Paul Mackerras
On Tue, May 21, 2019 at 07:13:26AM +0200, Christoph Hellwig wrote:
> On Tue, May 21, 2019 at 01:49:02AM -0300, Thiago Jung Bauermann wrote:
> > From: Benjamin Herrenschmidt 
> > 
> > For secure VMs, the signing tool will create a ticket called the "ESM blob"
> > for the Enter Secure Mode ultravisor call with the signatures of the kernel
> > and initrd among other things.
> > 
> > This adds support to the wrapper script for adding that blob via the "-e"
> > option to the zImage.pseries.
> > 
> > It also adds code to the zImage wrapper itself to retrieve and if necessary
> > relocate the blob, and pass its address to Linux via the device-tree, to be
> > later consumed by prom_init.
> 
> Where does the "BLOB" come from?  How is it licensed and how can we
> satisfy the GPL with it?

The blob is data, not code, and it will be created by a tool that will
be open source.  My understanding is that most of it will be encrypted
with a session key that is encrypted with the secret key of the
ultravisor.  Ram Pai's KVM Forum talk last year explained how this
works.

Paul.


Re: [PATCH] KVM: PPC: Book3S: Replace kmalloc_node+memset with kzalloc_node

2019-02-22 Thread Paul Mackerras
On Mon, Jan 07, 2019 at 08:15:52PM +0800, wangbo wrote:
> Replace kmalloc_node and memset with kzalloc_node
> 
> Signed-off-by: wangbo 

Thanks, applied to my kvm-ppc-next tree.

Paul.


Re: [PATCH] KVM: PPC: Book3S HV: Change to use DEFINE_SHOW_ATTRIBUTE macro

2018-12-17 Thread Paul Mackerras
On Mon, Nov 05, 2018 at 09:47:17AM -0500, Yangtao Li wrote:
> Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.
> 
> Signed-off-by: Yangtao Li 

Thanks, patch applied to my kvm-ppc-next branch.

Paul.


Re: [Resend PATCH V5 7/10] KVM: Make kvm_set_spte_hva() return int

2018-12-11 Thread Paul Mackerras
On Thu, Dec 06, 2018 at 09:21:10PM +0800, lantianyu1...@gmail.com wrote:
> From: Lan Tianyu 
> 
> The patch is to make kvm_set_spte_hva() return int and caller can
> check return value to determine flush tlb or not.

It would be helpful if the patch description told the reader which
return value(s) mean that the caller should flush the tlb.  I would
guess that non-zero means to do the flush, but you should make that
explicit.

> Signed-off-by: Lan Tianyu 

For the powerpc bits:

Acked-by: Paul Mackerras 


Re: [PATCH v2] ppp: remove the PPPIOCDETACH ioctl

2018-05-23 Thread Paul Mackerras
On Wed, May 23, 2018 at 02:37:38PM -0700, Eric Biggers wrote:
> From: Eric Biggers <ebigg...@google.com>
> 
> The PPPIOCDETACH ioctl effectively tries to "close" the given ppp file
> before f_count has reached 0, which is fundamentally a bad idea.  It
> does check 'f_count < 2', which excludes concurrent operations on the
> file since they would only be possible with a shared fd table, in which
> case each fdget() would take a file reference.  However, it fails to
> account for the fact that even with 'f_count == 1' the file can still be
> linked into epoll instances.  As reported by syzbot, this can trivially
> be used to cause a use-after-free.
> 
> Yet, the only known user of PPPIOCDETACH is pppd versions older than
> ppp-2.4.2, which was released almost 15 years ago (November 2003).
> Also, PPPIOCDETACH apparently stopped working reliably at around the
> same time, when the f_count check was added to the kernel, e.g. see
> https://lkml.org/lkml/2002/12/31/83.  Also, the current 'f_count < 2'
> check makes PPPIOCDETACH only work in single-threaded applications; it
> always fails if called from a multithreaded application.
> 
> All pppd versions released in the last 15 years just close() the file
> descriptor instead.
> 
> Therefore, instead of hacking around this bug by exporting epoll
> internals to modules, and probably missing other related bugs, just
> remove the PPPIOCDETACH ioctl and see if anyone actually notices.  Leave
> a stub in place that prints a one-time warning and returns EINVAL.
> 
> Reported-by: syzbot+16363c99d4134717c...@syzkaller.appspotmail.com
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Eric Biggers <ebigg...@google.com>

Acked-by: Paul Mackerras <pau...@ozlabs.org>


Re: [PATCH v2] ppp: remove the PPPIOCDETACH ioctl

2018-05-23 Thread Paul Mackerras
On Wed, May 23, 2018 at 02:37:38PM -0700, Eric Biggers wrote:
> From: Eric Biggers 
> 
> The PPPIOCDETACH ioctl effectively tries to "close" the given ppp file
> before f_count has reached 0, which is fundamentally a bad idea.  It
> does check 'f_count < 2', which excludes concurrent operations on the
> file since they would only be possible with a shared fd table, in which
> case each fdget() would take a file reference.  However, it fails to
> account for the fact that even with 'f_count == 1' the file can still be
> linked into epoll instances.  As reported by syzbot, this can trivially
> be used to cause a use-after-free.
> 
> Yet, the only known user of PPPIOCDETACH is pppd versions older than
> ppp-2.4.2, which was released almost 15 years ago (November 2003).
> Also, PPPIOCDETACH apparently stopped working reliably at around the
> same time, when the f_count check was added to the kernel, e.g. see
> https://lkml.org/lkml/2002/12/31/83.  Also, the current 'f_count < 2'
> check makes PPPIOCDETACH only work in single-threaded applications; it
> always fails if called from a multithreaded application.
> 
> All pppd versions released in the last 15 years just close() the file
> descriptor instead.
> 
> Therefore, instead of hacking around this bug by exporting epoll
> internals to modules, and probably missing other related bugs, just
> remove the PPPIOCDETACH ioctl and see if anyone actually notices.  Leave
> a stub in place that prints a one-time warning and returns EINVAL.
> 
> Reported-by: syzbot+16363c99d4134717c...@syzkaller.appspotmail.com
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Eric Biggers 

Acked-by: Paul Mackerras 


Re: [PATCH] Revert "powerpc/64: Fix checksum folding in csum_add()"

2018-05-16 Thread Paul Mackerras
On Tue, Apr 10, 2018 at 08:34:37AM +0200, Christophe Leroy wrote:
> This reverts commit 6ad966d7303b70165228dba1ee8da1a05c10eefe.
> 
> That commit was pointless, because csum_add() sums two 32 bits
> values, so the sum is 0x1fffe at the maximum.
> And then when adding upper part (1) and lower part (0xfffe),
> the result is 0x which doesn't carry.
> Any lower value will not carry either.
> 
> And behind the fact that this commit is useless, it also kills the
> whole purpose of having an arch specific inline csum_add()
> because the resulting code gets even worse than what is obtained
> with the generic implementation of csum_add()
> 
> 0240 <.csum_add>:
>  240: 38 00 ff ff li  r0,-1
>  244: 7c 84 1a 14 add r4,r4,r3
>  248: 78 00 00 20 clrldi  r0,r0,32
>  24c: 78 89 00 22 rldicl  r9,r4,32,32
>  250: 7c 80 00 38 and r0,r4,r0
>  254: 7c 09 02 14 add r0,r9,r0
>  258: 78 09 00 22 rldicl  r9,r0,32,32
>  25c: 7c 00 4a 14 add r0,r0,r9
>  260: 78 03 00 20 clrldi  r3,r0,32
>  264: 4e 80 00 20 blr
> 
> In comparison, the generic implementation of csum_add() gives:
> 
> 0290 <.csum_add>:
>  290: 7c 63 22 14 add r3,r3,r4
>  294: 7f 83 20 40 cmplw   cr7,r3,r4
>  298: 7c 10 10 26 mfocrf  r0,1
>  29c: 54 00 ef fe rlwinm  r0,r0,29,31,31
>  2a0: 7c 60 1a 14 add r3,r0,r3
>  2a4: 78 63 00 20 clrldi  r3,r3,32
>  2a8: 4e 80 00 20 blr
> 
> And the reverted implementation for PPC64 gives:
> 
> 0240 <.csum_add>:
>  240: 7c 84 1a 14 add r4,r4,r3
>  244: 78 80 00 22 rldicl  r0,r4,32,32
>  248: 7c 80 22 14 add r4,r0,r4
>  24c: 78 83 00 20 clrldi  r3,r4,32
>  250: 4e 80 00 20 blr
> 
> Fixes: 6ad966d7303b7 ("powerpc/64: Fix checksum folding in csum_add()")
> Signed-off-by: Christophe Leroy <christophe.le...@c-s.fr>

Seems I was right first time... :)

Acked-by: Paul Mackerras <pau...@ozlabs.org>


Re: [PATCH] Revert "powerpc/64: Fix checksum folding in csum_add()"

2018-05-16 Thread Paul Mackerras
On Tue, Apr 10, 2018 at 08:34:37AM +0200, Christophe Leroy wrote:
> This reverts commit 6ad966d7303b70165228dba1ee8da1a05c10eefe.
> 
> That commit was pointless, because csum_add() sums two 32 bits
> values, so the sum is 0x1fffe at the maximum.
> And then when adding upper part (1) and lower part (0xfffe),
> the result is 0x which doesn't carry.
> Any lower value will not carry either.
> 
> And behind the fact that this commit is useless, it also kills the
> whole purpose of having an arch specific inline csum_add()
> because the resulting code gets even worse than what is obtained
> with the generic implementation of csum_add()
> 
> 0240 <.csum_add>:
>  240: 38 00 ff ff li  r0,-1
>  244: 7c 84 1a 14 add r4,r4,r3
>  248: 78 00 00 20 clrldi  r0,r0,32
>  24c: 78 89 00 22 rldicl  r9,r4,32,32
>  250: 7c 80 00 38 and r0,r4,r0
>  254: 7c 09 02 14 add r0,r9,r0
>  258: 78 09 00 22 rldicl  r9,r0,32,32
>  25c: 7c 00 4a 14 add r0,r0,r9
>  260: 78 03 00 20 clrldi  r3,r0,32
>  264: 4e 80 00 20 blr
> 
> In comparison, the generic implementation of csum_add() gives:
> 
> 0290 <.csum_add>:
>  290: 7c 63 22 14 add r3,r3,r4
>  294: 7f 83 20 40 cmplw   cr7,r3,r4
>  298: 7c 10 10 26 mfocrf  r0,1
>  29c: 54 00 ef fe rlwinm  r0,r0,29,31,31
>  2a0: 7c 60 1a 14 add r3,r0,r3
>  2a4: 78 63 00 20 clrldi  r3,r3,32
>  2a8: 4e 80 00 20 blr
> 
> And the reverted implementation for PPC64 gives:
> 
> 0240 <.csum_add>:
>  240: 7c 84 1a 14 add r4,r4,r3
>  244: 78 80 00 22 rldicl  r0,r4,32,32
>  248: 7c 80 22 14 add r4,r0,r4
>  24c: 78 83 00 20 clrldi  r3,r4,32
>  250: 4e 80 00 20 blr
> 
> Fixes: 6ad966d7303b7 ("powerpc/64: Fix checksum folding in csum_add()")
> Signed-off-by: Christophe Leroy 

Seems I was right first time... :)

Acked-by: Paul Mackerras 


Re: [PATCH v2] powerpc: kvm: Change return type to vm_fault_t

2018-05-16 Thread Paul Mackerras
On Wed, May 16, 2018 at 10:11:11AM +0530, Souptick Joarder wrote:
> On Thu, May 10, 2018 at 11:57 PM, Souptick Joarder  
> wrote:
> > Use new return type vm_fault_t for fault handler
> > in struct vm_operations_struct. For now, this is
> > just documenting that the function returns a
> > VM_FAULT value rather than an errno.  Once all
> > instances are converted, vm_fault_t will become
> > a distinct type.
> >
> > commit 1c8f422059ae ("mm: change return type to
> > vm_fault_t")
> >
> > Signed-off-by: Souptick Joarder 
> > ---
> > v2: Updated the change log
> >
> >  arch/powerpc/kvm/book3s_64_vio.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/powerpc/kvm/book3s_64_vio.c 
> > b/arch/powerpc/kvm/book3s_64_vio.c
> > index 4dffa61..346ac0d 100644
> > --- a/arch/powerpc/kvm/book3s_64_vio.c
> > +++ b/arch/powerpc/kvm/book3s_64_vio.c
> > @@ -237,7 +237,7 @@ static void release_spapr_tce_table(struct rcu_head 
> > *head)
> > kfree(stt);
> >  }
> >
> > -static int kvm_spapr_tce_fault(struct vm_fault *vmf)
> > +static vm_fault_t kvm_spapr_tce_fault(struct vm_fault *vmf)
> >  {
> > struct kvmppc_spapr_tce_table *stt = 
> > vmf->vma->vm_file->private_data;
> > struct page *page;
> > --
> > 1.9.1
> >
> 
> If no comment, we would like to get this patch in queue
> for 4.18.

It looks fine - I'll queue it up.

Paul.


Re: [PATCH v2] powerpc: kvm: Change return type to vm_fault_t

2018-05-16 Thread Paul Mackerras
On Wed, May 16, 2018 at 10:11:11AM +0530, Souptick Joarder wrote:
> On Thu, May 10, 2018 at 11:57 PM, Souptick Joarder  
> wrote:
> > Use new return type vm_fault_t for fault handler
> > in struct vm_operations_struct. For now, this is
> > just documenting that the function returns a
> > VM_FAULT value rather than an errno.  Once all
> > instances are converted, vm_fault_t will become
> > a distinct type.
> >
> > commit 1c8f422059ae ("mm: change return type to
> > vm_fault_t")
> >
> > Signed-off-by: Souptick Joarder 
> > ---
> > v2: Updated the change log
> >
> >  arch/powerpc/kvm/book3s_64_vio.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/powerpc/kvm/book3s_64_vio.c 
> > b/arch/powerpc/kvm/book3s_64_vio.c
> > index 4dffa61..346ac0d 100644
> > --- a/arch/powerpc/kvm/book3s_64_vio.c
> > +++ b/arch/powerpc/kvm/book3s_64_vio.c
> > @@ -237,7 +237,7 @@ static void release_spapr_tce_table(struct rcu_head 
> > *head)
> > kfree(stt);
> >  }
> >
> > -static int kvm_spapr_tce_fault(struct vm_fault *vmf)
> > +static vm_fault_t kvm_spapr_tce_fault(struct vm_fault *vmf)
> >  {
> > struct kvmppc_spapr_tce_table *stt = 
> > vmf->vma->vm_file->private_data;
> > struct page *page;
> > --
> > 1.9.1
> >
> 
> If no comment, we would like to get this patch in queue
> for 4.18.

It looks fine - I'll queue it up.

Paul.


Re: [PATCH] powerpc/misc: get rid of add_reloc_offset()

2018-04-17 Thread Paul Mackerras
On Tue, Apr 17, 2018 at 09:56:24AM +0200, Christophe Leroy wrote:
> add_reloc_offset() is almost redundant with reloc_offset()
> 
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/include/asm/setup.h   |  3 +--
>  arch/powerpc/kernel/misc.S | 16 
>  arch/powerpc/kernel/prom_init_check.sh |  2 +-
>  3 files changed, 2 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/setup.h 
> b/arch/powerpc/include/asm/setup.h
> index 27fa52ed6d00..115e0896ffa7 100644
> --- a/arch/powerpc/include/asm/setup.h
> +++ b/arch/powerpc/include/asm/setup.h
> @@ -17,10 +17,9 @@ extern void note_scsi_host(struct device_node *, void *);
>  
>  /* Used in very early kernel initialization. */
>  extern unsigned long reloc_offset(void);
> -extern unsigned long add_reloc_offset(unsigned long);
>  extern void reloc_got2(unsigned long);
>  
> -#define PTRRELOC(x)  ((typeof(x)) add_reloc_offset((unsigned long)(x)))
> +#define PTRRELOC(x)  ((typeof(x)) ((unsigned long)(x) + reloc_offset()))

NAK.  This is how it used to be, and we changed it in order to prevent
gcc from making incorrect assumptions.  If you use the form with the
explicit addition, and x is the address of an array, gcc will assume
that the result is within the bounds of the array (apparently the C
standard says it can do that) and potentially generate incorrect
code.  I recall that we had an actual case where gcc was generating
incorrect code, though I don't recall the details, as this was some
time before 2002.

Paul.


Re: [PATCH] powerpc/misc: get rid of add_reloc_offset()

2018-04-17 Thread Paul Mackerras
On Tue, Apr 17, 2018 at 09:56:24AM +0200, Christophe Leroy wrote:
> add_reloc_offset() is almost redundant with reloc_offset()
> 
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/include/asm/setup.h   |  3 +--
>  arch/powerpc/kernel/misc.S | 16 
>  arch/powerpc/kernel/prom_init_check.sh |  2 +-
>  3 files changed, 2 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/setup.h 
> b/arch/powerpc/include/asm/setup.h
> index 27fa52ed6d00..115e0896ffa7 100644
> --- a/arch/powerpc/include/asm/setup.h
> +++ b/arch/powerpc/include/asm/setup.h
> @@ -17,10 +17,9 @@ extern void note_scsi_host(struct device_node *, void *);
>  
>  /* Used in very early kernel initialization. */
>  extern unsigned long reloc_offset(void);
> -extern unsigned long add_reloc_offset(unsigned long);
>  extern void reloc_got2(unsigned long);
>  
> -#define PTRRELOC(x)  ((typeof(x)) add_reloc_offset((unsigned long)(x)))
> +#define PTRRELOC(x)  ((typeof(x)) ((unsigned long)(x) + reloc_offset()))

NAK.  This is how it used to be, and we changed it in order to prevent
gcc from making incorrect assumptions.  If you use the form with the
explicit addition, and x is the address of an array, gcc will assume
that the result is within the bounds of the array (apparently the C
standard says it can do that) and potentially generate incorrect
code.  I recall that we had an actual case where gcc was generating
incorrect code, though I don't recall the details, as this was some
time before 2002.

Paul.


Re: [PATCH] KVM: PPC: Book3S HV: Fix guest time accounting with VIRT_CPU_ACCOUNTING_GEN

2018-03-04 Thread Paul Mackerras
On Fri, Mar 02, 2018 at 11:51:56AM +0100, Laurent Vivier wrote:
> Since commit 8b24e69fc47e ("KVM: PPC: Book3S HV: Close race with testing
> for signals on guest entry"), if CONFIG_VIRT_CPU_ACCOUNTING_GEN is set, the
> guest time is not accounted to guest time and user time, but instead to
> system time.
> 
> This is because guest_enter()/guest_exit() are called while interrupts
> are disabled and the tick counter cannot be updated between them.
> 
> To fix that, move guest_exit() after local_irq_enable(), and as
> guest_enter() is called with IRQ disabled, calls guest_enter_irqoff()
> instead.
> 
> Fixes: 8b24e69fc47e
> ("KVM: PPC: Book3S HV: Close race with testing for signals on guest entry")
> Signed-off-by: Laurent Vivier 

Thanks, applied to my kvm-ppc-fixes branch.

Paul.


Re: [PATCH] KVM: PPC: Book3S HV: Fix guest time accounting with VIRT_CPU_ACCOUNTING_GEN

2018-03-04 Thread Paul Mackerras
On Fri, Mar 02, 2018 at 11:51:56AM +0100, Laurent Vivier wrote:
> Since commit 8b24e69fc47e ("KVM: PPC: Book3S HV: Close race with testing
> for signals on guest entry"), if CONFIG_VIRT_CPU_ACCOUNTING_GEN is set, the
> guest time is not accounted to guest time and user time, but instead to
> system time.
> 
> This is because guest_enter()/guest_exit() are called while interrupts
> are disabled and the tick counter cannot be updated between them.
> 
> To fix that, move guest_exit() after local_irq_enable(), and as
> guest_enter() is called with IRQ disabled, calls guest_enter_irqoff()
> instead.
> 
> Fixes: 8b24e69fc47e
> ("KVM: PPC: Book3S HV: Close race with testing for signals on guest entry")
> Signed-off-by: Laurent Vivier 

Thanks, applied to my kvm-ppc-fixes branch.

Paul.


Re: [PATCH 02/20] KVM: PPC: Book3S PR: Fix broken select due to misspelling

2018-02-04 Thread Paul Mackerras
On Mon, Feb 05, 2018 at 05:58:59AM +0100, Ulf Magnusson wrote:
> On Mon, Feb 5, 2018 at 5:48 AM, Paul Mackerras <pau...@ozlabs.org> wrote:
> > On Mon, Feb 05, 2018 at 02:21:14AM +0100, Ulf Magnusson wrote:
> >> Commit 76d837a4c0f9 ("KVM: PPC: Book3S PR: Don't include SPAPR TCE code
> >> on non-pseries platforms") added a reference to the globally undefined
> >> symbol PPC_SERIES. Looking at the rest of the commit, PPC_PSERIES was
> >> probably intended.
> >>
> >> Change PPC_SERIES to PPC_PSERIES.
> >>
> >> Discovered with the
> >> https://github.com/ulfalizer/Kconfiglib/blob/master/examples/list_undefined.py
> >> script.
> >>
> >> Signed-off-by: Ulf Magnusson <ulfali...@gmail.com>
> >
> > Acked-by: Paul Mackerras <pau...@ozlabs.org>
> >
> > Which tree is this series going into?
> >
> > Paul.
> 
> I didn't have a particular one in mind. I imagined people would pick
> up individual patches into the trees that make the most sense. It was
> easiest to do the undefined symbol removal as a kernel-wide batch job.

OK, I'll take this one then.

Paul.


Re: [PATCH 02/20] KVM: PPC: Book3S PR: Fix broken select due to misspelling

2018-02-04 Thread Paul Mackerras
On Mon, Feb 05, 2018 at 05:58:59AM +0100, Ulf Magnusson wrote:
> On Mon, Feb 5, 2018 at 5:48 AM, Paul Mackerras  wrote:
> > On Mon, Feb 05, 2018 at 02:21:14AM +0100, Ulf Magnusson wrote:
> >> Commit 76d837a4c0f9 ("KVM: PPC: Book3S PR: Don't include SPAPR TCE code
> >> on non-pseries platforms") added a reference to the globally undefined
> >> symbol PPC_SERIES. Looking at the rest of the commit, PPC_PSERIES was
> >> probably intended.
> >>
> >> Change PPC_SERIES to PPC_PSERIES.
> >>
> >> Discovered with the
> >> https://github.com/ulfalizer/Kconfiglib/blob/master/examples/list_undefined.py
> >> script.
> >>
> >> Signed-off-by: Ulf Magnusson 
> >
> > Acked-by: Paul Mackerras 
> >
> > Which tree is this series going into?
> >
> > Paul.
> 
> I didn't have a particular one in mind. I imagined people would pick
> up individual patches into the trees that make the most sense. It was
> easiest to do the undefined symbol removal as a kernel-wide batch job.

OK, I'll take this one then.

Paul.


Re: [PATCH 02/20] KVM: PPC: Book3S PR: Fix broken select due to misspelling

2018-02-04 Thread Paul Mackerras
On Mon, Feb 05, 2018 at 02:21:14AM +0100, Ulf Magnusson wrote:
> Commit 76d837a4c0f9 ("KVM: PPC: Book3S PR: Don't include SPAPR TCE code
> on non-pseries platforms") added a reference to the globally undefined
> symbol PPC_SERIES. Looking at the rest of the commit, PPC_PSERIES was
> probably intended.
> 
> Change PPC_SERIES to PPC_PSERIES.
> 
> Discovered with the
> https://github.com/ulfalizer/Kconfiglib/blob/master/examples/list_undefined.py
> script.
> 
> Signed-off-by: Ulf Magnusson <ulfali...@gmail.com>

Acked-by: Paul Mackerras <pau...@ozlabs.org>

Which tree is this series going into?

Paul.


Re: [PATCH 02/20] KVM: PPC: Book3S PR: Fix broken select due to misspelling

2018-02-04 Thread Paul Mackerras
On Mon, Feb 05, 2018 at 02:21:14AM +0100, Ulf Magnusson wrote:
> Commit 76d837a4c0f9 ("KVM: PPC: Book3S PR: Don't include SPAPR TCE code
> on non-pseries platforms") added a reference to the globally undefined
> symbol PPC_SERIES. Looking at the rest of the commit, PPC_PSERIES was
> probably intended.
> 
> Change PPC_SERIES to PPC_PSERIES.
> 
> Discovered with the
> https://github.com/ulfalizer/Kconfiglib/blob/master/examples/list_undefined.py
> script.
> 
> Signed-off-by: Ulf Magnusson 

Acked-by: Paul Mackerras 

Which tree is this series going into?

Paul.


Re: [PATCH] KVM: PPC: Use seq_puts() in kvmppc_exit_timing_show()

2018-01-11 Thread Paul Mackerras
On Sun, Jan 07, 2018 at 10:18:08AM +0100, SF Markus Elfring wrote:
> From: Markus Elfring 
> Date: Sun, 7 Jan 2018 10:07:36 +0100
> 
> A headline should be quickly put into a sequence. Thus use the
> function "seq_puts" instead of "seq_printf" for this purpose.
> 
> This issue was detected by using the Coccinelle software.
> 
> Signed-off-by: Markus Elfring 

Thanks, applied to my kvm-ppc-next branch.

Paul.


Re: [PATCH] KVM: PPC: Use seq_puts() in kvmppc_exit_timing_show()

2018-01-11 Thread Paul Mackerras
On Sun, Jan 07, 2018 at 10:18:08AM +0100, SF Markus Elfring wrote:
> From: Markus Elfring 
> Date: Sun, 7 Jan 2018 10:07:36 +0100
> 
> A headline should be quickly put into a sequence. Thus use the
> function "seq_puts" instead of "seq_printf" for this purpose.
> 
> This issue was detected by using the Coccinelle software.
> 
> Signed-off-by: Markus Elfring 

Thanks, applied to my kvm-ppc-next branch.

Paul.


Re: [PATCH] KVM: PPC: Book3S HV: Remove vcpu->arch.dec usage

2018-01-11 Thread Paul Mackerras
On Tue, Dec 19, 2017 at 03:56:24PM +0100, Alexander Graf wrote:
> On Book3S in HV mode, we don't use the vcpu->arch.dec field at all.
> Instead, all logic is built around vcpu->arch.dec_expires.
> 
> So let's remove the one remaining piece of code that was setting it.
> 
> Signed-off-by: Alexander Graf 

Thanks, applied to my kvm-ppc-next branch.

> Looking through the DEC logic, I fail to see any code that allows
> save or restore of DEC. Do we maybe miss out on that register for
> (live) migration?

Yes, it looks like we do.  I'm amazed no-one has noticed before.  I'll
fix it.

Paul.


Re: [PATCH] KVM: PPC: Book3S HV: Remove vcpu->arch.dec usage

2018-01-11 Thread Paul Mackerras
On Tue, Dec 19, 2017 at 03:56:24PM +0100, Alexander Graf wrote:
> On Book3S in HV mode, we don't use the vcpu->arch.dec field at all.
> Instead, all logic is built around vcpu->arch.dec_expires.
> 
> So let's remove the one remaining piece of code that was setting it.
> 
> Signed-off-by: Alexander Graf 

Thanks, applied to my kvm-ppc-next branch.

> Looking through the DEC logic, I fail to see any code that allows
> save or restore of DEC. Do we maybe miss out on that register for
> (live) migration?

Yes, it looks like we do.  I'm amazed no-one has noticed before.  I'll
fix it.

Paul.


Re: [PATCH] KVM: PPC: Book3S HV: Always flush TLB in kvmppc_alloc_reset_hpt()

2018-01-11 Thread Paul Mackerras
On Wed, Jan 10, 2018 at 05:04:39PM +1100, David Gibson wrote:
> The KVM_PPC_ALLOCATE_HTAB ioctl(), implemented by kvmppc_alloc_reset_hpt()
> is supposed to completely clear and reset a guest's Hashed Page Table (HPT)
> allocating or re-allocating it if necessary.
> 
> In the case where an HPT of the right size already exists and it just
> zeroes it, it forces a TLB flush on all guest CPUs, to remove any stale TLB
> entries loaded from the old HPT.
> 
> However, that situation can arise when the HPT is resizing as well - or
> even when switching from an RPT to HPT - so those cases need a TLB flush as
> well.
> 
> So, move the TLB flush to trigger in all cases except for errors.
> 
> Signed-off-by: David Gibson 
> ---
>  arch/powerpc/kvm/book3s_64_mmu_hv.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> Paul, this is based on Paolo's KVM tree, but it should apply without
> modification to pretty much any vaguely current tree.  It's a pretty
> nasty bug - the case we've found hitting it in the wild is a bit
> esoteric, but it could in theory affect other situations as well.
> 
> Please apply ASAP, and should probably be queued for the stable
> branches as well.

Thanks, applied to my kvm-ppc-fixes branch, and I added
cc: sta...@vger.kernel.org.

Paul.


Re: [PATCH] KVM: PPC: Book3S HV: Always flush TLB in kvmppc_alloc_reset_hpt()

2018-01-11 Thread Paul Mackerras
On Wed, Jan 10, 2018 at 05:04:39PM +1100, David Gibson wrote:
> The KVM_PPC_ALLOCATE_HTAB ioctl(), implemented by kvmppc_alloc_reset_hpt()
> is supposed to completely clear and reset a guest's Hashed Page Table (HPT)
> allocating or re-allocating it if necessary.
> 
> In the case where an HPT of the right size already exists and it just
> zeroes it, it forces a TLB flush on all guest CPUs, to remove any stale TLB
> entries loaded from the old HPT.
> 
> However, that situation can arise when the HPT is resizing as well - or
> even when switching from an RPT to HPT - so those cases need a TLB flush as
> well.
> 
> So, move the TLB flush to trigger in all cases except for errors.
> 
> Signed-off-by: David Gibson 
> ---
>  arch/powerpc/kvm/book3s_64_mmu_hv.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> Paul, this is based on Paolo's KVM tree, but it should apply without
> modification to pretty much any vaguely current tree.  It's a pretty
> nasty bug - the case we've found hitting it in the wild is a bit
> esoteric, but it could in theory affect other situations as well.
> 
> Please apply ASAP, and should probably be queued for the stable
> branches as well.

Thanks, applied to my kvm-ppc-fixes branch, and I added
cc: sta...@vger.kernel.org.

Paul.


Re: [PATCH 03/11] powerpc/64s: Simple RFI macro conversions

2018-01-08 Thread Paul Mackerras
On Mon, Jan 08, 2018 at 06:09:51PM +0100, Peter Zijlstra wrote:
> On Tue, Jan 09, 2018 at 03:54:45AM +1100, Michael Ellerman wrote:
> > diff --git a/arch/powerpc/kvm/book3s_rmhandlers.S 
> > b/arch/powerpc/kvm/book3s_rmhandlers.S
> > index 42a4b237df5f..34a5adeff084 100644
> > --- a/arch/powerpc/kvm/book3s_rmhandlers.S
> > +++ b/arch/powerpc/kvm/book3s_rmhandlers.S
> > @@ -46,6 +46,9 @@
> >  
> >  #define FUNC(name) name
> >  
> > +#define RFI_TO_KERNEL  RFI
> > +#define RFI_TO_GUEST   RFI
> > +
> >  .macro INTERRUPT_TRAMPOLINE intno
> >  
> >  .global kvmppc_trampoline_\intno
> 
> Leftovers? The previous patch seems to define all that in common
> headers, why redefine here again?

Not leftovers - this is for the sake of 32-bit compiles.  There is
code in this file and in book3s_segment.S which gets used both for
32-bit and 64-bit kernels, and this is supplying a definition on
32-bit platforms.  Without this, 32-bit builds that have PR KVM
configured will fail.

Paul.



Re: [PATCH 03/11] powerpc/64s: Simple RFI macro conversions

2018-01-08 Thread Paul Mackerras
On Mon, Jan 08, 2018 at 06:09:51PM +0100, Peter Zijlstra wrote:
> On Tue, Jan 09, 2018 at 03:54:45AM +1100, Michael Ellerman wrote:
> > diff --git a/arch/powerpc/kvm/book3s_rmhandlers.S 
> > b/arch/powerpc/kvm/book3s_rmhandlers.S
> > index 42a4b237df5f..34a5adeff084 100644
> > --- a/arch/powerpc/kvm/book3s_rmhandlers.S
> > +++ b/arch/powerpc/kvm/book3s_rmhandlers.S
> > @@ -46,6 +46,9 @@
> >  
> >  #define FUNC(name) name
> >  
> > +#define RFI_TO_KERNEL  RFI
> > +#define RFI_TO_GUEST   RFI
> > +
> >  .macro INTERRUPT_TRAMPOLINE intno
> >  
> >  .global kvmppc_trampoline_\intno
> 
> Leftovers? The previous patch seems to define all that in common
> headers, why redefine here again?

Not leftovers - this is for the sake of 32-bit compiles.  There is
code in this file and in book3s_segment.S which gets used both for
32-bit and 64-bit kernels, and this is supplying a definition on
32-bit platforms.  Without this, 32-bit builds that have PR KVM
configured will fail.

Paul.



Re: [PATCH] powerpc/xive: store server for masked interrupt in kvmppc_xive_set_xive()

2017-12-04 Thread Paul Mackerras
On Fri, Nov 24, 2017 at 07:38:13AM +1100, Benjamin Herrenschmidt wrote:
> On Thu, 2017-11-23 at 10:06 +0100, Laurent Vivier wrote:
> > This is needed to map kvmppc_xive_set_xive() behavior
> > to kvmppc_xics_set_xive().
> > 
> > As we store the server, kvmppc_xive_get_xive() can return
> > the good value and we can also allow kvmppc_xive_int_on().
> > 
> > Signed-off-by: Laurent Vivier 
> > ---
> >  arch/powerpc/kvm/book3s_xive.c | 20 
> >  1 file changed, 8 insertions(+), 12 deletions(-)
> > 
> > diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
> > index bf457843e032..2781b8733038 100644
> > --- a/arch/powerpc/kvm/book3s_xive.c
> > +++ b/arch/powerpc/kvm/book3s_xive.c
> > @@ -584,10 +584,14 @@ int kvmppc_xive_set_xive(struct kvm *kvm, u32 irq, 
> > u32 server,
> >  *   we could initialize interrupts with valid default
> >  */
> >  
> > -   if (new_act_prio != MASKED &&
> > -   (state->act_server != server ||
> > -state->act_priority != new_act_prio))
> > -   rc = xive_target_interrupt(kvm, state, server, new_act_prio);
> > +   if (state->act_server != server ||
> > +   state->act_priority != new_act_prio) {
> > +   if (new_act_prio != MASKED)
> > +   rc = xive_target_interrupt(kvm, state, server,
> > +  new_act_prio);
> > +   if (!rc)
> > +   state->act_server = server;
> > +   }
> 
> That leads to another problem with this code. My current implementation
> is such that is a target queue is full, it will pick another target.
> But here, we still update act_server to the passed-in server and
> not the actual target...

So does that amount to a NAK?

> > /*
> >  * Perform the final unmasking of the interrupt source
> > @@ -646,14 +650,6 @@ int kvmppc_xive_int_on(struct kvm *kvm, u32 irq)
> >  
> > pr_devel("int_on(irq=0x%x)\n", irq);
> >  
> > -   /*
> > -* Check if interrupt was not targetted
> > -*/
> > -   if (state->act_priority == MASKED) {
> > -   pr_devel("int_on on untargetted interrupt\n");
> > -   return -EINVAL;
> > -   }
> > -
> 
> So my thinking here was that act_priority was never going to be MASKED
> except if the interrupt had never been targetted anywhere at machine
> startup time. Thus if act_priority is masked, the act_server field
> cannot be trusted.
> 
> > /* If saved_priority is 0xff, do nothing */
> > if (state->saved_priority == MASKED)
> > return 0;

How do you think this should be fixed?

Laurent, are you reworking the patch at the moment?

Paul.


Re: [PATCH] powerpc/xive: store server for masked interrupt in kvmppc_xive_set_xive()

2017-12-04 Thread Paul Mackerras
On Fri, Nov 24, 2017 at 07:38:13AM +1100, Benjamin Herrenschmidt wrote:
> On Thu, 2017-11-23 at 10:06 +0100, Laurent Vivier wrote:
> > This is needed to map kvmppc_xive_set_xive() behavior
> > to kvmppc_xics_set_xive().
> > 
> > As we store the server, kvmppc_xive_get_xive() can return
> > the good value and we can also allow kvmppc_xive_int_on().
> > 
> > Signed-off-by: Laurent Vivier 
> > ---
> >  arch/powerpc/kvm/book3s_xive.c | 20 
> >  1 file changed, 8 insertions(+), 12 deletions(-)
> > 
> > diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
> > index bf457843e032..2781b8733038 100644
> > --- a/arch/powerpc/kvm/book3s_xive.c
> > +++ b/arch/powerpc/kvm/book3s_xive.c
> > @@ -584,10 +584,14 @@ int kvmppc_xive_set_xive(struct kvm *kvm, u32 irq, 
> > u32 server,
> >  *   we could initialize interrupts with valid default
> >  */
> >  
> > -   if (new_act_prio != MASKED &&
> > -   (state->act_server != server ||
> > -state->act_priority != new_act_prio))
> > -   rc = xive_target_interrupt(kvm, state, server, new_act_prio);
> > +   if (state->act_server != server ||
> > +   state->act_priority != new_act_prio) {
> > +   if (new_act_prio != MASKED)
> > +   rc = xive_target_interrupt(kvm, state, server,
> > +  new_act_prio);
> > +   if (!rc)
> > +   state->act_server = server;
> > +   }
> 
> That leads to another problem with this code. My current implementation
> is such that is a target queue is full, it will pick another target.
> But here, we still update act_server to the passed-in server and
> not the actual target...

So does that amount to a NAK?

> > /*
> >  * Perform the final unmasking of the interrupt source
> > @@ -646,14 +650,6 @@ int kvmppc_xive_int_on(struct kvm *kvm, u32 irq)
> >  
> > pr_devel("int_on(irq=0x%x)\n", irq);
> >  
> > -   /*
> > -* Check if interrupt was not targetted
> > -*/
> > -   if (state->act_priority == MASKED) {
> > -   pr_devel("int_on on untargetted interrupt\n");
> > -   return -EINVAL;
> > -   }
> > -
> 
> So my thinking here was that act_priority was never going to be MASKED
> except if the interrupt had never been targetted anywhere at machine
> startup time. Thus if act_priority is masked, the act_server field
> cannot be trusted.
> 
> > /* If saved_priority is 0xff, do nothing */
> > if (state->saved_priority == MASKED)
> > return 0;

How do you think this should be fixed?

Laurent, are you reworking the patch at the moment?

Paul.


Re: [PATCH v2 0/2] KVM: fixes for the kernel-hardening tree

2017-10-26 Thread Paul Mackerras
On Thu, Oct 26, 2017 at 03:45:45PM +0200, Paolo Bonzini wrote:
> Four KVM ioctls (KVM_GET/SET_CPUID2 on x86, KVM_GET/SET_ONE_REG on
> ARM and s390) directly access the kvm_vcpu_arch struct.  Therefore, the
> new usercopy hardening work in linux-next, which forbids copies from and
> to slab objects unless they are from kmalloc or explicitly whitelisted,
> breaks KVM on those architectures.
> 
> The kvm_vcpu_arch struct is embedded in the kvm_vcpu struct and the
> corresponding slab cache is allocated by architecture-independent code.
> It is enough, for simplicity, to whitelist the whole sub-struct and
> only touch one place of the KVM code.  Later, any further restrictions
> can be applied in the KVM tree.

I checked arch/powerpc/kvm, and all the copy_to/from_user calls are
accessing the stack or memory allocated with kzalloc or kvzalloc, so
if I understand correctly, we should be OK there.

Paul.


Re: [PATCH v2 0/2] KVM: fixes for the kernel-hardening tree

2017-10-26 Thread Paul Mackerras
On Thu, Oct 26, 2017 at 03:45:45PM +0200, Paolo Bonzini wrote:
> Four KVM ioctls (KVM_GET/SET_CPUID2 on x86, KVM_GET/SET_ONE_REG on
> ARM and s390) directly access the kvm_vcpu_arch struct.  Therefore, the
> new usercopy hardening work in linux-next, which forbids copies from and
> to slab objects unless they are from kmalloc or explicitly whitelisted,
> breaks KVM on those architectures.
> 
> The kvm_vcpu_arch struct is embedded in the kvm_vcpu struct and the
> corresponding slab cache is allocated by architecture-independent code.
> It is enough, for simplicity, to whitelist the whole sub-struct and
> only touch one place of the KVM code.  Later, any further restrictions
> can be applied in the KVM tree.

I checked arch/powerpc/kvm, and all the copy_to/from_user calls are
accessing the stack or memory allocated with kzalloc or kvzalloc, so
if I understand correctly, we should be OK there.

Paul.


Re: [PATCH] bitmap: Fix optimization of bitmap_set/clear for big-endian machines

2017-10-25 Thread Paul Mackerras
On Wed, Oct 25, 2017 at 07:39:48AM +, Matthew Wilcox wrote:
> Hang on, don't tell me you found this by inspection.  Are you not running the 
> bitmap testcase, enabled by CONFIG_TEST_BITMAP?  Either that should be 
> producing an error, or there's a missing test case, or your inspection is 
> wrong ...

I did find it by inspection.  I was looking for a version of the
bitmap_* API that does little-endian style bitmaps on all systems, and
the inline bitmap_set() does that in the case where it calls memset,
but not in the case where it calls __bitmap_set.

I'll fire up a big-endian system tomorrow when I get to work to run
the test case.  (PPC64 is almost entirely little-endian these days as
far as the IBM POWER systems are concerned.)

In any case, it's pretty clearly wrong as it is.  On a big-endian
64-bit system, bitmap_set(p, 56, 16) should set bytes 0 and 15 to
0xff, and there's no way a single memset can do that.

Paul.

(and yes, I stuffed up the address for lkml)


Re: [PATCH] bitmap: Fix optimization of bitmap_set/clear for big-endian machines

2017-10-25 Thread Paul Mackerras
On Wed, Oct 25, 2017 at 07:39:48AM +, Matthew Wilcox wrote:
> Hang on, don't tell me you found this by inspection.  Are you not running the 
> bitmap testcase, enabled by CONFIG_TEST_BITMAP?  Either that should be 
> producing an error, or there's a missing test case, or your inspection is 
> wrong ...

I did find it by inspection.  I was looking for a version of the
bitmap_* API that does little-endian style bitmaps on all systems, and
the inline bitmap_set() does that in the case where it calls memset,
but not in the case where it calls __bitmap_set.

I'll fire up a big-endian system tomorrow when I get to work to run
the test case.  (PPC64 is almost entirely little-endian these days as
far as the IBM POWER systems are concerned.)

In any case, it's pretty clearly wrong as it is.  On a big-endian
64-bit system, bitmap_set(p, 56, 16) should set bytes 0 and 15 to
0xff, and there's no way a single memset can do that.

Paul.

(and yes, I stuffed up the address for lkml)


Re: [PATCH] KVM: PPC: Book3S HV: Delete an error message for a failed memory allocation in kvmppc_allocate_hpt()

2017-10-18 Thread Paul Mackerras
On Thu, Oct 05, 2017 at 01:23:30PM +0200, SF Markus Elfring wrote:
> From: Markus Elfring 
> Date: Thu, 5 Oct 2017 13:16:51 +0200
> 
> Omit an extra message for a memory allocation failure in this function.
> 
> This issue was detected by using the Coccinelle software.
> 
> Signed-off-by: Markus Elfring 

Thanks, applied to my kvm-ppc-next branch.

Paul.


Re: [PATCH] KVM: PPC: Book3S HV: Delete an error message for a failed memory allocation in kvmppc_allocate_hpt()

2017-10-18 Thread Paul Mackerras
On Thu, Oct 05, 2017 at 01:23:30PM +0200, SF Markus Elfring wrote:
> From: Markus Elfring 
> Date: Thu, 5 Oct 2017 13:16:51 +0200
> 
> Omit an extra message for a memory allocation failure in this function.
> 
> This issue was detected by using the Coccinelle software.
> 
> Signed-off-by: Markus Elfring 

Thanks, applied to my kvm-ppc-next branch.

Paul.


Re: [PATCH 1/10] KVM: PPC: Book3S HV: Use ARRAY_SIZE macro

2017-10-18 Thread Paul Mackerras
On Sun, Sep 03, 2017 at 02:19:31PM +0200, Thomas Meyer wrote:
> Use ARRAY_SIZE macro, rather than explicitly coding some variant of it
> yourself.
> Found with: find -type f -name "*.c" -o -name "*.h" | xargs perl -p -i -e
> 's/\bsizeof\s*\(\s*(\w+)\s*\)\s*\ /\s*sizeof\s*\(\s*\1\s*\[\s*0\s*\]\s*\)
> /ARRAY_SIZE(\1)/g' and manual check/verification.
> 
> Signed-off-by: Thomas Meyer 

Thanks, applied to my kvm-ppc-next branch.

Paul.


Re: [PATCH 1/10] KVM: PPC: Book3S HV: Use ARRAY_SIZE macro

2017-10-18 Thread Paul Mackerras
On Sun, Sep 03, 2017 at 02:19:31PM +0200, Thomas Meyer wrote:
> Use ARRAY_SIZE macro, rather than explicitly coding some variant of it
> yourself.
> Found with: find -type f -name "*.c" -o -name "*.h" | xargs perl -p -i -e
> 's/\bsizeof\s*\(\s*(\w+)\s*\)\s*\ /\s*sizeof\s*\(\s*\1\s*\[\s*0\s*\]\s*\)
> /ARRAY_SIZE(\1)/g' and manual check/verification.
> 
> Signed-off-by: Thomas Meyer 

Thanks, applied to my kvm-ppc-next branch.

Paul.


Re: [PATCH 6/7] KVM: PPC: Cocci spatch "vma_pages"

2017-10-18 Thread Paul Mackerras
On Thu, Sep 21, 2017 at 12:29:36AM +0200, Thomas Meyer wrote:
> Use vma_pages function on vma object instead of explicit computation.
> Found by coccinelle spatch "api/vma_pages.cocci"
> 
> Signed-off-by: Thomas Meyer 

Thanks, applied to my kvm-ppc-next branch, with the headline "KVM:
PPC: BookE: Use vma_pages function".

Paul.


Re: [PATCH 6/7] KVM: PPC: Cocci spatch "vma_pages"

2017-10-18 Thread Paul Mackerras
On Thu, Sep 21, 2017 at 12:29:36AM +0200, Thomas Meyer wrote:
> Use vma_pages function on vma object instead of explicit computation.
> Found by coccinelle spatch "api/vma_pages.cocci"
> 
> Signed-off-by: Thomas Meyer 

Thanks, applied to my kvm-ppc-next branch, with the headline "KVM:
PPC: BookE: Use vma_pages function".

Paul.


Re: Re: [PATCH] fix memory leak on kvm_vm_ioctl_create_spapr_tce

2017-08-28 Thread Paul Mackerras
On Mon, Aug 28, 2017 at 06:28:08AM +0100, Al Viro wrote:
> On Mon, Aug 28, 2017 at 02:38:37PM +1000, Paul Mackerras wrote:
> > On Sun, Aug 27, 2017 at 10:02:20PM +0100, Al Viro wrote:
> > > On Wed, Aug 23, 2017 at 04:06:24PM +1000, Paul Mackerras wrote:
> > > 
> > > > It seems to me that it would be better to do the anon_inode_getfd()
> > > > call before the kvm_get_kvm() call, and go to the fail label if it
> > > > fails.
> > > 
> > > And what happens if another thread does close() on the (guessed) fd?
> > 
> > Chaos ensues, but mostly because we don't have proper mutual exclusion
> > on the modifications to the list.  I'll add a mutex_lock/unlock to
> > kvm_spapr_tce_release() and move the anon_inode_getfd() call inside
> > the mutex.
> > 
> > It looks like the other possible uses of the fd (mmap, and passing it
> > as a parameter to the KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE ioctl on a KVM
> > device fd) are safe.
> 
> Frankly, it's a lot saner to have "no failure points past anon_inode_getfd()"
> policy...

Right.  In my latest patch, there are no failure points past
anon_inode_getfd().

Paul.


Re: Re: [PATCH] fix memory leak on kvm_vm_ioctl_create_spapr_tce

2017-08-28 Thread Paul Mackerras
On Mon, Aug 28, 2017 at 06:28:08AM +0100, Al Viro wrote:
> On Mon, Aug 28, 2017 at 02:38:37PM +1000, Paul Mackerras wrote:
> > On Sun, Aug 27, 2017 at 10:02:20PM +0100, Al Viro wrote:
> > > On Wed, Aug 23, 2017 at 04:06:24PM +1000, Paul Mackerras wrote:
> > > 
> > > > It seems to me that it would be better to do the anon_inode_getfd()
> > > > call before the kvm_get_kvm() call, and go to the fail label if it
> > > > fails.
> > > 
> > > And what happens if another thread does close() on the (guessed) fd?
> > 
> > Chaos ensues, but mostly because we don't have proper mutual exclusion
> > on the modifications to the list.  I'll add a mutex_lock/unlock to
> > kvm_spapr_tce_release() and move the anon_inode_getfd() call inside
> > the mutex.
> > 
> > It looks like the other possible uses of the fd (mmap, and passing it
> > as a parameter to the KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE ioctl on a KVM
> > device fd) are safe.
> 
> Frankly, it's a lot saner to have "no failure points past anon_inode_getfd()"
> policy...

Right.  In my latest patch, there are no failure points past
anon_inode_getfd().

Paul.


Re: Re: [PATCH] fix memory leak on kvm_vm_ioctl_create_spapr_tce

2017-08-27 Thread Paul Mackerras
On Sun, Aug 27, 2017 at 10:02:20PM +0100, Al Viro wrote:
> On Wed, Aug 23, 2017 at 04:06:24PM +1000, Paul Mackerras wrote:
> 
> > It seems to me that it would be better to do the anon_inode_getfd()
> > call before the kvm_get_kvm() call, and go to the fail label if it
> > fails.
> 
> And what happens if another thread does close() on the (guessed) fd?

Chaos ensues, but mostly because we don't have proper mutual exclusion
on the modifications to the list.  I'll add a mutex_lock/unlock to
kvm_spapr_tce_release() and move the anon_inode_getfd() call inside
the mutex.

It looks like the other possible uses of the fd (mmap, and passing it
as a parameter to the KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE ioctl on a KVM
device fd) are safe.

Thanks,
Paul.


Re: Re: [PATCH] fix memory leak on kvm_vm_ioctl_create_spapr_tce

2017-08-27 Thread Paul Mackerras
On Sun, Aug 27, 2017 at 10:02:20PM +0100, Al Viro wrote:
> On Wed, Aug 23, 2017 at 04:06:24PM +1000, Paul Mackerras wrote:
> 
> > It seems to me that it would be better to do the anon_inode_getfd()
> > call before the kvm_get_kvm() call, and go to the fail label if it
> > fails.
> 
> And what happens if another thread does close() on the (guessed) fd?

Chaos ensues, but mostly because we don't have proper mutual exclusion
on the modifications to the list.  I'll add a mutex_lock/unlock to
kvm_spapr_tce_release() and move the anon_inode_getfd() call inside
the mutex.

It looks like the other possible uses of the fd (mmap, and passing it
as a parameter to the KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE ioctl on a KVM
device fd) are safe.

Thanks,
Paul.


Re: Re: [PATCH] fix memory leak on kvm_vm_ioctl_create_spapr_tce

2017-08-23 Thread Paul Mackerras
On Wed, Aug 23, 2017 at 01:43:08AM +, Nixiaoming wrote:
> >On 22.08.2017 17:15, David Hildenbrand wrote:
> >> On 22.08.2017 16:28, nixiaoming wrote:
> >>> miss kfree(stt) when anon_inode_getfd return fail so add check 
> >>> anon_inode_getfd return val, and kfree stt
> >>>
> >>> Signed-off-by: nixiaoming 
> >>> ---
> >>>  arch/powerpc/kvm/book3s_64_vio.c | 5 -
> >>>  1 file changed, 4 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/arch/powerpc/kvm/book3s_64_vio.c 
> >>> b/arch/powerpc/kvm/book3s_64_vio.c
> >>> index a160c14..a0b4459 100644
> >>> --- a/arch/powerpc/kvm/book3s_64_vio.c
> >>> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> >>> @@ -341,8 +341,11 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm 
> >>> *kvm,
> >>>  
> >>>   mutex_unlock(>lock);
> >>>  
> >>> - return anon_inode_getfd("kvm-spapr-tce", _spapr_tce_fops,
> >>> + ret = anon_inode_getfd("kvm-spapr-tce", _spapr_tce_fops,
> >>>   stt, O_RDWR | O_CLOEXEC);
> >>> + if (ret < 0)
> >>> + goto fail;
> >>> + return ret;
> >>>  
> >>>  fail:
> >>>   if (stt) {
> >>>
> >> 
> >> 
> >> stt has already been added to kvm->arch.spapr_tce_tables, so freeing 
> >> it is evil IMHO. I don't know that code, so I don't know if there is 
> >> some other place that will make sure that everything in
> >> kvm->arch.spapr_tce_tables will properly get freed, even when no 
> >> kvm->release
> >> function has been called (kvm_spapr_tce_release).
> >> 
> >
> >If it is really not freed, than also kvm_put_kvm(stt->kvm) is missing.
> >
> >-- 
> >
> >Thanks,
> >
> >David
> >
> 
> if (!stt) return -ENOMEM;
> kvm_get_kvm(kvm);
> if anon_inode_getfd return -ENOMEM
> The user can not determine whether kvm_get_kvm has been called
> so need add kvm_pet_kvm when anon_inode_getfd fail
> 
> stt has already been added to kvm->arch.spapr_tce_tables,
> but if anon_inode_getfd fail, stt is unused val, 
> so call list_del_rcu, and  free as quickly as possible
> 
> new patch:
> 
> ---
>  arch/powerpc/kvm/book3s_64_vio.c | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c 
> b/arch/powerpc/kvm/book3s_64_vio.c
> index a160c14..e2228f1 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -341,8 +341,16 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
> 
> mutex_unlock(>lock);
> 
> -   return anon_inode_getfd("kvm-spapr-tce", _spapr_tce_fops,
> +   ret = anon_inode_getfd("kvm-spapr-tce", _spapr_tce_fops,
> stt, O_RDWR | O_CLOEXEC);
> +   if (ret < 0) {
> +   mutex_lock(>lock);
> +   list_del_rcu(>list);
> +   mutex_unlock(>lock);
> +   kvm_put_kvm(kvm);
> +   goto fail;
> +   }
> +   return ret;

It seems to me that it would be better to do the anon_inode_getfd()
call before the kvm_get_kvm() call, and go to the fail label if it
fails.

Paul.


Re: Re: [PATCH] fix memory leak on kvm_vm_ioctl_create_spapr_tce

2017-08-23 Thread Paul Mackerras
On Wed, Aug 23, 2017 at 01:43:08AM +, Nixiaoming wrote:
> >On 22.08.2017 17:15, David Hildenbrand wrote:
> >> On 22.08.2017 16:28, nixiaoming wrote:
> >>> miss kfree(stt) when anon_inode_getfd return fail so add check 
> >>> anon_inode_getfd return val, and kfree stt
> >>>
> >>> Signed-off-by: nixiaoming 
> >>> ---
> >>>  arch/powerpc/kvm/book3s_64_vio.c | 5 -
> >>>  1 file changed, 4 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/arch/powerpc/kvm/book3s_64_vio.c 
> >>> b/arch/powerpc/kvm/book3s_64_vio.c
> >>> index a160c14..a0b4459 100644
> >>> --- a/arch/powerpc/kvm/book3s_64_vio.c
> >>> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> >>> @@ -341,8 +341,11 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm 
> >>> *kvm,
> >>>  
> >>>   mutex_unlock(>lock);
> >>>  
> >>> - return anon_inode_getfd("kvm-spapr-tce", _spapr_tce_fops,
> >>> + ret = anon_inode_getfd("kvm-spapr-tce", _spapr_tce_fops,
> >>>   stt, O_RDWR | O_CLOEXEC);
> >>> + if (ret < 0)
> >>> + goto fail;
> >>> + return ret;
> >>>  
> >>>  fail:
> >>>   if (stt) {
> >>>
> >> 
> >> 
> >> stt has already been added to kvm->arch.spapr_tce_tables, so freeing 
> >> it is evil IMHO. I don't know that code, so I don't know if there is 
> >> some other place that will make sure that everything in
> >> kvm->arch.spapr_tce_tables will properly get freed, even when no 
> >> kvm->release
> >> function has been called (kvm_spapr_tce_release).
> >> 
> >
> >If it is really not freed, than also kvm_put_kvm(stt->kvm) is missing.
> >
> >-- 
> >
> >Thanks,
> >
> >David
> >
> 
> if (!stt) return -ENOMEM;
> kvm_get_kvm(kvm);
> if anon_inode_getfd return -ENOMEM
> The user can not determine whether kvm_get_kvm has been called
> so need add kvm_pet_kvm when anon_inode_getfd fail
> 
> stt has already been added to kvm->arch.spapr_tce_tables,
> but if anon_inode_getfd fail, stt is unused val, 
> so call list_del_rcu, and  free as quickly as possible
> 
> new patch:
> 
> ---
>  arch/powerpc/kvm/book3s_64_vio.c | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c 
> b/arch/powerpc/kvm/book3s_64_vio.c
> index a160c14..e2228f1 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -341,8 +341,16 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
> 
> mutex_unlock(>lock);
> 
> -   return anon_inode_getfd("kvm-spapr-tce", _spapr_tce_fops,
> +   ret = anon_inode_getfd("kvm-spapr-tce", _spapr_tce_fops,
> stt, O_RDWR | O_CLOEXEC);
> +   if (ret < 0) {
> +   mutex_lock(>lock);
> +   list_del_rcu(>list);
> +   mutex_unlock(>lock);
> +   kvm_put_kvm(kvm);
> +   goto fail;
> +   }
> +   return ret;

It seems to me that it would be better to do the anon_inode_getfd()
call before the kvm_get_kvm() call, and go to the fail label if it
fails.

Paul.


Re: [PATCH 2/4] time: Add warning about imminent deprecation of CONFIG_GENERIC_TIME_VSYSCALL_OLD

2017-06-21 Thread Paul Mackerras
On Tue, Jun 20, 2017 at 10:21:31PM -0700, John Stultz wrote:
> CONFIG_GENERIC_TIME_VSYSCALL_OLD was introduced five years ago
> to allow a transition from the old vsyscall implementations to
> the new method (which simplified internal accounting and made
> timekeeping more precise).
> 
> However, PPC and IA64 have yet to make the transition, despite
> in some cases me sending test patches to try to help it along.

Did you see my patch converting PPC to the new method?  Did you have
any comments on it?

Regards,
Paul.


Re: [PATCH 2/4] time: Add warning about imminent deprecation of CONFIG_GENERIC_TIME_VSYSCALL_OLD

2017-06-21 Thread Paul Mackerras
On Tue, Jun 20, 2017 at 10:21:31PM -0700, John Stultz wrote:
> CONFIG_GENERIC_TIME_VSYSCALL_OLD was introduced five years ago
> to allow a transition from the old vsyscall implementations to
> the new method (which simplified internal accounting and made
> timekeeping more precise).
> 
> However, PPC and IA64 have yet to make the transition, despite
> in some cases me sending test patches to try to help it along.

Did you see my patch converting PPC to the new method?  Did you have
any comments on it?

Regards,
Paul.


[PATCH] powerpc: Convert VDSO update function to use new update_vsyscall interface

2017-05-27 Thread Paul Mackerras
This converts the powerpc VDSO time update function to use the new
interface introduced in commit 576094b7f0aa ("time: Introduce new
GENERIC_TIME_VSYSCALL", 2012-09-11).  Where the old interface gave
us the time as of the last update in seconds and whole nanoseconds,
with the new interface we get the nanoseconds part effectively in
a binary fixed-point format with tk->tkr_mono.shift bits to the
right of the binary point.

With the old interface, the fractional nanoseconds got truncated,
meaning that the value returned by the VDSO clock_gettime function
would have about 1ns of jitter in it compared to the value computed
by the generic timekeeping code in the kernel.

The powerpc VDSO time functions (clock_gettime and gettimeofday)
already work in units of 2^-32 seconds, or 0.23283 ns, because that
makes it simple to split the result into seconds and fractional
seconds, and represent the fractional seconds in either microseconds
or nanoseconds.  This is good enough accuracy for now, so this patch
avoids changing how the VDSO works or the interface in the VDSO data
page.

This patch converts the powerpc update_vsyscall_old to be called
update_vsyscall and use the new interface.  We convert the fractional
second to units of 2^-32 seconds without truncating to whole nanoseconds.
(There is still a conversion to whole nanoseconds for any legacy users
of the vdso_data/systemcfg stamp_xtime field.)

In addition, this improves the accuracy of the computation of tb_to_xs
for those systems with high-frequency timebase clocks (>= 268.5 MHz)
by doing the right shift in two parts, one before the multiplication and
one after, rather than doing the right shift before the multiplication.
(We can't do all of the right shift after the multiplication unless we
use 128-bit arithmetic.)

Signed-off-by: Paul Mackerras <pau...@ozlabs.org>
---
 arch/powerpc/Kconfig   |  2 +-
 arch/powerpc/kernel/time.c | 68 +++---
 2 files changed, 53 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index f7c8f99..8fe2dc7 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -163,7 +163,7 @@ config PPC
select GENERIC_SMP_IDLE_THREAD
select GENERIC_STRNCPY_FROM_USER
select GENERIC_STRNLEN_USER
-   select GENERIC_TIME_VSYSCALL_OLD
+   select GENERIC_TIME_VSYSCALL
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_JUMP_LABEL
select HAVE_ARCH_KGDB
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 5d13f06..51a6bed 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -833,28 +833,66 @@ static u64 timebase_read(struct clocksource *cs)
return (u64)get_tb();
 }
 
-void update_vsyscall_old(struct timespec *wall_time, struct timespec *wtm,
-struct clocksource *clock, u32 mult, u64 cycle_last)
+
+void update_vsyscall(struct timekeeper *tk)
 {
+   struct timespec xt;
+   struct clocksource *clock = tk->tkr_mono.clock;
+   u32 mult = tk->tkr_mono.mult;
+   u32 shift = tk->tkr_mono.shift;
+   u64 cycle_last = tk->tkr_mono.cycle_last;
u64 new_tb_to_xs, new_stamp_xsec;
-   u32 frac_sec;
+   u64 frac_sec;
 
if (clock != _timebase)
return;
 
+   xt.tv_sec = tk->xtime_sec;
+   xt.tv_nsec = (long)(tk->tkr_mono.xtime_nsec >> tk->tkr_mono.shift);
+
/* Make userspace gettimeofday spin until we're done. */
++vdso_data->tb_update_count;
smp_mb();
 
-   /* 19342813113834067 ~= 2^(20+64) / 1e9 */
-   new_tb_to_xs = (u64) mult * (19342813113834067ULL >> clock->shift);
-   new_stamp_xsec = (u64) wall_time->tv_nsec * XSEC_PER_SEC;
-   do_div(new_stamp_xsec, 10);
-   new_stamp_xsec += (u64) wall_time->tv_sec * XSEC_PER_SEC;
+   /*
+* This computes ((2^20 / 1e9) * mult) >> shift as a
+* 0.64 fixed-point fraction.
+* The computation in the else clause below won't overflow
+* (as long as the timebase frequency is >= 1.049 MHz)
+* but loses precision because we lose the low bits of the constant
+* in the shift.  Note that 19342813113834067 ~= 2^(20+64) / 1e9.
+* For a shift of 24 the error is about 0.5e-9, or about 0.5ns
+* over a second.  (Shift values are usually 22, 23 or 24.)
+* For high frequency clocks such as the 512MHz timebase clock
+* on POWER[6789], the mult value is small (e.g. 32768000)
+* and so we can shift the constant by 16 initially
+* (295147905179 ~= 2^(20+64-16) / 1e9) and then do the
+* remaining shifts after the multiplication, which gives a
+* more accurate result (e.g. with mult = 32768000, shift = 24,
+* the error is only about 1.2e-12, or 0.7ns over 10 minutes).
+*/
+   if (mult <= 6250 && cloc

[PATCH] powerpc: Convert VDSO update function to use new update_vsyscall interface

2017-05-27 Thread Paul Mackerras
This converts the powerpc VDSO time update function to use the new
interface introduced in commit 576094b7f0aa ("time: Introduce new
GENERIC_TIME_VSYSCALL", 2012-09-11).  Where the old interface gave
us the time as of the last update in seconds and whole nanoseconds,
with the new interface we get the nanoseconds part effectively in
a binary fixed-point format with tk->tkr_mono.shift bits to the
right of the binary point.

With the old interface, the fractional nanoseconds got truncated,
meaning that the value returned by the VDSO clock_gettime function
would have about 1ns of jitter in it compared to the value computed
by the generic timekeeping code in the kernel.

The powerpc VDSO time functions (clock_gettime and gettimeofday)
already work in units of 2^-32 seconds, or 0.23283 ns, because that
makes it simple to split the result into seconds and fractional
seconds, and represent the fractional seconds in either microseconds
or nanoseconds.  This is good enough accuracy for now, so this patch
avoids changing how the VDSO works or the interface in the VDSO data
page.

This patch converts the powerpc update_vsyscall_old to be called
update_vsyscall and use the new interface.  We convert the fractional
second to units of 2^-32 seconds without truncating to whole nanoseconds.
(There is still a conversion to whole nanoseconds for any legacy users
of the vdso_data/systemcfg stamp_xtime field.)

In addition, this improves the accuracy of the computation of tb_to_xs
for those systems with high-frequency timebase clocks (>= 268.5 MHz)
by doing the right shift in two parts, one before the multiplication and
one after, rather than doing the right shift before the multiplication.
(We can't do all of the right shift after the multiplication unless we
use 128-bit arithmetic.)

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/Kconfig   |  2 +-
 arch/powerpc/kernel/time.c | 68 +++---
 2 files changed, 53 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index f7c8f99..8fe2dc7 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -163,7 +163,7 @@ config PPC
select GENERIC_SMP_IDLE_THREAD
select GENERIC_STRNCPY_FROM_USER
select GENERIC_STRNLEN_USER
-   select GENERIC_TIME_VSYSCALL_OLD
+   select GENERIC_TIME_VSYSCALL
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_JUMP_LABEL
select HAVE_ARCH_KGDB
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 5d13f06..51a6bed 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -833,28 +833,66 @@ static u64 timebase_read(struct clocksource *cs)
return (u64)get_tb();
 }
 
-void update_vsyscall_old(struct timespec *wall_time, struct timespec *wtm,
-struct clocksource *clock, u32 mult, u64 cycle_last)
+
+void update_vsyscall(struct timekeeper *tk)
 {
+   struct timespec xt;
+   struct clocksource *clock = tk->tkr_mono.clock;
+   u32 mult = tk->tkr_mono.mult;
+   u32 shift = tk->tkr_mono.shift;
+   u64 cycle_last = tk->tkr_mono.cycle_last;
u64 new_tb_to_xs, new_stamp_xsec;
-   u32 frac_sec;
+   u64 frac_sec;
 
if (clock != _timebase)
return;
 
+   xt.tv_sec = tk->xtime_sec;
+   xt.tv_nsec = (long)(tk->tkr_mono.xtime_nsec >> tk->tkr_mono.shift);
+
/* Make userspace gettimeofday spin until we're done. */
++vdso_data->tb_update_count;
smp_mb();
 
-   /* 19342813113834067 ~= 2^(20+64) / 1e9 */
-   new_tb_to_xs = (u64) mult * (19342813113834067ULL >> clock->shift);
-   new_stamp_xsec = (u64) wall_time->tv_nsec * XSEC_PER_SEC;
-   do_div(new_stamp_xsec, 10);
-   new_stamp_xsec += (u64) wall_time->tv_sec * XSEC_PER_SEC;
+   /*
+* This computes ((2^20 / 1e9) * mult) >> shift as a
+* 0.64 fixed-point fraction.
+* The computation in the else clause below won't overflow
+* (as long as the timebase frequency is >= 1.049 MHz)
+* but loses precision because we lose the low bits of the constant
+* in the shift.  Note that 19342813113834067 ~= 2^(20+64) / 1e9.
+* For a shift of 24 the error is about 0.5e-9, or about 0.5ns
+* over a second.  (Shift values are usually 22, 23 or 24.)
+* For high frequency clocks such as the 512MHz timebase clock
+* on POWER[6789], the mult value is small (e.g. 32768000)
+* and so we can shift the constant by 16 initially
+* (295147905179 ~= 2^(20+64-16) / 1e9) and then do the
+* remaining shifts after the multiplication, which gives a
+* more accurate result (e.g. with mult = 32768000, shift = 24,
+* the error is only about 1.2e-12, or 0.7ns over 10 minutes).
+*/
+   if (mult <= 6250 && clock->shift >= 1

Re: [RFC][PATCH] time: Add warning about imminent deprecation of CONFIG_GENERIC_TIME_VSYSCALL_OLD

2017-05-25 Thread Paul Mackerras
On Mon, May 22, 2017 at 12:06:04PM -0700, John Stultz wrote:
> 
> Basically long ago, timekeeping was handled (roughly) like:
> 
> clock_gettime():
> now = tk->clock->read()
> offset_ns = ((now - tk->cycle_last) * tk->clock->mult) >> 
> tk->clock->shift;
> return timespec_add_ns(tk->xtime, offset_ns);
> 
> But since for error handling use sub-ns precision, combined with that
> for update performance, we accumulate in fixed intervals, there are
> situations where in the update, we could accumulate half of a
> nanosecond into the base tk->xtime value and leaving half of a
> nanosecond in the offset.   This caused the split nanosecond to be
> truncated out by the math, causing 1ns discontinuities.
> 
> So to address this, we came up with sort of a hack, which when we
> accumulate rounds up that partial nanosecond, and adds the amount we
> rounded up to the error (which will cause the freq correction code to
> slow the clock down slightly). This is the code that is now done in
> the old_vsyscall_fixup() logic.
> 
> Unfortunately this fix (which generates up to a nanosecond of error
> per tick) then made the freq correction code do more work and made it
> more difficult to have a stable clock.
> 
> So we went for a more proper fix, which was to properly handle the
> sub-nanosecond portion of the timekeeping throughout the logic, doing
> the truncation last.
> 
> clock_gettime():
> now = tk->clock->read()
> ret.tv_sec = tk->xtime_sec;
> offset_sns = (now - tk->cycle_last) * tk->clock->mult;
> ret.tv_nsec = (offset_sns + tk->tkr_mono.xtime_nsec) >> tk->clock->shift;
> return ret;
> 
> So in the above, we now use the tk->tkr_mono.xtime_nsec (which despite
> its unfortunate name, stores the accumulated shifted nanoseconds), and
> add it to the (current_cycle_delta * clock->mult), then we do the
> shift last to preserve as much precision as we can.
> 
> Unfortunately we need all the reader code to do the same, which wasn't
> easy to transition in some cases. So we provided the
> CONFIG_GENERIC_TIME_VSYSCALL_OLD option to preserve the old round-up
> behavior while arch maintainers could make the transition.

The VDSO code on PPC computes the offset in units of 2^-32 seconds,
not nanoseconds, because that makes it easy to handle the split of the
offset into whole seconds and fractional seconds (which is handled in
the generic code by the slightly icky __iter_div_u64_rem function),
and also means that we can use PPC's instruction that computes
(a * b) >> 32 to convert the fractional part to either nanoseconds or
microseconds without doing a division.

I could pretty easily change the computations done at update_vsyscall
time to convert the tk->tkr_mono.xtime_nsec value to units of 2^-32
seconds for use by the VDSO.  That would mean we would no longer need
CONFIG_GENERIC_TIME_VSYSCALL_OLD, and would give us values returned by
the VDSO gettimeofday() and clock_gettime() that should be within
about 1/4 ns of what the generic code in the kernel would give (on
average, I mean, given that the results have at best nanosecond
resolution).  Since that corresponds to about 1 CPU clock cycle, it
seems like it should be good enough.

Alternatively I could make the VDSO computations use a smaller unit
(maybe 2^-36 or 2^-40 seconds), or else rewrite them to use exactly
the same algorithm as the generic code - which would be a bigger
change, and would mean having to do an iterative division.

So, do you think the 1/4 ns resolution is good enough for the VDSO
computations?

Paul.


Re: [RFC][PATCH] time: Add warning about imminent deprecation of CONFIG_GENERIC_TIME_VSYSCALL_OLD

2017-05-25 Thread Paul Mackerras
On Mon, May 22, 2017 at 12:06:04PM -0700, John Stultz wrote:
> 
> Basically long ago, timekeeping was handled (roughly) like:
> 
> clock_gettime():
> now = tk->clock->read()
> offset_ns = ((now - tk->cycle_last) * tk->clock->mult) >> 
> tk->clock->shift;
> return timespec_add_ns(tk->xtime, offset_ns);
> 
> But since for error handling use sub-ns precision, combined with that
> for update performance, we accumulate in fixed intervals, there are
> situations where in the update, we could accumulate half of a
> nanosecond into the base tk->xtime value and leaving half of a
> nanosecond in the offset.   This caused the split nanosecond to be
> truncated out by the math, causing 1ns discontinuities.
> 
> So to address this, we came up with sort of a hack, which when we
> accumulate rounds up that partial nanosecond, and adds the amount we
> rounded up to the error (which will cause the freq correction code to
> slow the clock down slightly). This is the code that is now done in
> the old_vsyscall_fixup() logic.
> 
> Unfortunately this fix (which generates up to a nanosecond of error
> per tick) then made the freq correction code do more work and made it
> more difficult to have a stable clock.
> 
> So we went for a more proper fix, which was to properly handle the
> sub-nanosecond portion of the timekeeping throughout the logic, doing
> the truncation last.
> 
> clock_gettime():
> now = tk->clock->read()
> ret.tv_sec = tk->xtime_sec;
> offset_sns = (now - tk->cycle_last) * tk->clock->mult;
> ret.tv_nsec = (offset_sns + tk->tkr_mono.xtime_nsec) >> tk->clock->shift;
> return ret;
> 
> So in the above, we now use the tk->tkr_mono.xtime_nsec (which despite
> its unfortunate name, stores the accumulated shifted nanoseconds), and
> add it to the (current_cycle_delta * clock->mult), then we do the
> shift last to preserve as much precision as we can.
> 
> Unfortunately we need all the reader code to do the same, which wasn't
> easy to transition in some cases. So we provided the
> CONFIG_GENERIC_TIME_VSYSCALL_OLD option to preserve the old round-up
> behavior while arch maintainers could make the transition.

The VDSO code on PPC computes the offset in units of 2^-32 seconds,
not nanoseconds, because that makes it easy to handle the split of the
offset into whole seconds and fractional seconds (which is handled in
the generic code by the slightly icky __iter_div_u64_rem function),
and also means that we can use PPC's instruction that computes
(a * b) >> 32 to convert the fractional part to either nanoseconds or
microseconds without doing a division.

I could pretty easily change the computations done at update_vsyscall
time to convert the tk->tkr_mono.xtime_nsec value to units of 2^-32
seconds for use by the VDSO.  That would mean we would no longer need
CONFIG_GENERIC_TIME_VSYSCALL_OLD, and would give us values returned by
the VDSO gettimeofday() and clock_gettime() that should be within
about 1/4 ns of what the generic code in the kernel would give (on
average, I mean, given that the results have at best nanosecond
resolution).  Since that corresponds to about 1 CPU clock cycle, it
seems like it should be good enough.

Alternatively I could make the VDSO computations use a smaller unit
(maybe 2^-36 or 2^-40 seconds), or else rewrite them to use exactly
the same algorithm as the generic code - which would be a bigger
change, and would mean having to do an iterative division.

So, do you think the 1/4 ns resolution is good enough for the VDSO
computations?

Paul.


Re: [PATCH] KVM: Eliminate unused variable warning on uniprocessor configs

2017-05-12 Thread Paul Mackerras
On Thu, May 11, 2017 at 09:44:06AM +0200, Paolo Bonzini wrote:
> 
> 
> On 11/05/2017 05:40, Paul Mackerras wrote:
> > When the
> > kernel is compiled with CONFIG_SMP=n, smp_call_function_many() turns
> > into a macro which doesn't use the 'wait' argument, leading to a
> > warning about the variable 'wait' being unused:
> > 
> > /home/paulus/kernel/kvm/arch/powerpc/kvm/../../../virt/kvm/kvm_main.c: In 
> > function ‘kvm_make_all_cpus_request’:
> > /home/paulus/kernel/kvm/arch/powerpc/kvm/../../../virt/kvm/kvm_main.c:195:7:
> >  error: unused variable ‘wait’ [-Werror=unused-variable]
> >   bool wait = req & KVM_REQUEST_WAIT;
> 
> Maybe the macro should not be a macro:

Sounds fair, and with a couple of semicolons removed, it works and
fixes the problem.  Do you want to send it to the appropriate
maintainer(s) or will I?

> diff --git a/include/linux/smp.h b/include/linux/smp.h
> index 8e0cb7a0f836..899c72f0933f 100644
> --- a/include/linux/smp.h
> +++ b/include/linux/smp.h
> @@ -128,17 +128,18 @@ static inline void smp_send_stop(void) { }
>   *   These macros fold the SMP functionality into a single CPU system
>   */
>  #define raw_smp_processor_id()   0
> -static inline int up_smp_call_function(smp_call_func_t func, void *info)
> +static inline void smp_call_function_many(const struct cpumask *mask,
> +   smp_call_func_t func, void *info,
> +   bool wait);

No semicolon on the end of this line...

>  {
> - return 0;
> +}
> +
> +static inline void smp_call_function(smp_call_func_t func, void *info, bool 
> wait);

or this one.

Paul.


Re: [PATCH] KVM: Eliminate unused variable warning on uniprocessor configs

2017-05-12 Thread Paul Mackerras
On Thu, May 11, 2017 at 09:44:06AM +0200, Paolo Bonzini wrote:
> 
> 
> On 11/05/2017 05:40, Paul Mackerras wrote:
> > When the
> > kernel is compiled with CONFIG_SMP=n, smp_call_function_many() turns
> > into a macro which doesn't use the 'wait' argument, leading to a
> > warning about the variable 'wait' being unused:
> > 
> > /home/paulus/kernel/kvm/arch/powerpc/kvm/../../../virt/kvm/kvm_main.c: In 
> > function ‘kvm_make_all_cpus_request’:
> > /home/paulus/kernel/kvm/arch/powerpc/kvm/../../../virt/kvm/kvm_main.c:195:7:
> >  error: unused variable ‘wait’ [-Werror=unused-variable]
> >   bool wait = req & KVM_REQUEST_WAIT;
> 
> Maybe the macro should not be a macro:

Sounds fair, and with a couple of semicolons removed, it works and
fixes the problem.  Do you want to send it to the appropriate
maintainer(s) or will I?

> diff --git a/include/linux/smp.h b/include/linux/smp.h
> index 8e0cb7a0f836..899c72f0933f 100644
> --- a/include/linux/smp.h
> +++ b/include/linux/smp.h
> @@ -128,17 +128,18 @@ static inline void smp_send_stop(void) { }
>   *   These macros fold the SMP functionality into a single CPU system
>   */
>  #define raw_smp_processor_id()   0
> -static inline int up_smp_call_function(smp_call_func_t func, void *info)
> +static inline void smp_call_function_many(const struct cpumask *mask,
> +   smp_call_func_t func, void *info,
> +   bool wait);

No semicolon on the end of this line...

>  {
> - return 0;
> +}
> +
> +static inline void smp_call_function(smp_call_func_t func, void *info, bool 
> wait);

or this one.

Paul.


Re: [PATCH] KVM: Prevent double-free on HPT resize commit path

2017-02-28 Thread Paul Mackerras
On Tue, Feb 28, 2017 at 11:56:55AM +1100, David Gibson wrote:
> On Wed, Feb 15, 2017 at 02:40:04PM +1100, David Gibson wrote:
> > resize_hpt_release(), called once the HPT resize of a KVM guest is
> > completed (successfully or unsuccessfully) free()s the state structure for
> > the resize.  It is currently not safe to call with a NULL pointer.
> > 
> > However, one of the error paths in kvm_vm_ioctl_resize_hpt_commit() can
> > invoke it with a NULL pointer.  This will occur if userspace improperly
> > invokes KVM_PPC_RESIZE_HPT_COMMIT without previously calling
> > KVM_PPC_RESIZE_HPT_PREPARE, or if it calls COMMIT twice without an
> > intervening PREPARE.
> > 
> > To fix this potential crash bug - and maybe others like it, make it safe
> > (and a no-op) to call resize_hpt_release() with a NULL resize pointer.
> > 
> > Found by Dan Carpenter with a static checker.
> > 
> > Reported-by: Dan Carpenter 
> > Signed-off-by: David Gibson 
> 
> Ping,
> 
> Paul have you taken this one?

Yes, thanks, it's in Linus' tree now.

Paul.


Re: [PATCH] KVM: Prevent double-free on HPT resize commit path

2017-02-28 Thread Paul Mackerras
On Tue, Feb 28, 2017 at 11:56:55AM +1100, David Gibson wrote:
> On Wed, Feb 15, 2017 at 02:40:04PM +1100, David Gibson wrote:
> > resize_hpt_release(), called once the HPT resize of a KVM guest is
> > completed (successfully or unsuccessfully) free()s the state structure for
> > the resize.  It is currently not safe to call with a NULL pointer.
> > 
> > However, one of the error paths in kvm_vm_ioctl_resize_hpt_commit() can
> > invoke it with a NULL pointer.  This will occur if userspace improperly
> > invokes KVM_PPC_RESIZE_HPT_COMMIT without previously calling
> > KVM_PPC_RESIZE_HPT_PREPARE, or if it calls COMMIT twice without an
> > intervening PREPARE.
> > 
> > To fix this potential crash bug - and maybe others like it, make it safe
> > (and a no-op) to call resize_hpt_release() with a NULL resize pointer.
> > 
> > Found by Dan Carpenter with a static checker.
> > 
> > Reported-by: Dan Carpenter 
> > Signed-off-by: David Gibson 
> 
> Ping,
> 
> Paul have you taken this one?

Yes, thanks, it's in Linus' tree now.

Paul.


Re: [PATCHv2 00/11] KVM implementation of PAPR HPT resizing extension

2017-01-31 Thread Paul Mackerras
On Wed, Feb 01, 2017 at 03:56:36PM +1100, David Gibson wrote:
> On Wed, Feb 01, 2017 at 09:18:13AM +1100, Paul Mackerras wrote:
> > On Tue, Dec 20, 2016 at 04:48:56PM +1100, David Gibson wrote:
> > > Here is the KVM implementation for the proposed PAPR extension which
> > > allows the runtime resizing of a PAPR guest's Hashed Page Table (HPT).
> > > 
> > > Using this requires a guest kernel with support for the extension.
> > > Patches for guest side support in Linux were posted earlier:
> > >   
> > > https://lists.ozlabs.org/pipermail/linuxppc-dev/2016-December/152164.html
> > > 
> > > It also requires userspace (i.e. qemu) to intercept the HPT resizing
> > > hypercalls and invoke the KVM ioctl()s to implement them.  This is
> > > done instead of having KVM direclty intercept the hypercalls, so that
> > > userspace can, if useful, impose additional restrictions on resizes:
> > > for example it could refuse them entirely if policy for the VM
> > > precludes resizing, or it could limit the size of HPT the guest can
> > > request to meet resource limits.
> > > 
> > > Patches to implement the userspace part of HPT resizing are proposed
> > > for qemu-2.9, and can be found at:
> > >   https://github.com/dgibson/qemu/tree/hpt-resize
> > > 
> > > I'm posting these now, in the hopes that both these and the
> > > corresponding guest side patches can be staged and merged for the 4.11
> > > window.
> > 
> > Thanks, series applied to my kvm-ppc-next branch.
> 
> Great.  Where can I grab that tree?

git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc

> > Next time, please cc kvm-...@vger.kernel.org so I shows up in
> > patchwork and I don't miss it.
> 
> Sorry, I hadn't realized that list existed.  Noted for future
> reference.

Thanks,
Paul.


Re: [PATCHv2 00/11] KVM implementation of PAPR HPT resizing extension

2017-01-31 Thread Paul Mackerras
On Wed, Feb 01, 2017 at 03:56:36PM +1100, David Gibson wrote:
> On Wed, Feb 01, 2017 at 09:18:13AM +1100, Paul Mackerras wrote:
> > On Tue, Dec 20, 2016 at 04:48:56PM +1100, David Gibson wrote:
> > > Here is the KVM implementation for the proposed PAPR extension which
> > > allows the runtime resizing of a PAPR guest's Hashed Page Table (HPT).
> > > 
> > > Using this requires a guest kernel with support for the extension.
> > > Patches for guest side support in Linux were posted earlier:
> > >   
> > > https://lists.ozlabs.org/pipermail/linuxppc-dev/2016-December/152164.html
> > > 
> > > It also requires userspace (i.e. qemu) to intercept the HPT resizing
> > > hypercalls and invoke the KVM ioctl()s to implement them.  This is
> > > done instead of having KVM direclty intercept the hypercalls, so that
> > > userspace can, if useful, impose additional restrictions on resizes:
> > > for example it could refuse them entirely if policy for the VM
> > > precludes resizing, or it could limit the size of HPT the guest can
> > > request to meet resource limits.
> > > 
> > > Patches to implement the userspace part of HPT resizing are proposed
> > > for qemu-2.9, and can be found at:
> > >   https://github.com/dgibson/qemu/tree/hpt-resize
> > > 
> > > I'm posting these now, in the hopes that both these and the
> > > corresponding guest side patches can be staged and merged for the 4.11
> > > window.
> > 
> > Thanks, series applied to my kvm-ppc-next branch.
> 
> Great.  Where can I grab that tree?

git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc

> > Next time, please cc kvm-...@vger.kernel.org so I shows up in
> > patchwork and I don't miss it.
> 
> Sorry, I hadn't realized that list existed.  Noted for future
> reference.

Thanks,
Paul.


Re: [PATCHv2 00/11] KVM implementation of PAPR HPT resizing extension

2017-01-31 Thread Paul Mackerras
On Tue, Dec 20, 2016 at 04:48:56PM +1100, David Gibson wrote:
> Here is the KVM implementation for the proposed PAPR extension which
> allows the runtime resizing of a PAPR guest's Hashed Page Table (HPT).
> 
> Using this requires a guest kernel with support for the extension.
> Patches for guest side support in Linux were posted earlier:
>   https://lists.ozlabs.org/pipermail/linuxppc-dev/2016-December/152164.html
> 
> It also requires userspace (i.e. qemu) to intercept the HPT resizing
> hypercalls and invoke the KVM ioctl()s to implement them.  This is
> done instead of having KVM direclty intercept the hypercalls, so that
> userspace can, if useful, impose additional restrictions on resizes:
> for example it could refuse them entirely if policy for the VM
> precludes resizing, or it could limit the size of HPT the guest can
> request to meet resource limits.
> 
> Patches to implement the userspace part of HPT resizing are proposed
> for qemu-2.9, and can be found at:
>   https://github.com/dgibson/qemu/tree/hpt-resize
> 
> I'm posting these now, in the hopes that both these and the
> corresponding guest side patches can be staged and merged for the 4.11
> window.

Thanks, series applied to my kvm-ppc-next branch.

Next time, please cc kvm-...@vger.kernel.org so I shows up in
patchwork and I don't miss it.

Paul.


Re: [PATCHv2 00/11] KVM implementation of PAPR HPT resizing extension

2017-01-31 Thread Paul Mackerras
On Tue, Dec 20, 2016 at 04:48:56PM +1100, David Gibson wrote:
> Here is the KVM implementation for the proposed PAPR extension which
> allows the runtime resizing of a PAPR guest's Hashed Page Table (HPT).
> 
> Using this requires a guest kernel with support for the extension.
> Patches for guest side support in Linux were posted earlier:
>   https://lists.ozlabs.org/pipermail/linuxppc-dev/2016-December/152164.html
> 
> It also requires userspace (i.e. qemu) to intercept the HPT resizing
> hypercalls and invoke the KVM ioctl()s to implement them.  This is
> done instead of having KVM direclty intercept the hypercalls, so that
> userspace can, if useful, impose additional restrictions on resizes:
> for example it could refuse them entirely if policy for the VM
> precludes resizing, or it could limit the size of HPT the guest can
> request to meet resource limits.
> 
> Patches to implement the userspace part of HPT resizing are proposed
> for qemu-2.9, and can be found at:
>   https://github.com/dgibson/qemu/tree/hpt-resize
> 
> I'm posting these now, in the hopes that both these and the
> corresponding guest side patches can be staged and merged for the 4.11
> window.

Thanks, series applied to my kvm-ppc-next branch.

Next time, please cc kvm-...@vger.kernel.org so I shows up in
patchwork and I don't miss it.

Paul.


Re: [PATCH] powerpc: Fix LPCR_VRMASD definition

2016-12-08 Thread Paul Mackerras
On Thu, Dec 08, 2016 at 11:29:30AM +0800, Jia He wrote:
> Fixes: a4b349540a ("powerpc/mm: Cleanup LPCR defines")
> Signed-off-by: Jia He <hejia...@gmail.com>

Acked-by: Paul Mackerras <pau...@ozlabs.org>


Re: [PATCH] powerpc: Fix LPCR_VRMASD definition

2016-12-08 Thread Paul Mackerras
On Thu, Dec 08, 2016 at 11:29:30AM +0800, Jia He wrote:
> Fixes: a4b349540a ("powerpc/mm: Cleanup LPCR defines")
> Signed-off-by: Jia He 

Acked-by: Paul Mackerras 


Re: [PATCH 00/10] vtime: Delay cputime accounting to tick

2016-12-05 Thread Paul Mackerras
On Tue, Dec 06, 2016 at 03:32:13AM +0100, Frederic Weisbecker wrote:
> This follows up Martin Schwidefsky's patch which propose to delay
> cputime accounting to the tick in order to minimize the calls to
> account_system_time() and alikes as these functions can carry quite some
> overhead:
> 
>   http://lkml.kernel.org/r/2016112728.13a0a3db@mschwide
> 
> The set includes Martin's patch, rebased on top of tip:sched/core and
> latest s390 changes, and extends it to the other implementations of
> CONFIG_VIRT_CPU_ACCOUNTING_NATIVE (powerpc and ia64) along with a few
> core changes to adapt the whole.
> 
> Only built-tested though as I don't have access to any of these archs.

The patches look reasonable at a quick look.  I assume that to test
them, we would want to run a guest in an overcommitted system, so as
to get some steal time.  Do you have any more specific suggestions as
to what to run as a test?  Just run some benchmark and see if the
user/system/irq times look reasonable?  Or do you have something more
quantitative?

Paul.


Re: [PATCH 00/10] vtime: Delay cputime accounting to tick

2016-12-05 Thread Paul Mackerras
On Tue, Dec 06, 2016 at 03:32:13AM +0100, Frederic Weisbecker wrote:
> This follows up Martin Schwidefsky's patch which propose to delay
> cputime accounting to the tick in order to minimize the calls to
> account_system_time() and alikes as these functions can carry quite some
> overhead:
> 
>   http://lkml.kernel.org/r/2016112728.13a0a3db@mschwide
> 
> The set includes Martin's patch, rebased on top of tip:sched/core and
> latest s390 changes, and extends it to the other implementations of
> CONFIG_VIRT_CPU_ACCOUNTING_NATIVE (powerpc and ia64) along with a few
> core changes to adapt the whole.
> 
> Only built-tested though as I don't have access to any of these archs.

The patches look reasonable at a quick look.  I assume that to test
them, we would want to run a guest in an overcommitted system, so as
to get some steal time.  Do you have any more specific suggestions as
to what to run as a test?  Just run some benchmark and see if the
user/system/irq times look reasonable?  Or do you have something more
quantitative?

Paul.


Re: [PATCH 0/2] Preliminary cleanups for HPT resizing

2016-11-23 Thread Paul Mackerras
On Wed, Nov 23, 2016 at 04:14:05PM +1100, David Gibson wrote:
> Hi Paul,
> 
> I'm still chasing this confusion about the CAS bit to send the real
> HPT resizing patches.  However, in the meantime, here are some
> preliminary cleanups.
> 
> These cleanups stand on their own, although I wrote them in the
> context of writing the HPT resizing code, and are prerequisites for
> those patches.
> 
> David Gibson (2):
>   kvm: Move KVM_PPC_PVINFO_FLAGS_EV_IDLE definition next to its
> structure
>   powerpc/kvm: Corectly report KVM_CAP_PPC_ALLOC_HTAB
> 
>  arch/powerpc/kvm/powerpc.c | 5 -
>  include/uapi/linux/kvm.h   | 5 +++--
>  2 files changed, 7 insertions(+), 3 deletions(-)

Thanks, series applied to my kvm-ppc-next branch.

Paul.


Re: [PATCH 0/2] Preliminary cleanups for HPT resizing

2016-11-23 Thread Paul Mackerras
On Wed, Nov 23, 2016 at 04:14:05PM +1100, David Gibson wrote:
> Hi Paul,
> 
> I'm still chasing this confusion about the CAS bit to send the real
> HPT resizing patches.  However, in the meantime, here are some
> preliminary cleanups.
> 
> These cleanups stand on their own, although I wrote them in the
> context of writing the HPT resizing code, and are prerequisites for
> those patches.
> 
> David Gibson (2):
>   kvm: Move KVM_PPC_PVINFO_FLAGS_EV_IDLE definition next to its
> structure
>   powerpc/kvm: Corectly report KVM_CAP_PPC_ALLOC_HTAB
> 
>  arch/powerpc/kvm/powerpc.c | 5 -
>  include/uapi/linux/kvm.h   | 5 +++--
>  2 files changed, 7 insertions(+), 3 deletions(-)

Thanks, series applied to my kvm-ppc-next branch.

Paul.


Re: [PATCH v2] powernv: Handle wakeup from idle due to SRESET

2016-11-22 Thread Paul Mackerras
On Tue, Nov 22, 2016 at 11:06:32PM +0530, Gautham R. Shenoy wrote:
> From: "Gautham R. Shenoy" 
> 
> The existing code doesn't handle the case when CPU which was in a
> hardware-idle state (nap,sleep,winkle on POWER8 and various stop
> states on POWER9) gets woken up due to a System Reset interrupt.
> 
> This patch checks if the CPU was woken up due to System Reset, in
> which case, after restoring the required hardware state, it jumps to
> the system reset handler.
> 
> Signed-off-by: Gautham R. Shenoy 
> ---
> v1 -> v2: Set r9,r11,r12 to CR,SRR0,SRR1 values
> before jumping to system_reset_common as expected by
> EXCEPTION_PROLOG_COMMON
> 
>  arch/powerpc/kernel/idle_book3s.S | 18 ++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/idle_book3s.S 
> b/arch/powerpc/kernel/idle_book3s.S
> index 72dac0b..06afe0e 100644
> --- a/arch/powerpc/kernel/idle_book3s.S
> +++ b/arch/powerpc/kernel/idle_book3s.S
> @@ -353,6 +353,22 @@ ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66); 
> \
>   ld  r3,ORIG_GPR3(r1);   /* Restore original r3 */   \
>  20:  nop;
>  
> +#define CHECK_SRESET_INTERRUPT  \
> +BEGIN_FTR_SECTION_NESTED(67);
> \
> + mfspr   r0,SPRN_SRR1;   \
> + rlwinm  r0,r0,45-31,0xf; /* Extract wake reason field (P8,9) */ \
> + cmpwi   r0,0x4;   /* System Reset ? */  \
> + bne 21f;\
> + ld  r1,PACAR1(r13); \
> + ld  r9,_CCR(r1);\
> + ld  r11,_NIP(r1);   \
> + mfspr   r12, SPRN_SRR1; \
> + b   system_reset_common ;   \
> + b   .;  /* We shouldn't return here */  \
> +FTR_SECTION_ELSE_NESTED(67); \
> + nop;\
> +ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 67); \

What's the point of the else section?

Paul.


Re: [PATCH v2] powernv: Handle wakeup from idle due to SRESET

2016-11-22 Thread Paul Mackerras
On Tue, Nov 22, 2016 at 11:06:32PM +0530, Gautham R. Shenoy wrote:
> From: "Gautham R. Shenoy" 
> 
> The existing code doesn't handle the case when CPU which was in a
> hardware-idle state (nap,sleep,winkle on POWER8 and various stop
> states on POWER9) gets woken up due to a System Reset interrupt.
> 
> This patch checks if the CPU was woken up due to System Reset, in
> which case, after restoring the required hardware state, it jumps to
> the system reset handler.
> 
> Signed-off-by: Gautham R. Shenoy 
> ---
> v1 -> v2: Set r9,r11,r12 to CR,SRR0,SRR1 values
> before jumping to system_reset_common as expected by
> EXCEPTION_PROLOG_COMMON
> 
>  arch/powerpc/kernel/idle_book3s.S | 18 ++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/idle_book3s.S 
> b/arch/powerpc/kernel/idle_book3s.S
> index 72dac0b..06afe0e 100644
> --- a/arch/powerpc/kernel/idle_book3s.S
> +++ b/arch/powerpc/kernel/idle_book3s.S
> @@ -353,6 +353,22 @@ ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66); 
> \
>   ld  r3,ORIG_GPR3(r1);   /* Restore original r3 */   \
>  20:  nop;
>  
> +#define CHECK_SRESET_INTERRUPT  \
> +BEGIN_FTR_SECTION_NESTED(67);
> \
> + mfspr   r0,SPRN_SRR1;   \
> + rlwinm  r0,r0,45-31,0xf; /* Extract wake reason field (P8,9) */ \
> + cmpwi   r0,0x4;   /* System Reset ? */  \
> + bne 21f;\
> + ld  r1,PACAR1(r13); \
> + ld  r9,_CCR(r1);\
> + ld  r11,_NIP(r1);   \
> + mfspr   r12, SPRN_SRR1; \
> + b   system_reset_common ;   \
> + b   .;  /* We shouldn't return here */  \
> +FTR_SECTION_ELSE_NESTED(67); \
> + nop;\
> +ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 67); \

What's the point of the else section?

Paul.


Re: [PATCH 2/4] cputime/powerpc: remove cputime_to_scaled()

2016-11-02 Thread Paul Mackerras
On Mon, Oct 31, 2016 at 01:36:27PM +0100, Stanislaw Gruszka wrote:
> Currently cputime_to_scaled() just return it's argument on
> all implementations, we don't need to call this function.
> 
> Signed-off-by: Stanislaw Gruszka <sgrus...@redhat.com>

Looks OK to me.

Reviewed-by: Paul Mackerras <pau...@ozlabs.org>


Re: [PATCH 1/4] cputime/powerpc: remove cputime_last_delta global variable

2016-11-02 Thread Paul Mackerras
On Mon, Oct 31, 2016 at 01:36:26PM +0100, Stanislaw Gruszka wrote:
> Since commit:
> 
> cf9efce0ce313 ("powerpc: Account time using timebase rather than PURR")
> 
> cputime_last_delta is not initialized to other value than 0, hence it's
> not used except zero check and cputime_to_scaled() just returns
> the argument.
> 
> Signed-off-by: Stanislaw Gruszka <sgrus...@redhat.com>

Yes, I should have removed them in that commit.  We don't want to do
any scaling in the places where cputime_to_scaled() is used.

Acked-by: Paul Mackerras <pau...@ozlabs.org>


Re: [PATCH 2/4] cputime/powerpc: remove cputime_to_scaled()

2016-11-02 Thread Paul Mackerras
On Mon, Oct 31, 2016 at 01:36:27PM +0100, Stanislaw Gruszka wrote:
> Currently cputime_to_scaled() just return it's argument on
> all implementations, we don't need to call this function.
> 
> Signed-off-by: Stanislaw Gruszka 

Looks OK to me.

Reviewed-by: Paul Mackerras 


Re: [PATCH 1/4] cputime/powerpc: remove cputime_last_delta global variable

2016-11-02 Thread Paul Mackerras
On Mon, Oct 31, 2016 at 01:36:26PM +0100, Stanislaw Gruszka wrote:
> Since commit:
> 
> cf9efce0ce313 ("powerpc: Account time using timebase rather than PURR")
> 
> cputime_last_delta is not initialized to other value than 0, hence it's
> not used except zero check and cputime_to_scaled() just returns
> the argument.
> 
> Signed-off-by: Stanislaw Gruszka 

Yes, I should have removed them in that commit.  We don't want to do
any scaling in the places where cputime_to_scaled() is used.

Acked-by: Paul Mackerras 


Re: [PATCH v2 2/3] cpuidle:powernv: Add helper function to populate powernv idle states.

2016-11-01 Thread Paul Mackerras
On Tue, Nov 01, 2016 at 07:32:58PM +1100, Oliver O'Halloran wrote:
> On Thu, Oct 27, 2016 at 7:35 PM, Gautham R. Shenoy
>  wrote:
> > From: "Gautham R. Shenoy" 
> >
> > In the current code for powernv_add_idle_states, there is a lot of code
> > duplication while initializing an idle state in powernv_states table.
> >
> > Add an inline helper function to populate the powernv_states[] table for
> > a given idle state. Invoke this for populating the "Nap", "Fastsleep"
> > and the stop states in powernv_add_idle_states.
> >
> > Signed-off-by: Gautham R. Shenoy 
> > ---
> >  drivers/cpuidle/cpuidle-powernv.c | 82 
> > +++
> >  include/linux/cpuidle.h   |  1 +
> >  2 files changed, 49 insertions(+), 34 deletions(-)
> >
> > diff --git a/drivers/cpuidle/cpuidle-powernv.c 
> > b/drivers/cpuidle/cpuidle-powernv.c
> > index 7fe442c..11b22b9 100644
> > --- a/drivers/cpuidle/cpuidle-powernv.c
> > +++ b/drivers/cpuidle/cpuidle-powernv.c
> > @@ -167,6 +167,28 @@ static int powernv_cpuidle_driver_init(void)
> > return 0;
> >  }
> >
> > +static inline void add_powernv_state(int index, const char *name,
> > +unsigned int flags,
> > +int (*idle_fn)(struct cpuidle_device *,
> > +   struct cpuidle_driver *,
> > +   int),
> > +unsigned int target_residency,
> > +unsigned int exit_latency,
> > +u64 psscr_val)
> > +{
> > +   strncpy(powernv_states[index].name, name, CPUIDLE_NAME_LEN);
> > +   strncpy(powernv_states[index].desc, name, CPUIDLE_NAME_LEN);
> 
> If the supplied name is equal to CPUIDLE_NAME_LEN then strncpy() won't
> terminate the string. The least annoying fix is to memset() the whole
> structure to zero and set n to CPUIDLE_NAME_LEN - 1.

Or he could use strlcpy() instead of strncpy().

Paul.


Re: [PATCH v2 2/3] cpuidle:powernv: Add helper function to populate powernv idle states.

2016-11-01 Thread Paul Mackerras
On Tue, Nov 01, 2016 at 07:32:58PM +1100, Oliver O'Halloran wrote:
> On Thu, Oct 27, 2016 at 7:35 PM, Gautham R. Shenoy
>  wrote:
> > From: "Gautham R. Shenoy" 
> >
> > In the current code for powernv_add_idle_states, there is a lot of code
> > duplication while initializing an idle state in powernv_states table.
> >
> > Add an inline helper function to populate the powernv_states[] table for
> > a given idle state. Invoke this for populating the "Nap", "Fastsleep"
> > and the stop states in powernv_add_idle_states.
> >
> > Signed-off-by: Gautham R. Shenoy 
> > ---
> >  drivers/cpuidle/cpuidle-powernv.c | 82 
> > +++
> >  include/linux/cpuidle.h   |  1 +
> >  2 files changed, 49 insertions(+), 34 deletions(-)
> >
> > diff --git a/drivers/cpuidle/cpuidle-powernv.c 
> > b/drivers/cpuidle/cpuidle-powernv.c
> > index 7fe442c..11b22b9 100644
> > --- a/drivers/cpuidle/cpuidle-powernv.c
> > +++ b/drivers/cpuidle/cpuidle-powernv.c
> > @@ -167,6 +167,28 @@ static int powernv_cpuidle_driver_init(void)
> > return 0;
> >  }
> >
> > +static inline void add_powernv_state(int index, const char *name,
> > +unsigned int flags,
> > +int (*idle_fn)(struct cpuidle_device *,
> > +   struct cpuidle_driver *,
> > +   int),
> > +unsigned int target_residency,
> > +unsigned int exit_latency,
> > +u64 psscr_val)
> > +{
> > +   strncpy(powernv_states[index].name, name, CPUIDLE_NAME_LEN);
> > +   strncpy(powernv_states[index].desc, name, CPUIDLE_NAME_LEN);
> 
> If the supplied name is equal to CPUIDLE_NAME_LEN then strncpy() won't
> terminate the string. The least annoying fix is to memset() the whole
> structure to zero and set n to CPUIDLE_NAME_LEN - 1.

Or he could use strlcpy() instead of strncpy().

Paul.


  1   2   3   4   5   6   7   8   9   10   >