Re: [PATCH v19 0/3] vfio/nvgrace-gpu: Add vfio pci variant module for grace hopper

2024-02-22 Thread Alex Williamson
n [3] and [4], KVM marks the region with
> MemAttr[2:0]=0b101 in S2.
> 
> If the device memory properties are not present, the driver registers the
> vfio-pci-core function pointers. Since there are no ACPI memory properties
> generated for the VM, the variant driver inside the VM will only use
> the vfio-pci-core ops and hence try to map the BARs as non cached. This
> is not a problem as the CPUs have FWB enabled which blocks the VM
> mapping's ability to override the cacheability set by the host mapping.
> 
> This goes along with a qemu series [6] to provides the necessary
> implementation of the Grace Hopper Superchip firmware specification so
> that the guest operating system can see the correct ACPI modeling for
> the coherent GPU device. Verified with the CUDA workload in the VM.
> 
> [1] https://www.nvidia.com/en-in/technologies/multi-instance-gpu/
> [2] section D8.5.5 of https://developer.arm.com/documentation/ddi0487/latest/
> [3] https://lore.kernel.org/all/20240211174705.31992-1-ank...@nvidia.com/
> [4] https://lore.kernel.org/all/20230907181459.18145-2-ank...@nvidia.com/
> [5] https://github.com/NVIDIA/open-gpu-kernel-modules
> [6] https://lore.kernel.org/all/20231203060245.31593-1-ank...@nvidia.com/
> 
> Applied over v6.8-rc5.
> 
> Signed-off-by: Aniket Agashe 
> Signed-off-by: Ankit Agrawal 
> ---
[snip]
> Ankit Agrawal (3):
>   vfio/pci: rename and export do_io_rw()
>   vfio/pci: rename and export range_intersect_range
>   vfio/nvgrace-gpu: Add vfio pci variant module for grace hopper
> 
>  MAINTAINERS   |  16 +-
>  drivers/vfio/pci/Kconfig  |   2 +
>  drivers/vfio/pci/Makefile |   2 +
>  drivers/vfio/pci/nvgrace-gpu/Kconfig  |  10 +
>  drivers/vfio/pci/nvgrace-gpu/Makefile |   3 +
>  drivers/vfio/pci/nvgrace-gpu/main.c   | 879 ++
>  drivers/vfio/pci/vfio_pci_config.c|  42 ++
>  drivers/vfio/pci/vfio_pci_rdwr.c  |  16 +-
>  drivers/vfio/pci/virtio/main.c|  72 +--
>  include/linux/vfio_pci_core.h |  10 +-
>  10 files changed, 993 insertions(+), 59 deletions(-)
>  create mode 100644 drivers/vfio/pci/nvgrace-gpu/Kconfig
>  create mode 100644 drivers/vfio/pci/nvgrace-gpu/Makefile
>  create mode 100644 drivers/vfio/pci/nvgrace-gpu/main.c
> 

Applied to vfio next branch for v6.9.  Thanks,

Alex




Re: [PATCH v17 3/3] vfio/nvgrace-gpu: Add vfio pci variant module for grace hopper

2024-02-09 Thread Alex Williamson
On Fri, 9 Feb 2024 13:19:03 -0400
Jason Gunthorpe  wrote:

> On Fri, Feb 09, 2024 at 08:55:31AM -0700, Alex Williamson wrote:
> > I think Kevin's point is also relative to this latter scenario, in the
> > L1 instance of the nvgrace-gpu driver the mmap of the usemem BAR is
> > cachable, but in the L2 instance of the driver where we only use the
> > vfio-pci-core ops nothing maintains that cachable mapping.  Is that a
> > problem?  An uncached mapping on top of a cachable mapping is often
> > prone to problems.
> 
> On these CPUs the ARM architecture won't permit it, the L0 level
> blocks uncachable using FWB and page table attributes. The VM, no
> matter what it does, cannot make the cachable memory uncachable.

Great, thanks,

Alex




Re: [PATCH v17 3/3] vfio/nvgrace-gpu: Add vfio pci variant module for grace hopper

2024-02-09 Thread Alex Williamson
On Fri, 9 Feb 2024 09:20:22 +
Ankit Agrawal  wrote:

> Thanks Kevin for the review. Comments inline.
> 
> >>
> >> Note that the usemem memory is added by the VM Nvidia device driver [5]
> >> to the VM kernel as memblocks. Hence make the usable memory size
> >> memblock
> >> aligned.  
> >
> > Is memblock size defined in spec or purely a guest implementation choice?  
> 
> The MEMBLOCK value is a hardwired and a constant ABI value between the GPU
> FW and VFIO driver.
> 
> >>
> >> If the bare metal properties are not present, the driver registers the
> >> vfio-pci-core function pointers.  
> >
> > so if qemu doesn't generate such property the variant driver running
> > inside guest will always go to use core functions and guest vfio userspace
> > will observe both resmem and usemem bars. But then there is nothing
> > in field to prohibit mapping resmem bar as cacheable.
> >
> > should this driver check the presence of either ACPI property or
> > resmem/usemem bars to enable variant function pointers?  
> 
> Maybe I am missing something here; but if the ACPI property is absent,
> the real physical BARs present on the device will be exposed by the
> vfio-pci-core functions to the VM. So I think if the variant driver is ran
> within the VM, it should not see the fake usemem and resmem BARs.

There are two possibilities here, either we're assigning the pure
physical device from a host that does not have the ACPI properties or
we're performing a nested assignment.  In the former case we're simply
passing along the unmodified physical BARs.  In the latter case we're
actually passing through the fake BARs, the virtualization of the
device has already happened in the level 1 assignment.

I think Kevin's point is also relative to this latter scenario, in the
L1 instance of the nvgrace-gpu driver the mmap of the usemem BAR is
cachable, but in the L2 instance of the driver where we only use the
vfio-pci-core ops nothing maintains that cachable mapping.  Is that a
problem?  An uncached mapping on top of a cachable mapping is often
prone to problems.  Thanks,

Alex




Re: [PATCH v17 3/3] vfio/nvgrace-gpu: Add vfio pci variant module for grace hopper

2024-02-08 Thread Alex Williamson
On Thu, 8 Feb 2024 07:21:40 +
"Tian, Kevin"  wrote:

> > From: Ankit Agrawal 
> > Sent: Thursday, February 8, 2024 3:13 PM  
> > >> > +    * Determine how many bytes to be actually read from the
> > >> > device memory.
> > >> > +    * Read request beyond the actual device memory size is
> > >> > filled with ~0,
> > >> > +    * while those beyond the actual reported size is skipped.
> > >> > +    */
> > >> > +   if (offset >= memregion->memlength)
> > >> > +   mem_count = 0;  
> > >>
> > >> If mem_count == 0, going through nvgrace_gpu_map_and_read() is not
> > >> necessary.  
> > >
> > > Harmless, other than the possibly unnecessary call through to
> > > nvgrace_gpu_map_device_mem().  Maybe both  
> > nvgrace_gpu_map_and_read()  
> > > and nvgrace_gpu_map_and_write() could conditionally return 0 as their
> > > first operation when !mem_count.  Thanks,
> > >
> > >Alex  
> > 
> > IMO, this seems like adding too much code to reduce the call length for a
> > very specific case. If there aren't any strong opinion on this, I'm 
> > planning to
> > leave this code as it is.  
> 
> a slight difference. if mem_count==0 the result should always succeed
> no matter nvgrace_gpu_map_device_mem() succeeds or not. Of course
> if it fails it's already a big problem probably nobody cares about the subtle
> difference when reading non-exist range.
> 
> but regarding to readability it's still clearer:
> 
> if (mem_count)
>   nvgrace_gpu_map_and_read();
> 

The below has better flow imo vs conditionalizing the call to
map_and_read/write and subsequent error handling, but I don't think
either adds too much code.  Thanks,

Alex

--- a/drivers/vfio/pci/nvgrace-gpu/main.c
+++ b/drivers/vfio/pci/nvgrace-gpu/main.c
@@ -429,6 +429,9 @@ nvgrace_gpu_map_and_read(struct 
nvgrace_gpu_vfio_pci_core_device *nvdev,
u64 offset = *ppos & VFIO_PCI_OFFSET_MASK;
int ret;
 
+   if (!mem_count)
+   return 0;
+
/*
 * Handle read on the BAR regions. Map to the target device memory
 * physical address and copy to the request read buffer.
@@ -547,6 +550,9 @@ nvgrace_gpu_map_and_write(struct 
nvgrace_gpu_vfio_pci_core_device *nvdev,
loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
int ret;
 
+   if (!mem_count)
+   return 0;
+
ret = nvgrace_gpu_map_device_mem(index, nvdev);
if (ret)
return ret;




Re: [PATCH v17 3/3] vfio/nvgrace-gpu: Add vfio pci variant module for grace hopper

2024-02-07 Thread Alex Williamson
h
> MemAttr[2:0]=0b101 in S2.
> 
> If the bare metal properties are not present, the driver registers the
> vfio-pci-core function pointers.
> 
> This goes along with a qemu series [6] to provides the necessary
> implementation of the Grace Hopper Superchip firmware specification so
> that the guest operating system can see the correct ACPI modeling for
> the coherent GPU device. Verified with the CUDA workload in the VM.
> 
> [1] https://www.nvidia.com/en-in/technologies/multi-instance-gpu/
> [2] section D8.5.5 of https://developer.arm.com/documentation/ddi0487/latest/
> [3] https://lore.kernel.org/all/20231205033015.10044-1-ank...@nvidia.com/
> [4] https://lore.kernel.org/all/20230907181459.18145-2-ank...@nvidia.com/
> [5] https://github.com/NVIDIA/open-gpu-kernel-modules
> [6] https://lore.kernel.org/all/20231203060245.31593-1-ank...@nvidia.com/
> 
> Signed-off-by: Aniket Agashe 
> Signed-off-by: Ankit Agrawal 
> ---
>  MAINTAINERS   |   6 +
>  drivers/vfio/pci/Kconfig  |   2 +
>  drivers/vfio/pci/Makefile |   2 +
>  drivers/vfio/pci/nvgrace-gpu/Kconfig  |  10 +
>  drivers/vfio/pci/nvgrace-gpu/Makefile |   3 +
>  drivers/vfio/pci/nvgrace-gpu/main.c   | 856 ++
>  6 files changed, 879 insertions(+)
>  create mode 100644 drivers/vfio/pci/nvgrace-gpu/Kconfig
>  create mode 100644 drivers/vfio/pci/nvgrace-gpu/Makefile
>  create mode 100644 drivers/vfio/pci/nvgrace-gpu/main.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 8999497011a2..529ec8966f58 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -23103,6 +23103,12 @@ L:   k...@vger.kernel.org
>  S:   Maintained
>  F:   drivers/vfio/platform/
>  
> +VFIO NVIDIA GRACE GPU DRIVER
> +M:   Ankit Agrawal 
> +L:   k...@vger.kernel.org
> +S:   Supported
> +F:   drivers/vfio/pci/nvgrace-gpu/
> +


Entries should be alphabetical.  This will end up colliding with [1] so
I'll plan to fix it either way.

Otherwise just a couple optional comments from me below.  I see Zhi also
has a few good comments.  I'd suggest soliciting a review from the other
variant driver reviewers for this version and maybe we can make v18 the
final version.  Thanks,

Alex

[1]https://lore.kernel.org/all/20240205235427.2103714-1-alex.william...@redhat.com/


>  VGA_SWITCHEROO
>  R:   Lukas Wunner 
>  S:   Maintained
> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> index 18c397df566d..15821a2d77d2 100644
> --- a/drivers/vfio/pci/Kconfig
> +++ b/drivers/vfio/pci/Kconfig
> @@ -67,4 +67,6 @@ source "drivers/vfio/pci/pds/Kconfig"
>  
>  source "drivers/vfio/pci/virtio/Kconfig"
>  
> +source "drivers/vfio/pci/nvgrace-gpu/Kconfig"
> +
>  endmenu
> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
> index 046139a4eca5..ce7a61f1d912 100644
> --- a/drivers/vfio/pci/Makefile
> +++ b/drivers/vfio/pci/Makefile
> @@ -15,3 +15,5 @@ obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
>  obj-$(CONFIG_PDS_VFIO_PCI) += pds/
>  
>  obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
> +
> +obj-$(CONFIG_NVGRACE_GPU_VFIO_PCI) += nvgrace-gpu/
> diff --git a/drivers/vfio/pci/nvgrace-gpu/Kconfig 
> b/drivers/vfio/pci/nvgrace-gpu/Kconfig
> new file mode 100644
> index ..936e88d8d41d
> --- /dev/null
> +++ b/drivers/vfio/pci/nvgrace-gpu/Kconfig
> @@ -0,0 +1,10 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +config NVGRACE_GPU_VFIO_PCI
> + tristate "VFIO support for the GPU in the NVIDIA Grace Hopper Superchip"
> + depends on ARM64 || (COMPILE_TEST && 64BIT)
> + select VFIO_PCI_CORE
> + help
> +   VFIO support for the GPU in the NVIDIA Grace Hopper Superchip is
> +   required to assign the GPU device using KVM/qemu/etc.
> +
> +   If you don't know what to do here, say N.
> diff --git a/drivers/vfio/pci/nvgrace-gpu/Makefile 
> b/drivers/vfio/pci/nvgrace-gpu/Makefile
> new file mode 100644
> index ..3ca8c187897a
> --- /dev/null
> +++ b/drivers/vfio/pci/nvgrace-gpu/Makefile
> @@ -0,0 +1,3 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +obj-$(CONFIG_NVGRACE_GPU_VFIO_PCI) += nvgrace-gpu-vfio-pci.o
> +nvgrace-gpu-vfio-pci-y := main.o
> diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c 
> b/drivers/vfio/pci/nvgrace-gpu/main.c
> new file mode 100644
> index ..6279af2bc6b8
> --- /dev/null
> +++ b/drivers/vfio/pci/nvgrace-gpu/main.c
> @@ -0,0 +1,856 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved
> + */
> +
> +#include 
> +
> +/*
> + * The device memory usable to the workloads running in the VM is cached
> + *

Re: [PATCH v17 3/3] vfio/nvgrace-gpu: Add vfio pci variant module for grace hopper

2024-02-07 Thread Alex Williamson
   ret = -ENOMEM;
> > +   } else if (index == RESMEM_REGION_INDEX &&
> > !memregion->ioaddr) {
> > +   memregion->ioaddr = ioremap_wc(memregion->memphys,
> > +  memregion->memlength);
> > +   if (!memregion->ioaddr)
> > +   ret = -ENOMEM;
> > +   }
> > +   mutex_unlock(>remap_lock);
> > +
> > +   return ret;
> > +}
> > +
> > +/*
> > + * Read the data from the device memory (mapped either through
> > ioremap
> > + * or memremap) into the user buffer.
> > + */
> > +static int
> > +nvgrace_gpu_map_and_read(struct nvgrace_gpu_vfio_pci_core_device
> > *nvdev,
> > +char __user *buf, size_t mem_count, loff_t
> > *ppos) +{
> > +   unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> > +   u64 offset = *ppos & VFIO_PCI_OFFSET_MASK;
> > +   int ret;
> > +
> > +   /*
> > +* Handle read on the BAR regions. Map to the target device
> > memory
> > +* physical address and copy to the request read buffer.
> > +*/
> > +   ret = nvgrace_gpu_map_device_mem(index, nvdev);
> > +   if (ret)
> > +   return ret;
> > +  
> 
> Wouldn't it be better to do the map in the open path? 

AIUI the device would typically be used exclusively through the mmap of
these ranges, these mappings are only for pread/pwrite type accesses,
so I think it makes sense to map them on demand.

> > +   if (index == USEMEM_REGION_INDEX) {
> > +   if (copy_to_user(buf,
> > +(u8 *)nvdev->usemem.memaddr +
> > offset,
> > +mem_count))
> > +   ret = -EFAULT;
> > +   } else {
> > +   /*
> > +* The hardware ensures that the system does not
> > crash when
> > +* the device memory is accessed with the memory
> > enable
> > +* turned off. It synthesizes ~0 on such read. So
> > there is
> > +* no need to check or support the
> > disablement/enablement of
> > +* BAR through PCI_COMMAND config space register.
> > Pass
> > +* test_mem flag as false.
> > +*/
> > +   ret = vfio_pci_core_do_io_rw(>core_device,
> > false,
> > +nvdev->resmem.ioaddr,
> > +buf, offset, mem_count,
> > +0, 0, false);
> > +   }
> > +
> > +   return ret;
> > +}
> > +
> > +/*
> > + * Read count bytes from the device memory at an offset. The actual
> > device
> > + * memory size (available) may not be a power-of-2. So the driver
> > fakes
> > + * the size to a power-of-2 (reported) when exposing to a user space
> > driver.
> > + *
> > + * Reads extending beyond the reported size are truncated; reads
> > starting
> > + * beyond the reported size generate -EINVAL; reads extending beyond
> > the
> > + * actual device size is filled with ~0.
> > + */
> > +static ssize_t
> > +nvgrace_gpu_read_mem(struct nvgrace_gpu_vfio_pci_core_device *nvdev,
> > +char __user *buf, size_t count, loff_t *ppos)
> > +{
> > +   u64 offset = *ppos & VFIO_PCI_OFFSET_MASK;
> > +   unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> > +   struct mem_region *memregion;
> > +   size_t mem_count, i;
> > +   u8 val = 0xFF;
> > +   int ret;
> > +
> > +   memregion = nvgrace_gpu_memregion(index, nvdev);
> > +   if (!memregion)
> > +   return -EINVAL;
> > +
> > +   if (offset >= memregion->bar_size)
> > +   return -EINVAL;
> > +
> > +   /* Clip short the read request beyond reported BAR size */
> > +   count = min(count, memregion->bar_size - (size_t)offset);
> > +
> > +   /*
> > +* Determine how many bytes to be actually read from the
> > device memory.
> > +* Read request beyond the actual device memory size is
> > filled with ~0,
> > +* while those beyond the actual reported size is skipped.
> > +*/
> > +   if (offset >= memregion->memlength)
> > +   mem_count = 0;  
> 
> If mem_count == 0, going through nvgrace_gpu_map_and_read() is not
> necessary.

Harmless, other than the possibly unnecessary call through to
nvgrace_gpu_map_device_mem().  Maybe both nvgrace_gpu_map_and_read()
and nvgrace_gpu_map_and_write() could conditionally return 0 as their
first operation 

Re: [PATCH] vfio: fix virtio-pci dependency

2024-01-10 Thread Alex Williamson
On Tue,  9 Jan 2024 08:57:19 +0100
Arnd Bergmann  wrote:

> From: Arnd Bergmann 
> 
> The new vfio-virtio driver already has a dependency on 
> VIRTIO_PCI_ADMIN_LEGACY,
> but that is a bool symbol and allows vfio-virtio to be built-in even if
> virtio-pci itself is a loadable module. This leads to a link failure:
> 
> aarch64-linux-ld: drivers/vfio/pci/virtio/main.o: in function 
> `virtiovf_pci_probe':
> main.c:(.text+0xec): undefined reference to `virtio_pci_admin_has_legacy_io'
> aarch64-linux-ld: drivers/vfio/pci/virtio/main.o: in function 
> `virtiovf_pci_init_device':
> main.c:(.text+0x260): undefined reference to 
> `virtio_pci_admin_legacy_io_notify_info'
> aarch64-linux-ld: drivers/vfio/pci/virtio/main.o: in function 
> `virtiovf_pci_bar0_rw':
> main.c:(.text+0x6ec): undefined reference to 
> `virtio_pci_admin_legacy_common_io_read'
> aarch64-linux-ld: main.c:(.text+0x6f4): undefined reference to 
> `virtio_pci_admin_legacy_device_io_read'
> aarch64-linux-ld: main.c:(.text+0x7f0): undefined reference to 
> `virtio_pci_admin_legacy_common_io_write'
> aarch64-linux-ld: main.c:(.text+0x7f8): undefined reference to 
> `virtio_pci_admin_legacy_device_io_write'
> 
> Add another explicit dependency on the tristate symbol.
> 
> Fixes: eb61eca0e8c3 ("vfio/virtio: Introduce a vfio driver over virtio 
> devices")
> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/vfio/pci/virtio/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
> index fc3a0be9d8d4..bd80eca4a196 100644
> --- a/drivers/vfio/pci/virtio/Kconfig
> +++ b/drivers/vfio/pci/virtio/Kconfig
> @@ -1,7 +1,7 @@
>  # SPDX-License-Identifier: GPL-2.0-only
>  config VIRTIO_VFIO_PCI
>  tristate "VFIO support for VIRTIO NET PCI devices"
> -depends on VIRTIO_PCI_ADMIN_LEGACY
> +depends on VIRTIO_PCI && VIRTIO_PCI_ADMIN_LEGACY
>  select VFIO_PCI_CORE
>  help
>This provides support for exposing VIRTIO NET VF devices which 
> support

Applied to vfio next branch for v6.8.  Thanks!

Alex




Re: [PATCH 35/40] drm/amd/amdgpu/amdgpu_cs: Repair some function naming disparity

2021-04-20 Thread Alex Deucher
Applied.  Thanks!

Alex

On Fri, Apr 16, 2021 at 11:54 AM Christian König
 wrote:
>
> Am 16.04.21 um 16:37 schrieb Lee Jones:
> > Fixes the following W=1 kernel build warning(s):
> >
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c:685: warning: expecting prototype 
> > for cs_parser_fini(). Prototype was for amdgpu_cs_parser_fini() instead
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c:1502: warning: expecting prototype 
> > for amdgpu_cs_wait_all_fence(). Prototype was for 
> > amdgpu_cs_wait_all_fences() instead
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c:1656: warning: expecting prototype 
> > for amdgpu_cs_find_bo_va(). Prototype was for amdgpu_cs_find_mapping() 
> > instead
> >
> > Cc: Alex Deucher 
> > Cc: "Christian König" 
> > Cc: David Airlie 
> > Cc: Daniel Vetter 
> > Cc: Sumit Semwal 
> > Cc: Jerome Glisse 
> > Cc: amd-...@lists.freedesktop.org
> > Cc: dri-de...@lists.freedesktop.org
> > Cc: linux-me...@vger.kernel.org
> > Cc: linaro-mm-...@lists.linaro.org
> > Signed-off-by: Lee Jones 
>
> Reviewed-by: Christian König 
>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 6 +++---
> >   1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > index b5c7669980458..90136f9dedd65 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > @@ -672,7 +672,7 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser 
> > *p)
> >   }
> >
> >   /**
> > - * cs_parser_fini() - clean parser states
> > + * amdgpu_cs_parser_fini() - clean parser states
> >* @parser: parser structure holding parsing context.
> >* @error:  error number
> >* @backoff:indicator to backoff the reservation
> > @@ -1488,7 +1488,7 @@ int amdgpu_cs_fence_to_handle_ioctl(struct drm_device 
> > *dev, void *data,
> >   }
> >
> >   /**
> > - * amdgpu_cs_wait_all_fence - wait on all fences to signal
> > + * amdgpu_cs_wait_all_fences - wait on all fences to signal
> >*
> >* @adev: amdgpu device
> >* @filp: file private
> > @@ -1639,7 +1639,7 @@ int amdgpu_cs_wait_fences_ioctl(struct drm_device 
> > *dev, void *data,
> >   }
> >
> >   /**
> > - * amdgpu_cs_find_bo_va - find bo_va for VM address
> > + * amdgpu_cs_find_mapping - find bo_va for VM address
> >*
> >* @parser: command submission parser context
> >* @addr: VM address
>
> ___
> amd-gfx mailing list
> amd-...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 33/40] drm/amd/amdgpu/amdgpu_ring: Provide description for 'sched_score'

2021-04-20 Thread Alex Deucher
Applied.  Thanks!

Alex

On Fri, Apr 16, 2021 at 11:54 AM Christian König
 wrote:
>
> Am 16.04.21 um 16:37 schrieb Lee Jones:
> > Fixes the following W=1 kernel build warning(s):
> >
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c:169: warning: Function parameter 
> > or member 'sched_score' not described in 'amdgpu_ring_init'
> >
> > Cc: Alex Deucher 
> > Cc: "Christian König" 
> > Cc: David Airlie 
> > Cc: Daniel Vetter 
> > Cc: Sumit Semwal 
> > Cc: amd-...@lists.freedesktop.org
> > Cc: dri-de...@lists.freedesktop.org
> > Cc: linux-me...@vger.kernel.org
> > Cc: linaro-mm-...@lists.linaro.org
> > Signed-off-by: Lee Jones 
>
> Reviewed-by: Christian König 
>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 1 +
> >   1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> > index 688624ebe4211..7b634a1517f9c 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> > @@ -158,6 +158,7 @@ void amdgpu_ring_undo(struct amdgpu_ring *ring)
> >* @irq_src: interrupt source to use for this ring
> >* @irq_type: interrupt type to use for this ring
> >* @hw_prio: ring priority (NORMAL/HIGH)
> > + * @sched_score: optional score atomic shared with other schedulers
> >*
> >* Initialize the driver information for the selected ring (all asics).
> >* Returns 0 on success, error on failure.
>
> ___
> amd-gfx mailing list
> amd-...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 32/40] drm/amd/amdgpu/amdgpu_ttm: Fix incorrectly documented function 'amdgpu_ttm_copy_mem_to_mem()'

2021-04-20 Thread Alex Deucher
Applied.  Thanks!

Alex

On Fri, Apr 16, 2021 at 11:53 AM Christian König
 wrote:
>
> Am 16.04.21 um 16:37 schrieb Lee Jones:
> > Fixes the following W=1 kernel build warning(s):
> >
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:311: warning: expecting prototype 
> > for amdgpu_copy_ttm_mem_to_mem(). Prototype was for 
> > amdgpu_ttm_copy_mem_to_mem() instead
> >
> > Cc: Alex Deucher 
> > Cc: "Christian König" 
> > Cc: David Airlie 
> > Cc: Daniel Vetter 
> > Cc: Sumit Semwal 
> > Cc: Jerome Glisse 
> > Cc: amd-...@lists.freedesktop.org
> > Cc: dri-de...@lists.freedesktop.org
> > Cc: linux-me...@vger.kernel.org
> > Cc: linaro-mm-...@lists.linaro.org
> > Signed-off-by: Lee Jones 
>
> Reviewed-by: Christian König 
>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > index 3bef0432cac2f..859314c0d6a39 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > @@ -288,7 +288,7 @@ static int amdgpu_ttm_map_buffer(struct 
> > ttm_buffer_object *bo,
> >   }
> >
> >   /**
> > - * amdgpu_copy_ttm_mem_to_mem - Helper function for copy
> > + * amdgpu_ttm_copy_mem_to_mem - Helper function for copy
> >* @adev: amdgpu device
> >* @src: buffer/address where to read from
> >* @dst: buffer/address where to write to
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 31/40] drm/amd/amdgpu/amdgpu_gart: Correct a couple of function names in the docs

2021-04-20 Thread Alex Deucher
Applied.  thanks!

Alex

On Fri, Apr 16, 2021 at 11:53 AM Christian König
 wrote:
>
> Am 16.04.21 um 16:37 schrieb Lee Jones:
> > Fixes the following W=1 kernel build warning(s):
> >
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c:73: warning: expecting prototype 
> > for amdgpu_dummy_page_init(). Prototype was for 
> > amdgpu_gart_dummy_page_init() instead
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c:96: warning: expecting prototype 
> > for amdgpu_dummy_page_fini(). Prototype was for 
> > amdgpu_gart_dummy_page_fini() instead
> >
> > Cc: Alex Deucher 
> > Cc: "Christian König" 
> > Cc: David Airlie 
> > Cc: Daniel Vetter 
> > Cc: Nirmoy Das 
> > Cc: amd-...@lists.freedesktop.org
> > Cc: dri-de...@lists.freedesktop.org
> > Signed-off-by: Lee Jones 
>
> Reviewed-by: Christian König 
>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 4 ++--
> >   1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> > index c5a9a4fb10d2b..5562b5c90c032 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> > @@ -60,7 +60,7 @@
> >*/
> >
> >   /**
> > - * amdgpu_dummy_page_init - init dummy page used by the driver
> > + * amdgpu_gart_dummy_page_init - init dummy page used by the driver
> >*
> >* @adev: amdgpu_device pointer
> >*
> > @@ -86,7 +86,7 @@ static int amdgpu_gart_dummy_page_init(struct 
> > amdgpu_device *adev)
> >   }
> >
> >   /**
> > - * amdgpu_dummy_page_fini - free dummy page used by the driver
> > + * amdgpu_gart_dummy_page_fini - free dummy page used by the driver
> >*
> >* @adev: amdgpu_device pointer
> >*
>
> ___
> amd-gfx mailing list
> amd-...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 29/40] drm/amd/amdgpu/amdgpu_fence: Provide description for 'sched_score'

2021-04-20 Thread Alex Deucher
Applied.  Thanks!

Alex

On Fri, Apr 16, 2021 at 11:52 AM Christian König
 wrote:
>
> Am 16.04.21 um 16:37 schrieb Lee Jones:
> > Fixes the following W=1 kernel build warning(s):
> >
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:444: warning: Function 
> > parameter or member 'sched_score' not described in 
> > 'amdgpu_fence_driver_init_ring'
> >
> > Cc: Alex Deucher 
> > Cc: "Christian König" 
> > Cc: David Airlie 
> > Cc: Daniel Vetter 
> > Cc: Sumit Semwal 
> > Cc: Jerome Glisse 
> > Cc: amd-...@lists.freedesktop.org
> > Cc: dri-de...@lists.freedesktop.org
> > Cc: linux-me...@vger.kernel.org
> > Cc: linaro-mm-...@lists.linaro.org
> > Signed-off-by: Lee Jones 
>
> Reviewed-by: Christian König 
>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 1 +
> >   1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > index 47ea468596184..30772608eac6c 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > @@ -434,6 +434,7 @@ int amdgpu_fence_driver_start_ring(struct amdgpu_ring 
> > *ring,
> >*
> >* @ring: ring to init the fence driver on
> >* @num_hw_submission: number of entries on the hardware queue
> > + * @sched_score: optional score atomic shared with other schedulers
> >*
> >* Init the fence driver for the requested ring (all asics).
> >* Helper function for amdgpu_fence_driver_init().
>
> ___
> amd-gfx mailing list
> amd-...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 25/40] drm/radeon/radeon_device: Provide function name in kernel-doc header

2021-04-20 Thread Alex Deucher
Applied.  Thanks!

Alex

On Fri, Apr 16, 2021 at 11:51 AM Christian König
 wrote:
>
> Am 16.04.21 um 16:37 schrieb Lee Jones:
> > Fixes the following W=1 kernel build warning(s):
> >
> >   drivers/gpu/drm/radeon/radeon_device.c:1101: warning: This comment starts 
> > with '/**', but isn't a kernel-doc comment. Refer 
> > Documentation/doc-guide/kernel-doc.rst
> >
> > Cc: Alex Deucher 
> > Cc: "Christian König" 
> > Cc: David Airlie 
> > Cc: Daniel Vetter 
> > Cc: amd-...@lists.freedesktop.org
> > Cc: dri-de...@lists.freedesktop.org
> > Signed-off-by: Lee Jones 
>
> Reviewed-by: Christian König 
>
> > ---
> >   drivers/gpu/drm/radeon/radeon_device.c | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
> > b/drivers/gpu/drm/radeon/radeon_device.c
> > index cc445c4cba2e3..46eea01950cb1 100644
> > --- a/drivers/gpu/drm/radeon/radeon_device.c
> > +++ b/drivers/gpu/drm/radeon/radeon_device.c
> > @@ -1098,7 +1098,8 @@ static bool radeon_check_pot_argument(int arg)
> >   }
> >
> >   /**
> > - * Determine a sensible default GART size according to ASIC family.
> > + * radeon_gart_size_auto - Determine a sensible default GART size
> > + * according to ASIC family.
> >*
> >* @family: ASIC family name
> >*/
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 26/40] drm/amd/amdgpu/amdgpu_device: Remove unused variable 'r'

2021-04-20 Thread Alex Deucher
Applied.  Thanks!

Alex

On Fri, Apr 16, 2021 at 10:38 AM Lee Jones  wrote:
>
> Fixes the following W=1 kernel build warning(s):
>
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c: In function 
> ‘amdgpu_device_suspend’:
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3733:6: warning: variable ‘r’ set 
> but not used [-Wunused-but-set-variable]
>
> Cc: Alex Deucher 
> Cc: "Christian König" 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Cc: Sumit Semwal 
> Cc: amd-...@lists.freedesktop.org
> Cc: dri-de...@lists.freedesktop.org
> Cc: linux-me...@vger.kernel.org
> Cc: linaro-mm-...@lists.linaro.org
> Signed-off-by: Lee Jones 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index b4ad1c055c702..eef54b265ffdd 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3730,7 +3730,6 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
>  int amdgpu_device_suspend(struct drm_device *dev, bool fbcon)
>  {
> struct amdgpu_device *adev = drm_to_adev(dev);
> -   int r;
>
> if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
> return 0;
> @@ -3745,7 +3744,7 @@ int amdgpu_device_suspend(struct drm_device *dev, bool 
> fbcon)
>
> amdgpu_ras_suspend(adev);
>
> -   r = amdgpu_device_ip_suspend_phase1(adev);
> +   amdgpu_device_ip_suspend_phase1(adev);
>
> if (!adev->in_s0ix)
> amdgpu_amdkfd_suspend(adev, adev->in_runpm);
> @@ -3755,7 +3754,7 @@ int amdgpu_device_suspend(struct drm_device *dev, bool 
> fbcon)
>
> amdgpu_fence_driver_suspend(adev);
>
> -   r = amdgpu_device_ip_suspend_phase2(adev);
> +   amdgpu_device_ip_suspend_phase2(adev);
> /* evict remaining vram memory
>  * This second call to evict vram is to evict the gart page table
>  * using the CPU.
> --
> 2.27.0
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [RFC PATCH 2/3] vfio/hisilicon: register the driver to vfio

2021-04-20 Thread Alex Williamson
On Tue, 20 Apr 2021 09:59:57 -0300
Jason Gunthorpe  wrote:

> On Tue, Apr 20, 2021 at 08:50:12PM +0800, liulongfang wrote:
> > On 2021/4/19 20:33, Jason Gunthorpe wrote:  
> > > On Mon, Apr 19, 2021 at 08:24:40PM +0800, liulongfang wrote:
> > >   
> > >>> I'm also confused how this works securely at all, as a general rule a
> > >>> VFIO PCI driver cannot access the MMIO memory of the function it is
> > >>> planning to assign to the guest. There is a lot of danger that the
> > >>> guest could access that MMIO space one way or another.  
> > >>
> > >> VF's MMIO memory is divided into two parts, one is the guest part,
> > >> and the other is the live migration part. They do not affect each other,
> > >> so there is no security problem.  
> > > 
> > > AFAIK there are several scenarios where a guest can access this MMIO
> > > memory using DMA even if it is not mapped into the guest for CPU
> > > access.
> > >   
> > The hardware divides VF's MMIO memory into two parts. The live migration
> > driver in the host uses the live migration part, and the device driver in
> > the guest uses the guest part. They obtain the address of VF's MMIO memory
> > in their respective drivers, although these two parts The memory is
> > continuous on the hardware device, but due to the needs of the drive 
> > function,
> > they will not perform operations on another part of the memory, and the
> > device hardware also independently responds to the operation commands of
> > the two parts.  
> 
> It doesn't matter, the memory is still under the same PCI BDF and VFIO
> supports scenarios where devices in the same IOMMU group are not
> isolated from each other.
> 
> This is why the granual of isolation is a PCI BDF - VFIO directly
> blocks kernel drivers from attaching to PCI BDFs that are not
> completely isolated from VFIO BDF.
> 
> Bypassing this prevention and attaching a kernel driver directly to
> the same BDF being exposed to the guest breaks that isolation model.
> 
> > So, I still don't understand what the security risk you are talking about 
> > is,
> > and what do you think the security design should look like?
> > Can you elaborate on it?  
> 
> Each security domain must have its own PCI BDF.
> 
> The migration control registers must be on a different VF from the VF
> being plugged into a guest and the two VFs have to be in different
> IOMMU groups to ensure they are isolated from each other.

I think that's a solution, I don't know if it's the only solution.
AIUI, the issue here is that we have a device specific kernel driver
extending vfio-pci with migration support for this device by using an
MMIO region of the same device.  This is susceptible to DMA
manipulation by the user device.   Whether that's a security issue or
not depends on how the user can break the device.  If the scope is
limited to breaking their own device, they can do that any number of
ways and it's not very interesting.  If the user can manipulate device
state in order to trigger an exploit of the host-side kernel driver,
that's obviously more of a problem.

The other side of this is that if migration support can be implemented
entirely within the VF using this portion of the device MMIO space, why
do we need the host kernel to support this rather than implementing it
in userspace?  For example, QEMU could know about this device,
manipulate the BAR size to expose only the operational portion of MMIO
to the VM and use the remainder to support migration itself.  I'm
afraid that just like mdev, the vfio migration uAPI is going to be used
as an excuse to create kernel drivers simply to be able to make use of
that uAPI.  I haven't looked at this driver to know if it has some
other reason to exist beyond what could be done through vfio-pci and
userspace migration support.  Thanks,

Alex



Re: [PATCH 23/40] drm/ttm/ttm_bo: Fix incorrectly documented function 'ttm_bo_cleanup_refs'

2021-04-20 Thread Alex Deucher
On Fri, Apr 16, 2021 at 11:32 AM Christian König
 wrote:
>
> Am 16.04.21 um 16:37 schrieb Lee Jones:
> > Fixes the following W=1 kernel build warning(s):
> >
> >   drivers/gpu/drm/ttm/ttm_bo.c:293: warning: expecting prototype for 
> > function ttm_bo_cleanup_refs(). Prototype was for ttm_bo_cleanup_refs() 
> > instead
> >
> > Cc: Christian Koenig 
> > Cc: Huang Rui 
> > Cc: David Airlie 
> > Cc: Daniel Vetter 
> > Cc: Sumit Semwal 
> > Cc: dri-de...@lists.freedesktop.org
> > Cc: linux-me...@vger.kernel.org
> > Cc: linaro-mm-...@lists.linaro.org
> > Signed-off-by: Lee Jones 
>
> Reviewed-by: Christian König 

Can you push the ttm and sched fixes to drm-misc?

Alex


>
> > ---
> >   drivers/gpu/drm/ttm/ttm_bo.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> > index cfd0b92923973..defec9487e1de 100644
> > --- a/drivers/gpu/drm/ttm/ttm_bo.c
> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> > @@ -274,7 +274,7 @@ static void ttm_bo_flush_all_fences(struct 
> > ttm_buffer_object *bo)
> >   }
> >
> >   /**
> > - * function ttm_bo_cleanup_refs
> > + * ttm_bo_cleanup_refs
> >* If bo idle, remove from lru lists, and unref.
> >* If not idle, block if possible.
> >*
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v5 1/3] riscv: Move kernel mapping outside of linear mapping

2021-04-18 Thread Alex Ghiti

Hi Palmer,

Le 4/15/21 à 2:00 PM, Alex Ghiti a écrit :

Le 4/15/21 à 12:54 AM, Alex Ghiti a écrit :

Le 4/15/21 à 12:20 AM, Palmer Dabbelt a écrit :

On Sun, 11 Apr 2021 09:41:44 PDT (-0700), a...@ghiti.fr wrote:

This is a preparatory patch for relocatable kernel and sv48 support.

The kernel used to be linked at PAGE_OFFSET address therefore we 
could use
the linear mapping for the kernel mapping. But the relocated kernel 
base
address will be different from PAGE_OFFSET and since in the linear 
mapping,
two different virtual addresses cannot point to the same physical 
address,
the kernel mapping needs to lie outside the linear mapping so that 
we don't

have to copy it at the same physical offset.

The kernel mapping is moved to the last 2GB of the address space, BPF
is now always after the kernel and modules use the 2GB memory range 
right

before the kernel, so BPF and modules regions do not overlap. KASLR
implementation will simply have to move the kernel in the last 2GB 
range

and just take care of leaving enough space for BPF.

In addition, by moving the kernel to the end of the address space, both
sv39 and sv48 kernels will be exactly the same without needing to be
relocated at runtime.

Suggested-by: Arnd Bergmann 
Signed-off-by: Alexandre Ghiti 
---
 arch/riscv/boot/loader.lds.S    |  3 +-
 arch/riscv/include/asm/page.h  | 17 +-
 arch/riscv/include/asm/pgtable.h    | 37 
 arch/riscv/include/asm/set_memory.h |  1 +
 arch/riscv/kernel/head.S    |  3 +-
 arch/riscv/kernel/module.c  |  6 +-
 arch/riscv/kernel/setup.c   |  5 ++
 arch/riscv/kernel/vmlinux.lds.S | 3 +-
 arch/riscv/mm/fault.c   | 13 +
 arch/riscv/mm/init.c    | 87 ++---
 arch/riscv/mm/kasan_init.c  |  9 +++
 arch/riscv/mm/physaddr.c    |  2 +-
 12 files changed, 146 insertions(+), 40 deletions(-)

diff --git a/arch/riscv/boot/loader.lds.S 
b/arch/riscv/boot/loader.lds.S

index 47a5003c2e28..62d94696a19c 100644
--- a/arch/riscv/boot/loader.lds.S
+++ b/arch/riscv/boot/loader.lds.S
@@ -1,13 +1,14 @@
 /* SPDX-License-Identifier: GPL-2.0 */

 #include 
+#include 

 OUTPUT_ARCH(riscv)
 ENTRY(_start)

 SECTIONS
 {
-    . = PAGE_OFFSET;
+    . = KERNEL_LINK_ADDR;

 .payload : {
 *(.payload)
diff --git a/arch/riscv/include/asm/page.h 
b/arch/riscv/include/asm/page.h

index adc9d26f3d75..22cfb2be60dc 100644
--- a/arch/riscv/include/asm/page.h
+++ b/arch/riscv/include/asm/page.h
@@ -90,15 +90,28 @@ typedef struct page *pgtable_t;

 #ifdef CONFIG_MMU
 extern unsigned long va_pa_offset;
+extern unsigned long va_kernel_pa_offset;
 extern unsigned long pfn_base;
 #define ARCH_PFN_OFFSET   (pfn_base)
 #else
 #define va_pa_offset    0
+#define va_kernel_pa_offset    0
 #define ARCH_PFN_OFFSET   (PAGE_OFFSET >> PAGE_SHIFT)
 #endif /* CONFIG_MMU */

-#define __pa_to_va_nodebug(x)    ((void *)((unsigned long) (x) + 
va_pa_offset))

-#define __va_to_pa_nodebug(x)    ((unsigned long)(x) - va_pa_offset)
+extern unsigned long kernel_virt_addr;
+
+#define linear_mapping_pa_to_va(x)    ((void *)((unsigned long)(x) 
+ va_pa_offset))
+#define kernel_mapping_pa_to_va(x)    ((void *)((unsigned long)(x) 
+ va_kernel_pa_offset))

+#define __pa_to_va_nodebug(x)    linear_mapping_pa_to_va(x)
+
+#define linear_mapping_va_to_pa(x)    ((unsigned long)(x) - 
va_pa_offset)
+#define kernel_mapping_va_to_pa(x)    ((unsigned long)(x) - 
va_kernel_pa_offset)

+#define __va_to_pa_nodebug(x)    ({    \
+    unsigned long _x = x;    \
+    (_x < kernel_virt_addr) ?    \
+    linear_mapping_va_to_pa(_x) : 
kernel_mapping_va_to_pa(_x);    \

+    })

 #ifdef CONFIG_DEBUG_VIRTUAL
 extern phys_addr_t __virt_to_phys(unsigned long x);
diff --git a/arch/riscv/include/asm/pgtable.h 
b/arch/riscv/include/asm/pgtable.h

index ebf817c1bdf4..80e63a93e903 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -11,23 +11,30 @@

 #include 

-#ifndef __ASSEMBLY__
-
-/* Page Upper Directory not used in RISC-V */
-#include 
-#include 
-#include 
-#include 
+#ifndef CONFIG_MMU
+#define KERNEL_LINK_ADDR    PAGE_OFFSET
+#else

-#ifdef CONFIG_MMU
+#define ADDRESS_SPACE_END    (UL(-1))
+/*
+ * Leave 2GB for kernel and BPF at the end of the address space
+ */
+#define KERNEL_LINK_ADDR    (ADDRESS_SPACE_END - SZ_2G + 1)

 #define VMALLOC_SIZE (KERN_VIRT_SIZE >>1)
 #define VMALLOC_END  (PAGE_OFFSET - 1)
 #define VMALLOC_START    (PAGE_OFFSET - VMALLOC_SIZE)

+/* KASLR should leave at least 128MB for BPF after the kernel */
 #define BPF_JIT_REGION_SIZE    (SZ_128M)
-#define BPF_JIT_REGION_START    (PAGE_OFFSET - BPF_JIT_REGION_SIZE)
-#define BPF_JIT_REGION_END    (VMALLOC_END)
+#define BPF_JIT_REGION_START    PFN_ALIGN((unsigned long)&_end)
+#define BPF_JIT_REGION_END    (BPF_JIT_REGION_START + 
BPF_JIT_REGION_SIZE)

+
+/* Modules

Re: [PATCH] riscv: Protect kernel linear mapping only if CONFIG_STRICT_KERNEL_RWX is set

2021-04-17 Thread Alex Ghiti

Le 4/16/21 à 12:33 PM, Palmer Dabbelt a écrit :

On Fri, 16 Apr 2021 03:47:19 PDT (-0700), a...@ghiti.fr wrote:

Hi Anup,

Le 4/16/21 à 6:41 AM, Anup Patel a écrit :

On Thu, Apr 15, 2021 at 4:34 PM Alexandre Ghiti  wrote:


If CONFIG_STRICT_KERNEL_RWX is not set, we cannot set different 
permissions

to the kernel data and text sections, so make sure it is defined before
trying to protect the kernel linear mapping.

Signed-off-by: Alexandre Ghiti 


Maybe you should add "Fixes:" tag in commit tag ?


Yes you're right I should have done that. Maybe Palmer will squash it as
it just entered for-next?


Ya, I'll do it.  My testing box was just tied up last night for the rc8 
PR, so I threw this on for-next to get the buildbots to take a look. 
It's a bit too late to take something for this week, as I try to be 
pretty conservative this late in the cycle.  There's another kprobes fix 
on the list so if we end up with an rc8 I might send this along with 
that, otherwise this'll just go onto for-next before the linear map 
changes that exercise the bug.


You're more than welcome to just dig up the fixes tag and reply, my 
scripts pull all tags from replies (just like Revieweb-by).  Otherwise 
I'll do it myself, most people don't really post Fixes tags that 
accurately so I go through it for pretty much everything anyway.


Here it is:

Fixes: 4b67f48da707 ("riscv: Move kernel mapping outside of linear mapping")

Thanks,



Thanks for sorting this out so quickly!





Otherwise it looks good.

Reviewed-by: Anup Patel 


Thank you!

Alex



Regards,
Anup


---
  arch/riscv/kernel/setup.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
index 626003bb5fca..ab394d173cd4 100644
--- a/arch/riscv/kernel/setup.c
+++ b/arch/riscv/kernel/setup.c
@@ -264,12 +264,12 @@ void __init setup_arch(char **cmdline_p)

 sbi_init();

-   if (IS_ENABLED(CONFIG_STRICT_KERNEL_RWX))
+   if (IS_ENABLED(CONFIG_STRICT_KERNEL_RWX)) {
 protect_kernel_text_data();
-
-#if defined(CONFIG_64BIT) && defined(CONFIG_MMU)
-   protect_kernel_linear_mapping_text_rodata();
+#ifdef CONFIG_64BIT
+   protect_kernel_linear_mapping_text_rodata();
  #endif
+   }

  #ifdef CONFIG_SWIOTLB
 swiotlb_init(1);
--
2.20.1



___
linux-riscv mailing list
linux-ri...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv



___
linux-riscv mailing list
linux-ri...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv


Re: [PATCH v4 1/3] riscv: Move kernel mapping outside of linear mapping

2021-04-17 Thread Alex Ghiti

Hi Guenter,

Le 4/16/21 à 2:51 PM, Guenter Roeck a écrit :

On Fri, Apr 09, 2021 at 02:14:58AM -0400, Alexandre Ghiti wrote:

This is a preparatory patch for relocatable kernel and sv48 support.

The kernel used to be linked at PAGE_OFFSET address therefore we could use
the linear mapping for the kernel mapping. But the relocated kernel base
address will be different from PAGE_OFFSET and since in the linear mapping,
two different virtual addresses cannot point to the same physical address,
the kernel mapping needs to lie outside the linear mapping so that we don't
have to copy it at the same physical offset.

The kernel mapping is moved to the last 2GB of the address space, BPF
is now always after the kernel and modules use the 2GB memory range right
before the kernel, so BPF and modules regions do not overlap. KASLR
implementation will simply have to move the kernel in the last 2GB range
and just take care of leaving enough space for BPF.

In addition, by moving the kernel to the end of the address space, both
sv39 and sv48 kernels will be exactly the same without needing to be
relocated at runtime.

Suggested-by: Arnd Bergmann 
Signed-off-by: Alexandre Ghiti 


In next-20210416, when booting a riscv32 image in qemu, this patch results in:

[0.00] Linux version 5.12.0-rc7-next-20210416 (groeck@desktop) 
(riscv32-linux-gcc (GCC) 10.3.0, GNU ld (GNU Binutils) 2.36.1) #1 SMP Fri Apr 
16 10:38:09 PDT 2021
[0.00] OF: fdt: Ignoring memory block 0x8000 - 0xa000
[0.00] Machine model: riscv-virtio,qemu
[0.00] earlycon: uart8250 at MMIO 0x1000 (options '115200')
[0.00] printk: bootconsole [uart8250] enabled
[0.00] efi: UEFI not found.
[0.00] Kernel panic - not syncing: init_resources: Failed to allocate 
160 bytes
[0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 5.12.0-rc7-next-20210416 
#1
[0.00] Hardware name: riscv-virtio,qemu (DT)
[0.00] Call Trace:
[0.00] [<80005292>] walk_stackframe+0x0/0xce
[0.00] [<809f4db8>] dump_backtrace+0x38/0x46
[0.00] [<809f4dd4>] show_stack+0xe/0x16
[0.00] [<809ff1d0>] dump_stack+0x92/0xc6
[0.00] [<809f4fee>] panic+0x10a/0x2d8
[0.00] [<80c02b24>] setup_arch+0x2a0/0x4ea
[0.00] [<80c006b0>] start_kernel+0x90/0x628
[0.00] ---[ end Kernel panic - not syncing: init_resources: Failed to 
allocate 160 bytes ]---

Reverting it fixes the problem. I understand that the version in -next is
different to this version of the patch, but I also tried v4 and it still
crashes with the same error message.



I completely neglected 32b kernel in this series, I fixed that here:

Thank you for testing and reporting,

Alex


Guenter

___
linux-riscv mailing list
linux-ri...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv



Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-16 Thread Alex Williamson
On Fri, 16 Apr 2021 06:12:58 -0700
Jacob Pan  wrote:

> Hi Jason,
> 
> On Thu, 15 Apr 2021 20:07:32 -0300, Jason Gunthorpe  wrote:
> 
> > On Thu, Apr 15, 2021 at 03:11:19PM +0200, Auger Eric wrote:  
> > > Hi Jason,
> > > 
> > > On 4/1/21 6:03 PM, Jason Gunthorpe wrote:
> > > > On Thu, Apr 01, 2021 at 02:08:17PM +, Liu, Yi L wrote:
> > > > 
> > > >> DMA page faults are delivered to root-complex via page request
> > > >> message and it is per-device according to PCIe spec. Page request
> > > >> handling flow is:
> > > >>
> > > >> 1) iommu driver receives a page request from device
> > > >> 2) iommu driver parses the page request message. Get the RID,PASID,
> > > >> faulted page and requested permissions etc.
> > > >> 3) iommu driver triggers fault handler registered by device driver
> > > >> with iommu_report_device_fault()
> > > > 
> > > > This seems confused.
> > > > 
> > > > The PASID should define how to handle the page fault, not the driver.
> > > >
> > > 
> > > In my series I don't use PASID at all. I am just enabling nested stage
> > > and the guest uses a single context. I don't allocate any user PASID at
> > > any point.
> > > 
> > > When there is a fault at physical level (a stage 1 fault that concerns
> > > the guest), this latter needs to be reported and injected into the
> > > guest. The vfio pci driver registers a fault handler to the iommu layer
> > > and in that fault handler it fills a circ bugger and triggers an eventfd
> > > that is listened to by the VFIO-PCI QEMU device. this latter retrives
> > > the faault from the mmapped circ buffer, it knowns which vIOMMU it is
> > > attached to, and passes the fault to the vIOMMU.
> > > Then the vIOMMU triggers and IRQ in the guest.
> > > 
> > > We are reusing the existing concepts from VFIO, region, IRQ to do that.
> > > 
> > > For that use case, would you also use /dev/ioasid?
> > 
> > /dev/ioasid could do all the things you described vfio-pci as doing,
> > it can even do them the same way you just described.
> > 
> > Stated another way, do you plan to duplicate all of this code someday
> > for vfio-cxl? What about for vfio-platform? ARM SMMU can be hooked to
> > platform devices, right?
> > 
> > I feel what you guys are struggling with is some choice in the iommu
> > kernel APIs that cause the events to be delivered to the pci_device
> > owner, not the PASID owner.
> > 
> > That feels solvable.
> >   
> Perhaps more of a philosophical question for you and Alex. There is no
> doubt that the direction you guided for /dev/ioasid is a much cleaner one,
> especially after VDPA emerged as another IOMMU backed framework.

I think this statement answers all your remaining questions ;)

> The question is what do we do with the nested translation features that have
> been targeting the existing VFIO-IOMMU for the last three years? That
> predates VDPA. Shall we put a stop marker *after* nested support and say no
> more extensions for VFIO-IOMMU, new features must be built on this new
> interface?
>
> If we were to close a checkout line for some unforeseen reasons, should we
> honor the customers already in line for a long time?
> 
> This is not a tactic or excuse for not working on the new /dev/ioasid
> interface. In fact, I believe we can benefit from the lessons learned while
> completing the existing. This will give confidence to the new
> interface. Thoughts?

I understand a big part of Jason's argument is that we shouldn't be in
the habit of creating duplicate interfaces, we should create one, well
designed interfaces to share among multiple subsystems.  As new users
have emerged, our solution needs to change to a common one rather than
a VFIO specific one.  The IOMMU uAPI provides an abstraction, but at
the wrong level, requiring userspace interfaces for each subsystem.

Luckily the IOMMU uAPI is not really exposed as an actual uAPI, but
that changes if we proceed to enable the interfaces to tunnel it
through VFIO.

The logical answer would therefore be that we don't make that
commitment to the IOMMU uAPI if we believe now that it's fundamentally
flawed.

Ideally this new /dev/ioasid interface, and making use of it as a VFIO
IOMMU backend, should replace type1.  Type1 will live on until that
interface gets to parity, at which point we may deprecate type1, but it
wouldn't make sense to continue to expand type1 in the same direction
as we intend /dev/ioasid to take over in the meantime, especially if it
means maintaining an otherwise dead uAPI.  Thanks,

Alex



[PATCH net-next 2/2] net: ipa: optionally define firmware name via DT

2021-04-16 Thread Alex Elder
IPA initialization includes loading some firmware.  This step is
done either by the modem or by the AP under Trust Zone.  If the
AP loads firmware, the name of the firmware file is currently
hard-coded ("ipa_fws.mdt").

Add the ability to specify the relative path of the firmware file to
use in a property in the Device Tree IPA node.  If the property is
not found (or if any other error occurs attempting to get it), fall
back to using a default relative path.

Use the "old" fixed name as the default.  Rename the symbol that
represents this default to emphasize its purpose.

Signed-off-by: Alex Elder 
---
 drivers/net/ipa/ipa_main.c | 23 +++
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ipa/ipa_main.c b/drivers/net/ipa/ipa_main.c
index aad915e2ce523..9915603ed10ba 100644
--- a/drivers/net/ipa/ipa_main.c
+++ b/drivers/net/ipa/ipa_main.c
@@ -67,7 +67,7 @@
  */
 
 /* The name of the GSI firmware file relative to /lib/firmware */
-#define IPA_FWS_PATH   "ipa_fws.mdt"
+#define IPA_FW_PATH_DEFAULT"ipa_fws.mdt"
 #define IPA_PAS_ID 15
 
 /* Shift of 19.2 MHz timestamp to achieve lower resolution timestamps */
@@ -517,6 +517,7 @@ static int ipa_firmware_load(struct device *dev)
struct device_node *node;
struct resource res;
phys_addr_t phys;
+   const char *path;
ssize_t size;
void *virt;
int ret;
@@ -534,9 +535,17 @@ static int ipa_firmware_load(struct device *dev)
return ret;
}
 
-   ret = request_firmware(, IPA_FWS_PATH, dev);
+   /* Use name from DTB if specified; use default for *any* error */
+   ret = of_property_read_string(dev->of_node, "firmware-name", );
if (ret) {
-   dev_err(dev, "error %d requesting \"%s\"\n", ret, IPA_FWS_PATH);
+   dev_dbg(dev, "error %d getting \"firmware-name\" resource\n",
+   ret);
+   path = IPA_FW_PATH_DEFAULT;
+   }
+
+   ret = request_firmware(, path, dev);
+   if (ret) {
+   dev_err(dev, "error %d requesting \"%s\"\n", ret, path);
return ret;
}
 
@@ -549,13 +558,11 @@ static int ipa_firmware_load(struct device *dev)
goto out_release_firmware;
}
 
-   ret = qcom_mdt_load(dev, fw, IPA_FWS_PATH, IPA_PAS_ID,
-   virt, phys, size, NULL);
+   ret = qcom_mdt_load(dev, fw, path, IPA_PAS_ID, virt, phys, size, NULL);
if (ret)
-   dev_err(dev, "error %d loading \"%s\"\n", ret, IPA_FWS_PATH);
+   dev_err(dev, "error %d loading \"%s\"\n", ret, path);
else if ((ret = qcom_scm_pas_auth_and_reset(IPA_PAS_ID)))
-   dev_err(dev, "error %d authenticating \"%s\"\n", ret,
-   IPA_FWS_PATH);
+   dev_err(dev, "error %d authenticating \"%s\"\n", ret, path);
 
memunmap(virt);
 out_release_firmware:
-- 
2.27.0



[PATCH net-next 1/2] dt-bindings: net: qcom,ipa: add firmware-name property

2021-04-16 Thread Alex Elder
Add a new optional firmware-name property to the IPA DT node.  It
is used only if the modem is not doing early initialization (i.e.,
if the modem-init property is not present).  Its value is the name
of the firmware file to use; if it's not specified, a default name
("ipa_fws.mdt") is used.

Signed-off-by: Alex Elder 
---
 .../devicetree/bindings/net/qcom,ipa.yaml | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/Documentation/devicetree/bindings/net/qcom,ipa.yaml 
b/Documentation/devicetree/bindings/net/qcom,ipa.yaml
index da5212e693e91..7443490d4cc6d 100644
--- a/Documentation/devicetree/bindings/net/qcom,ipa.yaml
+++ b/Documentation/devicetree/bindings/net/qcom,ipa.yaml
@@ -125,6 +125,14 @@ properties:
   the firmware passed to Trust Zone for authentication.  Required
   when Trust Zone (not the modem) performs early initialization.
 
+  firmware-name:
+$ref: /schemas/types.yaml#/definitions/string
+description:
+  If present, name (or relative path) of the file within the
+  firmware search path containing the firmware image used when
+  initializing IPA hardware.  Optional, and only used when
+  Trust Zone performs early initialization.
+
 required:
   - compatible
   - iommus
@@ -134,12 +142,23 @@ required:
   - interconnects
   - qcom,smem-states
 
+# Either modem-init is present, or memory-region must be present.
 oneOf:
   - required:
   - modem-init
   - required:
   - memory-region
 
+# If memory-region is present, firmware-name may optionally be present.
+# But if modem-init is present, firmware-name must not be present.
+if:
+  required:
+- modem-init
+then:
+  not:
+required:
+  - firmware-name
+
 additionalProperties: false
 
 examples:
-- 
2.27.0



[PATCH net-next 0/2] net: ipa: allow different firmware names

2021-04-16 Thread Alex Elder
Add the ability to define a "firmware-name" property in the IPA DT
node, specifying an alternate name to use for the firmware file.
Used only if the AP (Trust Zone) does early IPA initialization.

        -Alex

Alex Elder (2):
  dt-bindings: net: qcom,ipa: add firmware-name property
  net: ipa: optionally define firmware name via DT

 .../devicetree/bindings/net/qcom,ipa.yaml | 19 +++
 drivers/net/ipa/ipa_main.c| 23 ---
 2 files changed, 34 insertions(+), 8 deletions(-)

-- 
2.27.0



Re: [PATCH] drivers: ipa: Fix missing IRQF_ONESHOT as only threaded handler

2021-04-16 Thread Alex Elder

On 4/15/21 10:40 PM, zhuguangqin...@gmail.com wrote:

From: Guangqing Zhu 


This is not required here.  -Alex

https://lore.kernel.org/netdev/d57e0a43-4d87-93cf-471c-c8185ea85...@ieee.org/


Coccinelle noticed:
drivers/net/ipa/ipa_smp2p.c:186:7-27: ERROR: Threaded IRQ with no primary
handler requested without IRQF_ONESHOT

Signed-off-by: Guangqing Zhu 
---
  drivers/net/ipa/ipa_smp2p.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ipa/ipa_smp2p.c b/drivers/net/ipa/ipa_smp2p.c
index a5f7a79a1923..74e04427a711 100644
--- a/drivers/net/ipa/ipa_smp2p.c
+++ b/drivers/net/ipa/ipa_smp2p.c
@@ -183,7 +183,8 @@ static int ipa_smp2p_irq_init(struct ipa_smp2p *smp2p, 
const char *name,
}
irq = ret;
  
-	ret = request_threaded_irq(irq, NULL, handler, 0, name, smp2p);

+   ret = request_threaded_irq(irq, NULL, handler, IRQF_ONESHOT,
+  name, smp2p);
if (ret) {
dev_err(dev, "error %d requesting \"%s\" IRQ\n", ret, name);
return ret;





Re: [PATCH] riscv: Protect kernel linear mapping only if CONFIG_STRICT_KERNEL_RWX is set

2021-04-16 Thread Alex Ghiti

Hi Anup,

Le 4/16/21 à 6:41 AM, Anup Patel a écrit :

On Thu, Apr 15, 2021 at 4:34 PM Alexandre Ghiti  wrote:


If CONFIG_STRICT_KERNEL_RWX is not set, we cannot set different permissions
to the kernel data and text sections, so make sure it is defined before
trying to protect the kernel linear mapping.

Signed-off-by: Alexandre Ghiti 


Maybe you should add "Fixes:" tag in commit tag ?


Yes you're right I should have done that. Maybe Palmer will squash it as 
it just entered for-next?




Otherwise it looks good.

Reviewed-by: Anup Patel 


Thank you!

Alex



Regards,
Anup


---
  arch/riscv/kernel/setup.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
index 626003bb5fca..ab394d173cd4 100644
--- a/arch/riscv/kernel/setup.c
+++ b/arch/riscv/kernel/setup.c
@@ -264,12 +264,12 @@ void __init setup_arch(char **cmdline_p)

 sbi_init();

-   if (IS_ENABLED(CONFIG_STRICT_KERNEL_RWX))
+   if (IS_ENABLED(CONFIG_STRICT_KERNEL_RWX)) {
 protect_kernel_text_data();
-
-#if defined(CONFIG_64BIT) && defined(CONFIG_MMU)
-   protect_kernel_linear_mapping_text_rodata();
+#ifdef CONFIG_64BIT
+   protect_kernel_linear_mapping_text_rodata();
  #endif
+   }

  #ifdef CONFIG_SWIOTLB
 swiotlb_init(1);
--
2.20.1



___
linux-riscv mailing list
linux-ri...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv



Re: linux-next: build warning after merge of the amdgpu tree

2021-04-15 Thread Alex Deucher
On Fri, Apr 16, 2021 at 1:47 AM Liang, Prike  wrote:
>
> [AMD Public Use]
>
> > From: Stephen Rothwell 
> > Sent: Friday, April 16, 2021 12:09 PM
> > To: Liang, Prike 
> > Cc: Alex Deucher ; S-k, Shyam-sundar  > sundar@amd.com>; Linux Kernel Mailing List  > ker...@vger.kernel.org>; Linux Next Mailing List 
> > 
> > Subject: Re: linux-next: build warning after merge of the amdgpu tree
> >
> > Hi,
> >
> > On Fri, 16 Apr 2021 03:12:12 + "Liang, Prike" 
> > wrote:
> > >
> > > Hi, Rothwell
> >
> > (Stephen, actually :-))
> >
> > > This fix solution hasn't locked down and still being discussed and roll-
> > updated in the NVMe mail group.
> > > Will update the patch once it refined done.
> >
> > In which case, this patch should not be in linux-next (or any branch that is
> > included by linux-next).
> >
> How about revert the patch temporally ? Once lock down the solution and will 
> land in the final latest patch.

I'll drop it for now.  I just have it in there temporarily while it
makes its way upstream because a lot of people use this branch and it
fixes an important bug.

Alex


Re: [Regression] amdgpu driver broken on AMD HD7770 GHz edition.

2021-04-15 Thread Alex Deucher
On Fri, Apr 16, 2021 at 12:48 AM David Niklas  wrote:
>
> Hey,
>
> I forgot to give you a bug tracker in case you want one.
> Here: https://bugzilla.kernel.org/show_bug.cgi?id=212691

I've followed up on the bug report.  Please take a look there.

Alex

>
> Thanks,
> David
> ___
> amd-gfx mailing list
> amd-...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [External] : Re: [PATCH v14 4/6] locking/qspinlock: Introduce starvation avoidance into CNA

2021-04-15 Thread Alex Kogan



> On Apr 13, 2021, at 8:03 AM, Peter Zijlstra  wrote:
> 
> On Thu, Apr 01, 2021 at 11:31:54AM -0400, Alex Kogan wrote:
> 
>> @@ -49,13 +55,33 @@ struct cna_node {
>>  u16 real_numa_node;
>>  u32 encoded_tail;   /* self */
>>  u32 partial_order;  /* enum val */
>> +s32 start_time;
>> };
> 
>> +/*
>> + * Controls the threshold time in ms (default = 10) for intra-node lock
>> + * hand-offs before the NUMA-aware variant of spinlock is forced to be
>> + * passed to a thread on another NUMA node. The default setting can be
>> + * changed with the "numa_spinlock_threshold" boot option.
>> + */
>> +#define MSECS_TO_JIFFIES(m) \
>> +(((m) + (MSEC_PER_SEC / HZ) - 1) / (MSEC_PER_SEC / HZ))
>> +static int intra_node_handoff_threshold __ro_after_init = 
>> MSECS_TO_JIFFIES(10);
>> +
>> +static inline bool intra_node_threshold_reached(struct cna_node *cn)
>> +{
>> +s32 current_time = (s32)jiffies;
>> +s32 threshold = cn->start_time + intra_node_handoff_threshold;
>> +
>> +return current_time - threshold > 0;
>> +}
> 
> None of this makes any sense:
> 
> - why do you track time elapsed as a signed entity?
> - why are you using jiffies; that's terrible granularity.
Good points. I will address that (see below). I will just mention that 
those suggestions came from senior folks on this mailing list,
and it seemed prudent to take their counsel. 

> 
> As Andi already said, 10ms is silly large. You've just inflated the
> lock-acquire time for every contended lock to stupid land just because
> NUMA.
I just ran a few quick tests — local_clock() (a wrapper around sched_clock()) 
works well, so I will switch to using that.

I also took a few numbers with different thresholds. Looks like we can drop 
the threshold to 1ms with a minor penalty to performance. However, 
pushing the threshold to 100us has a more significant cost. Here are
the numbers for reference:

will-it-scale/lock2_threads:
threshold: 10ms 1ms  100us
speedup at 142 threads:   2.1841.974 1.1418 

will-it-scale/open1_threads:
threshold: 10ms 1ms  100us
speedup at 142 threads:   2.1461.974 1.291

Would you be more comfortable with setting the default at 1ms?

> And this also brings me to the whole premise of this series; *why* are
> we optimizing this? What locks are so contended that this actually helps
> and shouldn't you be spending your time breaking those locks? That would
> improve throughput more than this ever can.

I think for the same reason the kernel switched from ticket locks to queue locks
several years back. There always will be applications with contended locks. 
Sometimes the workarounds are easy, but many times they are not, like with 
legacy applications or when the workload is skewed (e.g., every client tries to
update the metadata of the same file protected by the same lock). The results
show that for those cases we leave > 2x performance on the table. Those are not
only our numbers — LKP reports show similar or even better results, 
on a wide range of benchmarks, e.g.:
https://lists.01.org/hyperkitty/list/l...@lists.01.org/thread/HGVOCYDEE5KTLYPTAFBD2RXDQOCDPFUJ/
https://lists.01.org/hyperkitty/list/l...@lists.01.org/thread/OUPS7MZ3GJA2XYWM52GMU7H7EI25IT37/
https://lists.01.org/hyperkitty/list/l...@lists.01.org/thread/DNMEQPXJRQY2IKHZ3ERGRY6TUPWDTFUN/

Regards,
— Alex



Re: [RFC PATCH 0/3] vfio/hisilicon: add acc live migration driver

2021-04-15 Thread Alex Williamson
[Cc+ NVIDIA folks both from migration and vfio-pci-core discussion]

On Tue, 13 Apr 2021 11:36:20 +0800
Longfang Liu  wrote:

> The live migration solution relies on the vfio_device_migration_info protocol.
> The structure vfio_device_migration_info is placed at the 0th offset of
> the VFIO_REGION_SUBTYPE_MIGRATION region to get and set VFIO device related
> migration information. Field accesses from this structure are only supported
> at their native width and alignment. Otherwise, the result is undefined and
> vendor drivers should return an error.
> 
> (1).The driver framework is based on vfio_pci_register_dev_region() of 
> vfio-pci,
> and then a new live migration region is added, and the live migration is
> realized through the ops of this region.
> 
> (2).In order to ensure the compatibility of the devices before and after the
> migration, the device compatibility information check will be performed in
> the Pre-copy stage. If the check fails, an error will be returned and the
> source VM will exit the migration function.
> 
> (3).After the compatibility check is passed, it will enter the Stop-and-copy
> stage. At this time, all the live migration data will be copied, and then
> saved to the VF device of the destination, and then the VF device of the
> destination will be started and the VM of the source will be exited.
> 
> Longfang Liu (3):
>   vfio/hisilicon: add acc live migration driver
>   vfio/hisilicon: register the driver to vfio
>   vfio/hisilicom: add debugfs for driver
> 
>  drivers/vfio/pci/Kconfig  |8 +
>  drivers/vfio/pci/Makefile |1 +
>  drivers/vfio/pci/hisilicon/acc_vf_migration.c | 1337 
> +
>  drivers/vfio/pci/hisilicon/acc_vf_migration.h |  170 
>  drivers/vfio/pci/vfio_pci.c   |   11 +
>  drivers/vfio/pci/vfio_pci_private.h   |9 +
>  6 files changed, 1536 insertions(+)
>  create mode 100644 drivers/vfio/pci/hisilicon/acc_vf_migration.c
>  create mode 100644 drivers/vfio/pci/hisilicon/acc_vf_migration.h
> 



Re: [PATCH 3/3] vfio/iommu_type1: Add support for manual dirty log clear

2021-04-15 Thread Alex Williamson
On Tue, 13 Apr 2021 17:14:45 +0800
Keqian Zhu  wrote:

> From: Kunkun Jiang 
> 
> In the past, we clear dirty log immediately after sync dirty
> log to userspace. This may cause redundant dirty handling if
> userspace handles dirty log iteratively:
> 
> After vfio clears dirty log, new dirty log starts to generate.
> These new dirty log will be reported to userspace even if they
> are generated before userspace handles the same dirty page.
> 
> That's to say, we should minimize the time gap of dirty log
> clearing and dirty log handling. We can give userspace the
> interface to clear dirty log.

IIUC, a user would be expected to clear the bitmap before copying the
dirty pages, therefore you're trying to reduce that time gap between
clearing any copy, but it cannot be fully eliminated and importantly,
if the user clears after copying, they've introduced a race.  Correct?

What results do you have to show that this is a worthwhile optimization?

I really don't like the semantics that testing for an IOMMU capability
enables it.  It needs to be explicitly controllable feature, which
suggests to me that it might be a flag used in combination with _GET or
a separate _GET_NOCLEAR operations.  Thanks,

Alex


> Co-developed-by: Keqian Zhu 
> Signed-off-by: Kunkun Jiang 
> ---
>  drivers/vfio/vfio_iommu_type1.c | 100 ++--
>  include/uapi/linux/vfio.h   |  28 -
>  2 files changed, 123 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 77950e47f56f..d9c4a27b3c4e 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -78,6 +78,7 @@ struct vfio_iommu {
>   boolv2;
>   boolnesting;
>   booldirty_page_tracking;
> + booldirty_log_manual_clear;
>   boolpinned_page_dirty_scope;
>   boolcontainer_open;
>  };
> @@ -1242,6 +1243,78 @@ static int vfio_iommu_dirty_log_sync(struct vfio_iommu 
> *iommu,
>   return ret;
>  }
>  
> +static int vfio_iova_dirty_log_clear(u64 __user *bitmap,
> +  struct vfio_iommu *iommu,
> +  dma_addr_t iova, size_t size,
> +  size_t pgsize)
> +{
> + struct vfio_dma *dma;
> + struct rb_node *n;
> + dma_addr_t start_iova, end_iova, riova;
> + unsigned long pgshift = __ffs(pgsize);
> + unsigned long bitmap_size;
> + unsigned long *bitmap_buffer = NULL;
> + bool clear_valid;
> + int rs, re, start, end, dma_offset;
> + int ret = 0;
> +
> + bitmap_size = DIRTY_BITMAP_BYTES(size >> pgshift);
> + bitmap_buffer = kvmalloc(bitmap_size, GFP_KERNEL);
> + if (!bitmap_buffer) {
> + ret = -ENOMEM;
> + goto out;
> + }
> +
> + if (copy_from_user(bitmap_buffer, bitmap, bitmap_size)) {
> + ret = -EFAULT;
> + goto out;
> + }
> +
> + for (n = rb_first(>dma_list); n; n = rb_next(n)) {
> + dma = rb_entry(n, struct vfio_dma, node);
> + if (!dma->iommu_mapped)
> + continue;
> + if ((dma->iova + dma->size - 1) < iova)
> + continue;
> + if (dma->iova > iova + size - 1)
> + break;
> +
> + start_iova = max(iova, dma->iova);
> + end_iova = min(iova + size, dma->iova + dma->size);
> +
> + /* Similar logic as the tail of vfio_iova_dirty_bitmap */
> +
> + clear_valid = false;
> + start = (start_iova - iova) >> pgshift;
> + end = (end_iova - iova) >> pgshift;
> + bitmap_for_each_set_region(bitmap_buffer, rs, re, start, end) {
> + clear_valid = true;
> + riova = iova + (rs << pgshift);
> + dma_offset = (riova - dma->iova) >> pgshift;
> + bitmap_clear(dma->bitmap, dma_offset, re - rs);
> + }
> +
> + if (clear_valid)
> + vfio_dma_populate_bitmap(dma, pgsize);
> +
> + if (clear_valid && !iommu->pinned_page_dirty_scope &&
> + dma->iommu_mapped && !iommu->num_non_hwdbm_groups) {
> + ret = vfio_iommu_dirty_log_clear(iommu, start_iova,
> + end_iova - start_iova,  bitmap_buffer,
> + iova, pgshift);
> + if (ret

Re: linux-next: manual merge of the vfio tree with the drm tree

2021-04-15 Thread Alex Williamson
On Thu, 15 Apr 2021 10:08:55 -0300
Jason Gunthorpe  wrote:

> On Thu, Apr 15, 2021 at 04:47:34PM +1000, Stephen Rothwell wrote:
> > Hi all,
> > 
> > Today's linux-next merge of the vfio tree got a conflict in:
> > 
> >   drivers/gpu/drm/i915/gvt/gvt.c
> > 
> > between commit:
> > 
> >   9ff06c385300 ("drm/i915/gvt: Remove references to struct drm_device.pdev")
> > 
> > from the drm tree and commit:
> > 
> >   383987fd15ba ("vfio/gvt: Use mdev_get_type_group_id()")
> > 
> > from the vfio tree.
> > 
> > I fixed it up (I used the latter version) and can carry the fix as
> > necessary.  
> 
> Yes that is right, thank you

Yep, thanks!

Alex



Re: QCA6174 pcie wifi: Add pci quirks

2021-04-15 Thread Alex Williamson
[cc +Pali]

On Thu, 15 Apr 2021 20:02:23 +0200
Ingmar Klein  wrote:

> First thanks to you both, Alex and Bjorn!
> I am in no way an expert on this topic, so I have to fully rely on your
> feedback, concerning this issue.
> 
> If you should have any other solution approach, in form of patch-set, I
> would be glad to test it out. Just let me know, what you think might
> make sense.
> I will wait for your further feedback on the issue. In the meantime I
> have my current workaround via quirk entry.
> 
> By the way, my layman's question:
> Do you think, that the following topic might also apply for the QCA6174?
> https://www.spinics.net/lists/linux-pci/msg106395.html
> Or in other words, should a similar approach be tried for the QCA6174
> and if yes, would it bring any benefit at all?
> I hope you can excuse me, in case the questions should not make too much
> sense.

If you run lspci -vvv on your device, what do LnkCap and LnkSta report
under the express capability?  I wonder if your device even supports
>Gen1 speeds, mine does not.

I would not expect that patch to be relevant to you based on your
report.  I understand it to resolve an issue during link retraining to a
higher speed on boot, not during a bus reset.  Pali can correct if I'm
wrong.  Thanks,

Alex

> Am 15.04.2021 um 04:36 schrieb Alex Williamson:
> > On Wed, 14 Apr 2021 16:03:50 -0500
> > Bjorn Helgaas  wrote:
> >  
> >> [+cc Alex]
> >>
> >> On Fri, Apr 09, 2021 at 11:26:33AM +0200, Ingmar Klein wrote:  
> >>> Edit: Retry, as I did not consider, that my mail-client would make this
> >>> party html.
> >>>
> >>> Dear maintainers,
> >>> I recently encountered an issue on my Proxmox server system, that
> >>> includes a Qualcomm QCA6174 m.2 PCIe wifi module.
> >>> https://deviwiki.com/wiki/AIRETOS_AFX-QCA6174-NX
> >>>
> >>> On system boot and subsequent virtual machine start (with passed-through
> >>> QCA6174), the VM would just freeze/hang, at the point where the ath10k
> >>> driver loads.
> >>> Quick search in the proxmox related topics, brought me to the following
> >>> discussion, which suggested a PCI quirk entry for the QCA6174 in the 
> >>> kernel:
> >>> https://forum.proxmox.com/threads/pcie-passthrough-freezes-proxmox.27513/
> >>>
> >>> I then went ahead, got the Proxmox kernel source (v5.4.106) and applied
> >>> the attached patch.
> >>> Effect was as hoped, that the VM hangs are now gone. System boots and
> >>> runs as intended.
> >>>
> >>> Judging by the existing quirk entries for Atheros, I would think, that
> >>> my proposed "fix" could be included in the vanilla kernel.
> >>> As far as I saw, there is no entry yet, even in the latest kernel 
> >>> sources.  
> >> This would need a signed-off-by; see
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?id=v5.11#n361
> >>
> >> This is an old issue, and likely we'll end up just applying this as
> >> yet another quirk.  But looking at c3e59ee4e766 ("PCI: Mark Atheros
> >> AR93xx to avoid bus reset"), where it started, it seems to be
> >> connected to 425c1b223dac ("PCI: Add Virtual Channel to save/restore
> >> support").
> >>
> >> I'd like to dig into that a bit more to see if there are any clues.
> >> AFAIK Linux itself still doesn't use VC at all, and 425c1b223dac added
> >> a fair bit of code.  I wonder if we're restoring something out of
> >> order or making some simple mistake in the way to restore VC config.  
> > I don't really have any faith in that bisect report in commit
> > c3e59ee4e766.  To double check I dug out the card from that commit,
> > installed an old Fedora release so I could build kernel v3.13,
> > pre-dating 425c1b223dac and tested triggering a bus reset both via
> > setpci and by masking PM reset so that sysfs can trigger the bus reset
> > path with the kernel save/restore code.  Both result in the system
> > hanging when the device is accessed either restoring from the kernel
> > bus reset or reading from the device after the setpci reset.  Thanks,
> >
> > Alex
> >  
> 



Re: [PATCH v5 1/3] riscv: Move kernel mapping outside of linear mapping

2021-04-15 Thread Alex Ghiti

Le 4/15/21 à 12:54 AM, Alex Ghiti a écrit :

Le 4/15/21 à 12:20 AM, Palmer Dabbelt a écrit :

On Sun, 11 Apr 2021 09:41:44 PDT (-0700), a...@ghiti.fr wrote:

This is a preparatory patch for relocatable kernel and sv48 support.

The kernel used to be linked at PAGE_OFFSET address therefore we 
could use

the linear mapping for the kernel mapping. But the relocated kernel base
address will be different from PAGE_OFFSET and since in the linear 
mapping,
two different virtual addresses cannot point to the same physical 
address,
the kernel mapping needs to lie outside the linear mapping so that we 
don't

have to copy it at the same physical offset.

The kernel mapping is moved to the last 2GB of the address space, BPF
is now always after the kernel and modules use the 2GB memory range 
right

before the kernel, so BPF and modules regions do not overlap. KASLR
implementation will simply have to move the kernel in the last 2GB range
and just take care of leaving enough space for BPF.

In addition, by moving the kernel to the end of the address space, both
sv39 and sv48 kernels will be exactly the same without needing to be
relocated at runtime.

Suggested-by: Arnd Bergmann 
Signed-off-by: Alexandre Ghiti 
---
 arch/riscv/boot/loader.lds.S    |  3 +-
 arch/riscv/include/asm/page.h   | 17 +-
 arch/riscv/include/asm/pgtable.h    | 37 
 arch/riscv/include/asm/set_memory.h |  1 +
 arch/riscv/kernel/head.S    |  3 +-
 arch/riscv/kernel/module.c  |  6 +-
 arch/riscv/kernel/setup.c   |  5 ++
 arch/riscv/kernel/vmlinux.lds.S |  3 +-
 arch/riscv/mm/fault.c   | 13 +
 arch/riscv/mm/init.c    | 87 ++---
 arch/riscv/mm/kasan_init.c  |  9 +++
 arch/riscv/mm/physaddr.c    |  2 +-
 12 files changed, 146 insertions(+), 40 deletions(-)

diff --git a/arch/riscv/boot/loader.lds.S b/arch/riscv/boot/loader.lds.S
index 47a5003c2e28..62d94696a19c 100644
--- a/arch/riscv/boot/loader.lds.S
+++ b/arch/riscv/boot/loader.lds.S
@@ -1,13 +1,14 @@
 /* SPDX-License-Identifier: GPL-2.0 */

 #include 
+#include 

 OUTPUT_ARCH(riscv)
 ENTRY(_start)

 SECTIONS
 {
-    . = PAGE_OFFSET;
+    . = KERNEL_LINK_ADDR;

 .payload : {
 *(.payload)
diff --git a/arch/riscv/include/asm/page.h 
b/arch/riscv/include/asm/page.h

index adc9d26f3d75..22cfb2be60dc 100644
--- a/arch/riscv/include/asm/page.h
+++ b/arch/riscv/include/asm/page.h
@@ -90,15 +90,28 @@ typedef struct page *pgtable_t;

 #ifdef CONFIG_MMU
 extern unsigned long va_pa_offset;
+extern unsigned long va_kernel_pa_offset;
 extern unsigned long pfn_base;
 #define ARCH_PFN_OFFSET    (pfn_base)
 #else
 #define va_pa_offset    0
+#define va_kernel_pa_offset    0
 #define ARCH_PFN_OFFSET    (PAGE_OFFSET >> PAGE_SHIFT)
 #endif /* CONFIG_MMU */

-#define __pa_to_va_nodebug(x)    ((void *)((unsigned long) (x) + 
va_pa_offset))

-#define __va_to_pa_nodebug(x)    ((unsigned long)(x) - va_pa_offset)
+extern unsigned long kernel_virt_addr;
+
+#define linear_mapping_pa_to_va(x)    ((void *)((unsigned long)(x) + 
va_pa_offset))
+#define kernel_mapping_pa_to_va(x)    ((void *)((unsigned long)(x) + 
va_kernel_pa_offset))

+#define __pa_to_va_nodebug(x)    linear_mapping_pa_to_va(x)
+
+#define linear_mapping_va_to_pa(x)    ((unsigned long)(x) - 
va_pa_offset)
+#define kernel_mapping_va_to_pa(x)    ((unsigned long)(x) - 
va_kernel_pa_offset)

+#define __va_to_pa_nodebug(x)    ({    \
+    unsigned long _x = x;    \
+    (_x < kernel_virt_addr) ?    \
+    linear_mapping_va_to_pa(_x) : kernel_mapping_va_to_pa(_x);    \
+    })

 #ifdef CONFIG_DEBUG_VIRTUAL
 extern phys_addr_t __virt_to_phys(unsigned long x);
diff --git a/arch/riscv/include/asm/pgtable.h 
b/arch/riscv/include/asm/pgtable.h

index ebf817c1bdf4..80e63a93e903 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -11,23 +11,30 @@

 #include 

-#ifndef __ASSEMBLY__
-
-/* Page Upper Directory not used in RISC-V */
-#include 
-#include 
-#include 
-#include 
+#ifndef CONFIG_MMU
+#define KERNEL_LINK_ADDR    PAGE_OFFSET
+#else

-#ifdef CONFIG_MMU
+#define ADDRESS_SPACE_END    (UL(-1))
+/*
+ * Leave 2GB for kernel and BPF at the end of the address space
+ */
+#define KERNEL_LINK_ADDR    (ADDRESS_SPACE_END - SZ_2G + 1)

 #define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1)
 #define VMALLOC_END  (PAGE_OFFSET - 1)
 #define VMALLOC_START    (PAGE_OFFSET - VMALLOC_SIZE)

+/* KASLR should leave at least 128MB for BPF after the kernel */
 #define BPF_JIT_REGION_SIZE    (SZ_128M)
-#define BPF_JIT_REGION_START    (PAGE_OFFSET - BPF_JIT_REGION_SIZE)
-#define BPF_JIT_REGION_END    (VMALLOC_END)
+#define BPF_JIT_REGION_START    PFN_ALIGN((unsigned long)&_end)
+#define BPF_JIT_REGION_END    (BPF_JIT_REGION_START + 
BPF_JIT_REGION_SIZE)

+
+/* Modules always live before the kernel */
+#ifdef CONFIG_64BIT
+#d

Re: [PATCH] drm/radeon/si: Fix inconsistent indenting

2021-04-15 Thread Alex Deucher
Applied.  Thanks!

Alex

On Thu, Apr 15, 2021 at 5:30 AM Yang Li  wrote:
>
> Kernel test robot throws below warning ->
>
> smatch warnings:
> drivers/gpu/drm/radeon/si.c:4514 si_vm_packet3_cp_dma_check() warn:
> inconsistent indenting
>
> Fixed the inconsistent indenting.
>
> Reported-by: Abaci Robot 
> Signed-off-by: Yang Li 
> ---
>  drivers/gpu/drm/radeon/si.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/radeon/si.c b/drivers/gpu/drm/radeon/si.c
> index 88731b79..d0e94b1 100644
> --- a/drivers/gpu/drm/radeon/si.c
> +++ b/drivers/gpu/drm/radeon/si.c
> @@ -4511,7 +4511,7 @@ static int si_vm_packet3_cp_dma_check(u32 *ib, u32 idx)
> } else {
> for (i = 0; i < (command & 0x1f); i++) {
> reg = start_reg + (4 * i);
> -   if (!si_vm_reg_valid(reg)) {
> +   if (!si_vm_reg_valid(reg)) {
> DRM_ERROR("CP DMA Bad DST 
> register\n");
> return -EINVAL;
> }
> --
> 1.8.3.1
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v5 1/3] riscv: Move kernel mapping outside of linear mapping

2021-04-14 Thread Alex Ghiti
ly_va = (void *)DTB_EARLY_BASE_VA + (dtb_pa & (PGDIR_SIZE 
- 1));

 #else /* CONFIG_BUILTIN_DTB */
-    dtb_early_va = __va(dtb_pa);
+    dtb_early_va = kernel_mapping_pa_to_va(dtb_pa);
 #endif /* CONFIG_BUILTIN_DTB */
 #endif
 dtb_early_pa = dtb_pa;
@@ -492,6 +522,22 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
 #endif
 }

+#ifdef CONFIG_64BIT
+void protect_kernel_linear_mapping_text_rodata(void)
+{
+    unsigned long text_start = (unsigned long)lm_alias(_start);
+    unsigned long init_text_start = (unsigned 
long)lm_alias(__init_text_begin);
+    unsigned long rodata_start = (unsigned 
long)lm_alias(__start_rodata);

+    unsigned long data_start = (unsigned long)lm_alias(_data);
+
+    set_memory_ro(text_start, (init_text_start - text_start) >> 
PAGE_SHIFT);
+    set_memory_nx(text_start, (init_text_start - text_start) >> 
PAGE_SHIFT);

+
+    set_memory_ro(rodata_start, (data_start - rodata_start) >> 
PAGE_SHIFT);
+    set_memory_nx(rodata_start, (data_start - rodata_start) >> 
PAGE_SHIFT);

+}
+#endif
+
 static void __init setup_vm_final(void)
 {
 uintptr_t va, map_size;
@@ -513,7 +559,7 @@ static void __init setup_vm_final(void)
    __pa_symbol(fixmap_pgd_next),
    PGDIR_SIZE, PAGE_TABLE);

-    /* Map all memory banks */
+    /* Map all memory banks in the linear mapping */
 for_each_mem_range(i, , ) {
 if (start >= end)
 break;
@@ -525,10 +571,13 @@ static void __init setup_vm_final(void)
 for (pa = start; pa < end; pa += map_size) {
 va = (uintptr_t)__va(pa);
 create_pgd_mapping(swapper_pg_dir, va, pa,
-   map_size, PAGE_KERNEL_EXEC);
+   map_size, PAGE_KERNEL);
 }
 }

+    /* Map the kernel */
+    create_kernel_page_table(swapper_pg_dir, PMD_SIZE);
+
 /* Clear fixmap PTE and PMD mappings */
 clear_fixmap(FIX_PTE);
 clear_fixmap(FIX_PMD);
diff --git a/arch/riscv/mm/kasan_init.c b/arch/riscv/mm/kasan_init.c
index 2c39f0386673..28f4d52cf17e 100644
--- a/arch/riscv/mm/kasan_init.c
+++ b/arch/riscv/mm/kasan_init.c
@@ -171,6 +171,10 @@ void __init kasan_init(void)
 phys_addr_t _start, _end;
 u64 i;

+    /*
+ * Populate all kernel virtual address space with 
kasan_early_shadow_page

+ * except for the linear mapping and the modules/kernel/BPF mapping.
+ */
 kasan_populate_early_shadow((void *)KASAN_SHADOW_START,
 (void *)kasan_mem_to_shadow((void *)
 VMEMMAP_END));
@@ -183,6 +187,7 @@ void __init kasan_init(void)
 (void *)kasan_mem_to_shadow((void *)VMALLOC_START),
 (void *)kasan_mem_to_shadow((void *)VMALLOC_END));

+    /* Populate the linear mapping */
 for_each_mem_range(i, &_start, &_end) {
 void *start = (void *)__va(_start);
 void *end = (void *)__va(_end);
@@ -193,6 +198,10 @@ void __init kasan_init(void)
 kasan_populate(kasan_mem_to_shadow(start), 
kasan_mem_to_shadow(end));

 };

+    /* Populate kernel, BPF, modules mapping */
+    kasan_populate(kasan_mem_to_shadow((const void *)MODULES_VADDR),
+   kasan_mem_to_shadow((const void *)BPF_JIT_REGION_END));
+
 for (i = 0; i < PTRS_PER_PTE; i++)
 set_pte(_early_shadow_pte[i],
 mk_pte(virt_to_page(kasan_early_shadow_page),
diff --git a/arch/riscv/mm/physaddr.c b/arch/riscv/mm/physaddr.c
index e8e4dcd39fed..35703d5ef5fd 100644
--- a/arch/riscv/mm/physaddr.c
+++ b/arch/riscv/mm/physaddr.c
@@ -23,7 +23,7 @@ EXPORT_SYMBOL(__virt_to_phys);

 phys_addr_t __phys_addr_symbol(unsigned long x)
 {
-    unsigned long kernel_start = (unsigned long)PAGE_OFFSET;
+    unsigned long kernel_start = (unsigned long)kernel_virt_addr;
 unsigned long kernel_end = (unsigned long)_end;

 /*


This is breaking boot for me with CONFIG_STRICT_KERNEL_RWX=n.  I'm not 
even really convinced that's a useful config to support, but it's 
currently optional and I'd prefer to avoid breaking it if possible.


I can't quite figure out what's going on here and I'm pretty much tired 
out for tonight.  LMK if you don't have time to look at it and I'll try 
to give it another shot.


I'm taking a look at that.

Thanks,

Alex


Re: QCA6174 pcie wifi: Add pci quirks

2021-04-14 Thread Alex Williamson
On Wed, 14 Apr 2021 16:03:50 -0500
Bjorn Helgaas  wrote:

> [+cc Alex]
> 
> On Fri, Apr 09, 2021 at 11:26:33AM +0200, Ingmar Klein wrote:
> > Edit: Retry, as I did not consider, that my mail-client would make this
> > party html.
> > 
> > Dear maintainers,
> > I recently encountered an issue on my Proxmox server system, that
> > includes a Qualcomm QCA6174 m.2 PCIe wifi module.
> > https://deviwiki.com/wiki/AIRETOS_AFX-QCA6174-NX
> > 
> > On system boot and subsequent virtual machine start (with passed-through
> > QCA6174), the VM would just freeze/hang, at the point where the ath10k
> > driver loads.
> > Quick search in the proxmox related topics, brought me to the following
> > discussion, which suggested a PCI quirk entry for the QCA6174 in the kernel:
> > https://forum.proxmox.com/threads/pcie-passthrough-freezes-proxmox.27513/
> > 
> > I then went ahead, got the Proxmox kernel source (v5.4.106) and applied
> > the attached patch.
> > Effect was as hoped, that the VM hangs are now gone. System boots and
> > runs as intended.
> > 
> > Judging by the existing quirk entries for Atheros, I would think, that
> > my proposed "fix" could be included in the vanilla kernel.
> > As far as I saw, there is no entry yet, even in the latest kernel sources.  
> 
> This would need a signed-off-by; see
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?id=v5.11#n361
> 
> This is an old issue, and likely we'll end up just applying this as
> yet another quirk.  But looking at c3e59ee4e766 ("PCI: Mark Atheros
> AR93xx to avoid bus reset"), where it started, it seems to be
> connected to 425c1b223dac ("PCI: Add Virtual Channel to save/restore
> support").
> 
> I'd like to dig into that a bit more to see if there are any clues.
> AFAIK Linux itself still doesn't use VC at all, and 425c1b223dac added
> a fair bit of code.  I wonder if we're restoring something out of
> order or making some simple mistake in the way to restore VC config.

I don't really have any faith in that bisect report in commit
c3e59ee4e766.  To double check I dug out the card from that commit,
installed an old Fedora release so I could build kernel v3.13,
pre-dating 425c1b223dac and tested triggering a bus reset both via
setpci and by masking PM reset so that sysfs can trigger the bus reset
path with the kernel save/restore code.  Both result in the system
hanging when the device is accessed either restoring from the kernel
bus reset or reading from the device after the setpci reset.  Thanks,

Alex



Re: [External] : Re: [PATCH v14 6/6] locking/qspinlock: Introduce the shuffle reduction optimization into CNA

2021-04-14 Thread Alex Kogan
Hi, Andreas.

Thanks for the great questions.

> On Apr 14, 2021, at 3:47 AM, Andreas Herrmann  wrote:
> 
> On Thu, Apr 01, 2021 at 11:31:56AM -0400, Alex Kogan wrote:
>> This performance optimization chooses probabilistically to avoid moving
>> threads from the main queue into the secondary one when the secondary queue
>> is empty.
>> 
>> It is helpful when the lock is only lightly contended. In particular, it
>> makes CNA less eager to create a secondary queue, but does not introduce
>> any extra delays for threads waiting in that queue once it is created.
>> 
>> Signed-off-by: Alex Kogan 
>> Reviewed-by: Steve Sistare 
>> Reviewed-by: Waiman Long 
>> ---
>> kernel/locking/qspinlock_cna.h | 39 ++
>> 1 file changed, 39 insertions(+)
>> 
>> diff --git a/kernel/locking/qspinlock_cna.h b/kernel/locking/qspinlock_cna.h
>> index 29c3abbd3d94..983c6a47a221 100644
>> --- a/kernel/locking/qspinlock_cna.h
>> +++ b/kernel/locking/qspinlock_cna.h
>> @@ -5,6 +5,7 @@
>> 
>> #include 
>> #include 
>> +#include 
>> 
>> /*
>>  * Implement a NUMA-aware version of MCS (aka CNA, or compact NUMA-aware 
>> lock).
>> @@ -86,6 +87,34 @@ static inline bool intra_node_threshold_reached(struct 
>> cna_node *cn)
>>  return current_time - threshold > 0;
>> }
>> 
>> +/*
>> + * Controls the probability for enabling the ordering of the main queue
>> + * when the secondary queue is empty. The chosen value reduces the amount
>> + * of unnecessary shuffling of threads between the two waiting queues
>> + * when the contention is low, while responding fast enough and enabling
>> + * the shuffling when the contention is high.
>> + */
>> +#define SHUFFLE_REDUCTION_PROB_ARG  (7)
> 
> Out of curiosity:
> 
> Have you used other values and done measurements what's an efficient
> value for SHUFFLE_REDUCTION_PROB_ARG?
Yes, we did try other values. Small variations do not change the results much,
but if you bump the value significantly (e.g., 20), you end up with a lock that
hardly does any shuffling and thus performs on-par with the (MCS-based)
baseline.

> Maybe I miscalculated it, but if I understand it correctly this value
> implies that the propability is 0.9921875 that below function returns
> true.
Your analysis is correct. Intuitively, we tried to keep the probability around 
1-2%,
so if we do decide to shuffle when we don’t really want to (i.e., when the
contention is low), the overall overhead of such wrong decisions would be small.

> 
> My question is probably answered by following statement from
> referenced paper:
> 
> "In our experiments with the shuffle reduction optimization enabled,
> we set THRESHOLD2 to 0xff." (page with figure 5)
Yeah, just to avoid any confusion — we used a different mechanism to draw
pseudo-random numbers in the paper, so that 0xff number is not directly 
comparable to the range of possible values for SHUFFLE_REDUCTION_PROB_ARG,
but the idea was exactly the same.

> 
>> +
>> +/* Per-CPU pseudo-random number seed */
>> +static DEFINE_PER_CPU(u32, seed);
>> +
>> +/*
>> + * Return false with probability 1 / 2^@num_bits.
>> + * Intuitively, the larger @num_bits the less likely false is to be 
>> returned.
>> + * @num_bits must be a number between 0 and 31.
>> + */
>> +static bool probably(unsigned int num_bits)
>> +{
>> +u32 s;
>> +
>> +s = this_cpu_read(seed);
>> +s = next_pseudo_random32(s);
>> +this_cpu_write(seed, s);
>> +
>> +return s & ((1 << num_bits) - 1);
>> +}
>> +
>> static void __init cna_init_nodes_per_cpu(unsigned int cpu)
>> {
>>  struct mcs_spinlock *base = per_cpu_ptr([0].mcs, cpu);
>> @@ -293,6 +322,16 @@ static __always_inline u32 cna_wait_head_or_lock(struct 
>> qspinlock *lock,
>> {
>>  struct cna_node *cn = (struct cna_node *)node;
>> 
>> +if (node->locked <= 1 && probably(SHUFFLE_REDUCTION_PROB_ARG)) {
> 
> Again if I understand it correctly with SHUFFLE_REDUCTION_PROB_ARG==7
> it's roughly 1 out of 100 cases where probably() returns false.
> 
> Why/when is this beneficial?
> 
> I assume it has to do with following statement in the referenced
> paper:
> 
> "The superior performance over MCS at 4 threads is the result of the
> shuffling that does take place once in a while, organizing threads’
> arrivals to the lock in a way that reduces the inter-socket lock
> migration without the need to continuously modify the main queue. ..."
> (page with figure 9; the pap

[GIT PULL] VFIO fix for v5.12-rc8/final

2021-04-14 Thread Alex Williamson
Hi Linus,

Sorry for the late request.

The following changes since commit d434405aaab7d0ebc516b68a8fc4100922d7f5ef:

  Linux 5.12-rc7 (2021-04-11 15:16:13 -0700)

are available in the Git repository at:

  git://github.com/awilliam/linux-vfio.git tags/vfio-v5.12-rc8

for you to fetch changes up to 909290786ea335366e21d7f1ed5812b90f2f0a92:

  vfio/pci: Add missing range check in vfio_pci_mmap (2021-04-13 08:29:16 -0600)


VFIO fix for v5.12-rc8/final

 - Verify mmap region within range (Christian A. Ehrhardt)


Christian A. Ehrhardt (1):
  vfio/pci: Add missing range check in vfio_pci_mmap

 drivers/vfio/pci/vfio_pci.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)



Re: [PATCH] staging: greybus: Match parentheses alignment

2021-04-14 Thread Alex Elder

On 4/14/21 9:29 AM, Joe Perches wrote:

On Wed, 2021-04-14 at 08:17 -0500, Alex Elder wrote:

Perhaps (like the -W options for GCC) there
could be a way to specify in a Makefile which checkpatch
messages are reported/not reported?  I don't claim that's
a good suggestion, but if I could optionally indicate
somewhere that "two consecutive blank lines is OK for
Greybus" (one example that comes to mind) I might do so.


checkpatch already has --ignore= and --types=
for the various classes of messages it emits.

see: $ ./scripts/checkpatch.pl --list-types --verbose

Dwaipayan Ray (cc'd) is supposedly working on expanding
the verbose descriptions of each type.



That's awesome, I wasn't aware of that.

Any suggestions on a standardized way to say "in this
subtree, please provide these arguments to checkpatch.pl"?

I can probably stick it in a README file or something,
but is there an existing best practice?

Thanks.

    -Alex


Re: [PATCH v5] docs/zh_CN: add translations in zh_CN/dev-tools/gcov

2021-04-14 Thread Alex Shi
Reviewed-by: Alex Shi 

On 2021/4/14 下午9:21, Wu XiangCheng wrote:
> From: Bernard Zhao 
> 
> Add new zh translations
> * zh_CN/dev-tools/gcov.rst
> * zh_CN/dev-tools/index.rst
> and link them to zh_CN/index.rst
> 
> Signed-off-by: Bernard Zhao 
> Reviewed-by: Wu XiangCheng 
> Signed-off-by: Wu XiangCheng 
> ---
> base: linux-next
> commit 269dd42f4776 ("docs/zh_CN: add riscv to zh_CN index")
> 
> Changes since V4:
> * modified some words under Alex Shi's advices
> 
> Changes since V3:
> * update to newest linux-next
> * fix ``
> * fix tags
> * fix list indent
> 
> Changes since V2:
> * fix some inaccurate translation
> 
> Changes since V1:
> * add index.rst in dev-tools and link to to zh_CN/index.rst
> * fix some inaccurate translation
> 
>  .../translations/zh_CN/dev-tools/gcov.rst | 265 ++
>  .../translations/zh_CN/dev-tools/index.rst|  35 +++
>  Documentation/translations/zh_CN/index.rst|   1 +
>  3 files changed, 301 insertions(+)
>  create mode 100644 Documentation/translations/zh_CN/dev-tools/gcov.rst
>  create mode 100644 Documentation/translations/zh_CN/dev-tools/index.rst
> 
> diff --git a/Documentation/translations/zh_CN/dev-tools/gcov.rst 
> b/Documentation/translations/zh_CN/dev-tools/gcov.rst
> new file mode 100644
> index ..7515b488bc4e
> --- /dev/null
> +++ b/Documentation/translations/zh_CN/dev-tools/gcov.rst
> @@ -0,0 +1,265 @@
> +.. include:: ../disclaimer-zh_CN.rst
> +
> +:Original: Documentation/dev-tools/gcov.rst
> +:Translator: 赵军奎 Bernard Zhao 
> +
> +在Linux内核里使用gcov做代码覆盖率检查
> +=
> +
> +gcov是linux中已经集成的一个分析模块,该模块在内核中对GCC的代码覆盖率统
> +计提供了支持。
> +linux内核运行时的代码覆盖率数据会以gcov兼容的格式存储在debug-fs中,可
> +以通过gcov的 ``-o`` 选项(如下示例)获得指定文件的代码运行覆盖率统计数据
> +(需要跳转到内核编译路径下并且要有root权限)::
> +
> +# cd /tmp/linux-out
> +# gcov -o /sys/kernel/debug/gcov/tmp/linux-out/kernel spinlock.c
> +
> +这将在当前目录中创建带有执行计数注释的源代码文件。
> +在获得这些统计文件后,可以使用图形化的 gcov_ 前端工具(比如 lcov_ ),来实现
> +自动化处理linux内核的覆盖率运行数据,同时生成易于阅读的HTML格式文件。
> +
> +可能的用途:
> +
> +* 调试(用来判断每一行的代码是否已经运行过)
> +* 测试改进(如何修改测试代码,尽可能地覆盖到没有运行过的代码)
> +* 内核配置优化(对于某一个选项配置,如果关联的代码从来没有运行过,是
> +  否还需要这个配置)
> +
> +.. _gcov: https://gcc.gnu.org/onlinedocs/gcc/Gcov.html
> +.. _lcov: http://ltp.sourceforge.net/coverage/lcov.php
> +
> +
> +准备
> +
> +
> +内核打开如下配置::
> +
> +CONFIG_DEBUG_FS=y
> +CONFIG_GCOV_KERNEL=y
> +
> +获取整个内核的覆盖率数据,还需要打开::
> +
> +CONFIG_GCOV_PROFILE_ALL=y
> +
> +需要注意的是,整个内核开启覆盖率统计会造成内核镜像文件尺寸的增大,
> +同时内核运行的也会变慢一些。
> +另外,并不是所有的架构都支持整个内核开启覆盖率统计。
> +
> +代码运行覆盖率数据只在debugfs挂载完成后才可以访问::
> +
> +mount -t debugfs none /sys/kernel/debug
> +
> +
> +定制化
> +--
> +
> +如果要单独针对某一个路径或者文件进行代码覆盖率统计,可以在内核相应路
> +径的Makefile中增加如下的配置:
> +
> +- 单独统计单个文件(例如main.o)::
> +
> +GCOV_PROFILE_main.o := y
> +
> +- 单独统计某一个路径::
> +
> +GCOV_PROFILE := y
> +
> +如果要在整个内核的覆盖率统计(开启CONFIG_GCOV_PROFILE_ALL)中单独排除
> +某一个文件或者路径,可以使用如下的方法::
> +
> +GCOV_PROFILE_main.o := n
> +
> +和::
> +
> +GCOV_PROFILE := n
> +
> +此机制仅支持链接到内核镜像或编译为内核模块的文件。
> +
> +
> +相关文件
> +
> +
> +gcov功能需要在debugfs中创建如下文件:
> +
> +``/sys/kernel/debug/gcov``
> +gcov相关功能的根路径
> +
> +``/sys/kernel/debug/gcov/reset``
> +全局复位文件:向该文件写入数据后会将所有的gcov统计数据清0
> +
> +``/sys/kernel/debug/gcov/path/to/compile/dir/file.gcda``
> +gcov工具可以识别的覆盖率统计数据文件,向该文件写入数据后
> +   会将本文件的gcov统计数据清0
> +
> +``/sys/kernel/debug/gcov/path/to/compile/dir/file.gcno``
> +gcov工具需要的软连接文件(指向编译时生成的信息统计文件),这个文件是
> +在gcc编译时如果配置了选项 ``-ftest-coverage`` 时生成的。
> +
> +
> +针对模块的统计
> +--
> +
> +内核中的模块会动态的加载和卸载,模块卸载时对应的数据会被清除掉。
> +gcov提供了一种机制,通过保留相关数据的副本来收集这部分卸载模块的覆盖率数据。
> +模块卸载后这些备份数据在debugfs中会继续存在。
> +一旦这个模块重新加载,模块关联的运行统计会被初始化成debugfs中备份的数据。
> +
> +可以通过对内核参数gcov_persist的修改来停用gcov对模块的备份机制::
> +
> +gcov_persist = 0
> +
> +在运行时,用户还可以通过写入模块的数据文件或者写入gcov复位文件来丢弃已卸
> +载模块的数据。
> +
> +
> +编译机和测试机分离
> +--
> +
> +gcov的内核分析架构支持内核的编译和运行是在同一台机器上,也可以编译和运
> +行是在不同的机器上。
> +如果内核编译和运行是不同的机器,那么需要额外的准备工作,这取决于gcov工具
> +是在哪里使用的:
> +
> +.. _gcov-test_zh:
> +
> +a) 若gcov运行在测试机上
> +
> +测试机上面gcov工具的版本必须要跟内核编译机器使用的gcc版本相兼容,
> +同时下面的文件要从编译机拷贝到测试机上:
> +
> +从源代码中:
> +  - 所有的C文件和头文件
> +
> +从编译目录中:
> +  - 所有的C文件和头文件
> +  - 所有的.gcda文件和.gcno文件
> +  - 所有目录的链接
> +
> +特别需要注意,测试机器上面的目录结构跟编译机器上面的目录机构必须
> +完全一致。
> +如果文件是软链接,需要替换成真正的目录文件(这是由make的当前工作
> +目录变量CURDIR引起的)。
> +
> +.. _gcov-build_zh:
> +
> +b) 若gcov运行在编译机上
> 

Re: [PATCH] staging: greybus: Match parentheses alignment

2021-04-14 Thread Alex Elder

On 4/6/21 12:21 PM, Joe Perches wrote:

On Tue, 2021-04-06 at 15:27 +0200, Greg KH wrote:

On Tue, Apr 06, 2021 at 06:42:59PM +0600, Zhansaya Bagdauletkyzy wrote:

Match next line with open parentheses by adding tabs/spaces
to conform with Linux kernel coding style.
Reported by checkpatch.

[]

diff --git a/drivers/staging/greybus/camera.c b/drivers/staging/greybus/camera.c

[]

@@ -378,8 +378,8 @@ struct ap_csi_config_request {
  #define GB_CAMERA_CSI_CLK_FREQ_MARGIN 15000U
  


  static int gb_camera_setup_data_connection(struct gb_camera *gcam,
-   struct gb_camera_configure_streams_response *resp,
-   struct gb_camera_csi_params *csi_params)
+  struct 
gb_camera_configure_streams_response *resp,
+  struct gb_camera_csi_params 
*csi_params)


And now you violate another coding style requirement, which means
someone will send another patch to fix that up and around and around it
goes...


None of the coding style document is an actual requirement Greg.
It's all rules of thumb.  Useful rules, but not hard and fast right?


I agree with this, but this ambiguity causes some problems.

Greybus is a go-to place for just-starting developers to
work with some reasonably good "real" code.  But someone
just starting has no way of judging whether the warnings
issued by checkpatch are real or not.  Even experienced
developers will lack the insight to judge this if they
are modifying on a less-familiar part of the kernel.

The result--for Greybus certainly--is fairly regular
stream patches that suggest making trivial changes based
on checkpatch recommendations.  And unfortunately each
one is destined to be rejected by the maintainers.  This
is no good for anybody.

Can you think of a way to try to further characterize
how "serious" a warning message is?  I recognize that
even if (for example) you had something like 1-10 severity
scale, the scale might not be uniform across the whole
kernel tree.  Perhaps (like the -W options for GCC) there
could be a way to specify in a Makefile which checkpatch
messages are reported/not reported?  I don't claim that's
a good suggestion, but if I could optionally indicate
somewhere that "two consecutive blank lines is OK for
Greybus" (one example that comes to mind) I might do so.


To me, the biggest issue with this code isn't whether or not the
code is aligned at open parentheses or stays within 80 columns,
but is the use of 30+ character length identifiers.


I agree with you on this one...  I've worked with code
like that and it's very difficult to make it readable.
I've made a mental note to go look at this and see if
I can make it better.  I can't say when I'll get to it
but I think it's a good suggestion.

    -Alex


Using identifiers of that length makes using 80 column, or even
100 column length lines infeasible.

Perhaps seeing if include/linux/greybus/greybus_protocols.h
could be updated to use shorter length identifiers might be useful.

The median length identifier there is ~25 chars long and the
maximum length identifier is ~50 chars.







Re: [PATCH v4] docs/zh_CN: add translations in zh_CN/dev-tools/gcov

2021-04-14 Thread Alex Shi



On 2021/4/14 下午7:24, Wu XiangCheng wrote:
> From: Bernard Zhao 
> 
> Add new zh translations
> * zh_CN/dev-tools/gcov.rst
> * zh_CN/dev-tools/index.rst
> and link them to zh_CN/index.rst
> 
> Signed-off-by: Bernard Zhao 
> Reviewed-by: Wu Xiangcheng 
> Signed-off-by: Wu XiangCheng 
> ---
> base: linux-next
> commit 269dd42f4776 ("docs/zh_CN: add riscv to zh_CN index")
> 
> Changes since V3:
> * update to newest linux-next
> * fix ``
> * fix tags
> * fix list indent
> 
> Changes since V2:
> * fix some inaccurate translation
> 
> Changes since V1:
> * add index.rst in dev-tools and link to to zh_CN/index.rst
> * fix some inaccurate translation
> 
>  .../translations/zh_CN/dev-tools/gcov.rst | 265 ++
>  .../translations/zh_CN/dev-tools/index.rst|  35 +++
>  Documentation/translations/zh_CN/index.rst|   1 +
>  3 files changed, 301 insertions(+)
>  create mode 100644 Documentation/translations/zh_CN/dev-tools/gcov.rst
>  create mode 100644 Documentation/translations/zh_CN/dev-tools/index.rst
> 
> diff --git a/Documentation/translations/zh_CN/dev-tools/gcov.rst 
> b/Documentation/translations/zh_CN/dev-tools/gcov.rst
> new file mode 100644
> index ..7515b488bc4e
> --- /dev/null
> +++ b/Documentation/translations/zh_CN/dev-tools/gcov.rst
> @@ -0,0 +1,265 @@
> +.. include:: ../disclaimer-zh_CN.rst
> +
> +:Original: Documentation/dev-tools/gcov.rst
> +:Translator: 赵军奎 Bernard Zhao 
> +
> +在Linux内核里使用gcov做代码覆盖率检查
> +=
> +
> +gcov是linux中已经集成的一个分析模块,该模块在内核中对GCC的代码覆盖率统
> +计提供了支持。
> +linux内核运行时的代码覆盖率数据会以gcov兼容的格式存储在debug-fs中,可
> +以通过gcov的 ``-o`` 选项(如下示例)获得指定文件的代码运行覆盖率统计数据
> +(需要跳转到内核编译路径下并且要有root权限)::
> +
> +# cd /tmp/linux-out
> +# gcov -o /sys/kernel/debug/gcov/tmp/linux-out/kernel spinlock.c
> +
> +这将在当前目录中创建带有执行计数注释的源代码文件。
> +在获得这些统计文件后,可以使用图形化的 gcov_ 前端工具(比如 lcov_ ),来实现
> +自动化处理linux内核的覆盖率运行数据,同时生成易于阅读的HTML格式文件。
> +
> +可能的用途:
> +
> +* 调试(用来判断每一行的代码是否已经运行过)
> +* 测试改进(如何修改测试代码,尽可能地覆盖到没有运行过的代码)
> +* 内核配置优化(对于某一个选项配置,如果关联的代码从来没有运行过,是
> +  否还需要这个配置)
> +
> +.. _gcov: https://gcc.gnu.org/onlinedocs/gcc/Gcov.html
> +.. _lcov: http://ltp.sourceforge.net/coverage/lcov.php
> +
> +
> +准备
> +
> +
> +内核打开如下配置::
> +
> +CONFIG_DEBUG_FS=y
> +CONFIG_GCOV_KERNEL=y
> +
> +获取整个内核的覆盖率数据,还需要打开::
> +
> +CONFIG_GCOV_PROFILE_ALL=y
> +
> +需要注意的是,整个内核开启覆盖率统计会造成内核镜像文件尺寸的增大,
> +同时内核运行的也会变慢一些。
> +另外,并不是所有的架构都支持整个内核开启覆盖率统计。
> +
> +代码运行覆盖率数据只在debugfs挂载完成后才可以访问::
> +
> +mount -t debugfs none /sys/kernel/debug
> +
> +
> +客制化

一般是‘定制化‘

> +--
> +
> +如果要单独针对某一个路径或者文件进行代码覆盖率统计,可以在内核相应路
> +径的Makefile中增加如下的配置:
> +
> +- 单独统计单个文件(例如main.o)::
> +
> +GCOV_PROFILE_main.o := y
> +
> +- 单独统计某一个路径::
> +
> +GCOV_PROFILE := y
> +
> +如果要在整个内核的覆盖率统计(开启CONFIG_GCOV_PROFILE_ALL)中单独排除
> +某一个文件或者路径,可以使用如下的方法::
> +
> +GCOV_PROFILE_main.o := n
> +
> +和::
> +
> +GCOV_PROFILE := n
> +
> +此机制仅支持链接到内核镜像或编译为内核模块的文件。
> +
> +
> +相关文件
> +
> +
> +gcov功能需要在debugfs中创建如下文件:
> +
> +``/sys/kernel/debug/gcov``
> +gcov相关功能的根路径
> +
> +``/sys/kernel/debug/gcov/reset``
> +全局复位文件:向该文件写入数据后会将所有的gcov统计数据清0
> +
> +``/sys/kernel/debug/gcov/path/to/compile/dir/file.gcda``
> +gcov工具可以识别的覆盖率统计数据文件,向该文件写入数据后
> +   会将本文件的gcov统计数据清0
> +
> +``/sys/kernel/debug/gcov/path/to/compile/dir/file.gcno``
> +gcov工具需要的软连接文件(指向编译时生成的信息统计文件),这个文件是
> +在gcc编译时如果配置了选项 ``-ftest-coverage`` 时生成的。
> +
> +
> +针对模块的统计
> +--
> +
> +内核中的模块会动态的加载和卸载,模块卸载时对应的数据会被清除掉。
> +gcov提供了一种机制,通过保留相关数据的副本来收集这部分卸载模块的覆盖率数据。
> +模块卸载后这些备份数据在debugfs中会继续存在。
> +一旦这个模块重新加载,模块关联的运行统计会被初始化成debugfs中备份的数据。
> +
> +可以通过对内核参数gcov_persist的修改来停用gcov对模块的备份机制::
> +
> +gcov_persist = 0
> +
> +在运行时,用户还可以通过写入模块的数据文件或者写入gcov复位文件来丢弃已卸
> +载模块的数据。
> +
> +
> +分离的编译和运行设备

编译和运行机分离 ?

machine means computer here. translated as 计算机 or 机器 may
better than 设备?

others looks fine for me.

Thanks
Alex

> +
> +
> +gcov的内核分析架构支持内核的编译和分析是在同一台设备上,也可以编译和运
> +行是在不同的设备上。
> +如果内核编译和运行是不同的设备,那么需要额外的准备工作,这取决于gcov工具
> +是在哪里使用的:
> +
> +.. _gcov-test_zh:
> +
> +a) 若gcov运行在测试设备上
> +
> +测试设备上面gcov工具的版本必须要跟设备内核编译使用的gcc版本相兼容,
> +同时下面的文件要从编译设备拷贝到测试设备上:
> +
> +从源代码中:
> +  - 所有的C文件和头文件
> +
> +从编译目录中:
> +  - 所有的C文件和头文件
> +  - 所有的.gcda文件和.gcno文件
> +  - 所有目录的链接
> +
> +特别需要注意,测试机器上面的目录结构跟编译机器上面的目录机构必须
> +完全一致。
> +如果文件是软链接,需要替换成真正的目录文件(这是由make的当前工作
> +目录变量CURDIR引起的)。
> +
> +.. _gcov-bu

Re: [PATCH] implement flush_cache_vmap for RISC-V

2021-04-14 Thread Alex Ghiti

Hi,

Le 4/12/21 à 3:08 AM, Jisheng Zhang a écrit :

Hi Jiuyang,

On Mon, 12 Apr 2021 00:05:30 + Jiuyang Liu  wrote:




This patch implements flush_cache_vmap for RISC-V, since it modifies PTE.
Without this patch, SFENCE.VMA won't be added to related codes, which
might introduce a bug in the out-of-order micro-architecture
implementations.

Signed-off-by: Jiuyang Liu 
Reviewed-by: Alexandre Ghiti 
Reviewed-by: Palmer Dabbelt 


IIRC, Palmer hasn't given this Reviewed-by tag.


---


Could you plz add version and changes? IIRC, this is the v3.


  arch/riscv/include/asm/cacheflush.h | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/arch/riscv/include/asm/cacheflush.h 
b/arch/riscv/include/asm/cacheflush.h
index 23ff70350992..3fd528badc35 100644
--- a/arch/riscv/include/asm/cacheflush.h
+++ b/arch/riscv/include/asm/cacheflush.h
@@ -30,6 +30,12 @@ static inline void flush_dcache_page(struct page *page)
  #define flush_icache_user_page(vma, pg, addr, len) \
 flush_icache_mm(vma->vm_mm, 0)

+/*
+ * flush_cache_vmap is invoked after map_kernel_range() has installed the page
+ * table entries, which modifies PTE, SFENCE.VMA should be inserted.


Just my humble opinion, flush_cache_vmap() may not be necessary. vmalloc_fault
can take care of this, and finally sfence.vma is inserted in related path.




I believe Palmer and Jisheng are right, my initial proposal to implement 
flush_cache_vmap is wrong.


But then, Jiuyang should not have noticed any problem here, so what's 
wrong? @Jiuyang: Does implementing flush_cache_vmap fix your issue?


And regarding flush_cache_vunmap, from Jisheng call stack, it seems also 
not necessary.


@Jiuyang: Can you tell us more about what you noticed?



Regards


+ */
+#define flush_cache_vmap(start, end) flush_tlb_all()
+
  #ifndef CONFIG_SMP

  #define flush_icache_all() local_flush_icache_all()
--
2.31.1


___
linux-riscv mailing list
linux-ri...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv



Re: [External] : Re: [PATCH v14 4/6] locking/qspinlock: Introduce starvation avoidance into CNA

2021-04-13 Thread Alex Kogan



> On Apr 13, 2021, at 5:22 PM, Andi Kleen  wrote:
> 
>>> ms granularity seems very coarse grained for this. Surely
>>> at some point of spinning you can afford a ktime_get? But ok.
>> We are reading time when we are at the head of the (main) queue, but
>> don’t have the lock yet. Not sure about the latency of ktime_get(), but
>> anything reasonably fast but not necessarily precise should work.
> 
> Actually cpu_clock / sched_clock (see my other email). These should
> be fast without corner cases and also monotonic.
I see, thanks.

> 
>> 
>>> Could you turn that into a moduleparm which can be changed at runtime?
>>> Would be strange to have to reboot just to play with this parameter
>> Yes, good suggestion, thanks.
>> 
>>> This would also make the code a lot shorter I guess.
>> So you don’t think we need the command-line parameter, just the module_param?
> 
> module_params can be changed at the command line too, so yes.
Got it, thanks again.

— Alex



Re: [External] : Re: [PATCH v14 3/6] locking/qspinlock: Introduce CNA into the slow path of qspinlock

2021-04-13 Thread Alex Kogan
Peter, thanks for all the comments and suggestions!

> On Apr 13, 2021, at 7:30 AM, Peter Zijlstra  wrote:
> 
> On Thu, Apr 01, 2021 at 11:31:53AM -0400, Alex Kogan wrote:
> 
>> +/*
>> + * cna_splice_tail -- splice the next node from the primary queue onto
>> + * the secondary queue.
>> + */
>> +static void cna_splice_next(struct mcs_spinlock *node,
>> +struct mcs_spinlock *next,
>> +struct mcs_spinlock *nnext)
> 
> You forgot to update the comment when you changed the name on this
> thing.
Good catch, thanks.

> 
>> +/*
>> + * cna_order_queue - check whether the next waiter in the main queue is on
>> + * the same NUMA node as the lock holder; if not, and it has a waiter behind
>> + * it in the main queue, move the former onto the secondary queue.
>> + */
>> +static void cna_order_queue(struct mcs_spinlock *node)
>> +{
>> +struct mcs_spinlock *next = READ_ONCE(node->next);
>> +struct cna_node *cn = (struct cna_node *)node;
>> +int numa_node, next_numa_node;
>> +
>> +if (!next) {
>> +cn->partial_order = LOCAL_WAITER_NOT_FOUND;
>> +return;
>> +}
>> +
>> +numa_node = cn->numa_node;
>> +next_numa_node = ((struct cna_node *)next)->numa_node;
>> +
>> +if (next_numa_node != numa_node) {
>> +struct mcs_spinlock *nnext = READ_ONCE(next->next);
>> +
>> +if (nnext) {
>> +cna_splice_next(node, next, nnext);
>> +next = nnext;
>> +}
>> +/*
>> + * Inherit NUMA node id of primary queue, to maintain the
>> + * preference even if the next waiter is on a different node.
>> + */
>> +((struct cna_node *)next)->numa_node = numa_node;
>> +}
>> +}
> 
> So the obvious change since last time I looked a this is that it now
> only looks 1 entry ahead. Which makes sense I suppose.
This is in response to the critique that the worst-case time complexity of
cna_order_queue() was O(n). With this change, the complexity is constant.

> 
> I'm not really a fan of the 'partial_order' name combined with that
> silly enum { LOCAL_WAITER_FOUND, LOCAL_WAITER_NOT_FOUND }. That's just
> really bad naming all around. The enum is about having a waiter while
> the variable is about partial order, that doesn't match at all.
Fair enough.

> If you rename the variable to 'has_waiter' and simply use 0,1 values,
> things would be ever so more readable. But I don't think that makes
> sense, see below.
> 
> I'm also not sure about that whole numa_node thing, why would you
> over-write the numa node, why at this point ?
With this move-one-by-one approach, I want to keep the NUMA-node 
preference of the lock holder even if the next-next waiter is on a different
NUMA-node. Otherwise, we will end up switching preference often and
the entire scheme would not perform well. In particular, we might easily
end up with threads from the preferred node waiting in the secondary queue.

> 
>> +
>> +/* Abuse the pv_wait_head_or_lock() hook to get some work done */
>> +static __always_inline u32 cna_wait_head_or_lock(struct qspinlock *lock,
>> + struct mcs_spinlock *node)
>> +{
>> +/*
>> + * Try and put the time otherwise spent spin waiting on
>> + * _Q_LOCKED_PENDING_MASK to use by sorting our lists.
>> + */
>> +cna_order_queue(node);
>> +
>> +return 0; /* we lied; we didn't wait, go do so now */
> 
> So here we inspect one entry ahead and then quit. I can't rmember, but
> did we try something like:
> 
>   /*
>* Try and put the time otherwise spent spin waiting on
>* _Q_LOCKED_PENDING_MASK to use by sorting our lists.
>* Move one entry at a go until either the list is fully
>* sorted or we ran out of spin condition.
>*/
>   while (READ_ONCE(lock->val) & _Q_LOCKED_PENDING_MASK &&
>  node->partial_order)
>   cna_order_queue(node);
> 
>   return 0;
> 
> This will keep moving @next to the remote list until such a time that
> we're forced to continue or @next is local.
We have not tried that. This is actually an interesting idea, with its pros and 
cons.
That is, we are likely to filter out “non-preferred” waiters into the secondary 
queue
faster, but also we are more likely to run into a situation where the lock 
becomes
available at the time we are running the cna_order_queue() logic, thus 
prolonging
the handover time.

W

Re: [PATCH] efifb: Check efifb_pci_dev before using it

2021-04-13 Thread Alex Deucher
On Tue, Apr 13, 2021 at 2:37 PM Daniel Vetter  wrote:
>
> On Tue, Apr 13, 2021 at 8:02 PM Alex Deucher  wrote:
> >
> > On Tue, Apr 13, 2021 at 1:05 PM Kai-Heng Feng
> >  wrote:
> > >
> > > On some platforms like Hyper-V and RPi4 with UEFI firmware, efifb is not
> > > a PCI device.
> > >
> > > So make sure efifb_pci_dev is found before using it.
> > >
> > > Fixes: a6c0fd3d5a8b ("efifb: Ensure graphics device for efifb stays at 
> > > PCI D0")
> > > BugLink: https://bugs.launchpad.net/bugs/1922403
> > > Signed-off-by: Kai-Heng Feng 
> >
> > Reviewed-by: Alex Deucher 
>
> fbdev is in drm-misc, so maybe you can push this one too?

Yes, pushed.  Thanks!

Alex

> -Daniel
>
> >
> > > ---
> > >  drivers/video/fbdev/efifb.c | 6 --
> > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/video/fbdev/efifb.c b/drivers/video/fbdev/efifb.c
> > > index f58a545b3bf3..8ea8f079cde2 100644
> > > --- a/drivers/video/fbdev/efifb.c
> > > +++ b/drivers/video/fbdev/efifb.c
> > > @@ -575,7 +575,8 @@ static int efifb_probe(struct platform_device *dev)
> > > goto err_fb_dealoc;
> > > }
> > > fb_info(info, "%s frame buffer device\n", info->fix.id);
> > > -   pm_runtime_get_sync(_pci_dev->dev);
> > > +   if (efifb_pci_dev)
> > > +   pm_runtime_get_sync(_pci_dev->dev);
> > > return 0;
> > >
> > >  err_fb_dealoc:
> > > @@ -602,7 +603,8 @@ static int efifb_remove(struct platform_device *pdev)
> > > unregister_framebuffer(info);
> > > sysfs_remove_groups(>dev.kobj, efifb_groups);
> > > framebuffer_release(info);
> > > -   pm_runtime_put(_pci_dev->dev);
> > > +   if (efifb_pci_dev)
> > > +   pm_runtime_put(_pci_dev->dev);
> > >
> > > return 0;
> > >  }
> > > --
> > > 2.30.2
> > >
> > > ___
> > > dri-devel mailing list
> > > dri-de...@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > ___
> > dri-devel mailing list
> > dri-de...@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch


Re: [External] : Re: [PATCH v14 4/6] locking/qspinlock: Introduce starvation avoidance into CNA

2021-04-13 Thread Alex Kogan
Hi, Andi.

Thanks for your comments!

> On Apr 13, 2021, at 2:03 AM, Andi Kleen  wrote:
> 
> Alex Kogan  writes:
>> 
>> +numa_spinlock_threshold=[NUMA, PV_OPS]
>> +Set the time threshold in milliseconds for the
>> +number of intra-node lock hand-offs before the
>> +NUMA-aware spinlock is forced to be passed to
>> +a thread on another NUMA node.  Valid values
>> +are in the [1..100] range. Smaller values result
>> +in a more fair, but less performant spinlock,
>> +and vice versa. The default value is 10.
> 
> ms granularity seems very coarse grained for this. Surely
> at some point of spinning you can afford a ktime_get? But ok.
We are reading time when we are at the head of the (main) queue, but
don’t have the lock yet. Not sure about the latency of ktime_get(), but
anything reasonably fast but not necessarily precise should work.

> Could you turn that into a moduleparm which can be changed at runtime?
> Would be strange to have to reboot just to play with this parameter
Yes, good suggestion, thanks.

> This would also make the code a lot shorter I guess.
So you don’t think we need the command-line parameter, just the module_param?

Regards,
— Alex



Re: [PATCH] vfio/iommu_type1: Remove unused pinned_page_dirty_scope in vfio_iommu

2021-04-13 Thread Alex Williamson
On Mon, 12 Apr 2021 10:44:15 +0800
Keqian Zhu  wrote:

> pinned_page_dirty_scope is optimized out by commit 010321565a7d
> ("vfio/iommu_type1: Mantain a counter for non_pinned_groups"),
> but appears again due to some issues during merging branches.
> We can safely remove it here.
> 
> Signed-off-by: Keqian Zhu 
> ---
> 
> However, I'm not clear about the root problem. Is there a bug in git?

Strange, clearly I broke something in merge commit 76adb20f924f, but
it's not evident to me how that line reappeared.  Thanks for spotting
it, I'll queue this for v5.13.  Thanks,

Alex

> ---
>  drivers/vfio/vfio_iommu_type1.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 45cbfd4879a5..4d1f10a33d74 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -77,7 +77,6 @@ struct vfio_iommu {
>   boolv2;
>   boolnesting;
>   booldirty_page_tracking;
> - boolpinned_page_dirty_scope;
>   boolcontainer_open;
>  };
>  



Re: [PATCH] efifb: Check efifb_pci_dev before using it

2021-04-13 Thread Alex Deucher
On Tue, Apr 13, 2021 at 1:05 PM Kai-Heng Feng
 wrote:
>
> On some platforms like Hyper-V and RPi4 with UEFI firmware, efifb is not
> a PCI device.
>
> So make sure efifb_pci_dev is found before using it.
>
> Fixes: a6c0fd3d5a8b ("efifb: Ensure graphics device for efifb stays at PCI 
> D0")
> BugLink: https://bugs.launchpad.net/bugs/1922403
> Signed-off-by: Kai-Heng Feng 

Reviewed-by: Alex Deucher 

> ---
>  drivers/video/fbdev/efifb.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/video/fbdev/efifb.c b/drivers/video/fbdev/efifb.c
> index f58a545b3bf3..8ea8f079cde2 100644
> --- a/drivers/video/fbdev/efifb.c
> +++ b/drivers/video/fbdev/efifb.c
> @@ -575,7 +575,8 @@ static int efifb_probe(struct platform_device *dev)
> goto err_fb_dealoc;
> }
> fb_info(info, "%s frame buffer device\n", info->fix.id);
> -   pm_runtime_get_sync(_pci_dev->dev);
> +   if (efifb_pci_dev)
> +   pm_runtime_get_sync(_pci_dev->dev);
> return 0;
>
>  err_fb_dealoc:
> @@ -602,7 +603,8 @@ static int efifb_remove(struct platform_device *pdev)
> unregister_framebuffer(info);
> sysfs_remove_groups(>dev.kobj, efifb_groups);
> framebuffer_release(info);
> -   pm_runtime_put(_pci_dev->dev);
> +   if (efifb_pci_dev)
> +   pm_runtime_put(_pci_dev->dev);
>
> return 0;
>  }
> --
> 2.30.2
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH net-next 2/2] arm64: dts: qcom: sm8350-mtp: enable IPA

2021-04-13 Thread Alex Elder
Enable IPA for the SM8350 MTP.

Signed-off-by: Alex Elder 
---
 arch/arm64/boot/dts/qcom/sm8350-mtp.dts | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/sm8350-mtp.dts 
b/arch/arm64/boot/dts/qcom/sm8350-mtp.dts
index 6ca638b4e3213..93740444dd1ea 100644
--- a/arch/arm64/boot/dts/qcom/sm8350-mtp.dts
+++ b/arch/arm64/boot/dts/qcom/sm8350-mtp.dts
@@ -364,3 +364,9 @@ _2_qmpphy {
vdda-phy-supply = <_l6b_1p2>;
vdda-pll-supply = <_l5b_0p88>;
 };
+
+ {
+   status = "okay";
+
+   memory-region = <_ipa_fw_mem>;
+};
-- 
2.27.0



[PATCH net-next 1/2] arm64: dts: qcom: sm8350: add IPA information

2021-04-13 Thread Alex Elder
Add IPA-related nodes and definitions to "sm8350.dtsi", which uses
IPA v4.9.

Signed-off-by: Alex Elder 
---
 arch/arm64/boot/dts/qcom/sm8350.dtsi | 51 
 1 file changed, 51 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/sm8350.dtsi 
b/arch/arm64/boot/dts/qcom/sm8350.dtsi
index ed0b51bc03ea7..2fc23f3d2c75c 100644
--- a/arch/arm64/boot/dts/qcom/sm8350.dtsi
+++ b/arch/arm64/boot/dts/qcom/sm8350.dtsi
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 
 / {
interrupt-parent = <>;
@@ -391,6 +392,17 @@ smp2p_modem_in: slave-kernel {
interrupt-controller;
#interrupt-cells = <2>;
};
+
+   ipa_smp2p_out: ipa-ap-to-modem {
+   qcom,entry-name = "ipa";
+   #qcom,smem-state-cells = <1>;
+   };
+
+   ipa_smp2p_in: ipa-modem-to-ap {
+   qcom,entry-name = "ipa";
+   interrupt-controller;
+   #interrupt-cells = <2>;
+   };
};
 
smp2p-slpi {
@@ -629,6 +641,45 @@ compute_noc: interconnect@a0c{
qcom,bcm-voters = <_bcm_voter>;
};
 
+   ipa: ipa@1e4 {
+   compatible = "qcom,sm8350-ipa";
+
+   iommus = <_smmu 0x5c0 0x0>,
+<_smmu 0x5c2 0x0>;
+   reg = <0 0x1e4 0 0x8000>,
+ <0 0x1e5 0 0x4b20>,
+ <0 0x1e04000 0 0x23000>;
+   reg-names = "ipa-reg",
+   "ipa-shared",
+   "gsi";
+
+   interrupts-extended = < GIC_SPI 655 
IRQ_TYPE_EDGE_RISING>,
+ < GIC_SPI 432 
IRQ_TYPE_LEVEL_HIGH>,
+ <_smp2p_in 0 
IRQ_TYPE_EDGE_RISING>,
+ <_smp2p_in 1 
IRQ_TYPE_EDGE_RISING>;
+   interrupt-names = "ipa",
+ "gsi",
+ "ipa-clock-query",
+ "ipa-setup-ready";
+
+   clocks = < RPMH_IPA_CLK>;
+   clock-names = "core";
+
+   interconnects = <_noc MASTER_IPA _noc 
SLAVE_LLCC>,
+   <_virt MASTER_LLCC _virt 
SLAVE_EBI1>,
+   <_noc MASTER_APPSS_PROC _noc 
SLAVE_IPA_CFG>;
+   interconnect-names = "ipa_to_llcc",
+"llcc_to_ebi1",
+"appss_to_ipa";
+
+   qcom,smem-states = <_smp2p_out 0>,
+  <_smp2p_out 1>;
+   qcom,smem-state-names = "ipa-clock-enabled-valid",
+   "ipa-clock-enabled";
+
+   status = "disabled";
+   };
+
tcsr_mutex: hwlock@1f4 {
compatible = "qcom,tcsr-mutex";
reg = <0x0 0x01f4 0x0 0x4>;
-- 
2.27.0



[PATCH net-next 0/2] arm64: dts: qcom: enable SM8350

2021-04-13 Thread Alex Elder
Add IPA-related information to "sm8350.dtsi", and enable IPA for the
SM8350 MTP platform.

        -Alex

Alex Elder (2):
  arm64: dts: qcom: sm8350: add IPA information
  arm64: dts: qcom: sm8350-mtp: enable IPA

 arch/arm64/boot/dts/qcom/sm8350-mtp.dts |  6 +++
 arch/arm64/boot/dts/qcom/sm8350.dtsi| 51 +
 2 files changed, 57 insertions(+)

-- 
2.27.0



[PATCH net-next 2/2] net: ipa: add IPA v4.9 configuration data

2021-04-13 Thread Alex Elder
Add support for the SM8350 SoC, which includes IPA version 4.9.

Signed-off-by: Alex Elder 
---
 drivers/net/ipa/Makefile|   3 +-
 drivers/net/ipa/ipa_data-v4.9.c | 430 
 drivers/net/ipa/ipa_data.h  |   1 +
 drivers/net/ipa/ipa_main.c  |   4 +
 4 files changed, 437 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ipa/ipa_data-v4.9.c

diff --git a/drivers/net/ipa/Makefile b/drivers/net/ipa/Makefile
index 8c0ac87903549..1efe1a88104b3 100644
--- a/drivers/net/ipa/Makefile
+++ b/drivers/net/ipa/Makefile
@@ -10,4 +10,5 @@ ipa-y :=  ipa_main.o ipa_clock.o 
ipa_reg.o ipa_mem.o \
ipa_resource.o ipa_qmi.o ipa_qmi_msg.o
 
 ipa-y  +=  ipa_data-v3.5.1.o ipa_data-v4.2.o \
-   ipa_data-v4.5.o ipa_data-v4.11.o
+   ipa_data-v4.5.o ipa_data-v4.9.o \
+   ipa_data-v4.11.o
diff --git a/drivers/net/ipa/ipa_data-v4.9.c b/drivers/net/ipa/ipa_data-v4.9.c
new file mode 100644
index 0..e41be790f45e5
--- /dev/null
+++ b/drivers/net/ipa/ipa_data-v4.9.c
@@ -0,0 +1,430 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/* Copyright (C) 2021 Linaro Ltd. */
+
+#include 
+
+#include "gsi.h"
+#include "ipa_data.h"
+#include "ipa_endpoint.h"
+#include "ipa_mem.h"
+
+/** enum ipa_resource_type - IPA resource types for an SoC having IPA v4.9 */
+enum ipa_resource_type {
+   /* Source resource types; first must have value 0 */
+   IPA_RESOURCE_TYPE_SRC_PKT_CONTEXTS  = 0,
+   IPA_RESOURCE_TYPE_SRC_DESCRIPTOR_LISTS,
+   IPA_RESOURCE_TYPE_SRC_DESCRIPTOR_BUFF,
+   IPA_RESOURCE_TYPE_SRC_HPS_DMARS,
+   IPA_RESOURCE_TYPE_SRC_ACK_ENTRIES,
+
+   /* Destination resource types; first must have value 0 */
+   IPA_RESOURCE_TYPE_DST_DATA_SECTORS  = 0,
+   IPA_RESOURCE_TYPE_DST_DPS_DMARS,
+};
+
+/* Resource groups used for an SoC having IPA v4.9 */
+enum ipa_rsrc_group_id {
+   /* Source resource group identifiers */
+   IPA_RSRC_GROUP_SRC_UL_DL= 0,
+   IPA_RSRC_GROUP_SRC_DMA,
+   IPA_RSRC_GROUP_SRC_UC_RX_Q,
+   IPA_RSRC_GROUP_SRC_COUNT,   /* Last in set; not a source group */
+
+   /* Destination resource group identifiers */
+   IPA_RSRC_GROUP_DST_UL_DL_DPL= 0,
+   IPA_RSRC_GROUP_DST_DMA,
+   IPA_RSRC_GROUP_DST_UC,
+   IPA_RSRC_GROUP_DST_DRB_IP,
+   IPA_RSRC_GROUP_DST_COUNT,   /* Last; not a destination group */
+};
+
+/* QSB configuration data for an SoC having IPA v4.9 */
+static const struct ipa_qsb_data ipa_qsb_data[] = {
+   [IPA_QSB_MASTER_DDR] = {
+   .max_writes = 8,
+   .max_reads  = 0,/* no limit (hardware max) */
+   .max_reads_beats= 120,
+   },
+};
+
+/* Endpoint configuration data for an SoC having IPA v4.9 */
+static const struct ipa_gsi_endpoint_data ipa_gsi_endpoint_data[] = {
+   [IPA_ENDPOINT_AP_COMMAND_TX] = {
+   .ee_id  = GSI_EE_AP,
+   .channel_id = 6,
+   .endpoint_id= 7,
+   .toward_ipa = true,
+   .channel = {
+   .tre_count  = 256,
+   .event_count= 256,
+   .tlv_count  = 20,
+   },
+   .endpoint = {
+   .config = {
+   .resource_group = IPA_RSRC_GROUP_SRC_UL_DL,
+   .dma_mode   = true,
+   .dma_endpoint   = IPA_ENDPOINT_AP_LAN_RX,
+   .tx = {
+   .seq_type = IPA_SEQ_DMA,
+   },
+   },
+   },
+   },
+   [IPA_ENDPOINT_AP_LAN_RX] = {
+   .ee_id  = GSI_EE_AP,
+   .channel_id = 7,
+   .endpoint_id= 11,
+   .toward_ipa = false,
+   .channel = {
+   .tre_count  = 256,
+   .event_count= 256,
+   .tlv_count  = 9,
+   },
+   .endpoint = {
+   .config = {
+   .resource_group = IPA_RSRC_GROUP_DST_UL_DL_DPL,
+   .aggregation= true,
+   .status_enable  = true,
+   .rx = {
+   .pad_align  = ilog2(sizeof(u32)),
+   },
+   },
+   },
+   },
+   [IPA_ENDPOINT_AP_MODEM_TX] = {
+   .ee_id  = GSI_EE_AP,
+   .channel_id = 2,
+   .endpoint_id= 2,
+   .toward_

[PATCH net-next 1/2] dt-bindings: net: qcom,ipa: add support for SM8350

2021-04-13 Thread Alex Elder
Add support for "qcom,sm8350-ipa", which uses IPA v4.9.

Use "enum" rather than "oneOf/const ..." to specify compatible
strings, as suggested by Rob Herring.

Signed-off-by: Alex Elder 
---
 Documentation/devicetree/bindings/net/qcom,ipa.yaml | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/qcom,ipa.yaml 
b/Documentation/devicetree/bindings/net/qcom,ipa.yaml
index 2645a02cf19bf..da5212e693e91 100644
--- a/Documentation/devicetree/bindings/net/qcom,ipa.yaml
+++ b/Documentation/devicetree/bindings/net/qcom,ipa.yaml
@@ -43,11 +43,12 @@ description:
 
 properties:
   compatible:
-oneOf:
-  - const: "qcom,sc7180-ipa"
-  - const: "qcom,sc7280-ipa"
-  - const: "qcom,sdm845-ipa"
-  - const: "qcom,sdx55-ipa"
+enum:
+  - qcom,sc7180-ipa
+  - qcom,sc7280-ipa
+  - qcom,sdm845-ipa
+  - qcom,sdx55-ipa
+  - qcom,sm8350-ipa
 
   reg:
 items:
-- 
2.27.0



[PATCH net-next 0/2] net: ipa: add support for the SM8350 SoC

2021-04-13 Thread Alex Elder
This small series adds IPA driver support for the Qualcomm SM8350
SoC, which implements IPA v4.9.

The first patch updates the DT binding, and depends on a previous
patch that has already been accepted into net-next.

The second just defines the IPA v4.9 configuration data file.

(Device Tree files to support this SoC will be sent separately and
will go through the Qualcomm tree.)

-Alex

Alex Elder (2):
  dt-bindings: net: qcom,ipa: add support for SM8350
  net: ipa: add IPA v4.9 configuration data

 .../devicetree/bindings/net/qcom,ipa.yaml |  11 +-
 drivers/net/ipa/Makefile  |   3 +-
 drivers/net/ipa/ipa_data-v4.9.c   | 430 ++
 drivers/net/ipa/ipa_data.h|   1 +
 drivers/net/ipa/ipa_main.c|   4 +
 5 files changed, 443 insertions(+), 6 deletions(-)
 create mode 100644 drivers/net/ipa/ipa_data-v4.9.c

-- 
2.27.0


Re: [PATCH v8] RISC-V: enable XIP

2021-04-13 Thread Alex Ghiti

Le 4/13/21 à 2:35 AM, Alexandre Ghiti a écrit :

From: Vitaly Wool 

Introduce XIP (eXecute In Place) support for RISC-V platforms.
It allows code to be executed directly from non-volatile storage
directly addressable by the CPU, such as QSPI NOR flash which can
be found on many RISC-V platforms. This makes way for significant
optimization of RAM footprint. The XIP kernel is not compressed
since it has to run directly from flash, so it will occupy more
space on the non-volatile storage. The physical flash address used
to link the kernel object files and for storing it has to be known
at compile time and is represented by a Kconfig option.

XIP on RISC-V will for the time being only work on MMU-enabled
kernels.

Signed-off-by: Alexandre Ghiti  [ Rebase on top of "Move
kernel mapping outside the linear mapping" ]
Signed-off-by: Vitaly Wool 
---


I forgot the changes history:

Changes in v2:
- dedicated macro for XIP address fixup when MMU is not enabled yet
  o both for 32-bit and 64-bit RISC-V
- SP is explicitly set to a safe place in RAM before __copy_data call
- removed redundant alignment requirements in vmlinux-xip.lds.S
- changed long -> uintptr_t typecast in __XIP_FIXUP macro.
Changes in v3:
- rebased against latest for-next
- XIP address fixup macro now takes an argument
- SMP related fixes
Changes in v4:
- rebased against the current for-next
- less #ifdef's in C/ASM code
- dedicated XIP_FIXUP_OFFSET assembler macro in head.S
- C-specific definitions moved into #ifndef __ASSEMBLY__
- Fixed multi-core boot
Changes in v5:
- fixed build error for non-XIP kernels
Changes in v6:
- XIP_PHYS_RAM_BASE config option renamed to PHYS_RAM_BASE
- added PHYS_RAM_BASE_FIXED config flag to allow usage of
  PHYS_RAM_BASE in non-XIP configurations if needed
- XIP_FIXUP macro rewritten with a tempoarary variable to avoid side
  effects
- fixed crash for non-XIP kernels that don't use built-in DTB
Changes in v7:
- Fix pfn_base that required FIXUP
- Fix copy_data which lacked + 1 in size to copy
- Fix pfn_valid for FLATMEM
- Rebased on top of "Move kernel mapping outside the linear mapping":
  this is the biggest change and affected mm/init.c,
  kernel/vmlinux-xip.lds.S and include/asm/pgtable.h: XIP kernel is now
  mapped like 'normal' kernel at the end of the address space.
Changes in v8:
- XIP_KERNEL now depends on SPARSEMEM
- FLATMEM related: pfn_valid and pfn_base removal


  arch/riscv/Kconfig  |  55 +++-
  arch/riscv/Makefile |   8 +-
  arch/riscv/boot/Makefile|  13 +++
  arch/riscv/include/asm/page.h   |  21 +
  arch/riscv/include/asm/pgtable.h|  25 +-
  arch/riscv/kernel/head.S|  46 +-
  arch/riscv/kernel/head.h|   3 +
  arch/riscv/kernel/setup.c   |  10 ++-
  arch/riscv/kernel/vmlinux-xip.lds.S | 133 
  arch/riscv/kernel/vmlinux.lds.S |   6 ++
  arch/riscv/mm/init.c| 115 ++--
  11 files changed, 418 insertions(+), 17 deletions(-)
  create mode 100644 arch/riscv/kernel/vmlinux-xip.lds.S

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 8ea60a0a19ae..7c7efdd67a10 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -28,7 +28,7 @@ config RISCV
select ARCH_HAS_PTE_SPECIAL
select ARCH_HAS_SET_DIRECT_MAP
select ARCH_HAS_SET_MEMORY
-   select ARCH_HAS_STRICT_KERNEL_RWX if MMU
+   select ARCH_HAS_STRICT_KERNEL_RWX if MMU && !XIP_KERNEL
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX
select ARCH_OPTIONAL_KERNEL_RWX_DEFAULT
@@ -441,7 +441,7 @@ config EFI_STUB
  
  config EFI

bool "UEFI runtime support"
-   depends on OF
+   depends on OF && !XIP_KERNEL
select LIBFDT
select UCS2_STRING
select EFI_PARAMS_FROM_FDT
@@ -465,11 +465,60 @@ config STACKPROTECTOR_PER_TASK
def_bool y
depends on STACKPROTECTOR && CC_HAVE_STACKPROTECTOR_TLS
  
+config PHYS_RAM_BASE_FIXED

+   bool "Explicitly specified physical RAM address"
+   default n
+
+config PHYS_RAM_BASE
+   hex "Platform Physical RAM address"
+   depends on PHYS_RAM_BASE_FIXED
+   default "0x8000"
+   help
+ This is the physical address of RAM in the system. It has to be
+ explicitly specified to run early relocations of read-write data
+ from flash to RAM.
+
+config XIP_KERNEL
+   bool "Kernel Execute-In-Place from ROM"
+   depends on MMU && SPARSEMEM
+   select PHYS_RAM_BASE_FIXED
+   help
+ Execute-In-Place allows the kernel to run from non-volatile storage
+ directly addressable by the CPU, such as NOR flash. This saves RAM
+ space since the text section of the kernel is not loaded from flash
+ to RAM.  Read-write sections, such as the data section and stack,
+ are still copied to 

Re: [PATCH v2 2/2] drivers: gpio: add virtio-gpio guest driver

2021-04-13 Thread Alex Bennée


"Enrico Weigelt, metux IT consult"  writes:

> On 04.12.20 04:35, Jason Wang wrote:
>
> Hi,
>
>> Is the plan to keep this doc synced with the one in the virtio
>> specification?
>
> Yes, of course. I'm still in progress of doing the beaurocratic stuff w/
> virtio-tc folks (ID registration, ...) - yet have to see whether they
> wanna add it to their spec documents ...
>
> BTW: if you feel, sometings not good w/ the current spec, please raise
> your voice now.
>
>> I think it's better to use u8 ot uint8_t here.Git grep told me the
>> former is more popular under Documentation/.
>
> thx, I'll fix that
>
>>> +- for version field currently only value 1 supported.
>>> +- the line names block holds a stream of zero-terminated strings,
>>> +  holding the individual line names.
>> 
>> I'm not sure but does this mean we don't have a fixed length of config
>> space? Need to check whether it can bring any trouble to
>> migration(compatibility).
>
> Yes, it depends on how many gpio lines are present and how much space
> their names take up.
>
> A fixed size would either put unpleasent limits on the max number of
> lines or waste a lot space when only few lines present.
>
> Not that virtio-gpio is also meant for small embedded workloads running
> under some hypervisor.
>
>>> +- unspecified fields are reserved for future use and should be zero.
>>> +
>>> +
>>> +Virtqueues and messages:
>>> +
>>> +
>>> +- Queue #0: transmission from host to guest
>>> +- Queue #1: transmission from guest to host
>> 
>> 
>> Virtio became more a popular in the area without virtualization. So I
>> think it's better to use "device/driver" instead of "host/guest" here.
>
> Good point. But I'd prefer "cpu" instead of "driver" in that case.

I think you are going to tie yourself up in knots if you don't move this
to the OASIS spec. The reason being the VirtIO spec has definitions for
what a "Device" and a "Driver" is that are clear and unambiguous. The
upstream spec should be considered the canonical source of truth for any
implementation (Linux or otherwise).

By all means have the distilled documentation for the driver in the
kernel source tree but trying to upstream an implementation before
starting the definition in the standard is a little back to front IMHO*.

* that's not to say these things can't be done in parallel as the spec
  is reviewed and worked on and the kinks worked out but you want the
  final order of upstreaming to start with the spec.

-- 
Alex Bennée


Re: Regression: gvt: vgpu 1: MI_LOAD_REGISTER_MEM handler error

2021-04-12 Thread Alex Williamson
On Mon, 12 Apr 2021 10:32:14 -0600
Alex Williamson  wrote:

> Running a Windows guest on a i915-GVTg_V4_2 from an HD 5500 IGD on
> v5.12-rc6 results in host logs:
> 
> gvt: vgpu 1: lrm access to register (20c0)
> gvt: vgpu 1: MI_LOAD_REGISTER_MEM handler error
> gvt: vgpu 1: cmd parser error
> 0x0 
> 0x29 
> 
> gvt: vgpu 1: scan wa ctx error
> gvt: vgpu 1: failed to submit desc 0
> gvt: vgpu 1: fail submit workload on ring rcs0
> gvt: vgpu 1: fail to emulate MMIO write 2230 len 4
> 
> The guest goes into a boot loop triggering this error before reaching
> the desktop and rebooting.  Guest using Intel driver 20.19.15.5171
> dated 11/4/2020 (from driver file 15.40.5171).
> 
> This VM works well with the same guest and userspace software stack on
> Fedora's kernel 5.11.11-200.fc33.x86_64.  Thanks,

Bisected to:

commit f18d417a57438498e0de481d3a0bc900c2b0e057
Author: Yan Zhao 
Date:   Wed Dec 23 11:45:08 2020 +0800

drm/i915/gvt: filter cmds "srm" and "lrm" in cmd_handler

do not allow "srm" and "lrm" except for GEN8_L3SQCREG4 and 0x21f0.

Cc: Colin Xu 
Cc: Kevin Tian 
Signed-off-by: Yan Zhao 
Signed-off-by: Zhenyu Wang 
Link: 
http://patchwork.freedesktop.org/patch/msgid/20201223034508.17031-1-yan.y.z...@intel.com
Reviewed-by: Zhenyu Wang 



Regression: gvt: vgpu 1: MI_LOAD_REGISTER_MEM handler error

2021-04-12 Thread Alex Williamson


Running a Windows guest on a i915-GVTg_V4_2 from an HD 5500 IGD on
v5.12-rc6 results in host logs:

gvt: vgpu 1: lrm access to register (20c0)
gvt: vgpu 1: MI_LOAD_REGISTER_MEM handler error
gvt: vgpu 1: cmd parser error
0x0 
0x29 

gvt: vgpu 1: scan wa ctx error
gvt: vgpu 1: failed to submit desc 0
gvt: vgpu 1: fail submit workload on ring rcs0
gvt: vgpu 1: fail to emulate MMIO write 2230 len 4

The guest goes into a boot loop triggering this error before reaching
the desktop and rebooting.  Guest using Intel driver 20.19.15.5171
dated 11/4/2020 (from driver file 15.40.5171).

This VM works well with the same guest and userspace software stack on
Fedora's kernel 5.11.11-200.fc33.x86_64.  Thanks,

Alex



Re: [PATCH 1/2] vfio/pci: remove vfio_pci_nvlink2

2021-04-12 Thread Alex Williamson
On Mon, 12 Apr 2021 19:41:41 +1000
Michael Ellerman  wrote:

> Alex Williamson  writes:
> > On Fri, 26 Mar 2021 07:13:10 +0100
> > Christoph Hellwig  wrote:
> >  
> >> This driver never had any open userspace (which for VFIO would include
> >> VM kernel drivers) that use it, and thus should never have been added
> >> by our normal userspace ABI rules.
> >> 
> >> Signed-off-by: Christoph Hellwig 
> >> Acked-by: Greg Kroah-Hartman 
> >> ---
> >>  drivers/vfio/pci/Kconfig|   6 -
> >>  drivers/vfio/pci/Makefile   |   1 -
> >>  drivers/vfio/pci/vfio_pci.c |  18 -
> >>  drivers/vfio/pci/vfio_pci_nvlink2.c | 490 
> >>  drivers/vfio/pci/vfio_pci_private.h |  14 -
> >>  include/uapi/linux/vfio.h   |  38 +--
> >>  6 files changed, 4 insertions(+), 563 deletions(-)
> >>  delete mode 100644 drivers/vfio/pci/vfio_pci_nvlink2.c  
> >
> > Hearing no objections, applied to vfio next branch for v5.13.  Thanks,  
> 
> Looks like you only took patch 1?
> 
> I can't take patch 2 on its own, that would break the build.
> 
> Do you want to take both patches? There's currently no conflicts against
> my tree. It's possible one could appear before the v5.13 merge window,
> though it would probably just be something minor.
> 
> Or I could apply both patches to my tree, which means patch 1 would
> appear as two commits in the git history, but that's not a big deal.

I've already got a conflict in my next branch with patch 1, so it's
best to go through my tree.  Seems like a shared branch would be
easiest to allow you to merge and manage potential conflicts against
patch 2, I've pushed a branch here:

https://github.com/awilliam/linux-vfio.git v5.13/vfio/nvlink

Thanks,
Alex



Re: [PATCH net-next 4/7] net: ipa: ipa_stop() does not return an error

2021-04-12 Thread Alex Elder

On 4/12/21 2:26 AM, Leon Romanovsky wrote:

On Sun, Apr 11, 2021 at 08:42:15AM -0500, Alex Elder wrote:

On 4/11/21 8:28 AM, Leon Romanovsky wrote:

I think *not* checking an available return value is questionable
practice.  I'd really rather have a build option for a
"__need_not_check" tag and have "must_check" be the default.

__need_not_check == void ???


I'm not sure I understand your statement here, but...


We are talking about the same thing. My point was that __need_not_check
is actually void. The API author was supposed to declare that by
declaring that function doesn't return anything.


No, we are not.

Functions like strcpy() return a value, but that value is almost
never checked.  The returned value isn't an error, so there is
no real need to check that return value.  This is the kind of
thing I'm talking about that might be tagged __need_not_check.

A function that returns a value for no reason should be void,
I agree with that.

In the ipa_stop() case, the value *must* be returned because
it serves as an ->ndo_stop() function and has to adhere to
that function prototype.  The point of the current patch
was to simplify the code (defined privately in the current
source file), given knowledge that it never returns an error.

The compiler could ensure all calls to functions that return
a value actually check the return value.  And because I think
that's the best practice, I'd like to be able to run such a
check in my code.  But there are always exceptions, and that
would be the purpose of a __need_not_check tag.

I don't think this is worthy of any more discussion.

    -Alex


Re: [PATCH v7] RISC-V: enable XIP

2021-04-11 Thread Alex Ghiti

Le 4/9/21 à 10:42 AM, Vitaly Wool a écrit :

On Fri, Apr 9, 2021 at 3:59 PM Mike Rapoport  wrote:


On Fri, Apr 09, 2021 at 02:46:17PM +0200, David Hildenbrand wrote:

Also, will that memory properly be exposed in the resource tree as
System RAM (e.g., /proc/iomem) ? Otherwise some things (/proc/kcore)
won't work as expected - the kernel won't be included in a dump.

Do we really need a XIP kernel to included in kdump?
And does not it sound weird to expose flash as System RAM in /proc/iomem? ;-)


See my other mail, maybe we actually want something different.




I have just checked and it does not appear in /proc/iomem.

Ok your conclusion would be to have struct page, I'm going to implement this
version then using memblock as you described.


I'm not sure this is required. With XIP kernel text never gets into RAM, so
it does not seem to require struct page.

XIP by definition has some limitations relatively to "normal" operation,
so lack of kdump could be one of them.


I agree.



I might be wrong, but IMHO, artificially creating a memory map for part of
flash would cause more problems in the long run.


Can you elaborate?


Nothing particular, just a gut feeling. Usually, when you force something
it comes out the wrong way later.


It's possible still that MTD_XIP is implemented allowing to write to
the flash used for XIP. While flash is being written, memory map
doesn't make sense at all. I can't come up with a real life example
when it can actually lead to problems but it is indeed weird when
System RAM suddenly becomes unreadable. I really don't think exposing
it in /proc/iomem is a good idea.


BTW, how does XIP account the kernel text on other architectures that
implement it?


Interesting point, I thought XIP would be something new on RISC-V (well, at
least to me :) ). If that concept exists already, we better mimic what
existing implementations do.


I had quick glance at ARM, it seems that kernel text does not have memory
map and does not show up in System RAM.


Exactly, and I believe ARM64 won't do that too when it gets its own
XIP support (which is underway).




memmap does not seem necessary and ARM/ARM64 do not use it.

But if someone tries to get a struct page from a physical address that 
lies in flash, as mentioned by David, that could lead to silent 
corruptions if something exists at the address where the struct page 
should be. And it is hard to know which features in the kernel depends 
on that.


Regarding SPARSEMEM, the vmemmap lies in its own region so that's 
unlikely to happen, so we will catch those invalid accesses (and that's 
what I observed on riscv).


But for FLATMEM, memmap is in the linear mapping, then that could very 
likely happen silently.


Could a simple solution be to force SPARSEMEM for those XIP kernels ? 
Then wrong things could happen, but we would see those and avoid 
spending hours to debug :)


I will at least send a v8 to remove the pfn_valid modifications for 
FLATMEM that now returns true to pfn in flash.


Thanks,




Best regards,
Vitaly

___
linux-riscv mailing list
linux-ri...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv



Re: [PATCH net-next 4/7] net: ipa: ipa_stop() does not return an error

2021-04-11 Thread Alex Elder
On 4/11/21 8:28 AM, Leon Romanovsky wrote:
>> I think *not* checking an available return value is questionable
>> practice.  I'd really rather have a build option for a
>> "__need_not_check" tag and have "must_check" be the default.
> __need_not_check == void ???

I'm not sure I understand your statement here, but...

My point is, I'd rather have things like printk() and
strscpy() be marked with (an imaginary) __need_not_check,
than the way things are, with only certain functions being
marked __must_check.

In my view, if a function returns a value, all callers
of that function ought to be checking it.  If the return
value is not necessary it should be a void function if
possible.

I don't expect the world to change, but I just think the
default should be "must check" rather than "check optional".

-Alex


Re: [PATCH net-next 4/7] net: ipa: ipa_stop() does not return an error

2021-04-11 Thread Alex Elder
On 4/11/21 1:34 AM, Leon Romanovsky wrote:
> On Fri, Apr 09, 2021 at 01:07:19PM -0500, Alex Elder wrote:
>> In ipa_modem_stop(), if the modem netdev pointer is non-null we call
>> ipa_stop().  We check for an error and if one is returned we handle
>> it.  But ipa_stop() never returns an error, so this extra handling
>> is unnecessary.  Simplify the code in ipa_modem_stop() based on the
>> knowledge no error handling is needed at this spot.
>>
>> Signed-off-by: Alex Elder 
>> ---
>>  drivers/net/ipa/ipa_modem.c | 18 --
>>  1 file changed, 4 insertions(+), 14 deletions(-)
> 
> <...>
> 
>> +/* Stop the queue and disable the endpoints if it's open */
>>  if (netdev) {
>> -/* Stop the queue and disable the endpoints if it's open */
>> -ret = ipa_stop(netdev);
>> -if (ret)
>> -goto out_set_state;
>> -
>> +(void)ipa_stop(netdev);
> 
> This void casting is not needed here and in more general case sometimes
> even be seen as a mistake, for example if the returned attribute declared
> as __must_check.

I accept your point but I feel like it's sort of a 50/50 thing.

I think *not* checking an available return value is questionable
practice.  I'd really rather have a build option for a
"__need_not_check" tag and have "must_check" be the default.

The void cast here says "I know this returns a result, but I am
intentionally not checking it."  If it had been __must_check I
would certainly have checked it.  

That being said, I don't really care that much, so I'll plan
to post version 2, which will drop this cast (I'll probably
add a comment though).

Thanks.

-Alex

> 
> Thanks
> 



[PATCH net-next 4/4] net: ipa: add IPA v4.11 configuration data

2021-04-09 Thread Alex Elder
Add support for the SC7280 SoC, which includes IPA version 4.11.

Signed-off-by: Alex Elder 
---
 drivers/net/ipa/Makefile |   2 +-
 drivers/net/ipa/ipa_data-v4.11.c | 382 +++
 drivers/net/ipa/ipa_data.h   |   1 +
 drivers/net/ipa/ipa_main.c   |   4 +
 4 files changed, 388 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ipa/ipa_data-v4.11.c

diff --git a/drivers/net/ipa/Makefile b/drivers/net/ipa/Makefile
index ccc4924881ac4..8c0ac87903549 100644
--- a/drivers/net/ipa/Makefile
+++ b/drivers/net/ipa/Makefile
@@ -10,4 +10,4 @@ ipa-y :=  ipa_main.o ipa_clock.o 
ipa_reg.o ipa_mem.o \
ipa_resource.o ipa_qmi.o ipa_qmi_msg.o
 
 ipa-y  +=  ipa_data-v3.5.1.o ipa_data-v4.2.o \
-   ipa_data-v4.5.o
+   ipa_data-v4.5.o ipa_data-v4.11.o
diff --git a/drivers/net/ipa/ipa_data-v4.11.c b/drivers/net/ipa/ipa_data-v4.11.c
new file mode 100644
index 0..05806ceae8b54
--- /dev/null
+++ b/drivers/net/ipa/ipa_data-v4.11.c
@@ -0,0 +1,382 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/* Copyright (C) 2021 Linaro Ltd. */
+
+#include 
+
+#include "gsi.h"
+#include "ipa_data.h"
+#include "ipa_endpoint.h"
+#include "ipa_mem.h"
+
+/** enum ipa_resource_type - IPA resource types for an SoC having IPA v4.11 */
+enum ipa_resource_type {
+   /* Source resource types; first must have value 0 */
+   IPA_RESOURCE_TYPE_SRC_PKT_CONTEXTS  = 0,
+   IPA_RESOURCE_TYPE_SRC_DESCRIPTOR_LISTS,
+   IPA_RESOURCE_TYPE_SRC_DESCRIPTOR_BUFF,
+   IPA_RESOURCE_TYPE_SRC_HPS_DMARS,
+   IPA_RESOURCE_TYPE_SRC_ACK_ENTRIES,
+
+   /* Destination resource types; first must have value 0 */
+   IPA_RESOURCE_TYPE_DST_DATA_SECTORS  = 0,
+   IPA_RESOURCE_TYPE_DST_DPS_DMARS,
+};
+
+/* Resource groups used for an SoC having IPA v4.11 */
+enum ipa_rsrc_group_id {
+   /* Source resource group identifiers */
+   IPA_RSRC_GROUP_SRC_UL_DL= 0,
+   IPA_RSRC_GROUP_SRC_UC_RX_Q,
+   IPA_RSRC_GROUP_SRC_UNUSED_2,
+   IPA_RSRC_GROUP_SRC_COUNT,   /* Last in set; not a source group */
+
+   /* Destination resource group identifiers */
+   IPA_RSRC_GROUP_DST_UL_DL_DPL= 0,
+   IPA_RSRC_GROUP_DST_UNUSED_1,
+   IPA_RSRC_GROUP_DST_DRB_IP,
+   IPA_RSRC_GROUP_DST_COUNT,   /* Last; not a destination group */
+};
+
+/* QSB configuration data for an SoC having IPA v4.11 */
+static const struct ipa_qsb_data ipa_qsb_data[] = {
+   [IPA_QSB_MASTER_DDR] = {
+   .max_writes = 12,
+   .max_reads  = 13,
+   .max_reads_beats= 120,
+   },
+};
+
+/* Endpoint configuration data for an SoC having IPA v4.11 */
+static const struct ipa_gsi_endpoint_data ipa_gsi_endpoint_data[] = {
+   [IPA_ENDPOINT_AP_COMMAND_TX] = {
+   .ee_id  = GSI_EE_AP,
+   .channel_id = 5,
+   .endpoint_id= 7,
+   .toward_ipa = true,
+   .channel = {
+   .tre_count  = 256,
+   .event_count= 256,
+   .tlv_count  = 20,
+   },
+   .endpoint = {
+   .config = {
+   .resource_group = IPA_RSRC_GROUP_SRC_UL_DL,
+   .dma_mode   = true,
+   .dma_endpoint   = IPA_ENDPOINT_AP_LAN_RX,
+   .tx = {
+   .seq_type = IPA_SEQ_DMA,
+   },
+   },
+   },
+   },
+   [IPA_ENDPOINT_AP_LAN_RX] = {
+   .ee_id  = GSI_EE_AP,
+   .channel_id = 14,
+   .endpoint_id= 9,
+   .toward_ipa = false,
+   .channel = {
+   .tre_count  = 256,
+   .event_count= 256,
+   .tlv_count  = 9,
+   },
+   .endpoint = {
+   .config = {
+   .resource_group = IPA_RSRC_GROUP_DST_UL_DL_DPL,
+   .aggregation= true,
+   .status_enable  = true,
+   .rx = {
+   .pad_align  = ilog2(sizeof(u32)),
+   },
+   },
+   },
+   },
+   [IPA_ENDPOINT_AP_MODEM_TX] = {
+   .ee_id  = GSI_EE_AP,
+   .channel_id = 2,
+   .endpoint_id= 2,
+   .toward_ipa = true,
+   .channel = {
+   .tre_count  = 512,
+

[PATCH net-next 2/4] net: ipa: disable checksum offload for IPA v4.5+

2021-04-09 Thread Alex Elder
Checksum offload for IPA v4.5+ is implemented differently, using
"inline" offload (which uses a common header format for both upload
and download offload).

The IPA hardware must be programmed to enable MAP checksum offload,
but the RMNet driver is responsible for interpreting checksum
metadata supplied with messages.

Currently, the RMNet driver does not support inline checksum offload.
This support is imminent, but until it is available, do not allow
newer versions of IPA to specify checksum offload for endpoints.

Signed-off-by: Alex Elder 
---
 drivers/net/ipa/ipa_endpoint.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/drivers/net/ipa/ipa_endpoint.c b/drivers/net/ipa/ipa_endpoint.c
index dd24179383c1c..5d8b8c68438a5 100644
--- a/drivers/net/ipa/ipa_endpoint.c
+++ b/drivers/net/ipa/ipa_endpoint.c
@@ -88,6 +88,11 @@ static bool ipa_endpoint_data_valid_one(struct ipa *ipa, u32 
count,
if (ipa_gsi_endpoint_data_empty(data))
return true;
 
+   /* IPA v4.5+ uses checksum offload, not yet supported by RMNet */
+   if (ipa->version >= IPA_VERSION_4_5)
+   if (data->endpoint.config.checksum)
+   return false;
+
if (!data->toward_ipa) {
if (data->endpoint.filter_support) {
dev_err(dev, "filtering not supported for "
@@ -230,6 +235,17 @@ static bool ipa_endpoint_data_valid(struct ipa *ipa, u32 
count,
 static bool ipa_endpoint_data_valid(struct ipa *ipa, u32 count,
const struct ipa_gsi_endpoint_data *data)
 {
+   const struct ipa_gsi_endpoint_data *dp = data;
+   enum ipa_endpoint_name name;
+
+   if (ipa->version < IPA_VERSION_4_5)
+   return true;
+
+   /* IPA v4.5+ uses checksum offload, not yet supported by RMNet */
+   for (name = 0; name < count; name++, dp++)
+   if (data->endpoint.config.checksum)
+   return false;
+
return true;
 }
 
-- 
2.27.0



[PATCH net-next 3/4] net: ipa: add IPA v4.5 configuration data

2021-04-09 Thread Alex Elder
Add support for the SDX55 SoC, which includes IPA version 4.5.

Starting with IPA v4.5, a few of the memory regions have a different
number of "canary" values; update comments in the where the region
identifers are defined to accurately reflect that.

I'll note three differences in SDX55 versus the other two existing
platforms (SDM845 and SC7180):
  - SDX55 uses a 32-bit Linux kernel
  - SDX55 has four interconnects rather than three
  - SDX55 uses IPA v4.5, which uses inline checksum offload

Signed-off-by: Alex Elder 
---
 drivers/net/ipa/Makefile|   3 +-
 drivers/net/ipa/ipa_data-v4.5.c | 437 
 drivers/net/ipa/ipa_data.h  |   1 +
 drivers/net/ipa/ipa_main.c  |   4 +
 drivers/net/ipa/ipa_mem.h   |   6 +-
 5 files changed, 447 insertions(+), 4 deletions(-)
 create mode 100644 drivers/net/ipa/ipa_data-v4.5.c

diff --git a/drivers/net/ipa/Makefile b/drivers/net/ipa/Makefile
index 6abd1db9fe330..ccc4924881ac4 100644
--- a/drivers/net/ipa/Makefile
+++ b/drivers/net/ipa/Makefile
@@ -9,4 +9,5 @@ ipa-y   :=  ipa_main.o ipa_clock.o 
ipa_reg.o ipa_mem.o \
ipa_endpoint.o ipa_cmd.o ipa_modem.o \
ipa_resource.o ipa_qmi.o ipa_qmi_msg.o
 
-ipa-y  +=  ipa_data-v3.5.1.o ipa_data-v4.2.o
+ipa-y  +=  ipa_data-v3.5.1.o ipa_data-v4.2.o \
+   ipa_data-v4.5.o
diff --git a/drivers/net/ipa/ipa_data-v4.5.c b/drivers/net/ipa/ipa_data-v4.5.c
new file mode 100644
index 0..5f67a3a909ee0
--- /dev/null
+++ b/drivers/net/ipa/ipa_data-v4.5.c
@@ -0,0 +1,437 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/* Copyright (C) 2021 Linaro Ltd. */
+
+#include 
+
+#include "gsi.h"
+#include "ipa_data.h"
+#include "ipa_endpoint.h"
+#include "ipa_mem.h"
+
+/** enum ipa_resource_type - IPA resource types for an SoC having IPA v4.5 */
+enum ipa_resource_type {
+   /* Source resource types; first must have value 0 */
+   IPA_RESOURCE_TYPE_SRC_PKT_CONTEXTS  = 0,
+   IPA_RESOURCE_TYPE_SRC_DESCRIPTOR_LISTS,
+   IPA_RESOURCE_TYPE_SRC_DESCRIPTOR_BUFF,
+   IPA_RESOURCE_TYPE_SRC_HPS_DMARS,
+   IPA_RESOURCE_TYPE_SRC_ACK_ENTRIES,
+
+   /* Destination resource types; first must have value 0 */
+   IPA_RESOURCE_TYPE_DST_DATA_SECTORS  = 0,
+   IPA_RESOURCE_TYPE_DST_DPS_DMARS,
+};
+
+/* Resource groups used for an SoC having IPA v4.5 */
+enum ipa_rsrc_group_id {
+   /* Source resource group identifiers */
+   IPA_RSRC_GROUP_SRC_UNUSED_0 = 0,
+   IPA_RSRC_GROUP_SRC_UL_DL,
+   IPA_RSRC_GROUP_SRC_UNUSED_2,
+   IPA_RSRC_GROUP_SRC_UNUSED_3,
+   IPA_RSRC_GROUP_SRC_UC_RX_Q,
+   IPA_RSRC_GROUP_SRC_COUNT,   /* Last in set; not a source group */
+
+   /* Destination resource group identifiers */
+   IPA_RSRC_GROUP_DST_UNUSED_0 = 0,
+   IPA_RSRC_GROUP_DST_UL_DL_DPL,
+   IPA_RSRC_GROUP_DST_UNUSED_2,
+   IPA_RSRC_GROUP_DST_UNUSED_3,
+   IPA_RSRC_GROUP_DST_UC,
+   IPA_RSRC_GROUP_DST_COUNT,   /* Last; not a destination group */
+};
+
+/* QSB configuration data for an SoC having IPA v4.5 */
+static const struct ipa_qsb_data ipa_qsb_data[] = {
+   [IPA_QSB_MASTER_DDR] = {
+   .max_writes = 8,
+   .max_reads  = 0,/* no limit (hardware max) */
+   .max_reads_beats= 120,
+   },
+   [IPA_QSB_MASTER_PCIE] = {
+   .max_writes = 8,
+   .max_reads  = 12,
+   /* no outstanding read byte (beat) limit */
+   },
+};
+
+/* Endpoint configuration data for an SoC having IPA v4.5 */
+static const struct ipa_gsi_endpoint_data ipa_gsi_endpoint_data[] = {
+   [IPA_ENDPOINT_AP_COMMAND_TX] = {
+   .ee_id  = GSI_EE_AP,
+   .channel_id = 9,
+   .endpoint_id= 7,
+   .toward_ipa = true,
+   .channel = {
+   .tre_count  = 256,
+   .event_count= 256,
+   .tlv_count  = 20,
+   },
+   .endpoint = {
+   .config = {
+   .resource_group = IPA_RSRC_GROUP_SRC_UL_DL,
+   .dma_mode   = true,
+   .dma_endpoint   = IPA_ENDPOINT_AP_LAN_RX,
+   .tx = {
+   .seq_type = IPA_SEQ_DMA,
+   },
+   },
+   },
+   },
+   [IPA_ENDPOINT_AP_LAN_RX] = {
+   .ee_id  = GSI_EE_AP,
+   .channel_id = 10,
+   .endpoint_id= 16,
+   .toward_ipa = false,
+   .channel = {
+   .tre_coun

[PATCH net-next 1/4] dt-bindings: net: qcom,ipa: add some compatible strings

2021-04-09 Thread Alex Elder
Add existing supported platform "qcom,sc7180-ipa" to the set of IPA
compatible strings.  Also add newly-supported "qcom,sdx55-ipa",
"qcom,sc7280-ipa".

Signed-off-by: Alex Elder 
---
 Documentation/devicetree/bindings/net/qcom,ipa.yaml | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/net/qcom,ipa.yaml 
b/Documentation/devicetree/bindings/net/qcom,ipa.yaml
index 8f86084bf12e9..2645a02cf19bf 100644
--- a/Documentation/devicetree/bindings/net/qcom,ipa.yaml
+++ b/Documentation/devicetree/bindings/net/qcom,ipa.yaml
@@ -43,7 +43,11 @@ description:
 
 properties:
   compatible:
-const: "qcom,sdm845-ipa"
+oneOf:
+  - const: "qcom,sc7180-ipa"
+  - const: "qcom,sc7280-ipa"
+  - const: "qcom,sdm845-ipa"
+  - const: "qcom,sdx55-ipa"
 
   reg:
 items:
-- 
2.27.0



[PATCH net-next 0/4] net: ipa: support two more platforms

2021-04-09 Thread Alex Elder
This series adds IPA support for two more Qualcomm SoCs.

The first patch updates the DT binding to add compatible strings.

The second temporarily disables checksum offload support for IPA
version 4.5 and above.  Changes are required to the RMNet driver
to support the "inline" checksum offload used for IPA v4.5+, and
once those are present this capability will be enabled for IPA.

The third and fourth patches add configuration data for IPA versions
4.5 (used for the SDX55 SoC) and 4.11 (used for the SD7280 SoC).

        -Alex

Alex Elder (4):
  dt-bindings: net: qcom,ipa: add some compatible strings
  net: ipa: disable checksum offload for IPA v4.5+
  net: ipa: add IPA v4.5 configuration data
  net: ipa: add IPA v4.11 configuration data

 .../devicetree/bindings/net/qcom,ipa.yaml |   6 +-
 drivers/net/ipa/Makefile  |   3 +-
 drivers/net/ipa/ipa_data-v4.11.c  | 382 +++
 drivers/net/ipa/ipa_data-v4.5.c   | 437 ++
 drivers/net/ipa/ipa_data.h|   2 +
 drivers/net/ipa/ipa_endpoint.c|  16 +
 drivers/net/ipa/ipa_main.c|   8 +
 drivers/net/ipa/ipa_mem.h |   6 +-
 8 files changed, 855 insertions(+), 5 deletions(-)
 create mode 100644 drivers/net/ipa/ipa_data-v4.11.c
 create mode 100644 drivers/net/ipa/ipa_data-v4.5.c

-- 
2.27.0



[PATCH net-next 5/7] net: ipa: get rid of empty IPA functions

2021-04-09 Thread Alex Elder
There are place holder functions in the IPA code that do nothing.
For the most part these are inverse functions, for example, once the
routing or filter tables are set up there is no need to perform any
matching teardown activity at shutdown, or in the case of an error.

These can be safely removed, resulting in some code simplification.
Add comments in these spots making it explicit that there is no
inverse.

Signed-off-by: Alex Elder 
---
 drivers/net/ipa/ipa_main.c | 29 +
 drivers/net/ipa/ipa_mem.c  |  9 +++--
 drivers/net/ipa/ipa_mem.h  |  5 ++---
 drivers/net/ipa/ipa_resource.c |  8 +---
 drivers/net/ipa/ipa_resource.h |  8 ++--
 drivers/net/ipa/ipa_table.c| 26 +++---
 drivers/net/ipa/ipa_table.h| 16 
 7 files changed, 24 insertions(+), 77 deletions(-)

diff --git a/drivers/net/ipa/ipa_main.c b/drivers/net/ipa/ipa_main.c
index a970d10e650ef..bfed151f5d6dc 100644
--- a/drivers/net/ipa/ipa_main.c
+++ b/drivers/net/ipa/ipa_main.c
@@ -147,13 +147,13 @@ int ipa_setup(struct ipa *ipa)
if (ret)
goto err_endpoint_teardown;
 
-   ret = ipa_mem_setup(ipa);
+   ret = ipa_mem_setup(ipa);   /* No matching teardown required */
if (ret)
goto err_command_disable;
 
-   ret = ipa_table_setup(ipa);
+   ret = ipa_table_setup(ipa); /* No matching teardown required */
if (ret)
-   goto err_mem_teardown;
+   goto err_command_disable;
 
/* Enable the exception handling endpoint, and tell the hardware
 * to use it by default.
@@ -161,7 +161,7 @@ int ipa_setup(struct ipa *ipa)
exception_endpoint = ipa->name_map[IPA_ENDPOINT_AP_LAN_RX];
ret = ipa_endpoint_enable_one(exception_endpoint);
if (ret)
-   goto err_table_teardown;
+   goto err_command_disable;
 
ipa_endpoint_default_route_set(ipa, exception_endpoint->endpoint_id);
 
@@ -179,10 +179,6 @@ int ipa_setup(struct ipa *ipa)
 err_default_route_clear:
ipa_endpoint_default_route_clear(ipa);
ipa_endpoint_disable_one(exception_endpoint);
-err_table_teardown:
-   ipa_table_teardown(ipa);
-err_mem_teardown:
-   ipa_mem_teardown(ipa);
 err_command_disable:
ipa_endpoint_disable_one(command_endpoint);
 err_endpoint_teardown:
@@ -211,8 +207,6 @@ static void ipa_teardown(struct ipa *ipa)
ipa_endpoint_default_route_clear(ipa);
exception_endpoint = ipa->name_map[IPA_ENDPOINT_AP_LAN_RX];
ipa_endpoint_disable_one(exception_endpoint);
-   ipa_table_teardown(ipa);
-   ipa_mem_teardown(ipa);
command_endpoint = ipa->name_map[IPA_ENDPOINT_AP_COMMAND_TX];
ipa_endpoint_disable_one(command_endpoint);
ipa_endpoint_teardown(ipa);
@@ -480,23 +474,20 @@ static int ipa_config(struct ipa *ipa, const struct 
ipa_data *data)
if (ret)
goto err_endpoint_deconfig;
 
-   ipa_table_config(ipa);
+   ipa_table_config(ipa);  /* No deconfig required */
 
-   /* Assign resource limitation to each group */
+   /* Assign resource limitation to each group; no deconfig required */
ret = ipa_resource_config(ipa, data->resource_data);
if (ret)
-   goto err_table_deconfig;
+   goto err_mem_deconfig;
 
ret = ipa_modem_config(ipa);
if (ret)
-   goto err_resource_deconfig;
+   goto err_mem_deconfig;
 
return 0;
 
-err_resource_deconfig:
-   ipa_resource_deconfig(ipa);
-err_table_deconfig:
-   ipa_table_deconfig(ipa);
+err_mem_deconfig:
ipa_mem_deconfig(ipa);
 err_endpoint_deconfig:
ipa_endpoint_deconfig(ipa);
@@ -514,8 +505,6 @@ static int ipa_config(struct ipa *ipa, const struct 
ipa_data *data)
 static void ipa_deconfig(struct ipa *ipa)
 {
ipa_modem_deconfig(ipa);
-   ipa_resource_deconfig(ipa);
-   ipa_table_deconfig(ipa);
ipa_mem_deconfig(ipa);
ipa_endpoint_deconfig(ipa);
ipa_hardware_deconfig(ipa);
diff --git a/drivers/net/ipa/ipa_mem.c b/drivers/net/ipa/ipa_mem.c
index 32907dde5dc6a..c5c3b1b7e67d5 100644
--- a/drivers/net/ipa/ipa_mem.c
+++ b/drivers/net/ipa/ipa_mem.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 
 /* Copyright (c) 2012-2018, The Linux Foundation. All rights reserved.
- * Copyright (C) 2019-2020 Linaro Ltd.
+ * Copyright (C) 2019-2021 Linaro Ltd.
  */
 
 #include 
@@ -53,6 +53,8 @@ ipa_mem_zero_region_add(struct gsi_trans *trans, const struct 
ipa_mem *mem)
  * The AP informs the modem where its portions of memory are located
  * in a QMI exchange that occurs at modem startup.
  *
+ * There is no need for a matching ipa_mem_teardown() function.
+ *
  * Return: 0 if successful, or a negative error code
  */
 int ipa_mem_setup(struct ipa *ipa)
@@ -97,11 +99,6 @@ int ipa_mem_setup(struct ipa *ipa)
return 0;
 }
 
-void i

[PATCH net-next 7/7] net: ipa: three small fixes

2021-04-09 Thread Alex Elder
Some time ago changes were made to stop referring to clearing the
hardware pipeline as a "tag process."  Fix a comment to use the
newer terminology.

Get rid of a pointless double-negation of the Boolean toward_ipa
flag in ipa_endpoint_config().

make ipa_endpoint_exit_one() private; it's only referenced inside
"ipa_endpoint.c".

Signed-off-by: Alex Elder 
---
 drivers/net/ipa/ipa_endpoint.c | 6 +++---
 drivers/net/ipa/ipa_endpoint.h | 2 --
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ipa/ipa_endpoint.c b/drivers/net/ipa/ipa_endpoint.c
index dd24179383c1c..72751843b2e48 100644
--- a/drivers/net/ipa/ipa_endpoint.c
+++ b/drivers/net/ipa/ipa_endpoint.c
@@ -397,7 +397,7 @@ int ipa_endpoint_modem_exception_reset_all(struct ipa *ipa)
/* We need one command per modem TX endpoint.  We can get an upper
 * bound on that by assuming all initialized endpoints are modem->IPA.
 * That won't happen, and we could be more precise, but this is fine
-* for now.  We need to end the transaction with a "tag process."
+* for now.  End the transaction with commands to clear the pipeline.
 */
count = hweight32(initialized) + ipa_cmd_pipeline_clear_count();
trans = ipa_cmd_trans_alloc(ipa, count);
@@ -1755,7 +1755,7 @@ int ipa_endpoint_config(struct ipa *ipa)
 
/* Make sure it's pointing in the right direction */
endpoint = >endpoint[endpoint_id];
-   if ((endpoint_id < rx_base) != !!endpoint->toward_ipa) {
+   if ((endpoint_id < rx_base) != endpoint->toward_ipa) {
dev_err(dev, "endpoint id %u wrong direction\n",
endpoint_id);
ret = -EINVAL;
@@ -1791,7 +1791,7 @@ static void ipa_endpoint_init_one(struct ipa *ipa, enum 
ipa_endpoint_name name,
ipa->initialized |= BIT(endpoint->endpoint_id);
 }
 
-void ipa_endpoint_exit_one(struct ipa_endpoint *endpoint)
+static void ipa_endpoint_exit_one(struct ipa_endpoint *endpoint)
 {
endpoint->ipa->initialized &= ~BIT(endpoint->endpoint_id);
 
diff --git a/drivers/net/ipa/ipa_endpoint.h b/drivers/net/ipa/ipa_endpoint.h
index f034a9e6ef215..0a859d10312dc 100644
--- a/drivers/net/ipa/ipa_endpoint.h
+++ b/drivers/net/ipa/ipa_endpoint.h
@@ -87,8 +87,6 @@ int ipa_endpoint_modem_exception_reset_all(struct ipa *ipa);
 
 int ipa_endpoint_skb_tx(struct ipa_endpoint *endpoint, struct sk_buff *skb);
 
-void ipa_endpoint_exit_one(struct ipa_endpoint *endpoint);
-
 int ipa_endpoint_enable_one(struct ipa_endpoint *endpoint);
 void ipa_endpoint_disable_one(struct ipa_endpoint *endpoint);
 
-- 
2.27.0



[PATCH net-next 6/7] net: ipa: get rid of empty GSI functions

2021-04-09 Thread Alex Elder
There are place holder functions in the GSI code that do nothing.
Remove these, knowing we can add something back in their place if
they're really needed someday.

Some of these are inverse functions (such as teardown to match setup).
Explicitly comment that there is no inverse in these cases.

Signed-off-by: Alex Elder 
---
 drivers/net/ipa/gsi.c | 54 +--
 1 file changed, 6 insertions(+), 48 deletions(-)

diff --git a/drivers/net/ipa/gsi.c b/drivers/net/ipa/gsi.c
index 1c835b3e1a437..9f06663cef263 100644
--- a/drivers/net/ipa/gsi.c
+++ b/drivers/net/ipa/gsi.c
@@ -198,7 +198,7 @@ static void gsi_irq_type_disable(struct gsi *gsi, enum 
gsi_irq_type_id type_id)
gsi_irq_type_update(gsi, gsi->type_enabled_bitmap & ~BIT(type_id));
 }
 
-/* Turn off all GSI interrupts initially */
+/* Turn off all GSI interrupts initially; there is no gsi_irq_teardown() */
 static void gsi_irq_setup(struct gsi *gsi)
 {
/* Disable all interrupt types */
@@ -217,12 +217,6 @@ static void gsi_irq_setup(struct gsi *gsi)
iowrite32(0, gsi->virt + GSI_CNTXT_GSI_IRQ_EN_OFFSET);
 }
 
-/* Turn off all GSI interrupts when we're all done */
-static void gsi_irq_teardown(struct gsi *gsi)
-{
-   /* Nothing to do */
-}
-
 /* Event ring commands are performed one at a time.  Their completion
  * is signaled by the event ring control GSI interrupt type, which is
  * only enabled when we issue an event ring command.  Only the event
@@ -786,7 +780,7 @@ static void gsi_channel_trans_quiesce(struct gsi_channel 
*channel)
}
 }
 
-/* Program a channel for use */
+/* Program a channel for use; there is no gsi_channel_deprogram() */
 static void gsi_channel_program(struct gsi_channel *channel, bool doorbell)
 {
size_t size = channel->tre_ring.count * GSI_RING_ELEMENT_SIZE;
@@ -874,11 +868,6 @@ static void gsi_channel_program(struct gsi_channel 
*channel, bool doorbell)
/* All done! */
 }
 
-static void gsi_channel_deprogram(struct gsi_channel *channel)
-{
-   /* Nothing to do */
-}
-
 static int __gsi_channel_start(struct gsi_channel *channel, bool start)
 {
struct gsi *gsi = channel->gsi;
@@ -1623,18 +1612,6 @@ static u32 gsi_event_bitmap_init(u32 evt_ring_max)
return event_bitmap;
 }
 
-/* Setup function for event rings */
-static void gsi_evt_ring_setup(struct gsi *gsi)
-{
-   /* Nothing to do */
-}
-
-/* Inverse of gsi_evt_ring_setup() */
-static void gsi_evt_ring_teardown(struct gsi *gsi)
-{
-   /* Nothing to do */
-}
-
 /* Setup function for a single channel */
 static int gsi_channel_setup_one(struct gsi *gsi, u32 channel_id)
 {
@@ -1684,7 +1661,6 @@ static void gsi_channel_teardown_one(struct gsi *gsi, u32 
channel_id)
 
netif_napi_del(>napi);
 
-   gsi_channel_deprogram(channel);
gsi_channel_de_alloc_command(gsi, channel_id);
gsi_evt_ring_reset_command(gsi, evt_ring_id);
gsi_evt_ring_de_alloc_command(gsi, evt_ring_id);
@@ -1759,7 +1735,6 @@ static int gsi_channel_setup(struct gsi *gsi)
u32 mask;
int ret;
 
-   gsi_evt_ring_setup(gsi);
gsi_irq_enable(gsi);
 
mutex_lock(>mutex);
@@ -1819,7 +1794,6 @@ static int gsi_channel_setup(struct gsi *gsi)
mutex_unlock(>mutex);
 
gsi_irq_disable(gsi);
-   gsi_evt_ring_teardown(gsi);
 
return ret;
 }
@@ -1848,7 +1822,6 @@ static void gsi_channel_teardown(struct gsi *gsi)
mutex_unlock(>mutex);
 
gsi_irq_disable(gsi);
-   gsi_evt_ring_teardown(gsi);
 }
 
 /* Setup function for GSI.  GSI firmware must be loaded and initialized */
@@ -1856,7 +1829,6 @@ int gsi_setup(struct gsi *gsi)
 {
struct device *dev = gsi->dev;
u32 val;
-   int ret;
 
/* Here is where we first touch the GSI hardware */
val = ioread32(gsi->virt + GSI_GSI_STATUS_OFFSET);
@@ -1865,7 +1837,7 @@ int gsi_setup(struct gsi *gsi)
return -EIO;
}
 
-   gsi_irq_setup(gsi);
+   gsi_irq_setup(gsi); /* No matching teardown required */
 
val = ioread32(gsi->virt + GSI_GSI_HW_PARAM_2_OFFSET);
 
@@ -1899,18 +1871,13 @@ int gsi_setup(struct gsi *gsi)
/* Writing 1 indicates IRQ interrupts; 0 would be MSI */
iowrite32(1, gsi->virt + GSI_CNTXT_INTSET_OFFSET);
 
-   ret = gsi_channel_setup(gsi);
-   if (ret)
-   gsi_irq_teardown(gsi);
-
-   return ret;
+   return gsi_channel_setup(gsi);
 }
 
 /* Inverse of gsi_setup() */
 void gsi_teardown(struct gsi *gsi)
 {
gsi_channel_teardown(gsi);
-   gsi_irq_teardown(gsi);
 }
 
 /* Initialize a channel's event ring */
@@ -1952,7 +1919,7 @@ static void gsi_channel_evt_ring_exit(struct gsi_channel 
*channel)
gsi_evt_ring_id_free(gsi, evt_ring_id);
 }
 
-/* Init function for event rings */
+/* Init function for event rings; there is no gsi_evt_ring_exit() */
 static void gsi_evt_ring_init(struct 

[PATCH net-next 4/7] net: ipa: ipa_stop() does not return an error

2021-04-09 Thread Alex Elder
In ipa_modem_stop(), if the modem netdev pointer is non-null we call
ipa_stop().  We check for an error and if one is returned we handle
it.  But ipa_stop() never returns an error, so this extra handling
is unnecessary.  Simplify the code in ipa_modem_stop() based on the
knowledge no error handling is needed at this spot.

Signed-off-by: Alex Elder 
---
 drivers/net/ipa/ipa_modem.c | 18 --
 1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ipa/ipa_modem.c b/drivers/net/ipa/ipa_modem.c
index 8a6ccebde2889..af9aedbde717a 100644
--- a/drivers/net/ipa/ipa_modem.c
+++ b/drivers/net/ipa/ipa_modem.c
@@ -240,7 +240,6 @@ int ipa_modem_stop(struct ipa *ipa)
 {
struct net_device *netdev = ipa->modem_netdev;
enum ipa_modem_state state;
-   int ret;
 
/* Only attempt to stop the modem if it's running */
state = atomic_cmpxchg(>modem_state, IPA_MODEM_STATE_RUNNING,
@@ -257,29 +256,20 @@ int ipa_modem_stop(struct ipa *ipa)
/* Prevent the modem from triggering a call to ipa_setup() */
ipa_smp2p_disable(ipa);
 
+   /* Stop the queue and disable the endpoints if it's open */
if (netdev) {
-   /* Stop the queue and disable the endpoints if it's open */
-   ret = ipa_stop(netdev);
-   if (ret)
-   goto out_set_state;
-
+   (void)ipa_stop(netdev);
ipa->name_map[IPA_ENDPOINT_AP_MODEM_RX]->netdev = NULL;
ipa->name_map[IPA_ENDPOINT_AP_MODEM_TX]->netdev = NULL;
ipa->modem_netdev = NULL;
unregister_netdev(netdev);
free_netdev(netdev);
-   } else {
-   ret = 0;
}
 
-out_set_state:
-   if (ret)
-   atomic_set(>modem_state, IPA_MODEM_STATE_RUNNING);
-   else
-   atomic_set(>modem_state, IPA_MODEM_STATE_STOPPED);
+   atomic_set(>modem_state, IPA_MODEM_STATE_STOPPED);
smp_mb__after_atomic();
 
-   return ret;
+   return 0;
 }
 
 /* Treat a "clean" modem stop the same as a crash */
-- 
2.27.0



[PATCH net-next 2/7] net: ipa: update sequence type for modem TX endpoint

2021-04-09 Thread Alex Elder
On IPA v3.5.1, the sequencer type for the modem TX endpoint does not
define the replication portion in the same way the downstream code
does.  This difference doesn't affect the behavior of the upstream
code, but I'd prefer the two code bases use the same configuration
value here.

Signed-off-by: Alex Elder 
---
 drivers/net/ipa/ipa_data-v3.5.1.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ipa/ipa_data-v3.5.1.c 
b/drivers/net/ipa/ipa_data-v3.5.1.c
index 57703e95a3f9c..ead1a82f32f5c 100644
--- a/drivers/net/ipa/ipa_data-v3.5.1.c
+++ b/drivers/net/ipa/ipa_data-v3.5.1.c
@@ -116,6 +116,7 @@ static const struct ipa_gsi_endpoint_data 
ipa_gsi_endpoint_data[] = {
.status_enable  = true,
.tx = {
.seq_type = IPA_SEQ_2_PASS_SKIP_LAST_UC,
+   .seq_rep_type = IPA_SEQ_REP_DMA_PARSER,
.status_endpoint =
IPA_ENDPOINT_MODEM_AP_RX,
},
-- 
2.27.0



[PATCH net-next 3/7] net: ipa: only set endpoint netdev pointer when in use

2021-04-09 Thread Alex Elder
In ipa_modem_start(), we set endpoint netdev pointers before the
network device is registered.  If registration fails, we don't undo
those assignments.  Instead, wait to assign the netdev pointer until
after registration succeeds.

Set these endpoint netdev pointers to NULL in ipa_modem_stop()
before unregistering the network device.

Signed-off-by: Alex Elder 
---
 drivers/net/ipa/ipa_modem.c | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ipa/ipa_modem.c b/drivers/net/ipa/ipa_modem.c
index 9b08eb8239846..8a6ccebde2889 100644
--- a/drivers/net/ipa/ipa_modem.c
+++ b/drivers/net/ipa/ipa_modem.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 
 /* Copyright (c) 2014-2018, The Linux Foundation. All rights reserved.
- * Copyright (C) 2018-2020 Linaro Ltd.
+ * Copyright (C) 2018-2021 Linaro Ltd.
  */
 
 #include 
@@ -213,18 +213,18 @@ int ipa_modem_start(struct ipa *ipa)
goto out_set_state;
}
 
-   ipa->name_map[IPA_ENDPOINT_AP_MODEM_TX]->netdev = netdev;
-   ipa->name_map[IPA_ENDPOINT_AP_MODEM_RX]->netdev = netdev;
-
SET_NETDEV_DEV(netdev, >pdev->dev);
priv = netdev_priv(netdev);
priv->ipa = ipa;
 
ret = register_netdev(netdev);
-   if (ret)
-   free_netdev(netdev);
-   else
+   if (!ret) {
ipa->modem_netdev = netdev;
+   ipa->name_map[IPA_ENDPOINT_AP_MODEM_TX]->netdev = netdev;
+   ipa->name_map[IPA_ENDPOINT_AP_MODEM_RX]->netdev = netdev;
+   } else {
+   free_netdev(netdev);
+   }
 
 out_set_state:
if (ret)
@@ -263,6 +263,8 @@ int ipa_modem_stop(struct ipa *ipa)
if (ret)
goto out_set_state;
 
+   ipa->name_map[IPA_ENDPOINT_AP_MODEM_RX]->netdev = NULL;
+   ipa->name_map[IPA_ENDPOINT_AP_MODEM_TX]->netdev = NULL;
ipa->modem_netdev = NULL;
unregister_netdev(netdev);
free_netdev(netdev);
-- 
2.27.0



[PATCH net-next 1/7] net: ipa: relax pool entry size requirement

2021-04-09 Thread Alex Elder
I no longer know why a validation check ensured the size of an entry
passed to gsi_trans_pool_init() was restricted to be a multiple of 8.
For 32-bit builds, this condition doesn't always hold, and for DMA
pools, the size is rounded up to a power of 2 anyway.

Remove this restriction.

Signed-off-by: Alex Elder 
---
 drivers/net/ipa/gsi_trans.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ipa/gsi_trans.c b/drivers/net/ipa/gsi_trans.c
index 70c2b585f98d6..8c795a6a85986 100644
--- a/drivers/net/ipa/gsi_trans.c
+++ b/drivers/net/ipa/gsi_trans.c
@@ -91,7 +91,7 @@ int gsi_trans_pool_init(struct gsi_trans_pool *pool, size_t 
size, u32 count,
void *virt;
 
 #ifdef IPA_VALIDATE
-   if (!size || size % 8)
+   if (!size)
return -EINVAL;
if (count < max_alloc)
return -EINVAL;
@@ -141,7 +141,7 @@ int gsi_trans_pool_init_dma(struct device *dev, struct 
gsi_trans_pool *pool,
void *virt;
 
 #ifdef IPA_VALIDATE
-   if (!size || size % 8)
+   if (!size)
return -EINVAL;
if (count < max_alloc)
return -EINVAL;
-- 
2.27.0



[PATCH net-next 0/7] net: ipa: a few small fixes

2021-04-09 Thread Alex Elder
This series implements some minor bug fixes or improvements.

The first patch removes an apparently unnecessary restriction, which
results in an error on a 32-bit ARM build.

The second makes a definition used for SDM845 match what is used in
the downstream code.

The third just ensures two netdev pointers are only non-null when
valid.

The fourth simplifies a little code, knowing that a called function
never returns an error.

The fifth and sixth just remove some empty/place holder functions.

And the last patch fixes a comment, makes a function private, and
removes an unnecessary double-negation of a Boolean variable.  This
patch produces a warning from checkpatch, indicating that a pair of
parentheses is unnecessary.  I agree with that advice, but it
conflicts with a suggestion from the compiler.  I left the "problem"
in place to avoid the compiler warning.

        -Alex


Alex Elder (7):
  net: ipa: relax pool entry size requirement
  net: ipa: update sequence type for modem TX endpoint
  net: ipa: only set endpoint netdev pointer when in use
  net: ipa: ipa_stop() does not return an error
  net: ipa: get rid of empty IPA functions
  net: ipa: get rid of empty GSI functions
  net: ipa: three small fixes

 drivers/net/ipa/gsi.c | 54 ---
 drivers/net/ipa/gsi_trans.c   |  4 +--
 drivers/net/ipa/ipa_data-v3.5.1.c |  1 +
 drivers/net/ipa/ipa_endpoint.c|  6 ++--
 drivers/net/ipa/ipa_endpoint.h|  2 --
 drivers/net/ipa/ipa_main.c| 29 ++---
 drivers/net/ipa/ipa_mem.c |  9 ++
 drivers/net/ipa/ipa_mem.h |  5 ++-
 drivers/net/ipa/ipa_modem.c   | 34 ---
 drivers/net/ipa/ipa_resource.c|  8 +
 drivers/net/ipa/ipa_resource.h|  8 ++---
 drivers/net/ipa/ipa_table.c   | 26 ++-
 drivers/net/ipa/ipa_table.h   | 16 +++--
 13 files changed, 49 insertions(+), 153 deletions(-)

-- 
2.27.0



[PATCH] ARM: dts: qcom: sdx55: add IPA information

2021-04-09 Thread Alex Elder
Add IPA-related nodes and definitions to "sdx55.dtsi".  The SMP2P
nodes (ipa_smp2p_out and ipa_smp2p_in) are already present.

Signed-off-by: Alex Elder 
---
Note: This depends on this series posted by Mani Sadhasivam:
  
https://lore.kernel.org/linux-arm-msm/20210408170457.91409-1-manivannan.sadhasi...@linaro.org

 arch/arm/boot/dts/qcom-sdx55.dtsi | 41 +++
 1 file changed, 41 insertions(+)

diff --git a/arch/arm/boot/dts/qcom-sdx55.dtsi 
b/arch/arm/boot/dts/qcom-sdx55.dtsi
index e4180bbc46555..0dc515dc5750d 100644
--- a/arch/arm/boot/dts/qcom-sdx55.dtsi
+++ b/arch/arm/boot/dts/qcom-sdx55.dtsi
@@ -215,6 +215,47 @@ qpic_nand: nand@1b3 {
status = "disabled";
};
 
+   ipa: ipa@1e4 {
+   compatible = "qcom,sdx55-ipa";
+
+   iommus = <_smmu 0x5e0 0x0>,
+<_smmu 0x5e2 0x0>;
+   reg = <0x1e4 0x7000>,
+ <0x1e5 0x4b20>,
+ <0x1e04000 0x2c000>;
+   reg-names = "ipa-reg",
+   "ipa-shared",
+   "gsi";
+
+   interrupts-extended = < GIC_SPI 241 
IRQ_TYPE_EDGE_RISING>,
+ < GIC_SPI 47 
IRQ_TYPE_LEVEL_HIGH>,
+ <_smp2p_in 0 
IRQ_TYPE_EDGE_RISING>,
+ <_smp2p_in 1 
IRQ_TYPE_EDGE_RISING>;
+   interrupt-names = "ipa",
+ "gsi",
+ "ipa-clock-query",
+ "ipa-setup-ready";
+
+   clocks = < RPMH_IPA_CLK>;
+   clock-names = "core";
+
+   interconnects = <_noc MASTER_IPA _noc 
SLAVE_SNOC_MEM_NOC_GC>,
+   <_noc MASTER_SNOC_GC_MEM_NOC 
_virt SLAVE_EBI_CH0>,
+   <_noc MASTER_IPA _noc 
SLAVE_OCIMEM>,
+   <_noc MASTER_AMPSS_M0 _noc 
SLAVE_IPA_CFG>;
+   interconnect-names = "memory-a",
+"memory-b",
+"imem",
+"config";
+
+   qcom,smem-states = <_smp2p_out 0>,
+  <_smp2p_out 1>;
+   qcom,smem-state-names = "ipa-clock-enabled-valid",
+   "ipa-clock-enabled";
+
+   status = "disabled";
+   };
+
tcsr_mutex: hwlock@1f4 {
compatible = "qcom,tcsr-mutex";
reg = <0x01f4 0x4>;
-- 
2.27.0



[PATCH] ARM: configs: qcom_defconfig: enable IPA and RMNET

2021-04-09 Thread Alex Elder
The SDX55 is a 32-bit ARM device that includes IPA v4.5.  Add
CONFIG_QCOM_IPA=m and CONFIG_RMNET=m to "qcom_defconfig".

Signed-off-by: Alex Elder 
---
 arch/arm/configs/qcom_defconfig | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm/configs/qcom_defconfig b/arch/arm/configs/qcom_defconfig
index 3f36887e83330..b30f57399477d 100644
--- a/arch/arm/configs/qcom_defconfig
+++ b/arch/arm/configs/qcom_defconfig
@@ -85,7 +85,9 @@ CONFIG_NETDEVICES=y
 CONFIG_DUMMY=y
 CONFIG_ATL1C=y
 CONFIG_KS8851=y
+CONFIG_RMNET=m
 CONFIG_SMSC911X=y
+CONFIG_QCOM_IPA=m
 CONFIG_MDIO_BITBANG=y
 CONFIG_MDIO_GPIO=y
 CONFIG_SLIP=y
-- 
2.27.0



Re: [PATCH v1 01/14] vfio: Create vfio_fs_type with inode per device

2021-04-09 Thread Alex Williamson
On Fri, 9 Apr 2021 04:54:23 +
"Zengtao (B)"  wrote:

> > -邮件原件-----
> > 发件人: Alex Williamson [mailto:alex.william...@redhat.com]
> > 发送时间: 2021年3月9日 5:47
> > 收件人: alex.william...@redhat.com
> > 抄送: coh...@redhat.com; k...@vger.kernel.org;
> > linux-kernel@vger.kernel.org; j...@nvidia.com; pet...@redhat.com
> > 主题: [PATCH v1 01/14] vfio: Create vfio_fs_type with inode per device
> > 
> > By linking all the device fds we provide to userspace to an address space
> > through a new pseudo fs, we can use tools like
> > unmap_mapping_range() to zap all vmas associated with a device.
> > 
> > Suggested-by: Jason Gunthorpe 
> > Signed-off-by: Alex Williamson 
> > ---
> >  drivers/vfio/vfio.c |   54
> > +++
> >  1 file changed, 54 insertions(+)
> > 
> > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index
> > 38779e6fd80c..abdf8d52a911 100644
> > --- a/drivers/vfio/vfio.c
> > +++ b/drivers/vfio/vfio.c
> > @@ -32,11 +32,18 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> > +#include   
> Minor: keep the headers in alphabetical order.

They started out that way, but various tree-wide changes ignoring that,
and likely oversights on my part as well, has left us with numerous
breaks in that rule already.

> > 
> >  #define DRIVER_VERSION "0.3"
> >  #define DRIVER_AUTHOR  "Alex Williamson "
> >  #define DRIVER_DESC"VFIO - User Level meta-driver"
> > 
> > +#define VFIO_MAGIC 0x5646494f /* "VFIO" */  
> Move to include/uapi/linux/magic.h ? 

Hmm, yeah, I suppose it probably should go there.  Thanks.

FWIW, I'm still working on a next version of this series, currently
struggling how to handle an arbitrary number of vmas per user DMA
mapping.  Thanks,

Alex



[PATCH] clk: qcom: rpmh: add support for SDX55 rpmh IPA clock

2021-04-09 Thread Alex Elder
The IPA core clock is required for SDX55.  Define it.

Signed-off-by: Alex Elder 
---
 drivers/clk/qcom/clk-rpmh.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/clk/qcom/clk-rpmh.c b/drivers/clk/qcom/clk-rpmh.c
index c623ce9004063..552d1cbfea4c0 100644
--- a/drivers/clk/qcom/clk-rpmh.c
+++ b/drivers/clk/qcom/clk-rpmh.c
@@ -380,6 +380,7 @@ static const struct clk_rpmh_desc clk_rpmh_sdm845 = {
 DEFINE_CLK_RPMH_VRM(sdx55, rf_clk1, rf_clk1_ao, "rfclkd1", 1);
 DEFINE_CLK_RPMH_VRM(sdx55, rf_clk2, rf_clk2_ao, "rfclkd2", 1);
 DEFINE_CLK_RPMH_BCM(sdx55, qpic_clk, "QP0");
+DEFINE_CLK_RPMH_BCM(sdx55, ipa, "IP0");
 
 static struct clk_hw *sdx55_rpmh_clocks[] = {
[RPMH_CXO_CLK]  = _bi_tcxo.hw,
@@ -389,6 +390,7 @@ static struct clk_hw *sdx55_rpmh_clocks[] = {
[RPMH_RF_CLK2]  = _rf_clk2.hw,
[RPMH_RF_CLK2_A]= _rf_clk2_ao.hw,
[RPMH_QPIC_CLK] = _qpic_clk.hw,
+   [RPMH_IPA_CLK]  = _ipa.hw,
 };
 
 static const struct clk_rpmh_desc clk_rpmh_sdx55 = {
-- 
2.27.0



Re: [PATCH v7] RISC-V: enable XIP

2021-04-09 Thread Alex Ghiti

Le 4/9/21 à 8:07 AM, David Hildenbrand a écrit :

On 09.04.21 13:39, Alex Ghiti wrote:

Hi David,

Le 4/9/21 à 4:23 AM, David Hildenbrand a écrit :

On 09.04.21 09:14, Alex Ghiti wrote:

Le 4/9/21 à 2:51 AM, Alexandre Ghiti a écrit :

From: Vitaly Wool 

Introduce XIP (eXecute In Place) support for RISC-V platforms.
It allows code to be executed directly from non-volatile storage
directly addressable by the CPU, such as QSPI NOR flash which can
be found on many RISC-V platforms. This makes way for significant
optimization of RAM footprint. The XIP kernel is not compressed
since it has to run directly from flash, so it will occupy more
space on the non-volatile storage. The physical flash address used
to link the kernel object files and for storing it has to be known
at compile time and is represented by a Kconfig option.

XIP on RISC-V will for the time being only work on MMU-enabled
kernels.


I added linux-mm and linux-arch to get feedbacks because I noticed that
DEBUG_VM_PGTABLE fails for SPARSEMEM (it works for FLATMEM but I think
it does not do what is expected): the fact that we don't have any 
struct

page to back the text and rodata in flash is the problem but to which
extent ?


Just wondering, why can't we create a memmap for that memory -- or is it
even desireable to not do that explicity? There might be some nasty side
effects when not having a memmap for text and rodata.



Do you have examples of such effects ? Any feature that will not work
without that ?



At least if it's not part of /proc/iomem in any way (maybe "System RAM" 
is not what we want without a memmap, TBD), kexec-tools won't be able to 
handle it properly e.g., for kdump. But not sure if that is really 
relevant in your setup.


Regarding other features, anything that does a pfn_valid(), 
pfn_to_page() or pfn_to_online_page() would behave differently now -- 
assuming the kernel doesn't fall into a section with other System RAM 
(whereby we would still allocate the memmap for the whole section).


I guess you might stumble over some surprises in some code paths, but 
nothing really comes to mind. Not sure if your zeropage is part of the 
kernel image on RISC-V (I remember that we sometimes need a memmap 
there, but I might be wrong)?



It is in the kernel image and is located in bss which will be in RAM and 
then be backed by a memmap.





I assume you still somehow create the direct mapping for the kernel, 
right? So it's really some memory region with a direct mapping but 
without a memmap (and right now, without a resource), correct?





No I don't create any direct mapping for the text and the rodata.



[...]



Also, will that memory properly be exposed in the resource tree as
System RAM (e.g., /proc/iomem) ? Otherwise some things (/proc/kcore)
won't work as expected - the kernel won't be included in a dump.



I have just checked and it does not appear in /proc/iomem.

Ok your conclusion would be to have struct page, I'm going to implement
this version then using memblock as you described.


Let's first evaluate what the harm could be. You could (and should?) 
create the kernel resource manually - IIRC, that's independent of the 
memmap/memblock thing.


@Mike, what's your take on not having a memmap for kernel text and ro data?



Re: [PATCH v7] RISC-V: enable XIP

2021-04-09 Thread Alex Ghiti

Hi David,

Le 4/9/21 à 4:23 AM, David Hildenbrand a écrit :

On 09.04.21 09:14, Alex Ghiti wrote:

Le 4/9/21 à 2:51 AM, Alexandre Ghiti a écrit :

From: Vitaly Wool 

Introduce XIP (eXecute In Place) support for RISC-V platforms.
It allows code to be executed directly from non-volatile storage
directly addressable by the CPU, such as QSPI NOR flash which can
be found on many RISC-V platforms. This makes way for significant
optimization of RAM footprint. The XIP kernel is not compressed
since it has to run directly from flash, so it will occupy more
space on the non-volatile storage. The physical flash address used
to link the kernel object files and for storing it has to be known
at compile time and is represented by a Kconfig option.

XIP on RISC-V will for the time being only work on MMU-enabled
kernels.


I added linux-mm and linux-arch to get feedbacks because I noticed that
DEBUG_VM_PGTABLE fails for SPARSEMEM (it works for FLATMEM but I think
it does not do what is expected): the fact that we don't have any struct
page to back the text and rodata in flash is the problem but to which
extent ?


Just wondering, why can't we create a memmap for that memory -- or is it 
even desireable to not do that explicity? There might be some nasty side 
effects when not having a memmap for text and rodata.



Do you have examples of such effects ? Any feature that will not work 
without that ?





I would assume stimply exposing the physical memory range to memblock as 
RAM and marking it reserved would create a memmap that's fully 
initialized like any bootmem (PG_reserved).


Or is there a reason why we cannot do that?



I did not want to do that if it was not needed as the overall goal of 
XIP kernel is to save RAM (I may be cheap but 16MB backed by struct page 
represents ~220KB).






Also, will that memory properly be exposed in the resource tree as 
System RAM (e.g., /proc/iomem) ? Otherwise some things (/proc/kcore) 
won't work as expected - the kernel won't be included in a dump.



I have just checked and it does not appear in /proc/iomem.

Ok your conclusion would be to have struct page, I'm going to implement 
this version then using memblock as you described.


Thanks David,

Alex






Thanks,

Alex


Signed-off-by: Alexandre Ghiti  [ Rebase on top of "Move
kernel mapping outside the linear mapping ]
Signed-off-by: Vitaly Wool 
---

Changes in v2:
- dedicated macro for XIP address fixup when MMU is not enabled yet
    o both for 32-bit and 64-bit RISC-V
- SP is explicitly set to a safe place in RAM before __copy_data call
- removed redundant alignment requirements in vmlinux-xip.lds.S
- changed long -> uintptr_t typecast in __XIP_FIXUP macro.
Changes in v3:
- rebased against latest for-next
- XIP address fixup macro now takes an argument
- SMP related fixes
Changes in v4:
- rebased against the current for-next
- less #ifdef's in C/ASM code
- dedicated XIP_FIXUP_OFFSET assembler macro in head.S
- C-specific definitions moved into #ifndef __ASSEMBLY__
- Fixed multi-core boot
Changes in v5:
- fixed build error for non-XIP kernels
Changes in v6:
- XIP_PHYS_RAM_BASE config option renamed to PHYS_RAM_BASE
- added PHYS_RAM_BASE_FIXED config flag to allow usage of
    PHYS_RAM_BASE in non-XIP configurations if needed
- XIP_FIXUP macro rewritten with a tempoarary variable to avoid side
    effects
- fixed crash for non-XIP kernels that don't use built-in DTB
Changes in v7:
- Fix pfn_base that required FIXUP
- Fix copy_data which lacked + 1 in size to copy
- Fix pfn_valid for FLATMEM
- Rebased on top of "Move kernel mapping outside the linear mapping":
    this is the biggest change and affected mm/init.c,
    kernel/vmlinux-xip.lds.S and include/asm/pgtable.h: XIP kernel is 
now

    mapped like 'normal' kernel at the end of the address space.

   arch/riscv/Kconfig  |  51 ++-
   arch/riscv/Makefile |   8 +-
   arch/riscv/boot/Makefile    |  13 +++
   arch/riscv/include/asm/page.h   |  28 ++
   arch/riscv/include/asm/pgtable.h    |  25 +-
   arch/riscv/kernel/head.S    |  46 +-
   arch/riscv/kernel/head.h    |   3 +
   arch/riscv/kernel/setup.c   |  10 ++-
   arch/riscv/kernel/vmlinux-xip.lds.S | 133 


   arch/riscv/kernel/vmlinux.lds.S |   6 ++
   arch/riscv/mm/init.c    | 118 ++--
   11 files changed, 424 insertions(+), 17 deletions(-)
   create mode 100644 arch/riscv/kernel/vmlinux-xip.lds.S

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 8ea60a0a19ae..4d0153805927 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -28,7 +28,7 @@ config RISCV
   select ARCH_HAS_PTE_SPECIAL
   select ARCH_HAS_SET_DIRECT_MAP
   select ARCH_HAS_SET_MEMORY
-    select ARCH_HAS_STRICT_KERNEL_RWX if MMU
+    select ARCH_HAS_STRICT_KERNEL_RWX if MMU && !XIP_KERNEL
   select ARCH_HAS_TICK_BROADCAST if GEN

Re: [PATCH v7] RISC-V: enable XIP

2021-04-09 Thread Alex Ghiti

Le 4/9/21 à 2:51 AM, Alexandre Ghiti a écrit :

From: Vitaly Wool 

Introduce XIP (eXecute In Place) support for RISC-V platforms.
It allows code to be executed directly from non-volatile storage
directly addressable by the CPU, such as QSPI NOR flash which can
be found on many RISC-V platforms. This makes way for significant
optimization of RAM footprint. The XIP kernel is not compressed
since it has to run directly from flash, so it will occupy more
space on the non-volatile storage. The physical flash address used
to link the kernel object files and for storing it has to be known
at compile time and is represented by a Kconfig option.

XIP on RISC-V will for the time being only work on MMU-enabled
kernels.

I added linux-mm and linux-arch to get feedbacks because I noticed that 
DEBUG_VM_PGTABLE fails for SPARSEMEM (it works for FLATMEM but I think 
it does not do what is expected): the fact that we don't have any struct 
page to back the text and rodata in flash is the problem but to which 
extent ?


Thanks,

Alex


Signed-off-by: Alexandre Ghiti  [ Rebase on top of "Move
kernel mapping outside the linear mapping ]
Signed-off-by: Vitaly Wool 
---

Changes in v2:
- dedicated macro for XIP address fixup when MMU is not enabled yet
   o both for 32-bit and 64-bit RISC-V
- SP is explicitly set to a safe place in RAM before __copy_data call
- removed redundant alignment requirements in vmlinux-xip.lds.S
- changed long -> uintptr_t typecast in __XIP_FIXUP macro.
Changes in v3:
- rebased against latest for-next
- XIP address fixup macro now takes an argument
- SMP related fixes
Changes in v4:
- rebased against the current for-next
- less #ifdef's in C/ASM code
- dedicated XIP_FIXUP_OFFSET assembler macro in head.S
- C-specific definitions moved into #ifndef __ASSEMBLY__
- Fixed multi-core boot
Changes in v5:
- fixed build error for non-XIP kernels
Changes in v6:
- XIP_PHYS_RAM_BASE config option renamed to PHYS_RAM_BASE
- added PHYS_RAM_BASE_FIXED config flag to allow usage of
   PHYS_RAM_BASE in non-XIP configurations if needed
- XIP_FIXUP macro rewritten with a tempoarary variable to avoid side
   effects
- fixed crash for non-XIP kernels that don't use built-in DTB
Changes in v7:
- Fix pfn_base that required FIXUP
- Fix copy_data which lacked + 1 in size to copy
- Fix pfn_valid for FLATMEM
- Rebased on top of "Move kernel mapping outside the linear mapping":
   this is the biggest change and affected mm/init.c,
   kernel/vmlinux-xip.lds.S and include/asm/pgtable.h: XIP kernel is now
   mapped like 'normal' kernel at the end of the address space.

  arch/riscv/Kconfig  |  51 ++-
  arch/riscv/Makefile |   8 +-
  arch/riscv/boot/Makefile|  13 +++
  arch/riscv/include/asm/page.h   |  28 ++
  arch/riscv/include/asm/pgtable.h|  25 +-
  arch/riscv/kernel/head.S|  46 +-
  arch/riscv/kernel/head.h|   3 +
  arch/riscv/kernel/setup.c   |  10 ++-
  arch/riscv/kernel/vmlinux-xip.lds.S | 133 
  arch/riscv/kernel/vmlinux.lds.S |   6 ++
  arch/riscv/mm/init.c| 118 ++--
  11 files changed, 424 insertions(+), 17 deletions(-)
  create mode 100644 arch/riscv/kernel/vmlinux-xip.lds.S

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 8ea60a0a19ae..4d0153805927 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -28,7 +28,7 @@ config RISCV
select ARCH_HAS_PTE_SPECIAL
select ARCH_HAS_SET_DIRECT_MAP
select ARCH_HAS_SET_MEMORY
-   select ARCH_HAS_STRICT_KERNEL_RWX if MMU
+   select ARCH_HAS_STRICT_KERNEL_RWX if MMU && !XIP_KERNEL
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX
select ARCH_OPTIONAL_KERNEL_RWX_DEFAULT
@@ -441,7 +441,7 @@ config EFI_STUB
  
  config EFI

bool "UEFI runtime support"
-   depends on OF
+   depends on OF && !XIP_KERNEL
select LIBFDT
select UCS2_STRING
select EFI_PARAMS_FROM_FDT
@@ -465,11 +465,56 @@ config STACKPROTECTOR_PER_TASK
def_bool y
depends on STACKPROTECTOR && CC_HAVE_STACKPROTECTOR_TLS
  
+config PHYS_RAM_BASE_FIXED

+   bool "Explicitly specified physical RAM address"
+   default n
+
+config PHYS_RAM_BASE
+   hex "Platform Physical RAM address"
+   depends on PHYS_RAM_BASE_FIXED
+   default "0x8000"
+   help
+ This is the physical address of RAM in the system. It has to be
+ explicitly specified to run early relocations of read-write data
+ from flash to RAM.
+
+config XIP_KERNEL
+   bool "Kernel Execute-In-Place from ROM"
+   depends on MMU
+   select PHYS_RAM_BASE_FIXED
+   help
+ Execute-In-Place allows the kernel to run from non-volatile storage

Re: [PATCH v1] drm/radeon: Fix a missing check bug in radeon_dp_mst_detect()

2021-04-08 Thread Alex Deucher
Applied.  Thanks!

Alex

On Wed, Apr 7, 2021 at 2:23 AM  wrote:
>
> From: Yingjie Wang 
>
> In radeon_dp_mst_detect(), We should check whether or not @connector
> has been unregistered from userspace. If the connector is unregistered,
> we should return disconnected status.
>
> Fixes: 9843ead08f18 ("drm/radeon: add DisplayPort MST support (v2)")
> Signed-off-by: Yingjie Wang 
> ---
>  drivers/gpu/drm/radeon/radeon_dp_mst.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/radeon/radeon_dp_mst.c 
> b/drivers/gpu/drm/radeon/radeon_dp_mst.c
> index 2c32186c4acd..4e4c937c36c6 100644
> --- a/drivers/gpu/drm/radeon/radeon_dp_mst.c
> +++ b/drivers/gpu/drm/radeon/radeon_dp_mst.c
> @@ -242,6 +242,9 @@ radeon_dp_mst_detect(struct drm_connector *connector,
> to_radeon_connector(connector);
> struct radeon_connector *master = radeon_connector->mst_port;
>
> +   if (drm_connector_is_unregistered(connector))
> +   return connector_status_disconnected;
> +
> return drm_dp_mst_detect_port(connector, ctx, >mst_mgr,
>   radeon_connector->port);
>  }
> --
> 2.7.4
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/amd/pm: convert sysfs snprintf to sysfs_emit

2021-04-07 Thread Alex Deucher
On Tue, Apr 6, 2021 at 10:13 AM Carlis  wrote:
>
> From: Xuezhi Zhang 
>
> Fix the following coccicheck warning:
> drivers/gpu/drm/amd/pm//amdgpu_pm.c:1940:8-16:
> WARNING: use scnprintf or sprintf
> drivers/gpu/drm/amd/pm//amdgpu_pm.c:1978:8-16:
> WARNING: use scnprintf or sprintf
> drivers/gpu/drm/amd/pm//amdgpu_pm.c:2022:8-16:
> WARNING: use scnprintf or sprintf
> drivers/gpu/drm/amd/pm//amdgpu_pm.c:294:8-16:
> WARNING: use scnprintf or sprintf
> drivers/gpu/drm/amd/pm//amdgpu_pm.c:154:8-16:
> WARNING: use scnprintf or sprintf
> drivers/gpu/drm/amd/pm//amdgpu_pm.c:496:8-16:
> WARNING: use scnprintf or sprintf
> drivers/gpu/drm/amd/pm//amdgpu_pm.c:512:9-17:
> WARNING: use scnprintf or sprintf
> drivers/gpu/drm/amd/pm//amdgpu_pm.c:1740:8-16:
> WARNING: use scnprintf or sprintf
> drivers/gpu/drm/amd/pm//amdgpu_pm.c:1667:8-16:
> WARNING: use scnprintf or sprintf
> drivers/gpu/drm/amd/pm//amdgpu_pm.c:2074:8-16:
> WARNING: use scnprintf or sprintf
> drivers/gpu/drm/amd/pm//amdgpu_pm.c:2047:9-17:
> WARNING: use scnprintf or sprintf
> drivers/gpu/drm/amd/pm//amdgpu_pm.c:2768:8-16:
> WARNING: use scnprintf or sprintf
> drivers/gpu/drm/amd/pm//amdgpu_pm.c:2738:8-16:
> WARNING: use scnprintf or sprintf
> drivers/gpu/drm/amd/pm//amdgpu_pm.c:2442:8-16:
> WARNING: use scnprintf or sprintf
> drivers/gpu/drm/amd/pm//amdgpu_pm.c:3246:8-16:
> WARNING: use scnprintf or sprintf
> drivers/gpu/drm/amd/pm//amdgpu_pm.c:3253:8-16:
> WARNING: use scnprintf or sprintf
> drivers/gpu/drm/amd/pm//amdgpu_pm.c:2458:8-16:
> WARNING: use scnprintf or sprintf
> drivers/gpu/drm/amd/pm//amdgpu_pm.c:3047:8-16:
> WARNING: use scnprintf or sprintf
> drivers/gpu/drm/amd/pm//amdgpu_pm.c:3133:8-16:
> WARNING: use scnprintf or sprintf
> drivers/gpu/drm/amd/pm//amdgpu_pm.c:3209:8-16:
> WARNING: use scnprintf or sprintf
> drivers/gpu/drm/amd/pm//amdgpu_pm.c:3216:8-16:
> WARNING: use scnprintf or sprintf
> drivers/gpu/drm/amd/pm//amdgpu_pm.c:2410:8-16:
> WARNING: use scnprintf or sprintf
> drivers/gpu/drm/amd/pm//amdgpu_pm.c:2496:8-16:
> WARNING: use scnprintf or sprintf
> drivers/gpu/drm/amd/pm//amdgpu_pm.c:2470:8-16:
> WARNING: use scnprintf or sprintf
> drivers/gpu/drm/amd/pm//amdgpu_pm.c:2426:8-16:
> WARNING: use scnprintf or sprintf
> drivers/gpu/drm/amd/pm//amdgpu_pm.c:2965:8-16:
> WARNING: use scnprintf or sprintf
> drivers/gpu/drm/amd/pm//amdgpu_pm.c:2972:8-16:
> WARNING: use scnprintf or sprintf
> drivers/gpu/drm/amd/pm//amdgpu_pm.c:3006:8-16:
> WARNING: use scnprintf or sprintf
> drivers/gpu/drm/amd/pm//amdgpu_pm.c:3013:8-16:
> WARNING: use scnprintf or sprintf
>
> Signed-off-by: Xuezhi Zhang 

I already applied a similar patch last week.

Thanks,

Alex


> ---
>  drivers/gpu/drm/amd/pm/amdgpu_pm.c | 58 +++---
>  1 file changed, 29 insertions(+), 29 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c 
> b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
> index 5fa65f191a37..2777966ec1ca 100644
> --- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
> +++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
> @@ -151,7 +151,7 @@ static ssize_t amdgpu_get_power_dpm_state(struct device 
> *dev,
> pm_runtime_mark_last_busy(ddev->dev);
> pm_runtime_put_autosuspend(ddev->dev);
>
> -   return snprintf(buf, PAGE_SIZE, "%s\n",
> +   return sysfs_emit(buf, "%s\n",
> (pm == POWER_STATE_TYPE_BATTERY) ? "battery" :
> (pm == POWER_STATE_TYPE_BALANCED) ? "balanced" : 
> "performance");
>  }
> @@ -291,7 +291,7 @@ static ssize_t 
> amdgpu_get_power_dpm_force_performance_level(struct device *dev,
> pm_runtime_mark_last_busy(ddev->dev);
> pm_runtime_put_autosuspend(ddev->dev);
>
> -   return snprintf(buf, PAGE_SIZE, "%s\n",
> +   return sysfs_emit(buf, "%s\n",
> (level == AMD_DPM_FORCED_LEVEL_AUTO) ? "auto" :
> (level == AMD_DPM_FORCED_LEVEL_LOW) ? "low" :
> (level == AMD_DPM_FORCED_LEVEL_HIGH) ? "high" :
> @@ -493,7 +493,7 @@ static ssize_t amdgpu_get_pp_cur_state(struct device *dev,
> if (i == data.nums)
> i = -EINVAL;
>
> -   return snprintf(buf, PAGE_SIZE, "%d\n", i);
> +   return sysfs_emit(buf, "%d\n", i);
>  }
>
>  static ssize_t amdgpu_get_pp_force_state(struct device *dev,
> @@ -509,7 +509,7 @@ static ssize_t amdgpu_get_pp_force_state(struct device 
> *dev,
> if (adev->pp_force_state_enabled)
> return amdgpu_get_pp_cur_state(dev, attr, buf);
> else
> -  

Re: [PATCH] driver: of: Properly truncate command line if too long

2021-04-07 Thread Alex Ghiti

Hi Andy,

Le 4/6/21 à 6:56 PM, Andy Shevchenko a écrit :



On Tuesday, March 16, 2021, Alexandre Ghiti <mailto:a...@ghiti.fr>> wrote:


In case the command line given by the user is too long, warn about it
and truncate it to the last full argument.

This is what efi already does in commit 80b1bfe1cb2f ("efi/libstub:
Don't parse overlong command lines").

Reported-by: Dmitry Vyukov mailto:dvyu...@google.com>>
Signed-off-by: Alexandre Ghiti mailto:a...@ghiti.fr>>
---
  drivers/of/fdt.c | 21 -
  1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index dcc1dd96911a..de4c6f9bac39 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -25,6 +25,7 @@
  #include 
  #include 
  #include 
+#include 

  #include   /* for COMMAND_LINE_SIZE */
  #include 
@@ -1050,9 +1051,27 @@ int __init early_init_dt_scan_chosen(unsigned
long node, const char *uname,

         /* Retrieve command line */
         p = of_get_flat_dt_prop(node, "bootargs", );
-       if (p != NULL && l > 0)
+       if (p != NULL && l > 0) {
                 strlcpy(data, p, min(l, COMMAND_LINE_SIZE));

+               /*
+                * If the given command line size is larger than
+                * COMMAND_LINE_SIZE, truncate it to the last complete
+                * parameter.
+                */
+               if (l > COMMAND_LINE_SIZE) {
+                       char *cmd_p = (char *)data +
COMMAND_LINE_SIZE - 1;
+
+                       while (!isspace(*cmd_p))
+                               cmd_p--;


Shouldn’t you check for cmd_p being always bigger than or equal to data?


Yes you're right.



+
+                       *cmd_p = '\0';
+
+                       pr_err("Command line is too long: truncated
to %d bytes\n",
+                              (int)(cmd_p - (char *)data + 1));


Do you really need that casting?


No, I can use %td to print a pointer difference.

I'll send a v2.

Thanks,

Alex



+               }
+       }
+
         /*
          * CONFIG_CMDLINE is meant to be a default in case nothing else
          * managed to set the command line, unless CONFIG_CMDLINE_FORCE
-- 
2.20.1




--
With Best Regards,
Andy Shevchenko




Re: [PATCH] drm/radeon/ttm: Fix memory leak userptr pages

2021-04-06 Thread Alex Deucher
On Mon, Mar 22, 2021 at 6:34 AM Christian König
 wrote:
>
> Hi Daniel,
>
> Am 22.03.21 um 10:38 schrieb Daniel Gomez:
> > On Fri, 19 Mar 2021 at 21:29, Felix Kuehling  wrote:
> >> This caused a regression in kfdtest in a large-buffer stress test after
> >> memory allocation for user pages fails:
> > I'm sorry to hear that. BTW, I guess you meant amdgpu leak patch and
> > not this one.
> > Just some background for the mem leak patch if helps to understand this:
> > The leak was introduce here:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0b988ca1c7c4c73983b4ea96ef7c2af2263c87eb
> > where the bound status was introduced for all drm drivers including
> > radeon and amdgpu. So this patch just reverts the logic to the
> > original code but keeping the bound status. In my case, the binding
> > code allocates the user pages memory and returns without bounding (at
> > amdgpu_gtt_mgr_has_gart_addr). So,
> > when the unbinding happens, the memory needs to be cleared to prevent the 
> > leak.
>
> Ah, now I understand what's happening here. Daniel your patch is not
> really correct.
>
> The problem is rather that we don't set the tt object to bound if it
> doesn't have a GTT address.
>
> Going to provide a patch for this.

Did this patch ever land?

Alex

>
> Regards,
> Christian.
>
> >
> >> [17359.536303] amdgpu: init_user_pages: Failed to get user pages: -16
> >> [17359.543746] BUG: kernel NULL pointer dereference, address: 
> >> 
> >> [17359.551494] #PF: supervisor read access in kernel mode
> >> [17359.557375] #PF: error_code(0x) - not-present page
> >> [17359.563247] PGD 0 P4D 0
> >> [17359.566514] Oops:  [#1] SMP PTI
> >> [17359.570728] CPU: 8 PID: 5944 Comm: kfdtest Not tainted 
> >> 5.11.0-kfd-fkuehlin #193
> >> [17359.578760] Hardware name: ASUS All Series/X99-E WS/USB 3.1, BIOS 3201 
> >> 06/17/2016
> >> [17359.586971] RIP: 0010:amdgpu_ttm_backend_unbind+0x52/0x110 [amdgpu]
> >> [17359.594075] Code: 48 39 c6 74 1b 8b 53 0c 48 8d bd 80 a1 ff ff e8 24 62 
> >> 00 00 85 c0 0f 85 ab 00 00 00 c6 43 54 00 5b 5d c3 48 8b 46 10 8b 4e 50 
> >> <48> 8b 30 48 85 f6 74 ba 8b 50 0c 48 8b bf 80 a1 ff ff 83 e1 01 45
> >> [17359.614340] RSP: 0018:a4764971fc98 EFLAGS: 00010206
> >> [17359.620315] RAX:  RBX: 950e8d4edf00 RCX: 
> >> 
> >> [17359.628204] RDX:  RSI: 950e8d4edf00 RDI: 
> >> 950eadec5e80
> >> [17359.636084] RBP: 950eadec5e80 R08:  R09: 
> >> 
> >> [17359.643958] R10: 0246 R11: 0001 R12: 
> >> 950c03377800
> >> [17359.651833] R13: 950eadec5e80 R14: 950c03377858 R15: 
> >> 
> >> [17359.659701] FS:  7febb20cb740() GS:950ebfc0() 
> >> knlGS:
> >> [17359.668528] CS:  0010 DS:  ES:  CR0: 80050033
> >> [17359.675012] CR2:  CR3: 0006d700e005 CR4: 
> >> 001706e0
> >> [17359.682883] Call Trace:
> >> [17359.686063]  amdgpu_ttm_backend_destroy+0x12/0x70 [amdgpu]
> >> [17359.692349]  ttm_bo_cleanup_memtype_use+0x37/0x60 [ttm]
> >> [17359.698307]  ttm_bo_release+0x278/0x5e0 [ttm]
> >> [17359.703385]  amdgpu_bo_unref+0x1a/0x30 [amdgpu]
> >> [17359.708701]  amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x7e5/0x910 
> >> [amdgpu]
> >> [17359.716307]  kfd_ioctl_alloc_memory_of_gpu+0x11a/0x220 [amdgpu]
> >> [17359.723036]  kfd_ioctl+0x223/0x400 [amdgpu]
> >> [17359.728017]  ? kfd_dev_is_large_bar+0x90/0x90 [amdgpu]
> >> [17359.734152]  __x64_sys_ioctl+0x8b/0xd0
> >> [17359.738796]  do_syscall_64+0x2d/0x40
> >> [17359.743259]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >> [17359.749205] RIP: 0033:0x7febb083b6d7
> >> [17359.753681] Code: b3 66 90 48 8b 05 b1 47 2d 00 64 c7 00 26 00 00 00 48 
> >> c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 
> >> <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 47 2d 00 f7 d8 64 89 01 48
> >> [17359.774340] RSP: 002b:7ffdb5522cd8 EFLAGS: 0202 ORIG_RAX: 
> >> 0010
> >> [17359.782668] RAX: ffda RBX: 0001 RCX: 
> >> 7febb083b6d7
> >> [17359.790566] RDX: 7ffdb5522d60 RSI: c0284b16 RDI: 
> >> 0003
> >> [17359.798459] RBP: 7ffdb5522d10 R08: 7ffdb5522dd0 R09: 
> >> c

Re: [PATCH 1/2] vfio/pci: remove vfio_pci_nvlink2

2021-04-06 Thread Alex Williamson
On Fri, 26 Mar 2021 07:13:10 +0100
Christoph Hellwig  wrote:

> This driver never had any open userspace (which for VFIO would include
> VM kernel drivers) that use it, and thus should never have been added
> by our normal userspace ABI rules.
> 
> Signed-off-by: Christoph Hellwig 
> Acked-by: Greg Kroah-Hartman 
> ---
>  drivers/vfio/pci/Kconfig|   6 -
>  drivers/vfio/pci/Makefile   |   1 -
>  drivers/vfio/pci/vfio_pci.c |  18 -
>  drivers/vfio/pci/vfio_pci_nvlink2.c | 490 
>  drivers/vfio/pci/vfio_pci_private.h |  14 -
>  include/uapi/linux/vfio.h   |  38 +--
>  6 files changed, 4 insertions(+), 563 deletions(-)
>  delete mode 100644 drivers/vfio/pci/vfio_pci_nvlink2.c

Hearing no objections, applied to vfio next branch for v5.13.  Thanks,

Alex



Re: [PATCH v1] vfio/type1: Remove the almost unused check in vfio_iommu_type1_unpin_pages

2021-04-06 Thread Alex Williamson
On Tue, 6 Apr 2021 21:50:09 +0800
Shenming Lu  wrote:

> The check i > npage at the end of vfio_iommu_type1_unpin_pages is unused
> unless npage < 0, but if npage < 0, this function will return npage, which
> should return -EINVAL instead. So let's just check the parameter npage at
> the start of the function. By the way, replace unpin_exit with break.
> 
> Signed-off-by: Shenming Lu 
> ---
>  drivers/vfio/vfio_iommu_type1.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 45cbfd4879a5..fd4213c41743 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -960,7 +960,7 @@ static int vfio_iommu_type1_unpin_pages(void *iommu_data,
>   bool do_accounting;
>   int i;
>  
> - if (!iommu || !user_pfn)
> + if (!iommu || !user_pfn || npage <= 0)
>   return -EINVAL;
>  
>   /* Supported for v2 version only */
> @@ -977,13 +977,13 @@ static int vfio_iommu_type1_unpin_pages(void 
> *iommu_data,
>   iova = user_pfn[i] << PAGE_SHIFT;
>   dma = vfio_find_dma(iommu, iova, PAGE_SIZE);
>   if (!dma)
> - goto unpin_exit;
> + break;
> +
>   vfio_unpin_page_external(dma, iova, do_accounting);
>   }
>  
> -unpin_exit:
>   mutex_unlock(>lock);
> - return i > npage ? npage : (i > 0 ? i : -EINVAL);
> + return i > 0 ? i : -EINVAL;
>  }
>  
>  static long vfio_sync_unpin(struct vfio_dma *dma, struct vfio_domain *domain,

Very odd behavior previously.  Applied to vfio next branch for v5.13.
Thanks,

Alex



Re: [PATCH 0/4] vfio: fix a couple of spelling mistakes detected by codespell tool

2021-04-06 Thread Alex Williamson
On Fri, 26 Mar 2021 16:35:24 +0800
Zhen Lei  wrote:

> This detection and correction covers the entire driver/vfio directory.
> 
> Zhen Lei (4):
>   vfio/type1: fix a couple of spelling mistakes
>   vfio/mdev: Fix spelling mistake "interal" -> "internal"
>   vfio/pci: fix a couple of spelling mistakes
>   vfio/platform: Fix spelling mistake "registe" -> "register"
> 
>  drivers/vfio/mdev/mdev_private.h | 2 +-
>  drivers/vfio/pci/vfio_pci.c  | 2 +-
>  drivers/vfio/pci/vfio_pci_config.c   | 2 +-
>  drivers/vfio/pci/vfio_pci_nvlink2.c  | 4 ++--
>  drivers/vfio/platform/reset/vfio_platform_calxedaxgmac.c | 2 +-
>  drivers/vfio/vfio_iommu_type1.c  | 6 +++---
>  6 files changed, 9 insertions(+), 9 deletions(-)
> 

Applied to vfio next branch for v5.13.  Thanks,

Alex



Re: [PATCH] vfio: pci: Spello fix in the file vfio_pci.c

2021-04-06 Thread Alex Williamson
On Sun, 14 Mar 2021 10:59:25 +0530
Bhaskar Chowdhury  wrote:

> s/permision/permission/
> 
> Signed-off-by: Bhaskar Chowdhury 
> ---
>  drivers/vfio/pci/vfio_pci.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 706de3ef94bb..62f137692a4f 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -2411,7 +2411,7 @@ static int __init vfio_pci_init(void)
>  {
>   int ret;
> 
> - /* Allocate shared config space permision data used by all devices */
> + /* Allocate shared config space permission data used by all devices */
>   ret = vfio_pci_init_perm_bits();
>   if (ret)
>   return ret;
> --
> 2.26.2
> 

Applied to vfio next branch for v5.13.  Thanks,

Alex



Re: [PATCH v3 2/5] RISC-V: Add kexec support

2021-04-06 Thread Alex Ghiti



Le 4/5/21 à 4:57 AM, Nick Kossifidis a écrit :

This patch adds support for kexec on RISC-V. On SMP systems it depends
on HOTPLUG_CPU in order to be able to bring up all harts after kexec.
It also needs a recent OpenSBI version that supports the HSM extension.
I tested it on riscv64 QEMU on both an smp and a non-smp system.

v5:
  * For now depend on MMU, further changes needed for NOMMU support
  * Make sure stvec is aligned
  * Cleanup some unneeded fences
  * Verify control code's buffer size
  * Compile kexec_relocate.S with medany and norelax

v4:
  * No functional changes, just re-based

v3:
  * Use the new smp_shutdown_nonboot_cpus() call.
  * Move riscv_kexec_relocate to .rodata

v2:
  * Pass needed parameters as arguments to riscv_kexec_relocate
instead of using global variables.
  * Use kimage_arch to hold the fdt address of the included fdt.
  * Use SYM_* macros on kexec_relocate.S.
  * Compatibility with STRICT_KERNEL_RWX.
  * Compatibility with HOTPLUG_CPU for SMP
  * Small cleanups

Signed-off-by: Nick Kossifidis 
---
  arch/riscv/Kconfig |  15 +++
  arch/riscv/include/asm/kexec.h |  47 
  arch/riscv/kernel/Makefile |   5 +
  arch/riscv/kernel/kexec_relocate.S | 156 
  arch/riscv/kernel/machine_kexec.c  | 186 +
  5 files changed, 409 insertions(+)
  create mode 100644 arch/riscv/include/asm/kexec.h
  create mode 100644 arch/riscv/kernel/kexec_relocate.S
  create mode 100644 arch/riscv/kernel/machine_kexec.c

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 8ea60a0a1..3716262ef 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -389,6 +389,21 @@ config RISCV_SBI_V01
help
  This config allows kernel to use SBI v0.1 APIs. This will be
  deprecated in future once legacy M-mode software are no longer in use.
+
+config KEXEC
+   bool "Kexec system call"
+   select KEXEC_CORE
+   select HOTPLUG_CPU if SMP
+   depends on MMU
+   help
+ kexec is a system call that implements the ability to shutdown your
+ current kernel, and to start another kernel. It is like a reboot
+ but it is independent of the system firmware. And like a reboot
+ you can start any kernel with it, not just Linux.
+
+ The name comes from the similarity to the exec system call.
+
+
  endmenu
  
  menu "Boot options"

diff --git a/arch/riscv/include/asm/kexec.h b/arch/riscv/include/asm/kexec.h
new file mode 100644
index 0..efc69feb4
--- /dev/null
+++ b/arch/riscv/include/asm/kexec.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2019 FORTH-ICS/CARV
+ *  Nick Kossifidis 
+ */
+
+#ifndef _RISCV_KEXEC_H
+#define _RISCV_KEXEC_H
+
+/* Maximum physical address we can use pages from */
+#define KEXEC_SOURCE_MEMORY_LIMIT (-1UL)
+
+/* Maximum address we can reach in physical address mode */
+#define KEXEC_DESTINATION_MEMORY_LIMIT (-1UL)
+
+/* Maximum address we can use for the control code buffer */
+#define KEXEC_CONTROL_MEMORY_LIMIT (-1UL)
+
+/* Reserve a page for the control code buffer */
+#define KEXEC_CONTROL_PAGE_SIZE 4096


PAGE_SIZE instead ?


+
+#define KEXEC_ARCH KEXEC_ARCH_RISCV
+
+static inline void
+crash_setup_regs(struct pt_regs *newregs,
+struct pt_regs *oldregs)
+{
+   /* Dummy implementation for now */
+}
+
+
+#define ARCH_HAS_KIMAGE_ARCH
+
+struct kimage_arch {
+   unsigned long fdt_addr;
+};
+
+const extern unsigned char riscv_kexec_relocate[];
+const extern unsigned int riscv_kexec_relocate_size;
+
+typedef void (*riscv_kexec_do_relocate)(unsigned long first_ind_entry,
+   unsigned long jump_addr,
+   unsigned long fdt_addr,
+   unsigned long hartid,
+   unsigned long va_pa_off);
+
+#endif
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index 3dc0abde9..c2594018c 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -9,6 +9,10 @@ CFLAGS_REMOVE_patch.o  = $(CC_FLAGS_FTRACE)
  CFLAGS_REMOVE_sbi.o   = $(CC_FLAGS_FTRACE)
  endif
  
+ifdef CONFIG_KEXEC

+AFLAGS_kexec_relocate.o := -mcmodel=medany -mno-relax
+endif
+
  extra-y += head.o
  extra-y += vmlinux.lds
  
@@ -54,6 +58,7 @@ obj-$(CONFIG_SMP) += cpu_ops_sbi.o

  endif
  obj-$(CONFIG_HOTPLUG_CPU) += cpu-hotplug.o
  obj-$(CONFIG_KGDB)+= kgdb.o
+obj-${CONFIG_KEXEC}+= kexec_relocate.o machine_kexec.o


Other obj-$() use parenthesis.

  
  obj-$(CONFIG_JUMP_LABEL)	+= jump_label.o
  
diff --git a/arch/riscv/kernel/kexec_relocate.S b/arch/riscv/kernel/kexec_relocate.S

new file mode 100644
index 0..616c20771
--- /dev/null
+++ b/arch/riscv/kernel/kexec_relocate.S
@@ -0,0 +1,156 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2019 FORTH-ICS/CARV
+ *  Nick Kossifidis 
+ */
+
+#include 

Re: [PATCH v3 4/5] RISC-V: Add kdump support

2021-04-06 Thread Alex Ghiti

Hi Nick,

Le 4/5/21 à 4:57 AM, Nick Kossifidis a écrit :

This patch adds support for kdump, the kernel will reserve a
region for the crash kernel and jump there on panic. In order
for userspace tools (kexec-tools) to prepare the crash kernel
kexec image, we also need to expose some information on
/proc/iomem for the memory regions used by the kernel and for
the region reserved for crash kernel. Note that on userspace
the device tree is used to determine the system's memory
layout so the "System RAM" on /proc/iomem is ignored.

I tested this on riscv64 qemu and works as expected, you may
test it by triggering a crash through /proc/sysrq_trigger:

echo c > /proc/sysrq_trigger

v3:
  * Move ELF_CORE_COPY_REGS to asm/elf.h instead of uapi/asm/elf.h
  * Set stvec when disabling MMU
  * Minor cleanups and re-base

v2:
  * Properly populate the ioresources tree, so that it can be
used later on for implementing strict /dev/mem.
  * Minor cleanups and re-base

Signed-off-by: Nick Kossifidis 
---
  arch/riscv/include/asm/elf.h|  6 +++
  arch/riscv/include/asm/kexec.h  | 19 ---
  arch/riscv/kernel/Makefile  |  2 +-
  arch/riscv/kernel/crash_save_regs.S | 56 +
  arch/riscv/kernel/kexec_relocate.S  | 68 -
  arch/riscv/kernel/machine_kexec.c   | 43 +---
  arch/riscv/kernel/setup.c   | 11 -
  arch/riscv/mm/init.c| 77 +
  8 files changed, 255 insertions(+), 27 deletions(-)
  create mode 100644 arch/riscv/kernel/crash_save_regs.S

diff --git a/arch/riscv/include/asm/elf.h b/arch/riscv/include/asm/elf.h
index 5c725e1df..f4b490cd0 100644
--- a/arch/riscv/include/asm/elf.h
+++ b/arch/riscv/include/asm/elf.h
@@ -81,4 +81,10 @@ extern int arch_setup_additional_pages(struct linux_binprm 
*bprm,
int uses_interp);
  #endif /* CONFIG_MMU */
  
+#define ELF_CORE_COPY_REGS(dest, regs)			\

+do {   \
+   *(struct user_regs_struct *)&(dest) =   \
+   *(struct user_regs_struct *)regs;   \
+} while (0);
+
  #endif /* _ASM_RISCV_ELF_H */
diff --git a/arch/riscv/include/asm/kexec.h b/arch/riscv/include/asm/kexec.h
index efc69feb4..4fd583acc 100644
--- a/arch/riscv/include/asm/kexec.h
+++ b/arch/riscv/include/asm/kexec.h
@@ -21,11 +21,16 @@
  
  #define KEXEC_ARCH KEXEC_ARCH_RISCV
  
+extern void riscv_crash_save_regs(struct pt_regs *newregs);

+
  static inline void
  crash_setup_regs(struct pt_regs *newregs,
 struct pt_regs *oldregs)
  {
-   /* Dummy implementation for now */
+   if (oldregs)
+   memcpy(newregs, oldregs, sizeof(struct pt_regs));
+   else
+   riscv_crash_save_regs(newregs);
  }
  
  
@@ -38,10 +43,12 @@ struct kimage_arch {

  const extern unsigned char riscv_kexec_relocate[];
  const extern unsigned int riscv_kexec_relocate_size;
  
-typedef void (*riscv_kexec_do_relocate)(unsigned long first_ind_entry,

-   unsigned long jump_addr,
-   unsigned long fdt_addr,
-   unsigned long hartid,
-   unsigned long va_pa_off);
+typedef void (*riscv_kexec_method)(unsigned long first_ind_entry,
+  unsigned long jump_addr,
+  unsigned long fdt_addr,
+  unsigned long hartid,
+  unsigned long va_pa_off);
+
+extern riscv_kexec_method riscv_kexec_norelocate;
  
  #endif

diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index c2594018c..07f676ad3 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -58,7 +58,7 @@ obj-$(CONFIG_SMP) += cpu_ops_sbi.o
  endif
  obj-$(CONFIG_HOTPLUG_CPU) += cpu-hotplug.o
  obj-$(CONFIG_KGDB)+= kgdb.o
-obj-${CONFIG_KEXEC}+= kexec_relocate.o machine_kexec.o
+obj-${CONFIG_KEXEC}+= kexec_relocate.o crash_save_regs.o 
machine_kexec.o
  
  obj-$(CONFIG_JUMP_LABEL)	+= jump_label.o
  
diff --git a/arch/riscv/kernel/crash_save_regs.S b/arch/riscv/kernel/crash_save_regs.S

new file mode 100644
index 0..7832fb763
--- /dev/null
+++ b/arch/riscv/kernel/crash_save_regs.S
@@ -0,0 +1,56 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2020 FORTH-ICS/CARV
+ *  Nick Kossifidis 
+ */
+
+#include  /* For RISCV_* and REG_* macros */
+#include  /* For CSR_* macros */
+#include  /* For offsets on pt_regs */
+#include/* For SYM_* macros */
+
+.section ".text"
+SYM_CODE_START(riscv_crash_save_regs)
+   REG_S ra,  PT_RA(a0)/* x1 */
+   REG_S sp,  PT_SP(a0)/* x2 */
+   REG_S gp,  PT_GP(a0)/* x3 */
+   REG_S tp,  PT_TP(a0)/* x4 */
+   REG_S t0,  PT_T0(a0)/* x5 */
+   REG_S t1,  PT_T1(a0)/* x6 */
+   REG_S t2,  

Re: [PATCH v6] RISC-V: enable XIP

2021-04-06 Thread Alex Ghiti



Le 4/6/21 à 3:54 AM, Vitaly Wool a écrit :

On Tue, Apr 6, 2021 at 8:47 AM Alex Ghiti  wrote:


Hi Vitaly,

Le 4/5/21 à 4:34 AM, Vitaly Wool a écrit :

On Sun, Apr 4, 2021 at 10:39 AM Vitaly Wool  wrote:


On Sat, Apr 3, 2021 at 12:00 PM Alex Ghiti  wrote:


Hi Vitaly,

Le 4/1/21 à 7:10 AM, Alex Ghiti a écrit :

Le 4/1/21 à 4:52 AM, Vitaly Wool a écrit :

Hi Alex,

On Thu, Apr 1, 2021 at 10:11 AM Alex Ghiti  wrote:


Hi,

Le 3/30/21 à 4:04 PM, Alex Ghiti a écrit :

Le 3/30/21 à 3:33 PM, Palmer Dabbelt a écrit :

On Tue, 30 Mar 2021 11:39:10 PDT (-0700), a...@ghiti.fr wrote:



Le 3/30/21 à 2:26 AM, Vitaly Wool a écrit :

On Tue, Mar 30, 2021 at 8:23 AM Palmer Dabbelt
 wrote:


On Sun, 21 Mar 2021 17:12:15 PDT (-0700), vitaly.w...@konsulko.com
wrote:

Introduce XIP (eXecute In Place) support for RISC-V platforms.
It allows code to be executed directly from non-volatile storage
directly addressable by the CPU, such as QSPI NOR flash which can
be found on many RISC-V platforms. This makes way for significant
optimization of RAM footprint. The XIP kernel is not compressed
since it has to run directly from flash, so it will occupy more
space on the non-volatile storage. The physical flash address used
to link the kernel object files and for storing it has to be known
at compile time and is represented by a Kconfig option.

XIP on RISC-V will for the time being only work on MMU-enabled
kernels.

Signed-off-by: Vitaly Wool 

---

Changes in v2:
- dedicated macro for XIP address fixup when MMU is not enabled
yet
  o both for 32-bit and 64-bit RISC-V
- SP is explicitly set to a safe place in RAM before
__copy_data call
- removed redundant alignment requirements in vmlinux-xip.lds.S
- changed long -> uintptr_t typecast in __XIP_FIXUP macro.
Changes in v3:
- rebased against latest for-next
- XIP address fixup macro now takes an argument
- SMP related fixes
Changes in v4:
- rebased against the current for-next
- less #ifdef's in C/ASM code
- dedicated XIP_FIXUP_OFFSET assembler macro in head.S
- C-specific definitions moved into #ifndef __ASSEMBLY__
- Fixed multi-core boot
Changes in v5:
- fixed build error for non-XIP kernels
Changes in v6:
- XIP_PHYS_RAM_BASE config option renamed to PHYS_RAM_BASE
- added PHYS_RAM_BASE_FIXED config flag to allow usage of
  PHYS_RAM_BASE in non-XIP configurations if needed
- XIP_FIXUP macro rewritten with a tempoarary variable to avoid
side
  effects
- fixed crash for non-XIP kernels that don't use built-in DTB


So v5 landed on for-next, which generally means it's best to avoid
re-spinning the patch and instead send along fixups.  That said,
the v5
is causing some testing failures for me.

I'm going to drop the v5 for now as I don't have time to test this
tonight.  I'll try and take a look soon, as it will conflict with
Alex's
patches.


I can come up with the incremental patch instead pretty much
straight
away if that works better.

~Vitaly


 arch/riscv/Kconfig  |  49 ++-
 arch/riscv/Makefile |   8 +-
 arch/riscv/boot/Makefile|  13 +++
 arch/riscv/include/asm/pgtable.h|  65 --
 arch/riscv/kernel/cpu_ops_sbi.c |  11 ++-
 arch/riscv/kernel/head.S|  49 ++-
 arch/riscv/kernel/head.h|   3 +
 arch/riscv/kernel/setup.c   |   8 +-
 arch/riscv/kernel/vmlinux-xip.lds.S | 132

 arch/riscv/kernel/vmlinux.lds.S |   6 ++
 arch/riscv/mm/init.c| 100 +++--
 11 files changed, 426 insertions(+), 18 deletions(-)
 create mode 100644 arch/riscv/kernel/vmlinux-xip.lds.S

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 8ea60a0a19ae..bd6f82240c34 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -441,7 +441,7 @@ config EFI_STUB

 config EFI
  bool "UEFI runtime support"
- depends on OF
+ depends on OF && !XIP_KERNEL
  select LIBFDT
  select UCS2_STRING
  select EFI_PARAMS_FROM_FDT
@@ -465,11 +465,56 @@ config STACKPROTECTOR_PER_TASK
  def_bool y
  depends on STACKPROTECTOR && CC_HAVE_STACKPROTECTOR_TLS

+config PHYS_RAM_BASE_FIXED
+ bool "Explicitly specified physical RAM address"
+ default n
+
+config PHYS_RAM_BASE
+ hex "Platform Physical RAM address"
+ depends on PHYS_RAM_BASE_FIXED
+ default "0x8000"
+ help
+   This is the physical address of RAM in the system. It has
to be
+   explicitly specified to run early relocations of
read-write data
+   from flash to RAM.
+
+config XIP_KERNEL
+ bool "Kernel Execute-In-Place from ROM"
+ depends on MMU
+ select PHYS_RAM_BASE_FIXED
+ help
+   Execute-In-Place allows the kernel to run from
non-volatile storage
+   directly addressable by the CPU, such as NOR flash. This
saves RAM
+   space since the text secti

Re: [PATCH] driver: of: Properly truncate command line if too long

2021-04-06 Thread Alex Ghiti

Le 4/6/21 à 9:40 AM, Rob Herring a écrit :

On Sat, Apr 3, 2021 at 7:09 AM Alex Ghiti  wrote:


Hi,

Le 3/16/21 à 3:38 PM, Alexandre Ghiti a écrit :

In case the command line given by the user is too long, warn about it
and truncate it to the last full argument.

This is what efi already does in commit 80b1bfe1cb2f ("efi/libstub:
Don't parse overlong command lines").

Reported-by: Dmitry Vyukov 
Signed-off-by: Alexandre Ghiti 
---
   drivers/of/fdt.c | 21 -
   1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index dcc1dd96911a..de4c6f9bac39 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -25,6 +25,7 @@
   #include 
   #include 
   #include 
+#include 

   #include   /* for COMMAND_LINE_SIZE */
   #include 
@@ -1050,9 +1051,27 @@ int __init early_init_dt_scan_chosen(unsigned long node, 
const char *uname,

   /* Retrieve command line */
   p = of_get_flat_dt_prop(node, "bootargs", );
- if (p != NULL && l > 0)
+ if (p != NULL && l > 0) {
   strlcpy(data, p, min(l, COMMAND_LINE_SIZE));

+ /*
+  * If the given command line size is larger than
+  * COMMAND_LINE_SIZE, truncate it to the last complete
+  * parameter.
+  */
+ if (l > COMMAND_LINE_SIZE) {
+ char *cmd_p = (char *)data + COMMAND_LINE_SIZE - 1;
+
+ while (!isspace(*cmd_p))
+ cmd_p--;
+
+ *cmd_p = '\0';
+
+ pr_err("Command line is too long: truncated to %d 
bytes\n",
+(int)(cmd_p - (char *)data + 1));
+ }
+ }
+
   /*
* CONFIG_CMDLINE is meant to be a default in case nothing else
* managed to set the command line, unless CONFIG_CMDLINE_FORCE



Any thought about that ?


It looks fine to me, but this will need to be adapted to the generic
command line support[1][2] when that is merged. So I've been waiting
to see if that's going to happen this cycle.


Ok I'll take a look then, thanks.

Alex



Rob

[1] 
https://lore.kernel.org/lkml/cover.1616765869.git.christophe.le...@csgroup.eu/
[2] 
https://lore.kernel.org/lkml/41021d66db2ab427c14255d2a24bb4517c8b58fd.1617126961.git.danie...@cisco.com/



Re: [PATCH] PCI: merge slot and bus reset implementations

2021-04-06 Thread Alex Williamson
On Sun, 4 Apr 2021 11:04:32 +0300
Leon Romanovsky  wrote:

> On Thu, Apr 01, 2021 at 10:56:16AM -0600, Alex Williamson wrote:
> > On Thu, 1 Apr 2021 15:27:37 +0300
> > Leon Romanovsky  wrote:
> >   
> > > On Thu, Apr 01, 2021 at 05:37:16AM +, Raphael Norwitz wrote:  
> > > > Slot resets are bus resets with additional logic to prevent a device
> > > > from being removed during the reset. Currently slot and bus resets have
> > > > separate implementations in pci.c, complicating higher level logic. As
> > > > discussed on the mailing list, they should be combined into a generic
> > > > function which performs an SBR. This change adds a function,
> > > > pci_reset_bus_function(), which first attempts a slot reset and then
> > > > attempts a bus reset if -ENOTTY is returned, such that there is now a
> > > > single device agnostic function to perform an SBR.
> > > > 
> > > > This new function is also needed to add SBR reset quirks and therefore
> > > > is exposed in pci.h.
> > > > 
> > > > Link: https://lkml.org/lkml/2021/3/23/911
> > > > 
> > > > Suggested-by: Alex Williamson 
> > > > Signed-off-by: Amey Narkhede 
> > > > Signed-off-by: Raphael Norwitz 
> > > > ---
> > > >  drivers/pci/pci.c   | 17 +
> > > >  include/linux/pci.h |  1 +
> > > >  2 files changed, 10 insertions(+), 8 deletions(-)
> > > > 
> > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > > index 16a17215f633..12a91af2ade4 100644
> > > > --- a/drivers/pci/pci.c
> > > > +++ b/drivers/pci/pci.c
> > > > @@ -4982,6 +4982,13 @@ static int pci_dev_reset_slot_function(struct 
> > > > pci_dev *dev, int probe)
> > > > return pci_reset_hotplug_slot(dev->slot->hotplug, probe);
> > > >  }
> > > >  
> > > > +int pci_reset_bus_function(struct pci_dev *dev, int probe)
> > > > +{
> > > > +   int rc = pci_dev_reset_slot_function(dev, probe);
> > > > +
> > > > +   return (rc == -ENOTTY) ? pci_parent_bus_reset(dev, probe) : rc; 
> > > >
> > > 
> > > The previous coding style is preferable one in the Linux kernel.
> > > int rc = pci_dev_reset_slot_function(dev, probe);
> > > if (rc != -ENOTTY)
> > >   return rc;
> > > return pci_parent_bus_reset(dev, probe);  
> > 
> > 
> > That'd be news to me, do you have a reference?  I've never seen
> > complaints for ternaries previously.  Thanks,  
> 
> The complaint is not to ternaries, but to the function call as one of
> the parameters, that makes it harder to read.

Sorry, I don't find a function call as a parameter to a ternary to be
extraordinary, nor do I find it to be a discouraged usage model within
the kernel.  This seems like a pretty low bar for hard to read code.



Re: [PATCH v6] RISC-V: enable XIP

2021-04-06 Thread Alex Ghiti

Hi Vitaly,

Le 4/5/21 à 4:34 AM, Vitaly Wool a écrit :

On Sun, Apr 4, 2021 at 10:39 AM Vitaly Wool  wrote:


On Sat, Apr 3, 2021 at 12:00 PM Alex Ghiti  wrote:


Hi Vitaly,

Le 4/1/21 à 7:10 AM, Alex Ghiti a écrit :

Le 4/1/21 à 4:52 AM, Vitaly Wool a écrit :

Hi Alex,

On Thu, Apr 1, 2021 at 10:11 AM Alex Ghiti  wrote:


Hi,

Le 3/30/21 à 4:04 PM, Alex Ghiti a écrit :

Le 3/30/21 à 3:33 PM, Palmer Dabbelt a écrit :

On Tue, 30 Mar 2021 11:39:10 PDT (-0700), a...@ghiti.fr wrote:



Le 3/30/21 à 2:26 AM, Vitaly Wool a écrit :

On Tue, Mar 30, 2021 at 8:23 AM Palmer Dabbelt
 wrote:


On Sun, 21 Mar 2021 17:12:15 PDT (-0700), vitaly.w...@konsulko.com
wrote:

Introduce XIP (eXecute In Place) support for RISC-V platforms.
It allows code to be executed directly from non-volatile storage
directly addressable by the CPU, such as QSPI NOR flash which can
be found on many RISC-V platforms. This makes way for significant
optimization of RAM footprint. The XIP kernel is not compressed
since it has to run directly from flash, so it will occupy more
space on the non-volatile storage. The physical flash address used
to link the kernel object files and for storing it has to be known
at compile time and is represented by a Kconfig option.

XIP on RISC-V will for the time being only work on MMU-enabled
kernels.

Signed-off-by: Vitaly Wool 

---

Changes in v2:
- dedicated macro for XIP address fixup when MMU is not enabled
yet
 o both for 32-bit and 64-bit RISC-V
- SP is explicitly set to a safe place in RAM before
__copy_data call
- removed redundant alignment requirements in vmlinux-xip.lds.S
- changed long -> uintptr_t typecast in __XIP_FIXUP macro.
Changes in v3:
- rebased against latest for-next
- XIP address fixup macro now takes an argument
- SMP related fixes
Changes in v4:
- rebased against the current for-next
- less #ifdef's in C/ASM code
- dedicated XIP_FIXUP_OFFSET assembler macro in head.S
- C-specific definitions moved into #ifndef __ASSEMBLY__
- Fixed multi-core boot
Changes in v5:
- fixed build error for non-XIP kernels
Changes in v6:
- XIP_PHYS_RAM_BASE config option renamed to PHYS_RAM_BASE
- added PHYS_RAM_BASE_FIXED config flag to allow usage of
 PHYS_RAM_BASE in non-XIP configurations if needed
- XIP_FIXUP macro rewritten with a tempoarary variable to avoid
side
 effects
- fixed crash for non-XIP kernels that don't use built-in DTB


So v5 landed on for-next, which generally means it's best to avoid
re-spinning the patch and instead send along fixups.  That said,
the v5
is causing some testing failures for me.

I'm going to drop the v5 for now as I don't have time to test this
tonight.  I'll try and take a look soon, as it will conflict with
Alex's
patches.


I can come up with the incremental patch instead pretty much
straight
away if that works better.

~Vitaly


arch/riscv/Kconfig  |  49 ++-
arch/riscv/Makefile |   8 +-
arch/riscv/boot/Makefile|  13 +++
arch/riscv/include/asm/pgtable.h|  65 --
arch/riscv/kernel/cpu_ops_sbi.c |  11 ++-
arch/riscv/kernel/head.S|  49 ++-
arch/riscv/kernel/head.h|   3 +
arch/riscv/kernel/setup.c   |   8 +-
arch/riscv/kernel/vmlinux-xip.lds.S | 132

arch/riscv/kernel/vmlinux.lds.S |   6 ++
arch/riscv/mm/init.c| 100 +++--
11 files changed, 426 insertions(+), 18 deletions(-)
create mode 100644 arch/riscv/kernel/vmlinux-xip.lds.S

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 8ea60a0a19ae..bd6f82240c34 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -441,7 +441,7 @@ config EFI_STUB

config EFI
 bool "UEFI runtime support"
- depends on OF
+ depends on OF && !XIP_KERNEL
 select LIBFDT
 select UCS2_STRING
 select EFI_PARAMS_FROM_FDT
@@ -465,11 +465,56 @@ config STACKPROTECTOR_PER_TASK
 def_bool y
 depends on STACKPROTECTOR && CC_HAVE_STACKPROTECTOR_TLS

+config PHYS_RAM_BASE_FIXED
+ bool "Explicitly specified physical RAM address"
+ default n
+
+config PHYS_RAM_BASE
+ hex "Platform Physical RAM address"
+ depends on PHYS_RAM_BASE_FIXED
+ default "0x8000"
+ help
+   This is the physical address of RAM in the system. It has
to be
+   explicitly specified to run early relocations of
read-write data
+   from flash to RAM.
+
+config XIP_KERNEL
+ bool "Kernel Execute-In-Place from ROM"
+ depends on MMU
+ select PHYS_RAM_BASE_FIXED
+ help
+   Execute-In-Place allows the kernel to run from
non-volatile storage
+   directly addressable by the CPU, such as NOR flash. This
saves RAM
+   space since the text section of the kernel is not loaded
from flash
+   to RAM.  Read-write sections, such as the data section and
stack,
+  

Re: [PATCH] driver: of: Properly truncate command line if too long

2021-04-03 Thread Alex Ghiti

Hi,

Le 3/16/21 à 3:38 PM, Alexandre Ghiti a écrit :

In case the command line given by the user is too long, warn about it
and truncate it to the last full argument.

This is what efi already does in commit 80b1bfe1cb2f ("efi/libstub:
Don't parse overlong command lines").

Reported-by: Dmitry Vyukov 
Signed-off-by: Alexandre Ghiti 
---
  drivers/of/fdt.c | 21 -
  1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index dcc1dd96911a..de4c6f9bac39 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -25,6 +25,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include   /* for COMMAND_LINE_SIZE */

  #include 
@@ -1050,9 +1051,27 @@ int __init early_init_dt_scan_chosen(unsigned long node, 
const char *uname,
  
  	/* Retrieve command line */

p = of_get_flat_dt_prop(node, "bootargs", );
-   if (p != NULL && l > 0)
+   if (p != NULL && l > 0) {
strlcpy(data, p, min(l, COMMAND_LINE_SIZE));
  
+		/*

+* If the given command line size is larger than
+* COMMAND_LINE_SIZE, truncate it to the last complete
+* parameter.
+*/
+   if (l > COMMAND_LINE_SIZE) {
+   char *cmd_p = (char *)data + COMMAND_LINE_SIZE - 1;
+
+   while (!isspace(*cmd_p))
+   cmd_p--;
+
+   *cmd_p = '\0';
+
+   pr_err("Command line is too long: truncated to %d 
bytes\n",
+  (int)(cmd_p - (char *)data + 1));
+   }
+   }
+
/*
 * CONFIG_CMDLINE is meant to be a default in case nothing else
 * managed to set the command line, unless CONFIG_CMDLINE_FORCE



Any thought about that ?

Thanks,

Alex


Re: [PATCH v6] RISC-V: enable XIP

2021-04-03 Thread Alex Ghiti

Hi Vitaly,

Le 4/1/21 à 7:10 AM, Alex Ghiti a écrit :

Le 4/1/21 à 4:52 AM, Vitaly Wool a écrit :

Hi Alex,

On Thu, Apr 1, 2021 at 10:11 AM Alex Ghiti  wrote:


Hi,

Le 3/30/21 à 4:04 PM, Alex Ghiti a écrit :

Le 3/30/21 à 3:33 PM, Palmer Dabbelt a écrit :

On Tue, 30 Mar 2021 11:39:10 PDT (-0700), a...@ghiti.fr wrote:



Le 3/30/21 à 2:26 AM, Vitaly Wool a écrit :

On Tue, Mar 30, 2021 at 8:23 AM Palmer Dabbelt
 wrote:


On Sun, 21 Mar 2021 17:12:15 PDT (-0700), vitaly.w...@konsulko.com
wrote:

Introduce XIP (eXecute In Place) support for RISC-V platforms.
It allows code to be executed directly from non-volatile storage
directly addressable by the CPU, such as QSPI NOR flash which can
be found on many RISC-V platforms. This makes way for significant
optimization of RAM footprint. The XIP kernel is not compressed
since it has to run directly from flash, so it will occupy more
space on the non-volatile storage. The physical flash address used
to link the kernel object files and for storing it has to be known
at compile time and is represented by a Kconfig option.

XIP on RISC-V will for the time being only work on MMU-enabled
kernels.

Signed-off-by: Vitaly Wool 

---

Changes in v2:
- dedicated macro for XIP address fixup when MMU is not enabled 
yet

    o both for 32-bit and 64-bit RISC-V
- SP is explicitly set to a safe place in RAM before 
__copy_data call

- removed redundant alignment requirements in vmlinux-xip.lds.S
- changed long -> uintptr_t typecast in __XIP_FIXUP macro.
Changes in v3:
- rebased against latest for-next
- XIP address fixup macro now takes an argument
- SMP related fixes
Changes in v4:
- rebased against the current for-next
- less #ifdef's in C/ASM code
- dedicated XIP_FIXUP_OFFSET assembler macro in head.S
- C-specific definitions moved into #ifndef __ASSEMBLY__
- Fixed multi-core boot
Changes in v5:
- fixed build error for non-XIP kernels
Changes in v6:
- XIP_PHYS_RAM_BASE config option renamed to PHYS_RAM_BASE
- added PHYS_RAM_BASE_FIXED config flag to allow usage of
    PHYS_RAM_BASE in non-XIP configurations if needed
- XIP_FIXUP macro rewritten with a tempoarary variable to avoid 
side

    effects
- fixed crash for non-XIP kernels that don't use built-in DTB


So v5 landed on for-next, which generally means it's best to avoid
re-spinning the patch and instead send along fixups.  That said,
the v5
is causing some testing failures for me.

I'm going to drop the v5 for now as I don't have time to test this
tonight.  I'll try and take a look soon, as it will conflict with
Alex's
patches.


I can come up with the incremental patch instead pretty much 
straight

away if that works better.

~Vitaly


   arch/riscv/Kconfig  |  49 ++-
   arch/riscv/Makefile |   8 +-
   arch/riscv/boot/Makefile    |  13 +++
   arch/riscv/include/asm/pgtable.h    |  65 --
   arch/riscv/kernel/cpu_ops_sbi.c |  11 ++-
   arch/riscv/kernel/head.S    |  49 ++-
   arch/riscv/kernel/head.h    |   3 +
   arch/riscv/kernel/setup.c   |   8 +-
   arch/riscv/kernel/vmlinux-xip.lds.S | 132

   arch/riscv/kernel/vmlinux.lds.S |   6 ++
   arch/riscv/mm/init.c    | 100 +++--
   11 files changed, 426 insertions(+), 18 deletions(-)
   create mode 100644 arch/riscv/kernel/vmlinux-xip.lds.S

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 8ea60a0a19ae..bd6f82240c34 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -441,7 +441,7 @@ config EFI_STUB

   config EFI
    bool "UEFI runtime support"
- depends on OF
+ depends on OF && !XIP_KERNEL
    select LIBFDT
    select UCS2_STRING
    select EFI_PARAMS_FROM_FDT
@@ -465,11 +465,56 @@ config STACKPROTECTOR_PER_TASK
    def_bool y
    depends on STACKPROTECTOR && CC_HAVE_STACKPROTECTOR_TLS

+config PHYS_RAM_BASE_FIXED
+ bool "Explicitly specified physical RAM address"
+ default n
+
+config PHYS_RAM_BASE
+ hex "Platform Physical RAM address"
+ depends on PHYS_RAM_BASE_FIXED
+ default "0x8000"
+ help
+   This is the physical address of RAM in the system. It has
to be
+   explicitly specified to run early relocations of
read-write data
+   from flash to RAM.
+
+config XIP_KERNEL
+ bool "Kernel Execute-In-Place from ROM"
+ depends on MMU
+ select PHYS_RAM_BASE_FIXED
+ help
+   Execute-In-Place allows the kernel to run from
non-volatile storage
+   directly addressable by the CPU, such as NOR flash. This
saves RAM
+   space since the text section of the kernel is not loaded
from flash
+   to RAM.  Read-write sections, such as the data section and
stack,
+   are still copied to RAM.  The XIP kernel is not compressed
since
+   it has to run directly from flash, so it will take more
space to
+   store it.  The flash 

  1   2   3   4   5   6   7   8   9   10   >