date:20240412

[PATCH net-next v7] virtio_net: Support RX hash XDP hint

2024-04-12 Thread Liang Chen

The RSS hash report is a feature that's part of the virtio specification.
Currently, virtio backends like qemu, vdpa (mlx5), and potentially vhost
(still a work in progress as per [1]) support this feature. While the
capability to obtain the RSS hash has been enabled in the normal path,
it's currently missing in the XDP path. Therefore, we are introducing
XDP hints through kfuncs to allow XDP programs to access the RSS hash.

1.
https://lore.kernel.org/all/20231015141644.260646-1-akihiko.od...@daynix.com/#r

Signed-off-by: Liang Chen 
---
  Changes from v6:
- fix a coding style issue
  Changes from v5:
- Preservation of the hash value has been dropped, following the conclusion
  from discussions in V3 reviews. The virtio_net driver doesn't
  accessing/using the virtio_net_hdr after the XDP program execution, so
  nothing tragic should happen. As to the xdp program, if it smashes the
  entry in virtio header, it is likely buggy anyways. Additionally, looking
  up the Intel IGC driver,  it also does not bother with this particular
  aspect.
---
 drivers/net/virtio_net.c | 55 
 1 file changed, 55 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index c22d1118a133..2a1892b7b8d3 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -4621,6 +4621,60 @@ static void virtnet_set_big_packets(struct virtnet_info 
*vi, const int mtu)
}
 }
 
+static int virtnet_xdp_rx_hash(const struct xdp_md *_ctx, u32 *hash,
+  enum xdp_rss_hash_type *rss_type)
+{
+   const struct xdp_buff *xdp = (void *)_ctx;
+   struct virtio_net_hdr_v1_hash *hdr_hash;
+   struct virtnet_info *vi;
+
+   if (!(xdp->rxq->dev->features & NETIF_F_RXHASH))
+   return -ENODATA;
+
+   vi = netdev_priv(xdp->rxq->dev);
+   hdr_hash = (struct virtio_net_hdr_v1_hash *)(xdp->data - vi->hdr_len);
+
+   switch (__le16_to_cpu(hdr_hash->hash_report)) {
+   case VIRTIO_NET_HASH_REPORT_TCPv4:
+   *rss_type = XDP_RSS_TYPE_L4_IPV4_TCP;
+   break;
+   case VIRTIO_NET_HASH_REPORT_UDPv4:
+   *rss_type = XDP_RSS_TYPE_L4_IPV4_UDP;
+   break;
+   case VIRTIO_NET_HASH_REPORT_TCPv6:
+   *rss_type = XDP_RSS_TYPE_L4_IPV6_TCP;
+   break;
+   case VIRTIO_NET_HASH_REPORT_UDPv6:
+   *rss_type = XDP_RSS_TYPE_L4_IPV6_UDP;
+   break;
+   case VIRTIO_NET_HASH_REPORT_TCPv6_EX:
+   *rss_type = XDP_RSS_TYPE_L4_IPV6_TCP_EX;
+   break;
+   case VIRTIO_NET_HASH_REPORT_UDPv6_EX:
+   *rss_type = XDP_RSS_TYPE_L4_IPV6_UDP_EX;
+   break;
+   case VIRTIO_NET_HASH_REPORT_IPv4:
+   *rss_type = XDP_RSS_TYPE_L3_IPV4;
+   break;
+   case VIRTIO_NET_HASH_REPORT_IPv6:
+   *rss_type = XDP_RSS_TYPE_L3_IPV6;
+   break;
+   case VIRTIO_NET_HASH_REPORT_IPv6_EX:
+   *rss_type = XDP_RSS_TYPE_L3_IPV6_EX;
+   break;
+   case VIRTIO_NET_HASH_REPORT_NONE:
+   default:
+   *rss_type = XDP_RSS_TYPE_NONE;
+   }
+
+   *hash = __le32_to_cpu(hdr_hash->hash_value);
+   return 0;
+}
+
+static const struct xdp_metadata_ops virtnet_xdp_metadata_ops = {
+   .xmo_rx_hash= virtnet_xdp_rx_hash,
+};
+
 static int virtnet_probe(struct virtio_device *vdev)
 {
int i, err = -ENOMEM;
@@ -4747,6 +4801,7 @@ static int virtnet_probe(struct virtio_device *vdev)
  VIRTIO_NET_RSS_HASH_TYPE_UDP_EX);
 
dev->hw_features |= NETIF_F_RXHASH;
+   dev->xdp_metadata_ops = _xdp_metadata_ops;
}
 
if (vi->has_rss_hash_report)
-- 
2.40.1

Re: [PATCH net-next v6] virtio_net: Support RX hash XDP hint

2024-04-12 Thread Liang Chen

On Sat, Apr 13, 2024 at 10:11 AM Jakub Kicinski  wrote:
>
> On Thu, 11 Apr 2024 16:52:16 +0800 Liang Chen wrote:
> > + switch (__le16_to_cpu(hdr_hash->hash_report)) {
> > + case VIRTIO_NET_HASH_REPORT_TCPv4:
>
> Please indent things according to the kernel coding style.
>

Sure. Thanks!

> Checkpatch finds 2 problems in this change.
> --
> pw-bot: cr

Re: [PATCH net-next v6] virtio_net: Support RX hash XDP hint

2024-04-12 Thread Jakub Kicinski

On Thu, 11 Apr 2024 16:52:16 +0800 Liang Chen wrote:
> + switch (__le16_to_cpu(hdr_hash->hash_report)) {
> + case VIRTIO_NET_HASH_REPORT_TCPv4:

Please indent things according to the kernel coding style.

Checkpatch finds 2 problems in this change.
-- 
pw-bot: cr

Re: [PATCH v3 5/7] KVM: x86: Participate in bitmap-based PTE aging

2024-04-12 Thread David Matlack

On 2024-04-01 11:29 PM, James Houghton wrote:
> Only handle the TDP MMU case for now. In other cases, if a bitmap was
> not provided, fallback to the slowpath that takes mmu_lock, or, if a
> bitmap was provided, inform the caller that the bitmap is unreliable.

I think this patch will trigger a lockdep assert in

  kvm_tdp_mmu_age_gfn_range
kvm_tdp_mmu_handle_gfn
  for_each_tdp_mmu_root
__for_each_tdp_mmu_root
  kvm_lockdep_assert_mmu_lock_held

... because it walks tdp_mmu_roots without holding mmu_lock.

Yu's patch[1] added a lockless walk to the TDP MMU. We'd need something
similar here and also update the comment above tdp_mmu_roots describing
how tdp_mmu_roots can be read locklessly.

[1] https://lore.kernel.org/kvmarm/zitx64bbx5vdj...@google.com/

Re: [PATCH v3 3/7] KVM: Add basic bitmap support into kvm_mmu_notifier_test/clear_young

2024-04-12 Thread David Matlack

On 2024-04-01 11:29 PM, James Houghton wrote:
> Add kvm_arch_prepare_bitmap_age() for architectures to indiciate that
> they support bitmap-based aging in kvm_mmu_notifier_test_clear_young()
> and that they do not need KVM to grab the MMU lock for writing. This
> function allows architectures to do other locking or other preparatory
> work that it needs.

There's a lot going on here. I know it's extra work but I think the
series would be easier to understand and simplify if you introduced the
KVM support for lockless test/clear_young() first, and then introduce
support for the bitmap-based look-around.

Specifically:

 1. Make all test/clear_young() notifiers lockless. i.e. Move the
mmu_lock into the architecture-specific code (kvm_age_gfn() and
kvm_test_age_gfn()).

 2. Convert KVM/x86's kvm_{test,}_age_gfn() to be lockless for the TDP
MMU.

 4. Convert KVM/arm64's kvm_{test,}_age_gfn() to hold the mmu_lock in
read-mode.

 5. Add bitmap-based look-around support to KVM/x86 and KVM/arm64
(probably 2-3 patches).

> 
> If an architecture does not implement kvm_arch_prepare_bitmap_age() or
> is unable to do bitmap-based aging at runtime (and marks the bitmap as
> unreliable):
>  1. If a bitmap was provided, we inform the caller that the bitmap is
> unreliable (MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE).
>  2. If a bitmap was not provided, fall back to the old logic.
> 
> Also add logic for architectures to easily use the provided bitmap if
> they are able. The expectation is that the architecture's implementation
> of kvm_gfn_test_age() will use kvm_gfn_record_young(), and
> kvm_gfn_age() will use kvm_gfn_should_age().
> 
> Suggested-by: Yu Zhao 
> Signed-off-by: James Houghton 
> ---
>  include/linux/kvm_host.h | 60 ++
>  virt/kvm/kvm_main.c  | 92 +---
>  2 files changed, 127 insertions(+), 25 deletions(-)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 1800d03a06a9..5862fd7b5f9b 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1992,6 +1992,26 @@ extern const struct _kvm_stats_desc 
> kvm_vm_stats_desc[];
>  extern const struct kvm_stats_header kvm_vcpu_stats_header;
>  extern const struct _kvm_stats_desc kvm_vcpu_stats_desc[];
>  
> +/*
> + * Architectures that support using bitmaps for kvm_age_gfn() and
> + * kvm_test_age_gfn should return true for kvm_arch_prepare_bitmap_age()
> + * and do any work they need to prepare. The subsequent walk will not
> + * automatically grab the KVM MMU lock, so some architectures may opt
> + * to grab it.
> + *
> + * If true is returned, a subsequent call to kvm_arch_finish_bitmap_age() is
> + * guaranteed.
> + */
> +#ifndef kvm_arch_prepare_bitmap_age
> +static inline bool kvm_arch_prepare_bitmap_age(struct mmu_notifier *mn)

I find the name of these architecture callbacks misleading/confusing.
The lockless path is used even when a bitmap is not provided. i.e.
bitmap can be NULL in between kvm_arch_prepare/finish_bitmap_age().

> +{
> + return false;
> +}
> +#endif
> +#ifndef kvm_arch_finish_bitmap_age
> +static inline void kvm_arch_finish_bitmap_age(struct mmu_notifier *mn) {}
> +#endif

kvm_arch_finish_bitmap_age() seems unnecessary. I think the KVM/arm64
code could acquire/release the mmu_lock in read-mode in
kvm_test_age_gfn() and kvm_age_gfn() right?

> +
>  #ifdef CONFIG_KVM_GENERIC_MMU_NOTIFIER
>  static inline struct kvm *mmu_notifier_to_kvm(struct mmu_notifier *mn)
>  {
> @@ -2076,9 +2096,16 @@ static inline bool 
> mmu_invalidate_retry_gfn_unsafe(struct kvm *kvm,
>   return READ_ONCE(kvm->mmu_invalidate_seq) != mmu_seq;
>  }
>  
> +struct test_clear_young_metadata {
> + unsigned long *bitmap;
> + unsigned long bitmap_offset_end;

bitmap_offset_end is unused.

> + unsigned long end;
> + bool unreliable;
> +};
>  union kvm_mmu_notifier_arg {
>   pte_t pte;
>   unsigned long attributes;
> + struct test_clear_young_metadata *metadata;

nit: Maybe s/metadata/test_clear_young/ ?

>  };
>  
>  struct kvm_gfn_range {
> @@ -2087,11 +2114,44 @@ struct kvm_gfn_range {
>   gfn_t end;
>   union kvm_mmu_notifier_arg arg;
>   bool may_block;
> + bool lockless;

Please document this as it's somewhat subtle. A reader might think this
implies the entire operation runs without taking the mmu_lock.

>  };
>  bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
>  bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
>  bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
>  bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
> +
> +static inline void kvm_age_set_unreliable(struct kvm_gfn_range *range)
> +{
> + struct test_clear_young_metadata *args = range->arg.metadata;
> +
> + args->unreliable = true;
> +}
> +static inline unsigned long kvm_young_bitmap_offset(struct kvm_gfn_range 
> *range,
> +

Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback

2024-04-12 Thread David Hildenbrand


On 11.04.24 18:55, Paolo Bonzini wrote:

On Mon, Apr 8, 2024 at 3:56 PM Peter Xu  wrote:

Paolo,

I may miss a bunch of details here (as I still remember some change_pte
patches previously on the list..), however not sure whether we considered
enable it?  Asked because I remember Andrea used to have a custom tree
maintaining that part:

https://github.com/aagit/aa/commit/c761078df7a77d13ddfaeebe56a0f4bc128b1968


The patch enables it only for KSM, so it would still require a bunch
of cleanups, for example I also would still use set_pte_at() in all
the places that are not KSM. This would at least fix the issue with
the poor documentation of where to use set_pte_at_notify() vs
set_pte_at().

With regard to the implementation, I like the idea of disabling the
invalidation on the MMU notifier side, but I would rather have
MMU_NOTIFIER_CHANGE_PTE as a separate field in the range instead of
overloading the event field.


Maybe it can't be enabled for some reason that I overlooked in the current
tree, or we just decided to not to?


I have just learnt about the patch, nobody had ever mentioned it even
though it's almost 2 years old... It's a lot of code though and no one


I assume Andrea used it on his tree where he also has a version of 
"randprotect" (even included in that commit subject) to mitigate a KSM 
security issue that was reported by some security researchers [1] a 
while ago. From what I recall, the industry did not end up caring about 
that security issue that much.


IIUC, with "randprotect" we get a lot more R/O protection even when not 
de-duplicating a page -- thus the name. Likely, the reporter mentioned 
in the commit is a researcher that played with Andreas fix for the 
security issue. But I'm just speculating at this point :)



has ever reported an issue for over 10 years, so I think it's easiest
to just rip the code out.


Yes. Can always be readded in a possibly cleaner fashion (like you note 
above), when deemed necessary and we are willing to support it.


[1] https://gruss.cc/files/remote_dedup.pdf

--
Cheers,

David / dhildenb

Re: [PATCH v3 1/7] mm: Add a bitmap into mmu_notifier_{clear,test}_young

2024-04-12 Thread David Matlack

On 2024-04-01 11:29 PM, James Houghton wrote:
> The bitmap is provided for secondary MMUs to use if they support it. For
> test_young(), after it returns, the bitmap represents the pages that
> were young in the interval [start, end). For clear_young, it represents
> the pages that we wish the secondary MMU to clear the accessed/young bit
> for.
> 
> If a bitmap is not provided, the mmu_notifier_{test,clear}_young() API
> should be unchanged except that if young PTEs are found and the
> architecture supports passing in a bitmap, instead of returning 1,
> MMU_NOTIFIER_YOUNG_FAST is returned.
> 
> This allows MGLRU's look-around logic to work faster, resulting in a 4%
> improvement in real workloads[1]. Also introduce MMU_NOTIFIER_YOUNG_FAST
> to indicate to main mm that doing look-around is likely to be
> beneficial.
> 
> If the secondary MMU doesn't support the bitmap, it must return
> an int that contains MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE.
> 
> [1]: https://lore.kernel.org/all/20230609005935.42390-1-yuz...@google.com/
> 
> Suggested-by: Yu Zhao 
> Signed-off-by: James Houghton 
> ---
>  include/linux/mmu_notifier.h | 93 +---
>  include/trace/events/kvm.h   | 13 +++--
>  mm/mmu_notifier.c| 20 +---
>  virt/kvm/kvm_main.c  | 19 ++--
>  4 files changed, 123 insertions(+), 22 deletions(-)
> 
> diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
> index f349e08a9dfe..daaa9db625d3 100644
> --- a/include/linux/mmu_notifier.h
> +++ b/include/linux/mmu_notifier.h
> @@ -61,6 +61,10 @@ enum mmu_notifier_event {
>  
>  #define MMU_NOTIFIER_RANGE_BLOCKABLE (1 << 0)
>  
> +#define MMU_NOTIFIER_YOUNG   (1 << 0)
> +#define MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE (1 << 1)

MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE appears to be unused by all callers
of test/clear_young(). I would vote to remove it.

> +#define MMU_NOTIFIER_YOUNG_FAST  (1 << 2)

Instead of MMU_NOTIFIER_YOUNG_FAST, how about
MMU_NOTIFIER_YOUNG_LOOK_AROUND? i.e. The secondary MMU is returning
saying it recommends doing a look-around and passing in a bitmap?

That would avoid the whole "what does FAST really mean" confusion.

> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index fb49c2a60200..ca4b1ef9dfc2 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -917,10 +917,15 @@ static int kvm_mmu_notifier_clear_flush_young(struct 
> mmu_notifier *mn,
>  static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn,
>   struct mm_struct *mm,
>   unsigned long start,
> - unsigned long end)
> + unsigned long end,
> + unsigned long *bitmap)
>  {
>   trace_kvm_age_hva(start, end);
>  
> + /* We don't support bitmaps. Don't test or clear anything. */
> + if (bitmap)
> + return MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE;

Wouldn't it be a bug to get a bitmap here? The main MM is only suppost
to pass in a bitmap if the secondary MMU returns
MMU_NOTIFIER_YOUNG_FAST, which KVM does not do at this point.

Put another way, this check seems unneccessary.

> +
>   /*
>* Even though we do not flush TLB, this will still adversely
>* affect performance on pre-Haswell Intel EPT, where there is
> @@ -939,11 +944,17 @@ static int kvm_mmu_notifier_clear_young(struct 
> mmu_notifier *mn,
>  
>  static int kvm_mmu_notifier_test_young(struct mmu_notifier *mn,
>  struct mm_struct *mm,
> -unsigned long address)
> +unsigned long start,
> +unsigned long end,
> +unsigned long *bitmap)
>  {
> - trace_kvm_test_age_hva(address);
> + trace_kvm_test_age_hva(start, end);
> +
> + /* We don't support bitmaps. Don't test or clear anything. */
> + if (bitmap)
> + return MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE;

Same thing here.

Re: [PATCH v3 0/7] mm/kvm: Improve parallelism for access bit harvesting

2024-04-12 Thread David Matlack

On 2024-04-01 11:29 PM, James Houghton wrote:
> This patchset adds a fast path in KVM to test and clear access bits on
> sptes without taking the mmu_lock. It also adds support for using a
> bitmap to (1) test the access bits for many sptes in a single call to
> mmu_notifier_test_young, and to (2) clear the access bits for many ptes
> in a single call to mmu_notifier_clear_young.

How much improvement would we get if we _just_ made test/clear_young
lockless on x86 and hold the read-lock on arm64? And then how much
benefit does the bitmap look-around add on top of that?

[PATCH v15 4/4] remoteproc: zynqmp: parse TCM from device tree

2024-04-12 Thread Tanmay Shah

ZynqMP TCM information was fixed in driver. Now ZynqMP TCM information
is available in device-tree. Parse TCM information in driver
as per new bindings.

Signed-off-by: Tanmay Shah 
---

Changes in v15:
  - Use hardcode TCM addresses as fallback method if "reg" unavailable
  - Remove Versal and Versal-net support

Changes in v14:
  - Add Versal platform support
  - Add Versal-NET platform support
  - Maintain backward compatibility for ZynqMP platform and use hardcode
TCM addresses
  - Configure TCM based on xlnx,tcm-mode property for Versal
  - Avoid TCM configuration if that property isn't available in dt

 drivers/remoteproc/xlnx_r5_remoteproc.c | 127 ++--
 1 file changed, 120 insertions(+), 7 deletions(-)

diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c 
b/drivers/remoteproc/xlnx_r5_remoteproc.c
index 0f942440b4e2..7b1c12108bff 100644
--- a/drivers/remoteproc/xlnx_r5_remoteproc.c
+++ b/drivers/remoteproc/xlnx_r5_remoteproc.c
@@ -74,8 +74,8 @@ struct mbox_info {
 };
 
 /*
- * Hardcoded TCM bank values. This will be removed once TCM bindings are
- * accepted for system-dt specifications and upstreamed in linux kernel
+ * Hardcoded TCM bank values. This will stay in driver to maintain backward
+ * compatibility with device-tree that does not have TCM information.
  */
 static const struct mem_bank_data zynqmp_tcm_banks_split[] = {
{0xffe0UL, 0x0, 0x1UL, PD_R5_0_ATCM, "atcm0"}, /* TCM 64KB each 
*/
@@ -761,6 +761,103 @@ static struct zynqmp_r5_core 
*zynqmp_r5_add_rproc_core(struct device *cdev)
return ERR_PTR(ret);
 }
 
+static int zynqmp_r5_get_tcm_node_from_dt(struct zynqmp_r5_cluster *cluster)
+{
+   int i, j, tcm_bank_count, ret, tcm_pd_idx, pd_count;
+   struct of_phandle_args out_args;
+   struct zynqmp_r5_core *r5_core;
+   struct platform_device *cpdev;
+   struct mem_bank_data *tcm;
+   struct device_node *np;
+   struct resource *res;
+   u64 abs_addr, size;
+   struct device *dev;
+
+   for (i = 0; i < cluster->core_count; i++) {
+   r5_core = cluster->r5_cores[i];
+   dev = r5_core->dev;
+   np = r5_core->np;
+
+   pd_count = of_count_phandle_with_args(np, "power-domains",
+ "#power-domain-cells");
+
+   if (pd_count <= 0) {
+   dev_err(dev, "invalid power-domains property, %d\n", 
pd_count);
+   return -EINVAL;
+   }
+
+   /* First entry in power-domains list is for r5 core, rest for 
TCM. */
+   tcm_bank_count = pd_count - 1;
+
+   if (tcm_bank_count <= 0) {
+   dev_err(dev, "invalid TCM count %d\n", tcm_bank_count);
+   return -EINVAL;
+   }
+
+   r5_core->tcm_banks = devm_kcalloc(dev, tcm_bank_count,
+ sizeof(struct mem_bank_data 
*),
+ GFP_KERNEL);
+   if (!r5_core->tcm_banks)
+   return -ENOMEM;
+
+   r5_core->tcm_bank_count = tcm_bank_count;
+   for (j = 0, tcm_pd_idx = 1; j < tcm_bank_count; j++, 
tcm_pd_idx++) {
+   tcm = devm_kzalloc(dev, sizeof(struct mem_bank_data),
+  GFP_KERNEL);
+   if (!tcm)
+   return -ENOMEM;
+
+   r5_core->tcm_banks[j] = tcm;
+
+   /* Get power-domains id of TCM. */
+   ret = of_parse_phandle_with_args(np, "power-domains",
+"#power-domain-cells",
+tcm_pd_idx, _args);
+   if (ret) {
+   dev_err(r5_core->dev,
+   "failed to get tcm %d pm domain, ret 
%d\n",
+   tcm_pd_idx, ret);
+   return ret;
+   }
+   tcm->pm_domain_id = out_args.args[0];
+   of_node_put(out_args.np);
+
+   /* Get TCM address without translation. */
+   ret = of_property_read_reg(np, j, _addr, );
+   if (ret) {
+   dev_err(dev, "failed to get reg property\n");
+   return ret;
+   }
+
+   /*
+* Remote processor can address only 32 bits
+* so convert 64-bits into 32-bits. This will discard
+* any unwanted upper 32-bits.
+*/
+   tcm->da = (u32)abs_addr;
+   tcm->size = (u32)size;
+
+   cpdev =

[PATCH v15 2/4] dt-bindings: remoteproc: add Tightly Coupled Memory (TCM) bindings

2024-04-12 Thread Tanmay Shah

From: Radhey Shyam Pandey 

Introduce bindings for TCM memory address space on AMD-xilinx Zynq
UltraScale+ platform. It will help in defining TCM in device-tree
and make it's access platform agnostic and data-driven.

Tightly-coupled memories(TCMs) are low-latency memory that provides
predictable instruction execution and predictable data load/store
timing. Each Cortex-R5F processor contains two 64-bit wide 64 KB memory
banks on the ATCM and BTCM ports, for a total of 128 KB of memory.

The TCM resources(reg, reg-names and power-domain) are documented for
each TCM in the R5 node. The reg and reg-names are made as required
properties as we don't want to hardcode TCM addresses for future
platforms and for zu+ legacy implementation will ensure that the
old dts w/o reg/reg-names works and stable ABI is maintained.

It also extends the examples for TCM split and lockstep modes.

Signed-off-by: Radhey Shyam Pandey 
Reviewed-by: Krzysztof Kozlowski 
Signed-off-by: Tanmay Shah 
---
 .../remoteproc/xlnx,zynqmp-r5fss.yaml | 279 --
 1 file changed, 257 insertions(+), 22 deletions(-)

diff --git 
a/Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml 
b/Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml
index 78aac69f1060..6f13da11f593 100644
--- a/Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml
+++ b/Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml
@@ -18,11 +18,26 @@ description: |
 
 properties:
   compatible:
-const: xlnx,zynqmp-r5fss
+enum:
+  - xlnx,zynqmp-r5fss
+  - xlnx,versal-r5fss
+  - xlnx,versal-net-r52fss
+
+  "#address-cells":
+const: 2
+
+  "#size-cells":
+const: 2
+
+  ranges:
+description: |
+  Standard ranges definition providing address translations for
+  local R5F TCM address spaces to bus addresses.
 
   xlnx,cluster-mode:
 $ref: /schemas/types.yaml#/definitions/uint32
 enum: [0, 1, 2]
+default: 1
 description: |
   The RPU MPCore can operate in split mode (Dual-processor performance), 
Safety
   lock-step mode(Both RPU cores execute the same code in lock-step,
@@ -36,8 +51,16 @@ properties:
   1: lockstep mode (default)
   2: single cpu mode
 
+  xlnx,tcm-mode:
+$ref: /schemas/types.yaml#/definitions/uint32
+enum: [0, 1]
+description: |
+  Configure RPU TCM
+  0: split mode
+  1: lockstep mode
+
 patternProperties:
-  "^r5f-[a-f0-9]+$":
+  "^r(.*)@[0-9a-f]+$":
 type: object
 description: |
   The RPU is located in the Low Power Domain of the Processor Subsystem.
@@ -52,10 +75,22 @@ patternProperties:
 
 properties:
   compatible:
-const: xlnx,zynqmp-r5f
+enum:
+  - xlnx,zynqmp-r5f
+  - xlnx,versal-r5f
+  - xlnx,versal-net-r52f
+
+  reg:
+minItems: 1
+maxItems: 4
+
+  reg-names:
+minItems: 1
+maxItems: 4
 
   power-domains:
-maxItems: 1
+minItems: 2
+maxItems: 5
 
   mboxes:
 minItems: 1
@@ -101,35 +136,235 @@ patternProperties:
 
 required:
   - compatible
+  - reg
+  - reg-names
   - power-domains
 
-unevaluatedProperties: false
-
 required:
   - compatible
+  - "#address-cells"
+  - "#size-cells"
+  - ranges
+
+allOf:
+  - if:
+  properties:
+compatible:
+  contains:
+enum:
+  - xlnx,versal-net-r52fss
+then:
+  properties:
+xlnx,tcm-mode: false
+
+  patternProperties:
+"^r52f@[0-9a-f]+$":
+  type: object
+
+  properties:
+reg:
+  minItems: 1
+  items:
+- description: ATCM internal memory
+- description: BTCM internal memory
+- description: CTCM internal memory
+
+reg-names:
+  minItems: 1
+  items:
+- const: atcm0
+- const: btcm0
+- const: ctcm0
+
+power-domains:
+  minItems: 2
+  items:
+- description: RPU core power domain
+- description: ATCM power domain
+- description: BTCM power domain
+- description: CTCM power domain
+
+  - if:
+  properties:
+compatible:
+  contains:
+enum:
+  - xlnx,zynqmp-r5fss
+  - xlnx,versal-r5fss
+then:
+  if:
+properties:
+  xlnx,cluster-mode:
+enum: [1, 2]
+  then:
+properties:
+  xlnx,tcm-mode:
+enum: [1]
+
+patternProperties:
+  "^r5f@[0-9a-f]+$":
+type: object
+
+properties:
+  reg:
+minItems: 1
+items:
+  - description: ATCM internal memory
+  - description: BTCM internal memory
+  - description: extra ATCM

[PATCH v15 1/4] remoteproc: zynqmp: fix lockstep mode memory region

2024-04-12 Thread Tanmay Shah

In lockstep mode, r5 core0 uses TCM of R5 core1. Following is lockstep
mode memory region as per hardware reference manual.

|  *TCM* |   *R5 View* | *Linux view* |
| R5_0 ATCM (128 KB) | 0x_ | 0xFFE0_  |
| R5_0 BTCM (128 KB) | 0x0002_ | 0xFFE2_  |

However, driver shouldn't model it as above because R5 core0 TCM and core1
TCM has different power-domains mapped to it.
Hence, TCM address space in lockstep mode should be modeled as 64KB
regions only where each region has its own power-domain as following:

|  *TCM* |   *R5 View* | *Linux view* |
| R5_0 ATCM0 (64 KB) | 0x_ | 0xFFE0_  |
| R5_0 BTCM0 (64 KB) | 0x0002_ | 0xFFE2_  |
| R5_0 ATCM1 (64 KB) | 0x0001_ | 0xFFE1_  |
| R5_0 BTCM1 (64 KB) | 0x0003_ | 0xFFE3_  |

This makes driver maintanance easy and makes design robust for future
platorms as well.

Signed-off-by: Tanmay Shah 
---
 drivers/remoteproc/xlnx_r5_remoteproc.c | 146 ++--
 1 file changed, 12 insertions(+), 134 deletions(-)

diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c 
b/drivers/remoteproc/xlnx_r5_remoteproc.c
index cfbd97b89c26..0f942440b4e2 100644
--- a/drivers/remoteproc/xlnx_r5_remoteproc.c
+++ b/drivers/remoteproc/xlnx_r5_remoteproc.c
@@ -84,12 +84,12 @@ static const struct mem_bank_data zynqmp_tcm_banks_split[] 
= {
{0xffebUL, 0x2, 0x1UL, PD_R5_1_BTCM, "btcm1"},
 };
 
-/* In lockstep mode cluster combines each 64KB TCM and makes 128KB TCM */
+/* In lockstep mode cluster uses each 64KB TCM from second core as well */
 static const struct mem_bank_data zynqmp_tcm_banks_lockstep[] = {
-   {0xffe0UL, 0x0, 0x2UL, PD_R5_0_ATCM, "atcm0"}, /* TCM 128KB 
each */
-   {0xffe2UL, 0x2, 0x2UL, PD_R5_0_BTCM, "btcm0"},
-   {0, 0, 0, PD_R5_1_ATCM, ""},
-   {0, 0, 0, PD_R5_1_BTCM, ""},
+   {0xffe0UL, 0x0, 0x1UL, PD_R5_0_ATCM, "atcm0"}, /* TCM 64KB each 
*/
+   {0xffe2UL, 0x2, 0x1UL, PD_R5_0_BTCM, "btcm0"},
+   {0xffe1UL, 0x1, 0x1UL, PD_R5_1_ATCM, "atcm1"},
+   {0xffe3UL, 0x3, 0x1UL, PD_R5_1_BTCM, "btcm1"},
 };
 
 /**
@@ -541,14 +541,14 @@ static int tcm_mem_map(struct rproc *rproc,
 }
 
 /*
- * add_tcm_carveout_split_mode()
+ * add_tcm_banks()
  * @rproc: single R5 core's corresponding rproc instance
  *
- * allocate and add remoteproc carveout for TCM memory in split mode
+ * allocate and add remoteproc carveout for TCM memory
  *
  * return 0 on success, otherwise non-zero value on failure
  */
-static int add_tcm_carveout_split_mode(struct rproc *rproc)
+static int add_tcm_banks(struct rproc *rproc)
 {
struct rproc_mem_entry *rproc_mem;
struct zynqmp_r5_core *r5_core;
@@ -581,10 +581,10 @@ static int add_tcm_carveout_split_mode(struct rproc 
*rproc)
 ZYNQMP_PM_REQUEST_ACK_BLOCKING);
if (ret < 0) {
dev_err(dev, "failed to turn on TCM 0x%x", 
pm_domain_id);
-   goto release_tcm_split;
+   goto release_tcm;
}
 
-   dev_dbg(dev, "TCM carveout split mode %s addr=%llx, da=0x%x, 
size=0x%lx",
+   dev_dbg(dev, "TCM carveout %s addr=%llx, da=0x%x, size=0x%lx",
bank_name, bank_addr, da, bank_size);
 
rproc_mem = rproc_mem_entry_init(dev, NULL, bank_addr,
@@ -594,99 +594,16 @@ static int add_tcm_carveout_split_mode(struct rproc 
*rproc)
if (!rproc_mem) {
ret = -ENOMEM;
zynqmp_pm_release_node(pm_domain_id);
-   goto release_tcm_split;
-   }
-
-   rproc_add_carveout(rproc, rproc_mem);
-   rproc_coredump_add_segment(rproc, da, bank_size);
-   }
-
-   return 0;
-
-release_tcm_split:
-   /* If failed, Turn off all TCM banks turned on before */
-   for (i--; i >= 0; i--) {
-   pm_domain_id = r5_core->tcm_banks[i]->pm_domain_id;
-   zynqmp_pm_release_node(pm_domain_id);
-   }
-   return ret;
-}
-
-/*
- * add_tcm_carveout_lockstep_mode()
- * @rproc: single R5 core's corresponding rproc instance
- *
- * allocate and add remoteproc carveout for TCM memory in lockstep mode
- *
- * return 0 on success, otherwise non-zero value on failure
- */
-static int add_tcm_carveout_lockstep_mode(struct rproc *rproc)
-{
-   struct rproc_mem_entry *rproc_mem;
-   struct zynqmp_r5_core *r5_core;
-   int i, num_banks, ret;
-   phys_addr_t bank_addr;
-   size_t bank_size = 0;
-   struct device *dev;
-   u32 pm_domain_id;
-   char *bank_name;
-   u32 da;
-
-   r5_core = rproc->priv;
-   dev = r5_core->dev;
-
-   /* Go through zynqmp banks for r5 node */
-   num_banks = r5_core->tcm_bank_count;
-
-   /*
-* In lockstep mode, TCM is contiguous memory

[PATCH v15 3/4] dts: zynqmp: add properties for TCM in remoteproc

2024-04-12 Thread Tanmay Shah

Add properties as per new bindings in zynqmp remoteproc node
to represent TCM address and size.

This patch also adds alternative remoteproc node to represent
remoteproc cluster in split mode. By default lockstep mode is
enabled and users should disable it before using split mode
dts. Both device-tree nodes can't be used simultaneously one
of them must be disabled. For zcu102-1.0 and zcu102-1.1 board
remoteproc split mode dts node is enabled and lockstep mode
dts is disabled.

Signed-off-by: Tanmay Shah 
---
 .../boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts  |  8 +++
 arch/arm64/boot/dts/xilinx/zynqmp.dtsi| 67 +--
 2 files changed, 70 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts 
b/arch/arm64/boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts
index c8f71a1aec89..495ca94b45db 100644
--- a/arch/arm64/boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts
+++ b/arch/arm64/boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts
@@ -14,6 +14,14 @@ / {
compatible = "xlnx,zynqmp-zcu102-rev1.0", "xlnx,zynqmp-zcu102", 
"xlnx,zynqmp";
 };
 
+_split {
+   status = "okay";
+};
+
+_lockstep {
+   status = "disabled";
+};
+
  {
#address-cells = <1>;
#size-cells = <1>;
diff --git a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi 
b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
index 25d20d803230..ef31b0fc73d1 100644
--- a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
+++ b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
@@ -260,19 +260,76 @@ fpga_full: fpga-full {
ranges;
};
 
-   remoteproc {
+   rproc_lockstep: remoteproc@ffe0 {
compatible = "xlnx,zynqmp-r5fss";
xlnx,cluster-mode = <1>;
+   xlnx,tcm-mode = <1>;
 
-   r5f-0 {
+   #address-cells = <2>;
+   #size-cells = <2>;
+
+   ranges = <0x0 0x0 0x0 0xffe0 0x0 0x1>,
+<0x0 0x2 0x0 0xffe2 0x0 0x1>,
+<0x0 0x1 0x0 0xffe1 0x0 0x1>,
+<0x0 0x3 0x0 0xffe3 0x0 0x1>;
+
+   r5f@0 {
+   compatible = "xlnx,zynqmp-r5f";
+   reg = <0x0 0x0 0x0 0x1>,
+ <0x0 0x2 0x0 0x1>,
+ <0x0 0x1 0x0 0x1>,
+ <0x0 0x3 0x0 0x1>;
+   reg-names = "atcm0", "btcm0", "atcm1", "btcm1";
+   power-domains = <_firmware PD_RPU_0>,
+   <_firmware PD_R5_0_ATCM>,
+   <_firmware PD_R5_0_BTCM>,
+   <_firmware PD_R5_1_ATCM>,
+   <_firmware PD_R5_1_BTCM>;
+   memory-region = <_0_fw_image>;
+   };
+
+   r5f@1 {
+   compatible = "xlnx,zynqmp-r5f";
+   reg = <0x1 0x0 0x0 0x1>, <0x1 0x2 0x0 0x1>;
+   reg-names = "atcm0", "btcm0";
+   power-domains = <_firmware PD_RPU_1>,
+   <_firmware PD_R5_1_ATCM>,
+   <_firmware PD_R5_1_BTCM>;
+   memory-region = <_1_fw_image>;
+   };
+   };
+
+   rproc_split: remoteproc-split@ffe0 {
+   status = "disabled";
+   compatible = "xlnx,zynqmp-r5fss";
+   xlnx,cluster-mode = <0>;
+   xlnx,tcm-mode = <0>;
+
+   #address-cells = <2>;
+   #size-cells = <2>;
+
+   ranges = <0x0 0x0 0x0 0xffe0 0x0 0x1>,
+<0x0 0x2 0x0 0xffe2 0x0 0x1>,
+<0x1 0x0 0x0 0xffe9 0x0 0x1>,
+<0x1 0x2 0x0 0xffeb 0x0 0x1>;
+
+   r5f@0 {
compatible = "xlnx,zynqmp-r5f";
-   power-domains = <_firmware PD_RPU_0>;
+   reg = <0x0 0x0 0x0 0x1>, <0x0 0x2 0x0 0x1>;
+   reg-names = "atcm0", "btcm0";
+   power-domains = <_firmware PD_RPU_0>,
+   <_firmware PD_R5_0_ATCM>,
+   <_firmware PD_R5_0_BTCM>;
memory-region = <_0_fw_image>;
};
 
-   r5f-1 {
+   r5f@1 {
compatible = "xlnx,zynqmp-r5f";
-   power-domains = <_firmware PD_RPU_1>;
+   reg = <0x1 0x0 0x0 0x1>, <0x1 0x2 0x0 0x1>;
+   reg-names = "atcm0", "btcm0";
+   power-domains = <_firmware PD_RPU_1>,
+   <_firmware PD_R5_1_ATCM>,
+   <_firmware PD_R5_1_BTCM>;

[PATCH v15 0/4] add zynqmp TCM bindings

2024-04-12 Thread Tanmay Shah

Tightly-Coupled Memories(TCMs) are low-latency memory that provides
predictable instruction execution and predictable data load/store
timing. Each Cortex-R5F processor contains exclusive two 64 KB memory
banks on the ATCM and BTCM ports, for a total of 128 KB of memory.
In lockstep mode, both 128KB memory is accessible to the cluster.

As per ZynqMP Ultrascale+ Technical Reference Manual UG1085, following
is address space of TCM memory. The bindings in this patch series
introduces properties to accommodate following address space with
address translation between Linux and Cortex-R5 views.

| | | |
| --- | --- | --- |
|  *Mode*|   *R5 View* | *Linux view* |  Notes   |
| *Split Mode*   | *start addr*| *start addr* |  |
| R5_0 ATCM (64 KB)  | 0x_ | 0xFFE0_  |  |
| R5_0 BTCM (64 KB)  | 0x0002_ | 0xFFE2_  |  |
| R5_1 ATCM (64 KB)  | 0x_ | 0xFFE9_  | alias of 0xFFE1_ |
| R5_1 BTCM (64 KB)  | 0x0002_ | 0xFFEB_  | alias of 0xFFE3_ |
|  ___   | ___ |___   |  |
| *Lockstep Mode*| |  |  |
| R5_0 ATCM (128 KB) | 0x_ | 0xFFE0_  |  |
| R5_0 BTCM (128 KB) | 0x0002_ | 0xFFE2_  |  |

References:
UG1085 TCM address space:
https://docs.xilinx.com/r/en-US/ug1085-zynq-ultrascale-trm/Tightly-Coupled-Memory-Address-Map

---

prerequisite-patch-link: 
https://lore.kernel.org/all/d4556268-8274-4089-949f-3b97d6779...@gmail.com/
Base Branch: 6.9.rc2

Changes in v15:
  - Use hardcode TCM addresses as fallback method if "reg" unavailable
  - Use new bindings for r5fss subsystem

Changes in v14:
  - Add xlnx,tcm-mode property and use it for TCM configuration
  - Add Versal and Versal-NET platform support
  - Maintain backward compatibility for ZynqMP platform and use hardcode
TCM addresses

Changes in v13:
  - Have power-domains property for lockstep case instead of
keeping it flexible.
  - Add "items:" list in power-domains property


Radhey Shyam Pandey (1):
  dt-bindings: remoteproc: add Tightly Coupled Memory (TCM) bindings

Tanmay Shah (3):
  remoteproc: zynqmp: fix lockstep mode memory region
  dts: zynqmp: add properties for TCM in remoteproc
  remoteproc: zynqmp: parse TCM from device tree

 .../remoteproc/xlnx,zynqmp-r5fss.yaml | 279 --
 .../boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts  |   8 +
 arch/arm64/boot/dts/xilinx/zynqmp.dtsi|  67 -
 drivers/remoteproc/xlnx_r5_remoteproc.c   | 273 +
 4 files changed, 459 insertions(+), 168 deletions(-)


base-commit: 4d5aabb6843939fad36912be8bf109adf9af0848
-- 
2.25.1

Re: [RFC PATCH 0/4] perf: Correlating user process data to samples

2024-04-12 Thread Mathieu Desnoyers


On 2024-04-12 12:28, Beau Belgrave wrote:

On Thu, Apr 11, 2024 at 09:52:22PM -0700, Ian Rogers wrote:

On Thu, Apr 11, 2024 at 5:17 PM Beau Belgrave  wrote:


In the Open Telemetry profiling SIG [1], we are trying to find a way to
grab a tracing association quickly on a per-sample basis. The team at
Elastic has a bespoke way to do this [2], however, I'd like to see a
more general way to achieve this. The folks I've been talking with seem
open to the idea of just having a TLS value for this we could capture


Presumably TLS == Thread Local Storage.



Yes, the initial idea is to use thread local storage (TLS). It seems to
be the fastest option to save a per-thread value that changes at a fast
rate.


upon each sample. We could then just state, Open Telemetry SDKs should
have a TLS value for span correlation. However, we need a way to sample
the TLS or other value(s) when a sampling event is generated. This is
supported today on Windows via EventActivityIdControl() [3]. Since
Open Telemetry works on both Windows and Linux, ideally we can do
something as efficient for Linux based workloads.

This series is to explore how it would be best possible to collect
supporting data from a user process when a profile sample is collected.
Having a value stored in TLS makes a lot of sense for this however
there are other ways to explore. Whatever is chosen, kernel samples
taken in process context should be able to get this supporting data.
In these patches on X64 the fsbase and gsbase are used for this.

An option to explore suggested by Mathieu Desnoyers is to utilize rseq
for processes to register a value location that can be included when
profiling if desired. This would allow a tighter contract between user
processes and a profiler.  It would allow better labeling/categorizing
the correlation values.


It is hard to understand this idea. Are you saying stash a cookie in
TLS for samples to capture to indicate an activity? Restartable
sequences are about preemption on a CPU not of a thread, so at least
my intuition is that they feel different. You could stash information
like this today by changing the thread name which generates comm
events. I've wondered about having similar information in some form of
reserved for profiling stack slot, for example, to stash a pointer to
the name of a function being interpreted. Snapshotting all of a stack
is bad performance wise and for security. A stack slot would be able
to deal with nesting.



You are getting the idea. A slot or tag for a thread would be great! I'm
not a fan of overriding the thread comm name (as that already has a
use). TLS would be fine, if we could also pass an offset + size + type.

Maybe a stack slot that just points to parts of TLS? That way you could
have a set of slots that don't require much memory and selectively copy
them out of TLS (or where ever those slots point to in user memory).

When I was talking to Mathieu about this, it seems that rseq already had
a place to potentially put these slots. I'm unsure though how the per
thread aspects would work.

Mathieu, can you post your ideas here about that?


Sure. I'll try to summarize my thoughts here. By all means, let me
know if I'm missing important pieces of the puzzle.

First of all, here is my understanding of what information we want to
share between userspace and kernel.

A 128-bit activity ID identifies "uniquely" (as far as a 128-bit random
UUID allows) a portion of the dependency chain involved in doing some
work (e.g. answer a HTTP request) across one or many participating hosts.

Activity IDs have a parent/child relationship: a parent activity ID can
create children activity IDs.

For instance, if one host has the service "dispatch", another host
has a "web server", and a third host has a SQL database, we should
be able to follow the chain of activities needed to answer a web
query by following those activity IDs, linking them together
through parent/child relationships. This usually requires the
communication protocols to convey those activity IDs across hosts.

The reason why this information must be provided from userspace is
because it's userspace that knows where to find those activity IDs
within its application-layer communication protocols.

With tracing, taking a full trace of the activity ID spans begin/end
from all hosts allow reconstructing the activity IDs parent/child
relationships, so we typically only need to extract information about
activity ID span begin/end with parent/child info to a tracer.

Using activity IDs from a kernel profiler is trickier, because
we do not have access to the complete span begin/end trace to
reconstruct the activity ID parent/child relationship. This is
where I suspect we'd want to introduce a notion of "activity ID
stack", so a profiler could reconstruct the currently active
stack of activity IDs for the current thread by walking that
stack.

This profiling could be triggered either from an interrupt
(sampling use-case), which would then walk

Re: [RFC PATCH 0/4] perf: Correlating user process data to samples

2024-04-12 Thread Beau Belgrave

On Fri, Apr 12, 2024 at 09:12:45AM +0200, Peter Zijlstra wrote:
> 
> On Fri, Apr 12, 2024 at 12:17:28AM +, Beau Belgrave wrote:
> 
> > An idea flow would look like this:
> > User Task   Profile
> > do_work();  sample() -> IP + No activity
> > ...
> > set_activity(123);
> > ...
> > do_work();  sample() -> IP + activity (123)
> > ...
> > set_activity(124);
> > ...
> > do_work();  sample() -> IP + activity (124)
> 
> This, start with this, because until I saw this, I was utterly confused
> as to what the heck you were on about.
> 

Will do.

> I started by thinking we already have TID in samples so you can already
> associate back to user processes and got increasingly confused the
> further I went.
> 
> What you seem to want to do however is have some task-state included so
> you can see what the thread is doing.
> 

Yeah, there is typically an external context (not on the machine) that
wants to be tied to each sample. The context could be a simple integer,
UUID, or something else entirely. For OTel, this is a 16-byte array [1].

> Anyway, since we typically run stuff from NMI context, accessing user
> data is 'interesting'. As such I would really like to make this work
> depend on the call-graph rework that pushes all the user access bits
> into return-to-user.

Cool, I assume that's the SFRAME work? Are there pointers to work I
could look at and think about what a rebase looks like? Or do you have
someone in mind I should work with for this?

Thanks,
-Beau

1. https://www.w3.org/TR/trace-context/#version-format

Re: [RFC PATCH 0/4] perf: Correlating user process data to samples

2024-04-12 Thread Beau Belgrave

On Thu, Apr 11, 2024 at 09:52:22PM -0700, Ian Rogers wrote:
> On Thu, Apr 11, 2024 at 5:17 PM Beau Belgrave  
> wrote:
> >
> > In the Open Telemetry profiling SIG [1], we are trying to find a way to
> > grab a tracing association quickly on a per-sample basis. The team at
> > Elastic has a bespoke way to do this [2], however, I'd like to see a
> > more general way to achieve this. The folks I've been talking with seem
> > open to the idea of just having a TLS value for this we could capture
> 
> Presumably TLS == Thread Local Storage.
> 

Yes, the initial idea is to use thread local storage (TLS). It seems to
be the fastest option to save a per-thread value that changes at a fast
rate.

> > upon each sample. We could then just state, Open Telemetry SDKs should
> > have a TLS value for span correlation. However, we need a way to sample
> > the TLS or other value(s) when a sampling event is generated. This is
> > supported today on Windows via EventActivityIdControl() [3]. Since
> > Open Telemetry works on both Windows and Linux, ideally we can do
> > something as efficient for Linux based workloads.
> >
> > This series is to explore how it would be best possible to collect
> > supporting data from a user process when a profile sample is collected.
> > Having a value stored in TLS makes a lot of sense for this however
> > there are other ways to explore. Whatever is chosen, kernel samples
> > taken in process context should be able to get this supporting data.
> > In these patches on X64 the fsbase and gsbase are used for this.
> >
> > An option to explore suggested by Mathieu Desnoyers is to utilize rseq
> > for processes to register a value location that can be included when
> > profiling if desired. This would allow a tighter contract between user
> > processes and a profiler.  It would allow better labeling/categorizing
> > the correlation values.
> 
> It is hard to understand this idea. Are you saying stash a cookie in
> TLS for samples to capture to indicate an activity? Restartable
> sequences are about preemption on a CPU not of a thread, so at least
> my intuition is that they feel different. You could stash information
> like this today by changing the thread name which generates comm
> events. I've wondered about having similar information in some form of
> reserved for profiling stack slot, for example, to stash a pointer to
> the name of a function being interpreted. Snapshotting all of a stack
> is bad performance wise and for security. A stack slot would be able
> to deal with nesting.
> 

You are getting the idea. A slot or tag for a thread would be great! I'm
not a fan of overriding the thread comm name (as that already has a
use). TLS would be fine, if we could also pass an offset + size + type.

Maybe a stack slot that just points to parts of TLS? That way you could
have a set of slots that don't require much memory and selectively copy
them out of TLS (or where ever those slots point to in user memory).

When I was talking to Mathieu about this, it seems that rseq already had
a place to potentially put these slots. I'm unsure though how the per
thread aspects would work.

Mathieu, can you post your ideas here about that?

> > An idea flow would look like this:
> > User Task   Profile
> > do_work();  sample() -> IP + No activity
> > ...
> > set_activity(123);
> > ...
> > do_work();  sample() -> IP + activity (123)
> > ...
> > set_activity(124);
> > ...
> > do_work();  sample() -> IP + activity (124)
> >
> > Ideally, the set_activity() method would not be a syscall. It needs to
> > be very cheap as this should not bottleneck work. Ideally this is just
> > a memcpy of 16-20 bytes as it is on Windows via EventActivityIdControl()
> > using EVENT_ACTIVITY_CTRL_SET_ID.
> >
> > For those not aware, Open Telemetry allows collecting data from multiple
> > machines and show where time was spent. The tracing context is already
> > available for logs, but not for profiling samples. The idea is to show
> > where slowdowns occur and have profile samples to explain why they
> > slowed down. This must be possible without having to track context
> > switches to do this correlation. This is because the profiling rates
> > are typically 20hz - 1Khz, while the context switching rates are much
> > higher. We do not want to have to consume high context switch rates
> > just to know a correlation for a 20hz signal. Often these 20hz signals
> > are always enabled in some environments.
> >
> > Regardless if TLS, rseq, or other source is used I believe we will need
> > a way for perf_events to include it within a sample. The changes in this
> > series show how it could be done with TLS. There is some factoring work
> > under perf to make it easier to add more dump types using the existing
> > ABI. This is mostly to make the patches clearer, certainly the refactor
> > parts could get dropped and we could have duplicated/specialized paths.
> 
> fs and gs may be used

Re: [PATCH v2] dt-bindings: mfd: qcom,spmi-pmic: Add pbs to SPMI device types

2024-04-12 Thread Krzysztof Kozlowski

On 12/04/2024 16:22, Luca Weiss wrote:
> Add the PBS (Programmable Boot Sequencer) to the list of devices.
> 
> Reviewed-by: Bjorn Andersson 
> Signed-off-by: Luca Weiss 
> ---
> Changes in v2:

Reviewed-by: Krzysztof Kozlowski 

Best regards,
Krzysztof

Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback

2024-04-12 Thread Sean Christopherson

On Fri, Apr 12, 2024, Marc Zyngier wrote:
> On Fri, 12 Apr 2024 11:44:09 +0100, Will Deacon  wrote:
> > On Fri, Apr 05, 2024 at 07:58:12AM -0400, Paolo Bonzini wrote:
> > Also, if you're in the business of hacking the MMU notifier code, it
> > would be really great to change the .clear_flush_young() callback so
> > that the architecture could handle the TLB invalidation. At the moment,
> > the core KVM code invalidates the whole VMID courtesy of 'flush_on_ret'
> > being set by kvm_handle_hva_range(), whereas we could do a much
> > lighter-weight and targetted TLBI in the architecture page-table code
> > when we actually update the ptes for small ranges.
> 
> Indeed, and I was looking at this earlier this week as it has a pretty
> devastating effect with NV (it blows the shadow S2 for that VMID, with
> costly consequences).
> 
> In general, it feels like the TLB invalidation should stay with the
> code that deals with the page tables, as it has a pretty good idea of
> what needs to be invalidated and how -- specially on architectures
> that have a HW-broadcast facility like arm64.

Would this be roughly on par with an in-line flush on arm64?  The simpler, more
straightforward solution would be to let architectures override flush_on_ret,
but I would prefer something like the below as x86 can also utilize a 
range-based
flush when running as a nested hypervisor.

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ff0a20565f90..b65116294efe 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -601,6 +601,7 @@ static __always_inline kvm_mn_ret_t 
__kvm_handle_hva_range(struct kvm *kvm,
struct kvm_gfn_range gfn_range;
struct kvm_memory_slot *slot;
struct kvm_memslots *slots;
+   bool need_flush = false;
int i, idx;
 
if (WARN_ON_ONCE(range->end <= range->start))
@@ -653,10 +654,22 @@ static __always_inline kvm_mn_ret_t 
__kvm_handle_hva_range(struct kvm *kvm,
break;
}
r.ret |= range->handler(kvm, _range);
+
+   /*
+* Use a precise gfn-based TLB flush when possible, as
+* most mmu_notifier events affect a small-ish range.
+* Fall back to a full TLB flush if the gfn-based flush
+* fails, and don't bother trying the gfn-based flush
+* if a full flush is already pending.
+*/
+   if (range->flush_on_ret && !need_flush && r.ret &&
+   kvm_arch_flush_remote_tlbs_range(kvm, 
gfn_range.start
+gfn_range.end - 
gfn_range.start + 1))
+   need_flush = true;
}
}
 
-   if (range->flush_on_ret && r.ret)
+   if (need_flush)
kvm_flush_remote_tlbs(kvm);
 
if (r.found_memslot)

Re: [PATCH v2 01/11] ring-buffer: Allow mapped field to be set without mapping

2024-04-12 Thread Vincent Donnefort

On Wed, Apr 10, 2024 at 09:25:42PM -0400, Steven Rostedt wrote:
> From: "Steven Rostedt (Google)" 
> 
> In preparation for having the ring buffer mapped to a dedicated location,
> which will have the same restrictions as user space memory mapped buffers,
> allow it to use the "mapped" field of the ring_buffer_per_cpu structure
> without having the user space meta page mapping.
> 
> When this starts using the mapped field, it will need to handle adding a
> user space mapping (and removing it) from a ring buffer that is using a
> dedicated memory range.
> 
> Signed-off-by: Steven Rostedt (Google) 
> ---
>  kernel/trace/ring_buffer.c | 10 +++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index 793ecc454039..44b1d5f1a99a 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -5223,6 +5223,9 @@ static void rb_update_meta_page(struct 
> ring_buffer_per_cpu *cpu_buffer)
>  {
>   struct trace_buffer_meta *meta = cpu_buffer->meta_page;
>  
> + if (!meta)
> + return;
> +
>   meta->reader.read = cpu_buffer->reader_page->read;
>   meta->reader.id = cpu_buffer->reader_page->id;
>   meta->reader.lost_events = cpu_buffer->lost_events;
> @@ -6167,7 +6170,7 @@ rb_get_mapped_buffer(struct trace_buffer *buffer, int 
> cpu)
>  
>   mutex_lock(_buffer->mapping_lock);
>  
> - if (!cpu_buffer->mapped) {
> + if (!cpu_buffer->mapped || !cpu_buffer->meta_page) {
>   mutex_unlock(_buffer->mapping_lock);
>   return ERR_PTR(-ENODEV);
>   }
> @@ -6345,12 +6348,13 @@ int ring_buffer_map(struct trace_buffer *buffer, int 
> cpu,

IIUC, we still allow to map from user-space this buffer. So we now can have
mapped && !meta_page.

Then the "if (cpu_buffer->mapped) {" that skips the meta_page creation in
ring_buffer_map() should be replaced by if (cpu_buffer->meta_page).

>*/
>   raw_spin_lock_irqsave(_buffer->reader_lock, flags);
>   rb_setup_ids_meta_page(cpu_buffer, subbuf_ids);
> +
>   raw_spin_unlock_irqrestore(_buffer->reader_lock, flags);
>  
>   err = __rb_map_vma(cpu_buffer, vma);
>   if (!err) {
>   raw_spin_lock_irqsave(_buffer->reader_lock, flags);
> - cpu_buffer->mapped = 1;
> + cpu_buffer->mapped++;
>   raw_spin_unlock_irqrestore(_buffer->reader_lock, flags);
>   } else {
>   kfree(cpu_buffer->subbuf_ids);
> @@ -6388,7 +6392,7 @@ int ring_buffer_unmap(struct trace_buffer *buffer, int 
> cpu)
>   mutex_lock(>mutex);
>   raw_spin_lock_irqsave(_buffer->reader_lock, flags);

In this function, there's also a check for cpu_buffer->mapped > 1. This avoids
killing the meta-page while someone is still in use.

It seems like a dedicated meta_page counter will be necessary. Otherwise, in the
event of a ring-buffer mapped at boot we, would setup the meta-page on the first
mmap() and never tear it down.

>  
> - cpu_buffer->mapped = 0;
> + cpu_buffer->mapped--;
>  
>   raw_spin_unlock_irqrestore(_buffer->reader_lock, flags);
>  
> -- 
> 2.43.0
> 
>

Re: Re: [PATCH 2/3] kernel/pid: Remove default pid_max value

2024-04-12 Thread Michal Koutný

On Thu, Apr 11, 2024 at 03:03:31PM -0700, Andrew Morton 
 wrote:
> A large increase in the maximum number of processes.

The change from (some) default to effective infinity is the crux of the
change. Because that is only a number.
(Thus I don't find the number's 12700% increase alone a big change.)

Actual maximum amount of processes is "workload dependent" and hence
should be determined based on the particular workload.

> Or did I misinterpret?

I thought you saw an issue with projection of that number into sizings
based on the default. Which of them comprises the large change in your
eyes?

Thanks,
Michal

Re: [PATCH] Bluetooth: Add more Bluetooth version defines

2024-04-12 Thread Luca Weiss

On Fri Feb 16, 2024 at 2:22 PM CET, Luca Weiss wrote:
> Add the various Bluetooth version identifiers found in the "Assigned
> Numbers" document[0] from the Bluetooth SIG.
>
> [0] https://www.bluetooth.com/specifications/assigned-numbers/

Hi all,

Is there any interest in this patch? Would be nice to get at least a
positive or negative reaction to it.

Regards
Luca

>
> Signed-off-by: Luca Weiss 
> ---
> To be clear, I don't have a use case for these extra defines myself but
> some time ago when working on Bluetooth I came across this and thought
> it would be interesting to have the list complete. No other motives.
> ---
>  include/net/bluetooth/bluetooth.h | 9 +
>  1 file changed, 9 insertions(+)
>
> diff --git a/include/net/bluetooth/bluetooth.h 
> b/include/net/bluetooth/bluetooth.h
> index 7ffa8c192c3f..818eb142eda3 100644
> --- a/include/net/bluetooth/bluetooth.h
> +++ b/include/net/bluetooth/bluetooth.h
> @@ -39,11 +39,20 @@
>  #endif
>  
>  /* Bluetooth versions */
> +#define BLUETOOTH_VER_1_0B   0
>  #define BLUETOOTH_VER_1_11
>  #define BLUETOOTH_VER_1_22
>  #define BLUETOOTH_VER_2_03
>  #define BLUETOOTH_VER_2_14
> +#define BLUETOOTH_VER_3_05
>  #define BLUETOOTH_VER_4_06
> +#define BLUETOOTH_VER_4_17
> +#define BLUETOOTH_VER_4_28
> +#define BLUETOOTH_VER_5_09
> +#define BLUETOOTH_VER_5_110
> +#define BLUETOOTH_VER_5_211
> +#define BLUETOOTH_VER_5_312
> +#define BLUETOOTH_VER_5_413
>  
>  /* Reserv for core and drivers use */
>  #define BT_SKB_RESERVE   8
>
> ---
> base-commit: 841c35169323cd833294798e58b9bf63fa4fa1de
> change-id: 20240216-bluetooth-defines-b810ce543191
>
> Best regards,

[PATCH v2] dt-bindings: mfd: qcom,spmi-pmic: Add pbs to SPMI device types

2024-04-12 Thread Luca Weiss

Add the PBS (Programmable Boot Sequencer) to the list of devices.

Reviewed-by: Bjorn Andersson 
Signed-off-by: Luca Weiss 
---
Changes in v2:
- Pick up tags
- Rebase on linux-next, drop merged patches
- Link to v1: 
https://lore.kernel.org/r/20240205-pmi632-ppg-v1-0-e236c95a2...@fairphone.com
---
 Documentation/devicetree/bindings/mfd/qcom,spmi-pmic.yaml | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/devicetree/bindings/mfd/qcom,spmi-pmic.yaml 
b/Documentation/devicetree/bindings/mfd/qcom,spmi-pmic.yaml
index 8103fb61a16c..b7f01cbb8fff 100644
--- a/Documentation/devicetree/bindings/mfd/qcom,spmi-pmic.yaml
+++ b/Documentation/devicetree/bindings/mfd/qcom,spmi-pmic.yaml
@@ -160,6 +160,10 @@ patternProperties:
 type: object
 $ref: /schemas/nvmem/qcom,spmi-sdam.yaml#
 
+  "^pbs@[0-9a-f]+$":
+type: object
+$ref: /schemas/soc/qcom/qcom,pbs.yaml#
+
   "phy@[0-9a-f]+$":
 type: object
 $ref: /schemas/phy/qcom,snps-eusb2-repeater.yaml#

---
base-commit: fa8c2b5f446d6e8ff4bc8f67ba944b1be3aad790
change-id: 20240117-pmi632-ppg-f1efb4318722

Best regards,
-- 
Luca Weiss

[PATCH v2] media: dt-bindings: qcom,sc7280-venus: Allow one IOMMU entry

2024-04-12 Thread Luca Weiss

Some SC7280-based boards crash when providing the "secure_non_pixel"
context bank, so allow only one iommu in the bindings also.

Acked-by: Krzysztof Kozlowski 
Signed-off-by: Luca Weiss 
---
Reference:
https://lore.kernel.org/linux-arm-msm/20231201-sc7280-venus-pas-v3-2-bc132dc5f...@fairphone.com/
---
Changes in v2:
- Pick up tags
- Otherwise just a resend, v1 was sent in January
- Link to v1: 
https://lore.kernel.org/r/20240129-sc7280-venus-bindings-v1-1-20a9ba194...@fairphone.com
---
 Documentation/devicetree/bindings/media/qcom,sc7280-venus.yaml | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/media/qcom,sc7280-venus.yaml 
b/Documentation/devicetree/bindings/media/qcom,sc7280-venus.yaml
index 8f9b6433aeb8..10c334e6b3dc 100644
--- a/Documentation/devicetree/bindings/media/qcom,sc7280-venus.yaml
+++ b/Documentation/devicetree/bindings/media/qcom,sc7280-venus.yaml
@@ -43,6 +43,7 @@ properties:
   - const: vcodec_bus
 
   iommus:
+minItems: 1
 maxItems: 2
 
   interconnects:

---
base-commit: 9ed46da14b9b9b2ad4edb3b0c545b6dbe5c00d39
change-id: 20240129-sc7280-venus-bindings-6e62a99620de

Best regards,
-- 
Luca Weiss

[PATCH v5 5/5] Documentation: Add reconnect process for VDUSE

2024-04-12 Thread Cindy Lu

Add a document explaining the reconnect process, including what the
Userspace App needs to do and how it works with the kernel.

Signed-off-by: Cindy Lu 
---
 Documentation/userspace-api/vduse.rst | 41 +++
 1 file changed, 41 insertions(+)

diff --git a/Documentation/userspace-api/vduse.rst 
b/Documentation/userspace-api/vduse.rst
index bdb880e01132..7faa83462e78 100644
--- a/Documentation/userspace-api/vduse.rst
+++ b/Documentation/userspace-api/vduse.rst
@@ -231,3 +231,44 @@ able to start the dataplane processing as follows:
after the used ring is filled.
 
 For more details on the uAPI, please see include/uapi/linux/vduse.h.
+
+HOW VDUSE devices reconnection works
+
+1. What is reconnection?
+
+   When the userspace application loads, it should establish a connection
+   to the vduse kernel device. Sometimes,the userspace application exists,
+   and we want to support its restart and connect to the kernel device again
+
+2. How can I support reconnection in a userspace application?
+
+2.1 During initialization, the userspace application should first verify the
+existence of the device "/dev/vduse/vduse_name".
+If it doesn't exist, it means this is the first-time for connection. goto 
step 2.2
+If it exists, it means this is a reconnection, and we should goto step 2.3
+
+2.2 Create a new VDUSE instance with ioctl(VDUSE_CREATE_DEV) on
+/dev/vduse/control.
+When ioctl(VDUSE_CREATE_DEV) is called, kernel allocates memory for
+the reconnect information. The total memory size is PAGE_SIZE*vq_mumber.
+
+2.3 Check if the information is suitable for reconnect
+If this is reconnection :
+Before attempting to reconnect, The userspace application needs to use the
+ioctl(VDUSE_DEV_GET_CONFIG, VDUSE_DEV_GET_STATUS, 
VDUSE_DEV_GET_FEATURES...)
+to get the information from kernel.
+Please review the information and confirm if it is suitable to reconnect.
+
+2.4 Userspace application needs to mmap the memory to userspace
+The userspace application requires mapping one page for every vq. These 
pages
+should be used to save vq-related information during system running. 
Additionally,
+the application must define its own structure to store information for 
reconnection.
+
+2.5 Completed the initialization and running the application.
+While the application is running, it is important to store relevant 
information
+about reconnections in mapped pages. When calling the ioctl 
VDUSE_VQ_GET_INFO to
+get vq information, it's necessary to check whether it's a reconnection. 
If it is
+a reconnection, the vq-related information must be get from the mapped 
pages.
+
+2.6 When the Userspace application exits, it is necessary to unmap all the
+pages for reconnection
-- 
2.43.0

[PATCH v5 4/5] vduse: Add file operation for mmap

2024-04-12 Thread Cindy Lu

Add the operation for mmap, This function  will be used by the user space
application to map the pages to the user space.

Signed-off-by: Cindy Lu 
---
 drivers/vdpa/vdpa_user/vduse_dev.c | 57 ++
 1 file changed, 57 insertions(+)

diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
b/drivers/vdpa/vdpa_user/vduse_dev.c
index 2da659d5f4a8..7abe0c17cf0e 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -1465,6 +1465,61 @@ static struct vduse_dev *vduse_dev_get_from_minor(int 
minor)
return dev;
 }
 
+static vm_fault_t vduse_vm_fault(struct vm_fault *vmf)
+{
+   struct vduse_dev *dev = vmf->vma->vm_file->private_data;
+   struct vm_area_struct *vma = vmf->vma;
+   u16 index = vma->vm_pgoff;
+   struct vduse_virtqueue *vq;
+   unsigned long vaddr;
+
+   /* index 0~vq_number-1 page reserved for virtqueue state*/
+   vq = dev->vqs[index];
+   vaddr = vq->vdpa_reconnect_vaddr;
+   if (remap_pfn_range(vma, vmf->address & PAGE_MASK,
+   PFN_DOWN(virt_to_phys((void *)vaddr)), PAGE_SIZE,
+   vma->vm_page_prot))
+   return VM_FAULT_SIGBUS;
+   return VM_FAULT_NOPAGE;
+}
+
+static const struct vm_operations_struct vduse_vm_ops = {
+   .fault = vduse_vm_fault,
+};
+
+static int vduse_dev_mmap(struct file *file, struct vm_area_struct *vma)
+{
+   struct vduse_dev *dev = file->private_data;
+   unsigned long vaddr = 0;
+   unsigned long index = vma->vm_pgoff;
+   struct vduse_virtqueue *vq;
+
+   if (vma->vm_end - vma->vm_start != PAGE_SIZE)
+   return -EINVAL;
+   if ((vma->vm_flags & VM_SHARED) == 0)
+   return -EINVAL;
+   if (index >= dev->vq_num)
+   return -EINVAL;
+
+   index = array_index_nospec(index, dev->vq_num);
+   vq = dev->vqs[index];
+   vaddr = vq->vdpa_reconnect_vaddr;
+   if (vaddr == 0)
+   return -EOPNOTSUPP;
+
+   if (virt_to_phys((void *)vaddr) & (PAGE_SIZE - 1))
+   return -EINVAL;
+
+   /* Check if the Userspace App mapped the correct size */
+   if (vma->vm_end - vma->vm_start != PAGE_SIZE)
+   return -EOPNOTSUPP;
+
+   vm_flags_set(vma, VM_DONTEXPAND);
+   vma->vm_ops = _vm_ops;
+
+   return 0;
+}
+
 static int vduse_dev_open(struct inode *inode, struct file *file)
 {
int ret;
@@ -1497,6 +1552,8 @@ static const struct file_operations vduse_dev_fops = {
.unlocked_ioctl = vduse_dev_ioctl,
.compat_ioctl   = compat_ptr_ioctl,
.llseek = noop_llseek,
+   .mmap   = vduse_dev_mmap,
+
 };
 
 static ssize_t irq_cb_affinity_show(struct vduse_virtqueue *vq, char *buf)
-- 
2.43.0

[PATCH v5 3/5] vduse: Add function to get/free the pages for reconnection

2024-04-12 Thread Cindy Lu

Add the function vduse_alloc_reconnnect_info_mem
and vduse_alloc_reconnnect_info_mem
These functions allow vduse to allocate and free memory for reconnection
information. The amount of memory allocated is vq_num pages.
Each VQS will map its own page where the reconnection information will be saved

Signed-off-by: Cindy Lu 
---
 drivers/vdpa/vdpa_user/vduse_dev.c | 40 ++
 1 file changed, 40 insertions(+)

diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
b/drivers/vdpa/vdpa_user/vduse_dev.c
index ef3c9681941e..2da659d5f4a8 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -65,6 +65,7 @@ struct vduse_virtqueue {
int irq_effective_cpu;
struct cpumask irq_affinity;
struct kobject kobj;
+   unsigned long vdpa_reconnect_vaddr;
 };
 
 struct vduse_dev;
@@ -1105,6 +1106,38 @@ static void vduse_vq_update_effective_cpu(struct 
vduse_virtqueue *vq)
 
vq->irq_effective_cpu = curr_cpu;
 }
+static int vduse_alloc_reconnnect_info_mem(struct vduse_dev *dev)
+{
+   unsigned long vaddr = 0;
+   struct vduse_virtqueue *vq;
+
+   for (int i = 0; i < dev->vq_num; i++) {
+   /*page 0~ vq_num save the reconnect info for vq*/
+   vq = dev->vqs[i];
+   vaddr = get_zeroed_page(GFP_KERNEL);
+   if (vaddr == 0)
+   return -ENOMEM;
+
+   vq->vdpa_reconnect_vaddr = vaddr;
+   }
+
+   return 0;
+}
+
+static int vduse_free_reconnnect_info_mem(struct vduse_dev *dev)
+{
+   struct vduse_virtqueue *vq;
+
+   for (int i = 0; i < dev->vq_num; i++) {
+   vq = dev->vqs[i];
+
+   if (vq->vdpa_reconnect_vaddr)
+   free_page(vq->vdpa_reconnect_vaddr);
+   vq->vdpa_reconnect_vaddr = 0;
+   }
+
+   return 0;
+}
 
 static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
unsigned long arg)
@@ -1672,6 +1705,8 @@ static int vduse_destroy_dev(char *name)
mutex_unlock(>lock);
return -EBUSY;
}
+   vduse_free_reconnnect_info_mem(dev);
+
dev->connected = true;
mutex_unlock(>lock);
 
@@ -1855,12 +1890,17 @@ static int vduse_create_dev(struct vduse_dev_config 
*config,
ret = vduse_dev_init_vqs(dev, config->vq_align, config->vq_num);
if (ret)
goto err_vqs;
+   ret = vduse_alloc_reconnnect_info_mem(dev);
+   if (ret < 0)
+   goto err_mem;
 
__module_get(THIS_MODULE);
 
return 0;
 err_vqs:
device_destroy(_class, MKDEV(MAJOR(vduse_major), dev->minor));
+err_mem:
+   vduse_free_reconnnect_info_mem(dev);
 err_dev:
idr_remove(_idr, dev->minor);
 err_idr:
-- 
2.43.0

[PATCH v5 2/5] vduse: Add new ioctl VDUSE_DEV_GET_STATUS

2024-04-12 Thread Cindy Lu

The ioctl VDUSE_DEV_GET_STATUS is used by the Userspace App
to get the device status

Signed-off-by: Cindy Lu 
---
 drivers/vdpa/vdpa_user/vduse_dev.c | 7 +++
 include/uapi/linux/vduse.h | 2 ++
 2 files changed, 9 insertions(+)

diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
b/drivers/vdpa/vdpa_user/vduse_dev.c
index ab246da27616..ef3c9681941e 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -1389,6 +1389,13 @@ static long vduse_dev_ioctl(struct file *file, unsigned 
int cmd,
ret = 0;
break;
}
+
+   case VDUSE_DEV_GET_STATUS:
+   /*
+* Returns the status read from device
+*/
+   ret = put_user(dev->status, (u8 __user *)argp);
+   break;
default:
ret = -ENOIOCTLCMD;
break;
diff --git a/include/uapi/linux/vduse.h b/include/uapi/linux/vduse.h
index 125d7529d91b..f501173a9d69 100644
--- a/include/uapi/linux/vduse.h
+++ b/include/uapi/linux/vduse.h
@@ -353,4 +353,6 @@ struct vduse_dev_response {
 /* get device configuration space */
 #define VDUSE_DEV_GET_CONFIG _IOR(VDUSE_BASE, 0x1b, struct vduse_config_data)
 
+#define VDUSE_DEV_GET_STATUS _IOR(VDUSE_BASE, 0x1c, __u8)
+
 #endif /* _UAPI_VDUSE_H_ */
-- 
2.43.0

[PATCH v5 1/5] vduse: Add new ioctl VDUSE_DEV_GET_CONFIG

2024-04-12 Thread Cindy Lu

The ioctl VDUSE_DEV_GET_CONFIG is used by the Userspace App
to get the device configuration space.

Signed-off-by: Cindy Lu 
---
 drivers/vdpa/vdpa_user/vduse_dev.c | 21 +
 include/uapi/linux/vduse.h |  3 +++
 2 files changed, 24 insertions(+)

diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
b/drivers/vdpa/vdpa_user/vduse_dev.c
index 1c1d71d69026..ab246da27616 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -1368,6 +1368,27 @@ static long vduse_dev_ioctl(struct file *file, unsigned 
int cmd,
ret = 0;
break;
}
+   case VDUSE_DEV_GET_CONFIG: {
+   struct vduse_config_data config;
+   unsigned long size = offsetof(struct vduse_config_data, buffer);
+
+   ret = -EFAULT;
+   if (copy_from_user(, argp, size))
+   break;
+
+   ret = -EINVAL;
+   if (config.offset > dev->config_size || config.length == 0 ||
+   config.length > dev->config_size - config.offset)
+   break;
+
+   if (copy_to_user(argp + size, dev->config + config.offset,
+config.length)) {
+   ret = -EFAULT;
+   break;
+   }
+   ret = 0;
+   break;
+   }
default:
ret = -ENOIOCTLCMD;
break;
diff --git a/include/uapi/linux/vduse.h b/include/uapi/linux/vduse.h
index 11bd48c72c6c..125d7529d91b 100644
--- a/include/uapi/linux/vduse.h
+++ b/include/uapi/linux/vduse.h
@@ -350,4 +350,7 @@ struct vduse_dev_response {
};
 };
 
+/* get device configuration space */
+#define VDUSE_DEV_GET_CONFIG _IOR(VDUSE_BASE, 0x1b, struct vduse_config_data)
+
 #endif /* _UAPI_VDUSE_H_ */
-- 
2.43.0

[PATCH v5 0/5] vduse: Add support for reconnection

2024-04-12 Thread Cindy Lu

Here is the reconnect support in vduse

Kernel will allocate pages for reconnection.
Userspace needs to use mmap to map the memory to userspace and use these pages 
to
save the reconnect information.

test passd in vduse+dpdk-testpmd

change in V2
1. Address the comments from v1
2. Add the document for reconnect process

change in V3
1. Move the vdpa_vq_state to the uAPI.  vduse will use this to synchronize the 
vq info between the kernel and userspace app.
2. Add a new ioctl VDUSE_DEV_GET_CONFIG. userspace app use this to get config 
space
3. Rewrite the commit message.
4. Only save the address for the page address and remove the index.
5. remove the ioctl VDUSE_GET_RECONNECT_INFO, userspace app will use uAPI 
VDUSE_RECONNCT_MMAP_SIZE to mmap
6. Rewrite the document for the reconnect process to make it clearer.

change in v4
1. Change the number of map pages to VQ numbers. UserSpace APP can define and 
maintain the structure for saving reconnection information in userspace. The 
kernel will not maintain this information.
2. Rewrite the document for the reconnect process to make it clearer.
3. add the new ioctl for VDUSE_DEV_GET_CONFIG/VDUSE_DEV_GET_STATUS

change in V5
1. update the Documentation for vduse reconnection 
2. merge the patch from Dan Carpenter  vduse: Fix off 
by one in vduse_dev_mmap()
3. fix the wrong comment in the code 

Signed-off-by: Cindy Lu 

Cindy Lu (5):
  vduse: Add new ioctl VDUSE_DEV_GET_CONFIG
  vduse: Add new ioctl VDUSE_DEV_GET_STATUS
  vduse: Add function to get/free the pages for reconnection
  vduse: Add file operation for mmap
  Documentation: Add reconnect process for VDUSE

 Documentation/userspace-api/vduse.rst |  41 +
 drivers/vdpa/vdpa_user/vduse_dev.c| 125 ++
 include/uapi/linux/vduse.h|   5 ++
 3 files changed, 171 insertions(+)

-- 
2.43.0

Re: [PATCH v2] bootconfig: use memblock_free_late to free xbc memory to buddy

2024-04-12 Thread Google

On Fri, 12 Apr 2024 18:49:41 +0800
qiang4.zh...@linux.intel.com wrote:

> From: Qiang Zhang 
> 
> On the time to free xbc memory in xbc_exit(), memblock may has handed
> over memory to buddy allocator. So it doesn't make sense to free memory
> back to memblock. memblock_free() called by xbc_exit() even causes UAF bugs
> on architectures with CONFIG_ARCH_KEEP_MEMBLOCK disabled like x86.
> Following KASAN logs shows this case.
> 
> This patch fixes the xbc memory free problem by calling memblock_free()
> in early xbc init error rewind path and calling memblock_free_late() in
> xbc exit path to free memory to buddy allocator.
> 
> [9.410890] 
> ==
> [9.418962] BUG: KASAN: use-after-free in 
> memblock_isolate_range+0x12d/0x260
> [9.426850] Read of size 8 at addr 88845dd3 by task swapper/0/1
> 
> [9.435901] CPU: 9 PID: 1 Comm: swapper/0 Tainted: G U 
> 6.9.0-rc3-00208-g586b5dfb51b9 #5
> [9.446403] Hardware name: Intel Corporation RPLP LP5 
> (CPU:RaptorLake)/RPLP LP5 (ID:13), BIOS IRPPN02.01.01.00.00.19.015.D- 
> Dec 28 2023
> [9.460789] Call Trace:
> [9.463518]  
> [9.465859]  dump_stack_lvl+0x53/0x70
> [9.469949]  print_report+0xce/0x610
> [9.473944]  ? __virt_addr_valid+0xf5/0x1b0
> [9.478619]  ? memblock_isolate_range+0x12d/0x260
> [9.483877]  kasan_report+0xc6/0x100
> [9.487870]  ? memblock_isolate_range+0x12d/0x260
> [9.493125]  memblock_isolate_range+0x12d/0x260
> [9.498187]  memblock_phys_free+0xb4/0x160
> [9.502762]  ? __pfx_memblock_phys_free+0x10/0x10
> [9.508021]  ? mutex_unlock+0x7e/0xd0
> [9.512111]  ? __pfx_mutex_unlock+0x10/0x10
> [9.516786]  ? kernel_init_freeable+0x2d4/0x430
> [9.521850]  ? __pfx_kernel_init+0x10/0x10
> [9.526426]  xbc_exit+0x17/0x70
> [9.529935]  kernel_init+0x38/0x1e0
> [9.533829]  ? _raw_spin_unlock_irq+0xd/0x30
> [9.538601]  ret_from_fork+0x2c/0x50
> [9.542596]  ? __pfx_kernel_init+0x10/0x10
> [9.547170]  ret_from_fork_asm+0x1a/0x30
> [9.551552]  
> 
> [9.555649] The buggy address belongs to the physical page:
> [9.561875] page: refcount:0 mapcount:0 mapping: index:0x1 
> pfn:0x45dd30
> [9.570821] flags: 0x200(node=0|zone=2)
> [9.576271] page_type: 0x()
> [9.580167] raw: 0200 ea0011774c48 ea0012ba1848 
> 
> [9.588823] raw: 0001   
> 
> [9.597476] page dumped because: kasan: bad access detected
> 
> [9.605362] Memory state around the buggy address:
> [9.610714]  88845dd2ff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> 00 00
> [9.618786]  88845dd2ff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> 00 00
> [9.626857] >88845dd3: ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
> ff ff
> [9.634930]^
> [9.638534]  88845dd30080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
> ff ff
> [9.646605]  88845dd30100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
> ff ff
> [9.654675] 
> ==
> 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Qiang Zhang 

Looks good to me.

Acked-by: Masami Hiramatsu (Google) 

Also,

Fixes: 40caa127f3c7 ("init: bootconfig: Remove all bootconfig data when the 
init memory is removed")

Let me pick this for bootconfig/fixes.

Thanks!

> ---
> v2:
> - add an early flag in xbc_free_mem() to free memory back to memblock in
>   xbc_init error path or put memory to buddy allocator in normal xbc_exit.
> 
> ---
>  include/linux/bootconfig.h |  7 ++-
>  lib/bootconfig.c   | 19 +++
>  2 files changed, 17 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/bootconfig.h b/include/linux/bootconfig.h
> index e5ee2c694401..3f4b4ac527ca 100644
> --- a/include/linux/bootconfig.h
> +++ b/include/linux/bootconfig.h
> @@ -288,7 +288,12 @@ int __init xbc_init(const char *buf, size_t size, const 
> char **emsg, int *epos);
>  int __init xbc_get_info(int *node_size, size_t *data_size);
>  
>  /* XBC cleanup data structures */
> -void __init xbc_exit(void);
> +void __init _xbc_exit(bool early);
> +
> +static inline void xbc_exit(void)
> +{
> + _xbc_exit(false);
> +}
>  
>  /* XBC embedded bootconfig data in kernel */
>  #ifdef CONFIG_BOOT_CONFIG_EMBED
> diff --git a/lib/bootconfig.c b/lib/bootconfig.c
> index c59d26068a64..f9a45adc6307 100644
> --- a/lib/bootconfig.c
> +++ b/lib/bootconfig.c
> @@ -61,9 +61,12 @@ static inline void * __init xbc_alloc_mem(size_t size)
>   return memblock_alloc(size, SMP_CACHE_BYTES);
>  }
>  
> -static inline void __init xbc_free_mem(void *addr, size_t size)
> +static inline void __init xbc_free_mem(void *addr, size_t size, bool early)
>  {
> - memblock_free(addr, size);
> + if (early)
> +

Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback

2024-04-12 Thread Marc Zyngier

On Fri, 12 Apr 2024 11:44:09 +0100,
Will Deacon  wrote:
> 
> On Fri, Apr 05, 2024 at 07:58:12AM -0400, Paolo Bonzini wrote:
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index dc04bc767865..ff17849be9f4 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -1768,40 +1768,6 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct 
> > kvm_gfn_range *range)
> > return false;
> >  }
> >  
> > -bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
> > -{
> > -   kvm_pfn_t pfn = pte_pfn(range->arg.pte);
> > -
> > -   if (!kvm->arch.mmu.pgt)
> > -   return false;
> > -
> > -   WARN_ON(range->end - range->start != 1);
> > -
> > -   /*
> > -* If the page isn't tagged, defer to user_mem_abort() for sanitising
> > -* the MTE tags. The S2 pte should have been unmapped by
> > -* mmu_notifier_invalidate_range_end().
> > -*/
> > -   if (kvm_has_mte(kvm) && !page_mte_tagged(pfn_to_page(pfn)))
> > -   return false;
> > -
> > -   /*
> > -* We've moved a page around, probably through CoW, so let's treat
> > -* it just like a translation fault and the map handler will clean
> > -* the cache to the PoC.
> > -*
> > -* The MMU notifiers will have unmapped a huge PMD before calling
> > -* ->change_pte() (which in turn calls kvm_set_spte_gfn()) and
> > -* therefore we never need to clear out a huge PMD through this
> > -* calling path and a memcache is not required.
> > -*/
> > -   kvm_pgtable_stage2_map(kvm->arch.mmu.pgt, range->start << PAGE_SHIFT,
> > -  PAGE_SIZE, __pfn_to_phys(pfn),
> > -  KVM_PGTABLE_PROT_R, NULL, 0);
> > -
> > -   return false;
> > -}
> > -
> >  bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
> >  {
> > u64 size = (range->end - range->start) << PAGE_SHIFT;
> 
> Thanks. It's nice to see this code retire:
> 
> Acked-by: Will Deacon 
> 
> Also, if you're in the business of hacking the MMU notifier code, it
> would be really great to change the .clear_flush_young() callback so
> that the architecture could handle the TLB invalidation. At the moment,
> the core KVM code invalidates the whole VMID courtesy of 'flush_on_ret'
> being set by kvm_handle_hva_range(), whereas we could do a much
> lighter-weight and targetted TLBI in the architecture page-table code
> when we actually update the ptes for small ranges.

Indeed, and I was looking at this earlier this week as it has a pretty
devastating effect with NV (it blows the shadow S2 for that VMID, with
costly consequences).

In general, it feels like the TLB invalidation should stay with the
code that deals with the page tables, as it has a pretty good idea of
what needs to be invalidated and how -- specially on architectures
that have a HW-broadcast facility like arm64.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.

Re: [PATCH 0/4] KVM, mm: remove the .change_pte() MMU notifier and set_pte_at_notify()

2024-04-12 Thread Marc Zyngier

On Fri, 05 Apr 2024 12:58:11 +0100,
Paolo Bonzini  wrote:
> 
> The .change_pte() MMU notifier callback was intended as an optimization
> and for this reason it was initially called without a surrounding
> mmu_notifier_invalidate_range_{start,end}() pair.  It was only ever
> implemented by KVM (which was also the original user of MMU notifiers)
> and the rules on when to call set_pte_at_notify() rather than set_pte_at()
> have always been pretty obscure.
> 
> It may seem a miracle that it has never caused any hard to trigger
> bugs, but there's a good reason for that: KVM's implementation has
> been nonfunctional for a good part of its existence.  Already in
> 2012, commit 6bdb913f0a70 ("mm: wrap calls to set_pte_at_notify with
> invalidate_range_start and invalidate_range_end", 2012-10-09) changed the
> .change_pte() callback to occur within an invalidate_range_start/end()
> pair; and because KVM unmaps the sPTEs during .invalidate_range_start(),
> .change_pte() has no hope of finding a sPTE to change.
> 
> Therefore, all the code for .change_pte() can be removed from both KVM
> and mm/, and set_pte_at_notify() can be replaced with just set_pte_at().
> 
> Please review!  Also feel free to take the KVM patches through the mm
> tree, as I don't expect any conflicts.
> 
> Thanks,
> 
> Paolo
> 
> Paolo Bonzini (4):
>   KVM: delete .change_pte MMU notifier callback
>   KVM: remove unused argument of kvm_handle_hva_range()
>   mmu_notifier: remove the .change_pte() callback
>   mm: replace set_pte_at_notify() with just set_pte_at()
> 
>  arch/arm64/kvm/mmu.c  | 34 -
>  arch/loongarch/include/asm/kvm_host.h |  1 -
>  arch/loongarch/kvm/mmu.c  | 32 
>  arch/mips/kvm/mmu.c   | 30 ---
>  arch/powerpc/include/asm/kvm_ppc.h|  1 -
>  arch/powerpc/kvm/book3s.c |  5 ---
>  arch/powerpc/kvm/book3s.h |  1 -
>  arch/powerpc/kvm/book3s_64_mmu_hv.c   | 12 --
>  arch/powerpc/kvm/book3s_hv.c  |  1 -
>  arch/powerpc/kvm/book3s_pr.c  |  7 
>  arch/powerpc/kvm/e500_mmu_host.c  |  6 ---
>  arch/riscv/kvm/mmu.c  | 20 --
>  arch/x86/kvm/mmu/mmu.c| 54 +--
>  arch/x86/kvm/mmu/spte.c   | 16 
>  arch/x86/kvm/mmu/spte.h   |  2 -
>  arch/x86/kvm/mmu/tdp_mmu.c| 46 ---
>  arch/x86/kvm/mmu/tdp_mmu.h|  1 -
>  include/linux/kvm_host.h  |  2 -
>  include/linux/mmu_notifier.h  | 44 --
>  include/trace/events/kvm.h| 15 
>  kernel/events/uprobes.c   |  5 +--
>  mm/ksm.c  |  4 +-
>  mm/memory.c   |  7 +---
>  mm/migrate_device.c   |  8 +---
>  mm/mmu_notifier.c | 17 -
>  virt/kvm/kvm_main.c   | 50 +
>  26 files changed, 10 insertions(+), 411 deletions(-)
> 

Reviewed-by: Marc Zyngier 

M.

-- 
Without deviation from the norm, progress is not possible.

Re: [PATCH 2/2] remoteproc: mediatek: Support MT8188 SCP core 1

2024-04-12 Thread 温倩苓

Hi AngeloGioacchino,

Thanks for the reviews.

On Thu, 2024-04-11 at 09:33 +0200, AngeloGioacchino Del Regno wrote:
> Il 11/04/24 05:37, olivia.wen ha scritto:
> > To Support MT8188 SCP core 1 for ISP driver.
> > The SCP on different chips will require different code sizes
> >   and IPI buffer sizes based on varying requirements.
> > 
> > Signed-off-by: olivia.wen 
> > ---
> >   drivers/remoteproc/mtk_common.h|  5 +--
> >   drivers/remoteproc/mtk_scp.c   | 62
> > +++---
> >   drivers/remoteproc/mtk_scp_ipi.c   |  9 --
> >   include/linux/remoteproc/mtk_scp.h |  1 +
> >   4 files changed, 62 insertions(+), 15 deletions(-)
> > 
> > diff --git a/drivers/remoteproc/mtk_common.h
> > b/drivers/remoteproc/mtk_common.h
> > index 6d7736a..8f37f65 100644
> > --- a/drivers/remoteproc/mtk_common.h
> > +++ b/drivers/remoteproc/mtk_common.h
> > @@ -78,7 +78,6 @@
> >   #define MT8195_L2TCM_OFFSET   0x850d0
> >   
> >   #define SCP_FW_VER_LEN32
> > -#define SCP_SHARE_BUFFER_SIZE  288
> >   
> >   struct scp_run {
> > u32 signaled;
> > @@ -110,6 +109,8 @@ struct mtk_scp_of_data {
> > u32 host_to_scp_int_bit;
> >   
> > size_t ipi_buf_offset;
> > +   u32 ipi_buffer_size;
> 
> this should be `ipi_share_buf_size`
> 
> > +   u32 max_code_size;
> 
> max_code_size should probably be dram_code_size or max_dram_size or
> dram_size.
> 
> Also, both should be size_t, not u32.

It will be fixed in the next version.

> 
> >   };
> >   
> >   struct mtk_scp_of_cluster {
> > @@ -162,7 +163,7 @@ struct mtk_scp {
> >   struct mtk_share_obj {
> > u32 id;
> > u32 len;
> > -   u8 share_buf[SCP_SHARE_BUFFER_SIZE];
> > +   u8 *share_buf;
> >   };
> >   
> >   void scp_memcpy_aligned(void __iomem *dst, const void *src,
> > unsigned int len);
> > diff --git a/drivers/remoteproc/mtk_scp.c
> > b/drivers/remoteproc/mtk_scp.c
> > index 6751829..270718d 100644
> > --- a/drivers/remoteproc/mtk_scp.c
> > +++ b/drivers/remoteproc/mtk_scp.c
> > @@ -20,7 +20,6 @@
> >   #include "mtk_common.h"
> >   #include "remoteproc_internal.h"
> >   
> > -#define MAX_CODE_SIZE 0x50
> >   #define SECTION_NAME_IPI_BUFFER ".ipi_buffer"
> >   
> >   /**
> > @@ -94,14 +93,14 @@ static void scp_ipi_handler(struct mtk_scp
> > *scp)
> >   {
> > struct mtk_share_obj __iomem *rcv_obj = scp->recv_buf;
> > struct scp_ipi_desc *ipi_desc = scp->ipi_desc;
> > -   u8 tmp_data[SCP_SHARE_BUFFER_SIZE];
> > +   u8 *tmp_data;
> > scp_ipi_handler_t handler;
> > u32 id = readl(_obj->id);
> > u32 len = readl(_obj->len);
> >   
> > -   if (len > SCP_SHARE_BUFFER_SIZE) {
> > +   if (len > scp->data->ipi_buffer_size) {
> > dev_err(scp->dev, "ipi message too long (len %d, max
> > %d)", len,
> > -   SCP_SHARE_BUFFER_SIZE);
> > +   scp->data->ipi_buffer_size);
> > return;
> > }
> > if (id >= SCP_IPI_MAX) {
> > @@ -109,6 +108,10 @@ static void scp_ipi_handler(struct mtk_scp
> > *scp)
> > return;
> > }
> >   
> > +   tmp_data = kzalloc(len, GFP_KERNEL);
> 
> I think that this will be impacting on performance a bit, especially
> if
> the scp_ipi_handler gets called frequently (and also remember that
> this
> is in interrupt context).
> 
> For best performance, you should allocate this at probe time (in
> struct mtk_scp
> or somewhere else), then:
> 
> len = ipi message length
> memset zero the tmp_data from len to ipi_buffer_size
> 
> memcpy_fromio() etc
> 
> > +   if (!tmp_data)
> > +   return;
> > +
> > scp_ipi_lock(scp, id);
> > handler = ipi_desc[id].handler;
> > if (!handler) {
> > @@ -123,6 +126,7 @@ static void scp_ipi_handler(struct mtk_scp
> > *scp)
> >   
> > scp->ipi_id_ack[id] = true;
> > wake_up(>ack_wq);
> > +   kfree(tmp_data);
> 
> There's a possible memory leak. You forgot to kfree in the NULL
> handler path.
> 
> >   }
> >   

It seems more appropriate to allocate memory in the function
scp_rproc_init and free memory within the function scp_free.

And memset zero the tmp_data by ipi_share_buffer_size in function
scp_ipi_handler.

I will make changes like this.
If there are any other suggestions, plese provide them.
Thank you.

> >   static int scp_elf_read_ipi_buf_addr(struct mtk_scp *scp,
> > @@ -133,6 +137,7 @@ static int scp_ipi_init(struct mtk_scp *scp,
> > const struct firmware *fw)
> >   {
> > int ret;
> > size_t buf_sz, offset;
> > +   size_t share_buf_offset;
> >   
> > /* read the ipi buf addr from FW itself first */
> > ret = scp_elf_read_ipi_buf_addr(scp, fw, );
> > @@ -154,10 +159,12 @@ static int scp_ipi_init(struct mtk_scp *scp,
> > const struct firmware *fw)
> >   
> > scp->recv_buf = (struct mtk_share_obj __iomem *)
> > (scp->sram_base + offset);
> > +   share_buf_offset = sizeof(scp->recv_buf->id)
> > +   + sizeof(scp->recv_buf->len) + scp->data-
> > >ipi_buffer_size;
>

Re: [PATCH 2/2] remoteproc: mediatek: Support MT8188 SCP core 1

2024-04-12 Thread 温倩苓

Hi Krzysztof,

On Thu, 2024-04-11 at 08:07 +0200, Krzysztof Kozlowski wrote:
>
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On 11/04/2024 05:37, olivia.wen wrote:
> > +};
> > +
> >  static const struct of_device_id mtk_scp_of_match[] = {
> >  { .compatible = "mediatek,mt8183-scp", .data = _of_data },
> >  { .compatible = "mediatek,mt8186-scp", .data = _of_data },
> > @@ -1323,6 +1362,7 @@ static const struct of_device_id
> mtk_scp_of_match[] = {
> >  { .compatible = "mediatek,mt8192-scp", .data = _of_data },
> >  { .compatible = "mediatek,mt8195-scp", .data = _of_data },
> >  { .compatible = "mediatek,mt8195-scp-dual", .data =
> _of_data_cores },
> > +{ .compatible = "mediatek,mt8188-scp-dual", .data =
> _of_data_cores },
> 
> Why do you add new entries to the end? Look at the list first.
> 
> Best regards,
> Krzysztof

Thanks for the reviews.
I will change the order as follows.
> +{ .compatible = "mediatek,mt8188-scp-dual", .data =
_of_data_cores },
>  { .compatible = "mediatek,mt8195-scp-dual", .data =
_of_data_cores },

Best regards,
Olivia

Re: [PATCH 1/2] dt-bindings: remoteproc: mediatek: Support MT8188 dual-core SCP

2024-04-12 Thread 温倩苓

Hi AngeloGioacchino,

On Thu, 2024-04-11 at 09:39 +0200, AngeloGioacchino Del Regno wrote:
> Il 11/04/24 09:34, AngeloGioacchino Del Regno ha scritto:
> > Il 11/04/24 05:37, olivia.wen ha scritto:
> > > Under different applications, the MT8188 SCP can be used as
> > > single-core
> > > or dual-core.
> > > 
> > > Signed-off-by: olivia.wen 
> > > ---
> > >   Documentation/devicetree/bindings/remoteproc/mtk,scp.yaml | 3
> > > ++-
> > >   1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git
> > > a/Documentation/devicetree/bindings/remoteproc/mtk,scp.yaml 
> > > b/Documentation/devicetree/bindings/remoteproc/mtk,scp.yaml
> > > index 507f98f..7e7b567 100644
> > > --- a/Documentation/devicetree/bindings/remoteproc/mtk,scp.yaml
> > > +++ b/Documentation/devicetree/bindings/remoteproc/mtk,scp.yaml
> > > @@ -22,7 +22,7 @@ properties:
> > > - mediatek,mt8192-scp
> > > - mediatek,mt8195-scp
> > > - mediatek,mt8195-scp-dual
> > > -
> > 
> > Don't remove the blank line, it's there for readability.
> > 
> > > +  - mediatek,mt8188-scp-dual
> 
> Ah, sorry, one more comment. Please, keep the entries ordered by
> name.
> 8188 goes before 8195.
> 
> > 
> > After addressing that comment,
> > 
> > Reviewed-by: AngeloGioacchino Del Regno <
> > angelogioacchino.delre...@collabora.com>
> > 
> > > reg:
> > >   description:
> > > Should contain the address ranges for memory regions
> > > SRAM, CFG, and,
> > > @@ -195,6 +195,7 @@ allOf:
> > >   compatible:
> > > enum:
> > >   - mediatek,mt8195-scp-dual
> > > +- mediatek,mt8188-scp-dual
> 
> same here.
> 
> > >   then:
> > > properties:
> > >   reg:
> > 
> > 
> 

Thanks for the reviews.
It will be corrected in the next version.

Best regards,
Olivia

Re: [PATCH 1/2] dt-bindings: remoteproc: mediatek: Support MT8188 dual-core SCP

2024-04-12 Thread 温倩苓

Hi Krzysztof,

Thanks for the reviews.

On Thu, 2024-04-11 at 08:06 +0200, Krzysztof Kozlowski wrote:
>
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On 11/04/2024 05:37, olivia.wen wrote:
> > Under different applications, the MT8188 SCP can be used as single-
> core
> > or dual-core.
> > 
> > Signed-off-by: olivia.wen 
> 
> Are you sure you use full name, not email login as name?
> 

Thanks for the reminder. I have made changes.

> > ---
> >  Documentation/devicetree/bindings/remoteproc/mtk,scp.yaml | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git
> a/Documentation/devicetree/bindings/remoteproc/mtk,scp.yaml
> b/Documentation/devicetree/bindings/remoteproc/mtk,scp.yaml
> > index 507f98f..7e7b567 100644
> > --- a/Documentation/devicetree/bindings/remoteproc/mtk,scp.yaml
> > +++ b/Documentation/devicetree/bindings/remoteproc/mtk,scp.yaml
> > @@ -22,7 +22,7 @@ properties:
> >- mediatek,mt8192-scp
> >- mediatek,mt8195-scp
> >- mediatek,mt8195-scp-dual
> > -
> > +  - mediatek,mt8188-scp-dual
> 
> Missing blank line, misordered.
> 

It will be corrected in the next version.

> >reg:
> >  description:
> >Should contain the address ranges for memory regions SRAM,
> CFG, and,
> > @@ -195,6 +195,7 @@ allOf:
> >  compatible:
> >enum:
> >  - mediatek,mt8195-scp-dual
> > +- mediatek,mt8188-scp-dual
> 
> Again, keep the order.
> 

It will be corrected in the next version.

> Best regards,a
> Krzysztof
> 
> 

Best regards,
Olivia

[PATCH v2] bootconfig: use memblock_free_late to free xbc memory to buddy

2024-04-12 Thread qiang4 . zhang

From: Qiang Zhang 

On the time to free xbc memory in xbc_exit(), memblock may has handed
over memory to buddy allocator. So it doesn't make sense to free memory
back to memblock. memblock_free() called by xbc_exit() even causes UAF bugs
on architectures with CONFIG_ARCH_KEEP_MEMBLOCK disabled like x86.
Following KASAN logs shows this case.

This patch fixes the xbc memory free problem by calling memblock_free()
in early xbc init error rewind path and calling memblock_free_late() in
xbc exit path to free memory to buddy allocator.

[9.410890] 
==
[9.418962] BUG: KASAN: use-after-free in memblock_isolate_range+0x12d/0x260
[9.426850] Read of size 8 at addr 88845dd3 by task swapper/0/1

[9.435901] CPU: 9 PID: 1 Comm: swapper/0 Tainted: G U 
6.9.0-rc3-00208-g586b5dfb51b9 #5
[9.446403] Hardware name: Intel Corporation RPLP LP5 (CPU:RaptorLake)/RPLP 
LP5 (ID:13), BIOS IRPPN02.01.01.00.00.19.015.D- Dec 28 2023
[9.460789] Call Trace:
[9.463518]  
[9.465859]  dump_stack_lvl+0x53/0x70
[9.469949]  print_report+0xce/0x610
[9.473944]  ? __virt_addr_valid+0xf5/0x1b0
[9.478619]  ? memblock_isolate_range+0x12d/0x260
[9.483877]  kasan_report+0xc6/0x100
[9.487870]  ? memblock_isolate_range+0x12d/0x260
[9.493125]  memblock_isolate_range+0x12d/0x260
[9.498187]  memblock_phys_free+0xb4/0x160
[9.502762]  ? __pfx_memblock_phys_free+0x10/0x10
[9.508021]  ? mutex_unlock+0x7e/0xd0
[9.512111]  ? __pfx_mutex_unlock+0x10/0x10
[9.516786]  ? kernel_init_freeable+0x2d4/0x430
[9.521850]  ? __pfx_kernel_init+0x10/0x10
[9.526426]  xbc_exit+0x17/0x70
[9.529935]  kernel_init+0x38/0x1e0
[9.533829]  ? _raw_spin_unlock_irq+0xd/0x30
[9.538601]  ret_from_fork+0x2c/0x50
[9.542596]  ? __pfx_kernel_init+0x10/0x10
[9.547170]  ret_from_fork_asm+0x1a/0x30
[9.551552]  

[9.555649] The buggy address belongs to the physical page:
[9.561875] page: refcount:0 mapcount:0 mapping: index:0x1 
pfn:0x45dd30
[9.570821] flags: 0x200(node=0|zone=2)
[9.576271] page_type: 0x()
[9.580167] raw: 0200 ea0011774c48 ea0012ba1848 

[9.588823] raw: 0001   

[9.597476] page dumped because: kasan: bad access detected

[9.605362] Memory state around the buggy address:
[9.610714]  88845dd2ff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00
[9.618786]  88845dd2ff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00
[9.626857] >88845dd3: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff
[9.634930]^
[9.638534]  88845dd30080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff
[9.646605]  88845dd30100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff
[9.654675] 
==

Cc: sta...@vger.kernel.org
Signed-off-by: Qiang Zhang 
---
v2:
- add an early flag in xbc_free_mem() to free memory back to memblock in
  xbc_init error path or put memory to buddy allocator in normal xbc_exit.

---
 include/linux/bootconfig.h |  7 ++-
 lib/bootconfig.c   | 19 +++
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/include/linux/bootconfig.h b/include/linux/bootconfig.h
index e5ee2c694401..3f4b4ac527ca 100644
--- a/include/linux/bootconfig.h
+++ b/include/linux/bootconfig.h
@@ -288,7 +288,12 @@ int __init xbc_init(const char *buf, size_t size, const 
char **emsg, int *epos);
 int __init xbc_get_info(int *node_size, size_t *data_size);
 
 /* XBC cleanup data structures */
-void __init xbc_exit(void);
+void __init _xbc_exit(bool early);
+
+static inline void xbc_exit(void)
+{
+   _xbc_exit(false);
+}
 
 /* XBC embedded bootconfig data in kernel */
 #ifdef CONFIG_BOOT_CONFIG_EMBED
diff --git a/lib/bootconfig.c b/lib/bootconfig.c
index c59d26068a64..f9a45adc6307 100644
--- a/lib/bootconfig.c
+++ b/lib/bootconfig.c
@@ -61,9 +61,12 @@ static inline void * __init xbc_alloc_mem(size_t size)
return memblock_alloc(size, SMP_CACHE_BYTES);
 }
 
-static inline void __init xbc_free_mem(void *addr, size_t size)
+static inline void __init xbc_free_mem(void *addr, size_t size, bool early)
 {
-   memblock_free(addr, size);
+   if (early)
+   memblock_free(addr, size);
+   else
+   memblock_free_late(__pa(addr), size);
 }
 
 #else /* !__KERNEL__ */
@@ -73,7 +76,7 @@ static inline void *xbc_alloc_mem(size_t size)
return malloc(size);
 }
 
-static inline void xbc_free_mem(void *addr, size_t size)
+static inline void xbc_free_mem(void *addr, size_t size, bool early)
 {
free(addr);
 }
@@ -904,13 +907,13 @@ static int __init xbc_parse_tree(void)
  * If you need to reuse xbc_init() with new boot config, you can
  * use this.
  */

Re: [PATCH 2/2] remoteproc: mediatek: Support MT8188 SCP core 1

2024-04-12 Thread 温倩苓

Hi AngeloGioacchino,

Thanks for the reviews.

On Thu, 2024-04-11 at 09:33 +0200, AngeloGioacchino Del Regno wrote:
> Il 11/04/24 05:37, olivia.wen ha scritto:
> > To Support MT8188 SCP core 1 for ISP driver.
> > The SCP on different chips will require different code sizes
> >   and IPI buffer sizes based on varying requirements.
> > 
> > Signed-off-by: olivia.wen 
> > ---
> >   drivers/remoteproc/mtk_common.h|  5 +--
> >   drivers/remoteproc/mtk_scp.c   | 62
> > +++---
> >   drivers/remoteproc/mtk_scp_ipi.c   |  9 --
> >   include/linux/remoteproc/mtk_scp.h |  1 +
> >   4 files changed, 62 insertions(+), 15 deletions(-)
> > 
> > diff --git a/drivers/remoteproc/mtk_common.h
> > b/drivers/remoteproc/mtk_common.h
> > index 6d7736a..8f37f65 100644
> > --- a/drivers/remoteproc/mtk_common.h
> > +++ b/drivers/remoteproc/mtk_common.h
> > @@ -78,7 +78,6 @@
> >   #define MT8195_L2TCM_OFFSET   0x850d0
> >   
> >   #define SCP_FW_VER_LEN32
> > -#define SCP_SHARE_BUFFER_SIZE  288
> >   
> >   struct scp_run {
> > u32 signaled;
> > @@ -110,6 +109,8 @@ struct mtk_scp_of_data {
> > u32 host_to_scp_int_bit;
> >   
> > size_t ipi_buf_offset;
> > +   u32 ipi_buffer_size;
> 
> this should be `ipi_share_buf_size`
> 
> > +   u32 max_code_size;
> 
> max_code_size should probably be dram_code_size or max_dram_size or
> dram_size.
> 
> Also, both should be size_t, not u32.
> 

It will be fixed in the next version.


> >   };
> >   
> >   struct mtk_scp_of_cluster {
> > @@ -162,7 +163,7 @@ struct mtk_scp {
> >   struct mtk_share_obj {
> > u32 id;
> > u32 len;
> > -   u8 share_buf[SCP_SHARE_BUFFER_SIZE];
> > +   u8 *share_buf;
> >   };
> >   
> >   void scp_memcpy_aligned(void __iomem *dst, const void *src,
> > unsigned int len);
> > diff --git a/drivers/remoteproc/mtk_scp.c
> > b/drivers/remoteproc/mtk_scp.c
> > index 6751829..270718d 100644
> > --- a/drivers/remoteproc/mtk_scp.c
> > +++ b/drivers/remoteproc/mtk_scp.c
> > @@ -20,7 +20,6 @@
> >   #include "mtk_common.h"
> >   #include "remoteproc_internal.h"
> >   
> > -#define MAX_CODE_SIZE 0x50
> >   #define SECTION_NAME_IPI_BUFFER ".ipi_buffer"
> >   
> >   /**
> > @@ -94,14 +93,14 @@ static void scp_ipi_handler(struct mtk_scp
> > *scp)
> >   {
> > struct mtk_share_obj __iomem *rcv_obj = scp->recv_buf;
> > struct scp_ipi_desc *ipi_desc = scp->ipi_desc;
> > -   u8 tmp_data[SCP_SHARE_BUFFER_SIZE];
> > +   u8 *tmp_data;
> > scp_ipi_handler_t handler;
> > u32 id = readl(_obj->id);
> > u32 len = readl(_obj->len);
> >   
> > -   if (len > SCP_SHARE_BUFFER_SIZE) {
> > +   if (len > scp->data->ipi_buffer_size) {
> > dev_err(scp->dev, "ipi message too long (len %d, max
> > %d)", len,
> > -   SCP_SHARE_BUFFER_SIZE);
> > +   scp->data->ipi_buffer_size);
> > return;
> > }
> > if (id >= SCP_IPI_MAX) {
> > @@ -109,6 +108,10 @@ static void scp_ipi_handler(struct mtk_scp
> > *scp)
> > return;
> > }
> >   
> > +   tmp_data = kzalloc(len, GFP_KERNEL);
> 
> I think that this will be impacting on performance a bit, especially
> if
> the scp_ipi_handler gets called frequently (and also remember that
> this
> is in interrupt context).
> 
> For best performance, you should allocate this at probe time (in
> struct mtk_scp
> or somewhere else), then:
> 
> len = ipi message length
> memset zero the tmp_data from len to ipi_buffer_size
> 
> memcpy_fromio() etc
> 
> > +   if (!tmp_data)
> > +   return;
> > +
> > scp_ipi_lock(scp, id);
> > handler = ipi_desc[id].handler;
> > if (!handler) {
> > @@ -123,6 +126,7 @@ static void scp_ipi_handler(struct mtk_scp
> > *scp)
> >   
> > scp->ipi_id_ack[id] = true;
> > wake_up(>ack_wq);
> > +   kfree(tmp_data);
> 
> There's a possible memory leak. You forgot to kfree in the NULL
> handler path.
> 
> >   }
> >   

It seems more appropriate to allocate memory in the function
scp_rproc_init and free memory within the function scp_free.

And memset zero the tmp_data by ipi_share_buffer_size in function
scp_ipi_handler.

I will make changes like this.
If there are any other suggestions, plese provide them.
Thank you.


> >   static int scp_elf_read_ipi_buf_addr(struct mtk_scp *scp,
> > @@ -133,6 +137,7 @@ static int scp_ipi_init(struct mtk_scp *scp,
> > const struct firmware *fw)
> >   {
> > int ret;
> > size_t buf_sz, offset;
> > +   size_t share_buf_offset;
> >   
> > /* read the ipi buf addr from FW itself first */
> > ret = scp_elf_read_ipi_buf_addr(scp, fw, );
> > @@ -154,10 +159,12 @@ static int scp_ipi_init(struct mtk_scp *scp,
> > const struct firmware *fw)
> >   
> > scp->recv_buf = (struct mtk_share_obj __iomem *)
> > (scp->sram_base + offset);
> > +   share_buf_offset = sizeof(scp->recv_buf->id)
> > +   + sizeof(scp->recv_buf->len) + scp->data-
> > >ipi_buffer_size;

Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback

2024-04-12 Thread Will Deacon

On Fri, Apr 05, 2024 at 07:58:12AM -0400, Paolo Bonzini wrote:
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index dc04bc767865..ff17849be9f4 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1768,40 +1768,6 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct 
> kvm_gfn_range *range)
>   return false;
>  }
>  
> -bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
> -{
> - kvm_pfn_t pfn = pte_pfn(range->arg.pte);
> -
> - if (!kvm->arch.mmu.pgt)
> - return false;
> -
> - WARN_ON(range->end - range->start != 1);
> -
> - /*
> -  * If the page isn't tagged, defer to user_mem_abort() for sanitising
> -  * the MTE tags. The S2 pte should have been unmapped by
> -  * mmu_notifier_invalidate_range_end().
> -  */
> - if (kvm_has_mte(kvm) && !page_mte_tagged(pfn_to_page(pfn)))
> - return false;
> -
> - /*
> -  * We've moved a page around, probably through CoW, so let's treat
> -  * it just like a translation fault and the map handler will clean
> -  * the cache to the PoC.
> -  *
> -  * The MMU notifiers will have unmapped a huge PMD before calling
> -  * ->change_pte() (which in turn calls kvm_set_spte_gfn()) and
> -  * therefore we never need to clear out a huge PMD through this
> -  * calling path and a memcache is not required.
> -  */
> - kvm_pgtable_stage2_map(kvm->arch.mmu.pgt, range->start << PAGE_SHIFT,
> -PAGE_SIZE, __pfn_to_phys(pfn),
> -KVM_PGTABLE_PROT_R, NULL, 0);
> -
> - return false;
> -}
> -
>  bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
>  {
>   u64 size = (range->end - range->start) << PAGE_SHIFT;

Thanks. It's nice to see this code retire:

Acked-by: Will Deacon 

Also, if you're in the business of hacking the MMU notifier code, it
would be really great to change the .clear_flush_young() callback so
that the architecture could handle the TLB invalidation. At the moment,
the core KVM code invalidates the whole VMID courtesy of 'flush_on_ret'
being set by kvm_handle_hva_range(), whereas we could do a much
lighter-weight and targetted TLBI in the architecture page-table code
when we actually update the ptes for small ranges.

Will

Re: [PATCH 1/2] dt-bindings: remoteproc: mediatek: Support MT8188 dual-core SCP

2024-04-12 Thread 温倩苓

Hi AngeloGioacchino,


On Thu, 2024-04-11 at 09:39 +0200, AngeloGioacchino Del Regno wrote:
> Il 11/04/24 09:34, AngeloGioacchino Del Regno ha scritto:
> > Il 11/04/24 05:37, olivia.wen ha scritto:
> > > Under different applications, the MT8188 SCP can be used as
> > > single-core
> > > or dual-core.
> > > 
> > > Signed-off-by: olivia.wen 
> > > ---
> > >   Documentation/devicetree/bindings/remoteproc/mtk,scp.yaml | 3
> > > ++-
> > >   1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git
> > > a/Documentation/devicetree/bindings/remoteproc/mtk,scp.yaml 
> > > b/Documentation/devicetree/bindings/remoteproc/mtk,scp.yaml
> > > index 507f98f..7e7b567 100644
> > > --- a/Documentation/devicetree/bindings/remoteproc/mtk,scp.yaml
> > > +++ b/Documentation/devicetree/bindings/remoteproc/mtk,scp.yaml
> > > @@ -22,7 +22,7 @@ properties:
> > > - mediatek,mt8192-scp
> > > - mediatek,mt8195-scp
> > > - mediatek,mt8195-scp-dual
> > > -
> > 
> > Don't remove the blank line, it's there for readability.
> > 
> > > +  - mediatek,mt8188-scp-dual
> 
> Ah, sorry, one more comment. Please, keep the entries ordered by
> name.
> 8188 goes before 8195.
> 
> > 
> > After addressing that comment,
> > 
> > Reviewed-by: AngeloGioacchino Del Regno <
> > angelogioacchino.delre...@collabora.com>
> > 
> > > reg:
> > >   description:
> > > Should contain the address ranges for memory regions
> > > SRAM, CFG, and,
> > > @@ -195,6 +195,7 @@ allOf:
> > >   compatible:
> > > enum:
> > >   - mediatek,mt8195-scp-dual
> > > +- mediatek,mt8188-scp-dual
> 
> same here.
> 
> > >   then:
> > > properties:
> > >   reg:
> > 
> > 
> 

Thanks for the reviews.
It will be corrected in the next version.

Best regards,
Olivia

Re: [PATCH RESEND] bootconfig: use memblock_free_late to free xbc memory to buddy

2024-04-12 Thread Qiang Zhang

On Fri, Apr 12, 2024 at 04:34:48PM +0900, Masami Hiramatsu wrote:
>On Fri, 12 Apr 2024 10:41:04 +0800
>qiang4.zh...@linux.intel.com wrote:
>
>> From: Qiang Zhang 
>> 
>> On the time to free xbc memory, memblock has handed over memory to buddy
>> allocator. So it doesn't make sense to free memory back to memblock.
>> memblock_free() called by xbc_exit() even causes UAF bugs on architectures
>> with CONFIG_ARCH_KEEP_MEMBLOCK disabled like x86. Following KASAN logs
>> shows this case.
>> 
>> [9.410890] 
>> ==
>> [9.418962] BUG: KASAN: use-after-free in 
>> memblock_isolate_range+0x12d/0x260
>> [9.426850] Read of size 8 at addr 88845dd3 by task swapper/0/1
>> 
>> [9.435901] CPU: 9 PID: 1 Comm: swapper/0 Tainted: G U 
>> 6.9.0-rc3-00208-g586b5dfb51b9 #5
>> [9.446403] Hardware name: Intel Corporation RPLP LP5 
>> (CPU:RaptorLake)/RPLP LP5 (ID:13), BIOS 
>> IRPPN02.01.01.00.00.19.015.D- Dec 28 2023
>> [9.460789] Call Trace:
>> [9.463518]  
>> [9.465859]  dump_stack_lvl+0x53/0x70
>> [9.469949]  print_report+0xce/0x610
>> [9.473944]  ? __virt_addr_valid+0xf5/0x1b0
>> [9.478619]  ? memblock_isolate_range+0x12d/0x260
>> [9.483877]  kasan_report+0xc6/0x100
>> [9.487870]  ? memblock_isolate_range+0x12d/0x260
>> [9.493125]  memblock_isolate_range+0x12d/0x260
>> [9.498187]  memblock_phys_free+0xb4/0x160
>> [9.502762]  ? __pfx_memblock_phys_free+0x10/0x10
>> [9.508021]  ? mutex_unlock+0x7e/0xd0
>> [9.512111]  ? __pfx_mutex_unlock+0x10/0x10
>> [9.516786]  ? kernel_init_freeable+0x2d4/0x430
>> [9.521850]  ? __pfx_kernel_init+0x10/0x10
>> [9.526426]  xbc_exit+0x17/0x70
>> [9.529935]  kernel_init+0x38/0x1e0
>> [9.533829]  ? _raw_spin_unlock_irq+0xd/0x30
>> [9.538601]  ret_from_fork+0x2c/0x50
>> [9.542596]  ? __pfx_kernel_init+0x10/0x10
>> [9.547170]  ret_from_fork_asm+0x1a/0x30
>> [9.551552]  
>> 
>> [9.555649] The buggy address belongs to the physical page:
>> [9.561875] page: refcount:0 mapcount:0 mapping: 
>> index:0x1 pfn:0x45dd30
>> [9.570821] flags: 0x200(node=0|zone=2)
>> [9.576271] page_type: 0x()
>> [9.580167] raw: 0200 ea0011774c48 ea0012ba1848 
>> 
>> [9.588823] raw: 0001   
>> 
>> [9.597476] page dumped because: kasan: bad access detected
>> 
>> [9.605362] Memory state around the buggy address:
>> [9.610714]  88845dd2ff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
>> 00 00
>> [9.618786]  88845dd2ff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
>> 00 00
>> [9.626857] >88845dd3: ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
>> ff ff
>> [9.634930]^
>> [9.638534]  88845dd30080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
>> ff ff
>> [9.646605]  88845dd30100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
>> ff ff
>> [9.654675] 
>> ==
>> 
>
>Oops, good catch! Indeed, it is too late to use memblock_free().
>
>BTW, is it safe to call memblock_free_late() in early boot stage,
>because xbc_free_mem() will be called also from xbc_init().
>If not, we need a custom internal __xbc_exit() or xbc_cleanup()
>which is called from xbc_init() and uses memblock_free().

No, memblock_free_late() can't be used early.
Exit and Cleanup seem alike and are confusing. Maybe adding a early flag to
_xbc_exit(bool early) is more clear. I will push a V2 for this.

>
>Thank you,
>
>
>> Cc: sta...@vger.kernel.org
>> Signed-off-by: Qiang Zhang 
>> ---
>>  lib/bootconfig.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/lib/bootconfig.c b/lib/bootconfig.c
>> index c59d26068a64..4524ee944df0 100644
>> --- a/lib/bootconfig.c
>> +++ b/lib/bootconfig.c
>> @@ -63,7 +63,7 @@ static inline void * __init xbc_alloc_mem(size_t size)
>>  
>>  static inline void __init xbc_free_mem(void *addr, size_t size)
>>  {
>> -memblock_free(addr, size);
>> +memblock_free_late(__pa(addr), size);
>>  }
>>  
>>  #else /* !__KERNEL__ */
>> -- 
>> 2.39.2
>> 
>> 
>
>
>-- 
>Masami Hiramatsu (Google)

Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()

2024-04-12 Thread Ingo Molnar



* Mike Rapoport  wrote:

> +/**
> + * enum execmem_type - types of executable memory ranges
> + *
> + * There are several subsystems that allocate executable memory.
> + * Architectures define different restrictions on placement,
> + * permissions, alignment and other parameters for memory that can be used
> + * by these subsystems.
> + * Types in this enum identify subsystems that allocate executable memory
> + * and let architectures define parameters for ranges suitable for
> + * allocations by each subsystem.
> + *
> + * @EXECMEM_DEFAULT: default parameters that would be used for types that
> + * are not explcitly defined.
> + * @EXECMEM_MODULE_TEXT: parameters for module text sections
> + * @EXECMEM_KPROBES: parameters for kprobes
> + * @EXECMEM_FTRACE: parameters for ftrace
> + * @EXECMEM_BPF: parameters for BPF
> + * @EXECMEM_TYPE_MAX:
> + */
> +enum execmem_type {
> + EXECMEM_DEFAULT,
> + EXECMEM_MODULE_TEXT = EXECMEM_DEFAULT,
> + EXECMEM_KPROBES,
> + EXECMEM_FTRACE,
> + EXECMEM_BPF,
> + EXECMEM_TYPE_MAX,
> +};

s/explcitly
 /explicitly

Thanks,

Ingo

Re: [RFC PATCH 5/7] x86/module: perpare module loading for ROX allocations of text

2024-04-12 Thread Ingo Molnar

* Mike Rapoport  wrote:

>   for (s = start; s < end; s++) {
>   void *addr = (void *)s + *s;
> + void *wr_addr = addr + module_writable_offset(mod, addr);

So instead of repeating this pattern in a dozen of places, why not use a 
simpler method:

void *wr_addr = module_writable_address(mod, addr);

or so, since we have to pass 'addr' to the module code anyway.

The text patching code is pretty complex already.

Thanks,

Ingo

Re: [PATCH v2 1/2] dt-bindings: pinctrl: qcom,pmic-gpio: Allow gpio-hog nodes

2024-04-12 Thread Linus Walleij

On Tue, Apr 9, 2024 at 8:36 PM Luca Weiss  wrote:

> Allow specifying a GPIO hog, as already used on
> qcom-msm8974-lge-nexus5-hammerhead.dts.
>
> Signed-off-by: Luca Weiss 

This patch applied to the pinctrl tree!

Yours,
Linus Walleij

Re: [PATCH RESEND] bootconfig: use memblock_free_late to free xbc memory to buddy

2024-04-12 Thread Google

On Fri, 12 Apr 2024 10:41:04 +0800
qiang4.zh...@linux.intel.com wrote:

> From: Qiang Zhang 
> 
> On the time to free xbc memory, memblock has handed over memory to buddy
> allocator. So it doesn't make sense to free memory back to memblock.
> memblock_free() called by xbc_exit() even causes UAF bugs on architectures
> with CONFIG_ARCH_KEEP_MEMBLOCK disabled like x86. Following KASAN logs
> shows this case.
> 
> [9.410890] 
> ==
> [9.418962] BUG: KASAN: use-after-free in 
> memblock_isolate_range+0x12d/0x260
> [9.426850] Read of size 8 at addr 88845dd3 by task swapper/0/1
> 
> [9.435901] CPU: 9 PID: 1 Comm: swapper/0 Tainted: G U 
> 6.9.0-rc3-00208-g586b5dfb51b9 #5
> [9.446403] Hardware name: Intel Corporation RPLP LP5 
> (CPU:RaptorLake)/RPLP LP5 (ID:13), BIOS IRPPN02.01.01.00.00.19.015.D- 
> Dec 28 2023
> [9.460789] Call Trace:
> [9.463518]  
> [9.465859]  dump_stack_lvl+0x53/0x70
> [9.469949]  print_report+0xce/0x610
> [9.473944]  ? __virt_addr_valid+0xf5/0x1b0
> [9.478619]  ? memblock_isolate_range+0x12d/0x260
> [9.483877]  kasan_report+0xc6/0x100
> [9.487870]  ? memblock_isolate_range+0x12d/0x260
> [9.493125]  memblock_isolate_range+0x12d/0x260
> [9.498187]  memblock_phys_free+0xb4/0x160
> [9.502762]  ? __pfx_memblock_phys_free+0x10/0x10
> [9.508021]  ? mutex_unlock+0x7e/0xd0
> [9.512111]  ? __pfx_mutex_unlock+0x10/0x10
> [9.516786]  ? kernel_init_freeable+0x2d4/0x430
> [9.521850]  ? __pfx_kernel_init+0x10/0x10
> [9.526426]  xbc_exit+0x17/0x70
> [9.529935]  kernel_init+0x38/0x1e0
> [9.533829]  ? _raw_spin_unlock_irq+0xd/0x30
> [9.538601]  ret_from_fork+0x2c/0x50
> [9.542596]  ? __pfx_kernel_init+0x10/0x10
> [9.547170]  ret_from_fork_asm+0x1a/0x30
> [9.551552]  
> 
> [9.555649] The buggy address belongs to the physical page:
> [9.561875] page: refcount:0 mapcount:0 mapping: index:0x1 
> pfn:0x45dd30
> [9.570821] flags: 0x200(node=0|zone=2)
> [9.576271] page_type: 0x()
> [9.580167] raw: 0200 ea0011774c48 ea0012ba1848 
> 
> [9.588823] raw: 0001   
> 
> [9.597476] page dumped because: kasan: bad access detected
> 
> [9.605362] Memory state around the buggy address:
> [9.610714]  88845dd2ff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> 00 00
> [9.618786]  88845dd2ff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> 00 00
> [9.626857] >88845dd3: ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
> ff ff
> [9.634930]^
> [9.638534]  88845dd30080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
> ff ff
> [9.646605]  88845dd30100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
> ff ff
> [9.654675] 
> ==
> 

Oops, good catch! Indeed, it is too late to use memblock_free().

BTW, is it safe to call memblock_free_late() in early boot stage,
because xbc_free_mem() will be called also from xbc_init().
If not, we need a custom internal __xbc_exit() or xbc_cleanup()
which is called from xbc_init() and uses memblock_free().

Thank you,


> Cc: sta...@vger.kernel.org
> Signed-off-by: Qiang Zhang 
> ---
>  lib/bootconfig.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/bootconfig.c b/lib/bootconfig.c
> index c59d26068a64..4524ee944df0 100644
> --- a/lib/bootconfig.c
> +++ b/lib/bootconfig.c
> @@ -63,7 +63,7 @@ static inline void * __init xbc_alloc_mem(size_t size)
>  
>  static inline void __init xbc_free_mem(void *addr, size_t size)
>  {
> - memblock_free(addr, size);
> + memblock_free_late(__pa(addr), size);
>  }
>  
>  #else /* !__KERNEL__ */
> -- 
> 2.39.2
> 
> 


-- 
Masami Hiramatsu (Google)

Re: [RFC PATCH 0/4] perf: Correlating user process data to samples

2024-04-12 Thread Peter Zijlstra

On Fri, Apr 12, 2024 at 12:17:28AM +, Beau Belgrave wrote:

> An idea flow would look like this:
> User Task Profile
> do_work();sample() -> IP + No activity
> ...
> set_activity(123);
> ...
> do_work();sample() -> IP + activity (123)
> ...
> set_activity(124);
> ...
> do_work();sample() -> IP + activity (124)

This, start with this, because until I saw this, I was utterly confused
as to what the heck you were on about.

I started by thinking we already have TID in samples so you can already
associate back to user processes and got increasingly confused the
further I went.

What you seem to want to do however is have some task-state included so
you can see what the thread is doing.

Anyway, since we typically run stuff from NMI context, accessing user
data is 'interesting'. As such I would really like to make this work
depend on the call-graph rework that pushes all the user access bits
into return-to-user.

Re: [RFC PATCH 2/7] mm: vmalloc: don't account for number of nodes for HUGE_VMAP allocations

2024-04-12 Thread Christophe Leroy



Le 11/04/2024 à 18:05, Mike Rapoport a écrit :
> From: "Mike Rapoport (IBM)" 
> 
> vmalloc allocations with VM_ALLOW_HUGE_VMAP that do not explictly
> specify node ID will use huge pages only if size_per_node is larger than
> PMD_SIZE.
> Still the actual allocated memory is not distributed between nodes and
> there is no advantage in such approach.
> On the contrary, BPF allocates PMD_SIZE * num_possible_nodes() for each
> new bpf_prog_pack, while it could do with PMD_SIZE'ed packs.
> 
> Don't account for number of nodes for VM_ALLOW_HUGE_VMAP with
> NUMA_NO_NODE and use huge pages whenever the requested allocation size
> is larger than PMD_SIZE.

Patch looks ok but message is confusing. We also use huge pages at PTE 
size, for instance 512k pages or 16k pages on powerpc 8xx, while 
PMD_SIZE is 4M.

Christophe

> 
> Signed-off-by: Mike Rapoport (IBM) 
> ---
>   mm/vmalloc.c | 9 ++---
>   1 file changed, 2 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 22aa63f4ef63..5fc8b514e457 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -3737,8 +3737,6 @@ void *__vmalloc_node_range(unsigned long size, unsigned 
> long align,
>   }
>   
>   if (vmap_allow_huge && (vm_flags & VM_ALLOW_HUGE_VMAP)) {
> - unsigned long size_per_node;
> -
>   /*
>* Try huge pages. Only try for PAGE_KERNEL allocations,
>* others like modules don't yet expect huge pages in
> @@ -3746,13 +3744,10 @@ void *__vmalloc_node_range(unsigned long size, 
> unsigned long align,
>* supporting them.
>*/
>   
> - size_per_node = size;
> - if (node == NUMA_NO_NODE)
> - size_per_node /= num_online_nodes();
> - if (arch_vmap_pmd_supported(prot) && size_per_node >= PMD_SIZE)
> + if (arch_vmap_pmd_supported(prot) && size >= PMD_SIZE)
>   shift = PMD_SHIFT;
>   else
> - shift = arch_vmap_pte_supported_shift(size_per_node);
> + shift = arch_vmap_pte_supported_shift(size);
>   
>   align = max(real_align, 1UL << shift);
>   size = ALIGN(real_size, 1UL << shift);

47 matches

Mail list logo