date:20240816

Re: [PATCH v2 01/19] tools: Add gendwarfksyms

2024-08-16 Thread Greg Kroah-Hartman

On Thu, Aug 15, 2024 at 05:39:05PM +, Sami Tolvanen wrote:
> --- /dev/null
> +++ b/scripts/gendwarfksyms/dwarf.c
> @@ -0,0 +1,87 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later

Sorry, but I have to ask, do you _REALLY_ mean "or later" here and in
other places in this series?  If so, great, but I want to be sure, as I
know:

> + * Copyright (C) 2024 Google LLC

Has some issues with the types of licenses that marking will cover.

thanks,

greg k-h

Re: [PATCH v2 00/19] Implement DWARF modversions

2024-08-16 Thread Greg Kroah-Hartman

On Thu, Aug 15, 2024 at 05:39:04PM +, Sami Tolvanen wrote:
> Changes in v2:
> - Per Luis' request, dropped Rust-specific patches and added
>   gendwarfksyms as an alternative to genksyms for the entire
>   kernel.
> 
> - Added support for missing DWARF features needed to handle
>   also non-Rust code.
> 
> - Changed symbol address matching to use the symbol table
>   information instead of relying on addresses in DWARF.
> 
> - Added __gendwarfksyms_ptr patches to ensure the compiler emits
>   the necessary type information in DWARF even for symbols that
>   are defined in other TUs.
> 
> - Refactored debugging output and moved the more verbose output
>   behind --dump* flags.
> 
> - Added a --symtypes flag for generating a genksyms-style
>   symtypes output based on Petr's feedback, and refactored
>   symbol version calculations to be based on symtypes instead
>   of raw --dump-dies output.
> 
> - Based on feedback from Greg and Petr, added --stable flag and
>   support for reserved data structure fields and declaration-onl
>   structures. Also added examples for using these features.

I missed the examples for this, is there a Documentation/ update
somewhere to explain this?  What patch of the series handles this?

thanks,

greg k-h

Re: [PATCH v2 16/19] gendwarfksyms: Add support for reserved structure fields

2024-08-16 Thread Greg Kroah-Hartman

On Thu, Aug 15, 2024 at 05:39:20PM +, Sami Tolvanen wrote:
> Distributions that want to maintain a stable kABI need the ability to
> add reserved fields to kernel data structures that they anticipate
> will be modified during the ABI support timeframe, either by LTS
> updates or backports.
> 
> With genksyms, developers would typically hide changes to the reserved
> fields from version calculation with #ifndef __GENKSYMS__, which would
> result in the symbol version not changing even though the actual type
> of the reserved field changes. When we process precompiled object
> files, this is again not an option.
> 
> To support stable symbol versions for reserved fields, change the
> union type processing to recognize field name prefixes, and if the
> union contains a field name that starts with __kabi_reserved, only use
> the type of that field for computing symbol versions. In other words,
> let's assume we have a structure where we want to reserve space for
> future changes:
> 
>   struct struct1 {
> long a;
> long __kabi_reserved_0; /* reserved for future use */
>   };
>   struct struct1 exported;
> 
> gendwarfksyms --debug produces the following output:
> 
>   variable structure_type struct1 {
> member base_type long int byte_size(8) encoding(5) 
> data_member_location(0),
> member base_type long int byte_size(8) encoding(5) 
> data_member_location(8),
>   } byte_size(16);
>   #SYMVER exported 0x67997f89
> 
> To take the reserved field into use, a distribution would replace it
> with a union, with one of the fields keeping the __kabi_reserved name
> prefix for the original type:
> 
>   struct struct1 {
> long a;
> union {
>   long __kabi_reserved_0;
>   struct {
>   int b;
>   int v;
>   };
> };
> 

Ah, ignore my previous email, here's the --stable stuff.

But this all needs to go into some documentation somewhere, trying to
dig it out of a changelog is going to be impossible to point people at.

> +/* See dwarf.c:process_reserved */
> +#define RESERVED_PREFIX "__kabi_reserved"

Seems semi-sane, I can live with this.

I don't know if you want to take the next step and provide examples of
how to use this in "easy to use macros" for it all, but if so, that
might be nice.  Especially as I have no idea how you are going to do
this with the rust side of things, this all will work for any structures
defined in .rs code, right?

thanks,

greg k-h

Re: [PATCH v4 2/3] remoteproc: k3-r5: Acquire mailbox handle during probe routine

2024-08-16 Thread Beleswar Prasad Padhi


Hi Mathieu,

On 14-08-2024 21:22, Mathieu Poirier wrote:
Hi Beleswar, On Thu, Aug 08, 2024 at 01: 11: 26PM +0530, Beleswar 
Padhi wrote: > Acquire the mailbox handle during device probe and do 
not release handle > in stop/detach routine or error paths. This 
removes the redundant > requests for

ZjQcmQRYFpfptBannerStart
Report Suspicious
 


ZjQcmQRYFpfptBannerEnd
Hi Beleswar,

On Thu, Aug 08, 2024 at 01:11:26PM +0530, Beleswar Padhi wrote:
> Acquire the mailbox handle during device probe and do not release handle
> in stop/detach routine or error paths. This removes the redundant
> requests for mbox handle later during rproc start/attach. This also
> allows to defer remoteproc driver's probe if mailbox is not probed yet.
> 
> Signed-off-by: Beleswar Padhi 

> ---
>  drivers/remoteproc/ti_k3_r5_remoteproc.c | 78 +---
>  1 file changed, 30 insertions(+), 48 deletions(-)
> 
> diff --git a/drivers/remoteproc/ti_k3_r5_remoteproc.c b/drivers/remoteproc/ti_k3_r5_remoteproc.c

> index 57067308b3c0..8a63a9360c0f 100644
> --- a/drivers/remoteproc/ti_k3_r5_remoteproc.c
> +++ b/drivers/remoteproc/ti_k3_r5_remoteproc.c
> @@ -194,6 +194,10 @@ static void k3_r5_rproc_mbox_callback(struct mbox_client 
*client, void *data)
>const char *name = kproc->rproc->name;
>u32 msg = omap_mbox_message(data);
>  
> +	/* Do not forward message from a detached core */

> +  if (kproc->rproc->state == RPROC_DETACHED)
> +  return;
> +
>dev_dbg(dev, "mbox msg: 0x%x\n", msg);
>  
>  	switch (msg) {

> @@ -229,6 +233,10 @@ static void k3_r5_rproc_kick(struct rproc *rproc, int 
vqid)
>mbox_msg_t msg = (mbox_msg_t)vqid;
>int ret;
>  
> +	/* Do not forward message to a detached core */

> +  if (kproc->rproc->state == RPROC_DETACHED)
> +  return;
> +
>/* send the index of the triggered virtqueue in the mailbox payload */
>ret = mbox_send_message(kproc->mbox, (void *)msg);
>if (ret < 0)
> @@ -399,12 +407,9 @@ static int k3_r5_rproc_request_mbox(struct rproc *rproc)
>client->knows_txdone = false;
>  
>  	kproc->mbox = mbox_request_channel(client, 0);

> -  if (IS_ERR(kproc->mbox)) {
> -  ret = -EBUSY;
> -  dev_err(dev, "mbox_request_channel failed: %ld\n",
> -  PTR_ERR(kproc->mbox));
> -  return ret;
> -  }
> +  if (IS_ERR(kproc->mbox))
> +  return dev_err_probe(dev, PTR_ERR(kproc->mbox),
> +   "mbox_request_channel failed\n");
>  
>  	/*

> * Ping the remote processor, this is only for sanity-sake for now;
> @@ -552,10 +557,6 @@ static int k3_r5_rproc_start(struct rproc *rproc)
>u32 boot_addr;
>int ret;
>  
> -	ret = k3_r5_rproc_request_mbox(rproc);

> -  if (ret)
> -  return ret;
> -
>boot_addr = rproc->bootaddr;
>/* TODO: add boot_addr sanity checking */
>dev_dbg(dev, "booting R5F core using boot addr = 0x%x\n", boot_addr);
> @@ -564,7 +565,7 @@ static int k3_r5_rproc_start(struct rproc *rproc)
>core = kproc->core;
>ret = ti_sci_proc_set_config(core->tsp, boot_addr, 0, 0);
>if (ret)
> -  goto put_mbox;
> +  return ret;
>  
>  	/* unhalt/run all applicable cores */

>if (cluster->mode == CLUSTER_MODE_LOCKSTEP) {
> @@ -580,13 +581,12 @@ static int k3_r5_rproc_start(struct rproc *rproc)
>if (core != core0 && core0->rproc->state == RPROC_OFFLINE) {
>dev_err(dev, "%s: can not start core 1 before core 0\n",
>__func__);
> -  ret = -EPERM;
> -  goto put_mbox;
> +  return -EPERM;
>}
>  
>  		ret = k3_r5_core_run(core);

>if (ret)
> -  goto put_mbox;
> +  return ret;
>}
>  
>  	return 0;

> @@ -596,8 +596,6 @@ static int k3_r5_rproc_start(struct rproc *rproc)
>if (k3_r5_core_halt(core))
>dev_warn(core->dev, "core halt back failed\n");
>}
> -put_mbox:
> -  mbox_free_channel(kproc->mbox);
>return ret;
>  }
>  
> @@ -658,8 +656,6 @@ static int k3_r5_rproc_stop(struct rproc *rproc)

>goto out;
>}
>  
> -	mbox_free_channel(kproc->mbox);

> -
>return 0;
>  
>  unroll_core_halt:

> @@ -674,42 +670,22 @@ static int k3_r5_rproc_stop(struct rproc *rproc)
>  /*
>   * Attach to a running R5F remote processor (IPC-only mode)
>   *
> - * The R5F attach callback only needs to request the mailbox, the remote
> - * processor is already booted, so there is no need to issue any TI-SCI
> - * commands to boot the R5F cores in IPC-only mode. This callback is invoked
> - * only in IPC-only mode.
> + * The R5F attach callback is a NOP. The remote processor is already booted, 
and
> + * all required resources have been acquired during probe routine, so there 
is
> + * no need to issue any TI-SCI commands to boot the R5F c

[PATCH vhost v2 00/10] vdpa/mlx5: Parallelize device suspend/resume

2024-08-16 Thread Dragos Tatulea

This series parallelizes the mlx5_vdpa device suspend and resume
operations through the firmware async API. The purpose is to reduce live
migration downtime.

The series starts with changing the VQ suspend and resume commands
to the async API. After that, the switch is made to issue multiple
commands of the same type in parallel.

Then, the an additional improvement is added: keep the notifiers enabled
during suspend but make it a NOP. Upon resume make sure that the link
state is forwarded. This shaves around 30ms per device constant time.

Finally, use parallel VQ suspend and resume during the CVQ MQ command.

For 1 vDPA device x 32 VQs (16 VQPs), on a large VM (256 GB RAM, 32 CPUs
x 2 threads per core), the improvements are:

+---+++---+
| operation | Before | After  | Reduction |
|---+++---|
| mlx5_vdpa_suspend | 37 ms  | 2.5 ms | 14x   |
| mlx5_vdpa_resume  | 16 ms  | 5 ms   |  3x   |
+---+++---+

---
v2:
- Changed to parallel VQ suspend/resume during CVQ MQ command.
  Support added in the last 2 patches.
- Made the fw async command more generic and moved it to resources.c.
  Did that because the following series (parallel mkey ops) needs this
  code as well.
  Dropped Acked-by from Eugenio on modified patches.
- Fixed kfree -> kvfree.
- Removed extra newline caught during review.
- As discussed in the v1, the series can be pulled in completely in
  the vhost tree [0]. The mlx5_core patch was reviewed by Tariq who is
  also a maintainer for mlx5_core.

[0] - 
https://lore.kernel.org/virtualization/6582792d-8db2-4bc0-bf3a-248fe5c8f...@nvidia.com/T/#maefabb2fde5adfb322d16ca16ae64d540f75b7d2

Dragos Tatulea (10):
  net/mlx5: Support throttled commands from async API
  vdpa/mlx5: Introduce error logging function
  vdpa/mlx5: Introduce async fw command wrapper
  vdpa/mlx5: Use async API for vq query command
  vdpa/mlx5: Use async API for vq modify commands
  vdpa/mlx5: Parallelize device suspend
  vdpa/mlx5: Parallelize device resume
  vdpa/mlx5: Keep notifiers during suspend but ignore
  vdpa/mlx5: Small improvement for change_num_qps()
  vdpa/mlx5: Parallelize VQ suspend/resume for CVQ MQ command

 drivers/net/ethernet/mellanox/mlx5/core/cmd.c |  21 +-
 drivers/vdpa/mlx5/core/mlx5_vdpa.h|  22 +
 drivers/vdpa/mlx5/core/resources.c|  73 
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 396 +++---
 4 files changed, 361 insertions(+), 151 deletions(-)

-- 
2.45.1

[PATCH vhost v2 02/10] vdpa/mlx5: Introduce error logging function

2024-08-16 Thread Dragos Tatulea

mlx5_vdpa_err() was missing. This patch adds it and uses it in the
necessary places.

Signed-off-by: Dragos Tatulea 
Reviewed-by: Tariq Toukan 
Acked-by: Eugenio Pérez 
---
 drivers/vdpa/mlx5/core/mlx5_vdpa.h |  5 +
 drivers/vdpa/mlx5/net/mlx5_vnet.c  | 24 
 2 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h 
b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
index 50aac8fe57ef..424d445ebee4 100644
--- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
@@ -135,6 +135,11 @@ int mlx5_vdpa_update_cvq_iotlb(struct mlx5_vdpa_dev *mvdev,
 int mlx5_vdpa_create_dma_mr(struct mlx5_vdpa_dev *mvdev);
 int mlx5_vdpa_reset_mr(struct mlx5_vdpa_dev *mvdev, unsigned int asid);
 
+#define mlx5_vdpa_err(__dev, format, ...)  
\
+   dev_err((__dev)->mdev->device, "%s:%d:(pid %d) error: " format, 
__func__, __LINE__,\
+current->pid, ##__VA_ARGS__)
+
+
 #define mlx5_vdpa_warn(__dev, format, ...) 
\
dev_warn((__dev)->mdev->device, "%s:%d:(pid %d) warning: " format, 
__func__, __LINE__, \
 current->pid, ##__VA_ARGS__)
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index fa78e8288ebb..12133e5d1285 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1538,13 +1538,13 @@ static int suspend_vq(struct mlx5_vdpa_net *ndev, 
struct mlx5_vdpa_virtqueue *mv
 
err = modify_virtqueue_state(ndev, mvq, 
MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND);
if (err) {
-   mlx5_vdpa_warn(&ndev->mvdev, "modify to suspend failed, err: 
%d\n", err);
+   mlx5_vdpa_err(&ndev->mvdev, "modify to suspend failed, err: 
%d\n", err);
return err;
}
 
err = query_virtqueue(ndev, mvq, &attr);
if (err) {
-   mlx5_vdpa_warn(&ndev->mvdev, "failed to query virtqueue, err: 
%d\n", err);
+   mlx5_vdpa_err(&ndev->mvdev, "failed to query virtqueue, err: 
%d\n", err);
return err;
}
 
@@ -1585,7 +1585,7 @@ static int resume_vq(struct mlx5_vdpa_net *ndev, struct 
mlx5_vdpa_virtqueue *mvq
 */
err = modify_virtqueue(ndev, mvq, 0);
if (err) {
-   mlx5_vdpa_warn(&ndev->mvdev,
+   mlx5_vdpa_err(&ndev->mvdev,
"modify vq properties failed for vq %u, err: 
%d\n",
mvq->index, err);
return err;
@@ -1600,15 +1600,15 @@ static int resume_vq(struct mlx5_vdpa_net *ndev, struct 
mlx5_vdpa_virtqueue *mvq
case MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY:
return 0;
default:
-   mlx5_vdpa_warn(&ndev->mvdev, "resume vq %u called from bad 
state %d\n",
+   mlx5_vdpa_err(&ndev->mvdev, "resume vq %u called from bad state 
%d\n",
   mvq->index, mvq->fw_state);
return -EINVAL;
}
 
err = modify_virtqueue_state(ndev, mvq, 
MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
if (err)
-   mlx5_vdpa_warn(&ndev->mvdev, "modify to resume failed for vq 
%u, err: %d\n",
-  mvq->index, err);
+   mlx5_vdpa_err(&ndev->mvdev, "modify to resume failed for vq %u, 
err: %d\n",
+ mvq->index, err);
 
return err;
 }
@@ -2002,13 +2002,13 @@ static int setup_steering(struct mlx5_vdpa_net *ndev)
 
ns = mlx5_get_flow_namespace(ndev->mvdev.mdev, 
MLX5_FLOW_NAMESPACE_BYPASS);
if (!ns) {
-   mlx5_vdpa_warn(&ndev->mvdev, "failed to get flow namespace\n");
+   mlx5_vdpa_err(&ndev->mvdev, "failed to get flow namespace\n");
return -EOPNOTSUPP;
}
 
ndev->rxft = mlx5_create_auto_grouped_flow_table(ns, &ft_attr);
if (IS_ERR(ndev->rxft)) {
-   mlx5_vdpa_warn(&ndev->mvdev, "failed to create flow table\n");
+   mlx5_vdpa_err(&ndev->mvdev, "failed to create flow table\n");
return PTR_ERR(ndev->rxft);
}
mlx5_vdpa_add_rx_flow_table(ndev);
@@ -2530,7 +2530,7 @@ static int mlx5_vdpa_get_vq_state(struct vdpa_device 
*vdev, u16 idx, struct vdpa
 
err = query_virtqueue(ndev, mvq, &attr);
if (err) {
-   mlx5_vdpa_warn(mvdev, "failed to query virtqueue\n");
+   mlx5_vdpa_err(mvdev, "failed to query virtqueue\n");
return err;
}
state->split.avail_index = attr.used_index;
@@ -3189,7 +3189,7 @@ static int mlx5_vdpa_compat_reset(struct vdpa_device 
*vdev, u32 flags)
if ((flags & VDPA_RESET_F_CLEAN_MAP) &&
MLX5_CAP_GEN(mvdev->mdev, umem_uid_0)) {
if (mlx5_vdpa_create_dma_mr(mvdev))
-   ml

[PATCH mlx5-vhost v2 01/10] net/mlx5: Support throttled commands from async API

2024-08-16 Thread Dragos Tatulea

Currently, commands that qualify as throttled can't be used via the
async API. That's due to the fact that the throttle semaphore can sleep
but the async API can't.

This patch allows throttling in the async API by using the tentative
variant of the semaphore and upon failure (semaphore at 0) returns EBUSY
to signal to the caller that they need to wait for the completion of
previously issued commands.

Furthermore, make sure that the semaphore is released in the callback.

Signed-off-by: Dragos Tatulea 
Cc: Leon Romanovsky 
Reviewed-by: Tariq Toukan 
---
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 21 ++-
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c 
b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index 20768ef2e9d2..f69c977c1569 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -1882,10 +1882,12 @@ static int cmd_exec(struct mlx5_core_dev *dev, void 
*in, int in_size, void *out,
 
throttle_op = mlx5_cmd_is_throttle_opcode(opcode);
if (throttle_op) {
-   /* atomic context may not sleep */
-   if (callback)
-   return -EINVAL;
-   down(&dev->cmd.vars.throttle_sem);
+   if (callback) {
+   if (down_trylock(&dev->cmd.vars.throttle_sem))
+   return -EBUSY;
+   } else {
+   down(&dev->cmd.vars.throttle_sem);
+   }
}
 
pages_queue = is_manage_pages(in);
@@ -2091,10 +2093,19 @@ static void mlx5_cmd_exec_cb_handler(int status, void 
*_work)
 {
struct mlx5_async_work *work = _work;
struct mlx5_async_ctx *ctx;
+   struct mlx5_core_dev *dev;
+   u16 opcode;
 
ctx = work->ctx;
-   status = cmd_status_err(ctx->dev, status, work->opcode, work->op_mod, 
work->out);
+   dev = ctx->dev;
+   opcode = work->opcode;
+   status = cmd_status_err(dev, status, work->opcode, work->op_mod, 
work->out);
work->user_callback(status, work);
+   /* Can't access "work" from this point on. It could have been freed in
+* the callback.
+*/
+   if (mlx5_cmd_is_throttle_opcode(opcode))
+   up(&dev->cmd.vars.throttle_sem);
if (atomic_dec_and_test(&ctx->num_inflight))
complete(&ctx->inflight_done);
 }
-- 
2.45.1

[PATCH vhost v2 03/10] vdpa/mlx5: Introduce async fw command wrapper

2024-08-16 Thread Dragos Tatulea

Introduce a new function mlx5_vdpa_exec_async_cmds() which
wraps the mlx5_core async firmware command API in a way
that will be used to parallelize certain operation in this
driver.

The wrapper deals with the case when mlx5_cmd_exec_cb() returns
EBUSY due to the command being throttled.

Signed-off-by: Dragos Tatulea 
Reviewed-by: Tariq Toukan 
---
 drivers/vdpa/mlx5/core/mlx5_vdpa.h | 15 ++
 drivers/vdpa/mlx5/core/resources.c | 73 ++
 2 files changed, 88 insertions(+)

diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h 
b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
index 424d445ebee4..b34e9b93d56e 100644
--- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
@@ -105,6 +105,18 @@ struct mlx5_vdpa_dev {
bool suspended;
 };
 
+struct mlx5_vdpa_async_cmd {
+   int err;
+   struct mlx5_async_work cb_work;
+   struct completion cmd_done;
+
+   void *in;
+   size_t inlen;
+
+   void *out;
+   size_t outlen;
+};
+
 int mlx5_vdpa_create_tis(struct mlx5_vdpa_dev *mvdev, void *in, u32 *tisn);
 void mlx5_vdpa_destroy_tis(struct mlx5_vdpa_dev *mvdev, u32 tisn);
 int mlx5_vdpa_create_rqt(struct mlx5_vdpa_dev *mvdev, void *in, int inlen, u32 
*rqtn);
@@ -134,6 +146,9 @@ int mlx5_vdpa_update_cvq_iotlb(struct mlx5_vdpa_dev *mvdev,
unsigned int asid);
 int mlx5_vdpa_create_dma_mr(struct mlx5_vdpa_dev *mvdev);
 int mlx5_vdpa_reset_mr(struct mlx5_vdpa_dev *mvdev, unsigned int asid);
+int mlx5_vdpa_exec_async_cmds(struct mlx5_vdpa_dev *mvdev,
+ struct mlx5_vdpa_async_cmd *cmds,
+ int num_cmds);
 
 #define mlx5_vdpa_err(__dev, format, ...)  
\
dev_err((__dev)->mdev->device, "%s:%d:(pid %d) error: " format, 
__func__, __LINE__,\
diff --git a/drivers/vdpa/mlx5/core/resources.c 
b/drivers/vdpa/mlx5/core/resources.c
index 5c5a41b64bfc..22ea32fe007b 100644
--- a/drivers/vdpa/mlx5/core/resources.c
+++ b/drivers/vdpa/mlx5/core/resources.c
@@ -321,3 +321,76 @@ void mlx5_vdpa_free_resources(struct mlx5_vdpa_dev *mvdev)
mutex_destroy(&mvdev->mr_mtx);
res->valid = false;
 }
+
+static void virtqueue_cmd_callback(int status, struct mlx5_async_work *context)
+{
+   struct mlx5_vdpa_async_cmd *cmd =
+   container_of(context, struct mlx5_vdpa_async_cmd, cb_work);
+
+   cmd->err = mlx5_cmd_check(context->ctx->dev, status, cmd->in, cmd->out);
+   complete(&cmd->cmd_done);
+}
+
+static int issue_async_cmd(struct mlx5_vdpa_dev *mvdev,
+  struct mlx5_vdpa_async_cmd *cmds,
+  int issued,
+  int *completed)
+
+{
+   struct mlx5_vdpa_async_cmd *cmd = &cmds[issued];
+   int err;
+
+retry:
+   err = mlx5_cmd_exec_cb(&mvdev->async_ctx,
+  cmd->in, cmd->inlen,
+  cmd->out, cmd->outlen,
+  virtqueue_cmd_callback,
+  &cmd->cb_work);
+   if (err == -EBUSY) {
+   if (*completed < issued) {
+   /* Throttled by own commands: wait for oldest 
completion. */
+   wait_for_completion(&cmds[*completed].cmd_done);
+   (*completed)++;
+
+   goto retry;
+   } else {
+   /* Throttled by external commands: switch to sync api. 
*/
+   err = mlx5_cmd_exec(mvdev->mdev,
+   cmd->in, cmd->inlen,
+   cmd->out, cmd->outlen);
+   if (!err)
+   (*completed)++;
+   }
+   }
+
+   return err;
+}
+
+int mlx5_vdpa_exec_async_cmds(struct mlx5_vdpa_dev *mvdev,
+ struct mlx5_vdpa_async_cmd *cmds,
+ int num_cmds)
+{
+   int completed = 0;
+   int issued = 0;
+   int err = 0;
+
+   for (int i = 0; i < num_cmds; i++)
+   init_completion(&cmds[i].cmd_done);
+
+   while (issued < num_cmds) {
+
+   err = issue_async_cmd(mvdev, cmds, issued, &completed);
+   if (err) {
+   mlx5_vdpa_err(mvdev, "error issuing command %d of %d: 
%d\n",
+ issued, num_cmds, err);
+   break;
+   }
+
+   issued++;
+   }
+
+   while (completed < issued)
+   wait_for_completion(&cmds[completed++].cmd_done);
+
+   return err;
+}
-- 
2.45.1

[PATCH vhost v2 04/10] vdpa/mlx5: Use async API for vq query command

2024-08-16 Thread Dragos Tatulea

Switch firmware vq query command to be issued via the async API to
allow future parallelization.

For now the command is still serial but the infrastructure is there
to issue commands in parallel, including ratelimiting the number
of issued async commands to firmware.

A later patch will switch to issuing more commands at a time.

Signed-off-by: Dragos Tatulea 
Reviewed-by: Tariq Toukan 
---
 drivers/vdpa/mlx5/core/mlx5_vdpa.h |   2 +
 drivers/vdpa/mlx5/net/mlx5_vnet.c  | 101 ++---
 2 files changed, 78 insertions(+), 25 deletions(-)

diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h 
b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
index b34e9b93d56e..24fa00afb24f 100644
--- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
@@ -103,6 +103,8 @@ struct mlx5_vdpa_dev {
struct workqueue_struct *wq;
unsigned int group2asid[MLX5_VDPA_NUMVQ_GROUPS];
bool suspended;
+
+   struct mlx5_async_ctx async_ctx;
 };
 
 struct mlx5_vdpa_async_cmd {
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 12133e5d1285..413b24398ef2 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1184,40 +1184,87 @@ struct mlx5_virtq_attr {
u16 used_index;
 };
 
-static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct 
mlx5_vdpa_virtqueue *mvq,
-  struct mlx5_virtq_attr *attr)
-{
-   int outlen = MLX5_ST_SZ_BYTES(query_virtio_net_q_out);
-   u32 in[MLX5_ST_SZ_DW(query_virtio_net_q_in)] = {};
-   void *out;
-   void *obj_context;
-   void *cmd_hdr;
-   int err;
-
-   out = kzalloc(outlen, GFP_KERNEL);
-   if (!out)
-   return -ENOMEM;
+struct mlx5_virtqueue_query_mem {
+   u8 in[MLX5_ST_SZ_BYTES(query_virtio_net_q_in)];
+   u8 out[MLX5_ST_SZ_BYTES(query_virtio_net_q_out)];
+};
 
-   cmd_hdr = MLX5_ADDR_OF(query_virtio_net_q_in, in, 
general_obj_in_cmd_hdr);
+static void fill_query_virtqueue_cmd(struct mlx5_vdpa_net *ndev,
+struct mlx5_vdpa_virtqueue *mvq,
+struct mlx5_virtqueue_query_mem *cmd)
+{
+   void *cmd_hdr = MLX5_ADDR_OF(query_virtio_net_q_in, cmd->in, 
general_obj_in_cmd_hdr);
 
MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, opcode, 
MLX5_CMD_OP_QUERY_GENERAL_OBJECT);
MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, obj_type, 
MLX5_OBJ_TYPE_VIRTIO_NET_Q);
MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, obj_id, mvq->virtq_id);
MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, uid, ndev->mvdev.res.uid);
-   err = mlx5_cmd_exec(ndev->mvdev.mdev, in, sizeof(in), out, outlen);
-   if (err)
-   goto err_cmd;
+}
+
+static void query_virtqueue_end(struct mlx5_vdpa_net *ndev,
+   struct mlx5_virtqueue_query_mem *cmd,
+   struct mlx5_virtq_attr *attr)
+{
+   void *obj_context = MLX5_ADDR_OF(query_virtio_net_q_out, cmd->out, 
obj_context);
 
-   obj_context = MLX5_ADDR_OF(query_virtio_net_q_out, out, obj_context);
memset(attr, 0, sizeof(*attr));
attr->state = MLX5_GET(virtio_net_q_object, obj_context, state);
attr->available_index = MLX5_GET(virtio_net_q_object, obj_context, 
hw_available_index);
attr->used_index = MLX5_GET(virtio_net_q_object, obj_context, 
hw_used_index);
-   kfree(out);
-   return 0;
+}
 
-err_cmd:
-   kfree(out);
+static int query_virtqueues(struct mlx5_vdpa_net *ndev,
+   int start_vq,
+   int num_vqs,
+   struct mlx5_virtq_attr *attrs)
+{
+   struct mlx5_vdpa_dev *mvdev = &ndev->mvdev;
+   struct mlx5_virtqueue_query_mem *cmd_mem;
+   struct mlx5_vdpa_async_cmd *cmds;
+   int err = 0;
+
+   WARN(start_vq + num_vqs > mvdev->max_vqs, "query vq range invalid [%d, 
%d), max_vqs: %u\n",
+start_vq, start_vq + num_vqs, mvdev->max_vqs);
+
+   cmds = kvcalloc(num_vqs, sizeof(*cmds), GFP_KERNEL);
+   cmd_mem = kvcalloc(num_vqs, sizeof(*cmd_mem), GFP_KERNEL);
+   if (!cmds || !cmd_mem) {
+   err = -ENOMEM;
+   goto done;
+   }
+
+   for (int i = 0; i < num_vqs; i++) {
+   cmds[i].in = &cmd_mem[i].in;
+   cmds[i].inlen = sizeof(cmd_mem[i].in);
+   cmds[i].out = &cmd_mem[i].out;
+   cmds[i].outlen = sizeof(cmd_mem[i].out);
+   fill_query_virtqueue_cmd(ndev, &ndev->vqs[start_vq + i], 
&cmd_mem[i]);
+   }
+
+   err = mlx5_vdpa_exec_async_cmds(&ndev->mvdev, cmds, num_vqs);
+   if (err) {
+   mlx5_vdpa_err(mvdev, "error issuing query cmd for vq range [%d, 
%d): %d\n",
+ start_vq, start_vq + num_vqs, err);
+   goto done;
+   }
+
+   for (int i = 0; i < num_vqs; i++) {
+   struct mlx5_vdpa_async_cmd *cmd = &cmds[i];

[PATCH vhost v2 05/10] vdpa/mlx5: Use async API for vq modify commands

2024-08-16 Thread Dragos Tatulea

Switch firmware vq modify command to be issued via the async API to
allow future parallelization. The new refactored function applies the
modify on a range of vqs and waits for their execution to complete.

For now the command is still used in a serial fashion. A later patch
will switch to modifying multiple vqs in parallel.

Signed-off-by: Dragos Tatulea 
Reviewed-by: Tariq Toukan 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 154 --
 1 file changed, 106 insertions(+), 48 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 413b24398ef2..9be7a88d71a7 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1189,6 +1189,11 @@ struct mlx5_virtqueue_query_mem {
u8 out[MLX5_ST_SZ_BYTES(query_virtio_net_q_out)];
 };
 
+struct mlx5_virtqueue_modify_mem {
+   u8 in[MLX5_ST_SZ_BYTES(modify_virtio_net_q_in)];
+   u8 out[MLX5_ST_SZ_BYTES(modify_virtio_net_q_out)];
+};
+
 static void fill_query_virtqueue_cmd(struct mlx5_vdpa_net *ndev,
 struct mlx5_vdpa_virtqueue *mvq,
 struct mlx5_virtqueue_query_mem *cmd)
@@ -1298,51 +1303,30 @@ static bool modifiable_virtqueue_fields(struct 
mlx5_vdpa_virtqueue *mvq)
return true;
 }
 
-static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
-   struct mlx5_vdpa_virtqueue *mvq,
-   int state)
+static void fill_modify_virtqueue_cmd(struct mlx5_vdpa_net *ndev,
+ struct mlx5_vdpa_virtqueue *mvq,
+ int state,
+ struct mlx5_virtqueue_modify_mem *cmd)
 {
-   int inlen = MLX5_ST_SZ_BYTES(modify_virtio_net_q_in);
-   u32 out[MLX5_ST_SZ_DW(modify_virtio_net_q_out)] = {};
struct mlx5_vdpa_dev *mvdev = &ndev->mvdev;
struct mlx5_vdpa_mr *desc_mr = NULL;
struct mlx5_vdpa_mr *vq_mr = NULL;
-   bool state_change = false;
void *obj_context;
void *cmd_hdr;
void *vq_ctx;
-   void *in;
-   int err;
-
-   if (mvq->fw_state == MLX5_VIRTIO_NET_Q_OBJECT_NONE)
-   return 0;
-
-   if (!modifiable_virtqueue_fields(mvq))
-   return -EINVAL;
 
-   in = kzalloc(inlen, GFP_KERNEL);
-   if (!in)
-   return -ENOMEM;
-
-   cmd_hdr = MLX5_ADDR_OF(modify_virtio_net_q_in, in, 
general_obj_in_cmd_hdr);
+   cmd_hdr = MLX5_ADDR_OF(modify_virtio_net_q_in, cmd->in, 
general_obj_in_cmd_hdr);
 
MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, opcode, 
MLX5_CMD_OP_MODIFY_GENERAL_OBJECT);
MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, obj_type, 
MLX5_OBJ_TYPE_VIRTIO_NET_Q);
MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, obj_id, mvq->virtq_id);
MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, uid, ndev->mvdev.res.uid);
 
-   obj_context = MLX5_ADDR_OF(modify_virtio_net_q_in, in, obj_context);
+   obj_context = MLX5_ADDR_OF(modify_virtio_net_q_in, cmd->in, 
obj_context);
vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context, 
virtio_q_context);
 
-   if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_STATE) {
-   if (!is_valid_state_change(mvq->fw_state, state, 
is_resumable(ndev))) {
-   err = -EINVAL;
-   goto done;
-   }
-
+   if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_STATE)
MLX5_SET(virtio_net_q_object, obj_context, state, state);
-   state_change = true;
-   }
 
if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_ADDRS) {
MLX5_SET64(virtio_q, vq_ctx, desc_addr, mvq->desc_addr);
@@ -1388,38 +1372,36 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
}
 
MLX5_SET64(virtio_net_q_object, obj_context, modify_field_select, 
mvq->modified_fields);
-   err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, sizeof(out));
-   if (err)
-   goto done;
+}
 
-   if (state_change)
-   mvq->fw_state = state;
+static void modify_virtqueue_end(struct mlx5_vdpa_net *ndev,
+struct mlx5_vdpa_virtqueue *mvq,
+int state)
+{
+   struct mlx5_vdpa_dev *mvdev = &ndev->mvdev;
 
if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY) {
+   unsigned int asid = mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP];
+   struct mlx5_vdpa_mr *vq_mr = mvdev->mr[asid];
+
mlx5_vdpa_put_mr(mvdev, mvq->vq_mr);
mlx5_vdpa_get_mr(mvdev, vq_mr);
mvq->vq_mr = vq_mr;
}
 
if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY) {
+   unsigned int asid = 
mvdev->group2asid[MLX5_VDPA_DATAVQ_DESC_GROUP];
+   struct mlx5_vdpa_mr *desc_mr = mvdev->mr[asid];
+
mlx5_

[PATCH vhost v2 06/10] vdpa/mlx5: Parallelize device suspend

2024-08-16 Thread Dragos Tatulea

Currently device suspend works on vqs serially. Building up on previous
changes that converted vq operations to the async api, this patch
parallelizes the device suspend:
1) Suspend all active vqs parallel.
2) Query suspended vqs in parallel.

For 1 vDPA device x 32 VQs (16 VQPs) attached to a large VM (256 GB RAM,
32 CPUs x 2 threads per core), the device suspend time is reduced from
~37 ms to ~13 ms.

A later patch will remove the link unregister operation which will make
it even faster.

Signed-off-by: Dragos Tatulea 
Reviewed-by: Tariq Toukan 
Acked-by: Eugenio Pérez 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 56 ---
 1 file changed, 29 insertions(+), 27 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 9be7a88d71a7..5fba16c80dbb 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1630,49 +1630,51 @@ static int modify_virtqueues(struct mlx5_vdpa_net 
*ndev, int start_vq, int num_v
return err;
 }
 
-static int suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue 
*mvq)
+static int suspend_vqs(struct mlx5_vdpa_net *ndev, int start_vq, int num_vqs)
 {
-   struct mlx5_virtq_attr attr;
+   struct mlx5_vdpa_virtqueue *mvq;
+   struct mlx5_virtq_attr *attrs;
+   int vq_idx, i;
int err;
 
+   if (start_vq >= ndev->cur_num_vqs)
+   return -EINVAL;
+
+   mvq = &ndev->vqs[start_vq];
if (!mvq->initialized)
return 0;
 
if (mvq->fw_state != MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY)
return 0;
 
-   err = modify_virtqueues(ndev, mvq->index, 1, 
MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND);
-   if (err) {
-   mlx5_vdpa_err(&ndev->mvdev, "modify to suspend failed, err: 
%d\n", err);
-   return err;
-   }
-
-   err = query_virtqueues(ndev, mvq->index, 1, &attr);
-   if (err) {
-   mlx5_vdpa_err(&ndev->mvdev, "failed to query virtqueue, err: 
%d\n", err);
+   err = modify_virtqueues(ndev, start_vq, num_vqs, 
MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND);
+   if (err)
return err;
-   }
-
-   mvq->avail_idx = attr.available_index;
-   mvq->used_idx = attr.used_index;
-
-   return 0;
-}
 
-static int suspend_vqs(struct mlx5_vdpa_net *ndev)
-{
-   int err = 0;
-   int i;
+   attrs = kcalloc(num_vqs, sizeof(struct mlx5_virtq_attr), GFP_KERNEL);
+   if (!attrs)
+   return -ENOMEM;
 
-   for (i = 0; i < ndev->cur_num_vqs; i++) {
-   int local_err = suspend_vq(ndev, &ndev->vqs[i]);
+   err = query_virtqueues(ndev, start_vq, num_vqs, attrs);
+   if (err)
+   goto done;
 
-   err = local_err ? local_err : err;
+   for (i = 0, vq_idx = start_vq; i < num_vqs; i++, vq_idx++) {
+   mvq = &ndev->vqs[vq_idx];
+   mvq->avail_idx = attrs[i].available_index;
+   mvq->used_idx = attrs[i].used_index;
}
 
+done:
+   kfree(attrs);
return err;
 }
 
+static int suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue 
*mvq)
+{
+   return suspend_vqs(ndev, mvq->index, 1);
+}
+
 static int resume_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue 
*mvq)
 {
int err;
@@ -3053,7 +3055,7 @@ static int mlx5_vdpa_change_map(struct mlx5_vdpa_dev 
*mvdev,
bool teardown = !is_resumable(ndev);
int err;
 
-   suspend_vqs(ndev);
+   suspend_vqs(ndev, 0, ndev->cur_num_vqs);
if (teardown) {
err = save_channels_info(ndev);
if (err)
@@ -3606,7 +3608,7 @@ static int mlx5_vdpa_suspend(struct vdpa_device *vdev)
 
down_write(&ndev->reslock);
unregister_link_notifier(ndev);
-   err = suspend_vqs(ndev);
+   err = suspend_vqs(ndev, 0, ndev->cur_num_vqs);
mlx5_vdpa_cvq_suspend(mvdev);
mvdev->suspended = true;
up_write(&ndev->reslock);
-- 
2.45.1

[PATCH vhost v2 07/10] vdpa/mlx5: Parallelize device resume

2024-08-16 Thread Dragos Tatulea

Currently device resume works on vqs serially. Building up on previous
changes that converted vq operations to the async api, this patch
parallelizes the device resume.

For 1 vDPA device x 32 VQs (16 VQPs) attached to a large VM (256 GB RAM,
32 CPUs x 2 threads per core), the device resume time is reduced from
~16 ms to ~4.5 ms.

Signed-off-by: Dragos Tatulea 
Reviewed-by: Tariq Toukan 
Acked-by: Eugenio Pérez 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 40 +++
 1 file changed, 14 insertions(+), 26 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 5fba16c80dbb..0773bec917be 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1675,10 +1675,15 @@ static int suspend_vq(struct mlx5_vdpa_net *ndev, 
struct mlx5_vdpa_virtqueue *mv
return suspend_vqs(ndev, mvq->index, 1);
 }
 
-static int resume_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue 
*mvq)
+static int resume_vqs(struct mlx5_vdpa_net *ndev, int start_vq, int num_vqs)
 {
+   struct mlx5_vdpa_virtqueue *mvq;
int err;
 
+   if (start_vq >= ndev->mvdev.max_vqs)
+   return -EINVAL;
+
+   mvq = &ndev->vqs[start_vq];
if (!mvq->initialized)
return 0;
 
@@ -1690,13 +1695,9 @@ static int resume_vq(struct mlx5_vdpa_net *ndev, struct 
mlx5_vdpa_virtqueue *mvq
/* Due to a FW quirk we need to modify the VQ fields first then 
change state.
 * This should be fixed soon. After that, a single command can 
be used.
 */
-   err = modify_virtqueues(ndev, mvq->index, 1, mvq->fw_state);
-   if (err) {
-   mlx5_vdpa_err(&ndev->mvdev,
-   "modify vq properties failed for vq %u, err: 
%d\n",
-   mvq->index, err);
+   err = modify_virtqueues(ndev, start_vq, num_vqs, mvq->fw_state);
+   if (err)
return err;
-   }
break;
case MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND:
if (!is_resumable(ndev)) {
@@ -1712,25 +1713,12 @@ static int resume_vq(struct mlx5_vdpa_net *ndev, struct 
mlx5_vdpa_virtqueue *mvq
return -EINVAL;
}
 
-   err = modify_virtqueues(ndev, mvq->index, 1, 
MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
-   if (err)
-   mlx5_vdpa_err(&ndev->mvdev, "modify to resume failed for vq %u, 
err: %d\n",
- mvq->index, err);
-
-   return err;
+   return modify_virtqueues(ndev, start_vq, num_vqs, 
MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
 }
 
-static int resume_vqs(struct mlx5_vdpa_net *ndev)
+static int resume_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue 
*mvq)
 {
-   int err = 0;
-
-   for (int i = 0; i < ndev->cur_num_vqs; i++) {
-   int local_err = resume_vq(ndev, &ndev->vqs[i]);
-
-   err = local_err ? local_err : err;
-   }
-
-   return err;
+   return resume_vqs(ndev, mvq->index, 1);
 }
 
 static void teardown_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue 
*mvq)
@@ -3080,7 +3068,7 @@ static int mlx5_vdpa_change_map(struct mlx5_vdpa_dev 
*mvdev,
return err;
}
 
-   resume_vqs(ndev);
+   resume_vqs(ndev, 0, ndev->cur_num_vqs);
 
return 0;
 }
@@ -3204,7 +3192,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device 
*vdev, u8 status)
teardown_vq_resources(ndev);
 
if (ndev->setup) {
-   err = resume_vqs(ndev);
+   err = resume_vqs(ndev, 0, ndev->cur_num_vqs);
if (err) {
mlx5_vdpa_warn(mvdev, "failed to resume 
VQs\n");
goto err_driver;
@@ -3628,7 +3616,7 @@ static int mlx5_vdpa_resume(struct vdpa_device *vdev)
 
down_write(&ndev->reslock);
mvdev->suspended = false;
-   err = resume_vqs(ndev);
+   err = resume_vqs(ndev, 0, ndev->cur_num_vqs);
register_link_notifier(ndev);
up_write(&ndev->reslock);
 
-- 
2.45.1

[PATCH vhost v2 08/10] vdpa/mlx5: Keep notifiers during suspend but ignore

2024-08-16 Thread Dragos Tatulea

Unregistering notifiers is a costly operation. Instead of removing
the notifiers during device suspend and adding them back at resume,
simply ignore the call when the device is suspended.

At resume time call queue_link_work() to make sure that the device state
is propagated in case there were changes.

For 1 vDPA device x 32 VQs (16 VQPs) attached to a large VM (256 GB RAM,
32 CPUs x 2 threads per core), the device suspend time is reduced from
~13 ms to ~2.5 ms.

Signed-off-by: Dragos Tatulea 
Reviewed-by: Tariq Toukan 
Acked-by: Eugenio Pérez 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 0773bec917be..65063c507130 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -2850,6 +2850,9 @@ static int event_handler(struct notifier_block *nb, 
unsigned long event, void *p
struct mlx5_eqe *eqe = param;
int ret = NOTIFY_DONE;
 
+   if (ndev->mvdev.suspended)
+   return NOTIFY_DONE;
+
if (event == MLX5_EVENT_TYPE_PORT_CHANGE) {
switch (eqe->sub_type) {
case MLX5_PORT_CHANGE_SUBTYPE_DOWN:
@@ -3595,7 +3598,6 @@ static int mlx5_vdpa_suspend(struct vdpa_device *vdev)
mlx5_vdpa_info(mvdev, "suspending device\n");
 
down_write(&ndev->reslock);
-   unregister_link_notifier(ndev);
err = suspend_vqs(ndev, 0, ndev->cur_num_vqs);
mlx5_vdpa_cvq_suspend(mvdev);
mvdev->suspended = true;
@@ -3617,7 +3619,7 @@ static int mlx5_vdpa_resume(struct vdpa_device *vdev)
down_write(&ndev->reslock);
mvdev->suspended = false;
err = resume_vqs(ndev, 0, ndev->cur_num_vqs);
-   register_link_notifier(ndev);
+   queue_link_work(ndev);
up_write(&ndev->reslock);
 
return err;
-- 
2.45.1

[PATCH vhost v2 09/10] vdpa/mlx5: Small improvement for change_num_qps()

2024-08-16 Thread Dragos Tatulea

change_num_qps() has a lot of multiplications by 2 to convert
the number of VQ pairs to number of VQs. This patch simplifies
the code by doing the VQP -> VQ count conversion at the beginning
in a variable.

Signed-off-by: Dragos Tatulea 
Reviewed-by: Tariq Toukan 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 65063c507130..d1a01c229110 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -2219,16 +2219,17 @@ static virtio_net_ctrl_ack handle_ctrl_mac(struct 
mlx5_vdpa_dev *mvdev, u8 cmd)
 static int change_num_qps(struct mlx5_vdpa_dev *mvdev, int newqps)
 {
struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
-   int cur_qps = ndev->cur_num_vqs / 2;
+   int cur_vqs = ndev->cur_num_vqs;
+   int new_vqs = newqps * 2;
int err;
int i;
 
-   if (cur_qps > newqps) {
-   err = modify_rqt(ndev, 2 * newqps);
+   if (cur_vqs > new_vqs) {
+   err = modify_rqt(ndev, new_vqs);
if (err)
return err;
 
-   for (i = ndev->cur_num_vqs - 1; i >= 2 * newqps; i--) {
+   for (i = cur_vqs - 1; i >= new_vqs; i--) {
struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[i];
 
if (is_resumable(ndev))
@@ -2237,27 +2238,27 @@ static int change_num_qps(struct mlx5_vdpa_dev *mvdev, 
int newqps)
teardown_vq(ndev, mvq);
}
 
-   ndev->cur_num_vqs = 2 * newqps;
+   ndev->cur_num_vqs = new_vqs;
} else {
-   ndev->cur_num_vqs = 2 * newqps;
-   for (i = cur_qps * 2; i < 2 * newqps; i++) {
+   ndev->cur_num_vqs = new_vqs;
+   for (i = cur_vqs; i < new_vqs; i++) {
struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[i];
 
err = mvq->initialized ? resume_vq(ndev, mvq) : 
setup_vq(ndev, mvq, true);
if (err)
goto clean_added;
}
-   err = modify_rqt(ndev, 2 * newqps);
+   err = modify_rqt(ndev, new_vqs);
if (err)
goto clean_added;
}
return 0;
 
 clean_added:
-   for (--i; i >= 2 * cur_qps; --i)
+   for (--i; i >= cur_vqs; --i)
teardown_vq(ndev, &ndev->vqs[i]);
 
-   ndev->cur_num_vqs = 2 * cur_qps;
+   ndev->cur_num_vqs = cur_vqs;
 
return err;
 }
-- 
2.45.1

[PATCH vhost v2 10/10] vdpa/mlx5: Parallelize VQ suspend/resume for CVQ MQ command

2024-08-16 Thread Dragos Tatulea

change_num_qps() is still suspending/resuming VQs one by one.
This change switches to parallel suspend/resume.

When increasing the number of queues the flow has changed a bit for
simplicity: the setup_vq() function will always be called before
resume_vqs(). If the VQ is initialized, setup_vq() will exit early. If
the VQ is not initialized, setup_vq() will create it and resume_vqs()
will resume it.

Signed-off-by: Dragos Tatulea 
Reviewed-by: Tariq Toukan 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index d1a01c229110..822092eccb32 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -2229,25 +2229,27 @@ static int change_num_qps(struct mlx5_vdpa_dev *mvdev, 
int newqps)
if (err)
return err;
 
-   for (i = cur_vqs - 1; i >= new_vqs; i--) {
-   struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[i];
-
-   if (is_resumable(ndev))
-   suspend_vq(ndev, mvq);
-   else
-   teardown_vq(ndev, mvq);
+   if (is_resumable(ndev)) {
+   suspend_vqs(ndev, new_vqs, cur_vqs - new_vqs);
+   } else {
+   for (i = new_vqs; i < cur_vqs; i++)
+   teardown_vq(ndev, &ndev->vqs[i]);
}
 
ndev->cur_num_vqs = new_vqs;
} else {
ndev->cur_num_vqs = new_vqs;
-   for (i = cur_vqs; i < new_vqs; i++) {
-   struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[i];
 
-   err = mvq->initialized ? resume_vq(ndev, mvq) : 
setup_vq(ndev, mvq, true);
+   for (i = cur_vqs; i < new_vqs; i++) {
+   err = setup_vq(ndev, &ndev->vqs[i], false);
if (err)
goto clean_added;
}
+
+   err = resume_vqs(ndev, cur_vqs, new_vqs - cur_vqs);
+   if (err)
+   goto clean_added;
+
err = modify_rqt(ndev, new_vqs);
if (err)
goto clean_added;
-- 
2.45.1

Re: [PATCH vhost 0/7] vdpa/mlx5: Parallelize device suspend/resume

2024-08-16 Thread Dragos Tatulea




On 02.08.24 15:14, Michael S. Tsirkin wrote:
> On Fri, Aug 02, 2024 at 10:20:17AM +0300, Dragos Tatulea wrote:
>> This series parallelizes the mlx5_vdpa device suspend and resume
>> operations through the firmware async API. The purpose is to reduce live
>> migration downtime.
>>
>> The series starts with changing the VQ suspend and resume commands
>> to the async API. After that, the switch is made to issue multiple
>> commands of the same type in parallel.
>>
>> Finally, a bonus improvement is thrown in: keep the notifierd enabled
>> during suspend but make it a NOP. Upon resume make sure that the link
>> state is forwarded. This shaves around 30ms per device constant time.
>>
>> For 1 vDPA device x 32 VQs (16 VQPs), on a large VM (256 GB RAM, 32 CPUs
>> x 2 threads per core), the improvements are:
>>
>> +---+++---+
>> | operation | Before | After  | Reduction |
>> |---+++---|
>> | mlx5_vdpa_suspend | 37 ms  | 2.5 ms | 14x   |
>> | mlx5_vdpa_resume  | 16 ms  | 5 ms   |  3x   |
>> +---+++---+
>>
>> Note for the maintainers:
>> The first patch contains changes for mlx5_core. This must be applied
>> into the mlx5-vhost tree [0] first. Once this patch is applied on
>> mlx5-vhost, the change has to be pulled from mlx5-vdpa into the vhost
>> tree and only then the remaining patches can be applied.
> 
> Or maintainer just acks it and I apply directly.
> 
Tariq reviewed the patch, he is a mlx5_core maintainer. So consider it acked.
Just sent the v2 with the same note in the cover letter.

Thanks,
Dragos

> Let me know when all this can happen.
> 
>> [0] 
>> https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/log/?h=mlx5-vhost
>>
>> Dragos Tatulea (7):
>>   net/mlx5: Support throttled commands from async API
>>   vdpa/mlx5: Introduce error logging function
>>   vdpa/mlx5: Use async API for vq query command
>>   vdpa/mlx5: Use async API for vq modify commands
>>   vdpa/mlx5: Parallelize device suspend
>>   vdpa/mlx5: Parallelize device resume
>>   vdpa/mlx5: Keep notifiers during suspend but ignore
>>
>>  drivers/net/ethernet/mellanox/mlx5/core/cmd.c |  21 +-
>>  drivers/vdpa/mlx5/core/mlx5_vdpa.h|   7 +
>>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 435 +-
>>  3 files changed, 333 insertions(+), 130 deletions(-)
>>
>> -- 
>> 2.45.2
>

[PATCH v7 0/5] Tracepoints and static branch in Rust

2024-08-16 Thread Alice Ryhl

An important part of a production ready Linux kernel driver is
tracepoints. So to write production ready Linux kernel drivers in Rust,
we must be able to call tracepoints from Rust code. This patch series
adds support for calling tracepoints declared in C from Rust.

To use the tracepoint support, you must:

1. Declare the tracepoint in a C header file as usual.

2. Add #define CREATE_RUST_TRACE_POINTS next to your
   #define CREATE_TRACE_POINTS.

3. Make sure that the header file is visible to bindgen.

4. Use the declare_trace! macro in your Rust code to generate Rust
   functions that call into the tracepoint.

For example, the kernel has a tracepoint called `sched_kthread_stop`. It
is declared like this:

TRACE_EVENT(sched_kthread_stop,
TP_PROTO(struct task_struct *t),
TP_ARGS(t),
TP_STRUCT__entry(
__array(char,   comm,   TASK_COMM_LEN   )
__field(pid_t,  pid )
),
TP_fast_assign(
memcpy(__entry->comm, t->comm, TASK_COMM_LEN);
__entry->pid= t->pid;
),
TP_printk("comm=%s pid=%d", __entry->comm, __entry->pid)
);

To call the above tracepoint from Rust code, you must first ensure that
the Rust helper for the tracepoint is generated. To do this, you would
modify kernel/sched/core.c by adding #define CREATE_RUST_TRACE_POINTS.

Next, you would include include/trace/events/sched.h in
rust/bindings/bindings_helper.h so that the exported C functions are
visible to Rust, and then you would declare the tracepoint in Rust:

declare_trace! {
fn sched_kthread_stop(task: *mut task_struct);
}

This will define an inline Rust function that checks the static key,
calling into rust_do_trace_##name if the tracepoint is active. Since
these tracepoints often take raw pointers as arguments, it may be
convenient to wrap it in a safe wrapper:

mod raw {
declare_trace! {
/// # Safety
/// `task` must point at a valid task for the duration
/// of this call.
fn sched_kthread_stop(task: *mut task_struct);
}
}

#[inline]
pub fn trace_sched_kthread_stop(task: &Task) {
// SAFETY: The pointer to `task` is valid.
unsafe { raw::sched_kthread_stop(task.as_raw()) }
}

A future expansion of the tracepoint support could generate these safe
versions automatically, but that is left as future work for now.

This is intended for use in the Rust Binder driver, which was originally
sent as an RFC [1]. The RFC did not include tracepoint support, but you
can see how it will be used in Rust Binder at [2]. The author has
verified that the tracepoint support works on Android devices.

This implementation implements support for static keys in Rust so that
the actual static branch happens in the Rust object file. However, the
__DO_TRACE body remains in C code. See v1 for an implementation where
__DO_TRACE is also implemented in Rust.

Link: 
https://lore.kernel.org/rust-for-linux/20231101-rust-binder-v1-0-08ba9197f...@google.com/
 [1]
Link: https://r.android.com/3119993 [2]
Signed-off-by: Alice Ryhl 
---
Changes in v7:
- Fix spurious file included in first patch.
- Fix issue with riscv asm.
- Fix tags on fourth patch to match fifth patch.
- Add Reviewed-by/Acked-by tags where appropriate.
- Link to v6: 
https://lore.kernel.org/r/20240808-tracepoint-v6-0-a23f800f1...@google.com

Changes in v6:
- Add support for !CONFIG_JUMP_LABEL.
- Add tracepoint to rust_print sample.
- Deduplicate inline asm.
- Require unsafe inside `declare_trace!`.
- Fix bug on x86 due to use of intel syntax.
- Link to v5: 
https://lore.kernel.org/r/20240802-tracepoint-v5-0-faa164494...@google.com

Changes in v5:
- Update first patch regarding inline asm duplication.
- Add __rust_do_trace helper to support conditions.
- Rename DEFINE_RUST_DO_TRACE_REAL to __DEFINE_RUST_DO_TRACE.
- Get rid of glob-import in tracepoint macro.
- Address safety requirements on tracepoints in docs.
- Link to v4: 
https://lore.kernel.org/rust-for-linux/20240628-tracepoint-v4-0-353d523a9...@google.com

Changes in v4:
- Move arch-specific code into rust/kernel/arch.
- Restore DEFINE_RUST_DO_TRACE at end of define_trace.h
- Link to v3: 
https://lore.kernel.org/r/20240621-tracepoint-v3-0-9e44eeea2...@google.com

Changes in v3:
- Support for Rust static_key on loongarch64 and riscv64.
- Avoid failing compilation on architectures that are missing Rust
  static_key support when the archtectures does not actually use it.
- Link to v2: 
https://lore.kernel.org/r/20240610-tracepoint-v2-0-faebad81b...@google.com

Changes in v2:
- Call into C code for __DO_TRACE.
- Drop static_call patch, as it is no longer needed.
- Link to v1: 
https://lore.kernel.org/r/20240606-tracepoint-v

[PATCH v7 1/5] rust: add generic static_key_false

2024-08-16 Thread Alice Ryhl

Add just enough support for static key so that we can use it from
tracepoints. Tracepoints rely on `static_key_false` even though it is
deprecated, so we add the same functionality to Rust.

This patch only provides a generic implementation without code patching
(matching the one used when CONFIG_JUMP_LABEL is disabled). Later
patches add support for inline asm implementations that use runtime
patching.

When CONFIG_JUMP_LABEL is unset, `static_key_count` is a static inline
function, so a Rust helper is defined for `static_key_count` in this
case. If Rust is compiled with LTO, this call should get inlined. The
helper can be eliminated once we have the necessary inline asm to make
atomic operations from Rust.

Signed-off-by: Alice Ryhl 
---
 rust/bindings/bindings_helper.h |  1 +
 rust/helpers.c  |  9 +
 rust/kernel/jump_label.rs   | 29 +
 rust/kernel/lib.rs  |  1 +
 4 files changed, 40 insertions(+)

diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index b940a5777330..8fd092e1b809 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/rust/helpers.c b/rust/helpers.c
index 92d3c03ae1bd..5a9bf5209cd8 100644
--- a/rust/helpers.c
+++ b/rust/helpers.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -133,6 +134,14 @@ bool rust_helper_refcount_dec_and_test(refcount_t *r)
 }
 EXPORT_SYMBOL_GPL(rust_helper_refcount_dec_and_test);
 
+#ifndef CONFIG_JUMP_LABEL
+int rust_helper_static_key_count(struct static_key *key)
+{
+   return static_key_count(key);
+}
+EXPORT_SYMBOL_GPL(rust_helper_static_key_count);
+#endif
+
 __force void *rust_helper_ERR_PTR(long err)
 {
return ERR_PTR(err);
diff --git a/rust/kernel/jump_label.rs b/rust/kernel/jump_label.rs
new file mode 100644
index ..011e1fc1d19a
--- /dev/null
+++ b/rust/kernel/jump_label.rs
@@ -0,0 +1,29 @@
+// SPDX-License-Identifier: GPL-2.0
+
+// Copyright (C) 2024 Google LLC.
+
+//! Logic for static keys.
+//!
+//! C header: 
[`include/linux/jump_label.h`](srctree/include/linux/jump_label.h).
+
+/// Branch based on a static key.
+///
+/// Takes three arguments:
+///
+/// * `key` - the path to the static variable containing the `static_key`.
+/// * `keytyp` - the type of `key`.
+/// * `field` - the name of the field of `key` that contains the `static_key`.
+///
+/// # Safety
+///
+/// The macro must be used with a real static key defined by C.
+#[macro_export]
+macro_rules! static_key_false {
+($key:path, $keytyp:ty, $field:ident) => {{
+let _key: *const $keytyp = ::core::ptr::addr_of!($key);
+let _key: *const $crate::bindings::static_key = 
::core::ptr::addr_of!((*_key).$field);
+
+$crate::bindings::static_key_count(_key.cast_mut()) > 0
+}};
+}
+pub use static_key_false;
diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index 274bdc1b0a82..91af9f75d121 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -36,6 +36,7 @@
 pub mod firmware;
 pub mod init;
 pub mod ioctl;
+pub mod jump_label;
 #[cfg(CONFIG_KUNIT)]
 pub mod kunit;
 #[cfg(CONFIG_NET)]

-- 
2.46.0.184.g6999bdac58-goog

[PATCH v7 2/5] rust: add tracepoint support

2024-08-16 Thread Alice Ryhl

Make it possible to have Rust code call into tracepoints defined by C
code. It is still required that the tracepoint is declared in a C
header, and that this header is included in the input to bindgen.

Instead of calling __DO_TRACE directly, the exported rust_do_trace_
function calls an inline helper function. This is because the `cond`
argument does not exist at the callsite of DEFINE_RUST_DO_TRACE.

__DECLARE_TRACE always emits an inline static and an extern declaration
that is only used when CREATE_RUST_TRACE_POINTS is set. These should not
end up in the final binary so it is not a problem that they sometimes
are emitted without a user.

Reviewed-by: Carlos Llamas 
Reviewed-by: Gary Guo 
Signed-off-by: Alice Ryhl 
---
 include/linux/tracepoint.h  | 22 +-
 include/trace/define_trace.h| 12 ++
 rust/bindings/bindings_helper.h |  1 +
 rust/kernel/lib.rs  |  1 +
 rust/kernel/tracepoint.rs   | 49 +
 5 files changed, 84 insertions(+), 1 deletion(-)

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 6be396bb4297..5042ca588e41 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -237,6 +237,18 @@ static inline struct tracepoint 
*tracepoint_ptr_deref(tracepoint_ptr_t *p)
 #define __DECLARE_TRACE_RCU(name, proto, args, cond)
 #endif
 
+/*
+ * Declare an exported function that Rust code can call to trigger this
+ * tracepoint. This function does not include the static branch; that is done
+ * in Rust to avoid a function call when the tracepoint is disabled.
+ */
+#define DEFINE_RUST_DO_TRACE(name, proto, args)
+#define __DEFINE_RUST_DO_TRACE(name, proto, args)  \
+   notrace void rust_do_trace_##name(proto)\
+   {   \
+   __rust_do_trace_##name(args);   \
+   }
+
 /*
  * Make sure the alignment of the structure in the __tracepoints section will
  * not add unwanted padding between the beginning of the section and the
@@ -252,6 +264,13 @@ static inline struct tracepoint 
*tracepoint_ptr_deref(tracepoint_ptr_t *p)
extern int __traceiter_##name(data_proto);  \
DECLARE_STATIC_CALL(tp_func_##name, __traceiter_##name);\
extern struct tracepoint __tracepoint_##name;   \
+   extern void rust_do_trace_##name(proto);\
+   static inline void __rust_do_trace_##name(proto)\
+   {   \
+   __DO_TRACE(name,\
+   TP_ARGS(args),  \
+   TP_CONDITION(cond), 0); \
+   }   \
static inline void trace_##name(proto)  \
{   \
if (static_key_false(&__tracepoint_##name.key)) \
@@ -336,7 +355,8 @@ static inline struct tracepoint 
*tracepoint_ptr_deref(tracepoint_ptr_t *p)
void __probestub_##_name(void *__data, proto)   \
{   \
}   \
-   DEFINE_STATIC_CALL(tp_func_##_name, __traceiter_##_name);
+   DEFINE_STATIC_CALL(tp_func_##_name, __traceiter_##_name);   \
+   DEFINE_RUST_DO_TRACE(_name, TP_PROTO(proto), TP_ARGS(args))
 
 #define DEFINE_TRACE(name, proto, args)\
DEFINE_TRACE_FN(name, NULL, NULL, PARAMS(proto), PARAMS(args));
diff --git a/include/trace/define_trace.h b/include/trace/define_trace.h
index 00723935dcc7..8159294c2041 100644
--- a/include/trace/define_trace.h
+++ b/include/trace/define_trace.h
@@ -72,6 +72,13 @@
 #define DECLARE_TRACE(name, proto, args)   \
DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
 
+/* If requested, create helpers for calling these tracepoints from Rust. */
+#ifdef CREATE_RUST_TRACE_POINTS
+#undef DEFINE_RUST_DO_TRACE
+#define DEFINE_RUST_DO_TRACE(name, proto, args)\
+   __DEFINE_RUST_DO_TRACE(name, PARAMS(proto), PARAMS(args))
+#endif
+
 #undef TRACE_INCLUDE
 #undef __TRACE_INCLUDE
 
@@ -129,6 +136,11 @@
 # undef UNDEF_TRACE_INCLUDE_PATH
 #endif
 
+#ifdef CREATE_RUST_TRACE_POINTS
+# undef DEFINE_RUST_DO_TRACE
+# define DEFINE_RUST_DO_TRACE(name, proto, args)
+#endif
+
 /* We may be processing more files */
 #define CREATE_TRACE_POINTS
 
diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index 8fd092e1b809..fc6f94729789 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #incl

[PATCH v7 3/5] rust: samples: add tracepoint to Rust sample

2024-08-16 Thread Alice Ryhl

This updates the Rust printing sample to invoke a tracepoint. This
ensures that we have a user in-tree from the get-go even though the
patch is being merged before its real user.

Signed-off-by: Alice Ryhl 
---
 MAINTAINERS|  1 +
 include/trace/events/rust_sample.h | 31 +++
 rust/bindings/bindings_helper.h|  1 +
 samples/rust/Makefile  |  3 ++-
 samples/rust/rust_print.rs | 18 ++
 samples/rust/rust_print_events.c   |  8 
 6 files changed, 61 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index f328373463b0..1acf5bfddfc4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -19922,6 +19922,7 @@ C:  zulip://rust-for-linux.zulipchat.com
 P: https://rust-for-linux.com/contributing
 T: git https://github.com/Rust-for-Linux/linux.git rust-next
 F: Documentation/rust/
+F: include/trace/events/rust_sample.h
 F: rust/
 F: samples/rust/
 F: scripts/*rust*
diff --git a/include/trace/events/rust_sample.h 
b/include/trace/events/rust_sample.h
new file mode 100644
index ..dbc80ca2e465
--- /dev/null
+++ b/include/trace/events/rust_sample.h
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Tracepoints for `samples/rust/rust_print.rs`.
+ *
+ * Copyright (C) 2024 Google, Inc.
+ */
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM rust_sample
+
+#if !defined(_RUST_SAMPLE_TRACE_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _RUST_SAMPLE_TRACE_H
+
+#include 
+
+TRACE_EVENT(rust_sample_loaded,
+   TP_PROTO(int magic_number),
+   TP_ARGS(magic_number),
+   TP_STRUCT__entry(
+   __field(int, magic_number)
+   ),
+   TP_fast_assign(
+   __entry->magic_number = magic_number;
+   ),
+   TP_printk("magic=%d", __entry->magic_number)
+);
+
+#endif /* _RUST_SAMPLE_TRACE_H */
+
+/* This part must be outside protection */
+#include 
diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index fc6f94729789..fe97256afe65 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* `bindgen` gets confused at certain things. */
 const size_t RUST_CONST_HELPER_ARCH_SLAB_MINALIGN = ARCH_SLAB_MINALIGN;
diff --git a/samples/rust/Makefile b/samples/rust/Makefile
index 03086dabbea4..f29280ec4820 100644
--- a/samples/rust/Makefile
+++ b/samples/rust/Makefile
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
+ccflags-y += -I$(src)  # needed for trace events
 
 obj-$(CONFIG_SAMPLE_RUST_MINIMAL)  += rust_minimal.o
-obj-$(CONFIG_SAMPLE_RUST_PRINT)+= rust_print.o
+obj-$(CONFIG_SAMPLE_RUST_PRINT)+= rust_print.o 
rust_print_events.o
 
 subdir-$(CONFIG_SAMPLE_RUST_HOSTPROGS) += hostprogs
diff --git a/samples/rust/rust_print.rs b/samples/rust/rust_print.rs
index 6eabb0d79ea3..6d14b08cac1c 100644
--- a/samples/rust/rust_print.rs
+++ b/samples/rust/rust_print.rs
@@ -69,6 +69,8 @@ fn init(_module: &'static ThisModule) -> Result {
 
 arc_print()?;
 
+trace::trace_rust_sample_loaded(42);
+
 Ok(RustPrint)
 }
 }
@@ -78,3 +80,19 @@ fn drop(&mut self) {
 pr_info!("Rust printing macros sample (exit)\n");
 }
 }
+
+mod trace {
+use core::ffi::c_int;
+
+kernel::declare_trace! {
+/// # Safety
+///
+/// Always safe to call.
+unsafe fn rust_sample_loaded(magic: c_int);
+}
+
+pub(crate) fn trace_rust_sample_loaded(magic: i32) {
+// SAFETY: Always safe to call.
+unsafe { rust_sample_loaded(magic as c_int) }
+}
+}
diff --git a/samples/rust/rust_print_events.c b/samples/rust/rust_print_events.c
new file mode 100644
index ..a9169ff0edf1
--- /dev/null
+++ b/samples/rust/rust_print_events.c
@@ -0,0 +1,8 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright 2024 Google LLC
+ */
+
+#define CREATE_TRACE_POINTS
+#define CREATE_RUST_TRACE_POINTS
+#include 

-- 
2.46.0.184.g6999bdac58-goog

[PATCH v7 4/5] jump_label: adjust inline asm to be consistent

2024-08-16 Thread Alice Ryhl

To avoid duplication of inline asm between C and Rust, we need to
import the inline asm from the relevant `jump_label.h` header into Rust.
To make that easier, this patch updates the header files to expose the
inline asm via a new ARCH_STATIC_BRANCH_ASM macro.

The header files are all updated to define a ARCH_STATIC_BRANCH_ASM that
takes the same arguments in a consistent order so that Rust can use the
same logic for every architecture.

Suggested-by: Peter Zijlstra (Intel) 
Acked-by: Peter Zijlstra (Intel) 
Co-developed-by: Miguel Ojeda 
Signed-off-by: Miguel Ojeda 
Signed-off-by: Alice Ryhl 
---
 arch/arm/include/asm/jump_label.h   | 14 +
 arch/arm64/include/asm/jump_label.h | 20 -
 arch/loongarch/include/asm/jump_label.h | 16 +++
 arch/riscv/include/asm/jump_label.h | 50 ++---
 arch/x86/include/asm/jump_label.h   | 38 ++---
 5 files changed, 75 insertions(+), 63 deletions(-)

diff --git a/arch/arm/include/asm/jump_label.h 
b/arch/arm/include/asm/jump_label.h
index e4eb54f6cd9f..a35aba7f548c 100644
--- a/arch/arm/include/asm/jump_label.h
+++ b/arch/arm/include/asm/jump_label.h
@@ -9,13 +9,17 @@
 
 #define JUMP_LABEL_NOP_SIZE 4
 
+/* This macro is also expanded on the Rust side. */
+#define ARCH_STATIC_BRANCH_ASM(key, label) \
+   "1:\n\t"\
+   WASM(nop) "\n\t"\
+   ".pushsection __jump_table,  \"aw\"\n\t"\
+   ".word 1b, " label ", " key "\n\t"  \
+   ".popsection\n\t"   \
+
 static __always_inline bool arch_static_branch(struct static_key *key, bool 
branch)
 {
-   asm goto("1:\n\t"
-WASM(nop) "\n\t"
-".pushsection __jump_table,  \"aw\"\n\t"
-".word 1b, %l[l_yes], %c0\n\t"
-".popsection\n\t"
+   asm goto(ARCH_STATIC_BRANCH_ASM("%c0", "%l[l_yes]")
 : :  "i" (&((char *)key)[branch]) :  : l_yes);
 
return false;
diff --git a/arch/arm64/include/asm/jump_label.h 
b/arch/arm64/include/asm/jump_label.h
index a0a5bbae7229..424ed421cd97 100644
--- a/arch/arm64/include/asm/jump_label.h
+++ b/arch/arm64/include/asm/jump_label.h
@@ -19,10 +19,14 @@
 #define JUMP_TABLE_ENTRY(key, label)   \
".pushsection   __jump_table, \"aw\"\n\t"   \
".align 3\n\t"  \
-   ".long  1b - ., %l["#label"] - .\n\t"   \
-   ".quad  %c0 - .\n\t"\
-   ".popsection\n\t"   \
-   :  :  "i"(key) :  : label
+   ".long  1b - ., " label " - .\n\t"  \
+   ".quad  " key " - .\n\t"\
+   ".popsection\n\t"
+
+/* This macro is also expanded on the Rust side. */
+#define ARCH_STATIC_BRANCH_ASM(key, label) \
+   "1: nop\n\t"\
+   JUMP_TABLE_ENTRY(key, label)
 
 static __always_inline bool arch_static_branch(struct static_key * const key,
   const bool branch)
@@ -30,8 +34,8 @@ static __always_inline bool arch_static_branch(struct 
static_key * const key,
char *k = &((char *)key)[branch];
 
asm goto(
-   "1: nop \n\t"
-   JUMP_TABLE_ENTRY(k, l_yes)
+   ARCH_STATIC_BRANCH_ASM("%c0", "%l[l_yes]")
+   :  :  "i"(k) :  : l_yes
);
 
return false;
@@ -43,9 +47,11 @@ static __always_inline bool arch_static_branch_jump(struct 
static_key * const ke
const bool branch)
 {
char *k = &((char *)key)[branch];
+
asm goto(
"1: b   %l[l_yes]   \n\t"
-   JUMP_TABLE_ENTRY(k, l_yes)
+   JUMP_TABLE_ENTRY("%c0", "%l[l_yes]")
+   :  :  "i"(k) :  : l_yes
);
return false;
 l_yes:
diff --git a/arch/loongarch/include/asm/jump_label.h 
b/arch/loongarch/include/asm/jump_label.h
index 29acfe3de3fa..8a924bd69d19 100644
--- a/arch/loongarch/include/asm/jump_label.h
+++ b/arch/loongarch/include/asm/jump_label.h
@@ -13,18 +13,22 @@
 
 #define JUMP_LABEL_NOP_SIZE4
 
-#define JUMP_TABLE_ENTRY   \
+/* This macro is also expanded on the Rust side. */
+#define JUMP_TABLE_ENTRY(key, label)   \
 ".pushsection  __jump_table, \"aw\"\n\t"   \
 ".align3   \n\t"   \
-".long 1b - ., %l[l_yes] - .   \n\t"   \
-".quad %0 - .  \n\t"   \
+".long 1b - ., " label " - .   \n\t"   \
+".quad " key " - . \n\t"   \
 ".popsection   \n\t"
 
+#define ARCH_STATIC_BRANCH_ASM(key, lab

[PATCH v7 5/5] rust: add arch_static_branch

2024-08-16 Thread Alice Ryhl

To allow the Rust implementation of static_key_false to use runtime code
patching instead of the generic implementation, pull in the relevant
inline assembly from the jump_label.h header by running the C
preprocessor on a .rs.S file. Build rules are added for .rs.S files.

Since the relevant inline asm has been adjusted to export the inline asm
via the ARCH_STATIC_BRANCH_ASM macro in a consistent way, the Rust side
does not need architecture specific code to pull in the asm.

It is not possible to use the existing C implementation of
arch_static_branch via a Rust helper because it passes the argument
`key` to inline assembly as an 'i' parameter. Any attempt to add a C
helper for this function will fail to compile because the value of `key`
must be known at compile-time.

Suggested-by: Peter Zijlstra (Intel) 
Co-developed-by: Miguel Ojeda 
Signed-off-by: Miguel Ojeda 
Signed-off-by: Alice Ryhl 
---
 rust/Makefile   |  5 ++-
 rust/kernel/.gitignore  |  3 ++
 rust/kernel/arch_static_branch_asm.rs.S |  7 
 rust/kernel/jump_label.rs   | 64 -
 rust/kernel/lib.rs  | 30 
 scripts/Makefile.build  |  9 -
 6 files changed, 115 insertions(+), 3 deletions(-)

diff --git a/rust/Makefile b/rust/Makefile
index 199e0db67962..277fcef656b8 100644
--- a/rust/Makefile
+++ b/rust/Makefile
@@ -14,6 +14,8 @@ CFLAGS_REMOVE_helpers.o = -Wmissing-prototypes 
-Wmissing-declarations
 always-$(CONFIG_RUST) += libmacros.so
 no-clean-files += libmacros.so
 
+always-$(subst y,$(CONFIG_RUST),$(CONFIG_JUMP_LABEL)) += 
kernel/arch_static_branch_asm.rs
+
 always-$(CONFIG_RUST) += bindings/bindings_generated.rs 
bindings/bindings_helpers_generated.rs
 obj-$(CONFIG_RUST) += alloc.o bindings.o kernel.o
 always-$(CONFIG_RUST) += exports_alloc_generated.h 
exports_bindings_generated.h \
@@ -409,7 +411,8 @@ $(obj)/uapi.o: $(src)/uapi/lib.rs \
 $(obj)/kernel.o: private rustc_target_flags = --extern alloc \
 --extern build_error --extern macros --extern bindings --extern uapi
 $(obj)/kernel.o: $(src)/kernel/lib.rs $(obj)/alloc.o $(obj)/build_error.o \
-$(obj)/libmacros.so $(obj)/bindings.o $(obj)/uapi.o FORCE
+$(obj)/libmacros.so $(obj)/bindings.o $(obj)/uapi.o \
+   $(obj)/kernel/arch_static_branch_asm.rs FORCE
+$(call if_changed_rule,rustc_library)
 
 endif # CONFIG_RUST
diff --git a/rust/kernel/.gitignore b/rust/kernel/.gitignore
new file mode 100644
index ..d082731007c6
--- /dev/null
+++ b/rust/kernel/.gitignore
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0
+
+/arch_static_branch_asm.rs
diff --git a/rust/kernel/arch_static_branch_asm.rs.S 
b/rust/kernel/arch_static_branch_asm.rs.S
new file mode 100644
index ..9e373d4f7567
--- /dev/null
+++ b/rust/kernel/arch_static_branch_asm.rs.S
@@ -0,0 +1,7 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include 
+
+// Cut here.
+
+::kernel::concat_literals!(ARCH_STATIC_BRANCH_ASM("{symb} + {off} + {branch}", 
"{l_yes}"))
diff --git a/rust/kernel/jump_label.rs b/rust/kernel/jump_label.rs
index 011e1fc1d19a..7757e4f8e85e 100644
--- a/rust/kernel/jump_label.rs
+++ b/rust/kernel/jump_label.rs
@@ -23,7 +23,69 @@ macro_rules! static_key_false {
 let _key: *const $keytyp = ::core::ptr::addr_of!($key);
 let _key: *const $crate::bindings::static_key = 
::core::ptr::addr_of!((*_key).$field);
 
-$crate::bindings::static_key_count(_key.cast_mut()) > 0
+#[cfg(not(CONFIG_JUMP_LABEL))]
+{
+$crate::bindings::static_key_count(_key.cast_mut()) > 0
+}
+
+#[cfg(CONFIG_JUMP_LABEL)]
+$crate::jump_label::arch_static_branch! { $key, $keytyp, $field, false 
}
 }};
 }
 pub use static_key_false;
+
+/// Assert that the assembly block evaluates to a string literal.
+#[cfg(CONFIG_JUMP_LABEL)]
+const _: &str = include!("arch_static_branch_asm.rs");
+
+#[macro_export]
+#[doc(hidden)]
+#[cfg(CONFIG_JUMP_LABEL)]
+#[cfg(not(CONFIG_HAVE_JUMP_LABEL_HACK))]
+macro_rules! arch_static_branch {
+($key:path, $keytyp:ty, $field:ident, $branch:expr) => {'my_label: {
+$crate::asm!(
+include!(concat!(env!("SRCTREE"), 
"/rust/kernel/arch_static_branch_asm.rs"));
+l_yes = label {
+break 'my_label true;
+},
+symb = sym $key,
+off = const ::core::mem::offset_of!($keytyp, $field),
+branch = const $crate::jump_label::bool_to_int($branch),
+);
+
+break 'my_label false;
+}};
+}
+
+#[macro_export]
+#[doc(hidden)]
+#[cfg(CONFIG_JUMP_LABEL)]
+#[cfg(CONFIG_HAVE_JUMP_LABEL_HACK)]
+macro_rules! arch_static_branch {
+($key:path, $keytyp:ty, $field:ident, $branch:expr) => {'my_label: {
+$crate::asm!(
+include!(concat!(env!("SRCTREE"), 
"/rust/kernel/arch_static_branch_asm.rs"));
+l_yes = label {
+break 'my_label true;
+}

Re: [PATCHv8 9/9] man2: Add uretprobe syscall page

2024-08-16 Thread Alejandro Colomar

Hi Steven, Jiri,

On Wed, Aug 07, 2024 at 04:27:34PM GMT, Steven Rostedt wrote:
> Just in case nobody pinged you, the rest of the series is now in Linus's
> tree.

Thanks for the ping!

I have prepared some tweaks to the patch (see below).
Also, I have some doubts.  The prototype shows that it has no arguments
(void), but the text said that arguments, if any, are arch-specific.
Does any arch have arguments?  Should we use a variadic prototype (...)?

Please add the changes proposed below to your patch, tweak anything if
you consider it appropriate) and send it as v10.

Have a lovely day!
Alex


diff --git i/man/man2/uretprobe.2 w/man/man2/uretprobe.2
index cf1c2b0d8..51b566998 100644
--- i/man/man2/uretprobe.2
+++ w/man/man2/uretprobe.2
@@ -7,50 +7,43 @@ .SH NAME
 uretprobe \- execute pending return uprobes
 .SH SYNOPSIS
 .nf
-.B int uretprobe(void)
+.B int uretprobe(void);
 .fi
 .SH DESCRIPTION
-The
 .BR uretprobe ()
-system call is an alternative to breakpoint instructions for triggering return
-uprobe consumers.
+is an alternative to breakpoint instructions
+for triggering return uprobe consumers.
 .P
 Calls to
 .BR uretprobe ()
-system call are only made from the user-space trampoline provided by the 
kernel.
+are only made from the user-space trampoline provided by the kernel.
 Calls from any other place result in a
 .BR SIGILL .
-.SH RETURN VALUE
-The
+.P
+Details of the arguments (if any) passed to
 .BR uretprobe ()
-system call return value is architecture-specific.
+are architecture-specific.
+.SH RETURN VALUE
+The return value is architecture-specific.
 .SH ERRORS
 .TP
 .B SIGILL
-The
 .BR uretprobe ()
-system call was called by a user-space program.
+was called by a user-space program.
 .SH VERSIONS
-Details of the
-.BR uretprobe ()
-system call behavior vary across systems.
+The behavior varies across systems.
 .SH STANDARDS
 None.
 .SH HISTORY
-TBD
-.SH NOTES
-The
+Linux 6.11.
+.P
 .BR uretprobe ()
-system call was initially introduced for the x86_64 architecture
+was initially introduced for the x86_64 architecture
 where it was shown to be faster than breakpoint traps.
 It might be extended to other architectures.
-.P
-The
+.SH CAVEATS
 .BR uretprobe ()
-system call exists only to allow the invocation of return uprobe consumers.
+exists only to allow the invocation of return uprobe consumers.
 It should
 .B never
 be called directly.
-Details of the arguments (if any) passed to
-.BR uretprobe ()
-and the return value are architecture-specific.

-- 



signature.asc
Description: PGP signature

[GIT PULL] DAX for 6.11

2024-08-16 Thread Ira Weiny

Hi Linux, please pull from 

  https://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git/ 
tags/libnvdimm-fixes-6.11-rc4

To get a fix for filesystem DAX.

It has been in -next since August 12th without any reported issues.

Thanks,
Ira Weiny

---

The following changes since commit afdab700f65e14070d8ab92175544b1c62b8bf03:

  Merge tag 'bitmap-6.11-rc' of https://github.com/norov/linux (2024-08-09 
11:18:09 -0700)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git/ 
tags/libnvdimm-fixes-6.11-rc4

for you to fetch changes up to d5240fa65db071909e9d1d5adcc5fd1abc8e96fe:

  nvdimm/pmem: Set dax flag for all 'PFN_MAP' cases (2024-08-09 14:29:58 -0500)


libnvdimm fixes for v6.11-rc4

Commit f467fee48da4 ("block: move the dax flag to queue_limits") broke
the DAX tests by skipping over the legacy pmem mapping pages case.

- Set the DAX flag in this case as well.


Zhihao Cheng (1):
  nvdimm/pmem: Set dax flag for all 'PFN_MAP' cases

 drivers/nvdimm/pmem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Re: [PATCH v11 3/9] remoteproc: k3-m4: Add a remoteproc driver for M4F subsystem

2024-08-16 Thread Mathieu Poirier

On Thu, Aug 15, 2024 at 10:46:41AM -0600, Mathieu Poirier wrote:
> Hi,
> 
> On Fri, Aug 02, 2024 at 10:21:03AM -0500, Andrew Davis wrote:
> > From: Martyn Welch 
> > 
> > The AM62x and AM64x SoCs of the TI K3 family has a Cortex M4F core in
> > the MCU domain. This core is typically used for safety applications in a
> > stand alone mode. However, some application (non safety related) may
> > want to use the M4F core as a generic remote processor with IPC to the
> > host processor. The M4F core has internal IRAM and DRAM memories and are
> > exposed to the system bus for code and data loading.
> > 
> > A remote processor driver is added to support this subsystem, including
> > being able to load and boot the M4F core. Loading includes to M4F
> > internal memories and predefined external code/data memories. The
> > carve outs for external contiguous memory is defined in the M4F device
> > node and should match with the external memory declarations in the M4F
> > image binary. The M4F subsystem has two resets. One reset is for the
> > entire subsystem i.e including the internal memories and the other, a
> > local reset is only for the M4F processing core. When loading the image,
> > the driver first releases the subsystem reset, loads the firmware image
> > and then releases the local reset to let the M4F processing core run.
> > 
> > Signed-off-by: Martyn Welch 
> > Signed-off-by: Hari Nagalla 
> > Signed-off-by: Andrew Davis 
> > ---
> >  drivers/remoteproc/Kconfig   |  13 +
> >  drivers/remoteproc/Makefile  |   1 +
> >  drivers/remoteproc/ti_k3_m4_remoteproc.c | 667 +++
> >  3 files changed, 681 insertions(+)
> >  create mode 100644 drivers/remoteproc/ti_k3_m4_remoteproc.c
> > 
> > diff --git a/drivers/remoteproc/Kconfig b/drivers/remoteproc/Kconfig
> > index dda2ada215b7c..0f0862e20a932 100644
> > --- a/drivers/remoteproc/Kconfig
> > +++ b/drivers/remoteproc/Kconfig
> > @@ -340,6 +340,19 @@ config TI_K3_DSP_REMOTEPROC
> >   It's safe to say N here if you're not interested in utilizing
> >   the DSP slave processors.
> >  
> > +config TI_K3_M4_REMOTEPROC
> > +   tristate "TI K3 M4 remoteproc support"
> > +   depends on ARCH_K3 || COMPILE_TEST
> > +   select MAILBOX
> > +   select OMAP2PLUS_MBOX
> > +   help
> > + Say m here to support TI's M4 remote processor subsystems
> > + on various TI K3 family of SoCs through the remote processor
> > + framework.
> > +
> > + It's safe to say N here if you're not interested in utilizing
> > + a remote processor.
> > +
> >  config TI_K3_R5_REMOTEPROC
> > tristate "TI K3 R5 remoteproc support"
> > depends on ARCH_K3
> > diff --git a/drivers/remoteproc/Makefile b/drivers/remoteproc/Makefile
> > index 91314a9b43cef..5ff4e2fee4abd 100644
> > --- a/drivers/remoteproc/Makefile
> > +++ b/drivers/remoteproc/Makefile
> > @@ -37,5 +37,6 @@ obj-$(CONFIG_ST_REMOTEPROC)   += 
> > st_remoteproc.o
> >  obj-$(CONFIG_ST_SLIM_REMOTEPROC)   += st_slim_rproc.o
> >  obj-$(CONFIG_STM32_RPROC)  += stm32_rproc.o
> >  obj-$(CONFIG_TI_K3_DSP_REMOTEPROC) += ti_k3_dsp_remoteproc.o
> > +obj-$(CONFIG_TI_K3_M4_REMOTEPROC)  += ti_k3_m4_remoteproc.o
> >  obj-$(CONFIG_TI_K3_R5_REMOTEPROC)  += ti_k3_r5_remoteproc.o
> >  obj-$(CONFIG_XLNX_R5_REMOTEPROC)   += xlnx_r5_remoteproc.o
> > diff --git a/drivers/remoteproc/ti_k3_m4_remoteproc.c 
> > b/drivers/remoteproc/ti_k3_m4_remoteproc.c
> > new file mode 100644
> > index 0..09f0484a90e10
> > --- /dev/null
> > +++ b/drivers/remoteproc/ti_k3_m4_remoteproc.c
> > @@ -0,0 +1,667 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * TI K3 Cortex-M4 Remote Processor(s) driver
> > + *
> > + * Copyright (C) 2021-2024 Texas Instruments Incorporated - 
> > https://www.ti.com/
> > + * Hari Nagalla 
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include "omap_remoteproc.h"
> > +#include "remoteproc_internal.h"
> > +#include "ti_sci_proc.h"
> > +
> > +#define K3_M4_IRAM_DEV_ADDR 0x0
> > +#define K3_M4_DRAM_DEV_ADDR 0x3
> > +
> > +/**
> > + * struct k3_m4_rproc_mem - internal memory structure
> > + * @cpu_addr: MPU virtual address of the memory region
> > + * @bus_addr: Bus address used to access the memory region
> > + * @dev_addr: Device address of the memory region from remote processor 
> > view
> > + * @size: Size of the memory region
> > + */
> > +struct k3_m4_rproc_mem {
> > +   void __iomem *cpu_addr;
> > +   phys_addr_t bus_addr;
> > +   u32 dev_addr;
> > +   size_t size;
> > +};
> > +
> > +/**
> > + * struct k3_m4_rproc_mem_data - memory definitions for a remote processor
> > + * @name: name for this memory entry
> > + * @dev_addr: device address for the memory entry
> > + */
> > +struct k3_m4_rproc_mem_data {
> > +   const char *name;
> > +   const u32 dev_addr;
> > +};
> > +
> > +/**
> > + * struct k3_m4_rproc - k3 remo

Re: [PATCH v4 2/3] remoteproc: k3-r5: Acquire mailbox handle during probe routine

2024-08-16 Thread Mathieu Poirier

On Fri, Aug 16, 2024 at 01:23:59PM +0530, Beleswar Prasad Padhi wrote:
> Hi Mathieu,
> 
> On 14-08-2024 21:22, Mathieu Poirier wrote:
> > Hi Beleswar, On Thu, Aug 08, 2024 at 01: 11: 26PM +0530, Beleswar Padhi
> > wrote: > Acquire the mailbox handle during device probe and do not
> > release handle > in stop/detach routine or error paths. This removes the
> > redundant > requests for
> > ZjQcmQRYFpfptBannerStart
> > Report Suspicious
> > 
> > 
> > ZjQcmQRYFpfptBannerEnd
> > Hi Beleswar,
> > 
> > On Thu, Aug 08, 2024 at 01:11:26PM +0530, Beleswar Padhi wrote:
> > > Acquire the mailbox handle during device probe and do not release handle
> > > in stop/detach routine or error paths. This removes the redundant
> > > requests for mbox handle later during rproc start/attach. This also
> > > allows to defer remoteproc driver's probe if mailbox is not probed yet.
> > > > Signed-off-by: Beleswar Padhi 
> > > ---
> > >  drivers/remoteproc/ti_k3_r5_remoteproc.c | 78 +---
> > >  1 file changed, 30 insertions(+), 48 deletions(-)
> > > > diff --git a/drivers/remoteproc/ti_k3_r5_remoteproc.c
> > b/drivers/remoteproc/ti_k3_r5_remoteproc.c
> > > index 57067308b3c0..8a63a9360c0f 100644
> > > --- a/drivers/remoteproc/ti_k3_r5_remoteproc.c
> > > +++ b/drivers/remoteproc/ti_k3_r5_remoteproc.c
> > > @@ -194,6 +194,10 @@ static void k3_r5_rproc_mbox_callback(struct 
> > > mbox_client *client, void *data)
> > >   const char *name = kproc->rproc->name;
> > >   u32 msg = omap_mbox_message(data);
> > >  > +  /* Do not forward message from a detached core */
> > > + if (kproc->rproc->state == RPROC_DETACHED)
> > > + return;
> > > +
> > >   dev_dbg(dev, "mbox msg: 0x%x\n", msg);
> > >  >switch (msg) {
> > > @@ -229,6 +233,10 @@ static void k3_r5_rproc_kick(struct rproc *rproc, 
> > > int vqid)
> > >   mbox_msg_t msg = (mbox_msg_t)vqid;
> > >   int ret;
> > >  > +  /* Do not forward message to a detached core */
> > > + if (kproc->rproc->state == RPROC_DETACHED)
> > > + return;
> > > +
> > >   /* send the index of the triggered virtqueue in the mailbox payload */
> > >   ret = mbox_send_message(kproc->mbox, (void *)msg);
> > >   if (ret < 0)
> > > @@ -399,12 +407,9 @@ static int k3_r5_rproc_request_mbox(struct rproc 
> > > *rproc)
> > >   client->knows_txdone = false;
> > >  >kproc->mbox = mbox_request_channel(client, 0);
> > > - if (IS_ERR(kproc->mbox)) {
> > > - ret = -EBUSY;
> > > - dev_err(dev, "mbox_request_channel failed: %ld\n",
> > > - PTR_ERR(kproc->mbox));
> > > - return ret;
> > > - }
> > > + if (IS_ERR(kproc->mbox))
> > > + return dev_err_probe(dev, PTR_ERR(kproc->mbox),
> > > +  "mbox_request_channel failed\n");
> > >  >/*
> > >* Ping the remote processor, this is only for sanity-sake for now;
> > > @@ -552,10 +557,6 @@ static int k3_r5_rproc_start(struct rproc *rproc)
> > >   u32 boot_addr;
> > >   int ret;
> > >  > -  ret = k3_r5_rproc_request_mbox(rproc);
> > > - if (ret)
> > > - return ret;
> > > -
> > >   boot_addr = rproc->bootaddr;
> > >   /* TODO: add boot_addr sanity checking */
> > >   dev_dbg(dev, "booting R5F core using boot addr = 0x%x\n", boot_addr);
> > > @@ -564,7 +565,7 @@ static int k3_r5_rproc_start(struct rproc *rproc)
> > >   core = kproc->core;
> > >   ret = ti_sci_proc_set_config(core->tsp, boot_addr, 0, 0);
> > >   if (ret)
> > > - goto put_mbox;
> > > + return ret;
> > >  >/* unhalt/run all applicable cores */
> > >   if (cluster->mode == CLUSTER_MODE_LOCKSTEP) {
> > > @@ -580,13 +581,12 @@ static int k3_r5_rproc_start(struct rproc *rproc)
> > >   if (core != core0 && core0->rproc->state == RPROC_OFFLINE) {
> > >   dev_err(dev, "%s: can not start core 1 before core 0\n",
> > >   __func__);
> > > - ret = -EPERM;
> > > - goto put_mbox;
> > > + return -EPERM;
> > >   }
> > >  >ret = k3_r5_core_run(core);
> > >   if (ret)
> > > - goto put_mbox;
> > > + return ret;
> > >   }
> > >  >return 0;
> > > @@ -596,8 +596,6 @@ static int k3_r5_rproc_start(struct rproc *rproc)
> > >   if (k3_r5_core_halt(core))
> > >   dev_warn(core->dev, "core halt back failed\n");
> > >   }
> > > -put_mbox:
> > > - mbox_free_channel(kproc->mbox);
> > >   return ret;
> > >  }
> > >  > @@ -658,8 +656,6 @@ static int k3_r5_rproc_stop(struct rproc
> > *rproc)
> > >   goto out;
> > >   }
> > >  > -  mbox_free_channel(kproc->mbox);
> > > -
> > >   return 0;
> > >  >  unroll_core_halt:
> > > @@ -674,42 +670,22 @@ static int k3_r5_rproc_stop(struct rproc *rproc)
> > >  /*
> > >   * Attach to a running R5F remote processor (IPC-only m

Re: [PATCH v2 16/19] gendwarfksyms: Add support for reserved structure fields

2024-08-16 Thread Sami Tolvanen

Hi Greg,

On Fri, Aug 16, 2024 at 12:20 AM Greg Kroah-Hartman
 wrote:
>
> On Thu, Aug 15, 2024 at 05:39:20PM +, Sami Tolvanen wrote:
> > Distributions that want to maintain a stable kABI need the ability to
> > add reserved fields to kernel data structures that they anticipate
> > will be modified during the ABI support timeframe, either by LTS
> > updates or backports.
> >
> > With genksyms, developers would typically hide changes to the reserved
> > fields from version calculation with #ifndef __GENKSYMS__, which would
> > result in the symbol version not changing even though the actual type
> > of the reserved field changes. When we process precompiled object
> > files, this is again not an option.
> >
> > To support stable symbol versions for reserved fields, change the
> > union type processing to recognize field name prefixes, and if the
> > union contains a field name that starts with __kabi_reserved, only use
> > the type of that field for computing symbol versions. In other words,
> > let's assume we have a structure where we want to reserve space for
> > future changes:
> >
> >   struct struct1 {
> > long a;
> > long __kabi_reserved_0; /* reserved for future use */
> >   };
> >   struct struct1 exported;
> >
> > gendwarfksyms --debug produces the following output:
> >
> >   variable structure_type struct1 {
> > member base_type long int byte_size(8) encoding(5) 
> > data_member_location(0),
> > member base_type long int byte_size(8) encoding(5) 
> > data_member_location(8),
> >   } byte_size(16);
> >   #SYMVER exported 0x67997f89
> >
> > To take the reserved field into use, a distribution would replace it
> > with a union, with one of the fields keeping the __kabi_reserved name
> > prefix for the original type:
> >
> >   struct struct1 {
> > long a;
> > union {
> >   long __kabi_reserved_0;
> >   struct {
> >   int b;
> >   int v;
> >   };
> > };
> >
>
> Ah, ignore my previous email, here's the --stable stuff.
>
> But this all needs to go into some documentation somewhere, trying to
> dig it out of a changelog is going to be impossible to point people at.

I agree, which is why I included the details in the comments too.
There's also an example file if you scroll down a bit further, but I
can certainly add some actual documentation too. Since the --stable
bits are not really needed in the mainline kernel, do you prefer a
file in Documentation/ or is it sufficient to expand the example files
to include any missing details?

> > +/* See dwarf.c:process_reserved */
> > +#define RESERVED_PREFIX "__kabi_reserved"
>
> Seems semi-sane, I can live with this.

Is there something you'd change to make this more than semi-sane?

> I don't know if you want to take the next step and provide examples of
> how to use this in "easy to use macros" for it all, but if so, that
> might be nice.

This should already work with the macros Android uses, for example,
with minor changes. The current example file doesn't include macro
wrappers, but I can add them in the next version.

> Especially as I have no idea how you are going to do
> this with the rust side of things, this all will work for any structures
> defined in .rs code, right?

Yes, Rust structures can use the same scheme. Accessing union members
might be less convenient than in C, but can presumably be wrapped in
helper macros if needed.

Sami

Re: [POC 3/7] livepatch: Use per-state callbacks in state API tests

2024-08-16 Thread Petr Mladek

On Thu 2024-07-25 13:48:06, Miroslav Benes wrote:
> Hi,
> 
> On Fri, 10 Nov 2023, Petr Mladek wrote:
> 
> > Recent changes in the livepatch core have allowed to connect states,
> > shadow variables, and callbacks. Use these new features in
> > the state tests.
> > 
> > Use the shadow variable API to store the original loglevel. It is
> > better suited for this purpose than directly accessing the .data
> > pointer in state klp_state.
> > 
> > Another big advantage is that the shadow variable is preserved
> > when the current patch is replaced by a new version. As a result,
> > there is not need to copy the pointer.
> > 
> > Finally, the lifetime of the shadow variable is connected with
> > the lifetime of the state. It is freed automatically when
> > it is not longer supported.
> > 
> > This results into the following changes in the code:
> > 
> >   + Rename CONSOLE_LOGLEVEL_STATE -> CONSOLE_LOGLEVEL_FIX_ID
> > because it will be used also the for shadow variable
> > 
> >   + Remove the extra code for module coming and going states
> > because the new callback are per-state.
> > 
> >   + Remove callbacks needed to transfer the pointer between
> > states.
> > 
> >   + Keep the versioning of the state to prevent downgrade.
> > The problem is artificial because no callbacks are
> > needed to transfer or free the shadow variable anymore.
> > 
> > Signed-off-by: Petr Mladek 
> 
> it is much cleaner now.
> 
> [...]
> 
> >  static int allocate_loglevel_state(void)
> >  {
> > -   struct klp_state *loglevel_state;
> > +   int *shadow_console_loglevel;
> >  
> > -   loglevel_state = klp_get_state(&patch, CONSOLE_LOGLEVEL_STATE);
> > -   if (!loglevel_state)
> > -   return -EINVAL;
> > +   /* Make sure that the shadow variable does not exist yet. */
> > +   shadow_console_loglevel =
> > +   klp_shadow_alloc(&console_loglevel, CONSOLE_LOGLEVEL_FIX_ID,
> > +sizeof(*shadow_console_loglevel), GFP_KERNEL,
> > +NULL, NULL);
> >  
> > -   loglevel_state->data = kzalloc(sizeof(console_loglevel), GFP_KERNEL);
> > -   if (!loglevel_state->data)
> > +   if (!shadow_console_loglevel) {
> > +   pr_err("%s: failed to allocated shadow variable for storing 
> > original loglevel\n",
> > +  __func__);
> > return -ENOMEM;
> > +   }
> >  
> > pr_info("%s: allocating space to store console_loglevel\n",
> > __func__);
> > +
> > return 0;
> >  }
> 
> Would it make sense to set is_shadow to 1 here? I mean you would pass
> klp_state down to allocate_loglevel_state() from setup callback and set
> its is_shadow member here. Because then...

Right.

> >  static void free_loglevel_state(void)
> >  {
> > -   struct klp_state *loglevel_state;
> > +   int *shadow_console_loglevel;
> >  
> > -   loglevel_state = klp_get_state(&patch, CONSOLE_LOGLEVEL_STATE);
> > -   if (!loglevel_state)
> > +   shadow_console_loglevel =
> > +   (int *)klp_shadow_get(&console_loglevel, 
> > CONSOLE_LOGLEVEL_FIX_ID);
> > +   if (!shadow_console_loglevel)
> > return;
> >  
> > pr_info("%s: freeing space for the stored console_loglevel\n",
> > __func__);
> > -   kfree(loglevel_state->data);
> > +   klp_shadow_free(&console_loglevel, CONSOLE_LOGLEVEL_FIX_ID, NULL);
> >  }
> 
> would not be needed. And release callback neither.
> 
> Or am I wrong?

No, you are perfectly right.

> We can even have both ways implemented to demonstrate different 
> approaches...

I have implemented only your approach ;-)

That said, I am going to keep the callback so that the selftest could
check that it is called at the right time. But the callback will
only print the message. And a comment would explain that is not
really needed.

Also I am going to add a .state_dtor callback so that we could test
the shadow variable is freed. The callback will only print a message.
It is a simple shadow variable and the memory is freed automatically
together with the struct klp_shadow.

Best Regards,
Petr

Re: [PATCHv8 9/9] man2: Add uretprobe syscall page

2024-08-16 Thread Jiri Olsa

On Fri, Aug 16, 2024 at 01:42:26PM +0200, Alejandro Colomar wrote:
> Hi Steven, Jiri,
> 
> On Wed, Aug 07, 2024 at 04:27:34PM GMT, Steven Rostedt wrote:
> > Just in case nobody pinged you, the rest of the series is now in Linus's
> > tree.
> 
> Thanks for the ping!
> 
> I have prepared some tweaks to the patch (see below).
> Also, I have some doubts.  The prototype shows that it has no arguments
> (void), but the text said that arguments, if any, are arch-specific.
> Does any arch have arguments?  Should we use a variadic prototype (...)?

hi,
there are no args for x86.. it's there just to note that it might
be different on other archs, so not sure what man page should say
in such case.. keeping (void) is fine with me

> 
> Please add the changes proposed below to your patch, tweak anything if
> you consider it appropriate) and send it as v10.

it looks good to me, thanks a lot

Acked-by: From: Jiri Olsa 

jirka

> 
> Have a lovely day!
> Alex
> 
> 
> diff --git i/man/man2/uretprobe.2 w/man/man2/uretprobe.2
> index cf1c2b0d8..51b566998 100644
> --- i/man/man2/uretprobe.2
> +++ w/man/man2/uretprobe.2
> @@ -7,50 +7,43 @@ .SH NAME
>  uretprobe \- execute pending return uprobes
>  .SH SYNOPSIS
>  .nf
> -.B int uretprobe(void)
> +.B int uretprobe(void);
>  .fi
>  .SH DESCRIPTION
> -The
>  .BR uretprobe ()
> -system call is an alternative to breakpoint instructions for triggering 
> return
> -uprobe consumers.
> +is an alternative to breakpoint instructions
> +for triggering return uprobe consumers.
>  .P
>  Calls to
>  .BR uretprobe ()
> -system call are only made from the user-space trampoline provided by the 
> kernel.
> +are only made from the user-space trampoline provided by the kernel.
>  Calls from any other place result in a
>  .BR SIGILL .
> -.SH RETURN VALUE
> -The
> +.P
> +Details of the arguments (if any) passed to
>  .BR uretprobe ()
> -system call return value is architecture-specific.
> +are architecture-specific.
> +.SH RETURN VALUE
> +The return value is architecture-specific.
>  .SH ERRORS
>  .TP
>  .B SIGILL
> -The
>  .BR uretprobe ()
> -system call was called by a user-space program.
> +was called by a user-space program.
>  .SH VERSIONS
> -Details of the
> -.BR uretprobe ()
> -system call behavior vary across systems.
> +The behavior varies across systems.
>  .SH STANDARDS
>  None.
>  .SH HISTORY
> -TBD
> -.SH NOTES
> -The
> +Linux 6.11.
> +.P
>  .BR uretprobe ()
> -system call was initially introduced for the x86_64 architecture
> +was initially introduced for the x86_64 architecture
>  where it was shown to be faster than breakpoint traps.
>  It might be extended to other architectures.
> -.P
> -The
> +.SH CAVEATS
>  .BR uretprobe ()
> -system call exists only to allow the invocation of return uprobe consumers.
> +exists only to allow the invocation of return uprobe consumers.
>  It should
>  .B never
>  be called directly.
> -Details of the arguments (if any) passed to
> -.BR uretprobe ()
> -and the return value are architecture-specific.
> 
> -- 
>

Re: [GIT PULL] DAX for 6.11

2024-08-16 Thread Ira Weiny

Ira Weiny wrote:
> Hi Linux, please pull from 
 ^
 Linus.

Apologies,
Ira

> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git/ 
> tags/libnvdimm-fixes-6.11-rc4
> 
> To get a fix for filesystem DAX.
> 
> It has been in -next since August 12th without any reported issues.
> 
> Thanks,
> Ira Weiny
> 
> ---
> 
> The following changes since commit afdab700f65e14070d8ab92175544b1c62b8bf03:
> 
>   Merge tag 'bitmap-6.11-rc' of https://github.com/norov/linux (2024-08-09 
> 11:18:09 -0700)
> 
> are available in the Git repository at:
> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git/ 
> tags/libnvdimm-fixes-6.11-rc4
> 
> for you to fetch changes up to d5240fa65db071909e9d1d5adcc5fd1abc8e96fe:
> 
>   nvdimm/pmem: Set dax flag for all 'PFN_MAP' cases (2024-08-09 14:29:58 
> -0500)
> 
> 
> libnvdimm fixes for v6.11-rc4
> 
> Commit f467fee48da4 ("block: move the dax flag to queue_limits") broke
> the DAX tests by skipping over the legacy pmem mapping pages case.
> 
>   - Set the DAX flag in this case as well.
> 
> 
> Zhihao Cheng (1):
>   nvdimm/pmem: Set dax flag for all 'PFN_MAP' cases
> 
>  drivers/nvdimm/pmem.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
>

Re: [syzbot] [kvm?] [net?] [virt?] INFO: task hung in __vhost_worker_flush

2024-08-16 Thread Sean Christopherson

On Wed, May 29, 2024, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:9b62e02e6336 Merge tag 'mm-hotfixes-stable-2024-05-25-09-1..
> git tree:   upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=16cb0eec98
> kernel config:  https://syzkaller.appspot.com/x/.config?x=3e73beba72b96506
> dashboard link: https://syzkaller.appspot.com/bug?extid=7f3bbe59e8dd2328a990
> compiler:   gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for 
> Debian) 2.40
> 
> Unfortunately, I don't have any reproducer for this issue yet.
> 
> Downloadable assets:
> disk image: 
> https://storage.googleapis.com/syzbot-assets/61b507f6e56c/disk-9b62e02e.raw.xz
> vmlinux: 
> https://storage.googleapis.com/syzbot-assets/6991f1313243/vmlinux-9b62e02e.xz
> kernel image: 
> https://storage.googleapis.com/syzbot-assets/65f88b96d046/bzImage-9b62e02e.xz
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+7f3bbe59e8dd2328a...@syzkaller.appspotmail.com

#syz unset kvm

Re: [syzbot] [kvm?] [net?] [virt?] INFO: task hung in __vhost_worker_flush

2024-08-16 Thread syzbot

> On Wed, May 29, 2024, syzbot wrote:
>> Hello,
>> 
>> syzbot found the following issue on:
>> 
>> HEAD commit:9b62e02e6336 Merge tag 'mm-hotfixes-stable-2024-05-25-09-1..
>> git tree:   upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=16cb0eec98
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=3e73beba72b96506
>> dashboard link: https://syzkaller.appspot.com/bug?extid=7f3bbe59e8dd2328a990
>> compiler:   gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for 
>> Debian) 2.40
>> 
>> Unfortunately, I don't have any reproducer for this issue yet.
>> 
>> Downloadable assets:
>> disk image: 
>> https://storage.googleapis.com/syzbot-assets/61b507f6e56c/disk-9b62e02e.raw.xz
>> vmlinux: 
>> https://storage.googleapis.com/syzbot-assets/6991f1313243/vmlinux-9b62e02e.xz
>> kernel image: 
>> https://storage.googleapis.com/syzbot-assets/65f88b96d046/bzImage-9b62e02e.xz
>> 
>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> Reported-by: syzbot+7f3bbe59e8dd2328a...@syzkaller.appspotmail.com
>
> #syz unset kvm

The following labels did not exist: kvm

Re: [syzbot] [kvm?] [net?] [virt?] INFO: task hung in __vhost_worker_flush

2024-08-16 Thread Sean Christopherson

On Fri, Aug 16, 2024, syzbot wrote:
> > On Wed, May 29, 2024, syzbot wrote:
> >> Hello,
> >> 
> >> syzbot found the following issue on:
> >> 
> >> HEAD commit:9b62e02e6336 Merge tag 
> >> 'mm-hotfixes-stable-2024-05-25-09-1..
> >> git tree:   upstream
> >> console output: https://syzkaller.appspot.com/x/log.txt?x=16cb0eec98
> >> kernel config:  https://syzkaller.appspot.com/x/.config?x=3e73beba72b96506
> >> dashboard link: 
> >> https://syzkaller.appspot.com/bug?extid=7f3bbe59e8dd2328a990
> >> compiler:   gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for 
> >> Debian) 2.40
> >> 
> >> Unfortunately, I don't have any reproducer for this issue yet.
> >> 
> >> Downloadable assets:
> >> disk image: 
> >> https://storage.googleapis.com/syzbot-assets/61b507f6e56c/disk-9b62e02e.raw.xz
> >> vmlinux: 
> >> https://storage.googleapis.com/syzbot-assets/6991f1313243/vmlinux-9b62e02e.xz
> >> kernel image: 
> >> https://storage.googleapis.com/syzbot-assets/65f88b96d046/bzImage-9b62e02e.xz
> >> 
> >> IMPORTANT: if you fix the issue, please add the following tag to the 
> >> commit:
> >> Reported-by: syzbot+7f3bbe59e8dd2328a...@syzkaller.appspotmail.com
> >
> > #syz unset kvm
> 
> The following labels did not exist: kvm

Hrm, looks like there's no unset for a single subsytem, so:

#syz set subsystems: net,virt

Re: [syzbot] [kvm?] [net?] [virt?] INFO: task hung in __vhost_worker_flush

2024-08-16 Thread Michael S. Tsirkin

On Fri, Aug 16, 2024 at 11:10:32AM -0700, Sean Christopherson wrote:
> On Fri, Aug 16, 2024, syzbot wrote:
> > > On Wed, May 29, 2024, syzbot wrote:
> > >> Hello,
> > >> 
> > >> syzbot found the following issue on:
> > >> 
> > >> HEAD commit:9b62e02e6336 Merge tag 
> > >> 'mm-hotfixes-stable-2024-05-25-09-1..
> > >> git tree:   upstream
> > >> console output: https://syzkaller.appspot.com/x/log.txt?x=16cb0eec98
> > >> kernel config:  
> > >> https://syzkaller.appspot.com/x/.config?x=3e73beba72b96506
> > >> dashboard link: 
> > >> https://syzkaller.appspot.com/bug?extid=7f3bbe59e8dd2328a990
> > >> compiler:   gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for 
> > >> Debian) 2.40
> > >> 
> > >> Unfortunately, I don't have any reproducer for this issue yet.
> > >> 
> > >> Downloadable assets:
> > >> disk image: 
> > >> https://storage.googleapis.com/syzbot-assets/61b507f6e56c/disk-9b62e02e.raw.xz
> > >> vmlinux: 
> > >> https://storage.googleapis.com/syzbot-assets/6991f1313243/vmlinux-9b62e02e.xz
> > >> kernel image: 
> > >> https://storage.googleapis.com/syzbot-assets/65f88b96d046/bzImage-9b62e02e.xz
> > >> 
> > >> IMPORTANT: if you fix the issue, please add the following tag to the 
> > >> commit:
> > >> Reported-by: syzbot+7f3bbe59e8dd2328a...@syzkaller.appspotmail.com
> > >
> > > #syz unset kvm
> > 
> > The following labels did not exist: kvm
> 
> Hrm, looks like there's no unset for a single subsytem, so:
> 
> #syz set subsystems: net,virt

Must be this patchset:

https://lore.kernel.org/all/20240316004707.45557-1-michael.chris...@oracle.com/

but I don't see anything obvious there to trigger it, and it's not
reproducible yet...

-- 
MST

Re: [GIT PULL] DAX for 6.11

2024-08-16 Thread Linus Torvalds

On Fri, 16 Aug 2024 at 10:15, Ira Weiny  wrote:
>
> Ira Weiny wrote:
> > Hi Linux, please pull from
>  ^
>  Linus.
>
> Apologies,

Heh. I've been called worse. And it's a fairly common typo with people
whose fingers are used to type "Linux" and do so on auto-pilot.

I have an evil twin called Kubys, which is what happens when I type my
own name and my right hand is off by one key.

So if I occasionally mis-type my own name, I can hardly complain when
others do it..

  Linus

Re: [GIT PULL] DAX for 6.11

2024-08-16 Thread pr-tracker-bot

The pull request you sent on Fri, 16 Aug 2024 08:44:39 -0500:

> https://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git/ 
> tags/libnvdimm-fixes-6.11-rc4

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/e4a55b555db6d2a006551605ef4404529e878cd2

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

Re: [PATCHv8 9/9] man2: Add uretprobe syscall page

2024-08-16 Thread Alejandro Colomar

On Fri, Aug 16, 2024 at 07:03:59PM GMT, Jiri Olsa wrote:
> On Fri, Aug 16, 2024 at 01:42:26PM +0200, Alejandro Colomar wrote:
> > Hi Steven, Jiri,
> > 
> > On Wed, Aug 07, 2024 at 04:27:34PM GMT, Steven Rostedt wrote:
> > > Just in case nobody pinged you, the rest of the series is now in Linus's
> > > tree.
> > 
> > Thanks for the ping!
> > 
> > I have prepared some tweaks to the patch (see below).
> > Also, I have some doubts.  The prototype shows that it has no arguments
> > (void), but the text said that arguments, if any, are arch-specific.
> > Does any arch have arguments?  Should we use a variadic prototype (...)?
> 
> hi,
> there are no args for x86.. it's there just to note that it might
> be different on other archs, so not sure what man page should say
> in such case.. keeping (void) is fine with me

Hmmm, then I'll remove that paragraph.  If that function is implemented
in another arch and the args are different, we can change the manual
page then.

> 
> > 
> > Please add the changes proposed below to your patch, tweak anything if
> > you consider it appropriate) and send it as v10.
> 
> it looks good to me, thanks a lot
> 
> Acked-by: From: Jiri Olsa 

Thanks!

Have a lovely day!
Alex

> 
> jirka
> 
> > 
> > Have a lovely day!
> > Alex
> > 
> > 
> > diff --git i/man/man2/uretprobe.2 w/man/man2/uretprobe.2
> > index cf1c2b0d8..51b566998 100644
> > --- i/man/man2/uretprobe.2
> > +++ w/man/man2/uretprobe.2
> > @@ -7,50 +7,43 @@ .SH NAME
> >  uretprobe \- execute pending return uprobes
> >  .SH SYNOPSIS
> >  .nf
> > -.B int uretprobe(void)
> > +.B int uretprobe(void);
> >  .fi
> >  .SH DESCRIPTION
> > -The
> >  .BR uretprobe ()
> > -system call is an alternative to breakpoint instructions for triggering 
> > return
> > -uprobe consumers.
> > +is an alternative to breakpoint instructions
> > +for triggering return uprobe consumers.
> >  .P
> >  Calls to
> >  .BR uretprobe ()
> > -system call are only made from the user-space trampoline provided by the 
> > kernel.
> > +are only made from the user-space trampoline provided by the kernel.
> >  Calls from any other place result in a
> >  .BR SIGILL .
> > -.SH RETURN VALUE
> > -The
> > +.P
> > +Details of the arguments (if any) passed to
> >  .BR uretprobe ()
> > -system call return value is architecture-specific.
> > +are architecture-specific.
> > +.SH RETURN VALUE
> > +The return value is architecture-specific.
> >  .SH ERRORS
> >  .TP
> >  .B SIGILL
> > -The
> >  .BR uretprobe ()
> > -system call was called by a user-space program.
> > +was called by a user-space program.
> >  .SH VERSIONS
> > -Details of the
> > -.BR uretprobe ()
> > -system call behavior vary across systems.
> > +The behavior varies across systems.
> >  .SH STANDARDS
> >  None.
> >  .SH HISTORY
> > -TBD
> > -.SH NOTES
> > -The
> > +Linux 6.11.
> > +.P
> >  .BR uretprobe ()
> > -system call was initially introduced for the x86_64 architecture
> > +was initially introduced for the x86_64 architecture
> >  where it was shown to be faster than breakpoint traps.
> >  It might be extended to other architectures.
> > -.P
> > -The
> > +.SH CAVEATS
> >  .BR uretprobe ()
> > -system call exists only to allow the invocation of return uprobe consumers.
> > +exists only to allow the invocation of return uprobe consumers.
> >  It should
> >  .B never
> >  be called directly.
> > -Details of the arguments (if any) passed to
> > -.BR uretprobe ()
> > -and the return value are architecture-specific.
> > 
> > -- 
> > 
> 

-- 



signature.asc
Description: PGP signature

[PATCH v4] filemap: add trace events for get_pages, map_pages, and fault

2024-08-16 Thread Takaya Saeki

To allow precise tracking of page caches accessed, add new tracepoints
that trigger when a process actually accesses them.

The ureadahead program used by ChromeOS traces the disk access of
programs as they start up at boot up. It uses mincore(2) or the
'mm_filemap_add_to_page_cache' trace event to accomplish this. It stores
this information in a "pack" file and on subsequent boots, it will read
the pack file and call readahead(2) on the information so that disk
storage can be loaded into RAM before the applications actually need it.

A problem we see is that due to the kernel's readahead algorithm that
can aggressively pull in more data than needed (to try and accomplish
the same goal) and this data is also recorded. The end result is that
the pack file contains a lot of pages on disk that are never actually
used. Calling readahead(2) on these unused pages can slow down the
system boot up times.

To solve this, add 3 new trace events, get_pages, map_pages, and fault.
These will be used to trace the pages are not only pulled in from disk,
but are actually used by the application. Only those pages will be
stored in the pack file, and this helps out the performance of boot up.

With the combination of these 3 new trace events and
mm_filemap_add_to_page_cache, we observed a reduction in the pack file
by 7.3% - 20% on ChromeOS varying by device.

Signed-off-by: Takaya Saeki 
Reviewed-by: Masami Hiramatsu (Google) 
Reviewed-by: Steven Rostedt (Google) 
---
Changelog between v4 and v3
- fix mm_filemap_get_pages by replacing last_index with last_index-1.
  it is an open interval while mm_filemap_map_pages's one is inclusive.

Changelog between v3 and v2
- Use a range notation in the printf format 

Changelog between v2 and v1
- Fix a file offset type usage by casting pgoff_t to loff_t
- Fix format string of dev and inode
 include/trace/events/filemap.h | 84 ++
 mm/filemap.c   |  4 ++
 2 files changed, 88 insertions(+)

diff --git a/include/trace/events/filemap.h b/include/trace/events/filemap.h
index 46c89c1e460c..f48fe637bfd2 100644
--- a/include/trace/events/filemap.h
+++ b/include/trace/events/filemap.h
@@ -56,6 +56,90 @@ DEFINE_EVENT(mm_filemap_op_page_cache, 
mm_filemap_add_to_page_cache,
TP_ARGS(folio)
);
 
+DECLARE_EVENT_CLASS(mm_filemap_op_page_cache_range,
+
+   TP_PROTO(
+   struct address_space *mapping,
+   pgoff_t index,
+   pgoff_t last_index
+   ),
+
+   TP_ARGS(mapping, index, last_index),
+
+   TP_STRUCT__entry(
+   __field(unsigned long, i_ino)
+   __field(dev_t, s_dev)
+   __field(unsigned long, index)
+   __field(unsigned long, last_index)
+   ),
+
+   TP_fast_assign(
+   __entry->i_ino = mapping->host->i_ino;
+   if (mapping->host->i_sb)
+   __entry->s_dev =
+   mapping->host->i_sb->s_dev;
+   else
+   __entry->s_dev = mapping->host->i_rdev;
+   __entry->index = index;
+   __entry->last_index = last_index;
+   ),
+
+   TP_printk(
+   "dev=%d:%d ino=%lx ofs=%lld-%lld",
+   MAJOR(__entry->s_dev),
+   MINOR(__entry->s_dev), __entry->i_ino,
+   ((loff_t)__entry->index) << PAGE_SHIFT,
+   loff_t)__entry->last_index + 1) << PAGE_SHIFT) - 1)
+   )
+);
+
+DEFINE_EVENT(mm_filemap_op_page_cache_range, mm_filemap_get_pages,
+   TP_PROTO(
+   struct address_space *mapping,
+   pgoff_t index,
+   pgoff_t last_index
+   ),
+   TP_ARGS(mapping, index, last_index)
+);
+
+DEFINE_EVENT(mm_filemap_op_page_cache_range, mm_filemap_map_pages,
+   TP_PROTO(
+   struct address_space *mapping,
+   pgoff_t index,
+   pgoff_t last_index
+   ),
+   TP_ARGS(mapping, index, last_index)
+);
+
+TRACE_EVENT(mm_filemap_fault,
+   TP_PROTO(struct address_space *mapping, pgoff_t index),
+
+   TP_ARGS(mapping, index),
+
+   TP_STRUCT__entry(
+   __field(unsigned long, i_ino)
+   __field(dev_t, s_dev)
+   __field(unsigned long, index)
+   ),
+
+   TP_fast_assign(
+   __entry->i_ino = mapping->host->i_ino;
+   if (mapping->host->i_sb)
+   __entry->s_dev =
+   mapping->host->i_sb->s_dev;
+   else
+   __entry->s_dev = mapping->host->i_rdev;
+   __entry->index = index;
+   ),
+
+   TP_printk(
+   "dev=%d:%d ino=%lx ofs=%lld",
+   MAJOR(__entry->s_dev),
+   MINOR(__entry->s_dev), __entry->i_ino,
+   ((loff_t)__entry->index) << PAGE_SHIFT
+   )
+);
+
 TRACE_EVENT(filemap_set_wb_err,
TP_PROTO(struct address_space *mapping, errseq_

Re: [PATCHv8 9/9] man2: Add uretprobe syscall page

2024-08-16 Thread Alejandro Colomar

Hi Jiri, Steven,

On Fri, Aug 16, 2024 at 08:55:47PM GMT, Alejandro Colomar wrote:
> > hi,
> > there are no args for x86.. it's there just to note that it might
> > be different on other archs, so not sure what man page should say
> > in such case.. keeping (void) is fine with me
> 
> Hmmm, then I'll remove that paragraph.  If that function is implemented
> in another arch and the args are different, we can change the manual
> page then.
> 
> > 
> > > 
> > > Please add the changes proposed below to your patch, tweak anything if
> > > you consider it appropriate) and send it as v10.
> > 
> > it looks good to me, thanks a lot
> > 
> > Acked-by: From: Jiri Olsa 

I have applied your patch with the tweaks I mentioned, and added several
tags to the commit message.

It's currently here:


and will $soon be pushed to master.

Have a lovely night!
Alex


-- 



signature.asc
Description: PGP signature

Re: [PATCH v3 14/16] modules: Support extended MODVERSIONS info

2024-08-16 Thread Michael Ellerman

Matthew Maurer  writes:
> Adds a new format for MODVERSIONS which stores each field in a separate
> ELF section. This initially adds support for variable length names, but
> could later be used to add additional fields to MODVERSIONS in a
> backwards compatible way if needed. Any new fields will be ignored by
> old user tooling, unlike the current format where user tooling cannot
> tolerate adjustments to the format (for example making the name field
> longer).
>
> Since PPC munges its version records to strip leading dots, we reproduce
> the munging for the new format.

AFAICS the existing code only strips a single leading dot, not all
leading dots?

cheers

40 matches

Mail list logo