date:20231201

Re: [PATCH] 9p: prevent read overrun in protocol dump tracepoint

2023-12-01 Thread asmadeus

asmad...@codewreck.org wrote on Sat, Dec 02, 2023 at 01:35:18PM +0900:
> > diff --git a/include/trace/events/9p.h b/include/trace/events/9p.h
> > index 4dfa6d7f83ba..8690a7086252 100644
> > --- a/include/trace/events/9p.h
> > +++ b/include/trace/events/9p.h
> > @@ -185,7 +185,8 @@ TRACE_EVENT(9p_protocol_dump,
> > __entry->clnt   =  clnt;
> > __entry->type   =  pdu->id;
> > __entry->tag=  pdu->tag;
> > -   memcpy(__entry->line, pdu->sdata, P9_PROTO_DUMP_SZ);
> > +   memcpy(__entry->line, pdu->sdata,
> > +   min(pdu->capacity, P9_PROTO_DUMP_SZ));

Building with W=1 yields a warning:
./include/linux/minmax.h:21:35: warning: comparison of distinct pointer types 
lacks a cast
...
./include/trace/events/9p.h:189:33: note: in expansion of macro ‘min’
  189 | min(pdu->capacity, P9_PROTO_DUMP_SZ));

I've updated the patch to:
+   min_t(size_t, pdu->capacity, P9_PROTO_DUMP_SZ));

and pushed to my -next branch:
https://github.com/martinetd/linux/commits/9p-next

-- 
Dominique Martinet | Asmadeus

Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER

2023-12-01 Thread Ard Biesheuvel

On Fri, 1 Dec 2023 at 23:59, Justin Chen  wrote:
>
>
>
> On 12/1/23 10:07 AM, Steven Rostedt wrote:
> > On Fri, 1 Dec 2023 09:25:59 -0800
> > Justin Chen  wrote:
> >
> >>> It appears the sub instruction at 0x6dd0 correctly accounts for the
> >>> extra 8 bytes, so the frame pointer is valid. So it is our assumption
> >>> that there are no gaps between the stack frames is invalid.
> >>
> >> Thanks for the assistance. The gap between the stack frame depends on
> >> the function. Most do not have a gap. Some have 8 (as shown above), some
> >> have 12. A single assumption here is not going to work. I'm having a
> >> hard time finding out the reasoning for this gap. I tried disabling a
> >> bunch of gcc flags as well as -O2 and the gap still exists.
> >
> > That code was originally added because of some strange things that gcc did
> > with mcount (for example, it made a copy of the stack frame that it passed
> > to mcount, where the function graph tracer replaced the copy of the return
> > stack making the shadow stack go out of sync and crash). This was very hard
> > to debug and I added this code to detect it if it happened again.
> >
> > Well it's been over a decade since that happened (2009).
> >
> >71e308a239c09 ("function-graph: add stack frame test")
> >
> > I'm happy assuming that the compiler folks are aware of our tricks with
> > hijacking return calls and I don't expect it to happen again. We can just
> > rip out those checks. That is, if it's only causing false positives, I
> > don't think it's worth keeping around.
> >
> > Has it detected any real issues on the Arm platforms?
> >
> > -- Steve
>
> I am not familiar enough to make a call. But from my limited testing
> with ARM, I didn't see any issues. If you would like me to, I can submit
> a patch to remove the check entirely. Or maybe only disable it for ARM?
>

Please try the fix I proposed first.

Re: [PATCH] 9p: prevent read overrun in protocol dump tracepoint

2023-12-01 Thread asmadeus

JP Kobryn wrote on Fri, Dec 01, 2023 at 07:04:10PM -0800:
> An out of bounds read can occur within the tracepoint 9p_protocol_dump().
> In the fast assign, there is a memcpy that uses a constant size of 32
> (macro definition as P9_PROTO_DUMP_SZ). When the copy is invoked, the
> source buffer is not guaranteed match this size. It was found that in some
> cases the source buffer size is less than 32, resulting in a read that
> overruns.
> 
> The size of the source buffer seems to be known at the time of the
> tracepoint being invoked. The allocations happen within p9_fcall_init(),
> where the capacity field is set to the allocated size of the payload
> buffer. This patch tries to fix the overrun by using the minimum of that
> field (size of source buffer) and the size of destination buffer when
> performing the copy.

Good catch; this is a regression due to a semi-recent optimization in
commit 60ece0833b6c ("net/9p: allocate appropriate reduced message
buffers")
For some reason I thought we rounded up small messages alloc to 4k but
I've just confirmed we don't, so these overruns are quite frequent.
I'll add the fixes tag and cc to stable if there's no other comment.

Thanks!

> 
> A repro can be performed by different ways. The simplest might just be
> mounting a shared filesystem (between host and guest vm) using the plan
> 9 protocol while the tracepoint is enabled.
> 
> mount -t 9p -o trans=virtio  
> 
> The bpftrace program below can be used to show the out of bounds read.
> Note that a recent version of bpftrace is needed for the raw tracepoint
> support. The script was tested using v0.19.0.
> 
> /* from include/net/9p/9p.h */
> struct p9_fcall {
> u32 size;
> u8 id;
> u16 tag;
> size_t offset;
> size_t capacity;
> struct kmem_cache *cache;
> u8 *sdata;
> bool zc;
> };
> 
> tracepoint:9p:9p_protocol_dump
> {
> /* out of bounds read can happen when this tracepoint is enabled */
> }
> 
> rawtracepoint:9p_protocol_dump
> {
> $pdu = (struct p9_fcall *)arg1;
> $dump_sz = (uint64)32;
> 
> if ($dump_sz > $pdu->capacity) {
> printf("reading %zu bytes from src buffer of %zu bytes\n",
> $dump_sz, $pdu->capacity);
> }
> }
> 
> Signed-off-by: JP Kobryn 
> ---
>  include/trace/events/9p.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/include/trace/events/9p.h b/include/trace/events/9p.h
> index 4dfa6d7f83ba..8690a7086252 100644
> --- a/include/trace/events/9p.h
> +++ b/include/trace/events/9p.h
> @@ -185,7 +185,8 @@ TRACE_EVENT(9p_protocol_dump,
>   __entry->clnt   =  clnt;
>   __entry->type   =  pdu->id;
>   __entry->tag=  pdu->tag;
> - memcpy(__entry->line, pdu->sdata, P9_PROTO_DUMP_SZ);
> + memcpy(__entry->line, pdu->sdata,
> + min(pdu->capacity, P9_PROTO_DUMP_SZ));
>   ),
>   TP_printk("clnt %lu %s(tag = %d)\n%.3x: %16ph\n%.3x: %16ph\n",
> (unsigned long)__entry->clnt, show_9p_op(__entry->type),

-- 
Dominique Martinet | Asmadeus

[PATCH] 9p: prevent read overrun in protocol dump tracepoint

2023-12-01 Thread JP Kobryn

An out of bounds read can occur within the tracepoint 9p_protocol_dump().
In the fast assign, there is a memcpy that uses a constant size of 32
(macro definition as P9_PROTO_DUMP_SZ). When the copy is invoked, the
source buffer is not guaranteed match this size. It was found that in some
cases the source buffer size is less than 32, resulting in a read that
overruns.

The size of the source buffer seems to be known at the time of the
tracepoint being invoked. The allocations happen within p9_fcall_init(),
where the capacity field is set to the allocated size of the payload
buffer. This patch tries to fix the overrun by using the minimum of that
field (size of source buffer) and the size of destination buffer when
performing the copy.

A repro can be performed by different ways. The simplest might just be
mounting a shared filesystem (between host and guest vm) using the plan
9 protocol while the tracepoint is enabled.

mount -t 9p -o trans=virtio  

The bpftrace program below can be used to show the out of bounds read.
Note that a recent version of bpftrace is needed for the raw tracepoint
support. The script was tested using v0.19.0.

/* from include/net/9p/9p.h */
struct p9_fcall {
u32 size;
u8 id;
u16 tag;
size_t offset;
size_t capacity;
struct kmem_cache *cache;
u8 *sdata;
bool zc;
};

tracepoint:9p:9p_protocol_dump
{
/* out of bounds read can happen when this tracepoint is enabled */
}

rawtracepoint:9p_protocol_dump
{
$pdu = (struct p9_fcall *)arg1;
$dump_sz = (uint64)32;

if ($dump_sz > $pdu->capacity) {
printf("reading %zu bytes from src buffer of %zu bytes\n",
$dump_sz, $pdu->capacity);
}
}

Signed-off-by: JP Kobryn 
---
 include/trace/events/9p.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/trace/events/9p.h b/include/trace/events/9p.h
index 4dfa6d7f83ba..8690a7086252 100644
--- a/include/trace/events/9p.h
+++ b/include/trace/events/9p.h
@@ -185,7 +185,8 @@ TRACE_EVENT(9p_protocol_dump,
__entry->clnt   =  clnt;
__entry->type   =  pdu->id;
__entry->tag=  pdu->tag;
-   memcpy(__entry->line, pdu->sdata, P9_PROTO_DUMP_SZ);
+   memcpy(__entry->line, pdu->sdata,
+   min(pdu->capacity, P9_PROTO_DUMP_SZ));
),
TP_printk("clnt %lu %s(tag = %d)\n%.3x: %16ph\n%.3x: %16ph\n",
  (unsigned long)__entry->clnt, show_9p_op(__entry->type),
-- 
2.43.0

Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER

2023-12-01 Thread Justin Chen




On 12/1/23 10:07 AM, Steven Rostedt wrote:

On Fri, 1 Dec 2023 09:25:59 -0800
Justin Chen  wrote:


It appears the sub instruction at 0x6dd0 correctly accounts for the
extra 8 bytes, so the frame pointer is valid. So it is our assumption
that there are no gaps between the stack frames is invalid.


Thanks for the assistance. The gap between the stack frame depends on
the function. Most do not have a gap. Some have 8 (as shown above), some
have 12. A single assumption here is not going to work. I'm having a
hard time finding out the reasoning for this gap. I tried disabling a
bunch of gcc flags as well as -O2 and the gap still exists.


That code was originally added because of some strange things that gcc did
with mcount (for example, it made a copy of the stack frame that it passed
to mcount, where the function graph tracer replaced the copy of the return
stack making the shadow stack go out of sync and crash). This was very hard
to debug and I added this code to detect it if it happened again.

Well it's been over a decade since that happened (2009).

   71e308a239c09 ("function-graph: add stack frame test")

I'm happy assuming that the compiler folks are aware of our tricks with
hijacking return calls and I don't expect it to happen again. We can just
rip out those checks. That is, if it's only causing false positives, I
don't think it's worth keeping around.

Has it detected any real issues on the Arm platforms?

-- Steve


I am not familiar enough to make a call. But from my limited testing 
with ARM, I didn't see any issues. If you would like me to, I can submit 
a patch to remove the check entirely. Or maybe only disable it for ARM?


Thanks,
Justin


smime.p7s
Description: S/MIME Cryptographic Signature

Re: [PATCH v3 0/5] params: harden string ops and allocatio ops

2023-12-01 Thread Andy Shevchenko

On Fri, Dec 01, 2023 at 09:43:34AM -0800, Kees Cook wrote:
> On Mon, 20 Nov 2023 17:11:41 +0200, Andy Shevchenko wrote:
> > A couple of patches are for get the string ops, used in the module,
> > slightly harden. On top a few cleanups.
> > 
> > Since the main part is rather hardening, I think the Kees' tree is
> > the best fit for the series. It also possible to route via Greg's
> > sysfs (driver core?), but I'm open for another option(s).

[...]

> Applied to for-next/hardening, thanks!

Awesome, thanks!

-- 
With Best Regards,
Andy Shevchenko

Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER

2023-12-01 Thread Russell King (Oracle)

On Fri, Dec 01, 2023 at 10:12:48AM +0100, Ard Biesheuvel wrote:
> It appears the sub instruction at 0x6dd0 correctly accounts for the
> extra 8 bytes, so the frame pointer is valid. So it is our assumption
> that there are no gaps between the stack frames is invalid.
> 
> Could you try the following change please?
> 
> --- a/arch/arm/kernel/ftrace.c
> +++ b/arch/arm/kernel/ftrace.c
> @@ -235,8 +235,12 @@
> return;
> 
> if (IS_ENABLED(CONFIG_UNWINDER_FRAME_POINTER)) {
> -   /* FP points one word below parent's top of stack */
> -   frame_pointer += 4;
> +   /*
> +* The top of stack of the parent is recorded in the stack
> +* frame at offset [fp, #-8].
> +*/
> +   get_kernel_nofault(frame_pointer,
> +  (unsigned long *)(frame_pointer - 8));

Yes, this will get the value of the stack pointer when the function
was entered - which may be the bottom of the parent function's stack
_or_ the start of non-register arguments to this function. So your
replacement has always been more correct.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

Re: [PATCH v7 3/4] remoteproc: zynqmp: add pm domains support

2023-12-01 Thread Tanmay Shah



On 11/29/23 11:10 AM, Mathieu Poirier wrote:
> On Mon, Nov 27, 2023 at 10:33:05AM -0600, Tanmay Shah wrote:
> > 
> > On 11/23/23 12:11 PM, Mathieu Poirier wrote:
> > > On Wed, Nov 22, 2023 at 03:00:36PM -0600, Tanmay Shah wrote:
> > > > Hi Mathieu,
> > > > 
> > > > Please find my comments below.
> > > > 
> > > > On 11/21/23 4:59 PM, Mathieu Poirier wrote:
> > > > > Hi,
> > > > >
> > > > > On Fri, Nov 17, 2023 at 09:42:37AM -0800, Tanmay Shah wrote:
> > > > > > Use TCM pm domains extracted from device-tree
> > > > > > to power on/off TCM using general pm domain framework.
> > > > > > 
> > > > > > Signed-off-by: Tanmay Shah 
> > > > > > ---
> > > > > > 
> > > > > > Changes in v7:
> > > > > >   - %s/pm_dev1/pm_dev_core0/r
> > > > > >   - %s/pm_dev_link1/pm_dev_core0_link/r
> > > > > >   - %s/pm_dev2/pm_dev_core1/r
> > > > > >   - %s/pm_dev_link2/pm_dev_core1_link/r
> > > > > >   - remove pm_domain_id check to move next patch
> > > > > >   - add comment about how 1st entry in pm domain list is used
> > > > > >   - fix loop when jump to fail_add_pm_domains loop
> > > > > > 
> > > > > >  drivers/remoteproc/xlnx_r5_remoteproc.c | 215 
> > > > > > +++-
> > > > > >  1 file changed, 212 insertions(+), 3 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c 
> > > > > > b/drivers/remoteproc/xlnx_r5_remoteproc.c
> > > > > > index 4395edea9a64..22bccc5075a0 100644
> > > > > > --- a/drivers/remoteproc/xlnx_r5_remoteproc.c
> > > > > > +++ b/drivers/remoteproc/xlnx_r5_remoteproc.c
> > > > > > @@ -16,6 +16,7 @@
> > > > > >  #include 
> > > > > >  #include 
> > > > > >  #include 
> > > > > > +#include 
> > > > > >  
> > > > > >  #include "remoteproc_internal.h"
> > > > > >  
> > > > > > @@ -102,6 +103,12 @@ static const struct mem_bank_data 
> > > > > > zynqmp_tcm_banks_lockstep[] = {
> > > > > >   * @rproc: rproc handle
> > > > > >   * @pm_domain_id: RPU CPU power domain id
> > > > > >   * @ipi: pointer to mailbox information
> > > > > > + * @num_pm_dev: number of tcm pm domain devices for this core
> > > > > > + * @pm_dev_core0: pm domain virtual devices for power domain 
> > > > > > framework
> > > > > > + * @pm_dev_core0_link: pm domain device links after registration
> > > > > > + * @pm_dev_core1: used only in lockstep mode. second core's pm 
> > > > > > domain virtual devices
> > > > > > + * @pm_dev_core1_link: used only in lockstep mode. second core's 
> > > > > > pm device links after
> > > > > > + * registration
> > > > > >   */
> > > > > >  struct zynqmp_r5_core {
> > > > > > struct device *dev;
> > > > > > @@ -111,6 +118,11 @@ struct zynqmp_r5_core {
> > > > > > struct rproc *rproc;
> > > > > > u32 pm_domain_id;
> > > > > > struct mbox_info *ipi;
> > > > > > +   int num_pm_dev;
> > > > > > +   struct device **pm_dev_core0;
> > > > > > +   struct device_link **pm_dev_core0_link;
> > > > > > +   struct device **pm_dev_core1;
> > > > > > +   struct device_link **pm_dev_core1_link;
> > > > > >  };
> > > > > >  
> > > > > >  /**
> > > > > > @@ -651,7 +663,8 @@ static int 
> > > > > > add_tcm_carveout_lockstep_mode(struct rproc *rproc)
> > > > > >  
> > > > > > ZYNQMP_PM_CAPABILITY_ACCESS, 0,
> > > > > >  
> > > > > > ZYNQMP_PM_REQUEST_ACK_BLOCKING);
> > > > > > if (ret < 0) {
> > > > > > -   dev_err(dev, "failed to turn on TCM 0x%x", 
> > > > > > pm_domain_id);
> > > > > > +   dev_err(dev, "failed to turn on TCM 0x%x",
> > > > > > +   pm_domain_id);
> > > > >
> > > > > Spurious change, you should have caught that.
> > > > 
> > > > Ack, need to observe changes more closely before sending them.
> > > > 
> > > > >
> > > > > > goto release_tcm_lockstep;
> > > > > > }
> > > > > >  
> > > > > > @@ -758,6 +771,189 @@ static int zynqmp_r5_parse_fw(struct rproc 
> > > > > > *rproc, const struct firmware *fw)
> > > > > > return ret;
> > > > > >  }
> > > > > >  
> > > > > > +static void zynqmp_r5_remove_pm_domains(struct rproc *rproc)
> > > > > > +{
> > > > > > +   struct zynqmp_r5_core *r5_core = rproc->priv;
> > > > > > +   struct device *dev = r5_core->dev;
> > > > > > +   struct zynqmp_r5_cluster *cluster;
> > > > > > +   int i;
> > > > > > +
> > > > > > +   cluster = platform_get_drvdata(to_platform_device(dev->parent));
> > > > > > +
> > > > > > +   for (i = 1; i < r5_core->num_pm_dev; i++) {
> > > > > > +   device_link_del(r5_core->pm_dev_core0_link[i]);
> > > > > > +   dev_pm_domain_detach(r5_core->pm_dev_core0[i], false);
> > > > > > +   }
> > > > > > +
> > > > > > +   kfree(r5_core->pm_dev_core0);
> > > > > > +   r5_core->pm_dev_core0 = NULL;
> > > > > > +   kfree(r5_core->pm_dev_core0_link);
> > > > > > +   r5_core->pm_dev_core0_link = NULL;
> > > > > > +
> > > > > > +   if (cluster->mode == SPLIT_MODE) {
> > > > > > +

Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER

2023-12-01 Thread Steven Rostedt

On Fri, 1 Dec 2023 09:25:59 -0800
Justin Chen  wrote:

> > It appears the sub instruction at 0x6dd0 correctly accounts for the
> > extra 8 bytes, so the frame pointer is valid. So it is our assumption
> > that there are no gaps between the stack frames is invalid.  
> 
> Thanks for the assistance. The gap between the stack frame depends on 
> the function. Most do not have a gap. Some have 8 (as shown above), some 
> have 12. A single assumption here is not going to work. I'm having a 
> hard time finding out the reasoning for this gap. I tried disabling a 
> bunch of gcc flags as well as -O2 and the gap still exists.

That code was originally added because of some strange things that gcc did
with mcount (for example, it made a copy of the stack frame that it passed
to mcount, where the function graph tracer replaced the copy of the return
stack making the shadow stack go out of sync and crash). This was very hard
to debug and I added this code to detect it if it happened again.

Well it's been over a decade since that happened (2009).

  71e308a239c09 ("function-graph: add stack frame test")

I'm happy assuming that the compiler folks are aware of our tricks with
hijacking return calls and I don't expect it to happen again. We can just
rip out those checks. That is, if it's only causing false positives, I
don't think it's worth keeping around.

Has it detected any real issues on the Arm platforms?

-- Steve

Re: [PATCH v3 0/5] params: harden string ops and allocatio ops

2023-12-01 Thread Kees Cook

On Mon, 20 Nov 2023 17:11:41 +0200, Andy Shevchenko wrote:
> A couple of patches are for get the string ops, used in the module,
> slightly harden. On top a few cleanups.
> 
> Since the main part is rather hardening, I think the Kees' tree is
> the best fit for the series. It also possible to route via Greg's
> sysfs (driver core?), but I'm open for another option(s).
> 
> [...]

Applied to for-next/hardening, thanks!

[1/5] params: Introduce the param_unknown_fn type
  https://git.kernel.org/kees/c/aa61d651412a
[2/5] params: Do not go over the limit when getting the string length
  https://git.kernel.org/kees/c/e6c5b15619a2
[3/5] params: Use size_add() for kmalloc()
  https://git.kernel.org/kees/c/9a4a4b528bff
[4/5] params: Sort headers
  https://git.kernel.org/kees/c/18bdb5a032e8
[5/5] params: Fix multi-line comment style
  https://git.kernel.org/kees/c/c62c9771b7d6

Take care,

-- 
Kees Cook

Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER

2023-12-01 Thread Justin Chen




On 12/1/2023 1:12 AM, Ard Biesheuvel wrote:

On Fri, 1 Dec 2023 at 00:48, Justin Chen  wrote:


Hello,

Ran into an odd bug that I am unsure what the solution is. Tested a few
kernels versions and they all fail the same.

FUNCTION_GRAPH_FP_TEST was enabled with 953f534a7ed6 ("ARM: ftrace:
enable HAVE_FUNCTION_GRAPH_FP_TEST"). This test fails when
UNWINDER_FRAME_POINTER is enabled. Enable function_graph tracer and you
should see a failure similar to below.

[   63.817239] [ cut here ]
[   63.822006] WARNING: CPU: 3 PID: 1185 at kernel/trace/fgraph.c:195
ftrace_return_to_handler+0x228/0x374
[   63.831645] Bad frame pointer: expected d1e0df40, received d1e0df48
[   63.831645]   from func packet_setsockopt return to c0b558f4
[   63.843801] Modules linked in: bdc udc_core
[   63.848246] CPU: 3 PID: 1185 Comm: udhcpc Not tainted
6.1.53-0.1pre-gf0bc552d12f8 #33
[   63.856209] Hardware name: Broadcom STB (Flattened Device Tree)
[   63.862227] Backtrace:
[   63.864761]  dump_backtrace from show_stack+0x20/0x24
[   63.869982]  r7:c031cd8c r6:0009 r5:0013 r4:c11c7fac
[   63.875736]  show_stack from dump_stack_lvl+0x48/0x54
[   63.880929]  dump_stack_lvl from dump_stack+0x18/0x1c
[   63.886111]  r5:00c3 r4:c11bd92c
[   63.889764]  dump_stack from __warn+0x88/0x130
[   63.894339]  __warn from warn_slowpath_fmt+0x140/0x198
[   63.899631]  r8:d1e0deac r7:c11bd958 r6:c031cd8c r5:c11bd92c r4:
[   63.906431]  warn_slowpath_fmt from ftrace_return_to_handler+0x228/0x374
[   63.913294]  r8:c3a8d840 r7:0002 r6:d1e0df48 r5:c2377a94 r4:c269a400
[   63.920095]  ftrace_return_to_handler from return_to_handler+0xc/0x18
[   63.926699]  r8:c0cd8ed0 r7:0008 r6:c418c500 r5:0004 r4:0107
[   63.933500]  __sys_setsockopt from return_to_handler+0x0/0x18
[   63.939415]  r8:c02002bc r7:0126 r6:0003 r5: r4:0004
[   63.946217]  sys_setsockopt from return_to_handler+0x0/0x18
[   63.952053] ---[ end trace  ]---

Sure enough the top of the parent stack is off by 8. (Tested with
gcc6.3/gcc8.3/gcc12.3)
6dcc :
  6dcc:   e1a0c00dmov ip, sp
  6dd0:   e24dd008sub sp, sp, #8 <==
  6dd4:   e92ddff0push{r4, r5, r6, r7, r8, r9, sl,
fp, ip, lr, pc}
  6dd8:   e24cb00csub fp, ip, #12
  6ddc:   e24dd06csub sp, sp, #108@ 0x6c
  6de0:   e52de004push{lr}@ (str lr, [sp,
#-4]!)
  6de4:   ebfebl  0 <__gnu_mcount_nc>

I'm not quite sure why gcc is putting this extra 8 byte frame (maybe
some optimization?), but it isn't being accounted for thus the
FUNCTION_GRAPH_FP_TEST for arm fails. Note that only some functions do
this. Function graph works with FUNCTION_GRAPH_FP_TEST disabled, so it
looks the test is hitting false positives.



Thanks for the report.

It appears the sub instruction at 0x6dd0 correctly accounts for the
extra 8 bytes, so the frame pointer is valid. So it is our assumption
that there are no gaps between the stack frames is invalid.


Thanks for the assistance. The gap between the stack frame depends on 
the function. Most do not have a gap. Some have 8 (as shown above), some 
have 12. A single assumption here is not going to work. I'm having a 
hard time finding out the reasoning for this gap. I tried disabling a 
bunch of gcc flags as well as -O2 and the gap still exists.


Thanks,
Justin



Could you try the following change please?

--- a/arch/arm/kernel/ftrace.c
+++ b/arch/arm/kernel/ftrace.c
@@ -235,8 +235,12 @@
 return;

 if (IS_ENABLED(CONFIG_UNWINDER_FRAME_POINTER)) {
-   /* FP points one word below parent's top of stack */
-   frame_pointer += 4;
+   /*
+* The top of stack of the parent is recorded in the stack
+* frame at offset [fp, #-8].
+*/
+   get_kernel_nofault(frame_pointer,
+  (unsigned long *)(frame_pointer - 8));
 } else {
 struct stackframe frame = {
 .fp = frame_pointer,


smime.p7s
Description: S/MIME Cryptographic Signature

Re: [PATCH v5] bus: mhi: host: Add tracing support

2023-12-01 Thread Steven Rostedt

On Fri, 1 Dec 2023 10:01:33 -0700
Jeffrey Hugo  wrote:

> > +DECLARE_EVENT_CLASS(mhi_process_event_ring,
> > +
> > +   TP_PROTO(const char *name, void *rp, __le64 ptr,
> > +__le32 dword0, __le32 dword1),
> > +
> > +   TP_ARGS(name, rp, ptr, dword0, dword1),
> > +
> > +   TP_STRUCT__entry(
> > +   __string(name, name)
> > +   __field(__le32, dword0)
> > +   __field(__le32, dword1)
> > +   __field(int, state)
> > +   __field(__le64, ptr)
> > +   __field(void *, rp)
> > +   ),
> > +
> > +   TP_fast_assign(
> > +   __assign_str(name, name);
> > +   __entry->rp = rp;
> > +   __entry->ptr = ptr;
> > +   __entry->dword0 = dword0;
> > +   __entry->dword1 = dword1;
> > +   __entry->state = MHI_TRE_GET_EV_STATE((struct mhi_ring_element 
> > *)entry->rp);  
> 
> "entry"?
> Also, you have the "rp" that was passed into the trace, why not just 
> directly use that?

Agreed.

-- Steve

Re: [PATCH v5] bus: mhi: host: Add tracing support

2023-12-01 Thread Jeffrey Hugo


On 11/27/2023 4:09 AM, Krishna chaitanya chundru wrote:

This change adds ftrace support for following functions which
helps in debugging the issues when there is Channel state & MHI
state change and also when we receive data and control events:
1. mhi_intvec_mhi_states
2. mhi_process_data_event_ring
3. mhi_process_ctrl_ev_ring
4. mhi_gen_tre
5. mhi_update_channel_state
6. mhi_tryset_pm_state
7. mhi_pm_st_worker

Where ever the trace events are added, debug messages are removed.

Signed-off-by: Krishna chaitanya chundru 
---
Changes in v5:
- Use DECLARE_EVENT_CLASS for multiple events as suggested by steve.
- Instead of converting to u64 to print address, use %px to print the address 
to avoid
- warnings in some platforms.
- Link to v4: 
https://lore.kernel.org/r/2023-ftrace_support-v4-1-c83602399...@quicinc.com

Changes in v4:
- Fix compilation issues in previous patch which happended due to rebasing.
- In the defconfig FTRACE config is not enabled due to that the compilation 
issue is not
- seen in my workspace.
- Link to v3: 
https://lore.kernel.org/r/2023-ftrace_support-v3-1-f358d2911...@quicinc.com

Changes in v3:
- move trace header file from include/trace/events to drivers/bus/mhi/host/ so 
that
- we can include driver header files.
- Use macros directly in the trace events as suggested Jeffrey Hugo.
- Reorder the structure in the events as suggested by steve to avoid holes in 
the buffer.
- removed the mhi_to_physical function as this can give security issues.
- removed macros to define strings as we can get those from driver headers.
- Link to v2: 
https://lore.kernel.org/r/20231013-ftrace_support-v2-1-6e893ce01...@quicinc.com

Changes in v2:
- Passing the raw state into the trace event and using  __print_symbolic() as 
suggested by bjorn.
- Change mhi_pm_st_worker to mhi_pm_st_transition as suggested by bjorn.
- Fixed the kernel test rebot issues.
- Link to v1: 
https://lore.kernel.org/r/20231005-ftrace_support-v1-1-23a2f394f...@quicinc.com
---
  drivers/bus/mhi/host/init.c |   3 +
  drivers/bus/mhi/host/internal.h |   1 +
  drivers/bus/mhi/host/main.c |  23 +++--
  drivers/bus/mhi/host/pm.c   |   6 +-
  drivers/bus/mhi/host/trace.h| 208 
  5 files changed, 228 insertions(+), 13 deletions(-)

diff --git a/drivers/bus/mhi/host/init.c b/drivers/bus/mhi/host/init.c
index f78aefd2d7a3..6acb85f4c5f8 100644
--- a/drivers/bus/mhi/host/init.c
+++ b/drivers/bus/mhi/host/init.c
@@ -20,6 +20,9 @@
  #include 
  #include "internal.h"
  
+#define CREATE_TRACE_POINTS

+#include "trace.h"
+
  static DEFINE_IDA(mhi_controller_ida);
  
  const char * const mhi_ee_str[MHI_EE_MAX] = {

diff --git a/drivers/bus/mhi/host/internal.h b/drivers/bus/mhi/host/internal.h
index 2e139e76de4c..a02a71605907 100644
--- a/drivers/bus/mhi/host/internal.h
+++ b/drivers/bus/mhi/host/internal.h
@@ -7,6 +7,7 @@
  #ifndef _MHI_INT_H
  #define _MHI_INT_H
  
+#include "trace.h"

  #include "../common.h"
  
  extern struct bus_type mhi_bus_type;

diff --git a/drivers/bus/mhi/host/main.c b/drivers/bus/mhi/host/main.c
index dcf627b36e82..0d7e068e713a 100644
--- a/drivers/bus/mhi/host/main.c
+++ b/drivers/bus/mhi/host/main.c
@@ -491,11 +491,9 @@ irqreturn_t mhi_intvec_threaded_handler(int irq_number, 
void *priv)
  
  	state = mhi_get_mhi_state(mhi_cntrl);

ee = mhi_get_exec_env(mhi_cntrl);
-   dev_dbg(dev, "local ee: %s state: %s device ee: %s state: %s\n",
-   TO_MHI_EXEC_STR(mhi_cntrl->ee),
-   mhi_state_str(mhi_cntrl->dev_state),
-   TO_MHI_EXEC_STR(ee), mhi_state_str(state));
  
+	trace_mhi_intvec_states(mhi_cntrl->mhi_dev->name, mhi_cntrl->ee,

+   mhi_cntrl->dev_state, ee, state);
if (state == MHI_STATE_SYS_ERR) {
dev_dbg(dev, "System error detected\n");
pm_state = mhi_tryset_pm_state(mhi_cntrl,
@@ -832,6 +830,10 @@ int mhi_process_ctrl_ev_ring(struct mhi_controller 
*mhi_cntrl,
while (dev_rp != local_rp) {
enum mhi_pkt_type type = MHI_TRE_GET_EV_TYPE(local_rp);
  
+		trace_mhi_ctrl_event(mhi_cntrl->mhi_dev->name, local_rp,

+local_rp->ptr, local_rp->dword[0],
+local_rp->dword[1]);
+
switch (type) {
case MHI_PKT_TYPE_BW_REQ_EVENT:
{
@@ -997,6 +999,9 @@ int mhi_process_data_event_ring(struct mhi_controller 
*mhi_cntrl,
while (dev_rp != local_rp && event_quota > 0) {
enum mhi_pkt_type type = MHI_TRE_GET_EV_TYPE(local_rp);
  
+		trace_mhi_data_event(mhi_cntrl->mhi_dev->name, local_rp, local_rp->ptr,

+local_rp->dword[0], local_rp->dword[1]);
+
chan = MHI_TRE_GET_EV_CHID(local_rp);
  
  		WARN_ON(chan >= mhi_cntrl->max_chan);

@@ -1235,6 +1240,8 @@ int mhi_gen_tre(struct mhi_controller *mhi_cntrl, struct 
mhi_chan *mhi_chan,

Re: [PATCH v3 0/5] params: harden string ops and allocatio ops

2023-12-01 Thread Andy Shevchenko

On Mon, Nov 20, 2023 at 05:11:41PM +0200, Andy Shevchenko wrote:
> A couple of patches are for get the string ops, used in the module,
> slightly harden. On top a few cleanups.
> 
> Since the main part is rather hardening, I think the Kees' tree is
> the best fit for the series. It also possible to route via Greg's
> sysfs (driver core?), but I'm open for another option(s).

Kees, Greg, can you apply this series?
Or should I do something about it?

-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH vhost 3/7] vdpa/mlx5: Allow modifying multiple vq fields in one modify command

2023-12-01 Thread Dragos Tatulea

On 12/01, Eugenio Perez Martin wrote:
> On Fri, Dec 1, 2023 at 11:49 AM Dragos Tatulea  wrote:
> >
> > Add a bitmask variable that tracks hw vq field changes that
> > are supposed to be modified on next hw vq change command.
> >
> > This will be useful to set multiple vq fields when resuming the vq.
> >
> > The state needs to remain as a parameter as it doesn't make sense to
> > make it part of the vq struct: fw_state is updated only after a
> > successful command.
> >
> 
> I don't get this paragraph, "modified_fields" is a member of
> "mlx5_vdpa_virtqueue". Am I missing something?
> 
I think this paragraph adds more confusion than clarification. I meant
to say that the state argument from modified_virtqueue needs to remain
there, as opposed to integrating it into the mlx5_vdpa_virtqueue struct.

> 
> > Signed-off-by: Dragos Tatulea 
> > ---
> >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 48 +--
> >  1 file changed, 40 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > index 12ac3397f39b..d06285e46fe2 100644
> > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > @@ -120,6 +120,9 @@ struct mlx5_vdpa_virtqueue {
> > u16 avail_idx;
> > u16 used_idx;
> > int fw_state;
> > +
> > +   u64 modified_fields;
> > +
> > struct msi_map map;
> >
> > /* keep last in the struct */
> > @@ -1181,7 +1184,19 @@ static bool is_valid_state_change(int oldstate, int 
> > newstate)
> > }
> >  }
> >
> > -static int modify_virtqueue(struct mlx5_vdpa_net *ndev, struct 
> > mlx5_vdpa_virtqueue *mvq, int state)
> > +static bool modifiable_virtqueue_fields(struct mlx5_vdpa_virtqueue *mvq)
> > +{
> > +   /* Only state is always modifiable */
> > +   if (mvq->modified_fields & ~MLX5_VIRTQ_MODIFY_MASK_STATE)
> > +   return mvq->fw_state == MLX5_VIRTIO_NET_Q_OBJECT_STATE_INIT 
> > ||
> > +  mvq->fw_state == 
> > MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND;
> > +
> > +   return true;
> > +}
> > +
> > +static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
> > +   struct mlx5_vdpa_virtqueue *mvq,
> > +   int state)
> >  {
> > int inlen = MLX5_ST_SZ_BYTES(modify_virtio_net_q_in);
> > u32 out[MLX5_ST_SZ_DW(modify_virtio_net_q_out)] = {};
> > @@ -1193,6 +1208,9 @@ static int modify_virtqueue(struct mlx5_vdpa_net 
> > *ndev, struct mlx5_vdpa_virtque
> > if (mvq->fw_state == MLX5_VIRTIO_NET_Q_OBJECT_NONE)
> > return 0;
> >
> > +   if (!modifiable_virtqueue_fields(mvq))
> > +   return -EINVAL;
> > +
> 
> I'm ok with this change, but since modified_fields is (or will be) a
> bitmap tracking changes to state, addresses, mkey, etc, doesn't have
> more sense to check it like:
> 
> if (modified_fields & 1<   // perform change 1
> if (modified_fields & 1<   // perform change 2
> if (modified_fields & 1<   // perform change 13
> ---
> 
> Instead of:
> if (!modified_fields)
>   return
> 
> if (modified_fields & 1<   // perform change 1
> if (modified_fields & 1<   // perform change 2
> 
> // perform change 3, no checking, as it is the only possible value of
> modified_fields
> ---
> 
> Or am I missing something?
> 
modifiable_virtqueue_fields() is meant to check that the modification is
done only in the INIT or SUSPEND state of the queue. Or did I
misunderstand your question?

> The rest looks good to me.
>
Thanks for reviewing my patches Eugenio!

> > if (!is_valid_state_change(mvq->fw_state, state))
> > return -EINVAL;
> >
> > @@ -1208,17 +1226,28 @@ static int modify_virtqueue(struct mlx5_vdpa_net 
> > *ndev, struct mlx5_vdpa_virtque
> > MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, uid, ndev->mvdev.res.uid);
> >
> > obj_context = MLX5_ADDR_OF(modify_virtio_net_q_in, in, obj_context);
> > -   MLX5_SET64(virtio_net_q_object, obj_context, modify_field_select,
> > -  MLX5_VIRTQ_MODIFY_MASK_STATE);
> > -   MLX5_SET(virtio_net_q_object, obj_context, state, state);
> > +   if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_STATE)
> > +   MLX5_SET(virtio_net_q_object, obj_context, state, state);
> > +
> > +   MLX5_SET64(virtio_net_q_object, obj_context, modify_field_select, 
> > mvq->modified_fields);
> > err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, sizeof(out));
> > kfree(in);
> > if (!err)
> > mvq->fw_state = state;
> >
> > +   mvq->modified_fields = 0;
> > +
> > return err;
> >  }
> >
> > +static int modify_virtqueue_state(struct mlx5_vdpa_net *ndev,
> > + struct mlx5_vdpa_virtqueue *mvq,
> > + unsigned int state)
> > +{
> > +   mvq->modified_fields |= MLX5_VIRTQ_MODIFY_MASK_STATE;
> > +   return modify_virtqueue(ndev,

Re: [PATCH vhost 6/7] vdpa/mlx5: Mark vq state for modification in hw vq

2023-12-01 Thread Eugenio Perez Martin

On Fri, Dec 1, 2023 at 11:50 AM Dragos Tatulea  wrote:
>
> .set_vq_state will set the indices and mark the fields to be modified in
> the hw vq.
>
> Signed-off-by: Dragos Tatulea 

Acked-by: Eugenio Pérez 

> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c  | 8 
>  include/linux/mlx5/mlx5_ifc_vdpa.h | 2 ++
>  2 files changed, 10 insertions(+)
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 2277daf4814f..6325aef045e2 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -1249,6 +1249,12 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
> MLX5_SET64(virtio_q, vq_ctx, available_addr, 
> mvq->driver_addr);
> }
>
> +   if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_AVAIL_IDX)
> +   MLX5_SET(virtio_net_q_object, obj_context, 
> hw_available_index, mvq->avail_idx);
> +
> +   if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_USED_IDX)
> +   MLX5_SET(virtio_net_q_object, obj_context, hw_used_index, 
> mvq->used_idx);
> +
> MLX5_SET64(virtio_net_q_object, obj_context, modify_field_select, 
> mvq->modified_fields);
> err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, sizeof(out));
> if (err)
> @@ -2328,6 +2334,8 @@ static int mlx5_vdpa_set_vq_state(struct vdpa_device 
> *vdev, u16 idx,
>
> mvq->used_idx = state->split.avail_index;
> mvq->avail_idx = state->split.avail_index;
> +   mvq->modified_fields |= MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_AVAIL_IDX |
> +   MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_USED_IDX;
> return 0;
>  }
>
> diff --git a/include/linux/mlx5/mlx5_ifc_vdpa.h 
> b/include/linux/mlx5/mlx5_ifc_vdpa.h
> index 9594ac405740..32e712106e68 100644
> --- a/include/linux/mlx5/mlx5_ifc_vdpa.h
> +++ b/include/linux/mlx5/mlx5_ifc_vdpa.h
> @@ -146,6 +146,8 @@ enum {
> MLX5_VIRTQ_MODIFY_MASK_DIRTY_BITMAP_PARAMS  = (u64)1 << 3,
> MLX5_VIRTQ_MODIFY_MASK_DIRTY_BITMAP_DUMP_ENABLE = (u64)1 << 4,
> MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_ADDRS   = (u64)1 << 6,
> +   MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_AVAIL_IDX   = (u64)1 << 7,
> +   MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_USED_IDX= (u64)1 << 8,
> MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY  = (u64)1 << 14,
>  };
>
> --
> 2.42.0
>

Re: [PATCH vhost 4/7] vdpa/mlx5: Introduce per vq and device resume

2023-12-01 Thread Eugenio Perez Martin

On Fri, Dec 1, 2023 at 11:50 AM Dragos Tatulea  wrote:
>
> Implement vdpa vq and device resume if capability detected. Add support
> for suspend -> ready state change.
>
> Signed-off-by: Dragos Tatulea 

Acked-by: Eugenio Pérez 

> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 67 +++
>  1 file changed, 60 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index d06285e46fe2..68e534cb57e2 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -1170,7 +1170,12 @@ static int query_virtqueue(struct mlx5_vdpa_net *ndev, 
> struct mlx5_vdpa_virtqueu
> return err;
>  }
>
> -static bool is_valid_state_change(int oldstate, int newstate)
> +static bool is_resumable(struct mlx5_vdpa_net *ndev)
> +{
> +   return ndev->mvdev.vdev.config->resume;
> +}
> +
> +static bool is_valid_state_change(int oldstate, int newstate, bool resumable)
>  {
> switch (oldstate) {
> case MLX5_VIRTIO_NET_Q_OBJECT_STATE_INIT:
> @@ -1178,6 +1183,7 @@ static bool is_valid_state_change(int oldstate, int 
> newstate)
> case MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY:
> return newstate == MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND;
> case MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND:
> +   return resumable ? newstate == 
> MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY : false;
> case MLX5_VIRTIO_NET_Q_OBJECT_STATE_ERR:
> default:
> return false;
> @@ -1200,6 +1206,7 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
>  {
> int inlen = MLX5_ST_SZ_BYTES(modify_virtio_net_q_in);
> u32 out[MLX5_ST_SZ_DW(modify_virtio_net_q_out)] = {};
> +   bool state_change = false;
> void *obj_context;
> void *cmd_hdr;
> void *in;
> @@ -1211,9 +1218,6 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
> if (!modifiable_virtqueue_fields(mvq))
> return -EINVAL;
>
> -   if (!is_valid_state_change(mvq->fw_state, state))
> -   return -EINVAL;
> -
> in = kzalloc(inlen, GFP_KERNEL);
> if (!in)
> return -ENOMEM;
> @@ -1226,17 +1230,29 @@ static int modify_virtqueue(struct mlx5_vdpa_net 
> *ndev,
> MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, uid, ndev->mvdev.res.uid);
>
> obj_context = MLX5_ADDR_OF(modify_virtio_net_q_in, in, obj_context);
> -   if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_STATE)
> +
> +   if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_STATE) {
> +   if (!is_valid_state_change(mvq->fw_state, state, 
> is_resumable(ndev))) {
> +   err = -EINVAL;
> +   goto done;
> +   }
> +
> MLX5_SET(virtio_net_q_object, obj_context, state, state);
> +   state_change = true;
> +   }
>
> MLX5_SET64(virtio_net_q_object, obj_context, modify_field_select, 
> mvq->modified_fields);
> err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, sizeof(out));
> -   kfree(in);
> -   if (!err)
> +   if (err)
> +   goto done;
> +
> +   if (state_change)
> mvq->fw_state = state;
>
> mvq->modified_fields = 0;
>
> +done:
> +   kfree(in);
> return err;
>  }
>
> @@ -1430,6 +1446,24 @@ static void suspend_vqs(struct mlx5_vdpa_net *ndev)
> suspend_vq(ndev, >vqs[i]);
>  }
>
> +static void resume_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue 
> *mvq)
> +{
> +   if (!mvq->initialized || !is_resumable(ndev))
> +   return;
> +
> +   if (mvq->fw_state != MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND)
> +   return;
> +
> +   if (modify_virtqueue_state(ndev, mvq, 
> MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY))
> +   mlx5_vdpa_warn(>mvdev, "modify to resume failed\n");
> +}
> +
> +static void resume_vqs(struct mlx5_vdpa_net *ndev)
> +{
> +   for (int i = 0; i < ndev->mvdev.max_vqs; i++)
> +   resume_vq(ndev, >vqs[i]);
> +}
> +
>  static void teardown_vq(struct mlx5_vdpa_net *ndev, struct 
> mlx5_vdpa_virtqueue *mvq)
>  {
> if (!mvq->initialized)
> @@ -3256,6 +3290,21 @@ static int mlx5_vdpa_suspend(struct vdpa_device *vdev)
> return 0;
>  }
>
> +static int mlx5_vdpa_resume(struct vdpa_device *vdev)
> +{
> +   struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> +   struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> +
> +   mlx5_vdpa_info(mvdev, "resuming device\n");
> +
> +   down_write(>reslock);
> +   mvdev->suspended = false;
> +   resume_vqs(ndev);
> +   register_link_notifier(ndev);
> +   up_write(>reslock);
> +   return 0;
> +}
> +
>  static int mlx5_set_group_asid(struct vdpa_device *vdev, u32 group,
>unsigned int asid)
>  {
> @@ -3312,6 +3361,7 @@ static const struct vdpa_config_ops mlx5_vdpa_ops =

Re: [PATCH vhost 3/7] vdpa/mlx5: Allow modifying multiple vq fields in one modify command

2023-12-01 Thread Eugenio Perez Martin

On Fri, Dec 1, 2023 at 11:49 AM Dragos Tatulea  wrote:
>
> Add a bitmask variable that tracks hw vq field changes that
> are supposed to be modified on next hw vq change command.
>
> This will be useful to set multiple vq fields when resuming the vq.
>
> The state needs to remain as a parameter as it doesn't make sense to
> make it part of the vq struct: fw_state is updated only after a
> successful command.
>

I don't get this paragraph, "modified_fields" is a member of
"mlx5_vdpa_virtqueue". Am I missing something?


> Signed-off-by: Dragos Tatulea 
> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 48 +--
>  1 file changed, 40 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 12ac3397f39b..d06285e46fe2 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -120,6 +120,9 @@ struct mlx5_vdpa_virtqueue {
> u16 avail_idx;
> u16 used_idx;
> int fw_state;
> +
> +   u64 modified_fields;
> +
> struct msi_map map;
>
> /* keep last in the struct */
> @@ -1181,7 +1184,19 @@ static bool is_valid_state_change(int oldstate, int 
> newstate)
> }
>  }
>
> -static int modify_virtqueue(struct mlx5_vdpa_net *ndev, struct 
> mlx5_vdpa_virtqueue *mvq, int state)
> +static bool modifiable_virtqueue_fields(struct mlx5_vdpa_virtqueue *mvq)
> +{
> +   /* Only state is always modifiable */
> +   if (mvq->modified_fields & ~MLX5_VIRTQ_MODIFY_MASK_STATE)
> +   return mvq->fw_state == MLX5_VIRTIO_NET_Q_OBJECT_STATE_INIT ||
> +  mvq->fw_state == 
> MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND;
> +
> +   return true;
> +}
> +
> +static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
> +   struct mlx5_vdpa_virtqueue *mvq,
> +   int state)
>  {
> int inlen = MLX5_ST_SZ_BYTES(modify_virtio_net_q_in);
> u32 out[MLX5_ST_SZ_DW(modify_virtio_net_q_out)] = {};
> @@ -1193,6 +1208,9 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev, 
> struct mlx5_vdpa_virtque
> if (mvq->fw_state == MLX5_VIRTIO_NET_Q_OBJECT_NONE)
> return 0;
>
> +   if (!modifiable_virtqueue_fields(mvq))
> +   return -EINVAL;
> +

I'm ok with this change, but since modified_fields is (or will be) a
bitmap tracking changes to state, addresses, mkey, etc, doesn't have
more sense to check it like:

if (modified_fields & 1< if (!is_valid_state_change(mvq->fw_state, state))
> return -EINVAL;
>
> @@ -1208,17 +1226,28 @@ static int modify_virtqueue(struct mlx5_vdpa_net 
> *ndev, struct mlx5_vdpa_virtque
> MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, uid, ndev->mvdev.res.uid);
>
> obj_context = MLX5_ADDR_OF(modify_virtio_net_q_in, in, obj_context);
> -   MLX5_SET64(virtio_net_q_object, obj_context, modify_field_select,
> -  MLX5_VIRTQ_MODIFY_MASK_STATE);
> -   MLX5_SET(virtio_net_q_object, obj_context, state, state);
> +   if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_STATE)
> +   MLX5_SET(virtio_net_q_object, obj_context, state, state);
> +
> +   MLX5_SET64(virtio_net_q_object, obj_context, modify_field_select, 
> mvq->modified_fields);
> err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, sizeof(out));
> kfree(in);
> if (!err)
> mvq->fw_state = state;
>
> +   mvq->modified_fields = 0;
> +
> return err;
>  }
>
> +static int modify_virtqueue_state(struct mlx5_vdpa_net *ndev,
> + struct mlx5_vdpa_virtqueue *mvq,
> + unsigned int state)
> +{
> +   mvq->modified_fields |= MLX5_VIRTQ_MODIFY_MASK_STATE;
> +   return modify_virtqueue(ndev, mvq, state);
> +}
> +
>  static int counter_set_alloc(struct mlx5_vdpa_net *ndev, struct 
> mlx5_vdpa_virtqueue *mvq)
>  {
> u32 in[MLX5_ST_SZ_DW(create_virtio_q_counters_in)] = {};
> @@ -1347,7 +1376,7 @@ static int setup_vq(struct mlx5_vdpa_net *ndev, struct 
> mlx5_vdpa_virtqueue *mvq)
> goto err_vq;
>
> if (mvq->ready) {
> -   err = modify_virtqueue(ndev, mvq, 
> MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
> +   err = modify_virtqueue_state(ndev, mvq, 
> MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
> if (err) {
> mlx5_vdpa_warn(>mvdev, "failed to modify to 
> ready vq idx %d(%d)\n",
>idx, err);
> @@ -1382,7 +1411,7 @@ static void suspend_vq(struct mlx5_vdpa_net *ndev, 
> struct mlx5_vdpa_virtqueue *m
> if (mvq->fw_state != MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY)
> return;
>
> -   if (modify_virtqueue(ndev, mvq, 
> MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND))
> +   if (modify_virtqueue_state(ndev, mvq, 
> MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND))
>

Re: [PATCH 1/2] tracing: Simplify and fix "buffered event" synchronization

2023-12-01 Thread Steven Rostedt

On Fri, 1 Dec 2023 15:17:35 +0100
Petr Pavlu  wrote:

> Ok, keeping the current approach, my plan for v2 is to prepare the
> following patches:
> 
> * Fix for the missing increment+decrement of trace_buffered_event_cnt
>   on the current CPU in trace_buffered_event_disable().
> 
>   Replace smp_call_function_many() with on_each_cpu_mask() in
>   trace_buffered_event_disable(). The on_each_cpu_mask() function has
>   also an advantage that it itself disables preemption so doing that can
>   be then removed from trace_buffered_event_disable().

OK.

> 
> * Fix the potential race between trace_buffered_event_enable() and
>   trace_event_buffer_lock_reserve() where the latter might already see
>   a valid trace_buffered_event pointer but not all initialization yet.
> 
>   I think this might be actually best to address by using the same
>   maintenance exclusion as is implemented in
>   trace_buffered_event_disable(). It would make both maintenance
>   operations consistent but for the cost of making the enable operation
>   somewhat slower.

I wouldn't do them the same just to make them consistent. I think the
smp_wmb() is sufficient. Don't you think?

> 
> * Fix the WARN_ON_ONCE(!trace_buffered_event_ref) issued in
>   trace_buffered_event_disable() when trace_buffered_event_enable()
>   previously fails.
> 
>   Add a variable/flag tracking whether trace_buffered_event is currently
>   allocated and use that for driving if a new allocation needs to be
>   done when trace_buffered_event_enable() is called, or the buffers
>   should be really freed when trace_buffered_event_disable() is invoked.
> 
>   Not sure if the mentioned alternative of leaving trace_buffered_event
>   partially initialized on failure is preferred instead.

I do not really have a preference for either solution. They both are bad if
it happens ;-)

> 
> * Fix the potential race between trace_buffered_event_disable() and
>   trace_event_buffer_lock_reserve() where the latter might still grab
>   a pointer from trace_buffered_event that is being freed.
> 
>   Replace smp_wmb() with synchronize_rcu() in
>   trace_buffered_event_disable().

Sounds good.

Thanks!

-- Steve

Re: [PATCH 1/2] tracing: Simplify and fix "buffered event" synchronization

2023-12-01 Thread Petr Pavlu

On 11/29/23 15:58, Steven Rostedt wrote:
> On Wed, 29 Nov 2023 10:22:23 +0100
> Petr Pavlu  wrote:
> 
>> Yes, I believe this should address this potential race condition.
>>
>> An alternative would be instead to update
>> trace_event_buffer_lock_reserve() as follows:
>>
>>  if (this_cpu_inc_return(trace_buffered_event_cnt) == 1) {
>>  smp_rmb();
> 
> This is the problem I have with your approach. That smp_rmb() is in the
> highly critical path. On some architectures, this has a significant impact
> on the overhead of this code. This path is called during code execution and
> increases the overhead of the tracing infrastructure.
> 
> If I'm given two solutions where one adds a smp_rmb() to the critical path
> and the other just slows down the non-critical path more, I'll take the
> slow down of non-critical path every time.
> 
>>  if ((entry = __this_cpu_read(trace_buffered_event))) {
>>  [...]
>>
>> That saves the synchronize_rcu() call but additionally modifies
>> trace_buffered_event_cnt even if trace_buffered_event is currently NULL.
>>
>> Another alternative is the approach taken by my patch which avoids more
>> RCU work and unnecessary memory modifications.
>>
>> I'd be interested if you could have a look again at what I'm proposing
>> in my patch. I think it simplifies the code while addressing these
>> problems as well. However, if you have reservations about that approach
>> then it is ok, I can fix the found problems individually as discussed.
> 
> Fix this without adding any memory barriers to the critical path, then I'll
> take another look.
> 
> FYI, this code was designed in the first place to avoid adding memory
> barriers in the critical path.

Thank you for the explanation. I ran the tests you mentioned in the
description of commit 0fc1b09ff1ff ("tracing: Use temp buffer when
filtering events") to understand this aspect a bit more. I confirm that
my proposed patch makes the tracing slower, usually by single digit
percentages. I understand this is not welcome. I also realize that the
slowdown might be even worse in different situations and on other
architectures that I checked (arm64, x86_64).

Ok, keeping the current approach, my plan for v2 is to prepare the
following patches:

* Fix for the missing increment+decrement of trace_buffered_event_cnt
  on the current CPU in trace_buffered_event_disable().

  Replace smp_call_function_many() with on_each_cpu_mask() in
  trace_buffered_event_disable(). The on_each_cpu_mask() function has
  also an advantage that it itself disables preemption so doing that can
  be then removed from trace_buffered_event_disable().

* Fix the potential race between trace_buffered_event_enable() and
  trace_event_buffer_lock_reserve() where the latter might already see
  a valid trace_buffered_event pointer but not all initialization yet.

  I think this might be actually best to address by using the same
  maintenance exclusion as is implemented in
  trace_buffered_event_disable(). It would make both maintenance
  operations consistent but for the cost of making the enable operation
  somewhat slower.

* Fix the WARN_ON_ONCE(!trace_buffered_event_ref) issued in
  trace_buffered_event_disable() when trace_buffered_event_enable()
  previously fails.

  Add a variable/flag tracking whether trace_buffered_event is currently
  allocated and use that for driving if a new allocation needs to be
  done when trace_buffered_event_enable() is called, or the buffers
  should be really freed when trace_buffered_event_disable() is invoked.

  Not sure if the mentioned alternative of leaving trace_buffered_event
  partially initialized on failure is preferred instead.

* Fix the potential race between trace_buffered_event_disable() and
  trace_event_buffer_lock_reserve() where the latter might still grab
  a pointer from trace_buffered_event that is being freed.

  Replace smp_wmb() with synchronize_rcu() in
  trace_buffered_event_disable().

Thanks,
Petr

[PATCH RFC 5/6] net: introduce page_frag_cache_drain()

2023-12-01 Thread Yunsheng Lin

When draining a page_frag_cache, most user are doing
the similar steps, so introduce an API to avoid code
duplication.

Signed-off-by: Yunsheng Lin 
---
 drivers/net/ethernet/google/gve/gve_main.c | 11 ++-
 drivers/net/ethernet/mediatek/mtk_wed_wo.c | 17 ++---
 drivers/nvme/host/tcp.c|  7 +--
 drivers/nvme/target/tcp.c  |  4 +---
 drivers/vhost/net.c|  4 +---
 include/linux/gfp.h|  2 ++
 mm/page_alloc.c| 10 ++
 7 files changed, 19 insertions(+), 36 deletions(-)

diff --git a/drivers/net/ethernet/google/gve/gve_main.c 
b/drivers/net/ethernet/google/gve/gve_main.c
index 619bf63ec935..d976190b0f4d 100644
--- a/drivers/net/ethernet/google/gve/gve_main.c
+++ b/drivers/net/ethernet/google/gve/gve_main.c
@@ -1278,17 +1278,10 @@ static void gve_unreg_xdp_info(struct gve_priv *priv)
 
 static void gve_drain_page_cache(struct gve_priv *priv)
 {
-   struct page_frag_cache *nc;
int i;
 
-   for (i = 0; i < priv->rx_cfg.num_queues; i++) {
-   nc = >rx[i].page_cache;
-   if (nc->va) {
-   __page_frag_cache_drain(virt_to_page(nc->va),
-   nc->pagecnt_bias);
-   nc->va = NULL;
-   }
-   }
+   for (i = 0; i < priv->rx_cfg.num_queues; i++)
+   page_frag_cache_drain(>rx[i].page_cache);
 }
 
 static int gve_open(struct net_device *dev)
diff --git a/drivers/net/ethernet/mediatek/mtk_wed_wo.c 
b/drivers/net/ethernet/mediatek/mtk_wed_wo.c
index 7ffbd4fca881..df0a3ceaf59b 100644
--- a/drivers/net/ethernet/mediatek/mtk_wed_wo.c
+++ b/drivers/net/ethernet/mediatek/mtk_wed_wo.c
@@ -286,7 +286,6 @@ mtk_wed_wo_queue_free(struct mtk_wed_wo *wo, struct 
mtk_wed_wo_queue *q)
 static void
 mtk_wed_wo_queue_tx_clean(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q)
 {
-   struct page *page;
int i;
 
for (i = 0; i < q->n_desc; i++) {
@@ -298,19 +297,12 @@ mtk_wed_wo_queue_tx_clean(struct mtk_wed_wo *wo, struct 
mtk_wed_wo_queue *q)
entry->buf = NULL;
}
 
-   if (!q->cache.va)
-   return;
-
-   page = virt_to_page(q->cache.va);
-   __page_frag_cache_drain(page, q->cache.pagecnt_bias);
-   memset(>cache, 0, sizeof(q->cache));
+   page_frag_cache_drain(>cache);
 }
 
 static void
 mtk_wed_wo_queue_rx_clean(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q)
 {
-   struct page *page;
-
for (;;) {
void *buf = mtk_wed_wo_dequeue(wo, q, NULL, true);
 
@@ -320,12 +312,7 @@ mtk_wed_wo_queue_rx_clean(struct mtk_wed_wo *wo, struct 
mtk_wed_wo_queue *q)
skb_free_frag(buf);
}
 
-   if (!q->cache.va)
-   return;
-
-   page = virt_to_page(q->cache.va);
-   __page_frag_cache_drain(page, q->cache.pagecnt_bias);
-   memset(>cache, 0, sizeof(q->cache));
+   page_frag_cache_drain(>cache);
 }
 
 static void
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 89661a9cf850..8d4f4a06f9d9 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1338,7 +1338,6 @@ static int nvme_tcp_alloc_async_req(struct nvme_tcp_ctrl 
*ctrl)
 
 static void nvme_tcp_free_queue(struct nvme_ctrl *nctrl, int qid)
 {
-   struct page *page;
struct nvme_tcp_ctrl *ctrl = to_tcp_ctrl(nctrl);
struct nvme_tcp_queue *queue = >queues[qid];
unsigned int noreclaim_flag;
@@ -1349,11 +1348,7 @@ static void nvme_tcp_free_queue(struct nvme_ctrl *nctrl, 
int qid)
if (queue->hdr_digest || queue->data_digest)
nvme_tcp_free_crypto(queue);
 
-   if (queue->pf_cache.va) {
-   page = virt_to_head_page(queue->pf_cache.va);
-   __page_frag_cache_drain(page, queue->pf_cache.pagecnt_bias);
-   queue->pf_cache.va = NULL;
-   }
+   page_frag_cache_drain(>pf_cache);
 
noreclaim_flag = memalloc_noreclaim_save();
/* ->sock will be released by fput() */
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index 92b74d0b8686..f9a553d70a61 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -1576,7 +1576,6 @@ static void nvmet_tcp_free_cmd_data_in_buffers(struct 
nvmet_tcp_queue *queue)
 
 static void nvmet_tcp_release_queue_work(struct work_struct *w)
 {
-   struct page *page;
struct nvmet_tcp_queue *queue =
container_of(w, struct nvmet_tcp_queue, release_work);
 
@@ -1600,8 +1599,7 @@ static void nvmet_tcp_release_queue_work(struct 
work_struct *w)
if (queue->hdr_digest || queue->data_digest)
nvmet_tcp_free_crypto(queue);
ida_free(_tcp_queue_ida, queue->idx);
-   page = virt_to_head_page(queue->pf_cache.va);
-   __page_frag_cache_drain(page, queue->pf_cache.pagecnt_bias);
+   page_frag_cache_drain(>pf_cache);

[PATCH RFC 6/6] tools: virtio: introduce vhost_net_test

2023-12-01 Thread Yunsheng Lin

introduce vhost_net_test basing on virtio_test to test
vhost_net changing in the kernel.

Signed-off-by: Yunsheng Lin 
---
 tools/virtio/Makefile |   8 +-
 tools/virtio/vhost_net_test.c | 441 ++
 2 files changed, 446 insertions(+), 3 deletions(-)
 create mode 100644 tools/virtio/vhost_net_test.c

diff --git a/tools/virtio/Makefile b/tools/virtio/Makefile
index d128925980e0..e25e99c1c3b7 100644
--- a/tools/virtio/Makefile
+++ b/tools/virtio/Makefile
@@ -1,8 +1,9 @@
 # SPDX-License-Identifier: GPL-2.0
 all: test mod
-test: virtio_test vringh_test
+test: virtio_test vringh_test vhost_net_test
 virtio_test: virtio_ring.o virtio_test.o
 vringh_test: vringh_test.o vringh.o virtio_ring.o
+vhost_net_test: virtio_ring.o vhost_net_test.o
 
 try-run = $(shell set -e;  \
if ($(1)) >/dev/null 2>&1;  \
@@ -49,6 +50,7 @@ oot-clean: OOT_BUILD+=clean
 
 .PHONY: all test mod clean vhost oot oot-clean oot-build
 clean:
-   ${RM} *.o vringh_test virtio_test vhost_test/*.o vhost_test/.*.cmd \
-  vhost_test/Module.symvers vhost_test/modules.order *.d
+   ${RM} *.o vringh_test virtio_test vhost_net_test vhost_test/*.o \
+  vhost_test/.*.cmd vhost_test/Module.symvers \
+  vhost_test/modules.order *.d
 -include *.d
diff --git a/tools/virtio/vhost_net_test.c b/tools/virtio/vhost_net_test.c
new file mode 100644
index ..7e7b7aba3668
--- /dev/null
+++ b/tools/virtio/vhost_net_test.c
@@ -0,0 +1,441 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define RANDOM_BATCH -1
+
+static int tun_alloc(void)
+{
+   struct ifreq ifr;
+   int fd, e;
+
+   fd = open("/dev/net/tun", O_RDWR);
+   if (fd < 0) {
+   perror("Cannot open /dev/net/tun");
+   return fd;
+   }
+
+   memset(, 0, sizeof(ifr));
+
+   ifr.ifr_flags = IFF_TUN | IFF_NO_PI;
+   strncpy(ifr.ifr_name, "tun0", IFNAMSIZ);
+
+   e = ioctl(fd, TUNSETIFF, (void *) );
+   if (e < 0) {
+   perror("ioctl[TUNSETIFF]");
+   close(fd);
+   return e;
+   }
+
+   return fd;
+}
+
+/* Unused */
+void *__kmalloc_fake, *__kfree_ignore_start, *__kfree_ignore_end;
+
+struct vq_info {
+   int kick;
+   int call;
+   int num;
+   int idx;
+   void *ring;
+   /* copy used for control */
+   struct vring vring;
+   struct virtqueue *vq;
+};
+
+struct vdev_info {
+   struct virtio_device vdev;
+   int control;
+   struct pollfd fds[1];
+   struct vq_info vqs[1];
+   int nvqs;
+   void *buf;
+   size_t buf_size;
+   struct vhost_memory *mem;
+};
+
+static struct vhost_vring_file no_backend = { .index = 1, .fd = -1 },
+backend = { .index = 1, .fd = 1 };
+static const struct vhost_vring_state null_state = {};
+
+bool vq_notify(struct virtqueue *vq)
+{
+   struct vq_info *info = vq->priv;
+   unsigned long long v = 1;
+   int r;
+   r = write(info->kick, , sizeof v);
+   assert(r == sizeof v);
+   return true;
+}
+
+void vq_callback(struct virtqueue *vq)
+{
+}
+
+
+void vhost_vq_setup(struct vdev_info *dev, struct vq_info *info)
+{
+   struct vhost_vring_state state = { .index = info->idx };
+   struct vhost_vring_file file = { .index = info->idx };
+   unsigned long long features = dev->vdev.features;
+   struct vhost_vring_addr addr = {
+   .index = info->idx,
+   .desc_user_addr = (uint64_t)(unsigned long)info->vring.desc,
+   .avail_user_addr = (uint64_t)(unsigned long)info->vring.avail,
+   .used_user_addr = (uint64_t)(unsigned long)info->vring.used,
+   };
+   int r;
+   r = ioctl(dev->control, VHOST_SET_FEATURES, );
+   assert(r >= 0);
+   state.num = info->vring.num;
+   r = ioctl(dev->control, VHOST_SET_VRING_NUM, );
+   assert(r >= 0);
+   state.num = 0;
+   r = ioctl(dev->control, VHOST_SET_VRING_BASE, );
+   assert(r >= 0);
+   r = ioctl(dev->control, VHOST_SET_VRING_ADDR, );
+   assert(r >= 0);
+   file.fd = info->kick;
+   r = ioctl(dev->control, VHOST_SET_VRING_KICK, );
+   assert(r >= 0);
+   file.fd = info->call;
+   r = ioctl(dev->control, VHOST_SET_VRING_CALL, );
+   assert(r >= 0);
+}
+
+static void vq_reset(struct vq_info *info, int num, struct virtio_device *vdev)
+{
+   if (info->vq)
+   vring_del_virtqueue(info->vq);
+
+   memset(info->ring, 0, vring_size(num, 4096));
+   vring_init(>vring, num, info->ring, 4096);
+   info->vq = vring_new_virtqueue(info->idx, num, 4096, vdev, true, false,
+  info->ring, vq_notify,

[PATCH RFC 4/6] vhost/net: remove vhost_net_page_frag_refill()

2023-12-01 Thread Yunsheng Lin

The page frag in vhost_net_page_frag_refill() uses the
'struct page_frag' from skb_page_frag_refill(), but it's
implementation is similar to page_frag_alloc_align() now.

This patch removes vhost_net_page_frag_refill() by using
'struct page_frag_cache' instead of 'struct page_frag',
and allocating frag using page_frag_alloc_align().

The added benefit is that not only unifying the page frag
implementation a little, but also having about 0.5% performance
boost testing by using the vhost_net_test introduced in the
last patch.

Signed-off-by: Yunsheng Lin 
---
 drivers/vhost/net.c | 93 ++---
 1 file changed, 29 insertions(+), 64 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index e574e21cc0ca..805e11d598e4 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -141,10 +141,8 @@ struct vhost_net {
unsigned tx_zcopy_err;
/* Flush in progress. Protected by tx vq lock. */
bool tx_flush;
-   /* Private page frag */
-   struct page_frag page_frag;
-   /* Refcount bias of page frag */
-   int refcnt_bias;
+   /* Private page frag cache */
+   struct page_frag_cache pf_cache;
 };
 
 static unsigned vhost_net_zcopy_mask __read_mostly;
@@ -655,41 +653,6 @@ static bool tx_can_batch(struct vhost_virtqueue *vq, 
size_t total_len)
   !vhost_vq_avail_empty(vq->dev, vq);
 }
 
-static bool vhost_net_page_frag_refill(struct vhost_net *net, unsigned int sz,
-  struct page_frag *pfrag, gfp_t gfp)
-{
-   if (pfrag->page) {
-   if (pfrag->offset + sz <= pfrag->size)
-   return true;
-   __page_frag_cache_drain(pfrag->page, net->refcnt_bias);
-   }
-
-   pfrag->offset = 0;
-   net->refcnt_bias = 0;
-   if (SKB_FRAG_PAGE_ORDER) {
-   /* Avoid direct reclaim but allow kswapd to wake */
-   pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) |
- __GFP_COMP | __GFP_NOWARN |
- __GFP_NORETRY | __GFP_NOMEMALLOC,
- SKB_FRAG_PAGE_ORDER);
-   if (likely(pfrag->page)) {
-   pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER;
-   goto done;
-   }
-   }
-   pfrag->page = alloc_page(gfp);
-   if (likely(pfrag->page)) {
-   pfrag->size = PAGE_SIZE;
-   goto done;
-   }
-   return false;
-
-done:
-   net->refcnt_bias = USHRT_MAX;
-   page_ref_add(pfrag->page, USHRT_MAX - 1);
-   return true;
-}
-
 #define VHOST_NET_RX_PAD (NET_IP_ALIGN + NET_SKB_PAD)
 
 static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq,
@@ -699,7 +662,6 @@ static int vhost_net_build_xdp(struct vhost_net_virtqueue 
*nvq,
struct vhost_net *net = container_of(vq->dev, struct vhost_net,
 dev);
struct socket *sock = vhost_vq_get_backend(vq);
-   struct page_frag *alloc_frag = >page_frag;
struct virtio_net_hdr *gso;
struct xdp_buff *xdp = >xdp[nvq->batched_xdp];
struct tun_xdp_hdr *hdr;
@@ -710,6 +672,7 @@ static int vhost_net_build_xdp(struct vhost_net_virtqueue 
*nvq,
int sock_hlen = nvq->sock_hlen;
void *buf;
int copied;
+   int ret;
 
if (unlikely(len < nvq->sock_hlen))
return -EFAULT;
@@ -719,18 +682,17 @@ static int vhost_net_build_xdp(struct vhost_net_virtqueue 
*nvq,
return -ENOSPC;
 
buflen += SKB_DATA_ALIGN(len + pad);
-   alloc_frag->offset = ALIGN((u64)alloc_frag->offset, SMP_CACHE_BYTES);
-   if (unlikely(!vhost_net_page_frag_refill(net, buflen,
-alloc_frag, GFP_KERNEL)))
+   buf = page_frag_alloc_align(>pf_cache, buflen, GFP_KERNEL,
+   SMP_CACHE_BYTES);
+   if (unlikely(!buf))
return -ENOMEM;
 
-   buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
-   copied = copy_page_from_iter(alloc_frag->page,
-alloc_frag->offset +
-offsetof(struct tun_xdp_hdr, gso),
-sock_hlen, from);
-   if (copied != sock_hlen)
-   return -EFAULT;
+   copied = copy_from_iter(buf + offsetof(struct tun_xdp_hdr, gso),
+   sock_hlen, from);
+   if (copied != sock_hlen) {
+   ret = -EFAULT;
+   goto err;
+   }
 
hdr = buf;
gso = >gso;
@@ -743,27 +705,30 @@ static int vhost_net_build_xdp(struct vhost_net_virtqueue 
*nvq,
   vhost16_to_cpu(vq, gso->csum_start) +
   vhost16_to_cpu(vq, gso->csum_offset) + 2);
 
-   if (vhost16_to_cpu(vq,

[PATCH RFC 2/6] page_frag: unify gfp bit for order 3 page allocation

2023-12-01 Thread Yunsheng Lin

Currently there seems to be three page frag implementions
which all try to allocate order 3 page, if that fails, it
then fail back to allocate order 0 page, and each of them
all allow order 3 page allocation to fail under certain
condition by using specific gfp bits.

The gfp bits for order 3 page allocation are different
between different implementation, __GFP_NOMEMALLOC is
or'd to forbid access to emergency reserves memory for
__page_frag_cache_refill(), but it is not or'd in other
implementions, __GFP_DIRECT_RECLAIM is xor'd to avoid
direct reclaim in skb_page_frag_refill(), but it is not
xor'd in __page_frag_cache_refill().

This patch unifies the gfp bits used between different
implementions by or'ing __GFP_NOMEMALLOC and xor'ing
__GFP_DIRECT_RECLAIM for order 3 page allocation to avoid
possible pressure for mm.

Signed-off-by: Yunsheng Lin 
CC: Alexander Duyck 
---
 drivers/vhost/net.c | 2 +-
 mm/page_alloc.c | 4 ++--
 net/core/sock.c | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index f2ed7167c848..e574e21cc0ca 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -670,7 +670,7 @@ static bool vhost_net_page_frag_refill(struct vhost_net 
*net, unsigned int sz,
/* Avoid direct reclaim but allow kswapd to wake */
pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) |
  __GFP_COMP | __GFP_NOWARN |
- __GFP_NORETRY,
+ __GFP_NORETRY | __GFP_NOMEMALLOC,
  SKB_FRAG_PAGE_ORDER);
if (likely(pfrag->page)) {
pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9a16305cf985..1f0b36dd81b5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4693,8 +4693,8 @@ static struct page *__page_frag_cache_refill(struct 
page_frag_cache *nc,
gfp_t gfp = gfp_mask;
 
 #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
-   gfp_mask |= __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY |
-   __GFP_NOMEMALLOC;
+   gfp_mask = (gfp_mask & ~__GFP_DIRECT_RECLAIM) |  __GFP_COMP |
+  __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC;
page = alloc_pages_node(NUMA_NO_NODE, gfp_mask,
PAGE_FRAG_CACHE_MAX_ORDER);
nc->size = page ? PAGE_FRAG_CACHE_MAX_SIZE : PAGE_SIZE;
diff --git a/net/core/sock.c b/net/core/sock.c
index fef349dd72fa..4efa9cae4b0d 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2904,7 +2904,7 @@ bool skb_page_frag_refill(unsigned int sz, struct 
page_frag *pfrag, gfp_t gfp)
/* Avoid direct reclaim but allow kswapd to wake */
pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) |
  __GFP_COMP | __GFP_NOWARN |
- __GFP_NORETRY,
+ __GFP_NORETRY | __GFP_NOMEMALLOC,
  SKB_FRAG_PAGE_ORDER);
if (likely(pfrag->page)) {
pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER;
-- 
2.33.0

Re: [PATCH vhost 2/7] vdpa/mlx5: Split function into locked and unlocked variants

2023-12-01 Thread Eugenio Perez Martin

On Fri, Dec 1, 2023 at 11:49 AM Dragos Tatulea  wrote:
>
> mlx5_vdpa_destroy_mr contains more logic than _mlx5_vdpa_destroy_mr.
> There is no reason for this to be the case. All the logic can go into
> the unlocked variant.
>
> Using the unlocked version is needed in a follow-up patch. And it also
> makes it more consistent with mlx5_vdpa_create_mr.
>
> Signed-off-by: Dragos Tatulea 

Acked-by: Eugenio Pérez 

> ---
>  drivers/vdpa/mlx5/core/mr.c | 31 ---
>  1 file changed, 16 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
> index 2197c46e563a..8c80d9e77935 100644
> --- a/drivers/vdpa/mlx5/core/mr.c
> +++ b/drivers/vdpa/mlx5/core/mr.c
> @@ -498,32 +498,32 @@ static void destroy_user_mr(struct mlx5_vdpa_dev 
> *mvdev, struct mlx5_vdpa_mr *mr
>
>  static void _mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev, struct 
> mlx5_vdpa_mr *mr)
>  {
> +   if (!mr)
> +   return;
> +
> if (mr->user_mr)
> destroy_user_mr(mvdev, mr);
> else
> destroy_dma_mr(mvdev, mr);
>
> +   for (int i = 0; i < MLX5_VDPA_NUM_AS; i++) {
> +   if (mvdev->mr[i] == mr)
> +   mvdev->mr[i] = NULL;
> +   }
> +
> vhost_iotlb_free(mr->iotlb);
> +
> +   kfree(mr);
>  }
>
>  void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev,
>   struct mlx5_vdpa_mr *mr)
>  {
> -   if (!mr)
> -   return;
> -
> mutex_lock(>mr_mtx);
>
> _mlx5_vdpa_destroy_mr(mvdev, mr);
>
> -   for (int i = 0; i < MLX5_VDPA_NUM_AS; i++) {
> -   if (mvdev->mr[i] == mr)
> -   mvdev->mr[i] = NULL;
> -   }
> -
> mutex_unlock(>mr_mtx);
> -
> -   kfree(mr);
>  }
>
>  void mlx5_vdpa_update_mr(struct mlx5_vdpa_dev *mvdev,
> @@ -535,10 +535,7 @@ void mlx5_vdpa_update_mr(struct mlx5_vdpa_dev *mvdev,
> mutex_lock(>mr_mtx);
>
> mvdev->mr[asid] = new_mr;
> -   if (old_mr) {
> -   _mlx5_vdpa_destroy_mr(mvdev, old_mr);
> -   kfree(old_mr);
> -   }
> +   _mlx5_vdpa_destroy_mr(mvdev, old_mr);
>
> mutex_unlock(>mr_mtx);
>
> @@ -546,8 +543,12 @@ void mlx5_vdpa_update_mr(struct mlx5_vdpa_dev *mvdev,
>
>  void mlx5_vdpa_destroy_mr_resources(struct mlx5_vdpa_dev *mvdev)
>  {
> +   mutex_lock(>mr_mtx);
> +
> for (int i = 0; i < MLX5_VDPA_NUM_AS; i++)
> -   mlx5_vdpa_destroy_mr(mvdev, mvdev->mr[i]);
> +   _mlx5_vdpa_destroy_mr(mvdev, mvdev->mr[i]);
> +
> +   mutex_unlock(>mr_mtx);
>
> prune_iotlb(mvdev->cvq.iotlb);
>  }
> --
> 2.42.0
>

[PATCH V1] rpmsg: glink: smem: validate index before fifo read write

2023-12-01 Thread Deepak Kumar Singh

Fifo head and tail index can be modified with wrong values from
untrusted remote procs. Glink smem is not validating these index
before using to read or write fifo. This can result in out of
bound memory access if head and tail have incorrect values.

Add check for validation of head and tail index. This check will
put index within fifo boundaries, so that no invalid memory access
is made. Further this may result in certain packet drops unless
glink finds a valid packet header in fifo again and recovers.

Crash signature and calltrace with wrong head and tail values:

Internal error: Oops: 9607 [#1] PREEMPT SMP
pc : __memcpy_fromio+0x34/0xb4
lr : glink_smem_rx_peak+0x68/0x94

__memcpy_fromio+0x34/0xb4
glink_smem_rx_peak+0x68/0x94
qcom_glink_native_intr+0x90/0x888

Signed-off-by: Deepak Kumar Singh 
---
 drivers/rpmsg/qcom_glink_smem.c | 21 ++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/rpmsg/qcom_glink_smem.c b/drivers/rpmsg/qcom_glink_smem.c
index 7a982c60a8dd..9eba0aaae916 100644
--- a/drivers/rpmsg/qcom_glink_smem.c
+++ b/drivers/rpmsg/qcom_glink_smem.c
@@ -86,9 +86,14 @@ static size_t glink_smem_rx_avail(struct qcom_glink_pipe *np)
tail = le32_to_cpu(*pipe->tail);
 
if (head < tail)
-   return pipe->native.length - tail + head;
+   len = pipe->native.length - tail + head;
else
-   return head - tail;
+   len = head - tail;
+
+   if (WARN_ON_ONCE(len > pipe->native.length))
+   len = 0;
+
+   return len;
 }
 
 static void glink_smem_rx_peek(struct qcom_glink_pipe *np,
@@ -99,6 +104,10 @@ static void glink_smem_rx_peek(struct qcom_glink_pipe *np,
u32 tail;
 
tail = le32_to_cpu(*pipe->tail);
+
+   if (WARN_ON_ONCE(tail > pipe->native.length))
+   return;
+
tail += offset;
if (tail >= pipe->native.length)
tail -= pipe->native.length;
@@ -121,7 +130,7 @@ static void glink_smem_rx_advance(struct qcom_glink_pipe 
*np,
 
tail += count;
if (tail >= pipe->native.length)
-   tail -= pipe->native.length;
+   tail %= pipe->native.length;
 
*pipe->tail = cpu_to_le32(tail);
 }
@@ -146,6 +155,9 @@ static size_t glink_smem_tx_avail(struct qcom_glink_pipe 
*np)
else
avail -= FIFO_FULL_RESERVE + TX_BLOCKED_CMD_RESERVE;
 
+   if (WARN_ON_ONCE(avail > pipe->native.length))
+   avail = 0;
+
return avail;
 }
 
@@ -155,6 +167,9 @@ static unsigned int glink_smem_tx_write_one(struct 
glink_smem_pipe *pipe,
 {
size_t len;
 
+   if (WARN_ON_ONCE(head > pipe->native.length))
+   return head;
+
len = min_t(size_t, count, pipe->native.length - head);
if (len)
memcpy(pipe->fifo + head, data, len);
-- 
2.34.1

[PATCH vhost 7/7] vdpa/mlx5: Use vq suspend/resume during .set_map

2023-12-01 Thread Dragos Tatulea

Instead of tearing down and setting up vq resources, use vq
suspend/resume during .set_map to speed things up a bit.

The vq mr is updated with the new mapping while the vqs are suspended.

If the device doesn't support resumable vqs, do the old teardown and
setup dance.

Signed-off-by: Dragos Tatulea 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c  | 46 --
 include/linux/mlx5/mlx5_ifc_vdpa.h |  1 +
 2 files changed, 39 insertions(+), 8 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 6325aef045e2..eb0ac3d9c9d2 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1206,6 +1206,7 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
 {
int inlen = MLX5_ST_SZ_BYTES(modify_virtio_net_q_in);
u32 out[MLX5_ST_SZ_DW(modify_virtio_net_q_out)] = {};
+   struct mlx5_vdpa_dev *mvdev = >mvdev;
bool state_change = false;
void *obj_context;
void *cmd_hdr;
@@ -1255,6 +1256,24 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_USED_IDX)
MLX5_SET(virtio_net_q_object, obj_context, hw_used_index, 
mvq->used_idx);
 
+   if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY) {
+   struct mlx5_vdpa_mr *mr = 
mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP]];
+
+   if (mr)
+   MLX5_SET(virtio_q, vq_ctx, virtio_q_mkey, mr->mkey);
+   else
+   mvq->modified_fields &= 
~MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY;
+   }
+
+   if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY) {
+   struct mlx5_vdpa_mr *mr = 
mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_DESC_GROUP]];
+
+   if (mr && MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, 
desc_group_mkey_supported))
+   MLX5_SET(virtio_q, vq_ctx, desc_group_mkey, mr->mkey);
+   else
+   mvq->modified_fields &= 
~MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY;
+   }
+
MLX5_SET64(virtio_net_q_object, obj_context, modify_field_select, 
mvq->modified_fields);
err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, sizeof(out));
if (err)
@@ -2784,24 +2803,35 @@ static int mlx5_vdpa_change_map(struct mlx5_vdpa_dev 
*mvdev,
unsigned int asid)
 {
struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+   bool teardown = !is_resumable(ndev);
int err;
 
suspend_vqs(ndev);
-   err = save_channels_info(ndev);
-   if (err)
-   return err;
+   if (teardown) {
+   err = save_channels_info(ndev);
+   if (err)
+   return err;
 
-   teardown_driver(ndev);
+   teardown_driver(ndev);
+   }
 
mlx5_vdpa_update_mr(mvdev, new_mr, asid);
 
+   for (int i = 0; i < ndev->cur_num_vqs; i++)
+   ndev->vqs[i].modified_fields |= 
MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY |
+   
MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY;
+
if (!(mvdev->status & VIRTIO_CONFIG_S_DRIVER_OK) || mvdev->suspended)
return 0;
 
-   restore_channels_info(ndev);
-   err = setup_driver(mvdev);
-   if (err)
-   return err;
+   if (teardown) {
+   restore_channels_info(ndev);
+   err = setup_driver(mvdev);
+   if (err)
+   return err;
+   }
+
+   resume_vqs(ndev);
 
return 0;
 }
diff --git a/include/linux/mlx5/mlx5_ifc_vdpa.h 
b/include/linux/mlx5/mlx5_ifc_vdpa.h
index 32e712106e68..40371c916cf9 100644
--- a/include/linux/mlx5/mlx5_ifc_vdpa.h
+++ b/include/linux/mlx5/mlx5_ifc_vdpa.h
@@ -148,6 +148,7 @@ enum {
MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_ADDRS   = (u64)1 << 6,
MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_AVAIL_IDX   = (u64)1 << 7,
MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_USED_IDX= (u64)1 << 8,
+   MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY= (u64)1 << 11,
MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY  = (u64)1 << 14,
 };
 
-- 
2.42.0

[PATCH vhost 6/7] vdpa/mlx5: Mark vq state for modification in hw vq

2023-12-01 Thread Dragos Tatulea

.set_vq_state will set the indices and mark the fields to be modified in
the hw vq.

Signed-off-by: Dragos Tatulea 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c  | 8 
 include/linux/mlx5/mlx5_ifc_vdpa.h | 2 ++
 2 files changed, 10 insertions(+)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 2277daf4814f..6325aef045e2 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1249,6 +1249,12 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
MLX5_SET64(virtio_q, vq_ctx, available_addr, mvq->driver_addr);
}
 
+   if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_AVAIL_IDX)
+   MLX5_SET(virtio_net_q_object, obj_context, hw_available_index, 
mvq->avail_idx);
+
+   if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_USED_IDX)
+   MLX5_SET(virtio_net_q_object, obj_context, hw_used_index, 
mvq->used_idx);
+
MLX5_SET64(virtio_net_q_object, obj_context, modify_field_select, 
mvq->modified_fields);
err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, sizeof(out));
if (err)
@@ -2328,6 +2334,8 @@ static int mlx5_vdpa_set_vq_state(struct vdpa_device 
*vdev, u16 idx,
 
mvq->used_idx = state->split.avail_index;
mvq->avail_idx = state->split.avail_index;
+   mvq->modified_fields |= MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_AVAIL_IDX |
+   MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_USED_IDX;
return 0;
 }
 
diff --git a/include/linux/mlx5/mlx5_ifc_vdpa.h 
b/include/linux/mlx5/mlx5_ifc_vdpa.h
index 9594ac405740..32e712106e68 100644
--- a/include/linux/mlx5/mlx5_ifc_vdpa.h
+++ b/include/linux/mlx5/mlx5_ifc_vdpa.h
@@ -146,6 +146,8 @@ enum {
MLX5_VIRTQ_MODIFY_MASK_DIRTY_BITMAP_PARAMS  = (u64)1 << 3,
MLX5_VIRTQ_MODIFY_MASK_DIRTY_BITMAP_DUMP_ENABLE = (u64)1 << 4,
MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_ADDRS   = (u64)1 << 6,
+   MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_AVAIL_IDX   = (u64)1 << 7,
+   MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_USED_IDX= (u64)1 << 8,
MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY  = (u64)1 << 14,
 };
 
-- 
2.42.0

[PATCH vhost 5/7] vdpa/mlx5: Mark vq addrs for modification in hw vq

2023-12-01 Thread Dragos Tatulea

Addresses get set by .set_vq_address. hw vq addresses will be updated on
next modify_virtqueue.

Signed-off-by: Dragos Tatulea 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c  | 9 +
 include/linux/mlx5/mlx5_ifc_vdpa.h | 1 +
 2 files changed, 10 insertions(+)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 68e534cb57e2..2277daf4814f 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1209,6 +1209,7 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
bool state_change = false;
void *obj_context;
void *cmd_hdr;
+   void *vq_ctx;
void *in;
int err;
 
@@ -1230,6 +1231,7 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, uid, ndev->mvdev.res.uid);
 
obj_context = MLX5_ADDR_OF(modify_virtio_net_q_in, in, obj_context);
+   vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context, 
virtio_q_context);
 
if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_STATE) {
if (!is_valid_state_change(mvq->fw_state, state, 
is_resumable(ndev))) {
@@ -1241,6 +1243,12 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
state_change = true;
}
 
+   if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_ADDRS) {
+   MLX5_SET64(virtio_q, vq_ctx, desc_addr, mvq->desc_addr);
+   MLX5_SET64(virtio_q, vq_ctx, used_addr, mvq->device_addr);
+   MLX5_SET64(virtio_q, vq_ctx, available_addr, mvq->driver_addr);
+   }
+
MLX5_SET64(virtio_net_q_object, obj_context, modify_field_select, 
mvq->modified_fields);
err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, sizeof(out));
if (err)
@@ -2202,6 +2210,7 @@ static int mlx5_vdpa_set_vq_address(struct vdpa_device 
*vdev, u16 idx, u64 desc_
mvq->desc_addr = desc_area;
mvq->device_addr = device_area;
mvq->driver_addr = driver_area;
+   mvq->modified_fields |= MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_ADDRS;
return 0;
 }
 
diff --git a/include/linux/mlx5/mlx5_ifc_vdpa.h 
b/include/linux/mlx5/mlx5_ifc_vdpa.h
index b86d51a855f6..9594ac405740 100644
--- a/include/linux/mlx5/mlx5_ifc_vdpa.h
+++ b/include/linux/mlx5/mlx5_ifc_vdpa.h
@@ -145,6 +145,7 @@ enum {
MLX5_VIRTQ_MODIFY_MASK_STATE= (u64)1 << 0,
MLX5_VIRTQ_MODIFY_MASK_DIRTY_BITMAP_PARAMS  = (u64)1 << 3,
MLX5_VIRTQ_MODIFY_MASK_DIRTY_BITMAP_DUMP_ENABLE = (u64)1 << 4,
+   MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_ADDRS   = (u64)1 << 6,
MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY  = (u64)1 << 14,
 };
 
-- 
2.42.0

[PATCH vhost 4/7] vdpa/mlx5: Introduce per vq and device resume

2023-12-01 Thread Dragos Tatulea

Implement vdpa vq and device resume if capability detected. Add support
for suspend -> ready state change.

Signed-off-by: Dragos Tatulea 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 67 +++
 1 file changed, 60 insertions(+), 7 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index d06285e46fe2..68e534cb57e2 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1170,7 +1170,12 @@ static int query_virtqueue(struct mlx5_vdpa_net *ndev, 
struct mlx5_vdpa_virtqueu
return err;
 }
 
-static bool is_valid_state_change(int oldstate, int newstate)
+static bool is_resumable(struct mlx5_vdpa_net *ndev)
+{
+   return ndev->mvdev.vdev.config->resume;
+}
+
+static bool is_valid_state_change(int oldstate, int newstate, bool resumable)
 {
switch (oldstate) {
case MLX5_VIRTIO_NET_Q_OBJECT_STATE_INIT:
@@ -1178,6 +1183,7 @@ static bool is_valid_state_change(int oldstate, int 
newstate)
case MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY:
return newstate == MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND;
case MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND:
+   return resumable ? newstate == 
MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY : false;
case MLX5_VIRTIO_NET_Q_OBJECT_STATE_ERR:
default:
return false;
@@ -1200,6 +1206,7 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
 {
int inlen = MLX5_ST_SZ_BYTES(modify_virtio_net_q_in);
u32 out[MLX5_ST_SZ_DW(modify_virtio_net_q_out)] = {};
+   bool state_change = false;
void *obj_context;
void *cmd_hdr;
void *in;
@@ -1211,9 +1218,6 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
if (!modifiable_virtqueue_fields(mvq))
return -EINVAL;
 
-   if (!is_valid_state_change(mvq->fw_state, state))
-   return -EINVAL;
-
in = kzalloc(inlen, GFP_KERNEL);
if (!in)
return -ENOMEM;
@@ -1226,17 +1230,29 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, uid, ndev->mvdev.res.uid);
 
obj_context = MLX5_ADDR_OF(modify_virtio_net_q_in, in, obj_context);
-   if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_STATE)
+
+   if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_STATE) {
+   if (!is_valid_state_change(mvq->fw_state, state, 
is_resumable(ndev))) {
+   err = -EINVAL;
+   goto done;
+   }
+
MLX5_SET(virtio_net_q_object, obj_context, state, state);
+   state_change = true;
+   }
 
MLX5_SET64(virtio_net_q_object, obj_context, modify_field_select, 
mvq->modified_fields);
err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, sizeof(out));
-   kfree(in);
-   if (!err)
+   if (err)
+   goto done;
+
+   if (state_change)
mvq->fw_state = state;
 
mvq->modified_fields = 0;
 
+done:
+   kfree(in);
return err;
 }
 
@@ -1430,6 +1446,24 @@ static void suspend_vqs(struct mlx5_vdpa_net *ndev)
suspend_vq(ndev, >vqs[i]);
 }
 
+static void resume_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue 
*mvq)
+{
+   if (!mvq->initialized || !is_resumable(ndev))
+   return;
+
+   if (mvq->fw_state != MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND)
+   return;
+
+   if (modify_virtqueue_state(ndev, mvq, 
MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY))
+   mlx5_vdpa_warn(>mvdev, "modify to resume failed\n");
+}
+
+static void resume_vqs(struct mlx5_vdpa_net *ndev)
+{
+   for (int i = 0; i < ndev->mvdev.max_vqs; i++)
+   resume_vq(ndev, >vqs[i]);
+}
+
 static void teardown_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue 
*mvq)
 {
if (!mvq->initialized)
@@ -3256,6 +3290,21 @@ static int mlx5_vdpa_suspend(struct vdpa_device *vdev)
return 0;
 }
 
+static int mlx5_vdpa_resume(struct vdpa_device *vdev)
+{
+   struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+   struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+
+   mlx5_vdpa_info(mvdev, "resuming device\n");
+
+   down_write(>reslock);
+   mvdev->suspended = false;
+   resume_vqs(ndev);
+   register_link_notifier(ndev);
+   up_write(>reslock);
+   return 0;
+}
+
 static int mlx5_set_group_asid(struct vdpa_device *vdev, u32 group,
   unsigned int asid)
 {
@@ -3312,6 +3361,7 @@ static const struct vdpa_config_ops mlx5_vdpa_ops = {
.get_vq_dma_dev = mlx5_get_vq_dma_dev,
.free = mlx5_vdpa_free,
.suspend = mlx5_vdpa_suspend,
+   .resume = mlx5_vdpa_resume, /* Op disabled if not supported. */
 };
 
 static int query_mtu(struct mlx5_core_dev *mdev, u16 *mtu)
@@ -3683,6 +3733,9 @@ static int mlx5v_probe(struct auxiliary_device *adev,
if

[PATCH vhost 3/7] vdpa/mlx5: Allow modifying multiple vq fields in one modify command

2023-12-01 Thread Dragos Tatulea

Add a bitmask variable that tracks hw vq field changes that
are supposed to be modified on next hw vq change command.

This will be useful to set multiple vq fields when resuming the vq.

The state needs to remain as a parameter as it doesn't make sense to
make it part of the vq struct: fw_state is updated only after a
successful command.

Signed-off-by: Dragos Tatulea 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 48 +--
 1 file changed, 40 insertions(+), 8 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 12ac3397f39b..d06285e46fe2 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -120,6 +120,9 @@ struct mlx5_vdpa_virtqueue {
u16 avail_idx;
u16 used_idx;
int fw_state;
+
+   u64 modified_fields;
+
struct msi_map map;
 
/* keep last in the struct */
@@ -1181,7 +1184,19 @@ static bool is_valid_state_change(int oldstate, int 
newstate)
}
 }
 
-static int modify_virtqueue(struct mlx5_vdpa_net *ndev, struct 
mlx5_vdpa_virtqueue *mvq, int state)
+static bool modifiable_virtqueue_fields(struct mlx5_vdpa_virtqueue *mvq)
+{
+   /* Only state is always modifiable */
+   if (mvq->modified_fields & ~MLX5_VIRTQ_MODIFY_MASK_STATE)
+   return mvq->fw_state == MLX5_VIRTIO_NET_Q_OBJECT_STATE_INIT ||
+  mvq->fw_state == MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND;
+
+   return true;
+}
+
+static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
+   struct mlx5_vdpa_virtqueue *mvq,
+   int state)
 {
int inlen = MLX5_ST_SZ_BYTES(modify_virtio_net_q_in);
u32 out[MLX5_ST_SZ_DW(modify_virtio_net_q_out)] = {};
@@ -1193,6 +1208,9 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev, 
struct mlx5_vdpa_virtque
if (mvq->fw_state == MLX5_VIRTIO_NET_Q_OBJECT_NONE)
return 0;
 
+   if (!modifiable_virtqueue_fields(mvq))
+   return -EINVAL;
+
if (!is_valid_state_change(mvq->fw_state, state))
return -EINVAL;
 
@@ -1208,17 +1226,28 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev, 
struct mlx5_vdpa_virtque
MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, uid, ndev->mvdev.res.uid);
 
obj_context = MLX5_ADDR_OF(modify_virtio_net_q_in, in, obj_context);
-   MLX5_SET64(virtio_net_q_object, obj_context, modify_field_select,
-  MLX5_VIRTQ_MODIFY_MASK_STATE);
-   MLX5_SET(virtio_net_q_object, obj_context, state, state);
+   if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_STATE)
+   MLX5_SET(virtio_net_q_object, obj_context, state, state);
+
+   MLX5_SET64(virtio_net_q_object, obj_context, modify_field_select, 
mvq->modified_fields);
err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, sizeof(out));
kfree(in);
if (!err)
mvq->fw_state = state;
 
+   mvq->modified_fields = 0;
+
return err;
 }
 
+static int modify_virtqueue_state(struct mlx5_vdpa_net *ndev,
+ struct mlx5_vdpa_virtqueue *mvq,
+ unsigned int state)
+{
+   mvq->modified_fields |= MLX5_VIRTQ_MODIFY_MASK_STATE;
+   return modify_virtqueue(ndev, mvq, state);
+}
+
 static int counter_set_alloc(struct mlx5_vdpa_net *ndev, struct 
mlx5_vdpa_virtqueue *mvq)
 {
u32 in[MLX5_ST_SZ_DW(create_virtio_q_counters_in)] = {};
@@ -1347,7 +1376,7 @@ static int setup_vq(struct mlx5_vdpa_net *ndev, struct 
mlx5_vdpa_virtqueue *mvq)
goto err_vq;
 
if (mvq->ready) {
-   err = modify_virtqueue(ndev, mvq, 
MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
+   err = modify_virtqueue_state(ndev, mvq, 
MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
if (err) {
mlx5_vdpa_warn(>mvdev, "failed to modify to ready 
vq idx %d(%d)\n",
   idx, err);
@@ -1382,7 +1411,7 @@ static void suspend_vq(struct mlx5_vdpa_net *ndev, struct 
mlx5_vdpa_virtqueue *m
if (mvq->fw_state != MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY)
return;
 
-   if (modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND))
+   if (modify_virtqueue_state(ndev, mvq, 
MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND))
mlx5_vdpa_warn(>mvdev, "modify to suspend failed\n");
 
if (query_virtqueue(ndev, mvq, )) {
@@ -1407,6 +1436,7 @@ static void teardown_vq(struct mlx5_vdpa_net *ndev, 
struct mlx5_vdpa_virtqueue *
return;
 
suspend_vq(ndev, mvq);
+   mvq->modified_fields = 0;
destroy_virtqueue(ndev, mvq);
dealloc_vector(ndev, mvq);
counter_set_dealloc(ndev, mvq);
@@ -2207,7 +2237,7 @@ static void mlx5_vdpa_set_vq_ready(struct vdpa_device 
*vdev, u16 idx, bool ready
if (!ready) {
suspend_vq(ndev, mvq);
}

[PATCH vhost 2/7] vdpa/mlx5: Split function into locked and unlocked variants

2023-12-01 Thread Dragos Tatulea

mlx5_vdpa_destroy_mr contains more logic than _mlx5_vdpa_destroy_mr.
There is no reason for this to be the case. All the logic can go into
the unlocked variant.

Using the unlocked version is needed in a follow-up patch. And it also
makes it more consistent with mlx5_vdpa_create_mr.

Signed-off-by: Dragos Tatulea 
---
 drivers/vdpa/mlx5/core/mr.c | 31 ---
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
index 2197c46e563a..8c80d9e77935 100644
--- a/drivers/vdpa/mlx5/core/mr.c
+++ b/drivers/vdpa/mlx5/core/mr.c
@@ -498,32 +498,32 @@ static void destroy_user_mr(struct mlx5_vdpa_dev *mvdev, 
struct mlx5_vdpa_mr *mr
 
 static void _mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev, struct 
mlx5_vdpa_mr *mr)
 {
+   if (!mr)
+   return;
+
if (mr->user_mr)
destroy_user_mr(mvdev, mr);
else
destroy_dma_mr(mvdev, mr);
 
+   for (int i = 0; i < MLX5_VDPA_NUM_AS; i++) {
+   if (mvdev->mr[i] == mr)
+   mvdev->mr[i] = NULL;
+   }
+
vhost_iotlb_free(mr->iotlb);
+
+   kfree(mr);
 }
 
 void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev,
  struct mlx5_vdpa_mr *mr)
 {
-   if (!mr)
-   return;
-
mutex_lock(>mr_mtx);
 
_mlx5_vdpa_destroy_mr(mvdev, mr);
 
-   for (int i = 0; i < MLX5_VDPA_NUM_AS; i++) {
-   if (mvdev->mr[i] == mr)
-   mvdev->mr[i] = NULL;
-   }
-
mutex_unlock(>mr_mtx);
-
-   kfree(mr);
 }
 
 void mlx5_vdpa_update_mr(struct mlx5_vdpa_dev *mvdev,
@@ -535,10 +535,7 @@ void mlx5_vdpa_update_mr(struct mlx5_vdpa_dev *mvdev,
mutex_lock(>mr_mtx);
 
mvdev->mr[asid] = new_mr;
-   if (old_mr) {
-   _mlx5_vdpa_destroy_mr(mvdev, old_mr);
-   kfree(old_mr);
-   }
+   _mlx5_vdpa_destroy_mr(mvdev, old_mr);
 
mutex_unlock(>mr_mtx);
 
@@ -546,8 +543,12 @@ void mlx5_vdpa_update_mr(struct mlx5_vdpa_dev *mvdev,
 
 void mlx5_vdpa_destroy_mr_resources(struct mlx5_vdpa_dev *mvdev)
 {
+   mutex_lock(>mr_mtx);
+
for (int i = 0; i < MLX5_VDPA_NUM_AS; i++)
-   mlx5_vdpa_destroy_mr(mvdev, mvdev->mr[i]);
+   _mlx5_vdpa_destroy_mr(mvdev, mvdev->mr[i]);
+
+   mutex_unlock(>mr_mtx);
 
prune_iotlb(mvdev->cvq.iotlb);
 }
-- 
2.42.0

[PATCH mlx5-vhost 1/7] vdpa/mlx5: Expose resumable vq capability

2023-12-01 Thread Dragos Tatulea

Necessary for checking if resumable vqs are supported by the hardware.
Actual support will be added in a downstream patch.

Signed-off-by: Dragos Tatulea 
---
 include/linux/mlx5/mlx5_ifc.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 3388007c645f..21dcfa034b7b 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -1235,7 +1235,8 @@ struct mlx5_ifc_virtio_emulation_cap_bits {
 
u8 reserved_at_c0[0x13];
u8 desc_group_mkey_supported[0x1];
-   u8 reserved_at_d4[0xc];
+   u8 freeze_to_rdy_supported[0x1];
+   u8 reserved_at_d5[0xb];
 
u8 reserved_at_e0[0x20];
 
-- 
2.42.0

[PATCH vhost 0/7] vdpa/mlx5: Add support for resumable vqs

2023-12-01 Thread Dragos Tatulea

Add support for resumable vqs in the driver. This is a firmware feature
that can be used for the following benefits:
- Full device .suspend/.resume.
- .set_map doesn't need to destroy and create new vqs anymore just to
  update the map. When resumable vqs are supported it is enough to
  suspend the vqs, set the new maps, and then resume the vqs.

The first patch exposes the relevant bits in mlx5_ifc.h. That means it
needs to be applied to the mlx5-vhost tree [0] first. Once applied
there, the change has to be pulled from mlx5-vhost into the vhost tree
and only then the remaining patches can be applied. Same flow as the vq
descriptor mappings patchset [1].

To be able to use resumable vqs properly, support for selectively modifying
vq parameters was needed. This is what the middle part of the series
consists of.

[0] 
https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/log/?h=mlx5-vhost
[1] 
https://lore.kernel.org/virtualization/20231018171456.1624030-2-dtatu...@nvidia.com/

Dragos Tatulea (7):
  vdpa/mlx5: Expose resumable vq capability
  vdpa/mlx5: Split function into locked and unlocked variants
  vdpa/mlx5: Allow modifying multiple vq fields in one modify command
  vdpa/mlx5: Introduce per vq and device resume
  vdpa/mlx5: Mark vq addrs for modification in hw vq
  vdpa/mlx5: Mark vq state for modification in hw vq
  vdpa/mlx5: Use vq suspend/resume during .set_map

 drivers/vdpa/mlx5/core/mr.c|  31 +++---
 drivers/vdpa/mlx5/net/mlx5_vnet.c  | 172 +
 include/linux/mlx5/mlx5_ifc.h  |   3 +-
 include/linux/mlx5/mlx5_ifc_vdpa.h |   4 +
 4 files changed, 174 insertions(+), 36 deletions(-)

-- 
2.42.0

Re: [PATCH net-next v5 2/3] virtio/vsock: send credit update during setting SO_RCVLOWAT

2023-12-01 Thread Arseniy Krasnov




On 01.12.2023 12:48, Stefano Garzarella wrote:
> On Fri, Dec 01, 2023 at 11:35:56AM +0300, Arseniy Krasnov wrote:
>>
>>
>> On 01.12.2023 11:27, Stefano Garzarella wrote:
>>> On Thu, Nov 30, 2023 at 12:40:43PM -0500, Michael S. Tsirkin wrote:
 On Thu, Nov 30, 2023 at 03:11:19PM +0100, Stefano Garzarella wrote:
> On Thu, Nov 30, 2023 at 08:58:58AM -0500, Michael S. Tsirkin wrote:
> > On Thu, Nov 30, 2023 at 04:43:34PM +0300, Arseniy Krasnov wrote:
> > >
> > >
> > > On 30.11.2023 16:42, Michael S. Tsirkin wrote:
> > > > On Thu, Nov 30, 2023 at 04:08:39PM +0300, Arseniy Krasnov wrote:
> > > >> Send credit update message when SO_RCVLOWAT is updated and it is 
> > > >> bigger
> > > >> than number of bytes in rx queue. It is needed, because 'poll()' 
> > > >> will
> > > >> wait until number of bytes in rx queue will be not smaller than
> > > >> SO_RCVLOWAT, so kick sender to send more data. Otherwise mutual 
> > > >> hungup
> > > >> for tx/rx is possible: sender waits for free space and receiver is
> > > >> waiting data in 'poll()'.
> > > >>
> > > >> Signed-off-by: Arseniy Krasnov 
> > > >> ---
> > > >>  Changelog:
> > > >>  v1 -> v2:
> > > >>   * Update commit message by removing 'This patch adds XXX' manner.
> > > >>   * Do not initialize 'send_update' variable - set it directly 
> > > >>during
> > > >> first usage.
> > > >>  v3 -> v4:
> > > >>   * Fit comment in 'virtio_transport_notify_set_rcvlowat()' to 80 
> > > >>chars.
> > > >>  v4 -> v5:
> > > >>   * Do not change callbacks order in transport structures.
> > > >>
> > > >>  drivers/vhost/vsock.c   |  1 +
> > > >>  include/linux/virtio_vsock.h    |  1 +
> > > >>  net/vmw_vsock/virtio_transport.c    |  1 +
> > > >>  net/vmw_vsock/virtio_transport_common.c | 27 
> > > >>+
> > > >>  net/vmw_vsock/vsock_loopback.c  |  1 +
> > > >>  5 files changed, 31 insertions(+)
> > > >>
> > > >> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > > >> index f75731396b7e..4146f80db8ac 100644
> > > >> --- a/drivers/vhost/vsock.c
> > > >> +++ b/drivers/vhost/vsock.c
> > > >> @@ -451,6 +451,7 @@ static struct virtio_transport vhost_transport 
> > > >> = {
> > > >>  .notify_buffer_size   = 
> > > >>virtio_transport_notify_buffer_size,
> > > >>
> > > >>  .read_skb = virtio_transport_read_skb,
> > > >> +    .notify_set_rcvlowat  = 
> > > >> virtio_transport_notify_set_rcvlowat
> > > >>  },
> > > >>
> > > >>  .send_pkt = vhost_transport_send_pkt,
> > > >> diff --git a/include/linux/virtio_vsock.h 
> > > >> b/include/linux/virtio_vsock.h
> > > >> index ebb3ce63d64d..c82089dee0c8 100644
> > > >> --- a/include/linux/virtio_vsock.h
> > > >> +++ b/include/linux/virtio_vsock.h
> > > >> @@ -256,4 +256,5 @@ void virtio_transport_put_credit(struct 
> > > >> virtio_vsock_sock *vvs, u32 credit);
> > > >>  void virtio_transport_deliver_tap_pkt(struct sk_buff *skb);
> > > >>  int virtio_transport_purge_skbs(void *vsk, struct sk_buff_head 
> > > >>*list);
> > > >>  int virtio_transport_read_skb(struct vsock_sock *vsk, 
> > > >>skb_read_actor_t read_actor);
> > > >> +int virtio_transport_notify_set_rcvlowat(struct vsock_sock *vsk, 
> > > >> int val);
> > > >>  #endif /* _LINUX_VIRTIO_VSOCK_H */
> > > >> diff --git a/net/vmw_vsock/virtio_transport.c 
> > > >> b/net/vmw_vsock/virtio_transport.c
> > > >> index af5bab1acee1..8007593a3a93 100644
> > > >> --- a/net/vmw_vsock/virtio_transport.c
> > > >> +++ b/net/vmw_vsock/virtio_transport.c
> > > >> @@ -539,6 +539,7 @@ static struct virtio_transport 
> > > >> virtio_transport = {
> > > >>  .notify_buffer_size   = 
> > > >>virtio_transport_notify_buffer_size,
> > > >>
> > > >>  .read_skb = virtio_transport_read_skb,
> > > >> +    .notify_set_rcvlowat  = 
> > > >> virtio_transport_notify_set_rcvlowat
> > > >>  },
> > > >>
> > > >>  .send_pkt = virtio_transport_send_pkt,
> > > >> diff --git a/net/vmw_vsock/virtio_transport_common.c 
> > > >> b/net/vmw_vsock/virtio_transport_common.c
> > > >> index f6dc896bf44c..1cb556ad4597 100644
> > > >> --- a/net/vmw_vsock/virtio_transport_common.c
> > > >> +++ b/net/vmw_vsock/virtio_transport_common.c
> > > >> @@ -1684,6 +1684,33 @@ int virtio_transport_read_skb(struct 
> > > >> vsock_sock *vsk, skb_read_actor_t recv_acto
> > > >>  }
> > > >>  EXPORT_SYMBOL_GPL(virtio_transport_read_skb);
> > > >>
> > > >> +int virtio_transport_notify_set_rcvlowat(struct vsock_sock *vsk,
> > > >> int val)
> > > >> +{
> > > >> +    struct virtio_vsock_sock *vvs = vsk->trans;
> > > >> +    bool

Re: [PATCH net-next v5 2/3] virtio/vsock: send credit update during setting SO_RCVLOWAT

2023-12-01 Thread Stefano Garzarella


On Fri, Dec 01, 2023 at 11:35:56AM +0300, Arseniy Krasnov wrote:



On 01.12.2023 11:27, Stefano Garzarella wrote:

On Thu, Nov 30, 2023 at 12:40:43PM -0500, Michael S. Tsirkin wrote:

On Thu, Nov 30, 2023 at 03:11:19PM +0100, Stefano Garzarella wrote:

On Thu, Nov 30, 2023 at 08:58:58AM -0500, Michael S. Tsirkin wrote:
> On Thu, Nov 30, 2023 at 04:43:34PM +0300, Arseniy Krasnov wrote:
> >
> >
> > On 30.11.2023 16:42, Michael S. Tsirkin wrote:
> > > On Thu, Nov 30, 2023 at 04:08:39PM +0300, Arseniy Krasnov wrote:
> > >> Send credit update message when SO_RCVLOWAT is updated and it is bigger
> > >> than number of bytes in rx queue. It is needed, because 'poll()' will
> > >> wait until number of bytes in rx queue will be not smaller than
> > >> SO_RCVLOWAT, so kick sender to send more data. Otherwise mutual hungup
> > >> for tx/rx is possible: sender waits for free space and receiver is
> > >> waiting data in 'poll()'.
> > >>
> > >> Signed-off-by: Arseniy Krasnov 
> > >> ---
> > >>  Changelog:
> > >>  v1 -> v2:
> > >>   * Update commit message by removing 'This patch adds XXX' manner.
> > >>   * Do not initialize 'send_update' variable - set it directly during
> > >> first usage.
> > >>  v3 -> v4:
> > >>   * Fit comment in 'virtio_transport_notify_set_rcvlowat()' to 80 chars.
> > >>  v4 -> v5:
> > >>   * Do not change callbacks order in transport structures.
> > >>
> > >>  drivers/vhost/vsock.c   |  1 +
> > >>  include/linux/virtio_vsock.h    |  1 +
> > >>  net/vmw_vsock/virtio_transport.c    |  1 +
> > >>  net/vmw_vsock/virtio_transport_common.c | 27 +
> > >>  net/vmw_vsock/vsock_loopback.c  |  1 +
> > >>  5 files changed, 31 insertions(+)
> > >>
> > >> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > >> index f75731396b7e..4146f80db8ac 100644
> > >> --- a/drivers/vhost/vsock.c
> > >> +++ b/drivers/vhost/vsock.c
> > >> @@ -451,6 +451,7 @@ static struct virtio_transport vhost_transport = {
> > >>  .notify_buffer_size   = virtio_transport_notify_buffer_size,
> > >>
> > >>  .read_skb = virtio_transport_read_skb,
> > >> +    .notify_set_rcvlowat  = virtio_transport_notify_set_rcvlowat
> > >>  },
> > >>
> > >>  .send_pkt = vhost_transport_send_pkt,
> > >> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > >> index ebb3ce63d64d..c82089dee0c8 100644
> > >> --- a/include/linux/virtio_vsock.h
> > >> +++ b/include/linux/virtio_vsock.h
> > >> @@ -256,4 +256,5 @@ void virtio_transport_put_credit(struct 
virtio_vsock_sock *vvs, u32 credit);
> > >>  void virtio_transport_deliver_tap_pkt(struct sk_buff *skb);
> > >>  int virtio_transport_purge_skbs(void *vsk, struct sk_buff_head *list);
> > >>  int virtio_transport_read_skb(struct vsock_sock *vsk, skb_read_actor_t 
read_actor);
> > >> +int virtio_transport_notify_set_rcvlowat(struct vsock_sock *vsk, int 
val);
> > >>  #endif /* _LINUX_VIRTIO_VSOCK_H */
> > >> diff --git a/net/vmw_vsock/virtio_transport.c 
b/net/vmw_vsock/virtio_transport.c
> > >> index af5bab1acee1..8007593a3a93 100644
> > >> --- a/net/vmw_vsock/virtio_transport.c
> > >> +++ b/net/vmw_vsock/virtio_transport.c
> > >> @@ -539,6 +539,7 @@ static struct virtio_transport virtio_transport = {
> > >>  .notify_buffer_size   = virtio_transport_notify_buffer_size,
> > >>
> > >>  .read_skb = virtio_transport_read_skb,
> > >> +    .notify_set_rcvlowat  = virtio_transport_notify_set_rcvlowat
> > >>  },
> > >>
> > >>  .send_pkt = virtio_transport_send_pkt,
> > >> diff --git a/net/vmw_vsock/virtio_transport_common.c 
b/net/vmw_vsock/virtio_transport_common.c
> > >> index f6dc896bf44c..1cb556ad4597 100644
> > >> --- a/net/vmw_vsock/virtio_transport_common.c
> > >> +++ b/net/vmw_vsock/virtio_transport_common.c
> > >> @@ -1684,6 +1684,33 @@ int virtio_transport_read_skb(struct vsock_sock 
*vsk, skb_read_actor_t recv_acto
> > >>  }
> > >>  EXPORT_SYMBOL_GPL(virtio_transport_read_skb);
> > >>
> > >> +int virtio_transport_notify_set_rcvlowat(struct vsock_sock *vsk,
> > >> int val)
> > >> +{
> > >> +    struct virtio_vsock_sock *vvs = vsk->trans;
> > >> +    bool send_update;
> > >> +
> > >> +    spin_lock_bh(>rx_lock);
> > >> +
> > >> +    /* If number of available bytes is less than new SO_RCVLOWAT value,
> > >> + * kick sender to send more data, because sender may sleep in
> > >> its
> > >> + * 'send()' syscall waiting for enough space at our side.
> > >> + */
> > >> +    send_update = vvs->rx_bytes < val;
> > >> +
> > >> +    spin_unlock_bh(>rx_lock);
> > >> +
> > >> +    if (send_update) {
> > >> +    int err;
> > >> +
> > >> +    err = virtio_transport_send_credit_update(vsk);
> > >> +    if (err < 0)
> > >> +    return err;
> > >> +    }
> > >> +
> > >> +    return 0;
> > >> +}
> > >
> > >
> > > I find it strange that this will send a credit update
> > > even if nothing changed since this was called previously.
>

[PATCH v3 3/3] arm64: dts: qcom: qcm6490-fairphone-fp5: Enable venus node

2023-12-01 Thread Luca Weiss

Enable the venus node so that the video encoder/decoder will start
working.

Reviewed-by: Konrad Dybcio 
Reviewed-by: Bryan O'Donoghue 
Reviewed-by: Vikash Garodia 
Signed-off-by: Luca Weiss 
---
 arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts 
b/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts
index 762c5db29520..cc092735ce17 100644
--- a/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts
+++ b/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts
@@ -688,3 +688,8 @@ _1_qmpphy {
 
status = "okay";
 };
+
+ {
+   firmware-name = "qcom/qcm6490/fairphone5/venus.mbn";
+   status = "okay";
+};

-- 
2.43.0

[PATCH v3 2/3] arm64: dts: qcom: sc7280: Move video-firmware to chrome-common

2023-12-01 Thread Luca Weiss

If the video-firmware node is present, the venus driver assumes we're on
a system that doesn't use TZ for starting venus, like on ChromeOS
devices.

Move the video-firmware node to chrome-common.dtsi so we can use venus
on a non-ChromeOS devices. We also need to move the secure SID 0x2184
for iommu since (on some boards) we cannot touch that.

At the same time also disable the venus node by default in the dtsi,
like it's done on other SoCs.

Reviewed-by: Bryan O'Donoghue 
Signed-off-by: Luca Weiss 
---
 arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi | 11 +++
 arch/arm64/boot/dts/qcom/sc7280.dtsi   |  9 +++--
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi 
b/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi
index 5d462ae14ba1..459ff877df54 100644
--- a/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi
+++ b/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi
@@ -104,6 +104,17 @@  {
dma-coherent;
 };
 
+ {
+   iommus = <_smmu 0x2180 0x20>,
+<_smmu 0x2184 0x20>;
+
+   status = "okay";
+
+   video-firmware {
+   iommus = <_smmu 0x21a2 0x0>;
+   };
+};
+
  {
status = "okay";
 };
diff --git a/arch/arm64/boot/dts/qcom/sc7280.dtsi 
b/arch/arm64/boot/dts/qcom/sc7280.dtsi
index 326897af117a..0ff9a2484096 100644
--- a/arch/arm64/boot/dts/qcom/sc7280.dtsi
+++ b/arch/arm64/boot/dts/qcom/sc7280.dtsi
@@ -3836,10 +3836,11 @@ venus: video-codec@aa0 {
<_noc MASTER_VIDEO_P0 0 _virt 
SLAVE_EBI1 0>;
interconnect-names = "cpu-cfg", "video-mem";
 
-   iommus = <_smmu 0x2180 0x20>,
-<_smmu 0x2184 0x20>;
+   iommus = <_smmu 0x2180 0x20>;
memory-region = <_mem>;
 
+   status = "disabled";
+
video-decoder {
compatible = "venus-decoder";
};
@@ -3848,10 +3849,6 @@ video-encoder {
compatible = "venus-encoder";
};
 
-   video-firmware {
-   iommus = <_smmu 0x21a2 0x0>;
-   };
-
venus_opp_table: opp-table {
compatible = "operating-points-v2";
 

-- 
2.43.0

[PATCH v3 1/3] media: venus: core: Set up secure memory ranges for SC7280

2023-12-01 Thread Luca Weiss

Not all SC7280 devices ship with ChromeOS firmware. Other devices need
PAS for image authentication. That requires the predefined virtual
address ranges to be passed via scm calls. Define them to enable Venus
on non-CrOS SC7280 devices.

Reviewed-by: Konrad Dybcio 
Reviewed-by: Bryan O'Donoghue 
Reviewed-by: Vikash Garodia 
Signed-off-by: Luca Weiss 
---
 drivers/media/platform/qcom/venus/core.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/media/platform/qcom/venus/core.c 
b/drivers/media/platform/qcom/venus/core.c
index 9cffe975581b..a712dd4f02a5 100644
--- a/drivers/media/platform/qcom/venus/core.c
+++ b/drivers/media/platform/qcom/venus/core.c
@@ -881,6 +881,10 @@ static const struct venus_resources sc7280_res = {
.vmem_size = 0,
.vmem_addr = 0,
.dma_mask = 0xe000 - 1,
+   .cp_start = 0,
+   .cp_size = 0x2580,
+   .cp_nonpixel_start = 0x100,
+   .cp_nonpixel_size = 0x2480,
.fwname = "qcom/vpu-2.0/venus.mbn",
 };
 

-- 
2.43.0

[PATCH v3 0/3] Enable venus on Fairphone 5 / non-ChromeOS sc7280 venus support

2023-12-01 Thread Luca Weiss

Devices with Qualcomm firmware (compared to ChromeOS firmware) need some
changes in the venus driver and dts layout so that venus can initialize.

Do these changes, similar to sc7180.

Signed-off-by: Luca Weiss 
---
Changes in v3:
- Move 0x2184 iommu from sc7280.dtsi to sc7280-chrome-common.dtsi since
  it seems to cause crash on some boards (Vikash)
- Pick up tags
- Link to v2: 
https://lore.kernel.org/r/20231002-sc7280-venus-pas-v2-0-bd2408891...@fairphone.com

Changes in v2:
- Reword commit message 2/3 to be clearer (Konrad)
- Link to v1: 
https://lore.kernel.org/r/20230929-sc7280-venus-pas-v1-0-9c6738cf1...@fairphone.com

---
Luca Weiss (3):
  media: venus: core: Set up secure memory ranges for SC7280
  arm64: dts: qcom: sc7280: Move video-firmware to chrome-common
  arm64: dts: qcom: qcm6490-fairphone-fp5: Enable venus node

 arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts |  5 +
 arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi | 11 +++
 arch/arm64/boot/dts/qcom/sc7280.dtsi   |  9 +++--
 drivers/media/platform/qcom/venus/core.c   |  4 
 4 files changed, 23 insertions(+), 6 deletions(-)
---
base-commit: b2a4d0696192f24f79ea71fea2d775da28fb9157
change-id: 20230929-sc7280-venus-pas-ea9630525753

Best regards,
-- 
Luca Weiss

Re: [PATCH v2 2/3] arm64: dts: qcom: sc7280: Move video-firmware to chrome-common

2023-12-01 Thread Luca Weiss

On Tue Nov 28, 2023 at 9:14 AM CET, Vikash Garodia wrote:
>
> On 11/24/2023 9:26 PM, Luca Weiss wrote:
> > On Fri Nov 24, 2023 at 2:35 PM CET, Vikash Garodia wrote:
> >>
> >>
> >> On 11/24/2023 6:23 PM, Dmitry Baryshkov wrote:
> >>> On Fri, 24 Nov 2023 at 14:30, Vikash Garodia  
> >>> wrote:
> 
>  On 11/24/2023 5:05 PM, Luca Weiss wrote:
> > On Fri Nov 24, 2023 at 7:38 AM CET, Vikash Garodia wrote:
> >>
> >> On 11/22/2023 7:50 PM, Luca Weiss wrote:
> >>> On Wed Nov 22, 2023 at 2:17 PM CET, Vikash Garodia wrote:
> 
>  On 10/2/2023 7:50 PM, Luca Weiss wrote:
> > If the video-firmware node is present, the venus driver assumes 
> > we're on
> > a system that doesn't use TZ for starting venus, like on ChromeOS
> > devices.
> >
> > Move the video-firmware node to chrome-common.dtsi so we can use 
> > venus
> > on a non-ChromeOS devices.
> >
> > At the same time also disable the venus node by default in the dtsi,
> > like it's done on other SoCs.
> >
> > Reviewed-by: Bryan O'Donoghue 
> > Signed-off-by: Luca Weiss 
> > ---
> >  arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi | 8 
> >  arch/arm64/boot/dts/qcom/sc7280.dtsi   | 6 ++
> >  2 files changed, 10 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi 
> > b/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi
> > index 5d462ae14ba1..cd491e4d 100644
> > --- a/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi
> > +++ b/arch/arm64/boot/dts/qcom/sc7280-chrome-common.dtsi
> > @@ -104,6 +104,14 @@  {
> >   dma-coherent;
> >  };
> >
> > + {
> > + status = "okay";
> > +
> > + video-firmware {
> > + iommus = <_smmu 0x21a2 0x0>;
> > + };
> > +};
> > +
> >   {
> >   status = "okay";
> >  };
> > diff --git a/arch/arm64/boot/dts/qcom/sc7280.dtsi 
> > b/arch/arm64/boot/dts/qcom/sc7280.dtsi
> > index 66f1eb83cca7..fa53f54d4675 100644
> > --- a/arch/arm64/boot/dts/qcom/sc7280.dtsi
> > +++ b/arch/arm64/boot/dts/qcom/sc7280.dtsi
> > @@ -3740,6 +3740,8 @@ venus: video-codec@aa0 {
> ><_smmu 0x2184 0x20>;
> >> 0x2184 is a secure SID. I think qcm6490-fairphone-fp5.dts needs to 
> >> override the
> >> iommus property as well to retain only the non secure SID i.e 0x2180 ? 
> >> I am
> >> seeing below crash
> >>
> >> Call trace:
> >> [   47.663593]  qcom_smmu_write_s2cr+0x64/0xa4
> >> [   47.663616]  arm_smmu_attach_dev+0x120/0x284
> >> [   47.663647]  __iommu_attach_device+0x24/0xf8
> >> [   47.676845]  __iommu_device_set_domain+0x70/0xd0
> >> [   47.681632]  __iommu_group_set_domain_internal+0x60/0x1b4
> >> [   47.687218]  iommu_setup_default_domain+0x358/0x418
> >> [   47.692258]  __iommu_probe_device+0x3e4/0x404
> >>
> >> Could you please reconfirm if Video SID 0x2184 (and mask) is allowed 
> >> by the
> >> qcm6490-fairphone-fp5 hardware having TZ ?
> >
> > Hi,
> >
> > On FP5 it seems it's no problem to have both SIDs in there, probe and
> > using venus appears to work fine.
> >
> > Are you using different firmware than QCM6490.LA.3.0 on the device where
> > you tested this?
>  I was testing this on RB3 board which uses firmware [1].
> >>>
> >>> There is something wrong here.
> >>>
> >>> RB3 board uses venus-5.2
> >>> RB5 board uses vpu-1.0
> >>> Only sc7280 uses vpu-2.0
> >>
> >> Tested on QCM6490 IDP board, which is QCOM internal board similar to RB3 
> >> gen2.
> > 
> > In any case, I don't know much about the venus & iommu setup here. I can
> > try removing the 0x2184 SID and test if venus still works on FP5.
>
> Please remove 0x2184 SID and confirm specifically encoder works. This SID is 
> for
> encoder.
>
> > Also should the chromebooks keep that iommu entry or not?
> Chrome-common can have 0x2184 since its no-TZ based solution. So in 
> sc7280.dtsi,
> you can keep the default SID i.e 0x2180 (with respective mask) and in
> chrome-common, we can override the iommus property with 0x2180 and 0x2184.

Hi Vikash,

I'm moving 0x2184 to chrome-common in v3 but I couldn't test venus
encoding myself since I just don't know *how* to test it.

Would be nice if you could share how you test venus (decoding &
encoding) since seemingly nobody (at least in the postmarketOS
community) seems to know how to test/use it properly. See also
https://wiki.postmarketos.org/wiki/Hardware_video_acceleration

Regards
Luca

>
> Regards,
> Vikash
>
> > Regards
> > Luca
> > 
> >>
> 
>  Regards,
>  Vikash
> 
>  [1]
>

Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER

2023-12-01 Thread Ard Biesheuvel

On Fri, 1 Dec 2023 at 00:48, Justin Chen  wrote:
>
> Hello,
>
> Ran into an odd bug that I am unsure what the solution is. Tested a few
> kernels versions and they all fail the same.
>
> FUNCTION_GRAPH_FP_TEST was enabled with 953f534a7ed6 ("ARM: ftrace:
> enable HAVE_FUNCTION_GRAPH_FP_TEST"). This test fails when
> UNWINDER_FRAME_POINTER is enabled. Enable function_graph tracer and you
> should see a failure similar to below.
>
> [   63.817239] [ cut here ]
> [   63.822006] WARNING: CPU: 3 PID: 1185 at kernel/trace/fgraph.c:195
> ftrace_return_to_handler+0x228/0x374
> [   63.831645] Bad frame pointer: expected d1e0df40, received d1e0df48
> [   63.831645]   from func packet_setsockopt return to c0b558f4
> [   63.843801] Modules linked in: bdc udc_core
> [   63.848246] CPU: 3 PID: 1185 Comm: udhcpc Not tainted
> 6.1.53-0.1pre-gf0bc552d12f8 #33
> [   63.856209] Hardware name: Broadcom STB (Flattened Device Tree)
> [   63.862227] Backtrace:
> [   63.864761]  dump_backtrace from show_stack+0x20/0x24
> [   63.869982]  r7:c031cd8c r6:0009 r5:0013 r4:c11c7fac
> [   63.875736]  show_stack from dump_stack_lvl+0x48/0x54
> [   63.880929]  dump_stack_lvl from dump_stack+0x18/0x1c
> [   63.886111]  r5:00c3 r4:c11bd92c
> [   63.889764]  dump_stack from __warn+0x88/0x130
> [   63.894339]  __warn from warn_slowpath_fmt+0x140/0x198
> [   63.899631]  r8:d1e0deac r7:c11bd958 r6:c031cd8c r5:c11bd92c r4:
> [   63.906431]  warn_slowpath_fmt from ftrace_return_to_handler+0x228/0x374
> [   63.913294]  r8:c3a8d840 r7:0002 r6:d1e0df48 r5:c2377a94 r4:c269a400
> [   63.920095]  ftrace_return_to_handler from return_to_handler+0xc/0x18
> [   63.926699]  r8:c0cd8ed0 r7:0008 r6:c418c500 r5:0004 r4:0107
> [   63.933500]  __sys_setsockopt from return_to_handler+0x0/0x18
> [   63.939415]  r8:c02002bc r7:0126 r6:0003 r5: r4:0004
> [   63.946217]  sys_setsockopt from return_to_handler+0x0/0x18
> [   63.952053] ---[ end trace  ]---
>
> Sure enough the top of the parent stack is off by 8. (Tested with
> gcc6.3/gcc8.3/gcc12.3)
> 6dcc :
>  6dcc:   e1a0c00dmov ip, sp
>  6dd0:   e24dd008sub sp, sp, #8 <==
>  6dd4:   e92ddff0push{r4, r5, r6, r7, r8, r9, sl,
> fp, ip, lr, pc}
>  6dd8:   e24cb00csub fp, ip, #12
>  6ddc:   e24dd06csub sp, sp, #108@ 0x6c
>  6de0:   e52de004push{lr}@ (str lr, [sp,
> #-4]!)
>  6de4:   ebfebl  0 <__gnu_mcount_nc>
>
> I'm not quite sure why gcc is putting this extra 8 byte frame (maybe
> some optimization?), but it isn't being accounted for thus the
> FUNCTION_GRAPH_FP_TEST for arm fails. Note that only some functions do
> this. Function graph works with FUNCTION_GRAPH_FP_TEST disabled, so it
> looks the test is hitting false positives.
>

Thanks for the report.

It appears the sub instruction at 0x6dd0 correctly accounts for the
extra 8 bytes, so the frame pointer is valid. So it is our assumption
that there are no gaps between the stack frames is invalid.

Could you try the following change please?

--- a/arch/arm/kernel/ftrace.c
+++ b/arch/arm/kernel/ftrace.c
@@ -235,8 +235,12 @@
return;

if (IS_ENABLED(CONFIG_UNWINDER_FRAME_POINTER)) {
-   /* FP points one word below parent's top of stack */
-   frame_pointer += 4;
+   /*
+* The top of stack of the parent is recorded in the stack
+* frame at offset [fp, #-8].
+*/
+   get_kernel_nofault(frame_pointer,
+  (unsigned long *)(frame_pointer - 8));
} else {
struct stackframe frame = {
.fp = frame_pointer,

Re: [PATCH net-next v5 2/3] virtio/vsock: send credit update during setting SO_RCVLOWAT

2023-12-01 Thread Arseniy Krasnov




On 01.12.2023 11:27, Stefano Garzarella wrote:
> On Thu, Nov 30, 2023 at 12:40:43PM -0500, Michael S. Tsirkin wrote:
>> On Thu, Nov 30, 2023 at 03:11:19PM +0100, Stefano Garzarella wrote:
>>> On Thu, Nov 30, 2023 at 08:58:58AM -0500, Michael S. Tsirkin wrote:
>>> > On Thu, Nov 30, 2023 at 04:43:34PM +0300, Arseniy Krasnov wrote:
>>> > >
>>> > >
>>> > > On 30.11.2023 16:42, Michael S. Tsirkin wrote:
>>> > > > On Thu, Nov 30, 2023 at 04:08:39PM +0300, Arseniy Krasnov wrote:
>>> > > >> Send credit update message when SO_RCVLOWAT is updated and it is 
>>> > > >> bigger
>>> > > >> than number of bytes in rx queue. It is needed, because 'poll()' will
>>> > > >> wait until number of bytes in rx queue will be not smaller than
>>> > > >> SO_RCVLOWAT, so kick sender to send more data. Otherwise mutual 
>>> > > >> hungup
>>> > > >> for tx/rx is possible: sender waits for free space and receiver is
>>> > > >> waiting data in 'poll()'.
>>> > > >>
>>> > > >> Signed-off-by: Arseniy Krasnov 
>>> > > >> ---
>>> > > >>  Changelog:
>>> > > >>  v1 -> v2:
>>> > > >>   * Update commit message by removing 'This patch adds XXX' manner.
>>> > > >>   * Do not initialize 'send_update' variable - set it directly during
>>> > > >> first usage.
>>> > > >>  v3 -> v4:
>>> > > >>   * Fit comment in 'virtio_transport_notify_set_rcvlowat()' to 80 
>>> > > >>chars.
>>> > > >>  v4 -> v5:
>>> > > >>   * Do not change callbacks order in transport structures.
>>> > > >>
>>> > > >>  drivers/vhost/vsock.c   |  1 +
>>> > > >>  include/linux/virtio_vsock.h    |  1 +
>>> > > >>  net/vmw_vsock/virtio_transport.c    |  1 +
>>> > > >>  net/vmw_vsock/virtio_transport_common.c | 27 
>>> > > >>+
>>> > > >>  net/vmw_vsock/vsock_loopback.c  |  1 +
>>> > > >>  5 files changed, 31 insertions(+)
>>> > > >>
>>> > > >> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>>> > > >> index f75731396b7e..4146f80db8ac 100644
>>> > > >> --- a/drivers/vhost/vsock.c
>>> > > >> +++ b/drivers/vhost/vsock.c
>>> > > >> @@ -451,6 +451,7 @@ static struct virtio_transport vhost_transport = 
>>> > > >> {
>>> > > >>  .notify_buffer_size   = 
>>> > > >>virtio_transport_notify_buffer_size,
>>> > > >>
>>> > > >>  .read_skb = virtio_transport_read_skb,
>>> > > >> +    .notify_set_rcvlowat  = 
>>> > > >> virtio_transport_notify_set_rcvlowat
>>> > > >>  },
>>> > > >>
>>> > > >>  .send_pkt = vhost_transport_send_pkt,
>>> > > >> diff --git a/include/linux/virtio_vsock.h 
>>> > > >> b/include/linux/virtio_vsock.h
>>> > > >> index ebb3ce63d64d..c82089dee0c8 100644
>>> > > >> --- a/include/linux/virtio_vsock.h
>>> > > >> +++ b/include/linux/virtio_vsock.h
>>> > > >> @@ -256,4 +256,5 @@ void virtio_transport_put_credit(struct 
>>> > > >> virtio_vsock_sock *vvs, u32 credit);
>>> > > >>  void virtio_transport_deliver_tap_pkt(struct sk_buff *skb);
>>> > > >>  int virtio_transport_purge_skbs(void *vsk, struct sk_buff_head 
>>> > > >>*list);
>>> > > >>  int virtio_transport_read_skb(struct vsock_sock *vsk, 
>>> > > >>skb_read_actor_t read_actor);
>>> > > >> +int virtio_transport_notify_set_rcvlowat(struct vsock_sock *vsk, 
>>> > > >> int val);
>>> > > >>  #endif /* _LINUX_VIRTIO_VSOCK_H */
>>> > > >> diff --git a/net/vmw_vsock/virtio_transport.c 
>>> > > >> b/net/vmw_vsock/virtio_transport.c
>>> > > >> index af5bab1acee1..8007593a3a93 100644
>>> > > >> --- a/net/vmw_vsock/virtio_transport.c
>>> > > >> +++ b/net/vmw_vsock/virtio_transport.c
>>> > > >> @@ -539,6 +539,7 @@ static struct virtio_transport virtio_transport 
>>> > > >> = {
>>> > > >>  .notify_buffer_size   = 
>>> > > >>virtio_transport_notify_buffer_size,
>>> > > >>
>>> > > >>  .read_skb = virtio_transport_read_skb,
>>> > > >> +    .notify_set_rcvlowat  = 
>>> > > >> virtio_transport_notify_set_rcvlowat
>>> > > >>  },
>>> > > >>
>>> > > >>  .send_pkt = virtio_transport_send_pkt,
>>> > > >> diff --git a/net/vmw_vsock/virtio_transport_common.c 
>>> > > >> b/net/vmw_vsock/virtio_transport_common.c
>>> > > >> index f6dc896bf44c..1cb556ad4597 100644
>>> > > >> --- a/net/vmw_vsock/virtio_transport_common.c
>>> > > >> +++ b/net/vmw_vsock/virtio_transport_common.c
>>> > > >> @@ -1684,6 +1684,33 @@ int virtio_transport_read_skb(struct 
>>> > > >> vsock_sock *vsk, skb_read_actor_t recv_acto
>>> > > >>  }
>>> > > >>  EXPORT_SYMBOL_GPL(virtio_transport_read_skb);
>>> > > >>
>>> > > >> +int virtio_transport_notify_set_rcvlowat(struct vsock_sock *vsk,
>>> > > >> int val)
>>> > > >> +{
>>> > > >> +    struct virtio_vsock_sock *vvs = vsk->trans;
>>> > > >> +    bool send_update;
>>> > > >> +
>>> > > >> +    spin_lock_bh(>rx_lock);
>>> > > >> +
>>> > > >> +    /* If number of available bytes is less than new SO_RCVLOWAT 
>>> > > >> value,
>>> > > >> + * kick sender to send more data, because sender may sleep in
>>> > > >> its
>>> > > >> + * 'send()' syscall waiting for enough space at our side.
>>> >

Re: [PATCH net-next v5 2/3] virtio/vsock: send credit update during setting SO_RCVLOWAT

2023-12-01 Thread Stefano Garzarella


On Thu, Nov 30, 2023 at 12:40:43PM -0500, Michael S. Tsirkin wrote:

On Thu, Nov 30, 2023 at 03:11:19PM +0100, Stefano Garzarella wrote:

On Thu, Nov 30, 2023 at 08:58:58AM -0500, Michael S. Tsirkin wrote:
> On Thu, Nov 30, 2023 at 04:43:34PM +0300, Arseniy Krasnov wrote:
> >
> >
> > On 30.11.2023 16:42, Michael S. Tsirkin wrote:
> > > On Thu, Nov 30, 2023 at 04:08:39PM +0300, Arseniy Krasnov wrote:
> > >> Send credit update message when SO_RCVLOWAT is updated and it is bigger
> > >> than number of bytes in rx queue. It is needed, because 'poll()' will
> > >> wait until number of bytes in rx queue will be not smaller than
> > >> SO_RCVLOWAT, so kick sender to send more data. Otherwise mutual hungup
> > >> for tx/rx is possible: sender waits for free space and receiver is
> > >> waiting data in 'poll()'.
> > >>
> > >> Signed-off-by: Arseniy Krasnov 
> > >> ---
> > >>  Changelog:
> > >>  v1 -> v2:
> > >>   * Update commit message by removing 'This patch adds XXX' manner.
> > >>   * Do not initialize 'send_update' variable - set it directly during
> > >> first usage.
> > >>  v3 -> v4:
> > >>   * Fit comment in 'virtio_transport_notify_set_rcvlowat()' to 80 chars.
> > >>  v4 -> v5:
> > >>   * Do not change callbacks order in transport structures.
> > >>
> > >>  drivers/vhost/vsock.c   |  1 +
> > >>  include/linux/virtio_vsock.h|  1 +
> > >>  net/vmw_vsock/virtio_transport.c|  1 +
> > >>  net/vmw_vsock/virtio_transport_common.c | 27 +
> > >>  net/vmw_vsock/vsock_loopback.c  |  1 +
> > >>  5 files changed, 31 insertions(+)
> > >>
> > >> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > >> index f75731396b7e..4146f80db8ac 100644
> > >> --- a/drivers/vhost/vsock.c
> > >> +++ b/drivers/vhost/vsock.c
> > >> @@ -451,6 +451,7 @@ static struct virtio_transport vhost_transport = {
> > >>  .notify_buffer_size   = 
virtio_transport_notify_buffer_size,
> > >>
> > >>  .read_skb = virtio_transport_read_skb,
> > >> +.notify_set_rcvlowat  = 
virtio_transport_notify_set_rcvlowat
> > >>  },
> > >>
> > >>  .send_pkt = vhost_transport_send_pkt,
> > >> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > >> index ebb3ce63d64d..c82089dee0c8 100644
> > >> --- a/include/linux/virtio_vsock.h
> > >> +++ b/include/linux/virtio_vsock.h
> > >> @@ -256,4 +256,5 @@ void virtio_transport_put_credit(struct 
virtio_vsock_sock *vvs, u32 credit);
> > >>  void virtio_transport_deliver_tap_pkt(struct sk_buff *skb);
> > >>  int virtio_transport_purge_skbs(void *vsk, struct sk_buff_head *list);
> > >>  int virtio_transport_read_skb(struct vsock_sock *vsk, skb_read_actor_t 
read_actor);
> > >> +int virtio_transport_notify_set_rcvlowat(struct vsock_sock *vsk, int 
val);
> > >>  #endif /* _LINUX_VIRTIO_VSOCK_H */
> > >> diff --git a/net/vmw_vsock/virtio_transport.c 
b/net/vmw_vsock/virtio_transport.c
> > >> index af5bab1acee1..8007593a3a93 100644
> > >> --- a/net/vmw_vsock/virtio_transport.c
> > >> +++ b/net/vmw_vsock/virtio_transport.c
> > >> @@ -539,6 +539,7 @@ static struct virtio_transport virtio_transport = {
> > >>  .notify_buffer_size   = 
virtio_transport_notify_buffer_size,
> > >>
> > >>  .read_skb = virtio_transport_read_skb,
> > >> +.notify_set_rcvlowat  = 
virtio_transport_notify_set_rcvlowat
> > >>  },
> > >>
> > >>  .send_pkt = virtio_transport_send_pkt,
> > >> diff --git a/net/vmw_vsock/virtio_transport_common.c 
b/net/vmw_vsock/virtio_transport_common.c
> > >> index f6dc896bf44c..1cb556ad4597 100644
> > >> --- a/net/vmw_vsock/virtio_transport_common.c
> > >> +++ b/net/vmw_vsock/virtio_transport_common.c
> > >> @@ -1684,6 +1684,33 @@ int virtio_transport_read_skb(struct vsock_sock 
*vsk, skb_read_actor_t recv_acto
> > >>  }
> > >>  EXPORT_SYMBOL_GPL(virtio_transport_read_skb);
> > >>
> > >> +int virtio_transport_notify_set_rcvlowat(struct vsock_sock *vsk,
> > >> int val)
> > >> +{
> > >> +struct virtio_vsock_sock *vvs = vsk->trans;
> > >> +bool send_update;
> > >> +
> > >> +spin_lock_bh(>rx_lock);
> > >> +
> > >> +/* If number of available bytes is less than new SO_RCVLOWAT value,
> > >> + * kick sender to send more data, because sender may sleep in
> > >> its
> > >> + * 'send()' syscall waiting for enough space at our side.
> > >> + */
> > >> +send_update = vvs->rx_bytes < val;
> > >> +
> > >> +spin_unlock_bh(>rx_lock);
> > >> +
> > >> +if (send_update) {
> > >> +int err;
> > >> +
> > >> +err = virtio_transport_send_credit_update(vsk);
> > >> +if (err < 0)
> > >> +return err;
> > >> +}
> > >> +
> > >> +return 0;
> > >> +}
> > >
> > >
> > > I find it strange that this will send a credit update
> > > even if nothing changed since this was called previously.
> > > I'm not sure whether this is a problem protocol-wise,
> > > but

44 matches

Mail list logo