Re: [PATCH] vhost/vsock: always initialize seqpacket_allow

2024-06-26 Thread 박정준
nice patch

> 2024. 5. 16. 오전 12:05, Michael S. Tsirkin  작성:
> 
> There are two issues around seqpacket_allow:
> 1. seqpacket_allow is not initialized when socket is
>   created. Thus if features are never set, it will be
>   read uninitialized.
> 2. if VIRTIO_VSOCK_F_SEQPACKET is set and then cleared,
>   then seqpacket_allow will not be cleared appropriately
>   (existing apps I know about don't usually do this but
>it's legal and there's no way to be sure no one relies
>on this).
> 
> To fix:
>- initialize seqpacket_allow after allocation
>- set it unconditionally in set_features
> 
> Reported-by: syzbot+6c21aeb59d0e82eb2...@syzkaller.appspotmail.com
> Reported-by: Jeongjun Park 
> Fixes: ced7b713711f ("vhost/vsock: support SEQPACKET for transport").
> Cc: Arseny Krasnov 
> Cc: David S. Miller 
> Cc: Stefan Hajnoczi 
> Signed-off-by: Michael S. Tsirkin 
> Acked-by: Arseniy Krasnov 
> Tested-by: Arseniy Krasnov 
> 
> ---
> 
> 
> Reposting now it's been tested.
> 
> drivers/vhost/vsock.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index ec20ecff85c7..bf664ec9341b 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -667,6 +667,7 @@ static int vhost_vsock_dev_open(struct inode *inode, 
> struct file *file)
>}
> 
>vsock->guest_cid = 0; /* no CID assigned yet */
> +vsock->seqpacket_allow = false;
> 
>atomic_set(&vsock->queued_replies, 0);
> 
> @@ -810,8 +811,7 @@ static int vhost_vsock_set_features(struct vhost_vsock 
> *vsock, u64 features)
>goto err;
>}
> 
> -if (features & (1ULL << VIRTIO_VSOCK_F_SEQPACKET))
> -vsock->seqpacket_allow = true;
> +vsock->seqpacket_allow = features & (1ULL << VIRTIO_VSOCK_F_SEQPACKET);
> 
>for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) {
>vq = &vsock->vqs[i];
> --
> MST
> 



Re: [PATCH] remoteproc: mediatek: Increase MT8188 SCP core0 DRAM size

2024-06-26 Thread 陳建豪
Hi Mathieu,

Sorry for the late response.

On Mon, 2024-06-10 at 10:34 -0600, Mathieu Poirier wrote:
>
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On Thu, Jun 06, 2024 at 01:00:11PM +0200, AngeloGioacchino Del Regno
> wrote:
> > Il 06/06/24 11:06, jason-ch chen ha scritto:
> > > From: Jason Chen 
> > > 
> > > Increase MT8188 SCP core0 DRAM size for HEVC driver.
> 
> This is telling me _what_ gets done rather than _why_ it gets done.
> 
I will modify the commit message in the next version.

> > > 
> > 
> > so the second core only gets a maximum of 0x20 of SRAM?
> > I'm not sure how useful is the secondary SCP core, at this point,
> with so little
> > available SRAM but... okay, as you wish.
> > 
> > Reviewed-by: AngeloGioacchino Del Regno <
> angelogioacchino.delre...@collabora.com>
> > 
> > 
> > > Signed-off-by: Jason Chen 
> > > ---
> > >   drivers/remoteproc/mtk_scp.c | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/remoteproc/mtk_scp.c
> b/drivers/remoteproc/mtk_scp.c
> > > index b885a9a041e4..2119fc62c3f2 100644
> > > --- a/drivers/remoteproc/mtk_scp.c
> > > +++ b/drivers/remoteproc/mtk_scp.c
> > > @@ -1390,7 +1390,7 @@ static const struct mtk_scp_sizes_data
> default_scp_sizes = {
> > >   };
> > >   static const struct mtk_scp_sizes_data mt8188_scp_sizes = {
> > > -.max_dram_size = 0x50,
> > > +.max_dram_size = 0x80,
> 
> Do you require to fix a "reserved-memory" node in a device tree file
> to account
> for this?

Using a "reserved-memory" node to calculate max_dram_size presents
challenges due to alignment requirements in dma_alloc_coherent().

For example, 

static const struct mtk_scp_sizes_data mt8188_scp_c1_sizes = {
.max_dram_size = 0xA0,
.ipi_share_buffer_size = 600,
};

We require 2560 pages (10M) for SCP core1 usage, but alignment
constraints necessitate searching for a free region of 2^12 pages
(16M). This misalignment between the reserved 10M and the required 16M
prevents successful allocation.

Adjusting the reserved memory to 16M for core1 would lead to a 6M
wastage. To avoid this, reserving a larger memory block is advisable.
This block can be partially used by SCP core1, with the remainder
allocated to feature drivers. Consequently, setting the max_dram_size
in SCP configurations is a practical solution to meet these
requirements.

Thanks,
Jason
> 
> Thanks,
> Mathieu
> 
> > >   .ipi_share_buffer_size = 600,
> > >   };
> > 
> > 


Re: [PATCH] tools/testing/radix-tree/idr-test: add missing MODULE_DESCRIPTION define

2024-06-26 Thread Liam R. Howlett
Thanks.  These (this one and two previous patches for xarray and maple)
are needed for our testing to compile.

* Sidhartha Kumar  [240626 19:21]:
> Userspace builds of the radix-tree testing suite fails because of patch
> KUnit: add missing MODULE_DESCRIPTION() macros for lib/test_*.ko. Add the
> proper defines to  tools/testing/radix-tree/idr-test.c so
> MODULE_DESCRIPTION has a definition. This allows the build to succeed.
> 
> Fixes: 303474913271("KUnit: add missing MODULE_DESCRIPTION() macros for 
> lib/test_*.ko")
> Signed-off-by: Sidhartha Kumar 

Reviewed-by: Liam R. Howlett 

> ---
> 
> This patch is based on next-20240626
> 
>  tools/testing/radix-tree/idr-test.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tools/testing/radix-tree/idr-test.c 
> b/tools/testing/radix-tree/idr-test.c
> index ca24f6839d50..84b8c3c92c79 100644
> --- a/tools/testing/radix-tree/idr-test.c
> +++ b/tools/testing/radix-tree/idr-test.c
> @@ -424,6 +424,7 @@ void idr_checks(void)
>  #define module_init(x)
>  #define module_exit(x)
>  #define MODULE_AUTHOR(x)
> +#define MODULE_DESCRIPTION(X)
>  #define MODULE_LICENSE(x)
>  #define dump_stack()assert(0)
>  void ida_dump(struct ida *);
> -- 
> 2.45.2
> 



[PATCH] tools/testing/radix-tree/idr-test: add missing MODULE_DESCRIPTION define

2024-06-26 Thread Sidhartha Kumar
Userspace builds of the radix-tree testing suite fails because of patch
KUnit: add missing MODULE_DESCRIPTION() macros for lib/test_*.ko. Add the
proper defines to  tools/testing/radix-tree/idr-test.c so
MODULE_DESCRIPTION has a definition. This allows the build to succeed.

Fixes: 303474913271("KUnit: add missing MODULE_DESCRIPTION() macros for 
lib/test_*.ko")
Signed-off-by: Sidhartha Kumar 
---

This patch is based on next-20240626

 tools/testing/radix-tree/idr-test.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/testing/radix-tree/idr-test.c 
b/tools/testing/radix-tree/idr-test.c
index ca24f6839d50..84b8c3c92c79 100644
--- a/tools/testing/radix-tree/idr-test.c
+++ b/tools/testing/radix-tree/idr-test.c
@@ -424,6 +424,7 @@ void idr_checks(void)
 #define module_init(x)
 #define module_exit(x)
 #define MODULE_AUTHOR(x)
+#define MODULE_DESCRIPTION(X)
 #define MODULE_LICENSE(x)
 #define dump_stack()assert(0)
 void ida_dump(struct ida *);
-- 
2.45.2




Re: [PATCH 2/3] arm64: dts: qcom: msm8916-lg-m216: Add initial device tree

2024-06-26 Thread Konrad Dybcio
On 23.06.2024 11:26 AM, Nikita Travkin wrote:
> From: Cristian Cozzolino 
> 
> This commit adds initial support for the LG K10 smartphone.
> 
> Support for the following features is included:
> 
> - Serial
> - Keys
> - Battery and charger
> - Accelerometer, magnetometer
> - Touchscreen
> - Sound and modem
> - Haptic
> 
> Signed-off-by: Cristian Cozzolino 
> [Nikita: Minor cleanup]
> Signed-off-by: Nikita Travkin 
> ---

That touchscreen binding could use some YAMLing 

Reviewed-by: Konrad Dybcio 

Konrad



Re: [PATCH 2/2] remoteproc: qcom: select AUXILIARY_BUS

2024-06-26 Thread Chris Lew




On 6/26/2024 12:12 PM, Dmitry Baryshkov wrote:

The QCOM_PD_MAPPER implementation made Qualcomm remoteproc drivers use
auxiliary bus for the pd-mapper subdevice. Add necessary dependency.

Reported-by: Mark Brown 
Fixes: 5b9f51b200dc ("remoteproc: qcom: enable in-kernel PD mapper")
Signed-off-by: Dmitry Baryshkov 
---
  drivers/remoteproc/Kconfig | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/remoteproc/Kconfig b/drivers/remoteproc/Kconfig
index 48845dc8fa85..dda2ada215b7 100644
--- a/drivers/remoteproc/Kconfig
+++ b/drivers/remoteproc/Kconfig
@@ -166,6 +166,7 @@ config QCOM_PIL_INFO
  
  config QCOM_RPROC_COMMON

tristate
+   select AUXILIARY_BUS
  
  config QCOM_Q6V5_COMMON

tristate




Reviewed-by: Chris Lew 



Re: [PATCH 1/2] soc: qcom: add missing pd-mapper dependencies

2024-06-26 Thread Chris Lew




On 6/26/2024 12:12 PM, Dmitry Baryshkov wrote:

The pd-mapper driver uses auxiliary bus and Qualcomm PDR message format
data. Add missing dependencies to the driver's Kconfig entry.

Reported-by: Mark Brown 
Fixes: 1ebcde047c54 ("soc: qcom: add pd-mapper implementation")
Signed-off-by: Dmitry Baryshkov 
---
  drivers/soc/qcom/Kconfig | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/drivers/soc/qcom/Kconfig b/drivers/soc/qcom/Kconfig
index 0a2f2bfd7863..432c85bd8ad4 100644
--- a/drivers/soc/qcom/Kconfig
+++ b/drivers/soc/qcom/Kconfig
@@ -75,6 +75,8 @@ config QCOM_OCMEM
  config QCOM_PD_MAPPER
tristate "Qualcomm Protection Domain Mapper"
select QCOM_QMI_HELPERS
+   select QCOM_PDR_MSG
+   select AUXILIARY_BUS
depends on NET && QRTR
default QCOM_RPROC_COMMON
help




Reviewed-by: Chris Lew 



Re: [PATCH v9 1/8] remoteproc: qcom: Add PRNG proxy clock

2024-06-26 Thread Dmitry Baryshkov
On Tue, Jun 25, 2024 at 11:03:30AM GMT, Gokul Sriram P wrote:
> 
> On 6/22/2024 2:38 AM, Dmitry Baryshkov wrote:
> > On Fri, Jun 21, 2024 at 05:16:52PM GMT, Gokul Sriram Palanisamy wrote:
> > > PRNG clock is needed by the secure PIL, support for the same
> > > is added in subsequent patches.
> > Which 'same'?
> > What is 'secure PIL'?
>   will elaborate in the updated version.
>   To answer your question, secure PIL is signed PIL image which only
> TrustZone can authenticate and load.

Fine. So, the current driver can not load WCSS firmware on IPQ8074, is
that correct? Or was there some kind of firmware interface change? The
driver was added in 2018, so I can only hope that at that point it
worked. Could you please explain, what happened?

> > > Signed-off-by: Nikhil Prakash V 
> > > Signed-off-by: Sricharan R 
> > > Signed-off-by: Gokul Sriram Palanisamy 
> > > ---
> > >   drivers/remoteproc/qcom_q6v5_wcss.c | 65 +
> > >   1 file changed, 47 insertions(+), 18 deletions(-)


-- 
With best wishes
Dmitry



[PATCH v2] remoteproc: k3-dsp: Fix log levels where appropriate

2024-06-26 Thread Garrett Giordano
Driver was logging information as errors. Changed dev_err to dev_dbg
where appropriate.

Signed-off-by: Garrett Giordano 
---
-v2
  - Change from dev_info to dev_dbg
  - Drop k3-r5 PATCH
---
 drivers/remoteproc/ti_k3_dsp_remoteproc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/remoteproc/ti_k3_dsp_remoteproc.c 
b/drivers/remoteproc/ti_k3_dsp_remoteproc.c
index 3555b535b168..a22d41689a7d 100644
--- a/drivers/remoteproc/ti_k3_dsp_remoteproc.c
+++ b/drivers/remoteproc/ti_k3_dsp_remoteproc.c
@@ -327,7 +327,7 @@ static int k3_dsp_rproc_start(struct rproc *rproc)
goto put_mbox;
}

-   dev_err(dev, "booting DSP core using boot addr = 0x%x\n", boot_addr);
+   dev_dbg(dev, "booting DSP core using boot addr = 0x%x\n", boot_addr);
ret = ti_sci_proc_set_config(kproc->tsp, boot_addr, 0, 0);
if (ret)
goto put_mbox;
--
2.25.1




[PATCH 2/2] remoteproc: qcom: select AUXILIARY_BUS

2024-06-26 Thread Dmitry Baryshkov
The QCOM_PD_MAPPER implementation made Qualcomm remoteproc drivers use
auxiliary bus for the pd-mapper subdevice. Add necessary dependency.

Reported-by: Mark Brown 
Fixes: 5b9f51b200dc ("remoteproc: qcom: enable in-kernel PD mapper")
Signed-off-by: Dmitry Baryshkov 
---
 drivers/remoteproc/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/remoteproc/Kconfig b/drivers/remoteproc/Kconfig
index 48845dc8fa85..dda2ada215b7 100644
--- a/drivers/remoteproc/Kconfig
+++ b/drivers/remoteproc/Kconfig
@@ -166,6 +166,7 @@ config QCOM_PIL_INFO
 
 config QCOM_RPROC_COMMON
tristate
+   select AUXILIARY_BUS
 
 config QCOM_Q6V5_COMMON
tristate

-- 
2.39.2




[PATCH 1/2] soc: qcom: add missing pd-mapper dependencies

2024-06-26 Thread Dmitry Baryshkov
The pd-mapper driver uses auxiliary bus and Qualcomm PDR message format
data. Add missing dependencies to the driver's Kconfig entry.

Reported-by: Mark Brown 
Fixes: 1ebcde047c54 ("soc: qcom: add pd-mapper implementation")
Signed-off-by: Dmitry Baryshkov 
---
 drivers/soc/qcom/Kconfig | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/soc/qcom/Kconfig b/drivers/soc/qcom/Kconfig
index 0a2f2bfd7863..432c85bd8ad4 100644
--- a/drivers/soc/qcom/Kconfig
+++ b/drivers/soc/qcom/Kconfig
@@ -75,6 +75,8 @@ config QCOM_OCMEM
 config QCOM_PD_MAPPER
tristate "Qualcomm Protection Domain Mapper"
select QCOM_QMI_HELPERS
+   select QCOM_PDR_MSG
+   select AUXILIARY_BUS
depends on NET && QRTR
default QCOM_RPROC_COMMON
help

-- 
2.39.2




[PATCH 0/2] qcom: fix missing dependencies for the QCOM_PD_MAPPER

2024-06-26 Thread Dmitry Baryshkov
While refactoring pd-mapper to use auxiliary bus for the PD mapper
instantiation I forgot to select the bus in Kconfig entries. Fix it now.

Signed-off-by: Dmitry Baryshkov 
---
Dmitry Baryshkov (2):
  soc: qcom: add missing pd-mapper dependencies
  remoteproc: qcom: select AUXILIARY_BUS

 drivers/remoteproc/Kconfig | 1 +
 drivers/soc/qcom/Kconfig   | 2 ++
 2 files changed, 3 insertions(+)
---
base-commit: b07e1e375f6389b6715b9aca590da17039bcd447
change-id: 20240626-qcom-pd-mapper-fix-deps-1cf064ee4715

Best regards,
-- 
Dmitry Baryshkov 




Re: [PATCH 2/2] remoteproc: k3-r5: Fix log levels where appropriate

2024-06-26 Thread Wadim Egorov

Hi Garrett,

Am 26.06.24 um 18:22 schrieb Garrett Giordano:

Driver was logging information as debug. Changed dev_dbg to dev_info
where appropriate.

Signed-off-by: Garrett Giordano 
---
  drivers/remoteproc/ti_k3_r5_remoteproc.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/remoteproc/ti_k3_r5_remoteproc.c 
b/drivers/remoteproc/ti_k3_r5_remoteproc.c
index 50e486bcfa10..5821b6517063 100644
--- a/drivers/remoteproc/ti_k3_r5_remoteproc.c
+++ b/drivers/remoteproc/ti_k3_r5_remoteproc.c
@@ -558,7 +558,7 @@ static int k3_r5_rproc_start(struct rproc *rproc)
  
  	boot_addr = rproc->bootaddr;

/* TODO: add boot_addr sanity checking */
-   dev_dbg(dev, "booting R5F core using boot addr = 0x%x\n", boot_addr);
+   dev_info(dev, "booting R5F core using boot addr = 0x%x\n", boot_addr);

There is no need for this debug message to be in info level.

Regards,
Wadim
  
  	/* boot vector need not be programmed for Core1 in LockStep mode */

core = kproc->core;




Re: [PATCH v3 2/2] rust: add tracepoint support

2024-06-26 Thread Steven Rostedt
On Wed, 26 Jun 2024 10:48:23 +0200
Alice Ryhl  wrote:

> >
> > Because your hooks/rust_binder.h and events/rust_binder.h use the same
> > TRACE_SYSTEM name? Could you try something like:
> >
> > #define TRACE_SYSTEM rust_binder_hook
> >
> > in your hooks/rust_binder.h?  
> 
> I was able to get it to work by moving the includes into two different
> .c files. I don't think changing TRACE_SYSTEM works because it must
> match the filename.

Try to use:

 #define TRACE_SYSTEM_VAR rust_binder_hook_other_name

in one. Then that is used as the variable for that file.

-- Steve



Re: [PATCH v2 4/4] EDAC/mce_amd: Add support for FRU Text in MCA

2024-06-26 Thread Borislav Petkov
On Wed, Jun 26, 2024 at 01:00:30PM -0500, Naik, Avadhut wrote:
> > 
> > Why are you clearing it if you're overwriting it immediately?
> > 
> Since its a local variable, wanted to ensure that the memory is zeroed out to 
> prevent
> any issues with the %s specifier, used later on.

What issues?

> Would you recommend removing that and using initializer instead for the 
> string?

I'd recommend looking at what the code does and then really thinking whether
that makes any sense.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v2 2/4] x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers

2024-06-26 Thread Borislav Petkov
On Wed, Jun 26, 2024 at 12:24:20PM -0500, Naik, Avadhut wrote:
> 
> 
> On 6/26/2024 06:10, Borislav Petkov wrote:
> > On Tue, Jun 25, 2024 at 02:56:22PM -0500, Avadhut Naik wrote:
> >> AMD's Scalable MCA systems viz. Genoa will include two new registers:
> > 
> > "viz."?
> > 
> Right. Will mention Zen4 instead of Genoa.

I still don't know what "viz." means...

> Yes, I catch your drift. Will reword the commit message to explain that the
> new syndrome registers are going to be exported through the tracepoint
> in a dynamic array, as they are vendor-specific, so that usersapce error
> decoding tools can retrieve the supplemental error information within them.

Again, why?

Why is it important to have them in the tracepoint?

> >> Note: Checkpatch warnings/errors are ignored to maintain coding style.
> > 
> > This goes...
> > 
> >>
> >> [Yazen: Drop Yazen's Co-developed-by tag and moved SoB tag.]
> > 
> > Yes, you did but now your SOB chain is wrong:
> > 
> >> Signed-off-by: Avadhut Naik 
> >> Signed-off-by: Yazen Ghannam 
> > 
> > This tells me Avadhut is the author, Yazen handled it and he's sending it to
> > me. But nope, he isn't. So it needs another Avadhut SOB underneath.
> > 
> > Audit all patches pls.
> > 
> Wasn't aware of this chronology. Thanks for this information!

Well, there's documentation for that which you should've read already, before
sending patches:

https://kernel.org/doc/html/latest/process/development-process.html

and

https://kernel.org/doc/html/latest/process/submitting-patches.html

especially.

> So, IIUC, the sequence for this patch should be as follows?
> 
> Signed-off-by: Avadhut Naik 
> Signed-off-by: Yazen Ghannam 
> Signed-off-by: Avadhut Naik 

Yes, now I leave it to you to explain why. Hint: it is in those docs above.

> 
> >> ---
> > 
> > ... right under those three "---" as such notes do not belong in the commit
> > message. Remember that for the future.
> > 
> Okay. Will move the note here.

Or remove it completely. checkpatch is crap - I know. No need to have it in
every patch.

> Had considered this. But struct mce_hw_err *err wouldn't really be used in
> mce_read_aux() in patch 1. Only struct mce m, which is already available, will
> be used.

So?

> Hence, deferred the change to this patch where usage of struct mce_hw_err *err
> is actually introduced in mce_read_aux().
> 
> Do you prefer having this change in patch 1 instead?

I prefer a patch to contain one logical and complete change only. Because this
makes review easier. You should try reviewing patches sometimes too and you'll
know.

> > So that vendor data layout - is that ABI too? Or are we free to shuffle the
> > fields around in the future or even remove some?
> > 
> > This all needs to be specified somewhere explicitly so that nothing relies 
> > on
> > that layout.
> > 
> > And I'm not sure that that's enough because when userspace tools start using
> > them, then they're practically an ABI so you can't change them even if you
> > wanted to.
> > 
> > So is libtraceevent or all the other libraries going to parse this as a blob
> > and it is always going to remain such?
> > 
> > But then the tools which interpret it need to know its layout and if it
> > changes, perhaps check kernel version which then becomes RealUgly(tm).
> > 
> > So you might just as well dump the separate fields one by one, without
> > a dynamic array.
> > 
> > Or do a dynamic array but specify that their layout in struct
> > mce_hw_er.vendor.amd are cast in stone so that we're all clear on what goes
> > where.
> > 
> > Questions over questions...
> > 
> Should we document this where struct mce_hw_err is defined, in
> arch/x86/include/asm/mce.h? Or do you have any other recommendations?

I don't know. If I knew I wouldn't have questions which you can read again and
try to answer.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v2 1/4] x86/mce: Add wrapper for struct mce to export vendor specific info

2024-06-26 Thread Borislav Petkov
On Wed, Jun 26, 2024 at 05:11:29PM +, Luck, Tony wrote:
> > Tony, any comments? You ok with this, would that fit any Intel-specific 
> > vendor
> > fields too or do you need some additional Intel-specific changes?
> 
> It looks easy enough to add any Intel specific bits to the union later.
> 
> Is there anyway that the trace event could be "smarter" about what vendor 
> specific
> information to include based on boot_cpu_data.x86_vendor? As currently written
> Intel systems are going to see 3*u64 decoded into ascii, that are all zero. 
> Not a
> huge deal, I think it will just look like "0x0,0x0,0x0"

Hmm, good question.

Yo, Steve, is there a way to do conditional things in a TP?

For example:

@@ -83,7 +87,8 @@ TRACE_EVENT(mce_record,
__entry->walltime,
__entry->socketid,
__entry->apicid,
-   __entry->microcode)
+   __entry->microcode,

if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
__print_array(__get_dynamic_array(v_data), __entry->len / 8, 8))

i.e., print that array only when on a AMD.

I'm sure this won't fly as it is macro magic - this is just to show the
intent...

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



[PATCH v2 4/4] EDAC/mce_amd: Add support for FRU Text in MCA

2024-06-26 Thread Naik, Avadhut



On 6/26/2024 07:04, Borislav Petkov wrote:
> On Tue, Jun 25, 2024 at 02:56:24PM -0500, Avadhut Naik wrote:
>> From: Yazen Ghannam 
>>
>> A new "FRU Text in MCA" feature is defined where the Field Replaceable
>> Unit (FRU) Text for a device is represented by a string in the new
>> MCA_SYND1 and MCA_SYND2 registers. This feature is supported per MCA
>> bank, and it is advertised by the McaFruTextInMca bit (MCA_CONFIG[9]).
>>
>> The FRU Text is populated dynamically for each individual error state
>> (MCA_STATUS, MCA_ADDR, et al.). This handles the case where an MCA bank
>> covers multiple devices, for example, a Unified Memory Controller (UMC)
>> bank that manages two DIMMs.
>>
> 
> From here...
> 
>> Print the FRU Text string, if available, when decoding an MCA error.
>>
>> Also, add field for MCA_CONFIG MSR in struct mce_hw_err as vendor specific
>> error information and save the value of the MSR. The very value can then be
>> exported through tracepoint for userspace tools like rasdaemon to print FRU
>> Text, if available.
>>
>>  Note: Checkpatch checks/warnings are ignored to maintain coding style.
> 
> ... to here goes into the trash can except what MCA_CONFIG is for being logged
> as part of the error.
> 
Will do.

>> [Yazen: Add Avadhut as co-developer for wrapper changes. ]
>>
>> Co-developed-by: Avadhut Naik 
>> Signed-off-by: Avadhut Naik 
>> Signed-off-by: Yazen Ghannam 
> 
> Ditto as for patch 3.
> 
Will do.
>> ---
> 
>> @@ -853,8 +850,18 @@ amd_decode_mce(struct notifier_block *nb, unsigned long 
>> val, void *data)
>>  
>>  if (m->status & MCI_STATUS_SYNDV) {
>>  pr_cont(", Syndrome: 0x%016llx\n", m->synd);
>> -pr_emerg(HW_ERR "Syndrome1: 0x%016llx, Syndrome2: 
>> 0x%016llx",
>> - err->vi.amd.synd1, err->vi.amd.synd2);
>> +if (mca_config & MCI_CONFIG_FRUTEXT) {
>> +char frutext[17];
>> +
>> +memset(frutext, 0, sizeof(frutext));
> 
> Why are you clearing it if you're overwriting it immediately?
> 
Since its a local variable, wanted to ensure that the memory is zeroed out to 
prevent
any issues with the %s specifier, used later on.
Would you recommend removing that and using initializer instead for the string?

>> +memcpy(&frutext[0], &err->vi.amd.synd1, 8);
>> +memcpy(&frutext[8], &err->vi.amd.synd2, 8);
>> +
>> +pr_emerg(HW_ERR "FRU Text: %s", frutext);
>> +} else {
>> +pr_emerg(HW_ERR "Syndrome1: 0x%016llx, 
>> Syndrome2: 0x%016llx",
>> + err->vi.amd.synd1, err->vi.amd.synd2);
>> +}
>>  }
>>  
>>  pr_cont("\n");
>> -- 
>> 2.34.1
>>
> 

-- 
Thanks,
Avadhut Naik



Re: [PATCH 3/3] arm64: dts: qcom: msm8916-lg-c50: add initial dts for LG Leon LTE

2024-06-26 Thread Konrad Dybcio
On 23.06.2024 11:26 AM, Nikita Travkin wrote:
> From: Anton Bambura 
> 
> Add initial device-tree for LG Leon LTE (lg-c50), currently supported
> features:
> - eMMC;
> - MicroSD;
> - usb in peripheral mode;
> - WiFi/BT;
> - vibration;
> - keys.
> 
> Signed-off-by: Anton Bambura 
> Signed-off-by: Nikita Travkin 
> ---

Reviewed-by: Konrad Dybcio 

Konrad



Re: [PATCH v2 3/4] x86/mce/apei: Handle variable register array size

2024-06-26 Thread Naik, Avadhut



On 6/26/2024 06:57, Borislav Petkov wrote:
> On Tue, Jun 25, 2024 at 02:56:23PM -0500, Avadhut Naik wrote:
>> From: Yazen Ghannam 
>>
>> ACPI Boot Error Record Table (BERT) is being used by the kernel to
>> report errors that occurred in a previous boot. On some modern AMD
>> systems, these very errors within the BERT are reported through the
>> x86 Common Platform Error Record (CPER) format which consists of one
>> or more Processor Context Information Structures. These context
>> structures provide a starting address and represent an x86 MSR range
>> in which the data constitutes a contiguous set of MSRs starting from,
>> and including the starting address.
>>
>> It's common, for AMD systems that implement this behavior, that the
>> MSR range represents the MCAX register space used for the Scalable MCA
>> feature. The apei_smca_report_x86_error() function decodes and passes
>> this information through the MCE notifier chain. However, this function
>> assumes a fixed register size based on the original HW/FW implementation.
>>
>> This assumption breaks with the addition of two new MCAX registers viz.
>> MCA_SYND1 and MCA_SYND2. These registers are added at the end of the
>> MCAX register space, so they won't be included when decoding the CPER
>> data.
>>
>> Rework apei_smca_report_x86_error() to support a variable register array
>> size. This covers any case where the MSR context information starts at
>> the MCAX address for MCA_STATUS and ends at any other register within
>> the MCAX register space.
>>
>> Add code comments indicating the MCAX register at each offset.
>>
>> [Yazen: Add Avadhut as co-developer for wrapper changes.]
>>
>> Co-developed-by: Avadhut Naik 
>> Signed-off-by: Avadhut Naik 
>> Signed-off-by: Yazen Ghannam 
> 
> This needs Avadhut's SOB after Yazen's.
> 
Will do. Will change to the below format:

Co-developed-by: Avadhut Naik 
Signed-off-by: Yazen Ghannam 
Signed-off-by: Avadhut Naik 

> Touchups ontop:
> 
> diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c
> index 7a15f0ca1bd1..6bbeb29125a9 100644
> --- a/arch/x86/kernel/cpu/mce/apei.c
> +++ b/arch/x86/kernel/cpu/mce/apei.c
> @@ -69,7 +69,7 @@ EXPORT_SYMBOL_GPL(apei_mce_report_mem_error);
>  int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 
> lapic_id)
>  {
>   const u64 *i_mce = ((const u64 *) (ctx_info + 1));
> - unsigned int cpu, num_registers;
> + unsigned int cpu, num_regs;
>   struct mce_hw_err err;
>   struct mce *m = &err.m;
>  
> @@ -93,10 +93,10 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx 
> *ctx_info, u64 lapic_id)
>   /*
>* The number of registers in the register array is determined by
>* Register Array Size/8 as defined in UEFI spec v2.8, sec N.2.4.2.2.
> -  * Ensure that the array size includes at least 1 register.
> +  * Sanity-check registers array size.
>*/
> - num_registers = ctx_info->reg_arr_size >> 3;
> - if (!num_registers)
> + num_regs = ctx_info->reg_arr_size >> 3;
> + if (!num_regs)
>   return -EINVAL;
>  
>   mce_setup(m);
> @@ -118,13 +118,12 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx 
> *ctx_info, u64 lapic_id)
>   /*
>* The SMCA register layout is fixed and includes 16 registers.
>* The end of the array may be variable, but the beginning is known.
> -  * Switch on the number of registers. Cap the number of registers to
> -  * expected max (15).
> +  * Cap the number of registers to expected max (15).
>*/
> - if (num_registers > 15)
> - num_registers = 15;
> + if (num_regs > 15)
> + num_regs = 15;
>  
> - switch (num_registers) {
> + switch (num_regs) {
>   /* MCA_SYND2 */
>   case 15:
>   err.vi.amd.synd2 = *(i_mce + 14);
> 
Will incorporate these.

-- 
Thanks,
Avadhut Naik



[PATCH v2 2/4] x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers

2024-06-26 Thread Naik, Avadhut



On 6/26/2024 06:10, Borislav Petkov wrote:
> On Tue, Jun 25, 2024 at 02:56:22PM -0500, Avadhut Naik wrote:
>> AMD's Scalable MCA systems viz. Genoa will include two new registers:
> 
> "viz."?
> 
Right. Will mention Zen4 instead of Genoa.

> Not a lot of people outside of AMD know what Genoa is. Zen4 is probably a lot
> more widespread.
> 
>> MCA_SYND1 and MCA_SYND2.
>>
>> These registers will include supplemental error information in addition
>> to the existing MCA_SYND register. The data within the registers is
>> considered valid if MCA_STATUS[SyndV] is set.
> 
> From here...
> 
>> Add fields for these registers as vendor-specific error information
>> in struct mce_hw_err. Save and print these registers wherever
>> MCA_STATUS[SyndV]/MCA_SYND is currently used.
>>
>> Also, modify the mce_record tracepoint to export these new registers
>> through __dynamic_array. While the sizeof() operator has been used to
>> determine the size of this __dynamic_array, the same, if needed in the
>> future can be substituted by caching the size of vendor-specific error
>> information as part of struct mce_hw_err.
> 
> ... to here this text explains what the patch does. I guess it is time for my
> boilerplate text again:
> 
> Do not talk about *what* the patch is doing in the commit message - that
> should be obvious from the diff itself. Rather, concentrate on the *why*
> it needs to be done.
> 
> Imagine one fine day you're doing git archeology, you find the place in
> the code about which you want to find out why it was changed the way it 
> is now.
> 
> You do git annotate  ... find the line, see the commit id and
> you do:
> 
> git show 
> 
> You read the commit message and there's just gibberish and nothing's
> explaining *why* that change was done. And you start scratching your
> head, trying to figure out why. Because the damn commit message is worth
> sh*t.
> 
> This happens to us maintainers at least once a week. Well, I don't want
> that to happen in my tree anymore.
> 
> You catch my drift? :)
> 
> So, now, how are those new syndromes going to be used in the tracepoint and
> why do we want them there?
> 
Yes, I catch your drift. Will reword the commit message to explain that the
new syndrome registers are going to be exported through the tracepoint
in a dynamic array, as they are vendor-specific, so that usersapce error
decoding tools can retrieve the supplemental error information within them.

>> Note: Checkpatch warnings/errors are ignored to maintain coding style.
> 
> This goes...
> 
>>
>> [Yazen: Drop Yazen's Co-developed-by tag and moved SoB tag.]
> 
> Yes, you did but now your SOB chain is wrong:
> 
>> Signed-off-by: Avadhut Naik 
>> Signed-off-by: Yazen Ghannam 
> 
> This tells me Avadhut is the author, Yazen handled it and he's sending it to
> me. But nope, he isn't. So it needs another Avadhut SOB underneath.
> 
> Audit all patches pls.
> 
Wasn't aware of this chronology. Thanks for this information!
So, IIUC, the sequence for this patch should be as follows?

Signed-off-by: Avadhut Naik 
Signed-off-by: Yazen Ghannam 
Signed-off-by: Avadhut Naik 

>> ---
> 
> ... right under those three "---" as such notes do not belong in the commit
> message. Remember that for the future.
> 
Okay. Will move the note here.

>>  arch/x86/include/asm/mce.h | 12 
>>  arch/x86/kernel/cpu/mce/amd.c  |  5 -
>>  arch/x86/kernel/cpu/mce/core.c | 24 +---
>>  drivers/edac/mce_amd.c | 10 +++---
>>  include/trace/events/mce.h |  9 +++--
>>  5 files changed, 47 insertions(+), 13 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
>> index e955edb22897..2b43ba37bbda 100644
>> --- a/arch/x86/include/asm/mce.h
>> +++ b/arch/x86/include/asm/mce.h
>> @@ -122,6 +122,9 @@
>>  #define MSR_AMD64_SMCA_MC0_DESTAT   0xc0002008
>>  #define MSR_AMD64_SMCA_MC0_DEADDR   0xc0002009
>>  #define MSR_AMD64_SMCA_MC0_MISC10xc000200a
>> +/* Registers MISC2 to MISC4 are at offsets B to D. */
>> +#define MSR_AMD64_SMCA_MC0_SYND10xc000200e
>> +#define MSR_AMD64_SMCA_MC0_SYND20xc000200f
>>  #define MSR_AMD64_SMCA_MCx_CTL(x)   (MSR_AMD64_SMCA_MC0_CTL + 0x10*(x))
>>  #define MSR_AMD64_SMCA_MCx_STATUS(x)(MSR_AMD64_SMCA_MC0_STATUS + 
>> 0x10*(x))
>>  #define MSR_AMD64_SMCA_MCx_ADDR(x)  (MSR_AMD64_SMCA_MC0_ADDR + 0x10*(x))
>> @@ -132,6 +135,8 @@
>>  #define MSR_AMD64_SMCA_MCx_DESTAT(x)(MSR_AMD64_SMCA_MC0_DESTAT + 
>> 0x10*(x))
>>  #define MSR_AMD64_SMCA_MCx_DEADDR(x)(MSR_AMD64_SMCA_MC0_DEADDR + 
>> 0x10*(x))
>>  #define MSR_AMD64_SMCA_MCx_MISCy(x, y)  ((MSR_AMD64_SMCA_MC0_MISC1 + y) 
>> + (0x10*(x)))
>> +#define MSR_AMD64_SMCA_MCx_SYND1(x) (MSR_AMD64_SMCA_MC0_SYND1 + 0x10*(x))
>> +#define MSR_AMD64_SMCA_MCx_SYND2(x) (MSR_AMD64_SMCA_MC0_SYND2 + 0x10*(x))
>>  
>>  #define XEC(x, mask)(((x) >> 16) & mask)
>>  
>> @@ -189,6 +194,13 @@ enum mce_notifier_prios {
>>  
>>  struct mce_hw_err {
>>  struct 

RE: [PATCH v2 1/4] x86/mce: Add wrapper for struct mce to export vendor specific info

2024-06-26 Thread Luck, Tony
> Tony, any comments? You ok with this, would that fit any Intel-specific vendor
> fields too or do you need some additional Intel-specific changes?

It looks easy enough to add any Intel specific bits to the union later.

Is there anyway that the trace event could be "smarter" about what vendor 
specific
information to include based on boot_cpu_data.x86_vendor? As currently written
Intel systems are going to see 3*u64 decoded into ascii, that are all zero. Not 
a
huge deal, I think it will just look like "0x0,0x0,0x0"

-Tony


Re: [PATCH v7 3/5] regulator: add regulators driver for Marvell 88PM886 PMIC

2024-06-26 Thread Mark Brown
On Fri, May 31, 2024 at 07:34:58PM +0200, Karel Balej wrote:
> Support the LDO and buck regulators of the Marvell 88PM886 PMIC.

Reviewed-by: Mark Brown 


signature.asc
Description: PGP signature


Re: [PATCH] vDPA: add missing MODULE_DESCRIPTION() macros

2024-06-26 Thread Jeff Johnson
On 6/11/2024 12:22 PM, Jeff Johnson wrote:
> With ARCH=x86, make allmodconfig && make W=1 C=1 reports:
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/vdpa/vdpa.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/vdpa/ifcvf/ifcvf.o
> 
> Add the missing invocations of the MODULE_DESCRIPTION() macro.
> 
> Signed-off-by: Jeff Johnson 
> ---
>  drivers/vdpa/ifcvf/ifcvf_main.c | 1 +
>  drivers/vdpa/vdpa.c | 1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c
> index 80d0a0460885..ccf64d7bbfaa 100644
> --- a/drivers/vdpa/ifcvf/ifcvf_main.c
> +++ b/drivers/vdpa/ifcvf/ifcvf_main.c
> @@ -894,4 +894,5 @@ static struct pci_driver ifcvf_driver = {
>  
>  module_pci_driver(ifcvf_driver);
>  
> +MODULE_DESCRIPTION("Intel IFC VF NIC driver for virtio dataplane 
> offloading");
>  MODULE_LICENSE("GPL v2");
> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
> index 8d391947eb8d..1ca445e31acb 100644
> --- a/drivers/vdpa/vdpa.c
> +++ b/drivers/vdpa/vdpa.c
> @@ -1538,4 +1538,5 @@ core_initcall(vdpa_init);
>  module_exit(vdpa_exit);
>  
>  MODULE_AUTHOR("Jason Wang ");
> +MODULE_DESCRIPTION("vDPA bus");
>  MODULE_LICENSE("GPL v2");
> 
> ---
> base-commit: 83a7eefedc9b56fe7bfeff13b6c7356688ffa670
> change-id: 20240611-md-drivers-vdpa-391206d17ec3
> 
Following up to see if anything else is needed from me. Hoping to see this in
linux-next so I can remove it from my tracking spreadsheet :)

/jeff



Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support

2024-06-26 Thread Richard Cochran
On Tue, Jun 25, 2024 at 11:34:24PM +0200, Thomas Gleixner wrote:

> There is effort underway to expose PTP clocks to user space via
> VDSO.

That sounds interesting.  Has anything been posted?

Thanks,
Richard




[PATCH 1/2] remoteproc: k3-dsp: Fix log levels where appropriate

2024-06-26 Thread Garrett Giordano
Driver was logging information as errors. Changed dev_err to dev_info
where appropriate.

Signed-off-by: Garrett Giordano 
---
 drivers/remoteproc/ti_k3_dsp_remoteproc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/remoteproc/ti_k3_dsp_remoteproc.c 
b/drivers/remoteproc/ti_k3_dsp_remoteproc.c
index 3555b535b168..57d21f8ae3b7 100644
--- a/drivers/remoteproc/ti_k3_dsp_remoteproc.c
+++ b/drivers/remoteproc/ti_k3_dsp_remoteproc.c
@@ -327,7 +327,7 @@ static int k3_dsp_rproc_start(struct rproc *rproc)
goto put_mbox;
}
 
-   dev_err(dev, "booting DSP core using boot addr = 0x%x\n", boot_addr);
+   dev_info(dev, "booting DSP core using boot addr = 0x%x\n", boot_addr);
ret = ti_sci_proc_set_config(kproc->tsp, boot_addr, 0, 0);
if (ret)
goto put_mbox;
-- 
2.25.1




[PATCH 2/2] remoteproc: k3-r5: Fix log levels where appropriate

2024-06-26 Thread Garrett Giordano
Driver was logging information as debug. Changed dev_dbg to dev_info
where appropriate.

Signed-off-by: Garrett Giordano 
---
 drivers/remoteproc/ti_k3_r5_remoteproc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/remoteproc/ti_k3_r5_remoteproc.c 
b/drivers/remoteproc/ti_k3_r5_remoteproc.c
index 50e486bcfa10..5821b6517063 100644
--- a/drivers/remoteproc/ti_k3_r5_remoteproc.c
+++ b/drivers/remoteproc/ti_k3_r5_remoteproc.c
@@ -558,7 +558,7 @@ static int k3_r5_rproc_start(struct rproc *rproc)
 
boot_addr = rproc->bootaddr;
/* TODO: add boot_addr sanity checking */
-   dev_dbg(dev, "booting R5F core using boot addr = 0x%x\n", boot_addr);
+   dev_info(dev, "booting R5F core using boot addr = 0x%x\n", boot_addr);
 
/* boot vector need not be programmed for Core1 in LockStep mode */
core = kproc->core;
-- 
2.25.1




Re: [PATCH v3 3/5] arm64: dts: qcom: sdx75: update reserved memory regions for mpss

2024-06-26 Thread Konrad Dybcio
On 24.06.2024 1:21 PM, Naina Mehta wrote:
> 
> 
> On 6/18/2024 7:08 PM, Konrad Dybcio wrote:
>>
>>
>> On 6/18/24 15:13, Naina Mehta wrote:
>>> Rename qdss@8880 memory region as qlink_logging memory region
>>> and add qdss_mem memory region at address of 0x8850.
>>> Split mpss_dsmharq_mem region into 2 separate regions and
>>> reduce the size of mpssadsp_mem region.
>>>
>>> Signed-off-by: Naina Mehta 
>>> ---
>>
>> Alright, we're getting somewhere. The commit message should however motivate
>> why such changes are necessary. For all we know, the splitting in two is
>> currently done for no reason, as qdss_mem and qlink_logging_mem are 
>> contiguous
>> - does the firmware have some expectations about them being separate?
>>
> 
> Since different DSM region size is required for different modem firmware, 
> mpss_dsmharq_mem region being split into 2 separate regions.
> This would provide the flexibility to remove the region which is
> not required for a particular platform.
> qlink_logging is being added at the memory region at the address of
> 0x8880 as the region is being used by modem firmware.

Ok, now put that in the commit message :)

And I suppose:

"This would provide the flexibility to remove the region which is not
required for a particular platform." - but you still pass both to the
remoteproc in patch 4. Are these regions mutually exclusive?

Konrad



Re: [PATCH v2 00/13] OMAP mailbox FIFO removal

2024-06-26 Thread Andrew Davis

On 6/26/24 9:39 AM, Dominic Rath wrote:

On 13.06.2024 14:22, Andrew Davis wrote:

We looked into this some time ago, and noticed that the IRQ approach caused 
problems in the virtio/rpmsg code. I'd like to understand if your change was 
for the same reason, or something else we missed before.



It is most likely the same reason. Seems despite its name, rproc_vq_interrupt() 
cannot
be called from an IRQ/atomic context. As the following backtrace shows, that 
function
calls down into functions which are not IRQ safe. So we needed to keep it 
threaded:

[    5.389374] BUG: scheduling while atomic: (udev-worker)/232/0x00010002
[    5.395917] Modules linked in: videobuf2_dma_contig videobuf2_memops 
videobuf2_v4l2 phy_j721e_wiz display_connector omap_mailbox(+) videodev 
tps6594_i2c(+) videobuf2_common phy_can_transceiver at24 cd6
[    5.433562] CPU: 0 PID: 232 Comm: (udev-worker) Not tainted 
6.10.0-rc1-next-20240528-dirty #10
[    5.442158] Hardware name: Texas Instruments AM69 SK (DT)
[    5.447540] Call trace:
[    5.449976]  dump_backtrace+0x94/0xec
[    5.453640]  show_stack+0x18/0x24
[    5.456944]  dump_stack_lvl+0x78/0x90
[    5.460598]  dump_stack+0x18/0x24
[    5.463900]  __schedule_bug+0x50/0x68
[    5.467552]  __schedule+0x80c/0xb0c
[    5.471029]  schedule+0x34/0x104
[    5.474243]  schedule_preempt_disabled+0x24/0x40
[    5.478845]  rwsem_down_write_slowpath+0x31c/0x56c
[    5.483622]  down_write+0x90/0x94
[    5.486924]  kernfs_add_one+0x3c/0x148
[    5.490661]  kernfs_create_dir_ns+0x50/0x94
[    5.494830]  sysfs_create_dir_ns+0x70/0x10c
[    5.498999]  kobject_add_internal+0x98/0x26c
[    5.503254]  kobject_add+0x9c/0x10c
[    5.506729]  device_add+0xc0/0x790
[    5.510120]  rpmsg_register_device_override+0x10c/0x1c0
[    5.515333]  rpmsg_register_device+0x14/0x20
[    5.519590]  virtio_rpmsg_create_channel+0xb0/0x104
[    5.524452]  rpmsg_create_channel+0x28/0x60
[    5.528622]  rpmsg_ns_cb+0x100/0x1dc
[    5.532185]  rpmsg_recv_done+0x114/0x2e4
[    5.536094]  vring_interrupt+0x68/0xa4
[    5.539833]  rproc_vq_interrupt+0x2c/0x48
[    5.543830]  k3_r5_rproc_mbox_callback+0x84/0x90 [ti_k3_r5_remoteproc]
[    5.550348]  mbox_chan_received_data+0x1c/0x2c
[    5.554779]  mbox_interrupt+0xa0/0x17c [omap_mailbox]
[    5.559820]  __handle_irq_event_percpu+0x48/0x13c
[    5.564511]  handle_irq_event+0x4c/0xac



I looked into this a bit more closely, together with the colleague who 
implemented our internal workaround, and we came up with one more concern:

Have you considered that this synchronous path from the (threaded) IRQ draining 
the mailbox down to the (potentially blocking) rpmsg_* calls might let the 
hardware mailbox grow full?

 From what I remember the vring (?) has room for 512 messages, but the hardware 
mailbox on e.g. the AM64x can only handle four messages. At least with the 
current implementation on TI's MCU+ SDK running on the R5f that could cause the 
R5f to block, waiting for room in the hardware mailbox, while there are plenty 
of vring buffers available.



We would like to switch back to the non-threaded handler at some point. As you 
mention doing this
in a threaded way increase the risk of the hardware message queue filling and 
blocking the remote side.
(Plus the threaded handling can add latency to the message handling which 
should be avoided for real-time
reasons)

The fix might be to have rpmsg_recv_done() callback simply queue the message 
and then schedule another
worker to actually do the next stage processing. That way complex actions on 
messages do not block
vring_interrupt() which should be treated like an atomic context call.

Anyway for now, I'd expect the much faster host core (2xA53 @ 1GHZ in AM64) to 
be able to outpace the
R5s and keep the mailbox drained. Are you actually running into this issue or 
is the concern based on
ensuring RT performance(not blocking on mailbox queue filled) on the R5 side?

Andrew


Best Regards,

Dominic




Re: [PATCH v2 00/13] OMAP mailbox FIFO removal

2024-06-26 Thread Dominic Rath

On 13.06.2024 14:22, Andrew Davis wrote:
We looked into this some time ago, and noticed that the IRQ approach 
caused problems in the virtio/rpmsg code. I'd like to understand if 
your change was for the same reason, or something else we missed before.




It is most likely the same reason. Seems despite its name, 
rproc_vq_interrupt() cannot
be called from an IRQ/atomic context. As the following backtrace shows, 
that function
calls down into functions which are not IRQ safe. So we needed to keep 
it threaded:


[    5.389374] BUG: scheduling while atomic: (udev-worker)/232/0x00010002
[    5.395917] Modules linked in: videobuf2_dma_contig videobuf2_memops 
videobuf2_v4l2 phy_j721e_wiz display_connector omap_mailbox(+) videodev 
tps6594_i2c(+) videobuf2_common phy_can_transceiver at24 cd6
[    5.433562] CPU: 0 PID: 232 Comm: (udev-worker) Not tainted 
6.10.0-rc1-next-20240528-dirty #10

[    5.442158] Hardware name: Texas Instruments AM69 SK (DT)
[    5.447540] Call trace:
[    5.449976]  dump_backtrace+0x94/0xec
[    5.453640]  show_stack+0x18/0x24
[    5.456944]  dump_stack_lvl+0x78/0x90
[    5.460598]  dump_stack+0x18/0x24
[    5.463900]  __schedule_bug+0x50/0x68
[    5.467552]  __schedule+0x80c/0xb0c
[    5.471029]  schedule+0x34/0x104
[    5.474243]  schedule_preempt_disabled+0x24/0x40
[    5.478845]  rwsem_down_write_slowpath+0x31c/0x56c
[    5.483622]  down_write+0x90/0x94
[    5.486924]  kernfs_add_one+0x3c/0x148
[    5.490661]  kernfs_create_dir_ns+0x50/0x94
[    5.494830]  sysfs_create_dir_ns+0x70/0x10c
[    5.498999]  kobject_add_internal+0x98/0x26c
[    5.503254]  kobject_add+0x9c/0x10c
[    5.506729]  device_add+0xc0/0x790
[    5.510120]  rpmsg_register_device_override+0x10c/0x1c0
[    5.515333]  rpmsg_register_device+0x14/0x20
[    5.519590]  virtio_rpmsg_create_channel+0xb0/0x104
[    5.524452]  rpmsg_create_channel+0x28/0x60
[    5.528622]  rpmsg_ns_cb+0x100/0x1dc
[    5.532185]  rpmsg_recv_done+0x114/0x2e4
[    5.536094]  vring_interrupt+0x68/0xa4
[    5.539833]  rproc_vq_interrupt+0x2c/0x48
[    5.543830]  k3_r5_rproc_mbox_callback+0x84/0x90 [ti_k3_r5_remoteproc]
[    5.550348]  mbox_chan_received_data+0x1c/0x2c
[    5.554779]  mbox_interrupt+0xa0/0x17c [omap_mailbox]
[    5.559820]  __handle_irq_event_percpu+0x48/0x13c
[    5.564511]  handle_irq_event+0x4c/0xac



I looked into this a bit more closely, together with the colleague who 
implemented our internal workaround, and we came up with one more concern:


Have you considered that this synchronous path from the (threaded) IRQ 
draining the mailbox down to the (potentially blocking) rpmsg_* calls 
might let the hardware mailbox grow full?


From what I remember the vring (?) has room for 512 messages, but the 
hardware mailbox on e.g. the AM64x can only handle four messages. At 
least with the current implementation on TI's MCU+ SDK running on the 
R5f that could cause the R5f to block, waiting for room in the hardware 
mailbox, while there are plenty of vring buffers available.


Best Regards,

Dominic



Re: [PATCH v2] filemap: add trace events for get_pages, map_pages, and fault

2024-06-26 Thread Steven Rostedt
On Wed, 26 Jun 2024 21:31:57 +0900
Masami Hiramatsu (Google)  wrote:

> On Thu, 20 Jun 2024 16:19:03 +
> Takaya Saeki  wrote:
> 
> > To allow precise tracking of page caches accessed, add new tracepoints
> > that trigger when a process actually accesses them.
> > 
> > The ureadahead program used by ChromeOS traces the disk access of
> > programs as they start up at boot up. It uses mincore(2) or the
> > 'mm_filemap_add_to_page_cache' trace event to accomplish this. It stores
> > this information in a "pack" file and on subsequent boots, it will read
> > the pack file and call readahead(2) on the information so that disk
> > storage can be loaded into RAM before the applications actually need it.
> > 
> > A problem we see is that due to the kernel's readahead algorithm that
> > can aggressively pull in more data than needed (to try and accomplish
> > the same goal) and this data is also recorded. The end result is that
> > the pack file contains a lot of pages on disk that are never actually
> > used. Calling readahead(2) on these unused pages can slow down the
> > system boot up times.
> > 
> > To solve this, add 3 new trace events, get_pages, map_pages, and fault.
> > These will be used to trace the pages are not only pulled in from disk,
> > but are actually used by the application. Only those pages will be
> > stored in the pack file, and this helps out the performance of boot up.
> > 
> > With the combination of these 3 new trace events and
> > mm_filemap_add_to_page_cache, we observed a reduction in the pack file
> > by 7.3% - 20% on ChromeOS varying by device.
> >   
> 
> This looks good to me from the trace-event point of view.
> 
> Reviewed-by: Masami Hiramatsu (Google) 

I added my reviewed-by on the last patch, you could have added it on
this one as it didn't change as much. But anyway, here it is again:

Reviewed-by: Steven Rostedt (Google) 

-- Steve



Re: [PATCH 0/3] Introduce msm8916 based LG devices

2024-06-26 Thread Rob Herring (Arm)


On Sun, 23 Jun 2024 14:26:29 +0500, Nikita Travkin wrote:
> This series introduces two msm8916-based LG devices:
> 
> - LG Leon LTE (c50)
> - LG LG K10 (m216)
> 
> The devices only have basic support for now.
> 
> Signed-off-by: Nikita Travkin 
> ---
> Anton Bambura (1):
>   arm64: dts: qcom: msm8916-lg-c50: add initial dts for LG Leon LTE
> 
> Cristian Cozzolino (1):
>   arm64: dts: qcom: msm8916-lg-m216: Add initial device tree
> 
> Nikita Travkin (1):
>   dt-bindings: arm: qcom: Add msm8916 based LG devices
> 
>  Documentation/devicetree/bindings/arm/qcom.yaml |   2 +
>  arch/arm64/boot/dts/qcom/Makefile   |   2 +
>  arch/arm64/boot/dts/qcom/msm8916-lg-c50.dts | 140 +
>  arch/arm64/boot/dts/qcom/msm8916-lg-m216.dts| 251 
> 
>  4 files changed, 395 insertions(+)
> ---
> base-commit: f76698bd9a8ca01d3581236082d786e9a6b72bb7
> change-id: 20240621-msm8916-lg-initial-8d4a399ec3c2
> 
> Best regards,
> --
> Nikita Travkin 
> 
> 
> 


My bot found new DTB warnings on the .dts files added or changed in this
series.

Some warnings may be from an existing SoC .dtsi. Or perhaps the warnings
are fixed by another series. Ultimately, it is up to the platform
maintainer whether these warnings are acceptable or not. No need to reply
unless the platform maintainer has comments.

If you already ran DT checks and didn't see these error(s), then
make sure dt-schema is up to date:

  pip3 install dtschema --upgrade


New warnings running 'make CHECK_DTBS=y qcom/msm8916-lg-c50.dtb 
qcom/msm8916-lg-m216.dtb' for 
20240623-msm8916-lg-initial-v1-0-6fbcf714d...@trvn.ru:

arch/arm64/boot/dts/qcom/msm8916-lg-c50.dtb: /soc@0/audio-codec@771c000: failed 
to match any schema with compatible: ['qcom,msm8916-wcd-digital-codec']
arch/arm64/boot/dts/qcom/msm8916-lg-m216.dtb: /soc@0/audio-codec@771c000: 
failed to match any schema with compatible: ['qcom,msm8916-wcd-digital-codec']
arch/arm64/boot/dts/qcom/msm8916-lg-m216.dtb: 
/soc@0/i2c@78b9000/touchscreen@34: failed to match any schema with compatible: 
['melfas,mip4_ts']
arch/arm64/boot/dts/qcom/msm8916-lg-c50.dtb: /soc@0/power-manager@b088000: 
failed to match any schema with compatible: ['qcom,msm8916-acc']
arch/arm64/boot/dts/qcom/msm8916-lg-c50.dtb: /soc@0/power-manager@b098000: 
failed to match any schema with compatible: ['qcom,msm8916-acc']
arch/arm64/boot/dts/qcom/msm8916-lg-c50.dtb: /soc@0/power-manager@b0a8000: 
failed to match any schema with compatible: ['qcom,msm8916-acc']
arch/arm64/boot/dts/qcom/msm8916-lg-c50.dtb: /soc@0/power-manager@b0b8000: 
failed to match any schema with compatible: ['qcom,msm8916-acc']
arch/arm64/boot/dts/qcom/msm8916-lg-m216.dtb: /soc@0/power-manager@b088000: 
failed to match any schema with compatible: ['qcom,msm8916-acc']
arch/arm64/boot/dts/qcom/msm8916-lg-m216.dtb: /soc@0/power-manager@b098000: 
failed to match any schema with compatible: ['qcom,msm8916-acc']
arch/arm64/boot/dts/qcom/msm8916-lg-m216.dtb: /soc@0/power-manager@b0a8000: 
failed to match any schema with compatible: ['qcom,msm8916-acc']
arch/arm64/boot/dts/qcom/msm8916-lg-m216.dtb: /soc@0/power-manager@b0b8000: 
failed to match any schema with compatible: ['qcom,msm8916-acc']








Re: [PATCH -fixes] riscv: patch: Flush the icache right after patching to avoid illegal insns

2024-06-26 Thread Conor Dooley
On Mon, Jun 24, 2024 at 10:21:41AM +0200, Alexandre Ghiti wrote:
> We cannot delay the icache flush after patching some functions as we may
> have patched a function that will get called before the icache flush.
> 
> The only way to completely avoid such scenario is by flushing the icache
> as soon as we patch a function. This will probably be costly as we don't
> batch the icache maintenance anymore.
> 
> Fixes: 6ca445d8af0e ("riscv: Fix early ftrace nop patching")
> Reported-by: Conor Dooley 
> Closes: 
> https://lore.kernel.org/linux-riscv/20240613-lubricant-breath-061192a9489a@wendy/
> Signed-off-by: Alexandre Ghiti 

So, I can't even test this right now :/ The issue is annoying enough to
reproduce that same config + compiler + commit isn't enough.


signature.asc
Description: PGP signature


Re: [PATCH v2] filemap: add trace events for get_pages, map_pages, and fault

2024-06-26 Thread Google
On Thu, 20 Jun 2024 16:19:03 +
Takaya Saeki  wrote:

> To allow precise tracking of page caches accessed, add new tracepoints
> that trigger when a process actually accesses them.
> 
> The ureadahead program used by ChromeOS traces the disk access of
> programs as they start up at boot up. It uses mincore(2) or the
> 'mm_filemap_add_to_page_cache' trace event to accomplish this. It stores
> this information in a "pack" file and on subsequent boots, it will read
> the pack file and call readahead(2) on the information so that disk
> storage can be loaded into RAM before the applications actually need it.
> 
> A problem we see is that due to the kernel's readahead algorithm that
> can aggressively pull in more data than needed (to try and accomplish
> the same goal) and this data is also recorded. The end result is that
> the pack file contains a lot of pages on disk that are never actually
> used. Calling readahead(2) on these unused pages can slow down the
> system boot up times.
> 
> To solve this, add 3 new trace events, get_pages, map_pages, and fault.
> These will be used to trace the pages are not only pulled in from disk,
> but are actually used by the application. Only those pages will be
> stored in the pack file, and this helps out the performance of boot up.
> 
> With the combination of these 3 new trace events and
> mm_filemap_add_to_page_cache, we observed a reduction in the pack file
> by 7.3% - 20% on ChromeOS varying by device.
> 

This looks good to me from the trace-event point of view.

Reviewed-by: Masami Hiramatsu (Google) 

Thanks!

> Signed-off-by: Takaya Saeki 
> ---
> Changelog between v2 and v1
> - Fix a file offset type usage by casting pgoff_t to loff_t
> - Fixed format string of dev and inode
> 
>  include/trace/events/filemap.h | 84 ++
>  mm/filemap.c   |  4 ++
>  2 files changed, 88 insertions(+)
> 
> V1:https://lore.kernel.org/all/20240618093656.1944210-1-taka...@chromium.org/
> 
> diff --git a/include/trace/events/filemap.h b/include/trace/events/filemap.h
> index 46c89c1e460c..3a94bd633bf0 100644
> --- a/include/trace/events/filemap.h
> +++ b/include/trace/events/filemap.h
> @@ -56,6 +56,90 @@ DEFINE_EVENT(mm_filemap_op_page_cache, 
> mm_filemap_add_to_page_cache,
>   TP_ARGS(folio)
>   );
>  
> +DECLARE_EVENT_CLASS(mm_filemap_op_page_cache_range,
> +
> + TP_PROTO(
> + struct address_space *mapping,
> + pgoff_t index,
> + pgoff_t last_index
> + ),
> +
> + TP_ARGS(mapping, index, last_index),
> +
> + TP_STRUCT__entry(
> + __field(unsigned long, i_ino)
> + __field(dev_t, s_dev)
> + __field(unsigned long, index)
> + __field(unsigned long, last_index)
> + ),
> +
> + TP_fast_assign(
> + __entry->i_ino = mapping->host->i_ino;
> + if (mapping->host->i_sb)
> + __entry->s_dev =
> + mapping->host->i_sb->s_dev;
> + else
> + __entry->s_dev = mapping->host->i_rdev;
> + __entry->index = index;
> + __entry->last_index = last_index;
> + ),
> +
> + TP_printk(
> + "dev=%d:%d ino=%lx ofs=%lld max_ofs=%lld",
> + MAJOR(__entry->s_dev),
> + MINOR(__entry->s_dev), __entry->i_ino,
> + ((loff_t)__entry->index) << PAGE_SHIFT,
> + ((loff_t)__entry->last_index) << PAGE_SHIFT
> + )
> +);
> +
> +DEFINE_EVENT(mm_filemap_op_page_cache_range, mm_filemap_get_pages,
> + TP_PROTO(
> + struct address_space *mapping,
> + pgoff_t index,
> + pgoff_t last_index
> + ),
> + TP_ARGS(mapping, index, last_index)
> +);
> +
> +DEFINE_EVENT(mm_filemap_op_page_cache_range, mm_filemap_map_pages,
> + TP_PROTO(
> + struct address_space *mapping,
> + pgoff_t index,
> + pgoff_t last_index
> + ),
> + TP_ARGS(mapping, index, last_index)
> +);
> +
> +TRACE_EVENT(mm_filemap_fault,
> + TP_PROTO(struct address_space *mapping, pgoff_t index),
> +
> + TP_ARGS(mapping, index),
> +
> + TP_STRUCT__entry(
> + __field(unsigned long, i_ino)
> + __field(dev_t, s_dev)
> + __field(unsigned long, index)
> + ),
> +
> + TP_fast_assign(
> + __entry->i_ino = mapping->host->i_ino;
> + if (mapping->host->i_sb)
> + __entry->s_dev =
> + mapping->host->i_sb->s_dev;
> + else
> + __entry->s_dev = mapping->host->i_rdev;
> + __entry->index = index;
> + ),
> +
> + TP_printk(
> + "dev=%d:%d ino=%lx ofs=%lld",
> + MAJOR(__entry->s_dev),
> + MINOR(__entry->s_dev), __entry->i_ino,
> + ((loff_t)__entry->index) << PAGE_SHIFT
> + )
> +);
> +
>  TRACE_EVENT(filemap_set_wb_err,
>  

[PATCH net-next v3 3/3] test/vsock: add ioctl unsent bytes test

2024-06-26 Thread Luigi Leonardi via B4 Relay
From: Luigi Leonardi 

Introduce two tests, one for SOCK_STREAM and one for SOCK_SEQPACKET, which 
checks
after a packet is delivered, that the number of unsent bytes is zero,
using ioctl SIOCOUTQ.

Signed-off-by: Luigi Leonardi 
---
 tools/testing/vsock/util.c   |  6 +--
 tools/testing/vsock/util.h   |  3 ++
 tools/testing/vsock/vsock_test.c | 85 
 3 files changed, 91 insertions(+), 3 deletions(-)

diff --git a/tools/testing/vsock/util.c b/tools/testing/vsock/util.c
index 554b290fefdc..a3d448a075e3 100644
--- a/tools/testing/vsock/util.c
+++ b/tools/testing/vsock/util.c
@@ -139,7 +139,7 @@ int vsock_bind_connect(unsigned int cid, unsigned int port, 
unsigned int bind_po
 }
 
 /* Connect to  and return the file descriptor. */
-static int vsock_connect(unsigned int cid, unsigned int port, int type)
+int vsock_connect(unsigned int cid, unsigned int port, int type)
 {
union {
struct sockaddr sa;
@@ -226,8 +226,8 @@ static int vsock_listen(unsigned int cid, unsigned int 
port, int type)
 /* Listen on  and return the first incoming connection.  The remote
  * address is stored to clientaddrp.  clientaddrp may be NULL.
  */
-static int vsock_accept(unsigned int cid, unsigned int port,
-   struct sockaddr_vm *clientaddrp, int type)
+int vsock_accept(unsigned int cid, unsigned int port,
+struct sockaddr_vm *clientaddrp, int type)
 {
union {
struct sockaddr sa;
diff --git a/tools/testing/vsock/util.h b/tools/testing/vsock/util.h
index e95e62485959..fff22d4a14c0 100644
--- a/tools/testing/vsock/util.h
+++ b/tools/testing/vsock/util.h
@@ -39,6 +39,9 @@ struct test_case {
 void init_signals(void);
 unsigned int parse_cid(const char *str);
 unsigned int parse_port(const char *str);
+int vsock_connect(unsigned int cid, unsigned int port, int type);
+int vsock_accept(unsigned int cid, unsigned int port,
+struct sockaddr_vm *clientaddrp, int type);
 int vsock_stream_connect(unsigned int cid, unsigned int port);
 int vsock_bind_connect(unsigned int cid, unsigned int port,
   unsigned int bind_port, int type);
diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
index f851f8961247..76bd17b4b291 100644
--- a/tools/testing/vsock/vsock_test.c
+++ b/tools/testing/vsock/vsock_test.c
@@ -20,6 +20,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "vsock_test_zerocopy.h"
 #include "timeout.h"
@@ -1238,6 +1240,79 @@ static void test_double_bind_connect_client(const struct 
test_opts *opts)
}
 }
 
+#define MSG_BUF_IOCTL_LEN 64
+static void test_unsent_bytes_server(const struct test_opts *opts, int type)
+{
+   unsigned char buf[MSG_BUF_IOCTL_LEN];
+   int client_fd;
+
+   client_fd = vsock_accept(VMADDR_CID_ANY, 1234, NULL, type);
+   if (client_fd < 0) {
+   perror("accept");
+   exit(EXIT_FAILURE);
+   }
+
+   recv_buf(client_fd, buf, sizeof(buf), 0, sizeof(buf));
+   control_writeln("RECEIVED");
+
+   close(client_fd);
+}
+
+static void test_unsent_bytes_client(const struct test_opts *opts, int type)
+{
+   unsigned char buf[MSG_BUF_IOCTL_LEN];
+   int ret, fd, sock_bytes_unsent;
+
+   fd = vsock_connect(opts->peer_cid, 1234, type);
+   if (fd < 0) {
+   perror("connect");
+   exit(EXIT_FAILURE);
+   }
+
+   for (int i = 0; i < sizeof(buf); i++)
+   buf[i] = rand() & 0xFF;
+
+   send_buf(fd, buf, sizeof(buf), 0, sizeof(buf));
+   control_expectln("RECEIVED");
+
+   ret = ioctl(fd, SIOCOUTQ, &sock_bytes_unsent);
+   if (ret < 0) {
+   if (errno == EOPNOTSUPP) {
+   fprintf(stderr, "Test skipped\n");
+   } else {
+   perror("ioctl");
+   exit(EXIT_FAILURE);
+   }
+   } else if (ret == 0 && sock_bytes_unsent != 0) {
+   fprintf(stderr,
+   "Unexpected 'SIOCOUTQ' value, expected 0, got %i\n",
+   sock_bytes_unsent);
+   exit(EXIT_FAILURE);
+   }
+
+   close(fd);
+}
+
+static void test_stream_unsent_bytes_client(const struct test_opts *opts)
+{
+   test_unsent_bytes_client(opts, SOCK_STREAM);
+}
+
+static void test_stream_unsent_bytes_server(const struct test_opts *opts)
+{
+   test_unsent_bytes_server(opts, SOCK_STREAM);
+}
+
+static void test_seqpacket_unsent_bytes_client(const struct test_opts *opts)
+{
+   test_unsent_bytes_client(opts, SOCK_SEQPACKET);
+}
+
+static void test_seqpacket_unsent_bytes_server(const struct test_opts *opts)
+{
+   test_unsent_bytes_server(opts, SOCK_SEQPACKET);
+}
+
 #define RCVLOWAT_CREDIT_UPD_BUF_SIZE   (1024 * 128)
 /* This define is the same as in 'include/linux/virtio_vsock.h':
  * it is used to decide when to send credit update message during
@@ -1523,6 +1598,16 

[PATCH net-next v3 2/3] vsock/virtio: add SIOCOUTQ support for all virtio based transports

2024-06-26 Thread Luigi Leonardi via B4 Relay
From: Luigi Leonardi 

Introduce support for stream_bytes_unsent and seqpacket_bytes_unsent
ioctl for virtio_transport, vhost_vsock and vsock_loopback.

For all transports the unsent bytes counter is incremented
in virtio_transport_get_credit.

In the virtio_transport (G2H) the counter is decremented each
time the host notifies the guest that it consumed the skbuffs.
In vhost-vsock (H2G) the counter is decremented after the skbuff
is queued in the virtqueue.
In vsock_loopback the counter is decremented after the skbuff is
dequeued.

Signed-off-by: Luigi Leonardi 
---
 drivers/vhost/vsock.c   |  4 +++-
 include/linux/virtio_vsock.h|  7 +++
 net/vmw_vsock/virtio_transport.c|  4 +++-
 net/vmw_vsock/virtio_transport_common.c | 35 +
 net/vmw_vsock/vsock_loopback.c  |  7 +++
 5 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index ec20ecff85c7..dba8b3ea37bf 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -244,7 +244,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
restart_tx = true;
}
 
-   consume_skb(skb);
+   virtio_transport_consume_skb_sent(skb, true);
}
} while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len)));
if (added)
@@ -451,6 +451,8 @@ static struct virtio_transport vhost_transport = {
.notify_buffer_size   = virtio_transport_notify_buffer_size,
.notify_set_rcvlowat  = 
virtio_transport_notify_set_rcvlowat,
 
+   .unsent_bytes = virtio_transport_bytes_unsent,
+
.read_skb = virtio_transport_read_skb,
},
 
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index c82089dee0c8..e74c12878213 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -134,6 +134,8 @@ struct virtio_vsock_sock {
u32 peer_fwd_cnt;
u32 peer_buf_alloc;
 
+   size_t bytes_unsent;
+
/* Protected by rx_lock */
u32 fwd_cnt;
u32 last_fwd_cnt;
@@ -193,6 +195,11 @@ s64 virtio_transport_stream_has_data(struct vsock_sock 
*vsk);
 s64 virtio_transport_stream_has_space(struct vsock_sock *vsk);
 u32 virtio_transport_seqpacket_has_data(struct vsock_sock *vsk);
 
+size_t virtio_transport_bytes_unsent(struct vsock_sock *vsk);
+
+void virtio_transport_consume_skb_sent(struct sk_buff *skb,
+  bool consume);
+
 int virtio_transport_do_socket_init(struct vsock_sock *vsk,
 struct vsock_sock *psk);
 int
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 43d405298857..fc62d2818c2c 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -311,7 +311,7 @@ static void virtio_transport_tx_work(struct work_struct 
*work)
 
virtqueue_disable_cb(vq);
while ((skb = virtqueue_get_buf(vq, &len)) != NULL) {
-   consume_skb(skb);
+   virtio_transport_consume_skb_sent(skb, true);
added = true;
}
} while (!virtqueue_enable_cb(vq));
@@ -540,6 +540,8 @@ static struct virtio_transport virtio_transport = {
.notify_buffer_size   = virtio_transport_notify_buffer_size,
.notify_set_rcvlowat  = 
virtio_transport_notify_set_rcvlowat,
 
+   .unsent_bytes = virtio_transport_bytes_unsent,
+
.read_skb = virtio_transport_read_skb,
},
 
diff --git a/net/vmw_vsock/virtio_transport_common.c 
b/net/vmw_vsock/virtio_transport_common.c
index 16ff976a86e3..3a7fa36f306b 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -463,6 +463,26 @@ void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock 
*vvs, struct sk_buff *
 }
 EXPORT_SYMBOL_GPL(virtio_transport_inc_tx_pkt);
 
+void virtio_transport_consume_skb_sent(struct sk_buff *skb, bool consume)
+{
+   struct sock *s = skb->sk;
+
+   if (s && skb->len) {
+   struct vsock_sock *vs = vsock_sk(s);
+   struct virtio_vsock_sock *vvs;
+
+   vvs = vs->trans;
+
+   spin_lock_bh(&vvs->tx_lock);
+   vvs->bytes_unsent -= skb->len;
+   spin_unlock_bh(&vvs->tx_lock);
+   }
+
+   if (consume)
+   consume_skb(skb);
+}
+EXPORT_SYMBOL_GPL(virtio_transport_consume_skb_sent);
+
 u32 virtio_transport_get_credit(struct virtio_vsock_sock *vvs, u32 credit)
 {
u32 ret;
@@ -475,6 +495,7 @@ u32 virtio_transport_get_credit(struct virtio_vsock_sock 
*vvs, u32 credit)
if (ret > credit)
ret = credit;
vvs->tx_cnt += ret;
+   vvs->bytes_unsent += ret;
sp

[PATCH net-next v3 1/3] vsock: add support for SIOCOUTQ ioctl for all vsock socket types.

2024-06-26 Thread Luigi Leonardi via B4 Relay
From: Luigi Leonardi 

Add support for ioctl(s) for SOCK_STREAM SOCK_SEQPACKET and SOCK_DGRAM
in AF_VSOCK.
The only ioctl available is SIOCOUTQ/TIOCOUTQ, which returns the number
of unsent bytes in the socket. This information is transport-specific
and is delegated to them using a callback.

Suggested-by: Daan De Meyer 
Signed-off-by: Luigi Leonardi 
---
 include/net/af_vsock.h   |  3 +++
 net/vmw_vsock/af_vsock.c | 60 +---
 2 files changed, 60 insertions(+), 3 deletions(-)

diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index 535701efc1e5..7b5375ae7827 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -169,6 +169,9 @@ struct vsock_transport {
void (*notify_buffer_size)(struct vsock_sock *, u64 *);
int (*notify_set_rcvlowat)(struct vsock_sock *vsk, int val);
 
+   /* SIOCOUTQ ioctl */
+   size_t (*unsent_bytes)(struct vsock_sock *vsk);
+
/* Shutdown. */
int (*shutdown)(struct vsock_sock *, int);
 
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 4b040285aa78..d6140d73d122 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -112,6 +112,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr);
 static void vsock_sk_destruct(struct sock *sk);
@@ -1292,6 +1293,59 @@ int vsock_dgram_recvmsg(struct socket *sock, struct 
msghdr *msg,
 }
 EXPORT_SYMBOL_GPL(vsock_dgram_recvmsg);
 
+static int vsock_do_ioctl(struct socket *sock, unsigned int cmd,
+ int __user *arg)
+{
+   struct sock *sk = sock->sk;
+   struct vsock_sock *vsk;
+   int retval;
+
+   vsk = vsock_sk(sk);
+
+   switch (cmd) {
+   case SIOCOUTQ: {
+   size_t n_bytes;
+
+   if (!vsk->transport || !vsk->transport->unsent_bytes) {
+   retval = -EOPNOTSUPP;
+   break;
+   }
+
+   if (vsk->transport->unsent_bytes) {
+   if (sock_type_connectible(sk->sk_type) && sk->sk_state 
== TCP_LISTEN) {
+   retval = -EINVAL;
+   break;
+   }
+
+   n_bytes = vsk->transport->unsent_bytes(vsk);
+   if (n_bytes < 0) {
+   retval = n_bytes;
+   break;
+   }
+
+   retval = put_user(n_bytes, arg);
+   }
+   break;
+   }
+   default:
+   retval = -ENOIOCTLCMD;
+   }
+
+   return retval;
+}
+
+static int vsock_ioctl(struct socket *sock, unsigned int cmd,
+  unsigned long arg)
+{
+   int ret;
+
+   lock_sock(sock->sk);
+   ret = vsock_do_ioctl(sock, cmd, (int __user *)arg);
+   release_sock(sock->sk);
+
+   return ret;
+}
+
 static const struct proto_ops vsock_dgram_ops = {
.family = PF_VSOCK,
.owner = THIS_MODULE,
@@ -1302,7 +1356,7 @@ static const struct proto_ops vsock_dgram_ops = {
.accept = sock_no_accept,
.getname = vsock_getname,
.poll = vsock_poll,
-   .ioctl = sock_no_ioctl,
+   .ioctl = vsock_ioctl,
.listen = sock_no_listen,
.shutdown = vsock_shutdown,
.sendmsg = vsock_dgram_sendmsg,
@@ -2286,7 +2340,7 @@ static const struct proto_ops vsock_stream_ops = {
.accept = vsock_accept,
.getname = vsock_getname,
.poll = vsock_poll,
-   .ioctl = sock_no_ioctl,
+   .ioctl = vsock_ioctl,
.listen = vsock_listen,
.shutdown = vsock_shutdown,
.setsockopt = vsock_connectible_setsockopt,
@@ -2308,7 +2362,7 @@ static const struct proto_ops vsock_seqpacket_ops = {
.accept = vsock_accept,
.getname = vsock_getname,
.poll = vsock_poll,
-   .ioctl = sock_no_ioctl,
+   .ioctl = vsock_ioctl,
.listen = vsock_listen,
.shutdown = vsock_shutdown,
.setsockopt = vsock_connectible_setsockopt,

-- 
2.45.2





[PATCH net-next v3 0/3] ioctl support for AF_VSOCK and virtio-based transports

2024-06-26 Thread Luigi Leonardi via B4 Relay
This patch series introduce the support for ioctl(s) in AF_VSOCK.
The only ioctl currently available is SIOCOUTQ, which returns
the number of unsent or unacked packets. It is available for
SOCK_STREAM, SOCK_SEQPACKET and SOCK_DGRAM.

As this information is transport-dependent, a new optional callback
is introduced: unsent_bytes.

The first patch add ioctl support in AF_VSOCK, while the second
patch introduce support for SOCK_STREAM and SOCK_SEQPACKET
in all virtio-based transports: virtio_transport (G2H),
vhost-vsock (H2G) and vsock-loopback.

The latest patch introduce two tests for this new feature.
More details can be found in each patch changelog.

v2->v3
Applied all reviewers' suggetions:
- Minor style and code changes
- atomic_int replaced with an existing spin_lock.
Introduced lock_sock on ioctl call.
Rebased to latest net-next

v1->v2
Applied all Stefano's suggestions:
- vsock_do_ioctl has been rewritten
- ioctl(SIOCOUTQ) test is skipped when it is not supported
- Minor variable/function name changes
- rebased to latest net-next

Signed-off-by: Luigi Leonardi 
---
Luigi Leonardi (3):
  vsock: add support for SIOCOUTQ ioctl for all vsock socket types.
  vsock/virtio: add SIOCOUTQ support for all virtio based transports
  test/vsock: add ioctl unsent bytes test

 drivers/vhost/vsock.c   |  4 +-
 include/linux/virtio_vsock.h|  7 +++
 include/net/af_vsock.h  |  3 ++
 net/vmw_vsock/af_vsock.c| 60 +--
 net/vmw_vsock/virtio_transport.c|  4 +-
 net/vmw_vsock/virtio_transport_common.c | 35 ++
 net/vmw_vsock/vsock_loopback.c  |  7 +++
 tools/testing/vsock/util.c  |  6 +--
 tools/testing/vsock/util.h  |  3 ++
 tools/testing/vsock/vsock_test.c| 85 +
 10 files changed, 206 insertions(+), 8 deletions(-)
---
base-commit: 50b70845fc5c22cf7e7d25b57d57b3dca1725aa5
change-id: 20240626-ioctl_next-fbbcdd596063

Best regards,
-- 
Luigi Leonardi 





Re: [PATCH v2 4/4] EDAC/mce_amd: Add support for FRU Text in MCA

2024-06-26 Thread Borislav Petkov
On Tue, Jun 25, 2024 at 02:56:24PM -0500, Avadhut Naik wrote:
> From: Yazen Ghannam 
> 
> A new "FRU Text in MCA" feature is defined where the Field Replaceable
> Unit (FRU) Text for a device is represented by a string in the new
> MCA_SYND1 and MCA_SYND2 registers. This feature is supported per MCA
> bank, and it is advertised by the McaFruTextInMca bit (MCA_CONFIG[9]).
> 
> The FRU Text is populated dynamically for each individual error state
> (MCA_STATUS, MCA_ADDR, et al.). This handles the case where an MCA bank
> covers multiple devices, for example, a Unified Memory Controller (UMC)
> bank that manages two DIMMs.
> 

>From here...

> Print the FRU Text string, if available, when decoding an MCA error.
> 
> Also, add field for MCA_CONFIG MSR in struct mce_hw_err as vendor specific
> error information and save the value of the MSR. The very value can then be
> exported through tracepoint for userspace tools like rasdaemon to print FRU
> Text, if available.
> 
>  Note: Checkpatch checks/warnings are ignored to maintain coding style.

... to here goes into the trash can except what MCA_CONFIG is for being logged
as part of the error.

> [Yazen: Add Avadhut as co-developer for wrapper changes. ]
> 
> Co-developed-by: Avadhut Naik 
> Signed-off-by: Avadhut Naik 
> Signed-off-by: Yazen Ghannam 

Ditto as for patch 3.

> ---

> @@ -853,8 +850,18 @@ amd_decode_mce(struct notifier_block *nb, unsigned long 
> val, void *data)
>  
>   if (m->status & MCI_STATUS_SYNDV) {
>   pr_cont(", Syndrome: 0x%016llx\n", m->synd);
> - pr_emerg(HW_ERR "Syndrome1: 0x%016llx, Syndrome2: 
> 0x%016llx",
> -  err->vi.amd.synd1, err->vi.amd.synd2);
> + if (mca_config & MCI_CONFIG_FRUTEXT) {
> + char frutext[17];
> +
> + memset(frutext, 0, sizeof(frutext));

Why are you clearing it if you're overwriting it immediately?

> + memcpy(&frutext[0], &err->vi.amd.synd1, 8);
> + memcpy(&frutext[8], &err->vi.amd.synd2, 8);
> +
> + pr_emerg(HW_ERR "FRU Text: %s", frutext);
> + } else {
> + pr_emerg(HW_ERR "Syndrome1: 0x%016llx, 
> Syndrome2: 0x%016llx",
> +  err->vi.amd.synd1, err->vi.amd.synd2);
> + }
>   }
>  
>   pr_cont("\n");
> -- 
> 2.34.1
> 

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v2 3/4] x86/mce/apei: Handle variable register array size

2024-06-26 Thread Borislav Petkov
On Tue, Jun 25, 2024 at 02:56:23PM -0500, Avadhut Naik wrote:
> From: Yazen Ghannam 
> 
> ACPI Boot Error Record Table (BERT) is being used by the kernel to
> report errors that occurred in a previous boot. On some modern AMD
> systems, these very errors within the BERT are reported through the
> x86 Common Platform Error Record (CPER) format which consists of one
> or more Processor Context Information Structures. These context
> structures provide a starting address and represent an x86 MSR range
> in which the data constitutes a contiguous set of MSRs starting from,
> and including the starting address.
> 
> It's common, for AMD systems that implement this behavior, that the
> MSR range represents the MCAX register space used for the Scalable MCA
> feature. The apei_smca_report_x86_error() function decodes and passes
> this information through the MCE notifier chain. However, this function
> assumes a fixed register size based on the original HW/FW implementation.
> 
> This assumption breaks with the addition of two new MCAX registers viz.
> MCA_SYND1 and MCA_SYND2. These registers are added at the end of the
> MCAX register space, so they won't be included when decoding the CPER
> data.
> 
> Rework apei_smca_report_x86_error() to support a variable register array
> size. This covers any case where the MSR context information starts at
> the MCAX address for MCA_STATUS and ends at any other register within
> the MCAX register space.
> 
> Add code comments indicating the MCAX register at each offset.
> 
> [Yazen: Add Avadhut as co-developer for wrapper changes.]
> 
> Co-developed-by: Avadhut Naik 
> Signed-off-by: Avadhut Naik 
> Signed-off-by: Yazen Ghannam 

This needs Avadhut's SOB after Yazen's.

Touchups ontop:

diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c
index 7a15f0ca1bd1..6bbeb29125a9 100644
--- a/arch/x86/kernel/cpu/mce/apei.c
+++ b/arch/x86/kernel/cpu/mce/apei.c
@@ -69,7 +69,7 @@ EXPORT_SYMBOL_GPL(apei_mce_report_mem_error);
 int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
 {
const u64 *i_mce = ((const u64 *) (ctx_info + 1));
-   unsigned int cpu, num_registers;
+   unsigned int cpu, num_regs;
struct mce_hw_err err;
struct mce *m = &err.m;
 
@@ -93,10 +93,10 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx 
*ctx_info, u64 lapic_id)
/*
 * The number of registers in the register array is determined by
 * Register Array Size/8 as defined in UEFI spec v2.8, sec N.2.4.2.2.
-* Ensure that the array size includes at least 1 register.
+* Sanity-check registers array size.
 */
-   num_registers = ctx_info->reg_arr_size >> 3;
-   if (!num_registers)
+   num_regs = ctx_info->reg_arr_size >> 3;
+   if (!num_regs)
return -EINVAL;
 
mce_setup(m);
@@ -118,13 +118,12 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx 
*ctx_info, u64 lapic_id)
/*
 * The SMCA register layout is fixed and includes 16 registers.
 * The end of the array may be variable, but the beginning is known.
-* Switch on the number of registers. Cap the number of registers to
-* expected max (15).
+* Cap the number of registers to expected max (15).
 */
-   if (num_registers > 15)
-   num_registers = 15;
+   if (num_regs > 15)
+   num_regs = 15;
 
-   switch (num_registers) {
+   switch (num_regs) {
/* MCA_SYND2 */
case 15:
err.vi.amd.synd2 = *(i_mce + 14);

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH] ipvs: Avoid unnecessary calls to skb_is_gso_sctp

2024-06-26 Thread Pablo Neira Ayuso
Hi,

I have placed this patch in the nf-next tree to be included in the
next pull request.

Thanks.



Re: [PATCH v2 2/4] x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers

2024-06-26 Thread Borislav Petkov
On Tue, Jun 25, 2024 at 02:56:22PM -0500, Avadhut Naik wrote:
> AMD's Scalable MCA systems viz. Genoa will include two new registers:

"viz."?

Not a lot of people outside of AMD know what Genoa is. Zen4 is probably a lot
more widespread.

> MCA_SYND1 and MCA_SYND2.
> 
> These registers will include supplemental error information in addition
> to the existing MCA_SYND register. The data within the registers is
> considered valid if MCA_STATUS[SyndV] is set.

>From here...

> Add fields for these registers as vendor-specific error information
> in struct mce_hw_err. Save and print these registers wherever
> MCA_STATUS[SyndV]/MCA_SYND is currently used.
> 
> Also, modify the mce_record tracepoint to export these new registers
> through __dynamic_array. While the sizeof() operator has been used to
> determine the size of this __dynamic_array, the same, if needed in the
> future can be substituted by caching the size of vendor-specific error
> information as part of struct mce_hw_err.

... to here this text explains what the patch does. I guess it is time for my
boilerplate text again:

Do not talk about *what* the patch is doing in the commit message - that
should be obvious from the diff itself. Rather, concentrate on the *why*
it needs to be done.

Imagine one fine day you're doing git archeology, you find the place in
the code about which you want to find out why it was changed the way it 
is now.

You do git annotate  ... find the line, see the commit id and
you do:

git show 

You read the commit message and there's just gibberish and nothing's
explaining *why* that change was done. And you start scratching your
head, trying to figure out why. Because the damn commit message is worth
sh*t.

This happens to us maintainers at least once a week. Well, I don't want
that to happen in my tree anymore.

You catch my drift? :)

So, now, how are those new syndromes going to be used in the tracepoint and
why do we want them there?

> Note: Checkpatch warnings/errors are ignored to maintain coding style.

This goes...

> 
> [Yazen: Drop Yazen's Co-developed-by tag and moved SoB tag.]

Yes, you did but now your SOB chain is wrong:

> Signed-off-by: Avadhut Naik 
> Signed-off-by: Yazen Ghannam 

This tells me Avadhut is the author, Yazen handled it and he's sending it to
me. But nope, he isn't. So it needs another Avadhut SOB underneath.

Audit all patches pls.

> ---

... right under those three "---" as such notes do not belong in the commit
message. Remember that for the future.

>  arch/x86/include/asm/mce.h | 12 
>  arch/x86/kernel/cpu/mce/amd.c  |  5 -
>  arch/x86/kernel/cpu/mce/core.c | 24 +---
>  drivers/edac/mce_amd.c | 10 +++---
>  include/trace/events/mce.h |  9 +++--
>  5 files changed, 47 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> index e955edb22897..2b43ba37bbda 100644
> --- a/arch/x86/include/asm/mce.h
> +++ b/arch/x86/include/asm/mce.h
> @@ -122,6 +122,9 @@
>  #define MSR_AMD64_SMCA_MC0_DESTAT0xc0002008
>  #define MSR_AMD64_SMCA_MC0_DEADDR0xc0002009
>  #define MSR_AMD64_SMCA_MC0_MISC1 0xc000200a
> +/* Registers MISC2 to MISC4 are at offsets B to D. */
> +#define MSR_AMD64_SMCA_MC0_SYND1 0xc000200e
> +#define MSR_AMD64_SMCA_MC0_SYND2 0xc000200f
>  #define MSR_AMD64_SMCA_MCx_CTL(x)(MSR_AMD64_SMCA_MC0_CTL + 0x10*(x))
>  #define MSR_AMD64_SMCA_MCx_STATUS(x) (MSR_AMD64_SMCA_MC0_STATUS + 0x10*(x))
>  #define MSR_AMD64_SMCA_MCx_ADDR(x)   (MSR_AMD64_SMCA_MC0_ADDR + 0x10*(x))
> @@ -132,6 +135,8 @@
>  #define MSR_AMD64_SMCA_MCx_DESTAT(x) (MSR_AMD64_SMCA_MC0_DESTAT + 0x10*(x))
>  #define MSR_AMD64_SMCA_MCx_DEADDR(x) (MSR_AMD64_SMCA_MC0_DEADDR + 0x10*(x))
>  #define MSR_AMD64_SMCA_MCx_MISCy(x, y)   ((MSR_AMD64_SMCA_MC0_MISC1 + y) 
> + (0x10*(x)))
> +#define MSR_AMD64_SMCA_MCx_SYND1(x)  (MSR_AMD64_SMCA_MC0_SYND1 + 0x10*(x))
> +#define MSR_AMD64_SMCA_MCx_SYND2(x)  (MSR_AMD64_SMCA_MC0_SYND2 + 0x10*(x))
>  
>  #define XEC(x, mask) (((x) >> 16) & mask)
>  
> @@ -189,6 +194,13 @@ enum mce_notifier_prios {
>  
>  struct mce_hw_err {
>   struct mce m;
> +
> + union vendor_info {
> + struct {
> + u64 synd1;
> + u64 synd2;
> + } amd;

I presume the intent here is for Intel or other vendors to add their
vendor-specific stuff here too?

I'm also expecting that shared fields will be promoted up to the common struct
namespace. Pls add a short comment explaining what the goal with that struct
is.

> + } vi;

Call that "vendor" so that in the code you can have

err.vendor.amd.

or

err.vendor.intel.

and so on so that it is perfectly clear what this is.

>  };
>  
>  struct notifier_block;
> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
> index cb7dc0b1aa50..fc69d244ca7f 100644
> --- a/arch/x86/kernel/cpu/mce/amd.c
> +++ b/arch/x86/kernel/cpu/mc

Re: [PATCH v2 1/4] x86/mce: Add wrapper for struct mce to export vendor specific info

2024-06-26 Thread Borislav Petkov
On Tue, Jun 25, 2024 at 02:56:21PM -0500, Avadhut Naik wrote:
> Currently, exporting new additional machine check error information
> involves adding new fields for the same at the end of the struct mce.
> This additional information can then be consumed through mcelog or
> tracepoint.
> 
> However, as new MSRs are being added (and will be added in the future)
> by CPU vendors on their newer CPUs with additional machine check error
> information to be exported, the size of struct mce will balloon on some
> CPUs, unnecessarily, since those fields are vendor-specific. Moreover,
> different CPU vendors may export the additional information in varying
> sizes.
> 
> The problem particularly intensifies since struct mce is exposed to
> userspace as part of UAPI. It's bloating through vendor-specific data
> should be avoided to limit the information being sent out to userspace.
> 
> Add a new structure mce_hw_err to wrap the existing struct mce. The same
> will prevent its ballooning since vendor-specifc data, if any, can now be
> exported through a union within the wrapper structure and through
> __dynamic_array in mce_record tracepoint.
> 
> Furthermore, new internal kernel fields can be added to the wrapper
> struct without impacting the user space API.
> 
> Note: Some Checkpatch checks have been ignored to maintain coding style.
> 
> [Yazen: Add last commit message paragraph.]
> 
> Suggested-by: Borislav Petkov (AMD) 
> Signed-off-by: Avadhut Naik 
> Signed-off-by: Yazen Ghannam 
> ---
>  arch/x86/include/asm/mce.h  |   6 +-
>  arch/x86/kernel/cpu/mce/amd.c   |  29 ++--
>  arch/x86/kernel/cpu/mce/apei.c  |  54 +++
>  arch/x86/kernel/cpu/mce/core.c  | 178 +---
>  arch/x86/kernel/cpu/mce/dev-mcelog.c|   2 +-
>  arch/x86/kernel/cpu/mce/genpool.c   |  20 +--
>  arch/x86/kernel/cpu/mce/inject.c|   4 +-
>  arch/x86/kernel/cpu/mce/internal.h  |   4 +-
>  drivers/acpi/acpi_extlog.c  |   2 +-
>  drivers/acpi/nfit/mce.c |   2 +-
>  drivers/edac/i7core_edac.c  |   2 +-
>  drivers/edac/igen6_edac.c   |   2 +-
>  drivers/edac/mce_amd.c  |   2 +-
>  drivers/edac/pnd2_edac.c|   2 +-
>  drivers/edac/sb_edac.c  |   2 +-
>  drivers/edac/skx_common.c   |   2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c |   2 +-
>  drivers/ras/amd/fmpm.c  |   2 +-
>  drivers/ras/cec.c   |   2 +-
>  include/trace/events/mce.h  |  42 +++---
>  20 files changed, 199 insertions(+), 162 deletions(-)

Ok, did some minor massaging but otherwise looks ok now.

Tony, any comments? You ok with this, would that fit any Intel-specific vendor
fields too or do you need some additional Intel-specific changes?

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



[PATCH vhost v2 24/24] vdpa/mlx5: Don't enable non-active VQs in .set_vq_ready()

2024-06-26 Thread Dragos Tatulea
VQ indices in the range [cur_num_qps, max_vqs) represent queues that
have not yet been activated. .set_vq_ready should not activate these
VQs.

Reviewed-by: Cosmin Ratiu 
Acked-by: Eugenio Pérez 
Signed-off-by: Dragos Tatulea 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 573dc01df8c3..fa78e8288ebb 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1575,6 +1575,9 @@ static int resume_vq(struct mlx5_vdpa_net *ndev, struct 
mlx5_vdpa_virtqueue *mvq
if (!mvq->initialized)
return 0;
 
+   if (mvq->index >= ndev->cur_num_vqs)
+   return 0;
+
switch (mvq->fw_state) {
case MLX5_VIRTIO_NET_Q_OBJECT_STATE_INIT:
/* Due to a FW quirk we need to modify the VQ fields first then 
change state.

-- 
2.45.1




[PATCH vhost v2 23/24] vdpa/mlx5: Don't reset VQs more than necessary

2024-06-26 Thread Dragos Tatulea
The vdpa device can be reset many times in sequence without any
significant state changes in between. Previously this was not a problem:
VQs were torn down only on first reset. But after VQ pre-creation was
introduced, each reset will delete and re-create the hardware VQs and
their associated resources.

To solve this problem, avoid resetting hardware VQs if the VQs are still
in a blank state.

Reviewed-by: Cosmin Ratiu 
Acked-by: Eugenio Pérez 
Signed-off-by: Dragos Tatulea 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 30 +++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index ea4bfd9afce9..573dc01df8c3 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -3134,18 +3134,41 @@ static void init_group_to_asid_map(struct mlx5_vdpa_dev 
*mvdev)
mvdev->group2asid[i] = 0;
 }
 
+static bool needs_vqs_reset(const struct mlx5_vdpa_dev *mvdev)
+{
+   struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+   struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[0];
+
+   if (mvdev->status & VIRTIO_CONFIG_S_DRIVER_OK)
+   return true;
+
+   if (mvq->fw_state != MLX5_VIRTIO_NET_Q_OBJECT_STATE_INIT)
+   return true;
+
+   return mvq->modified_fields & (
+   MLX5_VIRTQ_MODIFY_MASK_STATE |
+   MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_ADDRS |
+   MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_AVAIL_IDX |
+   MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_USED_IDX
+   );
+}
+
 static int mlx5_vdpa_compat_reset(struct vdpa_device *vdev, u32 flags)
 {
struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+   bool vq_reset;
 
print_status(mvdev, 0, true);
mlx5_vdpa_info(mvdev, "performing device reset\n");
 
down_write(&ndev->reslock);
unregister_link_notifier(ndev);
-   teardown_vq_resources(ndev);
-   mvqs_set_defaults(ndev);
+   vq_reset = needs_vqs_reset(mvdev);
+   if (vq_reset) {
+   teardown_vq_resources(ndev);
+   mvqs_set_defaults(ndev);
+   }
 
if (flags & VDPA_RESET_F_CLEAN_MAP)
mlx5_vdpa_destroy_mr_resources(&ndev->mvdev);
@@ -3165,7 +3188,8 @@ static int mlx5_vdpa_compat_reset(struct vdpa_device 
*vdev, u32 flags)
if (mlx5_vdpa_create_dma_mr(mvdev))
mlx5_vdpa_warn(mvdev, "create MR failed\n");
}
-   setup_vq_resources(ndev, false);
+   if (vq_reset)
+   setup_vq_resources(ndev, false);
up_write(&ndev->reslock);
 
return 0;

-- 
2.45.1




[PATCH vhost v2 22/24] vdpa/mlx5: Re-create HW VQs under certain conditions

2024-06-26 Thread Dragos Tatulea
There are a few conditions under which the hardware VQs need a full
teardown and setup:

- VQ size changed to something else than default value. Hardware VQ size
  modification is not supported.

- User turns off certain device features: mergeable buffers, checksum
  virtio 1.0 compliance. In these cases, the TIR and RQT need to be
  re-created.

Add a needs_teardown configuration variable and set it when detecting
the above scenarios. On next DRIVER_OK, the resources will be torn down
first.

Reviewed-by: Cosmin Ratiu 
Acked-by: Eugenio Pérez 
Signed-off-by: Dragos Tatulea 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +++
 drivers/vdpa/mlx5/net/mlx5_vnet.h |  1 +
 2 files changed, 16 insertions(+)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 8ef703f4c23d..ea4bfd9afce9 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -2390,6 +2390,7 @@ static void mlx5_vdpa_set_vq_num(struct vdpa_device 
*vdev, u16 idx, u32 num)
 }
 
mvq = &ndev->vqs[idx];
+   ndev->needs_teardown = num != mvq->num_ent;
mvq->num_ent = num;
 }
 
@@ -2800,6 +2801,7 @@ static int mlx5_vdpa_set_driver_features(struct 
vdpa_device *vdev, u64 features)
struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
u64 old_features = mvdev->actual_features;
+   u64 diff_features;
int err;
 
print_features(mvdev, features, true);
@@ -2822,6 +2824,14 @@ static int mlx5_vdpa_set_driver_features(struct 
vdpa_device *vdev, u64 features)
}
}
 
+   /* When below features diverge from initial device features, VQs need a 
full teardown. */
+#define NEEDS_TEARDOWN_MASK (BIT_ULL(VIRTIO_NET_F_MRG_RXBUF) | \
+BIT_ULL(VIRTIO_NET_F_CSUM) | \
+BIT_ULL(VIRTIO_F_VERSION_1))
+
+   diff_features = mvdev->mlx_features ^ mvdev->actual_features;
+   ndev->needs_teardown = !!(diff_features & NEEDS_TEARDOWN_MASK);
+
update_cvq_info(mvdev);
return err;
 }
@@ -3038,6 +3048,7 @@ static void teardown_vq_resources(struct mlx5_vdpa_net 
*ndev)
destroy_rqt(ndev);
teardown_virtqueues(ndev);
ndev->setup = false;
+   ndev->needs_teardown = false;
 }
 
 static int setup_cvq_vring(struct mlx5_vdpa_dev *mvdev)
@@ -3078,6 +3089,10 @@ static void mlx5_vdpa_set_status(struct vdpa_device 
*vdev, u8 status)
goto err_setup;
}
register_link_notifier(ndev);
+
+   if (ndev->needs_teardown)
+   teardown_vq_resources(ndev);
+
if (ndev->setup) {
err = resume_vqs(ndev);
if (err) {
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.h 
b/drivers/vdpa/mlx5/net/mlx5_vnet.h
index 90b556a57971..00e79a7d0be8 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.h
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.h
@@ -56,6 +56,7 @@ struct mlx5_vdpa_net {
struct dentry *rx_dent;
struct dentry *rx_table_dent;
bool setup;
+   bool needs_teardown;
u32 cur_num_vqs;
u32 rqt_size;
bool nb_registered;

-- 
2.45.1




[PATCH vhost v2 21/24] vdpa/mlx5: Pre-create hardware VQs at vdpa .dev_add time

2024-06-26 Thread Dragos Tatulea
Currently, hardware VQs are created right when the vdpa device gets into
DRIVER_OK state. That is easier because most of the VQ state is known by
then.

This patch switches to creating all VQs and their associated resources
at device creation time. The motivation is to reduce the vdpa device
live migration downtime by moving the expensive operation of creating
all the hardware VQs and their associated resources out of downtime on
the destination VM.

The VQs are now created in a blank state. The VQ configuration will
happen later, on DRIVER_OK. Then the configuration will be applied when
the VQs are moved to the Ready state.

When .set_vq_ready() is called on a VQ before DRIVER_OK, special care is
needed: now that the VQ is already created a resume_vq() will be
triggered too early when no mr has been configured yet. Skip calling
resume_vq() in this case, let it be handled during DRIVER_OK.

For virtio-vdpa, the device configuration is done earlier during
.vdpa_dev_add() by vdpa_register_device(). Avoid calling
setup_vq_resources() a second time in that case.

On a 64 CPU, 256 GB VM with 1 vDPA device of 16 VQps, the full VQ
resource creation + resume time was ~370ms. Now it's down to 60 ms
(only VQ config and resume). The measurements were done on a ConnectX6DX
based vDPA device.

Signed-off-by: Dragos Tatulea 
Reviewed-by: Cosmin Ratiu 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 37 -
 1 file changed, 32 insertions(+), 5 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 324604b16b91..8ef703f4c23d 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -2444,7 +2444,7 @@ static void mlx5_vdpa_set_vq_ready(struct vdpa_device 
*vdev, u16 idx, bool ready
mvq = &ndev->vqs[idx];
if (!ready) {
suspend_vq(ndev, mvq);
-   } else {
+   } else if (mvdev->status & VIRTIO_CONFIG_S_DRIVER_OK) {
if (resume_vq(ndev, mvq))
ready = false;
}
@@ -3078,10 +3078,18 @@ static void mlx5_vdpa_set_status(struct vdpa_device 
*vdev, u8 status)
goto err_setup;
}
register_link_notifier(ndev);
-   err = setup_vq_resources(ndev, true);
-   if (err) {
-   mlx5_vdpa_warn(mvdev, "failed to setup 
driver\n");
-   goto err_driver;
+   if (ndev->setup) {
+   err = resume_vqs(ndev);
+   if (err) {
+   mlx5_vdpa_warn(mvdev, "failed to resume 
VQs\n");
+   goto err_driver;
+   }
+   } else {
+   err = setup_vq_resources(ndev, true);
+   if (err) {
+   mlx5_vdpa_warn(mvdev, "failed to setup 
driver\n");
+   goto err_driver;
+   }
}
} else {
mlx5_vdpa_warn(mvdev, "did not expect DRIVER_OK to be 
cleared\n");
@@ -3142,6 +3150,7 @@ static int mlx5_vdpa_compat_reset(struct vdpa_device 
*vdev, u32 flags)
if (mlx5_vdpa_create_dma_mr(mvdev))
mlx5_vdpa_warn(mvdev, "create MR failed\n");
}
+   setup_vq_resources(ndev, false);
up_write(&ndev->reslock);
 
return 0;
@@ -3835,8 +3844,21 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev 
*v_mdev, const char *name,
goto err_reg;
 
mgtdev->ndev = ndev;
+
+   /* For virtio-vdpa, the device was set up during device register. */
+   if (ndev->setup)
+   return 0;
+
+   down_write(&ndev->reslock);
+   err = setup_vq_resources(ndev, false);
+   up_write(&ndev->reslock);
+   if (err)
+   goto err_setup_vq_res;
+
return 0;
 
+err_setup_vq_res:
+   _vdpa_unregister_device(&mvdev->vdev);
 err_reg:
destroy_workqueue(mvdev->wq);
 err_res2:
@@ -3862,6 +3884,11 @@ static void mlx5_vdpa_dev_del(struct vdpa_mgmt_dev 
*v_mdev, struct vdpa_device *
 
unregister_link_notifier(ndev);
_vdpa_unregister_device(dev);
+
+   down_write(&ndev->reslock);
+   teardown_vq_resources(ndev);
+   up_write(&ndev->reslock);
+
wq = mvdev->wq;
mvdev->wq = NULL;
destroy_workqueue(wq);

-- 
2.45.1




[PATCH vhost v2 20/24] vdpa/mlx5: Use suspend/resume during VQP change

2024-06-26 Thread Dragos Tatulea
Resume a VQ if it is already created when the number of VQ pairs
increases. This is done in preparation for VQ pre-creation which is
coming in a later patch. It is necessary because calling setup_vq() on
an already created VQ will return early and will not enable the queue.

For symmetry, suspend a VQ instead of tearing it down when the number of
VQ pairs decreases. But only if the resume operation is supported.

Reviewed-by: Cosmin Ratiu 
Acked-by: Eugenio Pérez 
Signed-off-by: Dragos Tatulea 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index ce1f6a1f36cd..324604b16b91 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -2130,14 +2130,22 @@ static int change_num_qps(struct mlx5_vdpa_dev *mvdev, 
int newqps)
if (err)
return err;
 
-   for (i = ndev->cur_num_vqs - 1; i >= 2 * newqps; i--)
-   teardown_vq(ndev, &ndev->vqs[i]);
+   for (i = ndev->cur_num_vqs - 1; i >= 2 * newqps; i--) {
+   struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[i];
+
+   if (is_resumable(ndev))
+   suspend_vq(ndev, mvq);
+   else
+   teardown_vq(ndev, mvq);
+   }
 
ndev->cur_num_vqs = 2 * newqps;
} else {
ndev->cur_num_vqs = 2 * newqps;
for (i = cur_qps * 2; i < 2 * newqps; i++) {
-   err = setup_vq(ndev, &ndev->vqs[i], true);
+   struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[i];
+
+   err = mvq->initialized ? resume_vq(ndev, mvq) : 
setup_vq(ndev, mvq, true);
if (err)
goto clean_added;
}

-- 
2.45.1




[PATCH vhost v2 19/24] vdpa/mlx5: Forward error in suspend/resume device

2024-06-26 Thread Dragos Tatulea
Start using the suspend/resume_vq() error return codes previously added.

Reviewed-by: Cosmin Ratiu 
Reviewed-by: Zhu Yanjun 
Signed-off-by: Dragos Tatulea 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index e65d488f7a08..ce1f6a1f36cd 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -3436,22 +3436,25 @@ static int mlx5_vdpa_suspend(struct vdpa_device *vdev)
 {
struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+   int err;
 
mlx5_vdpa_info(mvdev, "suspending device\n");
 
down_write(&ndev->reslock);
unregister_link_notifier(ndev);
-   suspend_vqs(ndev);
+   err = suspend_vqs(ndev);
mlx5_vdpa_cvq_suspend(mvdev);
mvdev->suspended = true;
up_write(&ndev->reslock);
-   return 0;
+
+   return err;
 }
 
 static int mlx5_vdpa_resume(struct vdpa_device *vdev)
 {
struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
struct mlx5_vdpa_net *ndev;
+   int err;
 
ndev = to_mlx5_vdpa_ndev(mvdev);
 
@@ -3459,10 +3462,11 @@ static int mlx5_vdpa_resume(struct vdpa_device *vdev)
 
down_write(&ndev->reslock);
mvdev->suspended = false;
-   resume_vqs(ndev);
+   err = resume_vqs(ndev);
register_link_notifier(ndev);
up_write(&ndev->reslock);
-   return 0;
+
+   return err;
 }
 
 static int mlx5_set_group_asid(struct vdpa_device *vdev, u32 group,

-- 
2.45.1




[PATCH vhost v2 18/24] vdpa/mlx5: Consolidate all VQ modify to Ready to use resume_vq()

2024-06-26 Thread Dragos Tatulea
There are a few more places modifying the VQ to Ready directly. Let's
consolidate them into resume_vq().

The redundant warnings for resume_vq() errors can also be dropped.

There is one special case that needs to be handled for virtio-vdpa:
the initialized flag must be set to true earlier in setup_vq() so that
resume_vq() doesn't return early.

Reviewed-by: Cosmin Ratiu 
Acked-by: Eugenio Pérez 
Signed-off-by: Dragos Tatulea 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 18 ++
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 8ab5cf1bbc43..e65d488f7a08 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -160,6 +160,7 @@ static void free_fixed_resources(struct mlx5_vdpa_net 
*ndev);
 static void mvqs_set_defaults(struct mlx5_vdpa_net *ndev);
 static int setup_vq_resources(struct mlx5_vdpa_net *ndev, bool filled);
 static void teardown_vq_resources(struct mlx5_vdpa_net *ndev);
+static int resume_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue 
*mvq);
 
 static bool mlx5_vdpa_debug;
 
@@ -1500,16 +1501,14 @@ static int setup_vq(struct mlx5_vdpa_net *ndev,
if (err)
goto err_vq;
 
+   mvq->initialized = true;
+
if (mvq->ready) {
-   err = modify_virtqueue_state(ndev, mvq, 
MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
-   if (err) {
-   mlx5_vdpa_warn(&ndev->mvdev, "failed to modify to ready 
vq idx %d(%d)\n",
-  idx, err);
+   err = resume_vq(ndev, mvq);
+   if (err)
goto err_modify;
-   }
}
 
-   mvq->initialized = true;
return 0;
 
 err_modify:
@@ -2422,7 +2421,6 @@ static void mlx5_vdpa_set_vq_ready(struct vdpa_device 
*vdev, u16 idx, bool ready
struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
struct mlx5_vdpa_virtqueue *mvq;
-   int err;
 
if (!mvdev->actual_features)
return;
@@ -2439,14 +2437,10 @@ static void mlx5_vdpa_set_vq_ready(struct vdpa_device 
*vdev, u16 idx, bool ready
if (!ready) {
suspend_vq(ndev, mvq);
} else {
-   err = modify_virtqueue_state(ndev, mvq, 
MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
-   if (err) {
-   mlx5_vdpa_warn(mvdev, "modify VQ %d to ready failed 
(%d)\n", idx, err);
+   if (resume_vq(ndev, mvq))
ready = false;
-   }
}
 
-
mvq->ready = ready;
 }
 

-- 
2.45.1




[PATCH vhost v2 17/24] vdpa/mlx5: Add error code for suspend/resume VQ

2024-06-26 Thread Dragos Tatulea
Instead of blindly calling suspend/resume_vqs(), make then return error
codes.

To keep compatibility, keep suspending or resuming VQs on error and
return the last error code. The assumption here is that the error code
would be the same.

Reviewed-by: Cosmin Ratiu 
Acked-by: Eugenio Pérez 
Signed-off-by: Dragos Tatulea 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 77 +++
 1 file changed, 54 insertions(+), 23 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index adcc4d63cf83..8ab5cf1bbc43 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1526,71 +1526,102 @@ static int setup_vq(struct mlx5_vdpa_net *ndev,
return err;
 }
 
-static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue 
*mvq)
+static int suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue 
*mvq)
 {
struct mlx5_virtq_attr attr;
+   int err;
 
if (!mvq->initialized)
-   return;
+   return 0;
 
if (mvq->fw_state != MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY)
-   return;
+   return 0;
 
-   if (modify_virtqueue_state(ndev, mvq, 
MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND))
-   mlx5_vdpa_warn(&ndev->mvdev, "modify to suspend failed\n");
+   err = modify_virtqueue_state(ndev, mvq, 
MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND);
+   if (err) {
+   mlx5_vdpa_warn(&ndev->mvdev, "modify to suspend failed, err: 
%d\n", err);
+   return err;
+   }
 
-   if (query_virtqueue(ndev, mvq, &attr)) {
-   mlx5_vdpa_warn(&ndev->mvdev, "failed to query virtqueue\n");
-   return;
+   err = query_virtqueue(ndev, mvq, &attr);
+   if (err) {
+   mlx5_vdpa_warn(&ndev->mvdev, "failed to query virtqueue, err: 
%d\n", err);
+   return err;
}
+
mvq->avail_idx = attr.available_index;
mvq->used_idx = attr.used_index;
+
+   return 0;
 }
 
-static void suspend_vqs(struct mlx5_vdpa_net *ndev)
+static int suspend_vqs(struct mlx5_vdpa_net *ndev)
 {
+   int err = 0;
int i;
 
-   for (i = 0; i < ndev->cur_num_vqs; i++)
-   suspend_vq(ndev, &ndev->vqs[i]);
+   for (i = 0; i < ndev->cur_num_vqs; i++) {
+   int local_err = suspend_vq(ndev, &ndev->vqs[i]);
+
+   err = local_err ? local_err : err;
+   }
+
+   return err;
 }
 
-static void resume_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue 
*mvq)
+static int resume_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue 
*mvq)
 {
+   int err;
+
if (!mvq->initialized)
-   return;
+   return 0;
 
switch (mvq->fw_state) {
case MLX5_VIRTIO_NET_Q_OBJECT_STATE_INIT:
/* Due to a FW quirk we need to modify the VQ fields first then 
change state.
 * This should be fixed soon. After that, a single command can 
be used.
 */
-   if (modify_virtqueue(ndev, mvq, 0))
+   err = modify_virtqueue(ndev, mvq, 0);
+   if (err) {
mlx5_vdpa_warn(&ndev->mvdev,
-   "modify vq properties failed for vq %u\n", 
mvq->index);
+   "modify vq properties failed for vq %u, err: 
%d\n",
+   mvq->index, err);
+   return err;
+   }
break;
case MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND:
if (!is_resumable(ndev)) {
mlx5_vdpa_warn(&ndev->mvdev, "vq %d is not 
resumable\n", mvq->index);
-   return;
+   return -EINVAL;
}
break;
case MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY:
-   return;
+   return 0;
default:
mlx5_vdpa_warn(&ndev->mvdev, "resume vq %u called from bad 
state %d\n",
   mvq->index, mvq->fw_state);
-   return;
+   return -EINVAL;
}
 
-   if (modify_virtqueue_state(ndev, mvq, 
MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY))
-   mlx5_vdpa_warn(&ndev->mvdev, "modify to resume failed for vq 
%u\n", mvq->index);
+   err = modify_virtqueue_state(ndev, mvq, 
MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
+   if (err)
+   mlx5_vdpa_warn(&ndev->mvdev, "modify to resume failed for vq 
%u, err: %d\n",
+  mvq->index, err);
+
+   return err;
 }
 
-static void resume_vqs(struct mlx5_vdpa_net *ndev)
+static int resume_vqs(struct mlx5_vdpa_net *ndev)
 {
-   for (int i = 0; i < ndev->cur_num_vqs; i++)
-   resume_vq(ndev, &ndev->vqs[i]);
+   int err = 0;
+
+   for (int i = 0; i < ndev->cur_num_vqs; i++) {
+   int local_err = resume_vq(ndev, &ndev->vqs[i]);
+
+ 

[PATCH vhost v2 16/24] vdpa/mlx5: Accept Init -> Ready VQ transition in resume_vq()

2024-06-26 Thread Dragos Tatulea
Until now resume_vq() was used only for the suspend/resume scenario.
This change also allows calling resume_vq() to bring it from Init to
Ready state (VQ initialization).

Signed-off-by: Dragos Tatulea 
Reviewed-by: Cosmin Ratiu 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 24 ++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 0a62ce0b4af8..adcc4d63cf83 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1557,11 +1557,31 @@ static void suspend_vqs(struct mlx5_vdpa_net *ndev)
 
 static void resume_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue 
*mvq)
 {
-   if (!mvq->initialized || !is_resumable(ndev))
+   if (!mvq->initialized)
return;
 
-   if (mvq->fw_state != MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND)
+   switch (mvq->fw_state) {
+   case MLX5_VIRTIO_NET_Q_OBJECT_STATE_INIT:
+   /* Due to a FW quirk we need to modify the VQ fields first then 
change state.
+* This should be fixed soon. After that, a single command can 
be used.
+*/
+   if (modify_virtqueue(ndev, mvq, 0))
+   mlx5_vdpa_warn(&ndev->mvdev,
+   "modify vq properties failed for vq %u\n", 
mvq->index);
+   break;
+   case MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND:
+   if (!is_resumable(ndev)) {
+   mlx5_vdpa_warn(&ndev->mvdev, "vq %d is not 
resumable\n", mvq->index);
+   return;
+   }
+   break;
+   case MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY:
return;
+   default:
+   mlx5_vdpa_warn(&ndev->mvdev, "resume vq %u called from bad 
state %d\n",
+  mvq->index, mvq->fw_state);
+   return;
+   }
 
if (modify_virtqueue_state(ndev, mvq, 
MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY))
mlx5_vdpa_warn(&ndev->mvdev, "modify to resume failed for vq 
%u\n", mvq->index);

-- 
2.45.1




[PATCH vhost v2 15/24] vdpa/mlx5: Allow creation of blank VQs

2024-06-26 Thread Dragos Tatulea
Based on the filled flag, create VQs that are filled or blank.
Blank VQs will be filled in later through VQ modify.

Downstream patches will make use of this to pre-create blank VQs at
vdpa device creation.

Signed-off-by: Dragos Tatulea 
Reviewed-by: Cosmin Ratiu 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 85 +--
 1 file changed, 55 insertions(+), 30 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index a8ac542f30f7..0a62ce0b4af8 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -158,7 +158,7 @@ static bool is_index_valid(struct mlx5_vdpa_dev *mvdev, u16 
idx)
 
 static void free_fixed_resources(struct mlx5_vdpa_net *ndev);
 static void mvqs_set_defaults(struct mlx5_vdpa_net *ndev);
-static int setup_vq_resources(struct mlx5_vdpa_net *ndev);
+static int setup_vq_resources(struct mlx5_vdpa_net *ndev, bool filled);
 static void teardown_vq_resources(struct mlx5_vdpa_net *ndev);
 
 static bool mlx5_vdpa_debug;
@@ -874,13 +874,16 @@ static bool msix_mode_supported(struct mlx5_vdpa_dev 
*mvdev)
pci_msix_can_alloc_dyn(mvdev->mdev->pdev);
 }
 
-static int create_virtqueue(struct mlx5_vdpa_net *ndev, struct 
mlx5_vdpa_virtqueue *mvq)
+static int create_virtqueue(struct mlx5_vdpa_net *ndev,
+   struct mlx5_vdpa_virtqueue *mvq,
+   bool filled)
 {
int inlen = MLX5_ST_SZ_BYTES(create_virtio_net_q_in);
u32 out[MLX5_ST_SZ_DW(create_virtio_net_q_out)] = {};
struct mlx5_vdpa_dev *mvdev = &ndev->mvdev;
struct mlx5_vdpa_mr *vq_mr;
struct mlx5_vdpa_mr *vq_desc_mr;
+   u64 features = filled ? mvdev->actual_features : mvdev->mlx_features;
void *obj_context;
u16 mlx_features;
void *cmd_hdr;
@@ -898,7 +901,7 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, 
struct mlx5_vdpa_virtque
goto err_alloc;
}
 
-   mlx_features = get_features(ndev->mvdev.actual_features);
+   mlx_features = get_features(features);
cmd_hdr = MLX5_ADDR_OF(create_virtio_net_q_in, in, 
general_obj_in_cmd_hdr);
 
MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, opcode, 
MLX5_CMD_OP_CREATE_GENERAL_OBJECT);
@@ -906,8 +909,6 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, 
struct mlx5_vdpa_virtque
MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, uid, ndev->mvdev.res.uid);
 
obj_context = MLX5_ADDR_OF(create_virtio_net_q_in, in, obj_context);
-   MLX5_SET(virtio_net_q_object, obj_context, hw_available_index, 
mvq->avail_idx);
-   MLX5_SET(virtio_net_q_object, obj_context, hw_used_index, 
mvq->used_idx);
MLX5_SET(virtio_net_q_object, obj_context, queue_feature_bit_mask_12_3,
 mlx_features >> 3);
MLX5_SET(virtio_net_q_object, obj_context, queue_feature_bit_mask_2_0,
@@ -929,17 +930,36 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, 
struct mlx5_vdpa_virtque
MLX5_SET(virtio_q, vq_ctx, queue_index, mvq->index);
MLX5_SET(virtio_q, vq_ctx, queue_size, mvq->num_ent);
MLX5_SET(virtio_q, vq_ctx, virtio_version_1_0,
-!!(ndev->mvdev.actual_features & BIT_ULL(VIRTIO_F_VERSION_1)));
-   MLX5_SET64(virtio_q, vq_ctx, desc_addr, mvq->desc_addr);
-   MLX5_SET64(virtio_q, vq_ctx, used_addr, mvq->device_addr);
-   MLX5_SET64(virtio_q, vq_ctx, available_addr, mvq->driver_addr);
-   vq_mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP]];
-   if (vq_mr)
-   MLX5_SET(virtio_q, vq_ctx, virtio_q_mkey, vq_mr->mkey);
-
-   vq_desc_mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_DESC_GROUP]];
-   if (vq_desc_mr && MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, 
desc_group_mkey_supported))
-   MLX5_SET(virtio_q, vq_ctx, desc_group_mkey, vq_desc_mr->mkey);
+!!(features & BIT_ULL(VIRTIO_F_VERSION_1)));
+
+   if (filled) {
+   MLX5_SET(virtio_net_q_object, obj_context, hw_available_index, 
mvq->avail_idx);
+   MLX5_SET(virtio_net_q_object, obj_context, hw_used_index, 
mvq->used_idx);
+
+   MLX5_SET64(virtio_q, vq_ctx, desc_addr, mvq->desc_addr);
+   MLX5_SET64(virtio_q, vq_ctx, used_addr, mvq->device_addr);
+   MLX5_SET64(virtio_q, vq_ctx, available_addr, mvq->driver_addr);
+
+   vq_mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP]];
+   if (vq_mr)
+   MLX5_SET(virtio_q, vq_ctx, virtio_q_mkey, vq_mr->mkey);
+
+   vq_desc_mr = 
mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_DESC_GROUP]];
+   if (vq_desc_mr &&
+   MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, 
desc_group_mkey_supported))
+   MLX5_SET(virtio_q, vq_ctx, desc_group_mkey, 
vq_desc_mr->mkey);
+   } else {
+   /* If there is no mr update, make sure that the existing ones 
are

[PATCH vhost v2 14/24] vdpa/mlx5: Set mkey modified flags on all VQs

2024-06-26 Thread Dragos Tatulea
Otherwise, when virtqueues are moved from INIT to READY the latest mkey
will not be set appropriately.

Reviewed-by: Cosmin Ratiu 
Acked-by: Eugenio Pérez 
Signed-off-by: Dragos Tatulea 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 7f1551aa1f78..a8ac542f30f7 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -2868,7 +2868,7 @@ static int mlx5_vdpa_change_map(struct mlx5_vdpa_dev 
*mvdev,
 
mlx5_vdpa_update_mr(mvdev, new_mr, asid);
 
-   for (int i = 0; i < ndev->cur_num_vqs; i++)
+   for (int i = 0; i < mvdev->max_vqs; i++)
ndev->vqs[i].modified_fields |= 
MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY |

MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY;
 

-- 
2.45.1




[PATCH vhost v2 13/24] vdpa/mlx5: Start off rqt_size with max VQPs

2024-06-26 Thread Dragos Tatulea
Currently rqt_size is initialized during device flag configuration.
That's because it is the earliest moment when device knows if MQ
(multi queue) is on or off.

Shift this configuration earlier to device creation time. This implies
that non-MQ devices will have a larger RQT size. But the configuration
will still be correct.

This is done in preparation for the pre-creation of hardware virtqueues
at device add time. When that change will be added, RQT will be created
at device creation time so it needs to be initialized to its max size.

Reviewed-by: Cosmin Ratiu 
Acked-by: Eugenio Pérez 
Signed-off-by: Dragos Tatulea 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 406cc590fe42..7f1551aa1f78 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -2731,10 +2731,6 @@ static int mlx5_vdpa_set_driver_features(struct 
vdpa_device *vdev, u64 features)
return err;
 
ndev->mvdev.actual_features = features & ndev->mvdev.mlx_features;
-   if (ndev->mvdev.actual_features & BIT_ULL(VIRTIO_NET_F_MQ))
-   ndev->rqt_size = mlx5vdpa16_to_cpu(mvdev, 
ndev->config.max_virtqueue_pairs);
-   else
-   ndev->rqt_size = 1;
 
/* Interested in changes of vq features only. */
if (get_features(old_features) != get_features(mvdev->actual_features)) 
{
@@ -3718,8 +3714,12 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev 
*v_mdev, const char *name,
goto err_alloc;
}
 
-   if (device_features & BIT_ULL(VIRTIO_NET_F_MQ))
+   if (device_features & BIT_ULL(VIRTIO_NET_F_MQ)) {
config->max_virtqueue_pairs = cpu_to_mlx5vdpa16(mvdev, max_vqs 
/ 2);
+   ndev->rqt_size = max_vqs / 2;
+   } else {
+   ndev->rqt_size = 1;
+   }
 
ndev->mvdev.mlx_features = device_features;
mvdev->vdev.dma_dev = &mdev->pdev->dev;

-- 
2.45.1




[PATCH vhost v2 12/24] vdpa/mlx5: Set an initial size on the VQ

2024-06-26 Thread Dragos Tatulea
The virtqueue size is a pre-requisite for setting up any virtqueue
resources. For the upcoming optimization of creating virtqueues at
device add, the virtqueue size has to be configured.

The queue size check in setup_vq() will always be false. So remove it.

Signed-off-by: Dragos Tatulea 
Reviewed-by: Cosmin Ratiu 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index db86e541b788..406cc590fe42 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -58,6 +58,8 @@ MODULE_LICENSE("Dual BSD/GPL");
  */
 #define MLX5V_DEFAULT_VQ_COUNT 2
 
+#define MLX5V_DEFAULT_VQ_SIZE 256
+
 struct mlx5_vdpa_cq_buf {
struct mlx5_frag_buf_ctrl fbc;
struct mlx5_frag_buf frag_buf;
@@ -1445,9 +1447,6 @@ static int setup_vq(struct mlx5_vdpa_net *ndev, struct 
mlx5_vdpa_virtqueue *mvq)
u16 idx = mvq->index;
int err;
 
-   if (!mvq->num_ent)
-   return 0;
-
if (mvq->initialized)
return 0;
 
@@ -3523,6 +3522,7 @@ static void mvqs_set_defaults(struct mlx5_vdpa_net *ndev)
mvq->ndev = ndev;
mvq->fwqp.fw = true;
mvq->fw_state = MLX5_VIRTIO_NET_Q_OBJECT_NONE;
+   mvq->num_ent = MLX5V_DEFAULT_VQ_SIZE;
}
 }
 

-- 
2.45.1




[PATCH vhost v2 11/24] vdpa/mlx5: Add support for modifying the VQ features field

2024-06-26 Thread Dragos Tatulea
This is done in preparation for the pre-creation of hardware virtqueues
at device add time.

Signed-off-by: Dragos Tatulea 
Reviewed-by: Cosmin Ratiu 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c  | 12 +++-
 include/linux/mlx5/mlx5_ifc_vdpa.h |  1 +
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index b104849f8477..db86e541b788 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1287,6 +1287,15 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
MLX5_SET(virtio_q, vq_ctx, virtio_version_1_0,
!!(ndev->mvdev.actual_features & 
BIT_ULL(VIRTIO_F_VERSION_1)));
 
+   if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_QUEUE_FEATURES) {
+   u16 mlx_features = get_features(ndev->mvdev.actual_features);
+
+   MLX5_SET(virtio_net_q_object, obj_context, 
queue_feature_bit_mask_12_3,
+mlx_features >> 3);
+   MLX5_SET(virtio_net_q_object, obj_context, 
queue_feature_bit_mask_2_0,
+mlx_features & 7);
+   }
+
if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY) {
vq_mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP]];
 
@@ -2734,7 +2743,8 @@ static int mlx5_vdpa_set_driver_features(struct 
vdpa_device *vdev, u64 features)
struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[i];
 
mvq->modified_fields |= (
-   MLX5_VIRTQ_MODIFY_MASK_QUEUE_VIRTIO_VERSION
+   MLX5_VIRTQ_MODIFY_MASK_QUEUE_VIRTIO_VERSION |
+   MLX5_VIRTQ_MODIFY_MASK_QUEUE_FEATURES
);
}
}
diff --git a/include/linux/mlx5/mlx5_ifc_vdpa.h 
b/include/linux/mlx5/mlx5_ifc_vdpa.h
index 34f27c01cec9..58dfa2ee7c83 100644
--- a/include/linux/mlx5/mlx5_ifc_vdpa.h
+++ b/include/linux/mlx5/mlx5_ifc_vdpa.h
@@ -150,6 +150,7 @@ enum {
MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_USED_IDX= (u64)1 << 8,
MLX5_VIRTQ_MODIFY_MASK_QUEUE_VIRTIO_VERSION = (u64)1 << 10,
MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY= (u64)1 << 11,
+   MLX5_VIRTQ_MODIFY_MASK_QUEUE_FEATURES   = (u64)1 << 12,
MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY  = (u64)1 << 14,
 };
 

-- 
2.45.1




[PATCH vhost v2 10/24] vdpa/mlx5: Add support for modifying the virtio_version VQ field

2024-06-26 Thread Dragos Tatulea
This is done in preparation for the pre-creation of hardware virtqueues
at device add time.

Signed-off-by: Dragos Tatulea 
Reviewed-by: Cosmin Ratiu 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c  | 16 
 include/linux/mlx5/mlx5_ifc_vdpa.h |  1 +
 2 files changed, 17 insertions(+)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 739c2886fc33..b104849f8477 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1283,6 +1283,10 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_USED_IDX)
MLX5_SET(virtio_net_q_object, obj_context, hw_used_index, 
mvq->used_idx);
 
+   if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_QUEUE_VIRTIO_VERSION)
+   MLX5_SET(virtio_q, vq_ctx, virtio_version_1_0,
+   !!(ndev->mvdev.actual_features & 
BIT_ULL(VIRTIO_F_VERSION_1)));
+
if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY) {
vq_mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP]];
 
@@ -2709,6 +2713,7 @@ static int mlx5_vdpa_set_driver_features(struct 
vdpa_device *vdev, u64 features)
 {
struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+   u64 old_features = mvdev->actual_features;
int err;
 
print_features(mvdev, features, true);
@@ -2723,6 +2728,17 @@ static int mlx5_vdpa_set_driver_features(struct 
vdpa_device *vdev, u64 features)
else
ndev->rqt_size = 1;
 
+   /* Interested in changes of vq features only. */
+   if (get_features(old_features) != get_features(mvdev->actual_features)) 
{
+   for (int i = 0; i < mvdev->max_vqs; ++i) {
+   struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[i];
+
+   mvq->modified_fields |= (
+   MLX5_VIRTQ_MODIFY_MASK_QUEUE_VIRTIO_VERSION
+   );
+   }
+   }
+
update_cvq_info(mvdev);
return err;
 }
diff --git a/include/linux/mlx5/mlx5_ifc_vdpa.h 
b/include/linux/mlx5/mlx5_ifc_vdpa.h
index 40371c916cf9..34f27c01cec9 100644
--- a/include/linux/mlx5/mlx5_ifc_vdpa.h
+++ b/include/linux/mlx5/mlx5_ifc_vdpa.h
@@ -148,6 +148,7 @@ enum {
MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_ADDRS   = (u64)1 << 6,
MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_AVAIL_IDX   = (u64)1 << 7,
MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_USED_IDX= (u64)1 << 8,
+   MLX5_VIRTQ_MODIFY_MASK_QUEUE_VIRTIO_VERSION = (u64)1 << 10,
MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY= (u64)1 << 11,
MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY  = (u64)1 << 14,
 };

-- 
2.45.1




[PATCH vhost v2 08/24] vdpa/mlx5: Clear and reinitialize software VQ data on reset

2024-06-26 Thread Dragos Tatulea
The hardware VQ configuration is mirrored by data in struct
mlx5_vdpa_virtqueue . Instead of clearing just a few fields at reset,
fully clear the struct and initialize with the appropriate default
values.

As clear_vqs_ready() is used only during reset, get rid of it.

Reviewed-by: Cosmin Ratiu 
Acked-by: Eugenio Pérez 
Signed-off-by: Dragos Tatulea 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 16 +++-
 1 file changed, 3 insertions(+), 13 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index c8b5c87f001d..de013b5a2815 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -2941,18 +2941,6 @@ static void teardown_vq_resources(struct mlx5_vdpa_net 
*ndev)
ndev->setup = false;
 }
 
-static void clear_vqs_ready(struct mlx5_vdpa_net *ndev)
-{
-   int i;
-
-   for (i = 0; i < ndev->mvdev.max_vqs; i++) {
-   ndev->vqs[i].ready = false;
-   ndev->vqs[i].modified_fields = 0;
-   }
-
-   ndev->mvdev.cvq.ready = false;
-}
-
 static int setup_cvq_vring(struct mlx5_vdpa_dev *mvdev)
 {
struct mlx5_control_vq *cvq = &mvdev->cvq;
@@ -3035,12 +3023,14 @@ static int mlx5_vdpa_compat_reset(struct vdpa_device 
*vdev, u32 flags)
down_write(&ndev->reslock);
unregister_link_notifier(ndev);
teardown_vq_resources(ndev);
-   clear_vqs_ready(ndev);
+   init_mvqs(ndev);
+
if (flags & VDPA_RESET_F_CLEAN_MAP)
mlx5_vdpa_destroy_mr_resources(&ndev->mvdev);
ndev->mvdev.status = 0;
ndev->mvdev.suspended = false;
ndev->cur_num_vqs = MLX5V_DEFAULT_VQ_COUNT;
+   ndev->mvdev.cvq.ready = false;
ndev->mvdev.cvq.received_desc = 0;
ndev->mvdev.cvq.completed_desc = 0;
memset(ndev->event_cbs, 0, sizeof(*ndev->event_cbs) * (mvdev->max_vqs + 
1));

-- 
2.45.1




[PATCH vhost v2 09/24] vdpa/mlx5: Rename init_mvqs

2024-06-26 Thread Dragos Tatulea
Function is used to set default values, so name it accordingly.

Signed-off-by: Dragos Tatulea 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index de013b5a2815..739c2886fc33 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -155,7 +155,7 @@ static bool is_index_valid(struct mlx5_vdpa_dev *mvdev, u16 
idx)
 }
 
 static void free_fixed_resources(struct mlx5_vdpa_net *ndev);
-static void init_mvqs(struct mlx5_vdpa_net *ndev);
+static void mvqs_set_defaults(struct mlx5_vdpa_net *ndev);
 static int setup_vq_resources(struct mlx5_vdpa_net *ndev);
 static void teardown_vq_resources(struct mlx5_vdpa_net *ndev);
 
@@ -2810,7 +2810,7 @@ static void restore_channels_info(struct mlx5_vdpa_net 
*ndev)
int i;
 
mlx5_clear_vqs(ndev);
-   init_mvqs(ndev);
+   mvqs_set_defaults(ndev);
for (i = 0; i < ndev->mvdev.max_vqs; i++) {
mvq = &ndev->vqs[i];
ri = &mvq->ri;
@@ -3023,7 +3023,7 @@ static int mlx5_vdpa_compat_reset(struct vdpa_device 
*vdev, u32 flags)
down_write(&ndev->reslock);
unregister_link_notifier(ndev);
teardown_vq_resources(ndev);
-   init_mvqs(ndev);
+   mvqs_set_defaults(ndev);
 
if (flags & VDPA_RESET_F_CLEAN_MAP)
mlx5_vdpa_destroy_mr_resources(&ndev->mvdev);
@@ -3485,7 +3485,7 @@ static void free_fixed_resources(struct mlx5_vdpa_net 
*ndev)
res->valid = false;
 }
 
-static void init_mvqs(struct mlx5_vdpa_net *ndev)
+static void mvqs_set_defaults(struct mlx5_vdpa_net *ndev)
 {
struct mlx5_vdpa_virtqueue *mvq;
int i;
@@ -3635,7 +3635,7 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev 
*v_mdev, const char *name,
}
ndev->cur_num_vqs = MLX5V_DEFAULT_VQ_COUNT;
 
-   init_mvqs(ndev);
+   mvqs_set_defaults(ndev);
allocate_irqs(ndev);
init_rwsem(&ndev->reslock);
config = &ndev->config;

-- 
2.45.1




[PATCH vhost v2 07/24] vdpa/mlx5: Initialize and reset device with one queue pair

2024-06-26 Thread Dragos Tatulea
The virtio spec says that a vdpa device should start off with one queue
pair. The driver is already compliant.

This patch moves the initialization to device add and reset times. This
is done in preparation for the pre-creation of hardware virtqueues at
device add time.

Signed-off-by: Dragos Tatulea 
Reviewed-by: Cosmin Ratiu 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index eca6f68c2eda..c8b5c87f001d 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -48,6 +48,16 @@ MODULE_LICENSE("Dual BSD/GPL");
 
 #define MLX5V_UNTAGGED 0x1000
 
+/* Device must start with 1 queue pair, as per VIRTIO v1.2 spec, section
+ * 5.1.6.5.5 "Device operation in multiqueue mode":
+ *
+ * Multiqueue is disabled by default.
+ * The driver enables multiqueue by sending a command using class
+ * VIRTIO_NET_CTRL_MQ. The command selects the mode of multiqueue
+ * operation, as follows: ...
+ */
+#define MLX5V_DEFAULT_VQ_COUNT 2
+
 struct mlx5_vdpa_cq_buf {
struct mlx5_frag_buf_ctrl fbc;
struct mlx5_frag_buf frag_buf;
@@ -2713,16 +2723,6 @@ static int mlx5_vdpa_set_driver_features(struct 
vdpa_device *vdev, u64 features)
else
ndev->rqt_size = 1;
 
-   /* Device must start with 1 queue pair, as per VIRTIO v1.2 spec, section
-* 5.1.6.5.5 "Device operation in multiqueue mode":
-*
-* Multiqueue is disabled by default.
-* The driver enables multiqueue by sending a command using class
-* VIRTIO_NET_CTRL_MQ. The command selects the mode of multiqueue
-* operation, as follows: ...
-*/
-   ndev->cur_num_vqs = 2;
-
update_cvq_info(mvdev);
return err;
 }
@@ -3040,7 +3040,7 @@ static int mlx5_vdpa_compat_reset(struct vdpa_device 
*vdev, u32 flags)
mlx5_vdpa_destroy_mr_resources(&ndev->mvdev);
ndev->mvdev.status = 0;
ndev->mvdev.suspended = false;
-   ndev->cur_num_vqs = 0;
+   ndev->cur_num_vqs = MLX5V_DEFAULT_VQ_COUNT;
ndev->mvdev.cvq.received_desc = 0;
ndev->mvdev.cvq.completed_desc = 0;
memset(ndev->event_cbs, 0, sizeof(*ndev->event_cbs) * (mvdev->max_vqs + 
1));
@@ -3643,6 +3643,7 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev 
*v_mdev, const char *name,
err = -ENOMEM;
goto err_alloc;
}
+   ndev->cur_num_vqs = MLX5V_DEFAULT_VQ_COUNT;
 
init_mvqs(ndev);
allocate_irqs(ndev);

-- 
2.45.1




[PATCH vhost v2 06/24] vdpa/mlx5: Remove duplicate suspend code

2024-06-26 Thread Dragos Tatulea
Use the dedicated suspend_vqs() function instead.

Reviewed-by: Cosmin Ratiu 
Reviewed-by: Eugenio Pérez 
Signed-off-by: Dragos Tatulea 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 51630b1935f4..eca6f68c2eda 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -3355,17 +3355,12 @@ static int mlx5_vdpa_suspend(struct vdpa_device *vdev)
 {
struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
-   struct mlx5_vdpa_virtqueue *mvq;
-   int i;
 
mlx5_vdpa_info(mvdev, "suspending device\n");
 
down_write(&ndev->reslock);
unregister_link_notifier(ndev);
-   for (i = 0; i < ndev->cur_num_vqs; i++) {
-   mvq = &ndev->vqs[i];
-   suspend_vq(ndev, mvq);
-   }
+   suspend_vqs(ndev);
mlx5_vdpa_cvq_suspend(mvdev);
mvdev->suspended = true;
up_write(&ndev->reslock);

-- 
2.45.1




[PATCH vhost v2 05/24] vdpa/mlx5: Iterate over active VQs during suspend/resume

2024-06-26 Thread Dragos Tatulea
No need to iterate over max number of VQs.

Reviewed-by: Cosmin Ratiu 
Acked-by: Eugenio Pérez 
Signed-off-by: Dragos Tatulea 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 96782b34e2b2..51630b1935f4 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1504,7 +1504,7 @@ static void suspend_vqs(struct mlx5_vdpa_net *ndev)
 {
int i;
 
-   for (i = 0; i < ndev->mvdev.max_vqs; i++)
+   for (i = 0; i < ndev->cur_num_vqs; i++)
suspend_vq(ndev, &ndev->vqs[i]);
 }
 
@@ -1522,7 +1522,7 @@ static void resume_vq(struct mlx5_vdpa_net *ndev, struct 
mlx5_vdpa_virtqueue *mv
 
 static void resume_vqs(struct mlx5_vdpa_net *ndev)
 {
-   for (int i = 0; i < ndev->mvdev.max_vqs; i++)
+   for (int i = 0; i < ndev->cur_num_vqs; i++)
resume_vq(ndev, &ndev->vqs[i]);
 }
 

-- 
2.45.1




[PATCH vhost v2 04/24] vdpa/mlx5: Drop redundant check in teardown_virtqueues()

2024-06-26 Thread Dragos Tatulea
The check is done inside teardown_vq().

Reviewed-by: Cosmin Ratiu 
Reviewed-by: Eugenio Pérez 
Signed-off-by: Dragos Tatulea 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index b4d9ef4f66c8..96782b34e2b2 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -2559,16 +2559,10 @@ static int setup_virtqueues(struct mlx5_vdpa_dev *mvdev)
 
 static void teardown_virtqueues(struct mlx5_vdpa_net *ndev)
 {
-   struct mlx5_vdpa_virtqueue *mvq;
int i;
 
-   for (i = ndev->mvdev.max_vqs - 1; i >= 0; i--) {
-   mvq = &ndev->vqs[i];
-   if (!mvq->initialized)
-   continue;
-
-   teardown_vq(ndev, mvq);
-   }
+   for (i = ndev->mvdev.max_vqs - 1; i >= 0; i--)
+   teardown_vq(ndev, &ndev->vqs[i]);
 }
 
 static void update_cvq_info(struct mlx5_vdpa_dev *mvdev)

-- 
2.45.1




[PATCH vhost v2 03/24] vdpa/mlx5: Drop redundant code

2024-06-26 Thread Dragos Tatulea
Originally, the second loop initialized the CVQ. But (acde3929492b
("vdpa/mlx5: Use consistent RQT size") initialized all the queues in the
first loop, so the second iteration in init_mvqs() is never called
because the first one will iterate up to max_vqs.

Reviewed-by: Cosmin Ratiu 
Acked-by: Eugenio Pérez 
Signed-off-by: Dragos Tatulea 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 1ad281cbc541..b4d9ef4f66c8 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -3519,12 +3519,6 @@ static void init_mvqs(struct mlx5_vdpa_net *ndev)
mvq->fwqp.fw = true;
mvq->fw_state = MLX5_VIRTIO_NET_Q_OBJECT_NONE;
}
-   for (; i < ndev->mvdev.max_vqs; i++) {
-   mvq = &ndev->vqs[i];
-   memset(mvq, 0, offsetof(struct mlx5_vdpa_virtqueue, ri));
-   mvq->index = i;
-   mvq->ndev = ndev;
-   }
 }
 
 struct mlx5_vdpa_mgmtdev {

-- 
2.45.1




[PATCH vhost v2 02/24] vdpa/mlx5: Make setup/teardown_vq_resources() symmetrical

2024-06-26 Thread Dragos Tatulea
... by changing the setup_vq_resources() parameter type.

Reviewed-by: Cosmin Ratiu 
Acked-by: Eugenio Pérez 
Signed-off-by: Dragos Tatulea 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 3422da0e344b..1ad281cbc541 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -146,7 +146,7 @@ static bool is_index_valid(struct mlx5_vdpa_dev *mvdev, u16 
idx)
 
 static void free_fixed_resources(struct mlx5_vdpa_net *ndev);
 static void init_mvqs(struct mlx5_vdpa_net *ndev);
-static int setup_vq_resources(struct mlx5_vdpa_dev *mvdev);
+static int setup_vq_resources(struct mlx5_vdpa_net *ndev);
 static void teardown_vq_resources(struct mlx5_vdpa_net *ndev);
 
 static bool mlx5_vdpa_debug;
@@ -2862,7 +2862,7 @@ static int mlx5_vdpa_change_map(struct mlx5_vdpa_dev 
*mvdev,
 
if (teardown) {
restore_channels_info(ndev);
-   err = setup_vq_resources(mvdev);
+   err = setup_vq_resources(ndev);
if (err)
return err;
}
@@ -2873,9 +2873,9 @@ static int mlx5_vdpa_change_map(struct mlx5_vdpa_dev 
*mvdev,
 }
 
 /* reslock must be held for this function */
-static int setup_vq_resources(struct mlx5_vdpa_dev *mvdev)
+static int setup_vq_resources(struct mlx5_vdpa_net *ndev)
 {
-   struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+   struct mlx5_vdpa_dev *mvdev = &ndev->mvdev;
int err;
 
WARN_ON(!rwsem_is_locked(&ndev->reslock));
@@ -2997,7 +2997,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device 
*vdev, u8 status)
goto err_setup;
}
register_link_notifier(ndev);
-   err = setup_vq_resources(mvdev);
+   err = setup_vq_resources(ndev);
if (err) {
mlx5_vdpa_warn(mvdev, "failed to setup 
driver\n");
goto err_driver;

-- 
2.45.1




[PATCH vhost v2 01/24] vdpa/mlx5: Clarify meaning thorough function rename

2024-06-26 Thread Dragos Tatulea
setup_driver()/teardown_driver() are a bit vague. These functions are
used for virtqueue resources.

Same for alloc_resources()/teardown_resources(): they represent fixed
resources that are meant to exist during the device lifetime.

Reviewed-by: Cosmin Ratiu 
Acked-by: Eugenio Pérez 
Signed-off-by: Dragos Tatulea 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index ecfc16151d61..3422da0e344b 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -144,10 +144,10 @@ static bool is_index_valid(struct mlx5_vdpa_dev *mvdev, 
u16 idx)
return idx <= mvdev->max_idx;
 }
 
-static void free_resources(struct mlx5_vdpa_net *ndev);
+static void free_fixed_resources(struct mlx5_vdpa_net *ndev);
 static void init_mvqs(struct mlx5_vdpa_net *ndev);
-static int setup_driver(struct mlx5_vdpa_dev *mvdev);
-static void teardown_driver(struct mlx5_vdpa_net *ndev);
+static int setup_vq_resources(struct mlx5_vdpa_dev *mvdev);
+static void teardown_vq_resources(struct mlx5_vdpa_net *ndev);
 
 static bool mlx5_vdpa_debug;
 
@@ -2848,7 +2848,7 @@ static int mlx5_vdpa_change_map(struct mlx5_vdpa_dev 
*mvdev,
if (err)
return err;
 
-   teardown_driver(ndev);
+   teardown_vq_resources(ndev);
}
 
mlx5_vdpa_update_mr(mvdev, new_mr, asid);
@@ -2862,7 +2862,7 @@ static int mlx5_vdpa_change_map(struct mlx5_vdpa_dev 
*mvdev,
 
if (teardown) {
restore_channels_info(ndev);
-   err = setup_driver(mvdev);
+   err = setup_vq_resources(mvdev);
if (err)
return err;
}
@@ -2873,7 +2873,7 @@ static int mlx5_vdpa_change_map(struct mlx5_vdpa_dev 
*mvdev,
 }
 
 /* reslock must be held for this function */
-static int setup_driver(struct mlx5_vdpa_dev *mvdev)
+static int setup_vq_resources(struct mlx5_vdpa_dev *mvdev)
 {
struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
int err;
@@ -2931,7 +2931,7 @@ static int setup_driver(struct mlx5_vdpa_dev *mvdev)
 }
 
 /* reslock must be held for this function */
-static void teardown_driver(struct mlx5_vdpa_net *ndev)
+static void teardown_vq_resources(struct mlx5_vdpa_net *ndev)
 {
 
WARN_ON(!rwsem_is_locked(&ndev->reslock));
@@ -2997,7 +2997,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device 
*vdev, u8 status)
goto err_setup;
}
register_link_notifier(ndev);
-   err = setup_driver(mvdev);
+   err = setup_vq_resources(mvdev);
if (err) {
mlx5_vdpa_warn(mvdev, "failed to setup 
driver\n");
goto err_driver;
@@ -3040,7 +3040,7 @@ static int mlx5_vdpa_compat_reset(struct vdpa_device 
*vdev, u32 flags)
 
down_write(&ndev->reslock);
unregister_link_notifier(ndev);
-   teardown_driver(ndev);
+   teardown_vq_resources(ndev);
clear_vqs_ready(ndev);
if (flags & VDPA_RESET_F_CLEAN_MAP)
mlx5_vdpa_destroy_mr_resources(&ndev->mvdev);
@@ -3197,7 +3197,7 @@ static void mlx5_vdpa_free(struct vdpa_device *vdev)
 
ndev = to_mlx5_vdpa_ndev(mvdev);
 
-   free_resources(ndev);
+   free_fixed_resources(ndev);
mlx5_vdpa_destroy_mr_resources(mvdev);
if (!is_zero_ether_addr(ndev->config.mac)) {
pfmdev = pci_get_drvdata(pci_physfn(mvdev->mdev->pdev));
@@ -3467,7 +3467,7 @@ static int query_mtu(struct mlx5_core_dev *mdev, u16 *mtu)
return 0;
 }
 
-static int alloc_resources(struct mlx5_vdpa_net *ndev)
+static int alloc_fixed_resources(struct mlx5_vdpa_net *ndev)
 {
struct mlx5_vdpa_net_resources *res = &ndev->res;
int err;
@@ -3494,7 +3494,7 @@ static int alloc_resources(struct mlx5_vdpa_net *ndev)
return err;
 }
 
-static void free_resources(struct mlx5_vdpa_net *ndev)
+static void free_fixed_resources(struct mlx5_vdpa_net *ndev)
 {
struct mlx5_vdpa_net_resources *res = &ndev->res;
 
@@ -3735,7 +3735,7 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev 
*v_mdev, const char *name,
goto err_res;
}
 
-   err = alloc_resources(ndev);
+   err = alloc_fixed_resources(ndev);
if (err)
goto err_mr;
 
@@ -3758,7 +3758,7 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev 
*v_mdev, const char *name,
 err_reg:
destroy_workqueue(mvdev->wq);
 err_res2:
-   free_resources(ndev);
+   free_fixed_resources(ndev);
 err_mr:
mlx5_vdpa_destroy_mr_resources(mvdev);
 err_res:

-- 
2.45.1




[PATCH vhost v2 00/24] vdpa/mlx5: Pre-create HW VQs to reduce LM downtime

2024-06-26 Thread Dragos Tatulea
According to the measurements for vDPA Live Migration downtime [0], one
large source of downtime is the creation of hardware VQs and their
associated resources on the devices on the destination VM.

Previous series ([1], [2]) addressed the source part of the Live
Migration downtime. This series addresses the destination part: instead
of creating hardware VQs and their dependent resources when the device
goes into the DRIVER_OK state (which is during downtime), create "blank"
VQs at device creation time and only modify them to the received
configuration before starting the VQs (DRIVER_OK state).

The caveat here is that mlx5_vdpa VQs don't support modifying the VQ
size. VQs will be created with a convenient default size and when this
size is changed, they will be recreated.

The beginning of the series consists of refactorings and preparation.

After that, some preparations are made:
- Allow creation of "blank" VQs by not configuring them during
  create_virtqueue() if there are no modified fields.
- The VQ Init to Ready state transition is consolidated into the
  resume_vq().
- Add error handling to suspend/resume code paths.

Then VQs are created at device creation time.

Finally, the special cases that need full VQ resource recreation are
handled.

On a 64 CPU, 256 GB VM with 1 vDPA device of 16 VQps, the full VQ
resource creation + resume time was ~370ms. Now it's down to 60 ms
(only VQ config and resume). The measurements were done on a ConnectX6DX
based vDPA device.

[0] 
https://lore.kernel.org/qemu-devel/1701970793-6865-1-git-send-email-si-wei@oracle.com/
[1] https://lore.kernel.org/lkml/20231018171456.1624030-2-dtatu...@nvidia.com
[2] https://lore.kernel.org/lkml/20231219180858.120898-1-dtatu...@nvidia.com

---
Changes in v2:
- Renamed a function based on v1 review.
- Addressed small nits from v1 review.
- Added improvement numbers in commit message instead of only cover
  letter.
- Link to v1: 
https://lore.kernel.org/r/20240617-stage-vdpa-vq-precreate-v1-0-8c0483f0c...@nvidia.com

---
Dragos Tatulea (24):
  vdpa/mlx5: Clarify meaning thorough function rename
  vdpa/mlx5: Make setup/teardown_vq_resources() symmetrical
  vdpa/mlx5: Drop redundant code
  vdpa/mlx5: Drop redundant check in teardown_virtqueues()
  vdpa/mlx5: Iterate over active VQs during suspend/resume
  vdpa/mlx5: Remove duplicate suspend code
  vdpa/mlx5: Initialize and reset device with one queue pair
  vdpa/mlx5: Clear and reinitialize software VQ data on reset
  vdpa/mlx5: Rename init_mvqs
  vdpa/mlx5: Add support for modifying the virtio_version VQ field
  vdpa/mlx5: Add support for modifying the VQ features field
  vdpa/mlx5: Set an initial size on the VQ
  vdpa/mlx5: Start off rqt_size with max VQPs
  vdpa/mlx5: Set mkey modified flags on all VQs
  vdpa/mlx5: Allow creation of blank VQs
  vdpa/mlx5: Accept Init -> Ready VQ transition in resume_vq()
  vdpa/mlx5: Add error code for suspend/resume VQ
  vdpa/mlx5: Consolidate all VQ modify to Ready to use resume_vq()
  vdpa/mlx5: Forward error in suspend/resume device
  vdpa/mlx5: Use suspend/resume during VQP change
  vdpa/mlx5: Pre-create hardware VQs at vdpa .dev_add time
  vdpa/mlx5: Re-create HW VQs under certain conditions
  vdpa/mlx5: Don't reset VQs more than necessary
  vdpa/mlx5: Don't enable non-active VQs in .set_vq_ready()

 drivers/vdpa/mlx5/net/mlx5_vnet.c  | 429 +
 drivers/vdpa/mlx5/net/mlx5_vnet.h  |   1 +
 include/linux/mlx5/mlx5_ifc_vdpa.h |   2 +
 3 files changed, 293 insertions(+), 139 deletions(-)
---
base-commit: c8fae27d141a32a1624d0d0d5419d94252824498
change-id: 20240617-stage-vdpa-vq-precreate-76df151bed08

Best regards,
-- 
Dragos Tatulea 




Re: [PATCH vhost 18/23] vdpa/mlx5: Forward error in suspend/resume device

2024-06-26 Thread Dragos Tatulea
On Sun, 2024-06-23 at 19:19 +0800, Zhu Yanjun wrote:
> 在 2024/6/17 23:07, Dragos Tatulea 写道:
> > Start using the suspend/resume_vq() error return codes previously added.
> > 
> > Signed-off-by: Dragos Tatulea 
> > Reviewed-by: Cosmin Ratiu 
> > ---
> >   drivers/vdpa/mlx5/net/mlx5_vnet.c | 12 
> >   1 file changed, 8 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > index f5d5b25cdb01..0e1c1b7ff297 100644
> > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > @@ -3436,22 +3436,25 @@ static int mlx5_vdpa_suspend(struct vdpa_device 
> > *vdev)
> >   {
> > struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> > struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> > +   int err;
> 
> Reverse Christmas Tree?
Would have fixed the code if it would have been part of the patch. But it isn't.

> 
> Reviewed-by: Zhu Yanjun 
> 
Thanks!

> Zhu Yanjun
> >   
> > mlx5_vdpa_info(mvdev, "suspending device\n");
> >   
> > down_write(&ndev->reslock);
> > unregister_link_notifier(ndev);
> > -   suspend_vqs(ndev);
> > +   err = suspend_vqs(ndev);
> > mlx5_vdpa_cvq_suspend(mvdev);
> > mvdev->suspended = true;
> > up_write(&ndev->reslock);
> > -   return 0;
> > +
> > +   return err;
> >   }
> >   
> >   static int mlx5_vdpa_resume(struct vdpa_device *vdev)
> >   {
> > struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> > struct mlx5_vdpa_net *ndev;
> > +   int err;
> >   
> > ndev = to_mlx5_vdpa_ndev(mvdev);
> >   
> > @@ -3459,10 +3462,11 @@ static int mlx5_vdpa_resume(struct vdpa_device 
> > *vdev)
> >   
> > down_write(&ndev->reslock);
> > mvdev->suspended = false;
> > -   resume_vqs(ndev);
> > +   err = resume_vqs(ndev);
> > register_link_notifier(ndev);
> > up_write(&ndev->reslock);
> > -   return 0;
> > +
> > +   return err;
> >   }
> >   
> >   static int mlx5_set_group_asid(struct vdpa_device *vdev, u32 group,
> > 
> 



Re: [PATCH vhost 20/23] vdpa/mlx5: Pre-create hardware VQs at vdpa .dev_add time

2024-06-26 Thread Dragos Tatulea
On Wed, 2024-06-19 at 17:54 +0200, Eugenio Perez Martin wrote:
> On Mon, Jun 17, 2024 at 5:09 PM Dragos Tatulea  wrote:
> > 
> > Currently, hardware VQs are created right when the vdpa device gets into
> > DRIVER_OK state. That is easier because most of the VQ state is known by
> > then.
> > 
> > This patch switches to creating all VQs and their associated resources
> > at device creation time. The motivation is to reduce the vdpa device
> > live migration downtime by moving the expensive operation of creating
> > all the hardware VQs and their associated resources out of downtime on
> > the destination VM.
> > 
> > The VQs are now created in a blank state. The VQ configuration will
> > happen later, on DRIVER_OK. Then the configuration will be applied when
> > the VQs are moved to the Ready state.
> > 
> > When .set_vq_ready() is called on a VQ before DRIVER_OK, special care is
> > needed: now that the VQ is already created a resume_vq() will be
> > triggered too early when no mr has been configured yet. Skip calling
> > resume_vq() in this case, let it be handled during DRIVER_OK.
> > 
> > For virtio-vdpa, the device configuration is done earlier during
> > .vdpa_dev_add() by vdpa_register_device(). Avoid calling
> > setup_vq_resources() a second time in that case.
> > 
> 
> I guess this happens if virtio_vdpa is already loaded, but I cannot
> see how this is different here. Apart from the IOTLB, what else does
> it change from the mlx5_vdpa POV?
> 
I don't understand your question, could you rephrase or provide more context
please?

Thanks,
Dragos

> > Signed-off-by: Dragos Tatulea 
> > Reviewed-by: Cosmin Ratiu 
> > ---
> >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 37 
> > -
> >  1 file changed, 32 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > index 249b5afbe34a..b2836fd3d1dd 100644
> > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > @@ -2444,7 +2444,7 @@ static void mlx5_vdpa_set_vq_ready(struct vdpa_device 
> > *vdev, u16 idx, bool ready
> > mvq = &ndev->vqs[idx];
> > if (!ready) {
> > suspend_vq(ndev, mvq);
> > -   } else {
> > +   } else if (mvdev->status & VIRTIO_CONFIG_S_DRIVER_OK) {
> > if (resume_vq(ndev, mvq))
> > ready = false;
> > }
> > @@ -3078,10 +3078,18 @@ static void mlx5_vdpa_set_status(struct vdpa_device 
> > *vdev, u8 status)
> > goto err_setup;
> > }
> > register_link_notifier(ndev);
> > -   err = setup_vq_resources(ndev, true);
> > -   if (err) {
> > -   mlx5_vdpa_warn(mvdev, "failed to setup 
> > driver\n");
> > -   goto err_driver;
> > +   if (ndev->setup) {
> > +   err = resume_vqs(ndev);
> > +   if (err) {
> > +   mlx5_vdpa_warn(mvdev, "failed to 
> > resume VQs\n");
> > +   goto err_driver;
> > +   }
> > +   } else {
> > +   err = setup_vq_resources(ndev, true);
> > +   if (err) {
> > +   mlx5_vdpa_warn(mvdev, "failed to 
> > setup driver\n");
> > +   goto err_driver;
> > +   }
> > }
> > } else {
> > mlx5_vdpa_warn(mvdev, "did not expect DRIVER_OK to 
> > be cleared\n");
> > @@ -3142,6 +3150,7 @@ static int mlx5_vdpa_compat_reset(struct vdpa_device 
> > *vdev, u32 flags)
> > if (mlx5_vdpa_create_dma_mr(mvdev))
> > mlx5_vdpa_warn(mvdev, "create MR failed\n");
> > }
> > +   setup_vq_resources(ndev, false);
> > up_write(&ndev->reslock);
> > 
> > return 0;
> > @@ -3836,8 +3845,21 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev 
> > *v_mdev, const char *name,
> > goto err_reg;
> > 
> > mgtdev->ndev = ndev;
> > +
> > +   /* For virtio-vdpa, the device was set up during device register. */
> > +   if (ndev->setup)
> > +   return 0;
> > +
> > +   down_write(&ndev->reslock);
> > +   err = setup_vq_resources(ndev, false);
> > +   up_write(&ndev->reslock);
> > +   if (err)
> > +   goto err_setup_vq_res;
> > +
> > return 0;
> > 
> > +err_setup_vq_res:
> > +   _vdpa_unregister_device(&mvdev->vdev);
> >  err_reg:
> > destroy_workqueue(mvdev->wq);
> >  err_res2:
> > @@ -3863,6 +3885,11 @@ static void mlx5_vdpa_dev_del(struct vdpa_mgmt_dev 
> > *v_mdev, struct vdpa_device *
> > 
> > unregister_link_noti

Re: [PATCH v3 2/2] rust: add tracepoint support

2024-06-26 Thread Alice Ryhl
On Tue, Jun 25, 2024 at 8:11 PM Boqun Feng  wrote:
>
> On Fri, Jun 21, 2024 at 02:52:10PM +0200, Alice Ryhl wrote:
> [...]
> >
> > Hmm, I tried using the support where I have both events and hooks:
> >
> > #define CREATE_TRACE_POINTS
> > #define CREATE_RUST_TRACE_POINTS
> > #include 
> > #include 
> >
> > But it's not really working. Initially I thought that it's because I
> > need to undef DEFINE_RUST_DO_TRACE at the end of this file, but even
> > when I added that, I still get this error:
> >
> > error: redefinition of 'str__rust_binder__trace_system_name'
> >
> > Is the Rust support missing something, or is the answer just that you
> > can't have two files of the same name like this? Or am I doing
> > something else wrong?
> >
>
> Because your hooks/rust_binder.h and events/rust_binder.h use the same
> TRACE_SYSTEM name? Could you try something like:
>
> #define TRACE_SYSTEM rust_binder_hook
>
> in your hooks/rust_binder.h?

I was able to get it to work by moving the includes into two different
.c files. I don't think changing TRACE_SYSTEM works because it must
match the filename.

Alice



Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support

2024-06-26 Thread David Woodhouse
On Tue, 2024-06-25 at 15:22 -0700, John Stultz wrote:
> On Tue, Jun 25, 2024 at 2:48 PM David Woodhouse  wrote:
> > On Tue, 2024-06-25 at 23:34 +0200, Thomas Gleixner wrote:
> > > On Tue, Jun 25 2024 at 20:01, David Woodhouse wrote:
> > > > From: David Woodhouse 
> > > > 
> > > > The vmclock "device" provides a shared memory region with precision 
> > > > clock
> > > > information. By using shared memory, it is safe across Live Migration.
> > > > 
> > > > Like the KVM PTP clock, this can convert TSC-based cross timestamps into
> > > > KVM clock values. Unlike the KVM PTP clock, it does so only when such is
> > > > actually helpful.
> > > > 
> > > > The memory region of the device is also exposed to userspace so it can 
> > > > be
> > > > read or memory mapped by application which need reliable notification of
> > > > clock disruptions.
> > > 
> > > There is effort underway to expose PTP clocks to user space via VDSO.
> > 
> > Ooh, interesting. Got a reference to that please?
> > 
> > >  Can we please not expose an ad hoc interface for that?
> > 
> > Absolutely. I'm explicitly trying to intercept the virtio-rtc
> > specification here, to *avoid* having to do anything ad hoc.
> > 
> > Note that this is a "vDSO-style" interface from hypervisor to guest via
> > a shared memory region, not necessarily an actual vDSO.
> > 
> > But yes, it *is* intended to be exposed to userspace, so that userspace
> > can know the *accurate* time without a system call, and know that it
> > hasn't been perturbed by live migration.
> 
> Yea, I was going to raise a concern that just defining an mmaped
> structure means it has to trust the guest logic is as expected. It's
> good that it's versioned! :)

Right. Although it's basically a pvclock, and we've had those for ages.

The main difference here is that we add an indicator that tells the
guest that it's been live migrated, so any additional NTP/PTP
refinement that the *guest* has done of its oscillator, should now be
discarded.

It's also designed to be useful in "disruption-only" mode, where the
pvclock information isn't actually populated, so *all* it does is tell
guests that their clock is now hosed due to live migration.

That part is why it needs to be mappable directly to userspace, so that
userspace can not only get a timestamp but *also* know that it's
actually valid. All without a system call.

The critical use cases are financial systems where they incur massive
fines if they submit mis-timestamped transactions, and distributed
databases which rely on accurate timestamps (and error bounds) for
eventual coherence. Live migration can screw those completely.

I'm open to changing fairly much anything about the proposal as long as
we can address those use cases (which the existing virtio-rtc and other
KVM enlightenments do not).

> I'd fret a bit about exposing this to userland. It feels very similar
> to the old powerpc systemcfg implementation that similarly mapped just
> kernel data out to userland and was difficult to maintain as changes
> were made. Would including a code page like a proper vdso make sense
> to make this more flexible of an UABI to maintain?

I think the structure itself should be stable once we've bikeshedded it
a bit. But there is certainly some potential for vDSO functions which
help us expose it to the user...

This structure exposes a 'disruption count' which is updated every time
the TSC/counter is messed with by live migration. But what is userspace
actually going to *compare* it with?

It basically needs to compare it with the disruption count when the
clock was last synchronized, so maybe the kernel could export *that* to
vDSO too, then expose a simple vDSO function which reports whether the
clock is valid?

The 'invalid' code path could turn into an actual system call which
makes the kernel (check for itself and) call ntp_clear() when the
disruption occurs. Or maybe not just ntp_clear() but actually consume
the pvclock rate information directly and apply the *new* calibration?

That kind of thing would be great, and I've definitely tried to design
the structure so that it *can* be made a first-class citizen within the
kernel's timekeeping code and used like that.

But I was going to start with a more modest proposal that it's "just a
device", and applications which care about reliable time after LM would
have to /dev/vmclock0 and mmap it and check for themselves. (Which
would be assisted by things like the ClockBound library).




smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH V2 3/3] virtio-net: synchronize operstate with admin state on up/down

2024-06-26 Thread Michael S. Tsirkin
On Wed, Jun 26, 2024 at 09:58:32AM +0800, Jason Wang wrote:
> On Tue, Jun 25, 2024 at 4:32 PM Michael S. Tsirkin  wrote:
> >
> > On Tue, Jun 25, 2024 at 04:11:05PM +0800, Jason Wang wrote:
> > > On Tue, Jun 25, 2024 at 3:57 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Tue, Jun 25, 2024 at 03:46:44PM +0800, Jason Wang wrote:
> > > > > Workqueue is used to serialize those so we won't lose any change.
> > > >
> > > > So we don't need to re-read then?
> > > >
> > >
> > > We might have to re-read but I don't get why it is a problem for us.
> > >
> > > Thanks
> >
> > I don't think each ethtool command should force a full config read,
> > is what I mean. Only do it if really needed.
> 
> We don't, as we will check config_pending there.
> 
> Thanks

And config_pending set from an interrupt? That's fine.
But it's not what this patch does, right?

> >
> > --
> > MST
> >