[CFT][PATCH v3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-06-02 Thread Eric W. Biederman


When switching from kthreads to vhost_tasks two bugs were added:
1. The vhost worker tasks's now show up as processes so scripts doing
ps or ps a would not incorrectly detect the vhost task as another
process.  2. kthreads disabled freeze by setting PF_NOFREEZE, but
vhost tasks's didn't disable or add support for them.

To fix both bugs, this switches the vhost task to be thread in the
process that does the VHOST_SET_OWNER ioctl, and has vhost_worker call
get_signal to support SIGKILL/SIGSTOP and freeze signals. Note that
SIGKILL/STOP support is required because CLONE_THREAD requires
CLONE_SIGHAND which requires those 2 signals to be suppported.

This is a modified version of the patch written by Mike Christie
 which was a modified version of patch
originally written by Linus.

Much of what depended upon PF_IO_WORKER now depends on PF_USER_WORKER.
Including ignoring signals, setting up the register state, and having
get_signal return instead of calling do_group_exit.

Tidied up the vhost_task abstraction so that the definition of
vhost_task only needs to be visible inside of vhost_task.c.  Making
it easier to review the code and tell what needs to be done where.
As part of this the main loop has been moved from vhost_worker into
vhost_task_fn.  vhost_worker now returns true if work was done.

The main loop has been updated to call get_signal which handles
SIGSTOP, freezing, and collects the message that tells the thread to
exit as part of process exit.  This collection clears
__fatal_signal_pending.

The vhost tasks when it has been asked to exit runs until it has
no more work pending and then exits instead of sleeping.

Causing the other threads to stop feeding the vhost worker work and
having the vhost worker stop when it runs out of work should be enough
to avoid hangs in coredump rendezvous and when killing threads in a
multi-threaded exec.

The vhost thread is no longer guaranteed to be the last thread to
exit.  Which means it is possible a work item to be submitted after
the vhost work thread has exited.  If that happens the work item will
leak and vhost_worker_free will warn about the situtation.

Fixes: 6e890c5d5021 ("vhost: use vhost_tasks for worker threads")
Co-developed-by: Mike Christie 
Signed-off-by: Eric W. Biederman 
---

This fixes the ordering issue in vhost_task_fn so that get_signal
should not work.

This patch is a gamble that during process exit or de_thread in exec
work will not be commonly queued from other threads.

If this gamble turns out to be false the existing WARN_ON in
vhost_worker_free will fire.

Can folks test this and let us know if the WARN_ON fires?

Thank you.

 arch/x86/include/asm/fpu/sched.h |  2 +-
 arch/x86/kernel/fpu/context.h|  2 +-
 arch/x86/kernel/fpu/core.c   |  2 +-
 drivers/vhost/vhost.c| 24 +++-
 include/linux/sched/task.h   |  1 -
 include/linux/sched/vhost_task.h | 16 ++
 kernel/fork.c| 13 ++---
 kernel/signal.c  |  4 +-
 kernel/vhost_task.c  | 99 ++--
 9 files changed, 91 insertions(+), 72 deletions(-)

diff --git a/arch/x86/include/asm/fpu/sched.h b/arch/x86/include/asm/fpu/sched.h
index c2d6cd78ed0c..78fcde7b1f07 100644
--- a/arch/x86/include/asm/fpu/sched.h
+++ b/arch/x86/include/asm/fpu/sched.h
@@ -39,7 +39,7 @@ extern void fpu_flush_thread(void);
 static inline void switch_fpu_prepare(struct fpu *old_fpu, int cpu)
 {
if (cpu_feature_enabled(X86_FEATURE_FPU) &&
-   !(current->flags & (PF_KTHREAD | PF_IO_WORKER))) {
+   !(current->flags & (PF_KTHREAD | PF_USER_WORKER))) {
save_fpregs_to_fpstate(old_fpu);
/*
 * The save operation preserved register state, so the
diff --git a/arch/x86/kernel/fpu/context.h b/arch/x86/kernel/fpu/context.h
index 9fcfa5c4dad7..af5cbdd9bd29 100644
--- a/arch/x86/kernel/fpu/context.h
+++ b/arch/x86/kernel/fpu/context.h
@@ -57,7 +57,7 @@ static inline void fpregs_restore_userregs(void)
struct fpu *fpu = >thread.fpu;
int cpu = smp_processor_id();
 
-   if (WARN_ON_ONCE(current->flags & (PF_KTHREAD | PF_IO_WORKER)))
+   if (WARN_ON_ONCE(current->flags & (PF_KTHREAD | PF_USER_WORKER)))
return;
 
if (!fpregs_state_valid(fpu, cpu)) {
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index caf33486dc5e..1015af1ae562 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -426,7 +426,7 @@ void kernel_fpu_begin_mask(unsigned int kfpu_mask)
 
this_cpu_write(in_kernel_fpu, true);
 
-   if (!(current->flags & (PF_KTHREAD | PF_IO_WORKER)) &&
+   if (!(current->flags & (PF_KTHREAD | PF_USER_WORKER)) &&
!test_thread_flag(TIF_NEED_FPU_LOAD)) {
set_thread_flag(TIF_NEED_FPU_LOAD);
save_fpregs_to_fpstate(>thread.fpu);
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index a92af08e7864..85948f40ddfe 100644

Re: [PATCH 1/1] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-06-02 Thread Eric W. Biederman
Oleg Nesterov  writes:

> Hi Mike,
>
> sorry, but somehow I can't understand this patch...
>
> I'll try to read it with a fresh head on Weekend, but for example,
>
> On 06/01, Mike Christie wrote:
>>
>>  static int vhost_task_fn(void *data)
>>  {
>>  struct vhost_task *vtsk = data;
>> -int ret;
>> +bool dead = false;
>> +
>> +for (;;) {
>> +bool did_work;
>> +
>> +/* mb paired w/ vhost_task_stop */
>> +if (test_bit(VHOST_TASK_FLAGS_STOP, >flags))
>> +break;
>> +
>> +if (!dead && signal_pending(current)) {
>> +struct ksignal ksig;
>> +/*
>> + * Calling get_signal will block in SIGSTOP,
>> + * or clear fatal_signal_pending, but remember
>> + * what was set.
>> + *
>> + * This thread won't actually exit until all
>> + * of the file descriptors are closed, and
>> + * the release function is called.
>> + */
>> +dead = get_signal();
>> +if (dead)
>> +clear_thread_flag(TIF_SIGPENDING);
>
> this can't be right or I am totally confused.
>
> Another signal_wake_up() can come right after clear(SIGPENDING).

Technically yes.

However please not that prepare_signal does:
if (signal->flags & SIGNAL_GROUP_EXIT)
return false;

In general it is wrong to receive or attempt to process a signal
after task death has been decided.

Strictly speaking that doesn't cover de_thread, and coredumping
but still receiving any kind of signal at that point is rare
and generally wrong behavior.

Beyond that clearing TIF_SIGPENDING is just an optimization so
the thread can sleep in schedule and not spin.

> Again, I'll try to re-read this patch, but let me ask anyway...
>
> Do we have a plan B? I mean... iirc you have mentioned that you can
> change these code paths to do something like
>
>   if (killed)
>   tell_the_drivers_that_all_callbacks_will_fail();
>
>
> so that vhost_worker() can exit after get_signal() returns SIGKILL.
>
> Probably I misunderstood you, but it would be nice to avoid the changes
> in coredump/etc code just to add a temporary (iiuc!) fix.

One saving grace with the the vhost code is that you need to open
device nodes that normally have root-only permissions.

If we are willing to allow races in process shutdown to cause leaks I
think we can do something better, and put the burden of work on vhost
layer.

I will follow up with a patch doing that.

Eric

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 1/1] vringh: IOMEM support

2023-06-02 Thread kernel test robot
Hi Shunsuke,

kernel test robot noticed the following build warnings:

[auto build test WARNING on mst-vhost/linux-next]
[also build test WARNING on linus/master horms-ipvs/master v6.4-rc4 
next-20230602]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:
https://github.com/intel-lab-lkp/linux/commits/Shunsuke-Mie/vringh-IOMEM-support/20230602-135351
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
patch link:
https://lore.kernel.org/r/20230602055211.309960-2-mie%40igel.co.jp
patch subject: [PATCH v4 1/1] vringh: IOMEM support
config: nios2-randconfig-s053-20230531 
(https://download.01.org/0day-ci/archive/20230603/202306031019.wwkekrgz-...@intel.com/config)
compiler: nios2-linux-gcc (GCC) 12.3.0
reproduce:
mkdir -p ~/bin
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# apt-get install sparse
# sparse version: v0.6.4-39-gce1a6720-dirty
# 
https://github.com/intel-lab-lkp/linux/commit/de2a1f5220c32e953400f225aba6bd294a8d41b8
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review 
Shunsuke-Mie/vringh-IOMEM-support/20230602-135351
git checkout de2a1f5220c32e953400f225aba6bd294a8d41b8
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.3.0 ~/bin/make.cross 
C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=nios2 
olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.3.0 ~/bin/make.cross 
C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=nios2 
SHELL=/bin/bash drivers/vhost/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202306031019.wwkekrgz-...@intel.com/

sparse warnings: (new ones prefixed by >>)
>> drivers/vhost/vringh.c:1630:21: sparse: sparse: incorrect type in argument 1 
>> (different address spaces) @@ expected void volatile [noderef] __iomem 
>> *addr @@ got struct vring_used_elem *dst @@
   drivers/vhost/vringh.c:1630:21: sparse: expected void volatile [noderef] 
__iomem *addr
   drivers/vhost/vringh.c:1630:21: sparse: got struct vring_used_elem *dst
   drivers/vhost/vringh.c:592:18: sparse: sparse: restricted __virtio16 
degrades to integer
   drivers/vhost/vringh.c:592:18: sparse: sparse: restricted __virtio16 
degrades to integer
   drivers/vhost/vringh.c:592:18: sparse: sparse: cast to restricted __virtio16
   drivers/vhost/vringh.c:1280:23: sparse: sparse: restricted __virtio16 
degrades to integer
   drivers/vhost/vringh.c:1280:23: sparse: sparse: restricted __virtio16 
degrades to integer
   drivers/vhost/vringh.c:1280:23: sparse: sparse: cast to restricted __virtio16
>> drivers/vhost/vringh.c:1610:46: sparse: sparse: incorrect type in argument 1 
>> (different address spaces) @@ expected void const volatile [noderef] 
>> __iomem *addr @@ got restricted __virtio16 const [usertype] *p @@
   drivers/vhost/vringh.c:1610:46: sparse: expected void const volatile 
[noderef] __iomem *addr
   drivers/vhost/vringh.c:1610:46: sparse: got restricted __virtio16 const 
[usertype] *p
>> drivers/vhost/vringh.c:1610:45: sparse: sparse: incorrect type in argument 2 
>> (different base types) @@ expected restricted __virtio16 [usertype] val 
>> @@ got unsigned short @@
   drivers/vhost/vringh.c:1610:45: sparse: expected restricted __virtio16 
[usertype] val
   drivers/vhost/vringh.c:1610:45: sparse: got unsigned short
>> drivers/vhost/vringh.c:1623:28: sparse: sparse: incorrect type in argument 2 
>> (different address spaces) @@ expected void const volatile [noderef] 
>> __iomem *addr @@ got void const *src @@
   drivers/vhost/vringh.c:1623:28: sparse: expected void const volatile 
[noderef] __iomem *addr
   drivers/vhost/vringh.c:1623:28: sparse: got void const *src
>> drivers/vhost/vringh.c:1637:28: sparse: sparse: incorrect type in argument 2 
>> (different address spaces) @@ expected void const volatile [noderef] 
>> __iomem *addr @@ got void *src @@
   drivers/vhost/vringh.c:1637:28: sparse: expected void const volatile 
[noderef] __iomem *addr
   drivers/vhost/vringh.c:1637:28: sparse: got void *src
>> drivers/vhost/vringh.c:1644:21: sparse: sparse: incorrect type in argument 1 
>> (different address spaces) @@ expected void volatile [noderef] __iomem 
>> *addr @@ got void *dst @@
   drivers/vhost/vringh.c:1644:21: sparse: expected void volatile [noderef] 
__

Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-06-02 Thread Linus Torvalds
On Fri, Jun 2, 2023 at 1:59 PM Oleg Nesterov  wrote:
>
> As I said from the very beginning, this code is fine on x86 because
> atomic ops are fully serialised on x86.

Yes. Other architectures require __smp_mb__{before,after}_atomic for
the bit setting ops to actually be memory barriers.

We *should* probably have acquire/release versions of the bit test/set
helpers, but we don't, so they end up being full memory barriers with
those things. Which isn't optimal, but I doubt it matters on most
architectures.

So maybe we'll some day have a "test_bit_acquire()" and a
"set_bit_release()" etc.

Linus
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/1] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-06-02 Thread Oleg Nesterov
Hi Mike,

sorry, but somehow I can't understand this patch...

I'll try to read it with a fresh head on Weekend, but for example,

On 06/01, Mike Christie wrote:
>
>  static int vhost_task_fn(void *data)
>  {
>   struct vhost_task *vtsk = data;
> - int ret;
> + bool dead = false;
> +
> + for (;;) {
> + bool did_work;
> +
> + /* mb paired w/ vhost_task_stop */
> + if (test_bit(VHOST_TASK_FLAGS_STOP, >flags))
> + break;
> +
> + if (!dead && signal_pending(current)) {
> + struct ksignal ksig;
> + /*
> +  * Calling get_signal will block in SIGSTOP,
> +  * or clear fatal_signal_pending, but remember
> +  * what was set.
> +  *
> +  * This thread won't actually exit until all
> +  * of the file descriptors are closed, and
> +  * the release function is called.
> +  */
> + dead = get_signal();
> + if (dead)
> + clear_thread_flag(TIF_SIGPENDING);

this can't be right or I am totally confused.

Another signal_wake_up() can come right after clear(SIGPENDING).


Again, I'll try to re-read this patch, but let me ask anyway...

Do we have a plan B? I mean... iirc you have mentioned that you can
change these code paths to do something like

if (killed)
tell_the_drivers_that_all_callbacks_will_fail();


so that vhost_worker() can exit after get_signal() returns SIGKILL.

Probably I misunderstood you, but it would be nice to avoid the changes
in coredump/etc code just to add a temporary (iiuc!) fix.

Oleg.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 1/1] vringh: IOMEM support

2023-06-02 Thread kernel test robot
Hi Shunsuke,

kernel test robot noticed the following build errors:

[auto build test ERROR on mst-vhost/linux-next]
[also build test ERROR on linus/master horms-ipvs/master v6.4-rc4 next-20230602]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:
https://github.com/intel-lab-lkp/linux/commits/Shunsuke-Mie/vringh-IOMEM-support/20230602-135351
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
patch link:
https://lore.kernel.org/r/20230602055211.309960-2-mie%40igel.co.jp
patch subject: [PATCH v4 1/1] vringh: IOMEM support
config: i386-randconfig-i003-20230531 
(https://download.01.org/0day-ci/archive/20230603/202306030216.bpwr6xv0-...@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build):
# 
https://github.com/intel-lab-lkp/linux/commit/de2a1f5220c32e953400f225aba6bd294a8d41b8
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review 
Shunsuke-Mie/vringh-IOMEM-support/20230602-135351
git checkout de2a1f5220c32e953400f225aba6bd294a8d41b8
# save the config file
mkdir build_dir && cp config build_dir/.config
make W=1 O=build_dir ARCH=i386 olddefconfig
make W=1 O=build_dir ARCH=i386 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202306030216.bpwr6xv0-...@intel.com/

All errors (new ones prefixed by >>):

   drivers/vhost/vringh.c: In function 'getu16_iomem':
>> drivers/vhost/vringh.c:1610:37: error: implicit declaration of function 
>> 'ioread16' [-Werror=implicit-function-declaration]
1610 | *val = vringh16_to_cpu(vrh, ioread16(p));
 | ^~~~
   drivers/vhost/vringh.c: In function 'putu16_iomem':
>> drivers/vhost/vringh.c:1616:9: error: implicit declaration of function 
>> 'iowrite16' [-Werror=implicit-function-declaration]
1616 | iowrite16(cpu_to_vringh16(vrh, val), p);
 | ^
   drivers/vhost/vringh.c: In function 'copydesc_iomem':
>> drivers/vhost/vringh.c:1623:9: error: implicit declaration of function 
>> 'memcpy_fromio'; did you mean 'memcpy_from_bvec'? 
>> [-Werror=implicit-function-declaration]
1623 | memcpy_fromio(dst, src, len);
 | ^
 | memcpy_from_bvec
   drivers/vhost/vringh.c: In function 'putused_iomem':
>> drivers/vhost/vringh.c:1630:9: error: implicit declaration of function 
>> 'memcpy_toio' [-Werror=implicit-function-declaration]
1630 | memcpy_toio(dst, src, num * sizeof(*dst));
 | ^~~
   drivers/vhost/vringh.c: At top level:
   drivers/vhost/vringh.c:1661:5: warning: no previous prototype for 
'vringh_init_iomem' [-Wmissing-prototypes]
1661 | int vringh_init_iomem(struct vringh *vrh, u64 features, unsigned int 
num,
 | ^
   drivers/vhost/vringh.c:1683:5: warning: no previous prototype for 
'vringh_getdesc_iomem' [-Wmissing-prototypes]
1683 | int vringh_getdesc_iomem(struct vringh *vrh, struct vringh_kiov 
*riov,
 | ^~~~
   drivers/vhost/vringh.c:1714:9: warning: no previous prototype for 
'vringh_iov_pull_iomem' [-Wmissing-prototypes]
1714 | ssize_t vringh_iov_pull_iomem(struct vringh *vrh, struct vringh_kiov 
*riov,
 | ^
   drivers/vhost/vringh.c:1729:9: warning: no previous prototype for 
'vringh_iov_push_iomem' [-Wmissing-prototypes]
1729 | ssize_t vringh_iov_push_iomem(struct vringh *vrh, struct vringh_kiov 
*wiov,
 | ^
   drivers/vhost/vringh.c:1744:6: warning: no previous prototype for 
'vringh_abandon_iomem' [-Wmissing-prototypes]
1744 | void vringh_abandon_iomem(struct vringh *vrh, unsigned int num)
 |  ^~~~
   drivers/vhost/vringh.c:1759:5: warning: no previous prototype for 
'vringh_complete_iomem' [-Wmissing-prototypes]
1759 | int vringh_complete_iomem(struct vringh *vrh, u16 head, u32 len)
 | ^
   drivers/vhost/vringh.c:1777:6: warning: no previous prototype for 
'vringh_notify_enable_iomem' [-Wmissing-prototypes]
1777 | bool vringh_notify_enable_iomem(struct vringh *vrh)
 |  ^~
   drivers/vhost/vringh.c:1790:6: warning: no previous prototype for 
'vringh_notify_disable_iomem' [-Wmissing-prototypes]
1790 | void vringh_notify_disable_iomem(struct vringh *vrh)
 |  ^~~
   drivers/vhost/vringh.c:1802:5: warning: no previous prototype for 
'vringh_need_notify_iomem' [-Wmissing-pr

Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-06-02 Thread Oleg Nesterov
On 06/02, Jason Wang wrote:
>
> On Thu, Jun 1, 2023 at 3:43 PM Oleg Nesterov  wrote:
> >
> > and the final rewrite:
> >
> > if (work->node) {
> > work_next = work->node->next;
> > if (true)
> > clear_bit(>flags);
> > }
> >
> > so again, I do not see the load-store control dependency.
>
> This kind of optimization is suspicious. Especially considering it's
> the control expression of the loop but not a condition.

It is not about optimization,

> Looking at the assembly (x86):
>
>0x81d46c5b <+75>:callq  0x81689ac0 
> 
>0x81d46c60 <+80>:mov%rax,%r15
>0x81d46c63 <+83>:test   %rax,%rax
>0x81d46c66 <+86>:je 0x81d46c3a 
>0x81d46c68 <+88>:mov%r15,%rdi
>0x81d46c6b <+91>:mov(%r15),%r15
>0x81d46c6e <+94>:lock andb $0xfd,0x10(%rdi)
>0x81d46c73 <+99>:movl   $0x0,0x18(%rbx)
>0x81d46c7a <+106>:   mov0x8(%rdi),%rax
>0x81d46c7e <+110>:   callq  0x821b39a0
> <__x86_indirect_thunk_array>
>0x81d46c83 <+115>:   callq  0x821b4d10 
> <__SCT__cond_resched>
> ...
>
> I can see:
>
> 1) The code read node->next (+91) before clear_bit (+94)

The code does. but what about CPU ?

> 2) And the it uses a lock prefix to guarantee the execution order

As I said from the very beginning, this code is fine on x86 because
atomic ops are fully serialised on x86.

OK. we can't convince each other. I'll try to write another email when
I have time,

If this code is correct, then my understanding of memory barriers is even
worse than I think. I wouldn't be surprised, but I'd like to understand
what I have missed.

Oleg.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 1/1] vringh: IOMEM support

2023-06-02 Thread kernel test robot
Hi Shunsuke,

kernel test robot noticed the following build warnings:

[auto build test WARNING on mst-vhost/linux-next]
[also build test WARNING on linus/master horms-ipvs/master v6.4-rc4 
next-20230602]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:
https://github.com/intel-lab-lkp/linux/commits/Shunsuke-Mie/vringh-IOMEM-support/20230602-135351
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
patch link:
https://lore.kernel.org/r/20230602055211.309960-2-mie%40igel.co.jp
patch subject: [PATCH v4 1/1] vringh: IOMEM support
config: hexagon-randconfig-r034-20230531 
(https://download.01.org/0day-ci/archive/20230603/202306030119.ng7cqmj8-...@intel.com/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project 
4faf3aaf28226a4e950c103a14f6fc1d1fdabb1b)
reproduce (this is a W=1 build):
mkdir -p ~/bin
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/intel-lab-lkp/linux/commit/de2a1f5220c32e953400f225aba6bd294a8d41b8
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review 
Shunsuke-Mie/vringh-IOMEM-support/20230602-135351
git checkout de2a1f5220c32e953400f225aba6bd294a8d41b8
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang ~/bin/make.cross W=1 
O=build_dir ARCH=hexagon olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang ~/bin/make.cross W=1 
O=build_dir ARCH=hexagon SHELL=/bin/bash drivers/vhost/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202306030119.ng7cqmj8-...@intel.com/

All warnings (new ones prefixed by >>):

   In file included from drivers/vhost/vringh.c:17:
   In file included from include/linux/bvec.h:10:
   In file included from include/linux/highmem.h:12:
   In file included from include/linux/hardirq.h:11:
   In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/hexagon/include/asm/io.h:334:
   include/asm-generic/io.h:547:31: warning: performing pointer arithmetic on a 
null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   val = __raw_readb(PCI_IOBASE + addr);
 ~~ ^
   include/asm-generic/io.h:560:61: warning: performing pointer arithmetic on a 
null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
   ~~ ^
   include/uapi/linux/byteorder/little_endian.h:37:51: note: expanded from 
macro '__le16_to_cpu'
   #define __le16_to_cpu(x) ((__force __u16)(__le16)(x))
 ^
   In file included from drivers/vhost/vringh.c:17:
   In file included from include/linux/bvec.h:10:
   In file included from include/linux/highmem.h:12:
   In file included from include/linux/hardirq.h:11:
   In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/hexagon/include/asm/io.h:334:
   include/asm-generic/io.h:573:61: warning: performing pointer arithmetic on a 
null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
   ~~ ^
   include/uapi/linux/byteorder/little_endian.h:35:51: note: expanded from 
macro '__le32_to_cpu'
   #define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
 ^
   In file included from drivers/vhost/vringh.c:17:
   In file included from include/linux/bvec.h:10:
   In file included from include/linux/highmem.h:12:
   In file included from include/linux/hardirq.h:11:
   In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/hexagon/include/asm/io.h:334:
   include/asm-generic/io.h:584:33: warning: performing pointer arithmetic on a 
null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   __ra

Re: vhost stable branch

2023-06-02 Thread Dragos Tatulea via Virtualization
On Fri, 2023-06-02 at 08:22 -0400, Michael S. Tsirkin wrote:
> On Tue, May 30, 2023 at 10:19:18AM +, Dragos Tatulea wrote:
> > On Tue, 2023-05-23 at 07:45 -0400, Michael S. Tsirkin wrote:
> > > On Tue, May 23, 2023 at 08:38:47AM +, Dragos Tatulea wrote:
> > > > On Tue, 2023-05-23 at 04:18 -0400, Michael S. Tsirkin wrote:
> > > > > On Tue, May 23, 2023 at 07:16:58AM +, Dragos Tatulea wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > I am looking for the stable branch for vdpa fixes in the vhost tree
> > > > > > [1].
> > > > > > There
> > > > > > are 3 branches that seem identical: linux-next, test and vhost.
> > > > > > linux-
> > > > > > next
> > > > > > seems
> > > > > > to be -next branch. Which one would be the stable branch?
> > > > > > 
> > > > > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git
> > > > > > 
> > > > > > Thanks,
> > > > > > Dragos
> > > > > 
> > > > > I don't publish one until I am ready to send patches to Linus.
> > > > > Which should be today or tomorrow.
> > > > > 
> > > > Understood. Is it the same with patches for -next? Are they published
> > > > only
> > > > once
> > > > the patches are sent to Linus?
> > > > 
> > > > Thanks,
> > > > Dragos
> > > > 
> > > 
> > > A bit different. I start worrying about next when I'm done with stable.
> > > Also my next branch rebases frequently, I also drop and squash patches,
> > > rewrite commit log etc.
> > > 
> > Is the linux-next branch in the referenced tree [1] the next branch or is it
> > hosted somewhere else? I am asking because I haven't seen any rebases in the
> > last 4+ weeks in the referenced location. I want to make sure that I'm
> > looking
> > in the right location.
> > 
> > Thanks,
> > Dragos
> > 
> 
> pushed to next now.
> 
Thanks for the note.

Thanks,
Dragos
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V3] virtio-fs: Improved request latencies when Virtio queue is full

2023-06-02 Thread Peter-Jan Gootzen via Virtualization
When the Virtio queue is full, a work item is scheduled
to execute in 1ms that retries adding the request to the queue.
This is a large amount of time on the scale on which a
virtio-fs device can operate. When using a DPU this is around
40us baseline without going to a remote server (4k, QD=1).
This patch queues requests when the Virtio queue is full,
and when a completed request is taken off, immediately fills
it back up with queued requests.

This reduces the 99.9th percentile latencies in our tests by
60x and slightly increases the overall throughput, when using a
queue depth 2x the size of the Virtio queue size, with a
DPU-powered virtio-fs device.

Furthermore, the virtio-fs driver now also always lets -ENOMEM
errors go to userspace instead of retrying the request in the
driver.

Signed-off-by: Peter-Jan Gootzen 
---
V3: Fixed requests falling into the void when -ENOMEM and no new
incoming requests. Virtio-fs now always lets -ENOMEM bubble up to
userspace. Also made queue full condition more explicit with
-ENOSPC in `send_forget_request`.
V2: Not scheduling dispatch work anymore when not needed
and changed delayed_work structs to work_struct structs

 fs/fuse/virtio_fs.c | 46 ++---
 1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index 4d8d4f16c727..3a3231ddb9e7 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -45,7 +45,7 @@ struct virtio_fs_vq {
struct work_struct done_work;
struct list_head queued_reqs;
struct list_head end_reqs;  /* End these requests */
-   struct delayed_work dispatch_work;
+   struct work_struct dispatch_work;
struct fuse_dev *fud;
bool connected;
long in_flight;
@@ -202,7 +202,7 @@ static void virtio_fs_drain_queue(struct virtio_fs_vq *fsvq)
}
 
flush_work(>done_work);
-   flush_delayed_work(>dispatch_work);
+   flush_work(>dispatch_work);
 }
 
 static void virtio_fs_drain_all_queues_locked(struct virtio_fs *fs)
@@ -346,6 +346,9 @@ static void virtio_fs_hiprio_done_work(struct work_struct 
*work)
dec_in_flight_req(fsvq);
}
} while (!virtqueue_enable_cb(vq) && likely(!virtqueue_is_broken(vq)));
+
+   if (!list_empty(>queued_reqs))
+   schedule_work(>dispatch_work);
spin_unlock(>lock);
 }
 
@@ -353,7 +356,7 @@ static void virtio_fs_request_dispatch_work(struct 
work_struct *work)
 {
struct fuse_req *req;
struct virtio_fs_vq *fsvq = container_of(work, struct virtio_fs_vq,
-dispatch_work.work);
+dispatch_work);
int ret;
 
pr_debug("virtio-fs: worker %s called.\n", __func__);
@@ -385,11 +388,9 @@ static void virtio_fs_request_dispatch_work(struct 
work_struct *work)
 
ret = virtio_fs_enqueue_req(fsvq, req, true);
if (ret < 0) {
-   if (ret == -ENOMEM || ret == -ENOSPC) {
+   if (ret == -ENOSPC) {
spin_lock(>lock);
list_add_tail(>list, >queued_reqs);
-   schedule_delayed_work(>dispatch_work,
- msecs_to_jiffies(1));
spin_unlock(>lock);
return;
}
@@ -405,8 +406,8 @@ static void virtio_fs_request_dispatch_work(struct 
work_struct *work)
 }
 
 /*
- * Returns 1 if queue is full and sender should wait a bit before sending
- * next request, 0 otherwise.
+ * Returns 0 if request has been successfully sent, otherwise -ENOSPC
+ * when the queue is full.
  */
 static int send_forget_request(struct virtio_fs_vq *fsvq,
   struct virtio_fs_forget *forget,
@@ -432,16 +433,12 @@ static int send_forget_request(struct virtio_fs_vq *fsvq,
 
ret = virtqueue_add_outbuf(vq, , 1, forget, GFP_ATOMIC);
if (ret < 0) {
-   if (ret == -ENOMEM || ret == -ENOSPC) {
+   if (ret == -ENOSPC) {
pr_debug("virtio-fs: Could not queue FORGET: err=%d. 
Will try later\n",
 ret);
list_add_tail(>list, >queued_reqs);
-   schedule_delayed_work(>dispatch_work,
- msecs_to_jiffies(1));
if (!in_flight)
inc_in_flight_req(fsvq);
-   /* Queue is full */
-   ret = 1;
} else {
pr_debug("virtio-fs: Could not queue FORGET: err=%d. 
Dropping it.\n",
 ret);
@@ -469,7 +466,7 @@ static void virtio_fs_hiprio_dispatch_work(struct 
work_struct *work)
 {
struct virtio_fs_forget 

Re: [PATCH V2] virtio-fs: Improved request latencies when Virtio queue is full

2023-06-02 Thread Peter-Jan Gootzen via Virtualization
On 01/06/2023 20:45, Vivek Goyal wrote:
> On Thu, Jun 01, 2023 at 10:08:50AM -0400, Stefan Hajnoczi wrote:
>> On Wed, May 31, 2023 at 04:49:39PM -0400, Vivek Goyal wrote:
>>> On Wed, May 31, 2023 at 10:34:15PM +0200, Peter-Jan Gootzen wrote:
 On 31/05/2023 21:18, Vivek Goyal wrote:
> On Wed, May 31, 2023 at 07:10:32PM +0200, Peter-Jan Gootzen wrote:
>> When the Virtio queue is full, a work item is scheduled
>> to execute in 1ms that retries adding the request to the queue.
>> This is a large amount of time on the scale on which a
>> virtio-fs device can operate. When using a DPU this is around
>> 40us baseline without going to a remote server (4k, QD=1).
>> This patch queues requests when the Virtio queue is full,
>> and when a completed request is taken off, immediately fills
>> it back up with queued requests.
>>
>> This reduces the 99.9th percentile latencies in our tests by
>> 60x and slightly increases the overall throughput, when using a
>> queue depth 2x the size of the Virtio queue size, with a
>> DPU-powered virtio-fs device.
>>
>> Signed-off-by: Peter-Jan Gootzen 
>> ---
>> V1 -> V2: Not scheduling dispatch work anymore when not needed
>> and changed delayed_work structs to work_struct structs
>>
>>  fs/fuse/virtio_fs.c | 32 +---
>>  1 file changed, 17 insertions(+), 15 deletions(-)
>>
>> diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
>> index 4d8d4f16c727..a676297db09b 100644
>> --- a/fs/fuse/virtio_fs.c
>> +++ b/fs/fuse/virtio_fs.c
>> @@ -45,7 +45,7 @@ struct virtio_fs_vq {
>>  struct work_struct done_work;
>>  struct list_head queued_reqs;
>>  struct list_head end_reqs;  /* End these requests */
>> -struct delayed_work dispatch_work;
>> +struct work_struct dispatch_work;
>>  struct fuse_dev *fud;
>>  bool connected;
>>  long in_flight;
>> @@ -202,7 +202,7 @@ static void virtio_fs_drain_queue(struct 
>> virtio_fs_vq *fsvq)
>>  }
>>  
>>  flush_work(>done_work);
>> -flush_delayed_work(>dispatch_work);
>> +flush_work(>dispatch_work);
>>  }
>>  
>>  static void virtio_fs_drain_all_queues_locked(struct virtio_fs *fs)
>> @@ -346,6 +346,9 @@ static void virtio_fs_hiprio_done_work(struct 
>> work_struct *work)
>>  dec_in_flight_req(fsvq);
>>  }
>>  } while (!virtqueue_enable_cb(vq) && 
>> likely(!virtqueue_is_broken(vq)));
>> +
>> +if (!list_empty(>queued_reqs))
>> +schedule_work(>dispatch_work);
>>  spin_unlock(>lock);
>>  }
>>  
>> @@ -353,7 +356,7 @@ static void virtio_fs_request_dispatch_work(struct 
>> work_struct *work)
>>  {
>>  struct fuse_req *req;
>>  struct virtio_fs_vq *fsvq = container_of(work, struct 
>> virtio_fs_vq,
>> - dispatch_work.work);
>> + dispatch_work);
>>  int ret;
>>  
>>  pr_debug("virtio-fs: worker %s called.\n", __func__);
>> @@ -388,8 +391,6 @@ static void virtio_fs_request_dispatch_work(struct 
>> work_struct *work)
>>  if (ret == -ENOMEM || ret == -ENOSPC) {
>>  spin_lock(>lock);
>>  list_add_tail(>list, 
>> >queued_reqs);
>> -
>> schedule_delayed_work(>dispatch_work,
>> -  
>> msecs_to_jiffies(1));
>
> Virtqueue being full is only one of the reasons for failure to queue
> the request. What if virtqueue is empty but we could not queue the
> request because lack of memory (-ENOMEM). In that case we will queue
> the request and it might not be dispatched because there is no completion.
> (Assume there is no further new request coming). That means deadlock?
>
> Thanks
> Vivek
>

 Good catch that will deadlock.

 Is default kernel behavior to indefinitely retry a file system
 request until memory is available?
>>>
>>> As of now that seems to be the behavior. I think I had copied this
>>> code from another driver. 
>>>
>>> But I guess one can argue that if memory is not available, then
>>> return -ENOMEM to user space instead of retrying in kernel.
>>>
>>> Stefan, Miklos, WDYT?
>>
>> My understanding is that file system syscalls may return ENOMEM, so this
>> is okay.
> 
> Ok. Fair enough. Thanks.
> 
> One more question. How do we know virtqueue is full. Is -ENOSPC is the
> correct error code to check and retry indefinitely. Are there other
> situations where -ENOSPC can be returned. Peter's 

Re: vhost stable branch

2023-06-02 Thread Michael S. Tsirkin
On Tue, May 30, 2023 at 10:19:18AM +, Dragos Tatulea wrote:
> On Tue, 2023-05-23 at 07:45 -0400, Michael S. Tsirkin wrote:
> > On Tue, May 23, 2023 at 08:38:47AM +, Dragos Tatulea wrote:
> > > On Tue, 2023-05-23 at 04:18 -0400, Michael S. Tsirkin wrote:
> > > > On Tue, May 23, 2023 at 07:16:58AM +, Dragos Tatulea wrote:
> > > > > Hi,
> > > > > 
> > > > > I am looking for the stable branch for vdpa fixes in the vhost tree 
> > > > > [1].
> > > > > There
> > > > > are 3 branches that seem identical: linux-next, test and vhost. linux-
> > > > > next
> > > > > seems
> > > > > to be -next branch. Which one would be the stable branch?
> > > > > 
> > > > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git
> > > > > 
> > > > > Thanks,
> > > > > Dragos
> > > > 
> > > > I don't publish one until I am ready to send patches to Linus.
> > > > Which should be today or tomorrow.
> > > > 
> > > Understood. Is it the same with patches for -next? Are they published only
> > > once
> > > the patches are sent to Linus?
> > > 
> > > Thanks,
> > > Dragos
> > > 
> > 
> > A bit different. I start worrying about next when I'm done with stable.
> > Also my next branch rebases frequently, I also drop and squash patches,
> > rewrite commit log etc.
> > 
> Is the linux-next branch in the referenced tree [1] the next branch or is it
> hosted somewhere else? I am asking because I haven't seen any rebases in the
> last 4+ weeks in the referenced location. I want to make sure that I'm looking
> in the right location.
> 
> Thanks,
> Dragos
> 

pushed to next now.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio-vdpa: Fix unchecked call to NULL set_vq_affinity

2023-06-02 Thread Michael S. Tsirkin
On Fri, May 12, 2023 at 04:55:38PM -0700, Shannon Nelson wrote:
> On 5/12/23 6:30 AM, Michael S. Tsirkin wrote:
> > 
> > On Fri, May 12, 2023 at 12:51:21PM +, Dragos Tatulea wrote:
> > > On Thu, 2023-05-04 at 14:51 -0400, Michael S. Tsirkin wrote:
> > > > On Thu, May 04, 2023 at 01:08:54PM -0400, Feng Liu wrote:
> > > > > 
> > > > > 
> > > > > On 2023-05-04 a.m.9:50, Dragos Tatulea wrote:
> > > > > > External email: Use caution opening links or attachments
> > > > > > 
> > > > > > 
> > > > > > The referenced patch calls set_vq_affinity without checking if the 
> > > > > > op is
> > > > > > valid. This patch adds the check.
> > > > > > 
> > > > > > Fixes: 3dad56823b53 ("virtio-vdpa: Support interrupt affinity 
> > > > > > spreading
> > > > > > mechanism")
> > > > > > Reviewed-by: Gal Pressman 
> > > > > > Signed-off-by: Dragos Tatulea 
> > > > > > ---
> > > > > >drivers/virtio/virtio_vdpa.c | 4 +++-
> > > > > >1 file changed, 3 insertions(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/drivers/virtio/virtio_vdpa.c 
> > > > > > b/drivers/virtio/virtio_vdpa.c
> > > > > > index eb6aee8c06b2..989e2d7184ce 100644
> > > > > > --- a/drivers/virtio/virtio_vdpa.c
> > > > > > +++ b/drivers/virtio/virtio_vdpa.c
> > > > > > @@ -385,7 +385,9 @@ static int virtio_vdpa_find_vqs(struct 
> > > > > > virtio_device
> > > > > > *vdev, unsigned int nvqs,
> > > > > >   err = PTR_ERR(vqs[i]);
> > > > > >   goto err_setup_vq;
> > > > > >   }
> > > > > > -   ops->set_vq_affinity(vdpa, i, [i]);
> > > > > > +
> > > > > > +   if (ops->set_vq_affinity)
> > > > > > +   ops->set_vq_affinity(vdpa, i, [i]);
> > > > > if ops->set_vq_affinity is NULL, should give an error code to err, and
> > > > > return err
> > > > 
> > > > Given we ignore return code, hardly seems like a critical thing to do.
> > > > Is it really important? affinity is an optimization isn't it?
> > > > 
> > > > > > 
> > > set_vq_affinity is optional so it's not an error if the op is not 
> > > implemented.
> > > 
> > > Is there anything else that needs to be done for this fix?
> > > 
> > > Thanks,
> > > Dragos
> > > 
> > 
> > no, it's queued already.
> 
> Are these queued into a repo that is accessible?  I haven't seen activity in
> the vhost.git where I would have expected it.  After stumbling over and
> debugging this same problem, I was happy to see it fixed, and I'd like to
> pull from a repo that has the current updates.
> 
> Thanks,
> sln

Pushed to next now.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2] vhost/vdpa: Add MSI translation tables to iommu for software-managed MSI

2023-06-02 Thread Michael S. Tsirkin
On Tue, Feb 07, 2023 at 08:08:43PM +0800, Nanyong Sun wrote:
> From: Rong Wang 
> 
> Once enable iommu domain for one device, the MSI
> translation tables have to be there for software-managed MSI.
> Otherwise, platform with software-managed MSI without an
> irq bypass function, can not get a correct memory write event
> from pcie, will not get irqs.
> The solution is to obtain the MSI phy base address from
> iommu reserved region, and set it to iommu MSI cookie,
> then translation tables will be created while request irq.


OK this one seems to be going nowhere I untagged it.

> Change log
> --
> 
> v1->v2:
> - add resv iotlb to avoid overlap mapping.
> 
> Signed-off-by: Rong Wang 
> Signed-off-by: Nanyong Sun 
> ---
>  drivers/iommu/iommu.c |  1 +
>  drivers/vhost/vdpa.c  | 59 ---
>  2 files changed, 57 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 5f6a85aea501..af9c064ad8b2 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2623,6 +2623,7 @@ void iommu_get_resv_regions(struct device *dev, struct 
> list_head *list)
>   if (ops->get_resv_regions)
>   ops->get_resv_regions(dev, list);
>  }
> +EXPORT_SYMBOL(iommu_get_resv_regions);
>  
>  /**
>   * iommu_put_resv_regions - release resered regions
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index ec32f785dfde..a58979da8acd 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -49,6 +49,7 @@ struct vhost_vdpa {
>   struct completion completion;
>   struct vdpa_device *vdpa;
>   struct hlist_head as[VHOST_VDPA_IOTLB_BUCKETS];
> + struct vhost_iotlb resv_iotlb;
>   struct device dev;
>   struct cdev cdev;
>   atomic_t opened;
> @@ -216,6 +217,8 @@ static int vhost_vdpa_reset(struct vhost_vdpa *v)
>  
>   v->in_batch = 0;
>  
> + vhost_iotlb_reset(>resv_iotlb);
> +
>   return vdpa_reset(vdpa);
>  }
>  
> @@ -1013,6 +1016,10 @@ static int vhost_vdpa_process_iotlb_update(struct 
> vhost_vdpa *v,
>   msg->iova + msg->size - 1 > v->range.last)
>   return -EINVAL;
>  
> + if (vhost_iotlb_itree_first(>resv_iotlb, msg->iova,
> + msg->iova + msg->size - 1))
> + return -EINVAL;
> +
>   if (vhost_iotlb_itree_first(iotlb, msg->iova,
>   msg->iova + msg->size - 1))
>   return -EEXIST;
> @@ -1103,6 +1110,45 @@ static ssize_t vhost_vdpa_chr_write_iter(struct kiocb 
> *iocb,
>   return vhost_chr_write_iter(dev, from);
>  }
>  
> +static int vhost_vdpa_resv_iommu_region(struct iommu_domain *domain, struct 
> device *dma_dev,
> + struct vhost_iotlb *resv_iotlb)
> +{
> + struct list_head dev_resv_regions;
> + phys_addr_t resv_msi_base = 0;
> + struct iommu_resv_region *region;
> + int ret = 0;
> + bool with_sw_msi = false;
> + bool with_hw_msi = false;
> +
> + INIT_LIST_HEAD(_resv_regions);
> + iommu_get_resv_regions(dma_dev, _resv_regions);
> +
> + list_for_each_entry(region, _resv_regions, list) {
> + ret = vhost_iotlb_add_range_ctx(resv_iotlb, region->start,
> + region->start + region->length - 1,
> + 0, 0, NULL);
> + if (ret) {
> + vhost_iotlb_reset(resv_iotlb);
> + break;
> + }
> +
> + if (region->type == IOMMU_RESV_MSI)
> + with_hw_msi = true;
> +
> + if (region->type == IOMMU_RESV_SW_MSI) {
> + resv_msi_base = region->start;
> + with_sw_msi = true;
> + }
> + }
> +
> + if (!ret && !with_hw_msi && with_sw_msi)
> + ret = iommu_get_msi_cookie(domain, resv_msi_base);
> +
> + iommu_put_resv_regions(dma_dev, _resv_regions);
> +
> + return ret;
> +}
> +
>  static int vhost_vdpa_alloc_domain(struct vhost_vdpa *v)
>  {
>   struct vdpa_device *vdpa = v->vdpa;
> @@ -1128,11 +1174,16 @@ static int vhost_vdpa_alloc_domain(struct vhost_vdpa 
> *v)
>  
>   ret = iommu_attach_device(v->domain, dma_dev);
>   if (ret)
> - goto err_attach;
> + goto err_alloc_domain;
>  
> - return 0;
> + ret = vhost_vdpa_resv_iommu_region(v->domain, dma_dev, >resv_iotlb);
> + if (ret)
> + goto err_attach_device;
>  
> -err_attach:
> + return 0;
> +err_attach_device:
> + iommu_detach_device(v->domain, dma_dev);
> +err_alloc_domain:
>   iommu_domain_free(v->domain);
>   return ret;
>  }
> @@ -1385,6 +1436,8 @@ static int vhost_vdpa_probe(struct vdpa_device *vdpa)
>   goto err;
>   }
>  
> + vhost_iotlb_init(>resv_iotlb, 0, 0);
> +
>   r = dev_set_name(>dev, "vhost-vdpa-%u", minor);
>   if (r)
>   goto err;
> -- 
> 2.25.1

___

Re: [PATCH] scsi: virtio_scsi: Remove a useless function call

2023-06-02 Thread Michael S. Tsirkin
On Mon, May 29, 2023 at 09:35:08AM +0200, Christophe JAILLET wrote:
> 'inq_result' is known to be NULL. There is no point calling kfree().
> 
> Signed-off-by: Christophe JAILLET 

Acked-by: Michael S. Tsirkin 

> ---
>  drivers/scsi/virtio_scsi.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
> index 58498da9869a..bd5633667d01 100644
> --- a/drivers/scsi/virtio_scsi.c
> +++ b/drivers/scsi/virtio_scsi.c
> @@ -338,10 +338,8 @@ static int virtscsi_rescan_hotunplug(struct virtio_scsi 
> *vscsi)
>   int result, inquiry_len, inq_result_len = 256;
>   char *inq_result = kmalloc(inq_result_len, GFP_KERNEL);
>  
> - if (!inq_result) {
> - kfree(inq_result);
> + if (!inq_result)
>   return -ENOMEM;
> - }
>  
>   shost_for_each_device(sdev, shost) {
>   inquiry_len = sdev->inquiry_len ? sdev->inquiry_len : 36;
> -- 
> 2.34.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v7 00/14] vhost: multiple worker support

2023-06-02 Thread Michael S. Tsirkin
Hi Mike,

On Fri, Apr 28, 2023 at 11:35:20AM -0500, michael.chris...@oracle.com wrote:
> The following patches were built over Linux's tree. They allow us to
> support multiple vhost workers tasks per device. The design is a modified
> version of Stefan's original idea where userspace has the kernel create a
> worker and we pass back the pid. In this version instead of passing the
> pid between user/kernel space we use a worker_id which is just an integer
> managed by the vhost driver and we allow userspace to create and free
> workers and then attach them to virtqueues at setup time.
> 
> All review comments from the past reviews should be handled. If I didn't
> reply to a review comment, I agreed with the comment and should have
> handled it in this posting. Let me know if I missed one.
> 
> Results:
> 
> 
> fio jobs1   2   4   8   12  16
> --
> 1 worker160k   488k -   -   -   -
> worker per vq   160k   310k620k1300k   1836k   2326k
> 
> 
> Notes:
> 0. This used a simple fio command:
> 
> fio --filename=/dev/sdb  --direct=1 --rw=randrw --bs=4k \
> --ioengine=libaio --iodepth=128  --numjobs=$JOBS_ABOVE
> 
> and I used a VM with 16 vCPUs and 16 virtqueues.
> 
> 1. The patches were tested with LIO's emulate_pr=0 which drops the
> LIO PR lock use. This was a bottleneck at around 12 vqs/jobs.
> 
> 2. Because we have a hard limit of 1024 cmds, if the num jobs * iodepth
> was greater than 1024, I would decrease iodepth. So 12 jobs used 85 cmds,
> and 16 used 64.
> 
> 3. The perf issue above at 2 jobs is because when we only have 1 worker
> we execute more cmds per vhost_work due to all vqs funneling to one worker.
> I have 2 patches that fix this. One is just submit from the worker thread
> instead of kikcing off to a workqueue like how the vhost block patches do.
> I'll post this after the worker support is merged. I'm also working on
> patches to add back batching during lio completion and do polling on the
> submission side.
> 
> We will still want the threading patches, because if we batch at the fio
> level plus use the vhost theading patches, we can see a big boost like
> below. So hopefully doing it at the kernel will allow apps to just work
> without having to be smart like fio.
> 
> fio using io_uring and batching with the iodepth_batch* settings:
> 
> fio jobs1   2   4   8   12  16
> -
> 1 worker494k520k-   -   -   -
> worker per vq   496k878k1542k   2436k   2304k   2590k


Could you rebase on latest rc and repost please?
Thanks!

> V7:
> - Add more comments about assumptions.
> - Drop refcounting and just use an attachment_cnt variable, so there
> is no confusion about when a worker is freed.
> - Do a opt-in model, because vsiock has an issue where it can queue works
> before it's running and it doesn't even need multiple workers, so there 
> is no point in chaning the driver or core code.
> - Add back get worker ioctl.
> - Broke up last patches to make it easier to read.
> 
> V6:
> - Rebase against vhost_task patchset.
> - Used xa instead of idr.
> V5:
> - Rebase against user_worker patchset.
> - Rebase against flush patchset.
> - Redo vhost-scsi tmf flush handling so it doesn't access vq->worker.
> V4:
> - fix vhost-sock VSOCK_VQ_RX use.
> - name functions called directly by ioctl cmd's to match the ioctl cmd.
> - break up VHOST_SET_VRING_WORKER into a new, free and attach cmd.
> - document worker lifetime, and cgroup, namespace, mm, rlimit
> inheritance, make it clear we currently only support sharing within the
> device.
> - add support to attach workers while IO is running.
> - instead of passing a pid_t of the kernel thread, pass a int allocated
> by the vhost layer with an idr.
> 
> V3:
> - fully convert vhost code to use vq based APIs instead of leaving it
> half per dev and half per vq.
> - rebase against kernel worker API.
> - Drop delayed worker creation. We always create the default worker at
> VHOST_SET_OWNER time. Userspace can create and bind workers after that.
> 
> V2:
> - change loop that we take a refcount to the worker in
> - replaced pid == -1 with define.
> - fixed tabbing/spacing coding style issue
> - use hash instead of list to lookup workers.
> - I dropped the patch that added an ioctl cmd to get a vq's worker's
> pid. I saw we might do a generic netlink interface instead.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vhost v10 06/10] virtio_ring: packed-detach: support return dma info to driver

2023-06-02 Thread Michael S. Tsirkin
On Fri, Jun 02, 2023 at 05:22:02PM +0800, Xuan Zhuo wrote:
> Under the premapped mode, the driver needs to unmap the DMA address
> after receiving the buffer. The virtio core records the DMA address,
> so the driver needs a way to get the dma info from the virtio core.
> 
> A straightforward approach is to pass an array to the virtio core when
> calling virtqueue_get_buf(). However, it is not feasible when there are
> multiple DMA addresses in the descriptor chain, and the array size is
> unknown.
> 
> To solve this problem, a helper be introduced. After calling
> virtqueue_get_buf(), the driver can call the helper to
> retrieve a dma info. If the helper function returns -EAGAIN, it means
> that there are more DMA addresses to be processed, and the driver should
> call the helper function again.


Please, keep error codes for when an actual error occurs.
A common pattern would be:
<0 - error
0 - success, done
>0 - success, more to do


> To keep track of the current position in
> the chain, a cursor must be passed to the helper function, which is
> initialized by virtqueue_get_buf().
> 
> Some processes are done inside this helper, so this helper MUST be
> called under the premapped mode.
> 
> Signed-off-by: Xuan Zhuo 
> ---
>  drivers/virtio/virtio_ring.c | 105 ---
>  include/linux/virtio.h   |   9 ++-
>  2 files changed, 103 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index cdc4349f6066..cbc22daae7e1 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -1695,8 +1695,85 @@ static bool virtqueue_kick_prepare_packed(struct 
> virtqueue *_vq)
>   return needs_kick;
>  }
>  
> +static void detach_cursor_init_packed(struct vring_virtqueue *vq,
> +   struct virtqueue_detach_cursor *cursor, 
> u16 id)
> +{
> + struct vring_desc_state_packed *state = NULL;
> + u32 len;
> +
> + state = >packed.desc_state[id];
> +
> + /* Clear data ptr. */
> + state->data = NULL;
> +
> + vq->packed.desc_extra[state->last].next = vq->free_head;
> + vq->free_head = id;
> + vq->vq.num_free += state->num;
> +
> + /* init cursor */
> + cursor->curr = id;
> + cursor->done = 0;
> + cursor->pos = 0;
> +
> + if (vq->packed.desc_extra[id].flags & VRING_DESC_F_INDIRECT) {
> + len = vq->split.desc_extra[id].len;
> +
> + cursor->num = len / sizeof(struct vring_packed_desc);
> + cursor->indirect = true;
> +
> + vring_unmap_extra_packed(vq, >packed.desc_extra[id]);
> + } else {
> + cursor->num = state->num;
> + cursor->indirect = false;
> + }
> +}
> +
> +static int virtqueue_detach_packed(struct virtqueue *_vq, struct 
> virtqueue_detach_cursor *cursor,
> +dma_addr_t *addr, u32 *len, enum 
> dma_data_direction *dir)
> +{
> + struct vring_virtqueue *vq = to_vvq(_vq);
> +
> + if (unlikely(cursor->done))
> + return -EINVAL;
> +
> + if (!cursor->indirect) {
> + struct vring_desc_extra *extra;
> +
> + extra = >packed.desc_extra[cursor->curr];
> + cursor->curr = extra->next;
> +
> + *addr = extra->addr;
> + *len = extra->len;
> + *dir = (extra->flags & VRING_DESC_F_WRITE) ? DMA_FROM_DEVICE : 
> DMA_TO_DEVICE;
> +
> + if (++cursor->pos == cursor->num) {
> + cursor->done = true;
> + return 0;
> + }
> + } else {
> + struct vring_packed_desc *indir_desc, *desc;
> + u16 flags;
> +
> + indir_desc = vq->packed.desc_state[cursor->curr].indir_desc;
> + desc = _desc[cursor->pos];
> +
> + flags = le16_to_cpu(desc->flags);
> + *addr = le64_to_cpu(desc->addr);
> + *len = le32_to_cpu(desc->len);
> + *dir = (flags & VRING_DESC_F_WRITE) ?  DMA_FROM_DEVICE : 
> DMA_TO_DEVICE;
> +
> + if (++cursor->pos == cursor->num) {
> + kfree(indir_desc);
> + cursor->done = true;
> + return 0;
> + }
> + }
> +
> + return -EAGAIN;
> +}
> +
>  static void detach_buf_packed(struct vring_virtqueue *vq,
> -   unsigned int id, void **ctx)
> +   unsigned int id)
>  {
>   struct vring_desc_state_packed *state = NULL;
>   struct vring_packed_desc *desc;
> @@ -1736,8 +1813,6 @@ static void detach_buf_packed(struct vring_virtqueue 
> *vq,
>   }
>   kfree(desc);
>   state->indir_desc = NULL;
> - } else if (ctx) {
> - *ctx = state->indir_desc;
>   }
>  }
>  
> @@ -1768,7 +1843,8 @@ static bool more_used_packed(const struct 
> vring_virtqueue *vq)
>  
>  static void 

Re: [PATCH v2 2/3] vhost: support PACKED when setting-getting vring_base

2023-06-02 Thread Michael S. Tsirkin
On Thu, May 18, 2023 at 09:34:25AM +0200, Stefano Garzarella wrote:
> I think we should do one of these things, though:
> - mask VIRTIO_F_RING_PACKED in the stable kernels when
> VHOST_GET_FEAETURES is called
> - backport this patch on all stable kernels that support vhost-vdpa
> 
> Maybe the last one makes more sense.
> 
> Thanks,
> Stefano

OK which patches do you want to go to stable exactly?


-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH net 2/3] virtio-net: allow usage of vrings smaller than MAX_SKB_FRAGS + 2

2023-06-02 Thread Michael S. Tsirkin
On Mon, May 01, 2023 at 11:59:42AM +, Alvaro Karsz wrote:
> > First up to 4k should not be a problem. Even jumbo frames e.g. 9k
> > is highly likely to succeed. And a probe time which is often boot
> > even 64k isn't a problem ...
> > 
> > Hmm. We could allocate large buffers at probe time. Reuse them and
> > copy data over.
> > 
> > IOW reusing  GOOD_COPY_LEN flow for this case.  Not yet sure how I feel
> > about this. OTOH it removes the need for the whole feature blocking
> > approach, does it not?
> > WDYT?
> > 
> 
> It could work..
> 
> In order to remove completely the feature blocking approach, we'll need to 
> let the control commands fail (as you mentioned in the other patch).
> I'm not sure I like it, it means many warnings from virtnet..
> And it means accepting features that we know for sure that are not going to 
> work.
>

Well they will work yes? just with an extra copy.

-- 
MST 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH net 0/3] virtio-net: allow usage of small vrings

2023-06-02 Thread Michael S. Tsirkin
On Mon, May 01, 2023 at 11:41:44AM +, Alvaro Karsz wrote:
> > > > Why the difference?
> > > >
> > >
> > > Because the RING_SIZE < 4 case requires much more adjustments.
> > >
> > > * We may need to squeeze the virtio header into the headroom.
> > > * We may need to squeeze the GSO header into the headroom, or block the 
> > > features.
> > 
> > We alread do this though no?
> > I think we'll need to tweak hard_header_len to guarantee it's there
> > as opposed to needed_headroom ...
> > 
> > > * At the moment, without NETIF_F_SG, we can receive a skb with 2 
> > > segments, we may need to reduce it to 1.
> > 
> > You are saying clearing NETIF_F_SG does not guarantee a linear skb?
> > 
> 
> I don't know..
> I'm not sure what is the cause, but using this patchset, without any host GSO 
> feature, I can get a chain of 3 descriptors.
> Posing an example of a 4 entries ring during iperf3, acting as a client:
> (TX descriptors)
> 
> len=86   flags 0x1 addr 0xf738d000
> len=1448   flags 0x0 addr 0xf738d800
> len=86   flags 0x8081   addr 0xf738e000
> len=1184,   flags 0x8081  addr 0xf738e800
> len=264 flags 0x8080   addr 0xf738f000
> len=86   flags 0x8081   addr 0xf738f800
> len=1448   flags 0x0 addr 0xf739
> len=86   flags 0x1 addr 0xf7390800
> len=1448   flags 0x0 addr 0xf7391000
> len=86   flags 0x1 addr 0xf716a800
> len=1448   flags 0x8080   addr 0xf716b000
> len=86   flags 0x8081   addr 0xf7391800
> len=1448   flags 0x8080   addr 0xf7392000
> 
> We got a chain of 3 in here.
> This happens often.
> 
> Now, when negotiating host GSO features, I can get up to 4:
> 
> len=86   flags 0x1 addr 0xf71fc800
> len=21328 flags 0x1 addr 0xf6e00800
> len=32768 flags 0x8081   addr 0xf6e06000
> len=11064 flags 0x8080   addr 0xf6e0e000
> len=86   flags 0x8081   addr 0xf738b000
> len=1 flags 0x8080   addr 0xf738b800
> len=86   flags 0x1 addr 0xf738c000
> len=21704 flags 0x1 addr 0xf738c800
> len=32768 flags 0x1 addr 0xf7392000
> len=10688 flags 0x0 addr 0xf739a000
> len=86   flags 0x8081   addr 0xf739d000
> len=22080 flags 0x8081   addr 0xf739d800
> len=32768 flags 0x8081   addr 0xf73a3000
> len=10312 flags 0x8080   addr 0xf73ab000
> 
> TBH, I thought that this behaviour was expected until you mentioned it,
> This is why virtnet_calc_max_descs returns 3 if no host_gso feature is 
> negotiated, and 4 otherwise.
> I was thinking that we may need to use another skb to hold the TSO template 
> (for headers generation)...
> 
> Any ideas?

Something's wrong with the code apparently. Want to try sticking
printk's in the driver to see what is going on?

Documentation/networking/netdev-features.rst says:
Those features say that ndo_start_xmit can handle fragmented skbs:
NETIF_F_SG --- paged skbs (skb_shinfo()->frags), NETIF_F_FRAGLIST ---
chained skbs (skb->next/prev list).


of course we can always linearize if we want to ...


> > > * We may need to change all the control commands, so class,  command and 
> > > command specific data will fit in a single segment.
> > > * We may need to disable the control command and all the features 
> > > depending on it.
> > 
> > well if we don't commands just fail as we can't add them right?
> > no corruption or stalls ...
> > 
> > > * We may need to disable NAPI?
> > 
> > hmm why napi?
> > 
> 
> I'm not sure if it's required to disable it, but I'm not sure what's the 
> point having napi if the ring size is 1..
> Will it work?

Not much point in it but it's simpler to just keep things consistent.

> > > There may be more changes..
> > >
> > > I was thinking that it may be easier to start with the easier case 
> > > RING_SIZE >= 4, make sure everything is working fine, then send a follow 
> > > up patchset with the required adjustments for RING_SIZE < 4.
> > 
> > 
> > it's ok but I'm just trying to figure out where does 4 come from.
> > 
> 
> I guess this part was not clear, sorry..
> In case of split vqs, we have ring size 2 before 4.
> And as you saw, we still get chains of 3 when NETIF_F_SG is off (Which I 
> thought was expected).
> 

Worth figuring out where do these come from.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 5/6] tools/virtio: use canonical ftrace path

2023-06-02 Thread Michael S. Tsirkin
On Wed, Feb 15, 2023 at 03:33:49PM -0700, Ross Zwisler wrote:
> The canonical location for the tracefs filesystem is at /sys/kernel/tracing.
> 
> But, from Documentation/trace/ftrace.rst:
> 
>   Before 4.1, all ftrace tracing control files were within the debugfs
>   file system, which is typically located at /sys/kernel/debug/tracing.
>   For backward compatibility, when mounting the debugfs file system,
>   the tracefs file system will be automatically mounted at:
> 
>   /sys/kernel/debug/tracing
> 
> A few spots in tools/virtio still refer to this older debugfs
> path, so let's update them to avoid confusion.
> 
> Signed-off-by: Ross Zwisler 

queued this too. thanks!

> ---
>  tools/virtio/virtio-trace/README|  2 +-
>  tools/virtio/virtio-trace/trace-agent.c | 12 
>  2 files changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/virtio/virtio-trace/README 
> b/tools/virtio/virtio-trace/README
> index b64845b823ab..cea29a2a4c0a 100644
> --- a/tools/virtio/virtio-trace/README
> +++ b/tools/virtio/virtio-trace/README
> @@ -95,7 +95,7 @@ Run
>  
>  1) Enable ftrace in the guest
>   
> - # echo 1 > /sys/kernel/debug/tracing/events/sched/enable
> + # echo 1 > /sys/kernel/tracing/events/sched/enable
>  
>  2) Run trace agent in the guest
>   This agent must be operated as root.
> diff --git a/tools/virtio/virtio-trace/trace-agent.c 
> b/tools/virtio/virtio-trace/trace-agent.c
> index cdfe77c2b4c8..7e2d9bbf0b84 100644
> --- a/tools/virtio/virtio-trace/trace-agent.c
> +++ b/tools/virtio/virtio-trace/trace-agent.c
> @@ -18,8 +18,9 @@
>  #define PIPE_DEF_BUFS16
>  #define PIPE_MIN_SIZE(PAGE_SIZE*PIPE_DEF_BUFS)
>  #define PIPE_MAX_SIZE(1024*1024)
> -#define READ_PATH_FMT\
> - "/sys/kernel/debug/tracing/per_cpu/cpu%d/trace_pipe_raw"
> +#define TRACEFS  "/sys/kernel/tracing"
> +#define DEBUGFS  "/sys/kernel/debug/tracing"
> +#define READ_PATH_FMT"%s/per_cpu/cpu%d/trace_pipe_raw"
>  #define WRITE_PATH_FMT   "/dev/virtio-ports/trace-path-cpu%d"
>  #define CTL_PATH "/dev/virtio-ports/agent-ctl-path"
>  
> @@ -120,9 +121,12 @@ static const char *make_path(int cpu_num, bool 
> this_is_write_path)
>   if (this_is_write_path)
>   /* write(output) path */
>   ret = snprintf(buf, PATH_MAX, WRITE_PATH_FMT, cpu_num);
> - else
> + else {
>   /* read(input) path */
> - ret = snprintf(buf, PATH_MAX, READ_PATH_FMT, cpu_num);
> + ret = snprintf(buf, PATH_MAX, READ_PATH_FMT, TRACEFS, cpu_num);
> + if (ret > 0 && access(buf, F_OK) != 0)
> + ret = snprintf(buf, PATH_MAX, READ_PATH_FMT, DEBUGFS, 
> cpu_num);
> + }
>  
>   if (ret <= 0) {
>   pr_err("Failed to generate %s path(CPU#%d):%d\n",
> -- 
> 2.39.1.637.g21b0678d19-goog

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 1/1] vringh: IOMEM support

2023-06-02 Thread Michael S. Tsirkin
On Fri, Jun 02, 2023 at 05:56:12PM +0800, kernel test robot wrote:
> Hi Shunsuke,
> 
> kernel test robot noticed the following build warnings:
> 
> [auto build test WARNING on mst-vhost/linux-next]
> [also build test WARNING on linus/master horms-ipvs/master v6.4-rc4 
> next-20230602]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
> 
> url:
> https://github.com/intel-lab-lkp/linux/commits/Shunsuke-Mie/vringh-IOMEM-support/20230602-135351
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git 
> linux-next
> patch link:
> https://lore.kernel.org/r/20230602055211.309960-2-mie%40igel.co.jp
> patch subject: [PATCH v4 1/1] vringh: IOMEM support
> config: alpha-allyesconfig 
> (https://download.01.org/0day-ci/archive/20230602/202306021725.3otsfxpf-...@intel.com/config)
> compiler: alpha-linux-gcc (GCC) 12.3.0
> reproduce (this is a W=1 build):
> mkdir -p ~/bin
> wget 
> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
> ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # 
> https://github.com/intel-lab-lkp/linux/commit/de2a1f5220c32e953400f225aba6bd294a8d41b8
> git remote add linux-review https://github.com/intel-lab-lkp/linux
> git fetch --no-tags linux-review 
> Shunsuke-Mie/vringh-IOMEM-support/20230602-135351
> git checkout de2a1f5220c32e953400f225aba6bd294a8d41b8
> # save the config file
> mkdir build_dir && cp config build_dir/.config
> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.3.0 ~/bin/make.cross 
> W=1 O=build_dir ARCH=alpha olddefconfig
> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.3.0 ~/bin/make.cross 
> W=1 O=build_dir ARCH=alpha SHELL=/bin/bash drivers/
> 
> If you fix the issue, kindly add following tag where applicable
> | Reported-by: kernel test robot 
> | Closes: 
> https://lore.kernel.org/oe-kbuild-all/202306021725.3otsfxpf-...@intel.com/
> 
> All warnings (new ones prefixed by >>):
> 
> >> drivers/vhost/vringh.c:1661:5: warning: no previous prototype for 
> >> 'vringh_init_iomem' [-Wmissing-prototypes]
> 1661 | int vringh_init_iomem(struct vringh *vrh, u64 features, unsigned 
> int num,
>  | ^
> >> drivers/vhost/vringh.c:1683:5: warning: no previous prototype for 
> >> 'vringh_getdesc_iomem' [-Wmissing-prototypes]
> 1683 | int vringh_getdesc_iomem(struct vringh *vrh, struct vringh_kiov 
> *riov,
>  | ^~~~
> >> drivers/vhost/vringh.c:1714:9: warning: no previous prototype for 
> >> 'vringh_iov_pull_iomem' [-Wmissing-prototypes]
> 1714 | ssize_t vringh_iov_pull_iomem(struct vringh *vrh, struct 
> vringh_kiov *riov,
>  | ^
> >> drivers/vhost/vringh.c:1729:9: warning: no previous prototype for 
> >> 'vringh_iov_push_iomem' [-Wmissing-prototypes]
> 1729 | ssize_t vringh_iov_push_iomem(struct vringh *vrh, struct 
> vringh_kiov *wiov,
>  | ^
> >> drivers/vhost/vringh.c:1744:6: warning: no previous prototype for 
> >> 'vringh_abandon_iomem' [-Wmissing-prototypes]
> 1744 | void vringh_abandon_iomem(struct vringh *vrh, unsigned int num)
>  |  ^~~~
> >> drivers/vhost/vringh.c:1759:5: warning: no previous prototype for 
> >> 'vringh_complete_iomem' [-Wmissing-prototypes]
> 1759 | int vringh_complete_iomem(struct vringh *vrh, u16 head, u32 len)
>  | ^
> >> drivers/vhost/vringh.c:1777:6: warning: no previous prototype for 
> >> 'vringh_notify_enable_iomem' [-Wmissing-prototypes]
> 1777 | bool vringh_notify_enable_iomem(struct vringh *vrh)
>  |  ^~
> >> drivers/vhost/vringh.c:1790:6: warning: no previous prototype for 
> >> 'vringh_notify_disable_iomem' [-Wmissing-prototypes]
> 1790 | void vringh_notify_disable_iomem(struct vringh *vrh)
>  |  ^~~
> >> drivers/vhost/vringh.c:1802:5: warning: no previous prototype for 
> >> 'vringh_need_notify_iomem' [-Wmissing-prototypes]
> 1802 | int vringh_need_notify_iomem(struct vringh *vrh)
>  | ^~~~
> 
> 
> vim +/vringh_init_iomem +1661 drivers/vhost/vringh.c


You probably should put the relevant code within ifdef.

>   1647
>   1648/**
>   1649 * vringh_init_iomem - initialize a vringh for a vring on 
> io-memory

Re: [PATCH v4 1/1] vringh: IOMEM support

2023-06-02 Thread kernel test robot
Hi Shunsuke,

kernel test robot noticed the following build warnings:

[auto build test WARNING on mst-vhost/linux-next]
[also build test WARNING on linus/master horms-ipvs/master v6.4-rc4 
next-20230602]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:
https://github.com/intel-lab-lkp/linux/commits/Shunsuke-Mie/vringh-IOMEM-support/20230602-135351
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
patch link:
https://lore.kernel.org/r/20230602055211.309960-2-mie%40igel.co.jp
patch subject: [PATCH v4 1/1] vringh: IOMEM support
config: alpha-allyesconfig 
(https://download.01.org/0day-ci/archive/20230602/202306021725.3otsfxpf-...@intel.com/config)
compiler: alpha-linux-gcc (GCC) 12.3.0
reproduce (this is a W=1 build):
mkdir -p ~/bin
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/intel-lab-lkp/linux/commit/de2a1f5220c32e953400f225aba6bd294a8d41b8
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review 
Shunsuke-Mie/vringh-IOMEM-support/20230602-135351
git checkout de2a1f5220c32e953400f225aba6bd294a8d41b8
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.3.0 ~/bin/make.cross 
W=1 O=build_dir ARCH=alpha olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.3.0 ~/bin/make.cross 
W=1 O=build_dir ARCH=alpha SHELL=/bin/bash drivers/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202306021725.3otsfxpf-...@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/vhost/vringh.c:1661:5: warning: no previous prototype for 
>> 'vringh_init_iomem' [-Wmissing-prototypes]
1661 | int vringh_init_iomem(struct vringh *vrh, u64 features, unsigned int 
num,
 | ^
>> drivers/vhost/vringh.c:1683:5: warning: no previous prototype for 
>> 'vringh_getdesc_iomem' [-Wmissing-prototypes]
1683 | int vringh_getdesc_iomem(struct vringh *vrh, struct vringh_kiov 
*riov,
 | ^~~~
>> drivers/vhost/vringh.c:1714:9: warning: no previous prototype for 
>> 'vringh_iov_pull_iomem' [-Wmissing-prototypes]
1714 | ssize_t vringh_iov_pull_iomem(struct vringh *vrh, struct vringh_kiov 
*riov,
 | ^
>> drivers/vhost/vringh.c:1729:9: warning: no previous prototype for 
>> 'vringh_iov_push_iomem' [-Wmissing-prototypes]
1729 | ssize_t vringh_iov_push_iomem(struct vringh *vrh, struct vringh_kiov 
*wiov,
 | ^
>> drivers/vhost/vringh.c:1744:6: warning: no previous prototype for 
>> 'vringh_abandon_iomem' [-Wmissing-prototypes]
1744 | void vringh_abandon_iomem(struct vringh *vrh, unsigned int num)
 |  ^~~~
>> drivers/vhost/vringh.c:1759:5: warning: no previous prototype for 
>> 'vringh_complete_iomem' [-Wmissing-prototypes]
1759 | int vringh_complete_iomem(struct vringh *vrh, u16 head, u32 len)
 | ^
>> drivers/vhost/vringh.c:1777:6: warning: no previous prototype for 
>> 'vringh_notify_enable_iomem' [-Wmissing-prototypes]
1777 | bool vringh_notify_enable_iomem(struct vringh *vrh)
 |  ^~
>> drivers/vhost/vringh.c:1790:6: warning: no previous prototype for 
>> 'vringh_notify_disable_iomem' [-Wmissing-prototypes]
1790 | void vringh_notify_disable_iomem(struct vringh *vrh)
 |  ^~~
>> drivers/vhost/vringh.c:1802:5: warning: no previous prototype for 
>> 'vringh_need_notify_iomem' [-Wmissing-prototypes]
1802 | int vringh_need_notify_iomem(struct vringh *vrh)
 | ^~~~


vim +/vringh_init_iomem +1661 drivers/vhost/vringh.c

  1647  
  1648  /**
  1649   * vringh_init_iomem - initialize a vringh for a vring on io-memory.
  1650   * @vrh: the vringh to initialize.
  1651   * @features: the feature bits for this ring.
  1652   * @num: the number of elements.
  1653   * @weak_barriers: true if we only need memory barriers, not I/O.
  1654   * @desc: the userspace descriptor pointer.
  1655   * @avail: the userspace avail pointer.
  1656   * @used: the userspace used pointer.
  1657   *
  1658   * Returns an error if num is invalid: you should check pointers
  1659   * yourself!
  1660   */
> 1661  int vringh_init_iomem(struct vringh *vrh, u64 features, unsigned int 
> num,
  1662bool weak_barrie

[PATCH vhost v10 08/10] virtio_ring: introduce virtqueue_dma_dev()

2023-06-02 Thread Xuan Zhuo
Added virtqueue_dma_dev() to get DMA device for virtio. Then the
caller can do dma operation in advance. The purpose is to keep memory
mapped across multiple add/get buf operations.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 17 +
 include/linux/virtio.h   |  2 ++
 2 files changed, 19 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 6771b9661798..56444f872967 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2459,6 +2459,23 @@ int virtqueue_add_inbuf_ctx(struct virtqueue *vq,
 }
 EXPORT_SYMBOL_GPL(virtqueue_add_inbuf_ctx);
 
+/**
+ * virtqueue_dma_dev - get the dma dev
+ * @_vq: the struct virtqueue we're talking about.
+ *
+ * Returns the dma dev. That can been used for dma api.
+ */
+struct device *virtqueue_dma_dev(struct virtqueue *_vq)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+
+   if (vq->use_dma_api)
+   return vring_dma_dev(vq);
+   else
+   return NULL;
+}
+EXPORT_SYMBOL_GPL(virtqueue_dma_dev);
+
 /**
  * virtqueue_kick_prepare - first half of split virtqueue_kick call.
  * @_vq: the struct virtqueue
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 0a11c5b32fe5..b24f0a665390 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -78,6 +78,8 @@ int virtqueue_add_sgs(struct virtqueue *vq,
  void *data,
  gfp_t gfp);
 
+struct device *virtqueue_dma_dev(struct virtqueue *vq);
+
 bool virtqueue_kick(struct virtqueue *vq);
 
 bool virtqueue_kick_prepare(struct virtqueue *vq);
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v10 10/10] virtio_net: support dma premapped

2023-06-02 Thread Xuan Zhuo
Introduce the module param "experiment_premapped" to enable the function
that the virtio-net do dma mapping.

If that is true, the vq of virtio-net is under the premapped mode.
It just handle the sg with dma_address. And the driver must get the dma
address of the buffer to unmap after get the buffer from virtio core.

That will be useful when AF_XDP is enable, AF_XDP tx and the kernel packet
xmit will share the tx queue, so the skb xmit must support the premapped
mode.

Signed-off-by: Xuan Zhuo 
---
 drivers/net/virtio_net.c | 163 +--
 1 file changed, 141 insertions(+), 22 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 2396c28c0122..5898212fcb3c 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -26,10 +26,11 @@
 static int napi_weight = NAPI_POLL_WEIGHT;
 module_param(napi_weight, int, 0444);
 
-static bool csum = true, gso = true, napi_tx = true;
+static bool csum = true, gso = true, napi_tx = true, experiment_premapped;
 module_param(csum, bool, 0444);
 module_param(gso, bool, 0444);
 module_param(napi_tx, bool, 0644);
+module_param(experiment_premapped, bool, 0644);
 
 /* FIXME: MTU in config. */
 #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
@@ -142,6 +143,9 @@ struct send_queue {
 
/* Record whether sq is in reset state. */
bool reset;
+
+   /* The vq is premapped mode. */
+   bool premapped;
 };
 
 /* Internal representation of a receive virtqueue */
@@ -174,6 +178,9 @@ struct receive_queue {
char name[16];
 
struct xdp_rxq_info xdp_rxq;
+
+   /* The vq is premapped mode. */
+   bool premapped;
 };
 
 /* This structure can contain rss message with maximum settings for 
indirection table and keysize
@@ -546,6 +553,105 @@ static struct sk_buff *page_to_skb(struct virtnet_info 
*vi,
return skb;
 }
 
+static int virtnet_generic_unmap(struct virtqueue *vq, struct 
virtqueue_detach_cursor *cursor)
+{
+   enum dma_data_direction dir;
+   dma_addr_t addr;
+   u32 len;
+   int err;
+
+   do {
+   err = virtqueue_detach(vq, cursor, , , );
+   if (!err || err == -EAGAIN)
+   dma_unmap_page_attrs(virtqueue_dma_dev(vq), addr, len, 
dir, 0);
+
+   } while (err == -EAGAIN);
+
+   return err;
+}
+
+static void *virtnet_detach_unused_buf(struct virtqueue *vq, bool premapped)
+{
+   struct virtqueue_detach_cursor cursor;
+   void *buf;
+
+   if (!premapped)
+   return virtqueue_detach_unused_buf(vq);
+
+   buf = virtqueue_detach_unused_buf_premapped(vq, );
+   if (buf)
+   virtnet_generic_unmap(vq, );
+
+   return buf;
+}
+
+static void *virtnet_get_buf_ctx(struct virtqueue *vq, bool premapped, u32 
*len, void **ctx)
+{
+   struct virtqueue_detach_cursor cursor;
+   void *buf;
+
+   if (!premapped)
+   return virtqueue_get_buf_ctx(vq, len, ctx);
+
+   buf = virtqueue_get_buf_premapped(vq, len, ctx, );
+   if (buf)
+   virtnet_generic_unmap(vq, );
+
+   return buf;
+}
+
+#define virtnet_rq_get_buf(rq, plen, pctx) \
+({ \
+   typeof(rq) _rq = (rq); \
+   virtnet_get_buf_ctx(_rq->vq, _rq->premapped, plen, pctx); \
+})
+
+#define virtnet_sq_get_buf(sq, plen, pctx) \
+({ \
+   typeof(sq) _sq = (sq); \
+   virtnet_get_buf_ctx(_sq->vq, _sq->premapped, plen, pctx); \
+})
+
+static int virtnet_add_sg(struct virtqueue *vq, bool premapped,
+ struct scatterlist *sg, unsigned int num, bool out,
+ void *data, void *ctx, gfp_t gfp)
+{
+   enum dma_data_direction dir;
+   struct device *dev;
+   int err, ret;
+
+   if (!premapped)
+   return virtqueue_add_sg(vq, sg, num, out, data, ctx, gfp);
+
+   dir = out ? DMA_TO_DEVICE : DMA_FROM_DEVICE;
+   dev = virtqueue_dma_dev(vq);
+
+   ret = dma_map_sg_attrs(dev, sg, num, dir, 0);
+   if (ret != num)
+   goto err;
+
+   err = virtqueue_add_sg(vq, sg, num, out, data, ctx, gfp);
+   if (err < 0)
+   goto err;
+
+   return 0;
+
+err:
+   dma_unmap_sg_attrs(dev, sg, num, dir, 0);
+   return -ENOMEM;
+}
+
+static int virtnet_add_outbuf(struct send_queue *sq, unsigned int num, void 
*data)
+{
+   return virtnet_add_sg(sq->vq, sq->premapped, sq->sg, num, true, data, 
NULL, GFP_ATOMIC);
+}
+
+static int virtnet_add_inbuf(struct receive_queue *rq, unsigned int num, void 
*data,
+void *ctx, gfp_t gfp)
+{
+   return virtnet_add_sg(rq->vq, rq->premapped, rq->sg, num, false, data, 
ctx, gfp);
+}
+
 static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
 {
unsigned int len;
@@ -553,7 +659,7 @@ static void free_old_xmit_skbs(struct send_queue *sq, bool 
in_napi)
unsigned int bytes = 0;
void *ptr;
 
-   while ((ptr = virtqueue_get_buf(sq->vq, )) != 

[PATCH vhost v10 06/10] virtio_ring: packed-detach: support return dma info to driver

2023-06-02 Thread Xuan Zhuo
Under the premapped mode, the driver needs to unmap the DMA address
after receiving the buffer. The virtio core records the DMA address,
so the driver needs a way to get the dma info from the virtio core.

A straightforward approach is to pass an array to the virtio core when
calling virtqueue_get_buf(). However, it is not feasible when there are
multiple DMA addresses in the descriptor chain, and the array size is
unknown.

To solve this problem, a helper be introduced. After calling
virtqueue_get_buf(), the driver can call the helper to
retrieve a dma info. If the helper function returns -EAGAIN, it means
that there are more DMA addresses to be processed, and the driver should
call the helper function again. To keep track of the current position in
the chain, a cursor must be passed to the helper function, which is
initialized by virtqueue_get_buf().

Some processes are done inside this helper, so this helper MUST be
called under the premapped mode.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 105 ---
 include/linux/virtio.h   |   9 ++-
 2 files changed, 103 insertions(+), 11 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index cdc4349f6066..cbc22daae7e1 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1695,8 +1695,85 @@ static bool virtqueue_kick_prepare_packed(struct 
virtqueue *_vq)
return needs_kick;
 }
 
+static void detach_cursor_init_packed(struct vring_virtqueue *vq,
+ struct virtqueue_detach_cursor *cursor, 
u16 id)
+{
+   struct vring_desc_state_packed *state = NULL;
+   u32 len;
+
+   state = >packed.desc_state[id];
+
+   /* Clear data ptr. */
+   state->data = NULL;
+
+   vq->packed.desc_extra[state->last].next = vq->free_head;
+   vq->free_head = id;
+   vq->vq.num_free += state->num;
+
+   /* init cursor */
+   cursor->curr = id;
+   cursor->done = 0;
+   cursor->pos = 0;
+
+   if (vq->packed.desc_extra[id].flags & VRING_DESC_F_INDIRECT) {
+   len = vq->split.desc_extra[id].len;
+
+   cursor->num = len / sizeof(struct vring_packed_desc);
+   cursor->indirect = true;
+
+   vring_unmap_extra_packed(vq, >packed.desc_extra[id]);
+   } else {
+   cursor->num = state->num;
+   cursor->indirect = false;
+   }
+}
+
+static int virtqueue_detach_packed(struct virtqueue *_vq, struct 
virtqueue_detach_cursor *cursor,
+  dma_addr_t *addr, u32 *len, enum 
dma_data_direction *dir)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+
+   if (unlikely(cursor->done))
+   return -EINVAL;
+
+   if (!cursor->indirect) {
+   struct vring_desc_extra *extra;
+
+   extra = >packed.desc_extra[cursor->curr];
+   cursor->curr = extra->next;
+
+   *addr = extra->addr;
+   *len = extra->len;
+   *dir = (extra->flags & VRING_DESC_F_WRITE) ? DMA_FROM_DEVICE : 
DMA_TO_DEVICE;
+
+   if (++cursor->pos == cursor->num) {
+   cursor->done = true;
+   return 0;
+   }
+   } else {
+   struct vring_packed_desc *indir_desc, *desc;
+   u16 flags;
+
+   indir_desc = vq->packed.desc_state[cursor->curr].indir_desc;
+   desc = _desc[cursor->pos];
+
+   flags = le16_to_cpu(desc->flags);
+   *addr = le64_to_cpu(desc->addr);
+   *len = le32_to_cpu(desc->len);
+   *dir = (flags & VRING_DESC_F_WRITE) ?  DMA_FROM_DEVICE : 
DMA_TO_DEVICE;
+
+   if (++cursor->pos == cursor->num) {
+   kfree(indir_desc);
+   cursor->done = true;
+   return 0;
+   }
+   }
+
+   return -EAGAIN;
+}
+
 static void detach_buf_packed(struct vring_virtqueue *vq,
- unsigned int id, void **ctx)
+ unsigned int id)
 {
struct vring_desc_state_packed *state = NULL;
struct vring_packed_desc *desc;
@@ -1736,8 +1813,6 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
}
kfree(desc);
state->indir_desc = NULL;
-   } else if (ctx) {
-   *ctx = state->indir_desc;
}
 }
 
@@ -1768,7 +1843,8 @@ static bool more_used_packed(const struct vring_virtqueue 
*vq)
 
 static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
  unsigned int *len,
- void **ctx)
+ void **ctx,
+ struct virtqueue_detach_cursor 
*cursor)
 {
struct vring_virtqueue *vq = to_vvq(_vq);
u16 last_used, id, 

[PATCH vhost v10 09/10] virtio_ring: introduce virtqueue_add_sg()

2023-06-02 Thread Xuan Zhuo
Introduce virtqueueu_add_sg(), so that in virtio-net we can create an
unify api for rq and sq.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 23 +++
 include/linux/virtio.h   |  4 
 2 files changed, 27 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 56444f872967..a00f049ea442 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2356,6 +2356,29 @@ static inline int virtqueue_add(struct virtqueue *_vq,
out_sgs, in_sgs, data, ctx, gfp);
 }
 
+/**
+ * virtqueue_add_sg - expose buffers to other end
+ * @vq: the struct virtqueue we're talking about.
+ * @sg: a scatterlist
+ * @num: the number of entries in @sg
+ * @out: whether the sg is readable by other side
+ * @data: the token identifying the buffer.
+ * @ctx: extra context for the token
+ * @gfp: how to do memory allocations (if necessary).
+ *
+ * Caller must ensure we don't call this with other virtqueue operations
+ * at the same time (except where noted).
+ *
+ * Returns zero or a negative error (ie. ENOSPC, ENOMEM, EIO).
+ */
+int virtqueue_add_sg(struct virtqueue *vq, struct scatterlist *sg,
+unsigned int num, bool out, void *data,
+void *ctx, gfp_t gfp)
+{
+   return virtqueue_add(vq, , num, (int)out, (int)!out, data, ctx, gfp);
+}
+EXPORT_SYMBOL_GPL(virtqueue_add_sg);
+
 /**
  * virtqueue_add_sgs - expose buffers to other end
  * @_vq: the struct virtqueue we're talking about.
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index b24f0a665390..1a4aa4382c53 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -55,6 +55,10 @@ struct virtqueue_detach_cursor {
unsigned pos:16;
 };
 
+int virtqueue_add_sg(struct virtqueue *vq, struct scatterlist *sg,
+unsigned int num, bool out, void *data,
+void *ctx, gfp_t gfp);
+
 int virtqueue_add_outbuf(struct virtqueue *vq,
 struct scatterlist sg[], unsigned int num,
 void *data,
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v10 05/10] virtio_ring: split-detach: support return dma info to driver

2023-06-02 Thread Xuan Zhuo
Under the premapped mode, the driver needs to unmap the DMA address
after receiving the buffer. The virtio core records the DMA address,
so the driver needs a way to get the dma info from the virtio core.

A straightforward approach is to pass an array to the virtio core when
calling virtqueue_get_buf(). However, it is not feasible when there are
multiple DMA addresses in the descriptor chain, and the array size is
unknown.

To solve this problem, a helper be introduced. After calling
virtqueue_get_buf(), the driver can call the helper to
retrieve a dma info. If the helper function returns -EAGAIN, it means
that there are more DMA addresses to be processed, and the driver should
call the helper function again. To keep track of the current position in
the chain, a cursor must be passed to the helper function, which is
initialized by virtqueue_get_buf().

Some processes are done inside this helper, so this helper MUST be
called under the premapped mode.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 118 ---
 include/linux/virtio.h   |  11 
 2 files changed, 119 insertions(+), 10 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index dc109fbc05a5..cdc4349f6066 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -754,8 +754,95 @@ static bool virtqueue_kick_prepare_split(struct virtqueue 
*_vq)
return needs_kick;
 }
 
-static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
-void **ctx)
+static void detach_cursor_init_split(struct vring_virtqueue *vq,
+struct virtqueue_detach_cursor *cursor, 
u16 head)
+{
+   struct vring_desc_extra *extra;
+
+   extra = >split.desc_extra[head];
+
+   /* Clear data ptr. */
+   vq->split.desc_state[head].data = NULL;
+
+   cursor->head = head;
+   cursor->done = 0;
+
+   if (extra->flags & VRING_DESC_F_INDIRECT) {
+   cursor->num = extra->len / sizeof(struct vring_desc);
+   cursor->indirect = true;
+   cursor->pos = 0;
+
+   vring_unmap_one_split(vq, head);
+
+   extra->next = vq->free_head;
+
+   vq->free_head = head;
+
+   /* Plus final descriptor */
+   vq->vq.num_free++;
+
+   } else {
+   cursor->indirect = false;
+   cursor->pos = head;
+   }
+}
+
+static int virtqueue_detach_split(struct virtqueue *_vq, struct 
virtqueue_detach_cursor *cursor,
+ dma_addr_t *addr, u32 *len, enum 
dma_data_direction *dir)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+   __virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
+   int rc = -EAGAIN;
+
+   if (unlikely(cursor->done))
+   return -EINVAL;
+
+   if (!cursor->indirect) {
+   struct vring_desc_extra *extra;
+   unsigned int i;
+
+   i = cursor->pos;
+
+   extra = >split.desc_extra[i];
+
+   if (vq->split.vring.desc[i].flags & nextflag) {
+   cursor->pos = extra->next;
+   } else {
+   extra->next = vq->free_head;
+   vq->free_head = cursor->head;
+   cursor->done = true;
+   rc = 0;
+   }
+
+   *addr = extra->addr;
+   *len = extra->len;
+   *dir = (extra->flags & VRING_DESC_F_WRITE) ? DMA_FROM_DEVICE : 
DMA_TO_DEVICE;
+
+   vq->vq.num_free++;
+
+   } else {
+   struct vring_desc *indir_desc, *desc;
+   u16 flags;
+
+   indir_desc = vq->split.desc_state[cursor->head].indir_desc;
+   desc = _desc[cursor->pos];
+
+   flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
+   *addr = virtio64_to_cpu(vq->vq.vdev, desc->addr);
+   *len = virtio32_to_cpu(vq->vq.vdev, desc->len);
+   *dir = (flags & VRING_DESC_F_WRITE) ? DMA_FROM_DEVICE : 
DMA_TO_DEVICE;
+
+   if (++cursor->pos == cursor->num) {
+   kfree(indir_desc);
+   cursor->done = true;
+   return 0;
+   }
+   }
+
+   return rc;
+}
+
+static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head)
 {
unsigned int i, j;
__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
@@ -799,8 +886,6 @@ static void detach_buf_split(struct vring_virtqueue *vq, 
unsigned int head,
 
kfree(indir_desc);
vq->split.desc_state[head].indir_desc = NULL;
-   } else if (ctx) {
-   *ctx = vq->split.desc_state[head].indir_desc;
}
 }
 
@@ -812,7 +897,8 @@ static bool more_used_split(const struct vring_virtqueue 
*vq)
 
 static void 

[PATCH vhost v10 07/10] virtio_ring: introduce helpers for premapped

2023-06-02 Thread Xuan Zhuo
This patch introduces three helpers for premapped mode.

* virtqueue_get_buf_premapped
* virtqueue_detach_unused_buf_premapped

The above helpers work like the non-premapped funcs. But a cursor is
passed.

virtqueue_detach is used to get the dma info of the last buf by
  cursor.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 83 
 include/linux/virtio.h   | 10 +
 2 files changed, 93 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index cbc22daae7e1..6771b9661798 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2555,6 +2555,66 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned 
int *len)
return virtqueue_get_buf_ctx(_vq, len, NULL);
 }
 EXPORT_SYMBOL_GPL(virtqueue_get_buf);
+
+/**
+ * virtqueue_get_buf_premapped - get the next used buffer
+ * @_vq: the struct virtqueue we're talking about.
+ * @len: the length written into the buffer
+ * @ctx: extra context for the token
+ * @cursor: detach cursor
+ *
+ * If the device wrote data into the buffer, @len will be set to the
+ * amount written.  This means you don't need to clear the buffer
+ * beforehand to ensure there's no data leakage in the case of short
+ * writes.
+ *
+ * Caller must ensure we don't call this with other virtqueue
+ * operations at the same time (except where noted).
+ *
+ * This is used for the premapped vq. The cursor is passed by the dirver, that
+ * is used for virtqueue_detach. That will be initialized by virtio core
+ * internally.
+ *
+ * Returns NULL if there are no used buffers, or the "data" token
+ * handed to virtqueue_add_*().
+ */
+void *virtqueue_get_buf_premapped(struct virtqueue *_vq, unsigned int *len,
+ void **ctx,
+ struct virtqueue_detach_cursor *cursor)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+
+   return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, ctx, 
cursor) :
+virtqueue_get_buf_ctx_split(_vq, len, ctx, 
cursor);
+}
+EXPORT_SYMBOL_GPL(virtqueue_get_buf_premapped);
+
+/**
+ * virtqueue_detach - get the dma info of last buf
+ * @_vq: the struct virtqueue we're talking about.
+ * @cursor: detach cursor
+ * @addr: the dma address
+ * @len: the length of the dma address
+ * @dir: the direction of the dma address
+ *
+ * This is used for the premapped vq. The cursor is initialized by
+ * virtqueue_get_buf_premapped or virtqueue_detach_unused_buf_premapped.
+ *
+ * Returns:
+ * -EAGAIN: there are more dma info, this function should be called more.
+ * -EINVAL: the process is done, should not call this function
+ * 0: no more dma info
+ */
+int virtqueue_detach(struct virtqueue *_vq, struct virtqueue_detach_cursor 
*cursor,
+dma_addr_t *addr, u32 *len, enum dma_data_direction *dir)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+
+   return vq->packed_ring ? virtqueue_detach_packed(_vq, cursor, addr, 
len, dir) :
+virtqueue_detach_split(_vq, cursor, addr, len, 
dir);
+}
+EXPORT_SYMBOL_GPL(virtqueue_detach);
+
 /**
  * virtqueue_disable_cb - disable callbacks
  * @_vq: the struct virtqueue we're talking about.
@@ -2682,6 +2742,29 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
 }
 EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf);
 
+/**
+ * virtqueue_detach_unused_buf_premapped - detach first unused buffer
+ * @_vq: the struct virtqueue we're talking about.
+ * @cursor: detach cursor
+ *
+ * This is used for the premapped vq. The cursor is passed by the dirver, that
+ * is used for virtqueue_detach. That will be initialized by virtio core
+ * internally.
+ *
+ * Returns NULL or the "data" token handed to virtqueue_add_*().
+ * This is not valid on an active queue; it is useful for device
+ * shutdown or the reset queue.
+ */
+void *virtqueue_detach_unused_buf_premapped(struct virtqueue *_vq,
+   struct virtqueue_detach_cursor 
*cursor)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+
+   return vq->packed_ring ? virtqueue_detach_unused_buf_packed(_vq, 
cursor) :
+virtqueue_detach_unused_buf_split(_vq, cursor);
+}
+EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf_premapped);
+
 static inline bool more_used(const struct vring_virtqueue *vq)
 {
return vq->packed_ring ? more_used_packed(vq) : more_used_split(vq);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 7f137c7a9034..0a11c5b32fe5 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -3,6 +3,7 @@
 #define _LINUX_VIRTIO_H
 /* Everything a virtio driver needs to work with any particular virtio
  * implementation. */
+#include 
 #include 
 #include 
 #include 
@@ -88,6 +89,10 @@ void *virtqueue_get_buf(struct virtqueue *vq, unsigned int 
*len);
 void *virtqueue_get_buf_ctx(struct virtqueue *vq, unsigned int *len,

[PATCH vhost v10 04/10] virtio_ring: packed: support add premapped buf

2023-06-02 Thread Xuan Zhuo
If the vq is the premapped mode, use the sg_dma_address() directly.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 36 ++--
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 18212c3e056b..dc109fbc05a5 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1299,9 +1299,13 @@ static int virtqueue_add_indirect_packed(struct 
vring_virtqueue *vq,
 
for (n = 0; n < out_sgs + in_sgs; n++) {
for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-   if (vring_map_one_sg(vq, sg, n < out_sgs ?
-DMA_TO_DEVICE : DMA_FROM_DEVICE, 
))
-   goto unmap_release;
+   if (vq->premapped) {
+   addr = sg_dma_address(sg);
+   } else {
+   if (vring_map_one_sg(vq, sg, n < out_sgs ?
+DMA_TO_DEVICE : 
DMA_FROM_DEVICE, ))
+   goto unmap_release;
+   }
 
desc[i].flags = cpu_to_le16(n < out_sgs ?
0 : VRING_DESC_F_WRITE);
@@ -1369,10 +1373,12 @@ static int virtqueue_add_indirect_packed(struct 
vring_virtqueue *vq,
return 0;
 
 unmap_release:
-   err_idx = i;
+   if (!vq->premapped) {
+   err_idx = i;
 
-   for (i = 0; i < err_idx; i++)
-   vring_unmap_desc_packed(vq, [i]);
+   for (i = 0; i < err_idx; i++)
+   vring_unmap_desc_packed(vq, [i]);
+   }
 
kfree(desc);
 
@@ -1447,9 +1453,13 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
for (sg = sgs[n]; sg; sg = sg_next(sg)) {
dma_addr_t addr;
 
-   if (vring_map_one_sg(vq, sg, n < out_sgs ?
-DMA_TO_DEVICE : DMA_FROM_DEVICE, 
))
-   goto unmap_release;
+   if (vq->premapped) {
+   addr = sg_dma_address(sg);
+   } else {
+   if (vring_map_one_sg(vq, sg, n < out_sgs ?
+DMA_TO_DEVICE : 
DMA_FROM_DEVICE, ))
+   goto unmap_release;
+   }
 
flags = cpu_to_le16(vq->packed.avail_used_flags |
(++c == total_sg ? 0 : VRING_DESC_F_NEXT) |
@@ -1512,11 +1522,17 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
return 0;
 
 unmap_release:
+   vq->packed.avail_used_flags = avail_used_flags;
+
+   if (vq->premapped) {
+   END_USE(vq);
+   return -EIO;
+   }
+
err_idx = i;
i = head;
curr = vq->free_head;
 
-   vq->packed.avail_used_flags = avail_used_flags;
 
for (n = 0; n < total_sg; n++) {
if (i == err_idx)
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v10 00/10] virtio core prepares for AF_XDP

2023-06-02 Thread Xuan Zhuo
## About DMA APIs

Now, virtio may can not work with DMA APIs when virtio features do not have
VIRTIO_F_ACCESS_PLATFORM.

1. I tried to let DMA APIs return phy address by virtio-device. But DMA APIs 
just
   work with the "real" devices.
2. I tried to let xsk support callballs to get phy address from virtio-net
   driver as the dma address. But the maintainers of xsk may want to use dma-buf
   to replace the DMA APIs. I think that may be a larger effort. We will wait
   too long.

So rethinking this, firstly, we can support premapped-dma only for devices with
VIRTIO_F_ACCESS_PLATFORM. In the case of af-xdp, if the users want to use it,
they have to update the device to support VIRTIO_F_RING_RESET, and they can also
enable the device's VIRTIO_F_ACCESS_PLATFORM feature.

Thanks for the help from Christoph.

=

XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero
copy feature of xsk (XDP socket) needs to be supported by the driver. The
performance of zero copy is very good.

ENV: Qemu with vhost.

   vhost cpu | Guest APP CPU |Guest Softirq CPU | PPS
-|---|--|
xmit by sockperf: 90%|   100%|  |  318967
xmit by xsk:  100%   |   30% |   33%| 1192064
recv by sockperf: 100%   |   68% |   100%   |  692288
recv by xsk:  100%   |   33% |   43%|  771670

Before achieving the function of Virtio-Net, we also have to let virtio core
support these features:

1. virtio core support premapped
2. virtio core support reset per-queue
3. introduce DMA APIs to virtio core

Please review.

Thanks.

v10:
 1. support to set vq to premapped mode, then the vq just handles the premapped 
request.
 2. virtio-net support to do dma mapping in advance

v9:
 1. use flag to distinguish the premapped operations. no do judgment by sg.

v8:
 1. vring_sg_address: check by sg_page(sg) not dma_address. Because 0 is a 
valid dma address
 2. remove unused code from vring_map_one_sg()

v7:
 1. virtqueue_dma_dev() return NULL when virtio is without DMA API.

v6:
 1. change the size of the flags to u32.

v5:
 1. fix for error handler
 2. add flags to record internal dma mapping

v4:
 1. rename map_inter to dma_map_internal
 2. fix: Excess function parameter 'vq' description in 'virtqueue_dma_dev'

v3:
 1. add map_inter to struct desc state to reocrd whether virtio core do dma map

v2:
 1. based on sgs[0]->dma_address to judgment is premapped
 2. based on extra.addr to judgment to do unmap for no-indirect desc
 3. based on indir_desc to judgment to do unmap for indirect desc
 4. rename virtqueue_get_dma_dev to virtqueue_dma_dev

v1:
 1. expose dma device. NO introduce the api for dma and sync
 2. split some commit for review.




Xuan Zhuo (10):
  virtio_ring: put mapping error check in vring_map_one_sg
  virtio_ring: introduce virtqueue_set_premapped()
  virtio_ring: split: support add premapped buf
  virtio_ring: packed: support add premapped buf
  virtio_ring: split-detach: support return dma info to driver
  virtio_ring: packed-detach: support return dma info to driver
  virtio_ring: introduce helpers for premapped
  virtio_ring: introduce virtqueue_dma_dev()
  virtio_ring: introduce virtqueue_add_sg()
  virtio_net: support dma premapped

 drivers/net/virtio_net.c | 163 ++--
 drivers/virtio/virtio_ring.c | 493 +++
 include/linux/virtio.h   |  34 +++
 3 files changed, 612 insertions(+), 78 deletions(-)

--
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v10 01/10] virtio_ring: put mapping error check in vring_map_one_sg

2023-06-02 Thread Xuan Zhuo
This patch put the dma addr error check in vring_map_one_sg().

The benefits of doing this:

1. reduce one judgment of vq->use_dma_api.
2. make vring_map_one_sg more simple, without calling
   vring_mapping_error to check the return value. simplifies subsequent
   code

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 37 +---
 1 file changed, 22 insertions(+), 15 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index c5310eaf8b46..72ed07a604d4 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -355,9 +355,8 @@ static struct device *vring_dma_dev(const struct 
vring_virtqueue *vq)
 }
 
 /* Map one sg entry. */
-static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq,
-  struct scatterlist *sg,
-  enum dma_data_direction direction)
+static int vring_map_one_sg(const struct vring_virtqueue *vq, struct 
scatterlist *sg,
+   enum dma_data_direction direction, dma_addr_t *addr)
 {
if (!vq->use_dma_api) {
/*
@@ -366,7 +365,8 @@ static dma_addr_t vring_map_one_sg(const struct 
vring_virtqueue *vq,
 * depending on the direction.
 */
kmsan_handle_dma(sg_page(sg), sg->offset, sg->length, 
direction);
-   return (dma_addr_t)sg_phys(sg);
+   *addr = (dma_addr_t)sg_phys(sg);
+   return 0;
}
 
/*
@@ -374,9 +374,14 @@ static dma_addr_t vring_map_one_sg(const struct 
vring_virtqueue *vq,
 * the way it expects (we don't guarantee that the scatterlist
 * will exist for the lifetime of the mapping).
 */
-   return dma_map_page(vring_dma_dev(vq),
+   *addr = dma_map_page(vring_dma_dev(vq),
sg_page(sg), sg->offset, sg->length,
direction);
+
+   if (dma_mapping_error(vring_dma_dev(vq), *addr))
+   return -ENOMEM;
+
+   return 0;
 }
 
 static dma_addr_t vring_map_single(const struct vring_virtqueue *vq,
@@ -588,8 +593,9 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 
for (n = 0; n < out_sgs; n++) {
for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-   dma_addr_t addr = vring_map_one_sg(vq, sg, 
DMA_TO_DEVICE);
-   if (vring_mapping_error(vq, addr))
+   dma_addr_t addr;
+
+   if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, ))
goto unmap_release;
 
prev = i;
@@ -603,8 +609,9 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
}
for (; n < (out_sgs + in_sgs); n++) {
for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-   dma_addr_t addr = vring_map_one_sg(vq, sg, 
DMA_FROM_DEVICE);
-   if (vring_mapping_error(vq, addr))
+   dma_addr_t addr;
+
+   if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, ))
goto unmap_release;
 
prev = i;
@@ -1279,9 +1286,8 @@ static int virtqueue_add_indirect_packed(struct 
vring_virtqueue *vq,
 
for (n = 0; n < out_sgs + in_sgs; n++) {
for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-   addr = vring_map_one_sg(vq, sg, n < out_sgs ?
-   DMA_TO_DEVICE : DMA_FROM_DEVICE);
-   if (vring_mapping_error(vq, addr))
+   if (vring_map_one_sg(vq, sg, n < out_sgs ?
+DMA_TO_DEVICE : DMA_FROM_DEVICE, 
))
goto unmap_release;
 
desc[i].flags = cpu_to_le16(n < out_sgs ?
@@ -1426,9 +1432,10 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
c = 0;
for (n = 0; n < out_sgs + in_sgs; n++) {
for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-   dma_addr_t addr = vring_map_one_sg(vq, sg, n < out_sgs ?
-   DMA_TO_DEVICE : DMA_FROM_DEVICE);
-   if (vring_mapping_error(vq, addr))
+   dma_addr_t addr;
+
+   if (vring_map_one_sg(vq, sg, n < out_sgs ?
+DMA_TO_DEVICE : DMA_FROM_DEVICE, 
))
goto unmap_release;
 
flags = cpu_to_le16(vq->packed.avail_used_flags |
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v10 03/10] virtio_ring: split: support add premapped buf

2023-06-02 Thread Xuan Zhuo
If the vq is the premapped mode, use the sg_dma_address() directly.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 46 ++--
 1 file changed, 28 insertions(+), 18 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 2afdfb9e3e30..18212c3e056b 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -598,8 +598,12 @@ static inline int virtqueue_add_split(struct virtqueue 
*_vq,
for (sg = sgs[n]; sg; sg = sg_next(sg)) {
dma_addr_t addr;
 
-   if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, ))
-   goto unmap_release;
+   if (vq->premapped) {
+   addr = sg_dma_address(sg);
+   } else {
+   if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, 
))
+   goto unmap_release;
+   }
 
prev = i;
/* Note that we trust indirect descriptor
@@ -614,8 +618,12 @@ static inline int virtqueue_add_split(struct virtqueue 
*_vq,
for (sg = sgs[n]; sg; sg = sg_next(sg)) {
dma_addr_t addr;
 
-   if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, ))
-   goto unmap_release;
+   if (vq->premapped) {
+   addr = sg_dma_address(sg);
+   } else {
+   if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, 
))
+   goto unmap_release;
+   }
 
prev = i;
/* Note that we trust indirect descriptor
@@ -689,21 +697,23 @@ static inline int virtqueue_add_split(struct virtqueue 
*_vq,
return 0;
 
 unmap_release:
-   err_idx = i;
+   if (!vq->premapped) {
+   err_idx = i;
 
-   if (indirect)
-   i = 0;
-   else
-   i = head;
-
-   for (n = 0; n < total_sg; n++) {
-   if (i == err_idx)
-   break;
-   if (indirect) {
-   vring_unmap_one_split_indirect(vq, [i]);
-   i = virtio16_to_cpu(_vq->vdev, desc[i].next);
-   } else
-   i = vring_unmap_one_split(vq, i);
+   if (indirect)
+   i = 0;
+   else
+   i = head;
+
+   for (n = 0; n < total_sg; n++) {
+   if (i == err_idx)
+   break;
+   if (indirect) {
+   vring_unmap_one_split_indirect(vq, [i]);
+   i = virtio16_to_cpu(_vq->vdev, desc[i].next);
+   } else
+   i = vring_unmap_one_split(vq, i);
+   }
}
 
if (indirect)
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v10 02/10] virtio_ring: introduce virtqueue_set_premapped()

2023-06-02 Thread Xuan Zhuo
This helper allows the driver change the dma mode to premapped mode.
Under the premapped mode, the virtio core do not do dma mapping
internally.

This just work when the use_dma_api is true. If the use_dma_api is false,
the dma options is not through the DMA APIs, that is not the standard
way of the linux kernel.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 40 
 include/linux/virtio.h   |  2 ++
 2 files changed, 42 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 72ed07a604d4..2afdfb9e3e30 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -172,6 +172,9 @@ struct vring_virtqueue {
/* Host publishes avail event idx */
bool event;
 
+   /* Do DMA mapping by driver */
+   bool premapped;
+
/* Head of free buffer list. */
unsigned int free_head;
/* Number we've added since last sync. */
@@ -2059,6 +2062,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
vq->packed_ring = true;
vq->dma_dev = dma_dev;
vq->use_dma_api = vring_use_dma_api(vdev);
+   vq->premapped = false;
 
vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
!context;
@@ -2548,6 +2552,7 @@ static struct virtqueue *__vring_new_virtqueue(unsigned 
int index,
 #endif
vq->dma_dev = dma_dev;
vq->use_dma_api = vring_use_dma_api(vdev);
+   vq->premapped = false;
 
vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
!context;
@@ -2691,6 +2696,41 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
 }
 EXPORT_SYMBOL_GPL(virtqueue_resize);
 
+/**
+ * virtqueue_set_premapped - set the vring premapped mode
+ * @_vq: the struct virtqueue we're talking about.
+ *
+ * Enable the premapped mode of the vq.
+ *
+ * The vring in premapped mode does not do dma internally, so the driver must
+ * do dma mapping in advance. The driver must pass the dma_address through
+ * dma_address of scatterlist. When the driver got a used buffer from
+ * the vring, it has to unmap the dma address. So the driver must call
+ * virtqueue_get_buf_premapped()/virtqueue_detach_unused_buf_premapped().
+ *
+ * This must be called before adding any buf to vring.
+ * So this should be called immediately after init vq or vq reset.
+ *
+ * Caller must ensure we don't call this with other virtqueue operations
+ * at the same time (except where noted).
+ *
+ * Returns zero or a negative error.
+ * 0: success.
+ * -EINVAL: vring does not use the dma api, so we can not enable premapped 
mode.
+ */
+int virtqueue_set_premapped(struct virtqueue *_vq)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+
+   if (!vq->use_dma_api)
+   return -EINVAL;
+
+   vq->premapped = true;
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(virtqueue_set_premapped);
+
 /* Only available for split ring */
 struct virtqueue *vring_new_virtqueue(unsigned int index,
  unsigned int num,
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index b93238db94e3..1fc0e1023bd4 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -78,6 +78,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
 
 unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
 
+int virtqueue_set_premapped(struct virtqueue *_vq);
+
 bool virtqueue_poll(struct virtqueue *vq, unsigned);
 
 bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization