Re: How to implement message forwarding from one CID to another in vhost driver

2024-05-28 Thread Paolo Bonzini
On Tue, May 28, 2024 at 5:53 PM Stefano Garzarella  wrote:
>
> On Tue, May 28, 2024 at 05:49:32PM GMT, Paolo Bonzini wrote:
> >On Tue, May 28, 2024 at 5:41 PM Stefano Garzarella  
> >wrote:
> >> >I think it's either that or implementing virtio-vsock in userspace
> >> >(https://lore.kernel.org/qemu-devel/30baeb56-64d2-4ea3-8e53-6a5c50999...@redhat.com/,
> >> >search for "To connect host<->guest").
> >>
> >> For in this case AF_VSOCK can't be used in the host, right?
> >> So it's similar to vhost-user-vsock.
> >
> >Not sure if I understand but in this case QEMU knows which CIDs are
> >forwarded to the host (either listen on vsock and connect to the host,
> >or vice versa), so there is no kernel and no VMADDR_FLAG_TO_HOST
> >involved.
>
> I meant that the application in the host that wants to connect to the
> guest cannot use AF_VSOCK in the host, but must use the one where QEMU
> is listening (e.g. AF_INET, AF_UNIX), right?
>
> I think one of Alex's requirements was that the application in the host
> continue to use AF_VSOCK as in their environment.

Can the host use VMADDR_CID_LOCAL for host-to-host communication? If
so, the proposed "-object vsock-forward" syntax can connect to it and
it should work as long as the application on the host does not assume
that it is on CID 3.

Paolo




Re: How to implement message forwarding from one CID to another in vhost driver

2024-05-28 Thread Paolo Bonzini
On Tue, May 28, 2024 at 5:41 PM Stefano Garzarella  wrote:
> >I think it's either that or implementing virtio-vsock in userspace
> >(https://lore.kernel.org/qemu-devel/30baeb56-64d2-4ea3-8e53-6a5c50999...@redhat.com/,
> >search for "To connect host<->guest").
>
> For in this case AF_VSOCK can't be used in the host, right?
> So it's similar to vhost-user-vsock.

Not sure if I understand but in this case QEMU knows which CIDs are
forwarded to the host (either listen on vsock and connect to the host,
or vice versa), so there is no kernel and no VMADDR_FLAG_TO_HOST
involved.

Paolo




Re: How to implement message forwarding from one CID to another in vhost driver

2024-05-28 Thread Paolo Bonzini

On 5/27/24 09:54, Alexander Graf wrote:


On 27.05.24 09:08, Alexander Graf wrote:

Hey Stefano,

On 23.05.24 10:45, Stefano Garzarella wrote:

On Tue, May 21, 2024 at 08:50:22AM GMT, Alexander Graf wrote:

Howdy,

On 20.05.24 14:44, Dorjoy Chowdhury wrote:

Hey Stefano,

Thanks for the reply.


On Mon, May 20, 2024, 2:55 PM Stefano Garzarella 
 wrote:

Hi Dorjoy,

On Sat, May 18, 2024 at 04:17:38PM GMT, Dorjoy Chowdhury wrote:

Hi,

Hope you are doing well. I am working on adding AWS Nitro Enclave[1]
emulation support in QEMU. Alexander Graf is mentoring me on this 
work. A v1
patch series has already been posted to the qemu-devel mailing 
list[2].


AWS nitro enclaves is an Amazon EC2[3] feature that allows 
creating isolated
execution environments, called enclaves, from Amazon EC2 
instances, which are
used for processing highly sensitive data. Enclaves have no 
persistent storage
and no external networking. The enclave VMs are based on 
Firecracker microvm
and have a vhost-vsock device for communication with the parent 
EC2 instance
that spawned it and a Nitro Secure Module (NSM) device for 
cryptographic
attestation. The parent instance VM always has CID 3 while the 
enclave VM gets
a dynamic CID. The enclave VMs can communicate with the parent 
instance over
various ports to CID 3, for example, the init process inside an 
enclave sends a
heartbeat to port 9000 upon boot, expecting a heartbeat reply, 
letting the

parent instance know that the enclave VM has successfully booted.

The plan is to eventually make the nitro enclave emulation in 
QEMU standalone

i.e., without needing to run another VM with CID 3 with proper vsock
If you don't have to launch another VM, maybe we can avoid 
vhost-vsock
and emulate virtio-vsock in user-space, having complete control 
over the

behavior.

So we could use this opportunity to implement virtio-vsock in QEMU 
[4]

or use vhost-user-vsock [5] and customize it somehow.
(Note: vhost-user-vsock already supports sibling communication, so 
maybe

with a few modifications it fits your case perfectly)

[4] https://gitlab.com/qemu-project/qemu/-/issues/2095
[5] 
https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock



Thanks for letting me know. Right now I don't have a complete picture
but I will look into them. Thank you.



communication support. For this to work, one approach could be to 
teach the

vhost driver in kernel to forward CID 3 messages to another CID N

So in this case both CID 3 and N would be assigned to the same QEMU
process?



CID N is assigned to the enclave VM. CID 3 was supposed to be the
parent VM that spawns the enclave VM (this is how it is in AWS, where
an EC2 instance VM spawns the enclave VM from inside it and that
parent EC2 instance always has CID 3). But in the QEMU case as we
don't want a parent VM (we want to run enclave VMs standalone) we
would need to forward the CID 3 messages to host CID. I don't know if
it means CID 3 and CID N is assigned to the same QEMU process. Sorry.



There are 2 use cases here:

1) Enclave wants to treat host as parent (default). In this scenario,
the "parent instance" that shows up as CID 3 in the Enclave doesn't
really exist. Instead, when the Enclave attempts to talk to CID 3, it
should really land on CID 0 (hypervisor). When the hypervisor tries to
connect to the Enclave on port X, it should look as if it originates
from CID 3, not CID 0.

2) Multiple parent VMs. Think of an actual cloud hosting scenario.
Here, we have multiple "parent instances". Each of them thinks it's
CID 3. Each can spawn an Enclave that talks to CID 3 and reach the
parent. For this case, I think implementing all of virtio-vsock in
user space is the best path forward. But in theory, you could also
swizzle CIDs to make random "real" CIDs appear as CID 3.



Thank you for clarifying the use cases!

Also for case 1, vhost-vsock doesn't support CID 0, so in my opinion
it's easier to go into user-space with vhost-user-vsock or the built-in
device.



Sorry, I believe I meant CID 2. Effectively for case 1, when a process 
on the hypervisor listens on port 1234, it should be visible as 3:1234 
from the VM and when the hypervisor process connects to :1234, 
it should look as if that connection came from CID 3.



Now that I'm thinking about my message again: What if we just introduce 
a sysfs/sysctl file for vsock that indicates the "host CID" (default: 
2)? Users that want vhost-vsock to behave as if the host is CID 3 can 
just write 3 to it.


It means we'd need to change all references to VMADDR_CID_HOST to 
instead refer to a global variable that indicates the new "host CID". 
It'd need some more careful massaging to not break number namespace 
assumptions (<= CID_HOST no longer works), but the idea should fly.


Forwarding one or more ports of a given CID to CID 2 (the host) should 
be doable with a dummy vhost client that listens to CID 3, connects to 
CID 2 and send data back and forth.  Not hard enough to justify ch

Re: [PATCH v2] tools: do not include scripts/Kbuild.include

2021-04-16 Thread Paolo Bonzini

On 16/04/21 15:26, Christian Borntraeger wrote:



On 16.04.21 15:00, Masahiro Yamada wrote:

Since commit d9f4ff50d2aa ("kbuild: spilt cc-option and friends to
scripts/Makefile.compiler"), some kselftests fail to build.

The tools/ directory opted out Kbuild, and went in a different
direction. They copy any kind of files to the tools/ directory
in order to do whatever they want in their world.

tools/build/Build.include mimics scripts/Kbuild.include, but some
tool Makefiles included the Kbuild one to import a feature that is
missing in tools/build/Build.include:

  - Commit ec04aa3ae87b ("tools/thermal: tmon: use "-fstack-protector"
    only if supported") included scripts/Kbuild.include from
    tools/thermal/tmon/Makefile to import the cc-option macro.

  - Commit c2390f16fc5b ("selftests: kvm: fix for compilers that do
    not support -no-pie") included scripts/Kbuild.include from
    tools/testing/selftests/kvm/Makefile to import the try-run macro.

  - Commit 9cae4ace80ef ("selftests/bpf: do not ignore clang
    failures") included scripts/Kbuild.include from
    tools/testing/selftests/bpf/Makefile to import the .DELETE_ON_ERROR
    target.

  - Commit 0695f8bca93e ("selftests/powerpc: Handle Makefile for
    unrecognized option") included scripts/Kbuild.include from
    tools/testing/selftests/powerpc/pmu/ebb/Makefile to import the
    try-run macro.

Copy what they need into tools/build/Build.include, and make them
include it instead of scripts/Kbuild.include.

Link: 
https://lore.kernel.org/lkml/86dadf33-70f7-a5ac-cb8c-64966d2f4...@linux.ibm.com/ 

Fixes: d9f4ff50d2aa ("kbuild: spilt cc-option and friends to 
scripts/Makefile.compiler")

Reported-by: Janosch Frank 
Reported-by: Christian Borntraeger 
Signed-off-by: Masahiro Yamada 


looks better.
Tested-by: Christian Borntraeger 



Thank you very much Masahiro, this look great.

Paolo



Re: [PATCH 2/2] tools: do not include scripts/Kbuild.include

2021-04-15 Thread Paolo Bonzini

On 15/04/21 10:04, Masahiro Yamada wrote:

On Thu, Apr 15, 2021 at 4:40 PM Paolo Bonzini  wrote:

I think it would make sense to add try-run, cc-option and
.DELETE_ON_ERROR to tools/build/Build.include?


To be safe, I just copy-pasted what the makefiles need.
If someone wants to refactor the tool build system, that is fine,
but, to me, I do not see consistent rules or policy under tools/.


"Please put this in a common file instead of introducing duplication" is 
not asking for wholesale refactoring.


Paolo



Re: [PATCH 2/2] tools: do not include scripts/Kbuild.include

2021-04-15 Thread Paolo Bonzini

On 15/04/21 09:27, Masahiro Yamada wrote:

Since commit d9f4ff50d2aa ("kbuild: spilt cc-option and friends to
scripts/Makefile.compiler"), some kselftests fail to build.

The tools/ directory opted out Kbuild, and went in a different
direction. They copy any kind of files to the tools/ directory
in order to do whatever they want to do in their world.

tools/build/Build.include mimics scripts/Kbuild.include, but some
tool Makefiles included the Kbuild one to import a feature that is
missing in tools/build/Build.include:

  - Commit ec04aa3ae87b ("tools/thermal: tmon: use "-fstack-protector"
only if supported") included scripts/Kbuild.include from
tools/thermal/tmon/Makefile to import the cc-option macro.

  - Commit c2390f16fc5b ("selftests: kvm: fix for compilers that do
not support -no-pie") included scripts/Kbuild.include from
tools/testing/selftests/kvm/Makefile to import the try-run macro.

  - Commit 9cae4ace80ef ("selftests/bpf: do not ignore clang
failures") included scripts/Kbuild.include from
tools/testing/selftests/bpf/Makefile to import the .DELETE_ON_ERROR
target.

  - Commit 0695f8bca93e ("selftests/powerpc: Handle Makefile for
unrecognized option") included scripts/Kbuild.include from
tools/testing/selftests/powerpc/pmu/ebb/Makefile to import the
try-run macro.

Copy what they want there, and stop including scripts/Kbuild.include
from the tool Makefiles.


I think it would make sense to add try-run, cc-option and 
.DELETE_ON_ERROR to tools/build/Build.include?


Paolo


Link: 
https://lore.kernel.org/lkml/86dadf33-70f7-a5ac-cb8c-64966d2f4...@linux.ibm.com/
Fixes: d9f4ff50d2aa ("kbuild: spilt cc-option and friends to 
scripts/Makefile.compiler")
Reported-by: Janosch Frank 
Reported-by: Christian Borntraeger 
Signed-off-by: Masahiro Yamada 
---

  tools/testing/selftests/bpf/Makefile  |  3 ++-
  tools/testing/selftests/kvm/Makefile  | 12 +++-
  .../selftests/powerpc/pmu/ebb/Makefile| 11 ++-
  tools/thermal/tmon/Makefile   | 19 +--
  4 files changed, 40 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index 044bfdcf5b74..d872b9f41543 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -1,5 +1,4 @@
  # SPDX-License-Identifier: GPL-2.0
-include ../../../../scripts/Kbuild.include
  include ../../../scripts/Makefile.arch
  include ../../../scripts/Makefile.include
  
@@ -476,3 +475,5 @@ EXTRA_CLEAN := $(TEST_CUSTOM_PROGS) $(SCRATCH_DIR) $(HOST_SCRATCH_DIR)	\

prog_tests/tests.h map_tests/tests.h verifier/tests.h   \
feature \
$(addprefix $(OUTPUT)/,*.o *.skel.h no_alu32 bpf_gcc bpf_testmod.ko)
+
+.DELETE_ON_ERROR:
diff --git a/tools/testing/selftests/kvm/Makefile 
b/tools/testing/selftests/kvm/Makefile
index a6d61f451f88..8b45bc417d83 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -1,5 +1,15 @@
  # SPDX-License-Identifier: GPL-2.0-only
-include ../../../../scripts/Kbuild.include
+
+TMPOUT = .tmp_
+
+try-run = $(shell set -e;  \
+   TMP=$(TMPOUT)/tmp;  \
+   mkdir -p $(TMPOUT); \
+   trap "rm -rf $(TMPOUT)" EXIT; \
+   if ($(1)) >/dev/null 2>&1;\
+   then echo "$(2)"; \
+   else echo "$(3)"; \
+   fi)
  
  all:
  
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/Makefile b/tools/testing/selftests/powerpc/pmu/ebb/Makefile

index af3df79d8163..d5d3e869df93 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/Makefile
+++ b/tools/testing/selftests/powerpc/pmu/ebb/Makefile
@@ -1,5 +1,4 @@
  # SPDX-License-Identifier: GPL-2.0
-include ../../../../../../scripts/Kbuild.include
  
  noarg:

$(MAKE) -C ../../
@@ -8,6 +7,16 @@ noarg:
  CFLAGS += -m64
  
  TMPOUT = $(OUTPUT)/TMPDIR/

+
+try-run = $(shell set -e;  \
+   TMP=$(TMPOUT)/tmp;  \
+   mkdir -p $(TMPOUT); \
+   trap "rm -rf $(TMPOUT)" EXIT; \
+   if ($(1)) >/dev/null 2>&1;\
+   then echo "$(2)"; \
+   else echo "$(3)"; \
+   fi)
+
  # Toolchains may build PIE by default which breaks the assembly
  no-pie-option := $(call try-run, echo 'int main() { return 0; }' | \
  $(CC) -Werror $(KBUILD_CPPFLAGS) $(CC_OPTION_CFLAGS) -no-pie -x c - -o 
"$$TMP", -no-pie)
diff --git a/tools/thermal/tmon/Makefile b/tools/thermal/tmon/Makefile
index 59e417ec3e13..92a683e4866c 100644
--- a/tools/thermal/tmon/Makefile
+++ b/tools/thermal/tmon/Makefile
@@ -1,6 +1,21 @@
  # SPDX-License-Identifier: GPL-2.0
-# We need this for the "cc-option" macro.
-include ../../../scripts/Kbuild.include
+
+TMPOUT = .tmp_
+
+try-run = $(shell set -e;  \
+   TMP=$(TMPOUT)/tmp;  \
+   mkdir -p 

Re: [PATCH AUTOSEL 5.9 22/33] vhost scsi: add lun parser helper

2020-12-04 Thread Paolo Bonzini

On 04/12/20 16:49, Sasha Levin wrote:

On Fri, Dec 04, 2020 at 09:27:28AM +0100, Paolo Bonzini wrote:

On 01/12/20 00:59, Sasha Levin wrote:


It's quite easy to NAK a patch too, just reply saying "no" and it'll be
dropped (just like this patch was dropped right after your first reply)
so the burden on maintainers is minimal.


The maintainers are _already_ marking patches with "Cc: stable".  That 


They're not, though. Some forget, some subsystems don't mark anything,
some don't mark it as it's not stable material when it lands in their
tree but then it turns out to be one if it sits there for too long.


That means some subsystems will be worse as far as stable release 
support goes.  That's not a problem:


- some subsystems have people paid to do backports to LTS releases when 
patches don't apply; others don't, if the patch doesn't apply the bug is 
simply not fixed in LTS releases


- some subsystems are worse than others even in "normal" releases :)

(plus backports) is where the burden on maintainers should start and 
end.  I don't see the need to second guess them.


This is similar to describing our CI infrastructure as "second
guessing": why are we second guessing authors and maintainers who are
obviously doing the right thing by testing their patches and reporting
issues to them?


No, it's not the same.  CI helps finding bugs before you have to waste 
time spending bisecting regressions across thousands of commits.  The 
lack of stable tags _can_ certainly be a problem, but it solves itself 
sooner or later when people upgrade their kernel.



Are you saying that you have always gotten stable tags right? never
missed a stable tag where one should go?


Of course I did, just like I have introduced bugs.  But at least I try 
to do my best both at adding stable tags and at not introducing bugs.


Paolo



Re: [PATCH AUTOSEL 5.9 22/33] vhost scsi: add lun parser helper

2020-12-04 Thread Paolo Bonzini

On 01/12/20 00:59, Sasha Levin wrote:


It's quite easy to NAK a patch too, just reply saying "no" and it'll be
dropped (just like this patch was dropped right after your first reply)
so the burden on maintainers is minimal.


The maintainers are _already_ marking patches with "Cc: stable".  That 
(plus backports) is where the burden on maintainers should start and 
end.  I don't see the need to second guess them.


Paolo



Re: [PATCH AUTOSEL 5.9 22/33] vhost scsi: add lun parser helper

2020-11-30 Thread Paolo Bonzini

On 30/11/20 20:44, Mike Christie wrote:

I have never seen a public/open-source vhost-scsi testsuite.

For patch 23 (the one that adds the lun reset support which is built on
patch 22), we can't add it to stable right now if you wanted to, because
it has a bug in it. Michael T, sent the fix:

https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git/commit/?h=linux-next&id=b4fffc177fad3c99ee049611a508ca9561bb6871

to Linus today.


Ok, so at least it was only a close call and anyway not for something 
that most people would be running on their machines.  But it still seems 
to me that the state of CI in Linux is abysmal compared to what is 
needed to arbitrarily(*) pick up patches and commit them to "stable" trees.


Paolo

(*) A ML bot is an arbitrary choice as far as we are concerned since we 
cannot know how it makes a decision.




Re: [PATCH AUTOSEL 5.9 22/33] vhost scsi: add lun parser helper

2020-11-30 Thread Paolo Bonzini

On 30/11/20 18:38, Sasha Levin wrote:
I am not aware of any public CI being done _at all_ done on 
vhost-scsi, by CKI or everyone else.  So autoselection should be done 
only on subsystems that have very high coverage in CI.


Where can I find a testsuite for virtio/vhost? I see one for KVM, but
where is the one that the maintainers of virtio/vhost run on patches
that come in?


I don't know of any, especially for vhost-scsi.  MikeC?

Paolo



Re: [PATCH AUTOSEL 5.9 22/33] vhost scsi: add lun parser helper

2020-11-30 Thread Paolo Bonzini

On 30/11/20 14:57, Greg KH wrote:

Every patch should be "fixing a real issue"---even a new feature.  But the
larger the patch, the more the submitters and maintainers should be trusted
rather than a bot.  The line between feature and bugfix_sometimes_  is
blurry, I would say that in this case it's not, and it makes me question how
the bot decided that this patch would be acceptable for stable (which AFAIK
is not something that can be answered).

I thought that earlier Sasha said that this patch was needed as a
prerequisite patch for a later fix, right?  If not, sorry, I've lost the
train of thought in this thread...


Yeah---sorry I am replying to 22/33 but referring to 23/33, which is the 
one that in my opinion should not be blindly accepted for stable kernels 
without the agreement of the submitter or maintainer.


Paolo



Re: [PATCH AUTOSEL 5.9 22/33] vhost scsi: add lun parser helper

2020-11-30 Thread Paolo Bonzini

On 30/11/20 14:28, Greg KH wrote:

Lines of code is not everything. If you think that this needs additional
testing then that's fine and we can drop it, but not picking up a fix
just because it's 120 lines is not something we'd do.

Starting with the first two steps in stable-kernel-rules.rst:

Rules on what kind of patches are accepted, and which ones are not, into the
"-stable" tree:

  - It must be obviously correct and tested.
  - It cannot be bigger than 100 lines, with context.

We do obviously take patches that are bigger than 100 lines, as there
are always exceptions to the rules here.  Look at all of the
spectre/meltdown patches as one such example.  Should we refuse a patch
just because it fixes a real issue yet is 101 lines long?


Every patch should be "fixing a real issue"---even a new feature.  But 
the larger the patch, the more the submitters and maintainers should be 
trusted rather than a bot.  The line between feature and bugfix 
_sometimes_ is blurry, I would say that in this case it's not, and it 
makes me question how the bot decided that this patch would be 
acceptable for stable (which AFAIK is not something that can be answered).


Paolo



Re: [PATCH AUTOSEL 5.9 22/33] vhost scsi: add lun parser helper

2020-11-30 Thread Paolo Bonzini

On 29/11/20 22:06, Sasha Levin wrote:

On Sun, Nov 29, 2020 at 06:34:01PM +0100, Paolo Bonzini wrote:

On 29/11/20 05:13, Sasha Levin wrote:

Which doesn't seem to be suitable for stable either...  Patch 3/5 in


Why not? It was sent as a fix to Linus.


Dunno, 120 lines of new code?  Even if it's okay for an rc, I don't 
see why it is would be backported to stable releases and release it 
without any kind of testing.  Maybe for 5.9 the chances of breaking 


Lines of code is not everything. If you think that this needs additional
testing then that's fine and we can drop it, but not picking up a fix
just because it's 120 lines is not something we'd do.


Starting with the first two steps in stable-kernel-rules.rst:

Rules on what kind of patches are accepted, and which ones are not, into 
the "-stable" tree:


 - It must be obviously correct and tested.
 - It cannot be bigger than 100 lines, with context.


Plus all the testing we have for the stable trees, yes. It goes beyond
just compiling at this point.

Your very own co-workers (https://cki-project.org/) are pushing hard on
this effort around stable kernel testing, and statements like these
aren't helping anyone.


I am not aware of any public CI being done _at all_ done on vhost-scsi, 
by CKI or everyone else.  So autoselection should be done only on 
subsystems that have very high coverage in CI.


Paolo



Re: [PATCH AUTOSEL 5.9 22/33] vhost scsi: add lun parser helper

2020-11-29 Thread Paolo Bonzini

On 29/11/20 05:13, Sasha Levin wrote:
Which doesn't seem to be suitable for stable either...  Patch 3/5 in 


Why not? It was sent as a fix to Linus.


Dunno, 120 lines of new code?  Even if it's okay for an rc, I don't see 
why it is would be backported to stable releases and release it without 
any kind of testing.  Maybe for 5.9 the chances of breaking things are 
low, but stuff like locking rules might have changed since older 
releases like 5.4 or 4.19.  The autoselection bot does not know that, it 
basically crosses fingers that these larger-scale changes cause the 
patches not to apply or compile anymore.


Maybe it's just me, but the whole "autoselect stable patches" and 
release them is very suspicious.  You are basically crossing fingers and 
are ready to release any kind of untested crap, because you do not trust 
maintainers of marking stable patches right.  Only then, when a backport 
is broken, it's maintainers who get the blame and have to fix it.


Personally I don't care because I have asked you to opt KVM out of 
autoselection, but this is the opposite of what Greg brags about when he 
touts the virtues of the upstream stable process over vendor kernels.


Paolo

the series might be (vhost scsi: fix cmd completion race), so I can 
understand including 1/5 and 2/5 just in case, but not the rest.  Does 
the bot not understand diffstats?


Not on their own, no. What's wrong with the diffstats?





Re: [PATCH AUTOSEL 5.9 22/33] vhost scsi: add lun parser helper

2020-11-25 Thread Paolo Bonzini

On 25/11/20 19:01, Sasha Levin wrote:

On Wed, Nov 25, 2020 at 06:48:21PM +0100, Paolo Bonzini wrote:

On 25/11/20 16:35, Sasha Levin wrote:

From: Mike Christie 

[ Upstream commit 18f1becb6948cd411fd01968a0a54af63732e73c ]

Move code to parse lun from req's lun_buf to helper, so tmf code
can use it in the next patch.

Signed-off-by: Mike Christie 
Reviewed-by: Paolo Bonzini 
Acked-by: Jason Wang 
Link: 
https://lore.kernel.org/r/1604986403-4931-5-git-send-email-michael.chris...@oracle.com 


Signed-off-by: Michael S. Tsirkin 
Acked-by: Stefan Hajnoczi 
Signed-off-by: Sasha Levin 


This doesn't seem like stable material, does it?


It went in as a dependency for efd838fec17b ("vhost scsi: Add support
for LUN resets."), which is the next patch.


Which doesn't seem to be suitable for stable either...  Patch 3/5 in the 
series might be (vhost scsi: fix cmd completion race), so I can 
understand including 1/5 and 2/5 just in case, but not the rest.  Does 
the bot not understand diffstats?


Paolo



Re: [PATCH AUTOSEL 5.9 22/33] vhost scsi: add lun parser helper

2020-11-25 Thread Paolo Bonzini

On 25/11/20 16:35, Sasha Levin wrote:

From: Mike Christie 

[ Upstream commit 18f1becb6948cd411fd01968a0a54af63732e73c ]

Move code to parse lun from req's lun_buf to helper, so tmf code
can use it in the next patch.

Signed-off-by: Mike Christie 
Reviewed-by: Paolo Bonzini 
Acked-by: Jason Wang 
Link: 
https://lore.kernel.org/r/1604986403-4931-5-git-send-email-michael.chris...@oracle.com
Signed-off-by: Michael S. Tsirkin 
Acked-by: Stefan Hajnoczi 
Signed-off-by: Sasha Levin 


This doesn't seem like stable material, does it?

Paolo


---
  drivers/vhost/scsi.c | 9 +++--
  1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 5d8850f5aef16..ed7dc6b998f65 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -898,6 +898,11 @@ vhost_scsi_get_req(struct vhost_virtqueue *vq, struct 
vhost_scsi_ctx *vc,
return ret;
  }
  
+static u16 vhost_buf_to_lun(u8 *lun_buf)

+{
+   return ((lun_buf[2] << 8) | lun_buf[3]) & 0x3FFF;
+}
+
  static void
  vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
  {
@@ -1036,12 +1041,12 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct 
vhost_virtqueue *vq)
tag = vhost64_to_cpu(vq, v_req_pi.tag);
task_attr = v_req_pi.task_attr;
cdb = &v_req_pi.cdb[0];
-   lun = ((v_req_pi.lun[2] << 8) | v_req_pi.lun[3]) & 
0x3FFF;
+   lun = vhost_buf_to_lun(v_req_pi.lun);
} else {
tag = vhost64_to_cpu(vq, v_req.tag);
task_attr = v_req.task_attr;
cdb = &v_req.cdb[0];
-   lun = ((v_req.lun[2] << 8) | v_req.lun[3]) & 0x3FFF;
+   lun = vhost_buf_to_lun(v_req.lun);
}
/*
 * Check that the received CDB size does not exceeded our





Re: [PATCH 1/3] sched: better handling for busy polling loops

2020-10-23 Thread Paolo Bonzini
On 23/10/20 09:19, Peter Zijlstra wrote:
>> +/*
>> + * preemption needs to be kept disabled between prepare_to_busy_poll()
>> + * and end_busy_poll().
>> + */
>> +BUG_ON(preemptible());
>> +if (allow_resched)
>> +preempt_enable();
>> +else
>> +preempt_enable_no_resched();
> NAK on @allow_resched
> 

Since KVM is the one passing false, indeed I see no reason for the
argument; you can just use preempt_enable().  There is no impact for
example on the tracking of how much time was spent polling; that
ktime_get() for the end of the polling period is done before calling
end_busy_poll().

Paolo



Re: [RFC PATCH 6/7] core/metricfs: expose x86-specific irq information through metricfs

2020-08-13 Thread Paolo Bonzini
On 13/08/20 14:13, Thomas Gleixner wrote:
 Add metricfs support for displaying percpu irq counters for x86.
 The top directory is /sys/kernel/debug/metricfs/irq_x86.
 Then there is a subdirectory for each x86-specific irq counter.
 For example:

cat /sys/kernel/debug/metricfs/irq_x86/TLB/values
>>> What is 'TLB'? I'm not aware of any vector which is named TLB.
>> There's a "TLB" entry in /proc/interrupts.
> It's TLB shootdowns and not TLB.

Yes but it's using the shortcut name on the left of the table.

> +METRICFS_ITEM(LOC, apic_timer_irqs, "Local timer interrupts");
> +METRICFS_ITEM(SPU, irq_spurious_count, "Spurious interrupts");
> +METRICFS_ITEM(PMI, apic_perf_irqs, "Performance monitoring interrupts");
> +METRICFS_ITEM(IWI, apic_irq_work_irqs, "IRQ work interrupts");
> +METRICFS_ITEM(RTR, icr_read_retry_count, "APIC ICR read retries");
> +#endif
> +METRICFS_ITEM(PLT, x86_platform_ipis, "Platform interrupts");
> +#ifdef CONFIG_SMP
> +METRICFS_ITEM(RES, irq_resched_count, "Rescheduling interrupts");
> +METRICFS_ITEM(CAL, irq_call_count, "Function call interrupts");
> +METRICFS_ITEM(TLB, irq_tlb_count, "TLB shootdowns");

Paolo



Re: [RFC PATCH 6/7] core/metricfs: expose x86-specific irq information through metricfs

2020-08-13 Thread Paolo Bonzini
On 13/08/20 12:11, Thomas Gleixner wrote:
>> Add metricfs support for displaying percpu irq counters for x86.
>> The top directory is /sys/kernel/debug/metricfs/irq_x86.
>> Then there is a subdirectory for each x86-specific irq counter.
>> For example:
>>
>>cat /sys/kernel/debug/metricfs/irq_x86/TLB/values
> What is 'TLB'? I'm not aware of any vector which is named TLB.

There's a "TLB" entry in /proc/interrupts.

Paolo



Re: [PATCH v3 0/7] Statsfs: a new ram-based file system for Linux kernel statistics

2020-05-27 Thread Paolo Bonzini
On 28/05/20 00:21, David Ahern wrote:
> On 5/27/20 3:07 PM, Paolo Bonzini wrote:
>> I see what you meant now.  statsfs can also be used to enumerate objects
>> if one is so inclined (with the prototype in patch 7, for example, each
>> network interface becomes a directory).
> 
> there are many use cases that have 100's to 1000's have network devices.
> Having a sysfs entry per device already bloats memory usage for these
> use cases; another filesystem with an entry per device makes that worse.
> Really the wrong direction for large scale systems.

Hi David,

IMO the important part for now is having a flexible kernel API for
exposing statistics across multiple subsystems, so that they can be
harvested in an efficient way.  The userspace API is secondary, and
multiple APIs can be added to cater for different usecases.

For example, as of the first five patches the memory usage is the same
as what is now in the mainline kernel, since all the patchset does is
take existing debugfs inodes and move them to statsfs.  I agree that, if
the concept is extended to the whole kernel, scalability and memory
usage becomes an issue; and indeed, the long-term plan is to support a
binary format that is actually _more_ efficient than the status quo for
large scale systems.

In the meanwhile, the new filesystem can be disabled (see the difference
between "STATS_FS" and "STATS_FS_API") if it imposes undesirable overhead.

Thanks,

Paolo



Re: [PATCH v3 0/7] Statsfs: a new ram-based file system for Linux kernel statistics

2020-05-27 Thread Paolo Bonzini
On 27/05/20 23:27, Jakub Kicinski wrote:
> On Wed, 27 May 2020 23:07:53 +0200 Paolo Bonzini wrote:
>>> Again, I have little KVM knowledge, but BPF also uses a fd-based API,
>>> and carries stats over the same syscall interface.  
>>
>> Can BPF stats (for BPF scripts created by whatever process is running in
>> the system) be collected by an external daemon that does not have access
>> to the file descriptor?  For KVM it's of secondary importance to gather
>> stats in the program; it can be nice to have and we are thinking of a
>> way to export the stats over the fd-based API, but it's less useful than
>> system-wide monitoring.  Perhaps this is a difference between the two.
> 
> Yes, check out bpftool prog list (bpftool code is under tools/bpf/ in
> the kernel tree). BPF statistics are under a static key, so you may not
> see any on your system. My system shows e.g.:
> 
> 81: kprobe  name abc  tag cefaa9376bdaae75  gpl run_time_ns 80941 run_cnt 152
>   loaded_at 2020-05-26T13:00:24-0700  uid 0
>   xlated 512B  jited 307B  memlock 4096B  map_ids 66,64
>   btf_id 16
> 
> In this example run_time_ns and run_cnt are stats.
> 
> The first number on the left is the program ID. BPF has an IDA, and
> each object gets an integer id. So admin (or CAP_BPF, I think) can
> iterate over the ids and open fds to objects of interest.

Got it, thanks.  But then "I'd hope that whatever daemon collects [BPF]
stats doesn't run as root". :)

>> Another case where stats and configuration are separate is CPUs, where
>> CPU enumeration is done in sysfs but statistics are exposed in various
>> procfs files such as /proc/interrupts and /proc/stats.
> 
> True, but I'm guessing everyone is just okay living with the legacy
> procfs format there. Otherwise I'd guess the stats would had been added
> to sysfs. I'd be curious to hear the full story there.

Yeah, it's a chicken-and-egg problem in that there's no good place in
sysfs to put statistics right now, which is part of what this filesystem
is trying to solve (the other part is the API).

You can read more about Google's usecase at
http://lkml.iu.edu/hypermail/linux/kernel/2005.0/08056.html, it does
include both network and interrupt stats and it's something that they've
been using in production for quite some time.  We'd like the statsfs API
to be the basis for including something akin to that in Linux.

To be honest, it's unlikely that Emanuele (who has just finished his
internship at Red Hat) and I will pursue the networking stats further
than the demo patch at the end of this series. However, we're trying to
make sure that the API is at least ready for that, and to probe whether
any developers from other subsystems would be interested in using
statsfs.  So thanks for bringing your point of view!

Thanks,

Paolo



Re: [PATCH v3 0/7] Statsfs: a new ram-based file system for Linux kernel statistics

2020-05-27 Thread Paolo Bonzini
On 27/05/20 22:23, Jakub Kicinski wrote:
> On Wed, 27 May 2020 15:14:41 +0200 Emanuele Giuseppe Esposito wrote:
>> Regarding the config, as I said the idea is to gather multiple 
>> subsystems' statistics, therefore there wouldn't be a single 
>> configuration method like in netlink.
>> For example in kvm there are file descriptors for configuration, and 
>> creating them requires no privilege, contrary to the network interfaces.
>
> Enumerating networking interfaces, addresses, and almost all of the
> configuration requires no extra privilege. In fact I'd hope that
> whatever daemon collects network stats doesn't run as root :)
> 
> I think enumerating objects is of primary importance, and statistics 
> of those objects are subordinate.

I see what you meant now.  statsfs can also be used to enumerate objects
if one is so inclined (with the prototype in patch 7, for example, each
network interface becomes a directory).

> Again, I have little KVM knowledge, but BPF also uses a fd-based API,
> and carries stats over the same syscall interface.

Can BPF stats (for BPF scripts created by whatever process is running in
the system) be collected by an external daemon that does not have access
to the file descriptor?  For KVM it's of secondary importance to gather
stats in the program; it can be nice to have and we are thinking of a
way to export the stats over the fd-based API, but it's less useful than
system-wide monitoring.  Perhaps this is a difference between the two.

Another case where stats and configuration are separate is CPUs, where
CPU enumeration is done in sysfs but statistics are exposed in various
procfs files such as /proc/interrupts and /proc/stats.

Thanks,

Paolo



Re: [PATCH v3 0/7] Statsfs: a new ram-based file system for Linux kernel statistics

2020-05-27 Thread Paolo Bonzini
On 27/05/20 15:33, Andrew Lunn wrote:
>> I don't really know a lot about the networking subsystem, and as it was
>> pointed out in another email on patch 7 by Andrew, networking needs to
>> atomically gather and display statistics in order to make them consistent,
>> and currently this is not supported by stats_fs but could be added in
>> future.
> 
> Do you have any idea how you will support atomic access? It does not
> seem easy to implement in a filesystem based model.

Hi Andrew,

there are plans to support binary access.  Emanuele and I don't really
have a plan for how to implement it, but there are developers from
Google that have ideas (because Google has a similar "metricfs" thing
in-house).

I think atomic access would use some kind of "source_ops" struct
containing create_snapshot and release_snapshot function pointers.

Paolo



Re: [PATCH v3] selftests: add headers_install to lib.mk

2018-05-14 Thread Paolo Bonzini
On 14/05/2018 13:58, Anders Roxell wrote:
> If the kernel headers aren't installed we can't build all the tests.
> Add a new make target rule 'khdr' in the file lib.mk to generate the
> kernel headers and that gets include for every test-dir Makefile that
> includes lib.mk If the testdir in turn have its own sub-dirs the
> top_srcdir needs to be set to the linux-rootdir to be able to generate
> the kernel headers.
> 
> Signed-off-by: Anders Roxell 
> Reviewed-by: Fathi Boudra 

Ack for KVM parts.

Paolo


Re: [virtio-dev] Re: [RFC] vhost: introduce mdev based hardware vhost backend

2018-04-10 Thread Paolo Bonzini
On 10/04/2018 06:57, Tiwei Bie wrote:
>> So you just move the abstraction layer from qemu to kernel, and you still
>> need different drivers in kernel for different device interfaces of
>> accelerators. This looks even more complex than leaving it in qemu. As you
>> said, another idea is to implement userspace vhost backend for accelerators
>> which seems easier and could co-work with other parts of qemu without
>> inventing new type of messages.
> 
> I'm not quite sure. Do you think it's acceptable to
> add various vendor specific hardware drivers in QEMU?

I think so.  We have vendor-specific quirks, and at some point there was
an idea of using quirks to implement (vendor-specific) live migration
support for assigned devices.

Paolo


Re: [virtio-dev] Re: [RFC PATCH 1/3] qemu: virtio-bypass should explicitly bind to a passthrough device

2018-04-05 Thread Paolo Bonzini
On 04/04/2018 10:02, Siwei Liu wrote:
>> pci_bus_num is almost always a bug if not done within
>> a context of a PCI host, bridge, etc.
>>
>> In particular this will not DTRT if run before guest assigns bus
>> numbers.
>>
> I was seeking means to reserve a specific pci bus slot from drivers,
> and update the driver when guest assigns the bus number but it seems
> there's no low-hanging fruits. Because of that reason the bus_num is
> only obtained until it's really needed (during get_config) and I
> assume at that point the pci bus assignment is already done. I know
> the current one is not perfect, but we need that information (PCI
> bus:slot.func number) to name the guest device correctly.

Can you use the -device "id", and look it up as

devices = container_get(qdev_get_machine(), "/peripheral");
return object_resolve_path_component(devices, id);

?

Thanks,

Paolo


Re: [PULL] virtio, vhost: new device, fixes, speedups

2016-12-16 Thread Paolo Bonzini


On 16/12/2016 03:20, Linus Torvalds wrote:
> On Thu, Dec 15, 2016 at 3:05 PM, Michael S. Tsirkin  wrote:
>>
>>   git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus
> 
> Pulled, but I wonder...
> 
>>  Documentation/translations/zh_CN/sparse.txt|   7 +-
>>  arch/arm/plat-samsung/include/plat/gpio-cfg.h  |   2 +-
>>  drivers/crypto/virtio/virtio_crypto_common.h   | 128 +
> [...]
> 
> what are you generating these diffstats with? Because they are pretty bogus..
> 
> The end result is correct:
> 
>>  86 files changed, 2106 insertions(+), 280 deletions(-)
> 
> but the file order in the diffstat is completely random, which makes
> it very hard to compare with what I get. It also makes it hard to see
> what you changed, because it's not alphabetical like it should be
> (strictly speaking the git pathname ordering isnt' really
> alphabetical, since the '/' sorts as the NUL character, but close
> enough).
> 
> I can't see the logic to the re-ordering of the lines, so I'm
> intrigued how you even generated it.

Looks like a diff.orderFile that places .h and .txt first, then .c, then
Makefile.  I've seen others propose it.

Paolo


Re: [virtio-dev] [PATCH] virtio_net: don't require ANY_LAYOUT with VERSION_1

2015-07-15 Thread Paolo Bonzini


On 15/07/2015 14:26, Michael S. Tsirkin wrote:
> ANY_LAYOUT is a compatibility feature. It's implied
> for VERSION_1 devices, and non-transitional devices
> might not offer it. Change code to behave accordingly.
> 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  drivers/net/virtio_net.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 63c7810..7fbca37 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1828,7 +1828,8 @@ static int virtnet_probe(struct virtio_device *vdev)
>   else
>   vi->hdr_len = sizeof(struct virtio_net_hdr);
>  
> - if (virtio_has_feature(vdev, VIRTIO_F_ANY_LAYOUT))
> + if (virtio_has_feature(vdev, VIRTIO_F_ANY_LAYOUT) ||
> + virtio_has_feature(vdev, VIRTIO_F_VERSION_1))
>   vi->any_header_sg = true;
>  
>   if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ))
> 

Reviewed-by: Paolo Bonzini 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Add Kconfig option to signal cross-endian guests

2015-07-09 Thread Paolo Bonzini


On 09/07/2015 11:48, Laurent Vivier wrote:
> 
> 
> On 09/07/2015 09:49, Thomas Huth wrote:
>> The option for supporting cross-endianness legacy guests in
>> the vhost and tun code should only be available on systems
>> that support cross-endian guests.
> 
> I'm sure I misunderstand something, but what happens if we use QEMU with
> TCG instead of KVM, i.e. a big endian powerpc kernel guest on x86_64
> little endian host ?

TCG does not yet support irqfd/ioeventfd, so it cannot be used with vhost.

Paolo

> Do you forbid the use of vhost in this case ?
> 
>> Signed-off-by: Thomas Huth 
>> ---
>>  arch/arm/kvm/Kconfig | 1 +
>>  arch/arm64/kvm/Kconfig   | 1 +
>>  arch/powerpc/kvm/Kconfig | 1 +
>>  drivers/net/Kconfig  | 1 +
>>  drivers/vhost/Kconfig| 1 +
>>  virt/kvm/Kconfig | 3 +++
>>  6 files changed, 8 insertions(+)
>>
>> diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
>> index bfb915d..9d8f363 100644
>> --- a/arch/arm/kvm/Kconfig
>> +++ b/arch/arm/kvm/Kconfig
>> @@ -31,6 +31,7 @@ config KVM
>>  select KVM_VFIO
>>  select HAVE_KVM_EVENTFD
>>  select HAVE_KVM_IRQFD
>> +select KVM_CROSS_ENDIAN_GUESTS
>>  depends on ARM_VIRT_EXT && ARM_LPAE && ARM_ARCH_TIMER
>>  ---help---
>>Support hosting virtualized guest machines.
>> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
>> index bfffe8f..9af39fe 100644
>> --- a/arch/arm64/kvm/Kconfig
>> +++ b/arch/arm64/kvm/Kconfig
>> @@ -31,6 +31,7 @@ config KVM
>>  select KVM_VFIO
>>  select HAVE_KVM_EVENTFD
>>  select HAVE_KVM_IRQFD
>> +select KVM_CROSS_ENDIAN_GUESTS
>>  ---help---
>>Support hosting virtualized guest machines.
>>  
>> diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
>> index 3caec2c..e028710 100644
>> --- a/arch/powerpc/kvm/Kconfig
>> +++ b/arch/powerpc/kvm/Kconfig
>> @@ -79,6 +79,7 @@ config KVM_BOOK3S_64_HV
>>  select KVM_BOOK3S_HV_POSSIBLE
>>  select MMU_NOTIFIER
>>  select CMA
>> +select KVM_CROSS_ENDIAN_GUESTS
>>  ---help---
>>Support running unmodified book3s_64 guest kernels in
>>virtual machines on POWER7 and PPC970 processors that have
>> diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
>> index c18f9e6..0c4ce47 100644
>> --- a/drivers/net/Kconfig
>> +++ b/drivers/net/Kconfig
>> @@ -261,6 +261,7 @@ config TUN
>>  config TUN_VNET_CROSS_LE
>>  bool "Support for cross-endian vnet headers on little-endian kernels"
>>  default n
>> +depends on KVM_CROSS_ENDIAN_GUESTS
>>  ---help---
>>This option allows TUN/TAP and MACVTAP device drivers in a
>>little-endian kernel to parse vnet headers that come from a
>> diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
>> index 533eaf0..4d8ae6b 100644
>> --- a/drivers/vhost/Kconfig
>> +++ b/drivers/vhost/Kconfig
>> @@ -35,6 +35,7 @@ config VHOST
>>  
>>  config VHOST_CROSS_ENDIAN_LEGACY
>>  bool "Cross-endian support for vhost"
>> +depends on KVM_CROSS_ENDIAN_GUESTS
>>  default n
>>  ---help---
>>This option allows vhost to support guests with a different byte
>> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
>> index e2c876d..cc7b28a 100644
>> --- a/virt/kvm/Kconfig
>> +++ b/virt/kvm/Kconfig
>> @@ -47,3 +47,6 @@ config KVM_GENERIC_DIRTYLOG_READ_PROTECT
>>  config KVM_COMPAT
>> def_bool y
>> depends on COMPAT && !S390
>> +
>> +config KVM_CROSS_ENDIAN_GUESTS
>> +   bool
>>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html