On Sun, Jun 30, 2024 at 04:54:18PM GMT, Dorjoy Chowdhury wrote:
Hey Stefano,
Apart from my questions in my previous email, I have some others as well.

If the vhost-device-vsock modification to forward packets to
VMADDR_CID_LOCAL is implemented, does the VMADDR_FLAG_TO_HOST need to
be set by any application in the guest? I understand that the flag is
set automatically in the listen path by the driver (ref:
https://patchwork.ozlabs.org/project/netdev/patch/20201204170235.84387-4-andra...@amazon.com/#2594117
), but from the comments in the referenced patch, I am guessing the
applications in the guest that will "connect" (as opposed to listen)
will need to set the flag in the application code? So does the
VMADDR_FLAG_TO_HOST flag need to be set by the applications in the
guest that will "connect" or should it work without it? I am asking
because the nitro-enclave VMs have an "init" which tries to connect to
CID 3 to send a "hello" on boot to let the parent VM know that it
booted expecting a "hello" reply but the init doesn't seem to set the
flag 
https://github.com/aws/aws-nitro-enclaves-sdk-bootstrap/blob/main/init/init.c#L356C1-L361C7

Looking at af_vsock.c code, it looks like that if we don't have any
H2G transports (e.g. vhost-vsock) loaded in the VM (this is loaded for nested VMs, so I guess for nitro-enclave VM this should not be the case), the packets are forwarded to the host in any case.

See https://elixir.bootlin.com/linux/latest/source/net/vmw_vsock/af_vsock.c#L469

.

I was following
https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock#sibling-vm-communication
to test if sibling communication works and it seems like I didn't need
to modify the "socat" to set the "VMADDR_FLAG_TO_HOST". I am wondering
why it works without any modification. Here is what I do:

shell1: ./vhost-device-vsock --vm
guest-cid=3,uds-path=/tmp/vm3.vsock,socket=/tmp/vhost3.socket --vm
guest-cid=4,uds-path=/tmp/vm4.vsock,socket=/tmp/vhost4.socket

shell2: ./qemu-system-x86_64 -machine q35,memory-backend=mem0
-enable-kvm -m 8G -nic user,model=virtio -drive
file=/home/dorjoy/Forks/test_vm/fedora2.qcow2,media=disk,if=virtio
--display sdl -object memory-backend-memfd,id=mem0,size=8G -chardev
socket,id=char0,reconnect=0,path=/tmp/vhost3.socket -device
vhost-user-vsock-pci,chardev=char0
   inside this guest I run: socat - VSOCK-LISTEN:9000

shell3: ./qemu-system-x86_64 -machine q35,memory-backend=mem0
-enable-kvm -m 8G -nic user,model=virtio -drive
file=/home/dorjoy/Forks/test_vm/fedora40.qcow2,media=disk,if=virtio
--display sdl -object memory-backend-memfd,id=mem0,size=8G -chardev
socket,id=char0,reconnect=0,path=/tmp/vhost4.socket -device
vhost-user-vsock-pci,chardev=char0
   inside this guest I run: socat - VSOCK-CONNECT:3:9000

Then when I type something in the socat terminal of one VM and hit
'enter', they pop up in the socat terminal of the other VM. From the
documentation of the vhost-device-vsock, I thought I would need to
patch socat to set the "VMADDR_FLAG_TO_HOST" but I did not do anything
with socat. I simply did "sudo dnf install socat" in both VMs. I also
looked into the socat source code and I didn't see any reference to
"VMADDR_FLAG_TO_HOST". I am running "Fedora 40" on both VMs. Do you
know why it works without the flag?

Yep, so the driver will forward them if the H2G transport is not loaded, like in your case. So if you set VMADDR_FLAG_TO_HOST you are sure that it is always forwarded to the host, if you don't set it, it is forwarded only if you don't have a nested VM using vhost-vsock. In that case we don't know how to differentiate the case of communication with a nested guest or a sibling guest, for this reason we added the flag.

If the host uses vhost-vsock, that packets are discarded, but for vhost-device-vsock, we are handling them.

Hope this clarify.

Stefano


On Wed, Jun 26, 2024 at 11:43 PM Dorjoy Chowdhury
<dorjoychy...@gmail.com> wrote:

Hey Stefano,
Thanks a lot for all the details. I will look into them and reach out
if I need further input. Thanks! I have tried to summarize my
understanding below. Let me know if that sounds correct.

On Wed, Jun 26, 2024 at 2:37 PM Stefano Garzarella <sgarz...@redhat.com> wrote:
>
> Hi Dorjoy,
>
> On Tue, Jun 25, 2024 at 11:44:30PM GMT, Dorjoy Chowdhury wrote:
> >Hey Stefano,
>
> [...]
>
> >> >
> >> >So the immediate plan would be to:
> >> >
> >> >  1) Build a new vhost-vsock-forward object model that connects to
> >> >vhost as CID 3 and then forwards every packet from CID 1 to the
> >> >Enclave-CID and every packet that arrives on to CID 3 to CID 2.
> >>
> >> This though requires writing completely from scratch the virtio-vsock
> >> emulation in QEMU. If you have time that would be great, otherwise if
> >> you want to do a PoC, my advice is to start with vhost-user-vsock which
> >> is already there.
> >>
> >
> >Can you give me some more details about how I can implement the
> >daemon?
>
> We already have a demon written in Rust, so I don't recommend you
> rewrite one from scratch, just start with that. You can find the daemon
> and instructions on how to use it with QEMU here [1].
>
> >I would appreciate some pointers to code too.
>
> I sent the pointer to it in my first reply [2].
>
> >
> >Right now, the "nitro-enclave" machine type (wip) in QEMU
> >automatically spawns a VHOST_VSOCK device with the CID equal to the
> >"guest-cid" machine option. I think this is equivalent to using the
> >"-device vhost-vsock-device,guest-cid=N" option explicitly. Does that
> >need any change? I guess instead of "vhost-vsock-device", the
> >vhost-vsock device needs to be equivalent to "-device
> >vhost-user-vsock-device,guest-cid=N"?
>
> Nope, the vhost-user-vsock device requires just a `chardev` option.
> The chardev points to the Unix socket used by QEMU to talk with the
> daemon. The daemon has a parameter to set the CID. See [1] for the
> examples.
>
> >
> >The applications inside the nitro-enclave VM will still connect and
> >talk to CID 3. So on the daemon side, do we need to spawn a device
> >that has CID 3 and then forward everything this device receives to CID
> >1 (VMADDR_CID_LOCAL) same port and everything it receives from CID 1
> >to the "guest-cid"?
>
> Yep, I think this is right.
> Note: to use VMADDR_CID_LOCAL, the host needs to load `vsock_loopback`
> kernel module.
>
> Before modifying the code, if you want to do some testing, perhaps you
> can use socat (which supports both UNIX-* and VSOCK-*). The daemon for
> now exposes two unix sockets, one is used to communicate with QEMU via
> the vhost-user protocol, and the other is to be used by the application
> to communicate with vsock sockets in the guest using the hybrid protocol
> defined by firecracker. So you could initiate a socat between the latter
> and VMADDR_CID_LOCAL, the only problem I see is that you have to send
> the first string provided by the hybrid protocol (CONNECT 1234), but for
> a PoC it should be ok.
>
> I just tried the following and it works without touching any code:
>
> shell1$ ./target/debug/vhost-device-vsock \
>      --vm guest-cid=3,socket=/tmp/vhost3.socket,uds-path=/tmp/vm3.vsock
>
> shell2$ sudo modprobe vsock_loopback
> shell2$ socat VSOCK-LISTEN:1234 UNIX-CONNECT:/tmp/vm3.vsock
>
> shell3$ qemu-system-x86_64 -smp 2 -M q35,accel=kvm,memory-backend=mem \
>      -drive file=fedora40.qcow2,format=qcow2,if=virtio\
>      -chardev socket,id=char0,path=/tmp/vhost3.socket \
>      -device vhost-user-vsock-pci,chardev=char0 \
>      -object memory-backend-memfd,id=mem,size=512M \
>      -nographic
>
>      guest$ nc --vsock -l 1234
>
> shell4$ nc --vsock 1 1234
> CONNECT 1234
>
>      Note: the `CONNECT 1234` is required by the hybrid vsock protocol
>      defined by firecracker, so if we extend the vhost-device-vsock
>      daemon to forward packet to VMADDR_CID_LOCAL, that would not be
>      needed (including running socat).
>

Understood. Just trying to think out loud what the final UX will be
from the user perspective to successfully run a nitro VM before I try
to modify vhost-device-vsock to support forwarding to
VMADDR_CID_LOCAL.
I guess because the "vhost-user-vsock" device needs to be spawned
implicitly (without any explicit option) inside nitro-enclave in QEMU,
we now need to provide the "chardev" as a machine option, so the
nitro-enclave command would look something like below:
"./qemu-system-x86_64 -M nitro-enclave,chardev=char0 -kernel
/path/to/eif -chardev socket,id=char0,path=/tmp/vhost5.socket -m 4G
--enable-kvm -cpu host"
and then set the chardev id to the vhost-user-vsock device in the code
from the machine option.

The modified "vhost-device-vsock" would need to be run with the new
option that will forward everything to VMADDR_CID_LOCAL (below by the
"-z" I mean the new option)
"./target/debug/vhost-device-vsock -z --vm
guest-cid=5,socket=/tmp/vhost5.socket,uds-path=/tmp/vm5.vsock"
this means the guest-cid of the nitro VM is CID 5, right?

And the applications in the host would need to use VMADDR_CID_LOCAL
for communication instead of "guest-cid" (5) (assuming vsock_loopback
is modprobed). Let's say there are 2 applications inside the nitro VM
that connect to CID 3 on port 9000 and 9001. And the applications on
the host listen on 9000 and 9001 using VMADDR_CID_LOCAL. So, after the
commands above (qemu VM and vhost-device-vsock) are run, the
communication between the applications in the host and the
applications in the nitro VM on port 9000 and 9001 should just work,
right, without needing to run any extra socat commands or such? or
will the user still need to run some socat commands for all the
relevant ports (e.g.,9000 and 9001)?

I am just wondering what kind of changes are needed in
vhost-device-vsock for forwarding packets to VMADDR_CID_LOCAL? Will
that be something like this: the codepath that handles
"/tmp/vm5.vsock", upon receiving a "connect" (from inside the nitro
VM) for any port to "/tmp/vm5.vsock", vhost-device-vsock will just
connect to the same port using AF_VSOCK using the socket system calls
and messages received on that port in "/tmp/vm5.vsock" will be "send"
to the AF_VSOCK socket? or am I not thinking right and the
implementation would be something different entirely (change the CID
from 3 to 2 (or 1?) on the packets before they are handled then socat
will be needed probably)? Will this work if the applications in the
host want to connect to applications inside the nitro VM (as opposed
to applications inside the nitro VM connecting to CID 3)?

Thanks and Regards,
Dorjoy



Reply via email to