On 7/7/23 01:38, Pekka Paalanen wrote:

That statement was based on the assumption that existing hypervisors
and VM viewer applications are not prepared to deal with hotspots
outside of cursor image. Therefore, if a guest is upgraded to a version
that uses hotspots outside of cursor images, and the VM stack is not
updated, it will malfunction.

Therefore it is best to model the new UAPI in a way that is compatible
with existing VM stack, especially since allowing this new feature
(hotspots outside of cursor image) has no known benefits.

Below I see my assumption was incorrect, but it still causes you to
fall back to something less optimal.

Okay, right.  That's why I was saying that it's not a big deal either way to VMware, but I wanted to at least make the case for allowing somewhat arbitrary hotspots just because it is semantically meaningful, and I don't know if any other hypervisors care about allowing it more than we do.


Essentially setting the hotspot properties means that the hypervisor
console can choose to either draw the cursor where the plane is located,
or use the cursor-plane + hotspot information to draw the cursor where
the user's mouse is on the client.

That works the same whether the hotspot is clamped or not.  We mostly
use clamping to avoid pathological cases (like a hotspot ot MAX_UINT32),
and get away with it because real Guest applications that do this are
very rare.
My point here is that you can design the new Linux UAPI to help you:
you can rule out cases that would lead to non-optimal behaviour, like
falling back to the drawing of cursor plane you mention when it would
be better to commandeer the cursor plane with the hotspot information.

We can't though, because we can't trust the guest kernel any more than the kernel can trust userspace.

So we need to handle these cases one way or another, both for older guests, and to ensure we don't have some security issue from a malicious guest kernel.


The question of which input device corresponds to which cursor plane
might be good to answer too. I presume the VM runner is configured to
expose exactly one of each, so there can be only one association?
As far as I know, all of the VM consoles are written as though they
taking the place of what would the the physical monitors and input
devices on a native machine.  So they assume that there is one user,
sitting in front of one console, and all monitors/input devices are
being used by that user.
Ok, but having a single user does not mean that there cannot be
multiple independent pointers, especially on Wayland. The same with
keyboards.
True, and if the userspace is doing anything complicated here, the
hypervisor has to be responsible for ensuring that whatever it's doing
works with that, or else this system won't work.  I don't know that the
kernel is in a good position to police that.
What do you mean by policing here?

Isn't it the hypervisor that determines what virtual input devices will
be available to the guest OS? Therefore, the hypervisor is in a
position to expose exactly one pointer device and exactly one
cursor plane to guest OS which means the guest OS cannot get the
association wrong. If that's the general and expected hypervisor
policy, then there is no need to design explicit device association in
the guest kernel UAPI. If so, I'd like it to be mentioned in the kernel
docs, too.
I'm not entirely sure how to fit what you're calling a "pointer" into my
mental model of what the hypervisor is doing...
My definition: A pointer is a pointing input device that requires a
cursor image to be drawn at the right location for it to be usable.
Right, but normal desktops (and our consoles) expect multiple input devices to feed into a single cursor.  So the connection between the on-screen cursor and the corresponding input-devices is not clear to me when you start talking about multiple pointers, even without any hypervisors in the picture.


For a typical Linux Guest, we currently expose 3+ virtual mouse devices,
and choose which one to send input to based on what their guest drivers
are doing, and what kind of input we got from the client.  We expect the
input from all of those to end up in the same user desktop session,
which we expect to own all the virtual screens, and that the user the
only gets one cursor image from that.
Why do you need to expose so many pointer devices? Just curious.
For one, we don't know what drivers are actually going to be running in the Guest.  If someone configured the Guest to not support the PS/2 mouse, or didn't have USB support compiled in, we still need to be able to send mouse input.  (We actually ran into this for years with some Linux distro installers, where they had frozen their installer with some weird/older kernel configs and just didn't support our preferred vmmouse device.)  So we plug them all in at boot, and then try to pick the one that looks the most active.

But we also need to be able to send both absolute/relative events, and we had trouble getting Guests to support both of those coming from the same virtual mouse device, so if the client changes mouse modes we would route those to the appropriate device dynamically.

There's some other quirky situations, like some absolute virtual mice not supporting the entire multimon topology correctly or mouse buttons not applying properly when things get split across mouse devices, but those are less common.


If you do expose multiple mouse (pointer) devices, then guest OS can
choose to use each of them as a different independent cursor on the
same desktop. The only thing stopping that is that it's not usually
useful, so it's not done.

Therefore, what you need to document in the Linux UAPI instead is that
*all* pointer devices are associated with the *same* cursor plane. That
forbids the multi-pointer pointer scenario the VM stack cannot handle.
At least all mouse input devices that the hypervisor console is going to use to send input to that desktop, yeah.  (You could still have non-hypervisor input sources that don't enter the picture, like some kind of passthrough/remote device or something.)

So I guess I'm not clear on what kind of usermode<=>kernel contract you
want here if the kernel isn't what's owning the translation between the
mouse input and the cursor position.  The hypervisor awkwardly has to
straddle both the input/graphics domain, and we do so by making
assumptions about how the user desktop is going to behave.
Correct. In order to reduce that awkwardness, I encourage you to write
down the expectations and requirements in this new Linux UAPI (the KMS
cursor place hotspot property). Then you can be much more confident on
how a random Linux desktop will behave.

It will also help the reviewers here to understand what the new UAPI is
and how it is supposed to work.

The cursor hotspot is I think fairly straightforward, as far as what that means for how hypervisor clients are expected to draw the mouse, and Zack's working on that part.

My point was that how the hypervisor then sends input is sort of outside the scope of the graphics portion here, and I think probably outside the current Linux UAPI entirely (unless there's some other input/topology system somewhere else I'm not familiar with).

However, in your case, the userspace (Wayland or X11 display server) is
not free to choose any arbitrary input-cursor association they might
want. You have a specific expectation that all pointing devices control
the same pointer. Since the hotspot property is exclusive to your use
case, I think it make sense to write down the expectations with the
hotspot property. Guest userspace has to explicitly program for the
hotspot property anyway, so it can also take care of your requirements.
I see your point, and I can see the value in documenting something to that effect, if only because it's /useful/ for userspaces trying to work with this.  (And the only way anyone is using this today.)

But I'm still a little fuzzy on what portion of that is within the Linux UAPI scope for the hotspot...

It seems like it might be more useful to restrict the scope of the Linux UAPI contract for how the hotspot property works to just how userspace can expect the hypervisors to display it on screen, and not try to tie in any expectations for how mouse input is going to work.

Like, VMware is using virtual mouse devices here, but another hypervisor might have no kernel mouse device and instead inject input entirely through a userspace daemon?  So even trying to express the input part of the contract in terms of kernel input devices I'm not sure makes sense.

I guess my fear is that I don't want to lock this down in a way that excludes some well-meaning hypervisor that implements the graphics part correctly but does something just weird enough on the input side to not be considered compliant.

So I guess I would vote for trying to include something to that effect as context or explanation, but not try to strictly define how that works?

--Michael Banack

Reply via email to