from:"Gurchetan Singh"

Re: [PATCH 0/2] drm/virtio: introduce the HOST_PAGE_SIZE feature

2024-08-22 Thread Gurchetan Singh

On Thu, Aug 22, 2024 at 8:29 AM Sergio Lopez Pascual  wrote:

> Gurchetan Singh  writes:
>
> > On Thu, Aug 8, 2024 at 3:38 AM Sergio Lopez Pascual 
> wrote:
> >
> >> Gurchetan Singh  writes:
> >>
> >> > On Tue, Aug 6, 2024 at 1:15 PM Rob Clark  wrote:
> >> >
> >> >> On Tue, Aug 6, 2024 at 9:15 AM Gurchetan Singh
> >> >>  wrote:
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Mon, Aug 5, 2024 at 2:14 AM Sergio Lopez Pascual <
> s...@redhat.com>
> >> >> wrote:
> >> >> >>
> >> >> >> Dmitry Osipenko  writes:
> >> >> >>
> >> >> >> > On 7/23/24 14:49, Sergio Lopez wrote:
> >> >> >> >> There's an incresing number of machines supporting multiple
> page
> >> >> sizes
> >> >> >> >> and on these machines the host and a guest can be running, each
> >> one,
> >> >> >> >> with a different page size.
> >> >> >> >>
> >> >> >> >> For what pertains to virtio-gpu, this is not a problem if the
> page
> >> >> size
> >> >> >> >> of the guest happens to be bigger or equal than the host, but
> will
> >> >> >> >> potentially lead to failures in memory allocations and/or
> mappings
> >> >> >> >> otherwise.
> >> >> >> >
> >> >> >> > Please describe concrete problem you're trying to solve. Guest
> >> memory
> >> >> >> > allocation consists of guest pages, I don't see how knowledge of
> >> host
> >> >> >> > page size helps anything in userspace.
> >> >> >> >
> >> >> >> > I suspect you want this for host blobs, but then it should be
> >> >> >> > virtio_gpu_vram_create() that should use max(host_page_sz,
> >> >> >> > guest_page_size), AFAICT. It's kernel who is responsible for
> memory
> >> >> >> > management, userspace can't be trusted for doing that.
> >> >> >>
> >> >> >> Mesa's Vulkan/Venus uses CREATE_BLOB to request the host the
> creation
> >> >> >> and mapping into the guest of device-backed memory and shmem
> regions.
> >> >> >> The CREATE_BLOB ioctl doesn't update
> >> drm_virtgpu_resource_create->size,
> >> >> >> so the guest kernel (and, as a consequence, the host kernel) can't
> >> >> >> override the user's request.
> >> >> >>
> >> >> >> I'd like Mesa's Vulkan/Venus in the guest to be able to obtain the
> >> host
> >> >> >> page size to align the size of the CREATE_BLOB requests as
> required.
> >> >> >
> >> >> >
> >> >> > gfxstream solves this problem by putting the relevant information
> in
> >> the
> >> >> capabilities obtained from the host:
> >> >> >
> >> >> >
> >> >>
> >>
> https://android.googlesource.com/platform/hardware/google/gfxstream/+/refs/heads/main/host/virtio-gpu-gfxstream-renderer.cpp#1691
> >> >> >
> >> >> > If you want to be paranoid, you can also validate the
> >> >> ResourceCreateBlob::size is properly host-page aligned when that
> request
> >> >> reaches the host.
> >> >> >
> >> >> > So you can probably solve this problem using current interfaces.
> >> >> Whether it's cleaner for all context types to use the capabilities,
> or
> >> have
> >> >> all VMMs to expose VIRTIO_GPU_F_HOST_PAGE_SIZE, would be the
> >> cost/benefit
> >> >> tradeoff.
> >> >> >
> >> >>
> >> >> I guess solving it in a context-type specific way is possible.  But I
> >> >> think it is a relatively universal constraint.  And maybe it makes
> >> >> sense for virtgpu guest kernel to enforce alignment (at least it can
> >> >> return an error synchronously) in addition to the host.
> >> >>
> >> >
> >> > virtio-media may have support for VIRTIO_MEDIA_CMD_MMAP too, so could
> run
> >> > into this issue.
> >> >
> >> >
> >&

Re: [PATCH 0/2] drm/virtio: introduce the HOST_PAGE_SIZE feature

2024-08-09 Thread Gurchetan Singh

On Thu, Aug 8, 2024 at 3:38 AM Sergio Lopez Pascual  wrote:

> Gurchetan Singh  writes:
>
> > On Tue, Aug 6, 2024 at 1:15 PM Rob Clark  wrote:
> >
> >> On Tue, Aug 6, 2024 at 9:15 AM Gurchetan Singh
> >>  wrote:
> >> >
> >> >
> >> >
> >> > On Mon, Aug 5, 2024 at 2:14 AM Sergio Lopez Pascual 
> >> wrote:
> >> >>
> >> >> Dmitry Osipenko  writes:
> >> >>
> >> >> > On 7/23/24 14:49, Sergio Lopez wrote:
> >> >> >> There's an incresing number of machines supporting multiple page
> >> sizes
> >> >> >> and on these machines the host and a guest can be running, each
> one,
> >> >> >> with a different page size.
> >> >> >>
> >> >> >> For what pertains to virtio-gpu, this is not a problem if the page
> >> size
> >> >> >> of the guest happens to be bigger or equal than the host, but will
> >> >> >> potentially lead to failures in memory allocations and/or mappings
> >> >> >> otherwise.
> >> >> >
> >> >> > Please describe concrete problem you're trying to solve. Guest
> memory
> >> >> > allocation consists of guest pages, I don't see how knowledge of
> host
> >> >> > page size helps anything in userspace.
> >> >> >
> >> >> > I suspect you want this for host blobs, but then it should be
> >> >> > virtio_gpu_vram_create() that should use max(host_page_sz,
> >> >> > guest_page_size), AFAICT. It's kernel who is responsible for memory
> >> >> > management, userspace can't be trusted for doing that.
> >> >>
> >> >> Mesa's Vulkan/Venus uses CREATE_BLOB to request the host the creation
> >> >> and mapping into the guest of device-backed memory and shmem regions.
> >> >> The CREATE_BLOB ioctl doesn't update
> drm_virtgpu_resource_create->size,
> >> >> so the guest kernel (and, as a consequence, the host kernel) can't
> >> >> override the user's request.
> >> >>
> >> >> I'd like Mesa's Vulkan/Venus in the guest to be able to obtain the
> host
> >> >> page size to align the size of the CREATE_BLOB requests as required.
> >> >
> >> >
> >> > gfxstream solves this problem by putting the relevant information in
> the
> >> capabilities obtained from the host:
> >> >
> >> >
> >>
> https://android.googlesource.com/platform/hardware/google/gfxstream/+/refs/heads/main/host/virtio-gpu-gfxstream-renderer.cpp#1691
> >> >
> >> > If you want to be paranoid, you can also validate the
> >> ResourceCreateBlob::size is properly host-page aligned when that request
> >> reaches the host.
> >> >
> >> > So you can probably solve this problem using current interfaces.
> >> Whether it's cleaner for all context types to use the capabilities, or
> have
> >> all VMMs to expose VIRTIO_GPU_F_HOST_PAGE_SIZE, would be the
> cost/benefit
> >> tradeoff.
> >> >
> >>
> >> I guess solving it in a context-type specific way is possible.  But I
> >> think it is a relatively universal constraint.  And maybe it makes
> >> sense for virtgpu guest kernel to enforce alignment (at least it can
> >> return an error synchronously) in addition to the host.
> >>
> >
> > virtio-media may have support for VIRTIO_MEDIA_CMD_MMAP too, so could run
> > into this issue.
> >
> >
> https://github.com/chromeos/virtio-media?tab=readme-ov-file#shared-memory-regions
> >
> > virtio-fs also has the DAX window which uses the same memory mapping
> > mechanism.
> >
> > https://virtio-fs.gitlab.io/design.html
> >
> > Maybe this should not be a virtio-gpu thing, but a virtio thing?
>
> This is true, but finding a common place to put the page size is really
> hard in practice. I don't think we can borrow space in the feature bits
> for that (and that would probably be abusing its purpose quite a bit)
> and extending the transport configuration registers is quite cumbersome
> and, in general, undesirable.
>
> That leaves us with the device-specific config space, and that implies a
> device-specific feature bit as it's implemented in this series.
>
> The Shared Memory Regions on the VIRTIO spec, while doesn't talk
> specifically about page size, also gives us a hint about this being the
> right direction:
>

Can't we just modify the Shared Memory region PCI capability to include
page size?  We can either:

1) keep the same size struct + header (VIRTIO_PCI_CAP_SHARED_MEMORY_CFG),
and just hijack one of the padding fields. If the padding field is zero, we
can just say it's 4096.

or

2) Or expand the size of the struct, with
VIRTIO_PCI_CAP_SHARED_MEMORY_CFG2.

(sketch here: crrev.com/c/5778179)

The benefit of this would work with virtio-fs (though I'm not sure anyone
uses the DAX window), and possibly virtio-media in the future.


>
> "
> 2.10 Shared Memory Regions
> (...)
> Memory consistency rules vary depending on the region and the device
> and they will be specified as required by each device."
> "
>
> Thanks,
> Sergio.
>
>

Re: [PATCH 0/2] drm/virtio: introduce the HOST_PAGE_SIZE feature

2024-08-07 Thread Gurchetan Singh

On Tue, Aug 6, 2024 at 1:15 PM Rob Clark  wrote:

> On Tue, Aug 6, 2024 at 9:15 AM Gurchetan Singh
>  wrote:
> >
> >
> >
> > On Mon, Aug 5, 2024 at 2:14 AM Sergio Lopez Pascual 
> wrote:
> >>
> >> Dmitry Osipenko  writes:
> >>
> >> > On 7/23/24 14:49, Sergio Lopez wrote:
> >> >> There's an incresing number of machines supporting multiple page
> sizes
> >> >> and on these machines the host and a guest can be running, each one,
> >> >> with a different page size.
> >> >>
> >> >> For what pertains to virtio-gpu, this is not a problem if the page
> size
> >> >> of the guest happens to be bigger or equal than the host, but will
> >> >> potentially lead to failures in memory allocations and/or mappings
> >> >> otherwise.
> >> >
> >> > Please describe concrete problem you're trying to solve. Guest memory
> >> > allocation consists of guest pages, I don't see how knowledge of host
> >> > page size helps anything in userspace.
> >> >
> >> > I suspect you want this for host blobs, but then it should be
> >> > virtio_gpu_vram_create() that should use max(host_page_sz,
> >> > guest_page_size), AFAICT. It's kernel who is responsible for memory
> >> > management, userspace can't be trusted for doing that.
> >>
> >> Mesa's Vulkan/Venus uses CREATE_BLOB to request the host the creation
> >> and mapping into the guest of device-backed memory and shmem regions.
> >> The CREATE_BLOB ioctl doesn't update drm_virtgpu_resource_create->size,
> >> so the guest kernel (and, as a consequence, the host kernel) can't
> >> override the user's request.
> >>
> >> I'd like Mesa's Vulkan/Venus in the guest to be able to obtain the host
> >> page size to align the size of the CREATE_BLOB requests as required.
> >
> >
> > gfxstream solves this problem by putting the relevant information in the
> capabilities obtained from the host:
> >
> >
> https://android.googlesource.com/platform/hardware/google/gfxstream/+/refs/heads/main/host/virtio-gpu-gfxstream-renderer.cpp#1691
> >
> > If you want to be paranoid, you can also validate the
> ResourceCreateBlob::size is properly host-page aligned when that request
> reaches the host.
> >
> > So you can probably solve this problem using current interfaces.
> Whether it's cleaner for all context types to use the capabilities, or have
> all VMMs to expose VIRTIO_GPU_F_HOST_PAGE_SIZE, would be the cost/benefit
> tradeoff.
> >
>
> I guess solving it in a context-type specific way is possible.  But I
> think it is a relatively universal constraint.  And maybe it makes
> sense for virtgpu guest kernel to enforce alignment (at least it can
> return an error synchronously) in addition to the host.
>

virtio-media may have support for VIRTIO_MEDIA_CMD_MMAP too, so could run
into this issue.

https://github.com/chromeos/virtio-media?tab=readme-ov-file#shared-memory-regions

virtio-fs also has the DAX window which uses the same memory mapping
mechanism.

https://virtio-fs.gitlab.io/design.html

Maybe this should not be a virtio-gpu thing, but a virtio thing?


>
> BR,
> -R
>
> >>
> >>
> >> Thanks,
> >> Sergio.
> >>
>

Re: [PATCH 0/2] drm/virtio: introduce the HOST_PAGE_SIZE feature

2024-08-06 Thread Gurchetan Singh

On Mon, Aug 5, 2024 at 2:14 AM Sergio Lopez Pascual  wrote:

> Dmitry Osipenko  writes:
>
> > On 7/23/24 14:49, Sergio Lopez wrote:
> >> There's an incresing number of machines supporting multiple page sizes
> >> and on these machines the host and a guest can be running, each one,
> >> with a different page size.
> >>
> >> For what pertains to virtio-gpu, this is not a problem if the page size
> >> of the guest happens to be bigger or equal than the host, but will
> >> potentially lead to failures in memory allocations and/or mappings
> >> otherwise.
> >
> > Please describe concrete problem you're trying to solve. Guest memory
> > allocation consists of guest pages, I don't see how knowledge of host
> > page size helps anything in userspace.
> >
> > I suspect you want this for host blobs, but then it should be
> > virtio_gpu_vram_create() that should use max(host_page_sz,
> > guest_page_size), AFAICT. It's kernel who is responsible for memory
> > management, userspace can't be trusted for doing that.
>
> Mesa's Vulkan/Venus uses CREATE_BLOB to request the host the creation
> and mapping into the guest of device-backed memory and shmem regions.
> The CREATE_BLOB ioctl doesn't update drm_virtgpu_resource_create->size,
> so the guest kernel (and, as a consequence, the host kernel) can't
> override the user's request.
>
> I'd like Mesa's Vulkan/Venus in the guest to be able to obtain the host
> page size to align the size of the CREATE_BLOB requests as required.
>

gfxstream solves this problem by putting the relevant information in the
capabilities obtained from the host:

https://android.googlesource.com/platform/hardware/google/gfxstream/+/refs/heads/main/host/virtio-gpu-gfxstream-renderer.cpp#1691

If you want to be paranoid, you can also validate the
ResourceCreateBlob::size is properly host-page aligned when that
request reaches the host.

So you can probably solve this problem using current interfaces.  Whether
it's cleaner for all context types to use the capabilities, or have all
VMMs to expose VIRTIO_GPU_F_HOST_PAGE_SIZE, would be the cost/benefit
tradeoff.


>
> Thanks,
> Sergio.
>
>

Re: [RFC 0/7] drm/virtio: Import scanout buffers from other devices

2024-06-14 Thread Gurchetan Singh

On Thu, May 30, 2024 at 12:21 AM Kasireddy, Vivek 
wrote:

> Hi Gurchetan,
>
> >
> > On Fri, May 24, 2024 at 11:33 AM Kasireddy, Vivek
> > mailto:vivek.kasire...@intel.com> > wrote:
> >
> >
> >   Hi,
> >
> >   Sorry, my previous reply got messed up as a result of HTML
> > formatting. This is
> >   a plain text version of the same reply.
> >
> >   >
> >   >
> >   >   Having virtio-gpu import scanout buffers (via prime) from
> other
> >   >   devices means that we'd be adding a head to headless GPUs
> > assigned
> >   >   to a Guest VM or additional heads to regular GPU devices
> that
> > are
> >   >   passthrough'd to the Guest. In these cases, the Guest
> > compositor
> >   >   can render into the scanout buffer using a primary GPU and
> has
> > the
> >   >   secondary GPU (virtio-gpu) import it for display purposes.
> >   >
> >   >   The main advantage with this is that the imported scanout
> > buffer can
> >   >   either be displayed locally on the Host (e.g, using Qemu +
> GTK
> > UI)
> >   >   or encoded and streamed to a remote client (e.g, Qemu +
> Spice
> > UI).
> >   >   Note that since Qemu uses udmabuf driver, there would be no
> >   > copies
> >   >   made of the scanout buffer as it is displayed. This should
> be
> >   >   possible even when it might reside in device memory such
> has
> >   > VRAM.
> >   >
> >   >   The specific use-case that can be supported with this
> series is
> > when
> >   >   running Weston or other guest compositors with "additional-
> > devices"
> >   >   feature (./weston --drm-device=card1 --additional-
> > devices=card0).
> >   >   More info about this feature can be found at:
> >   >   https://gitlab.freedesktop.org/wayland/weston/-
> >   > /merge_requests/736
> >   >
> >   >   In the above scenario, card1 could be a dGPU or an iGPU and
> > card0
> >   >   would be virtio-gpu in KMS only mode. However, the case
> > where this
> >   >   patch series could be particularly useful is when card1 is
> a GPU
> > VF
> >   >   that needs to share its scanout buffer (in a zero-copy
> way) with
> > the
> >   >   GPU PF on the Host. Or, it can also be useful when the
> scanout
> > buffer
> >   >   needs to be shared between any two GPU devices (assuming
> > one of
> >   > them
> >   >   is assigned to a Guest VM) as long as they are P2P DMA
> > compatible.
> >   >
> >   >
> >   >
> >   > Is passthrough iGPU-only or passthrough dGPU-only something you
> > intend to
> >   > use?
> >   Our main use-case involves passthrough’g a headless dGPU VF device
> > and sharing
> >   the Guest compositor’s scanout buffer with dGPU PF device on the
> > Host. Same goal for
> >   headless iGPU VF to iGPU PF device as well.
> >
> >
> >
> > Just to check my understanding: the same physical {i, d}GPU is
> partitioned
> > into the VF and PF, but the PF handles host-side display integration and
> > rendering?
> Yes, that is mostly right. In a nutshell, the same physical GPU is
> partitioned
> into one PF device and multiple VF devices. Only the PF device has access
> to
> the display hardware and can do KMS (on the Host). The VF devices are
> headless with no access to display hardware (cannot do KMS but can do
> render/
> encode/decode) and are generally assigned (or passthrough'd) to the Guest
> VMs.
> Some more details about this model can be found here:
>
> https://lore.kernel.org/dri-devel/20231110182231.1730-1-michal.wajdec...@intel.com/
>
> >
> >
> >   However, using a combination of iGPU and dGPU where either of
> > them can be passthrough’d
> >   to the Guest is something I think can be supported with this patch
> > series as well.
> >
> >   >
> >   > If it's a dGPU + iGPU setup, then the way other people seem to
> do it
> > is a
> >   > "virtualized" iGPU (via virgl/gfxstream/take your pick) and pass-
> > through the
> >   > dGPU.
> >   >
> >   > For example, AMD seems to use virgl to allocate and import into
> > the dGPU.
> >   >
> >   > https://gitlab.freedesktop.org/mesa/mesa/-
> > /merge_requests/23896
> >   >
> >   > https://lore.kernel.org/all/20231221100016.4022353-1-
> >   > julia.zh...@amd.com/ 
> >   >
> >   >
> >   > ChromeOS also uses that method (see crrev.com/c/3764931
> > 
> >   >  ) [cc: dGPU architect +Dominik Behr
> >   >  > ]
> >   >
> >   > So if iGPU + dGPU is the primary use case, you should be able to
> > use these
> >   > methods as well.  The model would "virtualized iGPU" +
> > passthrough dGPU,
> >   > not split SoCs.
> >   In our use-case, the goal is to have only one pri

Re: [RFC 0/7] drm/virtio: Import scanout buffers from other devices

2024-05-29 Thread Gurchetan Singh

On Fri, May 24, 2024 at 11:33 AM Kasireddy, Vivek 
wrote:

> Hi,
>
> Sorry, my previous reply got messed up as a result of HTML formatting.
> This is
> a plain text version of the same reply.
>
> >
> >
> >   Having virtio-gpu import scanout buffers (via prime) from other
> >   devices means that we'd be adding a head to headless GPUs assigned
> >   to a Guest VM or additional heads to regular GPU devices that are
> >   passthrough'd to the Guest. In these cases, the Guest compositor
> >   can render into the scanout buffer using a primary GPU and has the
> >   secondary GPU (virtio-gpu) import it for display purposes.
> >
> >   The main advantage with this is that the imported scanout buffer
> can
> >   either be displayed locally on the Host (e.g, using Qemu + GTK UI)
> >   or encoded and streamed to a remote client (e.g, Qemu + Spice UI).
> >   Note that since Qemu uses udmabuf driver, there would be no
> > copies
> >   made of the scanout buffer as it is displayed. This should be
> >   possible even when it might reside in device memory such has
> > VRAM.
> >
> >   The specific use-case that can be supported with this series is
> when
> >   running Weston or other guest compositors with "additional-devices"
> >   feature (./weston --drm-device=card1 --additional-devices=card0).
> >   More info about this feature can be found at:
> >   https://gitlab.freedesktop.org/wayland/weston/-
> > /merge_requests/736
> >
> >   In the above scenario, card1 could be a dGPU or an iGPU and card0
> >   would be virtio-gpu in KMS only mode. However, the case where this
> >   patch series could be particularly useful is when card1 is a GPU VF
> >   that needs to share its scanout buffer (in a zero-copy way) with
> the
> >   GPU PF on the Host. Or, it can also be useful when the scanout
> buffer
> >   needs to be shared between any two GPU devices (assuming one of
> > them
> >   is assigned to a Guest VM) as long as they are P2P DMA compatible.
> >
> >
> >
> > Is passthrough iGPU-only or passthrough dGPU-only something you intend to
> > use?
> Our main use-case involves passthrough’g a headless dGPU VF device and
> sharing
> the Guest compositor’s scanout buffer with dGPU PF device on the Host.
> Same goal for
> headless iGPU VF to iGPU PF device as well.
>

Just to check my understanding: the same physical {i, d}GPU is partitioned
into the VF and PF, but the PF handles host-side display integration and
rendering?


> However, using a combination of iGPU and dGPU where either of them can be
> passthrough’d
> to the Guest is something I think can be supported with this patch series
> as well.
>
> >
> > If it's a dGPU + iGPU setup, then the way other people seem to do it is a
> > "virtualized" iGPU (via virgl/gfxstream/take your pick) and pass-through
> the
> > dGPU.
> >
> > For example, AMD seems to use virgl to allocate and import into the dGPU.
> >
> > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23896
> >
> > https://lore.kernel.org/all/20231221100016.4022353-1-
> > julia.zh...@amd.com/
> >
> >
> > ChromeOS also uses that method (see crrev.com/c/3764931
> >  ) [cc: dGPU architect +Dominik Behr
> >  ]
> >
> > So if iGPU + dGPU is the primary use case, you should be able to use
> these
> > methods as well.  The model would "virtualized iGPU" + passthrough dGPU,
> > not split SoCs.
> In our use-case, the goal is to have only one primary GPU (passthrough’d
> iGPU/dGPU)
> do all the rendering (using native DRI drivers) for clients/compositor and
> all the outputs
> and share the scanout buffers with the secondary GPU (virtio-gpu). Since
> this is mostly
> how Mutter (and also Weston) work in a multi-GPU setup, I am not sure if
> virgl is needed.
>

I think you can probably use virgl with the PF and others probably will,
but supporting multiple methods in Linux is not unheard of.

Does your patchset need the Mesa kmsro patchset to function correctly?

https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9592

If so, I would try to get that reviewed first to meet DRM requirements (
https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#open-source-userspace-requirements).
You might explicitly call out the design decision you're making: ("We can
probably use virgl as the virtualized iGPU via PF, but that adds
unnecessary complexity b/c __").

And, doing it this way means that no other userspace components need to be
> modified
> on both the Guest and the Host.
>
> >
> >
> >
> >   As part of the import, the virtio-gpu driver shares the dma
> >   addresses and lengths with Qemu which then determines whether
> > the
> >   memory region they belong to is owned by a PCI device or whether it
> >   is part of the Guest's system ram. If it is the former, it
> identifies
> >   the devid (or bdf) and bar and provides this info (along with
> offsets
>

Re: [RFC 0/7] drm/virtio: Import scanout buffers from other devices

2024-05-23 Thread Gurchetan Singh

On Thu, Mar 28, 2024 at 2:01 AM Vivek Kasireddy 
wrote:

> Having virtio-gpu import scanout buffers (via prime) from other
> devices means that we'd be adding a head to headless GPUs assigned
> to a Guest VM or additional heads to regular GPU devices that are
> passthrough'd to the Guest. In these cases, the Guest compositor
> can render into the scanout buffer using a primary GPU and has the
> secondary GPU (virtio-gpu) import it for display purposes.
>
> The main advantage with this is that the imported scanout buffer can
> either be displayed locally on the Host (e.g, using Qemu + GTK UI)
> or encoded and streamed to a remote client (e.g, Qemu + Spice UI).
> Note that since Qemu uses udmabuf driver, there would be no copies
> made of the scanout buffer as it is displayed. This should be
> possible even when it might reside in device memory such has VRAM.
>
> The specific use-case that can be supported with this series is when
> running Weston or other guest compositors with "additional-devices"
> feature (./weston --drm-device=card1 --additional-devices=card0).
> More info about this feature can be found at:
> https://gitlab.freedesktop.org/wayland/weston/-/merge_requests/736
>
> In the above scenario, card1 could be a dGPU or an iGPU and card0
> would be virtio-gpu in KMS only mode. However, the case where this
> patch series could be particularly useful is when card1 is a GPU VF
> that needs to share its scanout buffer (in a zero-copy way) with the
> GPU PF on the Host. Or, it can also be useful when the scanout buffer
> needs to be shared between any two GPU devices (assuming one of them
> is assigned to a Guest VM) as long as they are P2P DMA compatible.
>

Is passthrough iGPU-only or passthrough dGPU-only something you intend to
use?

If it's a dGPU + iGPU setup, then the way other people seem to do it is a
"virtualized" iGPU (via virgl/gfxstream/take your pick) and pass-through
the dGPU.

For example, AMD seems to use virgl to allocate and import into the dGPU.

https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23896
https://lore.kernel.org/all/20231221100016.4022353-1-julia.zh...@amd.com/

ChromeOS also uses that method (see crrev.com/c/3764931) [cc: dGPU
architect +Dominik Behr ]

So if iGPU + dGPU is the primary use case, you should be able to use these
methods as well.  The model would "virtualized iGPU" + passthrough dGPU,
not split SoCs.


> As part of the import, the virtio-gpu driver shares the dma
> addresses and lengths with Qemu which then determines whether the
> memory region they belong to is owned by a PCI device or whether it
> is part of the Guest's system ram. If it is the former, it identifies
> the devid (or bdf) and bar and provides this info (along with offsets
> and sizes) to the udmabuf driver. In the latter case, instead of the
> the devid and bar it provides the memfd. The udmabuf driver then
> creates a dmabuf using this info that Qemu shares with Spice for
> encode via Gstreamer.
>
> Note that the virtio-gpu driver registers a move_notify() callback
> to track location changes associated with the scanout buffer and
> sends attach/detach backing cmds to Qemu when appropriate. And,
> synchronization (that is, ensuring that Guest and Host are not
> using the scanout buffer at the same time) is ensured by pinning/
> unpinning the dmabuf as part of plane update and using a fence
> in resource_flush cmd.


I'm not sure how QEMU's display paths work, but with crosvm if you share
the guest-created dmabuf with the display, and the guest moves the backing
pages, the only recourse is the destroy the surface and show a black screen
to the user: not the best thing experience wise.

Only amdgpu calls dma_buf_move_notfiy(..), and you're probably testing on
Intel only, so you may not be hitting that code path anyways.  I forgot the
exact reason, but apparently udmabuf may not work with amdgpu displays and
it seems the virtualized iGPU + dGPU is the way to go for amdgpu anyways.
So I recommend just pinning the buffer for the lifetime of the import for
simplicity and correctness.


> This series is available at:
> https://gitlab.freedesktop.org/Vivek/drm-tip/-/commits/virtgpu_import_rfc
>
> along with additional patches for Qemu and Spice here:
> https://gitlab.freedesktop.org/Vivek/qemu/-/commits/virtgpu_dmabuf_pcidev
> https://gitlab.freedesktop.org/Vivek/spice/-/commits/encode_dmabuf_v4
>
> Patchset overview:
>
> Patch 1:   Implement VIRTIO_GPU_CMD_RESOURCE_DETACH_BACKING cmd
> Patch 2-3: Helpers to initalize, import, free imported object
> Patch 4-5: Import and use buffers from other devices for scanout
> Patch 6-7: Have udmabuf driver create dmabuf from PCI bars for P2P DMA
>
> This series is tested using the following method:
> - Run Qemu with the following relevant options:
>   qemu-system-x86_64 -m 4096m 
>   -device vfio-pci,host=:03:00.0
>   -device virtio-vga,max_outputs=1,blob=true,xres=1920,yres=1080
>   -spice
> port=3001,gl=on,disable-ticketing=on,preferred

Re: [RFC] drm/msm: Add GPU memory traces

2024-03-05 Thread Gurchetan Singh

On Mon, Mar 4, 2024 at 6:04 PM Rob Clark  wrote:

> On Mon, Mar 4, 2024 at 5:38 PM Gurchetan Singh
>  wrote:
> >
> >
> >
> >
> > On Fri, Mar 1, 2024 at 10:54 AM Rob Clark  wrote:
> >>
> >> From: Rob Clark 
> >>
> >> Perfetto can use these traces to track global and per-process GPU memory
> >> usage.
> >>
> >> Signed-off-by: Rob Clark 
> >> ---
> >> I realized the tracepoint that perfetto uses to show GPU memory usage
> >> globally and per-process was already upstream, but with no users.
> >>
> >> This overlaps a bit with fdinfo, but ftrace is a lighter weight
> >> mechanism and fits better with perfetto (plus is already supported in
> >> trace_processor and perfetto UI, whereas something fdinfo based would
> >> require new code to be added in perfetto.
> >>
> >> We could probably do this more globally (ie. drm_gem_get/put_pages() and
> >> drm_gem_handle_create_tail()/drm_gem_object_release_handle() if folks
> >> prefer.  Not sure where that leaves the TTM drivers.
> >>
> >>  drivers/gpu/drm/msm/Kconfig   |  1 +
> >>  drivers/gpu/drm/msm/msm_drv.h |  5 +
> >>  drivers/gpu/drm/msm/msm_gem.c | 37 +++
> >>  drivers/gpu/drm/msm/msm_gpu.h |  8 
> >>  4 files changed, 51 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/msm/Kconfig b/drivers/gpu/drm/msm/Kconfig
> >> index f202f26adab2..e4c912fcaf22 100644
> >> --- a/drivers/gpu/drm/msm/Kconfig
> >> +++ b/drivers/gpu/drm/msm/Kconfig
> >> @@ -33,6 +33,7 @@ config DRM_MSM
> >> select PM_OPP
> >> select NVMEM
> >> select PM_GENERIC_DOMAINS
> >> +   select TRACE_GPU_MEM
> >> help
> >>   DRM/KMS driver for MSM/snapdragon.
> >>
> >> diff --git a/drivers/gpu/drm/msm/msm_drv.h
> b/drivers/gpu/drm/msm/msm_drv.h
> >> index 16a7cbc0b7dd..cb8f7e804b5b 100644
> >> --- a/drivers/gpu/drm/msm/msm_drv.h
> >> +++ b/drivers/gpu/drm/msm/msm_drv.h
> >> @@ -137,6 +137,11 @@ struct msm_drm_private {
> >> struct msm_rd_state *hangrd;   /* debugfs to dump hanging
> submits */
> >> struct msm_perf_state *perf;
> >>
> >> +   /**
> >> +* total_mem: Total/global amount of memory backing GEM objects.
> >> +*/
> >> +   atomic64_t total_mem;
> >> +
> >> /**
> >>  * List of all GEM objects (mainly for debugfs, protected by
> obj_lock
> >>  * (acquire before per GEM object lock)
> >> diff --git a/drivers/gpu/drm/msm/msm_gem.c
> b/drivers/gpu/drm/msm/msm_gem.c
> >> index 175ee4ab8a6f..e04c4af5d154 100644
> >> --- a/drivers/gpu/drm/msm/msm_gem.c
> >> +++ b/drivers/gpu/drm/msm/msm_gem.c
> >> @@ -12,6 +12,9 @@
> >>  #include 
> >>
> >>  #include 
> >> +#include 
> >> +
> >> +#include 
> >>
> >>  #include "msm_drv.h"
> >>  #include "msm_fence.h"
> >> @@ -33,6 +36,34 @@ static bool use_pages(struct drm_gem_object *obj)
> >> return !msm_obj->vram_node;
> >>  }
> >>
> >> +static void update_device_mem(struct msm_drm_private *priv, ssize_t
> size)
> >> +{
> >> +   uint64_t total_mem = atomic64_add_return(size,
> &priv->total_mem);
> >> +   trace_gpu_mem_total(0, 0, total_mem);
> >> +}
> >> +
> >> +static void update_ctx_mem(struct drm_file *file, ssize_t size)
> >> +{
> >> +   struct msm_file_private *ctx = file->driver_priv;
> >> +   uint64_t ctx_mem = atomic64_add_return(size, &ctx->ctx_mem);
> >> +
> >> +   rcu_read_lock(); /* Locks file->pid! */
> >> +   trace_gpu_mem_total(0, pid_nr(file->pid), ctx_mem);
> >> +   rcu_read_unlock();
> >> +
> >> +}
> >> +
> >> +static int msm_gem_open(struct drm_gem_object *obj, struct drm_file
> *file)
> >> +{
> >> +   update_ctx_mem(file, obj->size);
> >> +   return 0;
> >> +}
> >> +
> >> +static void msm_gem_close(struct drm_gem_object *obj, struct drm_file
> *file)
> >> +{
> >> +   update_ctx_mem(file, -obj->size);
> >> +}
> >> +
> >>  /*
> >>   * Cache sync.. this is a bit over-complicated, to fit dma-mapping
> >>   * API.  Really GPU cache is

Re: [RFC] drm/msm: Add GPU memory traces

2024-03-04 Thread Gurchetan Singh

On Fri, Mar 1, 2024 at 10:54 AM Rob Clark  wrote:

> From: Rob Clark 
>
> Perfetto can use these traces to track global and per-process GPU memory
> usage.
>
> Signed-off-by: Rob Clark 
> ---
> I realized the tracepoint that perfetto uses to show GPU memory usage
> globally and per-process was already upstream, but with no users.
>
> This overlaps a bit with fdinfo, but ftrace is a lighter weight
> mechanism and fits better with perfetto (plus is already supported in
> trace_processor and perfetto UI, whereas something fdinfo based would
> require new code to be added in perfetto.
>
> We could probably do this more globally (ie. drm_gem_get/put_pages() and
> drm_gem_handle_create_tail()/drm_gem_object_release_handle() if folks
> prefer.  Not sure where that leaves the TTM drivers.
>
>  drivers/gpu/drm/msm/Kconfig   |  1 +
>  drivers/gpu/drm/msm/msm_drv.h |  5 +
>  drivers/gpu/drm/msm/msm_gem.c | 37 +++
>  drivers/gpu/drm/msm/msm_gpu.h |  8 
>  4 files changed, 51 insertions(+)
>
> diff --git a/drivers/gpu/drm/msm/Kconfig b/drivers/gpu/drm/msm/Kconfig
> index f202f26adab2..e4c912fcaf22 100644
> --- a/drivers/gpu/drm/msm/Kconfig
> +++ b/drivers/gpu/drm/msm/Kconfig
> @@ -33,6 +33,7 @@ config DRM_MSM
> select PM_OPP
> select NVMEM
> select PM_GENERIC_DOMAINS
> +   select TRACE_GPU_MEM
> help
>   DRM/KMS driver for MSM/snapdragon.
>
> diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
> index 16a7cbc0b7dd..cb8f7e804b5b 100644
> --- a/drivers/gpu/drm/msm/msm_drv.h
> +++ b/drivers/gpu/drm/msm/msm_drv.h
> @@ -137,6 +137,11 @@ struct msm_drm_private {
> struct msm_rd_state *hangrd;   /* debugfs to dump hanging submits
> */
> struct msm_perf_state *perf;
>
> +   /**
> +* total_mem: Total/global amount of memory backing GEM objects.
> +*/
> +   atomic64_t total_mem;
> +
> /**
>  * List of all GEM objects (mainly for debugfs, protected by
> obj_lock
>  * (acquire before per GEM object lock)
> diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
> index 175ee4ab8a6f..e04c4af5d154 100644
> --- a/drivers/gpu/drm/msm/msm_gem.c
> +++ b/drivers/gpu/drm/msm/msm_gem.c
> @@ -12,6 +12,9 @@
>  #include 
>
>  #include 
> +#include 
> +
> +#include 
>
>  #include "msm_drv.h"
>  #include "msm_fence.h"
> @@ -33,6 +36,34 @@ static bool use_pages(struct drm_gem_object *obj)
> return !msm_obj->vram_node;
>  }
>
> +static void update_device_mem(struct msm_drm_private *priv, ssize_t size)
> +{
> +   uint64_t total_mem = atomic64_add_return(size, &priv->total_mem);
> +   trace_gpu_mem_total(0, 0, total_mem);
> +}
> +
> +static void update_ctx_mem(struct drm_file *file, ssize_t size)
> +{
> +   struct msm_file_private *ctx = file->driver_priv;
> +   uint64_t ctx_mem = atomic64_add_return(size, &ctx->ctx_mem);
> +
> +   rcu_read_lock(); /* Locks file->pid! */
> +   trace_gpu_mem_total(0, pid_nr(file->pid), ctx_mem);
> +   rcu_read_unlock();
> +
> +}
> +
> +static int msm_gem_open(struct drm_gem_object *obj, struct drm_file *file)
> +{
> +   update_ctx_mem(file, obj->size);
> +   return 0;
> +}
> +
> +static void msm_gem_close(struct drm_gem_object *obj, struct drm_file
> *file)
> +{
> +   update_ctx_mem(file, -obj->size);
> +}
> +
>  /*
>   * Cache sync.. this is a bit over-complicated, to fit dma-mapping
>   * API.  Really GPU cache is out of scope here (handled on cmdstream)
> @@ -156,6 +187,8 @@ static struct page **get_pages(struct drm_gem_object
> *obj)
> return p;
> }
>
> +   update_device_mem(dev->dev_private, obj->size);
> +
> msm_obj->pages = p;
>
> msm_obj->sgt = drm_prime_pages_to_sg(obj->dev, p, npages);
> @@ -209,6 +242,8 @@ static void put_pages(struct drm_gem_object *obj)
> msm_obj->sgt = NULL;
> }
>
> +   update_device_mem(obj->dev->dev_private, -obj->size);
> +
> if (use_pages(obj))
> drm_gem_put_pages(obj, msm_obj->pages, true,
> false);
> else
> @@ -1118,6 +1153,8 @@ static const struct vm_operations_struct vm_ops = {
>
>  static const struct drm_gem_object_funcs msm_gem_object_funcs = {
> .free = msm_gem_free_object,
> +   .open = msm_gem_open,
> +   .close = msm_gem_close,
> .pin = msm_gem_prime_pin,
> .unpin = msm_gem_prime_unpin,
> .get_sg_table = msm_gem_prime_get_sg_table,
> diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
> index 2bfcb222e353..f7d2a7d6f8cc 100644
> --- a/drivers/gpu/drm/msm/msm_gpu.h
> +++ b/drivers/gpu/drm/msm/msm_gpu.h
> @@ -428,6 +428,14 @@ struct msm_file_private {
>  * level.
>  */
> struct drm_sched_entity *entities[NR_SCHED_PRIORITIES *
> MSM_GPU_MAX_RINGS];
> +
> +   /

Re: [PATCH v2 0/1] Implementation of resource_query_layout

2023-12-26 Thread Gurchetan Singh

On Thu, Dec 21, 2023 at 2:01 AM Julia Zhang  wrote:

> Hi all,
>
> Sorry to late reply. This is v2 of the implementation of
> resource_query_layout. This adds a new ioctl to let guest query information
> of host resource, which is originally from Daniel Stone. We add some
> changes to support query the correct stride of host resource before it's
> created, which is to support to blit data from dGPU to virtio iGPU for dGPU
> prime feature.
>
> Changes from v1 to v2:
> -Squash two patches to a single patch.
> -A small modification of VIRTIO_GPU_F_RESOURCE_QUERY_LAYOUT
>
>
> Below is description of v1:
> This add implementation of resource_query_layout to get the information of
> how the host has actually allocated the buffer. This function is now used
> to query the stride for guest linear resource for dGPU prime on guest VMs.
>

You can use a context specific protocol or even the virgl capabilities [for
a linear strided resource].  For example, Sommelier does the following:

https://chromium.googlesource.com/chromiumos/platform2/+/HEAD/vm_tools/sommelier/virtualization/virtgpu_channel.cc#549

i.e, you should be able to avoid extra ioctl + hypercall.


>
> v1 of kernel side:
>  https:
> //
> lore.kernel.org/xen-devel/20231110074027.24862-1-julia.zh...@amd.com/T/#t
>
> v1 of qemu side:
> https:
> //
> lore.kernel.org/qemu-devel/20231110074027.24862-1-julia.zh...@amd.com/T/#t
>
> Daniel Stone (1):
>   drm/virtio: Implement RESOURCE_GET_LAYOUT ioctl
>
>  drivers/gpu/drm/virtio/virtgpu_drv.c   |  1 +
>  drivers/gpu/drm/virtio/virtgpu_drv.h   | 22 -
>  drivers/gpu/drm/virtio/virtgpu_ioctl.c | 66 ++
>  drivers/gpu/drm/virtio/virtgpu_kms.c   |  8 +++-
>  drivers/gpu/drm/virtio/virtgpu_vq.c| 63 
>  include/uapi/drm/virtgpu_drm.h | 21 
>  include/uapi/linux/virtio_gpu.h| 30 
>  7 files changed, 208 insertions(+), 3 deletions(-)
>
> --
> 2.34.1
>
>

Re: [PATCH v3 2/2] drm/uapi: add explicit virtgpu context debug name

2023-11-13 Thread Gurchetan Singh

On Sat, Nov 11, 2023 at 2:37 PM Dmitry Osipenko <
dmitry.osipe...@collabora.com> wrote:

> On 10/18/23 21:17, Gurchetan Singh wrote:
> > + case VIRTGPU_CONTEXT_PARAM_DEBUG_NAME:
> > + if (vfpriv->explicit_debug_name) {
> > + ret = -EINVAL;
> > + goto out_unlock;
> > + }
> > +
> > + ret = strncpy_from_user(vfpriv->debug_name,
> > + u64_to_user_ptr(value),
> > + DEBUG_NAME_MAX_LEN - 1);
> > +
> > + if (ret < 0) {
> > + ret = -EFAULT;
> > + goto out_unlock;
> > + }
> > +
> > + vfpriv->explicit_debug_name = true;
> > + break;
>
> Spotted a problem here. The ret needs to be set to zero on success. I'll
> send the fix shortly. Gurchetan you should've been getting the
> DRM_IOCTL_VIRTGPU_CONTEXT_INIT failure from gfxstream when you tested
> this patch, haven't you?
>

To accommodate older kernels/QEMU, gfxstream doesn't fail if CONTEXT_INIT
fails.  So the guest thought it failed and didn't react, but the value was
propagated to the host.


>
> Also noticed that the patch title says "drm/uapi" instead of
> "drm/virtio". My bad for not noticing it earlier. Please be more careful
> next time too :)
>
> --
> Best regards,
> Dmitry
>
>

Re: [PATCH v1] drm/virtio: Fix return value for VIRTGPU_CONTEXT_PARAM_DEBUG_NAME

2023-11-13 Thread Gurchetan Singh

On Sat, Nov 11, 2023 at 2:43 PM Dmitry Osipenko <
dmitry.osipe...@collabora.com> wrote:

> The strncpy_from_user() returns number of copied bytes and not zero on
> success. The non-zero return value of ioctl is treated as error. Return
> zero on success instead of the number of copied bytes.
>
> Fixes: 7add80126bce ("drm/uapi: add explicit virtgpu context debug name")
> Signed-off-by: Dmitry Osipenko 
>

Reviewed-by: Gurchetan Singh 


> ---
>  drivers/gpu/drm/virtio/virtgpu_ioctl.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> index 1e2042419f95..e4f76f315550 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> @@ -665,6 +665,7 @@ static int virtio_gpu_context_init_ioctl(struct
> drm_device *dev,
> goto out_unlock;
>
> vfpriv->explicit_debug_name = true;
> +   ret = 0;
> break;
> default:
> ret = -EINVAL;
> --
> 2.41.0
>
>

Re: [PATCH v3 2/2] drm/uapi: add explicit virtgpu context debug name

2023-11-10 Thread Gurchetan Singh

On Tue, Oct 31, 2023 at 8:55 AM Gurchetan Singh 
wrote:

>
>
> On Sun, Oct 22, 2023 at 4:50 PM Dmitry Osipenko <
> dmitry.osipe...@collabora.com> wrote:
>
>> On 10/18/23 21:17, Gurchetan Singh wrote:
>> > There are two problems with the current method of determining the
>> > virtio-gpu debug name.
>> >
>> > 1) TASK_COMM_LEN is defined to be 16 bytes only, and this is a
>> >Linux kernel idiom (see PR_SET_NAME + PR_GET_NAME). Though,
>> >Android/FreeBSD get around this via setprogname(..)/getprogname(..)
>> >in libc.
>> >
>> >On Android, names longer than 16 bytes are common.  For example,
>> >one often encounters a program like "com.android.systemui".
>> >
>> >The virtio-gpu spec allows the debug name to be up to 64 bytes, so
>> >ideally userspace should be able to set debug names up to 64 bytes.
>> >
>> > 2) The current implementation determines the debug name using whatever
>> >task initiated virtgpu.  This is could be a "RenderThread" of a
>> >larger program, when we actually want to propagate the debug name
>> >of the program.
>> >
>> > To fix these issues, add a new CONTEXT_INIT param that allows userspace
>> > to set the debug name when creating a context.
>> >
>> > It takes a null-terminated C-string as the param value. The length of
>> the
>> > string (excluding the terminator) **should** be <= 64 bytes.  Otherwise,
>> > the debug_name will be truncated to 64 bytes.
>> >
>> > Link to open-source userspace:
>> >
>> https://android-review.googlesource.com/c/platform/hardware/google/gfxstream/+/2787176
>> >
>> > Signed-off-by: Gurchetan Singh 
>> > Reviewed-by: Josh Simonot 
>> > ---
>> > Fixes suggested by Dmitry Osipenko
>> > v2:
>> > - Squash implementation and UAPI change into one commit
>> > - Avoid unnecessary casts
>> > - Use bool when necessary
>> > v3:
>> > - Use DEBUG_NAME_MAX_LEN - 1 when copying string
>> >
>> >  drivers/gpu/drm/virtio/virtgpu_drv.h   |  5 
>> >  drivers/gpu/drm/virtio/virtgpu_ioctl.c | 39 ++
>> >  include/uapi/drm/virtgpu_drm.h |  2 ++
>> >  3 files changed, 40 insertions(+), 6 deletions(-)
>>
>> Gerd, do you have objections to this UAPI change?
>>
>
> Bump.  I say we wait another week and see if anyone cares [I suspect
> nobody does].
>
>
> https://drm.pages.freedesktop.org/maintainer-tools/committer-drm-misc.html#merge-criteria
>
> As per DRM guidelines, if there are no open comments and the change is
> reviewed, it is mergeable.
>

*hears crickets*

Can we merge this now?


>
>>
>> --
>> Best regards,
>> Dmitry
>>
>>

Re: [PATCH v18 25/26] drm/virtio: Support shmem shrinking

2023-11-03 Thread Gurchetan Singh

On Sun, Oct 29, 2023 at 4:03 PM Dmitry Osipenko <
dmitry.osipe...@collabora.com> wrote:

> Support generic drm-shmem memory shrinker and add new madvise IOCTL to
> the VirtIO-GPU driver. BO cache manager of Mesa driver will mark BOs as
> "don't need" using the new IOCTL to let shrinker purge the marked BOs on
> OOM, the shrinker will also evict unpurgeable shmem BOs from memory if
> guest supports SWAP file or partition.
>
> Acked-by: Gerd Hoffmann 
> Signed-off-by: Daniel Almeida 
> Signed-off-by: Dmitry Osipenko 
> ---
>  drivers/gpu/drm/virtio/virtgpu_drv.h| 13 +-
>  drivers/gpu/drm/virtio/virtgpu_gem.c| 35 ++
>  drivers/gpu/drm/virtio/virtgpu_ioctl.c  | 25 ++
>  drivers/gpu/drm/virtio/virtgpu_kms.c|  8 
>  drivers/gpu/drm/virtio/virtgpu_object.c | 61 +
>  drivers/gpu/drm/virtio/virtgpu_vq.c | 40 
>  include/uapi/drm/virtgpu_drm.h  | 14 ++
>  7 files changed, 195 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h
> b/drivers/gpu/drm/virtio/virtgpu_drv.h
> index 421f524ae1de..33a78b24c272 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_drv.h
> +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
> @@ -278,7 +278,7 @@ struct virtio_gpu_fpriv {
>  };
>
>  /* virtgpu_ioctl.c */
> -#define DRM_VIRTIO_NUM_IOCTLS 12
> +#define DRM_VIRTIO_NUM_IOCTLS 13
>  extern struct drm_ioctl_desc virtio_gpu_ioctls[DRM_VIRTIO_NUM_IOCTLS];
>  void virtio_gpu_create_context(struct drm_device *dev, struct drm_file
> *file);
>
> @@ -316,6 +316,8 @@ void virtio_gpu_array_put_free_delayed(struct
> virtio_gpu_device *vgdev,
>  void virtio_gpu_array_put_free_work(struct work_struct *work);
>  int virtio_gpu_array_prepare(struct virtio_gpu_device *vgdev,
>  struct virtio_gpu_object_array *objs);
> +int virtio_gpu_gem_host_mem_release(struct virtio_gpu_object *bo);
> +int virtio_gpu_gem_madvise(struct virtio_gpu_object *obj, int madv);
>  int virtio_gpu_gem_pin(struct virtio_gpu_object *bo);
>  void virtio_gpu_gem_unpin(struct virtio_gpu_object *bo);
>
> @@ -329,6 +331,8 @@ void virtio_gpu_cmd_create_resource(struct
> virtio_gpu_device *vgdev,
> struct virtio_gpu_fence *fence);
>  void virtio_gpu_cmd_unref_resource(struct virtio_gpu_device *vgdev,
>struct virtio_gpu_object *bo);
> +int virtio_gpu_cmd_release_resource(struct virtio_gpu_device *vgdev,
> +   struct virtio_gpu_object *bo);
>  void virtio_gpu_cmd_transfer_to_host_2d(struct virtio_gpu_device *vgdev,
> uint64_t offset,
> uint32_t width, uint32_t height,
> @@ -349,6 +353,9 @@ void virtio_gpu_object_attach(struct virtio_gpu_device
> *vgdev,
>   struct virtio_gpu_object *obj,
>   struct virtio_gpu_mem_entry *ents,
>   unsigned int nents);
> +void virtio_gpu_object_detach(struct virtio_gpu_device *vgdev,
> + struct virtio_gpu_object *obj,
> + struct virtio_gpu_fence *fence);
>  void virtio_gpu_cursor_ping(struct virtio_gpu_device *vgdev,
> struct virtio_gpu_output *output);
>  int virtio_gpu_cmd_get_display_info(struct virtio_gpu_device *vgdev);
> @@ -492,4 +499,8 @@ void virtio_gpu_vram_unmap_dma_buf(struct device *dev,
>  int virtio_gpu_execbuffer_ioctl(struct drm_device *dev, void *data,
> struct drm_file *file);
>
> +/* virtgpu_gem_shrinker.c */
> +int virtio_gpu_gem_shrinker_init(struct virtio_gpu_device *vgdev);
> +void virtio_gpu_gem_shrinker_fini(struct virtio_gpu_device *vgdev);
> +
>  #endif
> diff --git a/drivers/gpu/drm/virtio/virtgpu_gem.c
> b/drivers/gpu/drm/virtio/virtgpu_gem.c
> index 97e67064c97e..748f7bbb0e6d 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_gem.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_gem.c
> @@ -147,10 +147,20 @@ void virtio_gpu_gem_object_close(struct
> drm_gem_object *obj,
> struct virtio_gpu_device *vgdev = obj->dev->dev_private;
> struct virtio_gpu_fpriv *vfpriv = file->driver_priv;
> struct virtio_gpu_object_array *objs;
> +   struct virtio_gpu_object *bo;
>
> if (!vgdev->has_virgl_3d)
> return;
>
> +   bo = gem_to_virtio_gpu_obj(obj);
> +
> +   /*
> +* Purged BO was already detached and released, the resource ID
> +* is invalid by now.
> +*/
> +   if (!virtio_gpu_gem_madvise(bo, VIRTGPU_MADV_WILLNEED))
> +   return;
> +
> objs = virtio_gpu_array_alloc(1);
> if (!objs)
> return;
> @@ -315,6 +325,31 @@ int virtio_gpu_array_prepare(struct virtio_gpu_device
> *vgdev,
> return ret;
>  }
>
> +int virtio_gpu_gem_madvise(struct virtio_gpu_object *bo, int madv)
> +{
> +   if (virtio_gpu_is_

Re: [PATCH v3 2/2] drm/uapi: add explicit virtgpu context debug name

2023-10-31 Thread Gurchetan Singh

On Sun, Oct 22, 2023 at 4:50 PM Dmitry Osipenko <
dmitry.osipe...@collabora.com> wrote:

> On 10/18/23 21:17, Gurchetan Singh wrote:
> > There are two problems with the current method of determining the
> > virtio-gpu debug name.
> >
> > 1) TASK_COMM_LEN is defined to be 16 bytes only, and this is a
> >Linux kernel idiom (see PR_SET_NAME + PR_GET_NAME). Though,
> >Android/FreeBSD get around this via setprogname(..)/getprogname(..)
> >in libc.
> >
> >On Android, names longer than 16 bytes are common.  For example,
> >one often encounters a program like "com.android.systemui".
> >
> >The virtio-gpu spec allows the debug name to be up to 64 bytes, so
> >ideally userspace should be able to set debug names up to 64 bytes.
> >
> > 2) The current implementation determines the debug name using whatever
> >task initiated virtgpu.  This is could be a "RenderThread" of a
> >larger program, when we actually want to propagate the debug name
> >of the program.
> >
> > To fix these issues, add a new CONTEXT_INIT param that allows userspace
> > to set the debug name when creating a context.
> >
> > It takes a null-terminated C-string as the param value. The length of the
> > string (excluding the terminator) **should** be <= 64 bytes.  Otherwise,
> > the debug_name will be truncated to 64 bytes.
> >
> > Link to open-source userspace:
> >
> https://android-review.googlesource.com/c/platform/hardware/google/gfxstream/+/2787176
> >
> > Signed-off-by: Gurchetan Singh 
> > Reviewed-by: Josh Simonot 
> > ---
> > Fixes suggested by Dmitry Osipenko
> > v2:
> > - Squash implementation and UAPI change into one commit
> > - Avoid unnecessary casts
> > - Use bool when necessary
> > v3:
> > - Use DEBUG_NAME_MAX_LEN - 1 when copying string
> >
> >  drivers/gpu/drm/virtio/virtgpu_drv.h   |  5 
> >  drivers/gpu/drm/virtio/virtgpu_ioctl.c | 39 ++
> >  include/uapi/drm/virtgpu_drm.h |  2 ++
> >  3 files changed, 40 insertions(+), 6 deletions(-)
>
> Gerd, do you have objections to this UAPI change?
>

Bump.  I say we wait another week and see if anyone cares [I suspect nobody
does].


https://drm.pages.freedesktop.org/maintainer-tools/committer-drm-misc.html#merge-criteria

As per DRM guidelines, if there are no open comments and the change is
reviewed, it is mergeable.


>
> --
> Best regards,
> Dmitry
>
>

Re: [PATCH v3] drm/virtio: add new virtio gpu capset definitions

2023-10-18 Thread Gurchetan Singh

On Tue, Oct 10, 2023 at 9:41 PM Huang Rui  wrote:

> On Tue, Oct 10, 2023 at 11:52:14PM +0800, Dmitry Osipenko wrote:
> > On 10/10/23 18:40, Dmitry Osipenko wrote:
> > > On 10/10/23 16:57, Huang Rui wrote:
> > >> These definitions are used fro qemu, and qemu imports this marco in
> the
> > >> headers to enable gfxstream, venus, cross domain, and drm (native
> > >> context) for virtio gpu. So it should add them even kernel doesn't use
> > >> this.
> > >>
> > >> Signed-off-by: Huang Rui 
> > >> Reviewed-by: Akihiko Odaki 
> > >> ---
> > >>
> > >> Changes V1 -> V2:
> > >> - Add all capsets including gfxstream and venus in kernel header
> (Dmitry Osipenko)
> > >>
> > >> Changes V2 -> V3:
> > >> - Add missed capsets including cross domain and drm (native context)
> > >>   (Dmitry Osipenko)
> > >>
> > >> v1:
> https://lore.kernel.org/lkml/20230915105918.3763061-1-ray.hu...@amd.com/
> > >> v2:
> https://lore.kernel.org/lkml/20231010032553.1138036-1-ray.hu...@amd.com/
> > >>
> > >>  include/uapi/linux/virtio_gpu.h | 4 
> > >>  1 file changed, 4 insertions(+)
> > >>
> > >> diff --git a/include/uapi/linux/virtio_gpu.h
> b/include/uapi/linux/virtio_gpu.h
> > >> index f556fde07b76..240911c8da31 100644
> > >> --- a/include/uapi/linux/virtio_gpu.h
> > >> +++ b/include/uapi/linux/virtio_gpu.h
> > >> @@ -309,6 +309,10 @@ struct virtio_gpu_cmd_submit {
> > >>
> > >>  #define VIRTIO_GPU_CAPSET_VIRGL 1
> > >>  #define VIRTIO_GPU_CAPSET_VIRGL2 2
> > >> +#define VIRTIO_GPU_CAPSET_GFXSTREAM 3
> > >
> > > The GFXSTREAM capset isn't correct, it should be GFXSTREAM_VULKAN in
> > > accordance to [1] and [2]. There are more capsets for GFXSTREAM.
> > >
> > > [1]
> > >
> https://github.com/google/crosvm/blob/main/rutabaga_gfx/src/rutabaga_utils.rs#L172
> > >
> > > [2]
> > >
> https://patchwork.kernel.org/project/qemu-devel/patch/20231006010835.444-7-gurchetansi...@chromium.org/
> >
> > Though, maybe those are "rutabaga" capsets that not related to
> > virtio-gpu because crosvm has another defs for virtio-gpu capsets [3].
> > The DRM capset is oddly missing in [3] and code uses "rutabaga" capset
> > for DRM and virtio-gpu.
> >
> > [3]
> >
> https://github.com/google/crosvm/blob/main/devices/src/virtio/gpu/protocol.rs#L416
>
> Yes, [3] is the file that I referred to add these capsets definitions. And
> it's defined as gfxstream not gfxstream_vulkan.
>
> >
> > Gurchetan, could you please clarify which capsets definitions are
> > related to virtio-gpu and gfxstream. The
> > GFXSTREAM_VULKAN/GLES/MAGMA/COMPOSER or just the single GFXSTREAM?


It should be GFXSTREAM_VULKAN.  The rest are more experimental and easy to
modify in terms of the enum value, should the need arise.

I imagine the virtio-spec update to reflect the GFXSTREAM to
GFXSTREAM_VULKAN change will happen eventually.


> >
>
> Gurchetan, may we have your insight?
>
> Thanks,
> Ray
>
> > --
> > Best regards,
> > Dmitry
> >
>

[PATCH v3 2/2] drm/uapi: add explicit virtgpu context debug name

2023-10-18 Thread Gurchetan Singh

There are two problems with the current method of determining the
virtio-gpu debug name.

1) TASK_COMM_LEN is defined to be 16 bytes only, and this is a
   Linux kernel idiom (see PR_SET_NAME + PR_GET_NAME). Though,
   Android/FreeBSD get around this via setprogname(..)/getprogname(..)
   in libc.

   On Android, names longer than 16 bytes are common.  For example,
   one often encounters a program like "com.android.systemui".

   The virtio-gpu spec allows the debug name to be up to 64 bytes, so
   ideally userspace should be able to set debug names up to 64 bytes.

2) The current implementation determines the debug name using whatever
   task initiated virtgpu.  This is could be a "RenderThread" of a
   larger program, when we actually want to propagate the debug name
   of the program.

To fix these issues, add a new CONTEXT_INIT param that allows userspace
to set the debug name when creating a context.

It takes a null-terminated C-string as the param value. The length of the
string (excluding the terminator) **should** be <= 64 bytes.  Otherwise,
the debug_name will be truncated to 64 bytes.

Link to open-source userspace:
https://android-review.googlesource.com/c/platform/hardware/google/gfxstream/+/2787176

Signed-off-by: Gurchetan Singh 
Reviewed-by: Josh Simonot 
---
Fixes suggested by Dmitry Osipenko
v2:
- Squash implementation and UAPI change into one commit
- Avoid unnecessary casts
- Use bool when necessary
v3:
- Use DEBUG_NAME_MAX_LEN - 1 when copying string

 drivers/gpu/drm/virtio/virtgpu_drv.h   |  5 
 drivers/gpu/drm/virtio/virtgpu_ioctl.c | 39 ++
 include/uapi/drm/virtgpu_drm.h |  2 ++
 3 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
b/drivers/gpu/drm/virtio/virtgpu_drv.h
index 96365a772f77..bb7d86a0c6a1 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -58,6 +58,9 @@
 #define MAX_CAPSET_ID 63
 #define MAX_RINGS 64
 
+/* See virtio_gpu_ctx_create. One additional character for NULL terminator. */
+#define DEBUG_NAME_MAX_LEN 65
+
 struct virtio_gpu_object_params {
unsigned long size;
bool dumb;
@@ -274,6 +277,8 @@ struct virtio_gpu_fpriv {
uint64_t base_fence_ctx;
uint64_t ring_idx_mask;
struct mutex context_lock;
+   char debug_name[DEBUG_NAME_MAX_LEN];
+   bool explicit_debug_name;
 };
 
 /* virtgpu_ioctl.c */
diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
index 8d13b17c215b..65811e818925 100644
--- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
+++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
@@ -42,12 +42,19 @@
 static void virtio_gpu_create_context_locked(struct virtio_gpu_device *vgdev,
 struct virtio_gpu_fpriv *vfpriv)
 {
-   char dbgname[TASK_COMM_LEN];
+   if (vfpriv->explicit_debug_name) {
+   virtio_gpu_cmd_context_create(vgdev, vfpriv->ctx_id,
+ vfpriv->context_init,
+ strlen(vfpriv->debug_name),
+ vfpriv->debug_name);
+   } else {
+   char dbgname[TASK_COMM_LEN];
 
-   get_task_comm(dbgname, current);
-   virtio_gpu_cmd_context_create(vgdev, vfpriv->ctx_id,
- vfpriv->context_init, strlen(dbgname),
- dbgname);
+   get_task_comm(dbgname, current);
+   virtio_gpu_cmd_context_create(vgdev, vfpriv->ctx_id,
+ vfpriv->context_init, 
strlen(dbgname),
+ dbgname);
+   }
 
vfpriv->context_created = true;
 }
@@ -107,6 +114,9 @@ static int virtio_gpu_getparam_ioctl(struct drm_device 
*dev, void *data,
case VIRTGPU_PARAM_SUPPORTED_CAPSET_IDs:
value = vgdev->capset_id_mask;
break;
+   case VIRTGPU_PARAM_EXPLICIT_DEBUG_NAME:
+   value = vgdev->has_context_init ? 1 : 0;
+   break;
default:
return -EINVAL;
}
@@ -580,7 +590,7 @@ static int virtio_gpu_context_init_ioctl(struct drm_device 
*dev,
return -EINVAL;
 
/* Number of unique parameters supported at this time. */
-   if (num_params > 3)
+   if (num_params > 4)
return -EINVAL;
 
ctx_set_params = memdup_user(u64_to_user_ptr(args->ctx_set_params),
@@ -642,6 +652,23 @@ static int virtio_gpu_context_init_ioctl(struct drm_device 
*dev,
 
vfpriv->ring_idx_mask = value;
break;
+   case VIRTGPU_CONTEXT_PARAM_DEBUG_NAME:
+   if (vfpriv->explicit_debug_name) {
+

[PATCH v3 1/2] drm/virtio: use uint64_t more in virtio_gpu_context_init_ioctl

2023-10-18 Thread Gurchetan Singh

drm_virtgpu_context_set_param defines both param and
value to be u64s.

Signed-off-by: Gurchetan Singh 
Reviewed-by: Josh Simonot 
Reviewed-by: Dmitry Osipenko 
---
 drivers/gpu/drm/virtio/virtgpu_ioctl.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
index b24b11f25197..8d13b17c215b 100644
--- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
+++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
@@ -565,8 +565,8 @@ static int virtio_gpu_context_init_ioctl(struct drm_device 
*dev,
 void *data, struct drm_file *file)
 {
int ret = 0;
-   uint32_t num_params, i, param, value;
-   uint64_t valid_ring_mask;
+   uint32_t num_params, i;
+   uint64_t valid_ring_mask, param, value;
size_t len;
struct drm_virtgpu_context_set_param *ctx_set_params = NULL;
struct virtio_gpu_device *vgdev = dev->dev_private;
-- 
2.42.0.655.g421f12c284-goog

[PATCH v2 2/2] drm/uapi: add explicit virtgpu context debug name

2023-10-18 Thread Gurchetan Singh

There are two problems with the current method of determining the
virtio-gpu debug name.

1) TASK_COMM_LEN is defined to be 16 bytes only, and this is a
   Linux kernel idiom (see PR_SET_NAME + PR_GET_NAME). Though,
   Android/FreeBSD get around this via setprogname(..)/getprogname(..)
   in libc.

   On Android, names longer than 16 bytes are common.  For example,
   one often encounters a program like "com.android.systemui".

   The virtio-gpu spec allows the debug name to be up to 64 bytes, so
   ideally userspace should be able to set debug names up to 64 bytes.

2) The current implementation determines the debug name using whatever
   task initiated virtgpu.  This is could be a "RenderThread" of a
   larger program, when we actually want to propagate the debug name
   of the program.

To fix these issues, add a new CONTEXT_INIT param that allows userspace
to set the debug name when creating a context.

It takes a null-terminated C-string as the param value. The length of the
string (excluding the terminator) **should** be <= 64 bytes.  Otherwise,
the debug_name will be truncated to 64 bytes.

Link to open-source userspace:
https://android-review.googlesource.com/c/platform/hardware/google/gfxstream/+/2787176

Signed-off-by: Gurchetan Singh 
Reviewed-by: Josh Simonot 
---
v2: Fixes suggested by Dmitry Osipenko
- Squash implementation and UAPI change into one commit
- Avoid unnecessary casts
- Use bool when necessary
- Add case for when length of string exceeds DEBUG_NAME_MAX_LEN.

 drivers/gpu/drm/virtio/virtgpu_drv.h   |  5 +++
 drivers/gpu/drm/virtio/virtgpu_ioctl.c | 46 ++
 include/uapi/drm/virtgpu_drm.h |  2 ++
 3 files changed, 47 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
b/drivers/gpu/drm/virtio/virtgpu_drv.h
index 96365a772f77..bb7d86a0c6a1 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -58,6 +58,9 @@
 #define MAX_CAPSET_ID 63
 #define MAX_RINGS 64
 
+/* See virtio_gpu_ctx_create. One additional character for NULL terminator. */
+#define DEBUG_NAME_MAX_LEN 65
+
 struct virtio_gpu_object_params {
unsigned long size;
bool dumb;
@@ -274,6 +277,8 @@ struct virtio_gpu_fpriv {
uint64_t base_fence_ctx;
uint64_t ring_idx_mask;
struct mutex context_lock;
+   char debug_name[DEBUG_NAME_MAX_LEN];
+   bool explicit_debug_name;
 };
 
 /* virtgpu_ioctl.c */
diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
index 8d13b17c215b..357d670361a0 100644
--- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
+++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
@@ -42,12 +42,19 @@
 static void virtio_gpu_create_context_locked(struct virtio_gpu_device *vgdev,
 struct virtio_gpu_fpriv *vfpriv)
 {
-   char dbgname[TASK_COMM_LEN];
+   if (vfpriv->explicit_debug_name) {
+   virtio_gpu_cmd_context_create(vgdev, vfpriv->ctx_id,
+ vfpriv->context_init,
+ strlen(vfpriv->debug_name),
+ vfpriv->debug_name);
+   } else {
+   char dbgname[TASK_COMM_LEN];
 
-   get_task_comm(dbgname, current);
-   virtio_gpu_cmd_context_create(vgdev, vfpriv->ctx_id,
- vfpriv->context_init, strlen(dbgname),
- dbgname);
+   get_task_comm(dbgname, current);
+   virtio_gpu_cmd_context_create(vgdev, vfpriv->ctx_id,
+ vfpriv->context_init, 
strlen(dbgname),
+ dbgname);
+   }
 
vfpriv->context_created = true;
 }
@@ -107,6 +114,9 @@ static int virtio_gpu_getparam_ioctl(struct drm_device 
*dev, void *data,
case VIRTGPU_PARAM_SUPPORTED_CAPSET_IDs:
value = vgdev->capset_id_mask;
break;
+   case VIRTGPU_PARAM_EXPLICIT_DEBUG_NAME:
+   value = vgdev->has_context_init ? 1 : 0;
+   break;
default:
return -EINVAL;
}
@@ -580,7 +590,7 @@ static int virtio_gpu_context_init_ioctl(struct drm_device 
*dev,
return -EINVAL;
 
/* Number of unique parameters supported at this time. */
-   if (num_params > 3)
+   if (num_params > 4)
return -EINVAL;
 
ctx_set_params = memdup_user(u64_to_user_ptr(args->ctx_set_params),
@@ -642,6 +652,30 @@ static int virtio_gpu_context_init_ioctl(struct drm_device 
*dev,
 
vfpriv->ring_idx_mask = value;
break;
+   case VIRTGPU_CONTEXT_PARAM_DEBUG_NAME:
+   if (vfpriv->explicit_debug_name) {
+

[PATCH v2 1/2] drm/virtio: use uint64_t more in virtio_gpu_context_init_ioctl

2023-10-18 Thread Gurchetan Singh

drm_virtgpu_context_set_param defines both param and
value to be u64s.

Signed-off-by: Gurchetan Singh 
Reviewed-by: Josh Simonot 
Reviewed-by: Dmitry Osipenko 
---
 drivers/gpu/drm/virtio/virtgpu_ioctl.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
index b24b11f25197..8d13b17c215b 100644
--- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
+++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
@@ -565,8 +565,8 @@ static int virtio_gpu_context_init_ioctl(struct drm_device 
*dev,
 void *data, struct drm_file *file)
 {
int ret = 0;
-   uint32_t num_params, i, param, value;
-   uint64_t valid_ring_mask;
+   uint32_t num_params, i;
+   uint64_t valid_ring_mask, param, value;
size_t len;
struct drm_virtgpu_context_set_param *ctx_set_params = NULL;
struct virtio_gpu_device *vgdev = dev->dev_private;
-- 
2.42.0.655.g421f12c284-goog

[PATCH 3/3] drm/virtio: implement debug name via CONTEXT_INIT

2023-10-16 Thread Gurchetan Singh

This allows setting the debug name during CONTEXT_INIT.

Signed-off-by: Gurchetan Singh 
---
 drivers/gpu/drm/virtio/virtgpu_drv.h   |  4 +++
 drivers/gpu/drm/virtio/virtgpu_ioctl.c | 38 ++
 2 files changed, 36 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
b/drivers/gpu/drm/virtio/virtgpu_drv.h
index 96365a772f77..c0702d630e05 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -58,6 +58,8 @@
 #define MAX_CAPSET_ID 63
 #define MAX_RINGS 64
 
+#define DEBUG_NAME_MAX_LEN 64
+
 struct virtio_gpu_object_params {
unsigned long size;
bool dumb;
@@ -274,6 +276,8 @@ struct virtio_gpu_fpriv {
uint64_t base_fence_ctx;
uint64_t ring_idx_mask;
struct mutex context_lock;
+   char debug_name[DEBUG_NAME_MAX_LEN];
+   char explicit_debug_name;
 };
 
 /* virtgpu_ioctl.c */
diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
index 8d13b17c215b..4d6d44a4c899 100644
--- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
+++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
@@ -42,12 +42,19 @@
 static void virtio_gpu_create_context_locked(struct virtio_gpu_device *vgdev,
 struct virtio_gpu_fpriv *vfpriv)
 {
-   char dbgname[TASK_COMM_LEN];
+   if (vfpriv->explicit_debug_name) {
+   virtio_gpu_cmd_context_create(vgdev, vfpriv->ctx_id,
+ vfpriv->context_init,
+ strlen(vfpriv->debug_name),
+ vfpriv->debug_name);
+   } else {
+   char dbgname[TASK_COMM_LEN];
 
-   get_task_comm(dbgname, current);
-   virtio_gpu_cmd_context_create(vgdev, vfpriv->ctx_id,
- vfpriv->context_init, strlen(dbgname),
- dbgname);
+   get_task_comm(dbgname, current);
+   virtio_gpu_cmd_context_create(vgdev, vfpriv->ctx_id,
+ vfpriv->context_init, 
strlen(dbgname),
+ dbgname);
+   }
 
vfpriv->context_created = true;
 }
@@ -107,6 +114,9 @@ static int virtio_gpu_getparam_ioctl(struct drm_device 
*dev, void *data,
case VIRTGPU_PARAM_SUPPORTED_CAPSET_IDs:
value = vgdev->capset_id_mask;
break;
+   case VIRTGPU_PARAM_EXPLICIT_DEBUG_NAME:
+   value = vgdev->has_context_init ? 1 : 0;
+   break;
default:
return -EINVAL;
}
@@ -580,7 +590,7 @@ static int virtio_gpu_context_init_ioctl(struct drm_device 
*dev,
return -EINVAL;
 
/* Number of unique parameters supported at this time. */
-   if (num_params > 3)
+   if (num_params > 4)
return -EINVAL;
 
ctx_set_params = memdup_user(u64_to_user_ptr(args->ctx_set_params),
@@ -642,6 +652,22 @@ static int virtio_gpu_context_init_ioctl(struct drm_device 
*dev,
 
vfpriv->ring_idx_mask = value;
break;
+   case VIRTGPU_CONTEXT_PARAM_DEBUG_NAME:
+   if (vfpriv->explicit_debug_name) {
+   ret = -EINVAL;
+   goto out_unlock;
+   }
+
+   ret = strncpy_from_user(vfpriv->debug_name,
+   (const char __user 
*)u64_to_user_ptr(value),
+   DEBUG_NAME_MAX_LEN);
+   if (ret < 0) {
+   ret = -EFAULT;
+   goto out_unlock;
+   }
+
+   vfpriv->explicit_debug_name = true;
+   break;
default:
ret = -EINVAL;
goto out_unlock;
-- 
2.42.0.655.g421f12c284-goog

[PATCH 2/3] drm/uapi: add explicit virtgpu context debug name

2023-10-16 Thread Gurchetan Singh

There are two problems with the current method of determining the
virtio-gpu debug name.

1) TASK_COMM_LEN is defined to be 16 bytes only, and this is a
   Linux kernel idiom (see PR_SET_NAME + PR_GET_NAME). Though,
   Android/FreeBSD get around this via setprogname(..)/getprogname(..)
   in libc.

   On Android, names longer than 16 bytes are common.  For example,
   one often encounters a program like "com.android.systemui".

   The virtio-gpu spec allows the debug name to be up to 64 bytes, so
   ideally userspace should be able to set debug names up to 64 bytes.

2) The current implementation determines the debug name using whatever
   task initiated virtgpu.  This is could be a "RenderThread" of a
   larger program, when we actually want to propagate the debug name
   of the program.

To fix these issues, add a new CONTEXT_INIT param that allows userspace
to set the debug name when creating a context.

It takes a null-terminated C-string as the param value.

Link to open-source userspace:
https://android-review.googlesource.com/c/platform/hardware/google/gfxstream/+/2787176

Signed-off-by: Gurchetan Singh 
---
 include/uapi/drm/virtgpu_drm.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/uapi/drm/virtgpu_drm.h b/include/uapi/drm/virtgpu_drm.h
index b1d0e56565bc..c2ce71987e9b 100644
--- a/include/uapi/drm/virtgpu_drm.h
+++ b/include/uapi/drm/virtgpu_drm.h
@@ -97,6 +97,7 @@ struct drm_virtgpu_execbuffer {
 #define VIRTGPU_PARAM_CROSS_DEVICE 5 /* Cross virtio-device resource sharing  
*/
 #define VIRTGPU_PARAM_CONTEXT_INIT 6 /* DRM_VIRTGPU_CONTEXT_INIT */
 #define VIRTGPU_PARAM_SUPPORTED_CAPSET_IDs 7 /* Bitmask of supported 
capability set ids */
+#define VIRTGPU_PARAM_EXPLICIT_DEBUG_NAME 8 /* Ability to set debug name from 
userspace */
 
 struct drm_virtgpu_getparam {
__u64 param;
@@ -198,6 +199,7 @@ struct drm_virtgpu_resource_create_blob {
 #define VIRTGPU_CONTEXT_PARAM_CAPSET_ID   0x0001
 #define VIRTGPU_CONTEXT_PARAM_NUM_RINGS   0x0002
 #define VIRTGPU_CONTEXT_PARAM_POLL_RINGS_MASK 0x0003
+#define VIRTGPU_CONTEXT_PARAM_DEBUG_NAME  0x0004
 struct drm_virtgpu_context_set_param {
__u64 param;
__u64 value;
-- 
2.42.0.655.g421f12c284-goog

[PATCH 1/3] drm/virtio: use uint64_t more in virtio_gpu_context_init_ioctl

2023-10-16 Thread Gurchetan Singh

drm_virtgpu_context_set_param defines both param and
value to be u64s.

Signed-off-by: Gurchetan Singh 
---
 drivers/gpu/drm/virtio/virtgpu_ioctl.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
index b24b11f25197..8d13b17c215b 100644
--- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
+++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
@@ -565,8 +565,8 @@ static int virtio_gpu_context_init_ioctl(struct drm_device 
*dev,
 void *data, struct drm_file *file)
 {
int ret = 0;
-   uint32_t num_params, i, param, value;
-   uint64_t valid_ring_mask;
+   uint32_t num_params, i;
+   uint64_t valid_ring_mask, param, value;
size_t len;
struct drm_virtgpu_context_set_param *ctx_set_params = NULL;
struct virtio_gpu_device *vgdev = dev->dev_private;
-- 
2.42.0.655.g421f12c284-goog

Re: [PATCH v6 0/3] Add sync object UAPI support to VirtIO-GPU driver

2023-07-28 Thread Gurchetan Singh

On Wed, Jul 19, 2023 at 11:58 AM Dmitry Osipenko <
dmitry.osipe...@collabora.com> wrote:

> 27.06.2023 20:16, Rob Clark пишет:
> ...
> >> Now these are just suggestions, and while I think they are good, you
> can safely ignore them.
> >>
> >> But there's also the DRM requirements, which state "userspace side must
> be fully reviewed and tested to the standards of that user-space
> project.".  So I think to meet the minimum requirements, I think we should
> at-least have one of the following (not all, just one) reviewed:
> >>
> >> 1) venus using the new uapi
> >> 2) gfxstream vk using the new uapi
> >> 3) amdgpu nctx out of "draft" mode and using the new uapi.
> >> 4) virtio-intel using new uapi
> >> 5) turnip using your new uapi
> >
> > forgot to mention this earlier, but
> > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23533
> >
> > Dmitry, you can also add, if you haven't already:
> >
> > Tested-by: Rob Clark 
>
> Gurchetan, Turnip Mesa virtio support is ready to be merged upstream,
> it's using this new syncobj UAPI. Could you please give yours r-b if you
> don't have objections?
>

Given that Turnip native contexts are reviewed using this UAPI, your change
does now meet the requirements and is ready to merge.

One thing I noticed is you might need explicit padding between
`num_out_syncobjs` and `in_syncobjs`.  Otherwise, feel free to add my
acked-by.


>
> --
> Best regards,
> Dmitry
>
>

[PATCH v4] drm/virtio: conditionally allocate virtio_gpu_fence

2023-07-07 Thread Gurchetan Singh

We don't want to create a fence for every command submission.  It's
only necessary when userspace provides a waitable token for submission.
This could be:

1) bo_handles, to be used with VIRTGPU_WAIT
2) out_fence_fd, to be used with dma_fence apis
3) a ring_idx provided with VIRTGPU_CONTEXT_PARAM_POLL_RINGS_MASK
   + DRM event API
4) syncobjs in the future

The use case for just submitting a command to the host, and expecting
no response.  For example, gfxstream has GFXSTREAM_CONTEXT_PING that
just wakes up the host side worker threads.  There's also
CROSS_DOMAIN_CMD_SEND which just sends data to the Wayland server.

This prevents the need to signal the automatically created
virtio_gpu_fence.

In addition, VIRTGPU_EXECBUF_RING_IDX is checked when creating a
DRM event object.  VIRTGPU_CONTEXT_PARAM_POLL_RINGS_MASK is
already defined in terms of per-context rings.  It was theoretically
possible to create a DRM event on the global timeline (ring_idx == 0),
if the context enabled DRM event polling.  However, that wouldn't
work and userspace (Sommelier).  Explicitly disallow it for
clarity.

Signed-off-by: Gurchetan Singh 
---
 v2: Fix indent (Dmitry)
 v3: Refactor drm fence event checks to avoid possible NULL deref (Dmitry)
 v4: More detailed commit message about addition drm fence event checks (Dmitry)

 drivers/gpu/drm/virtio/virtgpu_submit.c | 28 +
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_submit.c 
b/drivers/gpu/drm/virtio/virtgpu_submit.c
index cf3c04b16a7a..004364cf86d7 100644
--- a/drivers/gpu/drm/virtio/virtgpu_submit.c
+++ b/drivers/gpu/drm/virtio/virtgpu_submit.c
@@ -64,13 +64,9 @@ static int virtio_gpu_fence_event_create(struct drm_device 
*dev,
 struct virtio_gpu_fence *fence,
 u32 ring_idx)
 {
-   struct virtio_gpu_fpriv *vfpriv = file->driver_priv;
struct virtio_gpu_fence_event *e = NULL;
int ret;
 
-   if (!(vfpriv->ring_idx_mask & BIT_ULL(ring_idx)))
-   return 0;
-
e = kzalloc(sizeof(*e), GFP_KERNEL);
if (!e)
return -ENOMEM;
@@ -161,21 +157,27 @@ static int virtio_gpu_init_submit(struct 
virtio_gpu_submit *submit,
  struct drm_file *file,
  u64 fence_ctx, u32 ring_idx)
 {
+   int err;
+   struct virtio_gpu_fence *out_fence;
struct virtio_gpu_fpriv *vfpriv = file->driver_priv;
struct virtio_gpu_device *vgdev = dev->dev_private;
-   struct virtio_gpu_fence *out_fence;
-   int err;
+   bool drm_fence_event = (exbuf->flags & VIRTGPU_EXECBUF_RING_IDX) &&
+  (vfpriv->ring_idx_mask & BIT_ULL(ring_idx));
 
memset(submit, 0, sizeof(*submit));
 
-   out_fence = virtio_gpu_fence_alloc(vgdev, fence_ctx, ring_idx);
-   if (!out_fence)
-   return -ENOMEM;
+   if ((exbuf->flags & VIRTGPU_EXECBUF_FENCE_FD_OUT) || drm_fence_event ||
+exbuf->num_bo_handles)
+   out_fence = virtio_gpu_fence_alloc(vgdev, fence_ctx, ring_idx);
+   else
+   out_fence = NULL;
 
-   err = virtio_gpu_fence_event_create(dev, file, out_fence, ring_idx);
-   if (err) {
-   dma_fence_put(&out_fence->f);
-   return err;
+   if (drm_fence_event) {
+   err = virtio_gpu_fence_event_create(dev, file, out_fence, 
ring_idx);
+   if (err) {
+   dma_fence_put(&out_fence->f);
+   return err;
+   }
}
 
submit->out_fence = out_fence;
-- 
2.41.0.255.g8b1d071c50-goog

Re: [PATCH v3] drm/virtio: conditionally allocate virtio_gpu_fence

2023-07-07 Thread Gurchetan Singh

On Fri, Jul 7, 2023 at 10:35 AM Dmitry Osipenko
 wrote:
>
> On 7/7/23 20:04, Dmitry Osipenko wrote:
> > On 7/7/23 18:43, Gurchetan Singh wrote:
> >> @@ -161,21 +157,27 @@ static int virtio_gpu_init_submit(struct 
> >> virtio_gpu_submit *submit,
> >>struct drm_file *file,
> >>u64 fence_ctx, u32 ring_idx)
> >>  {
> >> +int err;
> >> +struct virtio_gpu_fence *out_fence;
> >>  struct virtio_gpu_fpriv *vfpriv = file->driver_priv;
> >>  struct virtio_gpu_device *vgdev = dev->dev_private;
> >> -struct virtio_gpu_fence *out_fence;
> >> -int err;
> >> +bool drm_fence_event = (exbuf->flags & VIRTGPU_EXECBUF_RING_IDX) &&
> >> +   (vfpriv->ring_idx_mask & BIT_ULL(ring_idx));
> >
> > Previously, when VIRTGPU_EXECBUF_RING_IDX flag wasn't specified, the
> > fence event was created for a default ring_idx=0. Now you changed this
> > behaviour and event will never be created without
> > VIRTGPU_EXECBUF_RING_IDX flag being set.

ring_idx = 0 is fine, but without VIRTGPU_EXECBUF_RING_IDX that means
the global timeline.

It's an additional check for where the userspace specifies they want
to use per-context fencing and polling, but actually uses the global
timeline.  Userspace never does this since it wouldn't work, so it's a
bit of a pathological edge case check than any UAPI change.

> >
> > Could you please point me at the corresponding userspace code that polls
> > DRM FD fence event?

https://chromium.googlesource.com/chromiumos/platform2/+/HEAD/vm_tools/sommelier/virtualization/virtgpu_channel.cc#216

Used with the following flow:

https://crosvm.dev/book/devices/wayland.html

If you wish to test, please do apply this change:

https://chromium-review.googlesource.com/c/chromiumos/platform2/+/4605854

> >
> > It's unclear whether there is a possible userspace regression here or
> > not. If there is no regression, then in general such behavioural changes
> > should be done in a separate commit having detailed description
> > explaining why behaviour changes.

Sommelier isn't formally packaged yet in the Linux distro style and it
always specifies RING_IDX when polling, so no regressions here.  Maybe
a separate commit is overkill (since the 2nd commit would delete the
newly added checks), what about just more detail in the commit
message?

>
> I see that venus does the polling and ring_idx_mask is a
> context-specific param, hence it's irrelevant to a generic ctx 0. Still
> it's now necessary to specify the EXECBUF_RING_IDX flag even if ctx has
> one ring, which is UAPI change.

It doesn't seem like venus enables POLL_RINGS_MASK to poll since that
param is zero?

https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/virtio/vulkan/vn_renderer_virtgpu.c#L617



>
> --
> Best regards,
> Dmitry
>

[PATCH v3] drm/virtio: conditionally allocate virtio_gpu_fence

2023-07-07 Thread Gurchetan Singh

We don't want to create a fence for every command submission.  It's
only necessary when userspace provides a waitable token for submission.
This could be:

1) bo_handles, to be used with VIRTGPU_WAIT
2) out_fence_fd, to be used with dma_fence apis
3) a ring_idx provided with VIRTGPU_CONTEXT_PARAM_POLL_RINGS_MASK
   + DRM event API
4) syncobjs in the future

The use case for just submitting a command to the host, and expecting
no response.  For example, gfxstream has GFXSTREAM_CONTEXT_PING that
just wakes up the host side worker threads.  There's also
CROSS_DOMAIN_CMD_SEND which just sends data to the Wayland server.

This prevents the need to signal the automatically created
virtio_gpu_fence.

Signed-off-by: Gurchetan Singh 
---
 v2: Fix indent (Dmitry)
 v3: Refactor drm fence event checks to avoid possible NULL deref (Dmitry)

 drivers/gpu/drm/virtio/virtgpu_submit.c | 28 +
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_submit.c 
b/drivers/gpu/drm/virtio/virtgpu_submit.c
index cf3c04b16a7a..004364cf86d7 100644
--- a/drivers/gpu/drm/virtio/virtgpu_submit.c
+++ b/drivers/gpu/drm/virtio/virtgpu_submit.c
@@ -64,13 +64,9 @@ static int virtio_gpu_fence_event_create(struct drm_device 
*dev,
 struct virtio_gpu_fence *fence,
 u32 ring_idx)
 {
-   struct virtio_gpu_fpriv *vfpriv = file->driver_priv;
struct virtio_gpu_fence_event *e = NULL;
int ret;
 
-   if (!(vfpriv->ring_idx_mask & BIT_ULL(ring_idx)))
-   return 0;
-
e = kzalloc(sizeof(*e), GFP_KERNEL);
if (!e)
return -ENOMEM;
@@ -161,21 +157,27 @@ static int virtio_gpu_init_submit(struct 
virtio_gpu_submit *submit,
  struct drm_file *file,
  u64 fence_ctx, u32 ring_idx)
 {
+   int err;
+   struct virtio_gpu_fence *out_fence;
struct virtio_gpu_fpriv *vfpriv = file->driver_priv;
struct virtio_gpu_device *vgdev = dev->dev_private;
-   struct virtio_gpu_fence *out_fence;
-   int err;
+   bool drm_fence_event = (exbuf->flags & VIRTGPU_EXECBUF_RING_IDX) &&
+  (vfpriv->ring_idx_mask & BIT_ULL(ring_idx));
 
memset(submit, 0, sizeof(*submit));
 
-   out_fence = virtio_gpu_fence_alloc(vgdev, fence_ctx, ring_idx);
-   if (!out_fence)
-   return -ENOMEM;
+   if ((exbuf->flags & VIRTGPU_EXECBUF_FENCE_FD_OUT) || drm_fence_event ||
+exbuf->num_bo_handles)
+   out_fence = virtio_gpu_fence_alloc(vgdev, fence_ctx, ring_idx);
+   else
+   out_fence = NULL;
 
-   err = virtio_gpu_fence_event_create(dev, file, out_fence, ring_idx);
-   if (err) {
-   dma_fence_put(&out_fence->f);
-   return err;
+   if (drm_fence_event) {
+   err = virtio_gpu_fence_event_create(dev, file, out_fence, 
ring_idx);
+   if (err) {
+   dma_fence_put(&out_fence->f);
+   return err;
+   }
}
 
submit->out_fence = out_fence;
-- 
2.41.0.255.g8b1d071c50-goog

Re: [PATCH v2] drm/virtio: conditionally allocate virtio_gpu_fence

2023-07-05 Thread Gurchetan Singh

On Wed, Jun 28, 2023 at 8:58 AM Gurchetan Singh
 wrote:
>
> We don't want to create a fence for every command submission.  It's
> only necessary when userspace provides a waitable token for submission.
> This could be:
>
> 1) bo_handles, to be used with VIRTGPU_WAIT
> 2) out_fence_fd, to be used with dma_fence apis
> 3) a ring_idx provided with VIRTGPU_CONTEXT_PARAM_POLL_RINGS_MASK
>+ DRM event API
> 4) syncobjs in the future
>
> The use case for just submitting a command to the host, and expected
> no response.  For example, gfxstream has GFXSTREAM_CONTEXT_PING that
> just wakes up the host side worker threads.  There's also
> CROSS_DOMAIN_CMD_SEND which just sends data to the Wayland server.
>
> This prevents the need to signal the automatically created
> virtio_gpu_fence.
>
> Signed-off-by: Gurchetan Singh 
> Reviewed-by: 
> ---
>  v2: Fix indent (Dmitry)
>
>  drivers/gpu/drm/virtio/virtgpu_submit.c | 10 +++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_submit.c 
> b/drivers/gpu/drm/virtio/virtgpu_submit.c
> index cf3c04b16a7a..8c7e15c31164 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_submit.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_submit.c
> @@ -168,9 +168,13 @@ static int virtio_gpu_init_submit(struct 
> virtio_gpu_submit *submit,
>
> memset(submit, 0, sizeof(*submit));
>
> -   out_fence = virtio_gpu_fence_alloc(vgdev, fence_ctx, ring_idx);
> -   if (!out_fence)
> -   return -ENOMEM;
> +   if ((exbuf->flags & VIRTGPU_EXECBUF_FENCE_FD_OUT) ||
> +   ((exbuf->flags & VIRTGPU_EXECBUF_RING_IDX) &&
> +   (vfpriv->ring_idx_mask & BIT_ULL(ring_idx))) ||
> +   exbuf->num_bo_handles)
> +   out_fence = virtio_gpu_fence_alloc(vgdev, fence_ctx, 
> ring_idx);
> +   else
> +   out_fence = NULL;
>
> err = virtio_gpu_fence_event_create(dev, file, out_fence, ring_idx);
> if (err) {
> --

Ping for additional reviews or merge.

> 2.41.0.162.gfafddb0af9-goog
>

Re: [PATCH] drm/virtio: conditionally allocate virtio_gpu_fence

2023-06-28 Thread Gurchetan Singh

On Mon, Jun 19, 2023 at 3:02 PM Dmitry Osipenko <
dmitry.osipe...@collabora.com> wrote:

> On 6/13/23 20:43, Gurchetan Singh wrote:
> > We don't want to create a fence for every command submission.  It's
> > only necessary when userspace provides a waitable token for submission.
> > This could be:
> >
> > 1) bo_handles, to be used with VIRTGPU_WAIT
> > 2) out_fence_fd, to be used with dma_fence apis
> > 3) a ring_idx provided with VIRTGPU_CONTEXT_PARAM_POLL_RINGS_MASK
> >+ DRM event API
> > 4) syncobjs in the future
> >
> > The use case for just submitting a command to the host, and expecting
> > no response.  For example, gfxstream has GFXSTREAM_CONTEXT_PING that
> > just wakes up the host side worker threads.  There's also
> > CROSS_DOMAIN_CMD_SEND which just sends data to the Wayland server.
> >
> > This prevents the need to signal the automatically created
> > virtio_gpu_fence.
> >
> > Signed-off-by: Gurchetan Singh 
> > ---
> >  drivers/gpu/drm/virtio/virtgpu_submit.c | 10 +++---
> >  1 file changed, 7 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/virtio/virtgpu_submit.c
> b/drivers/gpu/drm/virtio/virtgpu_submit.c
> > index cf3c04b16a7a..add106c06ab2 100644
> > --- a/drivers/gpu/drm/virtio/virtgpu_submit.c
> > +++ b/drivers/gpu/drm/virtio/virtgpu_submit.c
> > @@ -168,9 +168,13 @@ static int virtio_gpu_init_submit(struct
> virtio_gpu_submit *submit,
> >
> >   memset(submit, 0, sizeof(*submit));
> >
> > - out_fence = virtio_gpu_fence_alloc(vgdev, fence_ctx, ring_idx);
> > - if (!out_fence)
> > - return -ENOMEM;
> > + if ((exbuf->flags & VIRTGPU_EXECBUF_FENCE_FD_OUT) ||
> > + ((exbuf->flags & VIRTGPU_EXECBUF_RING_IDX) &&
> > + (vfpriv->ring_idx_mask & BIT_ULL(ring_idx))) ||
> > + exbuf->num_bo_handles)
> > + out_fence = virtio_gpu_fence_alloc(vgdev, fence_ctx,
> ring_idx);
> > + else
> > + out_fence = NULL;
> >
> >   err = virtio_gpu_fence_event_create(dev, file, out_fence,
> ring_idx);
> >   if (err) {
>
> Looks okay, code indentation may be improved a tad to make it more
> eye-friendly:
>
> +   if ((exbuf->flags & VIRTGPU_EXECBUF_FENCE_FD_OUT) ||
> +  ((exbuf->flags & VIRTGPU_EXECBUF_RING_IDX) &&
> (vfpriv->ring_idx_mask & BIT_ULL(ring_idx))) ||
> +exbuf->num_bo_handles)
> +   out_fence = virtio_gpu_fence_alloc(vgdev, fence_ctx,
> ring_idx);
> +   else
> +   out_fence = NULL;
>
> Checkpatch will complain about this variant, but the complaint can be
> ignored in this case.
>

Added you r-b and fixed indent in v2.


>
> --
> Best regards,
> Dmitry
>
>

[PATCH v2] drm/virtio: conditionally allocate virtio_gpu_fence

2023-06-28 Thread Gurchetan Singh

We don't want to create a fence for every command submission.  It's
only necessary when userspace provides a waitable token for submission.
This could be:

1) bo_handles, to be used with VIRTGPU_WAIT
2) out_fence_fd, to be used with dma_fence apis
3) a ring_idx provided with VIRTGPU_CONTEXT_PARAM_POLL_RINGS_MASK
   + DRM event API
4) syncobjs in the future

The use case for just submitting a command to the host, and expected
no response.  For example, gfxstream has GFXSTREAM_CONTEXT_PING that
just wakes up the host side worker threads.  There's also
CROSS_DOMAIN_CMD_SEND which just sends data to the Wayland server.

This prevents the need to signal the automatically created
virtio_gpu_fence.

Signed-off-by: Gurchetan Singh 
Reviewed-by: 
---
 v2: Fix indent (Dmitry)

 drivers/gpu/drm/virtio/virtgpu_submit.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_submit.c 
b/drivers/gpu/drm/virtio/virtgpu_submit.c
index cf3c04b16a7a..8c7e15c31164 100644
--- a/drivers/gpu/drm/virtio/virtgpu_submit.c
+++ b/drivers/gpu/drm/virtio/virtgpu_submit.c
@@ -168,9 +168,13 @@ static int virtio_gpu_init_submit(struct virtio_gpu_submit 
*submit,
 
memset(submit, 0, sizeof(*submit));
 
-   out_fence = virtio_gpu_fence_alloc(vgdev, fence_ctx, ring_idx);
-   if (!out_fence)
-   return -ENOMEM;
+   if ((exbuf->flags & VIRTGPU_EXECBUF_FENCE_FD_OUT) ||
+   ((exbuf->flags & VIRTGPU_EXECBUF_RING_IDX) &&
+   (vfpriv->ring_idx_mask & BIT_ULL(ring_idx))) ||
+   exbuf->num_bo_handles)
+   out_fence = virtio_gpu_fence_alloc(vgdev, fence_ctx, ring_idx);
+   else
+   out_fence = NULL;
 
err = virtio_gpu_fence_event_create(dev, file, out_fence, ring_idx);
if (err) {
-- 
2.41.0.162.gfafddb0af9-goog

[PATCH] drm/virtio: conditionally allocate virtio_gpu_fence

2023-06-13 Thread Gurchetan Singh

We don't want to create a fence for every command submission.  It's
only necessary when userspace provides a waitable token for submission.
This could be:

1) bo_handles, to be used with VIRTGPU_WAIT
2) out_fence_fd, to be used with dma_fence apis
3) a ring_idx provided with VIRTGPU_CONTEXT_PARAM_POLL_RINGS_MASK
   + DRM event API
4) syncobjs in the future

The use case for just submitting a command to the host, and expecting
no response.  For example, gfxstream has GFXSTREAM_CONTEXT_PING that
just wakes up the host side worker threads.  There's also
CROSS_DOMAIN_CMD_SEND which just sends data to the Wayland server.

This prevents the need to signal the automatically created
virtio_gpu_fence.

Signed-off-by: Gurchetan Singh 
---
 drivers/gpu/drm/virtio/virtgpu_submit.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_submit.c 
b/drivers/gpu/drm/virtio/virtgpu_submit.c
index cf3c04b16a7a..add106c06ab2 100644
--- a/drivers/gpu/drm/virtio/virtgpu_submit.c
+++ b/drivers/gpu/drm/virtio/virtgpu_submit.c
@@ -168,9 +168,13 @@ static int virtio_gpu_init_submit(struct virtio_gpu_submit 
*submit,
 
memset(submit, 0, sizeof(*submit));
 
-   out_fence = virtio_gpu_fence_alloc(vgdev, fence_ctx, ring_idx);
-   if (!out_fence)
-   return -ENOMEM;
+   if ((exbuf->flags & VIRTGPU_EXECBUF_FENCE_FD_OUT) ||
+   ((exbuf->flags & VIRTGPU_EXECBUF_RING_IDX) &&
+   (vfpriv->ring_idx_mask & BIT_ULL(ring_idx))) ||
+   exbuf->num_bo_handles)
+   out_fence = virtio_gpu_fence_alloc(vgdev, fence_ctx, ring_idx);
+   else
+   out_fence = NULL;
 
err = virtio_gpu_fence_event_create(dev, file, out_fence, ring_idx);
if (err) {
-- 
2.34.1

Re: [PATCH v6 0/3] Add sync object UAPI support to VirtIO-GPU driver

2023-05-12 Thread Gurchetan Singh

On Thu, May 11, 2023 at 7:33 PM Dmitry Osipenko <
dmitry.osipe...@collabora.com> wrote:

> On 5/12/23 03:17, Gurchetan Singh wrote:
> ...
> > Can we get one of the Mesa MRs reviewed first?  There's currently no
> > virtio-intel MR AFAICT, and the amdgpu one is marked as "Draft:".
> >
> > Even for the amdgpu, Pierre suggests the feature "will be marked as
> > experimental both in Mesa and virglrenderer" and we can revise as needed.
> > The DRM requirements seem to warn against adding an UAPI too hastily...
> >
> > You can get the deqp-vk 1.2 tests to pass with the current UAPI, if you
> > just change your mesa <--> virglrenderer protocol a little.  Perhaps that
> > way is even better, since you plumb the in sync-obj into host-side
> command
> > submission.
> >
> > Without inter-context sharing of the fence, this MR really only adds
> guest
> > kernel syntactic sugar.
> >
> > Note I'm not against syntactic sugar, but I just want to point out that
> you
> > can likely merge the native context work without any UAPI changes, in
> case
> > it's not clear.
> >
> > If this was for the core drm syncobj implementation, and not just
> >> driver ioctl parsing and wiring up the core helpers, I would agree
> >> with you.
> >>
> >
> > There are several possible and viable paths to get the features in
> question
> > (VK1.2 syncobjs, and inter-context fence sharing).  There are paths
> > entirely without the syncobj, paths that only use the syncobj for the
> > inter-context fence sharing case and create host syncobjs for VK1.2,
> paths
> > that also use guest syncobjs in every proxied command submission.
> >
> > It's really hard to tell which one is better.  Here's my suggestion:
> >
> > 1) Get the native contexts reviewed/merged in Mesa/virglrenderer using
> the
> > current UAPI.  Options for VK1.2 include: pushing down the syncobjs to
> the
> > host, and simulating the syncobj (as already done).  It's fine to mark
> > these contexts as "experimental" like msm-experimental.  That will allow
> > you to experiment with the protocols, come up with tests, and hopefully
> > determine an answer to the host versus guest syncobj question.
> >
> > 2) Once you've completed (1), try to add UAPI changes for features that
> are
> > missing or things that are suboptimal with the knowledge gained from
> doing
> > (2).
> >
> > WDYT?
>
> Having syncobj support available by DRM driver is a mandatory
> requirement for native contexts because userspace (Mesa) relies on sync
> objects support presence. In particular, Intel Mesa driver checks
> whether DRM driver supports sync objects to decide which features are
> available, ANV depends on the syncobj support.


> I'm not familiar with a history of Venus and its limitations. Perhaps
> the reason it's using host-side syncobjs is to have 1:1 Vulkan API
> mapping between guest and host. Not sure if Venus could use guest
> syncobjs instead or there are problems with that.
>

Why not submit a Venus MR?  It's already in-tree, and you can see how your
API works in scenarios with a host side timeline semaphore (aka syncobj).
I think they are also interested in fencing/sync improvements.


>
> When syncobj was initially added to kernel, it was done from the needs
> of supporting Vulkan wait API. For Venus the actual Vulkan driver is on
> host side, while for native contexts it's on guest side. Native contexts
> don't need syncobj on host side, it will be unnecessary overhead for
> every nctx to have it on host. Hence, if there is no good reason for
> host-side syncobjs, then why do that?


Depends on your threading model.  You can have the following scenarios:

1) N guest contexts : 1 host thread
2) N guest contexts : N host threads for each context
3) 1:1 thread

I think the native context is single-threaded (1), IIRC?  If the goal is to
push command submission to the host (for inter-context sharing), I think
you'll at-least want (2).  For a 1:1 model (a la gfxstream), one host
thread can put another thread's out_sync_objs as it's in_sync_objs (in the
same virtgpu context).  I think that's kind of the goal of timeline
semaphores, with the example given by Khronos as with a compute thread + a
graphics thread.

I'm not saying one threading model is better than any other, perhaps the
native context using the host driver in the guest is so good, it doesn't
matter.  I'm just saying these are the types of discussions we can have if
we tried to get one the Mesa MRs merged first ;-)


> Native contexts pass deqp

Re: [PATCH v6 0/3] Add sync object UAPI support to VirtIO-GPU driver

2023-05-11 Thread Gurchetan Singh

On Mon, May 8, 2023 at 6:59 AM Rob Clark  wrote:

> On Wed, May 3, 2023 at 10:07 AM Gurchetan Singh
>  wrote:
> >
> >
> >
> > On Mon, May 1, 2023 at 8:38 AM Dmitry Osipenko <
> dmitry.osipe...@collabora.com> wrote:
> >>
> >> On 4/16/23 14:52, Dmitry Osipenko wrote:
> >> > We have multiple Vulkan context types that are awaiting for the
> addition
> >> > of the sync object DRM UAPI support to the VirtIO-GPU kernel driver:
> >> >
> >> >  1. Venus context
> >> >  2. Native contexts (virtio-freedreno, virtio-intel, virtio-amdgpu)
> >> >
> >> > Mesa core supports DRM sync object UAPI, providing Vulkan drivers
> with a
> >> > generic fencing implementation that we want to utilize.
> >> >
> >> > This patch adds initial sync objects support. It creates fundament
> for a
> >> > further fencing improvements. Later on we will want to extend the
> VirtIO-GPU
> >> > fencing API with passing fence IDs to host for waiting, it will be a
> new
> >> > additional VirtIO-GPU IOCTL and more. Today we have several
> VirtIO-GPU context
> >> > drivers in works that require VirtIO-GPU to support sync objects UAPI.
> >> >
> >> > The patch is heavily inspired by the sync object UAPI implementation
> of the
> >> > MSM driver.
> >>
> >> Gerd, do you have any objections to merging this series?
> >>
> >> We have AMDGPU [1] and Intel [2] native context WIP drivers depending on
> >> the sync object support. It is the only part missing from kernel today
> >> that is wanted by the native context drivers. Otherwise, there are few
> >> other things in Qemu and virglrenderer left to sort out.
> >>
> >> [1] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21658
> >> [2]
> https://gitlab.freedesktop.org/digetx/mesa/-/commits/native-context-iris
> >
> >
> > I'm not saying this change isn't good, just it's probably possible to
> implement the native contexts (even up to even VK1.2) without it.  But this
> patch series may be the most ergonomic way to do it, given how Mesa is
> designed.  But you probably want one of Mesa MRs reviewed first before
> merging (I added a comment on the amdgpu change) and that is a requirement
> [a].
> >
> > [a] "The userspace side must be fully reviewed and tested to the
> standards of that user space project. For e.g. mesa this means piglit
> testcases and review on the mailing list. This is again to ensure that the
> new interface actually gets the job done." -- from the requirements
> >
>
> tbh, the syncobj support is all drm core, the only driver specifics is
> the ioctl parsing.  IMHO existing tests and the two existing consumers
> are sufficient.  (Also, considering that additional non-drm
> dependencies involved.)
>

Can we get one of the Mesa MRs reviewed first?  There's currently no
virtio-intel MR AFAICT, and the amdgpu one is marked as "Draft:".

Even for the amdgpu, Pierre suggests the feature "will be marked as
experimental both in Mesa and virglrenderer" and we can revise as needed.
The DRM requirements seem to warn against adding an UAPI too hastily...

You can get the deqp-vk 1.2 tests to pass with the current UAPI, if you
just change your mesa <--> virglrenderer protocol a little.  Perhaps that
way is even better, since you plumb the in sync-obj into host-side command
submission.

Without inter-context sharing of the fence, this MR really only adds guest
kernel syntactic sugar.

Note I'm not against syntactic sugar, but I just want to point out that you
can likely merge the native context work without any UAPI changes, in case
it's not clear.

If this was for the core drm syncobj implementation, and not just
> driver ioctl parsing and wiring up the core helpers, I would agree
> with you.
>

There are several possible and viable paths to get the features in question
(VK1.2 syncobjs, and inter-context fence sharing).  There are paths
entirely without the syncobj, paths that only use the syncobj for the
inter-context fence sharing case and create host syncobjs for VK1.2, paths
that also use guest syncobjs in every proxied command submission.

It's really hard to tell which one is better.  Here's my suggestion:

1) Get the native contexts reviewed/merged in Mesa/virglrenderer using the
current UAPI.  Options for VK1.2 include: pushing down the syncobjs to the
host, and simulating the syncobj (as already done).  It's fine to mark
these contexts as "experimental" like msm-experimental.  That will allow
you to experiment with the protocols, come up with tests, and hopefully
determine an answer to the host versus guest syncobj question.

2) Once you've completed (1), try to add UAPI changes for features that are
missing or things that are suboptimal with the knowledge gained from doing
(2).

WDYT?

>
> BR,
> -R
>

Re: [PATCH v6 0/3] Add sync object UAPI support to VirtIO-GPU driver

2023-05-03 Thread Gurchetan Singh

On Mon, May 1, 2023 at 8:38 AM Dmitry Osipenko <
dmitry.osipe...@collabora.com> wrote:

> On 4/16/23 14:52, Dmitry Osipenko wrote:
> > We have multiple Vulkan context types that are awaiting for the addition
> > of the sync object DRM UAPI support to the VirtIO-GPU kernel driver:
> >
> >  1. Venus context
> >  2. Native contexts (virtio-freedreno, virtio-intel, virtio-amdgpu)
> >
> > Mesa core supports DRM sync object UAPI, providing Vulkan drivers with a
> > generic fencing implementation that we want to utilize.
> >
> > This patch adds initial sync objects support. It creates fundament for a
> > further fencing improvements. Later on we will want to extend the
> VirtIO-GPU
> > fencing API with passing fence IDs to host for waiting, it will be a new
> > additional VirtIO-GPU IOCTL and more. Today we have several VirtIO-GPU
> context
> > drivers in works that require VirtIO-GPU to support sync objects UAPI.
> >
> > The patch is heavily inspired by the sync object UAPI implementation of
> the
> > MSM driver.
>
> Gerd, do you have any objections to merging this series?
>
> We have AMDGPU [1] and Intel [2] native context WIP drivers depending on
> the sync object support. It is the only part missing from kernel today
> that is wanted by the native context drivers. Otherwise, there are few
> other things in Qemu and virglrenderer left to sort out.
>
> [1] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21658
> [2]
> https://gitlab.freedesktop.org/digetx/mesa/-/commits/native-context-iris


I'm not saying this change isn't good, just it's probably possible to
implement the native contexts (even up to even VK1.2) without it.  But this
patch series may be the most ergonomic way to do it, given how Mesa is
designed.  But you probably want one of Mesa MRs reviewed first before
merging (I added a comment on the amdgpu change) and that is a requirement
[a].

[a] "The userspace side must be fully reviewed and tested to the standards
of that user space project. For e.g. mesa this means piglit testcases and
review on the mailing list. This is again to ensure that the new interface
actually gets the job done." -- from the requirements


>
>
> --
> Best regards,
> Dmitry
>
>

Re: [PATCH v6 0/3] Add sync object UAPI support to VirtIO-GPU driver

2023-04-24 Thread Gurchetan Singh

On Wed, Apr 19, 2023 at 2:22 PM Dmitry Osipenko
 wrote:
>
> Hello Gurchetan,
>
> On 4/18/23 02:17, Gurchetan Singh wrote:
> > On Sun, Apr 16, 2023 at 4:53 AM Dmitry Osipenko <
> > dmitry.osipe...@collabora.com> wrote:
> >
> >> We have multiple Vulkan context types that are awaiting for the addition
> >> of the sync object DRM UAPI support to the VirtIO-GPU kernel driver:
> >>
> >>  1. Venus context
> >>  2. Native contexts (virtio-freedreno, virtio-intel, virtio-amdgpu)
> >>
> >> Mesa core supports DRM sync object UAPI, providing Vulkan drivers with a
> >> generic fencing implementation that we want to utilize.
> >>
> >> This patch adds initial sync objects support. It creates fundament for a
> >> further fencing improvements. Later on we will want to extend the
> >> VirtIO-GPU
> >> fencing API with passing fence IDs to host for waiting, it will be a new
> >> additional VirtIO-GPU IOCTL and more. Today we have several VirtIO-GPU
> >> context
> >> drivers in works that require VirtIO-GPU to support sync objects UAPI.
> >>
> >> The patch is heavily inspired by the sync object UAPI implementation of the
> >> MSM driver.
> >>
> >
> > The changes seem good, but I would recommend getting a full end-to-end
> > solution (i.e, you've proxied the host fence with these changes and shared
> > with the host compositor) working first.  You'll never know what you'll
> > find after completing this exercise.  Or is that the plan already?
> >
> > Typically, you want to land the uAPI and virtio spec changes last.
> > Mesa/gfxstream/virglrenderer/crosvm all have the ability to test out
> > unstable uAPIs ...
>
> The proxied host fence isn't directly related to sync objects, though I
> prepared code such that it could be extended with a proxied fence later
> on, based on a prototype that was made some time ago.

Proxying the host fence is the novel bit.  If you have code that does
this, you should rebase/send that out (even as an RFC) so it's easier
to see how the pieces fit.

Right now, if you've only tested synchronization objects between the
same virtio-gpu context that skips the guest side wait, I think you
can already do that with the current uAPI (since ideally you'd wait on
the host side and can encode the sync resource in the command stream).

Also, try to come with a simple test (so we can meet requirements here
[a]) that showcases the new feature/capability.  An example would be
the virtio-intel native context sharing a fence with KMS or even
Wayland.

[a] 
https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#open-source-userspace-requirements

>
> The proxied host fence shouldn't require UAPI changes, but only
> virtio-gpu proto extension. Normally, all in-fences belong to a job's
> context, and thus, waits are skipped by the guest kernel. Hence, fence
> proxying is a separate feature from sync objects, it can be added
> without sync objects.
>
> Sync objects primarily wanted by native context drivers because Mesa
> relies on the sync object UAPI presence. It's one of direct blockers for
> Intel and AMDGPU drivers, both of which has been using this sync object
> UAPI for a few months and now wanting it to land upstream.
>
> --
> Best regards,
> Dmitry
>

Re: [PATCH v6 0/3] Add sync object UAPI support to VirtIO-GPU driver

2023-04-17 Thread Gurchetan Singh

On Sun, Apr 16, 2023 at 4:53 AM Dmitry Osipenko <
dmitry.osipe...@collabora.com> wrote:

> We have multiple Vulkan context types that are awaiting for the addition
> of the sync object DRM UAPI support to the VirtIO-GPU kernel driver:
>
>  1. Venus context
>  2. Native contexts (virtio-freedreno, virtio-intel, virtio-amdgpu)
>
> Mesa core supports DRM sync object UAPI, providing Vulkan drivers with a
> generic fencing implementation that we want to utilize.
>
> This patch adds initial sync objects support. It creates fundament for a
> further fencing improvements. Later on we will want to extend the
> VirtIO-GPU
> fencing API with passing fence IDs to host for waiting, it will be a new
> additional VirtIO-GPU IOCTL and more. Today we have several VirtIO-GPU
> context
> drivers in works that require VirtIO-GPU to support sync objects UAPI.
>
> The patch is heavily inspired by the sync object UAPI implementation of the
> MSM driver.
>

The changes seem good, but I would recommend getting a full end-to-end
solution (i.e, you've proxied the host fence with these changes and shared
with the host compositor) working first.  You'll never know what you'll
find after completing this exercise.  Or is that the plan already?

Typically, you want to land the uAPI and virtio spec changes last.
Mesa/gfxstream/virglrenderer/crosvm all have the ability to test out
unstable uAPIs ...


>
> Changelog:
>
> v6: - Added zeroing out of syncobj_desc, as was suggested by Emil Velikov.
>
> - Fixed memleak in error code path which was spotted by Emil Velikov.
>
> - Switched to u32/u64 instead of uint_t. Previously was keeping
>   uint_t style of the virtgpu_ioctl.c, in the end decided to change
>   it because it's not a proper kernel coding style after all.
>
> - Kept single drm_virtgpu_execbuffer_syncobj struct for both in/out
>   sync objects. There was a little concern about whether it would be
>   worthwhile to have separate in/out descriptors, in practice it's
>   unlikely that we will extend the descs in a foreseeable future.
>   There is no overhead in using same struct since we want to pad it
>   to 64b anyways and it shouldn't be a problem to separate the descs
>   later on if we will want to do that.
>
> - Added r-b from Emil Velikov.
>
> v5: - Factored out dma-fence unwrap API usage into separate patch as was
>   suggested by Emil Velikov.
>
> - Improved and documented the job submission reorderings as was
>   requested by Emil Velikov. Sync file FD is now installed after
>   job is submitted to virtio to further optimize reorderings.
>
> - Added comment for the kvalloc, as was requested by Emil Velikov.
>
> - The num_in/out_syncobjs now is set only after completed parsing
>   of pre/post deps, as was requested by Emil Velikov.
>
> v4: - Added r-b from Rob Clark to the "refactoring" patch.
>
> - Replaced for/while(ptr && itr) with if (ptr), like was suggested by
>   Rob Clark.
>
> - Dropped NOWARN and NORETRY GFP flags and switched syncobj patch
>   to use kvmalloc.
>
> - Removed unused variables from syncobj patch that were borrowed by
>   accident from another (upcoming) patch after one of git rebases.
>
> v3: - Switched to use dma_fence_unwrap_for_each(), like was suggested by
>   Rob Clark.
>
> - Fixed missing dma_fence_put() in error code path that was spotted by
>   Rob Clark.
>
> - Removed obsoleted comment to virtio_gpu_execbuffer_ioctl(), like was
>   suggested by Rob Clark.
>
> v2: - Fixed chain-fence context matching by making use of
>   dma_fence_chain_contained().
>
> - Fixed potential uninitialized var usage in error code patch of
>   parse_post_deps(). MSM driver had a similar issue that is fixed
>   already in upstream.
>
> - Added new patch that refactors job submission code path. I found
>   that it was very difficult to add a new/upcoming host-waits feature
>   because of how variables are passed around the code, the
> virtgpu_ioctl.c
>   also was growing to unmanageable size.
>
> Dmitry Osipenko (3):
>   drm/virtio: Refactor and optimize job submission code path
>   drm/virtio: Wait for each dma-fence of in-fence array individually
>   drm/virtio: Support sync objects
>
>  drivers/gpu/drm/virtio/Makefile |   2 +-
>  drivers/gpu/drm/virtio/virtgpu_drv.c|   3 +-
>  drivers/gpu/drm/virtio/virtgpu_drv.h|   4 +
>  drivers/gpu/drm/virtio/virtgpu_ioctl.c  | 182 
>  drivers/gpu/drm/virtio/virtgpu_submit.c | 530 
>  include/uapi/drm/virtgpu_drm.h  |  16 +-
>  6 files changed, 552 insertions(+), 185 deletions(-)
>  create mode 100644 drivers/gpu/drm/virtio/virtgpu_submit.c
>
> --
> 2.39.2
>
>

Re: [PATCH] drm/virtio: Fix capset-id query size

2022-02-15 Thread Gurchetan Singh

On Tue, Feb 15, 2022 at 5:15 PM Rob Clark  wrote:

> From: Rob Clark 
>
> The UABI was already defined for pointer to 64b value, and all the
> userspace users of this ioctl that I could find are already using a
> uint64_t (but zeroing it out to work around kernel only copying 32b).
> Unfortunately this ioctl doesn't have a length field, so out of paranoia
> I restricted the change to copy 64b to the single 64b param that can be
> queried.
>
> Fixes: 78aa20fa4381 ("drm/virtio: implement context init: advertise
> feature to userspace")
> Signed-off-by: Rob Clark 
>

Reviewed-by: Gurchetan Singh 


> ---
>  drivers/gpu/drm/virtio/virtgpu_ioctl.c | 16 
>  1 file changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> index 0f2f3f54dbf9..0158d27d5645 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> @@ -269,7 +269,8 @@ static int virtio_gpu_getparam_ioctl(struct drm_device
> *dev, void *data,
>  {
> struct virtio_gpu_device *vgdev = dev->dev_private;
> struct drm_virtgpu_getparam *param = data;
> -   int value;
> +   int value, ret, sz = sizeof(int);
> +   uint64_t value64;
>
> switch (param->param) {
> case VIRTGPU_PARAM_3D_FEATURES:
> @@ -291,13 +292,20 @@ static int virtio_gpu_getparam_ioctl(struct
> drm_device *dev, void *data,
> value = vgdev->has_context_init ? 1 : 0;
> break;
> case VIRTGPU_PARAM_SUPPORTED_CAPSET_IDs:
> -   value = vgdev->capset_id_mask;
> +   value64 = vgdev->capset_id_mask;
> +   sz = sizeof(value64);
> break;
> default:
> return -EINVAL;
> }
> -   if (copy_to_user(u64_to_user_ptr(param->value), &value,
> sizeof(int)))
> -   return -EFAULT;
> +
> +   if (sz == sizeof(int)) {
> +   if (copy_to_user(u64_to_user_ptr(param->value), &value,
> sz))
> +   return -EFAULT;
> +   } else {
> +   if (copy_to_user(u64_to_user_ptr(param->value), &value64,
> sz))
> +   return -EFAULT;
> +   }
>
> return 0;
>  }
> --
> 2.34.1
>
>

Re: [PATCH] drm/virtio: Fix NULL vs IS_ERR checking in virtio_gpu_object_shmem_init

2022-01-07 Thread Gurchetan Singh

On Tue, Dec 21, 2021 at 11:26 PM Miaoqian Lin  wrote:

> Since drm_prime_pages_to_sg() function return error pointers.
> The drm_gem_shmem_get_sg_table() function returns error pointers too.
> Using IS_ERR() to check the return value to fix this.
>
> Fixes: f651c8b05542("drm/virtio: factor out the sg_table from
> virtio_gpu_object")
> Signed-off-by: Miaoqian Lin 
>

Reviewed-by: Gurchetan Singh 


> ---
>  drivers/gpu/drm/virtio/virtgpu_object.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_object.c
> b/drivers/gpu/drm/virtio/virtgpu_object.c
> index f648b0e24447..8bb80289672c 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_object.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_object.c
> @@ -168,9 +168,9 @@ static int virtio_gpu_object_shmem_init(struct
> virtio_gpu_device *vgdev,
>  * since virtio_gpu doesn't support dma-buf import from other
> devices.
>  */
> shmem->pages = drm_gem_shmem_get_sg_table(&bo->base.base);
> -   if (!shmem->pages) {
> +   if (IS_ERR(shmem->pages)) {
> drm_gem_shmem_unpin(&bo->base.base);
> -   return -EINVAL;
> +   return PTR_ERR(shmem->pages);
> }
>
> if (use_dma_api) {
> --
> 2.17.1
>
>

[PATCH 2/2] drm/virtio: use drm_poll(..) instead of virtio_gpu_poll(..)

2021-11-22 Thread Gurchetan Singh

From: Gurchetan Singh 

With the use of dummy events, we can drop virtgpu specific
behavior.

Fixes: cd7f5ca33585 ("drm/virtio: implement context init: add 
virtio_gpu_fence_event")
Reported-by: Daniel Vetter 
Signed-off-by: Gurchetan Singh 
---
 drivers/gpu/drm/virtio/virtgpu_drv.c | 42 +---
 1 file changed, 1 insertion(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.c 
b/drivers/gpu/drm/virtio/virtgpu_drv.c
index d86e1ad4a972..5072dbb0669a 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.c
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.c
@@ -157,36 +157,6 @@ static void virtio_gpu_config_changed(struct virtio_device 
*vdev)
schedule_work(&vgdev->config_changed_work);
 }
 
-static __poll_t virtio_gpu_poll(struct file *filp,
-   struct poll_table_struct *wait)
-{
-   struct drm_file *drm_file = filp->private_data;
-   struct virtio_gpu_fpriv *vfpriv = drm_file->driver_priv;
-   struct drm_device *dev = drm_file->minor->dev;
-   struct virtio_gpu_device *vgdev = dev->dev_private;
-   struct drm_pending_event *e = NULL;
-   __poll_t mask = 0;
-
-   if (!vgdev->has_virgl_3d || !vfpriv || !vfpriv->ring_idx_mask)
-   return drm_poll(filp, wait);
-
-   poll_wait(filp, &drm_file->event_wait, wait);
-
-   if (!list_empty(&drm_file->event_list)) {
-   spin_lock_irq(&dev->event_lock);
-   e = list_first_entry(&drm_file->event_list,
-struct drm_pending_event, link);
-   drm_file->event_space += e->event->length;
-   list_del(&e->link);
-   spin_unlock_irq(&dev->event_lock);
-
-   kfree(e);
-   mask |= EPOLLIN | EPOLLRDNORM;
-   }
-
-   return mask;
-}
-
 static struct virtio_device_id id_table[] = {
{ VIRTIO_ID_GPU, VIRTIO_DEV_ANY_ID },
{ 0 },
@@ -226,17 +196,7 @@ MODULE_AUTHOR("Dave Airlie ");
 MODULE_AUTHOR("Gerd Hoffmann ");
 MODULE_AUTHOR("Alon Levy");
 
-static const struct file_operations virtio_gpu_driver_fops = {
-   .owner  = THIS_MODULE,
-   .open   = drm_open,
-   .release= drm_release,
-   .unlocked_ioctl = drm_ioctl,
-   .compat_ioctl   = drm_compat_ioctl,
-   .poll   = virtio_gpu_poll,
-   .read   = drm_read,
-   .llseek = noop_llseek,
-   .mmap   = drm_gem_mmap
-};
+DEFINE_DRM_GEM_FOPS(virtio_gpu_driver_fops);
 
 static const struct drm_driver driver = {
.driver_features = DRIVER_MODESET | DRIVER_GEM | DRIVER_RENDER | 
DRIVER_ATOMIC,
-- 
2.34.0.rc2.393.gf8c9666880-goog

[PATCH 1/2] drm/virtgpu api: define a dummy fence signaled event

2021-11-22 Thread Gurchetan Singh

From: Gurchetan Singh 

The current virtgpu implementation of poll(..) drops events
when VIRTGPU_CONTEXT_PARAM_POLL_RINGS_MASK is enabled (otherwise
it's like a normal DRM driver).

This is because paravirtualized userspaces receives responses in a
buffer of type BLOB_MEM_GUEST, not by read(..).

To be in line with other DRM drivers and avoid specialized behavior,
it is possible to define a dummy event for virtgpu.  Paravirtualized
userspace will now have to call read(..) on the DRM fd to receive the
dummy event.

Fixes: b10790434cf2 ("drm/virtgpu api: create context init feature")
Reported-by: Daniel Vetter 
Signed-off-by: Gurchetan Singh 
---
 drivers/gpu/drm/virtio/virtgpu_drv.h   | 1 -
 drivers/gpu/drm/virtio/virtgpu_ioctl.c | 2 +-
 include/uapi/drm/virtgpu_drm.h | 7 +++
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
b/drivers/gpu/drm/virtio/virtgpu_drv.h
index e0265fe74aa5..0a194aaad419 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -138,7 +138,6 @@ struct virtio_gpu_fence_driver {
spinlock_t   lock;
 };
 
-#define VIRTGPU_EVENT_FENCE_SIGNALED_INTERNAL 0x1000
 struct virtio_gpu_fence_event {
struct drm_pending_event base;
struct drm_event event;
diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
index 5618a1d5879c..3607646d3229 100644
--- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
+++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
@@ -54,7 +54,7 @@ static int virtio_gpu_fence_event_create(struct drm_device 
*dev,
if (!e)
return -ENOMEM;
 
-   e->event.type = VIRTGPU_EVENT_FENCE_SIGNALED_INTERNAL;
+   e->event.type = VIRTGPU_EVENT_FENCE_SIGNALED;
e->event.length = sizeof(e->event);
 
ret = drm_event_reserve_init(dev, file, &e->base, &e->event);
diff --git a/include/uapi/drm/virtgpu_drm.h b/include/uapi/drm/virtgpu_drm.h
index a13e20cc66b4..0512fde5e697 100644
--- a/include/uapi/drm/virtgpu_drm.h
+++ b/include/uapi/drm/virtgpu_drm.h
@@ -196,6 +196,13 @@ struct drm_virtgpu_context_init {
__u64 ctx_set_params;
 };
 
+/*
+ * Event code that's given when VIRTGPU_CONTEXT_PARAM_POLL_RINGS_MASK is in
+ * effect.  The event size is sizeof(drm_event), since there is no additional
+ * payload.
+ */
+#define VIRTGPU_EVENT_FENCE_SIGNALED 0x9000
+
 #define DRM_IOCTL_VIRTGPU_MAP \
DRM_IOWR(DRM_COMMAND_BASE + DRM_VIRTGPU_MAP, struct drm_virtgpu_map)
 
-- 
2.34.0.rc2.393.gf8c9666880-goog

[PATCH 0/2] virtgpu dummy events

2021-11-22 Thread Gurchetan Singh

From: Gurchetan Singh 

There was a desire to not have a virtgpu-specific implementation of
poll(..), but there wasn't any real event to return either.

Solution: Dummy event with just event code

Context:

https://lists.freedesktop.org/archives/dri-devel/2021-November/330950.html

Userspace:

crrev.com/c/3296610

This series applies to drm-misc-fixes.

Gurchetan Singh (2):
  drm/virtgpu api: define a dummy fence signaled event
  drm/virtio: use drm_poll(..) instead of virtio_gpu_poll(..)

 drivers/gpu/drm/virtio/virtgpu_drv.c   | 42 +-
 drivers/gpu/drm/virtio/virtgpu_drv.h   |  1 -
 drivers/gpu/drm/virtio/virtgpu_ioctl.c |  2 +-
 include/uapi/drm/virtgpu_drm.h |  7 +
 4 files changed, 9 insertions(+), 43 deletions(-)

-- 
2.34.0.rc2.393.gf8c9666880-goog

Re: [PATCH v3 11/12] drm/virtio: implement context init: add virtio_gpu_fence_event

2021-11-19 Thread Gurchetan Singh

On Fri, Nov 19, 2021 at 9:38 AM Rob Clark  wrote:

> On Thu, Nov 18, 2021 at 12:53 AM Daniel Vetter  wrote:
> >
> > On Tue, Nov 16, 2021 at 06:31:10PM -0800, Gurchetan Singh wrote:
> > > On Tue, Nov 16, 2021 at 7:43 AM Daniel Vetter  wrote:
> > >
> > > > On Mon, Nov 15, 2021 at 07:26:14PM +, Kasireddy, Vivek wrote:
> > > > > Hi Daniel, Greg,
> > > > >
> > > > > If it is the same or a similar crash reported here:
> > > > >
> > > >
> https://lists.freedesktop.org/archives/dri-devel/2021-November/330018.html
> > > > > and here:
> > > >
> https://lists.freedesktop.org/archives/dri-devel/2021-November/330212.html
> > > > > then the fix is already merged:
> > > > >
> > > >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d89c0c8322ecdc9a2ec84b959b6f766be082da76
> > >
> > > Yeah but that still leaves the problem of why exaxtly virtgpu is
> > > > reinventing drm_poll here?
> > >
> > >
> > > > Can you please replace it with drm_poll like all other drivers,
> including
> > > > the ones that have private events?
> > > >
> > >
> > > Hi Daniel,
> > >
> > > Allow me to explain the use case a bit.  It's for when virtgpu KMS is
> not
> > > used, but a special Wayland compositor does wayland passthrough
> instead:
> > >
> > >
> https://www.youtube.com/watch?v=WwrXqDERFm8https://www.youtube.com/watch?v=EkNBsBx501Q
> > >
> > > This technique has gained much popularity in the virtualized laptop
> > > space, where it offers better performance/user experience than virtgpu
> > > KMS.  The relevant paravirtualized userspace is "Sommelier":
> > >
> > >
> https://chromium.googlesource.com/chromiumos/platform2/+/main/vm_tools/sommelier/
> > >
> https://chromium.googlesource.com/chromiumos/platform2/+/main/vm_tools/sommelier/virtualization/virtgpu_channel.cc
> > >
> > > Previously, we were using the out-of-tree virtio-wl device and there
> > > were many discussions on how we could get this upstream:
> > >
> > >
> https://lists.freedesktop.org/archives/dri-devel/2017-December/160309.html
> > > https://lists.oasis-open.org/archives/virtio-dev/202002/msg5.html
> > >
> > > Extending virtgpu was deemed the least intrusive option:
> > >
> > > https://www.spinics.net/lists/kvm/msg159206.html
> > >
> > > We ultimately settled on the context type abstraction and used
> > > virtio_gpu_poll to tell the guest "hey, we have a Wayland event".  The
> > > host response is actually in a buffer of type BLOB_MEM_GUEST.
> > >
> > > It is possible to use drm_poll(..), but that would have to be
> > > accompanied by a drm_read(..).  You'll need to define a dummy
> > > VIRTGPU_EVENT_FENCE_SIGNALED in the uapi too.
> > >
> > > That's originally how I did it, but some pointed out that's
> > > unnecessary since the host response is in the BLOB_MEM_GUEST buffer
> > > and virtgpu event is a dummy event.  So we decided just to modify
> > > virtio_gpu_poll(..) to have the desired semantics in that case.
> > >
> > > For the regular virtio-gpu KMS path, things remain unchanged.
> > >
> > > There are of course other ways to do it (perhaps polling a dma_fence),
> > > but that was the cleanest way we could find.
> > >
> > > It's not rare for virtio to "special things" (see virtio_dma_buf_ops,
> > > virtio_dma_ops), since they are in fake devices.
> >
> > These are all internal interfaces, not uapi.
> >
> > > We're open to other ideas, but hopefully that answers some of your
> > > questions.
> >
> > Well for one, why does the commit message not explain any of this. You're
> > building uapi, which is forever, it's paramount all considerations are
> > properly explained.
> >
> > Second, I really don't like that youre redefining poll semantics in
> > incompatible ways from all other drm drivers. If you want special poll
> > semantics then just create a sperate fd for that (or a dma_fence or
> > whatever, maybe that saves some typing), but bending the drm fd semantics
> > is no good at all. We have tons of different fd with their dedicated
> > semantics in drm, trying to shoehorn it all into one just isn't very good
> > design.
> >
> > Or do

Re: [PATCH v3 11/12] drm/virtio: implement context init: add virtio_gpu_fence_event

2021-11-16 Thread Gurchetan Singh

On Tue, Nov 16, 2021 at 7:43 AM Daniel Vetter  wrote:

> On Mon, Nov 15, 2021 at 07:26:14PM +, Kasireddy, Vivek wrote:
> > Hi Daniel, Greg,
> >
> > If it is the same or a similar crash reported here:
> >
> https://lists.freedesktop.org/archives/dri-devel/2021-November/330018.html
> > and here:
> https://lists.freedesktop.org/archives/dri-devel/2021-November/330212.html
> > then the fix is already merged:
> >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d89c0c8322ecdc9a2ec84b959b6f766be082da76

Yeah but that still leaves the problem of why exaxtly virtgpu is
> reinventing drm_poll here?

> Can you please replace it with drm_poll like all other drivers, including
> the ones that have private events?
>

Hi Daniel,

Allow me to explain the use case a bit.  It's for when virtgpu KMS is not
used, but a special Wayland compositor does wayland passthrough instead:

https://www.youtube.com/watch?v=WwrXqDERFm8https://www.youtube.com/watch?v=EkNBsBx501Q

This technique has gained much popularity in the virtualized laptop
space, where it offers better performance/user experience than virtgpu
KMS.  The relevant paravirtualized userspace is "Sommelier":

https://chromium.googlesource.com/chromiumos/platform2/+/main/vm_tools/sommelier/
https://chromium.googlesource.com/chromiumos/platform2/+/main/vm_tools/sommelier/virtualization/virtgpu_channel.cc

Previously, we were using the out-of-tree virtio-wl device and there
were many discussions on how we could get this upstream:

https://lists.freedesktop.org/archives/dri-devel/2017-December/160309.html
https://lists.oasis-open.org/archives/virtio-dev/202002/msg5.html

Extending virtgpu was deemed the least intrusive option:

https://www.spinics.net/lists/kvm/msg159206.html

We ultimately settled on the context type abstraction and used
virtio_gpu_poll to tell the guest "hey, we have a Wayland event".  The
host response is actually in a buffer of type BLOB_MEM_GUEST.

It is possible to use drm_poll(..), but that would have to be
accompanied by a drm_read(..).  You'll need to define a dummy
VIRTGPU_EVENT_FENCE_SIGNALED in the uapi too.

That's originally how I did it, but some pointed out that's
unnecessary since the host response is in the BLOB_MEM_GUEST buffer
and virtgpu event is a dummy event.  So we decided just to modify
virtio_gpu_poll(..) to have the desired semantics in that case.

For the regular virtio-gpu KMS path, things remain unchanged.

There are of course other ways to do it (perhaps polling a dma_fence),
but that was the cleanest way we could find.

It's not rare for virtio to "special things" (see virtio_dma_buf_ops,
virtio_dma_ops), since they are in fake devices.

We're open to other ideas, but hopefully that answers some of your
questions.

> Thanks, Daniel
>
> >
> > Thanks,
> > Vivek
> >
> > > On Sat, Nov 13, 2021 at 03:51:48PM +0100, Greg KH wrote:
> > > > On Tue, Sep 21, 2021 at 04:20:23PM -0700, Gurchetan Singh wrote:
> > > > > Similar to DRM_VMW_EVENT_FENCE_SIGNALED.  Sends a pollable event
> > > > > to the DRM file descriptor when a fence on a specific ring is
> > > > > signaled.
> > > > >
> > > > > One difference is the event is not exposed via the UAPI -- this is
> > > > > because host responses are on a shared memory buffer of type
> > > > > BLOB_MEM_GUEST [this is the common way to receive responses with
> > > > > virtgpu].  As such, there is no context specific read(..)
> > > > > implementation either -- just a poll(..) implementation.
> > > > >
> > > > > Signed-off-by: Gurchetan Singh 
> > > > > Acked-by: Nicholas Verne 
> > > > > ---
> > > > >  drivers/gpu/drm/virtio/virtgpu_drv.c   | 43
> +-
> > > > >  drivers/gpu/drm/virtio/virtgpu_drv.h   |  7 +
> > > > >  drivers/gpu/drm/virtio/virtgpu_fence.c | 10 ++
> > > > >  drivers/gpu/drm/virtio/virtgpu_ioctl.c | 34 
> > > > >  4 files changed, 93 insertions(+), 1 deletion(-)
> > > >
> > > > This commit seems to cause a crash in a virtual drm gpu driver for
> > > > Android.  I have reverted this, and the next commit in the series
> from
> > > > Linus's tree and all is good again.
> > > >
> > > > Any ideas?
> > >
> > > Well no, but also this patch looks very questionable of hand-rolling
> > > drm_poll. Yes you can do driver private events like
> > > DRM_VMW_EVENT_FENCE_SIGNALED, that's fine. But you really should not
> need
> > > to hand-roll the poll callback. vmwgfx (which generally is a very old
> > > driver which has lots of custom stuff, so not a great example) doesn't
> do
> > > that either.
> > >
> > > So that part should go no matter what I think.
> > > -Daniel
> > > --
> > > Daniel Vetter
> > > Software Engineer, Intel Corporation
> > > http://blog.ffwll.ch
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
>

Re: [PATCH] drm/virtio: add null check in virtio_gpu_poll

2021-11-15 Thread Gurchetan Singh

On Mon, Nov 15, 2021 at 9:58 AM Gurchetan Singh 
wrote:

> From: Gurchetan Singh 
>
> If vfpriv is null, we shouldn't check vfpriv->ring_idx_mask.
>
> Signed-off-by: Gurchetan Singh 
> ---
>  drivers/gpu/drm/virtio/virtgpu_drv.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.c
> b/drivers/gpu/drm/virtio/virtgpu_drv.c
> index 749db18dcfa2..7975ea06b316 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_drv.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_drv.c
> @@ -166,7 +166,7 @@ static __poll_t virtio_gpu_poll(struct file *filp,
> struct drm_pending_event *e = NULL;
> __poll_t mask = 0;
>
> -   if (!vfpriv->ring_idx_mask)
> +   if (!vfpriv || !vfpriv->ring_idx_mask)
> return drm_poll(filp, wait);
>
> poll_wait(filp, &drm_file->event_wait, wait);
>

Nevermind, looks like fix was merged in the main tree and will make it back
to to drm-misc-next:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d89c0c8322ecdc9a2ec84b959b6f766be082da76

-- 
> 2.34.0.rc1.387.gb447b232ab-goog
>
>

[PATCH] drm/virtio: add null check in virtio_gpu_poll

2021-11-15 Thread Gurchetan Singh

From: Gurchetan Singh 

If vfpriv is null, we shouldn't check vfpriv->ring_idx_mask.

Signed-off-by: Gurchetan Singh 
---
 drivers/gpu/drm/virtio/virtgpu_drv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.c 
b/drivers/gpu/drm/virtio/virtgpu_drv.c
index 749db18dcfa2..7975ea06b316 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.c
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.c
@@ -166,7 +166,7 @@ static __poll_t virtio_gpu_poll(struct file *filp,
struct drm_pending_event *e = NULL;
__poll_t mask = 0;
 
-   if (!vfpriv->ring_idx_mask)
+   if (!vfpriv || !vfpriv->ring_idx_mask)
return drm_poll(filp, wait);
 
poll_wait(filp, &drm_file->event_wait, wait);
-- 
2.34.0.rc1.387.gb447b232ab-goog

[RFC PATCH 8/8] drm: trace memory import per DRM device

2021-10-20 Thread Gurchetan Singh

- drm_gem_prime_import_dev increases the per-device import memory
  counter

  * Most drivers use it.

  * drivers that have a (*gem_prime_import) callback will need
additional changes, which can be done if everyone likes the
overall RFC.

- drm_prime_gem_destroy decreases the per-device import memory
  counter.

  * All drivers seem to use it?

Signed-off-by: Gurchetan Singh 
---
 drivers/gpu/drm/drm_prime.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 1afcae0c4038..c2057b7a63b4 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -955,6 +955,7 @@ struct drm_gem_object *drm_gem_prime_import_dev(struct 
drm_device *dev,
 
obj->import_attach = attach;
obj->resv = dma_buf->resv;
+   drm_gem_trace_gpu_mem_total(dev, obj->size, true);
 
return obj;
 
@@ -1055,6 +1056,7 @@ void drm_prime_gem_destroy(struct drm_gem_object *obj, 
struct sg_table *sg)
struct dma_buf_attachment *attach;
struct dma_buf *dma_buf;
 
+   drm_gem_trace_gpu_mem_total(obj->dev, -obj->size, true);
attach = obj->import_attach;
if (sg)
dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
-- 
2.25.1

[RFC PATCH 2/8] drm: add new tracepoint fields to drm_device and drm_file

2021-10-20 Thread Gurchetan Singh

For struct drm_device, add:
- mem_total
- import_mem_total

For struct drm_file, add:
- mem_instance
- import_mem_instance

Signed-off-by: Gurchetan Singh 
---
 include/drm/drm_device.h | 16 
 include/drm/drm_file.h   | 16 
 2 files changed, 32 insertions(+)

diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h
index 604b1d1b2d72..35a96bda5320 100644
--- a/include/drm/drm_device.h
+++ b/include/drm/drm_device.h
@@ -298,6 +298,22 @@ struct drm_device {
 */
struct drm_fb_helper *fb_helper;
 
+   /**
+* @mem_total:
+*
+* The total size of all GEM objects known to this DRM device.  Used
+* with `gpu_mem_total` tracepoint.
+*/
+   atomic64_t mem_total;
+
+   /**
+* @import_mem_total:
+*
+* The total size of all GEM objects imported into this DRM device from
+* external exporters.  Used with `gpu_mem_total` tracepoint.
+*/
+   atomic64_t import_mem_total;
+
/* Everything below here is for legacy driver, never use! */
/* private: */
 #if IS_ENABLED(CONFIG_DRM_LEGACY)
diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
index a3acb7ac3550..a5b9befcf1db 100644
--- a/include/drm/drm_file.h
+++ b/include/drm/drm_file.h
@@ -362,6 +362,22 @@ struct drm_file {
 */
struct drm_prime_file_private prime;
 
+   /**
+* @mem_instance:
+*
+* The total size of all GEM objects known into this instance of the DRM
+* device.  Used with `gpu_mem_instance` tracepoint.
+*/
+   atomic64_t mem_instance;
+
+   /**
+* @import_mem_instance:
+*
+* The total size of all GEM objects imported into this instance of the
+* DRM device.  Used with `gpu_mem_instance` tracepoint.
+*/
+   atomic64_t import_mem_instance;
+
/* private: */
 #if IS_ENABLED(CONFIG_DRM_LEGACY)
unsigned long lock_count; /* DRI1 legacy lock count */
-- 
2.25.1

[RFC PATCH 4/8] drm: start using drm_gem_trace_gpu_mem_total

2021-10-20 Thread Gurchetan Singh

- drm_gem_private_object_init(..) increases the total memory
  counter.

  * All GEM objects (whether allocated or imported) seem to begin
there.
  * If there's a better place/method, please do let
me know.

- drm_gem_object_free(..) decreases the total memory counter.

Signed-off-by: Gurchetan Singh 
---
 drivers/gpu/drm/drm_gem.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 24a719b79400..528d7b29dccf 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -213,6 +213,7 @@ void drm_gem_private_object_init(struct drm_device *dev,
obj->resv = &obj->_resv;
 
drm_vma_node_reset(&obj->vma_node);
+   drm_gem_trace_gpu_mem_total(dev, obj->size, false);
 }
 EXPORT_SYMBOL(drm_gem_private_object_init);
 
@@ -1015,6 +1016,10 @@ drm_gem_object_free(struct kref *kref)
struct drm_gem_object *obj =
container_of(kref, struct drm_gem_object, refcount);
 
+   struct drm_device *dev = obj->dev;
+
+   drm_gem_trace_gpu_mem_total(dev, -obj->size, false);
+
if (WARN_ON(!obj->funcs->free))
return;
 
-- 
2.25.1

[RFC PATCH 7/8] drm: trace memory import per DRM file

2021-10-20 Thread Gurchetan Singh

- drm_gem_prime_fd_to_handle increases the per-instance imported
  memory counter

- drm_gem_remove_prime_handles decreases the per-instance imported
  memory counter on non-fake imports.

Signed-off-by: Gurchetan Singh 
---
 drivers/gpu/drm/drm_gem.c   | 3 +++
 drivers/gpu/drm/drm_prime.c | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 7637be0ceb74..c07568ea8442 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -231,6 +231,9 @@ drm_gem_remove_prime_handles(struct drm_gem_object *obj, 
struct drm_file *filp)
drm_prime_remove_buf_handle_locked(&filp->prime,
   obj->dma_buf,
   &removed_real_import);
+   if (removed_real_import)
+   drm_gem_trace_gpu_mem_instance(dev, filp, -obj->size,
+  true);
}
mutex_unlock(&filp->prime.lock);
 }
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 31f033ec8549..1afcae0c4038 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -349,6 +349,8 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev,
 
dma_buf_put(dma_buf);
 
+   drm_gem_trace_gpu_mem_instance(dev, file_priv, obj->size, true);
+
return 0;
 
 fail:
-- 
2.25.1

[RFC PATCH 5/8] drm: start using drm_gem_trace_gpu_mem_instance

2021-10-20 Thread Gurchetan Singh

- drm_gem_handle_create_tail(..) increases the per instance
  memory counter.

- drm_gem_object_release_handle(..) decreases the per instance
  memory counter.

Signed-off-by: Gurchetan Singh 
---
 drivers/gpu/drm/drm_gem.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 528d7b29dccf..6f70419f2c90 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -298,6 +298,7 @@ drm_gem_object_release_handle(int id, void *ptr, void *data)
 {
struct drm_file *file_priv = data;
struct drm_gem_object *obj = ptr;
+   struct drm_device *dev = file_priv->minor->dev;
 
if (obj->funcs->close)
obj->funcs->close(obj, file_priv);
@@ -305,6 +306,7 @@ drm_gem_object_release_handle(int id, void *ptr, void *data)
drm_gem_remove_prime_handles(obj, file_priv);
drm_vma_node_revoke(&obj->vma_node, file_priv);
 
+   drm_gem_trace_gpu_mem_instance(dev, file_priv, -obj->size, false);
drm_gem_object_handle_put_unlocked(obj);
 
return 0;
@@ -447,6 +449,7 @@ drm_gem_handle_create_tail(struct drm_file *file_priv,
goto err_revoke;
}
 
+   drm_gem_trace_gpu_mem_instance(dev, file_priv, obj->size, false);
*handlep = handle;
return 0;
 
-- 
2.25.1

[RFC PATCH 6/8] drm: track real and fake imports in drm_prime_member

2021-10-20 Thread Gurchetan Singh

Sometimes, an exported dma-buf is added to the import list.
That messes up with trace point accounting, so track real vs.
fake imports to correct this.

Signed-off-by: Gurchetan Singh 
---
 drivers/gpu/drm/drm_gem.c  |  5 -
 drivers/gpu/drm/drm_internal.h |  4 ++--
 drivers/gpu/drm/drm_prime.c| 18 +-
 3 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 6f70419f2c90..7637be0ceb74 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -226,8 +226,11 @@ drm_gem_remove_prime_handles(struct drm_gem_object *obj, 
struct drm_file *filp)
 */
mutex_lock(&filp->prime.lock);
if (obj->dma_buf) {
+   struct drm_device *dev = filp->minor->dev;
+   bool removed_real_import = false;
drm_prime_remove_buf_handle_locked(&filp->prime,
-  obj->dma_buf);
+  obj->dma_buf,
+  &removed_real_import);
}
mutex_unlock(&filp->prime.lock);
 }
diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
index 17f3548c8ed2..40d572e46e2a 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -75,8 +75,8 @@ int drm_prime_fd_to_handle_ioctl(struct drm_device *dev, void 
*data,
 void drm_prime_init_file_private(struct drm_prime_file_private *prime_fpriv);
 void drm_prime_destroy_file_private(struct drm_prime_file_private 
*prime_fpriv);
 void drm_prime_remove_buf_handle_locked(struct drm_prime_file_private 
*prime_fpriv,
-   struct dma_buf *dma_buf);
-
+   struct dma_buf *dma_buf,
+   bool *removed_real_import);
 /* drm_drv.c */
 struct drm_minor *drm_minor_acquire(unsigned int minor_id);
 void drm_minor_release(struct drm_minor *minor);
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index deb23dbec8b5..31f033ec8549 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -90,13 +90,15 @@
 struct drm_prime_member {
struct dma_buf *dma_buf;
uint32_t handle;
+   bool fake_import;
 
struct rb_node dmabuf_rb;
struct rb_node handle_rb;
 };
 
 static int drm_prime_add_buf_handle(struct drm_prime_file_private *prime_fpriv,
-   struct dma_buf *dma_buf, uint32_t handle)
+   struct dma_buf *dma_buf, uint32_t handle,
+   bool fake_import)
 {
struct drm_prime_member *member;
struct rb_node **p, *rb;
@@ -108,6 +110,7 @@ static int drm_prime_add_buf_handle(struct 
drm_prime_file_private *prime_fpriv,
get_dma_buf(dma_buf);
member->dma_buf = dma_buf;
member->handle = handle;
+   member->fake_import = fake_import;
 
rb = NULL;
p = &prime_fpriv->dmabufs.rb_node;
@@ -188,9 +191,11 @@ static int drm_prime_lookup_buf_handle(struct 
drm_prime_file_private *prime_fpri
 }
 
 void drm_prime_remove_buf_handle_locked(struct drm_prime_file_private 
*prime_fpriv,
-   struct dma_buf *dma_buf)
+   struct dma_buf *dma_buf,
+   bool *removed_real_import)
 {
struct rb_node *rb;
+   *removed_real_import = false;
 
rb = prime_fpriv->dmabufs.rb_node;
while (rb) {
@@ -201,6 +206,9 @@ void drm_prime_remove_buf_handle_locked(struct 
drm_prime_file_private *prime_fpr
rb_erase(&member->handle_rb, &prime_fpriv->handles);
rb_erase(&member->dmabuf_rb, &prime_fpriv->dmabufs);
 
+   if (!member->fake_import)
+   *removed_real_import = true;
+
dma_buf_put(dma_buf);
kfree(member);
return;
@@ -303,7 +311,6 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev,
return PTR_ERR(dma_buf);
 
mutex_lock(&file_priv->prime.lock);
-
ret = drm_prime_lookup_buf_handle(&file_priv->prime,
dma_buf, handle);
if (ret == 0)
@@ -315,6 +322,7 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev,
obj = dev->driver->gem_prime_import(dev, dma_buf);
else
obj = drm_gem_prime_import(dev, dma_buf);
+
if (IS_ERR(obj)) {
ret = PTR_ERR(obj);
goto out_unlock;
@@ -334,7 +342,7 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev,
goto out_put;
 
ret = drm_prime_add_buf_handle(&file_priv->prime,
-

[RFC PATCH 1/8] tracing/gpu: modify gpu_mem_total

2021-10-20 Thread Gurchetan Singh

The existing gpu_mem_total tracepoint [1] is not currently used by
any in-tree consumers, we should add some.

In addition, there's a desire to report imported memory via the
counters too [2].

To do this, we'll have to redefine the event to:

a) Change 'pid' to 'ctx_id'

The reason is  DRM subsystem is created with GEM objects, DRM devices
and DRM files in mind.  A GEM object is associated with DRM device,
and it may be shared between one or more DRM files.

Per-instance (or "context") counters make more sense than per-process
counters for DRM.  For GPUs that per process counters (kgsl), this
change is backwards compatible.

b) add an "import_mem_total" field

We're just appending a field, so no problem here.  Change "size" to
"mem_total" as well (name changes are backwards compatible).

[1] https://lore.kernel.org/r/20200302234840.57188-1-zzyi...@google.com/
[2] https://www.spinics.net/lists/kernel/msg4062769.html

Signed-off-by: Gurchetan Singh 
---
 include/trace/events/gpu_mem.h | 61 --
 1 file changed, 43 insertions(+), 18 deletions(-)

diff --git a/include/trace/events/gpu_mem.h b/include/trace/events/gpu_mem.h
index 26d871f96e94..198b87f50356 100644
--- a/include/trace/events/gpu_mem.h
+++ b/include/trace/events/gpu_mem.h
@@ -14,41 +14,66 @@
 #include 
 
 /*
- * The gpu_memory_total event indicates that there's an update to either the
- * global or process total gpu memory counters.
+ * The gpu_mem_total event indicates that there's an update to local or
+ * global gpu memory counters.
  *
- * This event should be emitted whenever the kernel device driver allocates,
- * frees, imports, unimports memory in the GPU addressable space.
+ * This event should be emitted whenever a GPU device (ctx_id == 0):
  *
- * @gpu_id: This is the gpu id.
+ *   1) allocates memory.
+ *   2) frees memory.
+ *   3) imports memory from an external exporter.
  *
- * @pid: Put 0 for global total, while positive pid for process total.
+ * OR when a GPU device instance (ctx_id != 0):
  *
- * @size: Size of the allocation in bytes.
+ *   1) allocates or acquires a reference to memory from another instance.
+ *   2) frees or releases a reference to memory from another instance.
+ *   3) imports memory from another GPU device instance.
  *
+ * When ctx_id == 0, both mem_total and import_mem_total total counters
+ * represent a global total.  When ctx_id == 0, these counters represent
+ * an instance specifical total.
+ *
+ * Note allocation does not necessarily mean backing the memory with pages.
+ *
+ * @gpu_id: unique ID of the GPU.
+ *
+ * @ctx_id: an ID for specific instance of the GPU device.
+ *
+ * @mem_total: - total size of memory known to a GPU device, including
+ *  imports (ctx_id == 0)
+ *- total size of memory known to a GPU device instance
+ *  (ctx_id != 0)
+ *
+ * @import_mem_total: - size of memory imported from outside GPU
+ * device (ctx_id == 0)
+ *   - size of memory imported into GPU device instance.
+ * (ctx_id == 0)
  */
 TRACE_EVENT(gpu_mem_total,
 
-   TP_PROTO(uint32_t gpu_id, uint32_t pid, uint64_t size),
+   TP_PROTO(u32 gpu_id, u32 ctx_id, u64 mem_total, u64 import_mem_total),
 
-   TP_ARGS(gpu_id, pid, size),
+   TP_ARGS(gpu_id, ctx_id, mem_total, import_mem_total),
 
TP_STRUCT__entry(
-   __field(uint32_t, gpu_id)
-   __field(uint32_t, pid)
-   __field(uint64_t, size)
+   __field(u32, gpu_id)
+   __field(u32, ctx_id)
+   __field(u64, mem_total)
+   __field(u64, import_mem_total)
),
 
TP_fast_assign(
__entry->gpu_id = gpu_id;
-   __entry->pid = pid;
-   __entry->size = size;
+   __entry->ctx_id = ctx_id;
+   __entry->mem_total = mem_total;
+   __entry->import_mem_total = import_mem_total;
),
 
-   TP_printk("gpu_id=%u pid=%u size=%llu",
-   __entry->gpu_id,
-   __entry->pid,
-   __entry->size)
+   TP_printk("gpu_id=%u, ctx_id=%u, mem total=%llu, mem import total=%llu",
+ __entry->gpu_id,
+ __entry->ctx_id,
+ __entry->mem_total,
+ __entry->import_mem_total)
 );
 
 #endif /* _TRACE_GPU_MEM_H */
-- 
2.25.1

[RFC PATCH 3/8] drm: add helper functions for gpu_mem_total and gpu_mem_instance

2021-10-20 Thread Gurchetan Singh

- Add helper functions for above tracepoints in the drm_gem.{h,c}
  files

- Given more tracepoints, a drm_trace.* file may be started

Signed-off-by: Gurchetan Singh 
---
 drivers/gpu/drm/Kconfig   |  1 +
 drivers/gpu/drm/drm_gem.c | 49 +++
 include/drm/drm_gem.h |  7 ++
 3 files changed, 57 insertions(+)

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index b91f0ce8154c..cef8545df1c9 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -15,6 +15,7 @@ menuconfig DRM
select I2C_ALGOBIT
select DMA_SHARED_BUFFER
select SYNC_FILE
+   select TRACE_GPU_MEM
 # gallium uses SYS_kcmp for os_same_file_description() to de-duplicate
 # device and dmabuf fd. Let's make sure that is available for our userspace.
select KCMP
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 4dcdec6487bb..24a719b79400 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -49,6 +49,8 @@
 #include 
 #include 
 
+#include 
+
 #include "drm_internal.h"
 
 /** @file drm_gem.c
@@ -138,6 +140,53 @@ int drm_gem_object_init(struct drm_device *dev,
 }
 EXPORT_SYMBOL(drm_gem_object_init);
 
+/**
+ * drm_gem_trace_gpu_mem_total - emit a total memory trace event
+ * @dev: drm_device to emit trace event for
+ * @delta: size change
+ * @imported: whether the imported or total memory counter should be used
+ *
+ * Emits a `gpu_mem_total` trace event with given parameters.
+ */
+void
+drm_gem_trace_gpu_mem_total(struct drm_device *dev, s64 delta, bool imported)
+{
+   if (imported)
+   atomic64_add(delta, &dev->import_mem_total);
+   else
+   atomic64_add(delta, &dev->mem_total);
+
+   trace_gpu_mem_total(dev->primary->index, 0,
+   atomic64_read(&dev->mem_total),
+   atomic64_read(&dev->import_mem_total));
+}
+EXPORT_SYMBOL(drm_gem_trace_gpu_mem_total);
+
+/**
+ * drm_gem_trace_gpu_mem_instance - emit a per instance memory trace event
+ * @dev: drm_device associated with DRM file
+ * @file: drm_file to emit event for
+ * @delta: size change
+ * @imported: whether the imported or total memory counter should be used
+ *
+ * Emits a `gpu_mem_instance` trace event with given parameters.
+ */
+void
+drm_gem_trace_gpu_mem_instance(struct drm_device *dev, struct drm_file *file,
+  s64 delta, bool imported)
+{
+   if (imported)
+   atomic64_add(delta, &file->import_mem_instance);
+   else
+   atomic64_add(delta, &file->mem_instance);
+
+   trace_gpu_mem_total(dev->primary->index,
+   file_inode(file->filp)->i_ino,
+   atomic64_read(&file->mem_instance),
+   atomic64_read(&file->import_mem_instance));
+}
+EXPORT_SYMBOL(drm_gem_trace_gpu_mem_instance);
+
 /**
  * drm_gem_private_object_init - initialize an allocated private GEM object
  * @dev: drm_device the object should be initialized for
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index 35e7f44c2a75..d61937cce222 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -342,6 +342,13 @@ struct drm_gem_object {
 
 void drm_gem_object_release(struct drm_gem_object *obj);
 void drm_gem_object_free(struct kref *kref);
+
+void drm_gem_trace_gpu_mem_total(struct drm_device *dev, s64 delta,
+bool imported);
+void drm_gem_trace_gpu_mem_instance(struct drm_device *dev,
+   struct drm_file *file,
+   s64 delta, bool imported);
+
 int drm_gem_object_init(struct drm_device *dev,
struct drm_gem_object *obj, size_t size);
 void drm_gem_private_object_init(struct drm_device *dev,
-- 
2.25.1

[RFC PATCH 0/8] GPU memory tracepoints

2021-10-20 Thread Gurchetan Singh

This is latest iteration of GPU memory tracepoints [1].

In the past, there were questions about the "big picture" of memory  
accounting [2], especially given related work on dma-buf heaps and DRM
cgroups [3].  Also, there was a desire for a non-driver specific solution.

The great news is the dma-buf heaps work as recently landed [4].  It uses
sys-fs and the plan is to use it in conjunction with the tracepoint
solution [5].  We're aiming for the GPU tracepoint to calculate totals
per DRM-instance (a proxy for per-process on Android) and per-DRM device.

The cgroups work looks terrific too and hopefully we can deduplicate code once
that's merged.  Though that's abit of an implementation detail, so long as
the "GPU tracepoints" +  "dma-buf heap stats" plan sounds good for Android.

This series modifies the GPU memory tracepoint API in a non-breaking fashion
(patch 1), and adds accounting via the GEM subsystem (patches 2 --> 7). Given
the multiple places where memory events happen, there's a bunch trace events
scattered in various places.  The hardest part is allocation, where each driver
has their own API.  If there's a better way, do say so.

The last patch is incomplete; we would like general feedback before proceeding
further.

[1] https://lore.kernel.org/lkml/20200302235044.59163-1-zzyi...@google.com/
[2] https://lists.freedesktop.org/archives/dri-devel/2021-January/295120.html
[3] https://www.spinics.net/lists/cgroups/msg27867.html
[4] https://www.spinics.net/lists/linux-doc/msg97788.html
[5] https://source.android.com/devices/graphics/implement-dma-buf-gpu-mem

Gurchetan Singh (8):
  tracing/gpu: modify gpu_mem_total
  drm: add new tracepoint fields to drm_device and drm_file
  drm: add helper functions for gpu_mem_total and gpu_mem_instance
  drm: start using drm_gem_trace_gpu_mem_total
  drm: start using drm_gem_trace_gpu_mem_instance
  drm: track real and fake imports in drm_prime_member
  drm: trace memory import per DRM file
  drm: trace memory import per DRM device

 drivers/gpu/drm/Kconfig|  1 +
 drivers/gpu/drm/drm_gem.c  | 65 +-
 drivers/gpu/drm/drm_internal.h |  4 +--
 drivers/gpu/drm/drm_prime.c| 22 +---
 include/drm/drm_device.h   | 16 +
 include/drm/drm_file.h | 16 +
 include/drm/drm_gem.h  |  7 
 include/trace/events/gpu_mem.h | 61 +--
 8 files changed, 166 insertions(+), 26 deletions(-)

-- 
2.25.1

[PATCH v3 10/12] drm/virtio: implement context init: handle VIRTGPU_CONTEXT_PARAM_POLL_RINGS_MASK

2021-09-21 Thread Gurchetan Singh

For the Sommelier guest Wayland proxy, it's desirable for the
DRM fd to be pollable in response to an host compositor event.
This can also be used by the 3D driver to poll events on a CPU
timeline.

This enables the DRM fd associated with a particular 3D context
to be polled independent of KMS events.  The parameter
VIRTGPU_CONTEXT_PARAM_POLL_RINGS_MASK specifies the pollable
rings.

Signed-off-by: Gurchetan Singh 
Acked-by: Nicholas Verne 
---
 drivers/gpu/drm/virtio/virtgpu_drv.h   |  1 +
 drivers/gpu/drm/virtio/virtgpu_ioctl.c | 22 +-
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
b/drivers/gpu/drm/virtio/virtgpu_drv.h
index cca9ab505deb..cb60d52c2bd1 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -266,6 +266,7 @@ struct virtio_gpu_fpriv {
bool context_created;
uint32_t num_rings;
uint64_t base_fence_ctx;
+   uint64_t ring_idx_mask;
struct mutex context_lock;
 };
 
diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
index 262f79210283..be7b22a03884 100644
--- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
+++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
@@ -694,6 +694,7 @@ static int virtio_gpu_context_init_ioctl(struct drm_device 
*dev,
 {
int ret = 0;
uint32_t num_params, i, param, value;
+   uint64_t valid_ring_mask;
size_t len;
struct drm_virtgpu_context_set_param *ctx_set_params = NULL;
struct virtio_gpu_device *vgdev = dev->dev_private;
@@ -707,7 +708,7 @@ static int virtio_gpu_context_init_ioctl(struct drm_device 
*dev,
return -EINVAL;
 
/* Number of unique parameters supported at this time. */
-   if (num_params > 2)
+   if (num_params > 3)
return -EINVAL;
 
ctx_set_params = memdup_user(u64_to_user_ptr(args->ctx_set_params),
@@ -761,12 +762,31 @@ static int virtio_gpu_context_init_ioctl(struct 
drm_device *dev,
vfpriv->base_fence_ctx = dma_fence_context_alloc(value);
vfpriv->num_rings = value;
break;
+   case VIRTGPU_CONTEXT_PARAM_POLL_RINGS_MASK:
+   if (vfpriv->ring_idx_mask) {
+   ret = -EINVAL;
+   goto out_unlock;
+   }
+
+   vfpriv->ring_idx_mask = value;
+   break;
default:
ret = -EINVAL;
goto out_unlock;
}
}
 
+   if (vfpriv->ring_idx_mask) {
+   valid_ring_mask = 0;
+   for (i = 0; i < vfpriv->num_rings; i++)
+   valid_ring_mask |= 1 << i;
+
+   if (~valid_ring_mask & vfpriv->ring_idx_mask) {
+   ret = -EINVAL;
+   goto out_unlock;
+   }
+   }
+
virtio_gpu_create_context_locked(vgdev, vfpriv);
virtio_gpu_notify(vgdev);
 
-- 
2.33.0.464.g1972c5931b-goog

[PATCH v3 07/12] drm/virtio: implement context init: plumb {base_fence_ctx, ring_idx} to virtio_gpu_fence_alloc

2021-09-21 Thread Gurchetan Singh

These were defined in the previous commit. We'll need these
parameters when allocating a dma_fence.  The use case for this
is multiple synchronizations timelines.

The maximum number of timelines per 3D instance will be 32. Usually,
only 2 are needed -- one for CPU commands, and another for GPU
commands.

As such, we'll need to specify these parameters when allocating a
dma_fence.

vgdev->fence_drv.context is the "default" fence context for 2D mode
and old userspace.

Signed-off-by: Gurchetan Singh 
Acked-by: Lingfeng Yang 
---
 drivers/gpu/drm/virtio/virtgpu_drv.h   | 5 +++--
 drivers/gpu/drm/virtio/virtgpu_fence.c | 4 +++-
 drivers/gpu/drm/virtio/virtgpu_ioctl.c | 9 +
 drivers/gpu/drm/virtio/virtgpu_plane.c | 3 ++-
 4 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
b/drivers/gpu/drm/virtio/virtgpu_drv.h
index 401aec1a5efb..a5142d60c2fa 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -426,8 +426,9 @@ struct drm_plane *virtio_gpu_plane_init(struct 
virtio_gpu_device *vgdev,
int index);
 
 /* virtgpu_fence.c */
-struct virtio_gpu_fence *virtio_gpu_fence_alloc(
-   struct virtio_gpu_device *vgdev);
+struct virtio_gpu_fence *virtio_gpu_fence_alloc(struct virtio_gpu_device 
*vgdev,
+   uint64_t base_fence_ctx,
+   uint32_t ring_idx);
 void virtio_gpu_fence_emit(struct virtio_gpu_device *vgdev,
  struct virtio_gpu_ctrl_hdr *cmd_hdr,
  struct virtio_gpu_fence *fence);
diff --git a/drivers/gpu/drm/virtio/virtgpu_fence.c 
b/drivers/gpu/drm/virtio/virtgpu_fence.c
index d28e25e8409b..24c728b65d21 100644
--- a/drivers/gpu/drm/virtio/virtgpu_fence.c
+++ b/drivers/gpu/drm/virtio/virtgpu_fence.c
@@ -71,7 +71,9 @@ static const struct dma_fence_ops virtio_gpu_fence_ops = {
.timeline_value_str  = virtio_gpu_timeline_value_str,
 };
 
-struct virtio_gpu_fence *virtio_gpu_fence_alloc(struct virtio_gpu_device 
*vgdev)
+struct virtio_gpu_fence *virtio_gpu_fence_alloc(struct virtio_gpu_device 
*vgdev,
+   uint64_t base_fence_ctx,
+   uint32_t ring_idx)
 {
struct virtio_gpu_fence_driver *drv = &vgdev->fence_drv;
struct virtio_gpu_fence *fence = kzalloc(sizeof(struct 
virtio_gpu_fence),
diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
index f5281d1e30e1..f51f3393a194 100644
--- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
+++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
@@ -173,7 +173,7 @@ static int virtio_gpu_execbuffer_ioctl(struct drm_device 
*dev, void *data,
goto out_memdup;
}
 
-   out_fence = virtio_gpu_fence_alloc(vgdev);
+   out_fence = virtio_gpu_fence_alloc(vgdev, vgdev->fence_drv.context, 0);
if(!out_fence) {
ret = -ENOMEM;
goto out_unresv;
@@ -288,7 +288,7 @@ static int virtio_gpu_resource_create_ioctl(struct 
drm_device *dev, void *data,
if (params.size == 0)
params.size = PAGE_SIZE;
 
-   fence = virtio_gpu_fence_alloc(vgdev);
+   fence = virtio_gpu_fence_alloc(vgdev, vgdev->fence_drv.context, 0);
if (!fence)
return -ENOMEM;
ret = virtio_gpu_object_create(vgdev, ¶ms, &qobj, fence);
@@ -367,7 +367,7 @@ static int virtio_gpu_transfer_from_host_ioctl(struct 
drm_device *dev,
if (ret != 0)
goto err_put_free;
 
-   fence = virtio_gpu_fence_alloc(vgdev);
+   fence = virtio_gpu_fence_alloc(vgdev, vgdev->fence_drv.context, 0);
if (!fence) {
ret = -ENOMEM;
goto err_unlock;
@@ -427,7 +427,8 @@ static int virtio_gpu_transfer_to_host_ioctl(struct 
drm_device *dev, void *data,
goto err_put_free;
 
ret = -ENOMEM;
-   fence = virtio_gpu_fence_alloc(vgdev);
+   fence = virtio_gpu_fence_alloc(vgdev, vgdev->fence_drv.context,
+  0);
if (!fence)
goto err_unlock;
 
diff --git a/drivers/gpu/drm/virtio/virtgpu_plane.c 
b/drivers/gpu/drm/virtio/virtgpu_plane.c
index a49fd9480381..6d3cc9e238a4 100644
--- a/drivers/gpu/drm/virtio/virtgpu_plane.c
+++ b/drivers/gpu/drm/virtio/virtgpu_plane.c
@@ -256,7 +256,8 @@ static int virtio_gpu_plane_prepare_fb(struct drm_plane 
*plane,
return 0;
 
if (bo->dumb && (plane->state->fb != new_state->fb)) {
-   vgfb->fence = virtio_gpu_fence_alloc(vgdev);
+   vgfb->fence = virtio_gpu_fence_alloc(vgdev, 
vgdev->fence_drv.context,
+0

[PATCH v3 06/12] drm/virtio: implement context init: track {ring_idx, emit_fence_info} in virtio_gpu_fence

2021-09-21 Thread Gurchetan Singh

Each fence should be associated with a [fence ID, fence_context,
seqno].  The seqno number is just the fence id.

To get the fence context, we add the ring_idx to the 3D context's
base_fence_ctx.  The ring_idx is between 0 and 31, inclusive.

Each 3D context will have it's own base_fence_ctx. The ring_idx will
be emitted to host userspace, when emit_fence_info is true.

Signed-off-by: Gurchetan Singh 
Acked-by: Lingfeng Yang 
---
 drivers/gpu/drm/virtio/virtgpu_drv.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
b/drivers/gpu/drm/virtio/virtgpu_drv.h
index 9996abf60e3a..401aec1a5efb 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -139,7 +139,9 @@ struct virtio_gpu_fence_driver {
 
 struct virtio_gpu_fence {
struct dma_fence f;
+   uint32_t ring_idx;
uint64_t fence_id;
+   bool emit_fence_info;
struct virtio_gpu_fence_driver *drv;
struct list_head node;
 };
-- 
2.33.0.464.g1972c5931b-goog

[PATCH v3 08/12] drm/virtio: implement context init: stop using drv->context when creating fence

2021-09-21 Thread Gurchetan Singh

The plumbing is all here to do this.  Since we always use the
default fence context when allocating a fence, this makes no
functional difference.

We can't process just the largest fence id anymore, since it's
it's associated with different timelines.  It's fine for fence_id
260 to signal before 259.  As such, process each fence_id
individually.

Signed-off-by: Gurchetan Singh 
Acked-by: Lingfeng Yang 
---
 drivers/gpu/drm/virtio/virtgpu_fence.c | 16 ++--
 drivers/gpu/drm/virtio/virtgpu_vq.c| 15 +++
 2 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_fence.c 
b/drivers/gpu/drm/virtio/virtgpu_fence.c
index 24c728b65d21..98a00c1e654d 100644
--- a/drivers/gpu/drm/virtio/virtgpu_fence.c
+++ b/drivers/gpu/drm/virtio/virtgpu_fence.c
@@ -75,20 +75,25 @@ struct virtio_gpu_fence *virtio_gpu_fence_alloc(struct 
virtio_gpu_device *vgdev,
uint64_t base_fence_ctx,
uint32_t ring_idx)
 {
+   uint64_t fence_context = base_fence_ctx + ring_idx;
struct virtio_gpu_fence_driver *drv = &vgdev->fence_drv;
struct virtio_gpu_fence *fence = kzalloc(sizeof(struct 
virtio_gpu_fence),
GFP_KERNEL);
+
if (!fence)
return fence;
 
fence->drv = drv;
+   fence->ring_idx = ring_idx;
+   fence->emit_fence_info = !(base_fence_ctx == drv->context);
 
/* This only partially initializes the fence because the seqno is
 * unknown yet.  The fence must not be used outside of the driver
 * until virtio_gpu_fence_emit is called.
 */
-   dma_fence_init(&fence->f, &virtio_gpu_fence_ops, &drv->lock, 
drv->context,
-  0);
+
+   dma_fence_init(&fence->f, &virtio_gpu_fence_ops, &drv->lock,
+  fence_context, 0);
 
return fence;
 }
@@ -110,6 +115,13 @@ void virtio_gpu_fence_emit(struct virtio_gpu_device *vgdev,
 
cmd_hdr->flags |= cpu_to_le32(VIRTIO_GPU_FLAG_FENCE);
cmd_hdr->fence_id = cpu_to_le64(fence->fence_id);
+
+   /* Only currently defined fence param. */
+   if (fence->emit_fence_info) {
+   cmd_hdr->flags |=
+   cpu_to_le32(VIRTIO_GPU_FLAG_INFO_RING_IDX);
+   cmd_hdr->ring_idx = (u8)fence->ring_idx;
+   }
 }
 
 void virtio_gpu_fence_event_process(struct virtio_gpu_device *vgdev,
diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c 
b/drivers/gpu/drm/virtio/virtgpu_vq.c
index db7741549ab0..7c052efe8836 100644
--- a/drivers/gpu/drm/virtio/virtgpu_vq.c
+++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
@@ -199,7 +199,7 @@ void virtio_gpu_dequeue_ctrl_func(struct work_struct *work)
struct list_head reclaim_list;
struct virtio_gpu_vbuffer *entry, *tmp;
struct virtio_gpu_ctrl_hdr *resp;
-   u64 fence_id = 0;
+   u64 fence_id;
 
INIT_LIST_HEAD(&reclaim_list);
spin_lock(&vgdev->ctrlq.qlock);
@@ -226,23 +226,14 @@ void virtio_gpu_dequeue_ctrl_func(struct work_struct 
*work)
DRM_DEBUG("response 0x%x\n", 
le32_to_cpu(resp->type));
}
if (resp->flags & cpu_to_le32(VIRTIO_GPU_FLAG_FENCE)) {
-   u64 f = le64_to_cpu(resp->fence_id);
-
-   if (fence_id > f) {
-   DRM_ERROR("%s: Oops: fence %llx -> %llx\n",
- __func__, fence_id, f);
-   } else {
-   fence_id = f;
-   }
+   fence_id = le64_to_cpu(resp->fence_id);
+   virtio_gpu_fence_event_process(vgdev, fence_id);
}
if (entry->resp_cb)
entry->resp_cb(vgdev, entry);
}
wake_up(&vgdev->ctrlq.ack_queue);
 
-   if (fence_id)
-   virtio_gpu_fence_event_process(vgdev, fence_id);
-
list_for_each_entry_safe(entry, tmp, &reclaim_list, list) {
if (entry->objs)
virtio_gpu_array_put_free_delayed(vgdev, entry->objs);
-- 
2.33.0.464.g1972c5931b-goog

[PATCH v3 12/12] drm/virtio: implement context init: advertise feature to userspace

2021-09-21 Thread Gurchetan Singh

This advertises the context init feature to userspace, along with
a mask of supported capabilities.

Signed-off-by: Gurchetan Singh 
Acked-by: Lingfeng Yang 
---
 drivers/gpu/drm/virtio/virtgpu_ioctl.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
index fdaa7f3d9eeb..5618a1d5879c 100644
--- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
+++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
@@ -286,6 +286,12 @@ static int virtio_gpu_getparam_ioctl(struct drm_device 
*dev, void *data,
case VIRTGPU_PARAM_CROSS_DEVICE:
value = vgdev->has_resource_assign_uuid ? 1 : 0;
break;
+   case VIRTGPU_PARAM_CONTEXT_INIT:
+   value = vgdev->has_context_init ? 1 : 0;
+   break;
+   case VIRTGPU_PARAM_SUPPORTED_CAPSET_IDs:
+   value = vgdev->capset_id_mask;
+   break;
default:
return -EINVAL;
}
-- 
2.33.0.464.g1972c5931b-goog

[PATCH v3 11/12] drm/virtio: implement context init: add virtio_gpu_fence_event

2021-09-21 Thread Gurchetan Singh

Similar to DRM_VMW_EVENT_FENCE_SIGNALED.  Sends a pollable event
to the DRM file descriptor when a fence on a specific ring is
signaled.

One difference is the event is not exposed via the UAPI -- this is
because host responses are on a shared memory buffer of type
BLOB_MEM_GUEST [this is the common way to receive responses with
virtgpu].  As such, there is no context specific read(..)
implementation either -- just a poll(..) implementation.

Signed-off-by: Gurchetan Singh 
Acked-by: Nicholas Verne 
---
 drivers/gpu/drm/virtio/virtgpu_drv.c   | 43 +-
 drivers/gpu/drm/virtio/virtgpu_drv.h   |  7 +
 drivers/gpu/drm/virtio/virtgpu_fence.c | 10 ++
 drivers/gpu/drm/virtio/virtgpu_ioctl.c | 34 
 4 files changed, 93 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.c 
b/drivers/gpu/drm/virtio/virtgpu_drv.c
index 9d963f1fda8f..749db18dcfa2 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.c
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.c
@@ -29,6 +29,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
@@ -155,6 +157,35 @@ static void virtio_gpu_config_changed(struct virtio_device 
*vdev)
schedule_work(&vgdev->config_changed_work);
 }
 
+static __poll_t virtio_gpu_poll(struct file *filp,
+   struct poll_table_struct *wait)
+{
+   struct drm_file *drm_file = filp->private_data;
+   struct virtio_gpu_fpriv *vfpriv = drm_file->driver_priv;
+   struct drm_device *dev = drm_file->minor->dev;
+   struct drm_pending_event *e = NULL;
+   __poll_t mask = 0;
+
+   if (!vfpriv->ring_idx_mask)
+   return drm_poll(filp, wait);
+
+   poll_wait(filp, &drm_file->event_wait, wait);
+
+   if (!list_empty(&drm_file->event_list)) {
+   spin_lock_irq(&dev->event_lock);
+   e = list_first_entry(&drm_file->event_list,
+struct drm_pending_event, link);
+   drm_file->event_space += e->event->length;
+   list_del(&e->link);
+   spin_unlock_irq(&dev->event_lock);
+
+   kfree(e);
+   mask |= EPOLLIN | EPOLLRDNORM;
+   }
+
+   return mask;
+}
+
 static struct virtio_device_id id_table[] = {
{ VIRTIO_ID_GPU, VIRTIO_DEV_ANY_ID },
{ 0 },
@@ -194,7 +225,17 @@ MODULE_AUTHOR("Dave Airlie ");
 MODULE_AUTHOR("Gerd Hoffmann ");
 MODULE_AUTHOR("Alon Levy");
 
-DEFINE_DRM_GEM_FOPS(virtio_gpu_driver_fops);
+static const struct file_operations virtio_gpu_driver_fops = {
+   .owner  = THIS_MODULE,
+   .open   = drm_open,
+   .release= drm_release,
+   .unlocked_ioctl = drm_ioctl,
+   .compat_ioctl   = drm_compat_ioctl,
+   .poll   = virtio_gpu_poll,
+   .read   = drm_read,
+   .llseek = noop_llseek,
+   .mmap   = drm_gem_mmap
+};
 
 static const struct drm_driver driver = {
.driver_features = DRIVER_MODESET | DRIVER_GEM | DRIVER_RENDER | 
DRIVER_ATOMIC,
diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
b/drivers/gpu/drm/virtio/virtgpu_drv.h
index cb60d52c2bd1..e0265fe74aa5 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -138,11 +138,18 @@ struct virtio_gpu_fence_driver {
spinlock_t   lock;
 };
 
+#define VIRTGPU_EVENT_FENCE_SIGNALED_INTERNAL 0x1000
+struct virtio_gpu_fence_event {
+   struct drm_pending_event base;
+   struct drm_event event;
+};
+
 struct virtio_gpu_fence {
struct dma_fence f;
uint32_t ring_idx;
uint64_t fence_id;
bool emit_fence_info;
+   struct virtio_gpu_fence_event *e;
struct virtio_gpu_fence_driver *drv;
struct list_head node;
 };
diff --git a/drivers/gpu/drm/virtio/virtgpu_fence.c 
b/drivers/gpu/drm/virtio/virtgpu_fence.c
index 98a00c1e654d..f28357dbde35 100644
--- a/drivers/gpu/drm/virtio/virtgpu_fence.c
+++ b/drivers/gpu/drm/virtio/virtgpu_fence.c
@@ -152,11 +152,21 @@ void virtio_gpu_fence_event_process(struct 
virtio_gpu_device *vgdev,
continue;
 
dma_fence_signal_locked(&curr->f);
+   if (curr->e) {
+   drm_send_event(vgdev->ddev, &curr->e->base);
+   curr->e = NULL;
+   }
+
list_del(&curr->node);
dma_fence_put(&curr->f);
}
 
dma_fence_signal_locked(&signaled->f);
+   if (signaled->e) {
+   drm_send_event(vgdev->ddev, &signaled->e->base);
+   signaled->e = NULL;
+   }
+
list_del(&signaled->node);

[PATCH v3 05/12] drm/virtio: implement context init: support init ioctl

2021-09-21 Thread Gurchetan Singh

From: Anthoine Bourgeois 

This implements the context initialization ioctl.  A list of params
is passed in by userspace, and kernel driver validates them.  The
only currently supported param is VIRTGPU_CONTEXT_PARAM_CAPSET_ID.

If the context has already been initialized, -EEXIST is returned.
This happens after Linux userspace does dumb_create + followed by
opening the Mesa virgl driver with the same virtgpu instance.

However, for most applications, 3D contexts will be explicitly
initialized when the feature is available.

Signed-off-by: Anthoine Bourgeois 
Acked-by: Lingfeng Yang 
---
 drivers/gpu/drm/virtio/virtgpu_drv.h   |  6 +-
 drivers/gpu/drm/virtio/virtgpu_ioctl.c | 96 --
 drivers/gpu/drm/virtio/virtgpu_vq.c|  4 +-
 3 files changed, 98 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
b/drivers/gpu/drm/virtio/virtgpu_drv.h
index 5e1958a522ff..9996abf60e3a 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -259,12 +259,13 @@ struct virtio_gpu_device {
 
 struct virtio_gpu_fpriv {
uint32_t ctx_id;
+   uint32_t context_init;
bool context_created;
struct mutex context_lock;
 };
 
 /* virtgpu_ioctl.c */
-#define DRM_VIRTIO_NUM_IOCTLS 11
+#define DRM_VIRTIO_NUM_IOCTLS 12
 extern struct drm_ioctl_desc virtio_gpu_ioctls[DRM_VIRTIO_NUM_IOCTLS];
 void virtio_gpu_create_context(struct drm_device *dev, struct drm_file *file);
 
@@ -342,7 +343,8 @@ int virtio_gpu_cmd_get_capset(struct virtio_gpu_device 
*vgdev,
  struct virtio_gpu_drv_cap_cache **cache_p);
 int virtio_gpu_cmd_get_edids(struct virtio_gpu_device *vgdev);
 void virtio_gpu_cmd_context_create(struct virtio_gpu_device *vgdev, uint32_t 
id,
-  uint32_t nlen, const char *name);
+  uint32_t context_init, uint32_t nlen,
+  const char *name);
 void virtio_gpu_cmd_context_destroy(struct virtio_gpu_device *vgdev,
uint32_t id);
 void virtio_gpu_cmd_context_attach_resource(struct virtio_gpu_device *vgdev,
diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
index 5c1ad1596889..f5281d1e30e1 100644
--- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
+++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
@@ -38,20 +38,30 @@
VIRTGPU_BLOB_FLAG_USE_SHAREABLE | \
VIRTGPU_BLOB_FLAG_USE_CROSS_DEVICE)
 
+/* Must be called with &virtio_gpu_fpriv.struct_mutex held. */
+static void virtio_gpu_create_context_locked(struct virtio_gpu_device *vgdev,
+struct virtio_gpu_fpriv *vfpriv)
+{
+   char dbgname[TASK_COMM_LEN];
+
+   get_task_comm(dbgname, current);
+   virtio_gpu_cmd_context_create(vgdev, vfpriv->ctx_id,
+ vfpriv->context_init, strlen(dbgname),
+ dbgname);
+
+   vfpriv->context_created = true;
+}
+
 void virtio_gpu_create_context(struct drm_device *dev, struct drm_file *file)
 {
struct virtio_gpu_device *vgdev = dev->dev_private;
struct virtio_gpu_fpriv *vfpriv = file->driver_priv;
-   char dbgname[TASK_COMM_LEN];
 
mutex_lock(&vfpriv->context_lock);
if (vfpriv->context_created)
goto out_unlock;
 
-   get_task_comm(dbgname, current);
-   virtio_gpu_cmd_context_create(vgdev, vfpriv->ctx_id,
- strlen(dbgname), dbgname);
-   vfpriv->context_created = true;
+   virtio_gpu_create_context_locked(vgdev, vfpriv);
 
 out_unlock:
mutex_unlock(&vfpriv->context_lock);
@@ -662,6 +672,79 @@ static int virtio_gpu_resource_create_blob_ioctl(struct 
drm_device *dev,
return 0;
 }
 
+static int virtio_gpu_context_init_ioctl(struct drm_device *dev,
+void *data, struct drm_file *file)
+{
+   int ret = 0;
+   uint32_t num_params, i, param, value;
+   size_t len;
+   struct drm_virtgpu_context_set_param *ctx_set_params = NULL;
+   struct virtio_gpu_device *vgdev = dev->dev_private;
+   struct virtio_gpu_fpriv *vfpriv = file->driver_priv;
+   struct drm_virtgpu_context_init *args = data;
+
+   num_params = args->num_params;
+   len = num_params * sizeof(struct drm_virtgpu_context_set_param);
+
+   if (!vgdev->has_context_init || !vgdev->has_virgl_3d)
+   return -EINVAL;
+
+   /* Number of unique parameters supported at this time. */
+   if (num_params > 1)
+   return -EINVAL;
+
+   ctx_set_params = memdup_user(u64_to_user_ptr(args->ctx_set_params),
+len);
+
+   if (IS_ERR(ctx_set_params))
+   return PTR_ERR(ctx_set_params);
+
+   mutex_lock(&vfpriv->context_lock);
+   if (vfpriv->context

[PATCH v3 09/12] drm/virtio: implement context init: allocate an array of fence contexts

2021-09-21 Thread Gurchetan Singh

We don't want fences from different 3D contexts (virgl, gfxstream,
venus) to be on the same timeline.  With explicit context creation,
we can specify the number of ring each context wants.

Execbuffer can specify which ring to use.

Signed-off-by: Gurchetan Singh 
Acked-by: Lingfeng Yang 
---
 drivers/gpu/drm/virtio/virtgpu_drv.h   |  3 +++
 drivers/gpu/drm/virtio/virtgpu_ioctl.c | 34 --
 2 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
b/drivers/gpu/drm/virtio/virtgpu_drv.h
index a5142d60c2fa..cca9ab505deb 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -56,6 +56,7 @@
 #define STATE_ERR 2
 
 #define MAX_CAPSET_ID 63
+#define MAX_RINGS 64
 
 struct virtio_gpu_object_params {
unsigned long size;
@@ -263,6 +264,8 @@ struct virtio_gpu_fpriv {
uint32_t ctx_id;
uint32_t context_init;
bool context_created;
+   uint32_t num_rings;
+   uint64_t base_fence_ctx;
struct mutex context_lock;
 };
 
diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
index f51f3393a194..262f79210283 100644
--- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
+++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
@@ -99,6 +99,11 @@ static int virtio_gpu_execbuffer_ioctl(struct drm_device 
*dev, void *data,
int in_fence_fd = exbuf->fence_fd;
int out_fence_fd = -1;
void *buf;
+   uint64_t fence_ctx;
+   uint32_t ring_idx;
+
+   fence_ctx = vgdev->fence_drv.context;
+   ring_idx = 0;
 
if (vgdev->has_virgl_3d == false)
return -ENOSYS;
@@ -106,6 +111,17 @@ static int virtio_gpu_execbuffer_ioctl(struct drm_device 
*dev, void *data,
if ((exbuf->flags & ~VIRTGPU_EXECBUF_FLAGS))
return -EINVAL;
 
+   if ((exbuf->flags & VIRTGPU_EXECBUF_RING_IDX)) {
+   if (exbuf->ring_idx >= vfpriv->num_rings)
+   return -EINVAL;
+
+   if (!vfpriv->base_fence_ctx)
+   return -EINVAL;
+
+   fence_ctx = vfpriv->base_fence_ctx;
+   ring_idx = exbuf->ring_idx;
+   }
+
exbuf->fence_fd = -1;
 
virtio_gpu_create_context(dev, file);
@@ -173,7 +189,7 @@ static int virtio_gpu_execbuffer_ioctl(struct drm_device 
*dev, void *data,
goto out_memdup;
}
 
-   out_fence = virtio_gpu_fence_alloc(vgdev, vgdev->fence_drv.context, 0);
+   out_fence = virtio_gpu_fence_alloc(vgdev, fence_ctx, ring_idx);
if(!out_fence) {
ret = -ENOMEM;
goto out_unresv;
@@ -691,7 +707,7 @@ static int virtio_gpu_context_init_ioctl(struct drm_device 
*dev,
return -EINVAL;
 
/* Number of unique parameters supported at this time. */
-   if (num_params > 1)
+   if (num_params > 2)
return -EINVAL;
 
ctx_set_params = memdup_user(u64_to_user_ptr(args->ctx_set_params),
@@ -731,6 +747,20 @@ static int virtio_gpu_context_init_ioctl(struct drm_device 
*dev,
 
vfpriv->context_init |= value;
break;
+   case VIRTGPU_CONTEXT_PARAM_NUM_RINGS:
+   if (vfpriv->base_fence_ctx) {
+   ret = -EINVAL;
+   goto out_unlock;
+   }
+
+   if (value > MAX_RINGS) {
+   ret = -EINVAL;
+   goto out_unlock;
+   }
+
+   vfpriv->base_fence_ctx = dma_fence_context_alloc(value);
+   vfpriv->num_rings = value;
+   break;
default:
ret = -EINVAL;
goto out_unlock;
-- 
2.33.0.464.g1972c5931b-goog

[PATCH v3 03/12] drm/virtio: implement context init: track valid capabilities in a mask

2021-09-21 Thread Gurchetan Singh

The valid capability IDs are between 1 to 63, and defined in the
virtio gpu spec.  This is used for error checking the subsequent
patches.  We're currently only using 2 capability IDs, so this
should be plenty for the immediate future.

Signed-off-by: Gurchetan Singh 
Acked-by: Lingfeng Yang 
---
 drivers/gpu/drm/virtio/virtgpu_drv.h |  3 +++
 drivers/gpu/drm/virtio/virtgpu_kms.c | 18 +-
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
b/drivers/gpu/drm/virtio/virtgpu_drv.h
index 0c4810982530..3023e16be0d6 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -55,6 +55,8 @@
 #define STATE_OK 1
 #define STATE_ERR 2
 
+#define MAX_CAPSET_ID 63
+
 struct virtio_gpu_object_params {
unsigned long size;
bool dumb;
@@ -245,6 +247,7 @@ struct virtio_gpu_device {
 
struct virtio_gpu_drv_capset *capsets;
uint32_t num_capsets;
+   uint64_t capset_id_mask;
struct list_head cap_cache;
 
/* protects uuid state when exporting */
diff --git a/drivers/gpu/drm/virtio/virtgpu_kms.c 
b/drivers/gpu/drm/virtio/virtgpu_kms.c
index f3379059f324..58a65121c200 100644
--- a/drivers/gpu/drm/virtio/virtgpu_kms.c
+++ b/drivers/gpu/drm/virtio/virtgpu_kms.c
@@ -65,6 +65,7 @@ static void virtio_gpu_get_capsets(struct virtio_gpu_device 
*vgdev,
   int num_capsets)
 {
int i, ret;
+   bool invalid_capset_id = false;
 
vgdev->capsets = kcalloc(num_capsets,
 sizeof(struct virtio_gpu_drv_capset),
@@ -78,19 +79,34 @@ static void virtio_gpu_get_capsets(struct virtio_gpu_device 
*vgdev,
virtio_gpu_notify(vgdev);
ret = wait_event_timeout(vgdev->resp_wq,
 vgdev->capsets[i].id > 0, 5 * HZ);
-   if (ret == 0) {
+   /*
+* Capability ids are defined in the virtio-gpu spec and are
+* between 1 to 63, inclusive.
+*/
+   if (!vgdev->capsets[i].id ||
+   vgdev->capsets[i].id > MAX_CAPSET_ID)
+   invalid_capset_id = true;
+
+   if (ret == 0)
DRM_ERROR("timed out waiting for cap set %d\n", i);
+   else if (invalid_capset_id)
+   DRM_ERROR("invalid capset id %u", vgdev->capsets[i].id);
+
+   if (ret == 0 || invalid_capset_id) {
spin_lock(&vgdev->display_info_lock);
kfree(vgdev->capsets);
vgdev->capsets = NULL;
spin_unlock(&vgdev->display_info_lock);
return;
}
+
+   vgdev->capset_id_mask |= 1 << vgdev->capsets[i].id;
DRM_INFO("cap set %d: id %d, max-version %d, max-size %d\n",
 i, vgdev->capsets[i].id,
 vgdev->capsets[i].max_version,
 vgdev->capsets[i].max_size);
}
+
vgdev->num_capsets = num_capsets;
 }
 
-- 
2.33.0.464.g1972c5931b-goog

[PATCH v3 04/12] drm/virtio: implement context init: probe for feature

2021-09-21 Thread Gurchetan Singh

From: Anthoine Bourgeois 

Let's probe for VIRTIO_GPU_F_CONTEXT_INIT.

Create a new DRM_INFO(..) line since the current one is getting
too long.

Signed-off-by: Anthoine Bourgeois 
Acked-by: Lingfeng Yang 
---
 drivers/gpu/drm/virtio/virtgpu_debugfs.c | 1 +
 drivers/gpu/drm/virtio/virtgpu_drv.c | 1 +
 drivers/gpu/drm/virtio/virtgpu_drv.h | 1 +
 drivers/gpu/drm/virtio/virtgpu_kms.c | 8 +++-
 4 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_debugfs.c 
b/drivers/gpu/drm/virtio/virtgpu_debugfs.c
index c2b20e0ee030..b6954e2f75e6 100644
--- a/drivers/gpu/drm/virtio/virtgpu_debugfs.c
+++ b/drivers/gpu/drm/virtio/virtgpu_debugfs.c
@@ -52,6 +52,7 @@ static int virtio_gpu_features(struct seq_file *m, void *data)
vgdev->has_resource_assign_uuid);
 
virtio_gpu_add_bool(m, "blob resources", vgdev->has_resource_blob);
+   virtio_gpu_add_bool(m, "context init", vgdev->has_context_init);
virtio_gpu_add_int(m, "cap sets", vgdev->num_capsets);
virtio_gpu_add_int(m, "scanouts", vgdev->num_scanouts);
if (vgdev->host_visible_region.len) {
diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.c 
b/drivers/gpu/drm/virtio/virtgpu_drv.c
index ed85a7863256..9d963f1fda8f 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.c
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.c
@@ -172,6 +172,7 @@ static unsigned int features[] = {
VIRTIO_GPU_F_EDID,
VIRTIO_GPU_F_RESOURCE_UUID,
VIRTIO_GPU_F_RESOURCE_BLOB,
+   VIRTIO_GPU_F_CONTEXT_INIT,
 };
 static struct virtio_driver virtio_gpu_driver = {
.feature_table = features,
diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
b/drivers/gpu/drm/virtio/virtgpu_drv.h
index 3023e16be0d6..5e1958a522ff 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -236,6 +236,7 @@ struct virtio_gpu_device {
bool has_resource_assign_uuid;
bool has_resource_blob;
bool has_host_visible;
+   bool has_context_init;
struct virtio_shm_region host_visible_region;
struct drm_mm host_visible_mm;
 
diff --git a/drivers/gpu/drm/virtio/virtgpu_kms.c 
b/drivers/gpu/drm/virtio/virtgpu_kms.c
index 58a65121c200..21f410901694 100644
--- a/drivers/gpu/drm/virtio/virtgpu_kms.c
+++ b/drivers/gpu/drm/virtio/virtgpu_kms.c
@@ -191,13 +191,19 @@ int virtio_gpu_init(struct drm_device *dev)
(unsigned long)vgdev->host_visible_region.addr,
(unsigned long)vgdev->host_visible_region.len);
}
+   if (virtio_has_feature(vgdev->vdev, VIRTIO_GPU_F_CONTEXT_INIT)) {
+   vgdev->has_context_init = true;
+   }
 
-   DRM_INFO("features: %cvirgl %cedid %cresource_blob %chost_visible\n",
+   DRM_INFO("features: %cvirgl %cedid %cresource_blob %chost_visible",
 vgdev->has_virgl_3d? '+' : '-',
 vgdev->has_edid? '+' : '-',
 vgdev->has_resource_blob ? '+' : '-',
 vgdev->has_host_visible ? '+' : '-');
 
+   DRM_INFO("features: %ccontext_init\n",
+vgdev->has_context_init ? '+' : '-');
+
ret = virtio_find_vqs(vgdev->vdev, 2, vqs, callbacks, names, NULL);
if (ret) {
DRM_ERROR("failed to find virt queues\n");
-- 
2.33.0.464.g1972c5931b-goog

[PATCH v3 02/12] drm/virtgpu api: create context init feature

2021-09-21 Thread Gurchetan Singh

This change allows creating contexts of depending on set of
context parameters.  The meaning of each of the parameters
is listed below:

1) VIRTGPU_CONTEXT_PARAM_CAPSET_ID

This determines the type of a context based on the capability set
ID.  For example, the current capsets:

VIRTIO_GPU_CAPSET_VIRGL
VIRTIO_GPU_CAPSET_VIRGL2

define a Gallium, TGSI based "virgl" context.  We only need 1 capset
ID per context type, though virgl has two due a bug that has since
been fixed.

The use case is the "gfxstream" rendering library and "venus"
renderer.

gfxstream doesn't do Gallium/TGSI translation and mostly relies on
auto-generated API streaming.  Certain users prefer gfxstream over
virgl for GLES on GLES emulation.  {gfxstream vk}/{venus} are also
required for Vulkan emulation.  The maximum capset ID is 63.

The goal is for guest userspace to choose the optimal context type
depending on the situation/hardware.

2) VIRTGPU_CONTEXT_PARAM_NUM_RINGS

This tells the number of independent command rings that the context
will use.  This value may be zero and is inferred to be zero if
VIRTGPU_CONTEXT_PARAM_NUM_RINGS is not passed in.  This is for backwards
compatibility for virgl, which has one big giant command ring for all
commands.

The maxiumum number of rings is 64.  In practice, multi-queue or
multi-ring submission is used for powerful dGPUs and virtio-gpu
may not be the best option in that case (see PCI passthrough or
rendernode forwarding).

3) VIRTGPU_CONTEXT_PARAM_POLL_RING_IDX_MASK

This is a mask of ring indices for which the DRM fd is pollable.
For example, if VIRTGPU_CONTEXT_PARAM_NUM_RINGS is 2, then the mask
may be:

[ring idx]  |  [1 << ring_idx] | final mask
---
0  11
1  23

The "Sommelier" guest Wayland proxy uses this to poll for events
from the host compositor.

Signed-off-by: Gurchetan Singh 
Acked-by: Lingfeng Yang 
Acked-by: Nicholas Verne 
---
 include/uapi/drm/virtgpu_drm.h | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/include/uapi/drm/virtgpu_drm.h b/include/uapi/drm/virtgpu_drm.h
index b9ec26e9c646..a13e20cc66b4 100644
--- a/include/uapi/drm/virtgpu_drm.h
+++ b/include/uapi/drm/virtgpu_drm.h
@@ -47,12 +47,15 @@ extern "C" {
 #define DRM_VIRTGPU_WAIT 0x08
 #define DRM_VIRTGPU_GET_CAPS  0x09
 #define DRM_VIRTGPU_RESOURCE_CREATE_BLOB 0x0a
+#define DRM_VIRTGPU_CONTEXT_INIT 0x0b
 
 #define VIRTGPU_EXECBUF_FENCE_FD_IN0x01
 #define VIRTGPU_EXECBUF_FENCE_FD_OUT   0x02
+#define VIRTGPU_EXECBUF_RING_IDX   0x04
 #define VIRTGPU_EXECBUF_FLAGS  (\
VIRTGPU_EXECBUF_FENCE_FD_IN |\
VIRTGPU_EXECBUF_FENCE_FD_OUT |\
+   VIRTGPU_EXECBUF_RING_IDX |\
0)
 
 struct drm_virtgpu_map {
@@ -68,6 +71,8 @@ struct drm_virtgpu_execbuffer {
__u64 bo_handles;
__u32 num_bo_handles;
__s32 fence_fd; /* in/out fence fd (see 
VIRTGPU_EXECBUF_FENCE_FD_IN/OUT) */
+   __u32 ring_idx; /* command ring index (see VIRTGPU_EXECBUF_RING_IDX) */
+   __u32 pad;
 };
 
 #define VIRTGPU_PARAM_3D_FEATURES 1 /* do we have 3D features in the hw */
@@ -75,6 +80,8 @@ struct drm_virtgpu_execbuffer {
 #define VIRTGPU_PARAM_RESOURCE_BLOB 3 /* DRM_VIRTGPU_RESOURCE_CREATE_BLOB */
 #define VIRTGPU_PARAM_HOST_VISIBLE 4 /* Host blob resources are mappable */
 #define VIRTGPU_PARAM_CROSS_DEVICE 5 /* Cross virtio-device resource sharing  
*/
+#define VIRTGPU_PARAM_CONTEXT_INIT 6 /* DRM_VIRTGPU_CONTEXT_INIT */
+#define VIRTGPU_PARAM_SUPPORTED_CAPSET_IDs 7 /* Bitmask of supported 
capability set ids */
 
 struct drm_virtgpu_getparam {
__u64 param;
@@ -173,6 +180,22 @@ struct drm_virtgpu_resource_create_blob {
__u64 blob_id;
 };
 
+#define VIRTGPU_CONTEXT_PARAM_CAPSET_ID   0x0001
+#define VIRTGPU_CONTEXT_PARAM_NUM_RINGS   0x0002
+#define VIRTGPU_CONTEXT_PARAM_POLL_RINGS_MASK 0x0003
+struct drm_virtgpu_context_set_param {
+   __u64 param;
+   __u64 value;
+};
+
+struct drm_virtgpu_context_init {
+   __u32 num_params;
+   __u32 pad;
+
+   /* pointer to drm_virtgpu_context_set_param array */
+   __u64 ctx_set_params;
+};
+
 #define DRM_IOCTL_VIRTGPU_MAP \
DRM_IOWR(DRM_COMMAND_BASE + DRM_VIRTGPU_MAP, struct drm_virtgpu_map)
 
@@ -212,6 +235,10 @@ struct drm_virtgpu_resource_create_blob {
DRM_IOWR(DRM_COMMAND_BASE + DRM_VIRTGPU_RESOURCE_CREATE_BLOB,   \
struct drm_virtgpu_resource_create_blob)
 
+#define DRM_IOCTL_VIRTGPU_CONTEXT_INIT \
+   DRM_IOWR(DRM_COMMAND_BASE + DRM_VIRTGPU_CONTEXT_INIT,   \
+   struct drm_virtgpu_context_init)
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.33.0.464.g1972c5931b-goog

[PATCH v3 01/12] virtio-gpu api: multiple context types with explicit initialization

2021-09-21 Thread Gurchetan Singh

This feature allows for each virtio-gpu 3D context to be created
with a "context_init" variable.  This variable can specify:

 - the type of protocol used by the context via the capset id.
   This is useful for differentiating virgl, gfxstream, and venus
   protocols by host userspace.

 - other things in the future, such as the version of the context.

In addition, each different context needs one or more timelines, so
for example a virgl context's waiting can be independent on a
gfxstream context's waiting.

VIRTIO_GPU_FLAG_INFO_RING_IDX is introduced to specific to tell the
host which per-context command ring (or "hardware queue", distinct
from the virtio-queue) the fence should be associated with.

The new capability sets (gfxstream, venus etc.) are only defined in
the virtio-gpu spec and not defined in the header.

Signed-off-by: Gurchetan Singh 
Acked-by: Lingfeng Yang 
---
 include/uapi/linux/virtio_gpu.h | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/virtio_gpu.h b/include/uapi/linux/virtio_gpu.h
index 97523a95781d..f556fde07b76 100644
--- a/include/uapi/linux/virtio_gpu.h
+++ b/include/uapi/linux/virtio_gpu.h
@@ -59,6 +59,11 @@
  * VIRTIO_GPU_CMD_RESOURCE_CREATE_BLOB
  */
 #define VIRTIO_GPU_F_RESOURCE_BLOB   3
+/*
+ * VIRTIO_GPU_CMD_CREATE_CONTEXT with
+ * context_init and multiple timelines
+ */
+#define VIRTIO_GPU_F_CONTEXT_INIT4
 
 enum virtio_gpu_ctrl_type {
VIRTIO_GPU_UNDEFINED = 0,
@@ -122,14 +127,20 @@ enum virtio_gpu_shm_id {
VIRTIO_GPU_SHM_ID_HOST_VISIBLE = 1
 };
 
-#define VIRTIO_GPU_FLAG_FENCE (1 << 0)
+#define VIRTIO_GPU_FLAG_FENCE (1 << 0)
+/*
+ * If the following flag is set, then ring_idx contains the index
+ * of the command ring that needs to used when creating the fence
+ */
+#define VIRTIO_GPU_FLAG_INFO_RING_IDX (1 << 1)
 
 struct virtio_gpu_ctrl_hdr {
__le32 type;
__le32 flags;
__le64 fence_id;
__le32 ctx_id;
-   __le32 padding;
+   __u8 ring_idx;
+   __u8 padding[3];
 };
 
 /* data passed in the cursor vq */
@@ -269,10 +280,11 @@ struct virtio_gpu_resource_create_3d {
 };
 
 /* VIRTIO_GPU_CMD_CTX_CREATE */
+#define VIRTIO_GPU_CONTEXT_INIT_CAPSET_ID_MASK 0x00ff
 struct virtio_gpu_ctx_create {
struct virtio_gpu_ctrl_hdr hdr;
__le32 nlen;
-   __le32 padding;
+   __le32 context_init;
char debug_name[64];
 };
 
-- 
2.33.0.464.g1972c5931b-goog

[PATCH v3 00/12] Context types, v3

2021-09-21 Thread Gurchetan Singh

Version 2 of context types:

https://lists.oasis-open.org/archives/virtio-dev/202108/msg00141.html

Changes since RFC:
   * le32 info --> {u8 ring_idx + u8 padding[3]).
   * Max rings is now 64.

Changes since v1:
   * Document plan regarding context types + display combinations that
 need implicit sync in patch 9.

Changes since v2:
   * u8 ring_idx --> __u8 ring_idx to fix buildbot issues

Anthoine Bourgeois (2):
  drm/virtio: implement context init: probe for feature
  drm/virtio: implement context init: support init ioctl

Gurchetan Singh (10):
  virtio-gpu api: multiple context types with explicit initialization
  drm/virtgpu api: create context init feature
  drm/virtio: implement context init: track valid capabilities in a mask
  drm/virtio: implement context init: track {ring_idx, emit_fence_info}
in virtio_gpu_fence
  drm/virtio: implement context init: plumb {base_fence_ctx, ring_idx}
to virtio_gpu_fence_alloc
  drm/virtio: implement context init: stop using drv->context when
creating fence
  drm/virtio: implement context init: allocate an array of fence
contexts
  drm/virtio: implement context init: handle
VIRTGPU_CONTEXT_PARAM_POLL_RINGS_MASK
  drm/virtio: implement context init: add virtio_gpu_fence_event
  drm/virtio: implement context init: advertise feature to userspace

 drivers/gpu/drm/virtio/virtgpu_debugfs.c |   1 +
 drivers/gpu/drm/virtio/virtgpu_drv.c |  44 -
 drivers/gpu/drm/virtio/virtgpu_drv.h |  28 +++-
 drivers/gpu/drm/virtio/virtgpu_fence.c   |  30 +++-
 drivers/gpu/drm/virtio/virtgpu_ioctl.c   | 195 +--
 drivers/gpu/drm/virtio/virtgpu_kms.c |  26 ++-
 drivers/gpu/drm/virtio/virtgpu_plane.c   |   3 +-
 drivers/gpu/drm/virtio/virtgpu_vq.c  |  19 +--
 include/uapi/drm/virtgpu_drm.h   |  27 
 include/uapi/linux/virtio_gpu.h  |  18 ++-
 10 files changed, 355 insertions(+), 36 deletions(-)

-- 
2.33.0.464.g1972c5931b-goog

[PATCH v2 12/12] drm/virtio: implement context init: advertise feature to userspace

2021-09-16 Thread Gurchetan Singh

This advertises the context init feature to userspace, along with
a mask of supported capabilities.

Signed-off-by: Gurchetan Singh 
Acked-by: Lingfeng Yang 
---
 drivers/gpu/drm/virtio/virtgpu_ioctl.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
index fdaa7f3d9eeb..5618a1d5879c 100644
--- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
+++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
@@ -286,6 +286,12 @@ static int virtio_gpu_getparam_ioctl(struct drm_device 
*dev, void *data,
case VIRTGPU_PARAM_CROSS_DEVICE:
value = vgdev->has_resource_assign_uuid ? 1 : 0;
break;
+   case VIRTGPU_PARAM_CONTEXT_INIT:
+   value = vgdev->has_context_init ? 1 : 0;
+   break;
+   case VIRTGPU_PARAM_SUPPORTED_CAPSET_IDs:
+   value = vgdev->capset_id_mask;
+   break;
default:
return -EINVAL;
}
-- 
2.33.0.464.g1972c5931b-goog

[PATCH v2 11/12] drm/virtio: implement context init: add virtio_gpu_fence_event

2021-09-16 Thread Gurchetan Singh

Similar to DRM_VMW_EVENT_FENCE_SIGNALED.  Sends a pollable event
to the DRM file descriptor when a fence on a specific ring is
signaled.

One difference is the event is not exposed via the UAPI -- this is
because host responses are on a shared memory buffer of type
BLOB_MEM_GUEST [this is the common way to receive responses with
virtgpu].  As such, there is no context specific read(..)
implementation either -- just a poll(..) implementation.

Signed-off-by: Gurchetan Singh 
Acked-by: Nicholas Verne 
---
 drivers/gpu/drm/virtio/virtgpu_drv.c   | 43 +-
 drivers/gpu/drm/virtio/virtgpu_drv.h   |  7 +
 drivers/gpu/drm/virtio/virtgpu_fence.c | 10 ++
 drivers/gpu/drm/virtio/virtgpu_ioctl.c | 34 
 4 files changed, 93 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.c 
b/drivers/gpu/drm/virtio/virtgpu_drv.c
index 9d963f1fda8f..749db18dcfa2 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.c
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.c
@@ -29,6 +29,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
@@ -155,6 +157,35 @@ static void virtio_gpu_config_changed(struct virtio_device 
*vdev)
schedule_work(&vgdev->config_changed_work);
 }
 
+static __poll_t virtio_gpu_poll(struct file *filp,
+   struct poll_table_struct *wait)
+{
+   struct drm_file *drm_file = filp->private_data;
+   struct virtio_gpu_fpriv *vfpriv = drm_file->driver_priv;
+   struct drm_device *dev = drm_file->minor->dev;
+   struct drm_pending_event *e = NULL;
+   __poll_t mask = 0;
+
+   if (!vfpriv->ring_idx_mask)
+   return drm_poll(filp, wait);
+
+   poll_wait(filp, &drm_file->event_wait, wait);
+
+   if (!list_empty(&drm_file->event_list)) {
+   spin_lock_irq(&dev->event_lock);
+   e = list_first_entry(&drm_file->event_list,
+struct drm_pending_event, link);
+   drm_file->event_space += e->event->length;
+   list_del(&e->link);
+   spin_unlock_irq(&dev->event_lock);
+
+   kfree(e);
+   mask |= EPOLLIN | EPOLLRDNORM;
+   }
+
+   return mask;
+}
+
 static struct virtio_device_id id_table[] = {
{ VIRTIO_ID_GPU, VIRTIO_DEV_ANY_ID },
{ 0 },
@@ -194,7 +225,17 @@ MODULE_AUTHOR("Dave Airlie ");
 MODULE_AUTHOR("Gerd Hoffmann ");
 MODULE_AUTHOR("Alon Levy");
 
-DEFINE_DRM_GEM_FOPS(virtio_gpu_driver_fops);
+static const struct file_operations virtio_gpu_driver_fops = {
+   .owner  = THIS_MODULE,
+   .open   = drm_open,
+   .release= drm_release,
+   .unlocked_ioctl = drm_ioctl,
+   .compat_ioctl   = drm_compat_ioctl,
+   .poll   = virtio_gpu_poll,
+   .read   = drm_read,
+   .llseek = noop_llseek,
+   .mmap   = drm_gem_mmap
+};
 
 static const struct drm_driver driver = {
.driver_features = DRIVER_MODESET | DRIVER_GEM | DRIVER_RENDER | 
DRIVER_ATOMIC,
diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
b/drivers/gpu/drm/virtio/virtgpu_drv.h
index cb60d52c2bd1..e0265fe74aa5 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -138,11 +138,18 @@ struct virtio_gpu_fence_driver {
spinlock_t   lock;
 };
 
+#define VIRTGPU_EVENT_FENCE_SIGNALED_INTERNAL 0x1000
+struct virtio_gpu_fence_event {
+   struct drm_pending_event base;
+   struct drm_event event;
+};
+
 struct virtio_gpu_fence {
struct dma_fence f;
uint32_t ring_idx;
uint64_t fence_id;
bool emit_fence_info;
+   struct virtio_gpu_fence_event *e;
struct virtio_gpu_fence_driver *drv;
struct list_head node;
 };
diff --git a/drivers/gpu/drm/virtio/virtgpu_fence.c 
b/drivers/gpu/drm/virtio/virtgpu_fence.c
index 98a00c1e654d..f28357dbde35 100644
--- a/drivers/gpu/drm/virtio/virtgpu_fence.c
+++ b/drivers/gpu/drm/virtio/virtgpu_fence.c
@@ -152,11 +152,21 @@ void virtio_gpu_fence_event_process(struct 
virtio_gpu_device *vgdev,
continue;
 
dma_fence_signal_locked(&curr->f);
+   if (curr->e) {
+   drm_send_event(vgdev->ddev, &curr->e->base);
+   curr->e = NULL;
+   }
+
list_del(&curr->node);
dma_fence_put(&curr->f);
}
 
dma_fence_signal_locked(&signaled->f);
+   if (signaled->e) {
+   drm_send_event(vgdev->ddev, &signaled->e->base);
+   signaled->e = NULL;
+   }
+
list_del(&signaled->node);

[PATCH v2 09/12] drm/virtio: implement context init: allocate an array of fence contexts

2021-09-16 Thread Gurchetan Singh

We don't want fences from different 3D contexts (virgl, gfxstream,
venus) to be on the same timeline.  With explicit context creation,
we can specify the number of ring each context wants.

Execbuffer can specify which ring to use.

Note: virgl + Xserver + virtgpu KMS may need implicit sync
support in case a buffer object comes from a different context
type.  This can be added later when the revelant context types
support multiple rings, by waiting on the reservation object
associated with the foreign context's buffer object.

Signed-off-by: Gurchetan Singh 
Acked-by: Lingfeng Yang 
---
 drivers/gpu/drm/virtio/virtgpu_drv.h   |  3 +++
 drivers/gpu/drm/virtio/virtgpu_ioctl.c | 34 --
 2 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
b/drivers/gpu/drm/virtio/virtgpu_drv.h
index a5142d60c2fa..cca9ab505deb 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -56,6 +56,7 @@
 #define STATE_ERR 2
 
 #define MAX_CAPSET_ID 63
+#define MAX_RINGS 64
 
 struct virtio_gpu_object_params {
unsigned long size;
@@ -263,6 +264,8 @@ struct virtio_gpu_fpriv {
uint32_t ctx_id;
uint32_t context_init;
bool context_created;
+   uint32_t num_rings;
+   uint64_t base_fence_ctx;
struct mutex context_lock;
 };
 
diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
index f51f3393a194..262f79210283 100644
--- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
+++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
@@ -99,6 +99,11 @@ static int virtio_gpu_execbuffer_ioctl(struct drm_device 
*dev, void *data,
int in_fence_fd = exbuf->fence_fd;
int out_fence_fd = -1;
void *buf;
+   uint64_t fence_ctx;
+   uint32_t ring_idx;
+
+   fence_ctx = vgdev->fence_drv.context;
+   ring_idx = 0;
 
if (vgdev->has_virgl_3d == false)
return -ENOSYS;
@@ -106,6 +111,17 @@ static int virtio_gpu_execbuffer_ioctl(struct drm_device 
*dev, void *data,
if ((exbuf->flags & ~VIRTGPU_EXECBUF_FLAGS))
return -EINVAL;
 
+   if ((exbuf->flags & VIRTGPU_EXECBUF_RING_IDX)) {
+   if (exbuf->ring_idx >= vfpriv->num_rings)
+   return -EINVAL;
+
+   if (!vfpriv->base_fence_ctx)
+   return -EINVAL;
+
+   fence_ctx = vfpriv->base_fence_ctx;
+   ring_idx = exbuf->ring_idx;
+   }
+
exbuf->fence_fd = -1;
 
virtio_gpu_create_context(dev, file);
@@ -173,7 +189,7 @@ static int virtio_gpu_execbuffer_ioctl(struct drm_device 
*dev, void *data,
goto out_memdup;
}
 
-   out_fence = virtio_gpu_fence_alloc(vgdev, vgdev->fence_drv.context, 0);
+   out_fence = virtio_gpu_fence_alloc(vgdev, fence_ctx, ring_idx);
if(!out_fence) {
ret = -ENOMEM;
goto out_unresv;
@@ -691,7 +707,7 @@ static int virtio_gpu_context_init_ioctl(struct drm_device 
*dev,
return -EINVAL;
 
/* Number of unique parameters supported at this time. */
-   if (num_params > 1)
+   if (num_params > 2)
return -EINVAL;
 
ctx_set_params = memdup_user(u64_to_user_ptr(args->ctx_set_params),
@@ -731,6 +747,20 @@ static int virtio_gpu_context_init_ioctl(struct drm_device 
*dev,
 
vfpriv->context_init |= value;
break;
+   case VIRTGPU_CONTEXT_PARAM_NUM_RINGS:
+   if (vfpriv->base_fence_ctx) {
+   ret = -EINVAL;
+   goto out_unlock;
+   }
+
+   if (value > MAX_RINGS) {
+   ret = -EINVAL;
+   goto out_unlock;
+   }
+
+   vfpriv->base_fence_ctx = dma_fence_context_alloc(value);
+   vfpriv->num_rings = value;
+   break;
default:
ret = -EINVAL;
goto out_unlock;
-- 
2.33.0.464.g1972c5931b-goog

[PATCH v2 08/12] drm/virtio: implement context init: stop using drv->context when creating fence

2021-09-16 Thread Gurchetan Singh

The plumbing is all here to do this.  Since we always use the
default fence context when allocating a fence, this makes no
functional difference.

We can't process just the largest fence id anymore, since it's
it's associated with different timelines.  It's fine for fence_id
260 to signal before 259.  As such, process each fence_id
individually.

Signed-off-by: Gurchetan Singh 
Acked-by: Lingfeng Yang 
---
 drivers/gpu/drm/virtio/virtgpu_fence.c | 16 ++--
 drivers/gpu/drm/virtio/virtgpu_vq.c| 15 +++
 2 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_fence.c 
b/drivers/gpu/drm/virtio/virtgpu_fence.c
index 24c728b65d21..98a00c1e654d 100644
--- a/drivers/gpu/drm/virtio/virtgpu_fence.c
+++ b/drivers/gpu/drm/virtio/virtgpu_fence.c
@@ -75,20 +75,25 @@ struct virtio_gpu_fence *virtio_gpu_fence_alloc(struct 
virtio_gpu_device *vgdev,
uint64_t base_fence_ctx,
uint32_t ring_idx)
 {
+   uint64_t fence_context = base_fence_ctx + ring_idx;
struct virtio_gpu_fence_driver *drv = &vgdev->fence_drv;
struct virtio_gpu_fence *fence = kzalloc(sizeof(struct 
virtio_gpu_fence),
GFP_KERNEL);
+
if (!fence)
return fence;
 
fence->drv = drv;
+   fence->ring_idx = ring_idx;
+   fence->emit_fence_info = !(base_fence_ctx == drv->context);
 
/* This only partially initializes the fence because the seqno is
 * unknown yet.  The fence must not be used outside of the driver
 * until virtio_gpu_fence_emit is called.
 */
-   dma_fence_init(&fence->f, &virtio_gpu_fence_ops, &drv->lock, 
drv->context,
-  0);
+
+   dma_fence_init(&fence->f, &virtio_gpu_fence_ops, &drv->lock,
+  fence_context, 0);
 
return fence;
 }
@@ -110,6 +115,13 @@ void virtio_gpu_fence_emit(struct virtio_gpu_device *vgdev,
 
cmd_hdr->flags |= cpu_to_le32(VIRTIO_GPU_FLAG_FENCE);
cmd_hdr->fence_id = cpu_to_le64(fence->fence_id);
+
+   /* Only currently defined fence param. */
+   if (fence->emit_fence_info) {
+   cmd_hdr->flags |=
+   cpu_to_le32(VIRTIO_GPU_FLAG_INFO_RING_IDX);
+   cmd_hdr->ring_idx = (u8)fence->ring_idx;
+   }
 }
 
 void virtio_gpu_fence_event_process(struct virtio_gpu_device *vgdev,
diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c 
b/drivers/gpu/drm/virtio/virtgpu_vq.c
index 496f8ce4cd41..938331554632 100644
--- a/drivers/gpu/drm/virtio/virtgpu_vq.c
+++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
@@ -205,7 +205,7 @@ void virtio_gpu_dequeue_ctrl_func(struct work_struct *work)
struct list_head reclaim_list;
struct virtio_gpu_vbuffer *entry, *tmp;
struct virtio_gpu_ctrl_hdr *resp;
-   u64 fence_id = 0;
+   u64 fence_id;
 
INIT_LIST_HEAD(&reclaim_list);
spin_lock(&vgdev->ctrlq.qlock);
@@ -232,23 +232,14 @@ void virtio_gpu_dequeue_ctrl_func(struct work_struct 
*work)
DRM_DEBUG("response 0x%x\n", 
le32_to_cpu(resp->type));
}
if (resp->flags & cpu_to_le32(VIRTIO_GPU_FLAG_FENCE)) {
-   u64 f = le64_to_cpu(resp->fence_id);
-
-   if (fence_id > f) {
-   DRM_ERROR("%s: Oops: fence %llx -> %llx\n",
- __func__, fence_id, f);
-   } else {
-   fence_id = f;
-   }
+   fence_id = le64_to_cpu(resp->fence_id);
+   virtio_gpu_fence_event_process(vgdev, fence_id);
}
if (entry->resp_cb)
entry->resp_cb(vgdev, entry);
}
wake_up(&vgdev->ctrlq.ack_queue);
 
-   if (fence_id)
-   virtio_gpu_fence_event_process(vgdev, fence_id);
-
list_for_each_entry_safe(entry, tmp, &reclaim_list, list) {
if (entry->objs)
virtio_gpu_array_put_free_delayed(vgdev, entry->objs);
-- 
2.33.0.464.g1972c5931b-goog

[PATCH v2 10/12] drm/virtio: implement context init: handle VIRTGPU_CONTEXT_PARAM_POLL_RINGS_MASK

2021-09-16 Thread Gurchetan Singh

For the Sommelier guest Wayland proxy, it's desirable for the
DRM fd to be pollable in response to an host compositor event.
This can also be used by the 3D driver to poll events on a CPU
timeline.

This enables the DRM fd associated with a particular 3D context
to be polled independent of KMS events.  The parameter
VIRTGPU_CONTEXT_PARAM_POLL_RINGS_MASK specifies the pollable
rings.

Signed-off-by: Gurchetan Singh 
Acked-by: Nicholas Verne 
---
 drivers/gpu/drm/virtio/virtgpu_drv.h   |  1 +
 drivers/gpu/drm/virtio/virtgpu_ioctl.c | 22 +-
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
b/drivers/gpu/drm/virtio/virtgpu_drv.h
index cca9ab505deb..cb60d52c2bd1 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -266,6 +266,7 @@ struct virtio_gpu_fpriv {
bool context_created;
uint32_t num_rings;
uint64_t base_fence_ctx;
+   uint64_t ring_idx_mask;
struct mutex context_lock;
 };
 
diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
index 262f79210283..be7b22a03884 100644
--- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
+++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
@@ -694,6 +694,7 @@ static int virtio_gpu_context_init_ioctl(struct drm_device 
*dev,
 {
int ret = 0;
uint32_t num_params, i, param, value;
+   uint64_t valid_ring_mask;
size_t len;
struct drm_virtgpu_context_set_param *ctx_set_params = NULL;
struct virtio_gpu_device *vgdev = dev->dev_private;
@@ -707,7 +708,7 @@ static int virtio_gpu_context_init_ioctl(struct drm_device 
*dev,
return -EINVAL;
 
/* Number of unique parameters supported at this time. */
-   if (num_params > 2)
+   if (num_params > 3)
return -EINVAL;
 
ctx_set_params = memdup_user(u64_to_user_ptr(args->ctx_set_params),
@@ -761,12 +762,31 @@ static int virtio_gpu_context_init_ioctl(struct 
drm_device *dev,
vfpriv->base_fence_ctx = dma_fence_context_alloc(value);
vfpriv->num_rings = value;
break;
+   case VIRTGPU_CONTEXT_PARAM_POLL_RINGS_MASK:
+   if (vfpriv->ring_idx_mask) {
+   ret = -EINVAL;
+   goto out_unlock;
+   }
+
+   vfpriv->ring_idx_mask = value;
+   break;
default:
ret = -EINVAL;
goto out_unlock;
}
}
 
+   if (vfpriv->ring_idx_mask) {
+   valid_ring_mask = 0;
+   for (i = 0; i < vfpriv->num_rings; i++)
+   valid_ring_mask |= 1 << i;
+
+   if (~valid_ring_mask & vfpriv->ring_idx_mask) {
+   ret = -EINVAL;
+   goto out_unlock;
+   }
+   }
+
virtio_gpu_create_context_locked(vgdev, vfpriv);
virtio_gpu_notify(vgdev);
 
-- 
2.33.0.464.g1972c5931b-goog

[PATCH v2 05/12] drm/virtio: implement context init: support init ioctl

2021-09-16 Thread Gurchetan Singh

From: Anthoine Bourgeois 

This implements the context initialization ioctl.  A list of params
is passed in by userspace, and kernel driver validates them.  The
only currently supported param is VIRTGPU_CONTEXT_PARAM_CAPSET_ID.

If the context has already been initialized, -EEXIST is returned.
This happens after Linux userspace does dumb_create + followed by
opening the Mesa virgl driver with the same virtgpu instance.

However, for most applications, 3D contexts will be explicitly
initialized when the feature is available.

Signed-off-by: Anthoine Bourgeois 
Acked-by: Lingfeng Yang 
---
 drivers/gpu/drm/virtio/virtgpu_drv.h   |  6 +-
 drivers/gpu/drm/virtio/virtgpu_ioctl.c | 96 --
 drivers/gpu/drm/virtio/virtgpu_vq.c|  4 +-
 3 files changed, 98 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
b/drivers/gpu/drm/virtio/virtgpu_drv.h
index 5e1958a522ff..9996abf60e3a 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -259,12 +259,13 @@ struct virtio_gpu_device {
 
 struct virtio_gpu_fpriv {
uint32_t ctx_id;
+   uint32_t context_init;
bool context_created;
struct mutex context_lock;
 };
 
 /* virtgpu_ioctl.c */
-#define DRM_VIRTIO_NUM_IOCTLS 11
+#define DRM_VIRTIO_NUM_IOCTLS 12
 extern struct drm_ioctl_desc virtio_gpu_ioctls[DRM_VIRTIO_NUM_IOCTLS];
 void virtio_gpu_create_context(struct drm_device *dev, struct drm_file *file);
 
@@ -342,7 +343,8 @@ int virtio_gpu_cmd_get_capset(struct virtio_gpu_device 
*vgdev,
  struct virtio_gpu_drv_cap_cache **cache_p);
 int virtio_gpu_cmd_get_edids(struct virtio_gpu_device *vgdev);
 void virtio_gpu_cmd_context_create(struct virtio_gpu_device *vgdev, uint32_t 
id,
-  uint32_t nlen, const char *name);
+  uint32_t context_init, uint32_t nlen,
+  const char *name);
 void virtio_gpu_cmd_context_destroy(struct virtio_gpu_device *vgdev,
uint32_t id);
 void virtio_gpu_cmd_context_attach_resource(struct virtio_gpu_device *vgdev,
diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
index 5c1ad1596889..f5281d1e30e1 100644
--- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
+++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
@@ -38,20 +38,30 @@
VIRTGPU_BLOB_FLAG_USE_SHAREABLE | \
VIRTGPU_BLOB_FLAG_USE_CROSS_DEVICE)
 
+/* Must be called with &virtio_gpu_fpriv.struct_mutex held. */
+static void virtio_gpu_create_context_locked(struct virtio_gpu_device *vgdev,
+struct virtio_gpu_fpriv *vfpriv)
+{
+   char dbgname[TASK_COMM_LEN];
+
+   get_task_comm(dbgname, current);
+   virtio_gpu_cmd_context_create(vgdev, vfpriv->ctx_id,
+ vfpriv->context_init, strlen(dbgname),
+ dbgname);
+
+   vfpriv->context_created = true;
+}
+
 void virtio_gpu_create_context(struct drm_device *dev, struct drm_file *file)
 {
struct virtio_gpu_device *vgdev = dev->dev_private;
struct virtio_gpu_fpriv *vfpriv = file->driver_priv;
-   char dbgname[TASK_COMM_LEN];
 
mutex_lock(&vfpriv->context_lock);
if (vfpriv->context_created)
goto out_unlock;
 
-   get_task_comm(dbgname, current);
-   virtio_gpu_cmd_context_create(vgdev, vfpriv->ctx_id,
- strlen(dbgname), dbgname);
-   vfpriv->context_created = true;
+   virtio_gpu_create_context_locked(vgdev, vfpriv);
 
 out_unlock:
mutex_unlock(&vfpriv->context_lock);
@@ -662,6 +672,79 @@ static int virtio_gpu_resource_create_blob_ioctl(struct 
drm_device *dev,
return 0;
 }
 
+static int virtio_gpu_context_init_ioctl(struct drm_device *dev,
+void *data, struct drm_file *file)
+{
+   int ret = 0;
+   uint32_t num_params, i, param, value;
+   size_t len;
+   struct drm_virtgpu_context_set_param *ctx_set_params = NULL;
+   struct virtio_gpu_device *vgdev = dev->dev_private;
+   struct virtio_gpu_fpriv *vfpriv = file->driver_priv;
+   struct drm_virtgpu_context_init *args = data;
+
+   num_params = args->num_params;
+   len = num_params * sizeof(struct drm_virtgpu_context_set_param);
+
+   if (!vgdev->has_context_init || !vgdev->has_virgl_3d)
+   return -EINVAL;
+
+   /* Number of unique parameters supported at this time. */
+   if (num_params > 1)
+   return -EINVAL;
+
+   ctx_set_params = memdup_user(u64_to_user_ptr(args->ctx_set_params),
+len);
+
+   if (IS_ERR(ctx_set_params))
+   return PTR_ERR(ctx_set_params);
+
+   mutex_lock(&vfpriv->context_lock);
+   if (vfpriv->context

[PATCH v2 06/12] drm/virtio: implement context init: track {ring_idx, emit_fence_info} in virtio_gpu_fence

2021-09-16 Thread Gurchetan Singh

Each fence should be associated with a [fence ID, fence_context,
seqno].  The seqno number is just the fence id.

To get the fence context, we add the ring_idx to the 3D context's
base_fence_ctx.  The ring_idx is between 0 and 31, inclusive.

Each 3D context will have it's own base_fence_ctx. The ring_idx will
be emitted to host userspace, when emit_fence_info is true.

Signed-off-by: Gurchetan Singh 
Acked-by: Lingfeng Yang 
---
 drivers/gpu/drm/virtio/virtgpu_drv.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
b/drivers/gpu/drm/virtio/virtgpu_drv.h
index 9996abf60e3a..401aec1a5efb 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -139,7 +139,9 @@ struct virtio_gpu_fence_driver {
 
 struct virtio_gpu_fence {
struct dma_fence f;
+   uint32_t ring_idx;
uint64_t fence_id;
+   bool emit_fence_info;
struct virtio_gpu_fence_driver *drv;
struct list_head node;
 };
-- 
2.33.0.464.g1972c5931b-goog

[PATCH v2 07/12] drm/virtio: implement context init: plumb {base_fence_ctx, ring_idx} to virtio_gpu_fence_alloc

2021-09-16 Thread Gurchetan Singh

These were defined in the previous commit. We'll need these
parameters when allocating a dma_fence.  The use case for this
is multiple synchronizations timelines.

The maximum number of timelines per 3D instance will be 32. Usually,
only 2 are needed -- one for CPU commands, and another for GPU
commands.

As such, we'll need to specify these parameters when allocating a
dma_fence.

vgdev->fence_drv.context is the "default" fence context for 2D mode
and old userspace.

Signed-off-by: Gurchetan Singh 
Acked-by: Lingfeng Yang 
---
 drivers/gpu/drm/virtio/virtgpu_drv.h   | 5 +++--
 drivers/gpu/drm/virtio/virtgpu_fence.c | 4 +++-
 drivers/gpu/drm/virtio/virtgpu_ioctl.c | 9 +
 drivers/gpu/drm/virtio/virtgpu_plane.c | 3 ++-
 4 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
b/drivers/gpu/drm/virtio/virtgpu_drv.h
index 401aec1a5efb..a5142d60c2fa 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -426,8 +426,9 @@ struct drm_plane *virtio_gpu_plane_init(struct 
virtio_gpu_device *vgdev,
int index);
 
 /* virtgpu_fence.c */
-struct virtio_gpu_fence *virtio_gpu_fence_alloc(
-   struct virtio_gpu_device *vgdev);
+struct virtio_gpu_fence *virtio_gpu_fence_alloc(struct virtio_gpu_device 
*vgdev,
+   uint64_t base_fence_ctx,
+   uint32_t ring_idx);
 void virtio_gpu_fence_emit(struct virtio_gpu_device *vgdev,
  struct virtio_gpu_ctrl_hdr *cmd_hdr,
  struct virtio_gpu_fence *fence);
diff --git a/drivers/gpu/drm/virtio/virtgpu_fence.c 
b/drivers/gpu/drm/virtio/virtgpu_fence.c
index d28e25e8409b..24c728b65d21 100644
--- a/drivers/gpu/drm/virtio/virtgpu_fence.c
+++ b/drivers/gpu/drm/virtio/virtgpu_fence.c
@@ -71,7 +71,9 @@ static const struct dma_fence_ops virtio_gpu_fence_ops = {
.timeline_value_str  = virtio_gpu_timeline_value_str,
 };
 
-struct virtio_gpu_fence *virtio_gpu_fence_alloc(struct virtio_gpu_device 
*vgdev)
+struct virtio_gpu_fence *virtio_gpu_fence_alloc(struct virtio_gpu_device 
*vgdev,
+   uint64_t base_fence_ctx,
+   uint32_t ring_idx)
 {
struct virtio_gpu_fence_driver *drv = &vgdev->fence_drv;
struct virtio_gpu_fence *fence = kzalloc(sizeof(struct 
virtio_gpu_fence),
diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
index f5281d1e30e1..f51f3393a194 100644
--- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
+++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
@@ -173,7 +173,7 @@ static int virtio_gpu_execbuffer_ioctl(struct drm_device 
*dev, void *data,
goto out_memdup;
}
 
-   out_fence = virtio_gpu_fence_alloc(vgdev);
+   out_fence = virtio_gpu_fence_alloc(vgdev, vgdev->fence_drv.context, 0);
if(!out_fence) {
ret = -ENOMEM;
goto out_unresv;
@@ -288,7 +288,7 @@ static int virtio_gpu_resource_create_ioctl(struct 
drm_device *dev, void *data,
if (params.size == 0)
params.size = PAGE_SIZE;
 
-   fence = virtio_gpu_fence_alloc(vgdev);
+   fence = virtio_gpu_fence_alloc(vgdev, vgdev->fence_drv.context, 0);
if (!fence)
return -ENOMEM;
ret = virtio_gpu_object_create(vgdev, ¶ms, &qobj, fence);
@@ -367,7 +367,7 @@ static int virtio_gpu_transfer_from_host_ioctl(struct 
drm_device *dev,
if (ret != 0)
goto err_put_free;
 
-   fence = virtio_gpu_fence_alloc(vgdev);
+   fence = virtio_gpu_fence_alloc(vgdev, vgdev->fence_drv.context, 0);
if (!fence) {
ret = -ENOMEM;
goto err_unlock;
@@ -427,7 +427,8 @@ static int virtio_gpu_transfer_to_host_ioctl(struct 
drm_device *dev, void *data,
goto err_put_free;
 
ret = -ENOMEM;
-   fence = virtio_gpu_fence_alloc(vgdev);
+   fence = virtio_gpu_fence_alloc(vgdev, vgdev->fence_drv.context,
+  0);
if (!fence)
goto err_unlock;
 
diff --git a/drivers/gpu/drm/virtio/virtgpu_plane.c 
b/drivers/gpu/drm/virtio/virtgpu_plane.c
index a49fd9480381..6d3cc9e238a4 100644
--- a/drivers/gpu/drm/virtio/virtgpu_plane.c
+++ b/drivers/gpu/drm/virtio/virtgpu_plane.c
@@ -256,7 +256,8 @@ static int virtio_gpu_plane_prepare_fb(struct drm_plane 
*plane,
return 0;
 
if (bo->dumb && (plane->state->fb != new_state->fb)) {
-   vgfb->fence = virtio_gpu_fence_alloc(vgdev);
+   vgfb->fence = virtio_gpu_fence_alloc(vgdev, 
vgdev->fence_drv.context,
+0

[PATCH v2 04/12] drm/virtio: implement context init: probe for feature

2021-09-16 Thread Gurchetan Singh

From: Anthoine Bourgeois 

Let's probe for VIRTIO_GPU_F_CONTEXT_INIT.

Create a new DRM_INFO(..) line since the current one is getting
too long.

Signed-off-by: Anthoine Bourgeois 
Acked-by: Lingfeng Yang 
---
 drivers/gpu/drm/virtio/virtgpu_debugfs.c | 1 +
 drivers/gpu/drm/virtio/virtgpu_drv.c | 1 +
 drivers/gpu/drm/virtio/virtgpu_drv.h | 1 +
 drivers/gpu/drm/virtio/virtgpu_kms.c | 8 +++-
 4 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_debugfs.c 
b/drivers/gpu/drm/virtio/virtgpu_debugfs.c
index c2b20e0ee030..b6954e2f75e6 100644
--- a/drivers/gpu/drm/virtio/virtgpu_debugfs.c
+++ b/drivers/gpu/drm/virtio/virtgpu_debugfs.c
@@ -52,6 +52,7 @@ static int virtio_gpu_features(struct seq_file *m, void *data)
vgdev->has_resource_assign_uuid);
 
virtio_gpu_add_bool(m, "blob resources", vgdev->has_resource_blob);
+   virtio_gpu_add_bool(m, "context init", vgdev->has_context_init);
virtio_gpu_add_int(m, "cap sets", vgdev->num_capsets);
virtio_gpu_add_int(m, "scanouts", vgdev->num_scanouts);
if (vgdev->host_visible_region.len) {
diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.c 
b/drivers/gpu/drm/virtio/virtgpu_drv.c
index ed85a7863256..9d963f1fda8f 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.c
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.c
@@ -172,6 +172,7 @@ static unsigned int features[] = {
VIRTIO_GPU_F_EDID,
VIRTIO_GPU_F_RESOURCE_UUID,
VIRTIO_GPU_F_RESOURCE_BLOB,
+   VIRTIO_GPU_F_CONTEXT_INIT,
 };
 static struct virtio_driver virtio_gpu_driver = {
.feature_table = features,
diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
b/drivers/gpu/drm/virtio/virtgpu_drv.h
index 3023e16be0d6..5e1958a522ff 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -236,6 +236,7 @@ struct virtio_gpu_device {
bool has_resource_assign_uuid;
bool has_resource_blob;
bool has_host_visible;
+   bool has_context_init;
struct virtio_shm_region host_visible_region;
struct drm_mm host_visible_mm;
 
diff --git a/drivers/gpu/drm/virtio/virtgpu_kms.c 
b/drivers/gpu/drm/virtio/virtgpu_kms.c
index 58a65121c200..21f410901694 100644
--- a/drivers/gpu/drm/virtio/virtgpu_kms.c
+++ b/drivers/gpu/drm/virtio/virtgpu_kms.c
@@ -191,13 +191,19 @@ int virtio_gpu_init(struct drm_device *dev)
(unsigned long)vgdev->host_visible_region.addr,
(unsigned long)vgdev->host_visible_region.len);
}
+   if (virtio_has_feature(vgdev->vdev, VIRTIO_GPU_F_CONTEXT_INIT)) {
+   vgdev->has_context_init = true;
+   }
 
-   DRM_INFO("features: %cvirgl %cedid %cresource_blob %chost_visible\n",
+   DRM_INFO("features: %cvirgl %cedid %cresource_blob %chost_visible",
 vgdev->has_virgl_3d? '+' : '-',
 vgdev->has_edid? '+' : '-',
 vgdev->has_resource_blob ? '+' : '-',
 vgdev->has_host_visible ? '+' : '-');
 
+   DRM_INFO("features: %ccontext_init\n",
+vgdev->has_context_init ? '+' : '-');
+
ret = virtio_find_vqs(vgdev->vdev, 2, vqs, callbacks, names, NULL);
if (ret) {
DRM_ERROR("failed to find virt queues\n");
-- 
2.33.0.464.g1972c5931b-goog

[PATCH v2 01/12] virtio-gpu api: multiple context types with explicit initialization

2021-09-16 Thread Gurchetan Singh

This feature allows for each virtio-gpu 3D context to be created
with a "context_init" variable.  This variable can specify:

 - the type of protocol used by the context via the capset id.
   This is useful for differentiating virgl, gfxstream, and venus
   protocols by host userspace.

 - other things in the future, such as the version of the context.

In addition, each different context needs one or more timelines, so
for example a virgl context's waiting can be independent on a
gfxstream context's waiting.

VIRTIO_GPU_FLAG_INFO_RING_IDX is introduced to specific to tell the
host which per-context command ring (or "hardware queue", distinct
from the virtio-queue) the fence should be associated with.

The new capability sets (gfxstream, venus etc.) are only defined in
the virtio-gpu spec and not defined in the header.

Signed-off-by: Gurchetan Singh 
Acked-by: Lingfeng Yang 
---
 include/uapi/linux/virtio_gpu.h | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/virtio_gpu.h b/include/uapi/linux/virtio_gpu.h
index 97523a95781d..b0e3d91dfab7 100644
--- a/include/uapi/linux/virtio_gpu.h
+++ b/include/uapi/linux/virtio_gpu.h
@@ -59,6 +59,11 @@
  * VIRTIO_GPU_CMD_RESOURCE_CREATE_BLOB
  */
 #define VIRTIO_GPU_F_RESOURCE_BLOB   3
+/*
+ * VIRTIO_GPU_CMD_CREATE_CONTEXT with
+ * context_init and multiple timelines
+ */
+#define VIRTIO_GPU_F_CONTEXT_INIT4
 
 enum virtio_gpu_ctrl_type {
VIRTIO_GPU_UNDEFINED = 0,
@@ -122,14 +127,20 @@ enum virtio_gpu_shm_id {
VIRTIO_GPU_SHM_ID_HOST_VISIBLE = 1
 };
 
-#define VIRTIO_GPU_FLAG_FENCE (1 << 0)
+#define VIRTIO_GPU_FLAG_FENCE (1 << 0)
+/*
+ * If the following flag is set, then ring_idx contains the index
+ * of the command ring that needs to used when creating the fence
+ */
+#define VIRTIO_GPU_FLAG_INFO_RING_IDX (1 << 1)
 
 struct virtio_gpu_ctrl_hdr {
__le32 type;
__le32 flags;
__le64 fence_id;
__le32 ctx_id;
-   __le32 padding;
+   u8 ring_idx;
+   u8 padding[3];
 };
 
 /* data passed in the cursor vq */
@@ -269,10 +280,11 @@ struct virtio_gpu_resource_create_3d {
 };
 
 /* VIRTIO_GPU_CMD_CTX_CREATE */
+#define VIRTIO_GPU_CONTEXT_INIT_CAPSET_ID_MASK 0x00ff
 struct virtio_gpu_ctx_create {
struct virtio_gpu_ctrl_hdr hdr;
__le32 nlen;
-   __le32 padding;
+   __le32 context_init;
char debug_name[64];
 };
 
-- 
2.33.0.464.g1972c5931b-goog

[PATCH v2 03/12] drm/virtio: implement context init: track valid capabilities in a mask

2021-09-16 Thread Gurchetan Singh

The valid capability IDs are between 1 to 63, and defined in the
virtio gpu spec.  This is used for error checking the subsequent
patches.  We're currently only using 2 capability IDs, so this
should be plenty for the immediate future.

Signed-off-by: Gurchetan Singh 
Acked-by: Lingfeng Yang 
---
 drivers/gpu/drm/virtio/virtgpu_drv.h |  3 +++
 drivers/gpu/drm/virtio/virtgpu_kms.c | 18 +-
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
b/drivers/gpu/drm/virtio/virtgpu_drv.h
index 0c4810982530..3023e16be0d6 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -55,6 +55,8 @@
 #define STATE_OK 1
 #define STATE_ERR 2
 
+#define MAX_CAPSET_ID 63
+
 struct virtio_gpu_object_params {
unsigned long size;
bool dumb;
@@ -245,6 +247,7 @@ struct virtio_gpu_device {
 
struct virtio_gpu_drv_capset *capsets;
uint32_t num_capsets;
+   uint64_t capset_id_mask;
struct list_head cap_cache;
 
/* protects uuid state when exporting */
diff --git a/drivers/gpu/drm/virtio/virtgpu_kms.c 
b/drivers/gpu/drm/virtio/virtgpu_kms.c
index f3379059f324..58a65121c200 100644
--- a/drivers/gpu/drm/virtio/virtgpu_kms.c
+++ b/drivers/gpu/drm/virtio/virtgpu_kms.c
@@ -65,6 +65,7 @@ static void virtio_gpu_get_capsets(struct virtio_gpu_device 
*vgdev,
   int num_capsets)
 {
int i, ret;
+   bool invalid_capset_id = false;
 
vgdev->capsets = kcalloc(num_capsets,
 sizeof(struct virtio_gpu_drv_capset),
@@ -78,19 +79,34 @@ static void virtio_gpu_get_capsets(struct virtio_gpu_device 
*vgdev,
virtio_gpu_notify(vgdev);
ret = wait_event_timeout(vgdev->resp_wq,
 vgdev->capsets[i].id > 0, 5 * HZ);
-   if (ret == 0) {
+   /*
+* Capability ids are defined in the virtio-gpu spec and are
+* between 1 to 63, inclusive.
+*/
+   if (!vgdev->capsets[i].id ||
+   vgdev->capsets[i].id > MAX_CAPSET_ID)
+   invalid_capset_id = true;
+
+   if (ret == 0)
DRM_ERROR("timed out waiting for cap set %d\n", i);
+   else if (invalid_capset_id)
+   DRM_ERROR("invalid capset id %u", vgdev->capsets[i].id);
+
+   if (ret == 0 || invalid_capset_id) {
spin_lock(&vgdev->display_info_lock);
kfree(vgdev->capsets);
vgdev->capsets = NULL;
spin_unlock(&vgdev->display_info_lock);
return;
}
+
+   vgdev->capset_id_mask |= 1 << vgdev->capsets[i].id;
DRM_INFO("cap set %d: id %d, max-version %d, max-size %d\n",
 i, vgdev->capsets[i].id,
 vgdev->capsets[i].max_version,
 vgdev->capsets[i].max_size);
}
+
vgdev->num_capsets = num_capsets;
 }
 
-- 
2.33.0.464.g1972c5931b-goog

[PATCH v2 02/12] drm/virtgpu api: create context init feature

2021-09-16 Thread Gurchetan Singh

This change allows creating contexts of depending on set of
context parameters.  The meaning of each of the parameters
is listed below:

1) VIRTGPU_CONTEXT_PARAM_CAPSET_ID

This determines the type of a context based on the capability set
ID.  For example, the current capsets:

VIRTIO_GPU_CAPSET_VIRGL
VIRTIO_GPU_CAPSET_VIRGL2

define a Gallium, TGSI based "virgl" context.  We only need 1 capset
ID per context type, though virgl has two due a bug that has since
been fixed.

The use case is the "gfxstream" rendering library and "venus"
renderer.

gfxstream doesn't do Gallium/TGSI translation and mostly relies on
auto-generated API streaming.  Certain users prefer gfxstream over
virgl for GLES on GLES emulation.  {gfxstream vk}/{venus} are also
required for Vulkan emulation.  The maximum capset ID is 63.

The goal is for guest userspace to choose the optimal context type
depending on the situation/hardware.

2) VIRTGPU_CONTEXT_PARAM_NUM_RINGS

This tells the number of independent command rings that the context
will use.  This value may be zero and is inferred to be zero if
VIRTGPU_CONTEXT_PARAM_NUM_RINGS is not passed in.  This is for backwards
compatibility for virgl, which has one big giant command ring for all
commands.

The maxiumum number of rings is 64.  In practice, multi-queue or
multi-ring submission is used for powerful dGPUs and virtio-gpu
may not be the best option in that case (see PCI passthrough or
rendernode forwarding).

3) VIRTGPU_CONTEXT_PARAM_POLL_RING_IDX_MASK

This is a mask of ring indices for which the DRM fd is pollable.
For example, if VIRTGPU_CONTEXT_PARAM_NUM_RINGS is 2, then the mask
may be:

[ring idx]  |  [1 << ring_idx] | final mask
---
0  11
1  23

The "Sommelier" guest Wayland proxy uses this to poll for events
from the host compositor.

Signed-off-by: Gurchetan Singh 
Acked-by: Lingfeng Yang 
Acked-by: Nicholas Verne 
---
 include/uapi/drm/virtgpu_drm.h | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/include/uapi/drm/virtgpu_drm.h b/include/uapi/drm/virtgpu_drm.h
index b9ec26e9c646..a13e20cc66b4 100644
--- a/include/uapi/drm/virtgpu_drm.h
+++ b/include/uapi/drm/virtgpu_drm.h
@@ -47,12 +47,15 @@ extern "C" {
 #define DRM_VIRTGPU_WAIT 0x08
 #define DRM_VIRTGPU_GET_CAPS  0x09
 #define DRM_VIRTGPU_RESOURCE_CREATE_BLOB 0x0a
+#define DRM_VIRTGPU_CONTEXT_INIT 0x0b
 
 #define VIRTGPU_EXECBUF_FENCE_FD_IN0x01
 #define VIRTGPU_EXECBUF_FENCE_FD_OUT   0x02
+#define VIRTGPU_EXECBUF_RING_IDX   0x04
 #define VIRTGPU_EXECBUF_FLAGS  (\
VIRTGPU_EXECBUF_FENCE_FD_IN |\
VIRTGPU_EXECBUF_FENCE_FD_OUT |\
+   VIRTGPU_EXECBUF_RING_IDX |\
0)
 
 struct drm_virtgpu_map {
@@ -68,6 +71,8 @@ struct drm_virtgpu_execbuffer {
__u64 bo_handles;
__u32 num_bo_handles;
__s32 fence_fd; /* in/out fence fd (see 
VIRTGPU_EXECBUF_FENCE_FD_IN/OUT) */
+   __u32 ring_idx; /* command ring index (see VIRTGPU_EXECBUF_RING_IDX) */
+   __u32 pad;
 };
 
 #define VIRTGPU_PARAM_3D_FEATURES 1 /* do we have 3D features in the hw */
@@ -75,6 +80,8 @@ struct drm_virtgpu_execbuffer {
 #define VIRTGPU_PARAM_RESOURCE_BLOB 3 /* DRM_VIRTGPU_RESOURCE_CREATE_BLOB */
 #define VIRTGPU_PARAM_HOST_VISIBLE 4 /* Host blob resources are mappable */
 #define VIRTGPU_PARAM_CROSS_DEVICE 5 /* Cross virtio-device resource sharing  
*/
+#define VIRTGPU_PARAM_CONTEXT_INIT 6 /* DRM_VIRTGPU_CONTEXT_INIT */
+#define VIRTGPU_PARAM_SUPPORTED_CAPSET_IDs 7 /* Bitmask of supported 
capability set ids */
 
 struct drm_virtgpu_getparam {
__u64 param;
@@ -173,6 +180,22 @@ struct drm_virtgpu_resource_create_blob {
__u64 blob_id;
 };
 
+#define VIRTGPU_CONTEXT_PARAM_CAPSET_ID   0x0001
+#define VIRTGPU_CONTEXT_PARAM_NUM_RINGS   0x0002
+#define VIRTGPU_CONTEXT_PARAM_POLL_RINGS_MASK 0x0003
+struct drm_virtgpu_context_set_param {
+   __u64 param;
+   __u64 value;
+};
+
+struct drm_virtgpu_context_init {
+   __u32 num_params;
+   __u32 pad;
+
+   /* pointer to drm_virtgpu_context_set_param array */
+   __u64 ctx_set_params;
+};
+
 #define DRM_IOCTL_VIRTGPU_MAP \
DRM_IOWR(DRM_COMMAND_BASE + DRM_VIRTGPU_MAP, struct drm_virtgpu_map)
 
@@ -212,6 +235,10 @@ struct drm_virtgpu_resource_create_blob {
DRM_IOWR(DRM_COMMAND_BASE + DRM_VIRTGPU_RESOURCE_CREATE_BLOB,   \
struct drm_virtgpu_resource_create_blob)
 
+#define DRM_IOCTL_VIRTGPU_CONTEXT_INIT \
+   DRM_IOWR(DRM_COMMAND_BASE + DRM_VIRTGPU_CONTEXT_INIT,   \
+   struct drm_virtgpu_context_init)
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.33.0.464.g1972c5931b-goog

[PATCH v2 00/12] Context types

2021-09-16 Thread Gurchetan Singh

Version 2 of context types:

https://lists.oasis-open.org/archives/virtio-dev/202108/msg00141.html

Changes since RFC:
   * le32 info --> {u8 ring_idx + u8 padding[3]).
   * Max rings is now 64.

Changes since v1:
   * Document plan regarding context types + display combinations that
 need implicit sync in patch 9 commit message.

Anthoine Bourgeois (2):
  drm/virtio: implement context init: probe for feature
  drm/virtio: implement context init: support init ioctl

Gurchetan Singh (10):
  virtio-gpu api: multiple context types with explicit initialization
  drm/virtgpu api: create context init feature
  drm/virtio: implement context init: track valid capabilities in a mask
  drm/virtio: implement context init: track {ring_idx, emit_fence_info}
in virtio_gpu_fence
  drm/virtio: implement context init: plumb {base_fence_ctx, ring_idx}
to virtio_gpu_fence_alloc
  drm/virtio: implement context init: stop using drv->context when
creating fence
  drm/virtio: implement context init: allocate an array of fence
contexts
  drm/virtio: implement context init: handle
VIRTGPU_CONTEXT_PARAM_POLL_RINGS_MASK
  drm/virtio: implement context init: add virtio_gpu_fence_event
  drm/virtio: implement context init: advertise feature to userspace

 drivers/gpu/drm/virtio/virtgpu_debugfs.c |   1 +
 drivers/gpu/drm/virtio/virtgpu_drv.c |  44 -
 drivers/gpu/drm/virtio/virtgpu_drv.h |  28 +++-
 drivers/gpu/drm/virtio/virtgpu_fence.c   |  30 +++-
 drivers/gpu/drm/virtio/virtgpu_ioctl.c   | 195 +--
 drivers/gpu/drm/virtio/virtgpu_kms.c |  26 ++-
 drivers/gpu/drm/virtio/virtgpu_plane.c   |   3 +-
 drivers/gpu/drm/virtio/virtgpu_vq.c  |  19 +--
 include/uapi/drm/virtgpu_drm.h   |  27 
 include/uapi/linux/virtio_gpu.h  |  18 ++-
 10 files changed, 355 insertions(+), 36 deletions(-)

-- 
2.33.0.464.g1972c5931b-goog

Re: [virtio-dev] [PATCH v1 09/12] drm/virtio: implement context init: allocate an array of fence contexts

2021-09-16 Thread Gurchetan Singh

On Wed, Sep 15, 2021 at 5:11 PM Chia-I Wu  wrote:

>  i
>
> On Tue, Sep 14, 2021 at 6:26 PM Gurchetan Singh
>  wrote:
> >
> >
> >
> > On Tue, Sep 14, 2021 at 10:53 AM Chia-I Wu  wrote:
> >>
> >> ,On Mon, Sep 13, 2021 at 6:57 PM Gurchetan Singh
> >>  wrote:
> >> >
> >> >
> >> >
> >> >
> >> > On Mon, Sep 13, 2021 at 11:52 AM Chia-I Wu  wrote:
> >> >>
> >> >> .
> >> >>
> >> >> On Mon, Sep 13, 2021 at 10:48 AM Gurchetan Singh
> >> >>  wrote:
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Fri, Sep 10, 2021 at 12:33 PM Chia-I Wu 
> wrote:
> >> >> >>
> >> >> >> On Wed, Sep 8, 2021 at 6:37 PM Gurchetan Singh
> >> >> >>  wrote:
> >> >> >> >
> >> >> >> > We don't want fences from different 3D contexts (virgl,
> gfxstream,
> >> >> >> > venus) to be on the same timeline.  With explicit context
> creation,
> >> >> >> > we can specify the number of ring each context wants.
> >> >> >> >
> >> >> >> > Execbuffer can specify which ring to use.
> >> >> >> >
> >> >> >> > Signed-off-by: Gurchetan Singh 
> >> >> >> > Acked-by: Lingfeng Yang 
> >> >> >> > ---
> >> >> >> >  drivers/gpu/drm/virtio/virtgpu_drv.h   |  3 +++
> >> >> >> >  drivers/gpu/drm/virtio/virtgpu_ioctl.c | 34
> --
> >> >> >> >  2 files changed, 35 insertions(+), 2 deletions(-)
> >> >> >> >
> >> >> >> > diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h
> b/drivers/gpu/drm/virtio/virtgpu_drv.h
> >> >> >> > index a5142d60c2fa..cca9ab505deb 100644
> >> >> >> > --- a/drivers/gpu/drm/virtio/virtgpu_drv.h
> >> >> >> > +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
> >> >> >> > @@ -56,6 +56,7 @@
> >> >> >> >  #define STATE_ERR 2
> >> >> >> >
> >> >> >> >  #define MAX_CAPSET_ID 63
> >> >> >> > +#define MAX_RINGS 64
> >> >> >> >
> >> >> >> >  struct virtio_gpu_object_params {
> >> >> >> > unsigned long size;
> >> >> >> > @@ -263,6 +264,8 @@ struct virtio_gpu_fpriv {
> >> >> >> > uint32_t ctx_id;
> >> >> >> > uint32_t context_init;
> >> >> >> > bool context_created;
> >> >> >> > +   uint32_t num_rings;
> >> >> >> > +   uint64_t base_fence_ctx;
> >> >> >> > struct mutex context_lock;
> >> >> >> >  };
> >> >> >> >
> >> >> >> > diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> >> >> >> > index f51f3393a194..262f79210283 100644
> >> >> >> > --- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> >> >> >> > +++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> >> >> >> > @@ -99,6 +99,11 @@ static int
> virtio_gpu_execbuffer_ioctl(struct drm_device *dev, void *data,
> >> >> >> > int in_fence_fd = exbuf->fence_fd;
> >> >> >> > int out_fence_fd = -1;
> >> >> >> > void *buf;
> >> >> >> > +   uint64_t fence_ctx;
> >> >> >> > +   uint32_t ring_idx;
> >> >> >> > +
> >> >> >> > +   fence_ctx = vgdev->fence_drv.context;
> >> >> >> > +   ring_idx = 0;
> >> >> >> >
> >> >> >> > if (vgdev->has_virgl_3d == false)
> >> >> >> > return -ENOSYS;
> >> >> >> > @@ -106,6 +111,17 @@ static int
> virtio_gpu_execbuffer_ioctl(struct drm_device *dev, void *data,
> >> >> >> > if ((exbuf->flags & ~VIRTGPU_EXECBUF_FLAGS))
> >> >> >> > return -EINVAL;
> >> >> >> >
> >> >> >> > +   if ((exbuf->flags &

Re: [virtio-dev] Re: [PATCH v1 08/12] drm/virtio: implement context init: stop using drv->context when creating fence

2021-09-15 Thread Gurchetan Singh

On Tue, Sep 14, 2021 at 10:53 PM Gerd Hoffmann  wrote:

> On Wed, Sep 08, 2021 at 06:37:13PM -0700, Gurchetan Singh wrote:
> > The plumbing is all here to do this.  Since we always use the
> > default fence context when allocating a fence, this makes no
> > functional difference.
> >
> > We can't process just the largest fence id anymore, since it's
> > it's associated with different timelines.  It's fine for fence_id
> > 260 to signal before 259.  As such, process each fence_id
> > individually.
>
> I guess you need to also update virtio_gpu_fence_event_process()
> then?  It currently has the strict ordering logic baked in ...
>

The update to virtio_gpu_fence_event_process was done as a preparation a
few months back:

https://cgit.freedesktop.org/drm/drm-misc/commit/drivers/gpu/drm/virtio/virtgpu_fence.c?id=36549848ed27c22bb2ffd5d1468efc6505b05f97



>
> take care,
>   Gerd
>
>
> -
> To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
>
>

Re: [virtio-dev] [PATCH v1 09/12] drm/virtio: implement context init: allocate an array of fence contexts

2021-09-14 Thread Gurchetan Singh

On Tue, Sep 14, 2021 at 10:53 AM Chia-I Wu  wrote:

> ,On Mon, Sep 13, 2021 at 6:57 PM Gurchetan Singh
>  wrote:
> >
> >
> >
> >
> > On Mon, Sep 13, 2021 at 11:52 AM Chia-I Wu  wrote:
> >>
> >> .
> >>
> >> On Mon, Sep 13, 2021 at 10:48 AM Gurchetan Singh
> >>  wrote:
> >> >
> >> >
> >> >
> >> > On Fri, Sep 10, 2021 at 12:33 PM Chia-I Wu  wrote:
> >> >>
> >> >> On Wed, Sep 8, 2021 at 6:37 PM Gurchetan Singh
> >> >>  wrote:
> >> >> >
> >> >> > We don't want fences from different 3D contexts (virgl, gfxstream,
> >> >> > venus) to be on the same timeline.  With explicit context creation,
> >> >> > we can specify the number of ring each context wants.
> >> >> >
> >> >> > Execbuffer can specify which ring to use.
> >> >> >
> >> >> > Signed-off-by: Gurchetan Singh 
> >> >> > Acked-by: Lingfeng Yang 
> >> >> > ---
> >> >> >  drivers/gpu/drm/virtio/virtgpu_drv.h   |  3 +++
> >> >> >  drivers/gpu/drm/virtio/virtgpu_ioctl.c | 34
> --
> >> >> >  2 files changed, 35 insertions(+), 2 deletions(-)
> >> >> >
> >> >> > diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h
> b/drivers/gpu/drm/virtio/virtgpu_drv.h
> >> >> > index a5142d60c2fa..cca9ab505deb 100644
> >> >> > --- a/drivers/gpu/drm/virtio/virtgpu_drv.h
> >> >> > +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
> >> >> > @@ -56,6 +56,7 @@
> >> >> >  #define STATE_ERR 2
> >> >> >
> >> >> >  #define MAX_CAPSET_ID 63
> >> >> > +#define MAX_RINGS 64
> >> >> >
> >> >> >  struct virtio_gpu_object_params {
> >> >> > unsigned long size;
> >> >> > @@ -263,6 +264,8 @@ struct virtio_gpu_fpriv {
> >> >> > uint32_t ctx_id;
> >> >> > uint32_t context_init;
> >> >> > bool context_created;
> >> >> > +   uint32_t num_rings;
> >> >> > +   uint64_t base_fence_ctx;
> >> >> > struct mutex context_lock;
> >> >> >  };
> >> >> >
> >> >> > diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> >> >> > index f51f3393a194..262f79210283 100644
> >> >> > --- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> >> >> > +++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> >> >> > @@ -99,6 +99,11 @@ static int virtio_gpu_execbuffer_ioctl(struct
> drm_device *dev, void *data,
> >> >> > int in_fence_fd = exbuf->fence_fd;
> >> >> > int out_fence_fd = -1;
> >> >> > void *buf;
> >> >> > +   uint64_t fence_ctx;
> >> >> > +   uint32_t ring_idx;
> >> >> > +
> >> >> > +   fence_ctx = vgdev->fence_drv.context;
> >> >> > +   ring_idx = 0;
> >> >> >
> >> >> > if (vgdev->has_virgl_3d == false)
> >> >> > return -ENOSYS;
> >> >> > @@ -106,6 +111,17 @@ static int virtio_gpu_execbuffer_ioctl(struct
> drm_device *dev, void *data,
> >> >> > if ((exbuf->flags & ~VIRTGPU_EXECBUF_FLAGS))
> >> >> > return -EINVAL;
> >> >> >
> >> >> > +   if ((exbuf->flags & VIRTGPU_EXECBUF_RING_IDX)) {
> >> >> > +   if (exbuf->ring_idx >= vfpriv->num_rings)
> >> >> > +   return -EINVAL;
> >> >> > +
> >> >> > +   if (!vfpriv->base_fence_ctx)
> >> >> > +   return -EINVAL;
> >> >> > +
> >> >> > +   fence_ctx = vfpriv->base_fence_ctx;
> >> >> > +   ring_idx = exbuf->ring_idx;
> >> >> > +   }
> >> >> > +
> >> >> > exbuf->fence_fd = -1;
> >> >> >
> >> >> > virtio_gpu_create_context(dev, file);
> >> >> > @@ -173,7 +189,7 @@ static int virtio_gpu_execbuffer_ioctl(struct
> drm_device *d

Re: [virtio-dev] [PATCH v1 09/12] drm/virtio: implement context init: allocate an array of fence contexts

2021-09-13 Thread Gurchetan Singh

On Mon, Sep 13, 2021 at 11:52 AM Chia-I Wu  wrote:

> .
>
> On Mon, Sep 13, 2021 at 10:48 AM Gurchetan Singh
>  wrote:
> >
> >
> >
> > On Fri, Sep 10, 2021 at 12:33 PM Chia-I Wu  wrote:
> >>
> >> On Wed, Sep 8, 2021 at 6:37 PM Gurchetan Singh
> >>  wrote:
> >> >
> >> > We don't want fences from different 3D contexts (virgl, gfxstream,
> >> > venus) to be on the same timeline.  With explicit context creation,
> >> > we can specify the number of ring each context wants.
> >> >
> >> > Execbuffer can specify which ring to use.
> >> >
> >> > Signed-off-by: Gurchetan Singh 
> >> > Acked-by: Lingfeng Yang 
> >> > ---
> >> >  drivers/gpu/drm/virtio/virtgpu_drv.h   |  3 +++
> >> >  drivers/gpu/drm/virtio/virtgpu_ioctl.c | 34
> --
> >> >  2 files changed, 35 insertions(+), 2 deletions(-)
> >> >
> >> > diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h
> b/drivers/gpu/drm/virtio/virtgpu_drv.h
> >> > index a5142d60c2fa..cca9ab505deb 100644
> >> > --- a/drivers/gpu/drm/virtio/virtgpu_drv.h
> >> > +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
> >> > @@ -56,6 +56,7 @@
> >> >  #define STATE_ERR 2
> >> >
> >> >  #define MAX_CAPSET_ID 63
> >> > +#define MAX_RINGS 64
> >> >
> >> >  struct virtio_gpu_object_params {
> >> > unsigned long size;
> >> > @@ -263,6 +264,8 @@ struct virtio_gpu_fpriv {
> >> > uint32_t ctx_id;
> >> > uint32_t context_init;
> >> > bool context_created;
> >> > +   uint32_t num_rings;
> >> > +   uint64_t base_fence_ctx;
> >> > struct mutex context_lock;
> >> >  };
> >> >
> >> > diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> >> > index f51f3393a194..262f79210283 100644
> >> > --- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> >> > +++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> >> > @@ -99,6 +99,11 @@ static int virtio_gpu_execbuffer_ioctl(struct
> drm_device *dev, void *data,
> >> > int in_fence_fd = exbuf->fence_fd;
> >> > int out_fence_fd = -1;
> >> > void *buf;
> >> > +   uint64_t fence_ctx;
> >> > +   uint32_t ring_idx;
> >> > +
> >> > +   fence_ctx = vgdev->fence_drv.context;
> >> > +   ring_idx = 0;
> >> >
> >> > if (vgdev->has_virgl_3d == false)
> >> > return -ENOSYS;
> >> > @@ -106,6 +111,17 @@ static int virtio_gpu_execbuffer_ioctl(struct
> drm_device *dev, void *data,
> >> > if ((exbuf->flags & ~VIRTGPU_EXECBUF_FLAGS))
> >> > return -EINVAL;
> >> >
> >> > +   if ((exbuf->flags & VIRTGPU_EXECBUF_RING_IDX)) {
> >> > +   if (exbuf->ring_idx >= vfpriv->num_rings)
> >> > +   return -EINVAL;
> >> > +
> >> > +   if (!vfpriv->base_fence_ctx)
> >> > +   return -EINVAL;
> >> > +
> >> > +   fence_ctx = vfpriv->base_fence_ctx;
> >> > +   ring_idx = exbuf->ring_idx;
> >> > +   }
> >> > +
> >> > exbuf->fence_fd = -1;
> >> >
> >> > virtio_gpu_create_context(dev, file);
> >> > @@ -173,7 +189,7 @@ static int virtio_gpu_execbuffer_ioctl(struct
> drm_device *dev, void *data,
> >> > goto out_memdup;
> >> > }
> >> >
> >> > -   out_fence = virtio_gpu_fence_alloc(vgdev,
> vgdev->fence_drv.context, 0);
> >> > +   out_fence = virtio_gpu_fence_alloc(vgdev, fence_ctx,
> ring_idx);
> >> > if(!out_fence) {
> >> > ret = -ENOMEM;
> >> > goto out_unresv;
> >> > @@ -691,7 +707,7 @@ static int virtio_gpu_context_init_ioctl(struct
> drm_device *dev,
> >> > return -EINVAL;
> >> >
> >> > /* Number of unique parameters supported at this time. */
> >> > -   if (num_params > 1)
> >> > +   if (num_params > 2)
> >> > return -EINVAL;
> >

Re: [virtio-dev] [PATCH v1 09/12] drm/virtio: implement context init: allocate an array of fence contexts

2021-09-13 Thread Gurchetan Singh

On Fri, Sep 10, 2021 at 12:33 PM Chia-I Wu  wrote:

> On Wed, Sep 8, 2021 at 6:37 PM Gurchetan Singh
>  wrote:
> >
> > We don't want fences from different 3D contexts (virgl, gfxstream,
> > venus) to be on the same timeline.  With explicit context creation,
> > we can specify the number of ring each context wants.
> >
> > Execbuffer can specify which ring to use.
> >
> > Signed-off-by: Gurchetan Singh 
> > Acked-by: Lingfeng Yang 
> > ---
> >  drivers/gpu/drm/virtio/virtgpu_drv.h   |  3 +++
> >  drivers/gpu/drm/virtio/virtgpu_ioctl.c | 34 --
> >  2 files changed, 35 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h
> b/drivers/gpu/drm/virtio/virtgpu_drv.h
> > index a5142d60c2fa..cca9ab505deb 100644
> > --- a/drivers/gpu/drm/virtio/virtgpu_drv.h
> > +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
> > @@ -56,6 +56,7 @@
> >  #define STATE_ERR 2
> >
> >  #define MAX_CAPSET_ID 63
> > +#define MAX_RINGS 64
> >
> >  struct virtio_gpu_object_params {
> > unsigned long size;
> > @@ -263,6 +264,8 @@ struct virtio_gpu_fpriv {
> > uint32_t ctx_id;
> > uint32_t context_init;
> > bool context_created;
> > +   uint32_t num_rings;
> > +   uint64_t base_fence_ctx;
> > struct mutex context_lock;
> >  };
> >
> > diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> > index f51f3393a194..262f79210283 100644
> > --- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> > +++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> > @@ -99,6 +99,11 @@ static int virtio_gpu_execbuffer_ioctl(struct
> drm_device *dev, void *data,
> > int in_fence_fd = exbuf->fence_fd;
> > int out_fence_fd = -1;
> > void *buf;
> > +   uint64_t fence_ctx;
> > +   uint32_t ring_idx;
> > +
> > +   fence_ctx = vgdev->fence_drv.context;
> > +   ring_idx = 0;
> >
> > if (vgdev->has_virgl_3d == false)
> > return -ENOSYS;
> > @@ -106,6 +111,17 @@ static int virtio_gpu_execbuffer_ioctl(struct
> drm_device *dev, void *data,
> > if ((exbuf->flags & ~VIRTGPU_EXECBUF_FLAGS))
> > return -EINVAL;
> >
> > +   if ((exbuf->flags & VIRTGPU_EXECBUF_RING_IDX)) {
> > +   if (exbuf->ring_idx >= vfpriv->num_rings)
> > +   return -EINVAL;
> > +
> > +   if (!vfpriv->base_fence_ctx)
> > +   return -EINVAL;
> > +
> > +   fence_ctx = vfpriv->base_fence_ctx;
> > +   ring_idx = exbuf->ring_idx;
> > +   }
> > +
> > exbuf->fence_fd = -1;
> >
> > virtio_gpu_create_context(dev, file);
> > @@ -173,7 +189,7 @@ static int virtio_gpu_execbuffer_ioctl(struct
> drm_device *dev, void *data,
> > goto out_memdup;
> > }
> >
> > -   out_fence = virtio_gpu_fence_alloc(vgdev,
> vgdev->fence_drv.context, 0);
> > +   out_fence = virtio_gpu_fence_alloc(vgdev, fence_ctx, ring_idx);
> > if(!out_fence) {
> > ret = -ENOMEM;
> > goto out_unresv;
> > @@ -691,7 +707,7 @@ static int virtio_gpu_context_init_ioctl(struct
> drm_device *dev,
> > return -EINVAL;
> >
> > /* Number of unique parameters supported at this time. */
> > -   if (num_params > 1)
> > +   if (num_params > 2)
> > return -EINVAL;
> >
> > ctx_set_params =
> memdup_user(u64_to_user_ptr(args->ctx_set_params),
> > @@ -731,6 +747,20 @@ static int virtio_gpu_context_init_ioctl(struct
> drm_device *dev,
> >
> > vfpriv->context_init |= value;
> > break;
> > +   case VIRTGPU_CONTEXT_PARAM_NUM_RINGS:
> > +   if (vfpriv->base_fence_ctx) {
> > +   ret = -EINVAL;
> > +   goto out_unlock;
> > +   }
> > +
> > +   if (value > MAX_RINGS) {
> > +   ret = -EINVAL;
> > +   goto out_unlock;
> > +   }
> > +
> > +   vfpriv->base_fence_ctx =
> dma_fence_context_alloc(value);
> With multiple fence contexts, we should do somethin

[PATCH v1 12/12] drm/virtio: implement context init: advertise feature to userspace

2021-09-08 Thread Gurchetan Singh

This advertises the context init feature to userspace, along with
a mask of supported capabilities.

Signed-off-by: Gurchetan Singh 
Acked-by: Lingfeng Yang 
---
 drivers/gpu/drm/virtio/virtgpu_ioctl.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
index fdaa7f3d9eeb..5618a1d5879c 100644
--- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
+++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
@@ -286,6 +286,12 @@ static int virtio_gpu_getparam_ioctl(struct drm_device 
*dev, void *data,
case VIRTGPU_PARAM_CROSS_DEVICE:
value = vgdev->has_resource_assign_uuid ? 1 : 0;
break;
+   case VIRTGPU_PARAM_CONTEXT_INIT:
+   value = vgdev->has_context_init ? 1 : 0;
+   break;
+   case VIRTGPU_PARAM_SUPPORTED_CAPSET_IDs:
+   value = vgdev->capset_id_mask;
+   break;
default:
return -EINVAL;
}
-- 
2.33.0.153.gba50c8fa24-goog

[PATCH v1 11/12] drm/virtio: implement context init: add virtio_gpu_fence_event

2021-09-08 Thread Gurchetan Singh

Similar to DRM_VMW_EVENT_FENCE_SIGNALED.  Sends a pollable event
to the DRM file descriptor when a fence on a specific ring is
signaled.

One difference is the event is not exposed via the UAPI -- this is
because host responses are on a shared memory buffer of type
BLOB_MEM_GUEST [this is the common way to receive responses with
virtgpu].  As such, there is no context specific read(..)
implementation either -- just a poll(..) implementation.

Signed-off-by: Gurchetan Singh 
Acked-by: Nicholas Verne 
---
 drivers/gpu/drm/virtio/virtgpu_drv.c   | 43 +-
 drivers/gpu/drm/virtio/virtgpu_drv.h   |  7 +
 drivers/gpu/drm/virtio/virtgpu_fence.c | 10 ++
 drivers/gpu/drm/virtio/virtgpu_ioctl.c | 34 
 4 files changed, 93 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.c 
b/drivers/gpu/drm/virtio/virtgpu_drv.c
index 9d963f1fda8f..749db18dcfa2 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.c
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.c
@@ -29,6 +29,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
@@ -155,6 +157,35 @@ static void virtio_gpu_config_changed(struct virtio_device 
*vdev)
schedule_work(&vgdev->config_changed_work);
 }
 
+static __poll_t virtio_gpu_poll(struct file *filp,
+   struct poll_table_struct *wait)
+{
+   struct drm_file *drm_file = filp->private_data;
+   struct virtio_gpu_fpriv *vfpriv = drm_file->driver_priv;
+   struct drm_device *dev = drm_file->minor->dev;
+   struct drm_pending_event *e = NULL;
+   __poll_t mask = 0;
+
+   if (!vfpriv->ring_idx_mask)
+   return drm_poll(filp, wait);
+
+   poll_wait(filp, &drm_file->event_wait, wait);
+
+   if (!list_empty(&drm_file->event_list)) {
+   spin_lock_irq(&dev->event_lock);
+   e = list_first_entry(&drm_file->event_list,
+struct drm_pending_event, link);
+   drm_file->event_space += e->event->length;
+   list_del(&e->link);
+   spin_unlock_irq(&dev->event_lock);
+
+   kfree(e);
+   mask |= EPOLLIN | EPOLLRDNORM;
+   }
+
+   return mask;
+}
+
 static struct virtio_device_id id_table[] = {
{ VIRTIO_ID_GPU, VIRTIO_DEV_ANY_ID },
{ 0 },
@@ -194,7 +225,17 @@ MODULE_AUTHOR("Dave Airlie ");
 MODULE_AUTHOR("Gerd Hoffmann ");
 MODULE_AUTHOR("Alon Levy");
 
-DEFINE_DRM_GEM_FOPS(virtio_gpu_driver_fops);
+static const struct file_operations virtio_gpu_driver_fops = {
+   .owner  = THIS_MODULE,
+   .open   = drm_open,
+   .release= drm_release,
+   .unlocked_ioctl = drm_ioctl,
+   .compat_ioctl   = drm_compat_ioctl,
+   .poll   = virtio_gpu_poll,
+   .read   = drm_read,
+   .llseek = noop_llseek,
+   .mmap   = drm_gem_mmap
+};
 
 static const struct drm_driver driver = {
.driver_features = DRIVER_MODESET | DRIVER_GEM | DRIVER_RENDER | 
DRIVER_ATOMIC,
diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
b/drivers/gpu/drm/virtio/virtgpu_drv.h
index cb60d52c2bd1..e0265fe74aa5 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -138,11 +138,18 @@ struct virtio_gpu_fence_driver {
spinlock_t   lock;
 };
 
+#define VIRTGPU_EVENT_FENCE_SIGNALED_INTERNAL 0x1000
+struct virtio_gpu_fence_event {
+   struct drm_pending_event base;
+   struct drm_event event;
+};
+
 struct virtio_gpu_fence {
struct dma_fence f;
uint32_t ring_idx;
uint64_t fence_id;
bool emit_fence_info;
+   struct virtio_gpu_fence_event *e;
struct virtio_gpu_fence_driver *drv;
struct list_head node;
 };
diff --git a/drivers/gpu/drm/virtio/virtgpu_fence.c 
b/drivers/gpu/drm/virtio/virtgpu_fence.c
index 98a00c1e654d..f28357dbde35 100644
--- a/drivers/gpu/drm/virtio/virtgpu_fence.c
+++ b/drivers/gpu/drm/virtio/virtgpu_fence.c
@@ -152,11 +152,21 @@ void virtio_gpu_fence_event_process(struct 
virtio_gpu_device *vgdev,
continue;
 
dma_fence_signal_locked(&curr->f);
+   if (curr->e) {
+   drm_send_event(vgdev->ddev, &curr->e->base);
+   curr->e = NULL;
+   }
+
list_del(&curr->node);
dma_fence_put(&curr->f);
}
 
dma_fence_signal_locked(&signaled->f);
+   if (signaled->e) {
+   drm_send_event(vgdev->ddev, &signaled->e->base);
+   signaled->e = NULL;
+   }
+
list_del(&signaled->node);

[PATCH v1 10/12] drm/virtio: implement context init: handle VIRTGPU_CONTEXT_PARAM_POLL_RINGS_MASK

2021-09-08 Thread Gurchetan Singh

For the Sommelier guest Wayland proxy, it's desirable for the
DRM fd to be pollable in response to an host compositor event.
This can also be used by the 3D driver to poll events on a CPU
timeline.

This enables the DRM fd associated with a particular 3D context
to be polled independent of KMS events.  The parameter
VIRTGPU_CONTEXT_PARAM_POLL_RINGS_MASK specifies the pollable
rings.

Signed-off-by: Gurchetan Singh 
Acked-by: Nicholas Verne 
---
 drivers/gpu/drm/virtio/virtgpu_drv.h   |  1 +
 drivers/gpu/drm/virtio/virtgpu_ioctl.c | 22 +-
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
b/drivers/gpu/drm/virtio/virtgpu_drv.h
index cca9ab505deb..cb60d52c2bd1 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -266,6 +266,7 @@ struct virtio_gpu_fpriv {
bool context_created;
uint32_t num_rings;
uint64_t base_fence_ctx;
+   uint64_t ring_idx_mask;
struct mutex context_lock;
 };
 
diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
index 262f79210283..be7b22a03884 100644
--- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
+++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
@@ -694,6 +694,7 @@ static int virtio_gpu_context_init_ioctl(struct drm_device 
*dev,
 {
int ret = 0;
uint32_t num_params, i, param, value;
+   uint64_t valid_ring_mask;
size_t len;
struct drm_virtgpu_context_set_param *ctx_set_params = NULL;
struct virtio_gpu_device *vgdev = dev->dev_private;
@@ -707,7 +708,7 @@ static int virtio_gpu_context_init_ioctl(struct drm_device 
*dev,
return -EINVAL;
 
/* Number of unique parameters supported at this time. */
-   if (num_params > 2)
+   if (num_params > 3)
return -EINVAL;
 
ctx_set_params = memdup_user(u64_to_user_ptr(args->ctx_set_params),
@@ -761,12 +762,31 @@ static int virtio_gpu_context_init_ioctl(struct 
drm_device *dev,
vfpriv->base_fence_ctx = dma_fence_context_alloc(value);
vfpriv->num_rings = value;
break;
+   case VIRTGPU_CONTEXT_PARAM_POLL_RINGS_MASK:
+   if (vfpriv->ring_idx_mask) {
+   ret = -EINVAL;
+   goto out_unlock;
+   }
+
+   vfpriv->ring_idx_mask = value;
+   break;
default:
ret = -EINVAL;
goto out_unlock;
}
}
 
+   if (vfpriv->ring_idx_mask) {
+   valid_ring_mask = 0;
+   for (i = 0; i < vfpriv->num_rings; i++)
+   valid_ring_mask |= 1 << i;
+
+   if (~valid_ring_mask & vfpriv->ring_idx_mask) {
+   ret = -EINVAL;
+   goto out_unlock;
+   }
+   }
+
virtio_gpu_create_context_locked(vgdev, vfpriv);
virtio_gpu_notify(vgdev);
 
-- 
2.33.0.153.gba50c8fa24-goog

[PATCH v1 09/12] drm/virtio: implement context init: allocate an array of fence contexts

2021-09-08 Thread Gurchetan Singh

We don't want fences from different 3D contexts (virgl, gfxstream,
venus) to be on the same timeline.  With explicit context creation,
we can specify the number of ring each context wants.

Execbuffer can specify which ring to use.

Signed-off-by: Gurchetan Singh 
Acked-by: Lingfeng Yang 
---
 drivers/gpu/drm/virtio/virtgpu_drv.h   |  3 +++
 drivers/gpu/drm/virtio/virtgpu_ioctl.c | 34 --
 2 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
b/drivers/gpu/drm/virtio/virtgpu_drv.h
index a5142d60c2fa..cca9ab505deb 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -56,6 +56,7 @@
 #define STATE_ERR 2
 
 #define MAX_CAPSET_ID 63
+#define MAX_RINGS 64
 
 struct virtio_gpu_object_params {
unsigned long size;
@@ -263,6 +264,8 @@ struct virtio_gpu_fpriv {
uint32_t ctx_id;
uint32_t context_init;
bool context_created;
+   uint32_t num_rings;
+   uint64_t base_fence_ctx;
struct mutex context_lock;
 };
 
diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
index f51f3393a194..262f79210283 100644
--- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
+++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
@@ -99,6 +99,11 @@ static int virtio_gpu_execbuffer_ioctl(struct drm_device 
*dev, void *data,
int in_fence_fd = exbuf->fence_fd;
int out_fence_fd = -1;
void *buf;
+   uint64_t fence_ctx;
+   uint32_t ring_idx;
+
+   fence_ctx = vgdev->fence_drv.context;
+   ring_idx = 0;
 
if (vgdev->has_virgl_3d == false)
return -ENOSYS;
@@ -106,6 +111,17 @@ static int virtio_gpu_execbuffer_ioctl(struct drm_device 
*dev, void *data,
if ((exbuf->flags & ~VIRTGPU_EXECBUF_FLAGS))
return -EINVAL;
 
+   if ((exbuf->flags & VIRTGPU_EXECBUF_RING_IDX)) {
+   if (exbuf->ring_idx >= vfpriv->num_rings)
+   return -EINVAL;
+
+   if (!vfpriv->base_fence_ctx)
+   return -EINVAL;
+
+   fence_ctx = vfpriv->base_fence_ctx;
+   ring_idx = exbuf->ring_idx;
+   }
+
exbuf->fence_fd = -1;
 
virtio_gpu_create_context(dev, file);
@@ -173,7 +189,7 @@ static int virtio_gpu_execbuffer_ioctl(struct drm_device 
*dev, void *data,
goto out_memdup;
}
 
-   out_fence = virtio_gpu_fence_alloc(vgdev, vgdev->fence_drv.context, 0);
+   out_fence = virtio_gpu_fence_alloc(vgdev, fence_ctx, ring_idx);
if(!out_fence) {
ret = -ENOMEM;
goto out_unresv;
@@ -691,7 +707,7 @@ static int virtio_gpu_context_init_ioctl(struct drm_device 
*dev,
return -EINVAL;
 
/* Number of unique parameters supported at this time. */
-   if (num_params > 1)
+   if (num_params > 2)
return -EINVAL;
 
ctx_set_params = memdup_user(u64_to_user_ptr(args->ctx_set_params),
@@ -731,6 +747,20 @@ static int virtio_gpu_context_init_ioctl(struct drm_device 
*dev,
 
vfpriv->context_init |= value;
break;
+   case VIRTGPU_CONTEXT_PARAM_NUM_RINGS:
+   if (vfpriv->base_fence_ctx) {
+   ret = -EINVAL;
+   goto out_unlock;
+   }
+
+   if (value > MAX_RINGS) {
+   ret = -EINVAL;
+   goto out_unlock;
+   }
+
+   vfpriv->base_fence_ctx = dma_fence_context_alloc(value);
+   vfpriv->num_rings = value;
+   break;
default:
ret = -EINVAL;
goto out_unlock;
-- 
2.33.0.153.gba50c8fa24-goog

[PATCH v1 05/12] drm/virtio: implement context init: support init ioctl

2021-09-08 Thread Gurchetan Singh

From: Anthoine Bourgeois 

This implements the context initialization ioctl.  A list of params
is passed in by userspace, and kernel driver validates them.  The
only currently supported param is VIRTGPU_CONTEXT_PARAM_CAPSET_ID.

If the context has already been initialized, -EEXIST is returned.
This happens after Linux userspace does dumb_create + followed by
opening the Mesa virgl driver with the same virtgpu instance.

However, for most applications, 3D contexts will be explicitly
initialized when the feature is available.

Signed-off-by: Anthoine Bourgeois 
Acked-by: Lingfeng Yang 
---
 drivers/gpu/drm/virtio/virtgpu_drv.h   |  6 +-
 drivers/gpu/drm/virtio/virtgpu_ioctl.c | 96 --
 drivers/gpu/drm/virtio/virtgpu_vq.c|  4 +-
 3 files changed, 98 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
b/drivers/gpu/drm/virtio/virtgpu_drv.h
index 5e1958a522ff..9996abf60e3a 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -259,12 +259,13 @@ struct virtio_gpu_device {
 
 struct virtio_gpu_fpriv {
uint32_t ctx_id;
+   uint32_t context_init;
bool context_created;
struct mutex context_lock;
 };
 
 /* virtgpu_ioctl.c */
-#define DRM_VIRTIO_NUM_IOCTLS 11
+#define DRM_VIRTIO_NUM_IOCTLS 12
 extern struct drm_ioctl_desc virtio_gpu_ioctls[DRM_VIRTIO_NUM_IOCTLS];
 void virtio_gpu_create_context(struct drm_device *dev, struct drm_file *file);
 
@@ -342,7 +343,8 @@ int virtio_gpu_cmd_get_capset(struct virtio_gpu_device 
*vgdev,
  struct virtio_gpu_drv_cap_cache **cache_p);
 int virtio_gpu_cmd_get_edids(struct virtio_gpu_device *vgdev);
 void virtio_gpu_cmd_context_create(struct virtio_gpu_device *vgdev, uint32_t 
id,
-  uint32_t nlen, const char *name);
+  uint32_t context_init, uint32_t nlen,
+  const char *name);
 void virtio_gpu_cmd_context_destroy(struct virtio_gpu_device *vgdev,
uint32_t id);
 void virtio_gpu_cmd_context_attach_resource(struct virtio_gpu_device *vgdev,
diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
index 5c1ad1596889..f5281d1e30e1 100644
--- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
+++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
@@ -38,20 +38,30 @@
VIRTGPU_BLOB_FLAG_USE_SHAREABLE | \
VIRTGPU_BLOB_FLAG_USE_CROSS_DEVICE)
 
+/* Must be called with &virtio_gpu_fpriv.struct_mutex held. */
+static void virtio_gpu_create_context_locked(struct virtio_gpu_device *vgdev,
+struct virtio_gpu_fpriv *vfpriv)
+{
+   char dbgname[TASK_COMM_LEN];
+
+   get_task_comm(dbgname, current);
+   virtio_gpu_cmd_context_create(vgdev, vfpriv->ctx_id,
+ vfpriv->context_init, strlen(dbgname),
+ dbgname);
+
+   vfpriv->context_created = true;
+}
+
 void virtio_gpu_create_context(struct drm_device *dev, struct drm_file *file)
 {
struct virtio_gpu_device *vgdev = dev->dev_private;
struct virtio_gpu_fpriv *vfpriv = file->driver_priv;
-   char dbgname[TASK_COMM_LEN];
 
mutex_lock(&vfpriv->context_lock);
if (vfpriv->context_created)
goto out_unlock;
 
-   get_task_comm(dbgname, current);
-   virtio_gpu_cmd_context_create(vgdev, vfpriv->ctx_id,
- strlen(dbgname), dbgname);
-   vfpriv->context_created = true;
+   virtio_gpu_create_context_locked(vgdev, vfpriv);
 
 out_unlock:
mutex_unlock(&vfpriv->context_lock);
@@ -662,6 +672,79 @@ static int virtio_gpu_resource_create_blob_ioctl(struct 
drm_device *dev,
return 0;
 }
 
+static int virtio_gpu_context_init_ioctl(struct drm_device *dev,
+void *data, struct drm_file *file)
+{
+   int ret = 0;
+   uint32_t num_params, i, param, value;
+   size_t len;
+   struct drm_virtgpu_context_set_param *ctx_set_params = NULL;
+   struct virtio_gpu_device *vgdev = dev->dev_private;
+   struct virtio_gpu_fpriv *vfpriv = file->driver_priv;
+   struct drm_virtgpu_context_init *args = data;
+
+   num_params = args->num_params;
+   len = num_params * sizeof(struct drm_virtgpu_context_set_param);
+
+   if (!vgdev->has_context_init || !vgdev->has_virgl_3d)
+   return -EINVAL;
+
+   /* Number of unique parameters supported at this time. */
+   if (num_params > 1)
+   return -EINVAL;
+
+   ctx_set_params = memdup_user(u64_to_user_ptr(args->ctx_set_params),
+len);
+
+   if (IS_ERR(ctx_set_params))
+   return PTR_ERR(ctx_set_params);
+
+   mutex_lock(&vfpriv->context_lock);
+   if (vfpriv->context

[PATCH v1 08/12] drm/virtio: implement context init: stop using drv->context when creating fence

2021-09-08 Thread Gurchetan Singh

The plumbing is all here to do this.  Since we always use the
default fence context when allocating a fence, this makes no
functional difference.

We can't process just the largest fence id anymore, since it's
it's associated with different timelines.  It's fine for fence_id
260 to signal before 259.  As such, process each fence_id
individually.

Signed-off-by: Gurchetan Singh 
Acked-by: Lingfeng Yang 
---
 drivers/gpu/drm/virtio/virtgpu_fence.c | 16 ++--
 drivers/gpu/drm/virtio/virtgpu_vq.c| 15 +++
 2 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_fence.c 
b/drivers/gpu/drm/virtio/virtgpu_fence.c
index 24c728b65d21..98a00c1e654d 100644
--- a/drivers/gpu/drm/virtio/virtgpu_fence.c
+++ b/drivers/gpu/drm/virtio/virtgpu_fence.c
@@ -75,20 +75,25 @@ struct virtio_gpu_fence *virtio_gpu_fence_alloc(struct 
virtio_gpu_device *vgdev,
uint64_t base_fence_ctx,
uint32_t ring_idx)
 {
+   uint64_t fence_context = base_fence_ctx + ring_idx;
struct virtio_gpu_fence_driver *drv = &vgdev->fence_drv;
struct virtio_gpu_fence *fence = kzalloc(sizeof(struct 
virtio_gpu_fence),
GFP_KERNEL);
+
if (!fence)
return fence;
 
fence->drv = drv;
+   fence->ring_idx = ring_idx;
+   fence->emit_fence_info = !(base_fence_ctx == drv->context);
 
/* This only partially initializes the fence because the seqno is
 * unknown yet.  The fence must not be used outside of the driver
 * until virtio_gpu_fence_emit is called.
 */
-   dma_fence_init(&fence->f, &virtio_gpu_fence_ops, &drv->lock, 
drv->context,
-  0);
+
+   dma_fence_init(&fence->f, &virtio_gpu_fence_ops, &drv->lock,
+  fence_context, 0);
 
return fence;
 }
@@ -110,6 +115,13 @@ void virtio_gpu_fence_emit(struct virtio_gpu_device *vgdev,
 
cmd_hdr->flags |= cpu_to_le32(VIRTIO_GPU_FLAG_FENCE);
cmd_hdr->fence_id = cpu_to_le64(fence->fence_id);
+
+   /* Only currently defined fence param. */
+   if (fence->emit_fence_info) {
+   cmd_hdr->flags |=
+   cpu_to_le32(VIRTIO_GPU_FLAG_INFO_RING_IDX);
+   cmd_hdr->ring_idx = (u8)fence->ring_idx;
+   }
 }
 
 void virtio_gpu_fence_event_process(struct virtio_gpu_device *vgdev,
diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c 
b/drivers/gpu/drm/virtio/virtgpu_vq.c
index 496f8ce4cd41..938331554632 100644
--- a/drivers/gpu/drm/virtio/virtgpu_vq.c
+++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
@@ -205,7 +205,7 @@ void virtio_gpu_dequeue_ctrl_func(struct work_struct *work)
struct list_head reclaim_list;
struct virtio_gpu_vbuffer *entry, *tmp;
struct virtio_gpu_ctrl_hdr *resp;
-   u64 fence_id = 0;
+   u64 fence_id;
 
INIT_LIST_HEAD(&reclaim_list);
spin_lock(&vgdev->ctrlq.qlock);
@@ -232,23 +232,14 @@ void virtio_gpu_dequeue_ctrl_func(struct work_struct 
*work)
DRM_DEBUG("response 0x%x\n", 
le32_to_cpu(resp->type));
}
if (resp->flags & cpu_to_le32(VIRTIO_GPU_FLAG_FENCE)) {
-   u64 f = le64_to_cpu(resp->fence_id);
-
-   if (fence_id > f) {
-   DRM_ERROR("%s: Oops: fence %llx -> %llx\n",
- __func__, fence_id, f);
-   } else {
-   fence_id = f;
-   }
+   fence_id = le64_to_cpu(resp->fence_id);
+   virtio_gpu_fence_event_process(vgdev, fence_id);
}
if (entry->resp_cb)
entry->resp_cb(vgdev, entry);
}
wake_up(&vgdev->ctrlq.ack_queue);
 
-   if (fence_id)
-   virtio_gpu_fence_event_process(vgdev, fence_id);
-
list_for_each_entry_safe(entry, tmp, &reclaim_list, list) {
if (entry->objs)
virtio_gpu_array_put_free_delayed(vgdev, entry->objs);
-- 
2.33.0.153.gba50c8fa24-goog

[PATCH v1 07/12] drm/virtio: implement context init: plumb {base_fence_ctx, ring_idx} to virtio_gpu_fence_alloc

2021-09-08 Thread Gurchetan Singh

These were defined in the previous commit. We'll need these
parameters when allocating a dma_fence.  The use case for this
is multiple synchronizations timelines.

The maximum number of timelines per 3D instance will be 32. Usually,
only 2 are needed -- one for CPU commands, and another for GPU
commands.

As such, we'll need to specify these parameters when allocating a
dma_fence.

vgdev->fence_drv.context is the "default" fence context for 2D mode
and old userspace.

Signed-off-by: Gurchetan Singh 
Acked-by: Lingfeng Yang 
---
 drivers/gpu/drm/virtio/virtgpu_drv.h   | 5 +++--
 drivers/gpu/drm/virtio/virtgpu_fence.c | 4 +++-
 drivers/gpu/drm/virtio/virtgpu_ioctl.c | 9 +
 drivers/gpu/drm/virtio/virtgpu_plane.c | 3 ++-
 4 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
b/drivers/gpu/drm/virtio/virtgpu_drv.h
index 401aec1a5efb..a5142d60c2fa 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -426,8 +426,9 @@ struct drm_plane *virtio_gpu_plane_init(struct 
virtio_gpu_device *vgdev,
int index);
 
 /* virtgpu_fence.c */
-struct virtio_gpu_fence *virtio_gpu_fence_alloc(
-   struct virtio_gpu_device *vgdev);
+struct virtio_gpu_fence *virtio_gpu_fence_alloc(struct virtio_gpu_device 
*vgdev,
+   uint64_t base_fence_ctx,
+   uint32_t ring_idx);
 void virtio_gpu_fence_emit(struct virtio_gpu_device *vgdev,
  struct virtio_gpu_ctrl_hdr *cmd_hdr,
  struct virtio_gpu_fence *fence);
diff --git a/drivers/gpu/drm/virtio/virtgpu_fence.c 
b/drivers/gpu/drm/virtio/virtgpu_fence.c
index d28e25e8409b..24c728b65d21 100644
--- a/drivers/gpu/drm/virtio/virtgpu_fence.c
+++ b/drivers/gpu/drm/virtio/virtgpu_fence.c
@@ -71,7 +71,9 @@ static const struct dma_fence_ops virtio_gpu_fence_ops = {
.timeline_value_str  = virtio_gpu_timeline_value_str,
 };
 
-struct virtio_gpu_fence *virtio_gpu_fence_alloc(struct virtio_gpu_device 
*vgdev)
+struct virtio_gpu_fence *virtio_gpu_fence_alloc(struct virtio_gpu_device 
*vgdev,
+   uint64_t base_fence_ctx,
+   uint32_t ring_idx)
 {
struct virtio_gpu_fence_driver *drv = &vgdev->fence_drv;
struct virtio_gpu_fence *fence = kzalloc(sizeof(struct 
virtio_gpu_fence),
diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
index f5281d1e30e1..f51f3393a194 100644
--- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
+++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
@@ -173,7 +173,7 @@ static int virtio_gpu_execbuffer_ioctl(struct drm_device 
*dev, void *data,
goto out_memdup;
}
 
-   out_fence = virtio_gpu_fence_alloc(vgdev);
+   out_fence = virtio_gpu_fence_alloc(vgdev, vgdev->fence_drv.context, 0);
if(!out_fence) {
ret = -ENOMEM;
goto out_unresv;
@@ -288,7 +288,7 @@ static int virtio_gpu_resource_create_ioctl(struct 
drm_device *dev, void *data,
if (params.size == 0)
params.size = PAGE_SIZE;
 
-   fence = virtio_gpu_fence_alloc(vgdev);
+   fence = virtio_gpu_fence_alloc(vgdev, vgdev->fence_drv.context, 0);
if (!fence)
return -ENOMEM;
ret = virtio_gpu_object_create(vgdev, ¶ms, &qobj, fence);
@@ -367,7 +367,7 @@ static int virtio_gpu_transfer_from_host_ioctl(struct 
drm_device *dev,
if (ret != 0)
goto err_put_free;
 
-   fence = virtio_gpu_fence_alloc(vgdev);
+   fence = virtio_gpu_fence_alloc(vgdev, vgdev->fence_drv.context, 0);
if (!fence) {
ret = -ENOMEM;
goto err_unlock;
@@ -427,7 +427,8 @@ static int virtio_gpu_transfer_to_host_ioctl(struct 
drm_device *dev, void *data,
goto err_put_free;
 
ret = -ENOMEM;
-   fence = virtio_gpu_fence_alloc(vgdev);
+   fence = virtio_gpu_fence_alloc(vgdev, vgdev->fence_drv.context,
+  0);
if (!fence)
goto err_unlock;
 
diff --git a/drivers/gpu/drm/virtio/virtgpu_plane.c 
b/drivers/gpu/drm/virtio/virtgpu_plane.c
index a49fd9480381..6d3cc9e238a4 100644
--- a/drivers/gpu/drm/virtio/virtgpu_plane.c
+++ b/drivers/gpu/drm/virtio/virtgpu_plane.c
@@ -256,7 +256,8 @@ static int virtio_gpu_plane_prepare_fb(struct drm_plane 
*plane,
return 0;
 
if (bo->dumb && (plane->state->fb != new_state->fb)) {
-   vgfb->fence = virtio_gpu_fence_alloc(vgdev);
+   vgfb->fence = virtio_gpu_fence_alloc(vgdev, 
vgdev->fence_drv.context,
+0

[PATCH v1 06/12] drm/virtio: implement context init: track {ring_idx, emit_fence_info} in virtio_gpu_fence

2021-09-08 Thread Gurchetan Singh

Each fence should be associated with a [fence ID, fence_context,
seqno].  The seqno number is just the fence id.

To get the fence context, we add the ring_idx to the 3D context's
base_fence_ctx.  The ring_idx is between 0 and 31, inclusive.

Each 3D context will have it's own base_fence_ctx. The ring_idx will
be emitted to host userspace, when emit_fence_info is true.

Signed-off-by: Gurchetan Singh 
Acked-by: Lingfeng Yang 
---
 drivers/gpu/drm/virtio/virtgpu_drv.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
b/drivers/gpu/drm/virtio/virtgpu_drv.h
index 9996abf60e3a..401aec1a5efb 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -139,7 +139,9 @@ struct virtio_gpu_fence_driver {
 
 struct virtio_gpu_fence {
struct dma_fence f;
+   uint32_t ring_idx;
uint64_t fence_id;
+   bool emit_fence_info;
struct virtio_gpu_fence_driver *drv;
struct list_head node;
 };
-- 
2.33.0.153.gba50c8fa24-goog

[PATCH v1 04/12] drm/virtio: implement context init: probe for feature

2021-09-08 Thread Gurchetan Singh

From: Anthoine Bourgeois 

Let's probe for VIRTIO_GPU_F_CONTEXT_INIT.

Create a new DRM_INFO(..) line since the current one is getting
too long.

Signed-off-by: Anthoine Bourgeois 
Acked-by: Lingfeng Yang 
---
 drivers/gpu/drm/virtio/virtgpu_debugfs.c | 1 +
 drivers/gpu/drm/virtio/virtgpu_drv.c | 1 +
 drivers/gpu/drm/virtio/virtgpu_drv.h | 1 +
 drivers/gpu/drm/virtio/virtgpu_kms.c | 8 +++-
 4 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_debugfs.c 
b/drivers/gpu/drm/virtio/virtgpu_debugfs.c
index c2b20e0ee030..b6954e2f75e6 100644
--- a/drivers/gpu/drm/virtio/virtgpu_debugfs.c
+++ b/drivers/gpu/drm/virtio/virtgpu_debugfs.c
@@ -52,6 +52,7 @@ static int virtio_gpu_features(struct seq_file *m, void *data)
vgdev->has_resource_assign_uuid);
 
virtio_gpu_add_bool(m, "blob resources", vgdev->has_resource_blob);
+   virtio_gpu_add_bool(m, "context init", vgdev->has_context_init);
virtio_gpu_add_int(m, "cap sets", vgdev->num_capsets);
virtio_gpu_add_int(m, "scanouts", vgdev->num_scanouts);
if (vgdev->host_visible_region.len) {
diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.c 
b/drivers/gpu/drm/virtio/virtgpu_drv.c
index ed85a7863256..9d963f1fda8f 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.c
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.c
@@ -172,6 +172,7 @@ static unsigned int features[] = {
VIRTIO_GPU_F_EDID,
VIRTIO_GPU_F_RESOURCE_UUID,
VIRTIO_GPU_F_RESOURCE_BLOB,
+   VIRTIO_GPU_F_CONTEXT_INIT,
 };
 static struct virtio_driver virtio_gpu_driver = {
.feature_table = features,
diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
b/drivers/gpu/drm/virtio/virtgpu_drv.h
index 3023e16be0d6..5e1958a522ff 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -236,6 +236,7 @@ struct virtio_gpu_device {
bool has_resource_assign_uuid;
bool has_resource_blob;
bool has_host_visible;
+   bool has_context_init;
struct virtio_shm_region host_visible_region;
struct drm_mm host_visible_mm;
 
diff --git a/drivers/gpu/drm/virtio/virtgpu_kms.c 
b/drivers/gpu/drm/virtio/virtgpu_kms.c
index 58a65121c200..21f410901694 100644
--- a/drivers/gpu/drm/virtio/virtgpu_kms.c
+++ b/drivers/gpu/drm/virtio/virtgpu_kms.c
@@ -191,13 +191,19 @@ int virtio_gpu_init(struct drm_device *dev)
(unsigned long)vgdev->host_visible_region.addr,
(unsigned long)vgdev->host_visible_region.len);
}
+   if (virtio_has_feature(vgdev->vdev, VIRTIO_GPU_F_CONTEXT_INIT)) {
+   vgdev->has_context_init = true;
+   }
 
-   DRM_INFO("features: %cvirgl %cedid %cresource_blob %chost_visible\n",
+   DRM_INFO("features: %cvirgl %cedid %cresource_blob %chost_visible",
 vgdev->has_virgl_3d? '+' : '-',
 vgdev->has_edid? '+' : '-',
 vgdev->has_resource_blob ? '+' : '-',
 vgdev->has_host_visible ? '+' : '-');
 
+   DRM_INFO("features: %ccontext_init\n",
+vgdev->has_context_init ? '+' : '-');
+
ret = virtio_find_vqs(vgdev->vdev, 2, vqs, callbacks, names, NULL);
if (ret) {
DRM_ERROR("failed to find virt queues\n");
-- 
2.33.0.153.gba50c8fa24-goog

[PATCH v1 03/12] drm/virtio: implement context init: track valid capabilities in a mask

2021-09-08 Thread Gurchetan Singh

The valid capability IDs are between 1 to 63, and defined in the
virtio gpu spec.  This is used for error checking the subsequent
patches.  We're currently only using 2 capability IDs, so this
should be plenty for the immediate future.

Signed-off-by: Gurchetan Singh 
Acked-by: Lingfeng Yang 
---
 drivers/gpu/drm/virtio/virtgpu_drv.h |  3 +++
 drivers/gpu/drm/virtio/virtgpu_kms.c | 18 +-
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
b/drivers/gpu/drm/virtio/virtgpu_drv.h
index 0c4810982530..3023e16be0d6 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -55,6 +55,8 @@
 #define STATE_OK 1
 #define STATE_ERR 2
 
+#define MAX_CAPSET_ID 63
+
 struct virtio_gpu_object_params {
unsigned long size;
bool dumb;
@@ -245,6 +247,7 @@ struct virtio_gpu_device {
 
struct virtio_gpu_drv_capset *capsets;
uint32_t num_capsets;
+   uint64_t capset_id_mask;
struct list_head cap_cache;
 
/* protects uuid state when exporting */
diff --git a/drivers/gpu/drm/virtio/virtgpu_kms.c 
b/drivers/gpu/drm/virtio/virtgpu_kms.c
index f3379059f324..58a65121c200 100644
--- a/drivers/gpu/drm/virtio/virtgpu_kms.c
+++ b/drivers/gpu/drm/virtio/virtgpu_kms.c
@@ -65,6 +65,7 @@ static void virtio_gpu_get_capsets(struct virtio_gpu_device 
*vgdev,
   int num_capsets)
 {
int i, ret;
+   bool invalid_capset_id = false;
 
vgdev->capsets = kcalloc(num_capsets,
 sizeof(struct virtio_gpu_drv_capset),
@@ -78,19 +79,34 @@ static void virtio_gpu_get_capsets(struct virtio_gpu_device 
*vgdev,
virtio_gpu_notify(vgdev);
ret = wait_event_timeout(vgdev->resp_wq,
 vgdev->capsets[i].id > 0, 5 * HZ);
-   if (ret == 0) {
+   /*
+* Capability ids are defined in the virtio-gpu spec and are
+* between 1 to 63, inclusive.
+*/
+   if (!vgdev->capsets[i].id ||
+   vgdev->capsets[i].id > MAX_CAPSET_ID)
+   invalid_capset_id = true;
+
+   if (ret == 0)
DRM_ERROR("timed out waiting for cap set %d\n", i);
+   else if (invalid_capset_id)
+   DRM_ERROR("invalid capset id %u", vgdev->capsets[i].id);
+
+   if (ret == 0 || invalid_capset_id) {
spin_lock(&vgdev->display_info_lock);
kfree(vgdev->capsets);
vgdev->capsets = NULL;
spin_unlock(&vgdev->display_info_lock);
return;
}
+
+   vgdev->capset_id_mask |= 1 << vgdev->capsets[i].id;
DRM_INFO("cap set %d: id %d, max-version %d, max-size %d\n",
 i, vgdev->capsets[i].id,
 vgdev->capsets[i].max_version,
 vgdev->capsets[i].max_size);
}
+
vgdev->num_capsets = num_capsets;
 }
 
-- 
2.33.0.153.gba50c8fa24-goog

[PATCH v1 02/12] drm/virtgpu api: create context init feature

2021-09-08 Thread Gurchetan Singh

This change allows creating contexts of depending on set of
context parameters.  The meaning of each of the parameters
is listed below:

1) VIRTGPU_CONTEXT_PARAM_CAPSET_ID

This determines the type of a context based on the capability set
ID.  For example, the current capsets:

VIRTIO_GPU_CAPSET_VIRGL
VIRTIO_GPU_CAPSET_VIRGL2

define a Gallium, TGSI based "virgl" context.  We only need 1 capset
ID per context type, though virgl has two due a bug that has since
been fixed.

The use case is the "gfxstream" rendering library and "venus"
renderer.

gfxstream doesn't do Gallium/TGSI translation and mostly relies on
auto-generated API streaming.  Certain users prefer gfxstream over
virgl for GLES on GLES emulation.  {gfxstream vk}/{venus} are also
required for Vulkan emulation.  The maximum capset ID is 63.

The goal is for guest userspace to choose the optimal context type
depending on the situation/hardware.

2) VIRTGPU_CONTEXT_PARAM_NUM_RINGS

This tells the number of independent command rings that the context
will use.  This value may be zero and is inferred to be zero if
VIRTGPU_CONTEXT_PARAM_NUM_RINGS is not passed in.  This is for backwards
compatibility for virgl, which has one big giant command ring for all
commands.

The maxiumum number of rings is 64.  In practice, multi-queue or
multi-ring submission is used for powerful dGPUs and virtio-gpu
may not be the best option in that case (see PCI passthrough or
rendernode forwarding).

3) VIRTGPU_CONTEXT_PARAM_POLL_RING_IDX_MASK

This is a mask of ring indices for which the DRM fd is pollable.
For example, if VIRTGPU_CONTEXT_PARAM_NUM_RINGS is 2, then the mask
may be:

[ring idx]  |  [1 << ring_idx] | final mask
---
0  11
1  23

The "Sommelier" guest Wayland proxy uses this to poll for events
from the host compositor.

Signed-off-by: Gurchetan Singh 
Acked-by: Lingfeng Yang 
Acked-by: Nicholas Verne 
---
 include/uapi/drm/virtgpu_drm.h | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/include/uapi/drm/virtgpu_drm.h b/include/uapi/drm/virtgpu_drm.h
index b9ec26e9c646..a13e20cc66b4 100644
--- a/include/uapi/drm/virtgpu_drm.h
+++ b/include/uapi/drm/virtgpu_drm.h
@@ -47,12 +47,15 @@ extern "C" {
 #define DRM_VIRTGPU_WAIT 0x08
 #define DRM_VIRTGPU_GET_CAPS  0x09
 #define DRM_VIRTGPU_RESOURCE_CREATE_BLOB 0x0a
+#define DRM_VIRTGPU_CONTEXT_INIT 0x0b
 
 #define VIRTGPU_EXECBUF_FENCE_FD_IN0x01
 #define VIRTGPU_EXECBUF_FENCE_FD_OUT   0x02
+#define VIRTGPU_EXECBUF_RING_IDX   0x04
 #define VIRTGPU_EXECBUF_FLAGS  (\
VIRTGPU_EXECBUF_FENCE_FD_IN |\
VIRTGPU_EXECBUF_FENCE_FD_OUT |\
+   VIRTGPU_EXECBUF_RING_IDX |\
0)
 
 struct drm_virtgpu_map {
@@ -68,6 +71,8 @@ struct drm_virtgpu_execbuffer {
__u64 bo_handles;
__u32 num_bo_handles;
__s32 fence_fd; /* in/out fence fd (see 
VIRTGPU_EXECBUF_FENCE_FD_IN/OUT) */
+   __u32 ring_idx; /* command ring index (see VIRTGPU_EXECBUF_RING_IDX) */
+   __u32 pad;
 };
 
 #define VIRTGPU_PARAM_3D_FEATURES 1 /* do we have 3D features in the hw */
@@ -75,6 +80,8 @@ struct drm_virtgpu_execbuffer {
 #define VIRTGPU_PARAM_RESOURCE_BLOB 3 /* DRM_VIRTGPU_RESOURCE_CREATE_BLOB */
 #define VIRTGPU_PARAM_HOST_VISIBLE 4 /* Host blob resources are mappable */
 #define VIRTGPU_PARAM_CROSS_DEVICE 5 /* Cross virtio-device resource sharing  
*/
+#define VIRTGPU_PARAM_CONTEXT_INIT 6 /* DRM_VIRTGPU_CONTEXT_INIT */
+#define VIRTGPU_PARAM_SUPPORTED_CAPSET_IDs 7 /* Bitmask of supported 
capability set ids */
 
 struct drm_virtgpu_getparam {
__u64 param;
@@ -173,6 +180,22 @@ struct drm_virtgpu_resource_create_blob {
__u64 blob_id;
 };
 
+#define VIRTGPU_CONTEXT_PARAM_CAPSET_ID   0x0001
+#define VIRTGPU_CONTEXT_PARAM_NUM_RINGS   0x0002
+#define VIRTGPU_CONTEXT_PARAM_POLL_RINGS_MASK 0x0003
+struct drm_virtgpu_context_set_param {
+   __u64 param;
+   __u64 value;
+};
+
+struct drm_virtgpu_context_init {
+   __u32 num_params;
+   __u32 pad;
+
+   /* pointer to drm_virtgpu_context_set_param array */
+   __u64 ctx_set_params;
+};
+
 #define DRM_IOCTL_VIRTGPU_MAP \
DRM_IOWR(DRM_COMMAND_BASE + DRM_VIRTGPU_MAP, struct drm_virtgpu_map)
 
@@ -212,6 +235,10 @@ struct drm_virtgpu_resource_create_blob {
DRM_IOWR(DRM_COMMAND_BASE + DRM_VIRTGPU_RESOURCE_CREATE_BLOB,   \
struct drm_virtgpu_resource_create_blob)
 
+#define DRM_IOCTL_VIRTGPU_CONTEXT_INIT \
+   DRM_IOWR(DRM_COMMAND_BASE + DRM_VIRTGPU_CONTEXT_INIT,   \
+   struct drm_virtgpu_context_init)
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.33.0.153.gba50c8fa24-goog

[PATCH v1 00/12] Context types

2021-09-08 Thread Gurchetan Singh

Version 1 of context types:

https://lists.oasis-open.org/archives/virtio-dev/202108/msg00141.html

Changes since RFC:
   * le32 info --> {u8 ring_idx + u8 padding[3]).
   * Max rings is now 64.

Anthoine Bourgeois (2):
  drm/virtio: implement context init: probe for feature
  drm/virtio: implement context init: support init ioctl

Gurchetan Singh (10):
  virtio-gpu api: multiple context types with explicit initialization
  drm/virtgpu api: create context init feature
  drm/virtio: implement context init: track valid capabilities in a mask
  drm/virtio: implement context init: track {ring_idx, emit_fence_info}
in virtio_gpu_fence
  drm/virtio: implement context init: plumb {base_fence_ctx, ring_idx}
to virtio_gpu_fence_alloc
  drm/virtio: implement context init: stop using drv->context when
creating fence
  drm/virtio: implement context init: allocate an array of fence
contexts
  drm/virtio: implement context init: handle
VIRTGPU_CONTEXT_PARAM_POLL_RINGS_MASK
  drm/virtio: implement context init: add virtio_gpu_fence_event
  drm/virtio: implement context init: advertise feature to userspace

 drivers/gpu/drm/virtio/virtgpu_debugfs.c |   1 +
 drivers/gpu/drm/virtio/virtgpu_drv.c |  44 -
 drivers/gpu/drm/virtio/virtgpu_drv.h |  28 +++-
 drivers/gpu/drm/virtio/virtgpu_fence.c   |  30 +++-
 drivers/gpu/drm/virtio/virtgpu_ioctl.c   | 195 +--
 drivers/gpu/drm/virtio/virtgpu_kms.c |  26 ++-
 drivers/gpu/drm/virtio/virtgpu_plane.c   |   3 +-
 drivers/gpu/drm/virtio/virtgpu_vq.c  |  19 +--
 include/uapi/drm/virtgpu_drm.h   |  27 
 include/uapi/linux/virtio_gpu.h  |  18 ++-
 10 files changed, 355 insertions(+), 36 deletions(-)

-- 
2.33.0.153.gba50c8fa24-goog

[PATCH v1 01/12] virtio-gpu api: multiple context types with explicit initialization

2021-09-08 Thread Gurchetan Singh

This feature allows for each virtio-gpu 3D context to be created
with a "context_init" variable.  This variable can specify:

 - the type of protocol used by the context via the capset id.
   This is useful for differentiating virgl, gfxstream, and venus
   protocols by host userspace.

 - other things in the future, such as the version of the context.

In addition, each different context needs one or more timelines, so
for example a virgl context's waiting can be independent on a
gfxstream context's waiting.

VIRTIO_GPU_FLAG_INFO_RING_IDX is introduced to specific to tell the
host which per-context command ring (or "hardware queue", distinct
from the virtio-queue) the fence should be associated with.

The new capability sets (gfxstream, venus etc.) are only defined in
the virtio-gpu spec and not defined in the header.

Signed-off-by: Gurchetan Singh 
Acked-by: Lingfeng Yang 
---
 include/uapi/linux/virtio_gpu.h | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/virtio_gpu.h b/include/uapi/linux/virtio_gpu.h
index 97523a95781d..b0e3d91dfab7 100644
--- a/include/uapi/linux/virtio_gpu.h
+++ b/include/uapi/linux/virtio_gpu.h
@@ -59,6 +59,11 @@
  * VIRTIO_GPU_CMD_RESOURCE_CREATE_BLOB
  */
 #define VIRTIO_GPU_F_RESOURCE_BLOB   3
+/*
+ * VIRTIO_GPU_CMD_CREATE_CONTEXT with
+ * context_init and multiple timelines
+ */
+#define VIRTIO_GPU_F_CONTEXT_INIT4
 
 enum virtio_gpu_ctrl_type {
VIRTIO_GPU_UNDEFINED = 0,
@@ -122,14 +127,20 @@ enum virtio_gpu_shm_id {
VIRTIO_GPU_SHM_ID_HOST_VISIBLE = 1
 };
 
-#define VIRTIO_GPU_FLAG_FENCE (1 << 0)
+#define VIRTIO_GPU_FLAG_FENCE (1 << 0)
+/*
+ * If the following flag is set, then ring_idx contains the index
+ * of the command ring that needs to used when creating the fence
+ */
+#define VIRTIO_GPU_FLAG_INFO_RING_IDX (1 << 1)
 
 struct virtio_gpu_ctrl_hdr {
__le32 type;
__le32 flags;
__le64 fence_id;
__le32 ctx_id;
-   __le32 padding;
+   u8 ring_idx;
+   u8 padding[3];
 };
 
 /* data passed in the cursor vq */
@@ -269,10 +280,11 @@ struct virtio_gpu_resource_create_3d {
 };
 
 /* VIRTIO_GPU_CMD_CTX_CREATE */
+#define VIRTIO_GPU_CONTEXT_INIT_CAPSET_ID_MASK 0x00ff
 struct virtio_gpu_ctx_create {
struct virtio_gpu_ctrl_hdr hdr;
__le32 nlen;
-   __le32 padding;
+   __le32 context_init;
char debug_name[64];
 };
 
-- 
2.33.0.153.gba50c8fa24-goog

[RFC PATCH 05/12] drm/virtio: implement context init: support init ioctl

2021-08-25 Thread Gurchetan Singh

From: Anthoine Bourgeois 

This implements the context initialization ioctl.  A list of params
is passed in by userspace, and kernel driver validates them.  The
only currently supported param is VIRTGPU_CONTEXT_PARAM_CAPSET_ID.

If the context has already been initialized, -EEXIST is returned.
This happens after Linux userspace does dumb_create + followed by
opening the Mesa virgl driver with the same virtgpu instance.

However, for most applications, 3D contexts will be explicitly
initialized when the feature is available.

Signed-off-by: Anthoine Bourgeois 
Acked-by: Lingfeng Yang 
---
 drivers/gpu/drm/virtio/virtgpu_drv.h   |  6 +-
 drivers/gpu/drm/virtio/virtgpu_ioctl.c | 96 --
 drivers/gpu/drm/virtio/virtgpu_vq.c|  4 +-
 3 files changed, 98 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
b/drivers/gpu/drm/virtio/virtgpu_drv.h
index 5e1958a522ff..9996abf60e3a 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -259,12 +259,13 @@ struct virtio_gpu_device {
 
 struct virtio_gpu_fpriv {
uint32_t ctx_id;
+   uint32_t context_init;
bool context_created;
struct mutex context_lock;
 };
 
 /* virtgpu_ioctl.c */
-#define DRM_VIRTIO_NUM_IOCTLS 11
+#define DRM_VIRTIO_NUM_IOCTLS 12
 extern struct drm_ioctl_desc virtio_gpu_ioctls[DRM_VIRTIO_NUM_IOCTLS];
 void virtio_gpu_create_context(struct drm_device *dev, struct drm_file *file);
 
@@ -342,7 +343,8 @@ int virtio_gpu_cmd_get_capset(struct virtio_gpu_device 
*vgdev,
  struct virtio_gpu_drv_cap_cache **cache_p);
 int virtio_gpu_cmd_get_edids(struct virtio_gpu_device *vgdev);
 void virtio_gpu_cmd_context_create(struct virtio_gpu_device *vgdev, uint32_t 
id,
-  uint32_t nlen, const char *name);
+  uint32_t context_init, uint32_t nlen,
+  const char *name);
 void virtio_gpu_cmd_context_destroy(struct virtio_gpu_device *vgdev,
uint32_t id);
 void virtio_gpu_cmd_context_attach_resource(struct virtio_gpu_device *vgdev,
diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
index 5c1ad1596889..f5281d1e30e1 100644
--- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
+++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
@@ -38,20 +38,30 @@
VIRTGPU_BLOB_FLAG_USE_SHAREABLE | \
VIRTGPU_BLOB_FLAG_USE_CROSS_DEVICE)
 
+/* Must be called with &virtio_gpu_fpriv.struct_mutex held. */
+static void virtio_gpu_create_context_locked(struct virtio_gpu_device *vgdev,
+struct virtio_gpu_fpriv *vfpriv)
+{
+   char dbgname[TASK_COMM_LEN];
+
+   get_task_comm(dbgname, current);
+   virtio_gpu_cmd_context_create(vgdev, vfpriv->ctx_id,
+ vfpriv->context_init, strlen(dbgname),
+ dbgname);
+
+   vfpriv->context_created = true;
+}
+
 void virtio_gpu_create_context(struct drm_device *dev, struct drm_file *file)
 {
struct virtio_gpu_device *vgdev = dev->dev_private;
struct virtio_gpu_fpriv *vfpriv = file->driver_priv;
-   char dbgname[TASK_COMM_LEN];
 
mutex_lock(&vfpriv->context_lock);
if (vfpriv->context_created)
goto out_unlock;
 
-   get_task_comm(dbgname, current);
-   virtio_gpu_cmd_context_create(vgdev, vfpriv->ctx_id,
- strlen(dbgname), dbgname);
-   vfpriv->context_created = true;
+   virtio_gpu_create_context_locked(vgdev, vfpriv);
 
 out_unlock:
mutex_unlock(&vfpriv->context_lock);
@@ -662,6 +672,79 @@ static int virtio_gpu_resource_create_blob_ioctl(struct 
drm_device *dev,
return 0;
 }
 
+static int virtio_gpu_context_init_ioctl(struct drm_device *dev,
+void *data, struct drm_file *file)
+{
+   int ret = 0;
+   uint32_t num_params, i, param, value;
+   size_t len;
+   struct drm_virtgpu_context_set_param *ctx_set_params = NULL;
+   struct virtio_gpu_device *vgdev = dev->dev_private;
+   struct virtio_gpu_fpriv *vfpriv = file->driver_priv;
+   struct drm_virtgpu_context_init *args = data;
+
+   num_params = args->num_params;
+   len = num_params * sizeof(struct drm_virtgpu_context_set_param);
+
+   if (!vgdev->has_context_init || !vgdev->has_virgl_3d)
+   return -EINVAL;
+
+   /* Number of unique parameters supported at this time. */
+   if (num_params > 1)
+   return -EINVAL;
+
+   ctx_set_params = memdup_user(u64_to_user_ptr(args->ctx_set_params),
+len);
+
+   if (IS_ERR(ctx_set_params))
+   return PTR_ERR(ctx_set_params);
+
+   mutex_lock(&vfpriv->context_lock);
+   if (vfpriv->context

[RFC PATCH 12/12] drm/virtio: implement context init: advertise feature to userspace

2021-08-25 Thread Gurchetan Singh

This advertises the context init feature to userspace, along with
a mask of supported capabilities.

Signed-off-by: Gurchetan Singh 
Acked-by: Lingfeng Yang 
---
 drivers/gpu/drm/virtio/virtgpu_ioctl.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
index 393897adbbaa..4e77a333005e 100644
--- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
+++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
@@ -286,6 +286,12 @@ static int virtio_gpu_getparam_ioctl(struct drm_device 
*dev, void *data,
case VIRTGPU_PARAM_CROSS_DEVICE:
value = vgdev->has_resource_assign_uuid ? 1 : 0;
break;
+   case VIRTGPU_PARAM_CONTEXT_INIT:
+   value = vgdev->has_context_init ? 1 : 0;
+   break;
+   case VIRTGPU_PARAM_SUPPORTED_CAPSET_IDs:
+   value = vgdev->capset_id_mask;
+   break;
default:
return -EINVAL;
}
-- 
2.33.0.259.gc128427fd7-goog

1 2 3 4 >

1 - 100 of 398 matches

Mail list logo