Re: [RFC PATCH 0/8] *** Per context fencing ***
Hi, > > > I am still catching up, but IIUC, indeed I don't think the host needs > > > to depend on fence_id. We should be able to repurpose fence_id. > > > > I'd rather ignore it altogether for FENCE_V2 (or whatever we call the > > feature flag). > > That's fine when one assumes each virtqueue has one host GPU timeline. > But when there are multiple (e.g., multiplexing multiple contexts over > one virtqueue, or multiple VkQueues), fence_id can be repurposed as > the host timeline id. Why do you need an id for that? You can simply keep track of the submit_3d commands instead. cheers, Gerd ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC PATCH 0/8] *** Per context fencing ***
On Mon, Mar 16, 2020 at 3:44 PM Gerd Hoffmann wrote: > > Hi, > > > >> At virtio level it is pretty simple: The host completes the SUBMIT_3D > > >> virtio command when it finished rendering, period. > > >> > > >> > > >> On the guest side we don't need the fence_id. The completion callback > > >> gets passed the virtio_gpu_vbuffer, so it can figure which command did > > >> actually complete without looking at virtio_gpu_ctrl_hdr->fence_id. > > >> > > >> On the host side we depend on the fence_id right now, but only because > > >> that is the way the virgl_renderer_callbacks->write_fence interface is > > >> designed. We have to change that anyway for per-context (or whatever) > > >> fences, so it should not be a problem to drop the fence_id dependency > > >> too and just pass around an opaque pointer instead. > > > > I am still catching up, but IIUC, indeed I don't think the host needs > > to depend on fence_id. We should be able to repurpose fence_id. > > I'd rather ignore it altogether for FENCE_V2 (or whatever we call the > feature flag). That's fine when one assumes each virtqueue has one host GPU timeline. But when there are multiple (e.g., multiplexing multiple contexts over one virtqueue, or multiple VkQueues), fence_id can be repurposed as the host timeline id. > > > On the other hand, the VIRTIO_GPU_FLAG_FENCE flag is interesting, and > > it indicates that the vbuf is on the host GPU timeline instead of the > > host CPU timeline. > > Yep, we have to keep that (unless we do command completion on GPU > timeline unconditionally with FENCE_V2). I think it will be useful when EXECBUFFER is used for metadata query and write the metadata directly to a guest BO's sg list. We want the query to be on the CPU timeline. > cheers, > Gerd > ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC PATCH 0/8] *** Per context fencing ***
Hi, > >> At virtio level it is pretty simple: The host completes the SUBMIT_3D > >> virtio command when it finished rendering, period. > >> > >> > >> On the guest side we don't need the fence_id. The completion callback > >> gets passed the virtio_gpu_vbuffer, so it can figure which command did > >> actually complete without looking at virtio_gpu_ctrl_hdr->fence_id. > >> > >> On the host side we depend on the fence_id right now, but only because > >> that is the way the virgl_renderer_callbacks->write_fence interface is > >> designed. We have to change that anyway for per-context (or whatever) > >> fences, so it should not be a problem to drop the fence_id dependency > >> too and just pass around an opaque pointer instead. > > I am still catching up, but IIUC, indeed I don't think the host needs > to depend on fence_id. We should be able to repurpose fence_id. I'd rather ignore it altogether for FENCE_V2 (or whatever we call the feature flag). > On the other hand, the VIRTIO_GPU_FLAG_FENCE flag is interesting, and > it indicates that the vbuf is on the host GPU timeline instead of the > host CPU timeline. Yep, we have to keep that (unless we do command completion on GPU timeline unconditionally with FENCE_V2). cheers, Gerd ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC PATCH 0/8] *** Per context fencing ***
On Thu, Mar 12, 2020 at 4:08 PM Gurchetan Singh wrote: > > > > On Thu, Mar 12, 2020 at 2:29 AM Gerd Hoffmann wrote: >> >> On Wed, Mar 11, 2020 at 04:36:16PM -0700, Gurchetan Singh wrote: >> > On Wed, Mar 11, 2020 at 3:36 AM Gerd Hoffmann wrote: >> > >> > > Hi, >> > > >> > > > I should've been more clear -- this is an internal cleanup/preparation >> > > and >> > > > the per-context changes are invisible to host userspace. >> > > >> > > Ok, it wasn't clear that you don't flip the switch yet. In general the >> > > commit messages could be a bit more verbose ... >> > > >> > > I'm wondering though why we need the new fence_id in the first place. >> > > Isn't it enough to have per-context (instead of global) last_seq? >> > > >> > >> > Heh, that was to leave open the possibility of multiple timelines per >> > context. Roughly speaking, >> > >> > V2 -- multiple processes >> > V3 -- multiple processes and multiple threads (due to VK multi-threaded >> > command buffers) >> > >> > I think we all agree on V2. It seems we still have to discuss V3 >> > (multi-queue, thread pools, a fence context associated with each thread) a >> > bit more before we start landing pieces. >> >> While thinking about the whole thing a bit more ... >> Do we need virtio_gpu_ctrl_hdr->fence_id at all? > > > A fence ID could be useful for sharing fences across virtio devices. Think > FENCE_ASSIGN_UUID, akin to RESOURCE_ASSIGN_UUID (+dstevens@). > >> >> At virtio level it is pretty simple: The host completes the SUBMIT_3D >> virtio command when it finished rendering, period. >> >> >> On the guest side we don't need the fence_id. The completion callback >> gets passed the virtio_gpu_vbuffer, so it can figure which command did >> actually complete without looking at virtio_gpu_ctrl_hdr->fence_id. >> >> On the host side we depend on the fence_id right now, but only because >> that is the way the virgl_renderer_callbacks->write_fence interface is >> designed. We have to change that anyway for per-context (or whatever) >> fences, so it should not be a problem to drop the fence_id dependency >> too and just pass around an opaque pointer instead. I am still catching up, but IIUC, indeed I don't think the host needs to depend on fence_id. We should be able to repurpose fence_id. On the other hand, the VIRTIO_GPU_FLAG_FENCE flag is interesting, and it indicates that the vbuf is on the host GPU timeline instead of the host CPU timeline. > > > For multiple GPU timelines per context, the (process local) sync object > handle looks interesting: > > https://patchwork.kernel.org/patch/9758565/ > > Some have extended EXECBUFFER to support this flow: > > https://patchwork.freedesktop.org/patch/msgid/1499289202-25441-1-git-send-email-jason.ekstr...@intel.com I think this only affects the kernel/userspace interface? I know there were works being done to support VK_KHR_timeline semaphore, which is something we definitely want. I don't know if it is the only way for the userspace to gain the extension support. I need to do my homework... > >> >> cheers, >> Gerd >> ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC PATCH 0/8] *** Per context fencing ***
On Thu, Mar 12, 2020 at 2:29 AM Gerd Hoffmann wrote: > On Wed, Mar 11, 2020 at 04:36:16PM -0700, Gurchetan Singh wrote: > > On Wed, Mar 11, 2020 at 3:36 AM Gerd Hoffmann wrote: > > > > > Hi, > > > > > > > I should've been more clear -- this is an internal > cleanup/preparation > > > and > > > > the per-context changes are invisible to host userspace. > > > > > > Ok, it wasn't clear that you don't flip the switch yet. In general the > > > commit messages could be a bit more verbose ... > > > > > > I'm wondering though why we need the new fence_id in the first place. > > > Isn't it enough to have per-context (instead of global) last_seq? > > > > > > > Heh, that was to leave open the possibility of multiple timelines per > > context. Roughly speaking, > > > > V2 -- multiple processes > > V3 -- multiple processes and multiple threads (due to VK multi-threaded > > command buffers) > > > > I think we all agree on V2. It seems we still have to discuss V3 > > (multi-queue, thread pools, a fence context associated with each thread) > a > > bit more before we start landing pieces. > > While thinking about the whole thing a bit more ... > Do we need virtio_gpu_ctrl_hdr->fence_id at all? > A fence ID could be useful for sharing fences across virtio devices. Think FENCE_ASSIGN_UUID, akin to RESOURCE_ASSIGN_UUID (+dstevens@). > At virtio level it is pretty simple: The host completes the SUBMIT_3D > virtio command when it finished rendering, period. > On the guest side we don't need the fence_id. The completion callback > gets passed the virtio_gpu_vbuffer, so it can figure which command did > actually complete without looking at virtio_gpu_ctrl_hdr->fence_id. > > On the host side we depend on the fence_id right now, but only because > that is the way the virgl_renderer_callbacks->write_fence interface is > designed. We have to change that anyway for per-context (or whatever) > fences, so it should not be a problem to drop the fence_id dependency > too and just pass around an opaque pointer instead. > For multiple GPU timelines per context, the (process local) sync object handle looks interesting: https://patchwork.kernel.org/patch/9758565/ Some have extended EXECBUFFER to support this flow: https://patchwork.freedesktop.org/patch/msgid/1499289202-25441-1-git-send-email-jason.ekstr...@intel.com > cheers, > Gerd > > ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC PATCH 0/8] *** Per context fencing ***
On Wed, Mar 11, 2020 at 04:36:16PM -0700, Gurchetan Singh wrote: > On Wed, Mar 11, 2020 at 3:36 AM Gerd Hoffmann wrote: > > > Hi, > > > > > I should've been more clear -- this is an internal cleanup/preparation > > and > > > the per-context changes are invisible to host userspace. > > > > Ok, it wasn't clear that you don't flip the switch yet. In general the > > commit messages could be a bit more verbose ... > > > > I'm wondering though why we need the new fence_id in the first place. > > Isn't it enough to have per-context (instead of global) last_seq? > > > > Heh, that was to leave open the possibility of multiple timelines per > context. Roughly speaking, > > V2 -- multiple processes > V3 -- multiple processes and multiple threads (due to VK multi-threaded > command buffers) > > I think we all agree on V2. It seems we still have to discuss V3 > (multi-queue, thread pools, a fence context associated with each thread) a > bit more before we start landing pieces. While thinking about the whole thing a bit more ... Do we need virtio_gpu_ctrl_hdr->fence_id at all? At virtio level it is pretty simple: The host completes the SUBMIT_3D virtio command when it finished rendering, period. On the guest side we don't need the fence_id. The completion callback gets passed the virtio_gpu_vbuffer, so it can figure which command did actually complete without looking at virtio_gpu_ctrl_hdr->fence_id. On the host side we depend on the fence_id right now, but only because that is the way the virgl_renderer_callbacks->write_fence interface is designed. We have to change that anyway for per-context (or whatever) fences, so it should not be a problem to drop the fence_id dependency too and just pass around an opaque pointer instead. cheers, Gerd ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC PATCH 0/8] *** Per context fencing ***
Hi, > I will start with... how many timelines do we want to expose per > context? In my mind, it goes like > > V1: 1 timeline per virtqueue (there is one timeline for ctrlq right now) > V2: 1 timeline per context (VK and GL on different timelines) > V3: N timelines per context (each VkQueue in a VK context gets a timeline?) > V4: N+M timelines per context (each CPU thread also gets a timeline?!?!) > > I certainly don't know if V4 is a good idea or not... I'd expect apps use one VkQueue per thread, so v3 makes sense but v4 not so much I think ... cheers, Gerd ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC PATCH 0/8] *** Per context fencing ***
Hi, > Can virtqueues be added dynamically? No. > Or can we have > fixed but enough (e.g., 64) virtqueues? Well, I wouldn't bet that any specific number is enough. When gtk starts rendering using vulkan by default the number of contexts of a standard gnome desktop will be pretty high I guess ... We can have a host-configurable number of virtqueues. Typically the guest can figure the number of available queues from config space. One common usage pattern (seen in block/net devices) is to have the number of virtqueues equal the number of vcpus, so each vcpu has dedicated virtqueue. For virtio-gpu it is probably more useful to bind contexts to virtqueues. As long as we have enough virtqueues each context can have its own. Once we run out of virtqueues we have to start sharing. On the host side it probably makes sense to have one process per virtqueue. cheers, Gerd ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC PATCH 0/8] *** Per context fencing ***
On Wed, Mar 11, 2020 at 4:36 PM Gurchetan Singh wrote: > > > > On Wed, Mar 11, 2020 at 3:36 AM Gerd Hoffmann wrote: >> >> Hi, >> >> > I should've been more clear -- this is an internal cleanup/preparation and >> > the per-context changes are invisible to host userspace. >> >> Ok, it wasn't clear that you don't flip the switch yet. In general the >> commit messages could be a bit more verbose ... >> >> I'm wondering though why we need the new fence_id in the first place. >> Isn't it enough to have per-context (instead of global) last_seq? > > > Heh, that was to leave open the possibility of multiple timelines per > context. Roughly speaking, Yeah, I think we will need multiple timelines per context. > > V2 -- multiple processes > V3 -- multiple processes and multiple threads (due to VK multi-threaded > command buffers) > > I think we all agree on V2. It seems we still have to discuss V3 > (multi-queue, thread pools, a fence context associated with each thread) a > bit more before we start landing pieces. In addition to multiple threads, we should also consider multiple VkQueues. I will start with... how many timelines do we want to expose per context? In my mind, it goes like V1: 1 timeline per virtqueue (there is one timeline for ctrlq right now) V2: 1 timeline per context (VK and GL on different timelines) V3: N timelines per context (each VkQueue in a VK context gets a timeline?) V4: N+M timelines per context (each CPU thread also gets a timeline?!?!) I certainly don't know if V4 is a good idea or not... > >> >> > Multi-queue sounds very interesting indeed, especially with VK >> > multi-threaded command submission. That to me is V3 rather than V2.. let's >> > start easy! >> >> Having v2 if we plan to obsolete it with v3 soon doesn't look like a >> good plan to me. It'll make backward compatibility more complex for >> no good reason ... >> >> Also: Does virglrenderer render different contexts in parallel today? >> Only in case it does we'll actually get benefits from per-context >> fences. But I think it doesn't, so there is no need to rush. >> >> I think we should better have a rough plan for parallel rendering first, >> then go start implementing the pieces needed. >> >> cheers, >> Gerd >> ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC PATCH 0/8] *** Per context fencing ***
On Wed, Mar 11, 2020 at 3:36 AM Gerd Hoffmann wrote: > Hi, > > > I should've been more clear -- this is an internal cleanup/preparation > and > > the per-context changes are invisible to host userspace. > > Ok, it wasn't clear that you don't flip the switch yet. In general the > commit messages could be a bit more verbose ... > > I'm wondering though why we need the new fence_id in the first place. > Isn't it enough to have per-context (instead of global) last_seq? > Heh, that was to leave open the possibility of multiple timelines per context. Roughly speaking, V2 -- multiple processes V3 -- multiple processes and multiple threads (due to VK multi-threaded command buffers) I think we all agree on V2. It seems we still have to discuss V3 (multi-queue, thread pools, a fence context associated with each thread) a bit more before we start landing pieces. > > Multi-queue sounds very interesting indeed, especially with VK > > multi-threaded command submission. That to me is V3 rather than V2.. > let's > > start easy! > > Having v2 if we plan to obsolete it with v3 soon doesn't look like a > good plan to me. It'll make backward compatibility more complex for > no good reason ... > > Also: Does virglrenderer render different contexts in parallel today? > Only in case it does we'll actually get benefits from per-context > fences. But I think it doesn't, so there is no need to rush. > > I think we should better have a rough plan for parallel rendering first, > then go start implementing the pieces needed. > > cheers, > Gerd > > ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC PATCH 0/8] *** Per context fencing ***
On Wed, Mar 11, 2020 at 3:36 AM Gerd Hoffmann wrote: > > Hi, > > > I should've been more clear -- this is an internal cleanup/preparation and > > the per-context changes are invisible to host userspace. > > Ok, it wasn't clear that you don't flip the switch yet. In general the > commit messages could be a bit more verbose ... > > I'm wondering though why we need the new fence_id in the first place. > Isn't it enough to have per-context (instead of global) last_seq? > > > Multi-queue sounds very interesting indeed, especially with VK > > multi-threaded command submission. That to me is V3 rather than V2.. let's > > start easy! > > Having v2 if we plan to obsolete it with v3 soon doesn't look like a > good plan to me. It'll make backward compatibility more complex for > no good reason ... I agree we want to study multi-queue a little bit before doing v2. If we do decide that multi-queue will be v3, we should at least design v2 in a forward-compatible way. Every VK context (or GL context if we go multi-process GL) is isolated. I think there will need to be at least one virtqueue for each VK context. Can virtqueues be added dynamically? Or can we have fixed but enough (e.g., 64) virtqueues? Multi-threaded command submission is not helped by multi-queue unless we go with one virtqueue for each VKQueue in a VK context. Otherwise, multi-queue only makes context scheduling easier, which is not a priority yet IMO. > > Also: Does virglrenderer render different contexts in parallel today? > Only in case it does we'll actually get benefits from per-context > fences. But I think it doesn't, so there is no need to rush. > > I think we should better have a rough plan for parallel rendering first, > then go start implementing the pieces needed. It will be soon. Each VK context will be rendered by a different renderer process. Besides, VK contexts and GL contexts are not on the same timeline. We don't want one to delay another by presenting a unified timeline. > > cheers, > Gerd > ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC PATCH 0/8] *** Per context fencing ***
Hi, > I should've been more clear -- this is an internal cleanup/preparation and > the per-context changes are invisible to host userspace. Ok, it wasn't clear that you don't flip the switch yet. In general the commit messages could be a bit more verbose ... I'm wondering though why we need the new fence_id in the first place. Isn't it enough to have per-context (instead of global) last_seq? > Multi-queue sounds very interesting indeed, especially with VK > multi-threaded command submission. That to me is V3 rather than V2.. let's > start easy! Having v2 if we plan to obsolete it with v3 soon doesn't look like a good plan to me. It'll make backward compatibility more complex for no good reason ... Also: Does virglrenderer render different contexts in parallel today? Only in case it does we'll actually get benefits from per-context fences. But I think it doesn't, so there is no need to rush. I think we should better have a rough plan for parallel rendering first, then go start implementing the pieces needed. cheers, Gerd ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC PATCH 0/8] *** Per context fencing ***
On Tue, Mar 10, 2020 at 12:43 AM Gerd Hoffmann wrote: > On Mon, Mar 09, 2020 at 06:08:10PM -0700, Gurchetan Singh wrote: > > We don't want fences from different 3D contexts/processes (GL, VK) to > > be on the same timeline. Sending this out as a RFC to solicit feedback > > on the general approach. > > NACK. > > virtio fences are global, period. You can't simply change fence > semantics like this. At least not without a virtio protocol update > because guest and host need to be on the same page here. I should've been more clear -- this is an internal cleanup/preparation and the per-context changes are invisible to host userspace. Just look at > virgl_renderer_callbacks->write_fences() and how it doesn't consider > contexts at all. > > So one way out would be to add a virtio feature flag for this, so guest > & host can negotiate whenever fences are global or per-context. > Yeah, we'll need something like VIRTIO_GPU_FLAG_FENCE_V2 eventually, which means fences virgl_write_fence can not assume fence_id is the global sequence number. It's a bit tricky, and we have to consider a few cases: 1) Current kernel/current host userspace. Everything works. 2) Newer kernel (with this series) on current host userspace. For that, fence_id needs to be a monotonically increasing value, which it remains to be. I ran the standard test apps (Unigine Valley, dEQP, etc.) with this change and things seem fine. 3) Current kernel on newer host userspace. New host won't see VIRTIO_GPU_FLAG_FENCE_V2, everything should work as usual. 4) Newer kernel on new host host userspace. virgl_write_fence signals fences only in a specific context (or possibly only one fence at a time). The guest kernel processing based on {fence_id, fence_context} will make a difference in a multi-process environment. If I have things right (and again, it's a bit tricky), so the virtio protocol update will be required at (4). It would be nice to get in refactorings to avoid mega-changes if we agree on the general approach.. Side note: Fences do have an associated context ID in virglrenderer [1], though we don't pass down the correct ctx ID just yet [2]. [1] https://gitlab.freedesktop.org/virgl/virglrenderer/-/blob/master/src/virglrenderer.h#L204 [2] https://github.com/qemu/qemu/blob/master/hw/display/virtio-gpu-3d.c#L490 Another approach would be to go multiqueue, with each virtqueue being > roughly the same as a rendering pipeline on physical hardware, then have > per-virtqueue fences. > Multi-queue sounds very interesting indeed, especially with VK multi-threaded command submission. That to me is V3 rather than V2.. let's start easy! When going multiqueue we might also rethink the cursor queue approach. > I think it makes sense to simply allow sending cursor commands to any > queue then. A separate cursor queue continues to be an option for the > guest then, but I'm not sure how useful that actually is in practice > given that cursor image updates are regular resource updates and have to > go through the control queue, so virtio_gpu_cursor_plane_update() has to > wait for the resource update finish before it can queue the cursor > command. I suspect the idea to fast-track cursor updates with a > separate queue doesn't work that well in practice because of that. > > cheers, > Gerd > > ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC PATCH 0/8] *** Per context fencing ***
On Mon, Mar 09, 2020 at 06:08:10PM -0700, Gurchetan Singh wrote: > We don't want fences from different 3D contexts/processes (GL, VK) to > be on the same timeline. Sending this out as a RFC to solicit feedback > on the general approach. NACK. virtio fences are global, period. You can't simply change fence semantics like this. At least not without a virtio protocol update because guest and host need to be on the same page here. Just look at virgl_renderer_callbacks->write_fences() and how it doesn't consider contexts at all. So one way out would be to add a virtio feature flag for this, so guest & host can negotiate whenever fences are global or per-context. Another approach would be to go multiqueue, with each virtqueue being roughly the same as a rendering pipeline on physical hardware, then have per-virtqueue fences. When going multiqueue we might also rethink the cursor queue approach. I think it makes sense to simply allow sending cursor commands to any queue then. A separate cursor queue continues to be an option for the guest then, but I'm not sure how useful that actually is in practice given that cursor image updates are regular resource updates and have to go through the control queue, so virtio_gpu_cursor_plane_update() has to wait for the resource update finish before it can queue the cursor command. I suspect the idea to fast-track cursor updates with a separate queue doesn't work that well in practice because of that. cheers, Gerd ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel