Re: [Intel-gfx] [PATCH v11 01/10] drm/syncobj: add sideband payload

2019-09-04 Thread Lionel Landwerlin

Jason: Are you alright with this?

Assuming you are I think we should merge it.

Thanks,

-Lionel

On 29/08/2019 08:26, Zhou, David(ChunMing) wrote:

v6 is fine to me as well, RB on it to go ahead.

Out of curious, why " [PATCH v11 01/10] " is on subject?

-David

-Original Message-
From: Lionel Landwerlin 
Sent: Wednesday, August 28, 2019 10:33 PM
To: intel-gfx@lists.freedesktop.org
Cc: Lionel Landwerlin ; Zhou, David(ChunMing) 
; Koenig, Christian ; Jason Ekstrand 

Subject: [PATCH v11 01/10] drm/syncobj: add sideband payload

The Vulkan timeline semaphores allow signaling to happen on the point of the 
timeline without all of the its dependencies to be created.

The current 2 implementations (AMD/Intel) of the Vulkan spec on top of the 
Linux kernel are using a thread to wait on the dependencies of a given point to 
materialize and delay actual submission to the kernel driver until the wait 
completes.

If a binary semaphore is submitted for signaling along the side of a timeline 
semaphore waiting for completion that means that the drm syncobj associated 
with that binary semaphore will not have a DMA fence associated with it by the 
time vkQueueSubmit() returns. This and the fact that a binary semaphore can be 
signaled and unsignaled as before its DMA fences materialize mean that we 
cannot just rely on the fence within the syncobj but we also need a sideband 
payload verifying that the fence in the syncobj matches the last submission 
from the Vulkan API point of view.

This change adds a sideband payload that is incremented with signaled syncobj 
when vkQueueSubmit() is called. The next vkQueueSubmit() waiting on a the 
syncobj will read the sideband payload and wait for a fence chain element with 
a seqno superior or equal to the sideband payload value to be added into the 
fence chain and use that fence to trigger the submission on the kernel driver.

v2: Use a separate ioctl to get/set the sideband value (Christian)

v3: Use 2 ioctls for get/set (Christian)

v4: Use a single new ioctl

v5: a bunch of blattant mistakes
 Store payload atomically (Chris)

v6: Only touch atomic value once (Jason)

Signed-off-by: Lionel Landwerlin 
Reviewed-by: David Zhou  (v5)
Cc: Christian Koenig 
Cc: Jason Ekstrand 
Cc: David(ChunMing) Zhou 
---
  drivers/gpu/drm/drm_internal.h |  2 ++
  drivers/gpu/drm/drm_ioctl.c|  3 ++
  drivers/gpu/drm/drm_syncobj.c  | 59 +-
  include/drm/drm_syncobj.h  |  9 ++
  include/uapi/drm/drm.h | 17 ++
  5 files changed, 89 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h 
index 51a2055c8f18..e297dfd85019 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -208,6 +208,8 @@ int drm_syncobj_timeline_signal_ioctl(struct drm_device 
*dev, void *data,
  struct drm_file *file_private);  int 
drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_private);
+int drm_syncobj_binary_ioctl(struct drm_device *dev, void *data,
+struct drm_file *file_private);
  
  /* drm_framebuffer.c */

  void drm_framebuffer_print_info(struct drm_printer *p, unsigned int indent, 
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c index 
f675a3bb2c88..644d0bc800a4 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -703,6 +703,9 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
  DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_QUERY, drm_syncobj_query_ioctl,
  DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_BINARY, drm_syncobj_binary_ioctl,
+ DRM_RENDER_ALLOW),
+
DRM_IOCTL_DEF(DRM_IOCTL_CRTC_GET_SEQUENCE, drm_crtc_get_sequence_ioctl, 
0),
DRM_IOCTL_DEF(DRM_IOCTL_CRTC_QUEUE_SEQUENCE, 
drm_crtc_queue_sequence_ioctl, 0),
DRM_IOCTL_DEF(DRM_IOCTL_MODE_CREATE_LEASE, drm_mode_create_lease_ioctl, 
DRM_MASTER), diff --git a/drivers/gpu/drm/drm_syncobj.c 
b/drivers/gpu/drm/drm_syncobj.c index 4b5c7b0ed714..732310b2b367 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -1224,8 +1224,10 @@ drm_syncobj_reset_ioctl(struct drm_device *dev, void 
*data,
if (ret < 0)
return ret;
  
-	for (i = 0; i < args->count_handles; i++)

+   for (i = 0; i < args->count_handles; i++) {
drm_syncobj_replace_fence(syncobjs[i], NULL);
+   atomic64_set([i]->binary_payload, 0);
+   }
  
  	drm_syncobj_array_free(syncobjs, args->count_handles);
  
@@ -1395,6 +1397,61 @@ int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,

if (ret)
break;
}
+
+   drm_syncobj_array_free(syncobjs, args->count_handles);
+
+   return ret;
+}
+
+int drm_syncobj_binary_ioctl(struct 

Re: [Intel-gfx] [PATCH v11 01/10] drm/syncobj: add sideband payload

2019-08-29 Thread Lionel Landwerlin

On 29/08/2019 08:26, Zhou, David(ChunMing) wrote:

v6 is fine to me as well, RB on it to go ahead.

Out of curious, why " [PATCH v11 01/10] " is on subject?



v11 is the series on the intel-gfx mailing list which depends on the 
timeline semaphore feature in our driver.


That's why I included this patch into it.


-Lionel




-David

-Original Message-
From: Lionel Landwerlin 
Sent: Wednesday, August 28, 2019 10:33 PM
To: intel-gfx@lists.freedesktop.org
Cc: Lionel Landwerlin ; Zhou, David(ChunMing) 
; Koenig, Christian ; Jason Ekstrand 

Subject: [PATCH v11 01/10] drm/syncobj: add sideband payload

The Vulkan timeline semaphores allow signaling to happen on the point of the 
timeline without all of the its dependencies to be created.

The current 2 implementations (AMD/Intel) of the Vulkan spec on top of the 
Linux kernel are using a thread to wait on the dependencies of a given point to 
materialize and delay actual submission to the kernel driver until the wait 
completes.

If a binary semaphore is submitted for signaling along the side of a timeline 
semaphore waiting for completion that means that the drm syncobj associated 
with that binary semaphore will not have a DMA fence associated with it by the 
time vkQueueSubmit() returns. This and the fact that a binary semaphore can be 
signaled and unsignaled as before its DMA fences materialize mean that we 
cannot just rely on the fence within the syncobj but we also need a sideband 
payload verifying that the fence in the syncobj matches the last submission 
from the Vulkan API point of view.

This change adds a sideband payload that is incremented with signaled syncobj 
when vkQueueSubmit() is called. The next vkQueueSubmit() waiting on a the 
syncobj will read the sideband payload and wait for a fence chain element with 
a seqno superior or equal to the sideband payload value to be added into the 
fence chain and use that fence to trigger the submission on the kernel driver.

v2: Use a separate ioctl to get/set the sideband value (Christian)

v3: Use 2 ioctls for get/set (Christian)

v4: Use a single new ioctl

v5: a bunch of blattant mistakes
 Store payload atomically (Chris)

v6: Only touch atomic value once (Jason)

Signed-off-by: Lionel Landwerlin 
Reviewed-by: David Zhou  (v5)
Cc: Christian Koenig 
Cc: Jason Ekstrand 
Cc: David(ChunMing) Zhou 
---
  drivers/gpu/drm/drm_internal.h |  2 ++
  drivers/gpu/drm/drm_ioctl.c|  3 ++
  drivers/gpu/drm/drm_syncobj.c  | 59 +-
  include/drm/drm_syncobj.h  |  9 ++
  include/uapi/drm/drm.h | 17 ++
  5 files changed, 89 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h 
index 51a2055c8f18..e297dfd85019 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -208,6 +208,8 @@ int drm_syncobj_timeline_signal_ioctl(struct drm_device 
*dev, void *data,
  struct drm_file *file_private);  int 
drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_private);
+int drm_syncobj_binary_ioctl(struct drm_device *dev, void *data,
+struct drm_file *file_private);
  
  /* drm_framebuffer.c */

  void drm_framebuffer_print_info(struct drm_printer *p, unsigned int indent, 
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c index 
f675a3bb2c88..644d0bc800a4 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -703,6 +703,9 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
  DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_QUERY, drm_syncobj_query_ioctl,
  DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_BINARY, drm_syncobj_binary_ioctl,
+ DRM_RENDER_ALLOW),
+
DRM_IOCTL_DEF(DRM_IOCTL_CRTC_GET_SEQUENCE, drm_crtc_get_sequence_ioctl, 
0),
DRM_IOCTL_DEF(DRM_IOCTL_CRTC_QUEUE_SEQUENCE, 
drm_crtc_queue_sequence_ioctl, 0),
DRM_IOCTL_DEF(DRM_IOCTL_MODE_CREATE_LEASE, drm_mode_create_lease_ioctl, 
DRM_MASTER), diff --git a/drivers/gpu/drm/drm_syncobj.c 
b/drivers/gpu/drm/drm_syncobj.c index 4b5c7b0ed714..732310b2b367 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -1224,8 +1224,10 @@ drm_syncobj_reset_ioctl(struct drm_device *dev, void 
*data,
if (ret < 0)
return ret;
  
-	for (i = 0; i < args->count_handles; i++)

+   for (i = 0; i < args->count_handles; i++) {
drm_syncobj_replace_fence(syncobjs[i], NULL);
+   atomic64_set([i]->binary_payload, 0);
+   }
  
  	drm_syncobj_array_free(syncobjs, args->count_handles);
  
@@ -1395,6 +1397,61 @@ int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,

if (ret)
break;
}
+
+   drm_syncobj_array_free(syncobjs, 

Re: [Intel-gfx] [PATCH v11 01/10] drm/syncobj: add sideband payload

2019-08-28 Thread Zhou, David(ChunMing)
v6 is fine to me as well, RB on it to go ahead.

Out of curious, why " [PATCH v11 01/10] " is on subject?

-David

-Original Message-
From: Lionel Landwerlin  
Sent: Wednesday, August 28, 2019 10:33 PM
To: intel-gfx@lists.freedesktop.org
Cc: Lionel Landwerlin ; Zhou, David(ChunMing) 
; Koenig, Christian ; Jason 
Ekstrand 
Subject: [PATCH v11 01/10] drm/syncobj: add sideband payload

The Vulkan timeline semaphores allow signaling to happen on the point of the 
timeline without all of the its dependencies to be created.

The current 2 implementations (AMD/Intel) of the Vulkan spec on top of the 
Linux kernel are using a thread to wait on the dependencies of a given point to 
materialize and delay actual submission to the kernel driver until the wait 
completes.

If a binary semaphore is submitted for signaling along the side of a timeline 
semaphore waiting for completion that means that the drm syncobj associated 
with that binary semaphore will not have a DMA fence associated with it by the 
time vkQueueSubmit() returns. This and the fact that a binary semaphore can be 
signaled and unsignaled as before its DMA fences materialize mean that we 
cannot just rely on the fence within the syncobj but we also need a sideband 
payload verifying that the fence in the syncobj matches the last submission 
from the Vulkan API point of view.

This change adds a sideband payload that is incremented with signaled syncobj 
when vkQueueSubmit() is called. The next vkQueueSubmit() waiting on a the 
syncobj will read the sideband payload and wait for a fence chain element with 
a seqno superior or equal to the sideband payload value to be added into the 
fence chain and use that fence to trigger the submission on the kernel driver.

v2: Use a separate ioctl to get/set the sideband value (Christian)

v3: Use 2 ioctls for get/set (Christian)

v4: Use a single new ioctl

v5: a bunch of blattant mistakes
Store payload atomically (Chris)

v6: Only touch atomic value once (Jason)

Signed-off-by: Lionel Landwerlin 
Reviewed-by: David Zhou  (v5)
Cc: Christian Koenig 
Cc: Jason Ekstrand 
Cc: David(ChunMing) Zhou 
---
 drivers/gpu/drm/drm_internal.h |  2 ++
 drivers/gpu/drm/drm_ioctl.c|  3 ++
 drivers/gpu/drm/drm_syncobj.c  | 59 +-
 include/drm/drm_syncobj.h  |  9 ++
 include/uapi/drm/drm.h | 17 ++
 5 files changed, 89 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h 
index 51a2055c8f18..e297dfd85019 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -208,6 +208,8 @@ int drm_syncobj_timeline_signal_ioctl(struct drm_device 
*dev, void *data,
  struct drm_file *file_private);  int 
drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_private);
+int drm_syncobj_binary_ioctl(struct drm_device *dev, void *data,
+struct drm_file *file_private);
 
 /* drm_framebuffer.c */
 void drm_framebuffer_print_info(struct drm_printer *p, unsigned int indent, 
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c index 
f675a3bb2c88..644d0bc800a4 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -703,6 +703,9 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
  DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_QUERY, drm_syncobj_query_ioctl,
  DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_BINARY, drm_syncobj_binary_ioctl,
+ DRM_RENDER_ALLOW),
+
DRM_IOCTL_DEF(DRM_IOCTL_CRTC_GET_SEQUENCE, drm_crtc_get_sequence_ioctl, 
0),
DRM_IOCTL_DEF(DRM_IOCTL_CRTC_QUEUE_SEQUENCE, 
drm_crtc_queue_sequence_ioctl, 0),
DRM_IOCTL_DEF(DRM_IOCTL_MODE_CREATE_LEASE, drm_mode_create_lease_ioctl, 
DRM_MASTER), diff --git a/drivers/gpu/drm/drm_syncobj.c 
b/drivers/gpu/drm/drm_syncobj.c index 4b5c7b0ed714..732310b2b367 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -1224,8 +1224,10 @@ drm_syncobj_reset_ioctl(struct drm_device *dev, void 
*data,
if (ret < 0)
return ret;
 
-   for (i = 0; i < args->count_handles; i++)
+   for (i = 0; i < args->count_handles; i++) {
drm_syncobj_replace_fence(syncobjs[i], NULL);
+   atomic64_set([i]->binary_payload, 0);
+   }
 
drm_syncobj_array_free(syncobjs, args->count_handles);
 
@@ -1395,6 +1397,61 @@ int drm_syncobj_query_ioctl(struct drm_device *dev, void 
*data,
if (ret)
break;
}
+
+   drm_syncobj_array_free(syncobjs, args->count_handles);
+
+   return ret;
+}
+
+int drm_syncobj_binary_ioctl(struct drm_device *dev, void *data,
+struct drm_file *file_private)
+{
+   struct drm_syncobj_binary_array *args = data;
+