Re: [PATCH v5 1/1] drm/syncobj: add sideband payload

2019-08-16 Thread zhoucm1
If it is not submitted yet,  Reviewed-by: Chunming Zhou 



-David

On 2019年08月09日 21:43, Lionel Landwerlin wrote:

The Vulkan timeline semaphores allow signaling to happen on the point
of the timeline without all of the its dependencies to be created.

The current 2 implementations (AMD/Intel) of the Vulkan spec on top of
the Linux kernel are using a thread to wait on the dependencies of a
given point to materialize and delay actual submission to the kernel
driver until the wait completes.

If a binary semaphore is submitted for signaling along the side of a
timeline semaphore waiting for completion that means that the drm
syncobj associated with that binary semaphore will not have a DMA
fence associated with it by the time vkQueueSubmit() returns. This and
the fact that a binary semaphore can be signaled and unsignaled as
before its DMA fences materialize mean that we cannot just rely on the
fence within the syncobj but we also need a sideband payload verifying
that the fence in the syncobj matches the last submission from the
Vulkan API point of view.

This change adds a sideband payload that is incremented with signaled
syncobj when vkQueueSubmit() is called. The next vkQueueSubmit()
waiting on a the syncobj will read the sideband payload and wait for a
fence chain element with a seqno superior or equal to the sideband
payload value to be added into the fence chain and use that fence to
trigger the submission on the kernel driver.

v2: Use a separate ioctl to get/set the sideband value (Christian)

v3: Use 2 ioctls for get/set (Christian)

v4: Use a single new ioctl

v5: a bunch of blattant mistakes
 Store payload atomically (Chris)

Signed-off-by: Lionel Landwerlin 
Cc: Christian Koenig 
Cc: Jason Ekstrand 
Cc: David(ChunMing) Zhou 
---
  drivers/gpu/drm/drm_internal.h |  2 ++
  drivers/gpu/drm/drm_ioctl.c|  3 ++
  drivers/gpu/drm/drm_syncobj.c  | 58 +-
  include/drm/drm_syncobj.h  |  9 ++
  include/uapi/drm/drm.h | 17 ++
  5 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
index 51a2055c8f18..e297dfd85019 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -208,6 +208,8 @@ int drm_syncobj_timeline_signal_ioctl(struct drm_device 
*dev, void *data,
  struct drm_file *file_private);
  int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_private);
+int drm_syncobj_binary_ioctl(struct drm_device *dev, void *data,
+struct drm_file *file_private);
  
  /* drm_framebuffer.c */

  void drm_framebuffer_print_info(struct drm_printer *p, unsigned int indent,
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index f675a3bb2c88..644d0bc800a4 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -703,6 +703,9 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
  DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_QUERY, drm_syncobj_query_ioctl,
  DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_BINARY, drm_syncobj_binary_ioctl,
+ DRM_RENDER_ALLOW),
+
DRM_IOCTL_DEF(DRM_IOCTL_CRTC_GET_SEQUENCE, drm_crtc_get_sequence_ioctl, 
0),
DRM_IOCTL_DEF(DRM_IOCTL_CRTC_QUEUE_SEQUENCE, 
drm_crtc_queue_sequence_ioctl, 0),
DRM_IOCTL_DEF(DRM_IOCTL_MODE_CREATE_LEASE, drm_mode_create_lease_ioctl, 
DRM_MASTER),
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index b927e482e554..d2d3a8d1374d 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -1150,8 +1150,10 @@ drm_syncobj_reset_ioctl(struct drm_device *dev, void 
*data,
if (ret < 0)
return ret;
  
-	for (i = 0; i < args->count_handles; i++)

+   for (i = 0; i < args->count_handles; i++) {
drm_syncobj_replace_fence(syncobjs[i], NULL);
+   atomic64_set([i]->binary_payload, 0);
+   }
  
  	drm_syncobj_array_free(syncobjs, args->count_handles);
  
@@ -1321,6 +1323,60 @@ int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,

if (ret)
break;
}
+
+   drm_syncobj_array_free(syncobjs, args->count_handles);
+
+   return ret;
+}
+
+int drm_syncobj_binary_ioctl(struct drm_device *dev, void *data,
+struct drm_file *file_private)
+{
+   struct drm_syncobj_binary_array *args = data;
+   struct drm_syncobj **syncobjs;
+   u32 __user *access_flags = u64_to_user_ptr(args->access_flags);
+   u64 __user *values = u64_to_user_ptr(args->values);
+   u32 i;
+   int ret;
+
+   if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ_TIMELINE))
+   return -EOPNOTSUPP;
+
+   if (args->pad != 0)
+   return -EINVAL;
+
+   

Re: Threaded submission & semaphore sharing

2019-08-02 Thread zhoucm1



On 2019年08月02日 17:41, Lionel Landwerlin wrote:

Hey David,

On 02/08/2019 12:11, zhoucm1 wrote:


Hi Lionel,

For binary semaphore, I guess every one will think application will 
guarantee wait is behind the signal, whenever the semaphore is shared 
or used in internal-process.


I think below two options can fix your problem:

a. Can we extend vkWaitForFence so that it can be able to wait on 
fence-available? If fence is available, then it's safe to do 
semaphore wait in vkQueueSubmit.




I'm sorry, but I don't understand what vkWaitForFence() has to do with 
this problem.


They test case we're struggling with doesn't use that API.


Can you maybe explain a bit more how it relates?


vkQueueSubmit()
vkWaitForFenceAvailable()
vkQueueSubmit()
vkWaitForFenceAvailable()
vkQueueSubmit()
vkWaitForFenceAvailable()

Sorry, that could lead application program very ugly.



b. Make waitBeforeSignal is valid for binary semaphore as well, as 
that way, It is reasonable to add wait/signal counting for binary 
syncobj.




Yeah essentially the change we're proposing internally makes binary 
semaphores use syncobj timelines.


Will you raise up a MR to add the language of waitBeforeSignal support 
of binary semaphore to vulkan spec?


-David


There is just another u64 associated with them.


-Lionel




-David


On 2019年08月02日 14:27, Lionel Landwerlin wrote:

On 02/08/2019 09:10, Koenig, Christian wrote:



Am 02.08.2019 07:38 schrieb Lionel Landwerlin 
:


On 02/08/2019 08:21, Koenig, Christian wrote:



Am 02.08.2019 07:17 schrieb Lionel Landwerlin

<mailto:lionel.g.landwer...@intel.com>:

On 02/08/2019 08:08, Koenig, Christian wrote:

Hi Lionel,

Well that looks more like your test case is buggy.

According to the code the ctx1 queue always waits
for sem1 and ctx2 queue always waits for sem2.


That's supposed to be the same underlying syncobj
because it's exported from one VkDevice as opaque FD
from sem1 and imported into sem2.


Well than that's still buggy and won't synchronize at all.

When ctx1 waits for a semaphore and then signals the same
semaphore there is no guarantee that ctx2 will run in
between jobs.

It's perfectly valid in this case to first run all jobs
from ctx1 and then all jobs from ctx2.


That's not really how I see the semaphores working.

The spec describe VkSemaphore as an interface to an internal
payload opaque to the application.


When ctx1 waits on the semaphore, it waits on the payload put
there by the previous iteration.


And who says that it's not waiting for it's own previous payload?



That's was I understood from you previous comment : "there is no 
guarantee that ctx2 will run in between jobs"





See if the payload is a counter this won't work either. Keep in 
mind that this has the semantic of a semaphore. Whoever grabs the 
semaphore first wins and can run, everybody else has to wait.



What performs the "grab" here?

I thought that would be vkQueueSubmit().

Since that occuring from a single application thread, that should 
then be ordered in execution of ctx1,ctx2,ctx1,...



Thanks for your time on this,


-Lionel




Then it proceeds to signal it by replacing the internal payload.


That's an implementation detail of our sync objects, but I don't 
think that this behavior is part of the Vulkan specification.


Regards,
Christian.


ctx2 then waits on that and replaces the payload again with the
new internal synchronization object.


The internal payload is a dma fence in our case and signaling
just replaces a dma fence by another or puts one where there
was none before.

So we should have created a dependecy link between all the
submissions and then should be executed in the order of
QueueSubmit() calls.


-Lionel



It only prevents running both at the same time and as far
as I can see that still works even with threaded submission.

You need at least two semaphores for a tandem submission.

Regards,
Christian.



This way there can't be any Synchronisation between
the two.

Regards,
Christian.

Am 02.08.2019 06:55 schrieb Lionel Landwerlin

<mailto:lionel.g.landwer...@intel.com>:
Hey Christian,

The problem boils down to the fact that we don't
immediately create dma fences when calling
vkQueueSubmit().
This is delayed to a thread.

From a single application thread, you can
QueueSubmit() to 2 queues from 2 different devices.
Each QueueSubmit to one queue has a dependency on
the previous QueueS

Re: Threaded submission & semaphore sharing

2019-08-02 Thread zhoucm1

Hi Lionel,

For binary semaphore, I guess every one will think application will 
guarantee wait is behind the signal, whenever the semaphore is shared or 
used in internal-process.


I think below two options can fix your problem:

a. Can we extend vkWaitForFence so that it can be able to wait on 
fence-available? If fence is available, then it's safe to do semaphore 
wait in vkQueueSubmit.


b. Make waitBeforeSignal is valid for binary semaphore as well, as that 
way, It is reasonable to add wait/signal counting for binary syncobj.



-David


On 2019年08月02日 14:27, Lionel Landwerlin wrote:

On 02/08/2019 09:10, Koenig, Christian wrote:



Am 02.08.2019 07:38 schrieb Lionel Landwerlin 
:


On 02/08/2019 08:21, Koenig, Christian wrote:



Am 02.08.2019 07:17 schrieb Lionel Landwerlin

:

On 02/08/2019 08:08, Koenig, Christian wrote:

Hi Lionel,

Well that looks more like your test case is buggy.

According to the code the ctx1 queue always waits for
sem1 and ctx2 queue always waits for sem2.


That's supposed to be the same underlying syncobj because
it's exported from one VkDevice as opaque FD from sem1
and imported into sem2.


Well than that's still buggy and won't synchronize at all.

When ctx1 waits for a semaphore and then signals the same
semaphore there is no guarantee that ctx2 will run in between
jobs.

It's perfectly valid in this case to first run all jobs from
ctx1 and then all jobs from ctx2.


That's not really how I see the semaphores working.

The spec describe VkSemaphore as an interface to an internal
payload opaque to the application.


When ctx1 waits on the semaphore, it waits on the payload put
there by the previous iteration.


And who says that it's not waiting for it's own previous payload?



That's was I understood from you previous comment : "there is no 
guarantee that ctx2 will run in between jobs"





See if the payload is a counter this won't work either. Keep in mind 
that this has the semantic of a semaphore. Whoever grabs the 
semaphore first wins and can run, everybody else has to wait.



What performs the "grab" here?

I thought that would be vkQueueSubmit().

Since that occuring from a single application thread, that should then 
be ordered in execution of ctx1,ctx2,ctx1,...



Thanks for your time on this,


-Lionel




Then it proceeds to signal it by replacing the internal payload.


That's an implementation detail of our sync objects, but I don't 
think that this behavior is part of the Vulkan specification.


Regards,
Christian.


ctx2 then waits on that and replaces the payload again with the
new internal synchronization object.


The internal payload is a dma fence in our case and signaling
just replaces a dma fence by another or puts one where there was
none before.

So we should have created a dependecy link between all the
submissions and then should be executed in the order of
QueueSubmit() calls.


-Lionel



It only prevents running both at the same time and as far as
I can see that still works even with threaded submission.

You need at least two semaphores for a tandem submission.

Regards,
Christian.



This way there can't be any Synchronisation between
the two.

Regards,
Christian.

Am 02.08.2019 06:55 schrieb Lionel Landwerlin

:
Hey Christian,

The problem boils down to the fact that we don't
immediately create dma fences when calling
vkQueueSubmit().
This is delayed to a thread.

From a single application thread, you can
QueueSubmit() to 2 queues from 2 different devices.
Each QueueSubmit to one queue has a dependency on the
previous QueueSubmit on the other queue through an
exported/imported semaphore.

From the API point of view the state of the semaphore
should be changed after each QueueSubmit().
The problem is that it's not because of the thread
and because you might have those 2 submission threads
tied to different VkDevice/VkInstance or even
different applications (synchronizing themselves
outside the vulkan API).

Hope that makes sense.
It's not really easy to explain by mail, the best
explanation is probably reading the test :

https://gitlab.freedesktop.org/mesa/crucible/blob/master/src/tests/func/sync/semaphore-fd.c#L788


Re: [PATCH] drm/syncobj: remove boring message

2019-07-30 Thread zhoucm1



On 2019年07月30日 17:40, Lionel Landwerlin wrote:

On 30/07/2019 12:36, Daniel Vetter wrote:

On Tue, Jul 30, 2019 at 05:31:26PM +0800, zhoucm1 wrote:


On 2019年07月30日 17:27, Daniel Vetter wrote:

On Mon, Jul 29, 2019 at 04:20:39PM +0800, Chunming Zhou wrote:

It is normal that binary syncobj replaces the underlying fence.

Signed-off-by: Chunming Zhou 

Do we hit this with one of the syncobj igts?
Unforturnately, No, It's only hit in AMDGPU path, which combines 
timeline

and binary to same path when timeline is enabled.



We can totally build that case with sw_fences which is what one of the 
IGT tests does.

OK, Thank you.

-David



-Lionel



Looks like lionel has something, maybe help review that?
-Daniel


-David

-Daniel


---
   drivers/gpu/drm/drm_syncobj.c | 3 ---
   1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c 
b/drivers/gpu/drm/drm_syncobj.c

index 929f7c64f9a2..bc7ec1679e4d 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -151,9 +151,6 @@ void drm_syncobj_add_point(struct drm_syncobj 
*syncobj,

   spin_lock(>lock);
   prev = drm_syncobj_fence_get(syncobj);
-    /* You are adding an unorder point to timeline, which could 
cause payload returned from query_ioctl is 0! */

-    if (prev && prev->seqno >= point)
-    DRM_ERROR("You are adding an unorder point to timeline!\n");
   dma_fence_chain_init(chain, prev, fence, point);
   rcu_assign_pointer(syncobj->fence, >base);
--
2.17.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel





___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] drm/syncobj: remove boring message

2019-07-30 Thread zhoucm1



On 2019年07月30日 17:27, Daniel Vetter wrote:

On Mon, Jul 29, 2019 at 04:20:39PM +0800, Chunming Zhou wrote:

It is normal that binary syncobj replaces the underlying fence.

Signed-off-by: Chunming Zhou 

Do we hit this with one of the syncobj igts?
Unforturnately, No, It's only hit in AMDGPU path, which combines 
timeline and binary to same path when timeline is enabled.


-David

-Daniel


---
  drivers/gpu/drm/drm_syncobj.c | 3 ---
  1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 929f7c64f9a2..bc7ec1679e4d 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -151,9 +151,6 @@ void drm_syncobj_add_point(struct drm_syncobj *syncobj,
spin_lock(>lock);
  
  	prev = drm_syncobj_fence_get(syncobj);

-   /* You are adding an unorder point to timeline, which could cause 
payload returned from query_ioctl is 0! */
-   if (prev && prev->seqno >= point)
-   DRM_ERROR("You are adding an unorder point to timeline!\n");
dma_fence_chain_init(chain, prev, fence, point);
rcu_assign_pointer(syncobj->fence, >base);
  
--

2.17.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH libdrm] libdrm: wrap new flexible syncobj query interface v2

2019-07-25 Thread zhoucm1
Thanks guys, since I have no write permission to libdrm, I need your 
help to push patch.


-David
On 2019年07月25日 16:11, Chunming Zhou wrote:

v2: nit-picks fix

Signed-off-by: Chunming Zhou 
Cc: Lionel Landwerlin 
Cc: Christian König 
Reviewed-by: Christian König 
For the xf86drm.[ch] part : Reviewed-by: Lionel Landwerlin 

---
  amdgpu/amdgpu-symbol-check |  1 +
  amdgpu/amdgpu.h| 18 ++
  amdgpu/amdgpu_cs.c | 10 ++
  include/drm/drm.h  |  3 ++-
  xf86drm.c  | 15 +++
  xf86drm.h  |  2 ++
  6 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/amdgpu/amdgpu-symbol-check b/amdgpu/amdgpu-symbol-check
index 274b4c6d..597a99b2 100755
--- a/amdgpu/amdgpu-symbol-check
+++ b/amdgpu/amdgpu-symbol-check
@@ -56,6 +56,7 @@ amdgpu_cs_syncobj_export_sync_file2
  amdgpu_cs_syncobj_import_sync_file
  amdgpu_cs_syncobj_import_sync_file2
  amdgpu_cs_syncobj_query
+amdgpu_cs_syncobj_query2
  amdgpu_cs_syncobj_reset
  amdgpu_cs_syncobj_signal
  amdgpu_cs_syncobj_timeline_signal
diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index 9d9b0832..19f74bd6 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -1591,6 +1591,24 @@ int amdgpu_cs_syncobj_timeline_wait(amdgpu_device_handle 
dev,
  int amdgpu_cs_syncobj_query(amdgpu_device_handle dev,
uint32_t *handles, uint64_t *points,
unsigned num_handles);
+/**
+ *  Query sync objects last signaled or submitted point.
+ *
+ * \param   dev- \c [in] self-explanatory
+ * \param   handles - \c [in] array of sync object handles
+ * \param   points - \c [out] array of sync points returned, which presents
+ * syncobj payload.
+ * \param   num_handles - \c [in] self-explanatory
+ * \param   flags   - \c [in] a bitmask of DRM_SYNCOBJ_QUERY_FLAGS_*
+ *
+ * \return   0 on success\n
+ *  -ETIME - Timeout
+ *  <0 - Negative POSIX Error code
+ *
+ */
+int amdgpu_cs_syncobj_query2(amdgpu_device_handle dev,
+uint32_t *handles, uint64_t *points,
+unsigned num_handles, uint32_t flags);
  
  /**

   *  Export kernel sync object to shareable fd.
diff --git a/amdgpu/amdgpu_cs.c b/amdgpu/amdgpu_cs.c
index 977fa3cf..01e2b2c8 100644
--- a/amdgpu/amdgpu_cs.c
+++ b/amdgpu/amdgpu_cs.c
@@ -721,6 +721,16 @@ drm_public int 
amdgpu_cs_syncobj_query(amdgpu_device_handle dev,
return drmSyncobjQuery(dev->fd, handles, points, num_handles);
  }
  
+drm_public int amdgpu_cs_syncobj_query2(amdgpu_device_handle dev,

+   uint32_t *handles, uint64_t *points,
+   unsigned num_handles, uint32_t flags)
+{
+   if (!dev)
+   return -EINVAL;
+
+   return drmSyncobjQuery2(dev->fd, handles, points, num_handles, flags);
+}
+
  drm_public int amdgpu_cs_export_syncobj(amdgpu_device_handle dev,
uint32_t handle,
int *shared_fd)
diff --git a/include/drm/drm.h b/include/drm/drm.h
index 532787bf..af37a80b 100644
--- a/include/drm/drm.h
+++ b/include/drm/drm.h
@@ -771,11 +771,12 @@ struct drm_syncobj_array {
__u32 pad;
  };
  
+#define DRM_SYNCOBJ_QUERY_FLAGS_LAST_SUBMITTED (1 << 0) /* last available point on timeline syncobj */

  struct drm_syncobj_timeline_array {
__u64 handles;
__u64 points;
__u32 count_handles;
-   __u32 pad;
+   __u32 flags;
  };
  
  
diff --git a/xf86drm.c b/xf86drm.c

index 953fc762..28a61264 100644
--- a/xf86drm.c
+++ b/xf86drm.c
@@ -4314,6 +4314,21 @@ drm_public int drmSyncobjQuery(int fd, uint32_t 
*handles, uint64_t *points,
  return 0;
  }
  
+drm_public int drmSyncobjQuery2(int fd, uint32_t *handles, uint64_t *points,

+   uint32_t handle_count, uint32_t flags)
+{
+struct drm_syncobj_timeline_array args;
+
+memclear(args);
+args.handles = (uintptr_t)handles;
+args.points = (uintptr_t)points;
+args.count_handles = handle_count;
+args.flags = flags;
+
+return drmIoctl(fd, DRM_IOCTL_SYNCOBJ_QUERY, );
+}
+
+
  drm_public int drmSyncobjTransfer(int fd,
  uint32_t dst_handle, uint64_t dst_point,
  uint32_t src_handle, uint64_t src_point,
diff --git a/xf86drm.h b/xf86drm.h
index 3fb1d1ca..55ceaed9 100644
--- a/xf86drm.h
+++ b/xf86drm.h
@@ -884,6 +884,8 @@ extern int drmSyncobjTimelineWait(int fd, uint32_t 
*handles, uint64_t *points,
  uint32_t *first_signaled);
  extern int drmSyncobjQuery(int fd, uint32_t *handles, uint64_t *points,
   uint32_t handle_count);
+extern int drmSyncobjQuery2(int fd, uint32_t *handles, uint64_t *points,
+   uint32_t handle_count, uint32_t flags);
  extern int drmSyncobjTransfer(int fd,
  uint32_t 

Re: [PATCH libdrm] libdrm: wrap new flexible syncobj query interface v2

2019-07-25 Thread zhoucm1
Thank guys. Since I have write permission to libdrm mast, I need your 
help to push patch.


-David


On 2019年07月25日 16:11, Chunming Zhou wrote:

v2: nit-picks fix

Signed-off-by: Chunming Zhou 
Cc: Lionel Landwerlin 
Cc: Christian König 
Reviewed-by: Christian König 
For the xf86drm.[ch] part : Reviewed-by: Lionel Landwerlin 

---
  amdgpu/amdgpu-symbol-check |  1 +
  amdgpu/amdgpu.h| 18 ++
  amdgpu/amdgpu_cs.c | 10 ++
  include/drm/drm.h  |  3 ++-
  xf86drm.c  | 15 +++
  xf86drm.h  |  2 ++
  6 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/amdgpu/amdgpu-symbol-check b/amdgpu/amdgpu-symbol-check
index 274b4c6d..597a99b2 100755
--- a/amdgpu/amdgpu-symbol-check
+++ b/amdgpu/amdgpu-symbol-check
@@ -56,6 +56,7 @@ amdgpu_cs_syncobj_export_sync_file2
  amdgpu_cs_syncobj_import_sync_file
  amdgpu_cs_syncobj_import_sync_file2
  amdgpu_cs_syncobj_query
+amdgpu_cs_syncobj_query2
  amdgpu_cs_syncobj_reset
  amdgpu_cs_syncobj_signal
  amdgpu_cs_syncobj_timeline_signal
diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index 9d9b0832..19f74bd6 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -1591,6 +1591,24 @@ int amdgpu_cs_syncobj_timeline_wait(amdgpu_device_handle 
dev,
  int amdgpu_cs_syncobj_query(amdgpu_device_handle dev,
uint32_t *handles, uint64_t *points,
unsigned num_handles);
+/**
+ *  Query sync objects last signaled or submitted point.
+ *
+ * \param   dev- \c [in] self-explanatory
+ * \param   handles - \c [in] array of sync object handles
+ * \param   points - \c [out] array of sync points returned, which presents
+ * syncobj payload.
+ * \param   num_handles - \c [in] self-explanatory
+ * \param   flags   - \c [in] a bitmask of DRM_SYNCOBJ_QUERY_FLAGS_*
+ *
+ * \return   0 on success\n
+ *  -ETIME - Timeout
+ *  <0 - Negative POSIX Error code
+ *
+ */
+int amdgpu_cs_syncobj_query2(amdgpu_device_handle dev,
+uint32_t *handles, uint64_t *points,
+unsigned num_handles, uint32_t flags);
  
  /**

   *  Export kernel sync object to shareable fd.
diff --git a/amdgpu/amdgpu_cs.c b/amdgpu/amdgpu_cs.c
index 977fa3cf..01e2b2c8 100644
--- a/amdgpu/amdgpu_cs.c
+++ b/amdgpu/amdgpu_cs.c
@@ -721,6 +721,16 @@ drm_public int 
amdgpu_cs_syncobj_query(amdgpu_device_handle dev,
return drmSyncobjQuery(dev->fd, handles, points, num_handles);
  }
  
+drm_public int amdgpu_cs_syncobj_query2(amdgpu_device_handle dev,

+   uint32_t *handles, uint64_t *points,
+   unsigned num_handles, uint32_t flags)
+{
+   if (!dev)
+   return -EINVAL;
+
+   return drmSyncobjQuery2(dev->fd, handles, points, num_handles, flags);
+}
+
  drm_public int amdgpu_cs_export_syncobj(amdgpu_device_handle dev,
uint32_t handle,
int *shared_fd)
diff --git a/include/drm/drm.h b/include/drm/drm.h
index 532787bf..af37a80b 100644
--- a/include/drm/drm.h
+++ b/include/drm/drm.h
@@ -771,11 +771,12 @@ struct drm_syncobj_array {
__u32 pad;
  };
  
+#define DRM_SYNCOBJ_QUERY_FLAGS_LAST_SUBMITTED (1 << 0) /* last available point on timeline syncobj */

  struct drm_syncobj_timeline_array {
__u64 handles;
__u64 points;
__u32 count_handles;
-   __u32 pad;
+   __u32 flags;
  };
  
  
diff --git a/xf86drm.c b/xf86drm.c

index 953fc762..28a61264 100644
--- a/xf86drm.c
+++ b/xf86drm.c
@@ -4314,6 +4314,21 @@ drm_public int drmSyncobjQuery(int fd, uint32_t 
*handles, uint64_t *points,
  return 0;
  }
  
+drm_public int drmSyncobjQuery2(int fd, uint32_t *handles, uint64_t *points,

+   uint32_t handle_count, uint32_t flags)
+{
+struct drm_syncobj_timeline_array args;
+
+memclear(args);
+args.handles = (uintptr_t)handles;
+args.points = (uintptr_t)points;
+args.count_handles = handle_count;
+args.flags = flags;
+
+return drmIoctl(fd, DRM_IOCTL_SYNCOBJ_QUERY, );
+}
+
+
  drm_public int drmSyncobjTransfer(int fd,
  uint32_t dst_handle, uint64_t dst_point,
  uint32_t src_handle, uint64_t src_point,
diff --git a/xf86drm.h b/xf86drm.h
index 3fb1d1ca..55ceaed9 100644
--- a/xf86drm.h
+++ b/xf86drm.h
@@ -884,6 +884,8 @@ extern int drmSyncobjTimelineWait(int fd, uint32_t 
*handles, uint64_t *points,
  uint32_t *first_signaled);
  extern int drmSyncobjQuery(int fd, uint32_t *handles, uint64_t *points,
   uint32_t handle_count);
+extern int drmSyncobjQuery2(int fd, uint32_t *handles, uint64_t *points,
+   uint32_t handle_count, uint32_t flags);
  extern int drmSyncobjTransfer(int fd,
  uint32_t 

Re: [PATCH] drm/syncobj: extend syncobj query ability v2

2019-07-24 Thread zhoucm1



On 2019年07月24日 03:20, Lionel Landwerlin wrote:

On 23/07/2019 17:21, Chunming Zhou wrote:

user space needs a flexiable query ability.
So that umd can get last signaled or submitted point.
v2:
add sanitizer checking.

Change-Id: I6512b430524ebabe715e602a2bf5abb0a7e780ea
Signed-off-by: Chunming Zhou 
Cc: Lionel Landwerlin 
Cc: Christian König 


Reviewed-by: Lionel Landwerlin 


Thanks.
Which branch should this patch go to?
Is it OK to push to amd-staging-drm-next?
Or should it go to drm-misc?
If drm-misc, I need your help to push it since I have no permission to 
write.


-David



---
  drivers/gpu/drm/drm_syncobj.c | 34 --
  include/uapi/drm/drm.h    |  3 ++-
  2 files changed, 22 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c 
b/drivers/gpu/drm/drm_syncobj.c

index 3d400905100b..3fc8f66ada68 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -1197,7 +1197,7 @@ drm_syncobj_timeline_signal_ioctl(struct 
drm_device *dev, void *data,

  if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ_TIMELINE))
  return -EOPNOTSUPP;
  -    if (args->pad != 0)
+    if (args->flags != 0)
  return -EINVAL;
    if (args->count_handles == 0)
@@ -1268,7 +1268,7 @@ int drm_syncobj_query_ioctl(struct drm_device 
*dev, void *data,

  if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ_TIMELINE))
  return -EOPNOTSUPP;
  -    if (args->pad != 0)
+    if (args->flags & ~DRM_SYNCOBJ_QUERY_FLAGS_LAST_SUBMITTED)
  return -EINVAL;
    if (args->count_handles == 0)
@@ -1291,23 +1291,29 @@ int drm_syncobj_query_ioctl(struct drm_device 
*dev, void *data,

  if (chain) {
  struct dma_fence *iter, *last_signaled = NULL;
  -    dma_fence_chain_for_each(iter, fence) {
-    if (!iter)
-    break;
-    dma_fence_put(last_signaled);
-    last_signaled = dma_fence_get(iter);
-    if (!to_dma_fence_chain(last_signaled)->prev_seqno)
-    /* It is most likely that timeline has
- * unorder points. */
-    break;
+    if (args->flags &
+    DRM_SYNCOBJ_QUERY_FLAGS_LAST_SUBMITTED) {
+    point = fence->seqno;
+    } else {
+    dma_fence_chain_for_each(iter, fence) {
+    if (!iter)
+    break;
+    dma_fence_put(last_signaled);
+    last_signaled = dma_fence_get(iter);
+    if (!to_dma_fence_chain(last_signaled)->prev_seqno)
+    /* It is most likely that timeline has
+    * unorder points. */
+    break;
+    }
+    point = dma_fence_is_signaled(last_signaled) ?
+    last_signaled->seqno :
+ to_dma_fence_chain(last_signaled)->prev_seqno;
  }
-    point = dma_fence_is_signaled(last_signaled) ?
-    last_signaled->seqno :
- to_dma_fence_chain(last_signaled)->prev_seqno;
  dma_fence_put(last_signaled);
  } else {
  point = 0;
  }
+    dma_fence_put(fence);
  ret = copy_to_user([i], , sizeof(uint64_t));
  ret = ret ? -EFAULT : 0;
  if (ret)
diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
index 661d73f9a919..fd987ce24d9f 100644
--- a/include/uapi/drm/drm.h
+++ b/include/uapi/drm/drm.h
@@ -777,11 +777,12 @@ struct drm_syncobj_array {
  __u32 pad;
  };
  +#define DRM_SYNCOBJ_QUERY_FLAGS_LAST_SUBMITTED (1 << 0) /* last 
available point on timeline syncobj */

  struct drm_syncobj_timeline_array {
  __u64 handles;
  __u64 points;
  __u32 count_handles;
-    __u32 pad;
+    __u32 flags;
  };





___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] drm/syncobj: return meaningful value to user space

2019-07-22 Thread zhoucm1



On 2019年07月22日 16:46, Lionel Landwerlin wrote:

On 18/07/2019 14:13, Chunming Zhou wrote:
if WAIT_FOR_SUBMIT isn't set and in the meanwhile no underlying fence 
on syncobj,

then return non-block error code to user sapce.

Signed-off-by: Chunming Zhou 
---
  drivers/gpu/drm/drm_syncobj.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c 
b/drivers/gpu/drm/drm_syncobj.c

index 361a01a08c18..929f7c64f9a2 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -252,7 +252,7 @@ int drm_syncobj_find_fence(struct drm_file 
*file_private,

  return 0;
  dma_fence_put(*fence);
  } else {
-    ret = -EINVAL;
+    ret = -ENOTBLK;



This will only return the new error when there is no chain fence in 
the syncobj?

If all of you agree, that's best.
I've checked orginal EINVAL,there are 3 situations which would return 
EINVAL:

a. invalid flags
b. count_handles
c. failed to find fence in syncobj.

If user can make sure sanitization for paramters, then EINVAL can be 
used to identify "lack of fence in syncobj", which is waitBeforeSignal. 
I use it in my current implementation.




Don't you want the new error code after dma_fence_chain_find_seqno() too?
No, I don't want to that, I just want to a meaningful and unique error 
code for umd.





Which make me realize there is probably a bug with this code :


ret = dma_fence_chain_find_seqno(fence, point);
if (!ret)
    return 0;
dma_fence_put(*fence);


Sounds like the condition should be

if (ret)

        return ret;


I realize we have introduced a blocking behavior on the transfer ioctl.

If we're going to change this to return EWOULDBLOCK, we might want to 
get rid of it.

Sounds right, but I think current implementation is acceptable as well.

-David



-Lionel



  }
    if (!(flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT))
@@ -832,7 +832,7 @@ static signed long 
drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,

  if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) {
  continue;
  } else {
-    timeout = -EINVAL;
+    timeout = -ENOTBLK;
  goto cleanup_entries;
  }
  }





___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] drm/syncobj: return meaningful value to user space

2019-07-19 Thread zhoucm1



On 2019年07月19日 16:13, Lionel Landwerlin wrote:

On 18/07/2019 17:33, Chunming Zhou wrote:

在 2019/7/18 22:08, Lionel Landwerlin 写道:

On 18/07/2019 16:02, Chunming Zhou wrote:

在 2019/7/18 19:31, Lionel Landwerlin 写道:

On 18/07/2019 14:13, Chunming Zhou wrote:
if WAIT_FOR_SUBMIT isn't set and in the meanwhile no underlying 
fence

on syncobj,
then return non-block error code to user sapce.

Signed-off-by: Chunming Zhou 
---
    drivers/gpu/drm/drm_syncobj.c | 4 ++--
    1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c
b/drivers/gpu/drm/drm_syncobj.c
index 361a01a08c18..929f7c64f9a2 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -252,7 +252,7 @@ int drm_syncobj_find_fence(struct drm_file
*file_private,
    return 0;
    dma_fence_put(*fence);
    } else {
-    ret = -EINVAL;
+    ret = -ENOTBLK;
    }
      if (!(flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT))
@@ -832,7 +832,7 @@ static signed long
drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,
    if (flags & 
DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) {

    continue;
    } else {
-    timeout = -EINVAL;
+    timeout = -ENOTBLK;
    goto cleanup_entries;
    }
    }

This would break existing tests for binary syncobjs.

How does this breaks binary syncobj?


This is used in the submission path of several drivers.

Changing the error code will change what the drivers are reporting to
userspace and could break tests.


i915 doesn't use that function so it's not affected but
lima/panfrost/vc4 seem to be.


anyone from vc4 can confirm this? There are many place in wait_ioctl
being able to return previous EINVAL, not sure what they use to.







Is this really what we want?
I want to use this meaningful return value to judge if 
WaitBeforeSignal

happens.

I think this is the cheapest change for that.


I thought the plan was to add a new ioctl to query the last submitted
value.

Yes, that is optional way too.  I just thought changing return value is
very cheap, isn't it?


-David



I could be misguided but I thought the kernel policy was to never 
break userspace.

But no one exactly points how to break userspace, doesn't it?

-David


I'm not sure where this sits :/


-Lionel





Did I misunderstand?


Thanks,


-Lionel



-David



-Lionel






___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 1/2] update drm.h

2019-06-06 Thread zhoucm1

https://gitlab.freedesktop.org/mesa/drm, Where the merge request button?

-David


On 2019年06月06日 18:20, Michel Dänzer wrote:

On 2019-05-24 7:15 a.m., zhoucm1 wrote:

anyone can pick up to gitlab for libdrm?

Can you create a merge request?




___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 1/2] update drm.h

2019-05-23 Thread zhoucm1

anyone can pick up to gitlab for libdrm?


Thanks,

-David


On 2019年05月22日 18:46, Koenig, Christian wrote:

Am 22.05.19 um 11:07 schrieb Chunming Zhou:

 a) delta: only DRM_CAP_SYNCOBJ_TIMELINE
 b) Generated using make headers_install.
 c) Generated from origin/drm-misc-next commit 
982c0500fd1a8012c31d3c9dd8de285129904656"

Signed-off-by: Chunming Zhou 
Suggested-by: Michel Dänzer 

Reviewed-by: Christian König 


---
   include/drm/drm.h | 2 ++
   1 file changed, 2 insertions(+)

diff --git a/include/drm/drm.h b/include/drm/drm.h
index c893f3b4..438abde3 100644
--- a/include/drm/drm.h
+++ b/include/drm/drm.h
@@ -44,6 +44,7 @@ typedef unsigned int drm_handle_t;
   
   #else /* One of the BSDs */
   
+#include 

   #include 
   #include 
   typedef int8_t   __s8;
@@ -643,6 +644,7 @@ struct drm_gem_open {
   #define DRM_CAP_PAGE_FLIP_TARGET 0x11
   #define DRM_CAP_CRTC_IN_VBLANK_EVENT 0x12
   #define DRM_CAP_SYNCOBJ  0x13
+#define DRM_CAP_SYNCOBJ_TIMELINE   0x14
   
   /** DRM_IOCTL_GET_CAP ioctl argument type */

   struct drm_get_cap {


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 06/10] drm/ttm: fix busy memory to fail other user v10

2019-05-23 Thread zhoucm1



On 2019年05月22日 20:59, Christian König wrote:

[CAUTION: External Email]

BOs on the LRU might be blocked during command submission
and cause OOM situations.

Avoid this by blocking for the first busy BO not locked by
the same ticket as the BO we are searching space for.

v10: completely start over with the patch since we didn't
  handled a whole bunch of corner cases.

Signed-off-by: Christian König 
---
  drivers/gpu/drm/ttm/ttm_bo.c | 77 ++--
  1 file changed, 66 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 4c6389d849ed..861facac33d4 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -771,32 +771,72 @@ EXPORT_SYMBOL(ttm_bo_eviction_valuable);
   * b. Otherwise, trylock it.
   */
  static bool ttm_bo_evict_swapout_allowable(struct ttm_buffer_object *bo,
-   struct ttm_operation_ctx *ctx, bool *locked)
+   struct ttm_operation_ctx *ctx, bool *locked, bool *busy)
  {
 bool ret = false;

-   *locked = false;
 if (bo->resv == ctx->resv) {
 reservation_object_assert_held(bo->resv);
 if (ctx->flags & TTM_OPT_FLAG_ALLOW_RES_EVICT
 || !list_empty(>ddestroy))
 ret = true;
+   *locked = false;
+   if (busy)
+   *busy = false;
 } else {
-   *locked = reservation_object_trylock(bo->resv);
-   ret = *locked;
+   ret = reservation_object_trylock(bo->resv);
+   *locked = ret;
+   if (busy)
+   *busy = !ret;
 }

 return ret;
  }

+/**
+ * ttm_mem_evict_wait_busy - wait for a busy BO to become available
+ *
+ * @busy_bo: BO which couldn't be locked with trylock
+ * @ctx: operation context
+ * @ticket: acquire ticket
+ *
+ * Try to lock a busy buffer object to avoid failing eviction.
+ */
+static int ttm_mem_evict_wait_busy(struct ttm_buffer_object *busy_bo,
+  struct ttm_operation_ctx *ctx,
+  struct ww_acquire_ctx *ticket)
+{
+   int r;
+
+   if (!busy_bo || !ticket)
+   return -EBUSY;
+
+   if (ctx->interruptible)
+   r = reservation_object_lock_interruptible(busy_bo->resv,
+ ticket);
+   else
+   r = reservation_object_lock(busy_bo->resv, ticket);
+
+   /*
+* TODO: It would be better to keep the BO locked until allocation is at
+* least tried one more time, but that would mean a much larger rework
+* of TTM.
+*/
+   if (!r)
+   reservation_object_unlock(busy_bo->resv);
+
+   return r == -EDEADLK ? -EAGAIN : r;
+}
+
  static int ttm_mem_evict_first(struct ttm_bo_device *bdev,
uint32_t mem_type,
const struct ttm_place *place,
-  struct ttm_operation_ctx *ctx)
+  struct ttm_operation_ctx *ctx,
+  struct ww_acquire_ctx *ticket)
  {
+   struct ttm_buffer_object *bo = NULL, *busy_bo = NULL;
 struct ttm_bo_global *glob = bdev->glob;
 struct ttm_mem_type_manager *man = >man[mem_type];
-   struct ttm_buffer_object *bo = NULL;
 bool locked = false;
 unsigned i;
 int ret;
@@ -804,8 +844,15 @@ static int ttm_mem_evict_first(struct ttm_bo_device *bdev,
 spin_lock(>lru_lock);
 for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
 list_for_each_entry(bo, >lru[i], lru) {
-   if (!ttm_bo_evict_swapout_allowable(bo, ctx, ))
+   bool busy;
+
+   if (!ttm_bo_evict_swapout_allowable(bo, ctx, ,
+   )) {
+   if (busy && !busy_bo &&
+   bo->resv->lock.ctx != ticket)
+   busy_bo = bo;
 continue;
+   }

 if (place && !bdev->driver->eviction_valuable(bo,
   place)) {
@@ -824,8 +871,13 @@ static int ttm_mem_evict_first(struct ttm_bo_device *bdev,
 }

 if (!bo) {
+   if (busy_bo)
+   ttm_bo_get(busy_bo);
 spin_unlock(>lru_lock);
-   return -EBUSY;
+   ret = ttm_mem_evict_wait_busy(busy_bo, ctx, ticket);
If you rely on EAGAIN, why do you still try to lock busy_bo? any 
negative effect if directly return EAGAIN without tring lock?


-David

+   if (busy_bo)
+   ttm_bo_put(busy_bo);
+   return ret;
 }

 

Re: [PATCH 01/10] drm/ttm: Make LRU removal optional.

2019-05-23 Thread zhoucm1



On 2019年05月22日 20:59, Christian König wrote:

[CAUTION: External Email]

We are already doing this for DMA-buf imports and also for
amdgpu VM BOs for quite a while now.

If this doesn't run into any problems we are probably going
to stop removing BOs from the LRU altogether.

Signed-off-by: Christian König 
---
  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  9 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c|  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c   |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   |  4 ++--
  drivers/gpu/drm/qxl/qxl_release.c |  2 +-
  drivers/gpu/drm/radeon/radeon_gem.c   |  2 +-
  drivers/gpu/drm/radeon/radeon_object.c|  2 +-
  drivers/gpu/drm/ttm/ttm_execbuf_util.c| 20 +++
  drivers/gpu/drm/virtio/virtgpu_ioctl.c|  2 +-
  drivers/gpu/drm/vmwgfx/vmwgfx_resource.c  |  3 ++-
  drivers/gpu/drm/vmwgfx/vmwgfx_validation.h|  2 +-
  include/drm/ttm/ttm_bo_driver.h   |  5 -
  include/drm/ttm/ttm_execbuf_util.h|  3 ++-
  13 files changed, 34 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index e1cae4a37113..647e18f9e136 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -574,7 +574,7 @@ static int reserve_bo_and_vm(struct kgd_mem *mem,
 amdgpu_vm_get_pd_bo(vm, >list, >vm_pd[0]);

 ret = ttm_eu_reserve_buffers(>ticket, >list,
-false, >duplicates);
+false, >duplicates, true);
 if (!ret)
 ctx->reserved = true;
 else {
@@ -647,7 +647,7 @@ static int reserve_bo_and_cond_vms(struct kgd_mem *mem,
 }

 ret = ttm_eu_reserve_buffers(>ticket, >list,
-false, >duplicates);
+false, >duplicates, true);
 if (!ret)
 ctx->reserved = true;
 else
@@ -1800,7 +1800,8 @@ static int validate_invalid_user_pages(struct 
amdkfd_process_info *process_info)
 }

 /* Reserve all BOs and page tables for validation */
-   ret = ttm_eu_reserve_buffers(, _list, false, );
+   ret = ttm_eu_reserve_buffers(, _list, false, ,
+true);
 WARN(!list_empty(), "Duplicates should be empty");
 if (ret)
 goto out_free;
@@ -2006,7 +2007,7 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, 
struct dma_fence **ef)
 }

 ret = ttm_eu_reserve_buffers(, ,
-false, _save);
+false, _save, true);
 if (ret) {
 pr_debug("Memory eviction: TTM Reserve Failed. Try again\n");
 goto ttm_reserve_fail;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index d72cc583ebd1..fff558cf385b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -648,7 +648,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
 }

 r = ttm_eu_reserve_buffers(>ticket, >validated, true,
-  );
+  , true);
 if (unlikely(r != 0)) {
 if (r != -ERESTARTSYS)
 DRM_ERROR("ttm_eu_reserve_buffers failed.\n");
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
index 54dd02a898b9..06f83cac0d3a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
@@ -79,7 +79,7 @@ int amdgpu_map_static_csa(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
 list_add(_tv.head, );
 amdgpu_vm_get_pd_bo(vm, , );

-   r = ttm_eu_reserve_buffers(, , true, NULL);
+   r = ttm_eu_reserve_buffers(, , true, NULL, true);
 if (r) {
 DRM_ERROR("failed to reserve CSA,PD BOs: err=%d\n", r);
 return r;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 7b840367004c..d513a5ad03dd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -171,7 +171,7 @@ void amdgpu_gem_object_close(struct drm_gem_object *obj,

 amdgpu_vm_get_pd_bo(vm, , _pd);

-   r = ttm_eu_reserve_buffers(, , false, );
+   r = ttm_eu_reserve_buffers(, , false, , true);
 if (r) {
 dev_err(adev->dev, "leaking bo va because "
 "we fail to reserve bo (%d)\n", r);
@@ -608,7 +608,7 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,

 amdgpu_vm_get_pd_bo(>vm, , _pd);

-   r = ttm_eu_reserve_buffers(, , true, );
+   r = ttm_eu_reserve_buffers(, , true, , true);
   

Re: drm-sync timeline signaling

2019-05-21 Thread zhoucm1

Sorry for late response.

Although we don't expect that, drm_syncobj_timeline_signal_ioctl already 
handle this case I think. Which can handle both (point value > 0) and 
(point value = 0).



-David


On 2019年05月21日 16:44, Lionel Landwerlin wrote:

[CAUTION: External Email]

Ping?

On 16/05/2019 15:49, Lionel Landwerlin wrote:

Hi all,

While picking up the IGT tests for timeline syncobj,
I noticed that although we deal with multi wait across both timeline
(with point value > 0) and binary (point value = 0) syncobjs,
we don't seem to have a similar behavior with signaling.

Do you have any thought on this?
I'm considering writing some docs but I'm not quite sure whether this
difference between signaling/waiting was intentional or just overlooked.

Thanks,

-Lionel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel





___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH libdrm] enable syncobj test depending on capability

2019-05-17 Thread zhoucm1
ping, Could you help check in patch to gitlab? My connection to gitlab 
still has problem.



Thanks,

-David


On 2019年05月16日 19:03, Zhou, David(ChunMing) wrote:

could you help push this patch as well?

Thanks,
-David

 Original Message 
Subject: Re: [PATCH libdrm] enable syncobj test depending on capability
From: "Koenig, Christian"
To: "Zhou, David(ChunMing)" ,dri-devel@lists.freedesktop.org
CC:

Am 16.05.19 um 12:46 schrieb Chunming Zhou:
> Feature is controlled by DRM_CAP_SYNCOBJ_TIMELINE drm capability.
>
> Signed-off-by: Chunming Zhou 

Reviewed-by: Christian König 

> ---
>   include/drm/drm.h    | 1 +
>   tests/amdgpu/syncobj_tests.c | 8 
>   2 files changed, 9 insertions(+)
>
> diff --git a/include/drm/drm.h b/include/drm/drm.h
> index c893f3b4..532787bf 100644
> --- a/include/drm/drm.h
> +++ b/include/drm/drm.h
> @@ -643,6 +643,7 @@ struct drm_gem_open {
>   #define DRM_CAP_PAGE_FLIP_TARGET    0x11
>   #define DRM_CAP_CRTC_IN_VBLANK_EVENT    0x12
>   #define DRM_CAP_SYNCOBJ 0x13
> +#define DRM_CAP_SYNCOBJ_TIMELINE 0x14
>
>   /** DRM_IOCTL_GET_CAP ioctl argument type */
>   struct drm_get_cap {
> diff --git a/tests/amdgpu/syncobj_tests.c b/tests/amdgpu/syncobj_tests.c
> index a0c627d7..869ed88e 100644
> --- a/tests/amdgpu/syncobj_tests.c
> +++ b/tests/amdgpu/syncobj_tests.c
> @@ -22,6 +22,7 @@
>   */
>
>   #include "CUnit/Basic.h"
> +#include "xf86drm.h"
>
>   #include "amdgpu_test.h"
>   #include "amdgpu_drm.h"
> @@ -36,6 +37,13 @@ static void amdgpu_syncobj_timeline_test(void);
>
>   CU_BOOL suite_syncobj_timeline_tests_enable(void)
>   {
> + int r;
> + uint64_t cap = 0;
> +
> + r = drmGetCap(drm_amdgpu[0], DRM_CAP_SYNCOBJ_TIMELINE, );
> + if (r || cap == 0)
> + return CU_FALSE;
> +
>    return CU_TRUE;
>   }
>



___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH libdrm 7/7] add syncobj timeline tests v3

2019-05-16 Thread zhoucm1



On 2019年05月16日 18:09, Christian König wrote:

[CAUTION: External Email]

Am 16.05.19 um 10:16 schrieb zhoucm1:

I was able to push changes to libdrm, but now seems after libdrm is
migrated to gitlab, I cannot yet. What step do I need to get back my
permission? I already can login into gitlab with old freedesktop 
account.


@Christian, Can you help submit this patch set to libdrm first?


Done. And I think you can now request write permission to a repository
through the web-interface and all the "owners" of the project can grant
that to you.

Any guide for that? I failed to find where to request permission.

-David


Christian.




Thanks,

-David


On 2019年05月16日 16:07, Chunming Zhou wrote:

v2: drop DRM_SYNCOBJ_CREATE_TYPE_TIMELINE, fix timeout calculation,
 fix some warnings
v3: add export/import and cpu signal testing cases

Signed-off-by: Chunming Zhou 
Acked-by: Christian König 
Acked-by: Lionel Landwerlin 
---
  tests/amdgpu/Makefile.am |   3 +-
  tests/amdgpu/amdgpu_test.c   |  11 ++
  tests/amdgpu/amdgpu_test.h   |  21 +++
  tests/amdgpu/meson.build |   2 +-
  tests/amdgpu/syncobj_tests.c | 290 
+++

  5 files changed, 325 insertions(+), 2 deletions(-)
  create mode 100644 tests/amdgpu/syncobj_tests.c

diff --git a/tests/amdgpu/Makefile.am b/tests/amdgpu/Makefile.am
index 48278848..920882d0 100644
--- a/tests/amdgpu/Makefile.am
+++ b/tests/amdgpu/Makefile.am
@@ -34,4 +34,5 @@ amdgpu_test_SOURCES = \
  uve_ib.h \
  deadlock_tests.c \
  vm_tests.c    \
-    ras_tests.c
+    ras_tests.c \
+    syncobj_tests.c
diff --git a/tests/amdgpu/amdgpu_test.c b/tests/amdgpu/amdgpu_test.c
index 35c8bf6c..73403fb4 100644
--- a/tests/amdgpu/amdgpu_test.c
+++ b/tests/amdgpu/amdgpu_test.c
@@ -57,6 +57,7 @@
  #define DEADLOCK_TESTS_STR "Deadlock Tests"
  #define VM_TESTS_STR "VM Tests"
  #define RAS_TESTS_STR "RAS Tests"
+#define SYNCOBJ_TIMELINE_TESTS_STR "SYNCOBJ TIMELINE Tests"
    /**
   *  Open handles for amdgpu devices
@@ -123,6 +124,12 @@ static CU_SuiteInfo suites[] = {
  .pCleanupFunc = suite_ras_tests_clean,
  .pTests = ras_tests,
  },
+    {
+    .pName = SYNCOBJ_TIMELINE_TESTS_STR,
+    .pInitFunc = suite_syncobj_timeline_tests_init,
+    .pCleanupFunc = suite_syncobj_timeline_tests_clean,
+    .pTests = syncobj_timeline_tests,
+    },
    CU_SUITE_INFO_NULL,
  };
@@ -176,6 +183,10 @@ static Suites_Active_Status suites_active_stat[]
= {
  .pName = RAS_TESTS_STR,
  .pActive = suite_ras_tests_enable,
  },
+    {
+    .pName = SYNCOBJ_TIMELINE_TESTS_STR,
+    .pActive = suite_syncobj_timeline_tests_enable,
+    },
  };
    diff --git a/tests/amdgpu/amdgpu_test.h 
b/tests/amdgpu/amdgpu_test.h

index bcd0bc7e..36675ea3 100644
--- a/tests/amdgpu/amdgpu_test.h
+++ b/tests/amdgpu/amdgpu_test.h
@@ -216,6 +216,27 @@ CU_BOOL suite_ras_tests_enable(void);
  extern CU_TestInfo ras_tests[];
    +/**
+ * Initialize syncobj timeline test suite
+ */
+int suite_syncobj_timeline_tests_init();
+
+/**
+ * Deinitialize syncobj timeline test suite
+ */
+int suite_syncobj_timeline_tests_clean();
+
+/**
+ * Decide if the suite is enabled by default or not.
+ */
+CU_BOOL suite_syncobj_timeline_tests_enable(void);
+
+/**
+ * Tests in syncobj timeline test suite
+ */
+extern CU_TestInfo syncobj_timeline_tests[];
+
+
  /**
   * Helper functions
   */
diff --git a/tests/amdgpu/meson.build b/tests/amdgpu/meson.build
index 95ed9305..1726cb43 100644
--- a/tests/amdgpu/meson.build
+++ b/tests/amdgpu/meson.build
@@ -24,7 +24,7 @@ if dep_cunit.found()
  files(
    'amdgpu_test.c', 'basic_tests.c', 'bo_tests.c', 'cs_tests.c',
    'vce_tests.c', 'uvd_enc_tests.c', 'vcn_tests.c',
'deadlock_tests.c',
-  'vm_tests.c', 'ras_tests.c',
+  'vm_tests.c', 'ras_tests.c', 'syncobj_tests.c',
  ),
  dependencies : [dep_cunit, dep_threads],
  include_directories : [inc_root, inc_drm,
include_directories('../../amdgpu')],
diff --git a/tests/amdgpu/syncobj_tests.c 
b/tests/amdgpu/syncobj_tests.c

new file mode 100644
index ..a0c627d7
--- /dev/null
+++ b/tests/amdgpu/syncobj_tests.c
@@ -0,0 +1,290 @@
+/*
+ * Copyright 2017 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person
obtaining a
+ * copy of this software and associated documentation files (the
"Software"),
+ * to deal in the Software without restriction, including without
limitation
+ * the rights to use, copy, modify, merge, publish, distribute,
sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom
the
+ * Software is furnished to do so, subject to the following 
conditions:

+ *
+ * The above copyright notice and this permission notice shall be
included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT W

Re: [PATCH libdrm 7/7] add syncobj timeline tests v3

2019-05-16 Thread zhoucm1
I was able to push changes to libdrm, but now seems after libdrm is 
migrated to gitlab, I cannot yet. What step do I need to get back my 
permission? I already can login into gitlab with old freedesktop account.


@Christian, Can you help submit this patch set to libdrm first?


Thanks,

-David


On 2019年05月16日 16:07, Chunming Zhou wrote:

v2: drop DRM_SYNCOBJ_CREATE_TYPE_TIMELINE, fix timeout calculation,
 fix some warnings
v3: add export/import and cpu signal testing cases

Signed-off-by: Chunming Zhou 
Acked-by: Christian König 
Acked-by: Lionel Landwerlin 
---
  tests/amdgpu/Makefile.am |   3 +-
  tests/amdgpu/amdgpu_test.c   |  11 ++
  tests/amdgpu/amdgpu_test.h   |  21 +++
  tests/amdgpu/meson.build |   2 +-
  tests/amdgpu/syncobj_tests.c | 290 +++
  5 files changed, 325 insertions(+), 2 deletions(-)
  create mode 100644 tests/amdgpu/syncobj_tests.c

diff --git a/tests/amdgpu/Makefile.am b/tests/amdgpu/Makefile.am
index 48278848..920882d0 100644
--- a/tests/amdgpu/Makefile.am
+++ b/tests/amdgpu/Makefile.am
@@ -34,4 +34,5 @@ amdgpu_test_SOURCES = \
uve_ib.h \
deadlock_tests.c \
vm_tests.c  \
-   ras_tests.c
+   ras_tests.c \
+   syncobj_tests.c
diff --git a/tests/amdgpu/amdgpu_test.c b/tests/amdgpu/amdgpu_test.c
index 35c8bf6c..73403fb4 100644
--- a/tests/amdgpu/amdgpu_test.c
+++ b/tests/amdgpu/amdgpu_test.c
@@ -57,6 +57,7 @@
  #define DEADLOCK_TESTS_STR "Deadlock Tests"
  #define VM_TESTS_STR "VM Tests"
  #define RAS_TESTS_STR "RAS Tests"
+#define SYNCOBJ_TIMELINE_TESTS_STR "SYNCOBJ TIMELINE Tests"
  
  /**

   *  Open handles for amdgpu devices
@@ -123,6 +124,12 @@ static CU_SuiteInfo suites[] = {
.pCleanupFunc = suite_ras_tests_clean,
.pTests = ras_tests,
},
+   {
+   .pName = SYNCOBJ_TIMELINE_TESTS_STR,
+   .pInitFunc = suite_syncobj_timeline_tests_init,
+   .pCleanupFunc = suite_syncobj_timeline_tests_clean,
+   .pTests = syncobj_timeline_tests,
+   },
  
  	CU_SUITE_INFO_NULL,

  };
@@ -176,6 +183,10 @@ static Suites_Active_Status suites_active_stat[] = {
.pName = RAS_TESTS_STR,
.pActive = suite_ras_tests_enable,
},
+   {
+   .pName = SYNCOBJ_TIMELINE_TESTS_STR,
+   .pActive = suite_syncobj_timeline_tests_enable,
+   },
  };
  
  
diff --git a/tests/amdgpu/amdgpu_test.h b/tests/amdgpu/amdgpu_test.h

index bcd0bc7e..36675ea3 100644
--- a/tests/amdgpu/amdgpu_test.h
+++ b/tests/amdgpu/amdgpu_test.h
@@ -216,6 +216,27 @@ CU_BOOL suite_ras_tests_enable(void);
  extern CU_TestInfo ras_tests[];
  
  
+/**

+ * Initialize syncobj timeline test suite
+ */
+int suite_syncobj_timeline_tests_init();
+
+/**
+ * Deinitialize syncobj timeline test suite
+ */
+int suite_syncobj_timeline_tests_clean();
+
+/**
+ * Decide if the suite is enabled by default or not.
+ */
+CU_BOOL suite_syncobj_timeline_tests_enable(void);
+
+/**
+ * Tests in syncobj timeline test suite
+ */
+extern CU_TestInfo syncobj_timeline_tests[];
+
+
  /**
   * Helper functions
   */
diff --git a/tests/amdgpu/meson.build b/tests/amdgpu/meson.build
index 95ed9305..1726cb43 100644
--- a/tests/amdgpu/meson.build
+++ b/tests/amdgpu/meson.build
@@ -24,7 +24,7 @@ if dep_cunit.found()
  files(
'amdgpu_test.c', 'basic_tests.c', 'bo_tests.c', 'cs_tests.c',
'vce_tests.c', 'uvd_enc_tests.c', 'vcn_tests.c', 'deadlock_tests.c',
-  'vm_tests.c', 'ras_tests.c',
+  'vm_tests.c', 'ras_tests.c', 'syncobj_tests.c',
  ),
  dependencies : [dep_cunit, dep_threads],
  include_directories : [inc_root, inc_drm, 
include_directories('../../amdgpu')],
diff --git a/tests/amdgpu/syncobj_tests.c b/tests/amdgpu/syncobj_tests.c
new file mode 100644
index ..a0c627d7
--- /dev/null
+++ b/tests/amdgpu/syncobj_tests.c
@@ -0,0 +1,290 @@
+/*
+ * Copyright 2017 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF 

Re: [PATCH libdrm 1/7] addr cs chunk for syncobj timeline

2019-05-14 Thread zhoucm1

Thank you, Lionel.

-David


On 2019年05月14日 17:49, Lionel Landwerlin wrote:

[CAUTION: External Email]

With the small nits, patches 2 & 4 are : Reviewed-by: Lionel Landwerlin

The other patches are a bit amdgpu specific so maybe you might want
someone more familiar with amdgpu to review them.
Still I didn't see anything wrong with them so remaining patches are :
Acked-by: Lionel Landwerlin 

I'll send the IGT stuff shortly.

Thanks,

-Lionel


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH libdrm 1/7] addr cs chunk for syncobj timeline

2019-05-13 Thread zhoucm1

ping... for patch set.


On 2019年05月13日 17:52, Chunming Zhou wrote:

[CAUTION: External Email]

Signed-off-by: Chunming Zhou 
---
  include/drm/amdgpu_drm.h | 9 +
  1 file changed, 9 insertions(+)

diff --git a/include/drm/amdgpu_drm.h b/include/drm/amdgpu_drm.h
index d0701ffc..3d0318e6 100644
--- a/include/drm/amdgpu_drm.h
+++ b/include/drm/amdgpu_drm.h
@@ -528,6 +528,8 @@ struct drm_amdgpu_gem_va {
  #define AMDGPU_CHUNK_ID_SYNCOBJ_OUT 0x05
  #define AMDGPU_CHUNK_ID_BO_HANDLES  0x06
  #define AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES 0x07
+#define AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_WAIT0x08
+#define AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_SIGNAL  0x09

  struct drm_amdgpu_cs_chunk {
 __u32   chunk_id;
@@ -608,6 +610,13 @@ struct drm_amdgpu_cs_chunk_sem {
 __u32 handle;
  };

+struct drm_amdgpu_cs_chunk_syncobj {
+   __u32 handle;
+   __u32 flags;
+   __u64 point;
+};
+
+
  #define AMDGPU_FENCE_TO_HANDLE_GET_SYNCOBJ 0
  #define AMDGPU_FENCE_TO_HANDLE_GET_SYNCOBJ_FD  1
  #define AMDGPU_FENCE_TO_HANDLE_GET_SYNC_FILE_FD2
--
2.17.1

___
amd-gfx mailing list
amd-...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 1/2] drm/ttm: fix busy memory to fail other user v6

2019-05-07 Thread zhoucm1



On 2019年05月07日 19:13, Koenig, Christian wrote:

Am 07.05.19 um 13:08 schrieb zhoucm1:


On 2019年05月07日 18:53, Koenig, Christian wrote:

Am 07.05.19 um 11:36 schrieb Chunming Zhou:

heavy gpu job could occupy memory long time, which lead other user
fail to get memory.

basically pick up Christian idea:

1. Reserve the BO in DC using a ww_mutex ticket (trivial).
2. If we then run into this EBUSY condition in TTM check if the BO
we need memory for (or rather the ww_mutex of its reservation
object) has a ticket assigned.
3. If we have a ticket we grab a reference to the first BO on the
LRU, drop the LRU lock and try to grab the reservation lock with the
ticket.
4. If getting the reservation lock with the ticket succeeded we
check if the BO is still the first one on the LRU in question (the
BO could have moved).
5. If the BO is still the first one on the LRU in question we try to
evict it as we would evict any other BO.
6. If any of the "If's" above fail we just back off and return -EBUSY.

v2: fix some minor check
v3: address Christian v2 comments.
v4: fix some missing
v5: handle first_bo unlock and bo_get/put
v6: abstract unified iterate function, and handle all possible
usecase not only pinned bo.

Change-Id: I21423fb922f885465f13833c41df1e134364a8e7
Signed-off-by: Chunming Zhou 
---
    drivers/gpu/drm/ttm/ttm_bo.c | 113
++-
    1 file changed, 97 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
b/drivers/gpu/drm/ttm/ttm_bo.c
index 8502b3ed2d88..bbf1d14d00a7 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -766,11 +766,13 @@ EXPORT_SYMBOL(ttm_bo_eviction_valuable);
     * b. Otherwise, trylock it.
     */
    static bool ttm_bo_evict_swapout_allowable(struct
ttm_buffer_object *bo,
-    struct ttm_operation_ctx *ctx, bool *locked)
+    struct ttm_operation_ctx *ctx, bool *locked, bool *busy)
    {
    bool ret = false;
       *locked = false;
+    if (busy)
+    *busy = false;
    if (bo->resv == ctx->resv) {
    reservation_object_assert_held(bo->resv);
    if (ctx->flags & TTM_OPT_FLAG_ALLOW_RES_EVICT
@@ -779,35 +781,45 @@ static bool
ttm_bo_evict_swapout_allowable(struct ttm_buffer_object *bo,
    } else {
    *locked = reservation_object_trylock(bo->resv);
    ret = *locked;
+    if (!ret && busy)
+    *busy = true;
    }
       return ret;
    }
    -static int ttm_mem_evict_first(struct ttm_bo_device *bdev,
-   uint32_t mem_type,
-   const struct ttm_place *place,
-   struct ttm_operation_ctx *ctx)
+static struct ttm_buffer_object*
+ttm_mem_find_evitable_bo(struct ttm_bo_device *bdev,
+ struct ttm_mem_type_manager *man,
+ const struct ttm_place *place,
+ struct ttm_operation_ctx *ctx,
+ struct ttm_buffer_object **first_bo,
+ bool *locked)
    {
-    struct ttm_bo_global *glob = bdev->glob;
-    struct ttm_mem_type_manager *man = >man[mem_type];
    struct ttm_buffer_object *bo = NULL;
-    bool locked = false;
-    unsigned i;
-    int ret;
+    int i;
    -    spin_lock(>lru_lock);
+    if (first_bo)
+    *first_bo = NULL;
    for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
    list_for_each_entry(bo, >lru[i], lru) {
-    if (!ttm_bo_evict_swapout_allowable(bo, ctx, ))
+    bool busy = false;
+    if (!ttm_bo_evict_swapout_allowable(bo, ctx, locked,
+    )) {

A newline between declaration and code please.


+    if (first_bo && !(*first_bo) && busy) {
+    ttm_bo_get(bo);
+    *first_bo = bo;
+    }
    continue;
+    }
       if (place && !bdev->driver->eviction_valuable(bo,
  place)) {
-    if (locked)
+    if (*locked)
    reservation_object_unlock(bo->resv);
    continue;
    }
+
    break;
    }
    @@ -818,9 +830,66 @@ static int ttm_mem_evict_first(struct
ttm_bo_device *bdev,
    bo = NULL;
    }
    +    return bo;
+}
+
+static int ttm_mem_evict_first(struct ttm_bo_device *bdev,
+   uint32_t mem_type,
+   const struct ttm_place *place,
+   struct ttm_operation_ctx *ctx)
+{
+    struct ttm_bo_global *glob = bdev->glob;
+    struct ttm_mem_type_manager *man = >man[mem_type];
+    struct ttm_buffer_object *bo = NULL, *first_bo = NULL;
+    bool locked = false;
+    int ret;
+
+    spin_lock(>lru_lock);
+    bo = ttm_mem_find_evitable_bo(bdev, man, place, ctx, _bo,
+  );
    if (!bo) {
+    struct ttm_operation_ctx busy_ctx;
+
    spin_unl

Re: [PATCH 1/2] drm/ttm: fix busy memory to fail other user v6

2019-05-07 Thread zhoucm1



On 2019年05月07日 18:53, Koenig, Christian wrote:

Am 07.05.19 um 11:36 schrieb Chunming Zhou:

heavy gpu job could occupy memory long time, which lead other user fail to get 
memory.

basically pick up Christian idea:

1. Reserve the BO in DC using a ww_mutex ticket (trivial).
2. If we then run into this EBUSY condition in TTM check if the BO we need 
memory for (or rather the ww_mutex of its reservation object) has a ticket 
assigned.
3. If we have a ticket we grab a reference to the first BO on the LRU, drop the 
LRU lock and try to grab the reservation lock with the ticket.
4. If getting the reservation lock with the ticket succeeded we check if the BO 
is still the first one on the LRU in question (the BO could have moved).
5. If the BO is still the first one on the LRU in question we try to evict it 
as we would evict any other BO.
6. If any of the "If's" above fail we just back off and return -EBUSY.

v2: fix some minor check
v3: address Christian v2 comments.
v4: fix some missing
v5: handle first_bo unlock and bo_get/put
v6: abstract unified iterate function, and handle all possible usecase not only 
pinned bo.

Change-Id: I21423fb922f885465f13833c41df1e134364a8e7
Signed-off-by: Chunming Zhou 
---
   drivers/gpu/drm/ttm/ttm_bo.c | 113 ++-
   1 file changed, 97 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 8502b3ed2d88..bbf1d14d00a7 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -766,11 +766,13 @@ EXPORT_SYMBOL(ttm_bo_eviction_valuable);
* b. Otherwise, trylock it.
*/
   static bool ttm_bo_evict_swapout_allowable(struct ttm_buffer_object *bo,
-   struct ttm_operation_ctx *ctx, bool *locked)
+   struct ttm_operation_ctx *ctx, bool *locked, bool *busy)
   {
bool ret = false;
   
   	*locked = false;

+   if (busy)
+   *busy = false;
if (bo->resv == ctx->resv) {
reservation_object_assert_held(bo->resv);
if (ctx->flags & TTM_OPT_FLAG_ALLOW_RES_EVICT
@@ -779,35 +781,45 @@ static bool ttm_bo_evict_swapout_allowable(struct 
ttm_buffer_object *bo,
} else {
*locked = reservation_object_trylock(bo->resv);
ret = *locked;
+   if (!ret && busy)
+   *busy = true;
}
   
   	return ret;

   }
   
-static int ttm_mem_evict_first(struct ttm_bo_device *bdev,

-  uint32_t mem_type,
-  const struct ttm_place *place,
-  struct ttm_operation_ctx *ctx)
+static struct ttm_buffer_object*
+ttm_mem_find_evitable_bo(struct ttm_bo_device *bdev,
+struct ttm_mem_type_manager *man,
+const struct ttm_place *place,
+struct ttm_operation_ctx *ctx,
+struct ttm_buffer_object **first_bo,
+bool *locked)
   {
-   struct ttm_bo_global *glob = bdev->glob;
-   struct ttm_mem_type_manager *man = >man[mem_type];
struct ttm_buffer_object *bo = NULL;
-   bool locked = false;
-   unsigned i;
-   int ret;
+   int i;
   
-	spin_lock(>lru_lock);

+   if (first_bo)
+   *first_bo = NULL;
for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
list_for_each_entry(bo, >lru[i], lru) {
-   if (!ttm_bo_evict_swapout_allowable(bo, ctx, ))
+   bool busy = false;
+   if (!ttm_bo_evict_swapout_allowable(bo, ctx, locked,
+   )) {

A newline between declaration and code please.


+   if (first_bo && !(*first_bo) && busy) {
+   ttm_bo_get(bo);
+   *first_bo = bo;
+   }
continue;
+   }
   
   			if (place && !bdev->driver->eviction_valuable(bo,

  place)) {
-   if (locked)
+   if (*locked)
reservation_object_unlock(bo->resv);
continue;
}
+
break;
}
   
@@ -818,9 +830,66 @@ static int ttm_mem_evict_first(struct ttm_bo_device *bdev,

bo = NULL;
}
   
+	return bo;

+}
+
+static int ttm_mem_evict_first(struct ttm_bo_device *bdev,
+  uint32_t mem_type,
+  const struct ttm_place *place,
+  struct ttm_operation_ctx *ctx)
+{
+   struct ttm_bo_global *glob = bdev->glob;
+   struct ttm_mem_type_manager *man = >man[mem_type];
+   struct 

Re: [PATCH] drm/ttm: fix busy memory to fail other user v5

2019-05-07 Thread zhoucm1



On 2019年05月07日 17:03, Christian König wrote:

[CAUTION: External Email]

Ping!
in fact, already address your comments, just wait for Prike test result, 
anyway, send v6 first.


-David


Marek is going to need this for its GDS patches as well.

Christina.

Am 30.04.19 um 11:10 schrieb Christian König:

Am 30.04.19 um 09:01 schrieb Chunming Zhou:

heavy gpu job could occupy memory long time, which lead other user
fail to get memory.

basically pick up Christian idea:

1. Reserve the BO in DC using a ww_mutex ticket (trivial).
2. If we then run into this EBUSY condition in TTM check if the BO we
need memory for (or rather the ww_mutex of its reservation object)
has a ticket assigned.
3. If we have a ticket we grab a reference to the first BO on the
LRU, drop the LRU lock and try to grab the reservation lock with the
ticket.
4. If getting the reservation lock with the ticket succeeded we check
if the BO is still the first one on the LRU in question (the BO could
have moved).
5. If the BO is still the first one on the LRU in question we try to
evict it as we would evict any other BO.
6. If any of the "If's" above fail we just back off and return -EBUSY.

v2: fix some minor check
v3: address Christian v2 comments.
v4: fix some missing
v5: handle first_bo unlock and bo_get/put

Change-Id: I21423fb922f885465f13833c41df1e134364a8e7
Signed-off-by: Chunming Zhou 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c    |  7 +-
  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 22 +++--
  drivers/gpu/drm/ttm/ttm_bo.c  | 81 
+--

  3 files changed, 99 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index affde72b44db..523773e85284 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -811,7 +811,12 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo
*bo, u32 domain,
   u64 min_offset, u64 max_offset)
  {
  struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
-    struct ttm_operation_ctx ctx = { false, false };
+    struct ttm_operation_ctx ctx = {
+    .interruptible = false,
+    .no_wait_gpu = false,
+    .resv = bo->tbo.resv,
+    .flags = 0
+    };
  int r, i;
    if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm))
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index a5cacf846e1b..cc3677c4a4c2 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -4101,6 +4101,9 @@ static int dm_plane_helper_prepare_fb(struct
drm_plane *plane,
  struct amdgpu_device *adev;
  struct amdgpu_bo *rbo;
  struct dm_plane_state *dm_plane_state_new, *dm_plane_state_old;
+    struct list_head list, duplicates;
+    struct ttm_validate_buffer tv;
+    struct ww_acquire_ctx ticket;
  uint64_t tiling_flags;
  uint32_t domain;
  int r;
@@ -4117,9 +4120,18 @@ static int dm_plane_helper_prepare_fb(struct
drm_plane *plane,
  obj = new_state->fb->obj[0];
  rbo = gem_to_amdgpu_bo(obj);
  adev = amdgpu_ttm_adev(rbo->tbo.bdev);
-    r = amdgpu_bo_reserve(rbo, false);
-    if (unlikely(r != 0))
+    INIT_LIST_HEAD();
+    INIT_LIST_HEAD();
+
+    tv.bo = >tbo;
+    tv.num_shared = 1;
+    list_add(, );
+
+    r = ttm_eu_reserve_buffers(, , false, );
+    if (r) {
+    dev_err(adev->dev, "fail to reserve bo (%d)\n", r);
  return r;
+    }
    if (plane->type != DRM_PLANE_TYPE_CURSOR)
  domain = amdgpu_display_supported_domains(adev);
@@ -4130,21 +4142,21 @@ static int dm_plane_helper_prepare_fb(struct
drm_plane *plane,
  if (unlikely(r != 0)) {
  if (r != -ERESTARTSYS)
  DRM_ERROR("Failed to pin framebuffer with error %d\n", 
r);

-    amdgpu_bo_unreserve(rbo);
+    ttm_eu_backoff_reservation(, );
  return r;
  }
    r = amdgpu_ttm_alloc_gart(>tbo);
  if (unlikely(r != 0)) {
  amdgpu_bo_unpin(rbo);
-    amdgpu_bo_unreserve(rbo);
+    ttm_eu_backoff_reservation(, );
  DRM_ERROR("%p bind failed\n", rbo);
  return r;
  }
    amdgpu_bo_get_tiling_flags(rbo, _flags);
  -    amdgpu_bo_unreserve(rbo);
+    ttm_eu_backoff_reservation(, );


Well I can only repeat myself, please put that into a separate patch!


    afb->address = amdgpu_bo_gpu_offset(rbo);
  diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
b/drivers/gpu/drm/ttm/ttm_bo.c
index 8502b3ed2d88..2c4963e105d9 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -766,11 +766,13 @@ EXPORT_SYMBOL(ttm_bo_eviction_valuable);
   * b. Otherwise, trylock it.
   */
  static bool ttm_bo_evict_swapout_allowable(struct ttm_buffer_object
*bo,
-    struct ttm_operation_ctx *ctx, bool *locked)
+    struct ttm_operation_ctx *ctx, bool *locked, bool *busy)
  {
  bool ret = false;
   

Re: [PATCH] drm/ttm: fix busy memory to fail other user v4

2019-04-26 Thread zhoucm1

please ignore v4. Will send v5 instead.


On 2019年04月26日 17:53, Chunming Zhou wrote:

heavy gpu job could occupy memory long time, which lead other user fail to get 
memory.

basically pick up Christian idea:

1. Reserve the BO in DC using a ww_mutex ticket (trivial).
2. If we then run into this EBUSY condition in TTM check if the BO we need 
memory for (or rather the ww_mutex of its reservation object) has a ticket 
assigned.
3. If we have a ticket we grab a reference to the first BO on the LRU, drop the 
LRU lock and try to grab the reservation lock with the ticket.
4. If getting the reservation lock with the ticket succeeded we check if the BO 
is still the first one on the LRU in question (the BO could have moved).
5. If the BO is still the first one on the LRU in question we try to evict it 
as we would evict any other BO.
6. If any of the "If's" above fail we just back off and return -EBUSY.

v2: fix some minor check
v3: address Christian v2 comments.
v4: fix some missing

Change-Id: I21423fb922f885465f13833c41df1e134364a8e7
Signed-off-by: Chunming Zhou 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c|  6 +-
  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 21 +-
  drivers/gpu/drm/ttm/ttm_bo.c  | 74 +--
  3 files changed, 92 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index affde72b44db..032edf477827 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -811,7 +811,11 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 
domain,
 u64 min_offset, u64 max_offset)
  {
struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
-   struct ttm_operation_ctx ctx = { false, false };
+   struct ttm_operation_ctx ctx = {
+   .interruptible = false,
+   .no_wait_gpu = false,
+   .resv = bo->tbo.resv,
+   };
int r, i;
  
  	if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm))

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index a5cacf846e1b..2957ac38dcb0 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -4101,6 +4101,9 @@ static int dm_plane_helper_prepare_fb(struct drm_plane 
*plane,
struct amdgpu_device *adev;
struct amdgpu_bo *rbo;
struct dm_plane_state *dm_plane_state_new, *dm_plane_state_old;
+   struct list_head list, duplicates;
+   struct ttm_validate_buffer tv;
+   struct ww_acquire_ctx ticket;
uint64_t tiling_flags;
uint32_t domain;
int r;
@@ -4120,6 +4123,18 @@ static int dm_plane_helper_prepare_fb(struct drm_plane 
*plane,
r = amdgpu_bo_reserve(rbo, false);
if (unlikely(r != 0))
return r;
+   INIT_LIST_HEAD();
+   INIT_LIST_HEAD();
+
+   tv.bo = >tbo;
+   tv.num_shared = 1;
+   list_add(, );
+
+   r = ttm_eu_reserve_buffers(, , false, );
+   if (r) {
+   dev_err(adev->dev, "fail to reserve bo (%d)\n", r);
+   return r;
+   }
  
  	if (plane->type != DRM_PLANE_TYPE_CURSOR)

domain = amdgpu_display_supported_domains(adev);
@@ -4130,21 +4145,21 @@ static int dm_plane_helper_prepare_fb(struct drm_plane 
*plane,
if (unlikely(r != 0)) {
if (r != -ERESTARTSYS)
DRM_ERROR("Failed to pin framebuffer with error %d\n", 
r);
-   amdgpu_bo_unreserve(rbo);
+   ttm_eu_backoff_reservation(, );
return r;
}
  
  	r = amdgpu_ttm_alloc_gart(>tbo);

if (unlikely(r != 0)) {
amdgpu_bo_unpin(rbo);
-   amdgpu_bo_unreserve(rbo);
+   ttm_eu_backoff_reservation(, );
DRM_ERROR("%p bind failed\n", rbo);
return r;
}
  
  	amdgpu_bo_get_tiling_flags(rbo, _flags);
  
-	amdgpu_bo_unreserve(rbo);

+   ttm_eu_backoff_reservation(, );
  
  	afb->address = amdgpu_bo_gpu_offset(rbo);
  
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c

index 8502b3ed2d88..f4f506dad2b0 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -766,11 +766,12 @@ EXPORT_SYMBOL(ttm_bo_eviction_valuable);
   * b. Otherwise, trylock it.
   */
  static bool ttm_bo_evict_swapout_allowable(struct ttm_buffer_object *bo,
-   struct ttm_operation_ctx *ctx, bool *locked)
+   struct ttm_operation_ctx *ctx, bool *locked, bool *busy)
  {
bool ret = false;
  
  	*locked = false;

+   *busy = false;
if (bo->resv == ctx->resv) {
reservation_object_assert_held(bo->resv);
if (ctx->flags & TTM_OPT_FLAG_ALLOW_RES_EVICT
@@ -779,6 +780,8 @@ static bool ttm_bo_evict_swapout_allowable(struct 

Re: [PATCH] drm/ttm: fix busy memory to fail other user v3

2019-04-26 Thread zhoucm1



On 2019年04月26日 17:11, Koenig, Christian wrote:

Am 26.04.19 um 11:07 schrieb zhoucm1:

[SNIP]

+ spin_lock(>lru_lock);
+    for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
+    if (list_empty(>lru[i]))
+    continue;
+    bo = list_first_entry(>lru[i],
+  struct ttm_buffer_object,
+  lru);

You now need to check all BOs on the LRU, cause the first one where a
trylock failed is not necessarily the first one on the list.

Sorry, I don't get your opinion on this, Could you detail a bit how to
do that?
I though after ww_mutex_lock, cs is done, then DC can pick up any of
them.

The first_bo is not necessarily the first BO of the LRU, but the first
BO which failed in the trylock.


+
+    break;
+    }
+    /* verify if BO have been moved */
+    if (first_bo != bo) {

And in this case we would abort here without a good reason.

I got what you said now. I will change it soon.

-David


Christian.


+ spin_unlock(>lru_lock);
+    ww_mutex_unlock(>resv->lock);
+    return -EBUSY;
+    }
+    spin_unlock(>lru_lock);
+    /* ok, pick up first busy BO to wait to evict */
   }
     kref_get(>list_kref);
@@ -1784,7 +1819,10 @@ int ttm_bo_swapout(struct ttm_bo_global
*glob, struct ttm_operation_ctx *ctx)
   spin_lock(>lru_lock);
   for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
   list_for_each_entry(bo, >swap_lru[i], swap) {
-    if (ttm_bo_evict_swapout_allowable(bo, ctx, )) {
+    bool busy = false;

Better make the parameter optional.

Will do.

-David

Christian.


+
+    if (ttm_bo_evict_swapout_allowable(bo, ctx, ,
+   )) {
   ret = 0;
   break;
   }


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] drm/ttm: fix busy memory to fail other user v3

2019-04-26 Thread zhoucm1



On 2019年04月26日 16:31, Christian König wrote:

Am 25.04.19 um 09:39 schrieb Chunming Zhou:
heavy gpu job could occupy memory long time, which lead other user 
fail to get memory.


basically pick up Christian idea:

1. Reserve the BO in DC using a ww_mutex ticket (trivial).


Any reason you don't want to split this into a separate patch?
Will do that after locking down solution, just make the solution in one 
patch to review conveniently.




2. If we then run into this EBUSY condition in TTM check if the BO we 
need memory for (or rather the ww_mutex of its reservation object) 
has a ticket assigned.
3. If we have a ticket we grab a reference to the first BO on the 
LRU, drop the LRU lock and try to grab the reservation lock with the 
ticket.
4. If getting the reservation lock with the ticket succeeded we check 
if the BO is still the first one on the LRU in question (the BO could 
have moved).
5. If the BO is still the first one on the LRU in question we try to 
evict it as we would evict any other BO.

6. If any of the "If's" above fail we just back off and return -EBUSY.

v2: fix some minor check
v3: address Christian v2 comments.

Change-Id: I21423fb922f885465f13833c41df1e134364a8e7
Signed-off-by: Chunming Zhou 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c    |  7 ++-
  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 21 ++--
  drivers/gpu/drm/ttm/ttm_bo.c  | 48 +--
  3 files changed, 67 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c

index affde72b44db..015bf62277d0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -811,7 +811,12 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo 
*bo, u32 domain,

   u64 min_offset, u64 max_offset)
  {
  struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
-    struct ttm_operation_ctx ctx = { false, false };
+    struct ttm_operation_ctx ctx = {
+    .interruptible = false,
+    .no_wait_gpu = false,
+    .resv = bo->tbo.resv,
+    .flags = TTM_OPT_FLAG_ALLOW_RES_EVICT
+    };


Please completely drop this chunk.

If we allow evicting BOs from the same process during pinning it could 
lead to some unforeseen side effects.

Will remove that flags.




  int r, i;
    if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm))
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c

index a5cacf846e1b..2957ac38dcb0 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -4101,6 +4101,9 @@ static int dm_plane_helper_prepare_fb(struct 
drm_plane *plane,

  struct amdgpu_device *adev;
  struct amdgpu_bo *rbo;
  struct dm_plane_state *dm_plane_state_new, *dm_plane_state_old;
+    struct list_head list, duplicates;
+    struct ttm_validate_buffer tv;
+    struct ww_acquire_ctx ticket;
  uint64_t tiling_flags;
  uint32_t domain;
  int r;
@@ -4120,6 +4123,18 @@ static int dm_plane_helper_prepare_fb(struct 
drm_plane *plane,

  r = amdgpu_bo_reserve(rbo, false);
  if (unlikely(r != 0))
  return r;
+    INIT_LIST_HEAD();
+    INIT_LIST_HEAD();
+
+    tv.bo = >tbo;
+    tv.num_shared = 1;
+    list_add(, );
+
+    r = ttm_eu_reserve_buffers(, , false, );
+    if (r) {
+    dev_err(adev->dev, "fail to reserve bo (%d)\n", r);
+    return r;
+    }
    if (plane->type != DRM_PLANE_TYPE_CURSOR)
  domain = amdgpu_display_supported_domains(adev);
@@ -4130,21 +4145,21 @@ static int dm_plane_helper_prepare_fb(struct 
drm_plane *plane,

  if (unlikely(r != 0)) {
  if (r != -ERESTARTSYS)
  DRM_ERROR("Failed to pin framebuffer with error %d\n", r);
-    amdgpu_bo_unreserve(rbo);
+    ttm_eu_backoff_reservation(, );
  return r;
  }
    r = amdgpu_ttm_alloc_gart(>tbo);
  if (unlikely(r != 0)) {
  amdgpu_bo_unpin(rbo);
-    amdgpu_bo_unreserve(rbo);
+    ttm_eu_backoff_reservation(, );
  DRM_ERROR("%p bind failed\n", rbo);
  return r;
  }
    amdgpu_bo_get_tiling_flags(rbo, _flags);
  -    amdgpu_bo_unreserve(rbo);
+    ttm_eu_backoff_reservation(, );


This part looks good to me, but as noted above should probably be in a 
separate patch.



    afb->address = amdgpu_bo_gpu_offset(rbo);
  diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
b/drivers/gpu/drm/ttm/ttm_bo.c

index 8502b3ed2d88..dbcf958d1b43 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -766,11 +766,12 @@ EXPORT_SYMBOL(ttm_bo_eviction_valuable);
   * b. Otherwise, trylock it.
   */
  static bool ttm_bo_evict_swapout_allowable(struct ttm_buffer_object 
*bo,

-    struct ttm_operation_ctx *ctx, bool *locked)
+    struct ttm_operation_ctx *ctx, bool *locked, bool *busy)
  {
  bool ret = false;
   

Re: [PATCH] gpu/docs: Clarify what userspace means for gl

2019-04-24 Thread zhoucm1



On 2019年04月25日 03:22, Eric Anholt wrote:

"Zhou, David(ChunMing)"  writes:


Will linux be only mesa-linux? I thought linux is an  open linux.
Which will impact our opengl/amdvlk(MIT open source), not sure Rocm:
1. how to deal with one uapi that opengl/amdvlk needs but mesa dont need? 
reject?
2. one hw feature that opengl/amdvlk developers work on that but no mesa
developers work on, cannot upstream as well?

I believe these questions are already covered by

"+Other userspace is only admissible if exposing a given feature through OpenGL
or
+OpenGL ES would result in a technically unsound design, incomplete driver or
+an implementation which isn't useful in real world usage."

If OpenGL needs the interface, then you need a Mesa implementation.
It's time for you to work with the community to build that or get it
built.  Or, in AMD's case, work with the Mesa developers that you
already employ.

If OpenGL doesn't need it, but Vulkan needs it, then we don't have a
clear policy in place, and this patch doesn't change that.  I would
personally say that AMDVLK doesn't qualify given that as far as I know
there is not open review of proposed patches to the project as they're
being developed.
Can I understand what you mean is, as soon as the stack is openly 
developed, then which will be able to drive new UAPI?


-David

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] ttm: wait mem space if user allow while gpu busy

2019-04-24 Thread zhoucm1

OK, Let's drop mine, then Could you draft a solution for that?


-David


On 2019年04月24日 16:59, Koenig, Christian wrote:

Am 24.04.19 um 10:11 schrieb zhoucm1:




On 2019年04月24日 16:07, Christian König wrote:

This is used in a work item, so you don't need to check for signals.

will remove.


And checking if the LRU is populated is mandatory here

How to check it outside of TTM? because the code is in dm.


Well just use a static cast on the first entry of the LRU.

We can't upstream that solution anyway, so just a hack should do for now.




or otherwise you can run into an endless loop.

I already add a timeout for that.


Thinking more about it we most likely actually need to grab the lock 
on the first BO entry from the LRU.




-David


Christian.

Am 24.04.19 um 09:59 schrieb zhoucm1:


how about new attached?


-David


On 2019年04月24日 15:30, Christian König wrote:
That would change the semantics of ttm_bo_mem_space() and so could 
change the return code in an IOCTL as well. Not a good idea, cause 
that could have a lot of side effects.


Instead in the calling DC code you could check if you get an 
-ENOMEM and then call schedule().


If after the schedule() we see that we have now BOs on the LRU we 
can try again and see if pinning the frame buffer now succeeds.


Christian.

Am 24.04.19 um 09:17 schrieb zhoucm1:


I made a patch as attached.

I'm not sure how to make patch as your proposal, Could you make a 
patch for that if mine isn't enough?


-David

On 2019年04月24日 15:12, Christian König wrote:

how about just adding a wrapper for pin function as below?

I considered this as well and don't think it will work reliable.

We could use it as a band aid for this specific problem, but in 
general we need to improve the handling in TTM to resolve those 
kind of resource conflicts.


Regards,
Christian.

Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):
>3. If we have a ticket we grab a reference to the first BO on 
the LRU, drop the LRU lock and try to grab the reservation lock 
with the ticket.


The BO on LRU is already locked by cs user, can it be dropped 
here by DC user? and then DC user grab its lock with ticket, 
how does CS grab it again?


If you think waiting in ttm has this risk, how about just 
adding a wrapper for pin function as below?

amdgpu_get_pin_bo_timeout()
{
do {
amdgpo_bo_reserve();
r=amdgpu_bo_pin();

if(!r)
    break;
amdgpu_bo_unreserve();
timeout--;

} while(timeout>0);

}

 Original Message 
Subject: Re: [PATCH] ttm: wait mem space if user allow while 
gpu busy

From: Christian König
To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, 
Prike" ,dri-devel@lists.freedesktop.org

CC:

Well that's not so easy of hand.

The basic problem here is that when you busy wait at this place 
you can easily run into situations where application A busy 
waits for B while B busy waits for A -> deadlock.


So what we need here is the deadlock detection logic of the 
ww_mutex. To use this we at least need to do the following steps:


1. Reserve the BO in DC using a ww_mutex ticket (trivial).

2. If we then run into this EBUSY condition in TTM check if the 
BO we need memory for (or rather the ww_mutex of its 
reservation object) has a ticket assigned.


3. If we have a ticket we grab a reference to the first BO on 
the LRU, drop the LRU lock and try to grab the reservation lock 
with the ticket.


4. If getting the reservation lock with the ticket succeeded we 
check if the BO is still the first one on the LRU in question 
(the BO could have moved).


5. If the BO is still the first one on the LRU in question we 
try to evict it as we would evict any other BO.


6. If any of the "If's" above fail we just back off and return 
-EBUSY.


Steps 2-5 are certainly not trivial, but doable as far as I can 
see.


Have fun :)
Christian.

Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):
How about adding more condition ctx->resv inline to address 
your concern? As well as don't wait from same user, shouldn't 
lead to deadlock.


Otherwise, any other idea?

 Original Message 
Subject: Re: [PATCH] ttm: wait mem space if user allow while 
gpu busy

From: Christian König
To: "Liang, Prike" ,"Zhou, David(ChunMing)" 
,dri-devel@lists.freedesktop.org

CC:

Well that is certainly a NAK because it can lead to deadlock 
in the

memory management.

You can't just busy wait with all those locks held.

Regards,
Christian.

Am 23.04.19 um 03:45 schrieb Liang, Prike:
> Acked-by: Prike Liang 
>
> Thanks,
> Prike
> -Original Message-
> From: Chunming Zhou 
> Sent: Monday, April 22, 2019 6:39 PM
> To: dri-devel@lists.freedesktop.org
> Cc: Liang, Prike ; Zhou, 
David(ChunMing) 
> Subject: [PATCH] ttm: wait mem space if user allow while gpu 
busy

>
> heavy gpu job could occupy memory long time, which could 
lead to other user fail to get memory.

>
> 

Re: [PATCH] ttm: wait mem space if user allow while gpu busy

2019-04-24 Thread zhoucm1



On 2019年04月24日 16:07, Christian König wrote:

This is used in a work item, so you don't need to check for signals.

will remove.


And checking if the LRU is populated is mandatory here

How to check it outside of TTM? because the code is in dm.


or otherwise you can run into an endless loop.

I already add a timeout for that.

-David


Christian.

Am 24.04.19 um 09:59 schrieb zhoucm1:


how about new attached?


-David


On 2019年04月24日 15:30, Christian König wrote:
That would change the semantics of ttm_bo_mem_space() and so could 
change the return code in an IOCTL as well. Not a good idea, cause 
that could have a lot of side effects.


Instead in the calling DC code you could check if you get an -ENOMEM 
and then call schedule().


If after the schedule() we see that we have now BOs on the LRU we 
can try again and see if pinning the frame buffer now succeeds.


Christian.

Am 24.04.19 um 09:17 schrieb zhoucm1:


I made a patch as attached.

I'm not sure how to make patch as your proposal, Could you make a 
patch for that if mine isn't enough?


-David

On 2019年04月24日 15:12, Christian König wrote:

how about just adding a wrapper for pin function as below?

I considered this as well and don't think it will work reliable.

We could use it as a band aid for this specific problem, but in 
general we need to improve the handling in TTM to resolve those 
kind of resource conflicts.


Regards,
Christian.

Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):
>3. If we have a ticket we grab a reference to the first BO on 
the LRU, drop the LRU lock and try to grab the reservation lock 
with the ticket.


The BO on LRU is already locked by cs user, can it be dropped 
here by DC user? and then DC user grab its lock with ticket, how 
does CS grab it again?


If you think waiting in ttm has this risk, how about just adding 
a wrapper for pin function as below?

amdgpu_get_pin_bo_timeout()
{
do {
amdgpo_bo_reserve();
r=amdgpu_bo_pin();

if(!r)
    break;
amdgpu_bo_unreserve();
timeout--;

} while(timeout>0);

}

 Original Message 
Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
From: Christian König
To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike" 
,dri-devel@lists.freedesktop.org

CC:

Well that's not so easy of hand.

The basic problem here is that when you busy wait at this place 
you can easily run into situations where application A busy waits 
for B while B busy waits for A -> deadlock.


So what we need here is the deadlock detection logic of the 
ww_mutex. To use this we at least need to do the following steps:


1. Reserve the BO in DC using a ww_mutex ticket (trivial).

2. If we then run into this EBUSY condition in TTM check if the 
BO we need memory for (or rather the ww_mutex of its reservation 
object) has a ticket assigned.


3. If we have a ticket we grab a reference to the first BO on the 
LRU, drop the LRU lock and try to grab the reservation lock with 
the ticket.


4. If getting the reservation lock with the ticket succeeded we 
check if the BO is still the first one on the LRU in question 
(the BO could have moved).


5. If the BO is still the first one on the LRU in question we try 
to evict it as we would evict any other BO.


6. If any of the "If's" above fail we just back off and return 
-EBUSY.


Steps 2-5 are certainly not trivial, but doable as far as I can see.

Have fun :)
Christian.

Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):
How about adding more condition ctx->resv inline to address your 
concern? As well as don't wait from same user, shouldn't lead to 
deadlock.


Otherwise, any other idea?

 Original Message 
Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu 
busy

From: Christian König
To: "Liang, Prike" ,"Zhou, David(ChunMing)" 
,dri-devel@lists.freedesktop.org

CC:

Well that is certainly a NAK because it can lead to deadlock in the
memory management.

You can't just busy wait with all those locks held.

Regards,
Christian.

Am 23.04.19 um 03:45 schrieb Liang, Prike:
> Acked-by: Prike Liang 
>
> Thanks,
> Prike
> -Original Message-
> From: Chunming Zhou 
> Sent: Monday, April 22, 2019 6:39 PM
> To: dri-devel@lists.freedesktop.org
> Cc: Liang, Prike ; Zhou, David(ChunMing) 


> Subject: [PATCH] ttm: wait mem space if user allow while gpu busy
>
> heavy gpu job could occupy memory long time, which could lead 
to other user fail to get memory.

>
> Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
> Signed-off-by: Chunming Zhou 
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 6 --
>   1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 
100644

> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/d

Re: [PATCH] ttm: wait mem space if user allow while gpu busy

2019-04-24 Thread zhoucm1

how about new attached?


-David


On 2019年04月24日 15:30, Christian König wrote:
That would change the semantics of ttm_bo_mem_space() and so could 
change the return code in an IOCTL as well. Not a good idea, cause 
that could have a lot of side effects.


Instead in the calling DC code you could check if you get an -ENOMEM 
and then call schedule().


If after the schedule() we see that we have now BOs on the LRU we can 
try again and see if pinning the frame buffer now succeeds.


Christian.

Am 24.04.19 um 09:17 schrieb zhoucm1:


I made a patch as attached.

I'm not sure how to make patch as your proposal, Could you make a 
patch for that if mine isn't enough?


-David

On 2019年04月24日 15:12, Christian König wrote:

how about just adding a wrapper for pin function as below?

I considered this as well and don't think it will work reliable.

We could use it as a band aid for this specific problem, but in 
general we need to improve the handling in TTM to resolve those kind 
of resource conflicts.


Regards,
Christian.

Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):
>3. If we have a ticket we grab a reference to the first BO on the 
LRU, drop the LRU lock and try to grab the reservation lock with 
the ticket.


The BO on LRU is already locked by cs user, can it be dropped here 
by DC user? and then DC user grab its lock with ticket, how does CS 
grab it again?


If you think waiting in ttm has this risk, how about just adding a 
wrapper for pin function as below?

amdgpu_get_pin_bo_timeout()
{
do {
amdgpo_bo_reserve();
r=amdgpu_bo_pin();

if(!r)
    break;
amdgpu_bo_unreserve();
timeout--;

} while(timeout>0);

}

 Original Message 
Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
From: Christian König
To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike" 
,dri-devel@lists.freedesktop.org

CC:

Well that's not so easy of hand.

The basic problem here is that when you busy wait at this place you 
can easily run into situations where application A busy waits for B 
while B busy waits for A -> deadlock.


So what we need here is the deadlock detection logic of the 
ww_mutex. To use this we at least need to do the following steps:


1. Reserve the BO in DC using a ww_mutex ticket (trivial).

2. If we then run into this EBUSY condition in TTM check if the BO 
we need memory for (or rather the ww_mutex of its reservation 
object) has a ticket assigned.


3. If we have a ticket we grab a reference to the first BO on the 
LRU, drop the LRU lock and try to grab the reservation lock with 
the ticket.


4. If getting the reservation lock with the ticket succeeded we 
check if the BO is still the first one on the LRU in question (the 
BO could have moved).


5. If the BO is still the first one on the LRU in question we try 
to evict it as we would evict any other BO.


6. If any of the "If's" above fail we just back off and return -EBUSY.

Steps 2-5 are certainly not trivial, but doable as far as I can see.

Have fun :)
Christian.

Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):
How about adding more condition ctx->resv inline to address your 
concern? As well as don't wait from same user, shouldn't lead to 
deadlock.


Otherwise, any other idea?

 Original Message 
Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
From: Christian König
To: "Liang, Prike" ,"Zhou, David(ChunMing)" 
,dri-devel@lists.freedesktop.org

CC:

Well that is certainly a NAK because it can lead to deadlock in the
memory management.

You can't just busy wait with all those locks held.

Regards,
Christian.

Am 23.04.19 um 03:45 schrieb Liang, Prike:
> Acked-by: Prike Liang 
>
> Thanks,
> Prike
> -Original Message-
> From: Chunming Zhou 
> Sent: Monday, April 22, 2019 6:39 PM
> To: dri-devel@lists.freedesktop.org
> Cc: Liang, Prike ; Zhou, David(ChunMing) 


> Subject: [PATCH] ttm: wait mem space if user allow while gpu busy
>
> heavy gpu job could occupy memory long time, which could lead to 
other user fail to get memory.

>
> Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
> Signed-off-by: Chunming Zhou 
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 6 --
>   1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644

> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct 
ttm_buffer_object *bo,

>    if (mem->mm_node)
>    break;
>    ret = ttm_mem_evict_first(bdev, mem_type, place, 
ctx);

> - if (unlikely(ret != 0))
> - return ret;
> + if (unlikely(ret != 0)) {
> +

Re: [PATCH] ttm: wait mem space if user allow while gpu busy

2019-04-24 Thread zhoucm1

I made a patch as attached.

I'm not sure how to make patch as your proposal, Could you make a patch 
for that if mine isn't enough?


-David

On 2019年04月24日 15:12, Christian König wrote:

how about just adding a wrapper for pin function as below?

I considered this as well and don't think it will work reliable.

We could use it as a band aid for this specific problem, but in 
general we need to improve the handling in TTM to resolve those kind 
of resource conflicts.


Regards,
Christian.

Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):
>3. If we have a ticket we grab a reference to the first BO on the 
LRU, drop the LRU lock and try to grab the reservation lock with the 
ticket.


The BO on LRU is already locked by cs user, can it be dropped here by 
DC user? and then DC user grab its lock with ticket, how does CS grab 
it again?


If you think waiting in ttm has this risk, how about just adding a 
wrapper for pin function as below?

amdgpu_get_pin_bo_timeout()
{
do {
amdgpo_bo_reserve();
r=amdgpu_bo_pin();

if(!r)
    break;
amdgpu_bo_unreserve();
timeout--;

} while(timeout>0);

}

 Original Message 
Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
From: Christian König
To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike" 
,dri-devel@lists.freedesktop.org

CC:

Well that's not so easy of hand.

The basic problem here is that when you busy wait at this place you 
can easily run into situations where application A busy waits for B 
while B busy waits for A -> deadlock.


So what we need here is the deadlock detection logic of the ww_mutex. 
To use this we at least need to do the following steps:


1. Reserve the BO in DC using a ww_mutex ticket (trivial).

2. If we then run into this EBUSY condition in TTM check if the BO we 
need memory for (or rather the ww_mutex of its reservation object) 
has a ticket assigned.


3. If we have a ticket we grab a reference to the first BO on the 
LRU, drop the LRU lock and try to grab the reservation lock with the 
ticket.


4. If getting the reservation lock with the ticket succeeded we check 
if the BO is still the first one on the LRU in question (the BO could 
have moved).


5. If the BO is still the first one on the LRU in question we try to 
evict it as we would evict any other BO.


6. If any of the "If's" above fail we just back off and return -EBUSY.

Steps 2-5 are certainly not trivial, but doable as far as I can see.

Have fun :)
Christian.

Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):
How about adding more condition ctx->resv inline to address your 
concern? As well as don't wait from same user, shouldn't lead to 
deadlock.


Otherwise, any other idea?

 Original Message 
Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
From: Christian König
To: "Liang, Prike" ,"Zhou, David(ChunMing)" 
,dri-devel@lists.freedesktop.org

CC:

Well that is certainly a NAK because it can lead to deadlock in the
memory management.

You can't just busy wait with all those locks held.

Regards,
Christian.

Am 23.04.19 um 03:45 schrieb Liang, Prike:
> Acked-by: Prike Liang 
>
> Thanks,
> Prike
> -Original Message-
> From: Chunming Zhou 
> Sent: Monday, April 22, 2019 6:39 PM
> To: dri-devel@lists.freedesktop.org
> Cc: Liang, Prike ; Zhou, David(ChunMing) 


> Subject: [PATCH] ttm: wait mem space if user allow while gpu busy
>
> heavy gpu job could occupy memory long time, which could lead to 
other user fail to get memory.

>
> Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
> Signed-off-by: Chunming Zhou 
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 6 --
>   1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644

> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct 
ttm_buffer_object *bo,

>    if (mem->mm_node)
>    break;
>    ret = ttm_mem_evict_first(bdev, mem_type, place, ctx);
> - if (unlikely(ret != 0))
> - return ret;
> + if (unlikely(ret != 0)) {
> + if (!ctx || ctx->no_wait_gpu || ret != -EBUSY)
> + return ret;
> + }
>    } while (1);
>    mem->mem_type = mem_type;
>    return ttm_bo_add_move_fence(bo, man, mem);
> --
> 2.17.1
>
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel



___
dri-devel mailing list
dri-devel@lists.freedesktop.org

Re: [PATCH 2/9] drm/syncobj: add new drm_syncobj_add_point interface v4

2019-04-01 Thread zhoucm1



On 2019年04月01日 16:19, Lionel Landwerlin wrote:

On 01/04/2019 06:54, Zhou, David(ChunMing) wrote:



-Original Message-
From: Lionel Landwerlin 
Sent: Saturday, March 30, 2019 10:09 PM
To: Koenig, Christian ; Zhou, David(ChunMing)
; dri-devel@lists.freedesktop.org; amd-
g...@lists.freedesktop.org; ja...@jlekstrand.net; Hector, Tobias

Subject: Re: [PATCH 2/9] drm/syncobj: add new drm_syncobj_add_point
interface v4

On 28/03/2019 15:18, Christian König wrote:

Am 28.03.19 um 14:50 schrieb Lionel Landwerlin:

On 25/03/2019 08:32, Chunming Zhou wrote:

From: Christian König 

Use the dma_fence_chain object to create a timeline of fence objects
instead of just replacing the existing fence.

v2: rebase and cleanup
v3: fix garbage collection parameters
v4: add unorder point check, print a warn calltrace

Signed-off-by: Christian König 
Cc: Lionel Landwerlin 
---
   drivers/gpu/drm/drm_syncobj.c | 39
+++
   include/drm/drm_syncobj.h |  5 +
   2 files changed, 44 insertions(+)

diff --git a/drivers/gpu/drm/drm_syncobj.c
b/drivers/gpu/drm/drm_syncobj.c index 5329e66598c6..19a9ce638119
100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -122,6 +122,45 @@ static void drm_syncobj_remove_wait(struct
drm_syncobj *syncobj,
   spin_unlock(>lock);
   }
   +/**
+ * drm_syncobj_add_point - add new timeline point to the syncobj
+ * @syncobj: sync object to add timeline point do
+ * @chain: chain node to use to add the point
+ * @fence: fence to encapsulate in the chain node
+ * @point: sequence number to use for the point
+ *
+ * Add the chain node as new timeline point to the syncobj.
+ */
+void drm_syncobj_add_point(struct drm_syncobj *syncobj,
+   struct dma_fence_chain *chain,
+   struct dma_fence *fence,
+   uint64_t point)
+{
+    struct syncobj_wait_entry *cur, *tmp;
+    struct dma_fence *prev;
+
+    dma_fence_get(fence);
+
+    spin_lock(>lock);
+
+    prev = drm_syncobj_fence_get(syncobj);
+    /* You are adding an unorder point to timeline, which could
cause payload returned from query_ioctl is 0! */
+    WARN_ON_ONCE(prev && prev->seqno >= point);


I think the WARN/BUG macros should only fire when there is an issue
with programming from within the kernel.

But this particular warning can be triggered by an application.


Probably best to just remove it?

Yeah, that was also my argument against it.

Key point here is that we still want to note somehow that userspace
did something wrong and returning an error is not an option.

Maybe just use DRM_ERROR with a static variable to print the message
only once.

Christian.

I don't really see any point in printing an error once. If you run your
application twice you end up thinking there was an issue just on the 
first run

but it's actually always wrong.

Except this nitpick, is there any other concern to push whole patch 
set? Is that time to push whole patch set?


-David



Looks good to me.
Does that mean we can add your RB on patch set so that we can submit the 
patch set to drm-misc-next branch?




I have an additional change to make drm_syncobj_find_fence() also 
return the drm_syncobj : 
https://github.com/djdeath/linux/commit/0b7732b267b931339d71fe6f493ea6fa4eab453e


This is needed in i915 to avoid looking up the drm_syncobj handle twice.

Our driver allows to wait on the syncobj's dma_fence that we're then 
going to replace so we need to get bot the fence & syncobj at the same 
time.


I guess it can go in a follow up series.

Yes, agree.

Thanks for your effort as well,
-David



-Lionel




Unless we're willing to take the syncobj lock for longer periods of 
time when
adding points, I guess we'll have to defer validation to validation 
layers.



-Lionel



-Lionel



+    dma_fence_chain_init(chain, prev, fence, point);
+    rcu_assign_pointer(syncobj->fence, >base);
+
+    list_for_each_entry_safe(cur, tmp, >cb_list, node) {
+    list_del_init(>node);
+    syncobj_wait_syncobj_func(syncobj, cur);
+    }
+    spin_unlock(>lock);
+
+    /* Walk the chain once to trigger garbage collection */
+    dma_fence_chain_for_each(fence, prev);
+    dma_fence_put(prev);
+}
+EXPORT_SYMBOL(drm_syncobj_add_point);
+
   /**
    * drm_syncobj_replace_fence - replace fence in a sync object.
    * @syncobj: Sync object to replace fence in diff --git
a/include/drm/drm_syncobj.h b/include/drm/drm_syncobj.h index
0311c9fdbd2f..6cf7243a1dc5 100644
--- a/include/drm/drm_syncobj.h
+++ b/include/drm/drm_syncobj.h
@@ -27,6 +27,7 @@
   #define __DRM_SYNCOBJ_H__
     #include 
+#include 
     struct drm_file;
   @@ -112,6 +113,10 @@ drm_syncobj_fence_get(struct drm_syncobj
*syncobj)
     struct drm_syncobj *drm_syncobj_find(struct drm_file
*file_private,
    u32 handle);
+void drm_syncobj_add_point(struct drm_syncobj *syncobj,
+   struct dma_fence_chain *chain,
+   struct dma_fence *fence,
+   

Re: [PATCH 1/9] dma-buf: add new dma_fence_chain container v6

2019-03-22 Thread zhoucm1

how about the attached?

If ok, I will merge to pathc#1.


-David


On 2019年03月21日 22:40, Christian König wrote:
No, atomic cmpxchg is a hardware operation. If you want to replace 
that you need a lock again.


Maybe just add a comment and use an explicit cast to void* ? Not sure 
if that silences the warning.


Christian.

Am 21.03.19 um 15:13 schrieb Zhou, David(ChunMing):

cmpxchg be replaced by some simple c sentance?
otherwise we have to remove __rcu of chian->prev.

-David

 Original Message 
Subject: Re: [PATCH 1/9] dma-buf: add new dma_fence_chain container v6
From: Christian König
To: "Zhou, David(ChunMing)" ,kbuild test robot ,"Zhou, David(ChunMing)"
CC: 
kbuild-...@01.org,dri-devel@lists.freedesktop.org,amd-...@lists.freedesktop.org,lionel.g.landwer...@intel.com,ja...@jlekstrand.net,"Koenig, 
Christian" ,"Hector, Tobias"


Hi David,

For the cmpxchg() case I of hand don't know either. Looks like so far
nobody has used cmpxchg() with rcu protected structures.

The other cases should be replaced by RCU_INIT_POINTER() or
rcu_dereference_protected(.., true);

Regards,
Christian.

Am 21.03.19 um 07:34 schrieb zhoucm1:
> Hi Lionel and Christian,
>
> Below is robot report for chain->prev, which was added __rcu as you
> suggested.
>
> How to fix this line "tmp = cmpxchg(>prev, prev, 
replacement); "?

> I checked kernel header file, seems it has no cmpxchg for rcu.
>
> Any suggestion to fix this robot report?
>
> Thanks,
> -David
>
> On 2019年03月21日 08:24, kbuild test robot wrote:
>> Hi Chunming,
>>
>> I love your patch! Perhaps something to improve:
>>
>> [auto build test WARNING on linus/master]
>> [also build test WARNING on v5.1-rc1 next-20190320]
>> [if your patch is applied to the wrong git tree, please drop us a
>> note to help improve the system]
>>
>> url:
>> 
https://github.com/0day-ci/linux/commits/Chunming-Zhou/dma-buf-add-new-dma_fence_chain-container-v6/20190320-223607

>> reproduce:
>>  # apt-get install sparse
>>  make ARCH=x86_64 allmodconfig
>>  make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'
>>
>>
>> sparse warnings: (new ones prefixed by >>)
>>
>>>> drivers/dma-buf/dma-fence-chain.c:73:23: sparse: incorrect type in
>>>> initializer (different address spaces) @@    expected struct
>>>> dma_fence [noderef] *__old @@    got  dma_fence [noderef]
>>>> *__old @@
>>     drivers/dma-buf/dma-fence-chain.c:73:23: expected struct
>> dma_fence [noderef] *__old
>>     drivers/dma-buf/dma-fence-chain.c:73:23: got struct dma_fence
>> *[assigned] prev
>>>> drivers/dma-buf/dma-fence-chain.c:73:23: sparse: incorrect type in
>>>> initializer (different address spaces) @@    expected struct
>>>> dma_fence [noderef] *__new @@    got  dma_fence [noderef]
>>>> *__new @@
>>     drivers/dma-buf/dma-fence-chain.c:73:23: expected struct
>> dma_fence [noderef] *__new
>>     drivers/dma-buf/dma-fence-chain.c:73:23: got struct dma_fence
>> *[assigned] replacement
>>>> drivers/dma-buf/dma-fence-chain.c:73:21: sparse: incorrect type in
>>>> assignment (different address spaces) @@    expected struct
>>>> dma_fence *tmp @@    got struct dma_fence [noderef] >>> dma_fence *tmp @@
>>     drivers/dma-buf/dma-fence-chain.c:73:21: expected struct
>> dma_fence *tmp
>>     drivers/dma-buf/dma-fence-chain.c:73:21: got struct dma_fence
>> [noderef] *[assigned] __ret
>>>> drivers/dma-buf/dma-fence-chain.c:190:28: sparse: incorrect type in
>>>> argument 1 (different address spaces) @@    expected struct
>>>> dma_fence *fence @@    got struct dma_fence struct dma_fence 
*fence @@

>>     drivers/dma-buf/dma-fence-chain.c:190:28: expected struct
>> dma_fence *fence
>>     drivers/dma-buf/dma-fence-chain.c:190:28: got struct dma_fence
>> [noderef] *prev
>>>> drivers/dma-buf/dma-fence-chain.c:222:21: sparse: incorrect type in
>>>> assignment (different address spaces) @@    expected struct
>>>> dma_fence [noderef] *prev @@    got [noderef] *prev @@
>>     drivers/dma-buf/dma-fence-chain.c:222:21: expected struct
>> dma_fence [noderef] *prev
>>     drivers/dma-buf/dma-fence-chain.c:222:21: got struct dma_fence
>> *prev
>>     drivers/dma-buf/dma-fence-chain.c:235:33: sparse: expression
>> using sizeof(void)
>>     drivers/dma-buf/dma-fence-chain.c:235:33: sparse: expression
>> using sizeof(void)
>>
>> vim +73 drivers/dma-buf/dma-fence-chain.c
>>
>>  38
>>  39    /**
&g

Re: [PATCH 1/9] dma-buf: add new dma_fence_chain container v6

2019-03-21 Thread zhoucm1

Hi Lionel and Christian,

Below is robot report for chain->prev, which was added __rcu as you 
suggested.


How to fix this line "tmp = cmpxchg(>prev, prev, replacement); "?
I checked kernel header file, seems it has no cmpxchg for rcu.

Any suggestion to fix this robot report?

Thanks,
-David

On 2019年03月21日 08:24, kbuild test robot wrote:

Hi Chunming,

I love your patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.1-rc1 next-20190320]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Chunming-Zhou/dma-buf-add-new-dma_fence_chain-container-v6/20190320-223607
reproduce:
 # apt-get install sparse
 make ARCH=x86_64 allmodconfig
 make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'


sparse warnings: (new ones prefixed by >>)


drivers/dma-buf/dma-fence-chain.c:73:23: sparse: incorrect type in initializer (different 
address spaces) @@expected struct dma_fence [noderef] *__old @@got  
dma_fence [noderef] *__old @@

drivers/dma-buf/dma-fence-chain.c:73:23:expected struct dma_fence [noderef] 
*__old
drivers/dma-buf/dma-fence-chain.c:73:23:got struct dma_fence 
*[assigned] prev

drivers/dma-buf/dma-fence-chain.c:73:23: sparse: incorrect type in initializer (different 
address spaces) @@expected struct dma_fence [noderef] *__new @@got  
dma_fence [noderef] *__new @@

drivers/dma-buf/dma-fence-chain.c:73:23:expected struct dma_fence [noderef] 
*__new
drivers/dma-buf/dma-fence-chain.c:73:23:got struct dma_fence 
*[assigned] replacement

drivers/dma-buf/dma-fence-chain.c:73:21: sparse: incorrect type in assignment 
(different address spaces) @@expected struct dma_fence *tmp @@got struct 
dma_fence [noderef] 
drivers/dma-buf/dma-fence-chain.c:73:21:expected struct dma_fence *tmp
drivers/dma-buf/dma-fence-chain.c:73:21:got struct dma_fence [noderef] 
*[assigned] __ret

drivers/dma-buf/dma-fence-chain.c:190:28: sparse: incorrect type in argument 1 
(different address spaces) @@expected struct dma_fence *fence @@got 
struct dma_fence struct dma_fence *fence @@

drivers/dma-buf/dma-fence-chain.c:190:28:expected struct dma_fence 
*fence
drivers/dma-buf/dma-fence-chain.c:190:28:got struct dma_fence [noderef] 
*prev

drivers/dma-buf/dma-fence-chain.c:222:21: sparse: incorrect type in assignment (different 
address spaces) @@expected struct dma_fence [noderef] *prev @@got 
[noderef] *prev @@

drivers/dma-buf/dma-fence-chain.c:222:21:expected struct dma_fence [noderef] 
*prev
drivers/dma-buf/dma-fence-chain.c:222:21:got struct dma_fence *prev
drivers/dma-buf/dma-fence-chain.c:235:33: sparse: expression using 
sizeof(void)
drivers/dma-buf/dma-fence-chain.c:235:33: sparse: expression using 
sizeof(void)

vim +73 drivers/dma-buf/dma-fence-chain.c

 38 
 39 /**
 40  * dma_fence_chain_walk - chain walking function
 41  * @fence: current chain node
 42  *
 43  * Walk the chain to the next node. Returns the next fence or NULL if 
we are at
 44  * the end of the chain. Garbage collects chain nodes which are already
 45  * signaled.
 46  */
 47 struct dma_fence *dma_fence_chain_walk(struct dma_fence *fence)
 48 {
 49 struct dma_fence_chain *chain, *prev_chain;
 50 struct dma_fence *prev, *replacement, *tmp;
 51 
 52 chain = to_dma_fence_chain(fence);
 53 if (!chain) {
 54 dma_fence_put(fence);
 55 return NULL;
 56 }
 57 
 58 while ((prev = dma_fence_chain_get_prev(chain))) {
 59 
 60 prev_chain = to_dma_fence_chain(prev);
 61 if (prev_chain) {
 62 if (!dma_fence_is_signaled(prev_chain->fence))
 63 break;
 64 
 65 replacement = 
dma_fence_chain_get_prev(prev_chain);
 66 } else {
 67 if (!dma_fence_is_signaled(prev))
 68 break;
 69 
 70 replacement = NULL;
 71 }
 72 
   > 73  tmp = cmpxchg(>prev, prev, replacement);
 74 if (tmp == prev)
 75 dma_fence_put(tmp);
 76 else
 77 dma_fence_put(replacement);
 78 dma_fence_put(prev);
 79 }
 80 
 81 dma_fence_put(fence);
 82 return prev;
 83 }
 84 EXPORT_SYMBOL(dma_fence_chain_walk);
 85 
 86 /**
 87  * dma_fence_chain_find_seqno - find fence chain node by seqno
 88  * @pfence: pointer to the chain node where to start
 89  * @seqno: the sequence 

Re: [PATCH 8/9] drm/syncobj: add timeline signal ioctl for syncobj v3

2019-03-19 Thread zhoucm1



On 2019年03月19日 19:54, Lionel Landwerlin wrote:

On 15/03/2019 12:09, Chunming Zhou wrote:
v2: individually allocate chain array, since chain node is free 
independently.
v3: all existing points must be already signaled before cpu perform 
signal operation,

 so add check condition for that.

Signed-off-by: Chunming Zhou 
---
  drivers/gpu/drm/drm_internal.h |   2 +
  drivers/gpu/drm/drm_ioctl.c    |   2 +
  drivers/gpu/drm/drm_syncobj.c  | 103 +
  include/uapi/drm/drm.h |   1 +
  4 files changed, 108 insertions(+)

diff --git a/drivers/gpu/drm/drm_internal.h 
b/drivers/gpu/drm/drm_internal.h

index dd11ae5f1eef..d9a483a5fce0 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -190,6 +190,8 @@ int drm_syncobj_reset_ioctl(struct drm_device 
*dev, void *data,

  struct drm_file *file_private);
  int drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
   struct drm_file *file_private);
+int drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void 
*data,

+  struct drm_file *file_private);
  int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
  struct drm_file *file_private);
  diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 92b3b7b2fd81..d337f161909c 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -696,6 +696,8 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
    DRM_UNLOCKED|DRM_RENDER_ALLOW),
  DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_SIGNAL, drm_syncobj_signal_ioctl,
    DRM_UNLOCKED|DRM_RENDER_ALLOW),
+    DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_TIMELINE_SIGNAL, 
drm_syncobj_timeline_signal_ioctl,

+  DRM_UNLOCKED|DRM_RENDER_ALLOW),
  DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_QUERY, drm_syncobj_query_ioctl,
    DRM_UNLOCKED|DRM_RENDER_ALLOW),
  DRM_IOCTL_DEF(DRM_IOCTL_CRTC_GET_SEQUENCE, 
drm_crtc_get_sequence_ioctl, DRM_UNLOCKED),
diff --git a/drivers/gpu/drm/drm_syncobj.c 
b/drivers/gpu/drm/drm_syncobj.c

index 306c7b7e2770..eaeb038f97d7 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -1183,6 +1183,109 @@ drm_syncobj_signal_ioctl(struct drm_device 
*dev, void *data,

  return ret;
  }
  +int
+drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
+  struct drm_file *file_private)
+{
+    struct drm_syncobj_timeline_array *args = data;
+    struct drm_syncobj **syncobjs;
+    struct dma_fence_chain **chains;
+    uint64_t *points;
+    uint32_t i, j, timeline_count = 0;
+    int ret;
+
+    if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ))
+    return -EOPNOTSUPP;
+
+    if (args->pad != 0)
+    return -EINVAL;
+
+    if (args->count_handles == 0)
+    return -EINVAL;
+
+    ret = drm_syncobj_array_find(file_private,
+ u64_to_user_ptr(args->handles),
+ args->count_handles,
+ );
+    if (ret < 0)
+    return ret;
+
+    for (i = 0; i < args->count_handles; i++) {
+    struct dma_fence_chain *chain;
+    struct dma_fence *fence;
+
+    fence = drm_syncobj_fence_get(syncobjs[i]);
+    chain = to_dma_fence_chain(fence);
+    if (chain) {
+    struct dma_fence *iter;
+
+    dma_fence_chain_for_each(iter, fence) {
+    if (!iter)
+    break;
+    if (!dma_fence_is_signaled(iter)) {
+    dma_fence_put(iter);
+    DRM_ERROR("Client must guarantee all existing 
timeline points signaled before performing host signal operation!");

+    ret = -EPERM;
+    goto out;



Sorry if I'm failing to remember whether we discussed this before.


Signaling a point from the host should be fine even if the previous 
points in the timeline are not signaled.

ok, will remove that checking.



After all this can happen on the device side as well (out of order 
signaling).



I thought the thing we didn't want is out of order submission.

Just checking the last chain node seqno against the host signal point 
should be enough.



What about simply returning -EPERM, we can warn the application from 
userspace?

OK, will add that.





+    }
+    }
+    }
+    }
+
+    points = kmalloc_array(args->count_handles, sizeof(*points),
+   GFP_KERNEL);
+    if (!points) {
+    ret = -ENOMEM;
+    goto out;
+    }
+    if (!u64_to_user_ptr(args->points)) {
+    memset(points, 0, args->count_handles * sizeof(uint64_t));
+    } else if (copy_from_user(points, u64_to_user_ptr(args->points),
+  sizeof(uint64_t) * args->count_handles)) {
+    ret = -EFAULT;
+    goto err_points;
+    }
+
+
+    for (i = 0; i < 

Re: [PATCH 09/11] drm/syncobj: add transition iotcls between binary and timeline

2019-02-20 Thread zhoucm1



On 2019年02月20日 15:59, Koenig, Christian wrote:

Am 20.02.19 um 05:53 schrieb zhoucm1:




On 2019年02月19日 19:32, Koenig, Christian wrote:

Hi David,


Could you have a look if it's reasonable?


Patch #1 is also something I already fixed on my local branch.

But patch #2 won't work like this.

We can't return an error from drm_syncobj_add_point() because we 
already submitted work to the hardware. And just dropping the fence 
like you do in the patch is a clearly no-go as well.


Then do you have any idea to skip the messed order signal point?


No, I don't think we can actually do this.
But as Lionel pointed out, user mode shouldn't query a smaller timeline 
payload compared to last time, we must skip messed order signal point!


-David



The only solution I can see would be to lock down the syncobj to 
modifications while command submission is in progress. And that in 
turn would mean a huge bunch of ww_mutex overhead we will certainly 
want to avoid.


Christian.



-David


Regards,
Christian.

Am 19.02.19 um 11:46 schrieb zhoucm1:


Hi Lionel,

the attached should fix your problem and also messed signal order.

Hi Christian,

Could you have a look if it's reasonable?


btw: I pushed to change to 
https://github.com/amingriyue/timeline-syncobj-kernel, which is 
already rebased to latest drm-misc(kernel 5.0). You can directly 
use that branch.



-David


On 2019年02月19日 01:01, Koenig, Christian wrote:

Am 18.02.19 um 13:07 schrieb Lionel Landwerlin:

Thanks guys :)

You mentioned that signaling out of order is illegal.
Is this illegal with regard to the vulkan spec or to the syncobj 
implementation?


David is the expert on that, but as far as I know that is 
forbidden by the vulkan spec.


I'm not finding anything in the vulkan spec that makes out of 
order signaling illegal.
That's why I came up with this test, just verifying that the 
timeline does not go backward in term of its payload.


Well we need to handle this case gracefully in the kernel, so it 
is still a good testcase.


Christian.



-Lionel

On 18/02/2019 11:01, Koenig, Christian wrote:

Hi David,

well I think Lionel is testing the invalid signal order on 
purpose :)


Anyway we really need to handle invalid order graceful here. 
E.g. either the same way as during CS or we abort and return an 
error message.


I think just using the same approach as during CS ist the best 
we can do.


Regards,
Christian


Am 18.02.2019 11:35 schrieb "Zhou, David(ChunMing)" 
:


Hi Lionel,

I checked your igt test case,

uint64_t points[5] = { 1, 5, 3, 7, 6 };

which is illegal signal order.

I must admit we should handle it gracefully if signal isn't 
in-order, and we shouldn't lead to deadlock.


Hi Christian,

Can we just ignore when signal point X <= timeline Y? Or just 
give a warning?


Otherwise like Lionel's unexpected use cases, which easily leads 
to deadlock.



-David


On 2019年02月15日 22:28, Lionel Landwerlin wrote:

Hi David,

Thanks a lot for point me to the tests you've added in IGT.
While adding a test with that signals fences imported into a timeline
syncobj out of order, I ran into a deadlock.
Here is the test :
https://github.com/djdeath/intel-gpu-tools/commit/1e46cf7e7bff09b78a24367ddc2314f97eb0a1b9

Trying to kill the deadlocked process I got this backtrace :


[   33.969136] [IGT] syncobj_timeline: starting subtest signal-order
[   60.452823] watchdog: BUG: soft lockup - CPU#6 stuck for 23s!
[syncobj_timelin:2021]
[   60.452826] Modules linked in: rfcomm cmac bnep binfmt_misc
nls_iso8859_1 snd_hda_codec_hdmi snd_hda_codec_realtek
snd_hda_codec_generic ledtrig_audio sch_fq_codel ib_iser snd_hda_intel
rdma_cm iw_cm snd_hda_codec ib_cm snd_hda_core snd_hwdep intel_rapl
snd_pcm ib_core x86_pkg_temp_thermal intel_powerclamp configf
s coretemp iscsi_tcp snd_seq_midi libiscsi_tcp snd_seq_midi_event
libiscsi kvm_intel scsi_transport_iscsi kvm btusb snd_rawmidi irqbypass
btrtl intel_cstate intel_rapl_perf btbcm btintel bluetooth snd_seq
snd_seq_device snd_timer input_leds ecdh_generic snd soundcore mei_me
mei intel_pch_thermal mac_hid acpi_pad parp
ort_pc ppdev lp parport ip_tables x_tables autofs4 btrfs zstd_decompress
zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear
hid_generic usbhid hid i915 crct10dif_pclmul crc32_pclmul i2c_algo_bit
ghash_clmulni_intel prime_numbers
drm_kms_helper aesni_intel syscopyarea sysfillrect
[   60.452876]  sysimgblt fb_sys_fops aes_x86_64 crypto_simd sdhci_pci
cryptd drm e1000e glue_helper cqhci sdhci wmi video
[   60.452881] CPU: 6 PID: 2021 Comm: syncobj_timelin Tainted: G
U    5.0.0-rc5+ #337
[   60.452882] Hardware name:  /NUC6i7KYB, BIOS
KYSKLi70.86A.0042.2016.0929.1933 09/29/2016
[   60.452886] RIP: 0010:dma_fence_chain_walk+0x22c/0x260
[   60.452888] Code: ff e9 93 fe ff ff 48 8b 45 08 48 8b 40 18 48 85 c0
74 0c 48 89 ef e8 33 0f 58 00 84 c0 75 23 f0 41 ff 4d 00 0f 88 99 87 2f
00 <0f>

Re: [PATCH 09/11] drm/syncobj: add transition iotcls between binary and timeline

2019-02-19 Thread zhoucm1



On 2019年02月19日 19:32, Koenig, Christian wrote:

Hi David,


Could you have a look if it's reasonable?


Patch #1 is also something I already fixed on my local branch.

But patch #2 won't work like this.

We can't return an error from drm_syncobj_add_point() because we 
already submitted work to the hardware. And just dropping the fence 
like you do in the patch is a clearly no-go as well.


Then do you have any idea to skip the messed order signal point?

-David


Regards,
Christian.

Am 19.02.19 um 11:46 schrieb zhoucm1:


Hi Lionel,

the attached should fix your problem and also messed signal order.

Hi Christian,

Could you have a look if it's reasonable?


btw: I pushed to change to 
https://github.com/amingriyue/timeline-syncobj-kernel, which is 
already rebased to latest drm-misc(kernel 5.0). You can directly use 
that branch.



-David


On 2019年02月19日 01:01, Koenig, Christian wrote:

Am 18.02.19 um 13:07 schrieb Lionel Landwerlin:

Thanks guys :)

You mentioned that signaling out of order is illegal.
Is this illegal with regard to the vulkan spec or to the syncobj 
implementation?


David is the expert on that, but as far as I know that is forbidden 
by the vulkan spec.


I'm not finding anything in the vulkan spec that makes out of order 
signaling illegal.
That's why I came up with this test, just verifying that the 
timeline does not go backward in term of its payload.


Well we need to handle this case gracefully in the kernel, so it is 
still a good testcase.


Christian.



-Lionel

On 18/02/2019 11:01, Koenig, Christian wrote:

Hi David,

well I think Lionel is testing the invalid signal order on purpose :)

Anyway we really need to handle invalid order graceful here. E.g. 
either the same way as during CS or we abort and return an error 
message.


I think just using the same approach as during CS ist the best we 
can do.


Regards,
Christian


Am 18.02.2019 11:35 schrieb "Zhou, David(ChunMing)" 
:


Hi Lionel,

I checked your igt test case,

uint64_t points[5] = { 1, 5, 3, 7, 6 };

which is illegal signal order.

I must admit we should handle it gracefully if signal isn't 
in-order, and we shouldn't lead to deadlock.


Hi Christian,

Can we just ignore when signal point X <= timeline Y? Or just give 
a warning?


Otherwise like Lionel's unexpected use cases, which easily leads 
to deadlock.



-David


On 2019年02月15日 22:28, Lionel Landwerlin wrote:

Hi David,

Thanks a lot for point me to the tests you've added in IGT.
While adding a test with that signals fences imported into a timeline
syncobj out of order, I ran into a deadlock.
Here is the test :
https://github.com/djdeath/intel-gpu-tools/commit/1e46cf7e7bff09b78a24367ddc2314f97eb0a1b9

Trying to kill the deadlocked process I got this backtrace :


[   33.969136] [IGT] syncobj_timeline: starting subtest signal-order
[   60.452823] watchdog: BUG: soft lockup - CPU#6 stuck for 23s!
[syncobj_timelin:2021]
[   60.452826] Modules linked in: rfcomm cmac bnep binfmt_misc
nls_iso8859_1 snd_hda_codec_hdmi snd_hda_codec_realtek
snd_hda_codec_generic ledtrig_audio sch_fq_codel ib_iser snd_hda_intel
rdma_cm iw_cm snd_hda_codec ib_cm snd_hda_core snd_hwdep intel_rapl
snd_pcm ib_core x86_pkg_temp_thermal intel_powerclamp configf
s coretemp iscsi_tcp snd_seq_midi libiscsi_tcp snd_seq_midi_event
libiscsi kvm_intel scsi_transport_iscsi kvm btusb snd_rawmidi irqbypass
btrtl intel_cstate intel_rapl_perf btbcm btintel bluetooth snd_seq
snd_seq_device snd_timer input_leds ecdh_generic snd soundcore mei_me
mei intel_pch_thermal mac_hid acpi_pad parp
ort_pc ppdev lp parport ip_tables x_tables autofs4 btrfs zstd_decompress
zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear
hid_generic usbhid hid i915 crct10dif_pclmul crc32_pclmul i2c_algo_bit
ghash_clmulni_intel prime_numbers
drm_kms_helper aesni_intel syscopyarea sysfillrect
[   60.452876]  sysimgblt fb_sys_fops aes_x86_64 crypto_simd sdhci_pci
cryptd drm e1000e glue_helper cqhci sdhci wmi video
[   60.452881] CPU: 6 PID: 2021 Comm: syncobj_timelin Tainted: G
U    5.0.0-rc5+ #337
[   60.452882] Hardware name:  /NUC6i7KYB, BIOS
KYSKLi70.86A.0042.2016.0929.1933 09/29/2016
[   60.452886] RIP: 0010:dma_fence_chain_walk+0x22c/0x260
[   60.452888] Code: ff e9 93 fe ff ff 48 8b 45 08 48 8b 40 18 48 85 c0
74 0c 48 89 ef e8 33 0f 58 00 84 c0 75 23 f0 41 ff 4d 00 0f 88 99 87 2f
00 <0f> 85 05 fe ff ff 4c 89 ef e8 56 ea ff ff 48 89 d8 5b 5d 41 5c 41
[   60.452888] RSP: 0018:9a5804653ca8 EFLAGS: 00010296 ORIG_RAX:
ff13
[   60.452889] RAX:  RBX: 8f5690fb2480 RCX:
8f5690fb2f00
[   60.452890] RDX: 003e3730 RSI:  RDI:
8f5690fb2180
[   60.452891] RBP: 8f5690fb2180 R08:  R09:
8f5690fb2eb0
[   60.452891] R10:  R11: 8f5660469860 R12:
8f5690fb2f68
[   60.452892] R13: 8f5690fb2f00 R14: 0

Re: [PATCH 09/11] drm/syncobj: add transition iotcls between binary and timeline

2019-02-19 Thread zhoucm1

Hi Lionel,

the attached should fix your problem and also messed signal order.

Hi Christian,

Could you have a look if it's reasonable?


btw: I pushed to change to 
https://github.com/amingriyue/timeline-syncobj-kernel, which is already 
rebased to latest drm-misc(kernel 5.0). You can directly use that branch.



-David


On 2019年02月19日 01:01, Koenig, Christian wrote:

Am 18.02.19 um 13:07 schrieb Lionel Landwerlin:

Thanks guys :)

You mentioned that signaling out of order is illegal.
Is this illegal with regard to the vulkan spec or to the syncobj 
implementation?


David is the expert on that, but as far as I know that is forbidden by 
the vulkan spec.


I'm not finding anything in the vulkan spec that makes out of order 
signaling illegal.
That's why I came up with this test, just verifying that the timeline 
does not go backward in term of its payload.


Well we need to handle this case gracefully in the kernel, so it is 
still a good testcase.


Christian.



-Lionel

On 18/02/2019 11:01, Koenig, Christian wrote:

Hi David,

well I think Lionel is testing the invalid signal order on purpose :)

Anyway we really need to handle invalid order graceful here. E.g. 
either the same way as during CS or we abort and return an error 
message.


I think just using the same approach as during CS ist the best we 
can do.


Regards,
Christian


Am 18.02.2019 11:35 schrieb "Zhou, David(ChunMing)" 
:


Hi Lionel,

I checked your igt test case,

uint64_t points[5] = { 1, 5, 3, 7, 6 };

which is illegal signal order.

I must admit we should handle it gracefully if signal isn't 
in-order, and we shouldn't lead to deadlock.


Hi Christian,

Can we just ignore when signal point X <= timeline Y? Or just give a 
warning?


Otherwise like Lionel's unexpected use cases, which easily leads to 
deadlock.



-David


On 2019年02月15日 22:28, Lionel Landwerlin wrote:

Hi David,

Thanks a lot for point me to the tests you've added in IGT.
While adding a test with that signals fences imported into a timeline
syncobj out of order, I ran into a deadlock.
Here is the test :
https://github.com/djdeath/intel-gpu-tools/commit/1e46cf7e7bff09b78a24367ddc2314f97eb0a1b9

Trying to kill the deadlocked process I got this backtrace :


[   33.969136] [IGT] syncobj_timeline: starting subtest signal-order
[   60.452823] watchdog: BUG: soft lockup - CPU#6 stuck for 23s!
[syncobj_timelin:2021]
[   60.452826] Modules linked in: rfcomm cmac bnep binfmt_misc
nls_iso8859_1 snd_hda_codec_hdmi snd_hda_codec_realtek
snd_hda_codec_generic ledtrig_audio sch_fq_codel ib_iser snd_hda_intel
rdma_cm iw_cm snd_hda_codec ib_cm snd_hda_core snd_hwdep intel_rapl
snd_pcm ib_core x86_pkg_temp_thermal intel_powerclamp configf
s coretemp iscsi_tcp snd_seq_midi libiscsi_tcp snd_seq_midi_event
libiscsi kvm_intel scsi_transport_iscsi kvm btusb snd_rawmidi irqbypass
btrtl intel_cstate intel_rapl_perf btbcm btintel bluetooth snd_seq
snd_seq_device snd_timer input_leds ecdh_generic snd soundcore mei_me
mei intel_pch_thermal mac_hid acpi_pad parp
ort_pc ppdev lp parport ip_tables x_tables autofs4 btrfs zstd_decompress
zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear
hid_generic usbhid hid i915 crct10dif_pclmul crc32_pclmul i2c_algo_bit
ghash_clmulni_intel prime_numbers
drm_kms_helper aesni_intel syscopyarea sysfillrect
[   60.452876]  sysimgblt fb_sys_fops aes_x86_64 crypto_simd sdhci_pci
cryptd drm e1000e glue_helper cqhci sdhci wmi video
[   60.452881] CPU: 6 PID: 2021 Comm: syncobj_timelin Tainted: G
U    5.0.0-rc5+ #337
[   60.452882] Hardware name:  /NUC6i7KYB, BIOS
KYSKLi70.86A.0042.2016.0929.1933 09/29/2016
[   60.452886] RIP: 0010:dma_fence_chain_walk+0x22c/0x260
[   60.452888] Code: ff e9 93 fe ff ff 48 8b 45 08 48 8b 40 18 48 85 c0
74 0c 48 89 ef e8 33 0f 58 00 84 c0 75 23 f0 41 ff 4d 00 0f 88 99 87 2f
00 <0f> 85 05 fe ff ff 4c 89 ef e8 56 ea ff ff 48 89 d8 5b 5d 41 5c 41
[   60.452888] RSP: 0018:9a5804653ca8 EFLAGS: 00010296 ORIG_RAX:
ff13
[   60.452889] RAX:  RBX: 8f5690fb2480 RCX:
8f5690fb2f00
[   60.452890] RDX: 003e3730 RSI:  RDI:
8f5690fb2180
[   60.452891] RBP: 8f5690fb2180 R08:  R09:
8f5690fb2eb0
[   60.452891] R10:  R11: 8f5660469860 R12:
8f5690fb2f68
[   60.452892] R13: 8f5690fb2f00 R14: 0003 R15:
8f5655a45fc0
[   60.452913] FS:  7fdc5c459980() GS:8f569eb8()
knlGS:
[   60.452913] CS:  0010 DS:  ES:  CR0: 80050033
[   60.452914] CR2: 7f9d74336dd8 CR3: 00084a67e004 CR4:
003606e0
[   60.452915] DR0:  DR1:  DR2:

[   60.452915] DR3:  DR6: fffe0ff0 DR7:
0400
[   60.452916] Call Trace:
[   60.452958]  drm_syncobj_add_point+0x102/0x160 [drm]
[   60.452965]  ? 

Re: [PATCH 09/11] drm/syncobj: add transition iotcls between binary and timeline

2019-02-18 Thread zhoucm1

Hi Lionel,

I checked your igt test case,

uint64_t points[5] = { 1, 5, 3, 7, 6 };

which is illegal signal order.

I must admit we should handle it gracefully if signal isn't in-order, 
and we shouldn't lead to deadlock.


Hi Christian,

Can we just ignore when signal point X <= timeline Y? Or just give a 
warning?


Otherwise like Lionel's unexpected use cases, which easily leads to 
deadlock.



-David


On 2019年02月15日 22:28, Lionel Landwerlin wrote:

Hi David,

Thanks a lot for point me to the tests you've added in IGT.
While adding a test with that signals fences imported into a timeline
syncobj out of order, I ran into a deadlock.
Here is the test :
https://github.com/djdeath/intel-gpu-tools/commit/1e46cf7e7bff09b78a24367ddc2314f97eb0a1b9

Trying to kill the deadlocked process I got this backtrace :


[   33.969136] [IGT] syncobj_timeline: starting subtest signal-order
[   60.452823] watchdog: BUG: soft lockup - CPU#6 stuck for 23s!
[syncobj_timelin:2021]
[   60.452826] Modules linked in: rfcomm cmac bnep binfmt_misc
nls_iso8859_1 snd_hda_codec_hdmi snd_hda_codec_realtek
snd_hda_codec_generic ledtrig_audio sch_fq_codel ib_iser snd_hda_intel
rdma_cm iw_cm snd_hda_codec ib_cm snd_hda_core snd_hwdep intel_rapl
snd_pcm ib_core x86_pkg_temp_thermal intel_powerclamp configf
s coretemp iscsi_tcp snd_seq_midi libiscsi_tcp snd_seq_midi_event
libiscsi kvm_intel scsi_transport_iscsi kvm btusb snd_rawmidi irqbypass
btrtl intel_cstate intel_rapl_perf btbcm btintel bluetooth snd_seq
snd_seq_device snd_timer input_leds ecdh_generic snd soundcore mei_me
mei intel_pch_thermal mac_hid acpi_pad parp
ort_pc ppdev lp parport ip_tables x_tables autofs4 btrfs zstd_decompress
zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear
hid_generic usbhid hid i915 crct10dif_pclmul crc32_pclmul i2c_algo_bit
ghash_clmulni_intel prime_numbers
drm_kms_helper aesni_intel syscopyarea sysfillrect
[   60.452876]  sysimgblt fb_sys_fops aes_x86_64 crypto_simd sdhci_pci
cryptd drm e1000e glue_helper cqhci sdhci wmi video
[   60.452881] CPU: 6 PID: 2021 Comm: syncobj_timelin Tainted: G
U    5.0.0-rc5+ #337
[   60.452882] Hardware name:  /NUC6i7KYB, BIOS
KYSKLi70.86A.0042.2016.0929.1933 09/29/2016
[   60.452886] RIP: 0010:dma_fence_chain_walk+0x22c/0x260
[   60.452888] Code: ff e9 93 fe ff ff 48 8b 45 08 48 8b 40 18 48 85 c0
74 0c 48 89 ef e8 33 0f 58 00 84 c0 75 23 f0 41 ff 4d 00 0f 88 99 87 2f
00 <0f> 85 05 fe ff ff 4c 89 ef e8 56 ea ff ff 48 89 d8 5b 5d 41 5c 41
[   60.452888] RSP: 0018:9a5804653ca8 EFLAGS: 00010296 ORIG_RAX:
ff13
[   60.452889] RAX:  RBX: 8f5690fb2480 RCX:
8f5690fb2f00
[   60.452890] RDX: 003e3730 RSI:  RDI:
8f5690fb2180
[   60.452891] RBP: 8f5690fb2180 R08:  R09:
8f5690fb2eb0
[   60.452891] R10:  R11: 8f5660469860 R12:
8f5690fb2f68
[   60.452892] R13: 8f5690fb2f00 R14: 0003 R15:
8f5655a45fc0
[   60.452913] FS:  7fdc5c459980() GS:8f569eb8()
knlGS:
[   60.452913] CS:  0010 DS:  ES:  CR0: 80050033
[   60.452914] CR2: 7f9d74336dd8 CR3: 00084a67e004 CR4:
003606e0
[   60.452915] DR0:  DR1:  DR2:

[   60.452915] DR3:  DR6: fffe0ff0 DR7:
0400
[   60.452916] Call Trace:
[   60.452958]  drm_syncobj_add_point+0x102/0x160 [drm]
[   60.452965]  ? drm_syncobj_fd_to_handle_ioctl+0x1b0/0x1b0 [drm]
[   60.452971]  drm_syncobj_transfer_ioctl+0x10f/0x180 [drm]
[   60.452978]  drm_ioctl_kernel+0xac/0xf0 [drm]
[   60.452984]  drm_ioctl+0x2eb/0x3b0 [drm]
[   60.452990]  ? drm_syncobj_fd_to_handle_ioctl+0x1b0/0x1b0 [drm]
[   60.452992]  ? sw_sync_ioctl+0x347/0x370
[   60.452994]  do_vfs_ioctl+0xa4/0x640
[   60.452995]  ? __fput+0x134/0x220
[   60.452997]  ? do_fcntl+0x1a5/0x650
[   60.452998]  ksys_ioctl+0x70/0x80
[   60.452999]  __x64_sys_ioctl+0x16/0x20
[   60.453002]  do_syscall_64+0x55/0x110
[   60.453004]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   60.453005] RIP: 0033:0x7fdc5b6e45d7
[   60.453006] Code: b3 66 90 48 8b 05 b1 48 2d 00 64 c7 00 26 00 00 00
48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 48 2d 00 f7 d8 64 89 01 48
[   60.453007] RSP: 002b:7fff25c4d198 EFLAGS: 0206 ORIG_RAX:
0010
[   60.453008] RAX: ffda RBX:  RCX:
7fdc5b6e45d7
[   60.453008] RDX: 7fff25c4d200 RSI: c02064cc RDI:
0003
[   60.453009] RBP: 7fff25c4d1d0 R08:  R09:
001e
[   60.453010] R10:  R11: 0206 R12:
563d3959e4d0
[   60.453010] R13: 7fff25c4d620 R14:  R15:

[   88.447359] watchdog: BUG: soft lockup - CPU#6 stuck for 22s!
[syncobj_timelin:2021]

Re: [PATCH 06/11] drm/syncobj: add timeline payload query ioctl v4

2019-02-17 Thread zhoucm1



On 2019年02月17日 03:22, Christian König wrote:

Am 15.02.19 um 20:31 schrieb Lionel Landwerlin via amd-gfx:

On 07/12/2018 09:55, Chunming Zhou wrote:

user mode can query timeline payload.
v2: check return value of copy_to_user
v3: handle querying entry by entry
v4: rebase on new chain container, simplify interface

Signed-off-by: Chunming Zhou 
Cc: Daniel Rakos 
Cc: Jason Ekstrand 
Cc: Bas Nieuwenhuizen 
Cc: Dave Airlie 
Cc: Christian König 
Cc: Chris Wilson 
---
  drivers/gpu/drm/drm_internal.h |  2 ++
  drivers/gpu/drm/drm_ioctl.c    |  2 ++
  drivers/gpu/drm/drm_syncobj.c  | 43 
++

  include/uapi/drm/drm.h | 10 
  4 files changed, 57 insertions(+)

diff --git a/drivers/gpu/drm/drm_internal.h 
b/drivers/gpu/drm/drm_internal.h

index 18b41e10195c..dab4d5936441 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -184,6 +184,8 @@ int drm_syncobj_reset_ioctl(struct drm_device 
*dev, void *data,

  struct drm_file *file_private);
  int drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
   struct drm_file *file_private);
+int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
+    struct drm_file *file_private);
    /* drm_framebuffer.c */
  void drm_framebuffer_print_info(struct drm_printer *p, unsigned 
int indent,

diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index a9a17ed35cc4..7578ef6dc1d1 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -681,6 +681,8 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
    DRM_UNLOCKED|DRM_RENDER_ALLOW),
  DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_SIGNAL, drm_syncobj_signal_ioctl,
    DRM_UNLOCKED|DRM_RENDER_ALLOW),
+    DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_QUERY, drm_syncobj_query_ioctl,
+  DRM_UNLOCKED|DRM_RENDER_ALLOW),
  DRM_IOCTL_DEF(DRM_IOCTL_CRTC_GET_SEQUENCE, 
drm_crtc_get_sequence_ioctl, DRM_UNLOCKED),
  DRM_IOCTL_DEF(DRM_IOCTL_CRTC_QUEUE_SEQUENCE, 
drm_crtc_queue_sequence_ioctl, DRM_UNLOCKED),
  DRM_IOCTL_DEF(DRM_IOCTL_MODE_CREATE_LEASE, 
drm_mode_create_lease_ioctl, DRM_MASTER|DRM_UNLOCKED),
diff --git a/drivers/gpu/drm/drm_syncobj.c 
b/drivers/gpu/drm/drm_syncobj.c

index 348079bb0965..f97fa00ca1d0 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -1061,3 +1061,46 @@ drm_syncobj_signal_ioctl(struct drm_device 
*dev, void *data,

    return ret;
  }
+
+int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
+    struct drm_file *file_private)
+{
+    struct drm_syncobj_timeline_array *args = data;
+    struct drm_syncobj **syncobjs;
+    uint64_t __user *points = u64_to_user_ptr(args->points);
+    uint32_t i;
+    int ret;
+
+    if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ))
+    return -ENODEV;
+
+    if (args->pad != 0)
+    return -EINVAL;
+
+    if (args->count_handles == 0)
+    return -EINVAL;
+
+    ret = drm_syncobj_array_find(file_private,
+ u64_to_user_ptr(args->handles),
+ args->count_handles,
+ );
+    if (ret < 0)
+    return ret;
+
+    for (i = 0; i < args->count_handles; i++) {
+    struct dma_fence_chain *chain;
+    struct dma_fence *fence;
+    uint64_t point;
+
+    fence = drm_syncobj_fence_get(syncobjs[i]);
+    chain = to_dma_fence_chain(fence);
+    point = chain ? fence->seqno : 0;



Sorry, I don' t want to sound annoying, but this looks like this 
could report values going backward.


Well please be annoying as much as you can :) But yeah all that stuff 
has been discussed before as well.




Anything add a point X to a timeline that has reached value Y with X 
< Y would trigger that.


Yes, that can indeed happen.

trigger what? when adding x (x < y), then return 0 when query?
Why would this happen?
No, syncobj->fence should always be there and the last chain node, if it 
is ever added.


-David
But adding a timeline point X which is before the already added point 
Y is illegal in the first place :)


So when the application does something stupid and breaks it can just 
keep the pieces.


In the kernel we still do the most defensive thing and sync to 
everything in this case.


I'm just not sure if we should print an error into syslog or just 
continue silently.


Regards,
Christian.



Either through the submission or userspace signaling or importing 
another syncpoint's fence.



-Lionel



+    ret = copy_to_user([i], , sizeof(uint64_t));
+    ret = ret ? -EFAULT : 0;
+    if (ret)
+    break;
+    }
+    drm_syncobj_array_free(syncobjs, args->count_handles);
+
+    return ret;
+}
diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
index 0092111d002c..b2c36f2b2599 100644
--- a/include/uapi/drm/drm.h
+++ b/include/uapi/drm/drm.h
@@ -767,6 +767,14 @@ struct drm_syncobj_array {
  __u32 pad;
  };
  +struct 

Re: [PATCH] drm/ttm: stop always moving BOs on the LRU on page fault

2019-01-13 Thread zhoucm1



On 2019年01月11日 21:15, Christian König wrote:

Move the BO on the LRU only when it is actually moved by a DMA
operation.

Signed-off-by: Christian König 

Tested-And-Reviewed-by: Chunming Zhou 

I just sent lru_notify  v2 patches, please review them. With yours and 
mine, the OOM issue is fixed without negative effect.


-David

---
  drivers/gpu/drm/ttm/ttm_bo_vm.c | 19 ---
  1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index a1d977fbade5..e86a29a1e51f 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -71,7 +71,7 @@ static vm_fault_t ttm_bo_vm_fault_idle(struct 
ttm_buffer_object *bo,
ttm_bo_get(bo);
up_read(>vma->vm_mm->mmap_sem);
(void) dma_fence_wait(bo->moving, true);
-   ttm_bo_unreserve(bo);
+   reservation_object_unlock(bo->resv);
ttm_bo_put(bo);
goto out_unlock;
}
@@ -131,11 +131,7 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf)
 * for reserve, and if it fails, retry the fault after waiting
 * for the buffer to become unreserved.
 */
-   err = ttm_bo_reserve(bo, true, true, NULL);
-   if (unlikely(err != 0)) {
-   if (err != -EBUSY)
-   return VM_FAULT_NOPAGE;
-
+   if (unlikely(!reservation_object_trylock(bo->resv))) {
if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) {
if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) {
ttm_bo_get(bo);
@@ -165,6 +161,8 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf)
}
  
  	if (bdev->driver->fault_reserve_notify) {

+   struct dma_fence *moving = dma_fence_get(bo->moving);
+
err = bdev->driver->fault_reserve_notify(bo);
switch (err) {
case 0:
@@ -177,6 +175,13 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf)
ret = VM_FAULT_SIGBUS;
goto out_unlock;
}
+
+   if (bo->moving != moving) {
+   spin_lock(>glob->lru_lock);
+   ttm_bo_move_to_lru_tail(bo, NULL);
+   spin_unlock(>glob->lru_lock);
+   }
+   dma_fence_put(moving);
}
  
  	/*

@@ -291,7 +296,7 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf)
  out_io_unlock:
ttm_mem_io_unlock(man);
  out_unlock:
-   ttm_bo_unreserve(bo);
+   reservation_object_unlock(bo->resv);
return ret;
  }
  


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH -next] drm/amdgpu: Fix return value check in amdgpu_allocate_static_csa()

2018-12-03 Thread zhoucm1



On 2018年12月04日 14:39, Wei Yongjun wrote:

Fix the return value check which testing the wrong variable
in amdgpu_allocate_static_csa().

Fixes: 7946340fa389 ("drm/amdgpu: Move csa related code to separate file")
Signed-off-by: Wei Yongjun 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
index 0c590dd..a5fbc6f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
@@ -43,7 +43,7 @@ int amdgpu_allocate_static_csa(struct amdgpu_device *adev, 
struct amdgpu_bo **bo
r = amdgpu_bo_create_kernel(adev, size, PAGE_SIZE,
domain, bo,
NULL, );
-   if (!bo)
+   if (!r)
return -ENOMEM;
I guess original is correct as well, if you want to change it, you can 
make like below, not your 'if (!r)':

                if (r)
                        return r;

-David
  
  	memset(ptr, 0, size);






___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 02/11] dma-buf: add new dma_fence_chain container v2

2018-12-02 Thread zhoucm1



On 2018年11月28日 22:50, Christian König wrote:

Lockless container implementation similar to a dma_fence_array, but with
only two elements per node and automatic garbage collection.

v2: properly document dma_fence_chain_for_each, add dma_fence_chain_find_seqno,
 drop prev reference during garbage collection if it's not a chain fence.

Signed-off-by: Christian König 
---
  drivers/dma-buf/Makefile  |   3 +-
  drivers/dma-buf/dma-fence-chain.c | 235 ++
  include/linux/dma-fence-chain.h   |  79 +
  3 files changed, 316 insertions(+), 1 deletion(-)
  create mode 100644 drivers/dma-buf/dma-fence-chain.c
  create mode 100644 include/linux/dma-fence-chain.h

diff --git a/drivers/dma-buf/Makefile b/drivers/dma-buf/Makefile
index 0913a6ccab5a..1f006e083eb9 100644
--- a/drivers/dma-buf/Makefile
+++ b/drivers/dma-buf/Makefile
@@ -1,4 +1,5 @@
-obj-y := dma-buf.o dma-fence.o dma-fence-array.o reservation.o seqno-fence.o
+obj-y := dma-buf.o dma-fence.o dma-fence-array.o dma-fence-chain.o \
+reservation.o seqno-fence.o
  obj-$(CONFIG_SYNC_FILE)   += sync_file.o
  obj-$(CONFIG_SW_SYNC) += sw_sync.o sync_debug.o
  obj-$(CONFIG_UDMABUF) += udmabuf.o
diff --git a/drivers/dma-buf/dma-fence-chain.c 
b/drivers/dma-buf/dma-fence-chain.c
new file mode 100644
index ..de05101fc48d
--- /dev/null
+++ b/drivers/dma-buf/dma-fence-chain.c
@@ -0,0 +1,235 @@
+/*
+ * fence-chain: chain fences together in a timeline
+ *
+ * Copyright (C) 2018 Advanced Micro Devices, Inc.
+ * Authors:
+ * Christian König 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#include 
+
+static bool dma_fence_chain_enable_signaling(struct dma_fence *fence);
+
+/**
+ * dma_fence_chain_get_prev - use RCU to get a reference to the previous fence
+ * @chain: chain node to get the previous node from
+ *
+ * Use dma_fence_get_rcu_safe to get a reference to the previous fence of the
+ * chain node.
+ */
+static struct dma_fence *dma_fence_chain_get_prev(struct dma_fence_chain 
*chain)
+{
+   struct dma_fence *prev;
+
+   rcu_read_lock();
+   prev = dma_fence_get_rcu_safe(>prev);
+   rcu_read_unlock();
+   return prev;
+}
+
+/**
+ * dma_fence_chain_walk - chain walking function
+ * @fence: current chain node
+ *
+ * Walk the chain to the next node. Returns the next fence or NULL if we are at
+ * the end of the chain. Garbage collects chain nodes which are already
+ * signaled.
+ */
+struct dma_fence *dma_fence_chain_walk(struct dma_fence *fence)
+{
+   struct dma_fence_chain *chain, *prev_chain;
+   struct dma_fence *prev, *replacement, *tmp;
+
+   chain = to_dma_fence_chain(fence);
+   if (!chain) {
+   dma_fence_put(fence);
+   return NULL;
+   }
+
+   while ((prev = dma_fence_chain_get_prev(chain))) {
+
+   prev_chain = to_dma_fence_chain(prev);
+   if (prev_chain) {
+   if (!dma_fence_is_signaled(prev_chain->fence))
+   break;
+
+   replacement = dma_fence_chain_get_prev(prev_chain);
+   } else {
+   if (!dma_fence_is_signaled(prev))
+   break;
+
+   replacement = NULL;
+   }
+
+   tmp = cmpxchg(>prev, prev, replacement);
+   if (tmp == prev)
+   dma_fence_put(tmp);
+   else
+   dma_fence_put(replacement);
+   dma_fence_put(prev);
+   }
+
+   dma_fence_put(fence);
+   return prev;
+}
+EXPORT_SYMBOL(dma_fence_chain_walk);
+
+/**
+ * dma_fence_chain_find_seqno - find fence chain node by seqno
+ * @pfence: pointer to the chain node where to start
+ * @seqno: the sequence number to search for
+ *
+ * Advance the fence pointer to the chain node which will signal this sequence
+ * number. If no sequence number is provided then this is a no-op.
+ *
+ * Returns EINVAL if the fence is not a chain node or the sequence number has
+ * not yet advanced far enough.
+ */
+int dma_fence_chain_find_seqno(struct dma_fence **pfence, uint64_t seqno)
+{
+   struct dma_fence_chain *chain;
+
+   if (!seqno)
+   return 0;
+
+   chain = to_dma_fence_chain(*pfence);
+   if (!chain || chain->base.seqno < seqno)
+   return -EINVAL;
+
+   dma_fence_chain_for_each(*pfence) {
+   if ((*pfence)->context != chain->base.context ||
+   

Re: [PATCH 04/11] drm/syncobj: use only a single stub fence

2018-11-29 Thread zhoucm1
Could you move this one to dma-fence as you said? Which will be used in 
other place as well.


-David


On 2018年11月28日 22:50, Christian König wrote:

Extract of useful code from the timeline work. Let's use just a single
stub fence instance instead of allocating a new one all the time.

Signed-off-by: Chunming Zhou 
Signed-off-by: Christian König 
---
  drivers/gpu/drm/drm_syncobj.c | 67 ++-
  1 file changed, 35 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index b92e3c726229..f78321338c1f 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -56,10 +56,8 @@
  #include "drm_internal.h"
  #include 
  
-struct drm_syncobj_stub_fence {

-   struct dma_fence base;
-   spinlock_t lock;
-};
+static DEFINE_SPINLOCK(stub_fence_lock);
+static struct dma_fence stub_fence;
  
  static const char *drm_syncobj_stub_fence_get_name(struct dma_fence *fence)

  {
@@ -71,6 +69,25 @@ static const struct dma_fence_ops drm_syncobj_stub_fence_ops 
= {
.get_timeline_name = drm_syncobj_stub_fence_get_name,
  };
  
+/**

+ * drm_syncobj_get_stub_fence - return a signaled fence
+ *
+ * Return a stub fence which is already signaled.
+ */
+static struct dma_fence *drm_syncobj_get_stub_fence(void)
+{
+   spin_lock(_fence_lock);
+   if (!stub_fence.ops) {
+   dma_fence_init(_fence,
+  _syncobj_stub_fence_ops,
+  _fence_lock,
+  0, 0);
+   dma_fence_signal_locked(_fence);
+   }
+   spin_unlock(_fence_lock);
+
+   return dma_fence_get(_fence);
+}
  
  /**

   * drm_syncobj_find - lookup and reference a sync object.
@@ -188,23 +205,18 @@ void drm_syncobj_replace_fence(struct drm_syncobj 
*syncobj,
  }
  EXPORT_SYMBOL(drm_syncobj_replace_fence);
  
-static int drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj)

+/**
+ * drm_syncobj_assign_null_handle - assign a stub fence to the sync object
+ * @syncobj: sync object to assign the fence on
+ *
+ * Assign a already signaled stub fence to the sync object.
+ */
+static void drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj)
  {
-   struct drm_syncobj_stub_fence *fence;
-   fence = kzalloc(sizeof(*fence), GFP_KERNEL);
-   if (fence == NULL)
-   return -ENOMEM;
+   struct dma_fence *fence = drm_syncobj_get_stub_fence();
  
-	spin_lock_init(>lock);

-   dma_fence_init(>base, _syncobj_stub_fence_ops,
-  >lock, 0, 0);
-   dma_fence_signal(>base);
-
-   drm_syncobj_replace_fence(syncobj, >base);
-
-   dma_fence_put(>base);
-
-   return 0;
+   drm_syncobj_replace_fence(syncobj, fence);
+   dma_fence_put(fence);
  }
  
  /**

@@ -272,7 +284,6 @@ EXPORT_SYMBOL(drm_syncobj_free);
  int drm_syncobj_create(struct drm_syncobj **out_syncobj, uint32_t flags,
   struct dma_fence *fence)
  {
-   int ret;
struct drm_syncobj *syncobj;
  
  	syncobj = kzalloc(sizeof(struct drm_syncobj), GFP_KERNEL);

@@ -283,13 +294,8 @@ int drm_syncobj_create(struct drm_syncobj **out_syncobj, 
uint32_t flags,
INIT_LIST_HEAD(>cb_list);
spin_lock_init(>lock);
  
-	if (flags & DRM_SYNCOBJ_CREATE_SIGNALED) {

-   ret = drm_syncobj_assign_null_handle(syncobj);
-   if (ret < 0) {
-   drm_syncobj_put(syncobj);
-   return ret;
-   }
-   }
+   if (flags & DRM_SYNCOBJ_CREATE_SIGNALED)
+   drm_syncobj_assign_null_handle(syncobj);
  
  	if (fence)

drm_syncobj_replace_fence(syncobj, fence);
@@ -980,11 +986,8 @@ drm_syncobj_signal_ioctl(struct drm_device *dev, void 
*data,
if (ret < 0)
return ret;
  
-	for (i = 0; i < args->count_handles; i++) {

-   ret = drm_syncobj_assign_null_handle(syncobjs[i]);
-   if (ret < 0)
-   break;
-   }
+   for (i = 0; i < args->count_handles; i++)
+   drm_syncobj_assign_null_handle(syncobjs[i]);
  
  	drm_syncobj_array_free(syncobjs, args->count_handles);
  


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH libdrm 4/5] wrap syncobj timeline query/wait APIs for amdgpu v3

2018-11-29 Thread zhoucm1



On 2018年11月28日 22:50, Christian König wrote:

From: Chunming Zhou 

v2: symbos are stored in lexical order.
v3: drop export/import and extra query indirection

Signed-off-by: Chunming Zhou 
Signed-off-by: Christian König 
---
  amdgpu/amdgpu-symbol-check |  2 ++
  amdgpu/amdgpu.h| 39 +++
  amdgpu/amdgpu_cs.c | 23 +++
  3 files changed, 64 insertions(+)

diff --git a/amdgpu/amdgpu-symbol-check b/amdgpu/amdgpu-symbol-check
index 6f5e0f95..4553736f 100755
--- a/amdgpu/amdgpu-symbol-check
+++ b/amdgpu/amdgpu-symbol-check
@@ -49,8 +49,10 @@ amdgpu_cs_submit
  amdgpu_cs_submit_raw
  amdgpu_cs_syncobj_export_sync_file
  amdgpu_cs_syncobj_import_sync_file
+amdgpu_cs_syncobj_query
  amdgpu_cs_syncobj_reset
  amdgpu_cs_syncobj_signal
+amdgpu_cs_syncobj_timeline_wait
  amdgpu_cs_syncobj_wait
  amdgpu_cs_wait_fences
  amdgpu_cs_wait_semaphore
diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index dc51659a..330658a0 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -1489,6 +1489,45 @@ int amdgpu_cs_syncobj_wait(amdgpu_device_handle dev,
   int64_t timeout_nsec, unsigned flags,
   uint32_t *first_signaled);
  
+/**

+ *  Wait for one or all sync objects on their points to signal.
+ *
+ * \param   dev- \c [in] self-explanatory
+ * \param   handles - \c [in] array of sync object handles
+ * \param   points - \c [in] array of sync points to wait
+ * \param   num_handles - \c [in] self-explanatory
+ * \param   timeout_nsec - \c [in] self-explanatory
+ * \param   flags   - \c [in] a bitmask of DRM_SYNCOBJ_WAIT_FLAGS_*
+ * \param   first_signaled - \c [in] self-explanatory
+ *
+ * \return   0 on success\n
+ *  -ETIME - Timeout
+ *  <0 - Negative POSIX Error code
+ *
+ */
+int amdgpu_cs_syncobj_timeline_wait(amdgpu_device_handle dev,
+   uint32_t *handles, uint64_t *points,
+   unsigned num_handles,
+   int64_t timeout_nsec, unsigned flags,
+   uint32_t *first_signaled);
+/**
+ *  Query sync objects payloads.
+ *
+ * \param   dev- \c [in] self-explanatory
+ * \param   handles - \c [in] array of sync object handles
+ * \param   points - \c [out] array of sync points returned, which presents
+ * syncobj payload.
+ * \param   num_handles - \c [in] self-explanatory
+ *
+ * \return   0 on success\n
+ *  -ETIME - Timeout
+ *  <0 - Negative POSIX Error code
+ *
+ */
+int amdgpu_cs_syncobj_query(amdgpu_device_handle dev,
+   uint32_t *handles, uint64_t *points,
+   unsigned num_handles);
+
  /**
   *  Export kernel sync object to shareable fd.
   *
diff --git a/amdgpu/amdgpu_cs.c b/amdgpu/amdgpu_cs.c
index 3b8231aa..e4a547c6 100644
--- a/amdgpu/amdgpu_cs.c
+++ b/amdgpu/amdgpu_cs.c
@@ -661,6 +661,29 @@ drm_public int amdgpu_cs_syncobj_wait(amdgpu_device_handle 
dev,
  flags, first_signaled);
  }
  
+drm_public int amdgpu_cs_syncobj_timeline_wait(amdgpu_device_handle dev,

+  uint32_t *handles, uint64_t 
*points,
+  unsigned num_handles,
+  int64_t timeout_nsec, unsigned 
flags,
+  uint32_t *first_signaled)
+{
+   if (NULL == dev)
+   return -EINVAL;
+
+   return drmSyncobjTimelineWait(dev->fd, handles, points, num_handles,
+ timeout_nsec, flags, first_signaled);
+}
+
+drm_public int amdgpu_cs_syncobj_query(amdgpu_device_handle dev,
+  uint32_t *handles, uint64_t *points,
This interfaces is public to umd, I think they like "uint64_t **points" 
for batch query, I've verified before, it works well and more convenience.
If removing num_handles, that means only one syncobj to query, I agree 
with "uint64_t *point".


-David

+  unsigned num_handles)
+{
+   if (NULL == dev)
+   return -EINVAL;
+
+   return drmSyncobjQuery(dev->fd, handles, points, num_handles);
+}
+
  drm_public int amdgpu_cs_export_syncobj(amdgpu_device_handle dev,
uint32_t handle,
int *shared_fd)


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: restart syncobj timeline changes v2

2018-11-29 Thread zhoucm1
Looks very very good, and I applied them in my local, and tested by 
./amdgpu_test -s 9 and synobj_basic/wait of IGT today.


+Daniel, Chris, Eric, Could you also have a look as well?


-David



On 2018年11月28日 22:50, Christian König wrote:

Tested this patch set more extensively in the last two weeks and fixed tons of 
additional bugs.

Still only testing with hand made DRM patches, but those are now rather 
reliable at least on amdgpu. Setting up igt is the next thing on the TODO list.

UAPI seems to be pretty solid already except for two changes:
1. Dropping an extra flag in the wait interface which was default behavior 
anyway.
2. Dropped the extra indirection in the query interface.

Additional to that I'm thinking if we shouldn't replace the flags parameter to 
find_fence() with a timeout value instead to limit how long we want to wait for 
a fence to appear.

Please test and comment,
Christian.

___
amd-gfx mailing list
amd-...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH libdrm 3/5] add timeline wait/query ioctl v2

2018-11-29 Thread zhoucm1



On 2018年11月28日 22:50, Christian König wrote:

From: Chunming Zhou 

v2: drop export/import

Signed-off-by: Chunming Zhou 
---
  xf86drm.c | 44 
  xf86drm.h |  8 
  2 files changed, 52 insertions(+)

diff --git a/xf86drm.c b/xf86drm.c
index 71ad54ba..afa2f466 100644
--- a/xf86drm.c
+++ b/xf86drm.c
@@ -4277,3 +4277,47 @@ drm_public int drmSyncobjSignal(int fd, const uint32_t 
*handles,
  ret = drmIoctl(fd, DRM_IOCTL_SYNCOBJ_SIGNAL, );
  return ret;
  }
+
+drm_public int drmSyncobjTimelineWait(int fd, uint32_t *handles, uint64_t 
*points,
+ unsigned num_handles,
+ int64_t timeout_nsec, unsigned flags,
+ uint32_t *first_signaled)
+{
+struct drm_syncobj_timeline_wait args;
+int ret;
+
+memclear(args);
+args.handles = (uintptr_t)handles;
+args.points = (uint64_t)(uintptr_t)points;
+args.timeout_nsec = timeout_nsec;
+args.count_handles = num_handles;
+args.flags = flags;
+
+ret = drmIoctl(fd, DRM_IOCTL_SYNCOBJ_TIMELINE_WAIT, );
+if (ret < 0)
+return -errno;
+
+if (first_signaled)
+*first_signaled = args.first_signaled;
+return ret;
+}
+
+
+drm_public int drmSyncobjQuery(int fd, uint32_t *handles, uint64_t *points,
We should change 'uint64_t *point' to 'uint64_t **points', otherwise, 
userspace always needs a copy to their own variable.


-David

+  uint32_t handle_count)
+{
+struct drm_syncobj_timeline_query args;
+int ret;
+
+memclear(args);
+args.handles = (uintptr_t)handles;
+args.points = (uint64_t)(uintptr_t)points;
+args.count_handles = handle_count;
+
+ret = drmIoctl(fd, DRM_IOCTL_SYNCOBJ_QUERY, );
+if (ret)
+return ret;
+return 0;
+}
+
+
diff --git a/xf86drm.h b/xf86drm.h
index 7773d71a..2dae1694 100644
--- a/xf86drm.h
+++ b/xf86drm.h
@@ -870,11 +870,19 @@ extern int drmSyncobjFDToHandle(int fd, int obj_fd, 
uint32_t *handle);
  
  extern int drmSyncobjImportSyncFile(int fd, uint32_t handle, int sync_file_fd);

  extern int drmSyncobjExportSyncFile(int fd, uint32_t handle, int 
*sync_file_fd);
+extern int drmSyncobjImportSyncFile2(int fd, uint32_t handle, uint64_t point, 
int sync_file_fd);
+extern int drmSyncobjExportSyncFile2(int fd, uint32_t handle, uint64_t point, 
int *sync_file_fd);
  extern int drmSyncobjWait(int fd, uint32_t *handles, unsigned num_handles,
  int64_t timeout_nsec, unsigned flags,
  uint32_t *first_signaled);
  extern int drmSyncobjReset(int fd, const uint32_t *handles, uint32_t 
handle_count);
  extern int drmSyncobjSignal(int fd, const uint32_t *handles, uint32_t 
handle_count);
+extern int drmSyncobjTimelineWait(int fd, uint32_t *handles, uint64_t *points,
+ unsigned num_handles,
+ int64_t timeout_nsec, unsigned flags,
+ uint32_t *first_signaled);
+extern int drmSyncobjQuery(int fd, uint32_t *handles, uint64_t *points,
+  uint32_t handle_count);
  
  #if defined(__cplusplus)

  }


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 01/11] dma-buf: make fence sequence numbers 64 bit

2018-11-29 Thread zhoucm1



On 2018年11月28日 22:50, Christian König wrote:

For a lot of use cases we need 64bit sequence numbers. Currently drivers
overload the dma_fence structure to store the additional bits.

Stop doing that and make the sequence number in the dma_fence always
64bit.

For compatibility with hardware which can do only 32bit sequences the
comparisons in __dma_fence_is_later still only takes the lower 32bits as
significant.

Can't we compare 64bit variable directly?  Can we do it as below?

-static inline bool __dma_fence_is_later(u32 f1, u32 f2)
+static inline bool __dma_fence_is_later(u64 f1, u64 f2)
 {
-   return (int)(f1 - f2) > 0;
+   return (f1 > f2) ? true : false;

 }

-David



Signed-off-by: Christian König 
---
  drivers/dma-buf/dma-fence.c|  2 +-
  drivers/dma-buf/sw_sync.c  |  2 +-
  drivers/dma-buf/sync_file.c|  4 ++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c |  2 +-
  drivers/gpu/drm/i915/i915_sw_fence.c   |  2 +-
  drivers/gpu/drm/i915/intel_engine_cs.c |  2 +-
  drivers/gpu/drm/vgem/vgem_fence.c  |  4 ++--
  include/linux/dma-fence.h  | 14 +++---
  8 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 1551ca7df394..37e24b69e94b 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -615,7 +615,7 @@ EXPORT_SYMBOL(dma_fence_wait_any_timeout);
   */
  void
  dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
-  spinlock_t *lock, u64 context, unsigned seqno)
+  spinlock_t *lock, u64 context, u64 seqno)
  {
BUG_ON(!lock);
BUG_ON(!ops || !ops->get_driver_name || !ops->get_timeline_name);
diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c
index 53c1d6d36a64..32dcf7b4c935 100644
--- a/drivers/dma-buf/sw_sync.c
+++ b/drivers/dma-buf/sw_sync.c
@@ -172,7 +172,7 @@ static bool timeline_fence_enable_signaling(struct 
dma_fence *fence)
  static void timeline_fence_value_str(struct dma_fence *fence,
char *str, int size)
  {
-   snprintf(str, size, "%d", fence->seqno);
+   snprintf(str, size, "%lld", fence->seqno);
  }
  
  static void timeline_fence_timeline_value_str(struct dma_fence *fence,

diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c
index 35dd06479867..4f6305ca52c8 100644
--- a/drivers/dma-buf/sync_file.c
+++ b/drivers/dma-buf/sync_file.c
@@ -144,7 +144,7 @@ char *sync_file_get_name(struct sync_file *sync_file, char 
*buf, int len)
} else {
struct dma_fence *fence = sync_file->fence;
  
-		snprintf(buf, len, "%s-%s%llu-%d",

+   snprintf(buf, len, "%s-%s%llu-%lld",
 fence->ops->get_driver_name(fence),
 fence->ops->get_timeline_name(fence),
 fence->context,
@@ -258,7 +258,7 @@ static struct sync_file *sync_file_merge(const char *name, 
struct sync_file *a,
  
  			i_b++;

} else {
-   if (pt_a->seqno - pt_b->seqno <= INT_MAX)
+   if (__dma_fence_is_later(pt_a->seqno, pt_b->seqno))
add_fence(fences, , pt_a);
else
add_fence(fences, , pt_b);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c
index 12f2bf97611f..bfaf5c6323be 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c
@@ -388,7 +388,7 @@ void amdgpu_sa_bo_dump_debug_info(struct amdgpu_sa_manager 
*sa_manager,
   soffset, eoffset, eoffset - soffset);
  
  		if (i->fence)

-   seq_printf(m, " protected by 0x%08x on context %llu",
+   seq_printf(m, " protected by 0x%016llx on context %llu",
   i->fence->seqno, i->fence->context);
  
  		seq_printf(m, "\n");

diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c 
b/drivers/gpu/drm/i915/i915_sw_fence.c
index 6dbeed079ae5..11bcdabd5177 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence.c
@@ -393,7 +393,7 @@ static void timer_i915_sw_fence_wake(struct timer_list *t)
if (!fence)
return;
  
-	pr_notice("Asynchronous wait on fence %s:%s:%x timed out (hint:%pS)\n",

+   pr_notice("Asynchronous wait on fence %s:%s:%llx timed out 
(hint:%pS)\n",
  cb->dma->ops->get_driver_name(cb->dma),
  cb->dma->ops->get_timeline_name(cb->dma),
  cb->dma->seqno,
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c 
b/drivers/gpu/drm/i915/intel_engine_cs.c
index 217ed3ee1cab..f28a66c67d34 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1236,7 +1236,7 @@ static void print_request(struct drm_printer *m,
  
  	x = 

Re: [PATCH 7/7] drm/syncobj: use the timeline point in drm_syncobj_find_fence

2018-11-23 Thread zhoucm1



On 2018年11月23日 18:10, Koenig, Christian wrote:

Am 23.11.18 um 03:36 schrieb zhoucm1:


On 2018年11月22日 19:30, Christian König wrote:

Am 22.11.18 um 07:52 schrieb zhoucm1:


On 2018年11月15日 19:12, Christian König wrote:

Implement finding the right timeline point in drm_syncobj_find_fence.

Signed-off-by: Christian König 
---
   drivers/gpu/drm/drm_syncobj.c | 10 +-
   1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c
b/drivers/gpu/drm/drm_syncobj.c
index 589d884ccd58..d42c51520da4 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -307,9 +307,17 @@ int drm_syncobj_find_fence(struct drm_file
*file_private,
   return -ENOENT;
     *fence = drm_syncobj_fence_get(syncobj);
-    if (!*fence) {
+    if (!*fence)
   ret = -EINVAL;
+
+    if (!ret && point) {
+    dma_fence_chain_for_each(*fence) {
+    if (!to_dma_fence_chain(*fence) ||
+    (*fence)->seqno <= point)
+    break;

This condition isn't enough to find proper point.
For two examples:
a. No garbage collection happens, the points in chain are
136912---18---20, if user wants to get point17, then
we should return node 18.

And that is exactly what's wrong in the original logic. In this case
we need to return 12, not 18 because point 17 could have already been
garbage collected.

I don't think so, the 'a' case I already assume there isn't garbage
collection. If user wants to get point17, then we should return node 18.
timeline means point[N]  must be signaled later than point[N-1].
Point[12] just can make sure point[1] ~point[12] are signaled.
Point[18] signal can make sure point[17] is signaled.
So this case we need to return 18, not 12, which is key timeline concept.

No, exactly that's incorrect. When we ask for 17 and can't find it then
this means it either never existed or that it is signaled already.

Returning a lower number in this case or even a stub fence is perfectly
fine since we only need to wait for that one in this case.

If we return 18 in this case then we add incorrect synchronization when
there shouldn't be any.
No, That will make timeline not work at all and break timeline semantics 
totally.


If there aren't point18 and point20, the chain is 
136912, if user wants to get point 17, you also return 
12? if yes, which absolutely is incorrect. The answer should be NO, 
right? point17 should be waited on there until a bigger point is coming.


For chain is 136912---18---20, if user wants to wait on 
any one of points 13,14,15,16,17,18, we must wait for point 18, this is 
timeline semantic.


You can also check sw_sync.c for timeline meaning.

-David


Christian.


-David

b. garbage collection happens on point6, chain would be updated to
1---3---9---12---18---20, if user wants to get point5, then we
should return node 3, but if user wants to get point 7, then we
should return node 9.

Why? That doesn't seem to make any sense to me.


I still have no idea how to satisfy all these requirements with your
current chain-fence. All these logic just are same we encountered
before, we're walking them again. After solving these problems, I
guess all design is similar as before.

In fact, I don't know what problem previous design has, maybe there
are some bugs, can't we fix these bugs by time going? Who can make
sure his implementation never have bugs?

Well there where numerous problems with the original design. For
example we need to reject the requirement that timeline fences are in
order because that doesn't make sense in the kernel.

When userspace does something like submitting fences in the order 1,
5, 3 then it is broken and can keep the pieces. In other words the
kernel should not care about that, but rather make sure that it never
looses any synchronization no matter what.

Regards,
Christian.



-David

+    }
   }
+
   drm_syncobj_put(syncobj);
   return ret;
   }


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 7/7] drm/syncobj: use the timeline point in drm_syncobj_find_fence

2018-11-22 Thread zhoucm1



On 2018年11月22日 19:30, Christian König wrote:

Am 22.11.18 um 07:52 schrieb zhoucm1:



On 2018年11月15日 19:12, Christian König wrote:

Implement finding the right timeline point in drm_syncobj_find_fence.

Signed-off-by: Christian König 
---
  drivers/gpu/drm/drm_syncobj.c | 10 +-
  1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c 
b/drivers/gpu/drm/drm_syncobj.c

index 589d884ccd58..d42c51520da4 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -307,9 +307,17 @@ int drm_syncobj_find_fence(struct drm_file 
*file_private,

  return -ENOENT;
    *fence = drm_syncobj_fence_get(syncobj);
-    if (!*fence) {
+    if (!*fence)
  ret = -EINVAL;
+
+    if (!ret && point) {
+    dma_fence_chain_for_each(*fence) {
+    if (!to_dma_fence_chain(*fence) ||
+    (*fence)->seqno <= point)
+    break;

This condition isn't enough to find proper point.
For two examples:
a. No garbage collection happens, the points in chain are 
136912---18---20, if user wants to get point17, then 
we should return node 18.


And that is exactly what's wrong in the original logic. In this case 
we need to return 12, not 18 because point 17 could have already been 
garbage collected.
I don't think so, the 'a' case I already assume there isn't garbage 
collection. If user wants to get point17, then we should return node 18.

timeline means point[N]  must be signaled later than point[N-1].
Point[12] just can make sure point[1] ~point[12] are signaled.
Point[18] signal can make sure point[17] is signaled.
So this case we need to return 18, not 12, which is key timeline concept.

-David


b. garbage collection happens on point6, chain would be updated to 
1---3---9---12---18---20, if user wants to get point5, then we should 
return node 3, but if user wants to get point 7, then we should 
return node 9.


Why? That doesn't seem to make any sense to me.

I still have no idea how to satisfy all these requirements with your 
current chain-fence. All these logic just are same we encountered 
before, we're walking them again. After solving these problems, I 
guess all design is similar as before.


In fact, I don't know what problem previous design has, maybe there 
are some bugs, can't we fix these bugs by time going? Who can make 
sure his implementation never have bugs?


Well there where numerous problems with the original design. For 
example we need to reject the requirement that timeline fences are in 
order because that doesn't make sense in the kernel.


When userspace does something like submitting fences in the order 1, 
5, 3 then it is broken and can keep the pieces. In other words the 
kernel should not care about that, but rather make sure that it never 
looses any synchronization no matter what.


Regards,
Christian.




-David

+    }
  }
+
  drm_syncobj_put(syncobj);
  return ret;
  }






___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 7/7] drm/syncobj: use the timeline point in drm_syncobj_find_fence

2018-11-21 Thread zhoucm1



On 2018年11月15日 19:12, Christian König wrote:

Implement finding the right timeline point in drm_syncobj_find_fence.

Signed-off-by: Christian König 
---
  drivers/gpu/drm/drm_syncobj.c | 10 +-
  1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 589d884ccd58..d42c51520da4 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -307,9 +307,17 @@ int drm_syncobj_find_fence(struct drm_file *file_private,
return -ENOENT;
  
  	*fence = drm_syncobj_fence_get(syncobj);

-   if (!*fence) {
+   if (!*fence)
ret = -EINVAL;
+
+   if (!ret && point) {
+   dma_fence_chain_for_each(*fence) {
+   if (!to_dma_fence_chain(*fence) ||
+   (*fence)->seqno <= point)
+   break;

This condition isn't enough to find proper point.
For two examples:
a. No garbage collection happens, the points in chain are 
136912---18---20, if user wants to get point17, then we 
should return node 18.
b. garbage collection happens on point6, chain would be updated to 
1---3---9---12---18---20, if user wants to get point5, then we should 
return node 3, but if user wants to get point 7, then we should return 
node 9.


I still have no idea how to satisfy all these requirements with your 
current chain-fence. All these logic just are same we encountered 
before, we're walking them again. After solving these problems, I guess 
all design is similar as before.


In fact, I don't know what problem previous design has, maybe there are 
some bugs, can't we fix these bugs by time going? Who can make sure his 
implementation never have bugs?



-David

+   }
}
+
drm_syncobj_put(syncobj);
return ret;
  }


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 2/2] drm: Revert syncobj timeline changes.

2018-11-12 Thread zhoucm1



On 2018年11月12日 18:16, Christian König wrote:

Am 09.11.18 um 23:26 schrieb Eric Anholt:

Eric Anholt  writes:


[ Unknown signature status ]
zhoucm1  writes:


On 2018年11月09日 00:52, Christian König wrote:

Am 08.11.18 um 17:07 schrieb Koenig, Christian:

Am 08.11.18 um 17:04 schrieb Eric Anholt:

Daniel suggested I submit this, since we're still seeing regressions
from it.  This is a revert to before 48197bc564c7 ("drm: add syncobj
timeline support v9") and its followon fixes.

This is a harmless false positive from lockdep, Chouming and I are
already working on a fix.

On the other hand we had enough trouble with that patch, so if it
really bothers you feel free to add my Acked-by: Christian König
  and push it.

NAK, please no, I don't think this needed, the Warning totally isn't
related to syncobj timeline, but fence-array implementation flaw, just
exposed by syncobj.
In addition, Christian already has a fix for this Warning, I've tested.
Please Christian send to public review.

I backed out my revert of #2 (#1 still necessary) after adding the
lockdep regression fix, and now my CTS run got oomkilled after just a
few hours, with these notable lines in the unreclaimable slab info list:

[ 6314.373099] drm_sched_fence69095KB  69095KB
[ 6314.373653] kmemleak_object   428249KB 428384KB
[ 6314.373736] kmalloc-262144   256KB256KB
[ 6314.373743] kmalloc-131072   128KB128KB
[ 6314.373750] kmalloc-65536 64KB 64KB
[ 6314.373756] kmalloc-32768   1472KB   1728KB
[ 6314.373763] kmalloc-16384 64KB 64KB
[ 6314.373770] kmalloc-8192 208KB208KB
[ 6314.373778] kmalloc-40962408KB   2408KB
[ 6314.373784] kmalloc-2048 288KB336KB
[ 6314.373792] kmalloc-10241457KB   1512KB
[ 6314.373800] kmalloc-512  854KB   1048KB
[ 6314.373808] kmalloc-256  188KB268KB
[ 6314.373817] kmalloc-19269141KB  69142KB
[ 6314.373824] kmalloc-64 47703KB  47704KB
[ 6314.373886] kmalloc-12846396KB  46396KB
[ 6314.373894] kmem_cache31KB 35KB

No results from kmemleak, though.

OK, it looks like the #2 revert probably isn't related to the OOM issue.
Running a single job on otherwise unused DRM, watching /proc/slabinfo
every second for drm_sched_fence, I get:

drm_sched_fence0  0192   211 : tunables   32   168 : 
slabdata  0  0  0 : globalstat   0  0 0000  
  000 : cpustat  0  0  0  0
drm_sched_fence   16 21192   211 : tunables   32   168 : 
slabdata  1  1  0 : globalstat  16 16 1000  
  000 : cpustat  5  1  6  0
drm_sched_fence   13 21192   211 : tunables   32   168 : 
slabdata  1  1  0 : globalstat  16 16 1000  
  000 : cpustat  5  1  6  0
drm_sched_fence6 21192   211 : tunables   32   168 : 
slabdata  1  1  0 : globalstat  16 16 1000  
  000 : cpustat  5  1  6  0
drm_sched_fence4 21192   211 : tunables   32   168 : 
slabdata  1  1  0 : globalstat  16 16 1000  
  000 : cpustat  5  1  6  0
drm_sched_fence2 21192   211 : tunables   32   168 : 
slabdata  1  1  0 : globalstat  16 16 1000  
  000 : cpustat  5  1  6  0
drm_sched_fence0 21192   211 : tunables   32   168 : 
slabdata  0  1  0 : globalstat  16 16 1000  
  000 : cpustat  5  1  6  0

So we generate a ton of fences, and I guess free them slowly because of
RCU?  And presumably kmemleak was sucking up lots of memory because of
how many of these objects were laying around.

Hi Eric,
Thanks for testing, I checked the code, we could forget signal 
fence-array immediately after many reviews without callback for 
fence-array, everything is waiting fence_wait or syncobj free, that's 
why you see "free them slowly".
Maybe we just need one line change as attahced, Could you please have a 
try with it on your tests?


btw,  I also didn't find there is fence-leak, or you can point me.

Thanks,
David


That is certainly possible. Another possibility is that we don't drop 
the reference in dma-fence-array early enough.


E.g. the dma-fence-array will keep the reference to its fences until 
it is destroyed, which is a bit late when you chain multiple 
dma-fence-array objects together.


David can you take a look at this and propose a fix? That would 
probably be good to have fixed in dma-fence-array separately to the 
timeline work.

Re: [PATCH 2/2] drm: Revert syncobj timeline changes.

2018-11-12 Thread zhoucm1



On 2018年11月12日 18:48, Chris Wilson wrote:

Quoting Christian König (2018-11-12 10:16:01)

Am 09.11.18 um 23:26 schrieb Eric Anholt:

 Eric Anholt  writes:


 [ Unknown signature status ]
 zhoucm1  writes:


 On 2018年11月09日 00:52, Christian König wrote:

 Am 08.11.18 um 17:07 schrieb Koenig, Christian:

 Am 08.11.18 um 17:04 schrieb Eric Anholt:

 Daniel suggested I submit this, since we're still 
seeing regressions
 from it.  This is a revert to before 48197bc564c7 
("drm: add syncobj
 timeline support v9") and its followon fixes.

 This is a harmless false positive from lockdep, Chouming 
and I are
 already working on a fix.

 On the other hand we had enough trouble with that patch, so if 
it
 really bothers you feel free to add my Acked-by: Christian 
König
  and push it.

 NAK, please no, I don't think this needed, the Warning totally 
isn't
 related to syncobj timeline, but fence-array implementation flaw, 
just
 exposed by syncobj.
 In addition, Christian already has a fix for this Warning, I've 
tested.
 Please Christian send to public review.

 I backed out my revert of #2 (#1 still necessary) after adding the
 lockdep regression fix, and now my CTS run got oomkilled after just a
 few hours, with these notable lines in the unreclaimable slab info 
list:

 [ 6314.373099] drm_sched_fence69095KB  69095KB
 [ 6314.373653] kmemleak_object   428249KB 428384KB
 [ 6314.373736] kmalloc-262144   256KB256KB
 [ 6314.373743] kmalloc-131072   128KB128KB
 [ 6314.373750] kmalloc-65536 64KB 64KB
 [ 6314.373756] kmalloc-32768   1472KB   1728KB
 [ 6314.373763] kmalloc-16384 64KB 64KB
 [ 6314.373770] kmalloc-8192 208KB208KB
 [ 6314.373778] kmalloc-40962408KB   2408KB
 [ 6314.373784] kmalloc-2048 288KB336KB
 [ 6314.373792] kmalloc-10241457KB   1512KB
 [ 6314.373800] kmalloc-512  854KB   1048KB
 [ 6314.373808] kmalloc-256  188KB268KB
 [ 6314.373817] kmalloc-19269141KB  69142KB
 [ 6314.373824] kmalloc-64 47703KB  47704KB
 [ 6314.373886] kmalloc-12846396KB  46396KB
 [ 6314.373894] kmem_cache31KB 35KB

 No results from kmemleak, though.

 OK, it looks like the #2 revert probably isn't related to the OOM issue.
 Running a single job on otherwise unused DRM, watching /proc/slabinfo
 every second for drm_sched_fence, I get:

 drm_sched_fence0  0192   211 : tunables   32   168 
: slabdata  0  0  0 : globalstat   0  0 000
0000 : cpustat  0  0  0  0
 drm_sched_fence   16 21192   211 : tunables   32   168 
: slabdata  1  1  0 : globalstat  16 16 100
0000 : cpustat  5  1  6  0
 drm_sched_fence   13 21192   211 : tunables   32   168 
: slabdata  1  1  0 : globalstat  16 16 100
0000 : cpustat  5  1  6  0
 drm_sched_fence6 21192   211 : tunables   32   168 
: slabdata  1  1  0 : globalstat  16 16 100
0000 : cpustat  5  1  6  0
 drm_sched_fence4 21192   211 : tunables   32   168 
: slabdata  1  1  0 : globalstat  16 16 100
0000 : cpustat  5  1  6  0
 drm_sched_fence2 21192   211 : tunables   32   168 
: slabdata  1  1  0 : globalstat  16 16 100
0000 : cpustat  5  1  6  0
 drm_sched_fence0 21192   211 : tunables   32   168 
: slabdata  0  1  0 : globalstat  16 16 100
0000 : cpustat  5  1  6  0

 So we generate a ton of fences, and I guess free them slowly because of
 RCU?  And presumably kmemleak was sucking up lots of memory because of
 how many of these objects were laying around.


That is certainly possible. Another possibility is that we don't drop the
reference in dma-fence-array early enough.

E.g. the dma-fence-array will keep the reference to its fences until it is
destroyed, which is a bit late when you chain multiple dma-fence-array object

Re: [PATCH 2/2] drm: Revert syncobj timeline changes.

2018-11-12 Thread zhoucm1



On 2018年11月12日 18:16, Christian König wrote:

Am 09.11.18 um 23:26 schrieb Eric Anholt:

Eric Anholt  writes:


[ Unknown signature status ]
zhoucm1  writes:


On 2018年11月09日 00:52, Christian König wrote:

Am 08.11.18 um 17:07 schrieb Koenig, Christian:

Am 08.11.18 um 17:04 schrieb Eric Anholt:

Daniel suggested I submit this, since we're still seeing regressions
from it.  This is a revert to before 48197bc564c7 ("drm: add syncobj
timeline support v9") and its followon fixes.

This is a harmless false positive from lockdep, Chouming and I are
already working on a fix.

On the other hand we had enough trouble with that patch, so if it
really bothers you feel free to add my Acked-by: Christian König
  and push it.

NAK, please no, I don't think this needed, the Warning totally isn't
related to syncobj timeline, but fence-array implementation flaw, just
exposed by syncobj.
In addition, Christian already has a fix for this Warning, I've tested.
Please Christian send to public review.

I backed out my revert of #2 (#1 still necessary) after adding the
lockdep regression fix, and now my CTS run got oomkilled after just a
few hours, with these notable lines in the unreclaimable slab info list:

[ 6314.373099] drm_sched_fence69095KB  69095KB
[ 6314.373653] kmemleak_object   428249KB 428384KB
[ 6314.373736] kmalloc-262144   256KB256KB
[ 6314.373743] kmalloc-131072   128KB128KB
[ 6314.373750] kmalloc-65536 64KB 64KB
[ 6314.373756] kmalloc-32768   1472KB   1728KB
[ 6314.373763] kmalloc-16384 64KB 64KB
[ 6314.373770] kmalloc-8192 208KB208KB
[ 6314.373778] kmalloc-40962408KB   2408KB
[ 6314.373784] kmalloc-2048 288KB336KB
[ 6314.373792] kmalloc-10241457KB   1512KB
[ 6314.373800] kmalloc-512  854KB   1048KB
[ 6314.373808] kmalloc-256  188KB268KB
[ 6314.373817] kmalloc-19269141KB  69142KB
[ 6314.373824] kmalloc-64 47703KB  47704KB
[ 6314.373886] kmalloc-12846396KB  46396KB
[ 6314.373894] kmem_cache31KB 35KB

No results from kmemleak, though.

OK, it looks like the #2 revert probably isn't related to the OOM issue.

Before you judge if it is memleak, to be honest, you can scan it first.


Running a single job on otherwise unused DRM, watching /proc/slabinfo
every second for drm_sched_fence, I get:

drm_sched_fence0  0192   211 : tunables   32   168 : 
slabdata  0  0  0 : globalstat   0  0 0000  
  000 : cpustat  0  0  0  0
drm_sched_fence   16 21192   211 : tunables   32   168 : 
slabdata  1  1  0 : globalstat  16 16 1000  
  000 : cpustat  5  1  6  0
drm_sched_fence   13 21192   211 : tunables   32   168 : 
slabdata  1  1  0 : globalstat  16 16 1000  
  000 : cpustat  5  1  6  0
drm_sched_fence6 21192   211 : tunables   32   168 : 
slabdata  1  1  0 : globalstat  16 16 1000  
  000 : cpustat  5  1  6  0
drm_sched_fence4 21192   211 : tunables   32   168 : 
slabdata  1  1  0 : globalstat  16 16 1000  
  000 : cpustat  5  1  6  0
drm_sched_fence2 21192   211 : tunables   32   168 : 
slabdata  1  1  0 : globalstat  16 16 1000  
  000 : cpustat  5  1  6  0
drm_sched_fence0 21192   211 : tunables   32   168 : 
slabdata  0  1  0 : globalstat  16 16 1000  
  000 : cpustat  5  1  6  0

So we generate a ton of fences, and I guess free them slowly because of
RCU?  And presumably kmemleak was sucking up lots of memory because of
how many of these objects were laying around.


That is certainly possible. Another possibility is that we don't drop 
the reference in dma-fence-array early enough.


E.g. the dma-fence-array will keep the reference to its fences until 
it is destroyed, which is a bit late when you chain multiple 
dma-fence-array objects together.

Good point, but need to confirm.



David can you take a look at this and propose a fix? That would 
probably be good to have fixed in dma-fence-array separately to the 
timeline work.

Yeah,  I would find a free time for it.

Thanks,
David Zhou


Thanks,
Christian.



___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel




___
dri-devel mailing list

Re: [PATCH 2/2] drm: Revert syncobj timeline changes.

2018-11-08 Thread zhoucm1



On 2018年11月09日 00:52, Christian König wrote:

Am 08.11.18 um 17:07 schrieb Koenig, Christian:

Am 08.11.18 um 17:04 schrieb Eric Anholt:

Daniel suggested I submit this, since we're still seeing regressions
from it.  This is a revert to before 48197bc564c7 ("drm: add syncobj
timeline support v9") and its followon fixes.

This is a harmless false positive from lockdep, Chouming and I are
already working on a fix.


On the other hand we had enough trouble with that patch, so if it 
really bothers you feel free to add my Acked-by: Christian König 
 and push it.
NAK, please no, I don't think this needed, the Warning totally isn't 
related to syncobj timeline, but fence-array implementation flaw, just 
exposed by syncobj.
In addition, Christian already has a fix for this Warning, I've tested. 
Please Christian send to public review.


-David


Christian.



Christian.


Fixes this on first V3D testcase execution:

[   48.767088] 
[   48.772410] WARNING: possible recursive locking detected
[   48.39] 4.19.0-rc6+ #489 Not tainted
[   48.781668] 
[   48.786993] shader_runner/3284 is trying to acquire lock:
[   48.792408] ce309d7f (&(>lock)->rlock){}, at: 
dma_fence_add_callback+0x30/0x23c

[   48.800714]
[   48.800714] but task is already holding lock:
[   48.806559] c5952bd3 (&(>lock)->rlock){}, at: 
dma_fence_add_callback+0x30/0x23c

[   48.814862]
[   48.814862] other info that might help us debug this:
[   48.821410]  Possible unsafe locking scenario:
[   48.821410]
[   48.827338]    CPU0
[   48.829788]    
[   48.832239]   lock(&(>lock)->rlock);
[   48.836434]   lock(&(>lock)->rlock);
[   48.840640]
[   48.840640]  *** DEADLOCK ***
[   48.840640]
[   48.846582]  May be due to missing lock nesting notation
[  130.763560] 1 lock held by cts-runner/3270:
[  130.767745]  #0: 7834b793 (&(>lock)->rlock){-...}, at: 
dma_fence_add_callback+0x30/0x23c

[  130.776461]
 stack backtrace:
[  130.780825] CPU: 1 PID: 3270 Comm: cts-runner Not tainted 
4.19.0-rc6+ #486

[  130.787706] Hardware name: Broadcom STB (Flattened Device Tree)
[  130.793645] [] (unwind_backtrace) from [] 
(show_stack+0x10/0x14)
[  130.801404] [] (show_stack) from [] 
(dump_stack+0xa8/0xd4)
[  130.808642] [] (dump_stack) from [] 
(__lock_acquire+0x848/0x1a68)
[  130.816483] [] (__lock_acquire) from [] 
(lock_acquire+0xd8/0x22c)
[  130.824326] [] (lock_acquire) from [] 
(_raw_spin_lock_irqsave+0x54/0x68)
[  130.832777] [] (_raw_spin_lock_irqsave) from 
[] (dma_fence_add_callback+0x30/0x23c)
[  130.842183] [] (dma_fence_add_callback) from 
[] (dma_fence_array_enable_signaling+0x58/0xec)
[  130.852371] [] (dma_fence_array_enable_signaling) from 
[] (dma_fence_add_callback+0xe8/0x23c)
[  130.862647] [] (dma_fence_add_callback) from 
[] (drm_syncobj_wait_ioctl+0x518/0x614)
[  130.872143] [] (drm_syncobj_wait_ioctl) from 
[] (drm_ioctl_kernel+0xb0/0xf0)
[  130.880940] [] (drm_ioctl_kernel) from [] 
(drm_ioctl+0x1d8/0x390)
[  130.888782] [] (drm_ioctl) from [] 
(do_vfs_ioctl+0xb0/0x8ac)
[  130.896187] [] (do_vfs_ioctl) from [] 
(ksys_ioctl+0x34/0x60)
[  130.903593] [] (ksys_ioctl) from [] 
(ret_fast_syscall+0x0/0x28)


Cc: Chunming Zhou 
Cc: Christian König 
Cc: Daniel Vetter 
Signed-off-by: Eric Anholt 
---
   drivers/gpu/drm/drm_syncobj.c | 359 
+++---

   include/drm/drm_syncobj.h |  73 ---
   2 files changed, 105 insertions(+), 327 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c 
b/drivers/gpu/drm/drm_syncobj.c

index da8175d9c6ff..e2c5b3ca4824 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -56,9 +56,6 @@
   #include "drm_internal.h"
   #include 
   -/* merge normal syncobj to timeline syncobj, the point interval 
is 1 */

-#define DRM_SYNCOBJ_BINARY_POINT 1
-
   struct drm_syncobj_stub_fence {
   struct dma_fence base;
   spinlock_t lock;
@@ -74,29 +71,7 @@ static const struct dma_fence_ops 
drm_syncobj_stub_fence_ops = {

   .get_timeline_name = drm_syncobj_stub_fence_get_name,
   };
   -struct drm_syncobj_signal_pt {
-    struct dma_fence_array *fence_array;
-    u64    value;
-    struct list_head list;
-};
-
-static DEFINE_SPINLOCK(signaled_fence_lock);
-static struct dma_fence signaled_fence;
   -static struct dma_fence *drm_syncobj_get_stub_fence(void)
-{
-    spin_lock(_fence_lock);
-    if (!signaled_fence.ops) {
-    dma_fence_init(_fence,
-   _syncobj_stub_fence_ops,
-   _fence_lock,
-   0, 0);
-    dma_fence_signal_locked(_fence);
-    }
-    spin_unlock(_fence_lock);
-
-    return dma_fence_get(_fence);
-}
   /**
    * drm_syncobj_find - lookup and reference a sync object.
    * @file_private: drm file private pointer
@@ -123,27 +98,6 @@ struct drm_syncobj *drm_syncobj_find(struct 
drm_file *file_private,

   }
   EXPORT_SYMBOL(drm_syncobj_find);
   -static struct 

Re: [PATCH 3/5] drm: add timeline support for syncobj export/import

2018-11-02 Thread zhoucm1


On 2018年11月02日 17:54, Koenig, Christian wrote:

Am 02.11.18 um 10:42 schrieb zhoucm1:


On 2018年11月02日 16:46, Christian König wrote:

Am 02.11.18 um 09:25 schrieb Chunming Zhou:

user space can specify timeline point fence to export/import.

Signed-off-by: Chunming Zhou 
Cc: Daniel Rakos 
Cc: Jason Ekstrand 
Cc: Bas Nieuwenhuizen 
Cc: Dave Airlie 
Cc: Christian König 
Cc: Chris Wilson 

 From the coding it looks good to me, but I can't judge if that really
makes sense to export/import individual points.

I would have rather expected that always the whole timeline is
exported/imported. Otherwise the whole thing with waiting for sync
points to appear doesn't really make sense to me.

We can Cpu wait on individual points, why can not we export/import them?
I confirmed with Daniel before, he said the extension need both them,
export/import individual points and export/import the whole timeline
semaphore.
More discussion is
https://gitlab.khronos.org/vulkan/vulkan/merge_requests/2696

Ah! It looked initially to me that this was the only way to
export/import timeline sync objects.

Thinking more about it wouldn't it be better to provide a function to
signal a sync point from another sync point?

That would provide the same functionality, would be cleaner to implement
and more flexible on top.

E.g. if you want to export only a specific sync point you first create a
new syncobj and the move over this sync point into the syncobj.

Sorry, I didn't completely get your means. Could you detail a bit?
but I think we need to meet export/import goals:
1. export sepcific sync point or binary syncobj fence to sync file fd or 
handle.

2. import fence from fd/handle to timeline sync point or banory syncobj.
3. export/import the whole timeline seamphore.

I'm still not sure how you mentioned can achieve these.

Thanks,
David


Otherwise you mix multiple operations into one IOCTL and that is usually
not a good idea.

Regards,
Christian.


Thanks,
David Zhou

Christian.


---
   drivers/gpu/drm/drm_internal.h |  4 ++
   drivers/gpu/drm/drm_ioctl.c    |  4 ++
   drivers/gpu/drm/drm_syncobj.c  | 76
++
   include/uapi/drm/drm.h | 11 +
   4 files changed, 88 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/drm_internal.h
b/drivers/gpu/drm/drm_internal.h
index 9c4826411a3c..5ad6cbdb68ab 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -181,6 +181,10 @@ int drm_syncobj_handle_to_fd_ioctl(struct
drm_device *dev, void *data,
  struct drm_file *file_private);
   int drm_syncobj_fd_to_handle_ioctl(struct drm_device *dev, void
*data,
  struct drm_file *file_private);
+int drm_syncobj_handle_to_fd_ioctl2(struct drm_device *dev, void
*data,
+    struct drm_file *file_private);
+int drm_syncobj_fd_to_handle_ioctl2(struct drm_device *dev, void
*data,
+    struct drm_file *file_private);
   int drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
  struct drm_file *file_private);
   int drm_syncobj_timeline_wait_ioctl(struct drm_device *dev, void
*data,
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 7578ef6dc1d1..364d26e949cf 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -673,6 +673,10 @@ static const struct drm_ioctl_desc drm_ioctls[]
= {
     DRM_UNLOCKED|DRM_RENDER_ALLOW),
   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_FD_TO_HANDLE,
drm_syncobj_fd_to_handle_ioctl,
     DRM_UNLOCKED|DRM_RENDER_ALLOW),
+    DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_HANDLE_TO_FD2,
drm_syncobj_handle_to_fd_ioctl2,
+  DRM_UNLOCKED|DRM_RENDER_ALLOW),
+    DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_FD_TO_HANDLE2,
drm_syncobj_fd_to_handle_ioctl2,
+  DRM_UNLOCKED|DRM_RENDER_ALLOW),
   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_WAIT, drm_syncobj_wait_ioctl,
     DRM_UNLOCKED|DRM_RENDER_ALLOW),
   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_TIMELINE_WAIT,
drm_syncobj_timeline_wait_ioctl,
diff --git a/drivers/gpu/drm/drm_syncobj.c
b/drivers/gpu/drm/drm_syncobj.c
index 9ad58d0d21cd..dffc42ba2f91 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -677,7 +677,7 @@ static int drm_syncobj_fd_to_handle(struct
drm_file *file_private,
   }
     static int drm_syncobj_import_sync_file_fence(struct drm_file
*file_private,
-  int fd, int handle)
+  int fd, int handle, uint64_t point)
   {
   struct dma_fence *fence = sync_file_get_fence(fd);
   struct drm_syncobj *syncobj;
@@ -691,14 +691,14 @@ static int
drm_syncobj_import_sync_file_fence(struct drm_file *file_private,
   return -ENOENT;
   }
   -    drm_syncobj_replace_fence(syncobj, 0, fence);
+    drm_syncobj_replace_fence(syncobj, point, fence);
   dma_fence_put(fence);
   drm_syncobj_put(syncobj);
   return 0;
   }
     static int

Re: [PATCH 3/5] drm: add timeline support for syncobj export/import

2018-11-02 Thread zhoucm1



On 2018年11月02日 16:46, Christian König wrote:

Am 02.11.18 um 09:25 schrieb Chunming Zhou:

user space can specify timeline point fence to export/import.

Signed-off-by: Chunming Zhou 
Cc: Daniel Rakos 
Cc: Jason Ekstrand 
Cc: Bas Nieuwenhuizen 
Cc: Dave Airlie 
Cc: Christian König 
Cc: Chris Wilson 


From the coding it looks good to me, but I can't judge if that really 
makes sense to export/import individual points.


I would have rather expected that always the whole timeline is 
exported/imported. Otherwise the whole thing with waiting for sync 
points to appear doesn't really make sense to me.

We can Cpu wait on individual points, why can not we export/import them?
I confirmed with Daniel before, he said the extension need both them, 
export/import individual points and export/import the whole timeline 
semaphore.
More discussion is 
https://gitlab.khronos.org/vulkan/vulkan/merge_requests/2696


Thanks,
David Zhou


Christian.


---
  drivers/gpu/drm/drm_internal.h |  4 ++
  drivers/gpu/drm/drm_ioctl.c    |  4 ++
  drivers/gpu/drm/drm_syncobj.c  | 76 ++
  include/uapi/drm/drm.h | 11 +
  4 files changed, 88 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/drm_internal.h 
b/drivers/gpu/drm/drm_internal.h

index 9c4826411a3c..5ad6cbdb68ab 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -181,6 +181,10 @@ int drm_syncobj_handle_to_fd_ioctl(struct 
drm_device *dev, void *data,

 struct drm_file *file_private);
  int drm_syncobj_fd_to_handle_ioctl(struct drm_device *dev, void *data,
 struct drm_file *file_private);
+int drm_syncobj_handle_to_fd_ioctl2(struct drm_device *dev, void *data,
+    struct drm_file *file_private);
+int drm_syncobj_fd_to_handle_ioctl2(struct drm_device *dev, void *data,
+    struct drm_file *file_private);
  int drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
 struct drm_file *file_private);
  int drm_syncobj_timeline_wait_ioctl(struct drm_device *dev, void 
*data,

diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 7578ef6dc1d1..364d26e949cf 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -673,6 +673,10 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
    DRM_UNLOCKED|DRM_RENDER_ALLOW),
  DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_FD_TO_HANDLE, 
drm_syncobj_fd_to_handle_ioctl,

    DRM_UNLOCKED|DRM_RENDER_ALLOW),
+    DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_HANDLE_TO_FD2, 
drm_syncobj_handle_to_fd_ioctl2,

+  DRM_UNLOCKED|DRM_RENDER_ALLOW),
+    DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_FD_TO_HANDLE2, 
drm_syncobj_fd_to_handle_ioctl2,

+  DRM_UNLOCKED|DRM_RENDER_ALLOW),
  DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_WAIT, drm_syncobj_wait_ioctl,
    DRM_UNLOCKED|DRM_RENDER_ALLOW),
  DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_TIMELINE_WAIT, 
drm_syncobj_timeline_wait_ioctl,
diff --git a/drivers/gpu/drm/drm_syncobj.c 
b/drivers/gpu/drm/drm_syncobj.c

index 9ad58d0d21cd..dffc42ba2f91 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -677,7 +677,7 @@ static int drm_syncobj_fd_to_handle(struct 
drm_file *file_private,

  }
    static int drm_syncobj_import_sync_file_fence(struct drm_file 
*file_private,

-  int fd, int handle)
+  int fd, int handle, uint64_t point)
  {
  struct dma_fence *fence = sync_file_get_fence(fd);
  struct drm_syncobj *syncobj;
@@ -691,14 +691,14 @@ static int 
drm_syncobj_import_sync_file_fence(struct drm_file *file_private,

  return -ENOENT;
  }
  -    drm_syncobj_replace_fence(syncobj, 0, fence);
+    drm_syncobj_replace_fence(syncobj, point, fence);
  dma_fence_put(fence);
  drm_syncobj_put(syncobj);
  return 0;
  }
    static int drm_syncobj_export_sync_file(struct drm_file 
*file_private,

-    int handle, int *p_fd)
+    int handle, uint64_t point, int *p_fd)
  {
  int ret;
  struct dma_fence *fence;
@@ -708,7 +708,7 @@ static int drm_syncobj_export_sync_file(struct 
drm_file *file_private,

  if (fd < 0)
  return fd;
  -    ret = drm_syncobj_find_fence(file_private, handle, 0, 0, );
+    ret = drm_syncobj_find_fence(file_private, handle, point, 0, 
);

  if (ret)
  goto err_put_fd;
  @@ -817,9 +817,14 @@ drm_syncobj_handle_to_fd_ioctl(struct 
drm_device *dev, void *data,
  args->flags != 
DRM_SYNCOBJ_HANDLE_TO_FD_FLAGS_EXPORT_SYNC_FILE)

  return -EINVAL;
  -    if (args->flags & 
DRM_SYNCOBJ_HANDLE_TO_FD_FLAGS_EXPORT_SYNC_FILE)
+    if (args->flags & 
DRM_SYNCOBJ_HANDLE_TO_FD_FLAGS_EXPORT_SYNC_FILE) {

+    struct drm_syncobj *syncobj = drm_syncobj_find(file_private,
+   args->handle);
+    if (!syncobj || syncobj->type != DRM_SYNCOBJ_TYPE_BINARY)
+

Re: [PATCH] drm/syncobj: Mark local add/remove callback functions as static

2018-11-01 Thread zhoucm1



On 2018年10月31日 20:07, Chris Wilson wrote:

drivers/gpu/drm/drm_syncobj.c:181:6: warning: no previous prototype for 
‘drm_syncobj_add_callback’ [-Wmissing-prototypes]
drivers/gpu/drm/drm_syncobj.c:190:6: warning: no previous prototype for 
‘drm_syncobj_remove_callback’ [-Wmissing-prototypes]

Fixing that leads to

drivers/gpu/drm/drm_syncobj.c:181:13: warning: ‘drm_syncobj_add_callback’ 
defined but not used [-Wunused-function]

so remove the unused drm_syncobj_add_callback() entirely.

Signed-off-by: Chris Wilson 

Reviewed-by: Chunming Zhou 


---
  drivers/gpu/drm/drm_syncobj.c | 19 +--
  1 file changed, 5 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index d3e2335b88f9..4dca5f7e8c4b 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -123,9 +123,9 @@ struct drm_syncobj *drm_syncobj_find(struct drm_file 
*file_private,
  }
  EXPORT_SYMBOL(drm_syncobj_find);
  
-static struct dma_fence

-*drm_syncobj_find_signal_pt_for_point(struct drm_syncobj *syncobj,
- uint64_t point)
+static struct dma_fence *
+drm_syncobj_find_signal_pt_for_point(struct drm_syncobj *syncobj,
+uint64_t point)
  {
struct drm_syncobj_signal_pt *signal_pt;
  
@@ -178,17 +178,8 @@ static void drm_syncobj_fence_get_or_add_callback(struct drm_syncobj *syncobj,

mutex_unlock(>cb_mutex);
  }
  
-void drm_syncobj_add_callback(struct drm_syncobj *syncobj,

- struct drm_syncobj_cb *cb,
- drm_syncobj_func_t func)
-{
-   mutex_lock(>cb_mutex);
-   drm_syncobj_add_callback_locked(syncobj, cb, func);
-   mutex_unlock(>cb_mutex);
-}
-
-void drm_syncobj_remove_callback(struct drm_syncobj *syncobj,
-struct drm_syncobj_cb *cb)
+static void drm_syncobj_remove_callback(struct drm_syncobj *syncobj,
+   struct drm_syncobj_cb *cb)
  {
mutex_lock(>cb_mutex);
list_del_init(>node);


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [igt-dev] [PATCH] RFC: Make igts for cross-driver stuff mandatory?

2018-10-26 Thread zhoucm1



On 2018年10月26日 16:32, Daniel Vetter wrote:

On Fri, Oct 26, 2018 at 5:50 AM Zhou, David(ChunMing)
 wrote:

Make igt for cross-driver, I think you should rename it first, not an intel 
specific. NO company wants their employee working on other company stuff.
You can rename it to DGT(drm graphics test), and published following  libdrm, 
or directly merge to libdrm, then everyone  can use it and develop it in same 
page, which is only my personal opinion.

We renamed it ot  IGT gpu tools, that was even enough for ARM folks.
If this is seriously what AMD expects before considering, I'm not sure
what to say.

Alex, Christian, is this the official AMD stance that you can't touch
stuff because of the letter i?
Nope, as I said last, this is just my personal thought. And I'm not sure 
what opinion of others.


-David

-Daniel



Regards,
David


-Original Message-
From: dri-devel  On Behalf Of Eric
Anholt
Sent: Friday, October 26, 2018 12:36 AM
To: Sean Paul ; Daniel Vetter 
Cc: IGT development ; Intel Graphics
Development ; DRI Development ; amd-...@lists.freedesktop.org
Subject: Re: [igt-dev] [PATCH] RFC: Make igts for cross-driver stuff
mandatory?

Sean Paul  writes:


On Fri, Oct 19, 2018 at 10:50:49AM +0200, Daniel Vetter wrote:

Hi all,

This is just to collect feedback on this idea, and see whether the
overall dri-devel community stands on all this. I think the past few
cross-vendor uapi extensions all came with igts attached, and
personally I think there's lots of value in having them: A
cross-vendor interface isn't useful if every driver implements it
slightly differently.

I think there's 2 questions here:

- Do we want to make such testcases mandatory?


Yes, more testing == better code.



- If yes, are we there yet, or is there something crucially missing
   still?

In my experience, no. Last week while trying to replicate an intel-gfx
CI failure, I tried compiling igt for one of my (intel) chromebooks.
It seems like cross-compilation (or, in my case, just specifying
prefix/ld_library_path/sbin_path) is broken on igt. If we want to
impose restrictions across the entire subsystem, we need to make sure
that everyone can build and deploy igt easily.

I managed to hack around everything and get it working, but I still
haven't tried switching out the toolchain. Once we have some GitLab CI
to validate cross-compilation, then we can consider making IGT mandatory.

It's possible that I'm just a meson n00b and didn't use the right
incantation, so maybe it already works, but then we need better

documentation.

I've pasted my horrible hacks below, I also didn't have libunwind, so
removed its usage.

I've also had to cut out libunwind for cross-compiling on many occasions.
Worst library.





___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/syncobj: Avoid kmalloc(GFP_KERNEL) under spinlock

2018-10-26 Thread zhoucm1

Thanks, Could you help to submit to drm-misc again?

-David


On 2018年10月26日 15:43, Christian König wrote:

Am 26.10.18 um 08:20 schrieb Chunming Zhou:
drivers/gpu/drm/drm_syncobj.c:202:4-14: ERROR: function 
drm_syncobj_find_signal_pt_for_point called on line 390 inside lock 
on line 389 but uses GFP_KERNEL


   Find functions that refer to GFP_KERNEL but are called with locks 
held.


Generated by: scripts/coccinelle/locks/call_kern.cocci

v2:
syncobj->timeline still needs protect.

v3:
use a global signaled fence instead of re-allocation.

v4:
Don't need moving lock.
Don't expose func.

v5:
rename func and directly return.

Tested by: syncobj_wait and ./deqp-vk -n dEQP-VK.*semaphore* with
lock debug kernel options enabled.

Signed-off-by: Chunming Zhou 
Cc: Maarten Lankhorst 
Cc: intel-...@lists.freedesktop.org
Cc: Christian König 
Cc: Chris Wilson 
CC: Julia Lawall 
Reviewed-by: Chris Wilson 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/drm_syncobj.c | 36 ++-
  1 file changed, 19 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c 
b/drivers/gpu/drm/drm_syncobj.c

index b7eaa603f368..d1c6f21c72b5 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -80,6 +80,23 @@ struct drm_syncobj_signal_pt {
  struct list_head list;
  };
  +static DEFINE_SPINLOCK(signaled_fence_lock);
+static struct dma_fence signaled_fence;
+
+static struct dma_fence *drm_syncobj_get_stub_fence(void)
+{
+    spin_lock(_fence_lock);
+    if (!signaled_fence.ops) {
+    dma_fence_init(_fence,
+   _syncobj_stub_fence_ops,
+   _fence_lock,
+   0, 0);
+    dma_fence_signal_locked(_fence);
+    }
+    spin_unlock(_fence_lock);
+
+    return dma_fence_get(_fence);
+}
  /**
   * drm_syncobj_find - lookup and reference a sync object.
   * @file_private: drm file private pointer
@@ -113,23 +130,8 @@ static struct dma_fence
  struct drm_syncobj_signal_pt *signal_pt;
    if ((syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE) &&
-    (point <= syncobj->timeline)) {
-    struct drm_syncobj_stub_fence *fence =
-    kzalloc(sizeof(struct drm_syncobj_stub_fence),
-    GFP_KERNEL);
-
-    if (!fence)
-    return NULL;
-    spin_lock_init(>lock);
-    dma_fence_init(>base,
-   _syncobj_stub_fence_ops,
-   >lock,
-   syncobj->timeline_context,
-   point);
-
-    dma_fence_signal(>base);
-    return >base;
-    }
+    (point <= syncobj->timeline))
+    return drm_syncobj_get_stub_fence();
    list_for_each_entry(signal_pt, >signal_pt_list, list) {
  if (point > signal_pt->value)




___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Intel-gfx] [PATCH] drm: fix call_kern.cocci warnings

2018-10-25 Thread zhoucm1



On 2018年10月25日 17:38, Christian König wrote:

Am 25.10.18 um 11:35 schrieb Daniel Vetter:

On Thu, Oct 25, 2018 at 04:36:34PM +0800, Chunming Zhou wrote:
drivers/gpu/drm/drm_syncobj.c:202:4-14: ERROR: function 
drm_syncobj_find_signal_pt_for_point called on line 390 inside lock 
on line 389 but uses GFP_KERNEL


   Find functions that refer to GFP_KERNEL but are called with locks 
held.

Uh ... if your internal validation doesn't catch stuff like this (boils
down to a) run code b) with sleep debugging enabled) what exactly do you
do?


Who says we do any internal validation? ;)
It's my mistake, I switched to new kernel and didn't check if these 
DEBUG options are enabled or not.

I shouldn't take this kind mistake, anyway.

Sorry for that,
David


But kidding aside, this code path is only taken after garbage 
collection has already cleaned up the original fence and we need to 
return a dummy.


I think we just never covered that in the testing.

Christian.



Not really inspiring confindence.
-Daniel


Semantic patch information:
   The proposed change of converting the GFP_KERNEL is not 
necessarily the
   correct one.  It may be desired to unlock the lock, or to not 
call the

   function under the lock in the first place.

Generated by: scripts/coccinelle/locks/call_kern.cocci

Signed-off-by: Chunming Zhou 
Cc: Maarten Lankhorst 
Cc: intel-...@lists.freedesktop.org
Cc: Christian König 
---
  drivers/gpu/drm/drm_syncobj.c | 12 ++--
  1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c 
b/drivers/gpu/drm/drm_syncobj.c

index b7eaa603f368..c9099549ddcb 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -111,6 +111,7 @@ static struct dma_fence
    uint64_t point)
  {
  struct drm_syncobj_signal_pt *signal_pt;
+    struct dma_fence *fence = NULL;
    if ((syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE) &&
  (point <= syncobj->timeline)) {
@@ -131,15 +132,18 @@ static struct dma_fence
  return >base;
  }
  +    spin_lock(>pt_lock);
  list_for_each_entry(signal_pt, >signal_pt_list, list) {
  if (point > signal_pt->value)
  continue;
  if ((syncobj->type == DRM_SYNCOBJ_TYPE_BINARY) &&
  (point != signal_pt->value))
  continue;
-    return dma_fence_get(_pt->fence_array->base);
+    fence = dma_fence_get(_pt->fence_array->base);
+    break;
  }
-    return NULL;
+    spin_unlock(>pt_lock);
+    return fence;
  }
    static void drm_syncobj_add_callback_locked(struct drm_syncobj 
*syncobj,
@@ -166,9 +170,7 @@ static void 
drm_syncobj_fence_get_or_add_callback(struct drm_syncobj *syncobj,

  }
    mutex_lock(>cb_mutex);
-    spin_lock(>pt_lock);
  *fence = drm_syncobj_find_signal_pt_for_point(syncobj, pt_value);
-    spin_unlock(>pt_lock);
  if (!*fence)
  drm_syncobj_add_callback_locked(syncobj, cb, func);
  mutex_unlock(>cb_mutex);
@@ -379,11 +381,9 @@ drm_syncobj_point_get(struct drm_syncobj 
*syncobj, u64 point, u64 flags,

  if (ret < 0)
  return ret;
  }
-    spin_lock(>pt_lock);
  *fence = drm_syncobj_find_signal_pt_for_point(syncobj, point);
  if (!*fence)
  ret = -EINVAL;
-    spin_unlock(>pt_lock);
  return ret;
  }
  --
2.17.1

___
Intel-gfx mailing list
intel-...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm: fix call_kern.cocci warnings v2

2018-10-25 Thread zhoucm1



On 2018年10月25日 17:30, Koenig, Christian wrote:

Am 25.10.18 um 11:28 schrieb zhoucm1:




On 2018年10月25日 17:23, Koenig, Christian wrote:

Am 25.10.18 um 11:20 schrieb zhoucm1:




On 2018年10月25日 17:11, Koenig, Christian wrote:

Am 25.10.18 um 11:03 schrieb zhoucm1:




On 2018年10月25日 16:56, Christian König wrote:

+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -111,15 +111,16 @@ static struct dma_fence
    uint64_t point)
  {
  struct drm_syncobj_signal_pt *signal_pt;
+    struct dma_fence *f = NULL;
+    struct drm_syncobj_stub_fence *fence =
+    kzalloc(sizeof(struct drm_syncobj_stub_fence),
+    GFP_KERNEL);
  +    if (!fence)
+    return NULL;
+    spin_lock(>pt_lock);


How about using a single static stub fence like I suggested? 

Sorry, I don't get your meanings, how to do that?


Add a new function drm_syncobj_stub_fence_init() which is called 
from drm_core_init() when the module is loaded.


In drm_syncobj_stub_fence_init() you initialize one static 
stub_fence which is then used over and over again.

Seems it would not work, we could need more than one stub fence.


Mhm, why? I mean it is just a signaled fence,


If A gets the global stub fence, doesn't put it yet, then B is 
coming, how does B re-use the global stub fence?  anything I 
misunderstand?


dma_fence_get()? The whole thing is reference counted, every time you 
need it you grab another reference.


Since we globally initialize it the reference never becomes zero, so 
it is never released.
I got your means now, always a signaled fence, no need re-initialize it, 
that's ok.


Thanks,
David



Christian.



David

context and sequence number are irrelevant.

Christian.



David


Since its reference count never goes down to zero it should never 
be freed. In doubt maybe add a .free callback which just calls 
BUG() to catch reference count issues.


Christian.



Thanks,
David












___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm: fix call_kern.cocci warnings v2

2018-10-25 Thread zhoucm1



On 2018年10月25日 17:23, Koenig, Christian wrote:

Am 25.10.18 um 11:20 schrieb zhoucm1:




On 2018年10月25日 17:11, Koenig, Christian wrote:

Am 25.10.18 um 11:03 schrieb zhoucm1:




On 2018年10月25日 16:56, Christian König wrote:

+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -111,15 +111,16 @@ static struct dma_fence
    uint64_t point)
  {
  struct drm_syncobj_signal_pt *signal_pt;
+    struct dma_fence *f = NULL;
+    struct drm_syncobj_stub_fence *fence =
+    kzalloc(sizeof(struct drm_syncobj_stub_fence),
+    GFP_KERNEL);
  +    if (!fence)
+    return NULL;
+    spin_lock(>pt_lock);


How about using a single static stub fence like I suggested? 

Sorry, I don't get your meanings, how to do that?


Add a new function drm_syncobj_stub_fence_init() which is called 
from drm_core_init() when the module is loaded.


In drm_syncobj_stub_fence_init() you initialize one static 
stub_fence which is then used over and over again.

Seems it would not work, we could need more than one stub fence.


Mhm, why? I mean it is just a signaled fence,


If A gets the global stub fence, doesn't put it yet, then B is coming, 
how does B re-use the global stub fence?  anything I misunderstand?


David

context and sequence number are irrelevant.

Christian.



David


Since its reference count never goes down to zero it should never be 
freed. In doubt maybe add a .free callback which just calls BUG() to 
catch reference count issues.


Christian.



Thanks,
David








___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm: fix call_kern.cocci warnings v2

2018-10-25 Thread zhoucm1



On 2018年10月25日 17:11, Koenig, Christian wrote:

Am 25.10.18 um 11:03 schrieb zhoucm1:




On 2018年10月25日 16:56, Christian König wrote:

+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -111,15 +111,16 @@ static struct dma_fence
    uint64_t point)
  {
  struct drm_syncobj_signal_pt *signal_pt;
+    struct dma_fence *f = NULL;
+    struct drm_syncobj_stub_fence *fence =
+    kzalloc(sizeof(struct drm_syncobj_stub_fence),
+    GFP_KERNEL);
  +    if (!fence)
+    return NULL;
+    spin_lock(>pt_lock);


How about using a single static stub fence like I suggested? 

Sorry, I don't get your meanings, how to do that?


Add a new function drm_syncobj_stub_fence_init() which is called from 
drm_core_init() when the module is loaded.


In drm_syncobj_stub_fence_init() you initialize one static stub_fence 
which is then used over and over again.

Seems it would not work, we could need more than one stub fence.

David


Since its reference count never goes down to zero it should never be 
freed. In doubt maybe add a .free callback which just calls BUG() to 
catch reference count issues.


Christian.



Thanks,
David




___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm: fix call_kern.cocci warnings v2

2018-10-25 Thread zhoucm1



On 2018年10月25日 16:56, Christian König wrote:

+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -111,15 +111,16 @@ static struct dma_fence
    uint64_t point)
  {
  struct drm_syncobj_signal_pt *signal_pt;
+    struct dma_fence *f = NULL;
+    struct drm_syncobj_stub_fence *fence =
+    kzalloc(sizeof(struct drm_syncobj_stub_fence),
+    GFP_KERNEL);
  +    if (!fence)
+    return NULL;
+    spin_lock(>pt_lock);


How about using a single static stub fence like I suggested? 

Sorry, I don't get your meanings, how to do that?

Thanks,
David
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm: fix call_kern.cocci warnings

2018-10-25 Thread zhoucm1

pls ignore this one, please review v2.



On 2018年10月25日 16:36, Chunming Zhou wrote:

drivers/gpu/drm/drm_syncobj.c:202:4-14: ERROR: function 
drm_syncobj_find_signal_pt_for_point called on line 390 inside lock on line 389 
but uses GFP_KERNEL

   Find functions that refer to GFP_KERNEL but are called with locks held.

Semantic patch information:
   The proposed change of converting the GFP_KERNEL is not necessarily the
   correct one.  It may be desired to unlock the lock, or to not call the
   function under the lock in the first place.

Generated by: scripts/coccinelle/locks/call_kern.cocci

Signed-off-by: Chunming Zhou 
Cc: Maarten Lankhorst 
Cc: intel-...@lists.freedesktop.org
Cc: Christian König 
---
  drivers/gpu/drm/drm_syncobj.c | 12 ++--
  1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index b7eaa603f368..c9099549ddcb 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -111,6 +111,7 @@ static struct dma_fence
  uint64_t point)
  {
struct drm_syncobj_signal_pt *signal_pt;
+   struct dma_fence *fence = NULL;
  
  	if ((syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE) &&

(point <= syncobj->timeline)) {
@@ -131,15 +132,18 @@ static struct dma_fence
return >base;
}
  
+	spin_lock(>pt_lock);

list_for_each_entry(signal_pt, >signal_pt_list, list) {
if (point > signal_pt->value)
continue;
if ((syncobj->type == DRM_SYNCOBJ_TYPE_BINARY) &&
(point != signal_pt->value))
continue;
-   return dma_fence_get(_pt->fence_array->base);
+   fence = dma_fence_get(_pt->fence_array->base);
+   break;
}
-   return NULL;
+   spin_unlock(>pt_lock);
+   return fence;
  }
  
  static void drm_syncobj_add_callback_locked(struct drm_syncobj *syncobj,

@@ -166,9 +170,7 @@ static void drm_syncobj_fence_get_or_add_callback(struct 
drm_syncobj *syncobj,
}
  
  	mutex_lock(>cb_mutex);

-   spin_lock(>pt_lock);
*fence = drm_syncobj_find_signal_pt_for_point(syncobj, pt_value);
-   spin_unlock(>pt_lock);
if (!*fence)
drm_syncobj_add_callback_locked(syncobj, cb, func);
mutex_unlock(>cb_mutex);
@@ -379,11 +381,9 @@ drm_syncobj_point_get(struct drm_syncobj *syncobj, u64 
point, u64 flags,
if (ret < 0)
return ret;
}
-   spin_lock(>pt_lock);
*fence = drm_syncobj_find_signal_pt_for_point(syncobj, point);
if (!*fence)
ret = -EINVAL;
-   spin_unlock(>pt_lock);
return ret;
  }
  


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm: fix call_kern.cocci warnings (fwd)

2018-10-25 Thread zhoucm1

will send a fix soon.


Thanks,

David


On 2018年10月25日 15:57, Koenig, Christian wrote:

Am 25.10.18 um 09:51 schrieb Maarten Lankhorst:

Op 25-10-18 om 08:53 schreef Christian König:

Am 25.10.18 um 03:28 schrieb Zhou, David(ChunMing):

Reviewed-by: Chunming Zhou 

NAK, GFP_ATOMIC should be avoided.

The correct solution is to move the allocation out of the spinlock or drop the 
lock and reacquire.

Yeah +1. Especially in a case like this where it's obvious to prevent. :)

Another possibility would to not allocate the dummy fence at all.

E.g. we just need a global instance of that which is always signaled and
has a reference count of +1.

Christian.


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm: fix deadlock of syncobj v5

2018-10-23 Thread zhoucm1



On 2018年10月23日 17:01, Chris Wilson wrote:

Quoting Chunming Zhou (2018-10-23 08:57:54)

v2:
add a mutex between sync_cb execution and free.
v3:
clearly separating the roles for pt_lock and cb_mutex (Chris)
v4:
the cb_mutex should be taken outside of the pt_lock around this if() block. 
(Chris)
v5:
fix a corner case

Tested by syncobj_basic and syncobj_wait of igt.

Signed-off-by: Chunming Zhou 
Cc: Daniel Vetter 
Cc: Chris Wilson 
Cc: Christian König 
Cc: intel-...@lists.freedesktop.org
Reviewed-by: Chris Wilson 
---
  drivers/gpu/drm/drm_syncobj.c | 55 +++
  include/drm/drm_syncobj.h |  8 +++--
  2 files changed, 36 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 57bf6006394d..679a56791e34 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -125,23 +125,26 @@ static int drm_syncobj_fence_get_or_add_callback(struct 
drm_syncobj *syncobj,
 if (!ret)
 return 1;
  
-   spin_lock(>lock);

+   mutex_lock(>cb_mutex);
 /* We've already tried once to get a fence and failed.  Now that we
  * have the lock, try one more time just to be sure we don't add a
  * callback when a fence has already been set.
  */
+   spin_lock(>pt_lock);
 if (!list_empty(>signal_pt_list)) {
-   spin_unlock(>lock);
+   spin_unlock(>pt_lock);
 drm_syncobj_search_fence(syncobj, 0, 0, fence);

Hmm, just thinking of other ways of tidying this up

mutex_lock(cb_lock);
spin_lock(pt_lock);
*fence = drm_syncobj_find_signal_pt_for_point();
spin_unlock(pt_list);
if (*!fence)
drm_syncobj_add_callback_locked(syncobj, cb, func);
mutex_unlock(cb_lock);

i.e. get rid of the early return and we can even drop the int return here
as it is unimportant and unused.

Yes, do you need I send v6? or you make a separate patch as a improvment?

Thanks,
David Zhou

-Chris


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm: fix deadlock of syncobj v4

2018-10-23 Thread zhoucm1



On 2018年10月23日 15:51, Chris Wilson wrote:

Quoting Chunming Zhou (2018-10-23 02:50:08)

v2:
add a mutex between sync_cb execution and free.
v3:
clearly separating the roles for pt_lock and cb_mutex (Chris)
v4:
the cb_mutex should be taken outside of the pt_lock around this if() block. 
(Chris)

Tested by syncobj_basic and syncobj_wait of igt.

Signed-off-by: Chunming Zhou 
Cc: Daniel Vetter 
Cc: Chris Wilson 
Cc: Christian König 
---
  drivers/gpu/drm/drm_syncobj.c | 51 ++-
  include/drm/drm_syncobj.h |  8 --
  2 files changed, 33 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 57bf6006394d..315f08132f6d 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -125,23 +125,24 @@ static int drm_syncobj_fence_get_or_add_callback(struct 
drm_syncobj *syncobj,
 if (!ret)
 return 1;
  
-   spin_lock(>lock);

+   mutex_lock(>cb_mutex);
 /* We've already tried once to get a fence and failed.  Now that we
  * have the lock, try one more time just to be sure we don't add a
  * callback when a fence has already been set.
  */
+   spin_lock(>pt_lock);
 if (!list_empty(>signal_pt_list)) {
-   spin_unlock(>lock);
+   spin_unlock(>pt_lock);
 drm_syncobj_search_fence(syncobj, 0, 0, fence);
 if (*fence)

mutex_unlock(>cb_mutex);

fixed.



With that,
Reviewed-by: Chris Wilson 

Thanks



Can you please resend with Cc: intel-...@lists.freedesktop.org so we can
double check the fix.

resent, Could you help to submit patch to drm-misc?

Thanks,
David Zhou

-Chris


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm: fix deadlock of syncobj v2

2018-10-22 Thread zhoucm1



On 2018年10月22日 16:34, Chris Wilson wrote:

Quoting Chunming Zhou (2018-10-21 12:14:24)

v2:
add a mutex between sync_cb execution and free.

The result would appear to be that syncobj->lock is relegated to
protecting the pt_list and the mutex would only be needed for the
syncobj->cb_list.

The patch looks correct for resolving the deadlock and avoiding
introducing any new fails, but I do wonder if

spinlock_t pt_lock;
struct mutex cb_mutex;

and clearly separating the roles would not be a better approach.

good idea, thanks,

-David

-Chris


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm: fix deadlock of syncobj

2018-10-19 Thread zhoucm1



On 2018年10月19日 20:08, Koenig, Christian wrote:

Am 19.10.18 um 14:01 schrieb zhoucm1:


On 2018年10月19日 19:26, zhoucm1 wrote:


On 2018年10月19日 18:50, Chris Wilson wrote:

Quoting Chunming Zhou (2018-10-19 11:26:41)

Signed-off-by: Chunming Zhou 
Cc: Daniel Vetter 
Cc: Chris Wilson 
Cc: Christian König 
---
   drivers/gpu/drm/drm_syncobj.c | 7 +--
   1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c
b/drivers/gpu/drm/drm_syncobj.c
index 57bf6006394d..2f3c14cb5156 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -344,13 +344,16 @@ void drm_syncobj_replace_fence(struct
drm_syncobj *syncobj,
  drm_syncobj_create_signal_pt(syncobj, fence, pt_value);
  if (fence) {
  struct drm_syncobj_cb *cur, *tmp;
+   struct list_head cb_list;
+   INIT_LIST_HEAD(_list);

LIST_HEAD(cb_list); // does both in one


spin_lock(>lock);
-   list_for_each_entry_safe(cur, tmp,
>cb_list, node) {
+   list_splice_init(>cb_list, _list);

Steal the snapshot of the list under the lock, ok.


+ spin_unlock(>lock);
+   list_for_each_entry_safe(cur, tmp, _list, node) {
  list_del_init(>node);

Races against external caller of drm_syncobj_remove_callback().
However,
it looks like that race is just fine, but we don't guard against the
struct drm_syncobj_cb itself being freed, leading to all sort of fun
for
an interrupted drm_syncobj_array_wait_timeout.

Thanks quick review, I will use "while (!list_empty()) { e =
list_first_entry(); list_del(e)" to avoid deadlock.

this still cannot resolve freeing problem,  do you mind I change
spinlock to mutex?

How does that help?

What you could do is to merge the array of fences into the beginning of
the signal_pt, e.g. something like this:

struct drm_syncobj_signal_pt {
          struct dma_fence fences[2];
      struct dma_fence_array *fence_array;
      u64    value;
      struct list_head list;
};

This way the drm_syncobj_signal_pt is freed when the fence_array is
freed. That should be sufficient if we correctly reference count the
fence_array.

I'm not sure what problem you said, the deadlock reason is :
Cb func will call drm_syncobj_search_fence, which will need to grab the 
lock, otherwise deadlock.


But when we steal list or use "while (!list_empty()) { e = 
list_first_entry(); list_del(e)", both will encounter another freeing 
problem, that is syncobj_cb could be freed when wait timeout.


If we change to use mutex, then we can move lock inside of _search_fence 
out.
another way is we add a separate spinlock for signal_pt_list, not share 
one lock with cb_list.


Regards,
David


Christian.


Thanks,
David Zhou

will send v2 in one minute.

Regards,
David Zhou

That kfree seems to undermine the validity of stealing the list.
-Chris


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm: fix deadlock of syncobj

2018-10-19 Thread zhoucm1



On 2018年10月19日 19:26, zhoucm1 wrote:



On 2018年10月19日 18:50, Chris Wilson wrote:

Quoting Chunming Zhou (2018-10-19 11:26:41)

Signed-off-by: Chunming Zhou 
Cc: Daniel Vetter 
Cc: Chris Wilson 
Cc: Christian König 
---
  drivers/gpu/drm/drm_syncobj.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c 
b/drivers/gpu/drm/drm_syncobj.c

index 57bf6006394d..2f3c14cb5156 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -344,13 +344,16 @@ void drm_syncobj_replace_fence(struct 
drm_syncobj *syncobj,

 drm_syncobj_create_signal_pt(syncobj, fence, pt_value);
 if (fence) {
 struct drm_syncobj_cb *cur, *tmp;
+   struct list_head cb_list;
+   INIT_LIST_HEAD(_list);

LIST_HEAD(cb_list); // does both in one


spin_lock(>lock);
-   list_for_each_entry_safe(cur, tmp, 
>cb_list, node) {

+   list_splice_init(>cb_list, _list);

Steal the snapshot of the list under the lock, ok.


+ spin_unlock(>lock);
+   list_for_each_entry_safe(cur, tmp, _list, node) {
 list_del_init(>node);

Races against external caller of drm_syncobj_remove_callback(). However,
it looks like that race is just fine, but we don't guard against the
struct drm_syncobj_cb itself being freed, leading to all sort of fun for
an interrupted drm_syncobj_array_wait_timeout.
Thanks quick review, I will use "while (!list_empty()) { e = 
list_first_entry(); list_del(e)" to avoid deadlock.
this still cannot resolve freeing problem,  do you mind I change 
spinlock to mutex?


Thanks,
David Zhou


will send v2 in one minute.

Regards,
David Zhou


That kfree seems to undermine the validity of stealing the list.
-Chris




___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm: fix deadlock of syncobj

2018-10-19 Thread zhoucm1



On 2018年10月19日 18:50, Chris Wilson wrote:

Quoting Chunming Zhou (2018-10-19 11:26:41)

Signed-off-by: Chunming Zhou 
Cc: Daniel Vetter 
Cc: Chris Wilson 
Cc: Christian König 
---
  drivers/gpu/drm/drm_syncobj.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 57bf6006394d..2f3c14cb5156 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -344,13 +344,16 @@ void drm_syncobj_replace_fence(struct drm_syncobj 
*syncobj,
 drm_syncobj_create_signal_pt(syncobj, fence, pt_value);
 if (fence) {
 struct drm_syncobj_cb *cur, *tmp;
+   struct list_head cb_list;
+   INIT_LIST_HEAD(_list);

LIST_HEAD(cb_list); // does both in one


 spin_lock(>lock);
-   list_for_each_entry_safe(cur, tmp, >cb_list, node) {
+   list_splice_init(>cb_list, _list);

Steal the snapshot of the list under the lock, ok.


+   spin_unlock(>lock);
+   list_for_each_entry_safe(cur, tmp, _list, node) {
 list_del_init(>node);

Races against external caller of drm_syncobj_remove_callback(). However,
it looks like that race is just fine, but we don't guard against the
struct drm_syncobj_cb itself being freed, leading to all sort of fun for
an interrupted drm_syncobj_array_wait_timeout.
Thanks quick review, I will use "while (!list_empty()) { e = 
list_first_entry(); list_del(e)" to avoid deadlock.


will send v2 in one minute.

Regards,
David Zhou


That kfree seems to undermine the validity of stealing the list.
-Chris


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 2/7] drm: add syncobj timeline support v8

2018-10-19 Thread zhoucm1


[snip]

Went boom:

https://bugs.freedesktop.org/show_bug.cgi?id=108490

Can we revert pls?

Sorry for bug, please.
In fact, the bug is already caught and fixed, just the fix part isn't 
in patch#1, but in patch#2:


Have you reverted? If not, I can send that fix in one minute.


Regards,
David Zhou


Also, can we please have igts for this stuff so that intel-gfx-ci could
test this properly before it's all fireworks?

Hi Daniel V,

Could you point me which problem I encounter when I run syncobj_wait of igt?

jenkins@jenkins-MS-7984:~/freedesktop/igt-gpu-tools/tests$ ./syncobj_wait
IGT-Version: 1.23-g94ebd21 (x86_64) (Linux: 4.19.0-rc5-custom+ x86_64)
Test requirement not met in function igt_require_sw_sync, file 
sw_sync.c:240:

Test requirement: kernel_has_sw_sync()
Last errno: 2, No such file or directory

Thanks,
David Zhou
Seems we cannot avoid igt now and vulkan CTS isn't enough, I will 
find some time next week to lean IGT, looks v10 is need.


Regards,
David Zhou


Thanks, Daniel


The rest in the series looks good to me as well,

Can I get your RB on them first?

but I certainly want the radv/anv developers to take a look as 
well as

Daniel suggested.
Ping @Dave/Bas/Jason or other radv/anv developers, Could anyone of 
you take
a look the rest of series for u/k interface? So that we can move to 
next

step for libdrm patches?

Thanks,
David

Christian.


Thanks,
David Zhou


-Daniel

___
amd-gfx mailing list
amd-...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 2/7] drm: add syncobj timeline support v8

2018-10-19 Thread zhoucm1



On 2018年10月19日 17:20, zhoucm1 wrote:



On 2018年10月19日 16:55, Daniel Vetter wrote:

On Fri, Oct 19, 2018 at 10:29:55AM +0800, zhoucm1 wrote:


On 2018年10月18日 19:50, Christian König wrote:

Am 18.10.18 um 05:11 schrieb zhoucm1:


On 2018年10月17日 18:24, Daniel Vetter wrote:

On Wed, Oct 17, 2018 at 11:29 AM Koenig, Christian
 wrote:

Am 17.10.18 um 11:17 schrieb zhoucm1:

[SNIP]

    +struct drm_syncobj_signal_pt {
+    struct dma_fence_array *base;
Out of curiosity, why the pointer and not embedding? base is 
kinda

misleading for a pointer.

Yeah, Christian doesn't like signal_pt lifecycle same as fence, so
it's a pointer.
If you don't like 'base' name, I can change it.

Well I never said that you can't embed the fence array into
the signal_pt.

You just need to make sure that we don't affect the drm_syncobj
lilecycle as well, e.g. that we don't also need to keep that 
around.

I don't see a problem with that, as long as drm_syncobj keeps a
reference to the fence while it's on the timeline list. Which it
already does. And embedding would avoid that 2nd separate 
allocation,

aside from making base less confusing.

That's indeed my initial implementation for signal_pt/wait_pt with
fence based, but after long and many discussions, we get current
solution, as you see, the version is up to v8 :).

For here  why the pointer and not embedding?
Two reasons:
1. their lifecycles are not same.
2. It is a fence array usage, which always needs separate
allocation, seems which is mandatory.
So it is a pointer.

But the name is historical from initial, and indeed be kinda
misleading for a pointer, I will change it to fence_array instead in
coming v9.

To avoid running into a v10 I've just pushed this version upstream :)

Thanks a lot.

(This time reply to the right patch, silly me)

Went boom:

https://bugs.freedesktop.org/show_bug.cgi?id=108490

Can we revert pls?

Sorry for bug, please.
In fact, the bug is already caught and fixed, just the fix part isn't in 
patch#1, but in patch#2:


Have you reverted? If not, I can send that fix in one minute.


Regards,
David Zhou


Also, can we please have igts for this stuff so that intel-gfx-ci could
test this properly before it's all fireworks?
Seems we cannot avoid igt now and vulkan CTS isn't enough, I will find 
some time next week to lean IGT, looks v10 is need.


Regards,
David Zhou


Thanks, Daniel


The rest in the series looks good to me as well,

Can I get your RB on them first?


but I certainly want the radv/anv developers to take a look as well as
Daniel suggested.
Ping @Dave/Bas/Jason or other radv/anv developers, Could anyone of 
you take
a look the rest of series for u/k interface? So that we can move to 
next

step for libdrm patches?

Thanks,
David

Christian.


Thanks,
David Zhou


-Daniel

___
amd-gfx mailing list
amd-...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 2/7] drm: add syncobj timeline support v8

2018-10-19 Thread zhoucm1



On 2018年10月19日 16:55, Daniel Vetter wrote:

On Fri, Oct 19, 2018 at 10:29:55AM +0800, zhoucm1 wrote:


On 2018年10月18日 19:50, Christian König wrote:

Am 18.10.18 um 05:11 schrieb zhoucm1:


On 2018年10月17日 18:24, Daniel Vetter wrote:

On Wed, Oct 17, 2018 at 11:29 AM Koenig, Christian
 wrote:

Am 17.10.18 um 11:17 schrieb zhoucm1:

[SNIP]

    +struct drm_syncobj_signal_pt {
+    struct dma_fence_array *base;

Out of curiosity, why the pointer and not embedding? base is kinda
misleading for a pointer.

Yeah, Christian doesn't like signal_pt lifecycle same as fence, so
it's a pointer.
If you don't like 'base' name, I can change it.

Well I never said that you can't embed the fence array into
the signal_pt.

You just need to make sure that we don't affect the drm_syncobj
lilecycle as well, e.g. that we don't also need to keep that around.

I don't see a problem with that, as long as drm_syncobj keeps a
reference to the fence while it's on the timeline list. Which it
already does. And embedding would avoid that 2nd separate allocation,
aside from making base less confusing.

That's indeed my initial implementation for signal_pt/wait_pt with
fence based, but after long and many discussions, we get current
solution, as you see, the version is up to v8 :).

For here  why the pointer and not embedding?
Two reasons:
1. their lifecycles are not same.
2. It is a fence array usage, which always needs separate
allocation, seems which is mandatory.
So it is a pointer.

But the name is historical from initial, and indeed be kinda
misleading for a pointer, I will change it to fence_array instead in
coming v9.

To avoid running into a v10 I've just pushed this version upstream :)

Thanks a lot.

(This time reply to the right patch, silly me)

Went boom:

https://bugs.freedesktop.org/show_bug.cgi?id=108490

Can we revert pls?

Sorry for bug, please.


Also, can we please have igts for this stuff so that intel-gfx-ci could
test this properly before it's all fireworks?
Seems we cannot avoid igt now and vulkan CTS isn't enough, I will find 
some time next week to lean IGT, looks v10 is need.


Regards,
David Zhou


Thanks, Daniel


The rest in the series looks good to me as well,

Can I get your RB on them first?


but I certainly want the radv/anv developers to take a look as well as
Daniel suggested.

Ping @Dave/Bas/Jason or other radv/anv developers, Could anyone of you take
a look the rest of series for u/k interface? So that we can move to next
step for libdrm patches?

Thanks,
David

Christian.


Thanks,
David Zhou


-Daniel

___
amd-gfx mailing list
amd-...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 2/7] drm: add syncobj timeline support v8

2018-10-18 Thread zhoucm1



On 2018年10月18日 19:50, Christian König wrote:

Am 18.10.18 um 05:11 schrieb zhoucm1:



On 2018年10月17日 18:24, Daniel Vetter wrote:

On Wed, Oct 17, 2018 at 11:29 AM Koenig, Christian
 wrote:

Am 17.10.18 um 11:17 schrieb zhoucm1:

[SNIP]

   +struct drm_syncobj_signal_pt {
+    struct dma_fence_array *base;

Out of curiosity, why the pointer and not embedding? base is kinda
misleading for a pointer.

Yeah, Christian doesn't like signal_pt lifecycle same as fence, so
it's a pointer.
If you don't like 'base' name, I can change it.
Well I never said that you can't embed the fence array into the 
signal_pt.


You just need to make sure that we don't affect the drm_syncobj
lilecycle as well, e.g. that we don't also need to keep that around.

I don't see a problem with that, as long as drm_syncobj keeps a
reference to the fence while it's on the timeline list. Which it
already does. And embedding would avoid that 2nd separate allocation,
aside from making base less confusing.
That's indeed my initial implementation for signal_pt/wait_pt with 
fence based, but after long and many discussions, we get current 
solution, as you see, the version is up to v8 :).


For here  why the pointer and not embedding?
Two reasons:
1. their lifecycles are not same.
2. It is a fence array usage, which always needs separate allocation, 
seems which is mandatory.

So it is a pointer.

But the name is historical from initial, and indeed be kinda 
misleading for a pointer, I will change it to fence_array instead in 
coming v9.


To avoid running into a v10 I've just pushed this version upstream :)

Thanks a lot.


The rest in the series looks good to me as well,

Can I get your RB on them first?

but I certainly want the radv/anv developers to take a look as well as 
Daniel suggested.
Ping @Dave/Bas/Jason or other radv/anv developers, Could anyone of you 
take a look the rest of series for u/k interface? So that we can move to 
next step for libdrm patches?


Thanks,
David


Christian.



Thanks,
David Zhou


-Daniel


___
amd-gfx mailing list
amd-...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx




___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 2/7] drm: add syncobj timeline support v8

2018-10-17 Thread zhoucm1



On 2018年10月17日 18:24, Daniel Vetter wrote:

On Wed, Oct 17, 2018 at 11:29 AM Koenig, Christian
 wrote:

Am 17.10.18 um 11:17 schrieb zhoucm1:

[SNIP]

   +struct drm_syncobj_signal_pt {
+struct dma_fence_array *base;

Out of curiosity, why the pointer and not embedding? base is kinda
misleading for a pointer.

Yeah, Christian doesn't like signal_pt lifecycle same as fence, so
it's a pointer.
If you don't like 'base' name, I can change it.

Well I never said that you can't embed the fence array into the signal_pt.

You just need to make sure that we don't affect the drm_syncobj
lilecycle as well, e.g. that we don't also need to keep that around.

I don't see a problem with that, as long as drm_syncobj keeps a
reference to the fence while it's on the timeline list. Which it
already does. And embedding would avoid that 2nd separate allocation,
aside from making base less confusing.
That's indeed my initial implementation for signal_pt/wait_pt with fence 
based, but after long and many discussions, we get current solution, as 
you see, the version is up to v8 :).


For here  why the pointer and not embedding?
Two reasons:
1. their lifecycles are not same.
2. It is a fence array usage, which always needs separate allocation, 
seems which is mandatory.

So it is a pointer.

But the name is historical from initial, and indeed be kinda misleading 
for a pointer, I will change it to fence_array instead in coming v9.


Thanks,
David Zhou


-Daniel


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 2/7] drm: add syncobj timeline support v8

2018-10-17 Thread zhoucm1

+Jason as well.


On 2018年10月17日 18:22, Daniel Vetter wrote:

On Wed, Oct 17, 2018 at 11:17 AM zhoucm1  wrote:



On 2018年10月17日 16:09, Daniel Vetter wrote:

On Mon, Oct 15, 2018 at 04:55:48PM +0800, Chunming Zhou wrote:

This patch is for VK_KHR_timeline_semaphore extension, semaphore is called 
syncobj in kernel side:
This extension introduces a new type of syncobj that has an integer payload
identifying a point in a timeline. Such timeline syncobjs support the
following operations:
 * CPU query - A host operation that allows querying the payload of the
   timeline syncobj.
 * CPU wait - A host operation that allows a blocking wait for a
   timeline syncobj to reach a specified value.
 * Device wait - A device operation that allows waiting for a
   timeline syncobj to reach a specified value.
 * Device signal - A device operation that allows advancing the
   timeline syncobj to a specified value.

v1:
Since it's a timeline, that means the front time point(PT) always is signaled 
before the late PT.
a. signal PT design:
Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when 
PT[N] fence is signaled,
the timeline will increase to value of PT[N].
b. wait PT design:
Wait PT fence is signaled by reaching timeline point value, when timeline is 
increasing, will compare
wait PTs value with new timeline value, if PT value is lower than timeline 
value, then wait PT will be
signaled, otherwise keep in list. syncobj wait operation can wait on any point 
of timeline,
so need a RB tree to order them. And wait PT could ahead of signal PT, we need 
a sumission fence to
perform that.

v2:
1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian)
2. move unexposed denitions to .c file. (Daniel Vetter)
3. split up the change to drm_syncobj_find_fence() in a separate patch. 
(Christian)
4. split up the change to drm_syncobj_replace_fence() in a separate patch.
5. drop the submission_fence implementation and instead use wait_event() for 
that. (Christian)
6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter)

v3:
1. replace normal syncobj with timeline implemenation. (Vetter and Christian)
  a. normal syncobj signal op will create a signal PT to tail of signal pt 
list.
  b. normal syncobj wait op will create a wait pt with last signal point, 
and this wait PT is only signaled by related signal point PT.
2. many bug fix and clean up
3. stub fence moving is moved to other patch.

v4:
1. fix RB tree loop with while(node=rb_first(...)). (Christian)
2. fix syncobj lifecycle. (Christian)
3. only enable_signaling when there is wait_pt. (Christian)
4. fix timeline path issues.
5. write a timeline test in libdrm

v5: (Christian)
1. semaphore is called syncobj in kernel side.
2. don't need 'timeline' characters in some function name.
3. keep syncobj cb.

v6: (Christian)
1. merge syncobj_timeline to syncobj structure.
2. simplify some check sentences.
3. some misc change.
4. fix CTS failed issue.

v7: (Christian)
1. error handling when creating signal pt.
2. remove timeline naming in func.
3. export flags in find_fence.
4. allow reset timeline.

v8:
1. use wait_event_interruptible without timeout
2. rename _TYPE_INDIVIDUAL to _TYPE_BINARY

individual syncobj is tested by ./deqp-vk -n dEQP-VK*semaphore*
timeline syncobj is tested by ./amdgpu_test -s 9

Can we please have these low-level syncobj tests as part of igt, together
with all the other syncobj tests which are there already?

Good suggestion first, I'm just not familiar with igt( build, run
cmd...), maybe we can add it later.


Really doesn't
make much sense imo to splits things on the test suite front.

Signed-off-by: Chunming Zhou 
Cc: Christian Konig 
Cc: Dave Airlie 
Cc: Daniel Rakos 
Cc: Daniel Vetter 
Reviewed-by: Christian König 
---
   drivers/gpu/drm/drm_syncobj.c  | 287 ++---
   drivers/gpu/drm/i915/i915_gem_execbuffer.c |   2 +-
   include/drm/drm_syncobj.h  |  65 ++---
   include/uapi/drm/drm.h |   1 +
   4 files changed, 281 insertions(+), 74 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index f796c9fc3858..67472bd77c83 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -56,6 +56,9 @@
   #include "drm_internal.h"
   #include 

+/* merge normal syncobj to timeline syncobj, the point interval is 1 */
+#define DRM_SYNCOBJ_BINARY_POINT 1
+
   struct drm_syncobj_stub_fence {
  struct dma_fence base;
  spinlock_t lock;
@@ -82,6 +85,11 @@ static const struct dma_fence_ops drm_syncobj_stub_fence_ops 
= {
  .release = drm_syncobj_stub_fence_release,
   };

+struct drm_syncobj_signal_pt {
+struct dma_fence_array *base;

Out of curiosity, why the pointer and not embedding? base is kinda
misleading for a pointer.

Yeah, Christian doesn't like signal_pt lifecycle same as fence, so it's
a pointer.
If you don't like 'base' name, I 

Re: [PATCH 2/7] drm: add syncobj timeline support v8

2018-10-17 Thread zhoucm1



On 2018年10月17日 16:09, Daniel Vetter wrote:

On Mon, Oct 15, 2018 at 04:55:48PM +0800, Chunming Zhou wrote:

This patch is for VK_KHR_timeline_semaphore extension, semaphore is called 
syncobj in kernel side:
This extension introduces a new type of syncobj that has an integer payload
identifying a point in a timeline. Such timeline syncobjs support the
following operations:
* CPU query - A host operation that allows querying the payload of the
  timeline syncobj.
* CPU wait - A host operation that allows a blocking wait for a
  timeline syncobj to reach a specified value.
* Device wait - A device operation that allows waiting for a
  timeline syncobj to reach a specified value.
* Device signal - A device operation that allows advancing the
  timeline syncobj to a specified value.

v1:
Since it's a timeline, that means the front time point(PT) always is signaled 
before the late PT.
a. signal PT design:
Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when 
PT[N] fence is signaled,
the timeline will increase to value of PT[N].
b. wait PT design:
Wait PT fence is signaled by reaching timeline point value, when timeline is 
increasing, will compare
wait PTs value with new timeline value, if PT value is lower than timeline 
value, then wait PT will be
signaled, otherwise keep in list. syncobj wait operation can wait on any point 
of timeline,
so need a RB tree to order them. And wait PT could ahead of signal PT, we need 
a sumission fence to
perform that.

v2:
1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian)
2. move unexposed denitions to .c file. (Daniel Vetter)
3. split up the change to drm_syncobj_find_fence() in a separate patch. 
(Christian)
4. split up the change to drm_syncobj_replace_fence() in a separate patch.
5. drop the submission_fence implementation and instead use wait_event() for 
that. (Christian)
6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter)

v3:
1. replace normal syncobj with timeline implemenation. (Vetter and Christian)
 a. normal syncobj signal op will create a signal PT to tail of signal pt 
list.
 b. normal syncobj wait op will create a wait pt with last signal point, 
and this wait PT is only signaled by related signal point PT.
2. many bug fix and clean up
3. stub fence moving is moved to other patch.

v4:
1. fix RB tree loop with while(node=rb_first(...)). (Christian)
2. fix syncobj lifecycle. (Christian)
3. only enable_signaling when there is wait_pt. (Christian)
4. fix timeline path issues.
5. write a timeline test in libdrm

v5: (Christian)
1. semaphore is called syncobj in kernel side.
2. don't need 'timeline' characters in some function name.
3. keep syncobj cb.

v6: (Christian)
1. merge syncobj_timeline to syncobj structure.
2. simplify some check sentences.
3. some misc change.
4. fix CTS failed issue.

v7: (Christian)
1. error handling when creating signal pt.
2. remove timeline naming in func.
3. export flags in find_fence.
4. allow reset timeline.

v8:
1. use wait_event_interruptible without timeout
2. rename _TYPE_INDIVIDUAL to _TYPE_BINARY

individual syncobj is tested by ./deqp-vk -n dEQP-VK*semaphore*
timeline syncobj is tested by ./amdgpu_test -s 9

Can we please have these low-level syncobj tests as part of igt, together
with all the other syncobj tests which are there already?
Good suggestion first, I'm just not familiar with igt( build, run 
cmd...), maybe we can add it later.



Really doesn't
make much sense imo to splits things on the test suite front.

Signed-off-by: Chunming Zhou 
Cc: Christian Konig 
Cc: Dave Airlie 
Cc: Daniel Rakos 
Cc: Daniel Vetter 
Reviewed-by: Christian König 
---
  drivers/gpu/drm/drm_syncobj.c  | 287 ++---
  drivers/gpu/drm/i915/i915_gem_execbuffer.c |   2 +-
  include/drm/drm_syncobj.h  |  65 ++---
  include/uapi/drm/drm.h |   1 +
  4 files changed, 281 insertions(+), 74 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index f796c9fc3858..67472bd77c83 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -56,6 +56,9 @@
  #include "drm_internal.h"
  #include 
  
+/* merge normal syncobj to timeline syncobj, the point interval is 1 */

+#define DRM_SYNCOBJ_BINARY_POINT 1
+
  struct drm_syncobj_stub_fence {
struct dma_fence base;
spinlock_t lock;
@@ -82,6 +85,11 @@ static const struct dma_fence_ops drm_syncobj_stub_fence_ops 
= {
.release = drm_syncobj_stub_fence_release,
  };
  
+struct drm_syncobj_signal_pt {

+   struct dma_fence_array *base;

Out of curiosity, why the pointer and not embedding? base is kinda
misleading for a pointer.
Yeah, Christian doesn't like signal_pt lifecycle same as fence, so it's 
a pointer.

If you don't like 'base' name, I can change it.




+   u64value;
+   struct list_head list;
+};
  
  /**

   * drm_syncobj_find - lookup and reference a 

Re: [PATCH 7/7] drm/amdgpu: update version for timeline syncobj support in amdgpu

2018-10-17 Thread zhoucm1



On 2018年10月16日 20:54, Christian König wrote:

I've added my rb to patch #1 and pushed it to drm-misc-next.

I would really like to get an rb from other people on patch #2 before 
proceeding.


Daniel, Dave and all the other usual suspects on the list what is your 
opinion on this implementation?
Thanks for head up, @Daniel, @Dave, or the others, Could you take a look 
for the series?


Thanks,
David


Christian.

Am 15.10.2018 um 11:04 schrieb Koenig, Christian:

I'm on sick leave today.

But I will see what I can do later in the afternoon,
Christian.

Am 15.10.2018 um 11:01 schrieb Zhou, David(ChunMing):

Ping...
Christian, Could I get your RB on the series? And help me to push to 
drm-misc?

After that I can rebase libdrm header file based on drm-next.

Thanks,
David Zhou


-Original Message-
From: amd-gfx  On Behalf Of
Chunming Zhou
Sent: Monday, October 15, 2018 4:56 PM
To: dri-devel@lists.freedesktop.org
Cc: Zhou, David(ChunMing) ; amd-
g...@lists.freedesktop.org
Subject: [PATCH 7/7] drm/amdgpu: update version for timeline syncobj
support in amdgpu

Signed-off-by: Chunming Zhou 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ++-
   1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 6870909da926..58cba492ba55 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -70,9 +70,10 @@
    * - 3.25.0 - Add support for sensor query info (stable pstate 
sclk/mclk).

    * - 3.26.0 - GFX9: Process AMDGPU_IB_FLAG_TC_WB_NOT_INVALIDATE.
    * - 3.27.0 - Add new chunk to to AMDGPU_CS to enable BO_LIST 
creation.

+ * - 3.28.0 - Add syncobj timeline support to AMDGPU_CS.
    */
   #define KMS_DRIVER_MAJOR    3
-#define KMS_DRIVER_MINOR    27
+#define KMS_DRIVER_MINOR    28
   #define KMS_DRIVER_PATCHLEVEL    0

   int amdgpu_vram_limit = 0;
--
2.17.1

___
amd-gfx mailing list
amd-...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel




___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 1/4] [RFC]drm: add syncobj timeline support v6

2018-09-19 Thread zhoucm1



On 2018年09月19日 16:07, Christian König wrote:

Am 19.09.2018 um 10:03 schrieb Zhou, David(ChunMing):



-Original Message-
From: amd-gfx  On Behalf Of
Christian K?nig
Sent: Wednesday, September 19, 2018 3:45 PM
To: Zhou, David(ChunMing) ; Zhou,
David(ChunMing) ; dri-
de...@lists.freedesktop.org
Cc: Dave Airlie ; Rakos, Daniel
; Daniel Vetter ; amd-
g...@lists.freedesktop.org
Subject: Re: [PATCH 1/4] [RFC]drm: add syncobj timeline support v6

Am 19.09.2018 um 09:32 schrieb zhoucm1:


On 2018年09月19日 15:18, Christian König wrote:

Am 19.09.2018 um 06:26 schrieb Chunming Zhou:

[snip]

   *fence = NULL;
   drm_syncobj_add_callback_locked(syncobj, cb, func); @@
-164,6 +177,153 @@ void drm_syncobj_remove_callback(struct
drm_syncobj *syncobj,
   spin_unlock(>lock);
   }
   +static void drm_syncobj_timeline_init(struct drm_syncobj
*syncobj)

We still have _timeline_ in the name here.

the func is relevant to timeline members, or which name is proper?
Yeah, but we now use the timeline implementation for the individual 
syncobj

as well.

Not a big issue, but I would just name it
drm_syncobj_init()/drm_syncobj_fini.
There is already drm_syncobj_init/fini in drm_syncboj.c , any other 
name can be suggested?


Hui what? I actually checked that there is no 
drm_syncobj_init()/drm_syncobj_fini() in drm_syncobj.c before 
suggesting it. Am I missing something?

messed syncobj_create/destroy in brain :(




+{
+    spin_lock(>lock);
+    syncobj->timeline_context = dma_fence_context_alloc(1);

[snip]

+}
+
+int drm_syncobj_lookup_fence(struct drm_syncobj *syncobj, u64
+point,
+   struct dma_fence **fence) {
+
+    return drm_syncobj_search_fence(syncobj, point,
+ DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT,

I still have a bad feeling setting that flag as default cause it
might change the behavior for the UAPI.

Maybe export drm_syncobj_search_fence directly? E.g. with the flags
parameter.

previous v5 indeed do this, you let me wrap it, need change back?

No, the problem is that drm_syncobj_find_fence() is still using
drm_syncobj_lookup_fence() which sets the flag instead of
drm_syncobj_search_fence() without the flag.

That changes the UAPI behavior because previously we would have 
returned

an error code and now we block for a fence to appear.

So I think the right solution would be to add the flags parameter to
drm_syncobj_find_fence() and let the driver decide if we need to 
block or

get -ENOENT.

Got your means,
Exporting flag in func is easy,
  but driver doesn't pass flag, which flag is proper by default? We 
still need to give a default flag in patch, don't we?


Well proper solution is to keep the old behavior as it is for now.

So passing 0 as flag by default and making sure we get a -ENOENT in 
that case sounds like the right approach to me.


Adding the DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT flag can happen when 
the driver starts to provide a proper point as well.

Sounds more flexible.

v7 will come soon.

thanks,
David Zhou


Christian.



Thanks,
David Zhou


Regards,
Christian.


Regards,
David Zhou

Regards,
Christian.


+    fence);
+}
+EXPORT_SYMBOL(drm_syncobj_lookup_fence);
+
   /**
    * drm_syncobj_find_fence - lookup and reference the fence in a
sync object
    * @file_private: drm file private pointer @@ -228,7 +443,7 @@
static int drm_syncobj_assign_null_handle(struct
drm_syncobj *syncobj)
    * @fence: out parameter for the fence
    *
    * This is just a convenience function that combines
drm_syncobj_find() and
- * drm_syncobj_fence_get().
+ * drm_syncobj_lookup_fence().
    *
    * Returns 0 on success or a negative error value on failure. On
success @fence
    * contains a reference to the fence, which must be released by
calling @@ -236,18 +451,11 @@ static int
drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj)
    */
   int drm_syncobj_find_fence(struct drm_file *file_private,
  u32 handle, u64 point,
-   struct dma_fence **fence) -{
+   struct dma_fence **fence) {
   struct drm_syncobj *syncobj = drm_syncobj_find(file_private,
handle);
-    int ret = 0;
-
-    if (!syncobj)
-    return -ENOENT;
+    int ret;
   -    *fence = drm_syncobj_fence_get(syncobj);
-    if (!*fence) {
-    ret = -EINVAL;
-    }
+    ret = drm_syncobj_lookup_fence(syncobj, point, fence);
   drm_syncobj_put(syncobj);
   return ret;
   }
@@ -264,7 +472,7 @@ void drm_syncobj_free(struct kref *kref)
   struct drm_syncobj *syncobj = container_of(kref,
  struct drm_syncobj,
  refcount);
-    drm_syncobj_replace_fence(syncobj, 0, NULL);
+    drm_syncobj_timeline_fini(syncobj);
   kfree(syncobj);
   }
   EXPORT_SYMBOL(drm_syncobj_free);
@@ -294,6 +502,11 @@ int drm_syncobj_create(struct drm_syncobj
**out_syncobj, uint32_t flags,
   kref_init(>refcount);
   INIT_LIST_HEAD(>cb_list);
   spin_lock_init(>loc

Re: [PATCH 1/4] [RFC]drm: add syncobj timeline support v6

2018-09-19 Thread zhoucm1



On 2018年09月19日 15:18, Christian König wrote:

Am 19.09.2018 um 06:26 schrieb Chunming Zhou:

[snip]

  *fence = NULL;
  drm_syncobj_add_callback_locked(syncobj, cb, func);
@@ -164,6 +177,153 @@ void drm_syncobj_remove_callback(struct 
drm_syncobj *syncobj,

  spin_unlock(>lock);
  }
  +static void drm_syncobj_timeline_init(struct drm_syncobj *syncobj)


We still have _timeline_ in the name here.

the func is relevant to timeline members, or which name is proper?




+{
+    spin_lock(>lock);
+    syncobj->timeline_context = dma_fence_context_alloc(1);

[snip]

+}
+
+int drm_syncobj_lookup_fence(struct drm_syncobj *syncobj, u64 point,
+   struct dma_fence **fence) {
+
+    return drm_syncobj_search_fence(syncobj, point,
+    DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT,


I still have a bad feeling setting that flag as default cause it might 
change the behavior for the UAPI.


Maybe export drm_syncobj_search_fence directly? E.g. with the flags 
parameter.

previous v5 indeed do this, you let me wrap it, need change back?

Regards,
David Zhou


Regards,
Christian.


+    fence);
+}
+EXPORT_SYMBOL(drm_syncobj_lookup_fence);
+
  /**
   * drm_syncobj_find_fence - lookup and reference the fence in a 
sync object

   * @file_private: drm file private pointer
@@ -228,7 +443,7 @@ static int drm_syncobj_assign_null_handle(struct 
drm_syncobj *syncobj)

   * @fence: out parameter for the fence
   *
   * This is just a convenience function that combines 
drm_syncobj_find() and

- * drm_syncobj_fence_get().
+ * drm_syncobj_lookup_fence().
   *
   * Returns 0 on success or a negative error value on failure. On 
success @fence
   * contains a reference to the fence, which must be released by 
calling
@@ -236,18 +451,11 @@ static int 
drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj)

   */
  int drm_syncobj_find_fence(struct drm_file *file_private,
 u32 handle, u64 point,
-   struct dma_fence **fence)
-{
+   struct dma_fence **fence) {
  struct drm_syncobj *syncobj = drm_syncobj_find(file_private, 
handle);

-    int ret = 0;
-
-    if (!syncobj)
-    return -ENOENT;
+    int ret;
  -    *fence = drm_syncobj_fence_get(syncobj);
-    if (!*fence) {
-    ret = -EINVAL;
-    }
+    ret = drm_syncobj_lookup_fence(syncobj, point, fence);
  drm_syncobj_put(syncobj);
  return ret;
  }
@@ -264,7 +472,7 @@ void drm_syncobj_free(struct kref *kref)
  struct drm_syncobj *syncobj = container_of(kref,
 struct drm_syncobj,
 refcount);
-    drm_syncobj_replace_fence(syncobj, 0, NULL);
+    drm_syncobj_timeline_fini(syncobj);
  kfree(syncobj);
  }
  EXPORT_SYMBOL(drm_syncobj_free);
@@ -294,6 +502,11 @@ int drm_syncobj_create(struct drm_syncobj 
**out_syncobj, uint32_t flags,

  kref_init(>refcount);
  INIT_LIST_HEAD(>cb_list);
  spin_lock_init(>lock);
+    if (flags & DRM_SYNCOBJ_CREATE_TYPE_TIMELINE)
+    syncobj->type = DRM_SYNCOBJ_TYPE_TIMELINE;
+    else
+    syncobj->type = DRM_SYNCOBJ_TYPE_INDIVIDUAL;
+    drm_syncobj_timeline_init(syncobj);
    if (flags & DRM_SYNCOBJ_CREATE_SIGNALED) {
  ret = drm_syncobj_assign_null_handle(syncobj);
@@ -576,7 +789,8 @@ drm_syncobj_create_ioctl(struct drm_device *dev, 
void *data,

  return -ENODEV;
    /* no valid flags yet */
-    if (args->flags & ~DRM_SYNCOBJ_CREATE_SIGNALED)
+    if (args->flags & ~(DRM_SYNCOBJ_CREATE_SIGNALED |
+    DRM_SYNCOBJ_CREATE_TYPE_TIMELINE))
  return -EINVAL;
    return drm_syncobj_create_as_handle(file_private,
@@ -669,9 +883,8 @@ static void syncobj_wait_syncobj_func(struct 
drm_syncobj *syncobj,

  struct syncobj_wait_entry *wait =
  container_of(cb, struct syncobj_wait_entry, syncobj_cb);
  -    /* This happens inside the syncobj lock */
-    wait->fence = 
dma_fence_get(rcu_dereference_protected(syncobj->fence,

- lockdep_is_held(>lock)));
+    drm_syncobj_search_fence(syncobj, 0, 0, >fence);
+
  wake_up_process(wait->task);
  }
  @@ -698,7 +911,8 @@ static signed long 
drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,

  signaled_count = 0;
  for (i = 0; i < count; ++i) {
  entries[i].task = current;
-    entries[i].fence = drm_syncobj_fence_get(syncobjs[i]);
+    ret = drm_syncobj_search_fence(syncobjs[i], 0, 0,
+   [i].fence);
  if (!entries[i].fence) {
  if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) {
  continue;
@@ -970,12 +1184,19 @@ drm_syncobj_reset_ioctl(struct drm_device 
*dev, void *data,

  if (ret < 0)
  return ret;
  -    for (i = 0; i < args->count_handles; i++)
-    drm_syncobj_replace_fence(syncobjs[i], 0, NULL);
-
+    for (i = 0; i < args->count_handles; i++) {
+    if (syncobjs[i]->type == DRM_SYNCOBJ_TYPE_TIMELINE) {
+    

Re: [PATCH 1/4] [RFC]drm: add syncobj timeline support v6

2018-09-19 Thread zhoucm1



On 2018年09月19日 15:18, Christian König wrote:

Am 19.09.2018 um 06:26 schrieb Chunming Zhou:

[snip]

  *fence = NULL;
  drm_syncobj_add_callback_locked(syncobj, cb, func);
@@ -164,6 +177,153 @@ void drm_syncobj_remove_callback(struct 
drm_syncobj *syncobj,

  spin_unlock(>lock);
  }
  +static void drm_syncobj_timeline_init(struct drm_syncobj *syncobj)


We still have _timeline_ in the name here.

the func is relevant to timeline members, or which name is proper?




+{
+    spin_lock(>lock);
+    syncobj->timeline_context = dma_fence_context_alloc(1);

[snip]

+}
+
+int drm_syncobj_lookup_fence(struct drm_syncobj *syncobj, u64 point,
+   struct dma_fence **fence) {
+
+    return drm_syncobj_search_fence(syncobj, point,
+    DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT,


I still have a bad feeling setting that flag as default cause it might 
change the behavior for the UAPI.


Maybe export drm_syncobj_search_fence directly? E.g. with the flags 
parameter.

previous v5 indeed do this, you let me wrap it, need change back?

Regards,
David Zhou


Regards,
Christian.


+    fence);
+}
+EXPORT_SYMBOL(drm_syncobj_lookup_fence);
+
  /**
   * drm_syncobj_find_fence - lookup and reference the fence in a 
sync object

   * @file_private: drm file private pointer
@@ -228,7 +443,7 @@ static int drm_syncobj_assign_null_handle(struct 
drm_syncobj *syncobj)

   * @fence: out parameter for the fence
   *
   * This is just a convenience function that combines 
drm_syncobj_find() and

- * drm_syncobj_fence_get().
+ * drm_syncobj_lookup_fence().
   *
   * Returns 0 on success or a negative error value on failure. On 
success @fence
   * contains a reference to the fence, which must be released by 
calling
@@ -236,18 +451,11 @@ static int 
drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj)

   */
  int drm_syncobj_find_fence(struct drm_file *file_private,
 u32 handle, u64 point,
-   struct dma_fence **fence)
-{
+   struct dma_fence **fence) {
  struct drm_syncobj *syncobj = drm_syncobj_find(file_private, 
handle);

-    int ret = 0;
-
-    if (!syncobj)
-    return -ENOENT;
+    int ret;
  -    *fence = drm_syncobj_fence_get(syncobj);
-    if (!*fence) {
-    ret = -EINVAL;
-    }
+    ret = drm_syncobj_lookup_fence(syncobj, point, fence);
  drm_syncobj_put(syncobj);
  return ret;
  }
@@ -264,7 +472,7 @@ void drm_syncobj_free(struct kref *kref)
  struct drm_syncobj *syncobj = container_of(kref,
 struct drm_syncobj,
 refcount);
-    drm_syncobj_replace_fence(syncobj, 0, NULL);
+    drm_syncobj_timeline_fini(syncobj);
  kfree(syncobj);
  }
  EXPORT_SYMBOL(drm_syncobj_free);
@@ -294,6 +502,11 @@ int drm_syncobj_create(struct drm_syncobj 
**out_syncobj, uint32_t flags,

  kref_init(>refcount);
  INIT_LIST_HEAD(>cb_list);
  spin_lock_init(>lock);
+    if (flags & DRM_SYNCOBJ_CREATE_TYPE_TIMELINE)
+    syncobj->type = DRM_SYNCOBJ_TYPE_TIMELINE;
+    else
+    syncobj->type = DRM_SYNCOBJ_TYPE_INDIVIDUAL;
+    drm_syncobj_timeline_init(syncobj);
    if (flags & DRM_SYNCOBJ_CREATE_SIGNALED) {
  ret = drm_syncobj_assign_null_handle(syncobj);
@@ -576,7 +789,8 @@ drm_syncobj_create_ioctl(struct drm_device *dev, 
void *data,

  return -ENODEV;
    /* no valid flags yet */
-    if (args->flags & ~DRM_SYNCOBJ_CREATE_SIGNALED)
+    if (args->flags & ~(DRM_SYNCOBJ_CREATE_SIGNALED |
+    DRM_SYNCOBJ_CREATE_TYPE_TIMELINE))
  return -EINVAL;
    return drm_syncobj_create_as_handle(file_private,
@@ -669,9 +883,8 @@ static void syncobj_wait_syncobj_func(struct 
drm_syncobj *syncobj,

  struct syncobj_wait_entry *wait =
  container_of(cb, struct syncobj_wait_entry, syncobj_cb);
  -    /* This happens inside the syncobj lock */
-    wait->fence = 
dma_fence_get(rcu_dereference_protected(syncobj->fence,

- lockdep_is_held(>lock)));
+    drm_syncobj_search_fence(syncobj, 0, 0, >fence);
+
  wake_up_process(wait->task);
  }
  @@ -698,7 +911,8 @@ static signed long 
drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,

  signaled_count = 0;
  for (i = 0; i < count; ++i) {
  entries[i].task = current;
-    entries[i].fence = drm_syncobj_fence_get(syncobjs[i]);
+    ret = drm_syncobj_search_fence(syncobjs[i], 0, 0,
+   [i].fence);
  if (!entries[i].fence) {
  if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) {
  continue;
@@ -970,12 +1184,19 @@ drm_syncobj_reset_ioctl(struct drm_device 
*dev, void *data,

  if (ret < 0)
  return ret;
  -    for (i = 0; i < args->count_handles; i++)
-    drm_syncobj_replace_fence(syncobjs[i], 0, NULL);
-
+    for (i = 0; i < args->count_handles; i++) {
+    if (syncobjs[i]->type == DRM_SYNCOBJ_TYPE_TIMELINE) {
+    

Re: [PATCH] [RFC]drm: add syncobj timeline support v5

2018-09-18 Thread zhoucm1



On 2018年09月18日 16:32, Christian König wrote:

+    for (i = 0; i < args->count_handles; i++) {
+    if (syncobjs[i]->type == DRM_SYNCOBJ_TYPE_TIMELINE) {
+    DRM_ERROR("timeline syncobj cannot reset!\n");


Why not? I mean that should still work or do I miss anything?
timeline semaphore spec doesn't require reset interface, it says the 
timeline value only can be changed by signal operations.


Yeah, but we don't care about the timeline spec in the kernel.

Question is rather if that still makes sense to support that and as 
far as I can see it should be trivial to reinitialize the object. 

Hi Daniel Rakos,

Could you give a comment on this question? Is it necessary to support 
timeline reset interface?  I only see the timeline value can be changed 
by signal operations in Spec.



Thanks,
David Zhou
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] [RFC]drm: add syncobj timeline support v5

2018-09-17 Thread zhoucm1



On 2018年09月17日 16:37, Christian König wrote:

Am 14.09.2018 um 12:37 schrieb Chunming Zhou:
This patch is for VK_KHR_timeline_semaphore extension, semaphore is 
called syncobj in kernel side:
This extension introduces a new type of syncobj that has an integer 
payload

identifying a point in a timeline. Such timeline syncobjs support the
following operations:
    * CPU query - A host operation that allows querying the payload 
of the

  timeline syncobj.
    * CPU wait - A host operation that allows a blocking wait for a
  timeline syncobj to reach a specified value.
    * Device wait - A device operation that allows waiting for a
  timeline syncobj to reach a specified value.
    * Device signal - A device operation that allows advancing the
  timeline syncobj to a specified value.

Since it's a timeline, that means the front time point(PT) always is 
signaled before the late PT.

a. signal PT design:
Signal PT fence N depends on PT[N-1] fence and signal opertion fence, 
when PT[N] fence is signaled,

the timeline will increase to value of PT[N].
b. wait PT design:
Wait PT fence is signaled by reaching timeline point value, when 
timeline is increasing, will compare
wait PTs value with new timeline value, if PT value is lower than 
timeline value, then wait PT will be
signaled, otherwise keep in list. syncobj wait operation can wait on 
any point of timeline,
so need a RB tree to order them. And wait PT could ahead of signal 
PT, we need a sumission fence to

perform that.

v2:
1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian)
2. move unexposed denitions to .c file. (Daniel Vetter)
3. split up the change to drm_syncobj_find_fence() in a separate 
patch. (Christian)
4. split up the change to drm_syncobj_replace_fence() in a separate 
patch.
5. drop the submission_fence implementation and instead use 
wait_event() for that. (Christian)

6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter)

v3:
1. replace normal syncobj with timeline implemenation. (Vetter and 
Christian)
 a. normal syncobj signal op will create a signal PT to tail of 
signal pt list.
 b. normal syncobj wait op will create a wait pt with last signal 
point, and this wait PT is only signaled by related signal point PT.

2. many bug fix and clean up
3. stub fence moving is moved to other patch.

v4:
1. fix RB tree loop with while(node=rb_first(...)). (Christian)
2. fix syncobj lifecycle. (Christian)
3. only enable_signaling when there is wait_pt. (Christian)
4. fix timeline path issues.
5. write a timeline test in libdrm

v5: (Christian)
1. semaphore is called syncobj in kernel side.
2. don't need 'timeline' characters in some function name.
3. keep syncobj cb

normal syncobj is tested by ./deqp-vk -n dEQP-VK*semaphore*
timeline syncobj is tested by ./amdgpu_test -s 9

Signed-off-by: Chunming Zhou 
Cc: Christian Konig 
Cc: Dave Airlie 
Cc: Daniel Rakos 
Cc: Daniel Vetter 
---
  drivers/gpu/drm/drm_syncobj.c  | 294 ++---
  drivers/gpu/drm/i915/i915_gem_execbuffer.c |   4 +-
  include/drm/drm_syncobj.h  |  62 +++--
  include/uapi/drm/drm.h |   1 +
  4 files changed, 292 insertions(+), 69 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c 
b/drivers/gpu/drm/drm_syncobj.c

index e9ce623d049e..e78d076f2703 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -56,6 +56,9 @@
  #include "drm_internal.h"
  #include 
  +/* merge normal syncobj to timeline syncobj, the point interval is 
1 */

+#define DRM_SYNCOBJ_NORMAL_POINT 1
+
  struct drm_syncobj_stub_fence {
  struct dma_fence base;
  spinlock_t lock;
@@ -82,6 +85,11 @@ static const struct dma_fence_ops 
drm_syncobj_stub_fence_ops = {

  .release = drm_syncobj_stub_fence_release,
  };
  +struct drm_syncobj_signal_pt {
+    struct dma_fence_array *base;
+    u64    value;
+    struct list_head list;
+};
    /**
   * drm_syncobj_find - lookup and reference a sync object.
@@ -124,7 +132,7 @@ static int 
drm_syncobj_fence_get_or_add_callback(struct drm_syncobj *syncobj,

  {
  int ret;
  -    *fence = drm_syncobj_fence_get(syncobj);
+    ret = drm_syncobj_search_fence(syncobj, 0, 0, fence);
  if (*fence)


Don't we need to check ret here instead?

both ok, if you like check ret, will update it in v6.




  return 1;
  @@ -133,10 +141,10 @@ static int 
drm_syncobj_fence_get_or_add_callback(struct drm_syncobj *syncobj,

   * have the lock, try one more time just to be sure we don't add a
   * callback when a fence has already been set.
   */
-    if (syncobj->fence) {
-    *fence = 
dma_fence_get(rcu_dereference_protected(syncobj->fence,

- lockdep_is_held(>lock)));
-    ret = 1;
+    if (fence) {
+    drm_syncobj_search_fence(syncobj, 0, 0, fence);
+    if (*fence)
+    ret = 1;


That doesn't looks correct to me, drm_syncobj_search_fence() would try 
to grab the lock once more.


That 

Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

2018-09-14 Thread zhoucm1



On 2018年09月13日 18:22, Christian König wrote:

Am 13.09.2018 um 11:35 schrieb Zhou, David(ChunMing):



-Original Message-
From: Koenig, Christian
Sent: Thursday, September 13, 2018 5:20 PM
To: Zhou, David(ChunMing) ; dri-
de...@lists.freedesktop.org
Cc: Dave Airlie ; Rakos, Daniel
; amd-...@lists.freedesktop.org
Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

Am 13.09.2018 um 11:11 schrieb Zhou, David(ChunMing):

-Original Message-
From: Christian König 
Sent: Thursday, September 13, 2018 4:50 PM
To: Zhou, David(ChunMing) ; Koenig, Christian
; dri-devel@lists.freedesktop.org
Cc: Dave Airlie ; Rakos, Daniel
; amd-...@lists.freedesktop.org
Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

Am 13.09.2018 um 09:43 schrieb Zhou, David(ChunMing):

-Original Message-
From: Koenig, Christian
Sent: Thursday, September 13, 2018 2:56 PM
To: Zhou, David(ChunMing) ; Zhou,
David(ChunMing) ; dri-
de...@lists.freedesktop.org
Cc: Dave Airlie ; Rakos, Daniel
; amd-...@lists.freedesktop.org
Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

Am 13.09.2018 um 04:15 schrieb zhoucm1:

On 2018年09月12日 19:05, Christian König wrote:

[SNIP]
+static void drm_syncobj_find_signal_pt_for_wait_pt(struct
drm_syncobj *syncobj,
+   struct drm_syncobj_wait_pt
+*wait_pt) {

That whole approach still looks horrible complicated to me.

It's already very close to what you said before.


Especially the separation of signal and wait pt is completely
unnecessary as far as I can see.
When a wait pt is requested we just need to search for the
signal point which it will trigger.

Yeah, I tried this, but when I implement cpu wait ioctl on
specific point, we need a advanced wait pt fence, otherwise, we
could still need old syncobj cb.

Why? I mean you just need to call drm_syncobj_find_fence() and

when

that one returns NULL you use wait_event_*() to wait for a signal
point >= your wait point to appear and try again.

e.g. when there are 3 syncobjs(A,B,C) to wait, all syncobjABC have
no fence yet, as you said, during drm_syncobj_find_fence(A) is
working on wait_event, syncobjB and syncobjC could already be
signaled, then we don't know which one is first signaled, which is
need when wait ioctl returns.

I don't really see a problem with that. When you wait for the first
one you need to wait for A,B,C at the same time anyway.

So what you do is to register a fence callback on the fences you
already have and for the syncobj which doesn't yet have a fence you
make sure that they wake up your thread when they get one.

So essentially exactly what drm_syncobj_fence_get_or_add_callback()
already does today.

So do you mean we need still use old syncobj CB for that?

Yes, as far as I can see it should work.


    Advanced wait pt is bad?

Well it isn't bad, I just don't see any advantage in it.

The advantage is to replace old syncobj cb.


The existing mechanism
should already be able to handle that.
I thought more a bit, we don't that mechanism at all, if use 
advanced wait
pt, we can easily use fence array to achieve it for wait ioctl, we 
should use
kernel existing feature as much as possible, not invent another, 
shouldn't we?

I remember  you said  it before.

Yeah, but the syncobj cb is an existing feature.
This is obviously a workaround when doing for wait ioctl, Do you see 
it used in other place?



And I absolutely don't see a
need to modify that and replace it with something far more complex.
The wait ioctl is simplified much more by fence array, not complex, 
and we just need  to allocate a wait pt.  If keeping old syncobj cb 
workaround, all wait pt logic still is there, just save allocation 
and wait pt handling, in fact, which part isn't complex at all. But 
compare with ugly syncobj cb, which is simpler.


I strongly disagree on that. You just need to extend the syncobj cb 
with the sequence number and you are done.


We could clean that up in the long term by adding some wait_multi 
event macro, but for now just adding the sequence number should do the 
trick.


Quote from Daniel Vetter comment when v1, "

Specifically for this stuff here having unified future fence semantics
will allow drivers to do clever stuff with them.

"
I think the advanced wait pt is a similar concept as 'future fence' what 
Daniel Vetter said before, which obviously a right direction.



Anyway, I will change the patch as you like if no other comment, so that 
the patch can pass soon.


Thanks,
David Zhou


Regards,
Christian.



Thanks,
David Zhou

Regards,
Christian.


Thanks,
David Zhou

Christian.


Thanks,
David Zhou

Regards,
Christian.


Back to my implementation, it already fixes all your concerns
before, and can be able to easily used in wait_ioctl. When you
feel that is complicated, I guess that is because we merged all
logic to that and much clean up in one patch. In fact, it already
is very simple, timeline_init/fini, create signal/

Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

2018-09-14 Thread zhoucm1



On 2018年09月14日 11:14, zhoucm1 wrote:



On 2018年09月13日 18:22, Christian König wrote:

Am 13.09.2018 um 11:35 schrieb Zhou, David(ChunMing):



-Original Message-
From: Koenig, Christian
Sent: Thursday, September 13, 2018 5:20 PM
To: Zhou, David(ChunMing) ; dri-
de...@lists.freedesktop.org
Cc: Dave Airlie ; Rakos, Daniel
; amd-...@lists.freedesktop.org
Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

Am 13.09.2018 um 11:11 schrieb Zhou, David(ChunMing):

-Original Message-
From: Christian König 
Sent: Thursday, September 13, 2018 4:50 PM
To: Zhou, David(ChunMing) ; Koenig, Christian
; dri-devel@lists.freedesktop.org
Cc: Dave Airlie ; Rakos, Daniel
; amd-...@lists.freedesktop.org
Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

Am 13.09.2018 um 09:43 schrieb Zhou, David(ChunMing):

-Original Message-
From: Koenig, Christian
Sent: Thursday, September 13, 2018 2:56 PM
To: Zhou, David(ChunMing) ; Zhou,
David(ChunMing) ; dri-
de...@lists.freedesktop.org
Cc: Dave Airlie ; Rakos, Daniel
; amd-...@lists.freedesktop.org
Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

Am 13.09.2018 um 04:15 schrieb zhoucm1:

On 2018年09月12日 19:05, Christian König wrote:

[SNIP]
+static void drm_syncobj_find_signal_pt_for_wait_pt(struct
drm_syncobj *syncobj,
+   struct drm_syncobj_wait_pt
+*wait_pt) {

That whole approach still looks horrible complicated to me.

It's already very close to what you said before.


Especially the separation of signal and wait pt is completely
unnecessary as far as I can see.
When a wait pt is requested we just need to search for the
signal point which it will trigger.

Yeah, I tried this, but when I implement cpu wait ioctl on
specific point, we need a advanced wait pt fence, otherwise, we
could still need old syncobj cb.

Why? I mean you just need to call drm_syncobj_find_fence() and

when
that one returns NULL you use wait_event_*() to wait for a 
signal

point >= your wait point to appear and try again.
e.g. when there are 3 syncobjs(A,B,C) to wait, all syncobjABC 
have

no fence yet, as you said, during drm_syncobj_find_fence(A) is
working on wait_event, syncobjB and syncobjC could already be
signaled, then we don't know which one is first signaled, 
which is

need when wait ioctl returns.
I don't really see a problem with that. When you wait for the 
first

one you need to wait for A,B,C at the same time anyway.

So what you do is to register a fence callback on the fences you
already have and for the syncobj which doesn't yet have a fence 
you

make sure that they wake up your thread when they get one.

So essentially exactly what 
drm_syncobj_fence_get_or_add_callback()

already does today.

So do you mean we need still use old syncobj CB for that?

Yes, as far as I can see it should work.


    Advanced wait pt is bad?

Well it isn't bad, I just don't see any advantage in it.

The advantage is to replace old syncobj cb.


The existing mechanism
should already be able to handle that.
I thought more a bit, we don't that mechanism at all, if use 
advanced wait
pt, we can easily use fence array to achieve it for wait ioctl, we 
should use
kernel existing feature as much as possible, not invent another, 
shouldn't we?

I remember  you said  it before.

Yeah, but the syncobj cb is an existing feature.
This is obviously a workaround when doing for wait ioctl, Do you see 
it used in other place?



And I absolutely don't see a
need to modify that and replace it with something far more complex.
The wait ioctl is simplified much more by fence array, not complex, 
and we just need  to allocate a wait pt.  If keeping old syncobj cb 
workaround, all wait pt logic still is there, just save allocation 
and wait pt handling, in fact, which part isn't complex at all. But 
compare with ugly syncobj cb, which is simpler.


I strongly disagree on that. You just need to extend the syncobj cb 
with the sequence number and you are done.


We could clean that up in the long term by adding some wait_multi 
event macro, but for now just adding the sequence number should do 
the trick.


Quote from Daniel Vetter comment when v1, "

Specifically for this stuff here having unified future fence semantics
will allow drivers to do clever stuff with them.

"
I think the advanced wait pt is a similar concept as 'future fence' 
what Daniel Vetter said before, which obviously a right direction.



Anyway, I will change the patch as you like if no other comment, so 
that the patch can pass soon.
When I try to remove wait pt future fence, I encounter another problem, 
drm_syncobj_find_fence cannot get a fence if signal pt already is 
collected as garbage, then CS will report error, any idea for that?
I still think the future fence is right thing, Could you give futher 
thought on it again? Otherwise, we could need various workarounds.


Thanks,
David Zhou


Thanks,
David Zhou


Regards,
Christian.



Thank

Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

2018-09-12 Thread zhoucm1



On 2018年09月12日 19:05, Christian König wrote:

Am 12.09.2018 um 12:20 schrieb zhoucm1:

[SNIP]

Drop the term semaphore here, better use syncobj.
This is from VK_KHR_timeline_semaphore extension describe, not my 
invention, I just quote it. In kernel side, we call syncobj, in UMD, 
they still call semaphore.


Yeah, but we don't care about close source UMD names in the kernel and'
the open source UMD calls it syncobj as well.




[SNIP]
+static void drm_syncobj_find_signal_pt_for_wait_pt(struct 
drm_syncobj *syncobj,

+   struct drm_syncobj_wait_pt *wait_pt)
+{


That whole approach still looks horrible complicated to me.

It's already very close to what you said before.



Especially the separation of signal and wait pt is completely 
unnecessary as far as I can see.
When a wait pt is requested we just need to search for the signal 
point which it will trigger.
Yeah, I tried this, but when I implement cpu wait ioctl on specific 
point, we need a advanced wait pt fence, otherwise, we could still 
need old syncobj cb.


Why? I mean you just need to call drm_syncobj_find_fence() and when 
that one returns NULL you use wait_event_*() to wait for a signal 
point >= your wait point to appear and try again.
e.g. when there are 3 syncobjs(A,B,C) to wait, all syncobjABC have no 
fence yet, as you said, during drm_syncobj_find_fence(A) is working on 
wait_event, syncobjB and syncobjC could already be signaled, then we 
don't know which one is first signaled, which is need when wait ioctl 
returns.


Back to my implementation, it already fixes all your concerns before, 
and can be able to easily used in wait_ioctl. When you feel that is 
complicated, I guess that is because we merged all logic to that and 
much clean up in one patch. In fact, it already is very simple, 
timeline_init/fini, create signal/wait_pt, find signal_pt for wait_pt, 
garbage collection, just them.


Thanks,
David Zhou


Regards,
Christian.




Thanks,
David Zhou


Regards,
Christian.

+    struct drm_syncobj_timeline *timeline = 
>syncobj_timeline;

+    struct drm_syncobj_signal_pt *signal_pt;
+    int ret;
+
+    if (wait_pt->signal_pt_fence) {
+    return;
+    } else if ((syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE) &&
+   (wait_pt->value <= timeline->timeline)) {
+    dma_fence_signal(_pt->base.base);
+    rb_erase(_pt->node,
+ >wait_pt_tree);
+    RB_CLEAR_NODE(_pt->node);
+    dma_fence_put(_pt->base.base);
+    return;
+    }
+
+    list_for_each_entry(signal_pt, >signal_pt_list, 
list) {

+    if (wait_pt->value < signal_pt->value)
+    continue;
+    if ((syncobj->type == DRM_SYNCOBJ_TYPE_NORMAL) &&
+    (wait_pt->value != signal_pt->value))
+    continue;
+    wait_pt->signal_pt_fence = 
dma_fence_get(_pt->base->base);

+    ret = dma_fence_add_callback(wait_pt->signal_pt_fence,
+ _pt->wait_cb,
+ wait_pt_func);
+    if (ret == -ENOENT) {
+    dma_fence_signal(_pt->base.base);
+    dma_fence_put(wait_pt->signal_pt_fence);
+    wait_pt->signal_pt_fence = NULL;
+    rb_erase(_pt->node,
+ >wait_pt_tree);
+    RB_CLEAR_NODE(_pt->node);
+    dma_fence_put(_pt->base.base);
+    } else if (ret < 0) {
+    dma_fence_put(wait_pt->signal_pt_fence);
+    DRM_ERROR("add callback error!");
+    } else {
+    /* after adding callback, remove from rb tree */
+    rb_erase(_pt->node,
+ >wait_pt_tree);
+    RB_CLEAR_NODE(_pt->node);
+    }
+    return;
+    }
+    /* signaled pt was released */
+    if (!wait_pt->signal_pt_fence && (wait_pt->value <=
+  timeline->signal_point)) {
+    dma_fence_signal(_pt->base.base);
+    rb_erase(_pt->node,
+ >wait_pt_tree);
+    RB_CLEAR_NODE(_pt->node);
+    dma_fence_put(_pt->base.base);
+    }
  }
  -void drm_syncobj_add_callback(struct drm_syncobj *syncobj,
-  struct drm_syncobj_cb *cb,
-  drm_syncobj_func_t func)
+static int drm_syncobj_timeline_create_signal_pt(struct 
drm_syncobj *syncobj,

+ struct dma_fence *fence,
+ u64 point)
  {
+    struct drm_syncobj_signal_pt *signal_pt =
+    kzalloc(sizeof(struct drm_syncobj_signal_pt), GFP_KERNEL);
+    struct drm_syncobj_signal_pt *tail_pt;
+    struct dma_fence **fences;
+    struct rb_node *node;
+    struct drm_syncobj_wait_pt *tail_wait_pt = NULL;
+    int num_fences = 0;
+    int ret = 0, i;
+
+    if (!signal_pt)
+    return -ENOMEM;
+    if (syncobj->syncobj_timeline.signal_point >= point) {
+    DRM_WARN("A later signal is ready!")

Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

2018-09-12 Thread zhoucm1



On 2018年09月12日 15:22, Christian König wrote:

Ping? Have you seen my comments here?

Sorry, I didn't see this reply.  inline...



Looks like you haven't addressed any of them in your last mail.

Christian.

Am 06.09.2018 um 09:25 schrieb Christian König:

Am 06.09.2018 um 08:25 schrieb Chunming Zhou:

VK_KHR_timeline_semaphore:
This extension introduces a new type of semaphore that has an 
integer payload

identifying a point in a timeline. Such timeline semaphores support the


Drop the term semaphore here, better use syncobj.
This is from VK_KHR_timeline_semaphore extension describe, not my 
invention, I just quote it. In kernel side, we call syncobj, in UMD, 
they still call semaphore.





following operations:
    * CPU query - A host operation that allows querying the payload 
of the

  timeline semaphore.
    * CPU wait - A host operation that allows a blocking wait for a
  timeline semaphore to reach a specified value.
    * Device wait - A device operation that allows waiting for a
  timeline semaphore to reach a specified value.
    * Device signal - A device operation that allows advancing the
  timeline semaphore to a specified value.

Since it's a timeline, that means the front time point(PT) always is 
signaled before the late PT.

a. signal PT design:
Signal PT fence N depends on PT[N-1] fence and signal opertion 
fence, when PT[N] fence is signaled,

the timeline will increase to value of PT[N].
b. wait PT design:
Wait PT fence is signaled by reaching timeline point value, when 
timeline is increasing, will compare
wait PTs value with new timeline value, if PT value is lower than 
timeline value, then wait PT will be
signaled, otherwise keep in list. semaphore wait operation can wait 
on any point of timeline,
so need a RB tree to order them. And wait PT could ahead of signal 
PT, we need a sumission fence to

perform that.

v2:
1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian)
2. move unexposed denitions to .c file. (Daniel Vetter)
3. split up the change to drm_syncobj_find_fence() in a separate 
patch. (Christian)
4. split up the change to drm_syncobj_replace_fence() in a separate 
patch.
5. drop the submission_fence implementation and instead use 
wait_event() for that. (Christian)

6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter)

v3:
1. replace normal syncobj with timeline implemenation. (Vetter and 
Christian)
 a. normal syncobj signal op will create a signal PT to tail of 
signal pt list.
 b. normal syncobj wait op will create a wait pt with last 
signal point, and this wait PT is only signaled by related signal 
point PT.

2. many bug fix and clean up
3. stub fence moving is moved to other patch.

v4:
1. fix RB tree loop with while(node=rb_first(...)). (Christian)
2. fix syncobj lifecycle. (Christian)
3. only enable_signaling when there is wait_pt. (Christian)

Tested by ./deqp-vk -n dEQP-VK*semaphore* for normal syncobj

Signed-off-by: Chunming Zhou 
Cc: Christian Konig 
Cc: Dave Airlie 
Cc: Daniel Rakos 
Cc: Daniel Vetter 
---
  drivers/gpu/drm/drm_syncobj.c  | 516 
+

  drivers/gpu/drm/i915/i915_gem_execbuffer.c |   4 +-
  include/drm/drm_syncobj.h  |  78 ++--
  include/uapi/drm/drm.h |   1 +
  4 files changed, 448 insertions(+), 151 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c 
b/drivers/gpu/drm/drm_syncobj.c

index e9ce623d049e..94b31de23858 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -56,6 +56,9 @@
  #include "drm_internal.h"
  #include 
  +/* merge normal syncobj to timeline syncobj, the point interval 
is 1 */

+#define DRM_SYNCOBJ_NORMAL_POINT 1
+
  struct drm_syncobj_stub_fence {
  struct dma_fence base;
  spinlock_t lock;
@@ -82,6 +85,18 @@ static const struct dma_fence_ops 
drm_syncobj_stub_fence_ops = {

  .release = drm_syncobj_stub_fence_release,
  };
  +struct drm_syncobj_signal_pt {
+    struct dma_fence_array *base;
+    u64    value;
+    struct list_head list;
+};
+struct drm_syncobj_wait_pt {
+    struct drm_syncobj_stub_fence base;
+    struct dma_fence *signal_pt_fence;
+    struct dma_fence_cb wait_cb;
+    struct rb_node   node;
+    u64    value;
+};
    /**
   * drm_syncobj_find - lookup and reference a sync object.
@@ -109,61 +124,238 @@ struct drm_syncobj *drm_syncobj_find(struct 
drm_file *file_private,

  }
  EXPORT_SYMBOL(drm_syncobj_find);
  -static void drm_syncobj_add_callback_locked(struct drm_syncobj 
*syncobj,

-    struct drm_syncobj_cb *cb,
-    drm_syncobj_func_t func)
+static void drm_syncobj_timeline_init(struct drm_syncobj *syncobj,
+  struct drm_syncobj_timeline *syncobj_timeline)


Since we merged timeline and singleton syncobj you can drop the extra 
_timeline_ part in the function name I think.

Will try in v5.




  {
-    cb->func = func;
-    list_add_tail(>node, >cb_list);
+    

Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-09-04 Thread zhoucm1



On 2018年09月04日 17:20, Christian König wrote:

Am 04.09.2018 um 11:00 schrieb zhoucm1:



On 2018年09月04日 16:42, Christian König wrote:

Am 04.09.2018 um 10:27 schrieb zhoucm1:



On 2018年09月04日 16:05, Christian König wrote:

Am 04.09.2018 um 09:53 schrieb zhoucm1:

[SNIP]


How about this idea:
1. Each signaling point is a fence implementation with an rb 
node.
2. Each node keeps a reference to the last previously inserted 
node.

3. Each node is referenced by the sync object itself.
4. Before each signal/wait operation we do a garbage 
collection and remove the first node from the tree as long as 
it is signaled.


5. When enable_signaling is requested for a node we cascade 
that to the left using rb_prev.
    This ensures that signaling is enabled for the current 
fence as well as all previous fences.


6. A wait just looks into the tree for the signal point lower 
or equal of the requested sequence number.
After re-thought your idea, I think it doesn't work since there 
is no timeline value as a line:
signal pt value doesn't must be continues, which can be jump by 
signal operation, like 1, 4, 8, 15, 19, e.g. there are five 
singal_pt, 
signal_pt1->signal_pt4->signal_pt8->signal_pt15->signal_pt19, if 
a wait pt is 7, do you mean this wait only needs signal_pt1 and 
signal_pt4???  That's certainly not right, we need to make sure 
the timeline value is bigger than wait pt value, that means 
signal_pt8 is need for wait_pt7.


That can be defined as we like it, e.g. when a wait operation asks 
for 7 we can return 8 as well.
If defined this, then problem is coming again, if 8 is removed when 
garbage collection, you will return 15?


The garbage collection is only done for signaled nodes. So when 8 is 
already garbage collected and 7 is asked we know that we don't need 
to return anything.
8 is a signaled node, waitA/signal operation do garbage collection, 
how waitB(7) know the garbage history?


Well we of course keep what the last garbage collected number is, 
don't we?


Since there is no timeline as a line, I think this is not right 
direction.


That is actually intended. There is no infinite timeline here, just 
a windows of the last not yet signaled fences.
No one said the it's a infinite timeline, timeline will stop 
increasing when syncobj is released.


Yeah, but the syncobj can live for a very very long time. Not having 
some form of shrinking it when fences are signaled is certainly not 
going to fly very far.

I will try to fix this problem.
btw, when I try your suggestion, I find it will be difficult to 
implement drm_syncobj_array_wait_timeout by your idea, since it needs 
first_signaled. if there is un-signaled syncobj, we will still register 
cb to timeline value change, then still back to need enble_signaling.


Thanks,
David Zhou


Regards,
Christian.



Anyway kref is a good way to solve the 'free' problem, I will try to 
use it improve my patch, of course, will refer your idea.:)


Thanks,
David Zhou


Otherwise you will never be able to release nodes from the tree 
since you always need to keep them around just in case somebody asks 
for a lower number.


Regards,
Christian.





The key is that as soon as a signal point is added adding a 
previous point is no longer allowed.

That's intention.

Regards,
David Zhou




7. When the sync object is released we use 
rbtree_postorder_for_each_entry_safe() and drop the extra 
reference to each node, but never call rb_erase!
    This way the rb_tree stays in memory, but without a root 
(e.g. the sync object). It only destructs itself when the 
looked up references to the nodes are dropped.
And here, who will destroy rb node since no one do 
enable_signaling, and there is no callback to free themselves.


The node will be destroyed when the last reference drops, not when 
enable_signaling is called.


In other words the sync_obj keeps the references to each tree 
object to provide the wait operation, as soon as the sync_obj is 
destroyed we don't need that functionality any more.


We don't even need to wait for anything to be signaled, this way 
we can drop all unused signal points as soon as the sync_obj is 
destroyed.


Only the used ones will stay alive and provide the necessary 
functionality to provide the signal for each wait operation.


Regards,
Christian.



Regards,
David Zhou


Well that is quite a bunch of logic, but I think that should 
work fine.
Yeah, it could work, simple timeline reference also can solve 
'free' problem.


I think this approach is still quite a bit better, 


e.g. you don't run into circle dependency problems, it needs 
less memory and each node has always the same size which means 
we can use a kmem_cache for it.


Regards,
Christian.



Thanks,
David Zhou






___
amd-gfx mailing list
amd-...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx








___
amd-gfx mail

Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-09-04 Thread zhoucm1



On 2018年09月04日 16:42, Christian König wrote:

Am 04.09.2018 um 10:27 schrieb zhoucm1:



On 2018年09月04日 16:05, Christian König wrote:

Am 04.09.2018 um 09:53 schrieb zhoucm1:

[SNIP]


How about this idea:
1. Each signaling point is a fence implementation with an rb node.
2. Each node keeps a reference to the last previously inserted 
node.

3. Each node is referenced by the sync object itself.
4. Before each signal/wait operation we do a garbage collection 
and remove the first node from the tree as long as it is signaled.


5. When enable_signaling is requested for a node we cascade that 
to the left using rb_prev.
    This ensures that signaling is enabled for the current fence 
as well as all previous fences.


6. A wait just looks into the tree for the signal point lower or 
equal of the requested sequence number.
After re-thought your idea, I think it doesn't work since there is 
no timeline value as a line:
signal pt value doesn't must be continues, which can be jump by 
signal operation, like 1, 4, 8, 15, 19, e.g. there are five 
singal_pt, 
signal_pt1->signal_pt4->signal_pt8->signal_pt15->signal_pt19, if a 
wait pt is 7, do you mean this wait only needs signal_pt1 and 
signal_pt4???  That's certainly not right, we need to make sure the 
timeline value is bigger than wait pt value, that means signal_pt8 
is need for wait_pt7.


That can be defined as we like it, e.g. when a wait operation asks 
for 7 we can return 8 as well.
If defined this, then problem is coming again, if 8 is removed when 
garbage collection, you will return 15?


The garbage collection is only done for signaled nodes. So when 8 is 
already garbage collected and 7 is asked we know that we don't need to 
return anything.
8 is a signaled node, waitA/signal operation do garbage collection, how 
waitB(7) know the garbage history?




Since there is no timeline as a line, I think this is not right 
direction.


That is actually intended. There is no infinite timeline here, just a 
windows of the last not yet signaled fences.
No one said the it's a infinite timeline, timeline will stop increasing 
when syncobj is released.


Anyway kref is a good way to solve the 'free' problem, I will try to use 
it improve my patch, of course, will refer your idea.:)


Thanks,
David Zhou


Otherwise you will never be able to release nodes from the tree since 
you always need to keep them around just in case somebody asks for a 
lower number.


Regards,
Christian.





The key is that as soon as a signal point is added adding a previous 
point is no longer allowed.

That's intention.

Regards,
David Zhou




7. When the sync object is released we use 
rbtree_postorder_for_each_entry_safe() and drop the extra 
reference to each node, but never call rb_erase!
    This way the rb_tree stays in memory, but without a root 
(e.g. the sync object). It only destructs itself when the looked 
up references to the nodes are dropped.
And here, who will destroy rb node since no one do 
enable_signaling, and there is no callback to free themselves.


The node will be destroyed when the last reference drops, not when 
enable_signaling is called.


In other words the sync_obj keeps the references to each tree object 
to provide the wait operation, as soon as the sync_obj is destroyed 
we don't need that functionality any more.


We don't even need to wait for anything to be signaled, this way we 
can drop all unused signal points as soon as the sync_obj is destroyed.


Only the used ones will stay alive and provide the necessary 
functionality to provide the signal for each wait operation.


Regards,
Christian.



Regards,
David Zhou


Well that is quite a bunch of logic, but I think that should 
work fine.
Yeah, it could work, simple timeline reference also can solve 
'free' problem.


I think this approach is still quite a bit better, 


e.g. you don't run into circle dependency problems, it needs less 
memory and each node has always the same size which means we can 
use a kmem_cache for it.


Regards,
Christian.



Thanks,
David Zhou






___
amd-gfx mailing list
amd-...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx






___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-09-04 Thread zhoucm1



On 2018年09月04日 16:05, Christian König wrote:

Am 04.09.2018 um 09:53 schrieb zhoucm1:

[SNIP]


How about this idea:
1. Each signaling point is a fence implementation with an rb node.
2. Each node keeps a reference to the last previously inserted node.
3. Each node is referenced by the sync object itself.
4. Before each signal/wait operation we do a garbage collection 
and remove the first node from the tree as long as it is signaled.


5. When enable_signaling is requested for a node we cascade that 
to the left using rb_prev.
    This ensures that signaling is enabled for the current fence 
as well as all previous fences.


6. A wait just looks into the tree for the signal point lower or 
equal of the requested sequence number.
After re-thought your idea, I think it doesn't work since there is no 
timeline value as a line:
signal pt value doesn't must be continues, which can be jump by 
signal operation, like 1, 4, 8, 15, 19, e.g. there are five 
singal_pt, 
signal_pt1->signal_pt4->signal_pt8->signal_pt15->signal_pt19, if a 
wait pt is 7, do you mean this wait only needs signal_pt1 and 
signal_pt4???  That's certainly not right, we need to make sure the 
timeline value is bigger than wait pt value, that means signal_pt8 is 
need for wait_pt7.


That can be defined as we like it, e.g. when a wait operation asks for 
7 we can return 8 as well.
If defined this, then problem is coming again, if 8 is removed when 
garbage collection, you will return 15? Since there is no timeline as a 
line, I think this is not right direction.




The key is that as soon as a signal point is added adding a previous 
point is no longer allowed.

That's intention.

Regards,
David Zhou




7. When the sync object is released we use 
rbtree_postorder_for_each_entry_safe() and drop the extra 
reference to each node, but never call rb_erase!
    This way the rb_tree stays in memory, but without a root (e.g. 
the sync object). It only destructs itself when the looked up 
references to the nodes are dropped.
And here, who will destroy rb node since no one do enable_signaling, 
and there is no callback to free themselves.


The node will be destroyed when the last reference drops, not when 
enable_signaling is called.


In other words the sync_obj keeps the references to each tree object 
to provide the wait operation, as soon as the sync_obj is destroyed we 
don't need that functionality any more.


We don't even need to wait for anything to be signaled, this way we 
can drop all unused signal points as soon as the sync_obj is destroyed.


Only the used ones will stay alive and provide the necessary 
functionality to provide the signal for each wait operation.


Regards,
Christian.



Regards,
David Zhou


Well that is quite a bunch of logic, but I think that should work 
fine.
Yeah, it could work, simple timeline reference also can solve 
'free' problem.


I think this approach is still quite a bit better, 


e.g. you don't run into circle dependency problems, it needs less 
memory and each node has always the same size which means we can use 
a kmem_cache for it.


Regards,
Christian.



Thanks,
David Zhou






___
amd-gfx mailing list
amd-...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-09-04 Thread zhoucm1



On 2018年09月04日 15:00, Christian König wrote:

Am 04.09.2018 um 06:04 schrieb zhoucm1:



On 2018年09月03日 19:19, Christian König wrote:

Am 03.09.2018 um 12:07 schrieb Chunming Zhou:



在 2018/9/3 16:50, Christian König 写道:

Am 03.09.2018 um 06:13 schrieb Chunming Zhou:



在 2018/8/30 19:32, Christian König 写道:

[SNIP]



+
+struct drm_syncobj_wait_pt {
+    struct drm_syncobj_stub_fence base;
+    u64    value;
+    struct rb_node   node;
+};
+struct drm_syncobj_signal_pt {
+    struct drm_syncobj_stub_fence base;
+    struct dma_fence *signal_fence;
+    struct dma_fence *pre_pt_base;
+    struct dma_fence_cb signal_cb;
+    struct dma_fence_cb pre_pt_cb;
+    struct drm_syncobj *syncobj;
+    u64    value;
+    struct list_head list;
+};


What are those two structures good for

They are used to record wait ops points and signal ops points.
For timeline, they are connected by timeline value, works like:
    a. signal pt increase timeline value to signal_pt value 
when signal_pt is signaled. signal_pt is depending on 
previous pt fence and itself signal fence from signal ops.

    b. wait pt tree checks if timeline value over itself value.

For normal, works like:
    a. signal pt increase 1 for 
syncobj_timeline->signal_point every time when signal ops is 
performed.
    b. when wait ops is comming, wait pt will fetch above 
last signal pt value as its wait point. wait pt will be only 
signaled by equal point value signal_pt.




and why is the stub fence their base?
Good question, I tried to kzalloc for them as well when I 
debug them, I encountered a problem:
I lookup/find wait_pt or signal_pt successfully, but when I 
tried to use them, they are freed sometimes, and results in 
NULL point.
and generally, when lookup them, we often need their stub 
fence as well, and in the meantime, their lifecycle are same.

Above reason, I make stub fence as their base.


That sounds like you only did this because you messed up the 
lifecycle.


Additional to that I don't think you correctly considered the 
lifecycle of the waits and the sync object itself. E.g. 
blocking in drm_syncobj_timeline_fini() until all waits are 
done is not a good idea.


What you should do instead is to create a fence_array object 
with all the fence we need to wait for when a wait point is 
requested.
Yeah, this is our initial discussion result, but when I tried 
to do that, I found that cannot meet the advance signal 
requirement:
    a. We need to consider the wait and signal pt value are not 
one-to-one match, it's difficult to find dependent point, at 
least, there is some overhead.


As far as I can see that is independent of using a fence array 
here. See you can either use a ring buffer or an rb-tree, but 
when you want to wait for a specific point we need to condense 
the not yet signaled fences into an array.
again, need to find the range of where the specific point in, we 
should close to timeline semantics, I also refered the sw_sync.c 
timeline, usally wait_pt is signalled by timeline point. And I 
agree we can implement it with several methods, but I don't think 
there are basical differences.


The problem is that with your current approach you need the 
sync_obj alive for the synchronization to work. That is most 
likely not a good idea.
Indeed, I will fix that. How abount only creating fence array for 
every wait pt when syncobj release? when syncobj release, wait pt 
must have waited the signal opertation, then we can easily condense 
fences for every wait pt. And meantime, we can take timeline based 
wait pt advantage.


That could work, but I don't see how you want to replace the already 
issued fence with a fence_array when the sync object is destroyed.


Additional to that I would rather prefer a consistent handling, e.g. 
without extra rarely used code paths.
Ah, I find a easy way, we just need to make syncobj_timeline 
structure as a reference. This way syncobj itself can be released 
first, wait_pt/signal_pt don't need syncobj at all.

every wait_pt/signal_pt keep a reference of syncobj_timeline.


I've thought about that as well, but came to the conclusion that you 
run into problems because of circle dependencies.


E.g. sync_obj references sync_point and sync_point references sync_obj.
sync_obj can be freed first, only sync point references syncobj_timeline 
structure, syncobj_timeline doesn't reference sync_pt, no circle dep.




Additional to that it is quite a bit larger memory footprint because 
you need to keep the sync_obj around as well.
all signaled sync_pt are freed immediately except syncobj_timeline 
sturcture, where does extra memory foootprint take?












Additional to that you enable signaling without a need from the 
waiting side. That is rather bad for implementations which need 
that optimization.
Do you mean increasing timeline based on signal fence is not 
better? only update timeline value when requested by a wait pt?


Yes, exactly.

This way, we will not update timeline va

Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-09-03 Thread zhoucm1



On 2018年09月03日 19:19, Christian König wrote:

Am 03.09.2018 um 12:07 schrieb Chunming Zhou:



在 2018/9/3 16:50, Christian König 写道:

Am 03.09.2018 um 06:13 schrieb Chunming Zhou:



在 2018/8/30 19:32, Christian König 写道:

[SNIP]



+
+struct drm_syncobj_wait_pt {
+    struct drm_syncobj_stub_fence base;
+    u64    value;
+    struct rb_node   node;
+};
+struct drm_syncobj_signal_pt {
+    struct drm_syncobj_stub_fence base;
+    struct dma_fence *signal_fence;
+    struct dma_fence *pre_pt_base;
+    struct dma_fence_cb signal_cb;
+    struct dma_fence_cb pre_pt_cb;
+    struct drm_syncobj *syncobj;
+    u64    value;
+    struct list_head list;
+};


What are those two structures good for

They are used to record wait ops points and signal ops points.
For timeline, they are connected by timeline value, works like:
    a. signal pt increase timeline value to signal_pt value 
when signal_pt is signaled. signal_pt is depending on previous 
pt fence and itself signal fence from signal ops.

    b. wait pt tree checks if timeline value over itself value.

For normal, works like:
    a. signal pt increase 1 for syncobj_timeline->signal_point 
every time when signal ops is performed.
    b. when wait ops is comming, wait pt will fetch above last 
signal pt value as its wait point. wait pt will be only 
signaled by equal point value signal_pt.




and why is the stub fence their base?
Good question, I tried to kzalloc for them as well when I debug 
them, I encountered a problem:
I lookup/find wait_pt or signal_pt successfully, but when I 
tried to use them, they are freed sometimes, and results in 
NULL point.
and generally, when lookup them, we often need their stub fence 
as well, and in the meantime,  their lifecycle are same.

Above reason, I make stub fence as their base.


That sounds like you only did this because you messed up the 
lifecycle.


Additional to that I don't think you correctly considered the 
lifecycle of the waits and the sync object itself. E.g. blocking 
in drm_syncobj_timeline_fini() until all waits are done is not a 
good idea.


What you should do instead is to create a fence_array object 
with all the fence we need to wait for when a wait point is 
requested.
Yeah, this is our initial discussion result, but when I tried to 
do that, I found that cannot meet the advance signal requirement:
    a. We need to consider the wait and signal pt value are not 
one-to-one match, it's difficult to find dependent point, at 
least, there is some overhead.


As far as I can see that is independent of using a fence array 
here. See you can either use a ring buffer or an rb-tree, but when 
you want to wait for a specific point we need to condense the not 
yet signaled fences into an array.
again, need to find the range of where the specific point in, we 
should close to timeline semantics, I also refered the sw_sync.c 
timeline, usally wait_pt is signalled by timeline point. And I 
agree we can implement it with several methods, but I don't think 
there are basical differences.


The problem is that with your current approach you need the sync_obj 
alive for the synchronization to work. That is most likely not a 
good idea.
Indeed, I will fix that. How abount only creating fence array for 
every wait pt when syncobj release? when syncobj release, wait pt 
must have waited the signal opertation, then we can easily condense 
fences for every wait pt. And meantime, we can take timeline based 
wait pt advantage.


That could work, but I don't see how you want to replace the already 
issued fence with a fence_array when the sync object is destroyed.


Additional to that I would rather prefer a consistent handling, e.g. 
without extra rarely used code paths.
Ah, I find a easy way, we just need to make syncobj_timeline structure 
as a reference. This way syncobj itself can be released first, 
wait_pt/signal_pt don't need syncobj at all.

every wait_pt/signal_pt keep a reference of syncobj_timeline.







Additional to that you enable signaling without a need from the 
waiting side. That is rather bad for implementations which need that 
optimization.
Do you mean increasing timeline based on signal fence is not better? 
only update timeline value when requested by a wait pt?


Yes, exactly.

This way, we will not update timeline value immidiately and cannot 
free signal pt immidiately, and we also need to consider it to CPU 
query and wait.


That is actually the better coding style. We usually try to avoid 
doing things in interrupt handlers as much as possible.
OK, I see your concern, how about to delay handling to a workqueue? this 
way, we only increase timeline value and wake up workqueue in fence cb, 
is that acceptable?





How about this idea:
1. Each signaling point is a fence implementation with an rb node.
2. Each node keeps a reference to the last previously inserted node.
3. Each node is referenced by the sync object itself.
4. Before each signal/wait operation we do a garbage 

Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-08-30 Thread zhoucm1



On 2018年08月30日 15:25, Christian König wrote:

Am 30.08.2018 um 05:50 schrieb zhoucm1:



On 2018年08月29日 19:42, Christian König wrote:

Am 29.08.2018 um 12:44 schrieb Chunming Zhou:

VK_KHR_timeline_semaphore:
This extension introduces a new type of semaphore that has an 
integer payload
identifying a point in a timeline. Such timeline semaphores support 
the

following operations:
    * CPU query - A host operation that allows querying the payload 
of the

  timeline semaphore.
    * CPU wait - A host operation that allows a blocking wait for a
  timeline semaphore to reach a specified value.
    * Device wait - A device operation that allows waiting for a
  timeline semaphore to reach a specified value.
    * Device signal - A device operation that allows advancing the
  timeline semaphore to a specified value.

Since it's a timeline, that means the front time point(PT) always 
is signaled before the late PT.

a. signal PT design:
Signal PT fence N depends on PT[N-1] fence and signal opertion 
fence, when PT[N] fence is signaled,

the timeline will increase to value of PT[N].
b. wait PT design:
Wait PT fence is signaled by reaching timeline point value, when 
timeline is increasing, will compare
wait PTs value with new timeline value, if PT value is lower than 
timeline value, then wait PT will be
signaled, otherwise keep in list. semaphore wait operation can wait 
on any point of timeline,
so need a RB tree to order them. And wait PT could ahead of signal 
PT, we need a sumission fence to

perform that.

v2:
1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian)
2. move unexposed denitions to .c file. (Daniel Vetter)
3. split up the change to drm_syncobj_find_fence() in a separate 
patch. (Christian)
4. split up the change to drm_syncobj_replace_fence() in a separate 
patch.
5. drop the submission_fence implementation and instead use 
wait_event() for that. (Christian)

6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter)

v3:
1. replace normal syncobj with timeline implemenation. (Vetter and 
Christian)
 a. normal syncobj signal op will create a signal PT to tail of 
signal pt list.
 b. normal syncobj wait op will create a wait pt with last 
signal point, and this wait PT is only signaled by related signal 
point PT.

2. some bug fix and clean up
3. tested by ./deqp-vk -n dEQP-VK*semaphore* for normal syncobj

TODO:
1. CPU query and wait on timeline semaphore.

Signed-off-by: Chunming Zhou 
Cc: Christian Konig 
Cc: Dave Airlie 
Cc: Daniel Rakos 
Cc: Daniel Vetter 
---
  drivers/gpu/drm/drm_syncobj.c  | 593 
-

  drivers/gpu/drm/i915/i915_gem_execbuffer.c |   4 +-
  include/drm/drm_syncobj.h  |  78 +--
  include/uapi/drm/drm.h |   1 +
  4 files changed, 505 insertions(+), 171 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c 
b/drivers/gpu/drm/drm_syncobj.c

index ab43559398d0..f701d9ef1b81 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -56,6 +56,50 @@
  #include "drm_internal.h"
  #include 
  +/* merge normal syncobj to timeline syncobj, the point interval 
is 1 */

+#define DRM_SYNCOBJ_NORMAL_POINT 1
+
+struct drm_syncobj_stub_fence {
+    struct dma_fence base;
+    spinlock_t lock;
+};
+
+static const char *drm_syncobj_stub_fence_get_name(struct 
dma_fence *fence)

+{
+    return "syncobjstub";
+}
+
+static bool drm_syncobj_stub_fence_enable_signaling(struct 
dma_fence *fence)

+{
+    return !dma_fence_is_signaled(fence);
+}
+static void drm_syncobj_stub_fence_release(struct dma_fence *f)
+{
+    kfree(f);
+}
+static const struct dma_fence_ops drm_syncobj_stub_fence_ops = {
+    .get_driver_name = drm_syncobj_stub_fence_get_name,
+    .get_timeline_name = drm_syncobj_stub_fence_get_name,
+    .enable_signaling = drm_syncobj_stub_fence_enable_signaling,
+    .release = drm_syncobj_stub_fence_release,
+};


Do we really need to move that around? Could reduce the size of the 
patch quite a bit if we don't.


stub fence is used widely in both normal and timeline syncobj, if you 
think which increase patch size, I can do a separate patch for that.





+
+struct drm_syncobj_wait_pt {
+    struct drm_syncobj_stub_fence base;
+    u64    value;
+    struct rb_node   node;
+};
+struct drm_syncobj_signal_pt {
+    struct drm_syncobj_stub_fence base;
+    struct dma_fence *signal_fence;
+    struct dma_fence *pre_pt_base;
+    struct dma_fence_cb signal_cb;
+    struct dma_fence_cb pre_pt_cb;
+    struct drm_syncobj *syncobj;
+    u64    value;
+    struct list_head list;
+};


What are those two structures good for

They are used to record wait ops points and signal ops points.
For timeline, they are connected by timeline value, works like:
    a. signal pt increase timeline value to signal_pt value when 
signal_pt is signaled. signal_pt is depending on previous pt fence 
and itself signal fence from signal ops.

    b.

Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-08-29 Thread zhoucm1



On 2018年08月29日 19:42, Christian König wrote:

Am 29.08.2018 um 12:44 schrieb Chunming Zhou:

VK_KHR_timeline_semaphore:
This extension introduces a new type of semaphore that has an integer 
payload

identifying a point in a timeline. Such timeline semaphores support the
following operations:
    * CPU query - A host operation that allows querying the payload 
of the

  timeline semaphore.
    * CPU wait - A host operation that allows a blocking wait for a
  timeline semaphore to reach a specified value.
    * Device wait - A device operation that allows waiting for a
  timeline semaphore to reach a specified value.
    * Device signal - A device operation that allows advancing the
  timeline semaphore to a specified value.

Since it's a timeline, that means the front time point(PT) always is 
signaled before the late PT.

a. signal PT design:
Signal PT fence N depends on PT[N-1] fence and signal opertion fence, 
when PT[N] fence is signaled,

the timeline will increase to value of PT[N].
b. wait PT design:
Wait PT fence is signaled by reaching timeline point value, when 
timeline is increasing, will compare
wait PTs value with new timeline value, if PT value is lower than 
timeline value, then wait PT will be
signaled, otherwise keep in list. semaphore wait operation can wait 
on any point of timeline,
so need a RB tree to order them. And wait PT could ahead of signal 
PT, we need a sumission fence to

perform that.

v2:
1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian)
2. move unexposed denitions to .c file. (Daniel Vetter)
3. split up the change to drm_syncobj_find_fence() in a separate 
patch. (Christian)
4. split up the change to drm_syncobj_replace_fence() in a separate 
patch.
5. drop the submission_fence implementation and instead use 
wait_event() for that. (Christian)

6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter)

v3:
1. replace normal syncobj with timeline implemenation. (Vetter and 
Christian)
 a. normal syncobj signal op will create a signal PT to tail of 
signal pt list.
 b. normal syncobj wait op will create a wait pt with last signal 
point, and this wait PT is only signaled by related signal point PT.

2. some bug fix and clean up
3. tested by ./deqp-vk -n dEQP-VK*semaphore* for normal syncobj

TODO:
1. CPU query and wait on timeline semaphore.

Signed-off-by: Chunming Zhou 
Cc: Christian Konig 
Cc: Dave Airlie 
Cc: Daniel Rakos 
Cc: Daniel Vetter 
---
  drivers/gpu/drm/drm_syncobj.c  | 593 -
  drivers/gpu/drm/i915/i915_gem_execbuffer.c |   4 +-
  include/drm/drm_syncobj.h  |  78 +--
  include/uapi/drm/drm.h |   1 +
  4 files changed, 505 insertions(+), 171 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c 
b/drivers/gpu/drm/drm_syncobj.c

index ab43559398d0..f701d9ef1b81 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -56,6 +56,50 @@
  #include "drm_internal.h"
  #include 
  +/* merge normal syncobj to timeline syncobj, the point interval is 
1 */

+#define DRM_SYNCOBJ_NORMAL_POINT 1
+
+struct drm_syncobj_stub_fence {
+    struct dma_fence base;
+    spinlock_t lock;
+};
+
+static const char *drm_syncobj_stub_fence_get_name(struct dma_fence 
*fence)

+{
+    return "syncobjstub";
+}
+
+static bool drm_syncobj_stub_fence_enable_signaling(struct dma_fence 
*fence)

+{
+    return !dma_fence_is_signaled(fence);
+}
+static void drm_syncobj_stub_fence_release(struct dma_fence *f)
+{
+    kfree(f);
+}
+static const struct dma_fence_ops drm_syncobj_stub_fence_ops = {
+    .get_driver_name = drm_syncobj_stub_fence_get_name,
+    .get_timeline_name = drm_syncobj_stub_fence_get_name,
+    .enable_signaling = drm_syncobj_stub_fence_enable_signaling,
+    .release = drm_syncobj_stub_fence_release,
+};


Do we really need to move that around? Could reduce the size of the 
patch quite a bit if we don't.


stub fence is used widely in both normal and timeline syncobj, if you 
think which increase patch size, I can do a separate patch for that.





+
+struct drm_syncobj_wait_pt {
+    struct drm_syncobj_stub_fence base;
+    u64    value;
+    struct rb_node   node;
+};
+struct drm_syncobj_signal_pt {
+    struct drm_syncobj_stub_fence base;
+    struct dma_fence *signal_fence;
+    struct dma_fence *pre_pt_base;
+    struct dma_fence_cb signal_cb;
+    struct dma_fence_cb pre_pt_cb;
+    struct drm_syncobj *syncobj;
+    u64    value;
+    struct list_head list;
+};


What are those two structures good for

They are used to record wait ops points and signal ops points.
For timeline, they are connected by timeline value, works like:
    a. signal pt increase timeline value to signal_pt value when 
signal_pt is signaled. signal_pt is depending on previous pt fence and 
itself signal fence from signal ops.

    b. wait pt tree checks if timeline value over itself value.

For normal, works like:
    a. signal pt increase 1 for 

Re: [PATCH 5/5] drm: add syncobj timeline support v2

2018-08-23 Thread zhoucm1



On 2018年08月23日 17:15, Christian König wrote:

Am 23.08.2018 um 10:25 schrieb Chunming Zhou:

VK_KHR_timeline_semaphore:
This extension introduces a new type of semaphore that has an integer 
payload

identifying a point in a timeline. Such timeline semaphores support the
following operations:
    * Host query - A host operation that allows querying the payload 
of the

  timeline semaphore.
    * Host wait - A host operation that allows a blocking wait for a
  timeline semaphore to reach a specified value.


I think I have a idea what "Host" means in this context, but it would 
probably be better to describe it.


How about "CPU"?


    * Device wait - A device operation that allows waiting for a
  timeline semaphore to reach a specified value.
    * Device signal - A device operation that allows advancing the
  timeline semaphore to a specified value.

Since it's a timeline, that means the front time point(PT) always is 
signaled before the late PT.

a. signal PT design:
Signal PT fence N depends on PT[N-1] fence and signal opertion fence, 
when PT[N] fence is signaled,

the timeline will increase to value of PT[N].
b. wait PT design:
Wait PT fence is signaled by reaching timeline point value, when 
timeline is increasing, will compare
wait PTs value with new timeline value, if PT value is lower than 
timeline value, then wait PT will be
signaled, otherwise keep in list. semaphore wait operation can wait 
on any point of timeline,
so need a RB tree to order them. And wait PT could ahead of signal 
PT, we need a sumission fence to

perform that.

v2:
1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian)
2. move unexposed denitions to .c file. (Daniel Vetter)
3. split up the change to drm_syncobj_find_fence() in a separate 
patch. (Christian)
4. split up the change to drm_syncobj_replace_fence() in a separate 
patch.
5. drop the submission_fence implementation and instead use 
wait_event() for that. (Christian)

6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter)


I really liked Daniels idea to handle the classic syncobj like a 
timeline synobj with just 1 entry. That can probably simplify the 
implementation quite a bit.
Yeah, after timeline, seems we can remove old syncobj->fence, right? 
will try to unify them in additional patch.


Thanks,
David Zhou


Additional to that an amdgpu patch which shows how the interface is to 
be used is probably something Daniel will want to see as well.


Christian.



TODO:
1. CPU query and wait on timeline semaphore.
2. test application (Daniel Vetter)

Signed-off-by: Chunming Zhou 
Cc: Christian Konig 
Cc: Dave Airlie 
Cc: Daniel Rakos 
Cc: Daniel Vetter 
---
  drivers/gpu/drm/drm_syncobj.c | 383 
+++---

  include/drm/drm_syncobj.h |  28 +++
  include/uapi/drm/drm.h    |   1 +
  3 files changed, 389 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c 
b/drivers/gpu/drm/drm_syncobj.c

index 6227df2cc0a4..f738d78edf65 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -56,6 +56,44 @@
  #include "drm_internal.h"
  #include 
  +struct drm_syncobj_stub_fence {
+    struct dma_fence base;
+    spinlock_t lock;
+};
+
+static const char *drm_syncobj_stub_fence_get_name(struct dma_fence 
*fence)

+{
+    return "syncobjstub";
+}
+
+static bool drm_syncobj_stub_fence_enable_signaling(struct dma_fence 
*fence)

+{
+    return !dma_fence_is_signaled(fence);
+}
+
+static const struct dma_fence_ops drm_syncobj_stub_fence_ops = {
+    .get_driver_name = drm_syncobj_stub_fence_get_name,
+    .get_timeline_name = drm_syncobj_stub_fence_get_name,
+    .enable_signaling = drm_syncobj_stub_fence_enable_signaling,
+    .release = NULL,
+};
+
+struct drm_syncobj_wait_pt {
+    struct drm_syncobj_stub_fence base;
+    u64    value;
+    struct rb_node   node;
+};
+struct drm_syncobj_signal_pt {
+    struct drm_syncobj_stub_fence base;
+    struct dma_fence *signal_fence;
+    struct dma_fence *pre_pt_base;
+    struct dma_fence_cb signal_cb;
+    struct dma_fence_cb pre_pt_cb;
+    struct drm_syncobj *syncobj;
+    u64    value;
+    struct list_head list;
+};
+
  /**
   * drm_syncobj_find - lookup and reference a sync object.
   * @file_private: drm file private pointer
@@ -137,6 +175,150 @@ void drm_syncobj_remove_callback(struct 
drm_syncobj *syncobj,

  spin_unlock(>lock);
  }
  +static void drm_syncobj_timeline_signal_wait_pts(struct 
drm_syncobj *syncobj)

+{
+    struct rb_node *node = NULL;
+    struct drm_syncobj_wait_pt *wait_pt = NULL;
+
+    spin_lock(>lock);
+    for(node = rb_first(>syncobj_timeline.wait_pt_tree);
+    node != NULL; ) {
+    wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node);
+    node = rb_next(node);
+    if (wait_pt->value <= syncobj->syncobj_timeline.timeline) {
+    dma_fence_signal(_pt->base.base);
+    rb_erase(_pt->node,
+ >syncobj_timeline.wait_pt_tree);
+ 

Re: [PATCH 5/5] drm: add syncobj timeline support v2

2018-08-23 Thread zhoucm1



On 2018年08月23日 17:08, Daniel Vetter wrote:

On Thu, Aug 23, 2018 at 04:25:42PM +0800, Chunming Zhou wrote:

VK_KHR_timeline_semaphore:
This extension introduces a new type of semaphore that has an integer payload
identifying a point in a timeline. Such timeline semaphores support the
following operations:
* Host query - A host operation that allows querying the payload of the
  timeline semaphore.
* Host wait - A host operation that allows a blocking wait for a
  timeline semaphore to reach a specified value.
* Device wait - A device operation that allows waiting for a
  timeline semaphore to reach a specified value.
* Device signal - A device operation that allows advancing the
  timeline semaphore to a specified value.

Since it's a timeline, that means the front time point(PT) always is signaled 
before the late PT.
a. signal PT design:
Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when 
PT[N] fence is signaled,
the timeline will increase to value of PT[N].
b. wait PT design:
Wait PT fence is signaled by reaching timeline point value, when timeline is 
increasing, will compare
wait PTs value with new timeline value, if PT value is lower than timeline 
value, then wait PT will be
signaled, otherwise keep in list. semaphore wait operation can wait on any 
point of timeline,
so need a RB tree to order them. And wait PT could ahead of signal PT, we need 
a sumission fence to
perform that.

v2:
1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian)
2. move unexposed denitions to .c file. (Daniel Vetter)
3. split up the change to drm_syncobj_find_fence() in a separate patch. 
(Christian)
4. split up the change to drm_syncobj_replace_fence() in a separate patch.
5. drop the submission_fence implementation and instead use wait_event() for 
that. (Christian)
6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter)

Depending upon how it's going to be used, this is the wrong thing to do.


TODO:
1. CPU query and wait on timeline semaphore.
2. test application (Daniel Vetter)

I also had some more suggestions, around aligning the two concepts of
future fences
submission fence is replaced by wait_event, so I don't address your 
future fence suggestion. And welcome to explain future fence status.

and at least trying to merge the timeline and the other
fence (which really is just a special case of a timeline with only 1
slot).

Could you detail that? Do you mean merge syncobj->fence to timeline point?

Thanks,
David Zhou

-Daniel


Signed-off-by: Chunming Zhou 
Cc: Christian Konig 
Cc: Dave Airlie 
Cc: Daniel Rakos 
Cc: Daniel Vetter 
---
  drivers/gpu/drm/drm_syncobj.c | 383 +++---
  include/drm/drm_syncobj.h |  28 +++
  include/uapi/drm/drm.h|   1 +
  3 files changed, 389 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 6227df2cc0a4..f738d78edf65 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -56,6 +56,44 @@
  #include "drm_internal.h"
  #include 
  
+struct drm_syncobj_stub_fence {

+   struct dma_fence base;
+   spinlock_t lock;
+};
+
+static const char *drm_syncobj_stub_fence_get_name(struct dma_fence *fence)
+{
+return "syncobjstub";
+}
+
+static bool drm_syncobj_stub_fence_enable_signaling(struct dma_fence *fence)
+{
+return !dma_fence_is_signaled(fence);
+}
+
+static const struct dma_fence_ops drm_syncobj_stub_fence_ops = {
+   .get_driver_name = drm_syncobj_stub_fence_get_name,
+   .get_timeline_name = drm_syncobj_stub_fence_get_name,
+   .enable_signaling = drm_syncobj_stub_fence_enable_signaling,
+   .release = NULL,
+};
+
+struct drm_syncobj_wait_pt {
+   struct drm_syncobj_stub_fence base;
+   u64value;
+   struct rb_node   node;
+};
+struct drm_syncobj_signal_pt {
+   struct drm_syncobj_stub_fence base;
+   struct dma_fence *signal_fence;
+   struct dma_fence *pre_pt_base;
+   struct dma_fence_cb signal_cb;
+   struct dma_fence_cb pre_pt_cb;
+   struct drm_syncobj *syncobj;
+   u64value;
+   struct list_head list;
+};
+
  /**
   * drm_syncobj_find - lookup and reference a sync object.
   * @file_private: drm file private pointer
@@ -137,6 +175,150 @@ void drm_syncobj_remove_callback(struct drm_syncobj 
*syncobj,
spin_unlock(>lock);
  }
  
+static void drm_syncobj_timeline_signal_wait_pts(struct drm_syncobj *syncobj)

+{
+   struct rb_node *node = NULL;
+   struct drm_syncobj_wait_pt *wait_pt = NULL;
+
+   spin_lock(>lock);
+   for(node = rb_first(>syncobj_timeline.wait_pt_tree);
+   node != NULL; ) {
+   wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node);
+   node = rb_next(node);
+   if (wait_pt->value <= syncobj->syncobj_timeline.timeline) {
+   dma_fence_signal(_pt->base.base);
+

Re: [PATCH 2/2] [RFC]drm: add syncobj timeline support

2018-08-22 Thread zhoucm1



On 2018年08月22日 17:31, Daniel Vetter wrote:

On Wed, Aug 22, 2018 at 05:28:17PM +0800, zhoucm1 wrote:


On 2018年08月22日 17:24, Daniel Vetter wrote:

On Wed, Aug 22, 2018 at 04:49:28PM +0800, Chunming Zhou wrote:

VK_KHR_timeline_semaphore:
This extension introduces a new type of semaphore that has an integer payload
identifying a point in a timeline. Such timeline semaphores support the
following operations:
* Host query - A host operation that allows querying the payload of the
  timeline semaphore.
* Host wait - A host operation that allows a blocking wait for a
  timeline semaphore to reach a specified value.
* Device wait - A device operation that allows waiting for a
  timeline semaphore to reach a specified value.
* Device signal - A device operation that allows advancing the
  timeline semaphore to a specified value.

Since it's a timeline, that means the front time point(PT) always is signaled 
before the late PT.
a. signal PT design:
Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when 
PT[N] fence is signaled,
the timeline will increase to value of PT[N].
b. wait PT design:
Wait PT fence is signaled by reaching timeline point value, when timeline is 
increasing, will compare
wait PTs value with new timeline value, if PT value is lower than timeline 
value, then wait PT will be
signaled, otherwise keep in list. semaphore wait operation can wait on any 
point of timeline,
so need a RB tree to order them. And wait PT could ahead of signal PT, we need 
a sumission fence to
perform that.

TODO:
CPU query and wait on timeline semaphore.

Another TODO: igt testcase for all the cornercases. We already have
other syncobj tests in there.

Yes, I'm also trying to find where test should be wrote, Could you give a
directory?

There's already tests/syncobj_basic.c and tests/syncobj_wait.c. Either
extend those, or probably better to start a new tests/syncobj_timeline.c
since I expect this will have a lot of corner-cases we need to check.
I failed to find them in both kernel and libdrm, Could you point which 
test you said?


Thanks,
David Zhou

-Daniel


Thanks,
David Zhou

That would also help with understanding how this is supposed to be used,
since I'm a bit too dense to immediately get your algorithm by just
staring at the code.



Change-Id: I9f09aae225e268442c30451badac40406f24262c
Signed-off-by: Chunming Zhou 
Cc: Christian Konig 
Cc: Dave Airlie 
Cc: Daniel Rakos 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c |   7 +-
   drivers/gpu/drm/drm_syncobj.c  | 385 
-
   drivers/gpu/drm/v3d/v3d_gem.c  |   4 +-
   drivers/gpu/drm/vc4/vc4_gem.c  |   2 +-
   include/drm/drm_syncobj.h  |  45 +++-
   include/uapi/drm/drm.h |   3 +-
   6 files changed, 435 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index d42d1c8f78f6..463cc8960723 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1105,7 +1105,7 @@ static int amdgpu_syncobj_lookup_and_add_to_sync(struct 
amdgpu_cs_parser *p,
   {
int r;
struct dma_fence *fence;
-   r = drm_syncobj_find_fence(p->filp, handle, );
+   r = drm_syncobj_find_fence(p->filp, handle, , 0);
if (r)
return r;
@@ -1193,8 +1193,9 @@ static void amdgpu_cs_post_dependencies(struct 
amdgpu_cs_parser *p)
   {
int i;
-   for (i = 0; i < p->num_post_dep_syncobjs; ++i)
-   drm_syncobj_replace_fence(p->post_dep_syncobjs[i], p->fence);
+   for (i = 0; i < p->num_post_dep_syncobjs; ++i) {
+   drm_syncobj_signal_fence(p->post_dep_syncobjs[i], p->fence, 0);
+   }
   }
   static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 70af32d0def1..3709f36c901e 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -187,6 +187,191 @@ void drm_syncobj_replace_fence(struct drm_syncobj 
*syncobj,
   }
   EXPORT_SYMBOL(drm_syncobj_replace_fence);
+static void drm_syncobj_timeline_signal_submission_fences(struct drm_syncobj 
*syncobj)
+{
+   struct rb_node *node = NULL;
+   struct drm_syncobj_wait_pt *wait_pt = NULL;
+
+   spin_lock(>lock);
+   for(node = rb_first(>syncobj_timeline.wait_pt_tree);
+   node != NULL; node = rb_next(node)) {
+   wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node);
+   if (wait_pt->value <= syncobj->syncobj_timeline.signal_point) {
+   if (wait_pt->submission_fence)
+   dma_fence_signal(_pt->submission_fence->base);
+   } else {
+   /* the loop is from left to right, the later entry value is
+* bigger, so don't need to check any more */
+

  1   2   >