Re: [PATCH 06/11] drm/syncobj: add timeline payload query ioctl v4

2019-02-15 Thread Lionel Landwerlin via amd-gfx

On 07/12/2018 09:55, Chunming Zhou wrote:

user mode can query timeline payload.
v2: check return value of copy_to_user
v3: handle querying entry by entry
v4: rebase on new chain container, simplify interface

Signed-off-by: Chunming Zhou 
Cc: Daniel Rakos 
Cc: Jason Ekstrand 
Cc: Bas Nieuwenhuizen 
Cc: Dave Airlie 
Cc: Christian König 
Cc: Chris Wilson 
---
  drivers/gpu/drm/drm_internal.h |  2 ++
  drivers/gpu/drm/drm_ioctl.c|  2 ++
  drivers/gpu/drm/drm_syncobj.c  | 43 ++
  include/uapi/drm/drm.h | 10 
  4 files changed, 57 insertions(+)

diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
index 18b41e10195c..dab4d5936441 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -184,6 +184,8 @@ int drm_syncobj_reset_ioctl(struct drm_device *dev, void 
*data,
struct drm_file *file_private);
  int drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
 struct drm_file *file_private);
+int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
+   struct drm_file *file_private);
  
  /* drm_framebuffer.c */

  void drm_framebuffer_print_info(struct drm_printer *p, unsigned int indent,
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index a9a17ed35cc4..7578ef6dc1d1 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -681,6 +681,8 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_SIGNAL, drm_syncobj_signal_ioctl,
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_QUERY, drm_syncobj_query_ioctl,
+ DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_CRTC_GET_SEQUENCE, drm_crtc_get_sequence_ioctl, 
DRM_UNLOCKED),
DRM_IOCTL_DEF(DRM_IOCTL_CRTC_QUEUE_SEQUENCE, 
drm_crtc_queue_sequence_ioctl, DRM_UNLOCKED),
DRM_IOCTL_DEF(DRM_IOCTL_MODE_CREATE_LEASE, drm_mode_create_lease_ioctl, 
DRM_MASTER|DRM_UNLOCKED),
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 348079bb0965..f97fa00ca1d0 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -1061,3 +1061,46 @@ drm_syncobj_signal_ioctl(struct drm_device *dev, void 
*data,
  
  	return ret;

  }
+
+int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
+   struct drm_file *file_private)
+{
+   struct drm_syncobj_timeline_array *args = data;
+   struct drm_syncobj **syncobjs;
+   uint64_t __user *points = u64_to_user_ptr(args->points);
+   uint32_t i;
+   int ret;
+
+   if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ))
+   return -ENODEV;
+
+   if (args->pad != 0)
+   return -EINVAL;
+
+   if (args->count_handles == 0)
+   return -EINVAL;
+
+   ret = drm_syncobj_array_find(file_private,
+u64_to_user_ptr(args->handles),
+args->count_handles,
+&syncobjs);
+   if (ret < 0)
+   return ret;
+
+   for (i = 0; i < args->count_handles; i++) {
+   struct dma_fence_chain *chain;
+   struct dma_fence *fence;
+   uint64_t point;
+
+   fence = drm_syncobj_fence_get(syncobjs[i]);
+   chain = to_dma_fence_chain(fence);
+   point = chain ? fence->seqno : 0;



Sorry, I don' t want to sound annoying, but this looks like this could 
report values going backward.


Anything add a point X to a timeline that has reached value Y with X < Y 
would trigger that.


Either through the submission or userspace signaling or importing 
another syncpoint's fence.



-Lionel



+   ret = copy_to_user(&points[i], &point, sizeof(uint64_t));
+   ret = ret ? -EFAULT : 0;
+   if (ret)
+   break;
+   }
+   drm_syncobj_array_free(syncobjs, args->count_handles);
+
+   return ret;
+}
diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
index 0092111d002c..b2c36f2b2599 100644
--- a/include/uapi/drm/drm.h
+++ b/include/uapi/drm/drm.h
@@ -767,6 +767,14 @@ struct drm_syncobj_array {
__u32 pad;
  };
  
+struct drm_syncobj_timeline_array {

+   __u64 handles;
+   __u64 points;
+   __u32 count_handles;
+   __u32 pad;
+};
+
+
  /* Query current scanout sequence number */
  struct drm_crtc_get_sequence {
__u32 crtc_id;  /* requested crtc_id */
@@ -924,6 +932,8 @@ extern "C" {
  #define DRM_IOCTL_MODE_REVOKE_LEASE   DRM_IOWR(0xC9, struct 
drm_mode_revoke_lease)
  
  #define DRM_IOCTL_SYNCOBJ_TIMELINE_WAIT	DRM_IOWR(0xCA, struct drm_syncobj_timeline_wait)

+#define DRM_IOCTL_SYNCOBJ_QUERYDRM_IOWR(0xCB, struct 
drm_

Re: [PATCH 02/11] dma-buf: add new dma_fence_chain container v4

2019-02-15 Thread Lionel Landwerlin via amd-gfx

On 15/02/2019 14:32, Koenig, Christian wrote:

Am 15.02.19 um 15:23 schrieb Lionel Landwerlin:

Hi Christian, David,

For timeline semaphore we need points to signaled in order.
I'm struggling to understand how this fence-chain implementation
preserves ordering of the seqnos.

One of the scenario I can see an issue happening is when you have a
timeline with points 1 & 2 and userspace submits for 2 different
engines :
     - first with let's say a blitter style engine on point 2
     - then a 3d style engine on point 1

Yeah, and where exactly is the problem?

Seqno 1 will signal when the 3d style engine finishes work.

And seqno 2 will signal when both seqno 1 is signaled and the blitter
style engine has finished its work.


That's not really how I understood the spec, but I might be wrong.

What makes me thing 1 should be signaled as soon as 2 is signaled
(regardless of whether the fence attached on point 1 is been signaled),
is that the spec defines wait & signal operations in term of the value
of the timeline.


-Lionel




Another scenario would be signaling a timeline with points 1 & 2 with
those points in reverse order in the submission array.

That is actually illegal in the spec, but actually handled gracefully as
well.

E.g. when you add seqno 1 to the syncobj container it will only signal
when 2 is signaled as well.







Regards,
Christian.


-Lionel

On 07/12/2018 09:55, Chunming Zhou wrote:

From: Christian König 

Lockless container implementation similar to a dma_fence_array, but with
only two elements per node and automatic garbage collection.

v2: properly document dma_fence_chain_for_each, add
dma_fence_chain_find_seqno,
  drop prev reference during garbage collection if it's not a
chain fence.
v3: use head and iterator for dma_fence_chain_for_each
v4: fix reference count in dma_fence_chain_enable_signaling

Signed-off-by: Christian König 
---
   drivers/dma-buf/Makefile  |   3 +-
   drivers/dma-buf/dma-fence-chain.c | 241 ++
   include/linux/dma-fence-chain.h   |  81 ++
   3 files changed, 324 insertions(+), 1 deletion(-)
   create mode 100644 drivers/dma-buf/dma-fence-chain.c
   create mode 100644 include/linux/dma-fence-chain.h

diff --git a/drivers/dma-buf/Makefile b/drivers/dma-buf/Makefile
index 0913a6ccab5a..1f006e083eb9 100644
--- a/drivers/dma-buf/Makefile
+++ b/drivers/dma-buf/Makefile
@@ -1,4 +1,5 @@
-obj-y := dma-buf.o dma-fence.o dma-fence-array.o reservation.o
seqno-fence.o
+obj-y := dma-buf.o dma-fence.o dma-fence-array.o dma-fence-chain.o \
+ reservation.o seqno-fence.o
   obj-$(CONFIG_SYNC_FILE)    += sync_file.o
   obj-$(CONFIG_SW_SYNC)    += sw_sync.o sync_debug.o
   obj-$(CONFIG_UDMABUF)    += udmabuf.o
diff --git a/drivers/dma-buf/dma-fence-chain.c
b/drivers/dma-buf/dma-fence-chain.c
new file mode 100644
index ..0c5e3c902fa0
--- /dev/null
+++ b/drivers/dma-buf/dma-fence-chain.c
@@ -0,0 +1,241 @@
+/*
+ * fence-chain: chain fences together in a timeline
+ *
+ * Copyright (C) 2018 Advanced Micro Devices, Inc.
+ * Authors:
+ *    Christian König 
+ *
+ * This program is free software; you can redistribute it and/or
modify it
+ * under the terms of the GNU General Public License version 2 as
published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
License for
+ * more details.
+ */
+
+#include 
+
+static bool dma_fence_chain_enable_signaling(struct dma_fence *fence);
+
+/**
+ * dma_fence_chain_get_prev - use RCU to get a reference to the
previous fence
+ * @chain: chain node to get the previous node from
+ *
+ * Use dma_fence_get_rcu_safe to get a reference to the previous
fence of the
+ * chain node.
+ */
+static struct dma_fence *dma_fence_chain_get_prev(struct
dma_fence_chain *chain)
+{
+    struct dma_fence *prev;
+
+    rcu_read_lock();
+    prev = dma_fence_get_rcu_safe(&chain->prev);
+    rcu_read_unlock();
+    return prev;
+}
+
+/**
+ * dma_fence_chain_walk - chain walking function
+ * @fence: current chain node
+ *
+ * Walk the chain to the next node. Returns the next fence or NULL
if we are at
+ * the end of the chain. Garbage collects chain nodes which are already
+ * signaled.
+ */
+struct dma_fence *dma_fence_chain_walk(struct dma_fence *fence)
+{
+    struct dma_fence_chain *chain, *prev_chain;
+    struct dma_fence *prev, *replacement, *tmp;
+
+    chain = to_dma_fence_chain(fence);
+    if (!chain) {
+    dma_fence_put(fence);
+    return NULL;
+    }
+
+    while ((prev = dma_fence_chain_get_prev(chain))) {
+
+    prev_chain = to_dma_fence_chain(prev);
+    if (prev_chain) {
+    if (!dma_fence_is_signaled(prev_chain->fence))
+    break;
+
+    replacement = dma_fence_chain_get_prev(prev_chain);
+    } else {
+   

Re: [PATCH 02/11] dma-buf: add new dma_fence_chain container v4

2019-02-15 Thread Lionel Landwerlin via amd-gfx

Hi Christian, David,

For timeline semaphore we need points to signaled in order.
I'm struggling to understand how this fence-chain implementation 
preserves ordering of the seqnos.


One of the scenario I can see an issue happening is when you have a 
timeline with points 1 & 2 and userspace submits for 2 different engines :

    - first with let's say a blitter style engine on point 2
    - then a 3d style engine on point 1

Another scenario would be signaling a timeline with points 1 & 2 with 
those points in reverse order in the submission array.


-Lionel

On 07/12/2018 09:55, Chunming Zhou wrote:

From: Christian König 

Lockless container implementation similar to a dma_fence_array, but with
only two elements per node and automatic garbage collection.

v2: properly document dma_fence_chain_for_each, add dma_fence_chain_find_seqno,
 drop prev reference during garbage collection if it's not a chain fence.
v3: use head and iterator for dma_fence_chain_for_each
v4: fix reference count in dma_fence_chain_enable_signaling

Signed-off-by: Christian König 
---
  drivers/dma-buf/Makefile  |   3 +-
  drivers/dma-buf/dma-fence-chain.c | 241 ++
  include/linux/dma-fence-chain.h   |  81 ++
  3 files changed, 324 insertions(+), 1 deletion(-)
  create mode 100644 drivers/dma-buf/dma-fence-chain.c
  create mode 100644 include/linux/dma-fence-chain.h

diff --git a/drivers/dma-buf/Makefile b/drivers/dma-buf/Makefile
index 0913a6ccab5a..1f006e083eb9 100644
--- a/drivers/dma-buf/Makefile
+++ b/drivers/dma-buf/Makefile
@@ -1,4 +1,5 @@
-obj-y := dma-buf.o dma-fence.o dma-fence-array.o reservation.o seqno-fence.o
+obj-y := dma-buf.o dma-fence.o dma-fence-array.o dma-fence-chain.o \
+reservation.o seqno-fence.o
  obj-$(CONFIG_SYNC_FILE)   += sync_file.o
  obj-$(CONFIG_SW_SYNC) += sw_sync.o sync_debug.o
  obj-$(CONFIG_UDMABUF) += udmabuf.o
diff --git a/drivers/dma-buf/dma-fence-chain.c 
b/drivers/dma-buf/dma-fence-chain.c
new file mode 100644
index ..0c5e3c902fa0
--- /dev/null
+++ b/drivers/dma-buf/dma-fence-chain.c
@@ -0,0 +1,241 @@
+/*
+ * fence-chain: chain fences together in a timeline
+ *
+ * Copyright (C) 2018 Advanced Micro Devices, Inc.
+ * Authors:
+ * Christian König 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#include 
+
+static bool dma_fence_chain_enable_signaling(struct dma_fence *fence);
+
+/**
+ * dma_fence_chain_get_prev - use RCU to get a reference to the previous fence
+ * @chain: chain node to get the previous node from
+ *
+ * Use dma_fence_get_rcu_safe to get a reference to the previous fence of the
+ * chain node.
+ */
+static struct dma_fence *dma_fence_chain_get_prev(struct dma_fence_chain 
*chain)
+{
+   struct dma_fence *prev;
+
+   rcu_read_lock();
+   prev = dma_fence_get_rcu_safe(&chain->prev);
+   rcu_read_unlock();
+   return prev;
+}
+
+/**
+ * dma_fence_chain_walk - chain walking function
+ * @fence: current chain node
+ *
+ * Walk the chain to the next node. Returns the next fence or NULL if we are at
+ * the end of the chain. Garbage collects chain nodes which are already
+ * signaled.
+ */
+struct dma_fence *dma_fence_chain_walk(struct dma_fence *fence)
+{
+   struct dma_fence_chain *chain, *prev_chain;
+   struct dma_fence *prev, *replacement, *tmp;
+
+   chain = to_dma_fence_chain(fence);
+   if (!chain) {
+   dma_fence_put(fence);
+   return NULL;
+   }
+
+   while ((prev = dma_fence_chain_get_prev(chain))) {
+
+   prev_chain = to_dma_fence_chain(prev);
+   if (prev_chain) {
+   if (!dma_fence_is_signaled(prev_chain->fence))
+   break;
+
+   replacement = dma_fence_chain_get_prev(prev_chain);
+   } else {
+   if (!dma_fence_is_signaled(prev))
+   break;
+
+   replacement = NULL;
+   }
+
+   tmp = cmpxchg(&chain->prev, prev, replacement);
+   if (tmp == prev)
+   dma_fence_put(tmp);
+   else
+   dma_fence_put(replacement);
+   dma_fence_put(prev);
+   }
+
+   dma_fence_put(fence);
+   return prev;
+}
+EXPORT_SYMBOL(dma_fence_chain_walk);
+
+/**
+ * dma_fence_chain_find_seqno - find fence chain node by seqno
+ * @pfence: pointer to the chain node where to start
+ * @seqno: the sequence number to search for
+ *
+ * Advance the fence point

Re: [PATCH 09/11] drm/syncobj: add transition iotcls between binary and timeline

2019-02-15 Thread Lionel Landwerlin via amd-gfx

Hi David,

Thanks a lot for point me to the tests you've added in IGT.
While adding a test with that signals fences imported into a timeline 
syncobj out of order, I ran into a deadlock.
Here is the test : 
https://github.com/djdeath/intel-gpu-tools/commit/1e46cf7e7bff09b78a24367ddc2314f97eb0a1b9


Trying to kill the deadlocked process I got this backtrace :


[   33.969136] [IGT] syncobj_timeline: starting subtest signal-order
[   60.452823] watchdog: BUG: soft lockup - CPU#6 stuck for 23s! 
[syncobj_timelin:2021]
[   60.452826] Modules linked in: rfcomm cmac bnep binfmt_misc 
nls_iso8859_1 snd_hda_codec_hdmi snd_hda_codec_realtek 
snd_hda_codec_generic ledtrig_audio sch_fq_codel ib_iser snd_hda_intel 
rdma_cm iw_cm snd_hda_codec ib_cm snd_hda_core snd_hwdep intel_rapl 
snd_pcm ib_core x86_pkg_temp_thermal intel_powerclamp configf
s coretemp iscsi_tcp snd_seq_midi libiscsi_tcp snd_seq_midi_event 
libiscsi kvm_intel scsi_transport_iscsi kvm btusb snd_rawmidi irqbypass 
btrtl intel_cstate intel_rapl_perf btbcm btintel bluetooth snd_seq 
snd_seq_device snd_timer input_leds ecdh_generic snd soundcore mei_me 
mei intel_pch_thermal mac_hid acpi_pad parp
ort_pc ppdev lp parport ip_tables x_tables autofs4 btrfs zstd_decompress 
zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear 
hid_generic usbhid hid i915 crct10dif_pclmul crc32_pclmul i2c_algo_bit 
ghash_clmulni_intel prime_numbers

drm_kms_helper aesni_intel syscopyarea sysfillrect
[   60.452876]  sysimgblt fb_sys_fops aes_x86_64 crypto_simd sdhci_pci 
cryptd drm e1000e glue_helper cqhci sdhci wmi video
[   60.452881] CPU: 6 PID: 2021 Comm: syncobj_timelin Tainted: G 
U    5.0.0-rc5+ #337
[   60.452882] Hardware name:  /NUC6i7KYB, BIOS 
KYSKLi70.86A.0042.2016.0929.1933 09/29/2016

[   60.452886] RIP: 0010:dma_fence_chain_walk+0x22c/0x260
[   60.452888] Code: ff e9 93 fe ff ff 48 8b 45 08 48 8b 40 18 48 85 c0 
74 0c 48 89 ef e8 33 0f 58 00 84 c0 75 23 f0 41 ff 4d 00 0f 88 99 87 2f 
00 <0f> 85 05 fe ff ff 4c 89 ef e8 56 ea ff ff 48 89 d8 5b 5d 41 5c 41
[   60.452888] RSP: 0018:9a5804653ca8 EFLAGS: 00010296 ORIG_RAX: 
ff13
[   60.452889] RAX:  RBX: 8f5690fb2480 RCX: 
8f5690fb2f00
[   60.452890] RDX: 003e3730 RSI:  RDI: 
8f5690fb2180
[   60.452891] RBP: 8f5690fb2180 R08:  R09: 
8f5690fb2eb0
[   60.452891] R10:  R11: 8f5660469860 R12: 
8f5690fb2f68
[   60.452892] R13: 8f5690fb2f00 R14: 0003 R15: 
8f5655a45fc0
[   60.452913] FS:  7fdc5c459980() GS:8f569eb8() 
knlGS:

[   60.452913] CS:  0010 DS:  ES:  CR0: 80050033
[   60.452914] CR2: 7f9d74336dd8 CR3: 00084a67e004 CR4: 
003606e0
[   60.452915] DR0:  DR1:  DR2: 

[   60.452915] DR3:  DR6: fffe0ff0 DR7: 
0400

[   60.452916] Call Trace:
[   60.452958]  drm_syncobj_add_point+0x102/0x160 [drm]
[   60.452965]  ? drm_syncobj_fd_to_handle_ioctl+0x1b0/0x1b0 [drm]
[   60.452971]  drm_syncobj_transfer_ioctl+0x10f/0x180 [drm]
[   60.452978]  drm_ioctl_kernel+0xac/0xf0 [drm]
[   60.452984]  drm_ioctl+0x2eb/0x3b0 [drm]
[   60.452990]  ? drm_syncobj_fd_to_handle_ioctl+0x1b0/0x1b0 [drm]
[   60.452992]  ? sw_sync_ioctl+0x347/0x370
[   60.452994]  do_vfs_ioctl+0xa4/0x640
[   60.452995]  ? __fput+0x134/0x220
[   60.452997]  ? do_fcntl+0x1a5/0x650
[   60.452998]  ksys_ioctl+0x70/0x80
[   60.452999]  __x64_sys_ioctl+0x16/0x20
[   60.453002]  do_syscall_64+0x55/0x110
[   60.453004]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   60.453005] RIP: 0033:0x7fdc5b6e45d7
[   60.453006] Code: b3 66 90 48 8b 05 b1 48 2d 00 64 c7 00 26 00 00 00 
48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 48 2d 00 f7 d8 64 89 01 48
[   60.453007] RSP: 002b:7fff25c4d198 EFLAGS: 0206 ORIG_RAX: 
0010
[   60.453008] RAX: ffda RBX:  RCX: 
7fdc5b6e45d7
[   60.453008] RDX: 7fff25c4d200 RSI: c02064cc RDI: 
0003
[   60.453009] RBP: 7fff25c4d1d0 R08:  R09: 
001e
[   60.453010] R10:  R11: 0206 R12: 
563d3959e4d0
[   60.453010] R13: 7fff25c4d620 R14:  R15: 

[   88.447359] watchdog: BUG: soft lockup - CPU#6 stuck for 22s! 
[syncobj_timelin:2021]



-Lionel


On 07/12/2018 09:55, Chunming Zhou wrote:

we need to import/export timeline point

Signed-off-by: Chunming Zhou 
---
  drivers/gpu/drm/drm_internal.h |  4 +++
  drivers/gpu/drm/drm_ioctl.c|  6 
  drivers/gpu/drm/drm_syncobj.c  | 66 ++
  include/uapi/drm/drm.h | 10 ++
  4 files changed, 86 insertions(+)

diff --git a/drivers/gpu/drm/drm_interna