Re: [PATCH v2 1/3] drm: Support importing dmabufs into drivers without DMA
On Mon, Feb 22, 2021 at 01:43:26PM +0100, Thomas Zimmermann wrote: ... > > [ 60.050199] [ cut here ] > [ 60.055092] WARNING: CPU: 0 PID: 1403 at kernel/dma/mapping.c:190 > dma_map_sg_attrs+0x8f/0xc0 > [ 60.064331] Modules linked in: af_packet(E) rfkill(E) dmi_sysfs(E) > intel_rapl_msr(E) intel_rapl_common(E) snd_hda_codec_realtek(E) > snd_hda_codec_generic(E) ledtrig_audio(E) snd_hda_codec_hdmi(E) > x86_pkg_temp_thermal(E) intel_powerclam) > [ 60.064801] wmi(E) video(E) btrfs(E) blake2b_generic(E) libcrc32c(E) > crc32c_intel(E) xor(E) raid6_pq(E) sg(E) dm_multipath(E) dm_mod(E) > scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) msr(E) efivarfs(E) > [ 60.170778] CPU: 0 PID: 1403 Comm: Xorg.bin Tainted: GE > 5.11.0-rc5-1-default+ #747 > [ 60.179841] Hardware name: Dell Inc. OptiPlex 9020/0N4YC8, BIOS A24 > 10/24/2018 > [ 60.187145] RIP: 0010:dma_map_sg_attrs+0x8f/0xc0 > [ 60.191822] Code: 4d 85 ff 75 2b 49 89 d8 44 89 e1 44 89 f2 4c 89 ee 48 89 > ef e8 62 30 00 00 85 c0 78 36 5b 5d 41 5c 41 5d 41 5e 41 5f c3 0f 0b <0f> 0b > 31 c0 eb ed 49 8d 7f 50 e8 72 f5 2a 00 49 8b 47 50 49 89 d8 > [ 60.210770] RSP: 0018:c90001d6fc18 EFLAGS: 00010246 > [ 60.216062] RAX: RBX: 0020 RCX: > b31e677b > [ 60.223274] RDX: dc00 RSI: 888212c10600 RDI: > 8881b08a9488 > [ 60.230501] RBP: 8881b08a9030 R08: 0020 R09: > 888212c10600 > [ 60.237710] R10: ed10425820df R11: 0001 R12: > > [ 60.244939] R13: 888212c10600 R14: 0008 R15: > > [ 60.252155] FS: 7f08ac2b2f00() GS:8887cbe0() > knlGS: > [ 60.260333] CS: 0010 DS: ES: CR0: 80050033 > [ 60.266150] CR2: 55831c899be8 CR3: 00015372e006 CR4: > 001706f0 > [ 60.273369] Call Trace: > [ 60.275845] drm_gem_map_dma_buf+0xb4/0x120 > [ 60.280089] dma_buf_map_attachment+0x15d/0x1e0 > [ 60.284688] drm_gem_prime_import_dev.part.0+0x5d/0x140 > [ 60.289984] drm_gem_prime_fd_to_handle+0x213/0x280 > [ 60.294931] ? drm_prime_destroy_file_private+0x30/0x30 > [ 60.300224] drm_ioctl_kernel+0x131/0x180 > [ 60.304291] ? drm_setversion+0x230/0x230 > [ 60.308366] ? drm_prime_destroy_file_private+0x30/0x30 > [ 60.313659] drm_ioctl+0x2f1/0x500 > [ 60.317118] ? drm_version+0x150/0x150 > [ 60.320903] ? lock_downgrade+0xa0/0xa0 > [ 60.324806] ? do_vfs_ioctl+0x5f4/0x680 > [ 60.328694] ? __fget_files+0x13e/0x210 > [ 60.332591] ? ioctl_fiemap.isra.0+0x1a0/0x1a0 > [ 60.337102] ? __fget_files+0x15d/0x210 > [ 60.340990] __x64_sys_ioctl+0xb9/0xf0 > [ 60.344795] do_syscall_64+0x33/0x80 > [ 60.348427] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [ 60.353542] RIP: 0033:0x7f08ac7b53cb > [ 60.357168] Code: 89 d8 49 8d 3c 1c 48 f7 d8 49 39 c4 72 b5 e8 1c ff ff ff > 85 c0 78 ba 4c 89 e0 5b 5d 41 5c c3 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d > 01 f0 ff ff 73 01 c3 48 8b 0d 75 ba 0c 00 f7 d8 64 89 01 48 > [ 60.376108] RSP: 002b:7ffeabc89fc8 EFLAGS: 0246 ORIG_RAX: > 0010 > [ 60.383758] RAX: ffda RBX: 7ffeabc8a00c RCX: > 7f08ac7b53cb > [ 60.390970] RDX: 7ffeabc8a00c RSI: c00c642e RDI: > 000d > [ 60.398221] RBP: c00c642e R08: 55831c691d00 R09: > 55831b2ec010 > [ 60.405446] R10: 7f08acf6ada0 R11: 0246 R12: > 55831c691d00 > [ 60.412667] R13: 000d R14: 007e9000 R15: > > [ 60.419903] irq event stamp: 672893 > [ 60.423441] hardirqs last enabled at (672913): [] > console_unlock+0x44d/0x500 > [ 60.432230] hardirqs last disabled at (672922): [] > console_unlock+0x443/0x500 > [ 60.441021] softirqs last enabled at (672940): [] > __do_softirq+0x3dd/0x554 > [ 60.449634] softirqs last disabled at (672931): [] > asm_call_irq_on_stack+0x12/0x20 > [ 60.458871] ---[ end trace f2f88696eb17806c ]--- > Copy-pasting the etnire stack trace to the commit log really hurts readability and provides no additional value to the reader, IMHO. > This patch introduces struct drm_driver.gem_prime_create_object, which ^ Documentation/process/submitting-patches.rst: Describe your changes in imperative mood, e.g. "make xyzzy do frotz" instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy to do frotz", as if you are giving orders to the codebase to change its behaviour. > Cc: Christoph Hellwig > Cc: Greg Kroah-Hartman > Cc: Alan Stern > Cc: Johan Hovold > Cc: Andy Shevchenko &g
[PATCH v3 00/20] seqlock: Extend seqcount API with associated locks
Hi, This is v3 of the seqlock patch series: [PATCH v1 00/25] seqlock: Extend seqcount API with associated locks https://lore.kernel.org/lkml/20200519214547.352050-1-a.darw...@linutronix.de [PATCH v2 00/18] https://lore.kernel.org/lkml/20200608005729.1874024-1-a.darw...@linutronix.de It's based over: git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git locking/core to get Peter's lockdep irqstate tracking series below, which untangles mainline seqlock.h<=>sched.h 'current->' task_struct circular dependency: https://lkml.kernel.org/r/linuxppc-dev/20200623083645.277342...@infradead.org Changelog-v3: - Re-add lockdep non-preemptibility checks on seqcount_t write paths. They were removed from v2 due to the circular dependencies mentioned. - Slight rebase over the new v5.8-rc1 KCSAN seqlock.h changes - Collect seqcount_t call-sites acked-by tags Thanks, 8<------ Ahmed S. Darwish (20): Documentation: locking: Describe seqlock design and usage seqlock: Properly format kernel-doc code samples seqlock: Add missing kernel-doc annotations lockdep: Add preemption enabled/disabled assertion APIs seqlock: lockdep assert non-preemptibility on seqcount_t write seqlock: Extend seqcount API with associated locks dma-buf: Remove custom seqcount lockdep class key dma-buf: Use sequence counter with associated wound/wait mutex sched: tasks: Use sequence counter with associated spinlock netfilter: conntrack: Use sequence counter with associated spinlock netfilter: nft_set_rbtree: Use sequence counter with associated rwlock xfrm: policy: Use sequence counters with associated lock timekeeping: Use sequence counter with associated raw spinlock vfs: Use sequence counter with associated spinlock raid5: Use sequence counter with associated spinlock iocost: Use sequence counter with associated spinlock NFSv4: Use sequence counter with associated spinlock userfaultfd: Use sequence counter with associated spinlock kvm/eventfd: Use sequence counter with associated spinlock hrtimer: Use sequence counter with associated raw spinlock Documentation/locking/index.rst | 1 + Documentation/locking/seqlock.rst | 242 + MAINTAINERS | 2 +- block/blk-iocost.c| 5 +- drivers/dma-buf/dma-resv.c| 15 +- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 - drivers/md/raid5.c| 2 +- drivers/md/raid5.h| 2 +- fs/dcache.c | 2 +- fs/fs_struct.c| 4 +- fs/nfs/nfs4_fs.h | 2 +- fs/nfs/nfs4state.c| 2 +- fs/userfaultfd.c | 4 +- include/linux/dcache.h| 2 +- include/linux/dma-resv.h | 4 +- include/linux/fs_struct.h | 2 +- include/linux/hrtimer.h | 2 +- include/linux/kvm_irqfd.h | 2 +- include/linux/lockdep.h | 18 + include/linux/sched.h | 2 +- include/linux/seqlock.h | 872 ++ include/linux/seqlock_types_internal.h| 186 include/net/netfilter/nf_conntrack.h | 2 +- init/init_task.c | 3 +- kernel/fork.c | 2 +- kernel/time/hrtimer.c | 13 +- kernel/time/timekeeping.c | 19 +- lib/Kconfig.debug | 1 + net/netfilter/nf_conntrack_core.c | 5 +- net/netfilter/nft_set_rbtree.c| 4 +- net/xfrm/xfrm_policy.c| 10 +- virt/kvm/eventfd.c| 2 +- 32 files changed, 1211 insertions(+), 225 deletions(-) create mode 100644 Documentation/locking/seqlock.rst create mode 100644 include/linux/seqlock_types_internal.h base-commit: 997e89fa345e9006f311cf9f9c8fd9f7d96c240f -- 2.20.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH v3 07/20] dma-buf: Remove custom seqcount lockdep class key
Commit 3c3b177a9369 ("reservation: add support for read-only access using rcu") introduced a sequence counter to manage updates to reservations. Back then, the reservation object initializer reservation_object_init() was always inlined. Having the sequence counter initialization inlined meant that each of the call sites would have a different lockdep class key, which would've broken lockdep's deadlock detection. The aforementioned commit thus introduced, and exported, a custom seqcount lockdep class key and name. The commit 8735f16803f00 ("dma-buf: cleanup reservation_object_init...") transformed the reservation object initializer to a normal non-inlined C function. seqcount_init(), which automatically defines the seqcount lockdep class key and must be called non-inlined, can now be safely used. Remove the seqcount custom lockdep class key, name, and export. Use seqcount_init() inside the dma reservation object initializer. Signed-off-by: Ahmed S. Darwish Reviewed-by: Sebastian Andrzej Siewior Acked-by: Daniel Vetter --- drivers/dma-buf/dma-resv.c | 9 + include/linux/dma-resv.h | 2 -- 2 files changed, 1 insertion(+), 10 deletions(-) diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index b45f8514dc82..15efa0c2dacb 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -51,12 +51,6 @@ DEFINE_WD_CLASS(reservation_ww_class); EXPORT_SYMBOL(reservation_ww_class); -struct lock_class_key reservation_seqcount_class; -EXPORT_SYMBOL(reservation_seqcount_class); - -const char reservation_seqcount_string[] = "reservation_seqcount"; -EXPORT_SYMBOL(reservation_seqcount_string); - /** * dma_resv_list_alloc - allocate fence list * @shared_max: number of fences we need space for @@ -135,9 +129,8 @@ subsys_initcall(dma_resv_lockdep); void dma_resv_init(struct dma_resv *obj) { ww_mutex_init(>lock, _ww_class); + seqcount_init(>seq); - __seqcount_init(>seq, reservation_seqcount_string, - _seqcount_class); RCU_INIT_POINTER(obj->fence, NULL); RCU_INIT_POINTER(obj->fence_excl, NULL); } diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index ee50d10f052b..a6538ae7d93f 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -46,8 +46,6 @@ #include extern struct ww_class reservation_ww_class; -extern struct lock_class_key reservation_seqcount_class; -extern const char reservation_seqcount_string[]; /** * struct dma_resv_list - a list of shared fences -- 2.20.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH v3 08/20] dma-buf: Use sequence counter with associated wound/wait mutex
A sequence counter write side critical section must be protected by some form of locking to serialize writers. If the serialization primitive is not disabling preemption implicitly, preemption has to be explicitly disabled before entering the sequence counter write side critical section. The dma-buf reservation subsystem uses plain sequence counters to manage updates to reservations. Writer serialization is accomplished through a wound/wait mutex. Acquiring a wound/wait mutex does not disable preemption, so this needs to be done manually before and after the write side critical section. Use the newly-added seqcount_ww_mutex_t instead: - It associates the ww_mutex with the sequence count, which enables lockdep to validate that the write side critical section is properly serialized. - It removes the need to explicitly add preempt_disable/enable() around the write side critical section because the write_begin/end() functions for this new data type automatically do this. If lockdep is disabled this ww_mutex lock association is compiled out and has neither storage size nor runtime overhead. Signed-off-by: Ahmed S. Darwish Acked-by: Daniel Vetter --- drivers/dma-buf/dma-resv.c | 8 +--- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 -- include/linux/dma-resv.h | 2 +- 3 files changed, 2 insertions(+), 10 deletions(-) diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 15efa0c2dacb..a7631352a486 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -129,7 +129,7 @@ subsys_initcall(dma_resv_lockdep); void dma_resv_init(struct dma_resv *obj) { ww_mutex_init(>lock, _ww_class); - seqcount_init(>seq); + seqcount_ww_mutex_init(>seq, >lock); RCU_INIT_POINTER(obj->fence, NULL); RCU_INIT_POINTER(obj->fence_excl, NULL); @@ -260,7 +260,6 @@ void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence) fobj = dma_resv_get_list(obj); count = fobj->shared_count; - preempt_disable(); write_seqcount_begin(>seq); for (i = 0; i < count; ++i) { @@ -282,7 +281,6 @@ void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence) smp_store_mb(fobj->shared_count, count); write_seqcount_end(>seq); - preempt_enable(); dma_fence_put(old); } EXPORT_SYMBOL(dma_resv_add_shared_fence); @@ -309,14 +307,12 @@ void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence) if (fence) dma_fence_get(fence); - preempt_disable(); write_seqcount_begin(>seq); /* write_seqcount_begin provides the necessary memory barrier */ RCU_INIT_POINTER(obj->fence_excl, fence); if (old) old->shared_count = 0; write_seqcount_end(>seq); - preempt_enable(); /* inplace update, no shared fences */ while (i--) @@ -394,13 +390,11 @@ int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src) src_list = dma_resv_get_list(dst); old = dma_resv_get_excl(dst); - preempt_disable(); write_seqcount_begin(>seq); /* write_seqcount_begin provides the necessary memory barrier */ RCU_INIT_POINTER(dst->fence_excl, new); RCU_INIT_POINTER(dst->fence, dst_list); write_seqcount_end(>seq); - preempt_enable(); dma_resv_list_free(src_list); dma_fence_put(old); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index b91b5171270f..ff4b583cb96a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -258,11 +258,9 @@ static int amdgpu_amdkfd_remove_eviction_fence(struct amdgpu_bo *bo, new->shared_count = k; /* Install the new fence list, seqcount provides the barriers */ - preempt_disable(); write_seqcount_begin(>seq); RCU_INIT_POINTER(resv->fence, new); write_seqcount_end(>seq); - preempt_enable(); /* Drop the references to the removed fences or move them to ef_list */ for (i = j, k = 0; i < old->shared_count; ++i) { diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index a6538ae7d93f..d44a77e8a7e3 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -69,7 +69,7 @@ struct dma_resv_list { */ struct dma_resv { struct ww_mutex lock; - seqcount_t seq; + seqcount_ww_mutex_t seq; struct dma_fence __rcu *fence_excl; struct dma_resv_list __rcu *fence; -- 2.20.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH v2 06/18] dma-buf: Use sequence counter with associated wound/wait mutex
A sequence counter write side critical section must be protected by some form of locking to serialize writers. If the serialization primitive is not disabling preemption implicitly, preemption has to be explicitly disabled before entering the sequence counter write side critical section. The dma-buf reservation subsystem uses plain sequence counters to manage updates to reservations. Writer serialization is accomplished through a wound/wait mutex. Acquiring a wound/wait mutex does not disable preemption, so this needs to be done manually before and after the write side critical section. Use the newly-added seqcount_ww_mutex_t instead: - It associates the ww_mutex with the sequence count, which enables lockdep to validate that the write side critical section is properly serialized. - It removes the need to explicitly add preempt_disable/enable() around the write side critical section because the write_begin/end() functions for this new data type automatically do this. If lockdep is disabled this ww_mutex lock association is compiled out and has neither storage size nor runtime overhead. Signed-off-by: Ahmed S. Darwish --- drivers/dma-buf/dma-resv.c | 8 +--- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 -- include/linux/dma-resv.h | 2 +- 3 files changed, 2 insertions(+), 10 deletions(-) diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 590ce7ad60a0..3aba2b2bfc48 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -128,7 +128,7 @@ subsys_initcall(dma_resv_lockdep); void dma_resv_init(struct dma_resv *obj) { ww_mutex_init(>lock, _ww_class); - seqcount_init(>seq); + seqcount_ww_mutex_init(>seq, >lock); RCU_INIT_POINTER(obj->fence, NULL); RCU_INIT_POINTER(obj->fence_excl, NULL); @@ -259,7 +259,6 @@ void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence) fobj = dma_resv_get_list(obj); count = fobj->shared_count; - preempt_disable(); write_seqcount_begin(>seq); for (i = 0; i < count; ++i) { @@ -281,7 +280,6 @@ void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence) smp_store_mb(fobj->shared_count, count); write_seqcount_end(>seq); - preempt_enable(); dma_fence_put(old); } EXPORT_SYMBOL(dma_resv_add_shared_fence); @@ -308,14 +306,12 @@ void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence) if (fence) dma_fence_get(fence); - preempt_disable(); write_seqcount_begin(>seq); /* write_seqcount_begin provides the necessary memory barrier */ RCU_INIT_POINTER(obj->fence_excl, fence); if (old) old->shared_count = 0; write_seqcount_end(>seq); - preempt_enable(); /* inplace update, no shared fences */ while (i--) @@ -393,13 +389,11 @@ int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src) src_list = dma_resv_get_list(dst); old = dma_resv_get_excl(dst); - preempt_disable(); write_seqcount_begin(>seq); /* write_seqcount_begin provides the necessary memory barrier */ RCU_INIT_POINTER(dst->fence_excl, new); RCU_INIT_POINTER(dst->fence, dst_list); write_seqcount_end(>seq); - preempt_enable(); dma_resv_list_free(src_list); dma_fence_put(old); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 6a5b91d23fd9..c71c0bb6ce26 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -258,11 +258,9 @@ static int amdgpu_amdkfd_remove_eviction_fence(struct amdgpu_bo *bo, new->shared_count = k; /* Install the new fence list, seqcount provides the barriers */ - preempt_disable(); write_seqcount_begin(>seq); RCU_INIT_POINTER(resv->fence, new); write_seqcount_end(>seq); - preempt_enable(); /* Drop the references to the removed fences or move them to ef_list */ for (i = j, k = 0; i < old->shared_count; ++i) { diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index a6538ae7d93f..d44a77e8a7e3 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -69,7 +69,7 @@ struct dma_resv_list { */ struct dma_resv { struct ww_mutex lock; - seqcount_t seq; + seqcount_ww_mutex_t seq; struct dma_fence __rcu *fence_excl; struct dma_resv_list __rcu *fence; -- 2.20.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH v2 00/18] seqlock: Extend seqcount API with associated locks
Hi, This is v2 of the seqlock patch series: [PATCH v1 00/25] seqlock: Extend seqcount API with associated locks https://lore.kernel.org/lkml/20200519214547.352050-1-a.darw...@linutronix.de Patches 1=>3 of this v2 series add documentation for the existing seqlock.h datatypes and APIs. Hopefully they can hit v5.8 -rc2 or -rc3. Changelog-v2 1. Drop, for now, the seqlock v1 patches #7 and #8. These patches added lockdep non-preemptibility checks to seqcount_t write paths, but they now depend on on-going work by Peter: [PATCH v3 0/5] lockdep: Change IRQ state tracking to use per-cpu variables https://lkml.kernel.org/r/20200529213550.683440...@infradead.org [PATCH 00/14] x86/entry: disallow #DB more and x86/entry lockdep/nmi https://lkml.kernel.org/r/20200529212728.795169...@infradead.org Once Peter's work get merged, I'll send the non-preemptibility checks as a separate series. 2. Drop the v1 seqcount_t call-sites bugfixes. I've already posted them in an isolated series. They got merged into their respective trees, and will hit v5.8-rc1 soon: [PATCH v2 0/6] seqlock: seqcount_t call sites bugfixes https://lore.kernel.org/lkml/20200603144949.1122421-1-a.darw...@linutronix.de 3. Patch #1: Add a small paragraph explaining that seqcount_t/seqlock_t cannot be used if the protected data contains pointers. A similar paragraph already existed in seqlock.h, but got mistakenly dropped. 4. Patch #2: Don't add RST directives inside kernel-doc comments. Peter doesn't like them :) I've kept the indentation though, and found a minimal way for Sphinx to properly render these code samples without too much disruption. 5. Patch #3: Brush up the introduced kernel-doc comments. Make them more consistent overall, and more concise. Thanks, 8<-- Ahmed S. Darwish (18): Documentation: locking: Describe seqlock design and usage seqlock: Properly format kernel-doc code samples seqlock: Add missing kernel-doc annotations seqlock: Extend seqcount API with associated locks dma-buf: Remove custom seqcount lockdep class key dma-buf: Use sequence counter with associated wound/wait mutex sched: tasks: Use sequence counter with associated spinlock netfilter: conntrack: Use sequence counter with associated spinlock netfilter: nft_set_rbtree: Use sequence counter with associated rwlock xfrm: policy: Use sequence counters with associated lock timekeeping: Use sequence counter with associated raw spinlock vfs: Use sequence counter with associated spinlock raid5: Use sequence counter with associated spinlock iocost: Use sequence counter with associated spinlock NFSv4: Use sequence counter with associated spinlock userfaultfd: Use sequence counter with associated spinlock kvm/eventfd: Use sequence counter with associated spinlock hrtimer: Use sequence counter with associated raw spinlock Documentation/locking/index.rst | 1 + Documentation/locking/seqlock.rst | 242 + MAINTAINERS | 2 +- block/blk-iocost.c| 5 +- drivers/dma-buf/dma-resv.c| 15 +- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 - drivers/md/raid5.c| 2 +- drivers/md/raid5.h| 2 +- fs/dcache.c | 2 +- fs/fs_struct.c| 4 +- fs/nfs/nfs4_fs.h | 2 +- fs/nfs/nfs4state.c| 2 +- fs/userfaultfd.c | 4 +- include/linux/dcache.h| 2 +- include/linux/dma-resv.h | 4 +- include/linux/fs_struct.h | 2 +- include/linux/hrtimer.h | 2 +- include/linux/kvm_irqfd.h | 2 +- include/linux/sched.h | 2 +- include/linux/seqlock.h | 855 ++ include/linux/seqlock_types_internal.h| 187 include/net/netfilter/nf_conntrack.h | 2 +- init/init_task.c | 3 +- kernel/fork.c | 2 +- kernel/time/hrtimer.c | 13 +- kernel/time/timekeeping.c | 19 +- net/netfilter/nf_conntrack_core.c | 5 +- net/netfilter/nft_set_rbtree.c| 4 +- net/xfrm/xfrm_policy.c| 10 +- virt/kvm/eventfd.c| 2 +- 30 files changed, 1175 insertions(+), 226 deletions(-) create mode 100644 Documentation/locking/seqlock.rst create mode 100644 include/linux/seqlock_types_internal.h base-commit: 3d77e6a8804abcc0504c904bd6e5cdf3a5cf8162 -- 2.20.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH v2 05/18] dma-buf: Remove custom seqcount lockdep class key
Commit 3c3b177a9369 ("reservation: add support for read-only access using rcu") introduced a sequence counter to manage updates to reservations. Back then, the reservation object initializer reservation_object_init() was always inlined. Having the sequence counter initialization inlined meant that each of the call sites would have a different lockdep class key, which would've broken lockdep's deadlock detection. The aforementioned commit thus introduced, and exported, a custom seqcount lockdep class key and name. The commit 8735f16803f00 ("dma-buf: cleanup reservation_object_init...") transformed the reservation object initializer to a normal non-inlined C function. seqcount_init(), which automatically defines the seqcount lockdep class key and must be called non-inlined, can now be safely used. Remove the seqcount custom lockdep class key, name, and export. Use seqcount_init() inside the dma reservation object initializer. Signed-off-by: Ahmed S. Darwish Reviewed-by: Sebastian Andrzej Siewior Acked-by: Daniel Vetter --- drivers/dma-buf/dma-resv.c | 9 + include/linux/dma-resv.h | 2 -- 2 files changed, 1 insertion(+), 10 deletions(-) diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 4264e64788c4..590ce7ad60a0 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -50,12 +50,6 @@ DEFINE_WD_CLASS(reservation_ww_class); EXPORT_SYMBOL(reservation_ww_class); -struct lock_class_key reservation_seqcount_class; -EXPORT_SYMBOL(reservation_seqcount_class); - -const char reservation_seqcount_string[] = "reservation_seqcount"; -EXPORT_SYMBOL(reservation_seqcount_string); - /** * dma_resv_list_alloc - allocate fence list * @shared_max: number of fences we need space for @@ -134,9 +128,8 @@ subsys_initcall(dma_resv_lockdep); void dma_resv_init(struct dma_resv *obj) { ww_mutex_init(>lock, _ww_class); + seqcount_init(>seq); - __seqcount_init(>seq, reservation_seqcount_string, - _seqcount_class); RCU_INIT_POINTER(obj->fence, NULL); RCU_INIT_POINTER(obj->fence_excl, NULL); } diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index ee50d10f052b..a6538ae7d93f 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -46,8 +46,6 @@ #include extern struct ww_class reservation_ww_class; -extern struct lock_class_key reservation_seqcount_class; -extern const char reservation_seqcount_string[]; /** * struct dma_resv_list - a list of shared fences -- 2.20.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v2 0/6] seqlock: seqcount_t call sites bugfixes
On Thu, Jun 04, 2020 at 09:28:41AM +0200, Daniel Vetter wrote: > On Wed, Jun 03, 2020 at 04:49:43PM +0200, Ahmed S. Darwish wrote: > > Hi, > > > > Since patch #7 and #8 from the series: > > > >[PATCH v1 00/25] seqlock: Extend seqcount API with associated locks > > > > https://lore.kernel.org/lkml/20200519214547.352050-1-a.darw...@linutronix.de > > > > are now pending on the lockdep/x86 IRQ state tracking patch series: > > > >[PATCH 00/14] x86/entry: disallow #DB more and x86/entry lockdep/nmi > >https://lkml.kernel.org/r/20200529212728.795169...@infradead.org > > > >[PATCH v3 0/5] lockdep: Change IRQ state tracking to use per-cpu > > variables > >https://lkml.kernel.org/r/20200529213550.683440...@infradead.org > > > > This is a repost only of the seqcount_t call sites bugfixes that were on > > top of the seqlock patch series. > > > > These fixes are independent, and can thus be merged on their own. I'm > > reposting them now so they can at least hit -rc2 or -rc3. > > I'm confused on what I should do with patch 6 here for dma-buf. Looks like > just a good cleanup/prep work, so I'd queue it for linux-next and 5.9, but > sounds like you want this in earlier. Do you need this in 5.8-rc for some > work meant for 5.9? Will this go in through some topic branch directly? > Should I apply it? > > Patch itself lgtm, I'm just confused what I should do with it. > My apologies for the confusion. The cover letter is indeed misleading w.r.t. the dma-buf patch. It isn't a bugfix, so it shouldn't hit -rc. Since without this patch compiling the seqcount series will fail, it will be best to merge it through tip instead. So all I need for now is a reviewed-by tag :) I will forwoard it to the tip tree afterwards. Thanks, -- Ahmed S. Darwish Linutronix GmbH ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH v2 0/6] seqlock: seqcount_t call sites bugfixes
Hi, Since patch #7 and #8 from the series: [PATCH v1 00/25] seqlock: Extend seqcount API with associated locks https://lore.kernel.org/lkml/20200519214547.352050-1-a.darw...@linutronix.de are now pending on the lockdep/x86 IRQ state tracking patch series: [PATCH 00/14] x86/entry: disallow #DB more and x86/entry lockdep/nmi https://lkml.kernel.org/r/20200529212728.795169...@infradead.org [PATCH v3 0/5] lockdep: Change IRQ state tracking to use per-cpu variables https://lkml.kernel.org/r/20200529213550.683440...@infradead.org This is a repost only of the seqcount_t call sites bugfixes that were on top of the seqlock patch series. These fixes are independent, and can thus be merged on their own. I'm reposting them now so they can at least hit -rc2 or -rc3. Changelog-v2: - patch #1: Add a missing up_read() on netdev_get_name() error path exit. Thanks to Dan/kbuild-bot report. - patch #4: new patch, invalid preemptible context found by the new lockdep checks added in the seqlock series + kbuild-bot. Thanks, 8<-- Ahmed S. Darwish (6): net: core: device_rename: Use rwsem instead of a seqcount net: phy: fixed_phy: Remove unused seqcount u64_stats: Document writer non-preemptibility requirement net: mdiobus: Disable preemption upon u64_stats update block: nr_sects_write(): Disable preemption on seqcount write dma-buf: Remove custom seqcount lockdep class key block/blk.h| 2 ++ drivers/dma-buf/dma-resv.c | 9 +-- drivers/net/phy/fixed_phy.c| 26 drivers/net/phy/mdio_bus.c | 2 ++ include/linux/dma-resv.h | 2 -- include/linux/u64_stats_sync.h | 43 ++ net/core/dev.c | 40 ++- 7 files changed, 56 insertions(+), 68 deletions(-) base-commit: 3d77e6a8804abcc0504c904bd6e5cdf3a5cf8162 -- 2.20.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH v2 6/6] dma-buf: Remove custom seqcount lockdep class key
Commit 3c3b177a9369 ("reservation: add support for read-only access using rcu") introduced a sequence counter to manage updates to reservations. Back then, the reservation object initializer reservation_object_init() was always inlined. Having the sequence counter initialization inlined meant that each of the call sites would have a different lockdep class key, which would've broken lockdep's deadlock detection. The aforementioned commit thus introduced, and exported, a custom seqcount lockdep class key and name. The commit 8735f16803f00 ("dma-buf: cleanup reservation_object_init...") transformed the reservation object initializer to a normal non-inlined C function. seqcount_init(), which automatically defines the seqcount lockdep class key and must be called non-inlined, can now be safely used. Remove the seqcount custom lockdep class key, name, and export. Use seqcount_init() inside the dma reservation object initializer. Signed-off-by: Ahmed S. Darwish Reviewed-by: Sebastian Andrzej Siewior --- drivers/dma-buf/dma-resv.c | 9 + include/linux/dma-resv.h | 2 -- 2 files changed, 1 insertion(+), 10 deletions(-) diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 4264e64788c4..590ce7ad60a0 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -50,12 +50,6 @@ DEFINE_WD_CLASS(reservation_ww_class); EXPORT_SYMBOL(reservation_ww_class); -struct lock_class_key reservation_seqcount_class; -EXPORT_SYMBOL(reservation_seqcount_class); - -const char reservation_seqcount_string[] = "reservation_seqcount"; -EXPORT_SYMBOL(reservation_seqcount_string); - /** * dma_resv_list_alloc - allocate fence list * @shared_max: number of fences we need space for @@ -134,9 +128,8 @@ subsys_initcall(dma_resv_lockdep); void dma_resv_init(struct dma_resv *obj) { ww_mutex_init(>lock, _ww_class); + seqcount_init(>seq); - __seqcount_init(>seq, reservation_seqcount_string, - _seqcount_class); RCU_INIT_POINTER(obj->fence, NULL); RCU_INIT_POINTER(obj->fence_excl, NULL); } diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index ee50d10f052b..a6538ae7d93f 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -46,8 +46,6 @@ #include extern struct ww_class reservation_ww_class; -extern struct lock_class_key reservation_seqcount_class; -extern const char reservation_seqcount_string[]; /** * struct dma_resv_list - a list of shared fences -- 2.20.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v1 13/25] dma-buf: Use sequence counter with associated wound/wait mutex
On Wed, May 20, 2020, Christian König wrote: > Am 19.05.20 um 23:45 schrieb Ahmed S. Darwish: > > A sequence counter write side critical section must be protected by some > > form of locking to serialize writers. If the serialization primitive is > > not disabling preemption implicitly, preemption has to be explicitly > > disabled before entering the sequence counter write side critical > > section. > > > > The dma-buf reservation subsystem uses plain sequence counters to manage > > updates to reservations. Writer serialization is accomplished through a > > wound/wait mutex. > > > > Acquiring a wound/wait mutex does not disable preemption, so this needs > > to be done manually before and after the write side critical section. > > > > Use the newly-added seqcount_ww_mutex_t instead: > > > >- It associates the ww_mutex with the sequence count, which enables > > lockdep to validate that the write side critical section is properly > > serialized. > > > >- It removes the need to explicitly add preempt_disable/enable() > > around the write side critical section because the write_begin/end() > > functions for this new data type automatically do this. > > > > If lockdep is disabled this ww_mutex lock association is compiled out > > and has neither storage size nor runtime overhead. > > Mhm, is the dma_resv object the only user of this new seqcount_ww_mutex > variant ? > > If yes we are trying to get rid of this sequence counter for quite some > time, so I would rather invest the additional time to finish this. > In this patch series, each extra "seqcount with associated lock" data type costs us, exactly: - 1 typedef definition, seqcount_ww_mutex_t - 1 static initializer, SEQCNT_WW_MUTEX_ZERO() - 1 runtime initializer, seqcount_ww_mutex_init() Definitions for the typedef and the 2 initializers above are template-code one liners. The logic which automatically disables preemption upon entering a seqcount_ww_mutex_t write side critical section is also already shared with seqcount_mutex_t and any future, preemptible, associated lock. So, yes, dma-resv is the only user of seqcount_ww_mutex. But even in that case, given the one liner template code nature of seqcount_ww_mutex_t logic, it does not make sense to block the dma_resv and amdgpu change until at some point in the future the sequence counter is completely removed. **If and when** the sequence counter gets removed, please just remove the seqcount_ww_mutex_t data type with it. It will be extremely simple. > Regards, > Christian. > Thanks, -- Ahmed S. Darwish Linutronix GmbH ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH v1 00/25] seqlock: Extend seqcount API with associated locks
Hi, A sequence counter write side critical section must be protected by some form of locking to serialize writers. If the serialization primitive is not disabling preemption implicitly, preemption has to be explicitly disabled before entering the write side critical section. There is no built-in debugging mechanism to verify that the lock used for writer serialization is held and preemption is disabled. Some usage sites like dma-buf have explicit lockdep checks for the writer-side lock, but this covers only a small portion of the sequence counter usage in the kernel. Add new sequence counter types which allows to associate a lock to the sequence counter at initialization time. The seqcount API functions are extended to provide appropriate lockdep assertions depending on the seqcount/lock type. For sequence counters with associated locks that do not implicitly disable preemption, preemption protection is enforced in the sequence counter write side functions. This removes the need to explicitly add preempt_disable/enable() around the write side critical sections: the write_begin/end() functions for these new sequence counter types automatically do this. Extend the lockdep API with a macro asserting that preemption is disabled. Use it to verify that preemption is disabled for all sequence counters write side critical sections. If lockdep is disabled, these lock associations and non-preemptibility checks are compiled out and have neither storage size nor runtime overhead. If lockdep is enabled, a pointer to the lock is stored in the seqcount and the write side API functions enable lockdep assertions. The following seqcount types with associated locks are introduced: seqcount_spinlock_t seqcount_raw_spinlock_t seqcount_rwlock_t seqcount_mutex_t seqcount_ww_mutex_t This lock association is not only useful for debugging purposes, it also provides a mechanism for PREEMPT_RT to prevent writer starvation. On RT kernels spinlocks and rwlocks are substituted with sleeping locks and the code sections protected by these locks become preemptible, which has the same problem as write side critical section with preemption enabled on a non-RT kernel. RT utilizes this association by storing the provided lock pointer and in case that a reader sees an active writer (seqcount is odd), it does not spin, but blocks on the associated lock similar to read_seqbegin_or_lock(). By using the lockdep debugging mechanisms added in this patch series, a number of erroneous seqcount call-sites were discovered across the kernel. The fixes are included at the beginning of the series. Thanks, 8<-- Ahmed S. Darwish (25): net: core: device_rename: Use rwsem instead of a seqcount mm/swap: Don't abuse the seqcount latching API net: phy: fixed_phy: Remove unused seqcount block: nr_sects_write(): Disable preemption on seqcount write u64_stats: Document writer non-preemptibility requirement dma-buf: Remove custom seqcount lockdep class key lockdep: Add preemption disabled assertion API seqlock: lockdep assert non-preemptibility on seqcount_t write Documentation: locking: Describe seqlock design and usage seqlock: Add RST directives to kernel-doc code samples and notes seqlock: Add missing kernel-doc annotations seqlock: Extend seqcount API with associated locks dma-buf: Use sequence counter with associated wound/wait mutex sched: tasks: Use sequence counter with associated spinlock netfilter: conntrack: Use sequence counter with associated spinlock netfilter: nft_set_rbtree: Use sequence counter with associated rwlock xfrm: policy: Use sequence counters with associated lock timekeeping: Use sequence counter with associated raw spinlock vfs: Use sequence counter with associated spinlock raid5: Use sequence counter with associated spinlock iocost: Use sequence counter with associated spinlock NFSv4: Use sequence counter with associated spinlock userfaultfd: Use sequence counter with associated spinlock kvm/eventfd: Use sequence counter with associated spinlock hrtimer: Use sequence counter with associated raw spinlock Documentation/locking/index.rst | 1 + Documentation/locking/seqlock.rst | 239 + MAINTAINERS | 2 +- block/blk-iocost.c| 5 +- block/blk.h | 2 + drivers/dma-buf/dma-resv.c| 15 +- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 - drivers/md/raid5.c| 2 +- drivers/md/raid5.h| 2 +- drivers/net/phy/fixed_phy.c | 25 +- fs/dcache.c | 2 +- fs/fs_struct.c| 4 +- fs/nfs/nfs4_fs.h | 2 +- fs/nfs/nfs4state.c| 2 +- fs/userfaultfd.c | 4 +- incl
[PATCH v1 06/25] dma-buf: Remove custom seqcount lockdep class key
Commit 3c3b177a9369 ("reservation: add support for read-only access using rcu") introduced a sequence counter to manage updates to reservations. Back then, the reservation object initializer reservation_object_init() was always inlined. Having the sequence counter initialization inlined meant that each of the call sites would have a different lockdep class key, which would've broken lockdep's deadlock detection. The aforementioned commit thus introduced, and exported, a custom seqcount lockdep class key and name. The commit 8735f16803f00 ("dma-buf: cleanup reservation_object_init...") transformed the reservation object initializer to a normal non-inlined C function. seqcount_init(), which automatically defines the seqcount lockdep class key and must be called non-inlined, can now be safely used. Remove the seqcount custom lockdep class key, name, and export. Use seqcount_init() inside the dma reservation object initializer. Signed-off-by: Ahmed S. Darwish Reviewed-by: Sebastian Andrzej Siewior --- drivers/dma-buf/dma-resv.c | 9 + include/linux/dma-resv.h | 2 -- 2 files changed, 1 insertion(+), 10 deletions(-) diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 4264e64788c4..590ce7ad60a0 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -50,12 +50,6 @@ DEFINE_WD_CLASS(reservation_ww_class); EXPORT_SYMBOL(reservation_ww_class); -struct lock_class_key reservation_seqcount_class; -EXPORT_SYMBOL(reservation_seqcount_class); - -const char reservation_seqcount_string[] = "reservation_seqcount"; -EXPORT_SYMBOL(reservation_seqcount_string); - /** * dma_resv_list_alloc - allocate fence list * @shared_max: number of fences we need space for @@ -134,9 +128,8 @@ subsys_initcall(dma_resv_lockdep); void dma_resv_init(struct dma_resv *obj) { ww_mutex_init(>lock, _ww_class); + seqcount_init(>seq); - __seqcount_init(>seq, reservation_seqcount_string, - _seqcount_class); RCU_INIT_POINTER(obj->fence, NULL); RCU_INIT_POINTER(obj->fence_excl, NULL); } diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index ee50d10f052b..a6538ae7d93f 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -46,8 +46,6 @@ #include extern struct ww_class reservation_ww_class; -extern struct lock_class_key reservation_seqcount_class; -extern const char reservation_seqcount_string[]; /** * struct dma_resv_list - a list of shared fences -- 2.20.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH v1 13/25] dma-buf: Use sequence counter with associated wound/wait mutex
A sequence counter write side critical section must be protected by some form of locking to serialize writers. If the serialization primitive is not disabling preemption implicitly, preemption has to be explicitly disabled before entering the sequence counter write side critical section. The dma-buf reservation subsystem uses plain sequence counters to manage updates to reservations. Writer serialization is accomplished through a wound/wait mutex. Acquiring a wound/wait mutex does not disable preemption, so this needs to be done manually before and after the write side critical section. Use the newly-added seqcount_ww_mutex_t instead: - It associates the ww_mutex with the sequence count, which enables lockdep to validate that the write side critical section is properly serialized. - It removes the need to explicitly add preempt_disable/enable() around the write side critical section because the write_begin/end() functions for this new data type automatically do this. If lockdep is disabled this ww_mutex lock association is compiled out and has neither storage size nor runtime overhead. Signed-off-by: Ahmed S. Darwish --- drivers/dma-buf/dma-resv.c | 8 +--- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 -- include/linux/dma-resv.h | 2 +- 3 files changed, 2 insertions(+), 10 deletions(-) diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 590ce7ad60a0..3aba2b2bfc48 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -128,7 +128,7 @@ subsys_initcall(dma_resv_lockdep); void dma_resv_init(struct dma_resv *obj) { ww_mutex_init(>lock, _ww_class); - seqcount_init(>seq); + seqcount_ww_mutex_init(>seq, >lock); RCU_INIT_POINTER(obj->fence, NULL); RCU_INIT_POINTER(obj->fence_excl, NULL); @@ -259,7 +259,6 @@ void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence) fobj = dma_resv_get_list(obj); count = fobj->shared_count; - preempt_disable(); write_seqcount_begin(>seq); for (i = 0; i < count; ++i) { @@ -281,7 +280,6 @@ void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence) smp_store_mb(fobj->shared_count, count); write_seqcount_end(>seq); - preempt_enable(); dma_fence_put(old); } EXPORT_SYMBOL(dma_resv_add_shared_fence); @@ -308,14 +306,12 @@ void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence) if (fence) dma_fence_get(fence); - preempt_disable(); write_seqcount_begin(>seq); /* write_seqcount_begin provides the necessary memory barrier */ RCU_INIT_POINTER(obj->fence_excl, fence); if (old) old->shared_count = 0; write_seqcount_end(>seq); - preempt_enable(); /* inplace update, no shared fences */ while (i--) @@ -393,13 +389,11 @@ int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src) src_list = dma_resv_get_list(dst); old = dma_resv_get_excl(dst); - preempt_disable(); write_seqcount_begin(>seq); /* write_seqcount_begin provides the necessary memory barrier */ RCU_INIT_POINTER(dst->fence_excl, new); RCU_INIT_POINTER(dst->fence, dst_list); write_seqcount_end(>seq); - preempt_enable(); dma_resv_list_free(src_list); dma_fence_put(old); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 9dff792c9290..87fd32aae8f9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -258,11 +258,9 @@ static int amdgpu_amdkfd_remove_eviction_fence(struct amdgpu_bo *bo, new->shared_count = k; /* Install the new fence list, seqcount provides the barriers */ - preempt_disable(); write_seqcount_begin(>seq); RCU_INIT_POINTER(resv->fence, new); write_seqcount_end(>seq); - preempt_enable(); /* Drop the references to the removed fences or move them to ef_list */ for (i = j, k = 0; i < old->shared_count; ++i) { diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index a6538ae7d93f..d44a77e8a7e3 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -69,7 +69,7 @@ struct dma_resv_list { */ struct dma_resv { struct ww_mutex lock; - seqcount_t seq; + seqcount_ww_mutex_t seq; struct dma_fence __rcu *fence_excl; struct dma_resv_list __rcu *fence; -- 2.20.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm: headers: Add neccessary include files and guards
Hi, On Wed, Mar 27, 2019 at 09:30:59AM +0100, Daniel Vetter wrote: > On Wed, Mar 27, 2019 at 12:23:43AM +0100, Ahmed S. Darwish wrote: > > Otherwise gcc will complain about unknown types, and declarations > > inside parameter lists, if "drm_internal.h" is used in C files with > > less headers than what's now typically done under drivers/gpu/drm/. > > > > Signed-off-by: Ahmed S. Darwish > > --- > > > > Notes: > > This was triggered by the in-development drm-panic code. > > > > drivers/gpu/drm/drm_crtc_helper_internal.h | 5 + > > drivers/gpu/drm/drm_crtc_internal.h| 5 + > > drivers/gpu/drm/drm_internal.h | 12 > > include/drm/drm_auth.h | 9 + > > 4 files changed, 31 insertions(+) > > > > diff --git a/drivers/gpu/drm/drm_crtc_helper_internal.h > > b/drivers/gpu/drm/drm_crtc_helper_internal.h > > index b5ac1581e623..ee8d8682db09 100644 > > --- a/drivers/gpu/drm/drm_crtc_helper_internal.h > > +++ b/drivers/gpu/drm/drm_crtc_helper_internal.h > > @@ -20,6 +20,9 @@ > > * OTHER DEALINGS IN THE SOFTWARE. > > */ > > > > +#ifndef _DRM_CRTC_HELPER_INTERNAL_H > > +#define _DRM_CRTC_HELPER_INTERNAL_H > > I'm kinda wondering how you managed to include these multiple times? > Oh, this was only added for completeness / consistency. The actual errors triggered are the missing declarations. > > + > > /* > > * This header file contains mode setting related functions and definitions > > * which are only used within the drm kms helper module as internal > > @@ -75,3 +78,5 @@ enum drm_mode_status drm_encoder_mode_valid(struct > > drm_encoder *encoder, > > const struct drm_display_mode > > *mode); > > enum drm_mode_status drm_connector_mode_valid(struct drm_connector > > *connector, > > struct drm_display_mode *mode); > > + > > +#endif /* _DRM_CRTC_HELPER_INTERNAL_H */ > > diff --git a/drivers/gpu/drm/drm_crtc_internal.h > > b/drivers/gpu/drm/drm_crtc_internal.h > > index 216f2a9ee3d4..840c1cb2eb7b 100644 > > --- a/drivers/gpu/drm/drm_crtc_internal.h > > +++ b/drivers/gpu/drm/drm_crtc_internal.h > > @@ -25,6 +25,9 @@ > > * OTHER DEALINGS IN THE SOFTWARE. > > */ > > > > +#ifndef _DRM_CRTC_INTERNAL_H > > +#define _DRM_CRTC_INTERNAL_H > > + > > /* > > * This header file contains mode setting related functions and definitions > > * which are only used within the drm module as internal implementation > > details > > @@ -252,3 +255,5 @@ int drm_mode_page_flip_ioctl(struct drm_device *dev, > > void drm_mode_fixup_1366x768(struct drm_display_mode *mode); > > void drm_reset_display_info(struct drm_connector *connector); > > u32 drm_add_display_info(struct drm_connector *connector, const struct > > edid *edid); > > + > > +#endif /* _DRM_CRTC_INTERNAL_H */ > > diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h > > index 251d67e04c2d..a1b68836b048 100644 > > --- a/drivers/gpu/drm/drm_internal.h > > +++ b/drivers/gpu/drm/drm_internal.h > > @@ -21,7 +21,17 @@ > > * OTHER DEALINGS IN THE SOFTWARE. > > */ > > > > +#ifndef _DRM_INTERNAL_H > > +#define _DRM_INTERNAL_H > > + > > +#include > > + > > +#include > > +#include > > +#include > > +#include > > #include > > +#include > > If you only need these for the structures used in pointers, then please > use forward declarations, don't pull in the entire thing. > > I'm also wondering whether we actually need drm_ioctl.h, or whether a few > forward decls would be good enough. > Oh, I see, forward decls then .. let's drop this. > > > > #define DRM_IF_MAJOR 1 > > #define DRM_IF_MINOR 4 > > @@ -191,3 +201,5 @@ int drm_syncobj_signal_ioctl(struct drm_device *dev, > > void *data, > > void drm_framebuffer_print_info(struct drm_printer *p, unsigned int indent, > > const struct drm_framebuffer *fb); > > int drm_framebuffer_debugfs_init(struct drm_minor *minor); > > + > > +#endif /* _DRM_INTERNAL_H */ > > diff --git a/include/drm/drm_auth.h b/include/drm/drm_auth.h > > index 86bff9841b54..a1a59fbf26b1 100644 > > --- a/include/drm/drm_auth.h > > +++ b/include/drm/drm_auth.h > > @@ -28,6 +28,15 @@ > > #ifndef _DRM_AUTH_H_ > > #define _DRM_AUTH_H_ > > > > +#include > > +#include > > +#include > > + > > +#include > > + > > +#include > > +#include > > Same comment about forward decls. Plus it would be good to also clean up > drm_auth.c, i.e. drop the drmP.h include and put the drm_auth.h include as > the very first one. To make sure the header is now working correctly > stand-alone. > will do. > -Daniel > thanks, --darwi ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] drm: headers: Add neccessary include files and guards
Otherwise gcc will complain about unknown types, and declarations inside parameter lists, if "drm_internal.h" is used in C files with less headers than what's now typically done under drivers/gpu/drm/. Signed-off-by: Ahmed S. Darwish --- Notes: This was triggered by the in-development drm-panic code. drivers/gpu/drm/drm_crtc_helper_internal.h | 5 + drivers/gpu/drm/drm_crtc_internal.h| 5 + drivers/gpu/drm/drm_internal.h | 12 include/drm/drm_auth.h | 9 + 4 files changed, 31 insertions(+) diff --git a/drivers/gpu/drm/drm_crtc_helper_internal.h b/drivers/gpu/drm/drm_crtc_helper_internal.h index b5ac1581e623..ee8d8682db09 100644 --- a/drivers/gpu/drm/drm_crtc_helper_internal.h +++ b/drivers/gpu/drm/drm_crtc_helper_internal.h @@ -20,6 +20,9 @@ * OTHER DEALINGS IN THE SOFTWARE. */ +#ifndef _DRM_CRTC_HELPER_INTERNAL_H +#define _DRM_CRTC_HELPER_INTERNAL_H + /* * This header file contains mode setting related functions and definitions * which are only used within the drm kms helper module as internal @@ -75,3 +78,5 @@ enum drm_mode_status drm_encoder_mode_valid(struct drm_encoder *encoder, const struct drm_display_mode *mode); enum drm_mode_status drm_connector_mode_valid(struct drm_connector *connector, struct drm_display_mode *mode); + +#endif /* _DRM_CRTC_HELPER_INTERNAL_H */ diff --git a/drivers/gpu/drm/drm_crtc_internal.h b/drivers/gpu/drm/drm_crtc_internal.h index 216f2a9ee3d4..840c1cb2eb7b 100644 --- a/drivers/gpu/drm/drm_crtc_internal.h +++ b/drivers/gpu/drm/drm_crtc_internal.h @@ -25,6 +25,9 @@ * OTHER DEALINGS IN THE SOFTWARE. */ +#ifndef _DRM_CRTC_INTERNAL_H +#define _DRM_CRTC_INTERNAL_H + /* * This header file contains mode setting related functions and definitions * which are only used within the drm module as internal implementation details @@ -252,3 +255,5 @@ int drm_mode_page_flip_ioctl(struct drm_device *dev, void drm_mode_fixup_1366x768(struct drm_display_mode *mode); void drm_reset_display_info(struct drm_connector *connector); u32 drm_add_display_info(struct drm_connector *connector, const struct edid *edid); + +#endif /* _DRM_CRTC_INTERNAL_H */ diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h index 251d67e04c2d..a1b68836b048 100644 --- a/drivers/gpu/drm/drm_internal.h +++ b/drivers/gpu/drm/drm_internal.h @@ -21,7 +21,17 @@ * OTHER DEALINGS IN THE SOFTWARE. */ +#ifndef _DRM_INTERNAL_H +#define _DRM_INTERNAL_H + +#include + +#include +#include +#include +#include #include +#include #define DRM_IF_MAJOR 1 #define DRM_IF_MINOR 4 @@ -191,3 +201,5 @@ int drm_syncobj_signal_ioctl(struct drm_device *dev, void *data, void drm_framebuffer_print_info(struct drm_printer *p, unsigned int indent, const struct drm_framebuffer *fb); int drm_framebuffer_debugfs_init(struct drm_minor *minor); + +#endif /* _DRM_INTERNAL_H */ diff --git a/include/drm/drm_auth.h b/include/drm/drm_auth.h index 86bff9841b54..a1a59fbf26b1 100644 --- a/include/drm/drm_auth.h +++ b/include/drm/drm_auth.h @@ -28,6 +28,15 @@ #ifndef _DRM_AUTH_H_ #define _DRM_AUTH_H_ +#include +#include +#include + +#include + +#include +#include + /* * Legacy DRI1 locking data structure. Only here instead of in drm_legacy.h for * include ordering reasons. -- darwi http://darwish.chasingpointers.com ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v2 0/3] drm: Add panic handling
=> Now that the dust has settled, here's a summary of this huge 50-email thread (thanks Daniel, Noralf, John, everyone!). => Parts of this document are a direct rewording of Daniel's replies, so I took the liberty of adding a Co-developed-by tag here.. => This is only a summary, and _not_ an official patch submission. It's now Show-me-the-code time ;-) Subject: [PATCH] Documentation: gpu: Add initial DRM panic design Co-developed-by: Daniel Vetter Signed-off-by: Ahmed S. Darwish --- Documentation/gpu/drm-panic-design.rst | 124 + 1 file changed, 124 insertions(+) create mode 100644 Documentation/gpu/drm-panic-design.rst diff --git a/Documentation/gpu/drm-panic-design.rst b/Documentation/gpu/drm-panic-design.rst new file mode 100644 index ..ba306193f347 --- /dev/null +++ b/Documentation/gpu/drm-panic-design.rst @@ -0,0 +1,124 @@ + + +A DRM-based panic viewer + + +The Linux Kernel currently contains the necessary plumbing for viewing +a kernel panic log using DRM-based kmsg dumpers, even if the system is +currently running a graphical session (e.g. wayland). + +.. _drm_panic_design: + +Implementation design += + +Code pathes running in a panic context face several constraints: + +1. Code must be fully synchronous: interrupts are disabled +2. Dynamic memory allocations are not allowed +3. Cannot acquire any mutexes: atomic context, due to 1. +4. Cannot acquire any spin locks: potential spinning-forever risk + +For the *DRM* panic code, the extra conditions below apply: + +5. Code must only trylock relevant DRM subsystem locks +6. If any trylock operation fails, the code *must not* go through +7. All rendering is done on the GPU's "current display buffer" used + at the time of panic(): no HW programming is done at all. +8. The code must be non-intrusive, and *must not* affect any other + panic handling mechanism (netconsole, ramoops, kexec, etc.) + +Rationale follows. + + +Spin locks +-- + +Acquiring a spin lock in a panic() context is potentially lethal: +the lock might've been already acquired, _permanently_, by another +core that is now fully shut down through an IPI from the panic()-ing +core. + +Moreover, at least on x86, the first target architecture for this +work, the panic()-ing core wait by default for *a full second* until +all other cores finish their non-preemptible regions and terminate. +If that did not work out, it even tries more aggressively with NMIs. + +So if the other non panic()-ing cores was holding a DRM-related lock +through spin_lock_irqsave() for more than one second, then it's a +bug in the DRM layer code. Thus, the DRM panic viewer cannot do +anything and should just exit. [A] + +What if the non panic()-ing core was holding a DRM lock through +barebone spin_lock()? Interrupts are enabled there, so the IPI will be +handled, and thus that core will effectively die while the lock is +*forever held*. [B] + + +Trylocking +-- + +The DRM panic code always *tries* to acquire the *minimum relevant +set* of DRM related locks, through the basic :c:func:`spin_trylock()` +mechanism. + +From case [A] and case [B] above, if the trylock operation fails, +there's no point in retrying it multiple times: the relevant locks +are in a broken and unrecoverable state anyway. + +Moreover, The panic code cannot also just ignore the DRM locks and +force its way through: a broken non-preemptible DRM region implies +either invalid SW state (e.g. broken linked list pointers), or a GPU +in an unknown HW state. + +A GPU in an unknown HW state is quite dangerous: it has access to the +full system memory, and if poked incorrectly, there's a really good +chance it can kill the entire machine. + + +GPU hardware access +--- + +In the current state, a full GPU reset, modesetting, or even disabling +GPU planes, is not doable under a panic() context: it implies going +through a potentially huge set of DRM call-chains that cannot be +sanely verified against the :ref:`drm_panic_design` requirements +(e.g. memory allocations, spinlocks, etc.). + +The current approach is simple: run the minimal amount of code +necessary to draw pixels onto the current scanout buffers. Instead +of disabling GPU planes, the biggest visible rectangle is just picked. + +*Usually* there should be a main window that is big enough to show the +oops. + + +CI testing +-- + +One of the things that killed earlier linux DRM panic handling efforts, +beside getting into deep DRM call-chains that cannot be verified, was +that it couldn't be tested except with real oopses. + +The first set of bug reports was whack-a-molde kind of bugs where the +oops displayed was caused by the DRM panic handler itself instead of +the real oops causing the panic. + +Thus, the :ref:`drm_panic_design` requirements was created. Moreover: + + - Special hooks are added at the spin_lock(
Re: [PATCH v2 1/3] drm: Add support for panic message output
Hi, On Wed, Mar 13, 2019 at 09:35:05AM +0100, Daniel Vetter wrote: > On Tue, Mar 12, 2019 at 11:13:03PM +0100, Ahmed S. Darwish wrote: > > On Mon, Mar 11, 2019 at 11:33:15PM +0100, Noralf Trønnes wrote: > > > Den 11.03.2019 20.23, skrev Daniel Vetter: [……] > > > > > > > > class_for_each_device uses klist, which only uses an irqsave spinlock. I > > > > think that's good enough. Comment to that effect would be good e.g. > > > > > > > > /* based on klist, which uses only a spin_lock_irqsave, which we > > > > * assume still works */ > > > > > > > > If we aim for perfect this should be a trylock still, maybe using our > > > > own > > > > device list. > > > > > > > > I definitely agree here. > > > > The lock may already be locked either by a stopped CPU, or by the > > very same CPU we execute panic() on (e.g. NMI panic() on the > > printing CPU). > > > > This is why it's very common for example in serial consoles, which > > are usually careful about re-entrance and panic contexts, to do: > > > > xx_console_write(...) { > > if (oops_in_progress) > > locked = spin_trylock_irqsave(>lock, flags); > > else > > spin_lock_irqsave(>lock, flags); > > } > > > > I'm quite positive we should do the same for panic drm drivers. > > Yeah Ideally all the locking in the drm path would be trylock only. > > I wonder whether lockdep could help us validate this, with some "don't > allow anything except trylocks in this context". It's easy to audit the > core code with review, but drivers are much tougher. And often end up with > really deep callchains to get at the backing buffers. [……] > > > > Instead what I had in mind is to recreate the worst > > > > possible panic context as much as feasible (disabling interrupts should > > > > be > > > > a good start, maybe we can even do an nmi callback), and then call our > > > > panic implementation. That way we can test the panic handler in a > > > > non-destructive way (i.e. aside from last dmesg lines printed to the > > > > screen nothing bad happens to the kernel: No real panic, no oops, no > > > > tainting). > > > > > > The interrupt case I can do, nmi I have no idea. > > > > > > > I agree too. Disabling interrupts + CONFIG_DEBUG_ATOMIC_SLEEP > > would be a nice non-destructive test-case emulation. > > See above, if we can somehow emulate "all locks are held, only allow > trylock" with lockdep that would be great too. > H, to prototype things faster, I'll do it as a hook: spin_lock(spinlock_t *lock) { might_spin_forever_under_panic_path(); ... } Just like how mutexes and DEBUG_ATOMIC_SLEEP do it today: void mutex_lock(struct mutex *lock) { might_sleep(); ... } Hopefully the locking maintainers can accept something like this. If not, let's play with lockdep (which is a bit complex). ===> Thanks to Sebastian for suggesting this! > Plus nmi context, once/if that somehow becomes relevant. > > The thing that killed the old drm panic handling code definitely was that > we flat out couldnt' test it except with real oopses. And that's just > whack-a-mole and bug reporter frustration if you first have a few patch > iterations around "oh sry, it's still some oops in the panic handler, not > the first one that we're seeing" :-/ > Oh, that's a critical point: full data > some data > no data > invalid/useless data First impressions for the feature will definitely count. Hopefully the hooks above and some heavy DRM CI testing __beforehand__ can improve things before publishing to a wider audience.. Cheers, -- darwi http://darwish.chasingpointers.com ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v2 1/3] drm: Add support for panic message output
[[ Adding Sebastian, who is quite experienced in intricate locking situations due to daily PREEMPT_RT work.. ]] On Wed, Mar 13, 2019 at 09:37:10AM +0100, Daniel Vetter wrote: > On Wed, Mar 13, 2019 at 08:49:17AM +0100, John Ogness wrote: > > On 2019-03-12, Ahmed S. Darwish wrote: > > > On Wed, Mar 13, 2019 at 09:37:10AM +0100, Daniel Vetter wrote: [……] > > > > > > > > If we aim for perfect this should be a trylock still, maybe using > > > > our own device list. > > > > > > > > > > I definitely agree here. > > > > > > The lock may already be locked either by a stopped CPU, or by the > > > very same CPU we execute panic() on (e.g. NMI panic() on the > > > printing CPU). > > > > > > This is why it's very common for example in serial consoles, which > > > are usually careful about re-entrance and panic contexts, to do: > > > > > > xx_console_write(...) { > > > if (oops_in_progress) > > > locked = spin_trylock_irqsave(>lock, flags); > > > else > > > spin_lock_irqsave(>lock, flags); > > > } > > > > > > I'm quite positive we should do the same for panic drm drivers. > > > > This construction will continue, even if the trylock fails. It only > > makes sense to do this if the driver has a chance of being > > successful. Ignoring locks is a serious issue. I personally am quite > > unhappy that the serial drivers do this, which was part of my motivation > > for the new printk design I'm working on. > > > > If the driver is not capable of doing something useful on a failed > > trylock, then I recommend just skipping that device. Maybe trying it > > again later after trying all the devices? > > Ah yes missed that. If the trylock fails anywhere, we must bail out. > > Not sure retrying is useful, my experience from at least drm is that > either you're lucky, and drm wasn't doing anything right when the machine > blew up, and then the trylocks will all go through. Or you're unlucky, and > most likely that means drm itself blew up, and no amount of retrying is > going to help. I wouldn't bother. > Ack, retrying most probably won't help. I see that, at least in x86: kernel/panic.c::panic() => kernel/panic.c::smp_send_stop() => arch/x86/incluse/smp.h::smp_send_stop(0) => arch/x86/kernel/smp.c::native_stop_other_cpus(wait) /* * Don't wait longer than a second if the caller * didn't ask us to wait. */ timeout = USEC_PER_SEC; while (num_online_cpus() > 1 && (wait || timeout--)) udelay(1); So the panic()-ing core wait by default for _a full second_ until all other cores finish their non-preemptible regions. If that did not work out, it even tries more aggressively with NMIs. So if one of the other non panic()-ing cores was holding a DRM related lock through spin_lock_irqsave() for more than one second, well, it's a bug in the DRM layer code anyway.. [A] Problem is, what will happen if the non panic()-ing core was holding a DRM lock through barebone spin_lock()? Interrupts are enabled there, so the IPI will be handled, and thus that core will effectively die while the lock is forever held :-( [B] But, well, yeah, I don't think there's a solution for the [B] part except by spin_trylock() the fewest amount of locks possible, and bailing out if anyone of them is held. For reference, I asked over IRC why not just resetting GPU state, but it's not easy as it sounds to people outside of the gfx world: [abridged] So it seems Windows is forcing driver writers to implement a gpu reset method [*] since Windows vista yeah not going to do that from panic there's a non-zero chance that gpu reset takes down the system with at least lots of hw/driver combos I think all our drivers also have gpu reset support if something is still rendering while we panic, mea culpa only way around that is switching planes which we tried with the old approach, and that's just too hard you end up running a few 100kloc of code worst case no way to audit/validate that for panic context but the rendering shouldn't really matter yeah, actually modesetting in a panic seems hopeless. at least for modern compositors anholt, yeah I killed that idea so they only render to the backbuffer and we won't show that with a panic yep. just draw on a plane. for everything but uncomposited x11, it'll already be idle. so only really an issue on boot splash and frontbuffer x11 and there a few cursors/characters drawn won't really matter in most cases anholt, yeah that's t
Re: [PATCH v2 1/3] drm: Add support for panic message output
Hi, [[ CCing John for the trylock parts ]] On Mon, Mar 11, 2019 at 11:33:15PM +0100, Noralf Trønnes wrote: > > > Den 11.03.2019 20.23, skrev Daniel Vetter: > > On Mon, Mar 11, 2019 at 06:42:16PM +0100, Noralf Trønnes wrote: > >> This adds support for outputting kernel messages on panic(). > >> A kernel message dumper is used to dump the log. The dumper iterates > >> over each DRM device and it's crtc's to find suitable framebuffers. > >> > >> All the other dumpers are run before this one except mtdoops. > >> Only atomic drivers are supported. > >> > >> Signed-off-by: Noralf Trønnes > > > > Bunch of comments/ideas for you or Darwish below, whoever picks this up. > > Actually it would ne nice if Darwish could pick it up since he will do > it on i915 which will be useful to a much broader audience. > If not I'll respin when I'm done with the drm_fb_helper refactoring. > Yup, I'll be more than happy to do this.. while preserving all of Noralf's authorship and copyright notices of course. I guess it can be: - Handle the comments posted by Daniel and others (I'll post some questions too) - Add the necessary i915 specific bits - Test, post v3/v4/../vn. Rinse and repeat. Keep it local at dri-devel until getting the necessary S-o-Bs. - Post to wider audience (some feedback from distribution folks would also be nice, before posting to lkml) More comments below.. [...] > >> + > >> +static void drm_panic_kmsg_dump(struct kmsg_dumper *dumper, > >> + enum kmsg_dump_reason reason) > >> +{ > >> + class_for_each_device(drm_class, NULL, dumper, drm_panic_dev_iter); > > > > class_for_each_device uses klist, which only uses an irqsave spinlock. I > > think that's good enough. Comment to that effect would be good e.g. > > > > /* based on klist, which uses only a spin_lock_irqsave, which we > > * assume still works */ > > > > If we aim for perfect this should be a trylock still, maybe using our own > > device list. > > I definitely agree here. The lock may already be locked either by a stopped CPU, or by the very same CPU we execute panic() on (e.g. NMI panic() on the printing CPU). This is why it's very common for example in serial consoles, which are usually careful about re-entrance and panic contexts, to do: xx_console_write(...) { if (oops_in_progress) locked = spin_trylock_irqsave(>lock, flags); else spin_lock_irqsave(>lock, flags); } I'm quite positive we should do the same for panic drm drivers. John? > >> +} > >> + > >> +static struct kmsg_dumper drm_panic_kmsg_dumper = { > >> + .dump = drm_panic_kmsg_dump, > >> + .max_reason = KMSG_DUMP_PANIC, > >> +}; > >> + > >> +static ssize_t drm_panic_file_panic_write(struct file *file, > >> +const char __user *user_buf, > >> +size_t count, loff_t *ppos) > >> +{ > >> + unsigned long long val; > >> + char buf[24]; > >> + size_t size; > >> + ssize_t ret; > >> + > >> + size = min(sizeof(buf) - 1, count); > >> + if (copy_from_user(buf, user_buf, size)) > >> + return -EFAULT; > >> + > >> + buf[size] = '\0'; > >> + ret = kstrtoull(buf, 0, ); > >> + if (ret) > >> + return ret; > >> + > >> + drm_panic_kmsg_dumper.max_reason = KMSG_DUMP_OOPS; > >> + wmb(); > >> + > >> + /* Do a real test with: echo c > /proc/sysrq-trigger */ > >> + > >> + if (val == 0) { > >> + pr_info("Test panic screen using kmsg_dump(OOPS)\n"); > >> + kmsg_dump(KMSG_DUMP_OOPS); > >> + } else if (val == 1) { > >> + char *null_pointer = NULL; > >> + > >> + pr_info("Test panic screen using NULL pointer dereference\n"); > >> + *null_pointer = 1; > >> + } else { > >> + return -EINVAL; > >> + } > > > > This isn't quite what I had in mind, since it still kills the kernel (like > > sysrq-trigger). > > If val == 0, it doesn't kill the kernel, it only dumps the kernel log. > And it doesn't taint the kernel either. > > > Instead what I had in mind is to recreate the worst > > possible panic context as much as feasible (disabling interrupts should be > > a good start, maybe we can even do an nmi callback), and then call our > > panic implementation. That way we can test the panic handler in a > > non-destructive way (i.e. aside from last dmesg lines printed to the > > screen nothing bad happens to the kernel: No real panic, no oops, no > > tainting). > > The interrupt case I can do, nmi I have no idea. > I agree too. Disabling interrupts + CONFIG_DEBUG_ATOMIC_SLEEP would be a nice non-destructive test-case emulation. thanks! -- darwi http://darwish.chasingpointers.com ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v2 1/3] drm: Add support for panic message output
Hi, [[ CCing John for the trylock parts ]] On Mon, Mar 11, 2019 at 11:33:15PM +0100, Noralf Trønnes wrote: > > Den 11.03.2019 20.23, skrev Daniel Vetter: > > On Mon, Mar 11, 2019 at 06:42:16PM +0100, Noralf Trønnes wrote: > >> This adds support for outputting kernel messages on panic(). > >> A kernel message dumper is used to dump the log. The dumper iterates > >> over each DRM device and it's crtc's to find suitable framebuffers. > >> > >> All the other dumpers are run before this one except mtdoops. > >> Only atomic drivers are supported. > >> > >> Signed-off-by: Noralf Trønnes > > > > Bunch of comments/ideas for you or Darwish below, whoever picks this up. > > Actually it would ne nice if Darwish could pick it up since he will do > it on i915 which will be useful to a much broader audience. > If not I'll respin when I'm done with the drm_fb_helper refactoring. > Yup, I'll be more than happy to do this.. while preserving all of Noralf's authorship and copyright notices of course. I guess it can be: - Handle the comments posted by Daniel and others (I'll post some questions too) - Add the necessary i915 specific bits - Test, post v3/v4/../vn. Rinse and repeat. Keep it local at dri-devel until getting the necessary S-o-Bs. - Post to wider audience (some feedback from distribution folks would also be nice, before posting to lkml) More comments below.. [...] > >> + > >> +static void drm_panic_kmsg_dump(struct kmsg_dumper *dumper, > >> + enum kmsg_dump_reason reason) > >> +{ > >> + class_for_each_device(drm_class, NULL, dumper, drm_panic_dev_iter); > > > > class_for_each_device uses klist, which only uses an irqsave spinlock. I > > think that's good enough. Comment to that effect would be good e.g. > > > > /* based on klist, which uses only a spin_lock_irqsave, which we > > * assume still works */ > > > > If we aim for perfect this should be a trylock still, maybe using our own > > device list. > > I definitely agree here. The lock may already be locked either by a stopped CPU, or by the very same CPU we execute panic() on (e.g. NMI panic() on the printing CPU). This is why it's very common for example in serial consoles, which are usually careful about re-entrance and panic contexts, to do: xx_console_write(...) { if (oops_in_progress) locked = spin_trylock_irqsave(>lock, flags); else spin_lock_irqsave(>lock, flags); } I'm quite positive we should do the same for panic drm drivers. John? > >> +} > >> + > >> +static struct kmsg_dumper drm_panic_kmsg_dumper = { > >> + .dump = drm_panic_kmsg_dump, > >> + .max_reason = KMSG_DUMP_PANIC, > >> +}; > >> + > >> +static ssize_t drm_panic_file_panic_write(struct file *file, > >> +const char __user *user_buf, > >> +size_t count, loff_t *ppos) > >> +{ > >> + unsigned long long val; > >> + char buf[24]; > >> + size_t size; > >> + ssize_t ret; > >> + > >> + size = min(sizeof(buf) - 1, count); > >> + if (copy_from_user(buf, user_buf, size)) > >> + return -EFAULT; > >> + > >> + buf[size] = '\0'; > >> + ret = kstrtoull(buf, 0, ); > >> + if (ret) > >> + return ret; > >> + > >> + drm_panic_kmsg_dumper.max_reason = KMSG_DUMP_OOPS; > >> + wmb(); > >> + > >> + /* Do a real test with: echo c > /proc/sysrq-trigger */ > >> + > >> + if (val == 0) { > >> + pr_info("Test panic screen using kmsg_dump(OOPS)\n"); > >> + kmsg_dump(KMSG_DUMP_OOPS); > >> + } else if (val == 1) { > >> + char *null_pointer = NULL; > >> + > >> + pr_info("Test panic screen using NULL pointer dereference\n"); > >> + *null_pointer = 1; > >> + } else { > >> + return -EINVAL; > >> + } > > > > This isn't quite what I had in mind, since it still kills the kernel (like > > sysrq-trigger). > > If val == 0, it doesn't kill the kernel, it only dumps the kernel log. > And it doesn't taint the kernel either. > > > Instead what I had in mind is to recreate the worst > > possible panic context as much as feasible (disabling interrupts should be > > a good start, maybe we can even do an nmi callback), and then call our > > panic implementation. That way we can test the panic handler in a > > non-destructive way (i.e. aside from last dmesg lines printed to the > > screen nothing bad happens to the kernel: No real panic, no oops, no > > tainting). > > The interrupt case I can do, nmi I have no idea. > I agree too. Disabling interrupts + CONFIG_DEBUG_ATOMIC_SLEEP would be a nice non-destructive test-case emulation. thanks! -- darwi http://darwish.chasingpointers.com ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v2 1/3] drm: Add support for panic message output
Hi, On Tue, Mar 12, 2019 at 11:58:10AM +0100, Daniel Vetter wrote: > On Mon, Mar 11, 2019 at 11:33:15PM +0100, Noralf Trønnes wrote: > > > > > > Den 11.03.2019 20.23, skrev Daniel Vetter: > > > On Mon, Mar 11, 2019 at 06:42:16PM +0100, Noralf Trønnes wrote: [...] > > >> +} > > >> + > > >> +static struct kmsg_dumper drm_panic_kmsg_dumper = { > > >> +.dump = drm_panic_kmsg_dump, > > >> +.max_reason = KMSG_DUMP_PANIC, > > >> +}; > > >> + > > >> +static ssize_t drm_panic_file_panic_write(struct file *file, > > >> + const char __user *user_buf, > > >> + size_t count, loff_t *ppos) > > >> +{ > > >> +unsigned long long val; > > >> +char buf[24]; > > >> +size_t size; > > >> +ssize_t ret; > > >> + > > >> +size = min(sizeof(buf) - 1, count); > > >> +if (copy_from_user(buf, user_buf, size)) > > >> +return -EFAULT; > > >> + > > >> +buf[size] = '\0'; > > >> +ret = kstrtoull(buf, 0, ); > > >> +if (ret) > > >> +return ret; > > >> + > > >> +drm_panic_kmsg_dumper.max_reason = KMSG_DUMP_OOPS; > > >> +wmb(); > > >> + > > >> +/* Do a real test with: echo c > /proc/sysrq-trigger */ > > >> + > > >> +if (val == 0) { > > >> +pr_info("Test panic screen using kmsg_dump(OOPS)\n"); > > >> +kmsg_dump(KMSG_DUMP_OOPS); > > >> +} else if (val == 1) { > > >> +char *null_pointer = NULL; > > >> + > > >> +pr_info("Test panic screen using NULL pointer > > >> dereference\n"); > > >> +*null_pointer = 1; > > >> +} else { > > >> +return -EINVAL; > > >> +} >> > > > > > This isn't quite what I had in mind, since it still kills the kernel (like > > > sysrq-trigger). > > > > If val == 0, it doesn't kill the kernel, it only dumps the kernel log. > > And it doesn't taint the kernel either. > > Ah I didn't realize that. Sounds like a good option to keep. > > > > Instead what I had in mind is to recreate the worst > > > possible panic context as much as feasible (disabling interrupts should be > > > a good start, maybe we can even do an nmi callback), and then call our > > > panic implementation. That way we can test the panic handler in a > > > non-destructive way (i.e. aside from last dmesg lines printed to the > > > screen nothing bad happens to the kernel: No real panic, no oops, no > > > tainting). > > > > The interrupt case I can do, nmi I have no idea. > > I just read the printk nmi code again and it looks like there's now even > more special handling for issues happening in nmi context, so we should > never see an oops from nmi context. So local_irq_disable() wrapping should > be good enough for testing. > -Daniel > Hmmm, John is on a mission to change that soon: - https://lore.kernel.org/lkml/20190212143003.48446-1-john.ogn...@linutronix.de - https://lwn.net/Articles/780556 A v2 is on the way, and there's a lot of interest from different groups, including RT, to get that in. So I guess it would be nice to cover NMI contexts from the start, not necessarily from a non-destructive CI testing perspective, but at least while doing normal tests and code reviews. (Is it possible to do non-destructive NMI tests? I don't know..) thanks! -- darwi http://darwish.chasingpointers.com ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: DRM-based Oops viewer
Hello Jani, On Mon, Mar 11, 2019 at 11:04:19AM +0200, Jani Nikula wrote: > On Sun, 10 Mar 2019, "Ahmed S. Darwish" wrote: > > Hello DRM/UEFI maintainers, > > > > Several years ago, I wrote a set of patches to dump the kernel > > log to disk upon panic -- through BIOS INT 0x13 services. [1] > > > > The overwhelming response was that it's unsafe to do this in a > > generic manner. Linus proposed a video-based viewer instead: [2] > > > > If you want to do the BIOS services thing, do it for video: copy the > > oops to low RAM, return to real mode, re-run the graphics card POST > > routines to initialize text-mode, and use the BIOS to print out the > > oops. That is WAY less scary than writing to disk. > > > > Of course it's 2019 now though, and it's quite known that > > Intel is officially obsoleting the PC/AT BIOS by 2020.. [3] > > > > Researching whether this can be done from UEFI, it was also clear > > that UEFI "Runtime Services" do not provide any re-initialization > > routines. [4] > > > > The maximum possible that UEFI can provide is a GOP-provided > > framebuffer that's ready to use by the OS -- even after the UEFI > > boot phase is marked as done through ExitBootServices(). [5] > > > > Of course, once native drivers like i915 or radeon take over, > > such a framebuffer is toast... [6] > > > > Thus a possible remaining option, is to display the oops through > > "minimal" DRM drivers provided for each HW variant... Since > > these special drivers will run only and fully under a panic() > > context though, several constraints exist: > > > > - The code should be fully synchronous (irqs are disabled) > > - It should not allocate any dynamic memory > > - It should make minimal assumptions about HW state > > - It should not chain into any other kernel subsystem > > - It has ample freedom to use delay-based loops and the > > like, the kernel is already dead. > > > > How feasible is it to have such a special "DRM viewoops" > > framework + its minimal drivers in the kernel? > > Please first better define what you want to achieve. > Oh I thought this was clear.. What I want to achieve is: - for normal day-to-day x86 laptops users - properly inform the user when a kernel panic happens during a running graphical session (e.g. wayland/gnome/kde/...). The current situation is dismal: the screen _just freezes_, and users are left wondering what the heck has really happened to their system (?) Some out-of-the-box notification mechanism, for everyday distros like Fedora and Ubuntu that can be enabled by default, would improve the situation considerably.. Yes, there are many _developer_ features that can mitigate the issue somewhat, but they're not really useful for everyday "normal" usage: - netconsole is definitely not an option. It implies a lab setting where two machines are always connected through a network. - ramoops is _completely irrelevant_ for normal users. It's mostly for embedded systems and the like; requires intimate knowledge of the hardware by the user translated into DT bindings or special platform_data struct.. - kexec/kdump partially solves the save-to-disk problem, but doesn't solve the user notification part.. Maybe a new "kexec/kview" solution can be useful, but distributions don't enable kexec/k* solutions _by default_. Maybe they fear the extra burden of maintaining two kernels at the same time? or the requirement of reserving memory for the crashkernel beforehand? Linux should not just "freeze the screen" upon panic, even if a crashkernel is not present .. _some_ sane default built-in user notification mechanism should be there. - efivars are neat, they partially solve the save-to-disk problem, but does not solve the user notification problem. Moreover, they always carry the risk of bricking laptops.. > Do you want to store the dmesg or oops (like your original series > suggests) or do you want to display the oops? The original save-to-disk series was only shown for context. This is a pure display solution; no disk is invovled at all. > Do you want the facility to be functioning at all times, or only > when specifically requested in advance by the user? At all times, as a basic "sane default" for laptop-oriented distributions to enable (think ubuntu, fedora, mint, etc.) > If you want to display the oops, do you want it to > also work when the display is disabled at the time of the oops? If the screen is disabled, then this is definitely out of scope. This only deals with classical laptop usage scenarios, w
Re: DRM-based Oops viewer
On Mon, Mar 11, 2019 at 02:49:41PM +0100, Daniel Vetter wrote: > On Mon, Mar 11, 2019 at 11:04:19AM +0200, Jani Nikula wrote: > > On Sun, 10 Mar 2019, "Ahmed S. Darwish" wrote: > > > Hello DRM/UEFI maintainers, > > > > > > Several years ago, I wrote a set of patches to dump the kernel > > > log to disk upon panic -- through BIOS INT 0x13 services. [1] > > > > > > The overwhelming response was that it's unsafe to do this in a > > > generic manner. Linus proposed a video-based viewer instead: [2] > > > > > > If you want to do the BIOS services thing, do it for video: copy the > > > oops to low RAM, return to real mode, re-run the graphics card POST > > > routines to initialize text-mode, and use the BIOS to print out the > > > oops. That is WAY less scary than writing to disk. > > > > > > Of course it's 2019 now though, and it's quite known that > > > Intel is officially obsoleting the PC/AT BIOS by 2020.. [3] > > > > > > Researching whether this can be done from UEFI, it was also clear > > > that UEFI "Runtime Services" do not provide any re-initialization > > > routines. [4] > > > > > > The maximum possible that UEFI can provide is a GOP-provided > > > framebuffer that's ready to use by the OS -- even after the UEFI > > > boot phase is marked as done through ExitBootServices(). [5] > > > > > > Of course, once native drivers like i915 or radeon take over, > > > such a framebuffer is toast... [6] > > > > > > Thus a possible remaining option, is to display the oops through > > > "minimal" DRM drivers provided for each HW variant... Since > > > these special drivers will run only and fully under a panic() > > > context though, several constraints exist: > > > > > > - The code should be fully synchronous (irqs are disabled) > > > - It should not allocate any dynamic memory > > > - It should make minimal assumptions about HW state > > > - It should not chain into any other kernel subsystem > > > - It has ample freedom to use delay-based loops and the > > > like, the kernel is already dead. > > > > > > How feasible is it to have such a special "DRM viewoops" > > > framework + its minimal drivers in the kernel? > > > > Please first better define what you want to achieve. > > > > Do you want to store the dmesg or oops (like your original series > > suggests) or do you want to display the oops? Do you want the facility > > to be functioning at all times, or only when specifically requested in > > advance by the user? If you want to display the oops, do you want it to > > also work when the display is disabled at the time of the oops? What if > > the display is at attached to a port on a dock? > > > > There's at least kdump, ramoops, and netconsole that can be used to > > achieve some of what you want. How do they fall short for you? > > Assuming the use-case is to get an oops to display on a kms driver, we do > have a fairly comprehensive plan of what that's should look like: > > https://dri.freedesktop.org/docs/drm/gpu/todo.html#make-panic-handling-work > > This takes into account all the failed previous attempts at trying to get > an oops to display. It's conceptually a match with your viewoops framework > I think. Thanks a lot Daniel for the reference! Yup, this is a conceptual match indeed! It's great to know that at the maintainer level there is some agreement on, awareness of, and plans for, the general topic... I'll jump to Noralf Trønnes's just-posted patches then and see how to move from there :) all the best, --darwi http://darwish.chasingpointers.com ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
DRM-based Oops viewer
Hello DRM/UEFI maintainers, Several years ago, I wrote a set of patches to dump the kernel log to disk upon panic -- through BIOS INT 0x13 services. [1] The overwhelming response was that it's unsafe to do this in a generic manner. Linus proposed a video-based viewer instead: [2] If you want to do the BIOS services thing, do it for video: copy the oops to low RAM, return to real mode, re-run the graphics card POST routines to initialize text-mode, and use the BIOS to print out the oops. That is WAY less scary than writing to disk. Of course it's 2019 now though, and it's quite known that Intel is officially obsoleting the PC/AT BIOS by 2020.. [3] Researching whether this can be done from UEFI, it was also clear that UEFI "Runtime Services" do not provide any re-initialization routines. [4] The maximum possible that UEFI can provide is a GOP-provided framebuffer that's ready to use by the OS -- even after the UEFI boot phase is marked as done through ExitBootServices(). [5] Of course, once native drivers like i915 or radeon take over, such a framebuffer is toast... [6] Thus a possible remaining option, is to display the oops through "minimal" DRM drivers provided for each HW variant... Since these special drivers will run only and fully under a panic() context though, several constraints exist: - The code should be fully synchronous (irqs are disabled) - It should not allocate any dynamic memory - It should make minimal assumptions about HW state - It should not chain into any other kernel subsystem - It has ample freedom to use delay-based loops and the like, the kernel is already dead. How feasible is it to have such a special "DRM viewoops" framework + its minimal drivers in the kernel? The target is to start from i915, since that's what in my laptop now, and work from there.. Some final notes: - The NT kernel has a similar concept, but for storage instead. They're used to dump core under kernel panic() situations, and are called "Minoport storage drivers". [7] - Since Windows 7+, a very fancy Blue Screen of Death is displayed, with Unicode and whatnot, implying GPU drivers involvement. [8] - Mac OS X also does something similar [9] - On Linux laptops, the current situation is _really_ bad. In any graphical session, type "echo c > /proc/sysrq-trigger"; the screen will just completely freeze... Desired first goal: just print the panic() log Thanks a lot, [1] https://lore.kernel.org/lkml/20110125134748.GA10051@laptop [2] https://lore.kernel.org/lkml/AANLkTinU0KYiCd4p=z+=ojbkeeot2g+cayvdru02k...@mail.gmail.com [3] https://uefi.org/sites/default/files/resources/Brian_Richardson_Intel_Final.pdf [4] UEFI v2.7 spec, Chapter 8, "Services — Runtime Services" [5] UEFI v2.7 spec, Section 12.9, "Graphics Output Protocol" "The Graphics Output Protocol supports this capability by providing the EFI OS loader access to a hardware frame buffer and enough information to allow the OS to draw directly to the graphics output device." [6] linux/drivers/gpu/drm/i915/i915_drv.c::i915_kick_out_firmware_fb() linux/drivers/gpu/drm/radeon/radeon_drv.c::radeon_pci_probe() [7] https://docs.microsoft.com/en-us/windows-hardware/drivers/storage/restrictions-on-miniport-drivers-that-manage-the-boot-drive [8] https://upload.wikimedia.org/wikipedia/commons/archive/5/56/20181019151937%21Bsodwindows10.png [9] https://upload.wikimedia.org/wikipedia/commons/4/4a/Mac_OS_X_10.2_Kernel_Panic.jpg --darwi http://darwish.chasingpointers.com ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel