[off topic: plain text mail please]

On Fri, 9 Aug 2019 12:41:42 +0000 Martin Wilck wrote:
> 
> This happened to me today, running kernel 5.3.0-rc3-1.g571863b-default
> (5.3-rc3 with just a few patches on top), after starting a KVM virtual
> machine. The X screen was frozen. Remote login via ssh was still
> possible, thus I was able to retrieve basic logs.

Thanks for report.
> 
> sysrq-w showed two blocked processes (kcompactd0 and KVM). After a
> minute, the same two processes were still blocked. KVM seems to try to
> acquire a lock that kcompactd is holding. kcompactd is waiting for IO
> to complete on pages owned by the i915 driver.
> 
> kcompactd stack:
> 
> Aug 09 12:12:48 apollon.suse.de kernel: sysrq: Show Blocked State
> Aug 09 12:12:48 apollon.suse.de kernel: task                        PC stack  
>  pid father
> Aug 09 12:12:48 apollon.suse.de kernel: kcompactd0      D    0    43      2 
> 0x80004000
> Aug 09 12:12:48 apollon.suse.de kernel: Call Trace:
> Aug 09 12:12:48 apollon.suse.de kernel:  ? __schedule+0x2af/0x6a0
> Aug 09 12:12:48 apollon.suse.de kernel:  schedule+0x33/0x90
> Aug 09 12:12:48 apollon.suse.de kernel:  io_schedule+0x12/0x40
> Aug 09 12:12:48 apollon.suse.de kernel:  __lock_page+0x123/0x200
> Aug 09 12:12:48 apollon.suse.de kernel:  ? gen8_ppgtt_clear_pdp+0xc0/0x140 
> [i915]
> Aug 09 12:12:48 apollon.suse.de kernel:  ? file_fdatawait_range+0x20/0x20
> Aug 09 12:12:48 apollon.suse.de kernel:  set_page_dirty_lock+0x49/0x50
> Aug 09 12:12:48 apollon.suse.de kernel:  
> i915_gem_userptr_put_pages+0x13f/0x1c0 [i915]

The two lines above show commit aa56a292ce62 ("drm/i915/userptr: Acquire
the page lock around set_page_dirty()") is culprit.

> Aug 09 12:12:48 apollon.suse.de kernel:  
> __i915_gem_object_put_pages+0x5e/0xa0 [i915]
> Aug 09 12:12:48 apollon.suse.de kernel:  
> userptr_mn_invalidate_range_start+0x1ff/0x220 [i915]
> Aug 09 12:12:48 apollon.suse.de kernel:  
> __mmu_notifier_invalidate_range_start+0x57/0xa0
> Aug 09 12:12:48 apollon.suse.de kernel:  try_to_unmap_one+0xa0b/0xae0
> Aug 09 12:12:48 apollon.suse.de kernel:  ? __mod_lruvec_state+0x3f/0xf0
> Aug 09 12:12:48 apollon.suse.de kernel:  rmap_walk_file+0xf2/0x250
> Aug 09 12:12:48 apollon.suse.de kernel:  try_to_unmap+0xa6/0xe0

Page is locked before try_to_unmap(), and dirty page table entry is
handled in try_to_unmap_one(), so what was added in aa56a292ce62 is
a bit of overaction in this call trace. A bigger pain is it can not
be reverted because of the Fixes tag in it.

> Aug 09 12:12:48 apollon.suse.de kernel:  ? page_remove_rmap+0x290/0x290
> Aug 09 12:12:48 apollon.suse.de kernel:  ? page_not_mapped+0x20/0x20
> Aug 09 12:12:48 apollon.suse.de kernel:  ? page_get_anon_vma+0x80/0x80
> Aug 09 12:12:48 apollon.suse.de kernel:  migrate_pages+0x8cd/0xbc0
> Aug 09 12:12:48 apollon.suse.de kernel:  ? fast_isolate_freepages+0x6b0/0x6b0
> Aug 09 12:12:48 apollon.suse.de kernel:  ? move_freelist_tail+0xb0/0xb0
> Aug 09 12:12:48 apollon.suse.de kernel:  compact_zone+0x669/0xc80
> Aug 09 12:12:48 apollon.suse.de kernel:  ? 
> entry_SYSCALL_64_after_hwframe+0xb8/0xbe
> Aug 09 12:12:48 apollon.suse.de kernel:  kcompactd_do_work+0x120/0x290
> 
> KVM stack:
> 
> Aug 09 12:12:48 apollon.suse.de kernel: CPU 0/KVM       D    0 25189      1 
> 0x00000320
> Aug 09 12:12:48 apollon.suse.de kernel: Call Trace:
> Aug 09 12:12:48 apollon.suse.de kernel:  ? __schedule+0x2af/0x6a0
> Aug 09 12:12:48 apollon.suse.de kernel:  schedule+0x33/0x90
> Aug 09 12:12:48 apollon.suse.de kernel:  schedule_preempt_disabled+0xa/0x10
> Aug 09 12:12:48 apollon.suse.de kernel:  __mutex_lock.isra.0+0x172/0x4d0
> Aug 09 12:12:48 apollon.suse.de kernel:  
> userptr_mn_invalidate_range_start+0x1bf/0x220 [i915]
> Aug 09 12:12:48 apollon.suse.de kernel:  
> __mmu_notifier_invalidate_range_start+0x57/0xa0
> Aug 09 12:12:48 apollon.suse.de kernel:  try_to_unmap_one+0xa0b/0xae0
> Aug 09 12:12:48 apollon.suse.de kernel:  rmap_walk_file+0xf2/0x250
> Aug 09 12:12:48 apollon.suse.de kernel:  try_to_unmap+0xa6/0xe0
> Aug 09 12:12:48 apollon.suse.de kernel:  ? page_remove_rmap+0x290/0x290
> Aug 09 12:12:48 apollon.suse.de kernel:  ? page_not_mapped+0x20/0x20
> Aug 09 12:12:48 apollon.suse.de kernel:  ? page_get_anon_vma+0x80/0x80
> Aug 09 12:12:48 apollon.suse.de kernel:  migrate_pages+0x8cd/0xbc0
> Aug 09 12:12:48 apollon.suse.de kernel:  ? fast_isolate_freepages+0x6b0/0x6b0
> Aug 09 12:12:48 apollon.suse.de kernel:  ? move_freelist_tail+0xb0/0xb0
> Aug 09 12:12:48 apollon.suse.de kernel:  compact_zone+0x669/0xc80
> Aug 09 12:12:48 apollon.suse.de kernel:  compact_zone_order+0xc6/0xf0
> Aug 09 12:12:48 apollon.suse.de kernel:  try_to_compact_pages+0xcc/0x2a0
> Aug 09 12:12:48 apollon.suse.de kernel:  
> __alloc_pages_direct_compact+0x7c/0x150
> Aug 09 12:12:48 apollon.suse.de kernel:  __alloc_pages_slowpath+0x1ee/0xd00
> Aug 09 12:12:48 apollon.suse.de kernel:  ? vmx_vcpu_load+0x100/0x120 
> [kvm_intel]
> 
> Full logs can be found under https://pastebin.com/KJ6tccj4
> I haven't yet tried if this is reproducible.

Set page dirty unless someone else is taking care of it.

--- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
@@ -663,7 +663,7 @@ i915_gem_userptr_put_pages(struct drm_i9
        i915_gem_gtt_finish_pages(obj, pages);
 
        for_each_sgt_page(page, sgt_iter, pages) {
-               if (obj->mm.dirty)
+               if (obj->mm.dirty) {
                        /*
                         * As this may not be anonymous memory (e.g. shmem)
                         * but exist on a real mapping, we have to lock
@@ -672,8 +672,15 @@ i915_gem_userptr_put_pages(struct drm_i9
                         * prevent the inode from being truncated.
                         * Play safe and take the lock.
                         */
-                       set_page_dirty_lock(page);
-
+                       if (trylock_page(page)) {
+                               set_page_dirty(page);
+                               unlock_page(page);
+                       }
+                       /*
+                        * else someone else is taking care of page and
+                        * we can do nothing about it to avoid deadlock
+                        */
+               }
                mark_page_accessed(page);
                put_page(page);
        }
--

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to