(please CC me if replying)

Hi,

recently I saw a report about an oops in 2.6.38-rc7 in i915, attached at
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/733780

Unfortunately the oops is cut (without call trace):
[126167.230394] BUG: unable to handle kernel paging request at 00100104
[126167.230699] IP: [<f8c2ce44>] i915_gem_retire_requests_ring+0xd4/0x240 [i915]
[126167.231042] *pdpt = 00000000314c1001 *pde = 0000000000000000 
[126167.231314] Oops: 0002 [#1] SMP 
[126167.231471] last sysfs file: 
/sys/devices/LNXSYSTM:00/device:00/PNP0C0A:00/power_supply/BAT1/current_now
[126167.231901] Modules linked in: snd_seq_dummy nls_utf8 isofs btrfs 
zlib_deflate libcrc32c ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs 
exportfs reiserfs cryptd aes_i586 aes_generic binfmt_misc vboxnetadp vboxnetflt 
vboxdrv parport_pc ppdev snd_hda_codec_hdmi snd_hda_codec_conexant 
snd_hda_intel snd_hda_codec snd_hwdep arc4 snd_pcm snd_seq_midi snd_rawmidi 
snd_seq_midi_event snd_seq uvcvideo videodev snd_timer snd_seq_device joydev 
iwlagn iwlcore mac80211 snd cfg80211 soundcore i915 drm_kms_helper 
snd_page_alloc psmouse drm serio_raw i2c_algo_bit video lp parport usbhid hid 
sky2 sdhci_pci ahci sdhci libahci
[126167.232018] 
[126167.232018] Pid: 1101, comm: Xorg Not tainted 2.6.38-6-generic-pae 
#34-Ubuntu Gateway                          MC7833U                         /   
     
[126167.232018] EIP: 0060:[<f8c2ce44>] EFLAGS: 00213246 CPU: 0
[126167.232018] EIP is at i915_gem_retire_requests_ring+0xd4/0x240 [i915]
[126167.232018] EAX: 00200200 EBX: f1ac25b0 ECX: 00000040 EDX: 00100100
[126167.232018] ESI: f1a2801c EDI: e87fc060 EBP: ef4d7dd8 ESP: ef4d7db0
[126167.232018]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[126167.232018] Process Xorg (pid: 1101, ti=ef4d6000 task=f1ba6500 
task.ti=ef4d6000)
[126167.232018] Stack:
[126167.232018]  f1a28000 f1a2809c f1a28094 0058bd97 f1aa2400 f1a2801c 0058bd7b 
0058bd85
[126167.232018]  f1a2801c f1a28000 ef4d7e38 f8c2e995 ef4d7e30 ef4d7e60 c14d1ebc 
f6b3a040
[126167.232018]  f1522cc0 000000db 00000000 f1ba6500 ffffffa1 00000000 00000001 
f1a29214
[126167.232018] Call Trace:


But we already can see some things. After downloading the debug symbols
for kernel above [1] and extracting the files (dpkg-deb -x), setting up
the source files, in gdb we can get this:

(gdb) l *(i915_gem_retire_requests_ring+0xd4)
0x17e74 is in i915_gem_retire_requests_ring 
(/build/buildd/linux-2.6.38/include/linux/list.h:88).
warning: Source file is more recent than executable.
83       * This is only for internal list manipulation where we know
84       * the prev/next entries already!
85       */
86      static inline void __list_del(struct list_head * prev, struct list_head 
* next)
87      {
88              next->prev = prev;
89              prev->next = next;
90      }
91      
92      /**

So the oops is in a list_del call. And looking at the EDX and
dereferenced value, it's LIST_POISON1 + 4, as
p &((struct list_head *)0)->prev == 0x4

So indeed the oops is happening on __list_del at next->prev = prev,
called from i915_gem_retire_requests_ring, probably from the
i915_gem_request_remove_from_client call, going back in the code for
example:
(gdb) l *(i915_gem_retire_requests_ring+0xc6)
0x17e66 is in i915_gem_retire_requests_ring 
(/build/buildd/linux-2.6.38/drivers/gpu/drm/i915/i915_gem.c:1748).
warning: Source file is more recent than executable.
1743    static inline void
1744    i915_gem_request_remove_from_client(struct drm_i915_gem_request 
*request)
1745    {
1746            struct drm_i915_file_private *file_priv = request->file_priv;
1747    
1748            if (!file_priv)
1749                    return;
1750    
1751            spin_lock(&file_priv->mm.lock);
1752            list_del(&request->client_list);

But without the call trace no way to know for certain. Anyway the oops
with LIST_POISON1 means we are trying to delete from the list an already
deleted item. As I can see in the code (I'm not used to that code, so
may be there is something I don't see), it seems we indeed have the
possibility to remove an request->client_list twice, which would cause
the above, we do list_del(&request->client_list) on both
i915_gem_request_remove_from_client and i915_gem_release

I don't know if it's the most correct fix, but perhaps the simple fix
is needed in the code. It's against latest Linus tree. We may have an
already removed client_list, or we didn't add any item to client_list
(file_priv NULL in i915_add_request)

Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesin...@canonical.com>
---
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 36e66cc..6077c0d 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1749,8 +1749,10 @@ i915_gem_request_remove_from_client(struct 
drm_i915_gem_request *request)
                return;
 
        spin_lock(&file_priv->mm.lock);
-       list_del(&request->client_list);
-       request->file_priv = NULL;
+       if (request->file_priv) {
+               list_del(&request->client_list);
+               request->file_priv = NULL;
+       }
        spin_unlock(&file_priv->mm.lock);
 }
 
@@ -4043,8 +4045,10 @@ void i915_gem_release(struct drm_device *dev, struct 
drm_file *file)
                request = list_first_entry(&file_priv->mm.request_list,
                                           struct drm_i915_gem_request,
                                           client_list);
-               list_del(&request->client_list);
-               request->file_priv = NULL;
+               if (request->file_priv) {
+                       list_del(&request->client_list);
+                       request->file_priv = NULL;
+               }
        }
        spin_unlock(&file_priv->mm.lock);
 }

[1] 
http://ddebs.ubuntu.com/pool/main/l/linux/linux-image-2.6.38-6-generic-pae-dbgsym_2.6.38-6.34_i386.ddeb
-- 
[]'s
Herton
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Reply via email to