Re: X freezes kernel during exit [Re: 2.6.23-rc3-mm1]

Jiri Slaby Mon, 17 Sep 2007 04:10:57 -0700

On 09/09/2007 04:43 PM, Jiri Slaby wrote:
> On 09/09/2007 04:33 PM, Andi Kleen wrote:
>> On Sun, Sep 09, 2007 at 04:26:22PM +0200, Jiri Slaby wrote:
>>> BTW it is reproducible for me on two different machines (i386-x86_64,
>>> radeon-intel), don't you have the problem too?
>> No problems here with a radeon, no.
>>
>> Does your CPU have clflush or not in /proc/cpuinfo?
> 
> BTW this is how my flush_kernel_map looks like:
> 
> static void flush_kernel_map(void *arg)
> {
>         struct flush_arg *a = (struct flush_arg *)arg;
>         struct page *pg;
>         unsigned int xx = 0;
> 
>         /* When clflush is available use it because it is
>            much cheaper than WBINVD. */
>         printk("%s: 1\n", __func__);
>         if (a->full_flush || !cpu_has_clflush)
>                 asm volatile("wbinvd" ::: "memory");
>         else list_for_each_entry(pg, &a->l, lru) {
>         printk("%s: %10u 1a\n", __func__, xx++);
>                 if (PageFlush(pg))
>                         clflush_cache_range(page_address(pg), PAGE_SIZE);
>         }
>         printk("%s: 2\n", __func__);
>         __flush_tlb_all();
>         printk("%s: 3\n", __func__);
> }
> 
> It outputs 1a in the infinite loop with incrementing xx. But only in this 
> case,
> some global_flush_tlb are OK. e.g.:
> Sep 10 01:39:19 localhost kernel: agpgart: Detected an Intel G33 Chipset.
> Sep 10 01:39:19 localhost kernel: global_flush_tlb: 1
> Sep 10 01:39:19 localhost kernel: flush_kernel_map: 1
> Sep 10 01:39:19 localhost kernel: flush_kernel_map:          0 1a
> Sep 10 01:39:19 localhost kernel: flush_kernel_map:          1 1a
> Sep 10 01:39:19 localhost kernel: flush_kernel_map: 2
> Sep 10 01:39:19 localhost kernel: flush_kernel_map: 3
> Sep 10 01:39:19 localhost kernel: flush_kernel_map: 1
> Sep 10 01:39:19 localhost kernel: flush_kernel_map:          0 1a
> Sep 10 01:39:19 localhost kernel: flush_kernel_map:          1 1a
> Sep 10 01:39:19 localhost kernel: flush_kernel_map: 2
> Sep 10 01:39:19 localhost kernel: flush_kernel_map: 3
> Sep 10 01:39:19 localhost kernel: global_flush_tlb: 2
> Sep 10 01:39:19 localhost kernel: global_flush_tlb: 3
> Sep 10 01:39:19 localhost kernel: agpgart: Detected 6140K stolen memory.
> 
> It seems, that the list is broken only on X shutdown. How can be 
> deferred-pages
> list inited in some bad manner, when list_replace_init is called on it? Weird.


Ok, here comes a BUG with trace:
set status page addr 0x00033000
list_add corruption. next->prev should be prev (ffffffff8068ae70), but was
ffffffff80697a50. (next=ffff81000117fbd0).
------------[ cut here ]------------
kernel BUG at /home/l/latest/xxx/lib/list_debug.c:27!
invalid opcode: 0000 [1] SMP
last sysfs file: /devices/pci0000:00/0000:00:02.0/enable
CPU 0
Modules linked in: ipv6 floppy sr_mod rtc_cmos rtc_core cdrom ehci_hcd rtc_lib
usbhid
Pid: 1639, comm: X Not tainted 2.6.23-rc4-mm1_64 #23
RIP: 0010:[<ffffffff80332f49>]  [<ffffffff80332f49>] __list_add+0x39/0x60
RSP: 0018:ffff81000547bd48  EFLAGS: 00010296
RAX: 0000000000000079 RBX: ffff81000117a380 RCX: ffffffff8068b450
RDX: ffff81000317d6a0 RSI: 0000000000000001 RDI: ffffffff8068b420
RBP: ffff81000547bd48 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000006da2
R13: ffff810006c10d10 R14: ffff810006da2000 R15: 8000000000000163
FS:  00007f7a05258710(0000) GS:ffffffff806d1000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000a33040 CR3: 0000000004a43000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process X (pid: 1639, threadinfo ffff81000547a000, task ffff81000317d6a0)
Stack:  ffff81000547bd58 ffffffff80332f7c ffff81000547bdc8 ffffffff80225c56
 ffffffff80225cb5 8000000000000163 ffffffff806830e8 ffffffff806830a0
 ffffffff806830a0 ffff810006da2000 ffff81000547bda8 0000000000006da2
Call Trace:
 [<ffffffff80332f7c>] list_add+0xc/0x10
 [<ffffffff80225c56>] __change_page_attr+0x376/0x390
 [<ffffffff80225cb5>] change_page_attr_addr+0x45/0x140
 [<ffffffff80225d16>] change_page_attr_addr+0xa6/0x140
 [<ffffffff80225de3>] change_page_attr+0x33/0x40
 [<ffffffff80387b64>] agp_generic_destroy_page+0x44/0x70
 [<ffffffff80388645>] agp_free_memory+0x65/0xd0
 [<ffffffff80386d49>] agp_free_memory_wrap+0x39/0x60
 [<ffffffff803873fb>] agp_release+0xdb/0x1c0
 [<ffffffff80299551>] __fput+0xd1/0x1b0
 [<ffffffff802996b6>] fput+0x16/0x20
 [<ffffffff802964e6>] filp_close+0x56/0x90
 [<ffffffff80297bed>] sys_close+0xad/0x110
 [<ffffffff8020bd0e>] system_call+0x7e/0x83

INFO: lockdep is turned off.

Code: 0f 0b eb fe 0f 1f 00 48 89 f1 4c 89 c2 48 89 c6 48 c7 c7 e8
RIP  [<ffffffff80332f49>] __list_add+0x39/0x60
 RSP <ffff81000547bd48>

regards,
-- 
Jiri Slaby ([EMAIL PROTECTED])
Faculty of Informatics, Masaryk University
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: X freezes kernel during exit [Re: 2.6.23-rc3-mm1]

Reply via email to