On 09/09/2007 04:43 PM, Jiri Slaby wrote: > On 09/09/2007 04:33 PM, Andi Kleen wrote: >> On Sun, Sep 09, 2007 at 04:26:22PM +0200, Jiri Slaby wrote: >>> BTW it is reproducible for me on two different machines (i386-x86_64, >>> radeon-intel), don't you have the problem too? >> No problems here with a radeon, no. >> >> Does your CPU have clflush or not in /proc/cpuinfo? > > BTW this is how my flush_kernel_map looks like: > > static void flush_kernel_map(void *arg) > { > struct flush_arg *a = (struct flush_arg *)arg; > struct page *pg; > unsigned int xx = 0; > > /* When clflush is available use it because it is > much cheaper than WBINVD. */ > printk("%s: 1\n", __func__); > if (a->full_flush || !cpu_has_clflush) > asm volatile("wbinvd" ::: "memory"); > else list_for_each_entry(pg, &a->l, lru) { > printk("%s: %10u 1a\n", __func__, xx++); > if (PageFlush(pg)) > clflush_cache_range(page_address(pg), PAGE_SIZE); > } > printk("%s: 2\n", __func__); > __flush_tlb_all(); > printk("%s: 3\n", __func__); > } > > It outputs 1a in the infinite loop with incrementing xx. But only in this > case, > some global_flush_tlb are OK. e.g.: > Sep 10 01:39:19 localhost kernel: agpgart: Detected an Intel G33 Chipset. > Sep 10 01:39:19 localhost kernel: global_flush_tlb: 1 > Sep 10 01:39:19 localhost kernel: flush_kernel_map: 1 > Sep 10 01:39:19 localhost kernel: flush_kernel_map: 0 1a > Sep 10 01:39:19 localhost kernel: flush_kernel_map: 1 1a > Sep 10 01:39:19 localhost kernel: flush_kernel_map: 2 > Sep 10 01:39:19 localhost kernel: flush_kernel_map: 3 > Sep 10 01:39:19 localhost kernel: flush_kernel_map: 1 > Sep 10 01:39:19 localhost kernel: flush_kernel_map: 0 1a > Sep 10 01:39:19 localhost kernel: flush_kernel_map: 1 1a > Sep 10 01:39:19 localhost kernel: flush_kernel_map: 2 > Sep 10 01:39:19 localhost kernel: flush_kernel_map: 3 > Sep 10 01:39:19 localhost kernel: global_flush_tlb: 2 > Sep 10 01:39:19 localhost kernel: global_flush_tlb: 3 > Sep 10 01:39:19 localhost kernel: agpgart: Detected 6140K stolen memory. > > It seems, that the list is broken only on X shutdown. How can be > deferred-pages > list inited in some bad manner, when list_replace_init is called on it? Weird.
Ok, here comes a BUG with trace: set status page addr 0x00033000 list_add corruption. next->prev should be prev (ffffffff8068ae70), but was ffffffff80697a50. (next=ffff81000117fbd0). ------------[ cut here ]------------ kernel BUG at /home/l/latest/xxx/lib/list_debug.c:27! invalid opcode: 0000 [1] SMP last sysfs file: /devices/pci0000:00/0000:00:02.0/enable CPU 0 Modules linked in: ipv6 floppy sr_mod rtc_cmos rtc_core cdrom ehci_hcd rtc_lib usbhid Pid: 1639, comm: X Not tainted 2.6.23-rc4-mm1_64 #23 RIP: 0010:[<ffffffff80332f49>] [<ffffffff80332f49>] __list_add+0x39/0x60 RSP: 0018:ffff81000547bd48 EFLAGS: 00010296 RAX: 0000000000000079 RBX: ffff81000117a380 RCX: ffffffff8068b450 RDX: ffff81000317d6a0 RSI: 0000000000000001 RDI: ffffffff8068b420 RBP: ffff81000547bd48 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000006da2 R13: ffff810006c10d10 R14: ffff810006da2000 R15: 8000000000000163 FS: 00007f7a05258710(0000) GS:ffffffff806d1000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000a33040 CR3: 0000000004a43000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process X (pid: 1639, threadinfo ffff81000547a000, task ffff81000317d6a0) Stack: ffff81000547bd58 ffffffff80332f7c ffff81000547bdc8 ffffffff80225c56 ffffffff80225cb5 8000000000000163 ffffffff806830e8 ffffffff806830a0 ffffffff806830a0 ffff810006da2000 ffff81000547bda8 0000000000006da2 Call Trace: [<ffffffff80332f7c>] list_add+0xc/0x10 [<ffffffff80225c56>] __change_page_attr+0x376/0x390 [<ffffffff80225cb5>] change_page_attr_addr+0x45/0x140 [<ffffffff80225d16>] change_page_attr_addr+0xa6/0x140 [<ffffffff80225de3>] change_page_attr+0x33/0x40 [<ffffffff80387b64>] agp_generic_destroy_page+0x44/0x70 [<ffffffff80388645>] agp_free_memory+0x65/0xd0 [<ffffffff80386d49>] agp_free_memory_wrap+0x39/0x60 [<ffffffff803873fb>] agp_release+0xdb/0x1c0 [<ffffffff80299551>] __fput+0xd1/0x1b0 [<ffffffff802996b6>] fput+0x16/0x20 [<ffffffff802964e6>] filp_close+0x56/0x90 [<ffffffff80297bed>] sys_close+0xad/0x110 [<ffffffff8020bd0e>] system_call+0x7e/0x83 INFO: lockdep is turned off. Code: 0f 0b eb fe 0f 1f 00 48 89 f1 4c 89 c2 48 89 c6 48 c7 c7 e8 RIP [<ffffffff80332f49>] __list_add+0x39/0x60 RSP <ffff81000547bd48> regards, -- Jiri Slaby ([EMAIL PROTECTED]) Faculty of Informatics, Masaryk University - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/