Hi Marcelo, Thank you for the quick response. I actually didn't anticipate that quick of a response.
> > I was wondering if someone could help me with this. I'm porting an > > ASPLOV paper miss ratio curve from 2.4.20 2.6.11.6 and eventually to > > Planet Labs kernel. It's a novel idea for memory management. In > > porting I at run time I'm consistently hitting kernel bugs at four > > different places bad_page, bad_range, in rmap.c > > BUG(page_mapcount(page)< 0), and failing at apm_do_idle. All of these > > functions except apm_do_idle seem to be new functions from 2.4.20 to > > 2.6.11.6. I'm pretty sure I'm forgetting to account for certain > > things when modifying the pages, but I'm not sure where. > > Having the information which bad_page etc. dump out would definately help. > > I can't figure out what is going on with the data you provide, probably > someone else can. Here are some data dumps that I have: for bad_page, bad_range, and rmap.c error 1: When this function has an error it traces back to bad_range and bad_page. ------------[ cut here ]------------ kernel BUG at mm/page_alloc.c:647! invalid operand: 0000 [#1] Modules linked in: CPU: 0 EIP: 0060:[<c0137af8>] Not tainted VLI EFLAGS: 00000202 (2.6.11-kgdb) EIP is at buffered_rmqueue+0x178/0x1d0 eax: 00000001 ebx: c033c164 ecx: fff99e0b edx: 00001000 esi: c033c164 edi: 00000246 ebp: c033c180 esp: c7db5df0 ds: 007b es: 007b ss: 0068 Process nbench (pid: 301, threadinfo=c7db4000 task=c1273020) Stack: c033c190 c7e3b000 c033c178 000080d2 00000000 c033c164 00000000 00000000 000080d2 c013800d 00000001 00000000 00000000 c11657e0 00000001 00000000 c1273020 00000010 c033c3cc 00000000 c03e0aa0 003ba025 c7dbffd0 c12f385c Call Trace: [<c013800d>] __alloc_pages+0x3fd/0x460 [<c0142e19>] do_anonymous_page+0x59/0x120 [<c0142f3e>] do_no_page+0x5e/0x280 [<c014338c>] handle_mm_fault+0x12c/0x1b0 [<c0113a76>] do_page_fault+0x176/0x5db [<c014479d>] vma_merge+0xbd/0x180 [<c0107cd1>] old_mmap+0xc1/0x100 [<c0113900>] do_page_fault+0x0/0x5db [<c0102f8b>] error_code+0x2b/0x30 Code: 00 00 00 8b 44 24 08 83 c4 14 5b 5e 5f 5d c3 8b 54 24 10 8b 44 24 08 e8 c7 f6 ff ff eb e5 0f 0b 5e 02 ed f4 2f c0 e9 7c ff ff ff <0f> 0b 87 02 ed f4 2f c0 e9 09 ff ff ff 9c 5f fa 8b 54 24 10 89 <1>Unable to handle kernel paging request at virtual address 00100104 printing eip: c0137927 *pde = 00000000 Oops: 0002 [#2] Modules linked in: CPU: 0 EIP: 0060:[<c0137927>] Not tainted VLI EFLAGS: 00000097 (2.6.11-kgdb) error 2: main error: For rmap.c: ------------[ cut here ]------------ kernel BUG at mm/rmap.c:487! invalid operand: 0000 [#1] Modules linked in: CPU: 0 EIP: 0060:[<c0147b5d>] Not tainted VLI EFLAGS: 00000292 (2.6.11-kgdb) EIP is at page_remove_rmap+0x3d/0x70 eax: ffffffe0 ebx: c10fc6a0 ecx: 00000000 edx: c7d9de74 esi: c7d78124 edi: 00010000 ebp: c10fc6a0 esp: c7d9de70 ds: 007b es: 007b ss: 0068 Process ls (pid: 302, threadinfo=c7d9c000 task=c1273020) Stack: c02ffb8f 00001000 c0141442 c0300460 c7ddcfa4 00000000 c12f116c 07e35025 00000000 08048000 c03bda38 08448000 c7dde084 08058000 c03bda38 c01415ab 00010000 00000000 c7ddeb7c b83e9000 08048000 c7dde084 08058000 c03bda38 Call Trace: [<c0141442>] zap_pte_range+0x162/0x280 [<c01415ab>] zap_pmd_range+0x4b/0x70 [<c014160d>] zap_pud_range+0x3d/0x60 [<c0141694>] unmap_page_range+0x64/0x80 [<c01417b9>] unmap_vmas+0x109/0x220 [<c0145e87>] exit_mmap+0x67/0x130 [<c0116fde>] mmput+0x1e/0x70 [<c011ab99>] do_exit+0x99/0x360 [<c011d2d3>] tasklet_action+0x43/0x70 [<c011d099>] __do_softirq+0x79/0x90 [<c011aecf>] do_group_exit+0x2f/0x70 [<c0102de3>] syscall_call+0x7/0xb Code: 08 ff 0f 98 c0 84 c0 74 3a 8b 43 08 40 78 26 8b 43 08 40 78 16 8b 5c 24 04 ba ff ff ff ff b8 10 00 00 00 83 c4 08 e9 43 08 ff ff <0f> 0b e7 01 7c fb 2f c0 eb e0 c7 04 24 8f fb 2f c0 e8 6d 14 fd <6>note: ls[302] exited with preempt_count 1 ------------[ cut here ]------------ kernel BUG at mm/rmap.c:487! invalid operand: 0000 [#1] Modules linked in: CPU: 0 EIP: 0060:[<c0147b3d>] Not tainted VLI EFLAGS: 00000296 (2.6.11-kgdb) EIP is at page_remove_rmap+0x3d/0x70 eax: ffffffa0 ebx: c10fb680 ecx: 00000000 edx: c7dc9e88 esi: c7d37fd0 edi: 00021000 ebp: c10fb680 esp: c7dc9e84 ds: 007b es: 007b ss: 0068 Process nbench (pid: 301, threadinfo=c7dc8000 task=c1273020) Stack: c02ffb6f 00016000 c0141436 00000246 00000100 00000000 c7dc9ef8 07db4067 00000000 b7fde000 c03bda38 b83de000 c7dcfb80 b7fff000 c03bda38 c014159b 00021000 00000000 c01205f0 00000100 b7fde000 c7dcfb80 b7fff000 c03bda38 Call Trace: [<c0141436>] zap_pte_range+0x156/0x270 [<c014159b>] zap_pmd_range+0x4b/0x70 [<c01205f0>] cascade+0x30/0x50 [<c01415fd>] zap_pud_range+0x3d/0x60 [<c0106bb9>] timer_interrupt+0x49/0xe0 [<c0141684>] unmap_page_range+0x64/0x80 [<c01417a9>] unmap_vmas+0x109/0x220 [<c01457da>] unmap_region+0x6a/0xd0 [<c0145a91>] do_munmap+0xf1/0x130 [<c0145b10>] sys_munmap+0x40/0x70 [<c0102de3>] syscall_call+0x7/0xb Code: 08 ff 0f 98 c0 84 c0 74 3a 8b 43 08 40 78 26 8b 43 08 40 78 16 8b 5c 24 04 ba ff ff ff ff b8 10 00 00 00 83 c4 08 e9 63 08 ff ff <0f> 0b e7 01 5c fb 2f c0 eb e0 c7 04 24 6f fb 2f c0 e8 8d 14 fd <6>note: nbench[301] exited with preempt_count 1 scheduling while atomic: nbench/0x00000001/301 [<c02e9038>] __switch_to_end+0x20c/0x214 [<c01046eb>] do_IRQ+0x3b/0x70 [<c0102f52>] common_interrupt+0x1a/0x20 [<c02e9acd>] rwsem_down_read_failed+0x8d/0x170 [<c011bff0>] .text.lock.exit+0x27/0x87 [<c011ab99>] do_exit+0x99/0x360 [<c0103678>] die+0x138/0x140 [<c01039d0>] do_invalid_op+0x0/0xc0 [<c0103a6f>] do_invalid_op+0x9f/0xc0 [<c0120ad6>] update_process_times+0x166/0x210 [<c0147b3d>] page_remove_rmap+0x3d/0x70 [<c0132a79>] handle_IRQ_event+0x29/0x60 [<c011d099>] __do_softirq+0x79/0x90 [<c01046eb>] do_IRQ+0x3b/0x70 [<c0102f52>] common_interrupt+0x1a/0x20 [<c0102f8b>] error_code+0x2b/0x30 [<c0147b3d>] page_remove_rmap+0x3d/0x70 [<c0141436>] zap_pte_range+0x156/0x270 [<c014159b>] zap_pmd_range+0x4b/0x70 [<c01205f0>] cascade+0x30/0x50 [<c01415fd>] zap_pud_range+0x3d/0x60 [<c0106bb9>] timer_interrupt+0x49/0xe0 [<c0141684>] unmap_page_range+0x64/0x80 [<c01417a9>] unmap_vmas+0x109/0x220 [<c01457da>] unmap_region+0x6a/0xd0 [<c0145a91>] do_munmap+0xf1/0x130 [<c0145b10>] sys_munmap+0x40/0x70 [<c0102de3>] syscall_call+0x7/0xb > Why dont you post the code (in case its GPL)... The patch is here. https://wiki.planet-lab.org/twiki/pub/Planetlab/UIUCProject/mrc2.6.11.6patch.diff.txt.txt It's for the 2.6.11.6 kernel. To replicate the error fully before compiling: please comment the mrc_reset_pgtables(p); in mm/mrc.c on line 433. If one comments out mrc_scan as well then there's no run time crashes at all. So all of the errors comes from mrc_scan and mrc_reset_pgtables, it's just I can't figure out how it's corrupting the pages, and how I should account for that in the code. Also, in make xconfig a menu should come up. Go to "Kernel hacking" then uncheck "Sleep-inside-spinlock checking" I usually use the virtual emulator qemu when debugging. Finally when running mrc. You want to start a program and get the process id then do the following: echo "<process id> 1 1 100 100" > /proc/sys/vm/mrc_info This will turn mrc on for the given process. All MRC code is surrounded by //MRC or something like it for easy recognition. I usually use qemu if you use qemu you may want to download this: http://fabrice.bellard.free.fr/qemu/linux-test-0.5.1.tar.gz http://fabrice.bellard.free.fr/qemu/qemu-0.7.0.tar.gz and when running qemu.sh in linux-test change the script to "qemu -nographic -hda linux.img -kernel /boot/vmlinuz-2.6.11-kgdb -append "console=ttyS0 root=/dev/hda sb=0x220,5,1,5 ide2=noprobe ide3=noprobe ide4=noprobe ide5=noprobe" I'd appreciate any direction you can provide. Thanks for your help in advance. I look forward to hearing from you. Zao - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/