Re: freelist corruption: more info
I wrote: Trying to fix some make release problems, I've kept running into the same freelist corruption problems that kris and dougb experienced earlier this week. Main difference is that I notice when the box (-CURRENT from 29 May, GENERIC kernel, UP) crashes. :-p At dougb's urging, I applied Tor's patch to ffs_softdep.c. I *think* the results were positive; my machine made it through a make release apparently successfully. Got the following entries in /var/log/messages though: May 30 20:10:10 bmah-freebsd-1 /boot/kernel/kernel: handle_written_filepage: active pagedep May 31 02:18:30 bmah-freebsd-1 /boot/kernel/kernel: handle_written_filepage: active pagedep May 31 02:18:30 bmah-freebsd-1 /boot/kernel/kernel: handle_written_filepage: active pagedep I guess it should be obvious by now but I have softupdates enabled for all filesystems except for /. Thanks, Bruce. PGP signature
freelist corruption: more info
Trying to fix some make release problems, I've kept running into the same freelist corruption problems that kris and dougb experienced earlier this week. Main difference is that I notice when the box (-CURRENT from 29 May, GENERIC kernel, UP) crashes. :-p Not being a -CURRENT guru, I haven't decided if I'm going to try Tor Egge's patch or just slug it out to try to finish fixing make release (which is my main goal at this point). Just as an FYI, here's the tombstone and a stack trace in case it's useful to anyone. Cheers, Bruce. -8-8- Data modified on freelist: word 2 of object 0xc1985a00 size 52 previous type pagedep (0xd6adc0de != 0xdeadc0de) Fatal trap 12: page fault while in kernel mode fault virtual address = 0xdeadc0e8 fault code = supervisor read, page not present instruction pointer= 0x8:0xc0376ab8 stack pointer = 0x10:0xcba7fb9c frame pointer = 0x10:0xcba7fb9c code segment = base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process= 17 (swi3: cambio) kernel: type 12 trap, code=0 Stopped at worklist_remove+0x1c: cmpw$0,0xa(%ecx) db trace worklist_remove(deadc0de) at worklist_remove+0x1c free_diradd(deadc0de) at free_diradd+0x26 free_newdirblk(c1396b70) at free_newdirblk+0x32 handle_written_inodeblock(c241a300,c64135d8) at handle_written_inodeblock+0x2b2 bufdone(c64135d8,cba7ff40,c0136a1b,c64135d8,c1394400) at bufdone+0x101 bufdonebio(c64135d8) at bufdonebio+0xe dadone(c127f400,c1394400) at dadone+0x1fb camisr(c048ccd4) at camisr+0x1c5 ithread_loop(c0e48980,cba7ffa8) at ithread_loop+0x2bf fork_exit(c022c118,c0e48980,cba7ffa8) at fork_exit+0xb4 fork_trampoline() at fork_trampoline+0x8 db PGP signature
Re: freelist corruption
Peter Jeremy wrote: On 2001-May-27 20:36:54 -0700, Kris Kennaway [EMAIL PROTECTED] wrote: I've been getting rather a lot of these tonight..any ideas? May 27 18:52:06 xor /boot/kernel/kernel: Data modified on freelist: word 2 of object 0xc1a60100 size 64 previous type pagedep (0xd6adc0de != 0xdeadc0de) If this isn't an ECC system I got one of these on my ECC system: May 25 01:16:20 kern.crit Master /boot/kernel/kernel: Data modified on freelist: word 2 of object 0xc1a58dc0 size 52 previous type vfscache (0xd6adc0de != 0xdeadc0de) I'm using the following experimental patch to avoid system crashes and the freelist corruption message. The softupdate code seems to free pagedeps structures with the NEWBLOCK flag set (which indicates that a newdirblk structure is currently pointing to the pagedep structure). When the newdirblk structure is freed later on, it clears the NEWBLOCK flag, changing 0xdeadc0de to 0xd6adc0de. If the memory for the pagedep structure has been reused for something else, the system might crash. free_newdirblk will typically be on the ddb stack backtrace - Tor Egge Index: sys/ufs/ffs/ffs_softdep.c === RCS file: /home/ncvs/src/sys/ufs/ffs/ffs_softdep.c,v retrieving revision 1.97 diff -u -r1.97 ffs_softdep.c --- sys/ufs/ffs/ffs_softdep.c 2001/05/19 19:24:26 1.97 +++ sys/ufs/ffs/ffs_softdep.c 2001/05/24 01:48:22 @@ -1932,6 +1932,11 @@ WORKLIST_INSERT(inodedep-id_bufwait, dirrem-dm_list); } + if ((pagedep-pd_state NEWBLOCK) != 0) { + FREE_LOCK(lk); + panic(deallocate_dependencies: + active pagedep); + } WORKLIST_REMOVE(pagedep-pd_list); LIST_REMOVE(pagedep, pd_hash); WORKITEM_FREE(pagedep, D_PAGEDEP); @@ -3930,8 +3935,12 @@ * is written back to disk. */ if (LIST_FIRST(pagedep-pd_pendinghd) == 0) { - LIST_REMOVE(pagedep, pd_hash); - WORKITEM_FREE(pagedep, D_PAGEDEP); + if ((pagedep-pd_state NEWBLOCK) != 0) { + printf(handle_written_filepage: active pagedep\n); + } else { + LIST_REMOVE(pagedep, pd_hash); + WORKITEM_FREE(pagedep, D_PAGEDEP); + } } return (0); }
Re: freelist corruption
On 2001-May-27 20:36:54 -0700, Kris Kennaway [EMAIL PROTECTED] wrote: I've been getting rather a lot of these tonight..any ideas? May 27 18:52:06 xor /boot/kernel/kernel: Data modified on freelist: word 2 of object 0xc1a60100 size 64 previous type pagedep (0xd6adc0de != 0xdeadc0de) If this isn't an ECC system, it could be a flaky SIMM (or flaky cache). There's a single bit difference. (Though I'd expect more obvious problems if bit 27 was incorrectly reading as zero at a detectable rate). Peter To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: freelist corruption
On Mon, May 28, 2001 at 01:48:33PM +1000, Peter Jeremy wrote: On 2001-May-27 20:36:54 -0700, Kris Kennaway [EMAIL PROTECTED] wrote: I've been getting rather a lot of these tonight..any ideas? May 27 18:52:06 xor /boot/kernel/kernel: Data modified on freelist: word 2 of object 0xc1a60100 size 64 previous type pagedep (0xd6adc0de != 0xdeadc0de) If this isn't an ECC system, it could be a flaky SIMM (or flaky cache). There's a single bit difference. (Though I'd expect more obvious problems if bit 27 was incorrectly reading as zero at a detectable rate). Could be, but I'm not having other problems on this system which I'd attribute to bad memory. Kris PGP signature
Re: freelist corruption
Peter Jeremy wrote: On 2001-May-27 20:36:54 -0700, Kris Kennaway [EMAIL PROTECTED] wrote: I've been getting rather a lot of these tonight..any ideas? May 27 18:52:06 xor /boot/kernel/kernel: Data modified on freelist: word 2 of object 0xc1a60100 size 64 previous type pagedep (0xd6adc0de != 0xdeadc0de) If this isn't an ECC system I got one of these on my ECC system: May 25 01:16:20 kern.crit Master /boot/kernel/kernel: Data modified on freelist: word 2 of object 0xc1a58dc0 size 52 previous type vfscache (0xd6adc0de != 0xdeadc0de) To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message