Re: freelist corruption: more info

2001-05-31 Thread Bruce A. Mah

I wrote:

 Trying to fix some make release problems, I've kept running into the
 same freelist corruption problems that kris and dougb experienced
 earlier this week.  Main difference is that I notice when the box
 (-CURRENT from 29 May, GENERIC kernel, UP) crashes.  :-p

At dougb's urging, I applied Tor's patch to ffs_softdep.c.  I *think* 
the results were positive; my machine made it through a make release 
apparently successfully.

Got the following entries in /var/log/messages though:

May 30 20:10:10 bmah-freebsd-1 /boot/kernel/kernel: handle_written_filepage: active 
pagedep
May 31 02:18:30 bmah-freebsd-1 /boot/kernel/kernel: handle_written_filepage: active 
pagedep
May 31 02:18:30 bmah-freebsd-1 /boot/kernel/kernel: handle_written_filepage: active 
pagedep

I guess it should be obvious by now but I have softupdates enabled for 
all filesystems except for /.

Thanks,

Bruce.





 PGP signature


freelist corruption: more info

2001-05-30 Thread Bruce A. Mah

Trying to fix some make release problems, I've kept running into the
same freelist corruption problems that kris and dougb experienced
earlier this week.  Main difference is that I notice when the box
(-CURRENT from 29 May, GENERIC kernel, UP) crashes.  :-p

Not being a -CURRENT guru, I haven't decided if I'm going to try Tor
Egge's patch or just slug it out to try to finish fixing make release 
(which is my main goal at this point).

Just as an FYI, here's the tombstone and a stack trace in case it's
useful to anyone.

Cheers,

Bruce.

-8-8-

Data modified on freelist: word 2 of object 0xc1985a00 size 52 previous type pagedep 
(0xd6adc0de != 0xdeadc0de)


Fatal trap 12: page fault while in kernel mode
fault virtual address  = 0xdeadc0e8
fault code = supervisor read, page not present
instruction pointer= 0x8:0xc0376ab8
stack pointer  = 0x10:0xcba7fb9c
frame pointer  = 0x10:0xcba7fb9c
code segment   = base 0x0, limit 0xf, type 0x1b
   = DPL 0, pres 1, def32 1, gran 1
processor eflags   = interrupt enabled, resume, IOPL = 0
current process= 17 (swi3: cambio)
kernel: type 12 trap, code=0
Stopped at  worklist_remove+0x1c:   cmpw$0,0xa(%ecx)
db trace
worklist_remove(deadc0de) at worklist_remove+0x1c
free_diradd(deadc0de) at free_diradd+0x26
free_newdirblk(c1396b70) at free_newdirblk+0x32
handle_written_inodeblock(c241a300,c64135d8) at handle_written_inodeblock+0x2b2
bufdone(c64135d8,cba7ff40,c0136a1b,c64135d8,c1394400) at bufdone+0x101
bufdonebio(c64135d8) at bufdonebio+0xe
dadone(c127f400,c1394400) at dadone+0x1fb
camisr(c048ccd4) at camisr+0x1c5
ithread_loop(c0e48980,cba7ffa8) at ithread_loop+0x2bf
fork_exit(c022c118,c0e48980,cba7ffa8) at fork_exit+0xb4
fork_trampoline() at fork_trampoline+0x8
db 





 PGP signature


Re: freelist corruption

2001-05-28 Thread Tor . Egge

 Peter Jeremy wrote:
  
  On 2001-May-27 20:36:54 -0700, Kris Kennaway [EMAIL PROTECTED] wrote:
  I've been getting rather a lot of these tonight..any ideas?
  
  May 27 18:52:06 xor /boot/kernel/kernel: Data modified on freelist: word 2 of 
object 0xc1a60100 size 64 previous type pagedep (0xd6adc0de != 0xdeadc0de)
  
  If this isn't an ECC system
 
   I got one of these on my ECC system:
 
 May 25 01:16:20 kern.crit Master /boot/kernel/kernel: Data modified on
 freelist: word 2 of object 0xc1a58dc0 size 52 previous type vfscache
 (0xd6adc0de != 0xdeadc0de)

I'm using the following experimental patch to avoid system crashes and
the freelist corruption message.  The softupdate code seems to free
pagedeps structures with the NEWBLOCK flag set (which indicates that a
newdirblk structure is currently pointing to the pagedep structure).
When the newdirblk structure is freed later on, it clears the NEWBLOCK
flag, changing 0xdeadc0de to 0xd6adc0de.  If the memory for the
pagedep structure has been reused for something else, the system might
crash.  free_newdirblk will typically be on the ddb stack backtrace

- Tor Egge



Index: sys/ufs/ffs/ffs_softdep.c
===
RCS file: /home/ncvs/src/sys/ufs/ffs/ffs_softdep.c,v
retrieving revision 1.97
diff -u -r1.97 ffs_softdep.c
--- sys/ufs/ffs/ffs_softdep.c   2001/05/19 19:24:26 1.97
+++ sys/ufs/ffs/ffs_softdep.c   2001/05/24 01:48:22
@@ -1932,6 +1932,11 @@
WORKLIST_INSERT(inodedep-id_bufwait,
dirrem-dm_list);
}
+   if ((pagedep-pd_state  NEWBLOCK) != 0) {
+   FREE_LOCK(lk);
+   panic(deallocate_dependencies: 
+ active pagedep);
+   }
WORKLIST_REMOVE(pagedep-pd_list);
LIST_REMOVE(pagedep, pd_hash);
WORKITEM_FREE(pagedep, D_PAGEDEP);
@@ -3930,8 +3935,12 @@
 * is written back to disk.
 */
if (LIST_FIRST(pagedep-pd_pendinghd) == 0) {
-   LIST_REMOVE(pagedep, pd_hash);
-   WORKITEM_FREE(pagedep, D_PAGEDEP);
+   if ((pagedep-pd_state  NEWBLOCK) != 0) {
+   printf(handle_written_filepage: active pagedep\n);
+   } else {
+   LIST_REMOVE(pagedep, pd_hash);
+   WORKITEM_FREE(pagedep, D_PAGEDEP);
+   }
}
return (0);
 }



Re: freelist corruption

2001-05-27 Thread Peter Jeremy

On 2001-May-27 20:36:54 -0700, Kris Kennaway [EMAIL PROTECTED] wrote:
I've been getting rather a lot of these tonight..any ideas?

May 27 18:52:06 xor /boot/kernel/kernel: Data modified on freelist: word 2 of object 
0xc1a60100 size 64 previous type pagedep (0xd6adc0de != 0xdeadc0de)

If this isn't an ECC system, it could be a flaky SIMM (or flaky
cache).  There's a single bit difference.  (Though I'd expect more
obvious problems if bit 27 was incorrectly reading as zero at a
detectable rate).

Peter

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: freelist corruption

2001-05-27 Thread Kris Kennaway

On Mon, May 28, 2001 at 01:48:33PM +1000, Peter Jeremy wrote:
 On 2001-May-27 20:36:54 -0700, Kris Kennaway [EMAIL PROTECTED] wrote:
 I've been getting rather a lot of these tonight..any ideas?
 
 May 27 18:52:06 xor /boot/kernel/kernel: Data modified on freelist: word 2 of 
object 0xc1a60100 size 64 previous type pagedep (0xd6adc0de != 0xdeadc0de)
 
 If this isn't an ECC system, it could be a flaky SIMM (or flaky
 cache).  There's a single bit difference.  (Though I'd expect more
 obvious problems if bit 27 was incorrectly reading as zero at a
 detectable rate).

Could be, but I'm not having other problems on this system which I'd
attribute to bad memory.

Kris

 PGP signature


Re: freelist corruption

2001-05-27 Thread Doug Barton

Peter Jeremy wrote:
 
 On 2001-May-27 20:36:54 -0700, Kris Kennaway [EMAIL PROTECTED] wrote:
 I've been getting rather a lot of these tonight..any ideas?
 
 May 27 18:52:06 xor /boot/kernel/kernel: Data modified on freelist: word 2 of 
object 0xc1a60100 size 64 previous type pagedep (0xd6adc0de != 0xdeadc0de)
 
 If this isn't an ECC system

I got one of these on my ECC system:

May 25 01:16:20 kern.crit Master /boot/kernel/kernel: Data modified on
freelist: word 2 of object 0xc1a58dc0 size 52 previous type vfscache
(0xd6adc0de != 0xdeadc0de)

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message