Re: Help! (was: Crash while newfs'ing innocent vinum volume on fresh system.)
On Friday, 19 March 1999 at 23:30:44 -0600, Russell Neeper wrote: > I decided to give vinum a try a few days ago and ran into the same > problem as Vallo - after using it for a short period of time it caused > a kernel panic due to a page fault. > > I spent some time with kgdb today and believe that I have found the bug. This message came in literally two seconds after I send a reply to Vallo about the bug: Mar 20 16:00:54 allegro sendmail[7086]: QAA07084: to=, delay=00:00:07, xdelay=00:00:07, mailer=esmtp, relay=solaris.matti.ee. [194.126.98.135], stat=Sent (HAA07473 Message accepted for delivery) Mar 20 16:00:56 allegro sendmail[7087]: QAA07087: from=, size=5117, class=0, pri=35117, nrcpts=1, msgid=<19990319233044.a10...@net.tamu.edu>, bodytype=8BITMIME, proto=ESMTP, relay=newnet.tamu.edu [128.194.177.50] Mar 20 16:00:57 allegro sendmail[7088]: QAA07087: to=, delay=00:00:04, xdelay=00:00:01, mailer=local, stat=Sent Anyway, wonderful! Exactly right. I'm very impressed that you found it at effectively the same time as me. More comments further down. > On Thu, Mar 18, 1999 at 01:31:23PM +1030, Greg Lehey wrote: >> This is a problem I've seen before, but it completely baffles me. The >> request passed to launch_requests (frame 10) has been deallocated. >> Some of the debug code I put in caught it: >> >> (kgdb) p freeinfo[7] >> $2 = { >> time = { >> tv_sec = 921669613, >> tv_usec = 289712 >> }, >> seq = 24, >> size = 36, >> line = 174, >> address = 0xf0a3cb00 >> "ÞÀÞh\235\"ðÞÀÞÀÈ£ðÞÀÞÞÀÞÞÀÞÞÀÞÞÀÞÞÀÞÞÀÞÞÀÞÞÀÞÞÀÞÞÀÞÞÀÞ", >> file = "vinuminterrupt.c" >> } >> >> This was called from freerq, which frees the complete request. freerq >> is called from only four places: one on completion of the request >> (which in this case is just about to be started), one if the request >> is aborted (which also sets bp->b_error, which is not set here), once >> in a read context (which is not applicable here: it's a write), and >> once just before the call to launch_requests in frame 11: > > The best that I can tell, the problem is with the first call that you > listed: "on completion of the request". The function 'complete_rqe' is > called asynchronously by an interrupt at the completion of the I/O > request. Correct. >> So where is this coming from? I'm completely baffled. It doesn't >> happen to most people, though I have had reports of one or two other >> cases. About the only clue is that the problem didn't occur when I >> removed the debug memory allocator, but I don't know whether it went >> away or into hiding. I'd really like to find out what's going on >> here. > > I think that removing the debug memory allocator just made it go into > hiding because it changed the timing of the code. Possibly. It would work in the right direction. > Freeing the request structure in the interrupt routine is causing a > race condition in the function 'launch_requests'. Interrupts must > be disabled around any and all code which refers to the request > chain and this wasn't being done. I have created a patch that seems > to fix the problem. However, there could be other places in the > code that refers to the request chain without disabling interrupts. > After looking at it for only a few hours, I'm not familiar enough > with it to tell. This is the only place. > Here's the patch: > > diff -u vinum/vinumrequest.c vinum-mod/vinumrequest.c > --- vinum/vinumrequest.cThu Mar 18 20:21:46 1999 > +++ vinum-mod/vinumrequest.cFri Mar 19 22:55:49 1999 > @@ -258,13 +258,8 @@ > biodone(bp); > freerq(rq); > return -1; > - } { /* XXX */ > - int result; > - int s = splhigh(); > - result = launch_requests(rq, reviveok); /* now start the > requests if we can */ > - splx(s); > - return result; > } > + return launch_requests(rq, reviveok); /* now start the requests > if we can */ > } else > /* > * This is a write operation. We write to all > @@ -366,6 +361,7 @@ > if (debug & DEBUG_LASTREQS) > logrq(loginfo_user_bpl, rq->bp, rq->bp); > #endif > +s = splbio(); > for (rqg = rq->rqg; rqg != NULL; rqg = rqg->next) {/* > through the whole request chain */ > rqg->active = rqg->count; /* they're all > active */ > rq->active++; /* one more > active request group */ > @@ -396,13 +392,13 @@ > logrq(loginfo_rqe, rqe, rq->bp); > #endif > /* fire off the request */ > - s = splbio(); > (*bdevsw[major(rqe->b.b_dev)]->d_strategy) (&rqe->b); > - splx(s); > } > /* XXX Do we need caching? Think about this more */ > } > } > +splx(s); > + > return 0; > } > > I remove the
Re: Help! (was: Crash while newfs'ing innocent vinum volume on fresh system.)
I decided to give vinum a try a few days ago and ran into the same problem as Vallo - after using it for a short period of time it caused a kernel panic due to a page fault. I spent some time with kgdb today and believe that I have found the bug. On Thu, Mar 18, 1999 at 01:31:23PM +1030, Greg Lehey wrote: > This is a problem I've seen before, but it completely baffles me. The > request passed to launch_requests (frame 10) has been deallocated. > Some of the debug code I put in caught it: > > (kgdb) p freeinfo[7] > $2 = { > time = { > tv_sec = 921669613, > tv_usec = 289712 > }, > seq = 24, > size = 36, > line = 174, > address = 0xf0a3cb00 > "ÞÀÞh\235\"ðÞÀÞÀÈ£ðÞÀÞÞÀÞÞÀÞÞÀÞÞÀÞÞÀÞÞÀÞÞÀÞÞÀÞÞÀÞÞÀÞÞÀÞ", > file = "vinuminterrupt.c" > } > > This was called from freerq, which frees the complete request. freerq > is called from only four places: one on completion of the request > (which in this case is just about to be started), one if the request > is aborted (which also sets bp->b_error, which is not set here), once > in a read context (which is not applicable here: it's a write), and > once just before the call to launch_requests in frame 11: The best that I can tell, the problem is with the first call that you listed: "on completion of the request". The function 'complete_rqe' is called asynchronously by an interrupt at the completion of the I/O request. > So where is this coming from? I'm completely baffled. It doesn't > happen to most people, though I have had reports of one or two other > cases. About the only clue is that the problem didn't occur when I > removed the debug memory allocator, but I don't know whether it went > away or into hiding. I'd really like to find out what's going on > here. I think that removing the debug memory allocator just made it go into hiding because it changed the timing of the code. Freeing the request structure in the interrupt routine is causing a race condition in the function 'launch_requests'. Interrupts must be disabled around any and all code which refers to the request chain and this wasn't being done. I have created a patch that seems to fix the problem. However, there could be other places in the code that refers to the request chain without disabling interrupts. After looking at it for only a few hours, I'm not familiar enough with it to tell. Hope this helps. Here's the patch: diff -u vinum/vinumrequest.c vinum-mod/vinumrequest.c --- vinum/vinumrequest.cThu Mar 18 20:21:46 1999 +++ vinum-mod/vinumrequest.cFri Mar 19 22:55:49 1999 @@ -258,13 +258,8 @@ biodone(bp); freerq(rq); return -1; - } { /* XXX */ - int result; - int s = splhigh(); - result = launch_requests(rq, reviveok); /* now start the requests if we can */ - splx(s); - return result; } + return launch_requests(rq, reviveok); /* now start the requests if we can */ } else /* * This is a write operation. We write to all @@ -366,6 +361,7 @@ if (debug & DEBUG_LASTREQS) logrq(loginfo_user_bpl, rq->bp, rq->bp); #endif +s = splbio(); for (rqg = rq->rqg; rqg != NULL; rqg = rqg->next) {/* through the whole request chain */ rqg->active = rqg->count; /* they're all active */ rq->active++; /* one more active request group */ @@ -396,13 +392,13 @@ logrq(loginfo_rqe, rqe, rq->bp); #endif /* fire off the request */ - s = splbio(); (*bdevsw[major(rqe->b.b_dev)]->d_strategy) (&rqe->b); - splx(s); } /* XXX Do we need caching? Think about this more */ } } +splx(s); + return 0; } I remove the splhigh/splx from around the first call of launch_requests because, as far as I can tell, it became redundant after adding splbio/splx around the for loop in the launch_requests function. -- Russell Neeper Texas A&M University russell-nee...@tamu.edu Computing & Information Services Network Group To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-current" in the body of the message
IPSEC support?
Is there any IPSEC support available for current? I've found support for 2.2.8, but not so far for current. Steven P. Donegan email: done...@quick.net Sr. Network Infrastructure Engineer ICBM: N 33' 47.538/W 117' 59.687 WANG Global (within 1 meter - 133 ASL) To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-current" in the body of the message
Re: deadlock in 3.1-RELEASE
Andrew Heybey writes: | When the deadlock does occur, "ps" (in ddb) says that there are many | processes in vmwait. The pagedaemon is in an inode wait. The stack | trace is in default_halt() (which I assume just means that there are | no runnable processes). The system is not short of memory (unless | "short of memory" means that it is attempting to use it all as a disk | cache). | | A search of cvs-commiters for "vmwait deadlock" did not reveal (to my | ignorant eye, anyway) any fixes to -current that would apply to this | problem. I have an environment that triggered this in less then 1/2 hour. Julian with the help of his friends (ie Matt & Alan) have brought in some changes from -current that got rid of my problem. Getting the latest RELENG_3 stuff should fix it. My processes got stuck on vmwait. Doug A. To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-current" in the body of the message
Re: How to add a new bootdevice to the new boot code ???
> In article <199903171103.naa13...@ceia.nordier.com> you wrote: > > Søren Schmidt wrote: > > > >> OK, easy enough, this is what I want to do: > >> > >> Boot from an ata disk on major# 30, device name "ad", plain and simple. > > > > I'd be inclined to handle this outside the boot code, by treating the > > passed in major# as describing the device rather than specifying > > the driver. > > Why not have the boot blocks pass in a device 'name' rather than a > major number. If the goal is to ditch major numbers entirely with > a properly working DEVFS, then using major numbers in the new boot > loader seems to be the wrong way to go. Until DEVFS is a reality, > the kernel will still need to perform a name to major number translation, > but it should be left up to the kernel. Because there's no way to work out a name either. All the loader has to go on is the BIOS unit number and the disklabel, the latter of which can't be relied on to be up-to-date (ie. it reflects what the disk was when it was laid out, not what some nominal kernel is going to call it). The *only* way for this to work is for the kernel to hunt for the root device, possibly with some helping hints from the loader. -- \\ Sometimes you're ahead, \\ Mike Smith \\ sometimes you're behind. \\ m...@smith.net.au \\ The race is long, and in the \\ msm...@freebsd.org \\ end it's only with yourself. \\ msm...@cdrom.com To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-current" in the body of the message
Re: How to add a new bootdevice to the new boot code ???
In article <199903171103.naa13...@ceia.nordier.com> you wrote: > Søren Schmidt wrote: > >> OK, easy enough, this is what I want to do: >> >> Boot from an ata disk on major# 30, device name "ad", plain and simple. > > I'd be inclined to handle this outside the boot code, by treating the > passed in major# as describing the device rather than specifying > the driver. Why not have the boot blocks pass in a device 'name' rather than a major number. If the goal is to ditch major numbers entirely with a properly working DEVFS, then using major numbers in the new boot loader seems to be the wrong way to go. Until DEVFS is a reality, the kernel will still need to perform a name to major number translation, but it should be left up to the kernel. -- Justin To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-current" in the body of the message
deadlock in 3.1-RELEASE
[Meta question: should I send this kind of thing to -current or -stable? I experienced it under 3.1, but I don't know if the people who are working on the VM and/or IO systems read -stable?] I can wedge my 3.1-RELEASE system under the following conditions: Two fxp fast ethernet interfaces, each receiving ~15k 512-byte pkts/sec. All of the above data (~15MB/sec) being written to a ccd partition striped across three disks. A couple of processes also trying to read the data from disk. It takes anywhere from 30 minutes to several hours to occur. It has happened with both an AIC7890 and an NCR 895 (Tekram 390U2W) disk controller. When the deadlock does occur, "ps" (in ddb) says that there are many processes in vmwait. The pagedaemon is in an inode wait. The stack trace is in default_halt() (which I assume just means that there are no runnable processes). The system is not short of memory (unless "short of memory" means that it is attempting to use it all as a disk cache). A search of cvs-commiters for "vmwait deadlock" did not reveal (to my ignorant eye, anyway) any fixes to -current that would apply to this problem. andrew To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-current" in the body of the message
Re: How to add a new bootdevice to the new boot code ???
I have an LS-120 and I'd be happy to test the new boot code with it. Bob -- Bob Willcox The man who follows the crowd will usually get no b...@luke.pmr.comfurther than the crowd. The man who walks alone is Austin, TX likely to find himself in places no one has ever been.-- Alan Ashley-Pitt To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-current" in the body of the message
Re: repeated ufs_dirbad() panics on 4.0-c
On Thu, Mar 18, 1999 at 06:36:09PM -0800, Matthew Dillon wrote: # :On Thu, Mar 18, 1999 at 12:28:35PM +0300, Mikhail A. Sokolov wrote: # :# Hello, # :# panic: ffs_valloc: dup alloc # : # :And a brand new one (for today): # : # :IdlePTD 2682880 # :initial pcb at 21c7b8 # :panicstr: ffs_blkfree: freeing free frag # :panic messages: # :--- # :panic: ffs_blkfree: freeing free frag # I'm running out of ideas. Ok, three more things: Well, me too.. # First, when you updated your /usr/src/sys tree from cvs, did you also # update /usr/src/contrib/sys? aka softupdates? Yes, I'm running cvsupd server myself and stuff ;) # Second, Make sure you are using softlinks for the softupdates files in # /usr/src/sys/ufs/ffs/, pointing to their actual location in contrib, # rather then a copies of the files. Of course # Third, Try turning off reallocblks: # sysctl -w vfs.ffs.doreallocblks=0 That's been in use since decided somewhere in November, 1998 on ~90% of machines. -- -mishania To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-current" in the body of the message
Re: panics while reading Solaris CDROM
>I get very consistant panics when doing ``find . -type f |xargs grep >foo'' on a Solaris CDROM in my Plextor 8x CDROM drive (device cd0). I'm >not sure how to proceed in fixing this. This should be fixed now. I got very consistent panics for `cd /dosD/windows; find . | xargs cksum' on an msdosfs with a block size of 2K :). cd9660 also has a block size of 2K, and getnewbuf() returned corrupt buffers when it reused buffers that had b_data offset 2K into the space reserved for the buffer data. Bruce To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-current" in the body of the message
Re: panics while reading Solaris CDROM
On Fri, Mar 19, 1999 at 01:11:17AM -0800, David O'Brien wrote: > I get very consistant panics when doing ``find . -type f |xargs grep > foo'' on a Solaris CDROM in my Plextor 8x CDROM drive (device cd0). I'm > not sure how to proceed in fixing this. On another machine (CVSuped and make world on Thur evening), the same command also crashes. The machine was so wedged, I couldn't drop into DDB, nor scroll to the first panic messages, but this is the last one: Fault trap 12: page fault while in kernel mode fault virual address = 0x8 fault code = supervisor write, page not present instruction pointer = 0x8:0xc012f531 stack pointer = 0x10:0xc4ab966c frame pointer = 0x10:0xc4a69678 code segment = base 0x0, limit 0xf, type 0x1b = DPL0, pres1, def32 1, gran 1 processor eflags = interupt enable, resume, IOPL=0 current process = 328 (egrep) interupt mask = net tty bio cam trap number = 12 panic: page fault (da0:ahc0:0:0:0) Synchronize cache failed, status == 0xb, scsi status == 0x0 To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-current" in the body of the message
panics while reading Solaris CDROM
I get very consistant panics when doing ``find . -type f |xargs grep foo'' on a Solaris CDROM in my Plextor 8x CDROM drive (device cd0). I'm not sure how to proceed in fixing this. The core files from the panic seem to be useless -- I can't get anything useful out of ``where'' with a kernel w/debugging symbols: savecore: reboot after panic: vm_fault: fault on nofault entry, addr: c2923000 savecore: writing core to /var/crash/vmcore.2 savecore: writing kernel to /var/crash/kernel.2 (kgdb) symbol-file ./kernel (kgdb) exec-file /var/crash/kernel.2 (kgdb) core-file /var/crash/vmcore.2 IdlePTD 7630437 initial pcb at 2783e4 panic messages: --- dmesg: kvm_read: invalid address (c026ed38) --- #0 0x2520656c in ?? () (kgdb) In DDB I get: panic: vm_fault on nofault entry, addr: c2b08000 db> trace vm_fault(c029c854, c2b08000, 1, 0, c4f147e0) at vm_fault +0x120 trap_pfault(c56bdb40, 0, c2b08000, 8000, c0a3500) at trap_pfault +0xf0 ... ---trap 0xc, eip=0xc0140146, esp=0xc56bdb7c, epb=0xc56bdd1c --- cd9660_lookup(c56bdd4c, c56de414, 16, c56bdf24, c56fc980) at cd9660_lookup+0x226 vfs_cache_lookup(c56bdd9c, c56d5ac0, c56bdf00, c56bdf24, 0) at vfs_cache_lookup+0x269 lookup(56bdf00, 0, c56bdf00, 9e9, c08fb580) at lookup+0x305 namei(56bdf00, 0, c56bdf94, fffc, c0a3e600) at namei+0x1cd vn_open(c56bdf00, 1, 9e9, c4f147e0, c027bdfc) at vn_open+0x1cd open(c4f147e0, c56bdf94, 9fb, bfbf9a4b, 8050f0d) at open+0xbb syscall(2f, 2f, 8050f0d, bfbf9a4b, bfbec648) at syscall+0x19b Xint0x80_syscall() at Xinit0x80_syscall+0x2c My kernel is compiled with ``option DIAGNOSTIC'', my only KLD's are linux.ko and daemon_saver.ko. FreeBSD 4.0-CURRENT #40: Fri Mar 19 00:20:08 PST 1999 ro...@dragon.nuxi.com:/FBSD/src/sys/compile/DRAGON Timecounter "i8254" frequency 1193182 Hz Timecounter "TSC" frequency 199432607 Hz CPU: AMD-K6tm w/ multimedia extensions (199.43-MHz 586-class CPU) Origin = "AuthenticAMD" Id = 0x562 Stepping=2 Features=0x8001bf real memory = 100663296 (98304K bytes) avail memory = 94785536 (92564K bytes) Preloaded elf kernel "kernel" at 0xc0301000. ccd0-1: Concatenated disk drivers Probing for devices on PCI bus 0: chip0: rev 0x03 on pci0.0.0 chip1: rev 0x01 on pci0.7.0 ide_pci0: rev 0x00 on pci0.7.1 fxp0: rev 0x02 int a irq 12 on pci0.17.0 fxp0: Ethernet address 00:a0:c9:8c:c4:49 vga0: rev 0x01 int a irq 9 on pci0.18.0 ahc0: rev 0x00 int a irq 11 on pci0.19.0 ahc0: aic7880 Wide Channel A, SCSI Id=7, 16/255 SCBs ahc1: rev 0x00 int a irq 15 on pci0.20.0 ahc1: aic7870 Single Channel A, SCSI Id=7, 16/255 SCBs Probing for PnP devices: CSN 1 Vendor ID: CTL00c7 [0xc7008c0e] Serial 0x101fd54b Comp ID: PNPb02f [0x2fb0d041] pcm1 (SB16pnp sn 0x101fd54b) at 0x220-0x22f irq 5 drq 1 flags 0x15 on isa Probing for devices on the ISA bus: sc0 on isa sc0: VGA color <16 virtual consoles, flags=0x0> ed0 at 0x280-0x29f irq 10 maddr 0xd8000 msize 16384 on isa ed0: address 00:00:c0:ee:e8:bf, type SMC8216/SMC8216C (16 bit) atkbdc0 at 0x60-0x6f on motherboard atkbd0 irq 1 on isa sio0 at 0x3f8-0x3ff irq 4 flags 0x10 on isa sio0: type 16550A sio1 at 0x2f8-0x2ff irq 3 on isa sio1: type 16550A ppc0 at 0x378 irq 7 on isa ppc0: SMC FDC37C665GT chipset (PS2/NIBBLE) in COMPATIBLE mode plip0: on ppbus 0 lpt0: on ppbus 0 lpt0: Interrupt-driven port ppi0: on ppbus 0 pca0 on motherboard pca0: PC speaker audio driver pcm0 not probed due to drq conflict with pcm1 at 1 fdc0 at 0x3f0-0x3f7 irq 6 drq 2 on isa fdc0: NEC 72065B fdc0: FIFO enabled, 8 bytes threshold fd0: 1.44MB 3.5in fd1: 1.2MB 5.25in wdc0 at 0x1f0-0x1f7 irq 14 flags 0xa0ffa0ff on isa wdc0: unit 0 (wd0): , DMA, 32-bit, multi-block-16 wd0: 6180MB (12658275 sectors), 13395 cyls, 15 heads, 63 S/T, 512 B/S wdc0: unit 1 (wd1): , DMA, 32-bit, multi-block-16 wd1: 8063MB (16514064 sectors), 16383 cyls, 16 heads, 63 S/T, 512 B/S vga0 at 0x3b0-0x3df maddr 0xa msize 131072 on isa npx0 on motherboard npx0: INT 16 interface IP packet filtering initialized, divert enabled, rule-based forwarding disabled, logging disabled Waiting 2 seconds for SCSI devices to settle da12 at ahc1 bus 0 target 2 lun 0 da12: Fixed Direct Access SCSI-2 device da12: 10.000MB/s transfers (10.000MHz, offset 15), Tagged Queueing Enabled da12: 2049MB (4197520 512 byte sectors: 255H 63S/T 261C) da13 at ahc1 bus 0 target 3 lun 0 da13: Fixed Direct Access SCSI-2 device da13: 10.000MB/s transfers (10.000MHz, offset 15), Tagged Queueing Enabled da13: 2033MB (4165272 512 byte sectors: 255H 63S/T 259C) da11 at ahc1 bus 0 target 1 lun 0 da11: Fixed Direct Access SCSI-2 device da11: 10.000MB/s transfers (10.000MHz, offset 15), Tagged Queueing Enabled da11: 2047MB (4194058 512 byte sectors: 255H 63S/T 261C) da1 at ahc0 bus 0 target 1 lun 0 da1: Fixed Direct Access SCSI-2 device da1: 20.000MB/s transfers (20.000MHz, offset 15), Tagged Queueing Enabled da1: 1222MB (2503872 512 byte s