Re: Panic in 6.2-PRERELEASE with bge on amd64
On Sun, 7 Jan 2007, Sven Willenberger wrote: I am starting a new thread on this as what I had assumed was a panic in nfsd turns out to be an issue with the bge driver. This is an amd64 box, dual processor (SMP kernel) that happens to be running nfsd. About every 3-5 days the kernel panics and I have finally managed to get a core dump. The system: FreeBSD 6.2-PRERELEASE #8: Tue Jan 2 10:57:39 EST 2007 Like most NIC drivers, bge unlocks and re-locks around its call to ether_input() in its interrupt handler. This isn't very safe, and it certainly causes panics for bge. I often see it panic when bringing the interface down and up while input is arriving, on a non-SMP non-amd64 (actually i386) non-6.x (actually -current) system. Bringing the interface down is probably the worst case. It creates a null pointer for bge_intr() to follow. The short and dirty of the dump: ... --- trap 0xc, rip = 0x801d5f17, rsp = 0xb371ab50, rbp = 0xb371aba0 --- bge_rxeof() at bge_rxeof+0x3b7 What is the instruction here? bge_intr() at bge_intr+0x1c8 ithread_loop() at ithread_loop+0x14c fork_exit() at fork_exit+0xbb fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xb371ad00, rbp = 0 --- Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0x28 Looks like a null pointer panic anyway. I guess the instruction is movl to/from 0x28(%reg) where %reg is a null pointer. ... #8 0x801db818 in bge_intr (xsc=0x0) at /usr/src/sys/dev/bge/if_bge.c:2707 What is the statement here? It presumably follow a null pointer and only the exprssion for the pointer is interesting. xsc is already null but that is probably a bug in gdb, or the result of excessive optimization. Compiling kernels with -O2 has little effect except to break debugging. I rarely use gdb on kernels and haven't looked closely enough using ddb to see where the null pointer for the panic on down/up came from. BTW, the sbdrop panic in -current isn't bge-only or SMP-only. I saw it once for sk on a non-SMP system. It rarely happens for non-SMP (much more rarely than the panic in bge_intr()). Under -current, on an SMP amd64 system with bge, It happens almost every time on close of the socket for a ttcp server if input is arriving at the time of the close. I haven't seen it for 6.x. Bruce ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: kernel panic on 6.2-RC2 with GENERIC.
Jan Mikkelsen wrote: (Scott: I should have emailed you this earlier, but Christmas and various other things got in the way.) Ian West wrote: On Sun, Jan 07, 2007 at 02:25:02PM -0500, Mike Tancsa wrote: At 11:43 AM 1/7/2007, Craig Rodrigues wrote: On Fri, Jan 05, 2007 at 06:59:10PM +0200, Nikolay Pavlov wrote: [ Areca kernel panic, IO failures ... ] I have seen this identical fault with the new areca driver, my machine is opteron hardware, but running a regular i386/SMP kernel/world. With everything at 6.2RC2 (as of 29th of December) except the areca driver the machine is rock solid, with the 29th of december version of the areca driver the box will crash on extract of a large tar file, removal of a large directory structure, or pretty much anything that does a lot of disk io to different files/locations. There is no error log prior to seeing the following messages.. Dec 29 14:26:44 aleph kernel: g_vfs_done():da0s1g[WRITE(offset=433078272, length=8192)]error = 5 Dec 29 14:26:44 aleph kernel: g_vfs_done():da0s1g[WRITE(offset=433111040, length=16384)]error = 5 Dec 29 14:26:44 aleph kernel: g_vfs_done():da0s1g[WRITE(offset=433209344, length=16384)]error = 5 Dec 29 14:26:44 aleph kernel: g_vfs_done():da0s1g[WRITE(offset=433242112, length=32768)]error = 5 Dec 29 14:26:44 aleph kernel: g_vfs_done():da0s1g[WRITE(offset=437612544, length=4096)]error = 5 Dec 29 14:26:44 aleph kernel: g_vfs_done():da0s1g[WRITE(offset=437616640, length=12288)]error = 5 Dec 29 14:26:44 aleph kernel: g_vfs_done():da0s1g[WRITE(offset=437633024, length=6144)]error = 5 Dec 29 14:26:44 aleph kernel: g_vfs_done():da0s1g[WRITE(offset=437639168, length=2048)]error = 5 Dec 29 14:26:44 aleph kernel: g_vfs_done():da0s1g[WRITE(offset=437641216, length=6144)]error = 5 There are a string of these, followed by a crash and reboot. The file system state can be left very dirty to the point where background fsck seems unable to recover it. The areca card in question is running the latest firmware/boot and has shown no problems either before, or since backing out the areca driver. The volume is ran the tests on was a 250G on a raid6 raid set. I have seen various problems with various Areca drivers. All on 6.2-RC1/amd64 with an Areca RAID-6 volume. Areca 1.20.00.02 seems to work fine. Areca 1.20.00.12 (from the Areca website) seems to have data corruption problems. My tests involve doing a "diff -r" on a filesystem with 2GB of data. It will occasional find differences in files. On examination, the last 640 bytes of the first block of the affected file contain data from another file "nearby" in the filesystem. Unmounting and remounting the filesystems and rerunning the test shows no problem, or a difference in another file entirely. I think this is the cause of the g_vfs_done failures with this version of the driver; the offsets are wrong because the data is corrupted. Areca 1.20.00.13 (as currently in the tree) does not seem to have data corruption problems, but I can trigger g_vfs_done failures under heavy I/O. I have raised this with Areca support, and I'm waiting to hear back from Erich Chen. Regards, Jan Mikkelsen I discussed this issue in length with the release engineering team today, and we're going to go ahead with keeping the .013 version in 6.2 since it has been working very reliably for a number of other testers, and reverting it at this late stage of the release represents more risk. A note about this issue will likely be put into the 6.2 errata document as well. I plan to dig into this problem next week unless Areca fixes it first. Please let me know if you hear anything from them. Scott ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Fatal trap 12: page fault while in kernel mode
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I'm up and running on the patch now as well ... - --On Sunday, January 07, 2007 17:02:40 -0800 Kevin Oberman <[EMAIL PROTECTED]> wrote: >> Date: Sun, 7 Jan 2007 14:03:41 + (GMT) >> From: Robert Watson <[EMAIL PROTECTED]> >> Sender: [EMAIL PROTECTED] >> >> On Sat, 6 Jan 2007, Marc G. Fournier wrote: >> >> > Just had the following happen on a FreeBSD 6.2-PRERELEASE #7: Sun Dec 17> >> > 01:28:52 AST 2006 system ... amd64, HP Proliant, 6G of RAM ... have core > >> > if there is information that I can provide out of it ... >> > >> > Fatal trap 12: page fault while in kernel mode >> > cpuid = 0; apic id = 00 >> > fault virtual address = 0x18c >> > fault code = supervisor read, page not present >> > instruction pointer = 0x8:0x801f9053 >> > stack pointer = 0x10:0xb5c78b30 >> > frame pointer = 0x10:0xb5c78b60 >> > code segment= base 0x0, limit 0xf, type 0x1b >> >= DPL 0, pres 1, long 1, def32 0, gran 1 >> > processor eflags= resume, IOPL = 0 >> > current process = 5 (thread taskq) >> > trap number = 12 >> > panic: page fault >> > cpuid = 0 >> > Uptime: 8d22h25m40s >> > >> > (kgdb) where >> > # 0 doadump () at pcpu.h:172 >> > # 1 0x80203955 in boot (howto=260) at >> > /usr/src/sys/kern/kern_shutdown.c:409 >> > # 2 0x80204065 in panic (fmt=0xff019b667720 >> > "X\223f\233\001???\020?c\233\001???") at >> > /usr/src/sys/kern/kern_shutdown.c:565 >> > # 3 0x803287a6 in trap_fatal (frame=0xc, eva=1844674298110007> >> > # 4784) at >> > /usr/src/sys/amd64/amd64/trap.c:660 >> > # 4 0x80328cd8 in trap (frame= >> > {tf_rdi = 112, tf_rsi = -1092609476832, tf_rdx = 6, tf_rcx => >> > 3221225730, tf_r8 = -1245213424, tf_r9 = -1092609476832, tf_rax = 1, >> > tf_rbx = - -1096874331952, tf_rbp = -1245213856, tf_r10 = -2142258536, >> > tf_r11 > = 0, tf_r12 = 4, tf_r13 = -1092609476832, tf_r14 = 4, tf_r15 = 1, >> > tf_trapno > = 12, tf_addr = 396, tf_flags = -2145197496, tf_err = 0, >> > tf_rip = -2145415085, tf_c> s = 8, tf_rflags = 65538, tf_rsp = >> > -1245213888, tf_ss = 16}) at >> > /usr/src/sys/amd64/amd64/trap.c:238 >> > # 5 0x80313c6b in calltrap () at >> > /usr/src/sys/amd64/amd64/exception.S:168 >> > # 6 0x801f9053 in _mtx_lock_sleep (m=0xff009d31f0d0, >> > tid=18446742981100074784, opts=6, file=0xc102 02 >> > out of bounds>, line=-1245213424) at /usr/src/sys/kern/kern_mutex.c:546 >> > # 7 0x8025b1ac in unp_gc (arg=0x70, pending=-1687783648) at >> > /usr/src/sys/kern/uipc_usrreq.c:1714 >> > # 8 0x8022c314 in taskqueue_run (queue=0xff844800) at >> > /usr/src/sys/kern/subr_taskqueue.c:257 >> > # 9 0x8022d0e7 in taskqueue_thread_loop (arg=0x70) at >> > /usr/src/sys/kern/subr_taskqueue.c:376 >> > # 10 0x801e7b76 in fork_exit (callout=0x8022d060 >> > , arg=0x805030d0, frame=0xb5c7> >> > 8c50) at /usr/src/sys/kern/kern_fork.c:821 >> > # 11 0x80313fce in fork_trampoline () at >> > /usr/src/sys/amd64/amd64/exception.S:394 >> >> This is a NULL pointer dereference in the UNIX domain socket code. John >> Baldwin recently committed a fix for a bug with these symptoms to 7-CURRENT> >> , with an MFC planned in the near future. The fix won't make 6.2-RELEASE, >> bu> t assuming it tests out well over the next few weeks, we will cut an >> errata> patch/announcement for it. I believe you can pull down his >> 6-STABLE versio> n at: >> >>http://people.FreeBSD.org/~jhb/patches/unp_gc.patch >> >> This same patch is currently in texting on mx1.FreeBSD.org. >> >> (John CC'd) >> >> Robert N M Watson >> Computer Laboratory >> University of Cambridge > > I have installed this on my system, but the panics have always been very > erratic, so it may be a while before I am sure whether this fixes it. At > the moment the system has been up for 7 days, although I have had > multiple crashes in a single day. > -- > R. Kevin Oberman, Network Engineer > Energy Sciences Network (ESnet) > Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) > E-mail: [EMAIL PROTECTED] Phone: +1 510 486-8634 > Key fingerprint:059B 2DDF 031C 9BA3 14A4 EADA 927D EBB3 987B 3751 - Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (FreeBSD) iD8DBQFFobPh4QvfyHIvDvMRAuGBAJ4vwJoVIRmbdHK6wqBxneuUzjekfACgr4Ys 2DSldX3rTRAHkng3UqKO+8U= =FtuJ -END PGP SIGNATURE- ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: kernel panic on 6.2-RC2 with GENERIC.
Craig Rodrigues wrote: On Fri, Jan 05, 2007 at 06:59:10PM +0200, Nikolay Pavlov wrote: Hello folks. I have kernel panic on GENERIC kernel while executing postmark. The sequence of steps that Nikolay used to produce this panic was: - install benchmarks/postmark from ports root# postmark PostMark v1.5 : 3/27/01 pm>set number=1 pm>set transactions=1 pm>set subdirectories=1 pm>show pm>run I was able to perform this test without issue on my latest test server using an Adaptec 2130SLP using Raid1 and 2 Ultra320 Scsi-3 drives. Plain install, GENERIC-SMP kernel: su-2.05b# postmark PostMark v1.5 : 3/27/01 pm>set number=1 pm>set transactions=1 pm>set subdirectories=1 pm>set location /tmp pm>run Creating subdirectories...Done Creating files...Done Performing transactions..Done Deleting files...Done Deleting subdirectories...Done Time: 102 seconds total 69 seconds of transactions (144 per second) Files: 15027 created (147 per second) Creation alone: 1 files (588 per second) Mixed with transactions: 5027 files (72 per second) 4990 read (72 per second) 5009 appended (72 per second) 15027 deleted (147 per second) Deletion alone: 10054 files (628 per second) Mixed with transactions: 4973 files (72 per second) Data: 27.14 megabytes read (272.46 kilobytes per second) 85.08 megabytes written (854.14 kilobytes per second) pm>quit su-2.05b# uname -a FreeBSD testserv1.aci 6.2-RC2 FreeBSD 6.2-RC2 #0: Sun Dec 24 23:42:30 UTC 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/SMP i386 Cheers, Jeff ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Fatal trap 12: page fault while in kernel mode
> Date: Sun, 7 Jan 2007 14:03:41 + (GMT) > From: Robert Watson <[EMAIL PROTECTED]> > Sender: [EMAIL PROTECTED] > > On Sat, 6 Jan 2007, Marc G. Fournier wrote: > > > Just had the following happen on a FreeBSD 6.2-PRERELEASE #7: Sun Dec 17> > > 01:28:52 AST 2006 system ... amd64, HP Proliant, 6G of RAM ... have core > > > if > > there is information that I can provide out of it ... > > > > Fatal trap 12: page fault while in kernel mode > > cpuid = 0; apic id = 00 > > fault virtual address = 0x18c > > fault code = supervisor read, page not present > > instruction pointer = 0x8:0x801f9053 > > stack pointer = 0x10:0xb5c78b30 > > frame pointer = 0x10:0xb5c78b60 > > code segment= base 0x0, limit 0xf, type 0x1b > >= DPL 0, pres 1, long 1, def32 0, gran 1 > > processor eflags= resume, IOPL = 0 > > current process = 5 (thread taskq) > > trap number = 12 > > panic: page fault > > cpuid = 0 > > Uptime: 8d22h25m40s > > > > (kgdb) where > > #0 doadump () at pcpu.h:172 > > #1 0x80203955 in boot (howto=260) at > > /usr/src/sys/kern/kern_shutdown.c:409 > > #2 0x80204065 in panic (fmt=0xff019b667720 > > "X\223f\233\001ÿÿÿ\020µc\233\001ÿÿÿ") at > > /usr/src/sys/kern/kern_shutdown.c:565 > > #3 0x803287a6 in trap_fatal (frame=0xc, eva=1844674298110007> > > 4784) at > > /usr/src/sys/amd64/amd64/trap.c:660 > > #4 0x80328cd8 in trap (frame= > > {tf_rdi = 112, tf_rsi = -1092609476832, tf_rdx = 6, tf_rcx => > > 3221225730, > > tf_r8 = -1245213424, tf_r9 = -1092609476832, tf_rax = 1, tf_rbx = > > - -1096874331952, tf_rbp = -1245213856, tf_r10 = -2142258536, tf_r11 > = 0, > > tf_r12 > > = 4, tf_r13 = -1092609476832, tf_r14 = 4, tf_r15 = 1, tf_trapno > = 12, > > tf_addr = > > 396, tf_flags = -2145197496, tf_err = 0, tf_rip = -2145415085, tf_c> s = 8, > > tf_rflags = 65538, tf_rsp = -1245213888, tf_ss = 16}) at > > /usr/src/sys/amd64/amd64/trap.c:238 > > #5 0x80313c6b in calltrap () at > > /usr/src/sys/amd64/amd64/exception.S:168 > > #6 0x801f9053 in _mtx_lock_sleep (m=0xff009d31f0d0, > > tid=18446742981100074784, opts=6, file=0xc102 02 out > > of > > bounds>, line=-1245213424) at /usr/src/sys/kern/kern_mutex.c:546 > > #7 0x8025b1ac in unp_gc (arg=0x70, pending=-1687783648) at > > /usr/src/sys/kern/uipc_usrreq.c:1714 > > #8 0x8022c314 in taskqueue_run (queue=0xff844800) at > > /usr/src/sys/kern/subr_taskqueue.c:257 > > #9 0x8022d0e7 in taskqueue_thread_loop (arg=0x70) at > > /usr/src/sys/kern/subr_taskqueue.c:376 > > #10 0x801e7b76 in fork_exit (callout=0x8022d060 > > , arg=0x805030d0, frame=0xb5c7> > > 8c50) at > > /usr/src/sys/kern/kern_fork.c:821 > > #11 0x80313fce in fork_trampoline () at > > /usr/src/sys/amd64/amd64/exception.S:394 > > This is a NULL pointer dereference in the UNIX domain socket code. John > Baldwin recently committed a fix for a bug with these symptoms to 7-CURRENT> > , > with an MFC planned in the near future. The fix won't make 6.2-RELEASE, bu> > t > assuming it tests out well over the next few weeks, we will cut an errata> > patch/announcement for it. I believe you can pull down his 6-STABLE versio> > n > at: > >http://people.FreeBSD.org/~jhb/patches/unp_gc.patch > > This same patch is currently in texting on mx1.FreeBSD.org. > > (John CC'd) > > Robert N M Watson > Computer Laboratory > University of Cambridge I have installed this on my system, but the panics have always been very erratic, so it may be a while before I am sure whether this fixes it. At the moment the system has been up for 7 days, although I have had multiple crashes in a single day. -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: [EMAIL PROTECTED] Phone: +1 510 486-8634 Key fingerprint:059B 2DDF 031C 9BA3 14A4 EADA 927D EBB3 987B 3751 pgpMYoyJA65XW.pgp Description: PGP signature
Re: Intel PRO/1000 PT Desktop NIC, PCIe 1x supported in FreeBSD 6.2?
On 1/7/07,"Jack Vogel" <[EMAIL PROTECTED]> wrote: > Yes, all released Intel PCI-E wired NICs are supported, sometimes not all > features of the hardware are supported, but my job at Intel is to keep > the driver > working with new hardware, and to add support for features that don't have > such now. And my sincere thanks to both Intel and you for the efforts to make Intel wired Ethernet devices work well with FreeBSD. Now, it someone could just explain why Intel is so good about this are of OSS support but still refuses to even discuss issues with OSS support for wireless cards. Ah, well. Intel still does far batter than many hardware suppliers, in no small part due to Jack's efforts. -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: [EMAIL PROTECTED] Phone: +1 510 486-8634 Key fingerprint:059B 2DDF 031C 9BA3 14A4 EADA 927D EBB3 987B 3751 pgpoBLIqq1XLX.pgp Description: PGP signature
Re: Source MAC addresses when bridge(4) used
Peter Jeremy wrote: > I've just noticed an number of unpexected "IP address changed MAC" > messages on one of the hosts in my network. It is connected via a > FreeBSD bridge to the rest of my network (there aren't enuf network > ports in my son's bedroom). The configuration looks like: > > +-+ +-+ > | | | | > | laptop1 |-| desktop |--> Rest of network > | |dc0 tl0| |rl0 via dumb switch > +-+ +-+ > > The desktop network configuration is: > tl0: flags=8943 mtu 1500 > ether 00:00:24:28:98:9a > media: Ethernet autoselect (100baseTX ) > status: active > rl0: flags=8943 mtu 1500 > options=8 > inet 192.168.123.36 netmask 0xff00 broadcast 192.168.123.255 > ether 00:20:ed:78:9c:a3 > media: Ethernet autoselect (100baseTX ) > status: active > lo0: flags=8049 mtu 16384 > inet 127.0.0.1 netmask 0xff00 > bridge0: flags=8043 mtu 1500 > ether ca:a9:aa:1e:71:32 > priority 32768 hellotime 2 fwddelay 15 maxage 20 > member: tl0 flags=3 > member: rl0 flags=3 > > laptop1 is regularly reporting that 192.168.123.36 (the IP address of > the desktop) is switching between the two adapters in it: > Jan 6 07:27:09 laptop1 kernel: arp: 192.168.123.36 moved from > 00:00:24:28:98:9a to 00:20:ed:78:9c:a3 on dc0 > Jan 6 08:09:45 laptop1 kernel: arp: 192.168.123.36 moved from > 00:20:ed:78:9c:a3 to 00:00:24:28:98:9a on dc0 > Jan 6 08:46:11 laptop1 kernel: arp: 192.168.123.36 moved from > 00:00:24:28:98:9a to 00:20:ed:78:9c:a3 on dc0 > Jan 6 09:29:00 laptop1 kernel: arp: 192.168.123.36 moved from > 00:00:24:28:98:9a to 00:20:ed:78:9c:a3 on dc0 > Jan 6 12:12:12 laptop1 kernel: arp: 192.168.123.36 moved from > 00:20:ed:78:9c:a3 to 00:00:24:28:98:9a on dc0 > Jan 6 12:15:31 laptop1 kernel: arp: 192.168.123.36 moved from > 00:00:24:28:98:9a to 00:20:ed:78:9c:a3 on dc0 > Jan 6 13:06:42 laptop1 kernel: arp: 192.168.123.36 moved from > 00:00:24:28:98:9a to 00:20:ed:78:9c:a3 on dc0 > Jan 6 16:48:45 laptop1 kernel: arp: 192.168.123.36 moved from > 00:00:24:28:98:9a to 00:20:ed:78:9c:a3 on dc0 > Jan 6 17:32:22 laptop1 kernel: arp: 192.168.123.36 moved from > 00:20:ed:78:9c:a3 to 00:00:24:28:98:9a on dc0 > Jan 6 17:33:33 laptop1 kernel: arp: 192.168.123.36 moved from > 00:00:24:28:98:9a to 00:20:ed:78:9c:a3 on dc0 > Jan 6 17:53:45 laptop1 kernel: arp: 192.168.123.36 moved from > 00:20:ed:78:9c:a3 to 00:00:24:28:98:9a on dc0 > Jan 6 17:57:05 laptop1 kernel: arp: 192.168.123.36 moved from > 00:00:24:28:98:9a to 00:20:ed:78:9c:a3 on dc0 > Jan 6 18:17:20 laptop1 kernel: arp: 192.168.123.36 moved from > 00:20:ed:78:9c:a3 to 00:00:24:28:98:9a on dc0 > Jan 6 18:24:48 laptop1 kernel: arp: 192.168.123.36 moved from > 00:00:24:28:98:9a to 00:20:ed:78:9c:a3 on dc0 > Jan 6 18:45:08 laptop1 kernel: arp: 192.168.123.36 moved from > 00:20:ed:78:9c:a3 to 00:00:24:28:98:9a on dc0 > Jan 6 18:48:19 laptop1 kernel: arp: 192.168.123.36 moved from > 00:00:24:28:98:9a to 00:20:ed:78:9c:a3 on dc0 > Jan 6 19:08:45 laptop1 kernel: arp: 192.168.123.36 moved from > 00:20:ed:78:9c:a3 to 00:00:24:28:98:9a on dc0 > Jan 6 19:11:50 laptop1 kernel: arp: 192.168.123.36 moved from > 00:00:24:28:98:9a to 00:20:ed:78:9c:a3 on dc0 > Jan 6 19:32:15 laptop1 kernel: arp: 192.168.123.36 moved from > 00:20:ed:78:9c:a3 to 00:00:24:28:98:9a on dc0 > Jan 6 19:33:07 laptop1 kernel: arp: 192.168.123.36 moved from > 00:00:24:28:98:9a to 00:20:ed:78:9c:a3 on dc0 > Jan 6 19:56:34 laptop1 kernel: arp: 192.168.123.36 moved from > 00:00:24:28:98:9a to 00:20:ed:78:9c:a3 on dc0 > Jan 6 22:44:24 laptop1 kernel: arp: 192.168.123.36 moved from > 00:20:ed:78:9c:a3 to 00:00:24:28:98:9a on dc0 > Jan 6 23:04:26 laptop1 kernel: arp: 192.168.123.36 moved from > 00:00:24:28:98:9a to 00:20:ed:78:9c:a3 on dc0 > > Even more unexpectedly, laptop1 is repeating the same "moved" message: > Jan 7 00:46:55 laptop1 kernel: arp: 192.168.123.36 moved from > 00:00:24:28:98:9a to 00:20:ed:78:9c:a3 on dc0 > Jan 7 01:38:09 laptop1 kernel: arp: 192.168.123.36 moved from > 00:00:24:28:98:9a to 00:20:ed:78:9c:a3 on dc0 > Jan 7 02:29:26 laptop1 kernel: arp: 192.168.123.36 moved from > 00:00:24:28:98:9a to 00:20:ed:78:9c:a3 on dc0 > Jan 7 03:20:39 laptop1 kernel: arp: 192.168.123.36 moved from > 00:00:24:28:98:9a to 00:20:ed:78:9c:a3 on dc0 > Jan 7 04:28:59 laptop1 kernel: arp: 192.168.123.36 moved from > 00:00:24:28:98:9a to 00:20:ed:78:9c:a3 on dc0 > Jan 7 05:18:50 laptop1 kernel: arp: 192.168.123.36 moved from > 00:00:24:28:98:9a to 00:20:ed:78:9c:a3 on dc0 > Jan 7 06:28:31 laptop1 kernel: arp: 192.168.123.36 moved from > 00:00:24:28:98:9a to 00:20:ed:78:9c:a3 on dc0 > Jan 7 07:16:05 laptop1 kernel: arp: 192.168.123.36 moved from > 00:00:24:28:98:9a to 00:20:ed:78:9c:a3 on dc0 > > Both hosts are running 6.1-STABLE: > laptop1: FreeBSD lapt
RE: kernel panic on 6.2-RC2 with GENERIC.
(Scott: I should have emailed you this earlier, but Christmas and various other things got in the way.) Ian West wrote: > On Sun, Jan 07, 2007 at 02:25:02PM -0500, Mike Tancsa wrote: > > At 11:43 AM 1/7/2007, Craig Rodrigues wrote: > > >On Fri, Jan 05, 2007 at 06:59:10PM +0200, Nikolay Pavlov wrote: >>> [ Areca kernel panic, IO failures ... ] > I have seen this identical fault with the new areca driver, my machine > is opteron hardware, but running a regular i386/SMP kernel/world. With > everything at 6.2RC2 (as of 29th of December) except the areca driver > the machine is rock solid, with the 29th of december version of the > areca driver the box will crash on extract of a large tar > file, removal > of a large directory structure, or pretty much anything that > does a lot > of disk io to different files/locations. There is no error > log prior to > seeing the following messages.. > > Dec 29 14:26:44 aleph kernel: > g_vfs_done():da0s1g[WRITE(offset=433078272, length=8192)]error = 5 > Dec 29 14:26:44 aleph kernel: > g_vfs_done():da0s1g[WRITE(offset=433111040, length=16384)]error = 5 > Dec 29 14:26:44 aleph kernel: > g_vfs_done():da0s1g[WRITE(offset=433209344, length=16384)]error = 5 > Dec 29 14:26:44 aleph kernel: > g_vfs_done():da0s1g[WRITE(offset=433242112, length=32768)]error = 5 > Dec 29 14:26:44 aleph kernel: > g_vfs_done():da0s1g[WRITE(offset=437612544, length=4096)]error = 5 > Dec 29 14:26:44 aleph kernel: > g_vfs_done():da0s1g[WRITE(offset=437616640, length=12288)]error = 5 > Dec 29 14:26:44 aleph kernel: > g_vfs_done():da0s1g[WRITE(offset=437633024, length=6144)]error = 5 > Dec 29 14:26:44 aleph kernel: > g_vfs_done():da0s1g[WRITE(offset=437639168, length=2048)]error = 5 > Dec 29 14:26:44 aleph kernel: > g_vfs_done():da0s1g[WRITE(offset=437641216, length=6144)]error = 5 > > There are a string of these, followed by a crash and reboot. > The file system > state can be left very dirty to the point where background > fsck seems unable > to recover it. > > The areca card in question is running the latest firmware/boot and > has shown no problems either before, or since backing out the areca > driver. > > The volume is ran the tests on was a 250G on a raid6 raid set. I have seen various problems with various Areca drivers. All on 6.2-RC1/amd64 with an Areca RAID-6 volume. Areca 1.20.00.02 seems to work fine. Areca 1.20.00.12 (from the Areca website) seems to have data corruption problems. My tests involve doing a "diff -r" on a filesystem with 2GB of data. It will occasional find differences in files. On examination, the last 640 bytes of the first block of the affected file contain data from another file "nearby" in the filesystem. Unmounting and remounting the filesystems and rerunning the test shows no problem, or a difference in another file entirely. I think this is the cause of the g_vfs_done failures with this version of the driver; the offsets are wrong because the data is corrupted. Areca 1.20.00.13 (as currently in the tree) does not seem to have data corruption problems, but I can trigger g_vfs_done failures under heavy I/O. I have raised this with Areca support, and I'm waiting to hear back from Erich Chen. Regards, Jan Mikkelsen ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Panic in 6.2-PRERELEASE with bge on amd64
I am starting a new thread on this as what I had assumed was a panic in nfsd turns out to be an issue with the bge driver. This is an amd64 box, dual processor (SMP kernel) that happens to be running nfsd. About every 3-5 days the kernel panics and I have finally managed to get a core dump. The system: FreeBSD 6.2-PRERELEASE #8: Tue Jan 2 10:57:39 EST 2007 The short and dirty of the dump: # kgdb /usr/obj/usr/src/sys/MSPOOL/kernel.debug /var/crash/vmcore.0 [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd". Unread portion of the kernel message buffer: lock order reversal: (sleepable after non-sleepable) 1st 0x8836b010 bge0 (network driver) @ /usr/src/sys/dev/bge/if_bge.c:2675 2nd 0x805f26b0 user map (user map) @ /usr/src/sys/vm/vm_map.c:3074 KDB: stack backtrace: witness_checkorder() at witness_checkorder+0x4da _sx_xlock() at _sx_xlock+0x51 vm_map_lookup() at vm_map_lookup+0x44 vm_fault() at vm_fault+0xba trap_pfault() at trap_pfault+0x13c trap() at trap+0x1f9 calltrap() at calltrap+0x5 --- trap 0xc, rip = 0x801d5f17, rsp = 0xb371ab50, rbp = 0xb371aba0 --- bge_rxeof() at bge_rxeof+0x3b7 bge_intr() at bge_intr+0x1c8 ithread_loop() at ithread_loop+0x14c fork_exit() at fork_exit+0xbb fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xb371ad00, rbp = 0 --- Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0x28 fault code = supervisor write, page not present instruction pointer = 0x8:0x801d5f17 stack pointer = 0x10:0xb371ab50 frame pointer = 0x10:0xb371aba0 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 28 (irq24: bge0) trap number = 12 panic: page fault cpuid = 1 Uptime: 3d4h18m42s #0 doadump () at pcpu.h:172 172 pcpu.h: No such file or directory. in pcpu.h (kgdb) bt #0 doadump () at pcpu.h:172 #1 0x802771b9 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #2 0x80276c4b in panic (fmt=0x8044160c "%s") at /usr/src/sys/kern/kern_shutdown.c:565 #3 0x803ebba6 in trap_fatal (frame=0xc, eva=18446742978291675136) at /usr/src/sys/amd64/amd64/trap.c:660 #4 0x803ebee3 in trap_pfault (frame=0xb371aaa0, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:573 #5 0x803ec0f9 in trap (frame= {tf_rdi = 0, tf_rsi = 0, tf_rdx = 1, tf_rcx = 499, tf_r8 = 2521427970, tf_r9 = -1099500152320, tf_rax = 0, tf_rbx = -1263948192, tf_rbp = -1284396128, tf_r10 = 0, tf_r11 = 0, tf_r12 = -2009681920, tf_r13 = 0, tf_r14 = 0, tf_r15 = -1099499984896, tf_trapno = 12, tf_addr = 40, tf_flags = -1263948192, tf_err = 2, tf_rip = -2145558761, tf_cs = 8, tf_rflags = 66071, tf_rsp = -1284396192, tf_ss = 16}) at /usr/src/sys/amd64/amd64/trap.c:352 #6 0x803d779b in calltrap () at /usr/src/sys/amd64/amd64/exception.S:168 #7 0x801d5f17 in bge_rxeof (sc=0x8836b000) at /usr/src/sys/dev/bge/if_bge.c:2528 #8 0x801db818 in bge_intr (xsc=0x0) at /usr/src/sys/dev/bge/if_bge.c:2707 #9 0x8025f2bc in ithread_loop (arg=0xffb1b320) at /usr/src/sys/kern/kern_intr.c:682 #10 0x8025e00b in fork_exit (callout=0x8025f170 , arg=0xffb1b320, frame=0xb371ac50) at /usr/src/sys/kern/kern_fork.c:821 #11 0x803d7afe in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:394 If more information is needed (disassemble, etc) please let me know. In the interim I may switch to either using the base100 ethernet port (fxp) or turn off SMP. Sven ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: kernel panic on 6.2-RC2 with GENERIC.
On Sun, Jan 07, 2007 at 02:25:02PM -0500, Mike Tancsa wrote: > At 11:43 AM 1/7/2007, Craig Rodrigues wrote: > >On Fri, Jan 05, 2007 at 06:59:10PM +0200, Nikolay Pavlov wrote: > >> Hello folks. > >> I have kernel panic on GENERIC kernel while executing postmark. > > > >The sequence of steps that Nikolay used to produce this panic was: > > > >- install benchmarks/postmark from ports > > > >root# postmark > >PostMark v1.5 : 3/27/01 > >pm>set number=1 > >pm>set transactions=1 > >pm>set subdirectories=1 > >pm>show > >pm>run > > I am able to do this on an AMD64 on a AREAC RAID6 file system and on > a plain old ata drive on i386 without issue. > > the i386 is a few weeks old but I will cvsup and re-try to confirm on > both today > > [tyan-1u]# postmark > PostMark v1.5 : 3/27/01 > pm>set number=1 > pm>set transactions=1 > pm>set subdirectories=1 > pm>set location /tmp > pm>run > Creating subdirectories...Done > Creating files...Done > Performing transactions..Done > Deleting files...Done > Deleting subdirectories...Done > Time: > 481 seconds total > 233 seconds of transactions (42 per second) > > Files: > 15027 created (31 per second) > Creation alone: 1 files (62 per second) > Mixed with transactions: 5027 files (21 per second) > 4990 read (21 per second) > 5009 appended (21 per second) > 15027 deleted (31 per second) > Deletion alone: 10054 files (115 per second) > Mixed with transactions: 4973 files (21 per second) > > Data: > 27.14 megabytes read (57.78 kilobytes per second) > 85.08 megabytes written (181.13 kilobytes per second) > pm>quit > [tyan-1u]# uname -a > FreeBSD tyan-1u.sentex.ca 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #1: > Mon Dec 11 17:45:45 EST > 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/tyan i386 > [tyan-1u]# > > > and amd64 > > pm>show > Current configuration is: > The base number of files is 1 > Transactions: 1 > Files range between 500 bytes and 9.77 kilobytes in size > Working directory: > /mnt (weight=1) > 1 subdirectories will be used > Block sizes are: read=512 bytes, write=512 bytes > Biases are: read/append=5, create/delete=5 > Using Unix buffered file I/O > Random number generator seed is 42 > Report format is verbose. > pm>run > Creating subdirectories...Done > Creating files...Done > Performing transactions..Done > Deleting files...Done > Deleting subdirectories...Done > Time: > 310 seconds total > 155 seconds of transactions (64 per second) > > Files: > 15027 created (48 per second) > Creation alone: 1 files (103 per second) > Mixed with transactions: 5027 files (32 per second) > 4990 read (32 per second) > 5009 appended (32 per second) > 15027 deleted (48 per second) > Deletion alone: 10054 files (173 per second) > Mixed with transactions: 4973 files (32 per second) > > Data: > 27.14 megabytes read (89.65 kilobytes per second) > 85.08 megabytes written (281.04 kilobytes per second) > pm>quit > [r2-releng6-64]# uname -a > FreeBSD r2-releng6-64.sentex.ca 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE > #0: Thu Dec 28 23:13:18 EST > 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/router amd64 > [r2-releng6-64]# > > > both file systems have normal newfs options and fairly standard > kernels with default /etc/make.conf and both are SMP I have seen this identical fault with the new areca driver, my machine is opteron hardware, but running a regular i386/SMP kernel/world. With everything at 6.2RC2 (as of 29th of December) except the areca driver the machine is rock solid, with the 29th of december version of the areca driver the box will crash on extract of a large tar file, removal of a large directory structure, or pretty much anything that does a lot of disk io to different files/locations. There is no error log prior to seeing the following messages.. Dec 29 14:26:44 aleph kernel: g_vfs_done():da0s1g[WRITE(offset=433078272, length=8192)]error = 5 Dec 29 14:26:44 aleph kernel: g_vfs_done():da0s1g[WRITE(offset=433111040, length=16384)]error = 5 Dec 29 14:26:44 aleph kernel: g_vfs_done():da0s1g[WRITE(offset=433209344, length=16384)]error = 5 Dec 29 14:26:44 aleph kernel: g_vfs_done():da0s1g[WRITE(offset=433242112, length=32768)]error = 5 Dec 29 14:26:44 aleph kernel: g_vfs_done():da0s1g[WRITE(offset=437612544, length=4096)]error = 5 Dec 29 14:26:44 aleph kernel: g_vfs_done():da0s1g[WRITE(offset=437616640, length=12288)]error = 5 Dec 29 14:26:44 aleph kernel: g_vfs_done():da0s1g[WRITE(offset=437633024, length=6144)]error = 5 Dec 29 14:26:44 aleph kernel: g_vfs_done():da0s1g[WRITE(offset=437639168, length=2048)]error = 5 Dec 29 14:26:44 aleph kernel: g_vfs_done():da0s1g[WRITE(offset=437641216, length=6144)]error = 5 There are a string of these,
Re: Intel PRO/1000 PT Desktop NIC, PCIe 1x supported in FreeBSD 6.2?
On 1/7/07, Erik Trulsson <[EMAIL PROTECTED]> wrote: On Sun, Jan 07, 2007 at 04:56:26PM +0100, O. Hartmann wrote: > Hello, > the company I'm working is about to purchae some additional NICs for > some replacement built-in NICs of nForce 405-based desktop PCs. I would > like to purchase the above mentioned NICs from Intel, hoping the em() > driver is capable of handling the NICs. I have one of those cards (Intel PRO/1000 PT Desktop NIC) myself, and it works just fine under 6-STABLE. So far I have not had any problems at all with it. Yes, all released Intel PCI-E wired NICs are supported, sometimes not all features of the hardware are supported, but my job at Intel is to keep the driver working with new hardware, and to add support for features that don't have such now. Happy New Year, Jack ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: kernel panic on 6.2-RC2 with GENERIC.
At 11:43 AM 1/7/2007, Craig Rodrigues wrote: On Fri, Jan 05, 2007 at 06:59:10PM +0200, Nikolay Pavlov wrote: > Hello folks. > I have kernel panic on GENERIC kernel while executing postmark. The sequence of steps that Nikolay used to produce this panic was: - install benchmarks/postmark from ports root# postmark PostMark v1.5 : 3/27/01 pm>set number=1 pm>set transactions=1 pm>set subdirectories=1 pm>show pm>run I am able to do this on an AMD64 on a AREAC RAID6 file system and on a plain old ata drive on i386 without issue. the i386 is a few weeks old but I will cvsup and re-try to confirm on both today [tyan-1u]# postmark PostMark v1.5 : 3/27/01 pm>set number=1 pm>set transactions=1 pm>set subdirectories=1 pm>set location /tmp pm>run Creating subdirectories...Done Creating files...Done Performing transactions..Done Deleting files...Done Deleting subdirectories...Done Time: 481 seconds total 233 seconds of transactions (42 per second) Files: 15027 created (31 per second) Creation alone: 1 files (62 per second) Mixed with transactions: 5027 files (21 per second) 4990 read (21 per second) 5009 appended (21 per second) 15027 deleted (31 per second) Deletion alone: 10054 files (115 per second) Mixed with transactions: 4973 files (21 per second) Data: 27.14 megabytes read (57.78 kilobytes per second) 85.08 megabytes written (181.13 kilobytes per second) pm>quit [tyan-1u]# uname -a FreeBSD tyan-1u.sentex.ca 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #1: Mon Dec 11 17:45:45 EST 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/tyan i386 [tyan-1u]# and amd64 pm>show Current configuration is: The base number of files is 1 Transactions: 1 Files range between 500 bytes and 9.77 kilobytes in size Working directory: /mnt (weight=1) 1 subdirectories will be used Block sizes are: read=512 bytes, write=512 bytes Biases are: read/append=5, create/delete=5 Using Unix buffered file I/O Random number generator seed is 42 Report format is verbose. pm>run Creating subdirectories...Done Creating files...Done Performing transactions..Done Deleting files...Done Deleting subdirectories...Done Time: 310 seconds total 155 seconds of transactions (64 per second) Files: 15027 created (48 per second) Creation alone: 1 files (103 per second) Mixed with transactions: 5027 files (32 per second) 4990 read (32 per second) 5009 appended (32 per second) 15027 deleted (48 per second) Deletion alone: 10054 files (173 per second) Mixed with transactions: 4973 files (32 per second) Data: 27.14 megabytes read (89.65 kilobytes per second) 85.08 megabytes written (281.04 kilobytes per second) pm>quit [r2-releng6-64]# uname -a FreeBSD r2-releng6-64.sentex.ca 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #0: Thu Dec 28 23:13:18 EST 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/router amd64 [r2-releng6-64]# both file systems have normal newfs options and fairly standard kernels with default /etc/make.conf and both are SMP ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: (audit?) Panic in 6.2-PRERELEASE
On Sun, Jan 07, 2007 at 06:05:39PM +, Robert Watson wrote: > > On Sun, 7 Jan 2007, Ceri Davies wrote: > > >>Could you try printing *td->td_ar? Maybe this will give us a clue as to > >>how far it got. In particular, this may be able to more reliably give us > >>the file descriptor number, which is audited early in the system call. > >>You might find that 'td' is corrupted in many layers of the stack, keep > >>going up until you find one where it's good. It may well be that > >>td->td_ar->k_ar.ar_arg_fd is correct, and might confirm that uap->fd is > >>correct still. We'd like also to know if ARG_SOCKINFO, ARG_VNODE1, or > >>ARG_VNODE2 is set in the k_ar.ar_valid_arg field. This may tell us some > >>more about the file descriptor even though it appears to have vanished. > > > >*td->td_ar is null (0x0) in both cases... > > I'm actually beginning to wonder if this is actually audit-related at all. > Something is clearly not right, and the audit code should not actually have > been entered at all there. Perhaps we're being mislead by the stack trace > corruption into thinking audit is involved. I've wondered the same. > >>I'm quite worried by the fact that the file descriptor seems not to be > >>present any more -- this suggests a file descriptor related race of the > >>sort that is both quite difficult to figure out and also quite a risk. > >>It's strange that it would only trigger with audit, however--perhaps > >>audit stretches out the race. Is this an SMP box? > > > >It's certainly looking quite nasty. This system is UP hardware without > >options SMP. > > > >... > > > >If it's at all useful, I can provide access to this system and the dumps. > > Yeah, I think at this point that would probably be the most helpful thing. OK, you should be able to log in as [EMAIL PROTECTED] with your freefall key. Details in ~rwatson/README once you're logged in. > Could you confirm that the kernel.debug you're using definitely matches the > version of the kernel in the core dump? Yes, definitely. Thanks again, Ceri -- That must be wonderful! I don't understand it at all. -- Moliere pgpySGWT4f6UY.pgp Description: PGP signature
Re: make buildworld is always braking at various points
Hello, > I keep having troubles compiling either 6.1-RELEASE and 6.2-RC2. > I downloaded sources, extracted them with install.sh and did a cvsup. > [..] > # make buildworld > it breaks with these last lines: > > [.. lines suppressed ..] > building shared library libpmc.so.3 > ===> lib/libpthread (all) > [.. lines suppressed ..] > cc -O2 -fno-strict-aliasing -pipe -march=pentium4 -DPTHREAD_KERNEL > -I/usr/src/lib/libpthread/../libc/include > -I/usr/src/lib/libpthread/thread > -I/usr/src/lib/libpthread/../../include > -I/usr/src/lib/libpthread/arch/i386/include > -I/usr/src/lib/libpthread/sys > -I/usr/src/lib/libpthread/../../libexec/rtld-elf > -I/usr/src/lib/libpthread/../../libexec/rtld-elf/i386 -fno-builtin > -D_LOCK_DEBUG -D_PTHREADS_INVARIANTS -Wall > -I/usr/src/lib/libpthread/../libc/i386 -c > /usr/src/lib/libpthread/thread/thr_condattr_pshared.c > make: don't know how to make > /usr/src.lib/libpthread/arch/i386/include/pthread_md.h. Stop > *** Error code 2 when make tries to rebuild source files, this is often an indication of mis-set system clocks. Check date/time settings on your machine. Wolfgang ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: make buildworld is always braking at various points
On Sun, 2007-Jan-07 17:44:24 +0100, Christoph Illnar wrote: >I keep having troubles compiling either 6.1-RELEASE and 6.2-RC2. >I downloaded sources, extracted them with install.sh and did a cvsup. > >My installed system is 6.1-RELEASE and I keep trying to compile it on my >own. Is the failure consistent? I suspect you may have bad RAM. >===> lib/libpthread (all) > >[.. lines suppressed ..] > >cc -O2 -fno-strict-aliasing -pipe -march=pentium4 -DPTHREAD_KERNEL >-I/usr/src/lib/libpthread/../libc/include >-I/usr/src/lib/libpthread/thread >-I/usr/src/lib/libpthread/../../include >-I/usr/src/lib/libpthread/arch/i386/include >-I/usr/src/lib/libpthread/sys >-I/usr/src/lib/libpthread/../../libexec/rtld-elf >-I/usr/src/lib/libpthread/../../libexec/rtld-elf/i386 -fno-builtin >-D_LOCK_DEBUG -D_PTHREADS_INVARIANTS -Wall >-I/usr/src/lib/libpthread/../libc/i386 -c >/usr/src/lib/libpthread/thread/thr_condattr_init.c > >cc -O2 -fno-strict-aliasing -pipe -march=pentium4 -DPTHREAD_KERNEL >-I/usr/src/lib/libpthread/../libc/include >-I/usr/src/lib/libpthread/thread >-I/usr/src/lib/libpthread/../../include >-I/usr/src/lib/libpthread/arch/i386/include >-I/usr/src/lib/libpthread/sys >-I/usr/src/lib/libpthread/../../libexec/rtld-elf >-I/usr/src/lib/libpthread/../../libexec/rtld-elf/i386 -fno-builtin >-D_LOCK_DEBUG -D_PTHREADS_INVARIANTS -Wall >-I/usr/src/lib/libpthread/../libc/i386 -c >/usr/src/lib/libpthread/thread/thr_condattr_pshared.c > >make: don't know how to make >/usr/src.lib/libpthread/arch/i386/include/pthread_md.h. Stop >*** Error code 2 Note that: 1) "/usr/src.lib/" does not normally exist; 2) "." and "/" differ by 1 bit; 3) The cc line shows "-I/usr/src/lib/libpthread/arch/i386/include"; 4) Compiling thr_condattr_init.c uses the same #include sequence to successfully load "pthread_md.h"; 5) None of the test build boxes are reporting any problems. Please try running a memory test, or swapping your RAM. -- Peter Jeremy pgpFKiYH7vLfm.pgp Description: PGP signature
Re: (audit?) Panic in 6.2-PRERELEASE
On Sun, 7 Jan 2007, Ceri Davies wrote: Could you try printing *td->td_ar? Maybe this will give us a clue as to how far it got. In particular, this may be able to more reliably give us the file descriptor number, which is audited early in the system call. You might find that 'td' is corrupted in many layers of the stack, keep going up until you find one where it's good. It may well be that td->td_ar->k_ar.ar_arg_fd is correct, and might confirm that uap->fd is correct still. We'd like also to know if ARG_SOCKINFO, ARG_VNODE1, or ARG_VNODE2 is set in the k_ar.ar_valid_arg field. This may tell us some more about the file descriptor even though it appears to have vanished. *td->td_ar is null (0x0) in both cases... I'm actually beginning to wonder if this is actually audit-related at all. Something is clearly not right, and the audit code should not actually have been entered at all there. Perhaps we're being mislead by the stack trace corruption into thinking audit is involved. I'm quite worried by the fact that the file descriptor seems not to be present any more -- this suggests a file descriptor related race of the sort that is both quite difficult to figure out and also quite a risk. It's strange that it would only trigger with audit, however--perhaps audit stretches out the race. Is this an SMP box? It's certainly looking quite nasty. This system is UP hardware without options SMP. ... If it's at all useful, I can provide access to this system and the dumps. Yeah, I think at this point that would probably be the most helpful thing. Could you confirm that the kernel.debug you're using definitely matches the version of the kernel in the core dump? Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
make buildworld is always braking at various points
Hello list, I keep having troubles compiling either 6.1-RELEASE and 6.2-RC2. I downloaded sources, extracted them with install.sh and did a cvsup. My installed system is 6.1-RELEASE and I keep trying to compile it on my own. Playing with various parameters did not help neither helped switching to the sources of 6.2-RC2. My compilation system is a P4-550 with 2G RAM running on an Asus P5P800. This is my /etc/make.conf: PERL_VER=5.8.8 PERL_VERSION=5.8.8 SUP_UPDATE=yes SUP=/usr/local/bin/cvsup SUPFLAGS=-L 1 SUPHOST=cvsup.at.freebsd.org SUPFILE=/home/franz/cvsupfile CPUTYPE=pentium4 KERNCONF=MYKERNEL NO_ATM=true# do not build ATM related programs and libraries NO_BLUETOOTH=true # do not build Bluetooth related stuff NO_FORTRAN=true# do not build g77 and related libraries NO_GAMES=true # do not build games (games/ subdir) NO_GDB=true# do not build GDB NO_I4B=true# do not build isdn4bsd package NO_INET6=true # do not build IPv6 related programs and NO_IPFILTER=true # do not build IP Filter package NO_KERBEROS=true # do not build and install Kerberos 5 (KTH NO_NIS=true NO_PROFILE=true# Avoid compiling profiled libraries NO_SENDMAIL=true # do not build sendmail and related programs PPP_NO_NAT=true# do not build with NAT support (see PPP_NO_NETGRAPH=true # do not build with Netgraph support PPP_NO_RADIUS=true # do not build with RADIUS support NO_BIND_LIBS_LWRES=true# Do not install the lwres library Some exclusions resulted from the buildworld being broken up there. This is my /usr/src/sys/i386/conf/MYKERNEL: machine i386 cpu I686_CPU ident MY-P4-SMP options SMP # Symmetric MultiProcessor options MPTABLE_FORCE_HTT # Enable HTT CPUs with the MP options IPI_PREEMPTION options SCHED_4BSD # 4BSD scheduler options PREEMPTION # Enable kernel thread options INET# InterNETworking options FFS # Berkeley Fast Filesystem options SOFTUPDATES # Enable FFS soft updates options UFS_ACL # Support for access control options UFS_DIRHASH # Improve performance on big options MD_ROOT # MD is a potential root device options NFSCLIENT # Network Filesystem Client options NFSSERVER # Network Filesystem Server options NFS_ROOT# NFS usable as /, requires options PROCFS # Process filesystem (requires options PSEUDOFS# Pseudo-filesystem framework options GEOM_GPT# GUID Partition Tables. options COMPAT_43 # Compatible with BSD 4.3 [KEEP options COMPAT_FREEBSD4 # Compatible with FreeBSD4 options COMPAT_FREEBSD5 # Compatible with FreeBSD5 options SCSI_DELAY=5000 options SYSVSHM # SYSV-style shared memory options SYSVMSG # SYSV-style message queues options SYSVSEM # SYSV-style semaphores options _KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B real-time extensions options KBD_INSTALL_CDEV# install a CDEV entry in /dev options AHC_REG_PRETTY_PRINT# Print register bitfields in options AHD_REG_PRETTY_PRINT# Print register bitfields in options ADAPTIVE_GIANT # Giant mutex is adaptive. options SC_DISABLE_REBOOT device apic# I/O APIC device eisa device pci device ata device atadisk # ATA disk drives device atapicd # ATAPI CDROM drives options ATA_STATIC_ID device atkbdc # AT keyboard controller device atkbd # AT keyboard device psm # PS/2 mouse device kbdmux # keyboard multiplexer device vga # VGA video card driver device splash # Splash screen and screen saver support device sc device agp # support several AGP chipsets device pmtimer device sio # 8250, 16[45]50 based serial ports device ppc device ppbus # Parallel port bus (required) device lpt # Printer device plip# TCP/IP over parallel device ppi # Parallel port interface device device nve # nVidia nForce MCP on-board Ethernet device sk # SysKonnect SK-984x & SK-982x gigabit device loop# Network loopback device random # Entropy device device ether # Ethernet support device pty # Pseudo-t
Re: (audit?) Panic in 6.2-PRERELEASE
On Sun, Jan 07, 2007 at 11:49:56AM +, Robert Watson wrote: > On Sat, 6 Jan 2007, Ceri Davies wrote: > > >>>So far it's happened this morning and yesterday morning. I haven't seen > >>>it before that. I don't know the cause so I can't reproduce it at will, > >>>but the logs don't give any indication. Chances are that it will happen > >>>again tomorrow, but we'll see. > >> > >>Hmm. It looks like you printf *(td->td_proc->p_fd->fd_ofiles) without > >>the array index. Could you repeat that, but with the array index -- > >>i.e., td->td_proc->p_fd->fd_ofiles[uap->fd]? Also, it would probably be > >>useful to print uap->fd. Right now you're printing stdin (index 0), but > >>if the index is non-0, we want a different file. > > > >Very tactfully put :) Sorry about that. > > > >None of the uap->fd's seem to be valid. In the first case, uap->fd is way > >too high for the length of fd_ofiles, which only has 21 elements: > > > >(kgdb) up 8 > >#8 0xc04c470d in fstat (td=0xc2eeb180, uap=0xd610dc74) at > >/usr/src/sys/kern/kern_descrip.c:1075 > >1075error = kern_fstat(td, uap->fd, &ub); > >(kgdb) p uap->fd > >$1 = 89 > >(kgdb) p *td->td_proc->p_fd->fd_ofiles[uap->fd] > >Cannot access memory at address 0x0 > > > >In the second, uap->fd is nonsense: > > > >(kgdb) up 8 > >#8 0xc04c470d in fstat (td=0xc3109300, uap=0xd617ec74) at > >/usr/src/sys/kern/kern_descrip.c:1075 > >1075error = kern_fstat(td, uap->fd, &ub); > >(kgdb) p uap->fd > >$1 = -1023449232 > >(kgdb) > > Hmm. So, I reviewed audit_arg_file() closely, and after staring at the > code a lot, couldn't see anything obvious in either the socket or the > vnode/fifo case. I did fix one other bug there, however, which can never > actually be exercised in 7-CURRENT, and is fairly unlikely in 6-STABLE, and > will MFC that in a week. OK, thanks. > Could you try printing *td->td_ar? Maybe this will give us a clue as to > how far it got. In particular, this may be able to more reliably give us > the file descriptor number, which is audited early in the system call. You > might find that 'td' is corrupted in many layers of the stack, keep going > up until you find one where it's good. It may well be that > td->td_ar->k_ar.ar_arg_fd is correct, and might confirm that uap->fd is > correct still. We'd like also to know if ARG_SOCKINFO, ARG_VNODE1, or > ARG_VNODE2 is set in the k_ar.ar_valid_arg field. This may tell us some > more about the file descriptor even though it appears to have vanished. *td->td_ar is null (0x0) in both cases... > I'm quite worried by the fact that the file descriptor seems not to be > present any more -- this suggests a file descriptor related race of the > sort that is both quite difficult to figure out and also quite a risk. > It's strange that it would only trigger with audit, however--perhaps audit > stretches out the race. Is this an SMP box? It's certainly looking quite nasty. This system is UP hardware without options SMP. > Could you print the entire contents of *td->td_proc->p_fd? First case: (kgdb) p *td->td_proc->p_fd $2 = {fd_ofiles = 0xc3441000, fd_ofileflags = 0xc3441100 "", fd_cdir = 0xc367f110, fd_rdir = 0xc2ce2bb0, fd_jdir = 0x0, fd_nfiles = 64, fd_map = 0xc3b65970, fd_lastfile = 20, fd_freefile = 16, fd_cmask = 63, fd_refcnt = 1, fd_holdcnt = 1, fd_mtx = {mtx_object = { lo_class = 0xc06ad4c4, lo_name = 0xc067c0fd "filedesc structure", lo_type = 0xc067c0fd "filedesc structure", lo_flags = 196608, lo_list = {tqe_next = 0x0, tqe_prev = 0x0}, lo_witness = 0x0}, mtx_lock = 4, mtx_recurse = 0}, fd_locked = 0, fd_wanted = 0, fd_kqlist = {slh_first = 0x0}, fd_holdleaderscount = 0, fd_holdleaderswakeup = 0} Second case: (kgdb) p *td->td_proc->p_fd $2 = {fd_ofiles = 0xc2d23600, fd_ofileflags = 0xc2d23700 "", fd_cdir = 0xc31b8660, fd_rdir = 0xc2ce2bb0, fd_jdir = 0x0, fd_nfiles = 64, fd_map = 0xc2e9c1c0, fd_lastfile = 20, fd_freefile = 17, fd_cmask = 63, fd_refcnt = 1, fd_holdcnt = 1, fd_mtx = {mtx_object = { lo_class = 0xc06ad4c4, lo_name = 0xc067c0fd "filedesc structure", lo_type = 0xc067c0fd "filedesc structure", lo_flags = 196608, lo_list = {tqe_next = 0x0, tqe_prev = 0x0}, lo_witness = 0x0}, mtx_lock = 4, mtx_recurse = 0}, fd_locked = 0, fd_wanted = 0, fd_kqlist = {slh_first = 0x0}, fd_holdleaderscount = 0, fd_holdleaderswakeup = 0} If it's at all useful, I can provide access to this system and the dumps. Ceri -- That must be wonderful! I don't understand it at all. -- Moliere pgpT6fmVvPA4c.pgp Description: PGP signature
Re: kernel panic on 6.2-RC2 with GENERIC.
On Fri, Jan 05, 2007 at 06:59:10PM +0200, Nikolay Pavlov wrote: > Hello folks. > I have kernel panic on GENERIC kernel while executing postmark. The sequence of steps that Nikolay used to produce this panic was: - install benchmarks/postmark from ports root# postmark PostMark v1.5 : 3/27/01 pm>set number=1 pm>set transactions=1 pm>set subdirectories=1 pm>show pm>run -- Craig Rodrigues [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Intel PRO/1000 PT Desktop NIC, PCIe 1x supported in FreeBSD 6.2?
On Sun, Jan 07, 2007 at 04:56:26PM +0100, O. Hartmann wrote: > Hello, > the company I'm working is about to purchae some additional NICs for > some replacement built-in NICs of nForce 405-based desktop PCs. I would > like to purchase the above mentioned NICs from Intel, hoping the em() > driver is capable of handling the NICs. I have one of those cards (Intel PRO/1000 PT Desktop NIC) myself, and it works just fine under 6-STABLE. So far I have not had any problems at all with it. > I found in the list of supported hardware that Intels PRO/1000 series is > supported by the em() driver, but it is said that the nVidia 4XX chipset > is also supported, but not the specific Realtek PHYS/chip used on some > ASROCK boards (AM2NF6G-VSTA). So I would like to ask here first. > > Othr suggestions for good stable and fast NICs are welcome. Thanks. > > Regards, > Oliver > > P.S. I do not have problems coming along with 6-STABLE after 6.2 gets > released, so if you plan integrating support for the nVidia nForce 405 > and/or this mentioned specific Intel PRO/1000 NIC shortly after the > launch of 6.2-RELEASE I will also welcome positive answeres about this. > > - > Intel PRO/1000 PT Desktop Adapter, 1x 1000Base-T, PCIe x1, low profile > (EXPI9300PTL) -- Erik Trulsson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Intel PRO/1000 PT Desktop NIC, PCIe 1x supported in FreeBSD 6.2?
Erik Trulsson wrote: On Sun, Jan 07, 2007 at 04:56:26PM +0100, O. Hartmann wrote: Hello, the company I'm working is about to purchae some additional NICs for some replacement built-in NICs of nForce 405-based desktop PCs. I would like to purchase the above mentioned NICs from Intel, hoping the em() driver is capable of handling the NICs. I have one of those cards (Intel PRO/1000 PT Desktop NIC) myself, and it works just fine under 6-STABLE. So far I have not had any problems at all with it. I found in the list of supported hardware that Intels PRO/1000 series is supported by the em() driver, but it is said that the nVidia 4XX chipset is also supported, but not the specific Realtek PHYS/chip used on some ASROCK boards (AM2NF6G-VSTA). So I would like to ask here first. Othr suggestions for good stable and fast NICs are welcome. Thanks. Regards, Oliver P.S. I do not have problems coming along with 6-STABLE after 6.2 gets released, so if you plan integrating support for the nVidia nForce 405 and/or this mentioned specific Intel PRO/1000 NIC shortly after the launch of 6.2-RELEASE I will also welcome positive answeres about this. - Intel PRO/1000 PT Desktop Adapter, 1x 1000Base-T, PCIe x1, low profile (EXPI9300PTL) Thank you, that helps a lot :-) Regards, Oliver ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: kernel panic on 6.2-RC2 with GENERIC.
On Friday, 5 January 2007 at 18:00:29 -0500, Craig Rodrigues wrote: > On Fri, Jan 05, 2007 at 06:59:10PM +0200, Nikolay Pavlov wrote: > > Hello folks. > > I have kernel panic on GENERIC kernel while executing postmark. > > What is postmark? > Can you give the exact sequence of steps used to produce this panic? Sure: benchmarks/postmark PostMark is the benchmark used in the NetApp Technical Report TR-3022, "PostMark: A New File System Benchmark". The paper fully explains how to use this tool. >From the paper's Abstract: Existing file system benchmarks are deficient in portraying performance in the ephemeral small-file regime used by Internet software, especially: * electronic mail * netnews * web-based commerce PostMark is a new benchmark to measure performance for this class of application. WWW: http://www.netapp.com/tech_library/3022.html root# postmark PostMark v1.5 : 3/27/01 pm>set number=1 pm>set transactions=1 pm>set subdirectories=1 pm>show Current configuration is: The base number of files is 1 Transactions: 1 Files range between 500 bytes and 9.77 kilobytes in size Working directory: /usr/home/quetzal 1 subdirectories will be used Block sizes are: read=512 bytes, write=512 bytes Biases are: read/append=5, create/delete=5 Using Unix buffered file I/O Random number generator seed is 42 Report format is verbose. And than: pm>run Actualy i can triger this panic even with rm -rf "some dir with many files" or background fsck after crash. Also i can triger this with rsync with many (~100G) files. My system is very unstable with 6.2-RC2 kernel, but with 6.1 kernel i can't crash it. Here is successful postmark results for 6.1: Creating subdirectories...Done Creating files...Done Performing transactions..Done Deleting files...Done Deleting subdirectories...Done Time: 1196 seconds total 556 seconds of transactions (17 per second) Files: 15027 created (12 per second) Creation alone: 1 files (32 per second) Mixed with transactions: 5027 files (9 per second) 4990 read (8 per second) 5009 appended (9 per second) 15027 deleted (12 per second) Deletion alone: 10054 files (30 per second) Mixed with transactions: 4973 files (8 per second) Data: 27.14 megabytes read (23.24 kilobytes per second) 85.08 megabytes written (72.84 kilobytes per second) > > -- > Craig Rodrigues > [EMAIL PROTECTED] -- == - Best regards, Nikolay Pavlov. <<<--- == ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Intel PRO/1000 PT Desktop NIC, PCIe 1x supported in FreeBSD 6.2?
Hello, the company I'm working is about to purchae some additional NICs for some replacement built-in NICs of nForce 405-based desktop PCs. I would like to purchase the above mentioned NICs from Intel, hoping the em() driver is capable of handling the NICs. I found in the list of supported hardware that Intels PRO/1000 series is supported by the em() driver, but it is said that the nVidia 4XX chipset is also supported, but not the specific Realtek PHYS/chip used on some ASROCK boards (AM2NF6G-VSTA). So I would like to ask here first. Othr suggestions for good stable and fast NICs are welcome. Thanks. Regards, Oliver P.S. I do not have problems coming along with 6-STABLE after 6.2 gets released, so if you plan integrating support for the nVidia nForce 405 and/or this mentioned specific Intel PRO/1000 NIC shortly after the launch of 6.2-RELEASE I will also welcome positive answeres about this. - Intel PRO/1000 PT Desktop Adapter, 1x 1000Base-T, PCIe x1, low profile (EXPI9300PTL) -- O. Hartmann ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
fxp(4) and lockups on RELENG_6_x
Hello. We are running an (IRC) server that under high-rate traffic (ie. DDoS attack) stops to respond to the network. The network remains locked up even after the original attack stops. However running tcpdump (which switches the interface into promisc mode) unlocks networking and things work again. At the moment, we are running 6.2-RC1 cvsupped at Dec 10, with if_fxp.c from Nov 11 (previously, we had 6.1 for a while, having the same issues) if_fxp.c,v 1.240.2.10.2.1 2006/11/20 16:21:12 The same machine used to run FreeBSD 4.11 without any problems. Any help/pointers/suggestions would be appreciated. More hardware details: [EMAIL PROTECTED]:3:0: class=0x02 card=0x10408086 chip=0x12298086 rev=0x0c hdr=0x00 vendor = 'Intel Corporation' device = '82550/1/7/8/9 EtherExpress PRO/100(B) Ethernet Adapter' class= network subclass = ethernet fxp0: port 0xc800-0xc83f mem 0xd902-0xd9020fff,0xd900-0xd901 irq 11 at device 3.0 on pci2 miibus0: on fxp0 fxp0: Ethernet address: 00:02:b3:90:65:86 interrupt total rate irq11: fxp067322 0 -- () ASCII Ribbon Campaign /\ Support plain text e-mail ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Fatal trap 12: page fault while in kernel mode
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Working on upgrading and applying patch right now ... thanks ... - --On Sunday, January 07, 2007 14:03:41 + Robert Watson <[EMAIL PROTECTED]> wrote: > > On Sat, 6 Jan 2007, Marc G. Fournier wrote: > >> Just had the following happen on a FreeBSD 6.2-PRERELEASE #7: Sun Dec 17 >> 01:28:52 AST 2006 system ... amd64, HP Proliant, 6G of RAM ... have core if >> there is information that I can provide out of it ... >> >> Fatal trap 12: page fault while in kernel mode >> cpuid = 0; apic id = 00 >> fault virtual address = 0x18c >> fault code = supervisor read, page not present >> instruction pointer = 0x8:0x801f9053 >> stack pointer = 0x10:0xb5c78b30 >> frame pointer = 0x10:0xb5c78b60 >> code segment= base 0x0, limit 0xf, type 0x1b >>= DPL 0, pres 1, long 1, def32 0, gran 1 >> processor eflags= resume, IOPL = 0 >> current process = 5 (thread taskq) >> trap number = 12 >> panic: page fault >> cpuid = 0 >> Uptime: 8d22h25m40s >> >> (kgdb) where >> # 0 doadump () at pcpu.h:172 >> # 1 0x80203955 in boot (howto=260) at >> /usr/src/sys/kern/kern_shutdown.c:409 >> # 2 0x80204065 in panic (fmt=0xff019b667720 >> "X\223f\233\001ÿÿÿ\020µc\233\001ÿÿÿ") at >> /usr/src/sys/kern/kern_shutdown.c:565 >> # 3 0x803287a6 in trap_fatal (frame=0xc, eva=18446742981100074784) >> # at >> /usr/src/sys/amd64/amd64/trap.c:660 >> # 4 0x80328cd8 in trap (frame= >> {tf_rdi = 112, tf_rsi = -1092609476832, tf_rdx = 6, tf_rcx = 3221225730, >> tf_r8 = -1245213424, tf_r9 = -1092609476832, tf_rax = 1, tf_rbx = >> - -1096874331952, tf_rbp = -1245213856, tf_r10 = -2142258536, tf_r11 = 0, >> tf_r12 = 4, tf_r13 = -1092609476832, tf_r14 = 4, tf_r15 = 1, tf_trapno = 12, >> tf_addr = 396, tf_flags = -2145197496, tf_err = 0, tf_rip = -2145415085, >> tf_cs = 8, tf_rflags = 65538, tf_rsp = -1245213888, tf_ss = 16}) at >> /usr/src/sys/amd64/amd64/trap.c:238 >> # 5 0x80313c6b in calltrap () at >> /usr/src/sys/amd64/amd64/exception.S:168 >> # 6 0x801f9053 in _mtx_lock_sleep (m=0xff009d31f0d0, >> tid=18446742981100074784, opts=6, file=0xc102 > bounds>, line=-1245213424) at /usr/src/sys/kern/kern_mutex.c:546 >> # 7 0x8025b1ac in unp_gc (arg=0x70, pending=-1687783648) at >> /usr/src/sys/kern/uipc_usrreq.c:1714 >> # 8 0x8022c314 in taskqueue_run (queue=0xff844800) at >> /usr/src/sys/kern/subr_taskqueue.c:257 >> # 9 0x8022d0e7 in taskqueue_thread_loop (arg=0x70) at >> /usr/src/sys/kern/subr_taskqueue.c:376 >> # 10 0x801e7b76 in fork_exit (callout=0x8022d060 >> , arg=0x805030d0, frame=0xb5c78c50) at >> /usr/src/sys/kern/kern_fork.c:821 >> # 11 0x80313fce in fork_trampoline () at >> /usr/src/sys/amd64/amd64/exception.S:394 > > This is a NULL pointer dereference in the UNIX domain socket code. John > Baldwin recently committed a fix for a bug with these symptoms to 7-CURRENT, > with an MFC planned in the near future. The fix won't make 6.2-RELEASE, but > assuming it tests out well over the next few weeks, we will cut an errata > patch/announcement for it. I believe you can pull down his 6-STABLE version > at: > >http://people.FreeBSD.org/~jhb/patches/unp_gc.patch > > This same patch is currently in texting on mx1.FreeBSD.org. > > (John CC'd) > > Robert N M Watson > Computer Laboratory > University of Cambridge - Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (FreeBSD) iD8DBQFFoQ8w4QvfyHIvDvMRAuTzAKDrPBUZ0dRgdujdSzQjbFyh2xiYcACgm8Oa adOhc5QuzI99WsjjjWaSi64= =lmyP -END PGP SIGNATURE- ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Fatal trap 12: page fault while in kernel mode
On Sat, 6 Jan 2007, Marc G. Fournier wrote: Just had the following happen on a FreeBSD 6.2-PRERELEASE #7: Sun Dec 17 01:28:52 AST 2006 system ... amd64, HP Proliant, 6G of RAM ... have core if there is information that I can provide out of it ... Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x18c fault code = supervisor read, page not present instruction pointer = 0x8:0x801f9053 stack pointer = 0x10:0xb5c78b30 frame pointer = 0x10:0xb5c78b60 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= resume, IOPL = 0 current process = 5 (thread taskq) trap number = 12 panic: page fault cpuid = 0 Uptime: 8d22h25m40s (kgdb) where #0 doadump () at pcpu.h:172 #1 0x80203955 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #2 0x80204065 in panic (fmt=0xff019b667720 "X\223f\233\001ÿÿÿ\020µc\233\001ÿÿÿ") at /usr/src/sys/kern/kern_shutdown.c:565 #3 0x803287a6 in trap_fatal (frame=0xc, eva=18446742981100074784) at /usr/src/sys/amd64/amd64/trap.c:660 #4 0x80328cd8 in trap (frame= {tf_rdi = 112, tf_rsi = -1092609476832, tf_rdx = 6, tf_rcx = 3221225730, tf_r8 = -1245213424, tf_r9 = -1092609476832, tf_rax = 1, tf_rbx = - -1096874331952, tf_rbp = -1245213856, tf_r10 = -2142258536, tf_r11 = 0, tf_r12 = 4, tf_r13 = -1092609476832, tf_r14 = 4, tf_r15 = 1, tf_trapno = 12, tf_addr = 396, tf_flags = -2145197496, tf_err = 0, tf_rip = -2145415085, tf_cs = 8, tf_rflags = 65538, tf_rsp = -1245213888, tf_ss = 16}) at /usr/src/sys/amd64/amd64/trap.c:238 #5 0x80313c6b in calltrap () at /usr/src/sys/amd64/amd64/exception.S:168 #6 0x801f9053 in _mtx_lock_sleep (m=0xff009d31f0d0, tid=18446742981100074784, opts=6, file=0xc102 , line=-1245213424) at /usr/src/sys/kern/kern_mutex.c:546 #7 0x8025b1ac in unp_gc (arg=0x70, pending=-1687783648) at /usr/src/sys/kern/uipc_usrreq.c:1714 #8 0x8022c314 in taskqueue_run (queue=0xff844800) at /usr/src/sys/kern/subr_taskqueue.c:257 #9 0x8022d0e7 in taskqueue_thread_loop (arg=0x70) at /usr/src/sys/kern/subr_taskqueue.c:376 #10 0x801e7b76 in fork_exit (callout=0x8022d060 , arg=0x805030d0, frame=0xb5c78c50) at /usr/src/sys/kern/kern_fork.c:821 #11 0x80313fce in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:394 This is a NULL pointer dereference in the UNIX domain socket code. John Baldwin recently committed a fix for a bug with these symptoms to 7-CURRENT, with an MFC planned in the near future. The fix won't make 6.2-RELEASE, but assuming it tests out well over the next few weeks, we will cut an errata patch/announcement for it. I believe you can pull down his 6-STABLE version at: http://people.FreeBSD.org/~jhb/patches/unp_gc.patch This same patch is currently in texting on mx1.FreeBSD.org. (John CC'd) Robert N M Watson Computer Laboratory University of Cambridge___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Livelock in 6.2-RC1
On Sat, 6 Jan 2007, Frode Nordahl wrote: I am experiencing a rare livelock on four of my backend mail servers running 6.1-STABLE, 6.2-BETA2 and 6.2-RC1. They are running OpenLDAP slapd, postfix and UW-IMAPD. The servers can run for months without any problem, but nevertheless I have experienced this problem on multiple versions and different hardware configurations about 5 times since september / october 2006. Server is responding to pings, but all other activity halts. On one occasion when one of the servers displayed this behaviour it managed to recover from the situation by itself after being gone for 20-30 minutes. Recovery is a sign of possible livelock, but otherwise this description sounds more like deadlock than livelock. Note that deadlock can be in a specific subsystem, so other services may still keep running -- for example, interrupts and the in-bound network stack generally have no interaction with the file system, so a file system deadlock can leave ping and the keyboard working. The first step in diagnosing both livelock and deadlock is to figure out what the system is actually doing. I'd start out with the following commands: show pcpu show allpcpu trace alltrace ps show lockedvnods show locks show alllocks (The last two won't work unless you have WITNESS compiled in). The fact that you can get into the debugger and run debugging commands is a good sign; the fact that the debugger breaks into the idle thread suggests that the system has at least one idle CPU. Robert N M Watson Computer Laboratory University of Cambridge Typical hardware configuration: CPU 2x Xeon 3.06GHz or 1x Core2Duo 2.00GHz (SMP) RAM 4 GB RAM DISK Intel SRCU42X (amr) or Dell PERC 5/i (mfi) Kernel config: include GENERIC options KDB # Enable kernel debugger support. options BREAK_TO_DEBUGGER options DDB # Support DDB. options GDB # Support remote GDB. options QUOTA options SMP On the last crash i collected the following info from DDB: db> tr Tracing pid 11 tid 15 td 0xc8f90780 kdb_enter(c092f08b) at kdb_enter+0x2b siointr1(c9120800) at siointr1+0xce siointr(c9120800) at siointr+0x5e intr_execute_handlers(c8f864c8,e7b14c94,4,e7b14cd8,c0889503,...) at intr_execute_handlers+0xe1 lapic_handle_intr(3d) at lapic_handle_intr+0x2e Xapic_isr1() at Xapic_isr1+0x33 --- interrupt, eip = 0xc0b5b0e5, esp = 0xe7b14cd8, ebp = 0xe7b14cd8 --- acpi_cpu_c1(0,0,e7b14cf8,c8f90780,1,...) at acpi_cpu_c1+0x5 acpi_cpu_idle(e7b14d10,c066a779,c8f8fa78,c066a6e4,e7b14d24,...) at acpi_cpu_idle+0x152 cpu_idle(c8f8fa78,c066a6e4,e7b14d24,c066a465,0,...) at cpu_idle+0x28 idle_proc(0,e7b14d38) at idle_proc+0x95 fork_exit(c066a6e4,0,e7b14d38) at fork_exit+0x71 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xe7b14d6c, ebp = 0 --- db> show lockedbufs buf at 0xdd08cbd0 b_flags = 0x2000 b_error = 0, b_bufsize = 16384, b_bcount = 16384, b_resid = 0 b_bufobj = (0xc937ed80), b_data = 0xdea14000, b_blkno = 14386688 b_npages = 4, pages(OBJ, IDX, PA): (0xc1045210, 0x1b70c0, 0xdbe35000),(0xc1045210, 0x1b70c1, 0xc17d6000),(0xc1045210, 0x1b70c2, 0x582d7000),(0xc1045210, 0x1b70c3, 0x84498000) I have a crashdump or two available for further investigation. -- Frode Nordahl ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: debugging kernel options
On Sat, 30 Dec 2006, Karol Kwiatkowski wrote: Robert Watson wrote: On Sat, 30 Dec 2006, Karol Kwiatkowski wrote: Robert Watson wrote: P.S. out of curiosity - now that I have configured kernel with DDB and KDB options, is there any performance penalty of running such kernel? No, it shouldn't really have any effect on performance. The one thing to watch out for is that your system will no longer reboot automatically on a panic, as it will drop to the debugger, by default. You can change this by setting debug.debugger_on_panic to 0, in which case you will likely want to set debug.trace_on_panic to 1 so it prints a stack trace before rebooting (which is often sufficient, combined with the trap frame and panic message to debug the problem). Right now these are sysctls, not tunables, but you can change the default using options KDB_UNATTENDED (which flips the default to not entering the debugger and rebooting) and options KDB_TRACE (which causes a trace to be printed on panic by default). Probably they should also be tunables so that loader.conf entries will work. Great explanation, thank you. I turned on debugging on my desktop computer which, apart from normal every day use, is 'testing' STABLE by running it :) I'm perfectly fine with the defaults, at least for now. BTW, I have added some new documentation to the Developer's Handbook on the various copmile-time kernel debugging options, what their impact is, etc: http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-options.html The kernel debugging section of the Developer's Handbook seems to be getting a bit long in the tooth, I may have a chance to do some further updating of it this weekend. In particular, it seems to focus mostly on crash dumps, and many problems are more easily debugged using information in DDB. BTW, if you're running X on your desktop, be aware that it's X that does all the video mode management. If your box enters the debugger while in X, the debugger doesn't know how to switch back to text mode (and X isn't running, obviously), so while you'll be talking to the debugger, the chances you'll see anything comprehensible are actually quite low. For this reason, I normally also use a serial console when debugging desktop boxes: I can always plug my notebook in with a serial cable to see why it's entered the debugger. Right, I haven't thought about that. I guess without a serial console my best option is to set debug.debugger_on_panic to 0, debug.trace_on_panic to 1 and keep crash dump with kernel.debug for later examination, isn't it? The whole point of doing this, as I am not really experienced in debugging, is to have the information saved somewhere in case of a panic. Yes -- if you have no firewire/serial console option (i.e., no extra notebook and null modem cable, or no serial port), then crash dumps are the best way to go. Setting the sysctls as above is good. Something I've been thinking of doing for a while is adding a scripting facility to DDB, which would allow you to have a script of DDB commands run on crash but before the dump, displaying useful debugging information which would then appear in the dump itself... Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: (audit?) Panic in 6.2-PRERELEASE
On Sat, 6 Jan 2007, Ceri Davies wrote: So far it's happened this morning and yesterday morning. I haven't seen it before that. I don't know the cause so I can't reproduce it at will, but the logs don't give any indication. Chances are that it will happen again tomorrow, but we'll see. Hmm. It looks like you printf *(td->td_proc->p_fd->fd_ofiles) without the array index. Could you repeat that, but with the array index -- i.e., td->td_proc->p_fd->fd_ofiles[uap->fd]? Also, it would probably be useful to print uap->fd. Right now you're printing stdin (index 0), but if the index is non-0, we want a different file. Very tactfully put :) Sorry about that. None of the uap->fd's seem to be valid. In the first case, uap->fd is way too high for the length of fd_ofiles, which only has 21 elements: (kgdb) up 8 #8 0xc04c470d in fstat (td=0xc2eeb180, uap=0xd610dc74) at /usr/src/sys/kern/kern_descrip.c:1075 1075error = kern_fstat(td, uap->fd, &ub); (kgdb) p uap->fd $1 = 89 (kgdb) p *td->td_proc->p_fd->fd_ofiles[uap->fd] Cannot access memory at address 0x0 In the second, uap->fd is nonsense: (kgdb) up 8 #8 0xc04c470d in fstat (td=0xc3109300, uap=0xd617ec74) at /usr/src/sys/kern/kern_descrip.c:1075 1075error = kern_fstat(td, uap->fd, &ub); (kgdb) p uap->fd $1 = -1023449232 (kgdb) Hmm. So, I reviewed audit_arg_file() closely, and after staring at the code a lot, couldn't see anything obvious in either the socket or the vnode/fifo case. I did fix one other bug there, however, which can never actually be exercised in 7-CURRENT, and is fairly unlikely in 6-STABLE, and will MFC that in a week. Could you try printing *td->td_ar? Maybe this will give us a clue as to how far it got. In particular, this may be able to more reliably give us the file descriptor number, which is audited early in the system call. You might find that 'td' is corrupted in many layers of the stack, keep going up until you find one where it's good. It may well be that td->td_ar->k_ar.ar_arg_fd is correct, and might confirm that uap->fd is correct still. We'd like also to know if ARG_SOCKINFO, ARG_VNODE1, or ARG_VNODE2 is set in the k_ar.ar_valid_arg field. This may tell us some more about the file descriptor even though it appears to have vanished. I'm quite worried by the fact that the file descriptor seems not to be present any more -- this suggests a file descriptor related race of the sort that is both quite difficult to figure out and also quite a risk. It's strange that it would only trigger with audit, however--perhaps audit stretches out the race. Is this an SMP box? Could you print the entire contents of *td->td_proc->p_fd? Thanks, Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"