named coredumping
Dear colleagues, today I've got problem with named exiting on signal 11. Yes, I've searched archives and found that POKED TIMER is a problem in our threads. Has anyone faced this problem and workarounded it? Or may be even fixed it? Is there anywhere detailed description of the problem? Jun 19 06:00:58 daemon.warn ns named[44534]: *** POKED TIMER *** Jun 19 06:28:40 daemon.warn ns named[44534]: *** POKED TIMER *** Jun 19 06:49:54 daemon.warn ns named[44534]: *** POKED TIMER *** Jun 19 07:18:19 daemon.warn ns named[44534]: *** POKED TIMER *** Jun 19 07:18:38 kern.info ns kernel: pid 44534 (named), uid 53: exited on signal 11 -- Totus tuus, Glebius. GLEBIUS-RIPN GLEB-RIPE ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Routes not deleted after link down
On Sat, Jun 18, 2005 at 10:14:32PM +0200, Jose M Rodriguez wrote: J Second, you may need a route daemon for this. ospf is a well known J canditate where convergence in case of lost link is a must. While an OSPF daemon may stop advertising the affected route to its neighbors, the kernel will still have the route installed and thus the box won't be able to contact other hosts on the connected net, while they are reachable via alternate pass. I've checked that Cisco routers remove route from FIB when interface link goes down. I haven't checked Junipers yet. From my viewpoint, removing route (or marking it unusable) is a correct behavior for router. Not sure it is correct for desktop. My vote is that we should implement this functionality and make it switchable via sysctl. I'd leave the default as is. What is opinion of other networkers? -- Totus tuus, Glebius. GLEBIUS-RIPN GLEB-RIPE ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Routes not deleted after link down
On Sunday 19 June 2005 10:29, Gleb Smirnoff wrote: On Sat, Jun 18, 2005 at 10:14:32PM +0200, Jose M Rodriguez wrote: J Second, you may need a route daemon for this. ospf is a well known J canditate where convergence in case of lost link is a must. While an OSPF daemon may stop advertising the affected route to its neighbors, the kernel will still have the route installed and thus the box won't be able to contact other hosts on the connected net, while they are reachable via alternate pass. Routing protocol should be responsible for removing affected routes from FIB. For example quagga should remove all routes learned via particular ospf neighbour when that neighbour is not reachable anymore due to link goes down. But in case when no daemons are used (`static' and `connected' are also `routing protocols'), kernel should be responsible for doing that. I've checked that Cisco routers remove route from FIB when interface link goes down. I haven't checked Junipers yet. Junipers do the same. It is the only feasible behaviour for router. From my viewpoint, removing route (or marking it unusable) is a correct behavior for router. Not sure it is correct for desktop. Sure. My vote is that we should implement this functionality and make it switchable via sysctl. I'd leave the default as is. Agree. pgp8nMK3Ubf8W.pgp Description: PGP signature
Re: re0 no carrier problem - Patches found in archives didn't work.
[04:12 19-06-2005] Tom Pepper [EMAIL PROTECTED] this may seem obvious, but do you still have the same issue if you statically define the mediatype to a fixed link speed and duplex? lots of interfaces have problems negotiating autoselect with switches, as protocols vary widely. the blinking is indicative of autoselect setup woes... This wasn't obvious for me! Thanks for the answer! When I set the media by hand it showed 'active', but no data could be transmitted (yellow link diode on switch, not blinking). I then pulled the switch with my home lan from the wall and plugged my re0 instead - It worked! So this seems to be the %$R#(@%*$ switch issue. The funny thing is, that with media set manually (or sometimes with autoselect) everything works fine with this switch (i have used it all day yesterday till evening without problem). Now, when i wait some time with media manually set, it starts transmitting data and everything works fine. Why does it happen? Why is it so unpredictible? Can it be fixed in software, or should i throw the switch out the window and by a new one? Thanks and best regards, m. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Routes not deleted after link down
El Domingo, 19 de Junio de 2005 10:48, Michal Vanco escribi: On Sunday 19 June 2005 10:29, Gleb Smirnoff wrote: On Sat, Jun 18, 2005 at 10:14:32PM +0200, Jose M Rodriguez wrote: J Second, you may need a route daemon for this. ospf is a well known J canditate where convergence in case of lost link is a must. While an OSPF daemon may stop advertising the affected route to its neighbors, the kernel will still have the route installed and thus the box won't be able to contact other hosts on the connected net, while they are reachable via alternate pass. Routing protocol should be responsible for removing affected routes from FIB. For example quagga should remove all routes learned via particular ospf neighbour when that neighbour is not reachable anymore due to link goes down. But in case when no daemons are used (`static' and `connected' are also `routing protocols'), kernel should be responsible for doing that. I've checked that Cisco routers remove route from FIB when interface link goes down. I haven't checked Junipers yet. Junipers do the same. It is the only feasible behaviour for router. From my viewpoint, removing route (or marking it unusable) is a correct behavior for router. Not sure it is correct for desktop. Sure. My vote is that we should implement this functionality and make it switchable via sysctl. I'd leave the default as is. I'm not sure of this. I also think that a devd or monitor daemon will be enough and easy to implement. I think NetBSD have allready some kinda of net monitor daemon for pppoe support (via sppp). Not sure if route support is included. But seems easy and clean that a kernel based solution. -- josemi Agree. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
NFS-related hang in 5.4?
Hi, when doing large file transfers (backing up jails using tar+gzip to a neighboring server), NFS has a tendency to lock up on me. This usually happens after quite a while - like a few hours or so. Also, before the hang, performance is generally bad. KDB trace: db trace Tracing pid 56 tid 100064 td 0xc1a18600 kdb_enter(c096bad3,4,480758,c08dcbf9,f5) at kdb_enter+0x30 siointr1(c1a8e000,c1a18600,c1a148d4,c1a12700,c1a12700) at siointr1+0xe7 siointr(c1a8e000,0,0,4,c1a18600) at siointr+0x78 intr_execute_handlers(c19bd090,d54807bc,d5480818,c08d05a3,34) at intr_execute_handlers+0x88 lapic_handle_intr(34) at lapic_handle_intr+0x3a Xapic_isr1() at Xapic_isr1+0x33 --- interrupt, eip = 0xc06b8490, esp = 0xd5480800, ebp = 0xd5480818 --- _mtx_lock_sleep(c0a1cd2c,c1a18600,0,0,0) at _mtx_lock_sleep+0xb0 udp_input(c2d4,14,c1a99000,1,0) at udp_input+0x257 ip_input(c2d4,0,0,0,0) at ip_input+0x590 transmit_event(c1c64100,2094,0,c1d58a80,7f4220) at transmit_event +0x107 ready_event_wfq(c1c64100,2094,0,c1d58a80,c06d860a) at ready_event_wfq+0x511 dummynet_io(c2bd2e00,64,1,d54809c8,c2bd2e00) at dummynet_io+0x519 ipfw_check_out(0,d5480a24,c1a99000,2,c1d1821c) at ipfw_check_out+0xf1 pfil_run_hooks(c0a1c160,d5480a9c,c1a99000,2,c1d1821c) at pfil_run_hooks+0x138 ip_output(c2bd2e00,0,0,0,0) at ip_output+0x593 udp_output(c1d1821c,c2bd2e00,0,0,c1a18600) at udp_output+0x597 udp_send(c2242654,0,c1e12100,0,0) at udp_send+0x30 sosend(c2242654,0,0,c1e12100,0) at sosend+0x6f1 nfs_send(c2242654,c1d57860,c1e12100,c2313900,1c) at nfs_send+0xc9 nfs_request(c22cf108,c1e12a00,7,0,c20bb300) at nfs_request+0x342 nfs_writerpc(c22cf108,d5480ca4,c20bb300,d5480c94,d5480c98) at nfs_writerpc+0x2a0 nfs_doio(cbf75e08,c20bb300,0,c094f9b4,0) at nfs_doio+0x508 nfssvc_iod(c0a21828,d5480d38,0,0,0) at nfssvc_iod+0x1db fork_exit(c07c5150,c0a21828,d5480d38) at fork_exit+0x80 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xd5480d6c, ebp = 0 --- I cannot seem to kill process 56 (nfsiod), so I have to reset the box. Anyone got a clue? What can I do to ease debugging here? Next time it happens I can probably make a dump, at least I will have a debug kernel running then. /Eirik ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.3-p16/i386 unknown reason console hang
On 6/14/05, Rong-En Fan [EMAIL PROTECTED] wrote: Hello all, Recently, one of our 5.3-p16/i386 machine got frequenctly hang. Details, 1. I can switch vty, but can't login (after typing username, got hang) 2. can response to ping, but not other tcp/udp services 3. can break into ddb I'm getting exactly the same behaviour on 4.11-STABLE/i386. After typing the username on console machine hangs. It works like a charm with previous kernel (4.8-STABLE). Upgrade was done today via usual way (build world/kernel, install world/kernel, mergemaster, reboot with /usr/local/etc/rc.d empty to ensure only base services to run). otis -- Sincerely yours, Juraj Lutter ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: re0 no carrier problem - Patches found in archives didn't work.
At 07:09 AM 6/19/2005, Marcin Koziej wrote: [04:12 19-06-2005] Tom Pepper [EMAIL PROTECTED] this may seem obvious, but do you still have the same issue if you statically define the mediatype to a fixed link speed and duplex? lots of interfaces have problems negotiating autoselect with switches, as protocols vary widely. the blinking is indicative of autoselect setup woes... This wasn't obvious for me! Thanks for the answer! When I set the media by hand it showed 'active', but no data could be transmitted (yellow link diode on switch, not blinking). I then pulled the switch with my home lan from the wall and plugged my re0 instead - It worked! So this seems to be the %$R#(@%*$ switch issue. The funny thing is, that with media set manually (or sometimes with autoselect) everything works fine with this switch (i have used it all day yesterday till evening without problem). Now, when i wait some time with media manually set, it starts transmitting data and everything works fine. Why does it happen? Why is it so unpredictible? Can it be fixed in software, or should i throw the switch out the window and by a new one? I may be replying to the wrong thread, but if I remember a previous post, this is for a realtek gig-e card going to a gig-e switch... Just another obvious point but you are using either Cat-5e or Cat-6 cables, right? Just thought I'd offer another obvious suggestion. :) Vinny Abello Network Engineer Server Management [EMAIL PROTECTED] (973)300-9211 x 125 (973)940-6125 (Direct) PGP Key Fingerprint: 3BC5 9A48 FC78 03D3 82E0 E935 5325 FBCB 0100 977A Tellurian Networks - The Ultimate Internet Connection http://www.tellurian.com (888)TELLURIAN Courage is resistance to fear, mastery of fear - not absence of fear -- Mark Twain ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: re0 no carrier problem - Patches found in archives didn't work.
this may seem obvious, but do you still have the same issue if you statically define the mediatype to a fixed link speed and duplex? lots of interfaces have problems negotiating autoselect with switches, as protocols vary widely. the blinking is indicative of autoselect setup woes... When I set the media by hand it showed 'active', but no data could be transmitted (yellow link diode on switch, not blinking). I then pulled the switch with my home lan from the wall and plugged my re0 instead - It worked! So this seems to be the %$R#(@%*$ switch issue. The funny thing is, that with media set manually (or sometimes with autoselect) everything works fine with this switch (i have used it all day yesterday till evening without problem). Now, when i wait some time with media manually set, it starts transmitting data and everything works fine. Why does it happen? Why is it so unpredictible? Can it be fixed in software, or should i throw the switch out the window and by a new one? Not sure if it is solely switch related. I encountered the very same problem for months now using different links ans switches. Moreover, it seems to be FreeBSD specific since the same machine works better with others OSes. See PR kern/80005 for a more long description on a very similar problem. -- -jpeg. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.4-p1 crash
On Sat, 18 Jun 2005, Philippe PEGON wrote: Unfortunately this is a known bug in FreeBSD; check the archives for more discussion. Doug White tried to look at fixing it before 5.4-RELEASE but I think he gave up. do you know if someone works on it ? I sent two mail in freebsd-stable about it without solution and this bug is really annoying : http://lists.freebsd.org/mailman/htdig/freebsd-stable/2005-June/015952.html and http://lists.freebsd.org/mailman/htdig/freebsd-stable/2005-June/015864.html there is a PR for it : kern/74319 This sounds very similar to a serial console related tty bug I was experiencing on -STABLE a few months ago, and that is believed may have been worked around in 5.4 tweaks before release. In particular, that there are reference counting related bugs in the 5.x tty code that are fixed by a partial rewrite of the tty code in 6.x, but that are too large and disruptive to merge to RELENG_5. If the problem is persisting, it may be worth trying to merge anyway, but it is a pretty big change and would break device driver binary compatibility, etc. What we might want to do here is wait until 6.x has settled out a bit more, then consider merging it to 5.x once 6.x has gotten burned in with similar workloads and continued to not illustrate the 5.x tty reference bugs. Robert N M Watson ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: NFS-related hang in 5.4?
On Sun, 19 Jun 2005, Eirik verby wrote: when doing large file transfers (backing up jails using tar+gzip to a neighboring server), NFS has a tendency to lock up on me. This usually happens after quite a while - like a few hours or so. Also, before the hang, performance is generally bad. Hmm. Looks like a bug in dummynet. ipfw should not be directly re-injecting UDP traffic back into the input path from an outbound path, or it risks re-entering, generating lock order problems, etc. It should be getting dropped into the netisr queue to be processed from the netisr context. Is it possible to configure dummynet out of your configuration, and see if the problem goes away? Robert N M Watson KDB trace: db trace Tracing pid 56 tid 100064 td 0xc1a18600 kdb_enter(c096bad3,4,480758,c08dcbf9,f5) at kdb_enter+0x30 siointr1(c1a8e000,c1a18600,c1a148d4,c1a12700,c1a12700) at siointr1+0xe7 siointr(c1a8e000,0,0,4,c1a18600) at siointr+0x78 intr_execute_handlers(c19bd090,d54807bc,d5480818,c08d05a3,34) at intr_execute_handlers+0x88 lapic_handle_intr(34) at lapic_handle_intr+0x3a Xapic_isr1() at Xapic_isr1+0x33 --- interrupt, eip = 0xc06b8490, esp = 0xd5480800, ebp = 0xd5480818 --- _mtx_lock_sleep(c0a1cd2c,c1a18600,0,0,0) at _mtx_lock_sleep+0xb0 udp_input(c2d4,14,c1a99000,1,0) at udp_input+0x257 ip_input(c2d4,0,0,0,0) at ip_input+0x590 transmit_event(c1c64100,2094,0,c1d58a80,7f4220) at transmit_event+0x107 ready_event_wfq(c1c64100,2094,0,c1d58a80,c06d860a) at ready_event_wfq+0x511 dummynet_io(c2bd2e00,64,1,d54809c8,c2bd2e00) at dummynet_io+0x519 ipfw_check_out(0,d5480a24,c1a99000,2,c1d1821c) at ipfw_check_out+0xf1 pfil_run_hooks(c0a1c160,d5480a9c,c1a99000,2,c1d1821c) at pfil_run_hooks+0x138 ip_output(c2bd2e00,0,0,0,0) at ip_output+0x593 udp_output(c1d1821c,c2bd2e00,0,0,c1a18600) at udp_output+0x597 udp_send(c2242654,0,c1e12100,0,0) at udp_send+0x30 sosend(c2242654,0,0,c1e12100,0) at sosend+0x6f1 nfs_send(c2242654,c1d57860,c1e12100,c2313900,1c) at nfs_send+0xc9 nfs_request(c22cf108,c1e12a00,7,0,c20bb300) at nfs_request+0x342 nfs_writerpc(c22cf108,d5480ca4,c20bb300,d5480c94,d5480c98) at nfs_writerpc+0x2a0 nfs_doio(cbf75e08,c20bb300,0,c094f9b4,0) at nfs_doio+0x508 nfssvc_iod(c0a21828,d5480d38,0,0,0) at nfssvc_iod+0x1db fork_exit(c07c5150,c0a21828,d5480d38) at fork_exit+0x80 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xd5480d6c, ebp = 0 --- I cannot seem to kill process 56 (nfsiod), so I have to reset the box. Anyone got a clue? What can I do to ease debugging here? Next time it happens I can probably make a dump, at least I will have a debug kernel running then. /Eirik ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: rebooting problem
On Fri, 17 Jun 2005 14:19:41 +0200 GMane [EMAIL PROTECTED] wrote: Hi Cian, Cian Hughes [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] Do you also have a hanging problem if you shutdown with `shutdown - p now`? No, the problem it's only with the reboot command. Are you using ACPI? Have you ensured you are using the latest BIOS for you system? I turned off the ACPI from the BIOS. (but from the dmesg it seems on) You need to add a line to /boot/device.hints to disable acpi: hint.acpi.0.disabled=1 FWIW, I've been seeing the same problem on both my 32-bit Athlon box and on my Athlon 64, both running 5.4-STABLE. One thing I've discovered is that the machine will reboot properly if booted in single-user mode. It appears to be only a problem when shutting down from multi-user mode. The shutdown process gets as far as outputting the Uptime: line, and then just hangs. -- Conrad J. Sabatier [EMAIL PROTECTED] -- In Unix veritas ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: re0 no carrier problem - Patches found in archives didn't work.
At 07:09 AM 19/06/2005, Marcin Koziej wrote: When I set the media by hand it showed 'active', but no data could be transmitted (yellow link diode on switch, not blinking). I then pulled the switch with my home lan from the wall and plugged my re0 instead - It worked! So this seems to be the %$R#(@%*$ switch issue. The funny thing is, that with media set manually (or sometimes with autoselect) everything works fine with this switch (i have used it all day yesterday till evening without problem). Now, when i Both sides have to agree to the setting. ie if the switch is set to autoneg, your network card must also be set to autoneg and vice versa.Unless its a managed switch, leave the NIC as autonegotiate. If its still failing, I suspect its something with the re driver. Its possible the NIC and switch dont like each other (it has happened in the past with certain vendors) but this is pretty rare these days. ---Mike ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Routes not deleted after link down
Gleb Smirnoff wrote: My vote is that we should implement this functionality and make it switchable via sysctl. I'd leave the default as is. What is opinion of other networkers? How about also adding a sysctl for setting a delay time between event and disabling of the route? Then even people with roaming wlan cards can benefit. Also it is in my opinion that the route be disabled (moved to a passive route table maybe?) and not deleted. At my old job i came across situations where the lack of this feature caused headaches and once or twice the loss of a customer. -- Sten Daniel Srsdal ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Routes not deleted after link down
At 04:29 AM 19/06/2005, Gleb Smirnoff wrote: On Sat, Jun 18, 2005 at 10:14:32PM +0200, Jose M Rodriguez wrote: J Second, you may need a route daemon for this. ospf is a well known J canditate where convergence in case of lost link is a must. I've checked that Cisco routers remove route from FIB when interface link goes down. I haven't checked Junipers yet. From my viewpoint, removing route (or marking it unusable) is a correct behavior for router. Not sure it is correct for desktop. My vote is that we should implement this functionality and make it switchable via sysctl. I'd leave the default as is. I like this idea as well, but you need to control how the routes would come back after the interface comes back up ? This seems more of the province of a routing daemon like quagga as opposed to a kernel feature no ? ---Mike ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ATA_DMA errors (and fs corruption!)
twesky wrote: I am having ATA_DMA errors on 5.4R and 5 STABLE up to June 16 (haven't done a cvsup again). It doesn't happen on 5.3R or lower. I have got same problem. I tried yesterday's kernel and I got lots of ATA DMA errors. A question: do you have a VIA IDE controller like mine? atapci0: VIA 8235 UDMA133 controller port 0xfc00-0xfc0f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 17.1 on pci0 [EMAIL PROTECTED]:17:1: class=0x01018a card=0x05711849 chip=0x05711106 rev=0x06 hdr=0x00 vendor = 'VIA Technologies Inc' device = 'VT82 EIDE Controller (All VIA Chipsets)' class= mass storage subclass = ATA Today, I noticed, the short experiment with the latest -STABLE destroyed a part of my /usr partition. It looked like this (with May 9th kernel today): kernel: handle_workitem_freeblocks: block count kernel: bad block 50333952, ino 1743780 kernel: pid 56 (syncer), uid 0 inumber 1743780 on /usr: bad block kernel: bad block 3221252091, ino 1743780 klotz kernel: pid 56 (syncer), uid 0 inumber 1743780 on /usr: bad block kernel: bad block 144119931884736777, ino 1743780 kernel: pid 56 (syncer), uid 0 inumber 1743780 on /usr: bad block kernel: bad block 72340173158093844, ino 1743780 kernel: pid 56 (syncer), uid 0 inumber 1743780 on /usr: bad block kernel: bad block 1104111992832, ino 1743780 kernel: pid 56 (syncer), uid 0 inumber 1743780 on /usr: bad block kernel: handle_workitem_freeblocks: block count kernel: handle_workitem_freeblocks: block count kernel: bad block 1865342872522620032, ino 1743783 While shutting down I got this: Jun 19 22:04:21 klotz kernel: /usr: unmount pending error: blocks -3561100369582 68157 files 0 I restored the fs in single-user mode. And now it runs fine with the kernel (May 9th). See also my earlier post. Martin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.4-p1 crash
Robert Watson a crit : This sounds very similar to a serial console related tty bug I was experiencing on -STABLE a few months ago, and that is believed may have been worked around in 5.4 tweaks before release. In particular, that there are reference counting related bugs in the 5.x tty code that are fixed by a partial rewrite of the tty code in 6.x, but that are too large and disruptive to merge to RELENG_5. If the problem is persisting, it may be worth trying to merge anyway, but it is a pretty big change and would break device driver binary compatibility, etc. What we might want to do here is wait until 6.x has settled out a bit more, then consider merging it to 5.x once 6.x has gotten burned in with similar workloads and continued to not illustrate the 5.x tty reference bugs. Thanks for your answer. Like I said on anothers posts, we have a FreeBSD 5.4-p1 which connects every fifteen minutes with an expect program to a lot of network devices for retrieving some informations, it seems that it is the culprit, the server crashed almost everyday. We reduced the frequency to one per hour and that attenuates the problem. This panic is easy to reproduce with this simple expect program (see below) by running it 6 times simultaneously and waiting a few hours, I tested it on a HP DL360 with 2 cpu. If that can help, I can test this on current next week. #! /usr/local/bin/expect set timeout 60 set host [lindex $argv 0] set pass PASSWORD spawn ssh [EMAIL PROTECTED] expect { continue*(yes/no) { send yes\r ; exp_continue } assword: { send $pass\r } } expect *# { send ls\r } expect *# { send exit\r } puts Done. Robert N M Watson -- Philippe PEGON ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
SCSI troubles
Hi, I Am running 5.4-STABLE FreeBSD 5.4-STABLE #0: Sun May 8 18:23:40 CEST 2005 with Adaptec 29320 or 2930U2 controller and with heavy load on SCSI I have the following problems ending with system freeze. Have anybody some opinion where could be problem? Thanks a lot Tomas Randa da0: SEAGATE ST336754LW 0002 Fixed Direct Access SCSI-3 device da0: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged Queueing Enabled kernel: ahc0: WARNING no command for scb 0 (cmdcmplt) kernel: QOUTPOS = 252 kernel: ahc0: WARNING no command for scb 0 (cmdcmplt) kernel: QOUTPOS = 253 kernel: ahc0: WARNING no command for scb 0 (cmdcmplt) kernel: QOUTPOS = 254 kernel: ahc0: WARNING no command for scb 0 (cmdcmplt) kernel: QOUTPOS = 255 kernel: ahc0: Recovery Initiated kernel: Dump Card State Begins kernel: ahc0: Dumping Card State while idle, at SEQADDR 0x8 kernel: Card was paused kernel: ACCUM = 0x0, SINDEX = 0x7, DINDEX = 0xe4, ARG_2 = 0x0 kernel: HCNT = 0x0 SCBPTR = 0xa kernel: SCSISIGI[0x0] ERROR[0x0] SCSIBUSL[0x0] LASTPHASE[0x1]:(P_BUSFREE) kernel: SCSISEQ[0x12]:(ENAUTOATNP|ENRSELI) SBLKCTL[0xa]:(SELWIDE|SELBUSB) kernel: SCSIRATE[0x0] SEQCTL[0x10]:(FASTMODE) SEQ_FLAGS[0xc0]:(NO_CDB_SENT|NOT_IDENTIFIED kernel: SSTAT0[0x0] SSTAT1[0xa]:(PHASECHG|BUSFREE) SSTAT2[0x0] kernel: SSTAT3[0x0] SIMODE0[0x8]:(ENSWRAP) SIMODE1[0xa4]:(ENSCSIPERR|ENSCSIRST|ENSELTIMO) kernel: SXFRCTL0[0x80]:(DFON) DFCNTRL[0x0] DFSTATUS[0x89]:(FIFOEMP|HDONE|PRELOAD_AVAIL) kernel: STACK: 0x0 0x167 0x10d 0x3 kernel: SCB count = 70 kernel: Kernel NEXTQSCB = 43 kernel: Card NEXTQSCB = 43 kernel: QINFIFO entries: kernel: Waiting Queue entries: kernel: Disconnected Queue entries: kernel: QOUTFIFO entries: kernel: Sequencer Free SCB List: 10 0 20 23 12 15 19 29 18 28 26 7 5 27 9 17 30 1 2 16 4 kernel: Sequencer SCB Info: kernel: 0 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) SCB_SCSIID[0x7] kernel: SCB_LUN[0x0] SCB_TAG[0xff] kernel: 1 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) SCB_SCSIID[0x7] kernel: SCB_LUN[0x0] SCB_TAG[0xff] kernel: 2 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) SCB_SCSIID[0x7] kernel: SCB_LUN[0x0] SCB_TAG[0xff] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.4-p1 crash
On Fri, 17 Jun 2005, Mitch Parks wrote: Below are details regarding another crash on a Dell 2600 SMP (HTT and USB disabled). It has been 9 days since the last crash. I didn't have the serial console in place for this last crash, but it is now. As noted, the ttwakeup() panic is a known bug. The best thing we have for a fix is this patch: http://people.freebsd.org/~mlaier/tty.t_pgrp.diff Please give it a try and report back if you have any more panics (or don't :-) ). -- Doug White| FreeBSD: The Power to Serve [EMAIL PROTECTED] | www.FreeBSD.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Weird fdisk behavior
On Sat, 18 Jun 2005, Baldur Gislason wrote: I am trying to add another partition to my root drive, it has a few gigabytes of unpartitioned space. Whenever I try to run fdisk -u it says cannot open disk /dev/ad0: No such file or directory ad0 does exist, why does fdisk say otherwise? fdisk can display the partition table but it can't alter it. securelevel is -1 and this is FreeBSD 5.4-STABLE from the beginning of May this year. Be aware that fdisk will fake up a partition table if the volume has no table. Look for a message like: warning: invalid partition table found If you see that then there is no FDISK partition table. -- Doug White| FreeBSD: The Power to Serve [EMAIL PROTECTED] | www.FreeBSD.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Weird fdisk behavior
Hi, On 18/06/2005, at 1:28 PM, Baldur Gislason wrote: I am trying to add another partition to my root drive, it has a few gigabytes of unpartitioned space. Whenever I try to run fdisk -u it says cannot open disk /dev/ad0: No such file or directory ad0 does exist, why does fdisk say otherwise? fdisk can display the partition table but it can't alter it. securelevel is -1 and this is FreeBSD 5.4-STABLE from the beginning of May this year. Try setting sysctl kern.geom.debugflags=16 before running fdisk. This stops GEOM protecting the drive/partition and allows fdisk to open it. I think fdisk and bsdlabel have been taught about GEOM in 6-current, but I very might well be wrong. Cheers Phil Murray ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: NFS-related hang in 5.4?
On 19. jun. 2005, at 20.06, Robert Watson wrote: On Sun, 19 Jun 2005, Eirik verby wrote: when doing large file transfers (backing up jails using tar+gzip to a neighboring server), NFS has a tendency to lock up on me. This usually happens after quite a while - like a few hours or so. Also, before the hang, performance is generally bad. Hmm. Looks like a bug in dummynet. ipfw should not be directly re- injecting UDP traffic back into the input path from an outbound path, or it risks re-entering, generating lock order problems, etc. It should be getting dropped into the netisr queue to be processed from the netisr context. This problem would exist across all 5.4 installations, both i386 and amd64? Would it depend on heavy load, or could it theoretically happen at any time when there's traffic? All three of my fbsd5 servers (dual opteron, dual p3-1ghz, dual p3-700mhz) are experiencing random hangs with ~a few weeks between, impression is that if running single-cpu mode they are all stable. All using dummynet in a comparable manner. Ideas? Is it possible to configure dummynet out of your configuration, and see if the problem goes away? I'm running a test right now, will let you know in the morning. Robert N M Watson KDB trace: db trace Tracing pid 56 tid 100064 td 0xc1a18600 kdb_enter(c096bad3,4,480758,c08dcbf9,f5) at kdb_enter+0x30 siointr1(c1a8e000,c1a18600,c1a148d4,c1a12700,c1a12700) at siointr1 +0xe7 siointr(c1a8e000,0,0,4,c1a18600) at siointr+0x78 intr_execute_handlers(c19bd090,d54807bc,d5480818,c08d05a3,34) at intr_execute_handlers+0x88 lapic_handle_intr(34) at lapic_handle_intr+0x3a Xapic_isr1() at Xapic_isr1+0x33 --- interrupt, eip = 0xc06b8490, esp = 0xd5480800, ebp = 0xd5480818 --- _mtx_lock_sleep(c0a1cd2c,c1a18600,0,0,0) at _mtx_lock_sleep+0xb0 udp_input(c2d4,14,c1a99000,1,0) at udp_input+0x257 ip_input(c2d4,0,0,0,0) at ip_input+0x590 transmit_event(c1c64100,2094,0,c1d58a80,7f4220) at transmit_event+0x107 ready_event_wfq(c1c64100,2094,0,c1d58a80,c06d860a) at ready_event_wfq+0x511 dummynet_io(c2bd2e00,64,1,d54809c8,c2bd2e00) at dummynet_io+0x519 ipfw_check_out(0,d5480a24,c1a99000,2,c1d1821c) at ipfw_check_out+0xf1 pfil_run_hooks(c0a1c160,d5480a9c,c1a99000,2,c1d1821c) at pfil_run_hooks+0x138 ip_output(c2bd2e00,0,0,0,0) at ip_output+0x593 udp_output(c1d1821c,c2bd2e00,0,0,c1a18600) at udp_output+0x597 udp_send(c2242654,0,c1e12100,0,0) at udp_send+0x30 sosend(c2242654,0,0,c1e12100,0) at sosend+0x6f1 nfs_send(c2242654,c1d57860,c1e12100,c2313900,1c) at nfs_send+0xc9 nfs_request(c22cf108,c1e12a00,7,0,c20bb300) at nfs_request+0x342 nfs_writerpc(c22cf108,d5480ca4,c20bb300,d5480c94,d5480c98) at nfs_writerpc+0x2a0 nfs_doio(cbf75e08,c20bb300,0,c094f9b4,0) at nfs_doio+0x508 nfssvc_iod(c0a21828,d5480d38,0,0,0) at nfssvc_iod+0x1db fork_exit(c07c5150,c0a21828,d5480d38) at fork_exit+0x80 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xd5480d6c, ebp = 0 --- I cannot seem to kill process 56 (nfsiod), so I have to reset the box. Anyone got a clue? What can I do to ease debugging here? Next time it happens I can probably make a dump, at least I will have a debug kernel running then. /Eirik ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable- [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ATA_DMA errors (and fs corruption!)
Here is my controller: atapci0: Intel ICH4 UDMA100 controller port 0x1860-0x186f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 31.1 on pci0 ata0: channel #0 on atapci0 ata1: channel #1 on atapci0 The last known good stable version for me was aprox April 25, my next cvsup was May 17, but I have problems with 5.4 Release so I assume (probably incorrectly) that something changed between April 25 and 5.4R. I don't exactly recall my shutdown errors, but I did have to restore my file systems to get my laptop back to a functioning state. On 6/19/05, Martin [EMAIL PROTECTED] wrote: twesky wrote: I am having ATA_DMA errors on 5.4R and 5 STABLE up to June 16 (haven't done a cvsup again). It doesn't happen on 5.3R or lower. I have got same problem. I tried yesterday's kernel and I got lots of ATA DMA errors. A question: do you have a VIA IDE controller like mine? atapci0: VIA 8235 UDMA133 controller port 0xfc00-0xfc0f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 17.1 on pci0 [EMAIL PROTECTED]:17:1: class=0x01018a card=0x05711849 chip=0x05711106 rev=0x06 hdr=0x00 vendor = 'VIA Technologies Inc' device = 'VT82 EIDE Controller (All VIA Chipsets)' class= mass storage subclass = ATA Today, I noticed, the short experiment with the latest -STABLE destroyed a part of my /usr partition. It looked like this (with May 9th kernel today): kernel: handle_workitem_freeblocks: block count kernel: bad block 50333952, ino 1743780 kernel: pid 56 (syncer), uid 0 inumber 1743780 on /usr: bad block kernel: bad block 3221252091, ino 1743780 klotz kernel: pid 56 (syncer), uid 0 inumber 1743780 on /usr: bad block kernel: bad block 144119931884736777, ino 1743780 kernel: pid 56 (syncer), uid 0 inumber 1743780 on /usr: bad block kernel: bad block 72340173158093844, ino 1743780 kernel: pid 56 (syncer), uid 0 inumber 1743780 on /usr: bad block kernel: bad block 1104111992832, ino 1743780 kernel: pid 56 (syncer), uid 0 inumber 1743780 on /usr: bad block kernel: handle_workitem_freeblocks: block count kernel: handle_workitem_freeblocks: block count kernel: bad block 1865342872522620032, ino 1743783 While shutting down I got this: Jun 19 22:04:21 klotz kernel: /usr: unmount pending error: blocks -3561100369582 68157 files 0 I restored the fs in single-user mode. And now it runs fine with the kernel (May 9th). See also my earlier post. Martin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.4-p1 crash
On Sun, 19 Jun 2005, Doug White wrote: On Fri, 17 Jun 2005, Mitch Parks wrote: Below are details regarding another crash on a Dell 2600 SMP (HTT and USB disabled). It has been 9 days since the last crash. I didn't have the serial console in place for this last crash, but it is now. As noted, the ttwakeup() panic is a known bug. The best thing we have for a fix is this patch: http://people.freebsd.org/~mlaier/tty.t_pgrp.diff Please give it a try and report back if you have any more panics (or don't :-) ). Thanks! This patch appears to be for 5.3, but I manually applied the chunk of the patch that didn't apply cleanly and the countdown is on. I'll report back in 10 days unless something bad happens before then. Below is the patch chunk #10 that I actually applied rather than the one given. If I've done something bad here by removing the PGRP_LOCK please let me know. Hunk #6 succeeded at 1154 (offset -51 lines). Hunk #7 succeeded at 1215 (offset -6 lines). Hunk #8 succeeded at 1203 (offset -51 lines). Hunk #9 succeeded at 1946 (offset -5 lines). Hunk #10 failed at 2562. Hunk #11 succeeded at 2847 (offset -212 lines). 1 out of 11 hunks failed--saving rejects to tty.c.rej @@ -2495,19 +2511,21 @@ * On return following a ttyprintf(), we set tp-t_rocount to 0 so * that pending input will be retyped on BS. */ + sx_slock(proctree_lock); if (tp-t_session == NULL) { + sx_sunlock(proctree_lock); ttyprintf(tp, not a controlling terminal\n); tp-t_rocount = 0; return; } if (tp-t_pgrp == NULL) { + sx_sunlock(proctree_lock); ttyprintf(tp, no foreground process group\n); tp-t_rocount = 0; return; } - PGRP_LOCK(tp-t_pgrp); - if ((p = LIST_FIRST(tp-t_pgrp-pg_members)) == 0) { - PGRP_UNLOCK(tp-t_pgrp); + if ((p = LIST_FIRST(tp-t_pgrp-pg_members)) == NULL) { + sx_sunlock(proctree_lock); ttyprintf(tp, empty foreground process group\n); tp-t_rocount = 0; return; Or the complete patch: http://kuoi.asui.uidaho.edu/~mitch/crash/tty_5.4.patch Mitch Parks [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
ndis no longer detects netgear wg311v2
I recently moved, and for some reason my computer started crashing when I tried to make it associate with my new wireless network. (It worked fine on the old wireless network.) I figured the first thing I should do to fix the problem is update my sources, since it may have already been fixed. Unfortunately the updated sources don't recognize my wireless card at all. Before the update (hand-transcribed): # uname -a FreeBSD lojak.washington.edu 5.4-PRERELEASE FreeBSD 5.4-PRERELEASE #0: Sat Apr 2 11:50:53 PST 2005 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/CUSTOM i386 # kldload FwRad16.bin.ko # kldload ndis # kldload if_ndis ndis0: NETGEAR WG311v2 802.11g Wireless PCI Adapter mem 0xfb02-0xfb03,0xfb04-0xfb041fff irq 16 at device 4.0 on pci2 ndis0: NDIS API version: 5.1 ndis0: Ethernet address: 00:09:5b:ba:da:ef ndis0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps ndis0: 11g rates: 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps 36Mbps 48Mbps 54Mbps # ifconfig ndis0 ndis0: flags=8802BROADCAST,SIMPLEX,MULTICAST mtu 1500 ether 00:09:5b:da:ef media: IEEE 802.11 Wireless Ethernet autoselect status: no carrier ssid channel -1 authmode OPEN powersavemode OFF powersavesleep 100 rtsthreshold 2312 protmode CTS wepmode OFF weptxkey 1 # wicontrol ndis0 -l 0 stations: # ifconfig ndis0 ssid The Penthouse channel 11 up ndis0: link up # dhclient ndis0 ndis0: link up I can't see the whole panic screen, but here's the bottom: panic: page fault cpuid = 0 boot() called on cpu#0 Uptime: 11m20s Dumping 1023 MB Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address= 0x0 fault code = supervisor read, page not present intruction pointer = 0x8:0x0 stack pointer= 0x10:0xe4e1fcec frame pointer= 0x10:0xe4e1fd0c code segment = base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 5 (thread taskq) trap number = 12 I can work on getting the crash dump if that's useful, but the behavior seems to have changed in the last couple of months. Now I get: # uname -a FreeBSD lojak.washington.edu 5.4-STABLE FreeBSD 5.4-STABLE #0: Sat Jun 14:20:40 PDT 2005 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/CUSTOM i386 # kldload FwRad16.bin.ko # kldload ndis warning: KLD '/boot/kernel/if_ndis.ko' is newer than the linker.hints file # kldload if_ndis kldload: can't load if_ndis: File exists # kldstat Id Refs AddressSize Name 1 10 0xc040 4a42f0 kernel 2 14 0xc08a5000 56270acpi.ko 31 0xc2e83000 14000FwRad16.bin.ko 41 0xc2e97000 9000 if_ndis.ko 51 0xc2ea 12000ndis.ko 61 0xc2ec5000 b000 pccard.ko # ifconfig ndis0 ifconfig: interface ndis0 does not exist # kldunload if_ndis # kldload if_ndis # ifconfig ndis0 ifconfig: interface ndis0 does not exist I haven't seen anything in UPDATING that accounts for this. Does anyone have any idea of where to look for clues? Thanks very much, -- Evan Dower Software Development Engineer Amazon.com, Inc. Public key: http://students.washington.edu/evantd/pgp-pub-key.txt Key fingerprint = D321 FA24 4BDA F82D 53A9 5B27 7D15 5A4F 033F 887D ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: SCSI troubles
On Mon, Jun 20, 2005 at 12:10:27AM +0200, Tomas Randa wrote: I Am running 5.4-STABLE FreeBSD 5.4-STABLE #0: Sun May 8 18:23:40 CEST 2005 with Adaptec 29320 or 2930U2 controller and with heavy load on SCSI I have the following problems ending with system freeze. Have anybody some opinion where could be problem? kernel: ahc0: WARNING no command for scb 0 (cmdcmplt) kernel: QOUTPOS = 252 Why are you using ahc() rather than ahd()? Have you tried the ahd() driver previously? What do you consider heavy load? I have nad no problems with the ahd() and a 29320. Bruce -- I like bad! Bruce BurdenAustin, TX. - Thuganlitha The Power and the Prophet Robert Don Hughes ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]