Panic in route.c:579 on SSH connect with 11-CURRENT at r293913
Hello, with 11-CURRENT at r293913 I'm seeing this panic as soon as I'm trying to connect through SSH: Unread portion of the kernel message buffer: panic: _mtx_lock_sleep: recursed on non-recursive mutex rtentry @ /usr/src/sys/net/route.c:579 (kgdb) bt #0 doadump (textdump=-2122574672) at pcpu.h:221 #1 0x803823b6 in db_fncall (dummy1=, dummy2=, dummy3=, dummy4=) at /usr/src/sys/ddb/db_command.c:568 #2 0x80381e4e in db_command (cmd_table=0x0) at /usr/src/sys/ddb/db_command.c:440 #3 0x80381be4 in db_command_loop () at /usr/src/sys/ddb/db_command.c:493 #4 0x8038467b in db_trap (type=, code=0) at /usr/src/sys/ddb/db_main.c:251 #5 0x80a5d893 in kdb_trap (type=3, code=0, tf=) at /usr/src/sys/kern/subr_kdb.c:654 #6 0x80e6a2a8 in trap (frame=0xfe011b3b21e0) at /usr/src/sys/amd64/amd64/trap.c:556 #7 0x80e4ad47 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:234 #8 0x80a5cf7b in kdb_enter (why=0x8137b8dc "panic", msg=0x80 ) at cpufunc.h:63 #9 0x80a2046f in vpanic (fmt=, ap=) at /usr/src/sys/kern/kern_shutdown.c:750 #10 0x80a202c6 in kassert_panic (fmt=) at /usr/src/sys/kern/kern_shutdown.c:647 #11 0x80a04441 in __mtx_lock_sleep (c=0xf80006b89cf0, tid=, opts=, file=, line=1) at /usr/src/sys/kern/kern_mutex.c:396 #12 0x80a0412d in __mtx_lock_flags (c=, opts=0, file=0x81395a63 "/usr/src/sys/net/route.c", line=579) at /usr/src/sys/kern/kern_mutex.c:222 #13 0x80b10ffe in rtredirect_fib (dst=0xfe011b3b2600, gateway=0xfe011b3b25f0, netmask=0x0, flags=6, src=0xfe011b3b25e0, fibnum=0) at /usr/src/sys/net/route.c:579 #14 0x80b6cad7 in icmp_input (mp=0xfe011b3b2670, offp=0xfe011b3b266c, proto=1) at /usr/src/sys/netinet/ip_icmp.c:614 #15 0x80b6d5cd in ip_input (m=0x4) at /usr/src/sys/netinet/ip_input.c:786 #16 0x80b0c861 in netisr_dispatch_src (proto=, source=, m=0xf80006720b00) at /usr/src/sys/net/netisr.c:972 #17 0x80b029be in ether_demux (ifp=, m=) at /usr/src/sys/net/if_ethersubr.c:803 #18 0x80b03704 in ether_nh_input (m=) at /usr/src/sys/net/if_ethersubr.c:609 #19 0x80b0c861 in netisr_dispatch_src (proto=, source=, m=0xf80006720b00) at /usr/src/sys/net/netisr.c:972 #20 0x80b02cbf in ether_input (ifp=0xf80003f2b000, m=0x0) at /usr/src/sys/net/if_ethersubr.c:713 #21 0x808a1b43 in vtnet_rxq_eof (rxq=0xf80003f06e00) at /usr/src/sys/dev/virtio/network/if_vtnet.c:1732 #22 0x808a284e in vtnet_rx_vq_intr (xrxq=0xf80003f06e00) at /usr/src/sys/dev/virtio/network/if_vtnet.c:1863 #23 0x809e8ef6 in intr_event_execute_handlers ( p=, ie=0xf80003ede200) at /usr/src/sys/kern/kern_intr.c:1262 #24 0x809e9586 in ithread_loop (arg=0xf80003cbbc60) at /usr/src/sys/kern/kern_intr.c:1275 #25 0x809e67b4 in fork_exit ( callout=0x809e94e0 , arg=0xf80003cbbc60, frame=0xfe011b3b29c0) at /usr/src/sys/kern/kern_fork.c:1010 #26 0x80e4b27e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:609 #27 0x in ?? () Current language: auto; currently minimal This a a byhve VM with an VirtIO network adapter: virtio_pci0: port 0x2000-0x201f mem 0xc000-0xc0001fff irq 16 at device 2.0 on pci0 vtnet0: on virtio_pci0 vtnet0: Ethernet address: 00:a0:98:51:ed:26 001.48 [ 421] vtnet_netmap_attach max rings 1 vtnet0: netmap queues/slots: TX 1/1024, RX 1/1024 001.49 [ 426] vtnet_netmap_attach virtio attached txq=1, txd=1024 rxq=1, rxd=1024 This may be caused by the recent routing work, but I'm not quite sure. I have the dump and I'm able to reproduce this easily so more information can be provided if necessary. Regards, Yamagi -- Homepage: www.yamagi.org XMPP: yam...@yamagi.org GnuPG/GPG: 0xEFBCCBCB ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: Panic in route.c:579 on SSH connect with 11-CURRENT at r293913
Hello, updating to r294020 solves the issue for me. Thank you. :) Regards, Yamagi On Thu, 14 Jan 2016 19:31:45 +0300 Alexander V. Chernikov <melif...@freebsd.org> wrote: > 14.01.2016, 19:16, "Alexander V. Chernikov" <melif...@freebsd.org>: > > 14.01.2016, 18:29, "Yamagi Burmeister" <li...@yamagi.org>: > >> Hello, > >> with 11-CURRENT at r293913 I'm seeing this panic as soon as I'm trying > >> to connect through SSH: > >> > >> Unread portion of the kernel message buffer: > >> panic: _mtx_lock_sleep: recursed on non-recursive mutex rtentry > >> @ /usr/src/sys/net/route.c:579 > > > > This seems to be caused by r293466. I'll do more investigation and reply. > Should be fixed in r294020. -- Homepage: www.yamagi.org XMPP: yam...@yamagi.org GnuPG/GPG: 0xEFBCCBCB ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: Enabling VIMAGE by default for FreeBSD 11?
Hello, it's been a while since I tested VIMAGE, but at the last time somewhere in 10-CURRENT some UMA memory leaks were left when destroying vnets. They weren't showstoppers for most workloads, but pretty anoying... Have those been fixed? Regards, Yamagi On Sat, 11 Oct 2014 10:58:13 -0700 Craig Rodrigues rodr...@freebsd.org wrote: Hi, What action items are left to enable VIMAGE by default for FreeBSD 11? Not everyone uses bhyve, so VIMAGE is quite useful when using jails. -- Craig ___ freebsd-virtualizat...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to freebsd-virtualization-unsubscr...@freebsd.org -- Homepage: www.yamagi.org XMPP: yam...@yamagi.org GnuPG/GPG: 0xEFBCCBCB ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Kernel memory corruption(?) with age(4)
On Fri, 1 Apr 2011, YongHyeon PYUN wrote: On Thu, Mar 31, 2011 at 09:59:12PM +0200, Yamagi Burmeister wrote: On Thu, 31 Mar 2011, YongHyeon PYUN wrote: Thanks a lot! It seems the L1 controller has data corruption issue when 64bit DMA addressing is used. Try this one. Oops, there was a bug in previous patch. Try this instead. Okay, that patch seems to do the trick. This was just a short test run of about one hour with just 50gb copied, but without the patch the system would have crashed in the first 20 minutes. I'll do a more comprehensive test over night and report back tomorrow morning. Fix committed to HEAD(r220249, r220252). Thanks a lot for testing! No problem. -- Homepage: www.yamagi.org Jabber: yam...@yamagi.org GnuPG/GPG:0xEFBCCBCB ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Kernel memory corruption(?) with age(4)
On Wed, 30 Mar 2011, YongHyeon PYUN wrote: Okay, I did a test run with RX checksum, TX checksum and both disabled. In all three cases the crash occurs within about 20 minutes. I'm either not sure that age(4) is the problem but it has definedly something to do with the problem, since with another nic driver the same scenario is rock solid... OK. The workload: It's a NFS3 server (FreeBSDs non-experimental implementation), serving and receiving file with about 250 to 500 megabytes at about 20mb/s. The clients are FreeBSD 7 and 8 systems and are mounting the shares via TCP. The connection is 1000mbit/s via a dumb gigabit switch. That's too broad to narrow down the issue. :-( I'm not sure but your box seem to have more than 4GB memory. Could you limit the available memory to 3GB via loader.conf and test it again? All boxes are quadcore machines with 8GB RAM, running FreeBSD/amd64. After limiting the memory via hw.physmem to 3GB the problems are gone. The box is running crashfree for more than 6 hours and has served over 300GB of data via age(4). -- Homepage: www.yamagi.org Jabber: yam...@yamagi.org GnuPG/GPG:0xEFBCCBCB ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Kernel memory corruption(?) with age(4)
On Thu, 31 Mar 2011, YongHyeon PYUN wrote: All boxes are quadcore machines with 8GB RAM, running FreeBSD/amd64. After limiting the memory via hw.physmem to 3GB the problems are gone. The box is running crashfree for more than 6 hours and has served over 300GB of data via age(4). Thanks for testing. Remove the hw.physmem configuration and try attached patch and let me know how it goes. Thanks for your help, but the patch doesn't work. Another random panic - this time page fault in kernel mode - with nothing age(4) or network stack related stuff in the backtrace... Maybe it'll help to know about a bug fix in the linux atl1 driver, now replaced by atlx. In git commit 5f08e46b621a769e52a9545a23ab1d5fb2aec1d4 64 bit DMA was disabled: 64-bit DMA causes data corruption with atl1. We don't know why, and Atheros is working on it. For now, just use 32-bit DMA. This is a big hack that is probably wrong, but it stops the bleeding. There was no later follow up on it. I think that this can't be problem on FreeBSD but maybe I'm reading the driver code wrong. The kernel.org gitweb URL is: http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.23.y.git;a=commitdiff;h=5f08e46b621a769e52a9545a23ab1d5fb2aec1d4 -- Homepage: www.yamagi.org Jabber: yam...@yamagi.org GnuPG/GPG:0xEFBCCBCB ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Kernel memory corruption(?) with age(4)
On Thu, 31 Mar 2011, YongHyeon PYUN wrote: Thanks a lot! It seems the L1 controller has data corruption issue when 64bit DMA addressing is used. Try this one. Oops, there was a bug in previous patch. Try this instead. Okay, that patch seems to do the trick. This was just a short test run of about one hour with just 50gb copied, but without the patch the system would have crashed in the first 20 minutes. I'll do a more comprehensive test over night and report back tomorrow morning. -- Homepage: www.yamagi.org Jabber: yam...@yamagi.org GnuPG/GPG:0xEFBCCBCB ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Kernel memory corruption(?) with age(4)
Hi, I recently got four about two years old Asus M3A-H/HDMI mainboards with an integrated Attansic L2 ethernet controller. This NIC is supported by age(4) and recognized by freebsd: age0: Attansic Technology Corp, L1 Gigabit Ethernet mem 0xfeac-0xfeaf irq 18 at device 0.0 on pci2 age0: 1280 Tx FIFO, 2364 Rx FIFO age0: Using 1 MSI messages. age0: 4GB boundary crossed, switching to 32bit DMA addressing mode. miibus0: MII bus on age0 atphy0: Atheros F1 10/100/1000 PHY PHY 0 on miibus0 atphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT-FDX, 1000baseT-FDX-master, auto age0: Ethernet address: 00:23:54:31:a0:12 age0: [FILTER] age0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=c319bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4, WOL_MCAST,WOL_MAGIC,VLAN_HWTSO,LINKSTATE ether 00:23:54:31:a0:12 inet6 fe80::223:54ff:fe31:a012%age0 prefixlen 64 scopeid 0x1 nd6 options=3PERFORMNUD,ACCEPT_RTADV media: Ethernet autoselect (none) status: no carrier All for boxes are unstable if the Attansic NIC is in use, no one of them survived more than 60 minutes of ~20mb/s network traffic. I managed to get some coredumps and extracted the backtraces. Since everytime one of the boxes paniced I got different panic message and a different backtrace with a different subsystem involved I suspected broken hardware. I plugged a em(4) NIC into the PCI slot and wasn't able to reproduce the problem, in fact the boxes run rock solid for several days. Next I set up a Windows 7, installed the Attansic vendor driver and did another run. All went smooth, no crash for nearly 24 hours. My guess is kernel memory corruption by age(4), which would explain all the different backtraces and the different panic messages. This problem is reproducible in at least FreeBSD 7.4 and 8.2 and with TSO4 enabled and disabled. I'm willing to debug this, but I really don't know how. So any help or a pointer into the right direction would be appreciated. Three backtraces, all of them occurred while receiving and sending data via NFS over the age(4) NIC: panic: initiate_write_filepage: dir inum 50001080 != new 0 cpuid = 2 #0 doadump () at /usr/src/sys/kern/kern_shutdown.c:251 #1 0x8018604c in db_fncall (dummy1=Variable dummy1 is not available. ) at /usr/src/sys/ddb/db_command.c:548 #2 0x80186381 in db_command (last_cmdp=0x806178c0, cmd_table=Variable cmd_table is not available. ) at /usr/src/sys/ddb/db_command.c:445 #3 0x801865d0 in db_command_loop () at /usr/src/sys/ddb/db_command.c:498 #4 0x80188619 in db_trap (type=Variable type is not available. ) at /usr/src/sys/ddb/db_main.c:229 #5 0x8024d7fe in kdb_trap (type=3, code=0, tf=0xff8243513720) at /usr/src/sys/kern/subr_kdb.c:546 #6 0x80424366 in trap (frame=0xff8243513720) at /usr/src/sys/amd64/amd64/trap.c:566 #7 0x8040c234 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:224 #8 0x8024d99d in kdb_enter (why=0x80479419 panic, msg=0xa Address 0xa out of bounds) at cpufunc.h:63 #9 0x8021c4f0 in panic (fmt=Variable fmt is not available. ) at /usr/src/sys/kern/kern_shutdown.c:575 #10 0x80c5925e in softdep_fsync_mountdev () from /boot/kernel/ufs.ko #11 0xff00067a0460 in ?? () #12 0x in ?? () #13 0xff0167d49988 in ?? () #14 0xff000694000e in ?? () #15 0xff0006b32800 in ?? () #16 0xff81ef201bd0 in ?? () #17 0xff81ef201bd0 in ?? () #18 0xff0006b613b0 in ?? () #19 0xff0006b614c8 in ?? () #20 0xff0156024878 in ?? () #21 0xff8243513980 in ?? () #22 0x80c5c174 in ffs_flushfiles () from /boot/kernel/ufs.ko #23 0xff81ef201bd0 in ?? () #24 0xff013c210a80 in ?? () #25 0x0004 in ?? () #26 0x in ?? () #27 0xff82435139b0 in ?? () #28 0x80c3ea25 in ufs_do_nfs4_acl_inheritance () from /boot/kernel/ufs.ko #29 0xff82435139b0 in ?? () #30 0x80459fb5 in VOP_STRATEGY_APV (vop=0xff00067a0460, a=0xff0167d49980) at vnode_if.c:2169 Previous frame inner to this frame (corrupt stack?) Fatal trap 9: general protection fault while in kernel mode cpuid = 2; apic id = 02 instruction pointer = 0x20:0x8020ca0e stack pointer = 0x28:0xff82435139e0 frame pointer = 0x28:0xff8243513a00 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 21 (syncer) #0 doadump () at /usr/src/sys/kern/kern_shutdown.c:251 #1 0x8018604c in db_fncall (dummy1=Variable dummy1 is not available. ) at /usr/src/sys/ddb/db_command.c:548 #2 0x80186381 in db_command (last_cmdp=0x806178c0, cmd_table=Variable cmd_table is not available. )
Re: Kernel memory corruption(?) with age(4)
On Wed, 30 Mar 2011, YongHyeon PYUN wrote: On Wed, Mar 30, 2011 at 04:22:23PM +0200, Yamagi Burmeister wrote: All for boxes are unstable if the Attansic NIC is in use, no one of them survived more than 60 minutes of ~20mb/s network traffic. I managed to get some coredumps and extracted the backtraces. Since everytime one of the boxes paniced I got different panic message and a different backtrace with a different subsystem involved I suspected broken hardware. I plugged a em(4) NIC into the PCI slot and wasn't able to reproduce the problem, in fact the boxes run rock solid for several days. Next I set up a Windows 7, installed the Attansic vendor driver and did another run. All went smooth, no crash for nearly 24 hours. My guess is kernel memory corruption by age(4), which would explain all the different backtraces and the different panic messages. This problem is reproducible in at least FreeBSD 7.4 and 8.2 and with TSO4 enabled and disabled. I'm willing to debug this, but I really don't know how. So any help or a pointer into the right direction would be appreciated. AFAIK this is the first report for possible memory corruption triggered by age(4). I'm still not sure whether it's caused by age(4) but you can disable RX checksum offloading and see whether that makes any difference. Since I have no longer access to the hardware it would be even better if you can tell me which traffic pattern triggered the issue. Okay, I did a test run with RX checksum, TX checksum and both disabled. In all three cases the crash occurs within about 20 minutes. I'm either not sure that age(4) is the problem but it has definedly something to do with the problem, since with another nic driver the same scenario is rock solid... The workload: It's a NFS3 server (FreeBSDs non-experimental implementation), serving and receiving file with about 250 to 500 megabytes at about 20mb/s. The clients are FreeBSD 7 and 8 systems and are mounting the shares via TCP. The connection is 1000mbit/s via a dumb gigabit switch. -- Homepage: www.yamagi.org Jabber: yam...@yamagi.org GnuPG/GPG:0xEFBCCBCB ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Juniper e3k with ports limitied to 100Mbit and re NICs on MSI MoBo: problems with duplex negotiation (Hetzner host provider discard FreeBSD support due this bug)
On Tue, 11 Jan 2011, Lev Serebryakov wrote: Very large and famous (due to very attractive prices) hosting provider Hetzner.de discards FreeBSD support on dedicated servers, because these servers can niot negotiate 100Mbit/DUPLEX when switches' ports are limited to 100Mbit (1Gbit connection costs additional money) only under FreeBSD. Linux works fine. Switches known to be Juniper e3k series. MoBos of servers are different assortment of MSI MoBos with Realtek (re driver) network-on-board. Symptjms are: NIC can not negotiate/set duplex when switch port is limited to 100Mbit/Duplex. Duplex can not be set even manually via ifconfig: media: Ethernet 100baseTX full-duplex (100baseTX half-duplex) Is it know problem? Maybe, -CURRENT driver has fix for it? Unfortunately, I can not provide more information, as I don't have server at Hetzner (I'm planning to order one, but due to these problems, I'm not sure now, as I need FreeBSD), and all this information is collected in communication with people who HAVE servers with FreeBSD installed. Hi, I've got several Hetzner EQ4 and on all these machines FreeBSD 8.1 runs just fine. I've never seen this strange negotiation problem myself. But maybe I was just lucky and got working mainboard and nic combinations. So if further information is needed, I'm happy to provide it. Some data: % ifconfig re0 re0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=389bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_UCAST,WOL_MCAST,WOL_MAGIC [snip] nd6 options=3PERFORMNUD,ACCEPT_RTADV media: Ethernet autoselect (100baseTX full-duplex) status: active $ dmesg re0: RealTek 8168/8111 B/C/CP/D/DP/E PCIe Gigabit Ethernet port 0xe800-0xe8ff mem 0xfbeff000-0xfbef,0xf6ff-0xf6ff irq 16 at device 0.0 on pci6 re0: Using 1 MSI messages re0: Chip rev. 0x3c00 re0: MAC rev. 0x0040 miibus0: MII bus on re0 rgephy0: RTL8169S/8110S/8211B media interface PHY 1 on miibus0 rgephy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto re0: Ethernet address: 40:61:86:f3:d7:20 re0: [FILTER] Also have a look at the FreeBSD section in the Hetzner Wiki: http://wiki.hetzner.de/index.php/FreeBSD It's in german but Google can translate it :) Ciao, Yamagi -- Homepage: www.yamagi.org Jabber: yam...@yamagi.org GnuPG/GPG:0xEFBCCBCB ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Juniper e3k with ports limitied to 100Mbit and re NICs on MSI MoBo: problems with duplex negotiation (Hetzner host provider discard FreeBSD support due this bug)
On Tue, 11 Jan 2011, Bjoern A. Zeeb wrote: I've got several Hetzner EQ4 and on all these machines FreeBSD 8.1 runs just fine. I've never seen this strange negotiation problem myself. But maybe I was just lucky and got working mainboard and nic combinations. So if further information is needed, I'm happy to provide it. A lot of us do. There is a problem with the re(4) setup as well in that if you do not send packets out yourself the port takes a very long time to come up and unblocked. I haven't discussed that with them or tested with an updated HEAD (since end of October). I never said that this problems doesn't exists. :) Lev Serebryakov said that everythings works fine in DC11 and DC12, my servers are in DC12. so I was just lucky... But yes, I am running HEAD on an EQ4 as well. If you have problems and a personal email contact at Hetzner feel free to talk to me. I am local (a couple of 100km away in the same country) and a FreeBSD committer and I can probably figure things out with them or properly proxy requests. Sadly no. My only contact to Hetzner is the service e-mail adress and the phone number for business clients. They are for all customers and probably can't help with such problems. There are special technical contacts for each DC, but those are only available for customers with hardware in that DC and with specific problems. So someone with a server in DC13 could write a service request in which the problem is explained and ask for help. Maybe they're willing ton assistent in tracking down and solving the problem. Ciao, Yamagi -- Homepage: www.yamagi.org Jabber: yam...@yamagi.org GnuPG/GPG:0xEFBCCBCB ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: [patch] WOL support for nfe(4)
On Wed, 10 Nov 2010, Ian Smith wrote: On Tue, 9 Nov 2010, Pyun YongHyeon wrote: On Tue, Nov 09, 2010 at 10:01:36PM +0100, Yamagi Burmeister wrote: On Tue, 9 Nov 2010, Pyun YongHyeon wrote: [..] You can switch to suspend mode with acpiconf -s1. If all goes well, driver would put the controller into suspend mode after reprogramming controller to accept WOL frames. After that, you can wakeup the box by sending a WOL magic packet. Okay, It thought that S3 is required. Put the box into S1, waited some minutes and send the magic packet. The video didn't resume but I was able to login via SSH. So waking up by sending the WOL magic packet works. Thanks for testing. Probably you want to poke jkim@ to address video resume issue. It _may_ be just a matter of toggling the value of hw.acpi.reset_video ? No, it doesn't. But... This is a ~5 years old die hard server board. Those machines a running headless, only this test box has a graphics adapter plugged into it. Not even a new one, but an old Geforce FX 5300 PCIe which isn't supported by nVidia any more. The manpower required to get this working is better spend on other ACPI tasks or modern hardware. -- Homepage: www.yamagi.org Jabber: yam...@yamagi.org GnuPG/GPG:0xEFBCCBCB ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: [patch] WOL support for nfe(4)
On Wed, 10 Nov 2010, Pyun YongHyeon wrote: On Tue, Nov 09, 2010 at 01:34:21PM -0800, Pyun YongHyeon wrote: On Tue, Nov 09, 2010 at 10:01:36PM +0100, Yamagi Burmeister wrote: On Tue, 9 Nov 2010, Pyun YongHyeon wrote: No, the link stays at 1000Mbps so the driver must manually switch back to 10/100Mbps. Hmm, this is real problem for WOL. Establishing 1000Mbps link to accept WOL frames is really bad idea since it can draw more power than 375mA. Consuming more power than 375mA is violation of PCI specification and some system may completely shutdown the power to protect hardware against over-current damage which in turn means WOL wouldn't work anymore. Even if WOL work with 1000Mbps link for all nfe(4) controllers, it would dissipate much more power. Because nfe(4) controllers are notorious for using various PHYs, it's hard to write a code to reliably establish 10/100Mbps link in driver. In addition, nfe(4) is known to be buggy in link state handling such that forced media selection didn't work well. I'll see what could be done in this week if I find spare time. Hmm... Maybe just add a hint to the manpage that WOL is possible broken? I think this may not be enough. Because it can damage your hardware under certain conditions if protection circuit was not there. Ok, I updated patch which will change link speed to 10/100Mps when shutdown/suspend is initiated. You can get the patch at the following URL. Please give it a try and let me know whether it really changes link speed to 10/100Mbps. If it does not work as expected, show me the dmesg output of your system. http://people.freebsd.org/~yongari/nfe/nfe.wol.patch2 Okay, that does the trick. At shutdown the link speed is changed to 10/100Mbps, at boot - either via WOL magic packet or manuell startup - it's changed back to 1000Mbps. Thanks again, Yamagi -- Homepage: www.yamagi.org Jabber: yam...@yamagi.org GnuPG/GPG:0xEFBCCBCB ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: [patch] WOL support for nfe(4)
Thanks for your reply. On Mon, 8 Nov 2010, Pyun YongHyeon wrote: Thanks for the patch. I attached slightly modified the code to better match other WOL capable drivers in tree. Because data sheet is not available I blindly made a patch based on your code. I have a couple of questions which I can't verify it on real hardware(I have no more access to the hardware). o If you established a gigabit link with link partner and shutdown your box, does the established link automatically change to 10 or 100Mbps? You can check it on your link partner. If your link partner still reports it established 1000Mbps link, we have to do other necessary work in driver(i.e. manually switching to 10/100Mbps). No, the link stays at 1000Mbps so the driver must manually switch back to 10/100Mbps. o When you put your box into suspend mode, can you wake up your box with WOL magic packet? I'm sorry but I can't test that since none of those boxes supports suspend: % sysctl hw.acpi.suspend_state hw.acpi.suspend_state: NONE o When your system boots up with/without WOL magic packet, sending WOL magic packets from other hosts can hang your box? No they don't. No matter if the box was started by sending the WOL magic packet or by hand it survives all WOL packets I send to it. o If you disabled WOL with ifconfig before system shutdown, can you still wakeup your box with WOL magic packet? No, I can't. WOL is disabled and the box must be started manually. o If you reprogram your station address with ifconfig(i.e. ifconfig nfe0 ether xx:xx:xx:xx:xx:xx), can you still wakeup your box with WOL magic packet? Yes, with sending the WOL magic packet to the new station adress. Sending it to the original adress doesn't work. The patch I made didn't take into account management firmware so if you use the patch with IMPI, IMPI wouldn't work. But I think that's not an issue since all other parts of nfe(4) also ignores management firmware at this moment. I can't test that, because none of these machines has the IPMI option installed. Sorry. Ciao, Yamagi -- Homepage: www.yamagi.org Jabber: yam...@yamagi.org GnuPG/GPG:0xEFBCCBCB ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: [patch] WOL support for nfe(4)
On Tue, 9 Nov 2010, Pyun YongHyeon wrote: No, the link stays at 1000Mbps so the driver must manually switch back to 10/100Mbps. Hmm, this is real problem for WOL. Establishing 1000Mbps link to accept WOL frames is really bad idea since it can draw more power than 375mA. Consuming more power than 375mA is violation of PCI specification and some system may completely shutdown the power to protect hardware against over-current damage which in turn means WOL wouldn't work anymore. Even if WOL work with 1000Mbps link for all nfe(4) controllers, it would dissipate much more power. Because nfe(4) controllers are notorious for using various PHYs, it's hard to write a code to reliably establish 10/100Mbps link in driver. In addition, nfe(4) is known to be buggy in link state handling such that forced media selection didn't work well. I'll see what could be done in this week if I find spare time. Hmm... Maybe just add a hint to the manpage that WOL is possible broken? Nevertheless thanks for your work it's much appreciated :) o When you put your box into suspend mode, can you wake up your box with WOL magic packet? I'm sorry but I can't test that since none of those boxes supports suspend: % sysctl hw.acpi.suspend_state hw.acpi.suspend_state: NONE You can switch to suspend mode with acpiconf -s1. If all goes well, driver would put the controller into suspend mode after reprogramming controller to accept WOL frames. After that, you can wakeup the box by sending a WOL magic packet. Okay, It thought that S3 is required. Put the box into S1, waited some minutes and send the magic packet. The video didn't resume but I was able to login via SSH. So waking up by sending the WOL magic packet works. -- Homepage: www.yamagi.org Jabber: yam...@yamagi.org GnuPG/GPG:0xEFBCCBCB ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
[patch] WOL support for nfe(4)
Hi, some time ago we migrated a lot of boxes from Linux to FreeBSD. Those machines have a NVIDIA nForce4 CK804 MCP4 network adapter, supported by nfe(4). Even if nfe(4) at least tries to enable the WOL capability of the NIC it doesn't work and nfe(4) doesn't integrate with FreeBSDs (new) WOL framework. Since we are in need of WOL I spend some minutes to implement it the correct way. Attached are two patches: - if_nfe_wol_8.1.diff against FreeBSD 8.1-RELEASE-p1, this one is used on our servers. - if_nfe_wol_current.diff against -CURRENT r214831. This one is _untested_! But it should work... In case that the patches a stripped by mailman they can be found here: http://deponie.yamagi.org/freebsd/nfe/ This patch works reliable on our machines and nfe(4) runs without any problems with it. But nevertheless my skills in writting network drivers are somewhat limited therefor a review by somewhat with better knowledge of the WOL framework and maybe nfe(4) itself is highly anticipated. Ciao, Yamagi -- Homepage: www.yamagi.org Jabber: yam...@yamagi.org GnuPG/GPG:0xEFBCCBCB--- if_nfe.c2010-11-05 10:41:04.672351879 +0100 +++ if_nfe.c2010-11-05 10:41:09.259689584 +0100 @@ -125,6 +125,7 @@ static void nfe_sysctl_node(struct nfe_softc *); static void nfe_stats_clear(struct nfe_softc *); static void nfe_stats_update(struct nfe_softc *); +static void nfe_enable_wol(struct nfe_softc *); #ifdef NFE_DEBUG static int nfedebug = 0; @@ -599,6 +600,10 @@ ifp-if_capabilities |= IFCAP_POLLING; #endif + /* Wake on LAN support */ + ifp-if_capabilities |= IFCAP_WOL_MAGIC; + ifp-if_capenable = ifp-if_capabilities; + /* Do MII setup */ error = mii_attach(dev, sc-nfe_miibus, ifp, nfe_ifmedia_upd, nfe_ifmedia_sts, BMSR_DEFCAPMASK, MII_PHY_ANY, MII_OFFSET_ANY, 0); @@ -769,6 +774,10 @@ NFE_LOCK(sc); ifp = sc-nfe_ifp; + + /* Disable WOL bits */ + NFE_WRITE(sc, NFE_WOL_CTL, 0); + if (ifp-if_flags IFF_UP) nfe_init_locked(sc); sc-nfe_suspended = 0; @@ -1752,6 +1761,12 @@ ifp-if_hwassist = ~CSUM_TSO; } + if ((mask IFCAP_WOL) != 0 + (ifp-if_capabilities IFCAP_WOL) != 0) { + if ((mask IFCAP_WOL_MAGIC) != 0) + ifp-if_capenable ^= IFCAP_WOL_MAGIC; + } + if (init 0 (ifp-if_drv_flags IFF_DRV_RUNNING) != 0) { ifp-if_drv_flags = ~IFF_DRV_RUNNING; nfe_init(sc); @@ -2746,7 +2761,6 @@ NFE_WRITE(sc, NFE_STATUS, sc-mii_phyaddr 24 | NFE_STATUS_MAGIC); NFE_WRITE(sc, NFE_SETUP_R4, NFE_R4_MAGIC); - NFE_WRITE(sc, NFE_WOL_CTL, NFE_WOL_MAGIC); sc-rxtxctl = ~NFE_RXTX_BIT2; NFE_WRITE(sc, NFE_RXTX_CTL, sc-rxtxctl); @@ -2806,12 +2820,6 @@ /* abort Tx */ NFE_WRITE(sc, NFE_TX_CTL, 0); - /* disable Rx */ - NFE_WRITE(sc, NFE_RX_CTL, 0); - - /* disable interrupts */ - nfe_disable_intr(sc); - sc-nfe_link = 0; /* free Rx and Tx mbufs still in the queues. */ @@ -2923,9 +2931,12 @@ sc = device_get_softc(dev); NFE_LOCK(sc); + nfe_enable_wol(sc); + NFE_UNLOCK(sc); + + NFE_LOCK(sc); ifp = sc-nfe_ifp; nfe_stop(ifp); - /* nfe_reset(sc); */ NFE_UNLOCK(sc); return (0); @@ -3212,3 +3223,17 @@ stats-rx_broadcast += NFE_READ(sc, NFE_TX_BROADCAST); } } + +static void +nfe_enable_wol(struct nfe_softc *sc) +{ + struct ifnet *ifp; + + NFE_LOCK_ASSERT(sc); + + ifp = sc-nfe_ifp; + + if ((ifp-if_capenable IFCAP_WOL_MAGIC) != 0) + NFE_WRITE(sc, NFE_WOL_CTL, NFE_WOL_MAGIC); +} + --- if_nfe.c2010-11-05 10:36:43.300738161 +0100 +++ if_nfe.c2010-11-05 10:39:04.712603916 +0100 @@ -125,6 +125,7 @@ static void nfe_sysctl_node(struct nfe_softc *); static void nfe_stats_clear(struct nfe_softc *); static void nfe_stats_update(struct nfe_softc *); +static void nfe_enable_wol(struct nfe_softc *); #ifdef NFE_DEBUG static int nfedebug = 0; @@ -600,6 +601,10 @@ ifp-if_capabilities |= IFCAP_POLLING; #endif + /* Wake on LAN support */ + ifp-if_capabilities |= IFCAP_WOL_MAGIC; + ifp-if_capenable = ifp-if_capabilities; + /* Do MII setup */ if (mii_phy_probe(dev, sc-nfe_miibus, nfe_ifmedia_upd, nfe_ifmedia_sts)) { @@ -770,6 +775,10 @@ NFE_LOCK(sc); ifp = sc-nfe_ifp; + + /* Disable WOL bits */ + NFE_WRITE(sc, NFE_WOL_CTL, 0); + if (ifp-if_flags IFF_UP) nfe_init_locked(sc); sc-nfe_suspended = 0; @@ -1753,6 +1762,12 @@ ifp-if_hwassist = ~CSUM_TSO; } + if ((mask IFCAP_WOL) != 0 +