Panic in route.c:579 on SSH connect with 11-CURRENT at r293913

2016-01-14 Thread Yamagi Burmeister
Hello,
with 11-CURRENT at r293913 I'm seeing this panic as soon as I'm trying
to connect through SSH:


Unread portion of the kernel message buffer:
panic: _mtx_lock_sleep: recursed on non-recursive mutex rtentry
@ /usr/src/sys/net/route.c:579

(kgdb) bt
#0  doadump (textdump=-2122574672) at pcpu.h:221
#1  0x803823b6 in db_fncall (dummy1=, 
dummy2=, dummy3=, 
dummy4=) at /usr/src/sys/ddb/db_command.c:568
#2  0x80381e4e in db_command (cmd_table=0x0)
at /usr/src/sys/ddb/db_command.c:440
#3  0x80381be4 in db_command_loop ()
at /usr/src/sys/ddb/db_command.c:493
#4  0x8038467b in db_trap (type=, code=0)
at /usr/src/sys/ddb/db_main.c:251
#5  0x80a5d893 in kdb_trap (type=3, code=0, tf=)
at /usr/src/sys/kern/subr_kdb.c:654
#6  0x80e6a2a8 in trap (frame=0xfe011b3b21e0)
at /usr/src/sys/amd64/amd64/trap.c:556
#7  0x80e4ad47 in calltrap ()
at /usr/src/sys/amd64/amd64/exception.S:234
#8  0x80a5cf7b in kdb_enter (why=0x8137b8dc "panic", 
msg=0x80 ) at cpufunc.h:63
#9  0x80a2046f in vpanic (fmt=, 
ap=) at /usr/src/sys/kern/kern_shutdown.c:750
#10 0x80a202c6 in kassert_panic (fmt=)
at /usr/src/sys/kern/kern_shutdown.c:647
#11 0x80a04441 in __mtx_lock_sleep (c=0xf80006b89cf0, 
tid=, opts=, 
file=, line=1) at /usr/src/sys/kern/kern_mutex.c:396
#12 0x80a0412d in __mtx_lock_flags (c=, opts=0, 
file=0x81395a63 "/usr/src/sys/net/route.c", line=579)
at /usr/src/sys/kern/kern_mutex.c:222
#13 0x80b10ffe in rtredirect_fib (dst=0xfe011b3b2600, 
gateway=0xfe011b3b25f0, netmask=0x0, flags=6, src=0xfe011b3b25e0, 
fibnum=0) at /usr/src/sys/net/route.c:579
#14 0x80b6cad7 in icmp_input (mp=0xfe011b3b2670, 
offp=0xfe011b3b266c, proto=1) at /usr/src/sys/netinet/ip_icmp.c:614
#15 0x80b6d5cd in ip_input (m=0x4)
at /usr/src/sys/netinet/ip_input.c:786
#16 0x80b0c861 in netisr_dispatch_src (proto=, 
source=, m=0xf80006720b00)
at /usr/src/sys/net/netisr.c:972
#17 0x80b029be in ether_demux (ifp=, 
m=) at /usr/src/sys/net/if_ethersubr.c:803
#18 0x80b03704 in ether_nh_input (m=)
at /usr/src/sys/net/if_ethersubr.c:609
#19 0x80b0c861 in netisr_dispatch_src (proto=, 
source=, m=0xf80006720b00)
at /usr/src/sys/net/netisr.c:972
#20 0x80b02cbf in ether_input (ifp=0xf80003f2b000, m=0x0)
at /usr/src/sys/net/if_ethersubr.c:713
#21 0x808a1b43 in vtnet_rxq_eof (rxq=0xf80003f06e00)
at /usr/src/sys/dev/virtio/network/if_vtnet.c:1732
#22 0x808a284e in vtnet_rx_vq_intr (xrxq=0xf80003f06e00)
at /usr/src/sys/dev/virtio/network/if_vtnet.c:1863
#23 0x809e8ef6 in intr_event_execute_handlers (
p=, ie=0xf80003ede200)
at /usr/src/sys/kern/kern_intr.c:1262
#24 0x809e9586 in ithread_loop (arg=0xf80003cbbc60)
at /usr/src/sys/kern/kern_intr.c:1275
#25 0x809e67b4 in fork_exit (
callout=0x809e94e0 , arg=0xf80003cbbc60, 
frame=0xfe011b3b29c0) at /usr/src/sys/kern/kern_fork.c:1010
#26 0x80e4b27e in fork_trampoline ()
at /usr/src/sys/amd64/amd64/exception.S:609
#27 0x in ?? ()
Current language:  auto; currently minimal


This a a byhve VM with an VirtIO network adapter:

virtio_pci0:  port 0x2000-0x201f mem 
0xc000-0xc0001fff irq 16 at device 2.0 on pci0
vtnet0:  on virtio_pci0
vtnet0: Ethernet address: 00:a0:98:51:ed:26
001.48 [ 421] vtnet_netmap_attach   max rings 1
vtnet0: netmap queues/slots: TX 1/1024, RX 1/1024
001.49 [ 426] vtnet_netmap_attach   virtio attached txq=1, txd=1024 
rxq=1, rxd=1024


This may be caused by the recent routing work, but I'm not quite
sure. I have the dump and I'm able to reproduce this easily so
more information can be provided if necessary.

Regards,
Yamagi

-- 
Homepage:  www.yamagi.org
XMPP:  yam...@yamagi.org
GnuPG/GPG: 0xEFBCCBCB
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Panic in route.c:579 on SSH connect with 11-CURRENT at r293913

2016-01-14 Thread Yamagi Burmeister
Hello,
updating to r294020 solves the issue for me. Thank you. :)

Regards,
Yamagi

On Thu, 14 Jan 2016 19:31:45 +0300
Alexander V. Chernikov <melif...@freebsd.org> wrote:

> 14.01.2016, 19:16, "Alexander V. Chernikov" <melif...@freebsd.org>:
> > 14.01.2016, 18:29, "Yamagi Burmeister" <li...@yamagi.org>:
> >>  Hello,
> >>  with 11-CURRENT at r293913 I'm seeing this panic as soon as I'm trying
> >>  to connect through SSH:
> >>
> >>  Unread portion of the kernel message buffer:
> >>  panic: _mtx_lock_sleep: recursed on non-recursive mutex rtentry
> >>  @ /usr/src/sys/net/route.c:579
> >
> > This seems to be caused by r293466. I'll do more investigation and reply.
> Should be fixed in r294020.


-- 
Homepage:  www.yamagi.org
XMPP:  yam...@yamagi.org
GnuPG/GPG: 0xEFBCCBCB
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Enabling VIMAGE by default for FreeBSD 11?

2014-10-12 Thread Yamagi Burmeister
Hello,
it's been a while since I tested VIMAGE, but at the last time somewhere
in 10-CURRENT some UMA memory leaks were left when destroying vnets. 
They weren't showstoppers for most workloads, but pretty anoying...
Have those been fixed?

Regards,
Yamagi

On Sat, 11 Oct 2014 10:58:13 -0700
Craig Rodrigues rodr...@freebsd.org wrote:

 Hi,
 
 What action items are left to enable VIMAGE by default for FreeBSD 11?
 
 Not everyone uses bhyve, so VIMAGE is quite useful when using jails.
 
 --
 Craig
 ___
 freebsd-virtualizat...@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
 To unsubscribe, send any mail to 
 freebsd-virtualization-unsubscr...@freebsd.org


-- 
Homepage:  www.yamagi.org
XMPP:  yam...@yamagi.org
GnuPG/GPG: 0xEFBCCBCB
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: Kernel memory corruption(?) with age(4)

2011-04-02 Thread Yamagi Burmeister

On Fri, 1 Apr 2011, YongHyeon PYUN wrote:


On Thu, Mar 31, 2011 at 09:59:12PM +0200, Yamagi Burmeister wrote:

On Thu, 31 Mar 2011, YongHyeon PYUN wrote:


Thanks a lot! It seems the L1 controller has data corruption issue
when 64bit DMA addressing is used. Try this one.


Oops, there was a bug in previous patch.
Try this instead.


Okay, that patch seems to do the trick. This was just a short test run
of about one hour with just 50gb copied, but without the patch the
system would have crashed in the first 20 minutes. I'll do a more
comprehensive test over night and report back tomorrow morning.



Fix committed to HEAD(r220249, r220252).
Thanks a lot for testing!


No problem.

--
Homepage: www.yamagi.org
Jabber:   yam...@yamagi.org
GnuPG/GPG:0xEFBCCBCB
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: Kernel memory corruption(?) with age(4)

2011-03-31 Thread Yamagi Burmeister

On Wed, 30 Mar 2011, YongHyeon PYUN wrote:


Okay, I did a test run with RX checksum, TX checksum and both disabled.
In all three cases the crash occurs within about 20 minutes. I'm either
not sure that age(4) is the problem but it has definedly something to do
with the problem, since with another nic driver the same scenario is
rock solid...



OK.


The workload: It's a NFS3 server (FreeBSDs non-experimental
implementation), serving and receiving file with about 250 to 500
megabytes at about 20mb/s. The clients are FreeBSD 7 and 8 systems and
are mounting the shares via TCP. The connection is 1000mbit/s via a
dumb gigabit switch.



That's too broad to narrow down the issue. :-(
I'm not sure but your box seem to have more than 4GB memory. Could
you limit the available memory to 3GB via loader.conf and test it
again?


All boxes are quadcore machines with 8GB RAM, running FreeBSD/amd64.
After limiting the memory via hw.physmem to 3GB the problems are gone.
The box is running crashfree for more than 6 hours and has served over
300GB of data via age(4).

--
Homepage: www.yamagi.org
Jabber:   yam...@yamagi.org
GnuPG/GPG:0xEFBCCBCB
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: Kernel memory corruption(?) with age(4)

2011-03-31 Thread Yamagi Burmeister

On Thu, 31 Mar 2011, YongHyeon PYUN wrote:


All boxes are quadcore machines with 8GB RAM, running FreeBSD/amd64.
After limiting the memory via hw.physmem to 3GB the problems are gone.
The box is running crashfree for more than 6 hours and has served over
300GB of data via age(4).



Thanks for testing. Remove the hw.physmem configuration and try
attached patch and let me know how it goes.


Thanks for your help, but the patch doesn't work. Another random panic -
this time page fault in kernel mode - with nothing age(4) or network
stack related stuff in the backtrace...

Maybe it'll help to know about a bug fix in the linux atl1 driver, now
replaced by atlx. In git commit 5f08e46b621a769e52a9545a23ab1d5fb2aec1d4
64 bit DMA was disabled:

  64-bit DMA causes data corruption with atl1.  We don't know why, and
  Atheros is working on it. For now, just use 32-bit DMA. This is a big
  hack that is probably wrong, but it stops the bleeding.

There was no later follow up on it. I think that this can't be problem
on FreeBSD but maybe I'm reading the driver code wrong. The kernel.org
gitweb URL is:

http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.23.y.git;a=commitdiff;h=5f08e46b621a769e52a9545a23ab1d5fb2aec1d4

--
Homepage: www.yamagi.org
Jabber:   yam...@yamagi.org
GnuPG/GPG:0xEFBCCBCB
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: Kernel memory corruption(?) with age(4)

2011-03-31 Thread Yamagi Burmeister

On Thu, 31 Mar 2011, YongHyeon PYUN wrote:


Thanks a lot! It seems the L1 controller has data corruption issue
when 64bit DMA addressing is used. Try this one.


Oops, there was a bug in previous patch.
Try this instead.


Okay, that patch seems to do the trick. This was just a short test run
of about one hour with just 50gb copied, but without the patch the
system would have crashed in the first 20 minutes. I'll do a more
comprehensive test over night and report back tomorrow morning.

--
Homepage: www.yamagi.org
Jabber:   yam...@yamagi.org
GnuPG/GPG:0xEFBCCBCB
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Kernel memory corruption(?) with age(4)

2011-03-30 Thread Yamagi Burmeister

Hi,
I recently got four about two years old Asus M3A-H/HDMI mainboards with
an integrated Attansic L2 ethernet controller. This NIC is supported by
age(4) and recognized by freebsd:



age0: Attansic Technology Corp, L1 Gigabit Ethernet
   mem 0xfeac-0xfeaf irq 18 at device 0.0 on pci2
age0: 1280 Tx FIFO, 2364 Rx FIFO
age0: Using 1 MSI messages.
age0: 4GB boundary crossed, switching to 32bit DMA addressing mode.
miibus0: MII bus on age0
atphy0: Atheros F1 10/100/1000 PHY PHY 0 on miibus0
atphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT-FDX,
 1000baseT-FDX-master, auto
age0: Ethernet address: 00:23:54:31:a0:12
age0: [FILTER]



age0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500
options=c319bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,
WOL_MCAST,WOL_MAGIC,VLAN_HWTSO,LINKSTATE
ether 00:23:54:31:a0:12
inet6 fe80::223:54ff:fe31:a012%age0 prefixlen 64 scopeid 0x1
nd6 options=3PERFORMNUD,ACCEPT_RTADV
media: Ethernet autoselect (none)
status: no carrier



All for boxes are unstable if the Attansic NIC is in use, no one of them
survived more than 60 minutes of ~20mb/s network traffic. I managed to
get some coredumps and extracted the backtraces. Since everytime one of
the boxes paniced I got different panic message and a different backtrace
with a different subsystem involved I suspected broken hardware. I
plugged a em(4) NIC into the PCI slot and wasn't able to reproduce the
problem, in fact the boxes run rock solid for several days. Next I set
up a Windows 7, installed the Attansic vendor driver and did another
run. All went smooth, no crash for nearly 24 hours.

My guess is kernel memory corruption by age(4), which would explain all
the different backtraces and the different panic messages. This problem
is reproducible in at least FreeBSD 7.4 and 8.2 and with TSO4 enabled
and disabled. I'm willing to debug this, but I really don't know how. So
any help or a pointer into the right direction would be appreciated.



Three backtraces, all of them occurred while receiving and sending data
via NFS over the age(4) NIC:

panic: initiate_write_filepage: dir inum 50001080 != new 0
cpuid = 2

#0  doadump () at /usr/src/sys/kern/kern_shutdown.c:251
#1  0x8018604c in db_fncall (dummy1=Variable dummy1 is not available.
) at /usr/src/sys/ddb/db_command.c:548
#2  0x80186381 in db_command (last_cmdp=0x806178c0, cmd_table=Variable 
cmd_table is not available.
) at /usr/src/sys/ddb/db_command.c:445
#3  0x801865d0 in db_command_loop () at 
/usr/src/sys/ddb/db_command.c:498
#4  0x80188619 in db_trap (type=Variable type is not available.
) at /usr/src/sys/ddb/db_main.c:229
#5  0x8024d7fe in kdb_trap (type=3, code=0, tf=0xff8243513720) at 
/usr/src/sys/kern/subr_kdb.c:546
#6  0x80424366 in trap (frame=0xff8243513720) at 
/usr/src/sys/amd64/amd64/trap.c:566
#7  0x8040c234 in calltrap () at 
/usr/src/sys/amd64/amd64/exception.S:224
#8  0x8024d99d in kdb_enter (why=0x80479419 panic, msg=0xa 
Address 0xa out of bounds) at cpufunc.h:63
#9  0x8021c4f0 in panic (fmt=Variable fmt is not available.
) at /usr/src/sys/kern/kern_shutdown.c:575
#10 0x80c5925e in softdep_fsync_mountdev () from /boot/kernel/ufs.ko
#11 0xff00067a0460 in ?? ()
#12 0x in ?? ()
#13 0xff0167d49988 in ?? ()
#14 0xff000694000e in ?? ()
#15 0xff0006b32800 in ?? ()
#16 0xff81ef201bd0 in ?? ()
#17 0xff81ef201bd0 in ?? ()
#18 0xff0006b613b0 in ?? ()
#19 0xff0006b614c8 in ?? ()
#20 0xff0156024878 in ?? ()
#21 0xff8243513980 in ?? ()
#22 0x80c5c174 in ffs_flushfiles () from /boot/kernel/ufs.ko
#23 0xff81ef201bd0 in ?? ()
#24 0xff013c210a80 in ?? ()
#25 0x0004 in ?? ()
#26 0x in ?? ()
#27 0xff82435139b0 in ?? ()
#28 0x80c3ea25 in ufs_do_nfs4_acl_inheritance () from 
/boot/kernel/ufs.ko
#29 0xff82435139b0 in ?? ()
#30 0x80459fb5 in VOP_STRATEGY_APV (vop=0xff00067a0460, 
a=0xff0167d49980) at vnode_if.c:2169
Previous frame inner to this frame (corrupt stack?)



Fatal trap 9: general protection fault while in kernel mode
cpuid = 2; apic id = 02
instruction pointer = 0x20:0x8020ca0e
stack pointer   = 0x28:0xff82435139e0
frame pointer   = 0x28:0xff8243513a00
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 21 (syncer)

#0  doadump () at /usr/src/sys/kern/kern_shutdown.c:251
#1  0x8018604c in db_fncall (dummy1=Variable dummy1 is not available.
) at /usr/src/sys/ddb/db_command.c:548
#2  0x80186381 in db_command (last_cmdp=0x806178c0, cmd_table=Variable 
cmd_table is not available.
) 

Re: Kernel memory corruption(?) with age(4)

2011-03-30 Thread Yamagi Burmeister

On Wed, 30 Mar 2011, YongHyeon PYUN wrote:


On Wed, Mar 30, 2011 at 04:22:23PM +0200, Yamagi Burmeister wrote:


All for boxes are unstable if the Attansic NIC is in use, no one of them
survived more than 60 minutes of ~20mb/s network traffic. I managed to
get some coredumps and extracted the backtraces. Since everytime one of
the boxes paniced I got different panic message and a different backtrace
with a different subsystem involved I suspected broken hardware. I
plugged a em(4) NIC into the PCI slot and wasn't able to reproduce the
problem, in fact the boxes run rock solid for several days. Next I set
up a Windows 7, installed the Attansic vendor driver and did another
run. All went smooth, no crash for nearly 24 hours.

My guess is kernel memory corruption by age(4), which would explain all
the different backtraces and the different panic messages. This problem
is reproducible in at least FreeBSD 7.4 and 8.2 and with TSO4 enabled
and disabled. I'm willing to debug this, but I really don't know how. So
any help or a pointer into the right direction would be appreciated.



AFAIK this is the first report for possible memory corruption
triggered by age(4). I'm still not sure whether it's caused by
age(4) but you can disable RX checksum offloading and see whether
that makes any difference.
Since I have no longer access to the hardware it would be even
better if you can tell me which traffic pattern triggered the
issue.


Okay, I did a test run with RX checksum, TX checksum and both disabled. 
In all three cases the crash occurs within about 20 minutes. I'm either

not sure that age(4) is the problem but it has definedly something to do
with the problem, since with another nic driver the same scenario is
rock solid...

The workload: It's a NFS3 server (FreeBSDs non-experimental
implementation), serving and receiving file with about 250 to 500
megabytes at about 20mb/s. The clients are FreeBSD 7 and 8 systems and
are mounting the shares via TCP. The connection is 1000mbit/s via a
dumb gigabit switch.

--
Homepage: www.yamagi.org
Jabber:   yam...@yamagi.org
GnuPG/GPG:0xEFBCCBCB
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: Juniper e3k with ports limitied to 100Mbit and re NICs on MSI MoBo: problems with duplex negotiation (Hetzner host provider discard FreeBSD support due this bug)

2011-01-11 Thread Yamagi Burmeister

On Tue, 11 Jan 2011, Lev Serebryakov wrote:


 Very large and famous (due to very attractive prices) hosting
provider Hetzner.de discards FreeBSD support on dedicated servers,
because these servers can niot negotiate 100Mbit/DUPLEX when
switches' ports are limited to 100Mbit (1Gbit connection costs
additional money) only under FreeBSD. Linux works fine.

 Switches known to be Juniper e3k series.

 MoBos of servers are different assortment of MSI MoBos with Realtek
(re driver) network-on-board.

 Symptjms are: NIC can not negotiate/set duplex when switch port is
limited to 100Mbit/Duplex. Duplex can not be set even manually via
ifconfig:


media: Ethernet 100baseTX full-duplex (100baseTX half-duplex)

 Is it know problem? Maybe, -CURRENT driver has fix for it?

 Unfortunately, I can not provide more information, as I don't have
server at Hetzner (I'm planning to order one, but due to these
problems, I'm not sure now, as I need FreeBSD), and all this
information is collected in communication with people who HAVE servers
with FreeBSD installed.


Hi,
I've got several Hetzner EQ4 and on all these machines FreeBSD 8.1 runs
just fine. I've never seen this strange negotiation problem myself. But
maybe I was just lucky and got working mainboard and nic combinations.
So if further information is needed, I'm happy to provide it.

Some data:

% ifconfig re0
  re0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500

options=389bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_UCAST,WOL_MCAST,WOL_MAGIC
[snip]
nd6 options=3PERFORMNUD,ACCEPT_RTADV
media: Ethernet autoselect (100baseTX full-duplex)
status: active

$ dmesg
  re0: RealTek 8168/8111 B/C/CP/D/DP/E PCIe Gigabit Ethernet port 
0xe800-0xe8ff mem 0xfbeff000-0xfbef,0xf6ff-0xf6ff irq 16 at device 0.0 on 
pci6
  re0: Using 1 MSI messages
  re0: Chip rev. 0x3c00
  re0: MAC rev. 0x0040
  miibus0: MII bus on re0
  rgephy0: RTL8169S/8110S/8211B media interface PHY 1 on miibus0
  rgephy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
1000baseT-FDX, auto
  re0: Ethernet address: 40:61:86:f3:d7:20
  re0: [FILTER]

Also have a look at the FreeBSD section in the Hetzner Wiki:
http://wiki.hetzner.de/index.php/FreeBSD
It's in german but Google can translate it :)

Ciao,
Yamagi

--
Homepage: www.yamagi.org
Jabber:   yam...@yamagi.org
GnuPG/GPG:0xEFBCCBCB
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: Juniper e3k with ports limitied to 100Mbit and re NICs on MSI MoBo: problems with duplex negotiation (Hetzner host provider discard FreeBSD support due this bug)

2011-01-11 Thread Yamagi Burmeister

On Tue, 11 Jan 2011, Bjoern A. Zeeb wrote:


I've got several Hetzner EQ4 and on all these machines FreeBSD 8.1 runs
just fine. I've never seen this strange negotiation problem myself. But
maybe I was just lucky and got working mainboard and nic combinations.
So if further information is needed, I'm happy to provide it.



A lot of us do.  There is a problem with the re(4) setup as well in
that if you do not send packets out yourself the port takes a very
long time to come up and unblocked.  I haven't discussed that with
them or tested with an updated HEAD (since end of October).


I never said that this problems doesn't exists. :) Lev Serebryakov said
that everythings works fine in DC11 and DC12, my servers are in DC12. so
I was just lucky...


But yes, I am running HEAD on an EQ4 as well.  If you have problems
and a personal email contact at Hetzner feel free to talk to me.
I am local (a couple of 100km away in the same country) and a FreeBSD
committer and I can probably figure things out with them or properly
proxy requests.


Sadly no. My only contact to Hetzner is the service e-mail adress and
the phone number for business clients. They are for all customers and
probably can't help with such problems. There are special technical
contacts for each DC, but those are only available for customers with
hardware in that DC and with specific problems. So someone with a server
in DC13 could write a service request in which the problem is explained
and ask for help. Maybe they're willing ton assistent in tracking down
and solving the problem.

Ciao,
Yamagi

--
Homepage: www.yamagi.org
Jabber:   yam...@yamagi.org
GnuPG/GPG:0xEFBCCBCB
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: [patch] WOL support for nfe(4)

2010-11-10 Thread Yamagi Burmeister

On Wed, 10 Nov 2010, Ian Smith wrote:


On Tue, 9 Nov 2010, Pyun YongHyeon wrote:
 On Tue, Nov 09, 2010 at 10:01:36PM +0100, Yamagi Burmeister wrote:
  On Tue, 9 Nov 2010, Pyun YongHyeon wrote:
[..]
  You can switch to suspend mode with acpiconf -s1. If all goes
  well, driver would put the controller into suspend mode after
  reprogramming controller to accept WOL frames. After that, you can
  wakeup the box by sending a WOL magic packet.
 
  Okay, It thought that S3 is required. Put the box into S1, waited some
  minutes and send the magic packet. The video didn't resume but I was
  able to login via SSH. So waking up by sending the WOL magic packet
  works.
 

 Thanks for testing. Probably you want to poke jkim@ to address
 video resume issue.

It _may_ be just a matter of toggling the value of hw.acpi.reset_video ?


No, it doesn't. But... This is a ~5 years old die hard server board.
Those machines a running headless, only this test box has a graphics
adapter plugged into it. Not even a new one, but an old Geforce FX 5300
PCIe which isn't supported by nVidia any more. The manpower required to
get this working is better spend on other ACPI tasks or modern hardware.

--
Homepage: www.yamagi.org
Jabber:   yam...@yamagi.org
GnuPG/GPG:0xEFBCCBCB
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: [patch] WOL support for nfe(4)

2010-11-10 Thread Yamagi Burmeister

On Wed, 10 Nov 2010, Pyun YongHyeon wrote:


On Tue, Nov 09, 2010 at 01:34:21PM -0800, Pyun YongHyeon wrote:

On Tue, Nov 09, 2010 at 10:01:36PM +0100, Yamagi Burmeister wrote:

On Tue, 9 Nov 2010, Pyun YongHyeon wrote:


No, the link stays at 1000Mbps so the driver must manually switch back
to 10/100Mbps.



Hmm, this is real problem for WOL. Establishing 1000Mbps link to
accept WOL frames is really bad idea since it can draw more power
than 375mA. Consuming more power than 375mA is violation of
PCI specification and some system may completely shutdown the power
to protect hardware against over-current damage which in turn means
WOL wouldn't work anymore. Even if WOL work with 1000Mbps link for
all nfe(4) controllers, it would dissipate much more power.

Because nfe(4) controllers are notorious for using various PHYs,
it's hard to write a code to reliably establish 10/100Mbps link in
driver. In addition, nfe(4) is known to be buggy in link state
handling such that forced media selection didn't work well. I'll
see what could be done in this week if I find spare time.


Hmm... Maybe just add a hint to the manpage that WOL is possible broken?


I think this may not be enough. Because it can damage your hardware
under certain conditions if protection circuit was not there.



Ok, I updated patch which will change link speed to 10/100Mps when
shutdown/suspend is initiated.  You can get the patch at the
following URL. Please give it a try and let me know whether it
really changes link speed to 10/100Mbps. If it does not work as
expected, show me the dmesg output of your system.

http://people.freebsd.org/~yongari/nfe/nfe.wol.patch2


Okay, that does the trick. At shutdown the link speed is changed to
10/100Mbps, at boot - either via WOL magic packet or manuell startup -
it's changed back to 1000Mbps.

Thanks again,
Yamagi

--
Homepage: www.yamagi.org
Jabber:   yam...@yamagi.org
GnuPG/GPG:0xEFBCCBCB
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: [patch] WOL support for nfe(4)

2010-11-09 Thread Yamagi Burmeister


Thanks for your reply.

On Mon, 8 Nov 2010, Pyun YongHyeon wrote:


Thanks for the patch. I attached slightly modified the code to
better match other WOL capable drivers in tree. Because data sheet
is not available I blindly made a patch based on your code. I have
a couple of questions which I can't verify it on real hardware(I
have no more access to the hardware).

o If you established a gigabit link with link partner and shutdown
 your box, does the established link automatically change to 10 or
 100Mbps? You can check it on your link partner. If your link
 partner still reports it established 1000Mbps link, we have to
 do other necessary work in driver(i.e. manually switching to
 10/100Mbps).


No, the link stays at 1000Mbps so the driver must manually switch back
to 10/100Mbps.


o When you put your box into suspend mode, can you wake up your box
 with WOL magic packet?


I'm sorry but I can't test that since none of those boxes supports
suspend:

  % sysctl hw.acpi.suspend_state
hw.acpi.suspend_state: NONE


o When your system boots up with/without WOL magic packet, sending
 WOL magic packets from other hosts can hang your box?


No they don't. No matter if the box was started by sending the WOL magic
packet or by hand it survives all WOL packets I send to it.


o If you disabled WOL with ifconfig before system shutdown, can you
 still wakeup your box with WOL magic packet?


No, I can't. WOL is disabled and the box must be started manually.


o If you reprogram your station address with ifconfig(i.e. ifconfig
 nfe0 ether xx:xx:xx:xx:xx:xx), can you still wakeup your box with
 WOL magic packet?


Yes, with sending the WOL magic packet to the new station adress.
Sending it to the original adress doesn't work.


The patch I made didn't take into account management firmware so
if you use the patch with IMPI, IMPI wouldn't work. But I think
that's not an issue since all other parts of nfe(4) also ignores
management firmware at this moment.


I can't test that, because none of these machines has the IPMI option
installed. Sorry.

Ciao,
Yamagi

--
Homepage: www.yamagi.org
Jabber:   yam...@yamagi.org
GnuPG/GPG:0xEFBCCBCB
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: [patch] WOL support for nfe(4)

2010-11-09 Thread Yamagi Burmeister

On Tue, 9 Nov 2010, Pyun YongHyeon wrote:


No, the link stays at 1000Mbps so the driver must manually switch back
to 10/100Mbps.



Hmm, this is real problem for WOL. Establishing 1000Mbps link to
accept WOL frames is really bad idea since it can draw more power
than 375mA. Consuming more power than 375mA is violation of
PCI specification and some system may completely shutdown the power
to protect hardware against over-current damage which in turn means
WOL wouldn't work anymore. Even if WOL work with 1000Mbps link for
all nfe(4) controllers, it would dissipate much more power.

Because nfe(4) controllers are notorious for using various PHYs,
it's hard to write a code to reliably establish 10/100Mbps link in
driver. In addition, nfe(4) is known to be buggy in link state
handling such that forced media selection didn't work well. I'll
see what could be done in this week if I find spare time.


Hmm... Maybe just add a hint to the manpage that WOL is possible broken?
Nevertheless thanks for your work it's much appreciated :)


o When you put your box into suspend mode, can you wake up your box
with WOL magic packet?


I'm sorry but I can't test that since none of those boxes supports
suspend:

  % sysctl hw.acpi.suspend_state
hw.acpi.suspend_state: NONE



You can switch to suspend mode with acpiconf -s1. If all goes
well, driver would put the controller into suspend mode after
reprogramming controller to accept WOL frames. After that, you can
wakeup the box by sending a WOL magic packet.


Okay, It thought that S3 is required. Put the box into S1, waited some
minutes and send the magic packet. The video didn't resume but I was
able to login via SSH. So waking up by sending the WOL magic packet
works.

--
Homepage: www.yamagi.org
Jabber:   yam...@yamagi.org
GnuPG/GPG:0xEFBCCBCB
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


[patch] WOL support for nfe(4)

2010-11-05 Thread Yamagi Burmeister

Hi,

some time ago we migrated a lot of boxes from Linux to FreeBSD. Those
machines have a NVIDIA nForce4 CK804 MCP4 network adapter, supported
by nfe(4). Even if nfe(4) at least tries to enable the WOL capability of
the NIC it doesn't work and nfe(4) doesn't integrate with FreeBSDs (new)
WOL framework. Since we are in need of WOL I spend some minutes to
implement it the correct way.

Attached are two patches:
- if_nfe_wol_8.1.diff against FreeBSD 8.1-RELEASE-p1, this one is used
  on our servers.
- if_nfe_wol_current.diff against -CURRENT r214831. This one is
  _untested_! But it should work...

In case that the patches a stripped by mailman they can be found here:
http://deponie.yamagi.org/freebsd/nfe/

This patch works reliable on our machines and nfe(4) runs without any
problems with it. But nevertheless my skills in writting network drivers
are somewhat limited therefor a review by somewhat with better knowledge
of the WOL framework and maybe nfe(4) itself is highly anticipated.

Ciao,
Yamagi

--
Homepage: www.yamagi.org
Jabber:   yam...@yamagi.org
GnuPG/GPG:0xEFBCCBCB--- if_nfe.c2010-11-05 10:41:04.672351879 +0100
+++ if_nfe.c2010-11-05 10:41:09.259689584 +0100
@@ -125,6 +125,7 @@
 static void nfe_sysctl_node(struct nfe_softc *);
 static void nfe_stats_clear(struct nfe_softc *);
 static void nfe_stats_update(struct nfe_softc *);
+static void nfe_enable_wol(struct nfe_softc *);
 
 #ifdef NFE_DEBUG
 static int nfedebug = 0;
@@ -599,6 +600,10 @@
ifp-if_capabilities |= IFCAP_POLLING;
 #endif
 
+   /* Wake on LAN support */
+   ifp-if_capabilities |= IFCAP_WOL_MAGIC;
+   ifp-if_capenable = ifp-if_capabilities;
+
/* Do MII setup */
error = mii_attach(dev, sc-nfe_miibus, ifp, nfe_ifmedia_upd,
nfe_ifmedia_sts, BMSR_DEFCAPMASK, MII_PHY_ANY, MII_OFFSET_ANY, 0);
@@ -769,6 +774,10 @@
 
NFE_LOCK(sc);
ifp = sc-nfe_ifp;
+
+   /* Disable WOL bits */
+   NFE_WRITE(sc, NFE_WOL_CTL, 0);
+
if (ifp-if_flags  IFF_UP)
nfe_init_locked(sc);
sc-nfe_suspended = 0;
@@ -1752,6 +1761,12 @@
ifp-if_hwassist = ~CSUM_TSO;
}
 
+   if ((mask  IFCAP_WOL) != 0 
+   (ifp-if_capabilities  IFCAP_WOL) != 0) {
+   if ((mask  IFCAP_WOL_MAGIC) != 0)
+   ifp-if_capenable ^= IFCAP_WOL_MAGIC;
+   }
+
if (init  0  (ifp-if_drv_flags  IFF_DRV_RUNNING) != 0) {
ifp-if_drv_flags = ~IFF_DRV_RUNNING;
nfe_init(sc);
@@ -2746,7 +2761,6 @@
NFE_WRITE(sc, NFE_STATUS, sc-mii_phyaddr  24 | NFE_STATUS_MAGIC);
 
NFE_WRITE(sc, NFE_SETUP_R4, NFE_R4_MAGIC);
-   NFE_WRITE(sc, NFE_WOL_CTL, NFE_WOL_MAGIC);
 
sc-rxtxctl = ~NFE_RXTX_BIT2;
NFE_WRITE(sc, NFE_RXTX_CTL, sc-rxtxctl);
@@ -2806,12 +2820,6 @@
/* abort Tx */
NFE_WRITE(sc, NFE_TX_CTL, 0);
 
-   /* disable Rx */
-   NFE_WRITE(sc, NFE_RX_CTL, 0);
-
-   /* disable interrupts */
-   nfe_disable_intr(sc);
-
sc-nfe_link = 0;
 
/* free Rx and Tx mbufs still in the queues. */
@@ -2923,9 +2931,12 @@
sc = device_get_softc(dev);
 
NFE_LOCK(sc);
+   nfe_enable_wol(sc);
+   NFE_UNLOCK(sc);
+
+   NFE_LOCK(sc);
ifp = sc-nfe_ifp;
nfe_stop(ifp);
-   /* nfe_reset(sc); */
NFE_UNLOCK(sc);
 
return (0);
@@ -3212,3 +3223,17 @@
stats-rx_broadcast += NFE_READ(sc, NFE_TX_BROADCAST);
}
 }
+
+static void
+nfe_enable_wol(struct nfe_softc *sc)
+{
+   struct ifnet *ifp;
+
+   NFE_LOCK_ASSERT(sc);
+
+   ifp = sc-nfe_ifp;
+
+   if ((ifp-if_capenable  IFCAP_WOL_MAGIC) != 0)
+   NFE_WRITE(sc, NFE_WOL_CTL, NFE_WOL_MAGIC);
+}
+
--- if_nfe.c2010-11-05 10:36:43.300738161 +0100
+++ if_nfe.c2010-11-05 10:39:04.712603916 +0100
@@ -125,6 +125,7 @@
 static void nfe_sysctl_node(struct nfe_softc *);
 static void nfe_stats_clear(struct nfe_softc *);
 static void nfe_stats_update(struct nfe_softc *);
+static void nfe_enable_wol(struct nfe_softc *);
 
 #ifdef NFE_DEBUG
 static int nfedebug = 0;
@@ -600,6 +601,10 @@
ifp-if_capabilities |= IFCAP_POLLING;
 #endif
 
+   /* Wake on LAN support */
+   ifp-if_capabilities |= IFCAP_WOL_MAGIC;
+   ifp-if_capenable = ifp-if_capabilities;
+
/* Do MII setup */
if (mii_phy_probe(dev, sc-nfe_miibus, nfe_ifmedia_upd,
nfe_ifmedia_sts)) {
@@ -770,6 +775,10 @@
 
NFE_LOCK(sc);
ifp = sc-nfe_ifp;
+
+   /* Disable WOL bits */
+   NFE_WRITE(sc, NFE_WOL_CTL, 0);
+
if (ifp-if_flags  IFF_UP)
nfe_init_locked(sc);
sc-nfe_suspended = 0;
@@ -1753,6 +1762,12 @@
ifp-if_hwassist = ~CSUM_TSO;
}
 
+   if ((mask  IFCAP_WOL) != 0 
+