high cpu usage on natd / dhcpd

2012-01-25 Thread Matthew Luckie

Hi

I have a small system running FreeBSD 8.2 that does NAT using ipfw and 
natd to systems attached to two interfaces: em0 and wlan0.  I have a 
dhcpd daemon issuing leases on those interfaces.  The system has an em1 
interface plugged into a cable modem where it obtains a DHCP lease from 
an ISP.


For some reason, when traffic from the Internet terminates on the system 
itself (I scp a file from the computer) the natd and dhcpd processes 
consume significant CPU, and the throughput is less than I expect. 
Traffic that passes through to a computer behind the NAT flows without 
causing the natd or dhcpd processes to measurably consume CPU.


From top:

CPU: 10.9% user,  0.0% nice, 56.0% system, 21.1% interrupt, 12.0% idle
Mem: 225M Active, 92M Inact, 162M Wired, 556K Cache, 112M Buf, 1506M Free
  PID USERNAMETHR PRI NICE   SIZERES STATETIME   WCPU COMMAND
 1222 root  1 1040  3572K  1448K RUN  1:29 39.36% natd
 1676 root  1  620  5340K  3544K select   0:59 24.56% dhcpd

What is going on?  My ipfw ruleset is below, and is based on the example 
in the FreeBSD handbook.


1 allow ip from any to any via lo0
2 allow ip from any to any via em0
3 allow ip from any to any via wlan0
00101 divert 8668 ip from any to any in via em1
00102 check-state
00110 skipto 500 tcp from any to any out via em1 setup keep-state
00111 skipto 500 udp from any to any out via em1 keep-state
00112 skipto 500 icmp from any to any out via em1 keep-state
00201 allow udp from any to any dst-port 68 in keep-state
00202 allow tcp from any to me dst-port 80 in via em1 setup keep-state
00210 allow tcp from 130.217.250.13 to me in via em1 setup keep-state
00211 allow tcp from 199.109.33.1 to me in via em1 setup keep-state
00212 allow tcp from 192.172.226.78 to me in via em1 setup keep-state
00213 allow tcp from 192.172.226.95 to me in via em1 setup keep-state
00230 allow tcp from any to me dst-port 6984 in via em1 setup keep-state
00231 allow udp from any to me dst-port 6984 in via em1
00240 allow icmp from any to me in via em1
00300 unreach filter-prohib log ip from any to any
00500 divert 8668 ip from any to any out via em1
00501 allow ip from any to any
65535 allow ip from any to any
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: 9-stable - ifmedia_set: no match for 0x0/0xfffffff

2012-01-25 Thread Randy Bush
ok, i 
  o used device.hints to disable both bge interfaces
  o booted successfully
  o used serial console
  o ifconfiged bge0 to the normal addresses
  o and it is working

i suspect that something sucks in bge initialization at startup.
insightful, i know.  sorry.

randy
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: msk0: watchdog timeout interface hang

2012-01-25 Thread Arnaud Lacombe
Hi,

On Wed, Jan 25, 2012 at 3:26 PM, Kim Culhan  wrote:
> Running 10-curent from 01-20-12
> the msk0 interface hung, on the console:
>
> msk0: watchdog timeout
> msk0: prefetch unit stuck?
> msk0: initialization failed: no memory for Rx buffers
>
> Verbose boot dmesg output attached.
>
known issue affecting at least 8-STABLE, 9-STABLE (assumed) and
-current. Already reported in these threads:

http://lists.freebsd.org/pipermail/freebsd-net/2011-December/030635.html

http://lists.freebsd.org/pipermail/freebsd-questions/2011-November/235646.html

 - Arnaud

> Any help is greatly appreciated.
>
> -kim
>
> ___
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: 9-stable - ifmedia_set: no match for 0x0/0xfffffff

2012-01-25 Thread Randy Bush
way cool.  a /boot/device.hints entry of
hint.acpi.bge.1.disable=1
did disable bge1.  but now it's bge0, and i need that interface.  and
media are present!

so i tried /etc/rc.conf

ifconfig_bge0="198.180.150.1/25 media 1000baseTX"
ifconfig_bge0_ipv6="inet6 2001:418:8006::1/64"
ifconfig_bge0_alias0="inet 198.180.150.2/32"
ifconfig_bge1="media 1000baseTX"

pcib4:  irq 12 at device 28.2 on pci0
pcib0: allocated type 3 (0xd010-0xd01f) for rid 20 of pcib4
pcib4:   domain0
pcib4:   secondary bus 4
pcib4:   subordinate bus   4
pcib4:   memory decode 0xd010-0xd01f
pcib4:   no prefetched decode
ACPI: Found matching pin for 4.0.INTA at func 0: 12
pci4:  on pcib4
pci4: domain=0, physical bus=4
found-> vendor=0x14e4, dev=0x1659, revid=0x11
domain=0, bus=4, slot=0, func=0
class=02-00-00, hdrtype=0x00, mfdev=0
cmdreg=0x0006, statreg=0x0010, cachelnsz=8 (dwords)
lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns)
intpin=a, irq=12
powerspec 2  supports D0 D3  current D0
MSI supports 8 messages, 64 bit
map[10]: type Memory, range 64, base 0xd010, size 16, enabled
pcib4: allocated memory range (0xd010-0xd010) for rid 10 of pci0:4:0:0
pcib4: matched entry for 4.0.INTA (src \_SB_.PCI0.LNKC:0)
pcib4: slot 0 INTA routed to irq 12 via \_SB_.PCI0.LNKC
pci0:4:0:0: bad VPD cksum, remain 14
bge0:  mem 
0xd010-0xd010 irq 12 at device 0.0 on pci4
bge0: CHIP ID 0x4101; ASIC REV 0x04; CHIP REV 0x41; PCI-E
miibus0:  on bge0
brgphy0:  PHY 1 on miibus0
brgphy0: OUI 0x001018, model 0x0018, rev. 0
brgphy0:  no media present
ifmedia_set: no match for 0x0/0xfff
panic: ifmedia_set
KDB: stack backtrace:
#0 0xc05bc257 at kdb_backtrace+0x47
#1 0xc058db2f at panic+0xaf
#2 0xc063e3d1 at ifmedia_set+0x41
#3 0xc04e94fa at miibus_mediainit+0x8a
#4 0xc04e227f at brgphy_attach+0x3bf
#5 0xc05b5f6f at device_attach+0x36f
#6 0xc05b745c at device_probe_and_attach+0x2c
#7 0xc05b7489 at bus_generic_attach+0x19
#8 0xc04e9987 at miibus_attach+0xd7
#9 0xc05b5f6f at device_attach+0x36f
#10 0xc05b745c at device_probe_and_attach+0x2c
#11 0xc05b7489 at bus_generic_attach+0x19
#12 0xc04e9f0c at mii_attach+0x40c
#13 0xc04db0f3 at bge_attach+0x3a93
#14 0xc05b5f6f at device_attach+0x36f
#15 0xc05b745c at device_probe_and_attach+0x2c
#16 0xc05b7489 at bus_generic_attach+0x19
#17 0xc049e984 at acpi_pci_attach+0x194
Uptime: 1s
Automatic reboot in 15 seconds - press a key on the console to abort
--> Press a key on the console to reboot,
--> or switch off the system now.

randy
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: kern/164475: [gre] gre misses RUNNING flag after a reboot

2012-01-25 Thread linimon
Old Synopsis: gre misses RUNNING flag after a reboot
New Synopsis: [gre] gre misses RUNNING flag after a reboot

Responsible-Changed-From-To: freebsd-bugs->freebsd-net
Responsible-Changed-By: linimon
Responsible-Changed-When: Thu Jan 26 02:23:58 UTC 2012
Responsible-Changed-Why: 
Over to maintainer(s).

http://www.freebsd.org/cgi/query-pr.cgi?pr=164475
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: kern/164495: [igb] connect double head igb to switch cause system to halt

2012-01-25 Thread linimon
Old Synopsis: connect double head igb to switch cause system to halt
New Synopsis: [igb] connect double head igb to switch cause system to halt

Responsible-Changed-From-To: freebsd-bugs->freebsd-net
Responsible-Changed-By: linimon
Responsible-Changed-When: Thu Jan 26 02:23:09 UTC 2012
Responsible-Changed-Why: 
Over to maintainer(s).

http://www.freebsd.org/cgi/query-pr.cgi?pr=164495
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


9-stable - ifmedia_set: no match for 0x0/0xfffffff

2012-01-25 Thread Randy Bush
day old i386 current

bge1:  mem 
0xd020-0xd020 irq 10 at device 0.0 on pci5
bge1: CHIP ID 0x4101; ASIC REV 0x04; CHIP REV 0x41; PCI-E
miibus1:  on bge1
brgphy1:  PHY 1 on miibus1
brgphy1: OUI 0x001018, model 0x0018, rev. 0
brgphy1:  no media present
ifmedia_set: no match for 0x0/0xfff
panic: ifmedia_set
KDB: stack backtrace:
#0 0xc05bc257 at kdb_backtrace+0x47
#1 0xc058db2f at panic+0xaf
#2 0xc063e3d1 at ifmedia_set+0x41
#3 0xc04e94fa at miibus_mediainit+0x8a
#4 0xc04e227f at brgphy_attach+0x3bf
#5 0xc05b5f6f at device_attach+0x36f
#6 0xc05b745c at device_probe_and_attach+0x2c
#7 0xc05b7489 at bus_generic_attach+0x19
#8 0xc04e9987 at miibus_attach+0xd7
#9 0xc05b5f6f at device_attach+0x36f
#10 0xc05b745c at device_probe_and_attach+0x2c
#11 0xc05b7489 at bus_generic_attach+0x19
#12 0xc04e9f0c at mii_attach+0x40c
#13 0xc04db0f3 at bge_attach+0x3a93
#14 0xc05b5f6f at device_attach+0x36f
#15 0xc05b745c at device_probe_and_attach+0x2c
#16 0xc05b7489 at bus_generic_attach+0x19
#17 0xc049e984 at acpi_pci_attach+0x194
Uptime: 1s
Automatic reboot in 15 seconds - press a key on the console to abort
--> Press a key on the console to reboot,
--> or switch off the system now.

randy
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: low network speed

2012-01-25 Thread Rick Macklem
Eugene M. Zheganin wrote:
> Hi.
> 
> I'm suffering from low network performance on one of my FreeBSDs.
> I have an i386 8.2-RELEASE machine with an fxp(4) adapter. It's
> connected though a bunch of catalysts 2950 to another 8.2. While other
> machines in this server room using the same sequence of switches and
> the
> same target source server (which, btw, is equipped with an em(4) and a
> gigabit link bia catalyst 3750) show sufficient speed, this particular
> machine while using scp starts with a speed of 200 Kbytes/sec and
> while
> copying the file shows speed about 600-800 Kbytes/sec.
> 
> I've added this tweak to the sysctl:
> 
> net.local.stream.recvspace=196605
> net.local.stream.sendspace=196605
> net.inet.tcp.sendspace=196605
> net.inet.tcp.recvspace=196605
> net.inet.udp.recvspace=196605
> kern.ipc.maxsockbuf=2621440
> kern.ipc.somaxconn=4096
> net.inet.tcp.sendbuf_max=524288
> net.inet.tcp.recvbuf_max=524288
> 
> With these settings the copying starts at 9.5 Mbytes/sec speed, but
> then, as file is copying, drops down to 3.5 Megs/sec in about
> two-three
> minutes.
> 
> Is there some way to maintain 9.5 Mbytes/sec (I like this speed more)
> ?
> 
You might want to try disabling the hardware checksumming via ifconfig.
(I very vaguely recall doing that for a fxp(4) interface some time ago,
 but am probably completely wrong.:-)

rick

> 
> Thanks.
> Eugene.
> 
> P.S. This machine also runs zfs, I don't know if it's important but I
> decided to mention it.
> ___
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Problem with nat traversal

2012-01-25 Thread Christer Hermansson
I have problem with nat traversal. The server is directly connected to 
the Internet, the client is behind a gateway that use nat.


The problem is that the server tries to respond to the clients internal 
private address 192.168.1.10, (and the ISP sends icmp messages back to 
the server, telling it blocks 192.168 addresses).


(I don't have access to the real output from tcpdump right now...)

tcpdump on the server shows something like this:

 client-ext-ip > srv-ext-ip UDP 500
 srv-ext-ip UDP 500 > client-ext-ip

 client-ext-ip > srv-ext-ip UDP 500
 srv-ext-ip UDP 500 > client-ext-ip

 client-ext-ip > srv-ext-ip UDP 4500
 srv-ext-ip 4500 > client-INT-ip UDP
 icmp from isp-router telling client-INT-ip is filtered

 client-ext-ip > srv-ext-ip UDP 4500
 srv-ext-ip 4500 > client-INT-ip UDP
 icmp from isp-router telling client-INT-ip is filtered

 client-ext-ip > srv-ext-ip UDP 4500
 srv-ext-ip 4500 > client-INT-ip UDP
 icmp from isp-router telling client-INT-ip is filtered

windump on the client with win7 shows something like this:

 client-ext-ip > srv-ext-ip UDP 500
 srv-ext-ip UDP 500 > client-ext-ip

 client-ext-ip > srv-ext-ip UDP 500
 srv-ext-ip UDP 500 > client-ext-ip

 client-ext-ip > srv-ext-ip UDP 4500
 client-ext-ip > srv-ext-ip UDP 4500
 client-ext-ip > srv-ext-ip UDP 4500

I get the same problem with

FreeBSD 8.1R i386 + ipsec-tools 0.8.0
FreeBSD 8.2R amd64 + ipsec-tools 0.7.3
FreeBSD 8.2R amd64 + ipsec-tools 0.8.0

I have compiled the kernel with

options IPSEC
options IPSEC_DEBUG
options IPSEC_FILTERTUNNEL
options IPSEC_NAT_T
device crypto
device enc

and I have "nat_traversal on" in racoon.conf.

Why is the server trying to send packets to the clients internal address ?


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Problem with nat traversal

2012-01-25 Thread Christer Hermansson
I have problem with nat traversal. The server is directly connected to 
the Internet, the client is behind a gateway that use nat.


The problem is that the server tries to respond to the clients internal 
private address 192.168.1.10, (and the ISP sends icmp messages back to 
the server, telling it blocks 192.168 addresses).


(I don't have access to the real output from tcpdump right now...)

tcpdump on the server shows something like this:

 client-ext-ip > srv-ext-ip UDP 500
 srv-ext-ip UDP 500 > client-ext-ip

 client-ext-ip > srv-ext-ip UDP 500
 srv-ext-ip UDP 500 > client-ext-ip

 client-ext-ip > srv-ext-ip UDP 4500
 srv-ext-ip 4500 > client-INT-ip UDP
 icmp from isp-router telling client-INT-ip is filtered

 client-ext-ip > srv-ext-ip UDP 4500
 srv-ext-ip 4500 > client-INT-ip UDP
 icmp from isp-router telling client-INT-ip is filtered

 client-ext-ip > srv-ext-ip UDP 4500
 srv-ext-ip 4500 > client-INT-ip UDP
 icmp from isp-router telling client-INT-ip is filtered

windump on the client with win7 shows something like this:

 client-ext-ip > srv-ext-ip UDP 500
 srv-ext-ip UDP 500 > client-ext-ip

 client-ext-ip > srv-ext-ip UDP 500
 srv-ext-ip UDP 500 > client-ext-ip

 client-ext-ip > srv-ext-ip UDP 4500
 client-ext-ip > srv-ext-ip UDP 4500
 client-ext-ip > srv-ext-ip UDP 4500

I get the same problem with

FreeBSD 8.1R i386 + ipsec-tools 0.8.0
FreeBSD 8.2R amd64 + ipsec-tools 0.7.3
FreeBSD 8.2R amd64 + ipsec-tools 0.8.0

I have compiled the kernel with

options IPSEC
options IPSEC_DEBUG
options IPSEC_FILTERTUNNEL
options IPSEC_NAT_T
device crypto
device enc

and I have "nat_traversal on" in racoon.conf.

Why is the server trying to send packets to the clients internal address ?


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: "ifconfig media off"?

2012-01-25 Thread Marius Strobl
On Sat, Jan 21, 2012 at 12:58:08AM +0100, Stefan Bethke wrote:
> Am 14.12.2011 um 02:16 schrieb Marius Strobl:
> 
> > On Tue, Dec 13, 2011 at 10:53:48AM -0800, YongHyeon PYUN wrote:
> >> On Tue, Dec 13, 2011 at 11:04:51AM +0100, Stefan Bethke wrote:
> >>> Am 13.12.2011 um 03:50 schrieb YongHyeon PYUN:
> >>> 
>  On Tue, Dec 13, 2011 at 12:56:22AM +0100, Stefan Bethke wrote:
> > I'm currently writing a driver to configure an ethernet switch chip 
> > (see TL-WR1043ND on -embedded).
> > 
> > I noticed that there doesn't seem to be a way to power down a phy right 
> > now through the ifconfig media command.
> > 
> > Would there be objections to extend the media subtype definitions to 
> > include an "off", "poweroff" or "down" media subtype, and add code to 
> > the relevant phy drivers to power down the phy for this media subtype?
> > 
> > The difference between media subtype "none" and this new one would be 
> > that there will be no link, even if there is a physical connection.  
> > With media subtype "none", a 10 MBit/s half-duplex connection is 
> > established, potentially confusing the remote end about the 
> > availability of this link.  On the local side, the link is down, so no 
> > packets are exchanged.
> > 
>  
>  I think "none" means "isolated" so should have no established link
>  and probably you can also power down the PHY.
>  I vaguely guess the PHY of switch chip does not correctly support
>  isolated mode so you may have wanted to power down.
> >>> 
> >>> 
> >>> After looking at the code a bit more, I think the common code just 
> >>> doesn't set the BMCR_PDOWN (but clears it when bringing up the PHY).
> >>> 
> >> 
> >> Yes, and most PHYs could be powered down when BMCR_ISO is chosen.
> >> I'm not sure whether this could be applied to hardwares that
> >> support multiple PHYs(i.e. internal and external transceivers)
> >> though.  Marius may have some opinions on this(CCed).
> >> However powering down PHY with BMCR_ISO looks natural to me.
> >> 
> >>> Index: sys/dev/mii/mii_physubr.c
> >>> ===
> >>> --- sys/dev/mii/mii_physubr.c (revision 228402)
> >>> +++ sys/dev/mii/mii_physubr.c (working copy)
> >>> @@ -58,7 +58,7 @@
> >>>  */
> >>> static const struct mii_media mii_media_table[MII_NMEDIA] = {
> >>>   /* None */
> >>> - { BMCR_ISO, ANAR_CSMA,
> >>> + { BMCR_ISO | BMCR_PDOWN,ANAR_CSMA,
> >>> 0, },
> >>> 
> >>>   /* 10baseT */
> >>> 
> >>> I've opened kern/163240.
> >>> http://www.freebsd.org/cgi/query-pr.cgi?pr=163240
> 
> I'd like to revisit this.  Just to reiterate my motivation for the change: I 
> want to be able to indicate to the remote end that my station is not active.  
> With the PHY just isolated from the MII, the link stays up and functional 
> (and even autoneg continues to work), so the remote has no indication that 
> it's just shouting into a void.

Yes, I understand the motivation and generally agree that this should
be implemented. IMO the above is just a quick-hack though and no proper
solution, on the other hand I neither see a need to grown an "off"
media for this.

> 
> > I don't think powering down the PHY along with IFM_NONE especially
> > in that way is a good idea for several reasons:
> > - It's incomplete as not all PHY drivers use mii_phy_add_media()/
> >  mii_phy_setmedia().
> > - Even for those that do IFM_NONE isn't added when the PHY driver
> >  sets MIIF_NOISOLATE (for some PHYs BMCR_ISO either just doesn't
> >  work as especially the built-in ones probably have been designed
> >  with only single-PHY configurations in mind or even wedges the
> >  chip up to the point that even a reset doesn't get it working
> >  again). In general though, BMCR_ISO and BMCR_PDOWN are orthogonal
> >  (even in IEEE 802.3-2008 as far as I can see), i.e. while BMCR_ISO
> >  might be broken, BMCR_PDOWN could work (actually I'd expect
> >  BMCR_PDOWN to be less fragile than BMCR_ISO).
> 
> I didn't expect my suggestion to be the be-all end-all, only a quick and easy 
> way to allow compliant PHYs to be powered down, and I'm not sure why a 
> "complete" solution is required.  I'd assume that PHYs setting MIIF_NOISOLATE 
> have specific requirements, so it's OK to not have the power-down option 
> available there.  (Plus I don't have hardware I could test that case on).

I wouldn't call it "specific requirements". The PHY drivers I've
flagged with MIIF_NOISOLATE so far fall into one of two categories:
a) Setting BMCR_ISO just doesn't have any effect and the PHY happily
   continues to pass traffic. Setting MIIF_NOISOLATE in this case
   is done in order to not add an non-working "none" media.
b) Upon setting BMCR_ISO the hardware wedges up to a way that a
   power-cycle is required in order to get it into a working state
   again. MIIF_NOISOLATE is set here in order to protect the users
   from s

Re: kern/164490: [pfil] Incorrect IP checksum on pfil pass from ip_output()

2012-01-25 Thread linimon
Old Synopsis: Incorrect IP checksum on pfil pass from ip_output()
New Synopsis: [pfil] Incorrect IP checksum on pfil pass from ip_output()

Responsible-Changed-From-To: freebsd-bugs->freebsd-net
Responsible-Changed-By: linimon
Responsible-Changed-When: Wed Jan 25 19:58:14 UTC 2012
Responsible-Changed-Why: 
Over to maintainer(s).

http://www.freebsd.org/cgi/query-pr.cgi?pr=164490
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: low network speed

2012-01-25 Thread Ivan Voras

On 25/01/2012 06:27, Eugene M. Zheganin wrote:

Hi.

I'm suffering from low network performance on one of my FreeBSDs.
I have an i386 8.2-RELEASE machine with an fxp(4) adapter. It's
connected though a bunch of catalysts 2950 to another 8.2.


Another thing to try would be to upgrade both ends to 8-STABLE and try 
the high-performance network buffer sizing in ssh (enabled by default).



___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


livelock with full loaded em(4)

2012-01-25 Thread Anton Yuzhaninov

Hello.

I have test boxes with em(4) network card - Intel 82563EB
FreeBSD version - 8.2 stable from 2012-01-15, amd64

When this NIC is full loaded livelock occurs - system is unresponsive
even from local console.

To generate load I use netsend from /usr/src/tools/tools/netrate/
but other traffic source (e. g. TCP instead UDP) cause same problem.

There is need 2 conditions for this livelock:

1. With full NIC load, kernel thread "em1 taskq" hogs CPU.

top -zISHP for interface load a bit less, than full.
Traffic is generated by
# netsend 172.16.0.2 9001 8500 14300 3600
where 14300 is packets per second:

112 processes: 10 running, 82 sleeping, 20 waiting
CPU 0:  0.0% user,  0.0% nice, 27.1% system,  0.0% interrupt, 72.9% idle
CPU 1:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU 2:  2.3% user,  0.0% nice, 97.7% system,  0.0% interrupt,  0.0% idle
CPU 3:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU 4:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU 5:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU 6:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU 7:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Mem: 26M Active, 378M Inact, 450M Wired, 132K Cache, 63M Buf, 15G Free
Swap: 8192M Total, 8192M Free

  PID USERNAME  PRI NICE   SIZERES STATE   C   TIME   WCPU COMMAND
 7737 ayuzhaninov   1190  5832K  1116K CPU22   0:04 100.00% netsend
0 root  -680 0K   144K -   0   2:17 22.27% {em1 taskq}

top -zISHP for full interface load (some drops occurs), load is
generated by
# netsend 172.16.0.2 9001 8500 14400 3600
112 processes: 11 running, 81 sleeping, 20 waiting
CPU 0:  0.0% user,  0.0% nice,  100% system,  0.0% interrupt,  0.0% idle
CPU 1:  4.1% user,  0.0% nice, 95.9% system,  0.0% interrupt,  0.0% idle
CPU 2:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU 3:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU 4:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU 5:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU 6:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU 7:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Mem: 26M Active, 378M Inact, 450M Wired, 132K Cache, 63M Buf, 15G Free
Swap: 8192M Total, 8192M Free

  PID USERNAME  PRI NICE   SIZERES STATE   C   TIME   WCPU COMMAND
0 root  -680 0K   144K CPU00   2:17 100.00% {em1 taskq}
 7759 ayuzhaninov   1190  5832K  1116K CPU11   0:01 100.00% netsend

So pps increased from 14300 to 14400 (0.7%), but CPU load from "em1 taskq" 
thread
increased from 27.1% to 100.00%

This at least strange, but system still works fine until I run
sysctl dev.cpu.0.temperature

2. sysctl handler code for coretemp must be executed on target CPU,
e. g. for dev.cpu.0.temperature code executed on CPU0.

If CPU0 is fully loaded by "em1 taskq" sysctl handler for
dev.cpu.0.temperature acquires Giant mutex lock then tries to run code
on CPU0, but it can't - CPU0 is busy.

If Giant mutex hold for long time system is unresponsive. In my case
Giant mutex acquired when sysctl dev.cpu.0.temperature started and hold
all time while netsend is running.

This seems to be a scheduler problem:
1. Why "em1 taskq" runs only on CPU0 (there is no affinity for this tread)?

# procstat -k 0 | egrep '(PID|em1)'
  PIDTID COMM TDNAME   KSTACK
0 100038 kernel   em1 taskq
# cpuset -g -t 100038
tid 100038 mask: 0, 1, 2, 3, 4, 5, 6, 7

2. Why "em1 taskq" is not preempted to execute sysctl handler code? This
is not short term condition - is netsend running for a hour, "em1 taskq"
is not preempted for a hour - sysctl all this time in running state but
don't have a chance to be executed.

--
 Anton Yuzhaninov

P. S. I tried to use EM_MULTIQUEUE, but this is don't help in my case.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Ethernet Switch Framework

2012-01-25 Thread Stefan Bethke
Am 25.01.2012 um 08:12 schrieb Adrian Chadd:

> So when will you two have something consensus-y to commit? :-)
> 
> What I'm hoping for is:
> 
> * some traction on the MII bus / MDIO bus split and tidyup from stb, which is 
> nice;
> * ray's switch API for speaking to userland with;
> * agreeing on whether the correct place to put the driver(s) is where stb, 
> ray, or a mix of both approaches says so.
> 
> I've been (mostly) trying to stay out of this to see where both of you have 
> gone. I think we've made some good progress; now it's time to solidify a 
> design for the first pass of what we want in -HEAD and figure out how to move 
> forward.

My suggestion is to take my bus attachment code (incl. mdio and miiproxy) and 
ray's ioctl and userland code.

Aleksandr's approach for the driver attachment is to have a generic switch 
"bus" driver that abstracts the mii, i2c, memory mapped I/O, etc. busses the 
devices are physically attached to, and present a uniform register file to the 
chip-specific switch driver.  I believe that this is unnecessarily complicated 
for two reasons: newbus already provides that abstraction, and chip-specific 
drivers usually differ in so many aspects, including their register files, that 
code sharing between chips will be somewhat limited anyway.

One aspect that I would enjoy looking into in more detail is how register 
accesses on, for example, MDIO, can be provided through the bus space API.  
From my cursory reading, it seems that the code currently is tailored towards 
register accesses that can be implemented through CPU native instructions, 
instead of indirectly through a controller.

Aleksandr has defined a quite comprehensive ethernet switch control API that 
the framework provides towards in-kernel clients as well as userland.  I think 
it would be really helpful if we could concentrate on those API functions that 
can be controlled through the userland utility, have immediate use cases (for 
example, VLAN configuration on the TL-WR1043ND to separate the WAN from the LAN 
ports), and we have test hardware for.  In short, don't commit dead code.

Having a description of the generic switch model that the API assumes and 
driver-specific documentation also wouldn't hurt.  (Yes, I'm volunteering.)


Stefan

-- 
Stefan BethkeFon +49 151 14070811

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"