Re: ipsec with ipfw
On 2017-03-13 11:01, Andrey V. Elsukov wrote: On 12.03.2017 00:23, Hooman Fazaeli wrote: Hi, As you know the ipsec/setkey provide limited syntax to define security policies: only a single subnet/host, protocol number and optional port may be used to specify traffic's source and destination. I was thinking about the idea of using ipfw as the packet selector for ipsec, much like it is used with dummeynet. Something like: ipfw add 100 ipsec 2 tcp from to 80,443,110,139 What this rule should do? How do you plan implement policy lookup for inbound packets? For instance, Outbound packets matching the rule would go through the tunnel whose index is 2. The tunnel itself is defined using setkey. Something like: spdadd 2 esp/tunnel/1.1.1.1-2.2.2.2/require It's basically the same as spdadd without the src/dst/proto/port specification. A similar rule would be written for inbound packets. This is just to indicate the idea. Obviously, exact mechanism needs further thought & investigation (i.e., the issue of stateful vs. stateless rules). One important aspect, as s...@zxy.spb.ru pointed out, is how to deal with IKE/ISAKMP to support the mechanism, as the current protocol requires that negotiating parties to exchange & match subject-to-ipsec-traffic specification in SA payloads (which is restricted to single subnet+proto+port). I was thinking about some form of labeling (like MPLS) plus custom payload types or DOIs. Your ideas are welcome. -- Best regards Hooman Fazaeli ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
ipsec with ipfw
Hi, As you know the ipsec/setkey provide limited syntax to define security policies: only a single subnet/host, protocol number and optional port may be used to specify traffic's source and destination. I was thinking about the idea of using ipfw as the packet selector for ipsec, much like it is used with dummeynet. Something like: ipfw add 100 ipsec 2 tcp from to 80,443,110,139 What do you think? Are you interested in such a feature? Is it worth the effort? What are the implementation challenges? -- Best regards Hooman Fazaeli ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: projects/routing announcement/status
On 2016-01-22 03:11, Alexander V. Chernikov wrote: I would like to introduce routing rework which started as projects/routing SVN branch. It has been around for quite a long time, some of the code has made its way to HEAD, but there hasn't been any public announcements. So, what is projects/routing about? First, it is about bringing more scalability by solving most annoying problems on packet output path. To be more specific, it eliminates 2 out of 4 locks, converts other 2 to rmlock(9) and adds infrastructure to reduce locking to single rmlock for certain traffic types. With these changes, OS is able to forward 12MPPS on 16-core box for both IPv4/IPv6 which is 6-10 times better than stock HEAD. Second, it eases hacking by avoiding direct access to route/lltable internals and providing higher level API instead. Third, it is about bringing advanced features like route multipath, and even more speed by adding modular lookup API permitting to use different route lookup algorithms based on server role. Description with graphs and links is available at: http://wiki.freebsd.org/ProjectsRoutingProposal Used API is described in http://wiki.freebsd.org/ProjectsRoutingProposal/API Current status is available at http://wiki.freebsd.org/ProjectsRoutingProposal/ConversionStatus It is probably much more convenient to read project details on wiki, however I’ll try to summarise the most important things here (wiki readers can skip till the end). Typical packet processing (forwarding for router, or output for web server) path consists of: doing routing lookup (radix read rwlock + routing entry (rte) mutex lock) (optionally) interface address (ifa) atomic refcount acquire/release doing link level entry (lle, llentry) lookup (afdata read rwlock + llentry read (or write) lock) Most annoying one is the rtentry mutex. The only goal of this mutex is to provide rtentry refcounting so consumer code can use it without the risk of rtentrry being deleted. We solve this by saving all needed data into on-stack optimised structure instead of refcounting. Additionally, we are trying to pre-calculate the data we need to pass by using special next-hop structures instead of route entries. Several different (in terms of returned info and relative overhead) functions for retrieving routing data are provided. Most of the consumers have already been switched to the new KPI. Actual output/forward path are not converted yet. It should be noted, that since individual rtentries are not returned, it is not possible to do per-ifa output packet accouting (can be observed in netstat -s). Route table lock is switched to ipfw-like dual-locking mode (read rmlock() for data path, rwlock for config changes, route export, etc..). The reasons of having rwlock are to 1) provide serialization for things in control plane not directly used for data path and 2) avoid acquiring contested/sleeping locks for rmlock. See projects/routing r287078 for an example. Lltable entry locks were eliminated in r291853, r292155. Lltable lock is also planned to be converted to dual-locking model, with the similar reasoning. However, instead of (ab)using AFDATA lock, it needs to be converted to per-lltable set of locks. Open problems: SCTP/Flowtable references rtentries directly. It is not possible to convert ip[6]_output() path without dealing with that. Brief merge plan: Discuss/merge new routing KPI for data path Discuss/merge lltable dual-lock (WIP) Discuss/merge explicit nexthop changes Discuss/merge IPv4/IPv6 output path (along with converted sctp/flowtable) Discuss/merge route table dual-lock Current outstanding reviews (I encourage you to take a look at these) D5009 (IPv4 fast forwarding conversion) D5010 (IPv6 forwarding conversion) D4794 (Deal with per-ifa output counters) D4962 (new LLE lookup functions, no sockaddrs in lltable data path) D4751 (move all lltable code to separate files) ___ freebsd-a...@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-arch To unsubscribe, send any mail to "freebsd-arch-unsubscr...@freebsd.org" First, thanks for the effort. I personally very much appreciate any improvements made to the network related stuff. Second have you considered replacing the existing radix tree with a faster data structure, specially the Luigi DXR tables? (http://info.iet.unipi.it/~luigi/papers/20120601-dxr.pdf <http://info.iet.unipi.it/%7Eluigi/papers/20120601-dxr.pdf>) I apologize if the question is not much relevant to your work. -- Best regards Hooman Fazaeli ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: kernel panic with netgraph and mpd3.8
On 2016-07-10 10:49, Donald Baud via freebsd-net wrote: Hi I'm running an l2tp lns through mpd3.8 and it's been crashing twice in 24h. This is a new project replacing a cisco 7206, 700-sessions 800mbit/s I am not familiar with troubleshooting kernel panic's, I am suspecting that the crash is happening inside the netgraph module because the crash is happening at the instruction pointer = 0x20:0x81c38283 I included the 2 two crash logs. I need some help to to figure out what to do next. -Dbaud - Upgrade to mpd 5 (/usr/ports/net/mpd5) - Try below workarounds: https://lists.freebsd.org/pipermail/freebsd-bugs/2014-June/056548.html https://lists.freebsd.org/pipermail/freebsd-bugs/2014-June/056549.html https://lists.freebsd.org/pipermail/freebsd-net/2014-June/038954.html -- Best regards Hooman Fazaeli ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: tcp window scaling + syn cookies problem
On 2016-03-07 4:26 PM, Hooman Fazaeli wrote: Hi, In our network, Windows clients connect to internet via our custom developed transparent tcp proxy (running on 7.3). Things work fine, except that _sometimes_ downloads from the some windows clients become very slow. To debug the problem, we inspected a few packet traces and found out that the problem happens because the proxy TCP stack forgets about client's window scale factor, as illustrated in the following packet trace (it is for a download from ftp.freebsd.org site. x.y.z.y is a windows 8): 1. 15:09:32.765713 IP (tos 0x0, ttl 63, id 16510, offset 0, flags [DF], proto TCP (6), length 52) x.y.z.y.57430 > 96.47.72.72.80: S, cksum 0x8343 (correct), 1530161492:1530161492(0) win 8192 2. 15:09:32.765729 IP (tos 0x0, ttl 64, id 55869, offset 0, flags [none], proto TCP (6), length 52) 96.47.72.72.80 > x.y.z.y.57430: S, cksum 0xe2c0 (correct), 503882603:503882603(0) ack 1530161493 win 65535 3. 15:09:32.766071 IP (tos 0x0, ttl 63, id 16511, offset 0, flags [DF], proto TCP (6), length 40) x.y.z.y.57430 > 96.47.72.72.80: ., cksum 0x2192 (correct), ack 1 win 256 4. 15:09:32.770074 IP (tos 0x0, ttl 63, id 16512, offset 0, flags [DF], proto TCP (6), length 408) x.y.z.y.57430 > 96.47.72.72.80: P, cksum 0x259c (correct), 1:369(368) ack 1 win 256 5. 15:09:32.869286 IP (tos 0x0, ttl 64, id 57834, offset 0, flags [none], proto TCP (6), length 40) 96.47.72.72.80 > x.y.z.y.57430: ., cksum 0x2122 (correct), ack 369 win 65535 6. 15:09:33.180983 IP (tos 0x0, ttl 64, id 64495, offset 0, flags [none], proto TCP (6), length 296) 96.47.72.72.80 > x.y.z.y.57430: ., cksum 0xbd5a (correct), 1:257(256) ack 369 win 65535 7. 15:09:33.231475 IP (tos 0x0, ttl 63, id 16513, offset 0, flags [DF], proto TCP (6), length 40) x.y.z.y.57430 > 96.47.72.72.80: ., cksum 0x1f23 (correct), ack 257 win 255 8. 15:09:33.231494 IP (tos 0x0, ttl 64, id 248, offset 0, flags [none], proto TCP (6), length 295) 96.47.72.72.80 > x.y.z.y.57430: ., cksum 0xc9b6 (correct), 257:512(255) ack 369 win 65535 9. 15:09:33.282256 IP (tos 0x0, ttl 63, id 16514, offset 0, flags [DF], proto TCP (6), length 40) x.y.z.y.57430 > 96.47.72.72.80: ., cksum 0x1e25 (correct), ack 512 win 254 10. 15:09:33.282279 IP (tos 0x0, ttl 64, id 1283, offset 0, flags [none], proto TCP (6), length 294) 96.47.72.72.80 > x.y.z.y.57430: ., cksum 0x1e25 (correct), 512:766(254) ack 369 win 65535 11. 15:09:33.333006 IP (tos 0x0, ttl 63, id 16515, offset 0, flags [DF], proto TCP (6), length 40) x.y.z.y.57430 > 96.47.72.72.80: ., cksum 0x1d28 (correct), ack 766 win 253 12. 15:09:33.333023 IP (tos 0x0, ttl 64, id 2520, offset 0, flags [none], proto TCP (6), length 293) 96.47.72.72.80 > x.y.z.y.57430: ., cksum 0x1d28 (correct), 766:1019(253) ack 369 win 65535 13. 15:09:33.383926 IP (tos 0x0, ttl 63, id 16516, offset 0, flags [DF], proto TCP (6), length 40) x.y.z.y.57430 > 96.47.72.72.80: ., cksum 0x1c2c (correct), ack 1019 win 252 As can be seen, the client advertises a window scale factor of 8 and then correctly sets packet's window size based on the advertised factor. But the proxy seems to forget about client's scale factor and sends as much data as the client's unscaled window size sent in a previous ACK. Now, setting 'net.inet.tcp.syncookies' to zero obviously seems to fix the problem and the download speed becomes as expected. Is this bad interaction between window scaling and syn cookies a known problem? Why it happens? Has it been fixed in later freebsd version? Thanks in advance. A few minutes after posting, I found the following thread which describes an exact duplicate of our problem : https://lists.freebsd.org/pipermail/freebsd-net/2013-February/034519.html -- Best regards Hooman Fazaeli ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
tcp window scaling + syn cookies problem
Hi, In our network, Windows clients connect to internet via our custom developed transparent tcp proxy (running on 7.3). Things work fine, except that _sometimes_ downloads from the some windows clients become very slow. To debug the problem, we inspected a few packet traces and found out that the problem happens because the proxy TCP stack forgets about client's window scale factor, as illustrated in the following packet trace (it is for a download from ftp.freebsd.org site. x.y.z.y is a windows 8): 1. 15:09:32.765713 IP (tos 0x0, ttl 63, id 16510, offset 0, flags [DF], proto TCP (6), length 52) x.y.z.y.57430 > 96.47.72.72.80: S, cksum 0x8343 (correct), 1530161492:1530161492(0) win 8192 2. 15:09:32.765729 IP (tos 0x0, ttl 64, id 55869, offset 0, flags [none], proto TCP (6), length 52) 96.47.72.72.80 > x.y.z.y.57430: S, cksum 0xe2c0 (correct), 503882603:503882603(0) ack 1530161493 win 65535 3. 15:09:32.766071 IP (tos 0x0, ttl 63, id 16511, offset 0, flags [DF], proto TCP (6), length 40) x.y.z.y.57430 > 96.47.72.72.80: ., cksum 0x2192 (correct), ack 1 win 256 4. 15:09:32.770074 IP (tos 0x0, ttl 63, id 16512, offset 0, flags [DF], proto TCP (6), length 408) x.y.z.y.57430 > 96.47.72.72.80: P, cksum 0x259c (correct), 1:369(368) ack 1 win 256 5. 15:09:32.869286 IP (tos 0x0, ttl 64, id 57834, offset 0, flags [none], proto TCP (6), length 40) 96.47.72.72.80 > x.y.z.y.57430: ., cksum 0x2122 (correct), ack 369 win 65535 6. 15:09:33.180983 IP (tos 0x0, ttl 64, id 64495, offset 0, flags [none], proto TCP (6), length 296) 96.47.72.72.80 > x.y.z.y.57430: ., cksum 0xbd5a (correct), 1:257(256) ack 369 win 65535 7. 15:09:33.231475 IP (tos 0x0, ttl 63, id 16513, offset 0, flags [DF], proto TCP (6), length 40) x.y.z.y.57430 > 96.47.72.72.80: ., cksum 0x1f23 (correct), ack 257 win 255 8. 15:09:33.231494 IP (tos 0x0, ttl 64, id 248, offset 0, flags [none], proto TCP (6), length 295) 96.47.72.72.80 > x.y.z.y.57430: ., cksum 0xc9b6 (correct), 257:512(255) ack 369 win 65535 9. 15:09:33.282256 IP (tos 0x0, ttl 63, id 16514, offset 0, flags [DF], proto TCP (6), length 40) x.y.z.y.57430 > 96.47.72.72.80: ., cksum 0x1e25 (correct), ack 512 win 254 10. 15:09:33.282279 IP (tos 0x0, ttl 64, id 1283, offset 0, flags [none], proto TCP (6), length 294) 96.47.72.72.80 > x.y.z.y.57430: ., cksum 0x1e25 (correct), 512:766(254) ack 369 win 65535 11. 15:09:33.333006 IP (tos 0x0, ttl 63, id 16515, offset 0, flags [DF], proto TCP (6), length 40) x.y.z.y.57430 > 96.47.72.72.80: ., cksum 0x1d28 (correct), ack 766 win 253 12. 15:09:33.333023 IP (tos 0x0, ttl 64, id 2520, offset 0, flags [none], proto TCP (6), length 293) 96.47.72.72.80 > x.y.z.y.57430: ., cksum 0x1d28 (correct), 766:1019(253) ack 369 win 65535 13. 15:09:33.383926 IP (tos 0x0, ttl 63, id 16516, offset 0, flags [DF], proto TCP (6), length 40) x.y.z.y.57430 > 96.47.72.72.80: ., cksum 0x1c2c (correct), ack 1019 win 252 As can be seen, the client advertises a window scale factor of 8 and then correctly sets packet's window size based on the advertised factor. But the proxy seems to forget about client's scale factor and sends as much data as the client's unscaled window size sent in a previous ACK. Now, setting 'net.inet.tcp.syncookies' to zero obviously seems to fix the problem and the download speed becomes as expected. Is this bad interaction between window scaling and syn cookies a known problem? Why it happens? Has it been fixed in later freebsd version? Thanks in advance. -- Best regards Hooman Fazaeli ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: Bridge Interfaces and ARPs
On 12/3/2015 5:24 PM, Jason Van Patten wrote: Hey gang - I posted this to the FreeBSD user forums but figured I'd send a message off to the list to see if anyone has any input, guidance, or ideas. Emailing diagrams around isn't good form (IMHO) but having a diagram handy will help with the discussion. So please glance at: http://pics.lateapex.net/vz.png Background: I have a business class Verizon FIOS connection for Internet at home. Along with that connection, I have 13 (not 14!) static IPs from VZ. They almost fall within a proper CIDR block, but not quite: 1.2.3.210 - 1.2.3.222. I don't own .209, so I can't claim 1.2.3.208/28 as my IP block (dammit!) The subnet for the static IPs is a /24, and the default route is *Verizon's* router: 1.2.3.1. There are a number of different choices for this network layout: DMZ, bridging, or binat. I chose bridging so that I don't have the complexity of binatting, and yet have some protection for the servers via my router. So, per the drawing, the FreeBSD router's em0 is connected to the Verizon equipment, while re0 and re1 are both connected to a managed Cisco switch, on different VLANs. VLAN 10 for re0: Public IPs (public services, etc) VLAN 20 for re1: Private IPs (NAS, wireless AP, etc) Via the router, VLAN 10 and Verizon's network are bridged together. The bridge interface on the router has IP: 1.2.3.222/24 with a default route set to 1.2.3.1. All servers on VLAN 10 have IPs within the allocated range (.210 - .220) and the same default route. Now: the problem. I used the LAGG'd server as an example in the diagram, but the same thing is happening with other servers: the router is learning ARP entries for the IPs I own *from* Verizon's router. As soon as the router caches that bad entry, it no longer routes traffic to those public IPs *from* VLAN 20 (private side). So, in other words, a laptop on the wireless network won't be able to get to 1.2.3.215. My work-around for now has been a series of static ARP entries on the router for each of my public servers. That seems to work fine, but I wonder if there's something I might be doing wrong? If I didn't include enough info, fire away. Thanks! Can you post the output of the following commands (on freebsd router): # ifconfig # ifconfig bridgeX addr # arp -na # netstat -nr -f inet # sysctl net.inet.ip -- Best regards Hooman Fazaeli ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
mbuf statistics
Hi, On an idle freebsd 9.3 system: vmstat -z | egrep "mbuf_cluster|ITEM" | column -t ITEM SIZE LIMIT USED FREE REQFAIL SLEEP mbuf_cluster: 2048, 10284, 1152, 56, 4237, 0,0 netstat -mb | grep "mbuf clusters in use" 512/696/1208/10284 mbuf clusters in use (current/cache/total/max) one can see that: current + cache == total == USED + FREE but the current/cache values as reported by netstat are very different form USED/FREE values reported by vmstat, so they should have different meaning. The question is: what is the exact meaning of USED/FREE and current/cache values? Is there any relationship between them? -- Best regards Hooman Fazaeli ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
tcp window scaling (rfc1323) problem
Hi, We connect to the Internet through a TCP proxy running on FreeBSD 8.3-RELEASE. Everything works except that instagram clients frequently fail to get/refresh some images and feeds. I have checked anything that may be the cause of problem and found that setting net.inet.tcp.rfc1323 to zero improves the situation. Googling a bit, I found out that there are reports about window scaling impl. bug in older freebsds (i.e., https://lists.freebsd.org/pipermail/freebsd-hackers/2007-January/019070.html). My question is that which version of freebsd is known to have the bug of window scaling fixed? Is there any known problem related to window scaling in newer (8+) freebsd versions? Thanks in advance. -- Best regards Hooman Fazaeli ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Locking Memory Question
On 7/30/2015 5:22 AM, Laurie Jennings via freebsd-net wrote: On Wed, 7/29/15, John-Mark Gurney j...@funkthat.com wrote: Subject: Re: Locking Memory Question To: Laurie Jennings laurie_jennings_1...@yahoo.com Cc: John Baldwin j...@freebsd.org, freebsd-net@freebsd.org Date: Wednesday, July 29, 2015, 7:25 PM Laurie Jennings via freebsd-net wrote this message on Wed, Jul 29, 2015 at 15:26 -0700: I have a problem and I can't quite figure out where to look. This is what Im doing: I have an IOCTL to read a block of data, but the data is too large to return via ioctl. So to get the data, I allocate a block in a kernel module: foo = malloc(1024000,M_DEVBUF,M_WAITOK); I pass up a pointer and in user space map it using /dev/kmem: An easier solution would be for your ioctl to pass in a userland pointer and then use copyout(9) to push the data to userland... This means the userland process doesn't have to have /dev/kmem access... Is there a reason you need to use kmem? The only reason you list above is that it's too large via ioctl, but a copyout is fine, and would handle all page faults for you.. __ I'm using kmem because the only options I could think of was to 1) use shared memory 2) use kmem 3) use a huge ioctl structure. Im not clear how I'd do that. the data being passed up from the kernel is a variable size. To use copyout I'd have to pass a pointer with a static buffer, right? Is there a way to malloc user space memory from within an ioctl call? Or would I just have to pass down a pointer to a huge buffer large enough for the largest possible answer? thanks Laurie You can use two IOCTLs. Get the block size from kernel module with the first ioctl, and malloc(3) a buffer in userland with that size. Then use a second ioctl to pass the address of allocated buffer to kernel module. The module may use copyout(9) to copy in-kernel data to user space buffer. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org -- Best regards Hooman Fazaeli ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: IPSEC in GENERIC [was: Re: netmap in GENERIC, by default, on HEAD]
On 11/6/2014 1:30 PM, Olivier Cochard-Labbé wrote: How to correctly bench IPSec performance ? For benching forwarding performance I generate minimum-size packet (2000 flows: 100 different source IP * 20 different destination IP) like with this netmap's pkt-gen example: pkt-gen -i ix0 -f tx -n 10 -l 60 -d 9.1.1.1:2000-9.1.1.100 -s 8.1.1.1:2000-8.1.1.20 = This permit me to obtain the maximum PPS forwarded by the server. May be off-topic: How much PPS and on which hardware? But for benching IPSec: Is the PPS with minimum-size packet a useful value ? ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org -- Best regards. Hooman Fazaeli ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: transparent udp proxy
On 10/31/2014 8:11 PM, Adrian Chadd wrote: Hi, If it's missing in 10 or later then please file a bug and I'll see what it'll take to add another socket option to return the original destination address+port. Thanks, https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=194758 -adrian On 31 October 2014 08:00, Hooman Fazaeli hoomanfaza...@gmail.com wrote: On 10/31/2014 5:30 PM, Mark Felder wrote: I'm not sure if this is what you're looking for, but perhaps the solution is in net/samplicator ? From the project's website: This simple program listens for UDP datagrams on a network port, and sends copies of these datagrams on to a set of destinations. Optionally, it can perform sampling, i.e. rather than forwarding every packet, forward only 1 in N. Another option is that it can spoof the IP source address, so that the copies appear to come from the original source, rather than the relay. Currently only supports IPv4. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org Thanks. I do not thinks it provides what I am looking for. I am not looking for an application performing a specific task, but a mechanism to get the __original__ destination address and port of packets forwarded to a local UDP proxy by ipfw fwd rules. As I figured it out until now, The original destination address may be obtained by IP_RECVDSTADDR on 9.0+ (but not on 8.x and older versions) but there seems to be no mechanism get the _original_ destination _port_ (Apart from this missing mechanism, my proxy is functional and performs what it is intended to do). -- Best regards. Hooman Fazaeli ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org -- Best regards. Hooman Fazaeli ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: transparent udp proxy
On 10/31/2014 8:30 PM, Ian Smith wrote: On Fri, 31 Oct 2014 18:30:00 +0330, Hooman Fazaeli wrote: On 10/31/2014 5:30 PM, Mark Felder wrote: I'm not sure if this is what you're looking for, but perhaps the solution is in net/samplicator ? From the project's website: This simple program listens for UDP datagrams on a network port, and sends copies of these datagrams on to a set of destinations. Optionally, it can perform sampling, i.e. rather than forwarding every packet, forward only 1 in N. Another option is that it can spoof the IP source address, so that the copies appear to come from the original source, rather than the relay. Currently only supports IPv4. Thanks. I do not thinks it provides what I am looking for. I am not looking for an application performing a specific task, but a mechanism to get the __original__ destination address and port of packets forwarded to a local UDP proxy by ipfw fwd rules. As I figured it out until now, The original destination address may be obtained by IP_RECVDSTADDR on 9.0+ (but not on 8.x and older versions) but there seems to be no mechanism get the _original_ destination _port_ (Apart from this missing mechanism, my proxy is functional and performs what it is intended to do). : ipfw add 10 fwd localhost,7000 udp from any to any recv em1 Given these are local packets and that ipfw(8) /fwd states: The fwd action does not change the contents of the packet at all. In particular, the destination address remains unmodified, so packets forwarded to another system will usually be rejected by that system unless there is a matching rule on that system to capture them. For packets forwarded locally, the local address of the socket will be set to the original destination address of the packet. This makes the netstat(1) entry look rather weird but is intended for use with transparent proxy servers. For FreeBSDs before 9.0, that description is only correct for TCP packets. For 9.0+, it is true for both UDP and TCP. Old kernels (before 9.0), change the destination of UDP packets forwarded to a local address to the forwarded-to address and port (those specified in the fwd rule). Has the destination port in the received packet been changed to 7000? If not, you're all set. If so, where else could the dst port be stored? cheers, Ian There is no way to get the destination port. That is the problem. recvmsg(2) only returns source address+port and destination IP address. (on 9.0+). -- Best regards. Hooman Fazaeli ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: transparent udp proxy
On 10/31/2014 8:11 PM, Adrian Chadd wrote: Hi, If it's missing in 10 or later then please file a bug and I'll see what it'll take to add another socket option to return the original destination address+port. Thanks, -adrian Thanks. I will check ASAP. -- Best regards. Hooman Fazaeli ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
transparent udp proxy
Hi, I my setup, I use a fwd rule to forward all udp traffic to my local proxy: ipfw add 10 fwd localhost,7000 udp from any to any recv em1 The proxy needs to know the original destination address of forwarded datagrams, but there seems to be no way to obtain that address. Using recvmsg with IP_RECVDSTADDR does not help because it returns next-hop address instead of original destination. This is because udp_input() overwrites packet's destination with next-hop address before doing ip_savecontrol. It seems easy to change udp_input to pass the original dest. address to ip_savecontrol. Another soultion would be to implement IP_RECVDSTSOCKADDR option, which records the original destination address:port as a 'struct sockaddr_in[6]' in packet's control data. Comments/suggestions are welcome. -- Best regards. Hooman Fazaeli ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: transparent udp proxy
On 10/31/2014 2:18 PM, Andrey V. Elsukov wrote: On 31.10.2014 12:50, Hooman Fazaeli wrote: Hi, I my setup, I use a fwd rule to forward all udp traffic to my local proxy: ipfw add 10 fwd localhost,7000 udp from any to any recv em1 The proxy needs to know the original destination address of forwarded datagrams, but there seems to be no way to obtain that address. Using recvmsg with IP_RECVDSTADDR does not help because it returns next-hop address instead of original destination. This is because udp_input() overwrites packet's destination with next-hop address before doing ip_savecontrol. Hi, udp_input() doesn't overwrite destination address. Probably you have NAT that does this. There is no NAT stuff. I checked that on 8.4 source: http://fxr.watson.org/fxr/source/netinet/udp_usrreq.c?v=FREEBSD8#L461 -- Best regards. Hooman Fazaeli ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: transparent udp proxy
On 10/31/2014 3:38 PM, Andrey V. Elsukov wrote: On 31.10.2014 15:04, Hooman Fazaeli wrote: Hi, udp_input() doesn't overwrite destination address. Probably you have NAT that does this. There is no NAT stuff. I checked that on 8.4 source: http://fxr.watson.org/fxr/source/netinet/udp_usrreq.c?v=FREEBSD8#L461 The more recent FreeBSD versions don't overwrite destination address. https://svnweb.freebsd.org/base?view=revisionrevision=225044 Yes. It seems so. But still the problem of obtaining original destination port remains. -- Best regards. Hooman Fazaeli ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: transparent udp proxy
On 10/31/2014 5:30 PM, Mark Felder wrote: I'm not sure if this is what you're looking for, but perhaps the solution is in net/samplicator ? From the project's website: This simple program listens for UDP datagrams on a network port, and sends copies of these datagrams on to a set of destinations. Optionally, it can perform sampling, i.e. rather than forwarding every packet, forward only 1 in N. Another option is that it can spoof the IP source address, so that the copies appear to come from the original source, rather than the relay. Currently only supports IPv4. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org Thanks. I do not thinks it provides what I am looking for. I am not looking for an application performing a specific task, but a mechanism to get the __original__ destination address and port of packets forwarded to a local UDP proxy by ipfw fwd rules. As I figured it out until now, The original destination address may be obtained by IP_RECVDSTADDR on 9.0+ (but not on 8.x and older versions) but there seems to be no mechanism get the _original_ destination _port_ (Apart from this missing mechanism, my proxy is functional and performs what it is intended to do). -- Best regards. Hooman Fazaeli ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: pf stuck
On 9/30/2014 12:12 AM, Andrea Venturoli wrote: On 09/29/14 20:21, Ermal Luçi wrote: Probably is better you ask this on freebsd-pf@. Thanks, I see you have already cc:ed it. Though this sounds like state limit reached. Can this happen even if all my pf rules have no state? No. Anyway, you can check state statistics with: pfctl -s i ; pfctl -s m -- Best regards. Hooman Fazaeli ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: UDP/TCP versus IP frames - subtle out of order packets with hardware hashing
On 7/15/2014 5:14 AM, Adrian Chadd wrote: Hi, Whilst digging into UDP receive side scaling on the intel ixgbe(4) NIC, I stumbled across how it hashes traffic between IP fragmented traffic and non IP-fragmented traffic. Here's how it surfaced: * the ixgbe(4) NIC is configured to hash on both IP (2-tuple) and TCP/UDP (4-tuple); * when a non-fragmented UDP frame comes in, it's hashed on the 4-tuple and comes into queue A; * when a fragmented UDP frame comes in, it's hashed on the IP 2-tuple and comes into queue B. So if there's a mix of small and large datagrams, we'll end up with some packets coming in via queue A and some by queue B. In normal operation that'll result in out of order packets. For the RSS stuff I'm working on it means that some packets will match the PCBGROUP setup and some won't. By default UDP configures a 2-tuple hash so it expects packets to come in hashed appropriately. But that only matches for large frames. For small frames it'll be hashed via the 4-tuple and it won't match. The ip reassembly code doesn't recalculate the flowid/flowtype once it's finished. It'd be nice to do that before further processing so it can be placed in the right netisr. So there's a couple of semi-overlapping issues: * Right now we could get TCP and UDP frames out of order. I'd like to at least have ixgbe(4) hash on the 2-tuple for UDP rather than the 4-tuple. That fixes that silly corner case. It's not likely going to show up except for things like forwarding workloads. Maybe people doing memcached work, I'm not sure. * Whether or not to calculate the flowid/flowtype in ip_reass() (or maybe in the netisr input path, in case there's no flowid assigned) so work is better distributed; * .. then if we do that, we could do 4-tuple UDP hashing again and we'd just recalculate for any large frames. Here's what happened with Linux and ixgbe in 2010 on this topic: http://comments.gmane.org/gmane.linux.network/166687 What do people think? -a ___ freebsd-a...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-arch To unsubscribe, send any mail to freebsd-arch-unsubscr...@freebsd.org Doesn't the problem applies to TCP too? TCP may be fragmented too but is less likely because of MSS. -- Best regards. Hooman Fazaeli ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: FreeBSD 9 w/ MPD5 crashes as LNS with 300+ tunnels. Netgraph issue?
and core dump somewhere for download so we can have a closer look at panic trace. -- Best regards. Hooman Fazaeli ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: TSO and FreeBSD vs Linux
On 9/4/2013 9:23 AM, Julian Elischer wrote: On 9/4/13 6:49 AM, David Wolfskill wrote: On Tue, Sep 03, 2013 at 12:27:34PM -0700, David Wolfskill wrote: ... As soon as I issued sudo net.inet.tcp.tso=0 ... the copy worked without a hitch or a whine. And I was able to copy all 117709618 bytes, not just 2097152 (2^21). The above command should (of course) have read sudo sysctl net.inet.tcp.tso=0 Also: I normally had the em0 NIC on the machine in question connected to a Netgear GS105 (5-port Gigabit switch). In the process of trouble-shooting the problem with NFS writes, I bypassed that switch and connected the em0 NIC directly to the jack in my cube. In that configuration, the em0 NIC showed media: Ethernet 1000baseT (autoselect), while connected to the GS105, it showed media: Ethernet 100baseTX (autoselect). While the NFS write worked whether or not I had the GS105 in the path, it seemed ... suboptimal ... to have a NIC capable of 1000baseT connected to a Gigabit switch, but negotiating at 100baseTX. So I tried setting the media via ifconfig em0 media 1000baseT; after a few seconds, it finally woke back up, and now reports media: Ethernet 1000baseT (1000baseT full-duplex). So it appears that the em(4) driver and Intel 82578DM NIC fail to negotiate 1000baseT with the Netgear GS105. yeah auto-negotiation seems a bit fragile.. not just for us either.. I often end up hardwiring it in rc.conf. I had also experienced similar problems (one case was 82574 with cisco 3550). I also remember cases when auto-select worked but fixed media did not (link was settled down to 100 or half-duplex) I am curious as what is the exact technical reason(s) for such media problems? Are they more hardware or driver related? -- Best regards. Hooman Fazaeli ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Intel 4-port ethernet adaptor link aggregation issue
On 8/2/2013 2:44 AM, Joe Moog wrote: On Aug 1, 2013, at 4:27 PM, Joe Moog joem...@ebureau.com wrote: On Aug 1, 2013, at 3:55 PM, Ryan Stone ryst...@gmail.com wrote: Have you tried using only two ports, but both from the NIC? My suspicion would be that the problem is in the lagg's handling of more than 2 ports rather than the driver, especially given that it is the igb driver in all cases. Ryan: We have done this successfully with two ports on the NIC, on another hardware-identical host. That said, it is entirely possible that this is a shortcoming of lagg. Can you think of any sort of workaround? Our desired implementation really requires the inclusion of all 4 ports in the lagg. Failing this we're looking at the likelihood of 10G ethernet, but with that comes significant overhead, both cost and administration (before anybody tries to force the cost debate, remember that there are 10G router modules and 10G-capable distribution switches involved, never mind the cabling and SFPs -- it's not just a $600 10G card for the host). I'd like to defer that requirement as long as possible. 4 aggregated gig ports would serve us perfectly well for the near-term. Thanks Joe UPDATE: After additional testing, I'm beginning to suspect the igb driver. With our setup, ifconfig identifies all the ethernet ports as igb(0-5). I configured igb0 with a single static IP address (say, 192.168.1.10), and was able to connect to the host administratively. While connected, I enabled another port as a second standalone port, again with a unique address (say, 192.168.1.20), and was able to access the host via that interface as well. The problem arises when we attempt to similarly add a third interface to the mix -- and it doesn't seem to matter what interface(s) we use, or in what order we activate them. Always on the third interface, that third interface fails to respond despite showing active both in ifconfig and on the switch. If there is anything else I could try that would be useful to help identify where the issue may reside, please let me know. Thanks Joe ___ Assign IP addresses from __different__ subnets to the four NIC ports and re-test. (e.g., 192.168.0.10/24, 1.10/24, 2.10/24, 3.10/24). -- Best regards. Hooman Fazaeli ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: netmap bridge can tranmit big packet in line rate ?
On 5/21/2013 5:10 PM, Barney Cordoba wrote: --- On Tue, 5/21/13, liujie liu...@263.net wrote: From: liujie liu...@263.net Subject: Re: netmap bridge can tranmit big packet in line rate ? To: freebsd-net@freebsd.org Date: Tuesday, May 21, 2013, 5:25 AM Hi, Prof.Luigi RIZZO Firstly i should thank you for netmap. I tried to send a e-mail to you yestoday, but it was rejected. I used two machines to test netmap bridge. all with i7-2600 cpu and intel 82599 dual-interfaces card. One worked as sender and receiver with pkt-gen, the other worked as bridge with bridge.c. as you said,I feeled comfous too when i saw the big packet performance dropped, i tried to change the memory parameters of netmap(netmap_mem1.c netmap_mem2.c),but it seemed that can not resove the problem. 60-byte packet send 14882289 pps recv 13994753 pps 124-byte send 8445770 pps recv7628942 pps 252-byte send 4529819 pps recv 3757843 pps 508-byte send2350815 pps recv1645647 pps 1514-byte send 814288 pps recv 489133 pps These numbers indicate you're tx'ing 7.2Gb/s with 60 byte packets and 9.8Gb/s with 1514, so maybe you just need a new calculator? BC ___ AsBarney pointed outalready, your numbers are reasonable. You have almost saturated the link with 1514 byte packets.In the case of 64 byte packets, you do not achieve line rate probably because of the congestion on the bus.Can you show us top -SI output on the sender machine? -- Best regards. Hooman Fazaeli ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: High CPU interrupt load on intel I350T4 with igb on 8.3
On 5/11/2013 8:26 PM, Barney Cordoba wrote: Clearly you don't understand the problem. Your logic is that because other drivers are defective also; therefore its not a driver problem? The problem is caused by a multi-threaded driver that haphazardly launches tasks and that doesn't manage the case that the rest of the system can't handle the load. It's no different than a driver that barfs when mbuf clusters are exhausted. The answer isn't to increase memory or mbufs, even though that may alleviate the problem. The answer is to fix the driver, so that it doesn't crash the system for an event that is wholly predictable. igb has 1) too many locks and 2) exasperates the problem by binding to cpus, which causes it to not only have to wait for the lock to free, but also for a specific cpu to become free. So it chugs along happily until it encounters a bottleneck, at which point it quickly blows up the entire system in a domino effect. It needs to manage locks more efficiently, and also to detect when the backup is unmanageable. Ever since FreeBSD 5 the answer has been it's fixed in 7, or its fixed in 9, or it's fixed in 10. There will always be bottlenecks, and no driver should blow up the system no matter what intermediate code may present a problem. Its the driver's responsibility to behave and to drop packets if necessary. BC And how the driver should behave? You suggest dropping the packets. Even if we accept that dropping packets is a good strategy in all configurations (which I doubt), the driver is definitely not the best place to implement it, since that involves duplication of similar code between drivers. Somewhere like the Ethernet layer is a much better choice to watch load of packets and drop them to prevent them to eat all the cores. Furthermore, ignoring the fact that pf is not optimized for multi-processors and blaming drivers for not adjusting themselves with the this pf's fault, is a bit unfair, I believe. -- Best regards. Hooman Fazaeli ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: 'no buffer space available' after switch goes down on freeBSD 7.3
On 12/25/2012 4:31 AM, Ryan Stone wrote: I don't believe that this is fixed in later versions of the driver. The problem is that when the interface loses link the transmit queue can fill up. Once that happens the driver never gets any more calls from the network stack to make it send packets. Pinging the interface fixes it because the driver processes rx.and tx from the same context, so when it receives a packet it starts transmitting again. The patch that I sent fixes the problem by forcing the driver to process the tx queue when ever links goes from down to up. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org I have not tested it but it is apparently fixed: http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/e1000/if_em.c#rev1.21.2.23 -- Best regards. Hooman Fazaeli ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: ping: sendto: No buffer space available
On 9/27/2012 9:38 PM, Rudy wrote: On 09/27/2012 11:00 AM, Rudy wrote: Rebooting and/or the settings change seems to have stopped the errors. Here is a pretty little graph showing error rate on em1 for the past 3 days. http://www.monkeybrains.net/images/ErrorRate-em1.png Interesting... if I zoom in on the graph, I see the errors were 'every other sample period' until I rebooted the box. http://www.monkeybrains.net/images/ErrorRate-em1-zoom.png How much traffic (bytes/s and packets/s) and of what type is passing through this box? ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: ping: sendto: No buffer space available
On 9/24/2012 7:50 PM, Rudy (bulk) wrote: Sometimes when I try to ping a neighbor machine (plugged directly in with no switch involved), I get: ping: sendto: No buffer space available ping: sendto: No buffer space available If I reset the interface 'ifconfig em1 down; ifconfig em1 up' the problem goes away. The pings are: FreeBSD 8.3 em1 -- FreeBSD 9.0 em2 and I am seeing the issue on the FreeBSD 8.3 machine. The box has 6GB of free ram and is a quagga router. What do I need to tune? Thanks! Rudy # netstat -m 10236/8454/18690 mbufs in use (current/cache/total) 10234/5388/15622/262144 mbuf clusters in use (current/cache/total/max) 10234/5382 mbuf+clusters out of packet secondary zone in use (current/cache) 0/327/327/12800 4k (page size) jumbo clusters in use (current/cache/total/max) 0/3070/3070/6400 9k jumbo clusters in use (current/cache/total/max) 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) 23027K/41827K/64854K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0/0/0 sfbufs in use (current/peak/max) 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile 0 calls to protocol drain routines # ifconfig em1 em1: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=4219bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC,VLAN_HWTSO ether 00:25:90:56:60:7f inet 10.1.1.1 netmask 0xfffc broadcast 10.1.1.3 media: Ethernet autoselect (1000baseT full-duplex) status: active FreeBSD 8.3 ### loader.conf: net.link.ifqmaxlen=1024 hw.em.rxd=1024 hw.em.txd=1024 ### sysctl.conf: kern.timecounter.hardware=HPET net.route.netisr_maxqlen=2048 net.inet.ip.intr_queue_maxlen=1024 kern.ipc.somaxconn=256 kern.random.sys.harvest.interrupt=0 kern.random.sys.harvest.ethernet=0 net.inet.raw.maxdgram=16384 net.inet.raw.recvspace=16384 net.inet.icmp.icmplim=1000 net.inet.ip.fastforwarding=1 kern.ipc.nmbclusters=262144 net.inet.icmp.drop_redirect=1 dev.em.0.rx_processing_limit=200 dev.em.1.rx_processing_limit=200 dev.em.2.rx_processing_limit=200 dev.em.3.rx_processing_limit=200 net.link.ether.inet.max_age=300 hw.intr_storm_threshold=9000 # Security net.inet.ip.redirect=0 net.inet.ip.sourceroute=0 net.inet.ip.accept_sourceroute=0 net.inet.icmp.maskrepl=0 Not sure if it matters, but here are the tunings on the other box: FreeBSD 9.0 ### loader.conf: net.link.ifqmaxlen=512 ### sysctl.conf: net.inet.ip.fastforwarding=1 kern.ipc.nmbclusters=262144 kern.timecounter.hardware=HPET net.inet.ip.rtminexpire=2 net.inet.ip.rtmaxcache=1024 dev.igb.0.rx_processing_limit=480 dev.igb.1.rx_processing_limit=480 net.inet.icmp.icmplim=1000 kern.random.sys.harvest.interrupt=0 kern.random.sys.harvest.ethernet=0 net.link.ether.inet.max_age=300 ##Sat Apr 21 00:06:48 PDT 2012 net.inet.ip.redirect=0 net.route.netisr_maxqlen=2048 ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org The most likely cause is that the interface send queue has become full and stayed in that condition. What type of NIC is at the other end of link? can you post the output of: # sysctl dev.em.1 ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: ping: sendto: No buffer space available
On 9/25/2012 11:08 AM, Rudy (bulk) wrote: On 9/24/12 11:52 PM, Hooman Fazaeli wrote: sysctl dev.em.1 From the side having the 'No buffer space available' (FreeBSD 8.3 Sep 13 2012) # sysctl dev.em.1 dev.em.1.%desc: Intel(R) PRO/1000 Network Connection 7.3.2 dev.em.1.%driver: em dev.em.1.%location: slot=0 function=0 dev.em.1.%pnpinfo: vendor=0x8086 device=0x10d3 subvendor=0x15d9 subdevice=0x class=0x02 dev.em.1.%parent: pci5 dev.em.1.nvm: -1 dev.em.1.debug: -1 dev.em.1.fc: 3 dev.em.1.rx_int_delay: 0 dev.em.1.tx_int_delay: 66 dev.em.1.rx_abs_int_delay: 66 dev.em.1.tx_abs_int_delay: 66 dev.em.1.rx_processing_limit: 200 dev.em.1.eee_control: 0 dev.em.1.link_irq: 6379725883 dev.em.1.mbuf_alloc_fail: 0 dev.em.1.cluster_alloc_fail: 0 dev.em.1.dropped: 0 dev.em.1.tx_dma_fail: 0 dev.em.1.rx_overruns: 0 dev.em.1.watchdog_timeouts: 0 dev.em.1.device_control: 1477444168 dev.em.1.rx_control: 67141634 dev.em.1.fc_high_water: 18432 dev.em.1.fc_low_water: 16932 dev.em.1.queue0.txd_head: 188 dev.em.1.queue0.txd_tail: 188 dev.em.1.queue0.tx_irq: 760427663 dev.em.1.queue0.no_desc_avail: 0 dev.em.1.queue0.rxd_head: 300 dev.em.1.queue0.rxd_tail: 297 dev.em.1.queue0.rx_irq: 838300057 dev.em.1.mac_stats.excess_coll: 0 dev.em.1.mac_stats.single_coll: 0 dev.em.1.mac_stats.multiple_coll: 0 dev.em.1.mac_stats.late_coll: 0 dev.em.1.mac_stats.collision_count: 0 dev.em.1.mac_stats.symbol_errors: 0 dev.em.1.mac_stats.sequence_errors: 0 dev.em.1.mac_stats.defer_count: 0 dev.em.1.mac_stats.missed_packets: 580251107926 dev.em.1.mac_stats.recv_no_buff: 895 dev.em.1.mac_stats.recv_undersize: 0 dev.em.1.mac_stats.recv_fragmented: 0 dev.em.1.mac_stats.recv_oversize: 0 dev.em.1.mac_stats.recv_jabber: 0 dev.em.1.mac_stats.recv_errs: 0 dev.em.1.mac_stats.crc_errs: 0 dev.em.1.mac_stats.alignment_errs: 0 dev.em.1.mac_stats.coll_ext_errs: 0 dev.em.1.mac_stats.xon_recvd: 809 dev.em.1.mac_stats.xon_txd: 684 dev.em.1.mac_stats.xoff_recvd: 580251112172 dev.em.1.mac_stats.xoff_txd: 580251108668 dev.em.1.mac_stats.total_pkts_recvd: 582154845658 dev.em.1.mac_stats.good_pkts_recvd: 1903732156 dev.em.1.mac_stats.bcast_pkts_recvd: 923 dev.em.1.mac_stats.mcast_pkts_recvd: 0 dev.em.1.mac_stats.rx_frames_64: 257128416 dev.em.1.mac_stats.rx_frames_65_127: 702676478 dev.em.1.mac_stats.rx_frames_128_255: 225331435 dev.em.1.mac_stats.rx_frames_256_511: 59888288 dev.em.1.mac_stats.rx_frames_512_1023: 4176 dev.em.1.mac_stats.rx_frames_1024_1522: 610930363 dev.em.1.mac_stats.good_octets_recvd: 1057190106675 dev.em.1.mac_stats.good_octets_txd: 1502996801989 dev.em.1.mac_stats.total_pkts_txd: 582709483882 dev.em.1.mac_stats.good_pkts_txd: 2458374408 dev.em.1.mac_stats.bcast_pkts_txd: 73 dev.em.1.mac_stats.mcast_pkts_txd: 0 dev.em.1.mac_stats.tx_frames_64: 314613253 dev.em.1.mac_stats.tx_frames_65_127: 841961719 dev.em.1.mac_stats.tx_frames_128_255: 268669868 dev.em.1.mac_stats.tx_frames_256_511: 73341358 dev.em.1.mac_stats.tx_frames_512_1023: 62765737 dev.em.1.mac_stats.tx_frames_1024_1522: 897022473 dev.em.1.mac_stats.tso_txd: 1880 dev.em.1.mac_stats.tso_ctx_fail: 0 dev.em.1.interrupts.asserts: 6331439142 dev.em.1.interrupts.rx_pkt_timer: 0 dev.em.1.interrupts.rx_abs_timer: 0 dev.em.1.interrupts.tx_pkt_timer: 0 dev.em.1.interrupts.tx_abs_timer: 0 dev.em.1.interrupts.tx_queue_empty: 0 dev.em.1.interrupts.tx_queue_min_thresh: 0 dev.em.1.interrupts.rx_desc_min_thresh: 0 dev.em.1.interrupts.rx_overrun: 74346455 And the the other end of the link (FreeBSD 9.0-STABLE Feb 1 2012) # sysctl dev.em.2 dev.em.2.%desc: Intel(R) PRO/1000 Network Connection 7.2.3 dev.em.2.%driver: em dev.em.2.%location: slot=0 function=0 dev.em.2.%pnpinfo: vendor=0x8086 device=0x10d3 subvendor=0x15d9 subdevice=0x10d3 class=0x02 dev.em.2.%parent: pci7 dev.em.2.nvm: -1 dev.em.2.debug: -1 dev.em.2.rx_int_delay: 0 dev.em.2.tx_int_delay: 66 dev.em.2.rx_abs_int_delay: 66 dev.em.2.tx_abs_int_delay: 66 dev.em.2.rx_processing_limit: 100 dev.em.2.flow_control: 3 dev.em.2.eee_control: 0 dev.em.2.link_irq: 6379294926 dev.em.2.mbuf_alloc_fail: 0 dev.em.2.cluster_alloc_fail: 0 dev.em.2.dropped: 0 dev.em.2.tx_dma_fail: 0 dev.em.2.rx_overruns: 0 dev.em.2.watchdog_timeouts: 0 dev.em.2.device_control: 1477444168 dev.em.2.rx_control: 67141634 dev.em.2.fc_high_water: 18432 dev.em.2.fc_low_water: 16932 dev.em.2.queue0.txd_head: 735 dev.em.2.queue0.txd_tail: 735 dev.em.2.queue0.tx_irq: 839960061 dev.em.2.queue0.no_desc_avail: 0 dev.em.2.queue0.rxd_head: 237 dev.em.2.queue0.rxd_tail: 236 dev.em.2.queue0.rx_irq: 762108556 dev.em.2.mac_stats.excess_coll: 0 dev.em.2.mac_stats.single_coll: 0 dev.em.2.mac_stats.multiple_coll: 0 dev.em.2.mac_stats.late_coll: 0 dev.em.2.mac_stats.collision_count: 0 dev.em.2.mac_stats.symbol_errors: 0 dev.em.2.mac_stats.sequence_errors: 0 dev.em.2.mac_stats.defer_count: 0 dev.em.2.mac_stats.missed_packets: 580252415422 dev.em.2.mac_stats.recv_no_buff: 3211 dev.em.2.mac_stats.recv_undersize: 0 dev.em.2.mac_stats.recv_fragmented: 0 dev.em.2.mac_stats.recv_oversize: 0
Re: ping: sendto: No buffer space available
On 9/25/2012 11:08 AM, Rudy (bulk) wrote: On 9/24/12 11:52 PM, Hooman Fazaeli wrote: sysctl dev.em.1 From the side having the 'No buffer space available' (FreeBSD 8.3 Sep 13 2012) # sysctl dev.em.1 dev.em.1.%desc: Intel(R) PRO/1000 Network Connection 7.3.2 dev.em.1.%driver: em dev.em.1.%location: slot=0 function=0 dev.em.1.%pnpinfo: vendor=0x8086 device=0x10d3 subvendor=0x15d9 subdevice=0x class=0x02 dev.em.1.%parent: pci5 dev.em.1.nvm: -1 dev.em.1.debug: -1 dev.em.1.fc: 3 dev.em.1.rx_int_delay: 0 dev.em.1.tx_int_delay: 66 dev.em.1.rx_abs_int_delay: 66 dev.em.1.tx_abs_int_delay: 66 dev.em.1.rx_processing_limit: 200 dev.em.1.eee_control: 0 dev.em.1.link_irq: 6379725883 dev.em.1.mbuf_alloc_fail: 0 dev.em.1.cluster_alloc_fail: 0 dev.em.1.dropped: 0 dev.em.1.tx_dma_fail: 0 dev.em.1.rx_overruns: 0 dev.em.1.watchdog_timeouts: 0 dev.em.1.device_control: 1477444168 dev.em.1.rx_control: 67141634 dev.em.1.fc_high_water: 18432 dev.em.1.fc_low_water: 16932 dev.em.1.queue0.txd_head: 188 dev.em.1.queue0.txd_tail: 188 dev.em.1.queue0.tx_irq: 760427663 dev.em.1.queue0.no_desc_avail: 0 dev.em.1.queue0.rxd_head: 300 dev.em.1.queue0.rxd_tail: 297 dev.em.1.queue0.rx_irq: 838300057 dev.em.1.mac_stats.excess_coll: 0 dev.em.1.mac_stats.single_coll: 0 dev.em.1.mac_stats.multiple_coll: 0 dev.em.1.mac_stats.late_coll: 0 dev.em.1.mac_stats.collision_count: 0 dev.em.1.mac_stats.symbol_errors: 0 dev.em.1.mac_stats.sequence_errors: 0 dev.em.1.mac_stats.defer_count: 0 dev.em.1.mac_stats.missed_packets: 580251107926 dev.em.1.mac_stats.recv_no_buff: 895 dev.em.1.mac_stats.recv_undersize: 0 dev.em.1.mac_stats.recv_fragmented: 0 dev.em.1.mac_stats.recv_oversize: 0 dev.em.1.mac_stats.recv_jabber: 0 dev.em.1.mac_stats.recv_errs: 0 dev.em.1.mac_stats.crc_errs: 0 dev.em.1.mac_stats.alignment_errs: 0 dev.em.1.mac_stats.coll_ext_errs: 0 dev.em.1.mac_stats.xon_recvd: 809 dev.em.1.mac_stats.xon_txd: 684 dev.em.1.mac_stats.xoff_recvd: 580251112172 dev.em.1.mac_stats.xoff_txd: 580251108668 dev.em.1.mac_stats.total_pkts_recvd: 582154845658 dev.em.1.mac_stats.good_pkts_recvd: 1903732156 dev.em.1.mac_stats.bcast_pkts_recvd: 923 dev.em.1.mac_stats.mcast_pkts_recvd: 0 dev.em.1.mac_stats.rx_frames_64: 257128416 dev.em.1.mac_stats.rx_frames_65_127: 702676478 dev.em.1.mac_stats.rx_frames_128_255: 225331435 dev.em.1.mac_stats.rx_frames_256_511: 59888288 dev.em.1.mac_stats.rx_frames_512_1023: 4176 dev.em.1.mac_stats.rx_frames_1024_1522: 610930363 dev.em.1.mac_stats.good_octets_recvd: 1057190106675 dev.em.1.mac_stats.good_octets_txd: 1502996801989 dev.em.1.mac_stats.total_pkts_txd: 582709483882 dev.em.1.mac_stats.good_pkts_txd: 2458374408 dev.em.1.mac_stats.bcast_pkts_txd: 73 dev.em.1.mac_stats.mcast_pkts_txd: 0 dev.em.1.mac_stats.tx_frames_64: 314613253 dev.em.1.mac_stats.tx_frames_65_127: 841961719 dev.em.1.mac_stats.tx_frames_128_255: 268669868 dev.em.1.mac_stats.tx_frames_256_511: 73341358 dev.em.1.mac_stats.tx_frames_512_1023: 62765737 dev.em.1.mac_stats.tx_frames_1024_1522: 897022473 dev.em.1.mac_stats.tso_txd: 1880 dev.em.1.mac_stats.tso_ctx_fail: 0 dev.em.1.interrupts.asserts: 6331439142 dev.em.1.interrupts.rx_pkt_timer: 0 dev.em.1.interrupts.rx_abs_timer: 0 dev.em.1.interrupts.tx_pkt_timer: 0 dev.em.1.interrupts.tx_abs_timer: 0 dev.em.1.interrupts.tx_queue_empty: 0 dev.em.1.interrupts.tx_queue_min_thresh: 0 dev.em.1.interrupts.rx_desc_min_thresh: 0 dev.em.1.interrupts.rx_overrun: 74346455 And the the other end of the link (FreeBSD 9.0-STABLE Feb 1 2012) # sysctl dev.em.2 dev.em.2.%desc: Intel(R) PRO/1000 Network Connection 7.2.3 dev.em.2.%driver: em dev.em.2.%location: slot=0 function=0 dev.em.2.%pnpinfo: vendor=0x8086 device=0x10d3 subvendor=0x15d9 subdevice=0x10d3 class=0x02 dev.em.2.%parent: pci7 dev.em.2.nvm: -1 dev.em.2.debug: -1 dev.em.2.rx_int_delay: 0 dev.em.2.tx_int_delay: 66 dev.em.2.rx_abs_int_delay: 66 dev.em.2.tx_abs_int_delay: 66 dev.em.2.rx_processing_limit: 100 dev.em.2.flow_control: 3 dev.em.2.eee_control: 0 dev.em.2.link_irq: 6379294926 dev.em.2.mbuf_alloc_fail: 0 dev.em.2.cluster_alloc_fail: 0 dev.em.2.dropped: 0 dev.em.2.tx_dma_fail: 0 dev.em.2.rx_overruns: 0 dev.em.2.watchdog_timeouts: 0 dev.em.2.device_control: 1477444168 dev.em.2.rx_control: 67141634 dev.em.2.fc_high_water: 18432 dev.em.2.fc_low_water: 16932 dev.em.2.queue0.txd_head: 735 dev.em.2.queue0.txd_tail: 735 dev.em.2.queue0.tx_irq: 839960061 dev.em.2.queue0.no_desc_avail: 0 dev.em.2.queue0.rxd_head: 237 dev.em.2.queue0.rxd_tail: 236 dev.em.2.queue0.rx_irq: 762108556 dev.em.2.mac_stats.excess_coll: 0 dev.em.2.mac_stats.single_coll: 0 dev.em.2.mac_stats.multiple_coll: 0 dev.em.2.mac_stats.late_coll: 0 dev.em.2.mac_stats.collision_count: 0 dev.em.2.mac_stats.symbol_errors: 0 dev.em.2.mac_stats.sequence_errors: 0 dev.em.2.mac_stats.defer_count: 0 dev.em.2.mac_stats.missed_packets: 580252415422 dev.em.2.mac_stats.recv_no_buff: 3211 dev.em.2.mac_stats.recv_undersize: 0 dev.em.2.mac_stats.recv_fragmented: 0 dev.em.2.mac_stats.recv_oversize: 0
Re: FreeBSD 9.0-R em0 issues?
On 8/11/2012 2:17 PM, Karl Pielorz wrote: --On 11 August 2012 12:36 +0430 Hooman Fazaeli hoomanfaza...@gmail.com wrote: NameMtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll em01500 Link#5 00:25:90:31:82:46 355482 10612864185945 0 291109 3032246910270 1516123455135 82574L with ASPM enabled is known to cause a problem like yours. (See:http://www.google.com/#hl=ensclient=psy-abq=82574L+%2B+ASPM http://www.google.com/#hl=ensclient=psy-abq=82574L+%2B+ASPM) However, some time ago, jack committed a fix which disabled ASPM to fix the problem. I recommend getting and compiling latest e1000 source from CVS (which is version 7.3.2) and see what happens. Hi, In the midst of trying to get this onto the machine (without the NIC working - which was fun), during a reboot the NIC suddenly disappeared completely. Rebooting the machine again gives a 50/50 on the NIC probing when FreeBSD runs up - half the time I'm left with em1 only, and no em0. It looks like this has gone from a 'possible software' issue to a 'probable hardware' issue now? - I've moved the connection over to em1, I'll see how I get on with that. I have also seen this problem on a different hardware but I can not recall if I fixed it with hardware replacement or driver update. Anyway, it is worth to give the driver update a try. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: FreeBSD 9.0-R em0 issues?
On 8/10/2012 11:24 PM, Karl Pielorz wrote: Hi, Apologies for posting to -net as well - I originally posted this to -hackers, but was advised to re-post it here... A FreeBSD 9.0-R amd64 box - based on a SuperMicro X8DTL-IF Rev. 2.01 w/Intel L5630 6Gb of RAM seems to have issues with it's onboard NIC (em driver based - i.e. em0). The machine runs fine - but then suddenly loses all network connectivity. Nothing is logged on the console, or /var/log/messages. Doing an 'infconfig em0 down' then up, doesn't fix it. Rebooting the box does fix it for a while. Having dug around Google - I've now set hw.em.enable_msix=0 - the box ran the whole of the day with that set, before again - having em0 wedge up. When it does this 'netstat -n -i' returns silly figures - i.e. if I catch it even moments after it's done it - it'll claim to have suffered billions of input/output and collision errors (huge amounts more than the amount of traffic that machine would have handled) - e.g. NameMtu Network Address Ipkts Ierrs IdropOpkts Oerrs Coll em01500 Link#5 00:25:90:31:82:46 355482 10612864185945 0 291109 3032246910270 1516123455135 82574L with ASPM enabled is known to cause a problem like yours. (See:http://www.google.com/#hl=ensclient=psy-abq=82574L+%2B+ASPM http://www.google.com/#hl=ensclient=psy-abq=82574L+%2B+ASPM) However, some time ago, jack committed a fix which disabled ASPM to fix the problem. I recommend getting and compiling latest e1000 source from CVS (which is version 7.3.2) and see what happens. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: FreeBSD 10G forwarding performance @Intel
On 7/16/2012 10:13 PM, Alexander V. Chernikov wrote: Old kernel from previous letters, same setup: net.inet.ip.fw.enable=0 2.3 MPPS net.inet.ip.fw.update_counters=0 net.inet.ip.fw.enable=1 1.93MPPS net.inet.ip.fw.update_counters=1 1.74MPPS Kernel with ipfw pcpu counters: net.inet.ip.fw.enable=0 2.3 MPPS net.inet.ip.fw.update_counters=0 net.inet.ip.fw.enable=1 1.93MPPS net.inet.ip.fw.update_counters=1 1.93MPPS Counters seems to be working without any (significant) overhead. (Maybe I'm wrong somewhere?) Additionally, I've got (from my previous pcpu attempt) a small patch permitting ipfw to re-use rule map allocation instead of reallocating on every rule. This saves a bit of system time: loading 20k rules with ipfw binary gives us: 5.1s system time before and 4.1s system time after. May be slightly off-topic, but do you have tested (or have plans to test ) with bidirectional traffic? ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Intel 82574L interface wedging - em7.3.2/8.2-STABLE
Dear Jason With a link_irq of 4, I still guess your problem is snd_buf filling up during a temporary link_loss (see: http://lists.freebsd.org/pipermail/freebsd-net/2011-November/030424.html). I use a patched version of e1000 which addresses this issue and works good for me but it is based on 7.2.3 and I have just tested in on 7.3-RELEASE. If interested, I can send you the sources for test. You may also port my changes to 7.3.2 and roll your own version. On 3/8/2012 12:27 AM, Jason Wolfe wrote: I'm sure it's getting old with all of the recent work put into the e1000 driver, but this is still ongoing with MSI-X enabled. Most machines are running an 8.2-STABLE from early Feb, though it appears there have been no relevant changes in RELENG_8 since then. I've disabled all possible em options on the devices also to rule that out and am still seeing the issue. I guess reverting back to MSI-X disabled is the next step if nothing is spotted. This box had been doing between 1 and 1.5Gb/s steady for the 26 days before the network hang. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Intel 82574L interface wedging - em7.3.2/8.2-STABLE
On 3/11/2012 5:31 AM, Adrian Chadd wrote: Are you able to post the patch here? Maybe Jack can look at what's going on and apply it to the latest intel ethernet driver. Adrian Below is the patch for if_em.c (7.2.3). It simply checks driver's queue status when the link state changes (inactive - active) and start transmit task if queue(s) are not empty. It also contains stuff I have added to compile on 7 plus some code for test and diagnostics. Hope it helps. --- if_em.c.orig2011-10-27 14:47:20.0 +0330 +++ if_em.c2011-11-19 16:11:54.0 +0330 @@ -85,6 +85,14 @@ #include e1000_82571.h #include if_em.h +#if !defined(DISABLE_FIXUPS) __FreeBSD_version 80 +static __inline int +pci_find_cap(device_t dev, int capability, int *capreg) +{ +return (PCI_FIND_EXTCAP(device_get_parent(dev), dev, capability, capreg)); +} +#endif + /* * Set this to one to display debug statistics */ @@ -93,7 +101,11 @@ /* * Driver version: */ +#ifdef PKG_VERSION +char em_driver_version[] = version 7.2.3 (ifdrivers- PKG_VERSION ); +#else char em_driver_version[] = 7.2.3; +#endif /* * PCI Device ID Table @@ -293,6 +305,11 @@ static poll_handler_t em_poll; #endif /* POLLING */ +#ifndef DISABLE_FIXUPS +static int em_sysctl_snd_ifq_len(SYSCTL_HANDLER_ARGS); +static int em_sysctl_snd_ifq_drv_len(SYSCTL_HANDLER_ARGS); +#endif + /* * FreeBSD Device Interface Entry Points */ @@ -399,6 +416,23 @@ /* Global used in WOL setup with multiport cards */ static int global_quad_port_a = 0; +#ifndef DISABLE_FIXUPS +static int enable_hang_fixup = 1; +TUNABLE_INT(hw.em.enable_hang_fixup, enable_hang_fixup); +SYSCTL_INT(_hw_em, OID_AUTO, enable_hang_fixup, CTLFLAG_RW, enable_hang_fixup, 0, +Enable rx/tx hang fixup); + +static int em_regard_tx_link_status = 1; +TUNABLE_INT(hw.em.regard_tx_link_status, em_regard_tx_link_status); +SYSCTL_INT(_hw_em, OID_AUTO, regard_tx_link_status, CTLFLAG_RW, em_regard_tx_link_status, 0, +Regard tx link status); + +static int link_master_slave = e1000_ms_hw_default; +TUNABLE_INT(hw.em.link_master_slave, link_master_slave); +SYSCTL_INT(_hw_em, OID_AUTO, link_master_slave, CTLFLAG_RW, link_master_slave, +0, Link negotiation master/slave type); +#endif + /* * Device identification routine * @@ -411,7 +445,11 @@ static int em_probe(device_t dev) { +#ifdef PKG_VERSION +charadapter_name[sizeof(em_driver_version) + 60]; +#else charadapter_name[60]; +#endif u16pci_vendor_id = 0; u16pci_device_id = 0; u16pci_subvendor_id = 0; @@ -864,7 +902,11 @@ int err = 0, enq = 0; if ((ifp-if_drv_flags (IFF_DRV_RUNNING | IFF_DRV_OACTIVE)) != +#ifndef DISABLE_FIXUPS IFF_DRV_RUNNING || adapter-link_active == 0) { +#else +IFF_DRV_RUNNING || (em_regard_tx_link_status !adapter-link_active)) { +#endif if (m != NULL) err = drbr_enqueue(ifp, txr-br, m); return (err); @@ -962,9 +1004,17 @@ if ((ifp-if_drv_flags (IFF_DRV_RUNNING|IFF_DRV_OACTIVE)) != IFF_DRV_RUNNING) return; +#ifdef _TEST +if (adapter-forced_link_status == 0) +return; +#endif +#ifdef DISABLE_FIXUPS if (!adapter-link_active) +#else +if (em_regard_tx_link_status !adapter-link_active) return; +#endif while (!IFQ_DRV_IS_EMPTY(ifp-if_snd)) { /* Call cleanup if number of TX descriptors low */ @@ -977,6 +1027,17 @@ IFQ_DRV_DEQUEUE(ifp-if_snd, m_head); if (m_head == NULL) break; +#ifdef _TEST +if (adapter-forced_xmit_error == ENOMEM) { +ifp-if_drv_flags |= IFF_DRV_OACTIVE; +IFQ_DRV_PREPEND(ifp-if_snd, m_head); +break; +} else if (adapter-forced_xmit_error != 0) { +m_freem(m_head); +m_head = NULL; +break; +} else +#endif /* * Encapsulation can modify our pointer, and or make it * NULL on failure. In that event, we can't requeue. @@ -1141,6 +1202,10 @@ adapter-hw.phy.reset_disable = FALSE; /* Check SOL/IDER usage */ EM_CORE_LOCK(adapter); +#ifndef DISABLE_FIXUPS +if (adapter-hw.phy.media_type == e1000_media_type_copper) +adapter-hw.phy.ms_type = link_master_slave; +#endif if (e1000_check_reset_block(adapter-hw)) {
Re: em0 hangs on 8-STABLE again
Dear Jack Is the problem related to link loss fixed in this version? The problem was that if if_snd fills up during a link_active == 0 period, stack never calls em_start again, because em does not kick off tx when link becomes active again. On 1/29/2012 9:51 PM, Jack Vogel wrote: No, I told Mike I'd get it into 8.x, have just been busy, but will try and get it pushed up in the queue. Jack 2012/1/29 Lev Serebryakovl...@freebsd.org Hello, Mike. You wrote 29 января 2012 г., 16:54:59: My home server lost connection on em0 this night again. It was persistent problem some times ago, but with version 7.2.3 it is first time, but with worse symptoms. 7.3.0 from HEAD is quite stable for me. Hopefully it will be MFC'd soon :) I'm afraid, that MFC'd means to 9-STABLE now :( -- // Black Lion AKA Lev Serebryakovl...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Intel 82574L interface wedging on em 7.1.9/7.2.3 when MSIX enabled
On 11/10/2011 3:39 AM, Adrian Chadd wrote: There's no locking around the OACTIVE flag set/clear, right? Is it possible that multiple TX threads are fiddling with OACTIVE and then it's not being properly cleared and tx kicked? Adrian If we check for OACTIVE periodically (for instance, in local_timer) and under transient resource shortage, the driver will finally end up with OACTIVE cleared. Under frequent resource shortages, the driver may remain OACTIVE longer than it is ~OACTIVE or it may constantly toggles but there is not much the driver can do about this and a simple locking around OACTIVE set/clear does not change the situation. The problem _is_ low resources and the only fix is to increase it. The problems we should focus on here are two things: 1- The driver _must_ be able to recover from OACTIVE after transient resource shortages. 2- It is desirable to do this as fast as possible. Doing recovery in local_timer accommodates the first need but it is very far from from the second. One possible solution for 2 would be to defer setting OACTIVE until N consecutive transmissions fail (i.e., N == 75% (if_snd.ifq_maxlen - if_snd.ifq_len)). The overhead is a little wasted cpu time in longer OACTIVE states. We still need local_timer to recover from these states. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Intel 82574L interface wedging on em 7.1.9/7.2.3 when MSIX enabled
On 11/10/2011 3:39 AM, Adrian Chadd wrote: There's no locking around the OACTIVE flag set/clear, right? Is it possible that multiple TX threads are fiddling with OACTIVE and then it's not being properly cleared and tx kicked? Adrian sorry! I forgot to cleanup the the last message ... here is the correct one: If we check for OACTIVE periodically (for instance, in local_timer) and under transient resource shortage, the driver will finally end up with OACTIVE cleared. Under frequent resource shortages, the driver may remain OACTIVE longer than it is ~OACTIVE or it may constantly toggles but there is not much the driver can do about this and a simple locking around OACTIVE set/clear does not change the situation. The problem _is_ low resources and the only fix is to increase it. The problems we should focus on here are two things: 1- The driver _must_ be able to recover from OACTIVE after transient resource shortages. 2- It is desirable to do this as fast as possible. Doing recovery in local_timer accommodates the first need but it is very far from from the second. One possible solution for 2 would be to defer setting OACTIVE until N consecutive transmissions fail (i.e., N == 75% (if_snd.ifq_maxlen - if_snd.ifq_len)). The overhead is a little wasted cpu time consumed in longer OACTIVE states. We still need local_timer to recover from these states. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Intel 82574L interface wedging on em 7.1.9/7.2.3 when MSIX enabled
On 11/8/2011 11:00 PM, Adrian Chadd wrote: On 8 November 2011 09:21, Hooman Fazaelihoomanfaza...@gmail.com wrote: With MSIX enabled, the link task (em_handle_link) does _not_ triggers _start when the link changes state from inactive to active (which it should). If if_snd quickly fills up during a temporary link loss, transmission is stopped forever and the driver never recovers from that state. The last patch should have reduced the frequency of the problem but it assumes every IFQ_ENQUEUE is followed by a if_start which is not a true assumption. FWIW, I saw something very similar with the if_arge code port from Linux. If the TX queue filled up and wasn't serviced before it hit completely full, it was never drained. It may be worthwhile auditing some of the other NIC drivers to ensure this kind of situation isn't occuring. Especially if they came from Linux. :-) That's a great catch, I hope it finally fixes the if_em issues with MSIX. :-) Adrian Just for the record, I should inform you that igb, ixgb and ixbge have the same issue. I have not checked other drivers. And there is another subtle problem with all these drivers: if transmit (xxx_xmit) fails for a temporary memory shortage (i.e., DMA failure for ENOMEM), the driver may enter the OACTIVE state and _never_ recovers! The scenario is somehow as before: - if_start is executed. - xxx_xmit fails with ENOMEM. - xxx_start_locked sets OACTIVE. Note that this is different from a low TX descriptor condition which also sets OACTIVE. - stack enqueues packets in if_snd but does not call if_start since driver is OACTIVE. - stack enqueues more packets until if_snd fills up and packets start to drop. - Since there is nowhere in the driver's code to re-try transmission when memory becomes available again (xxx_local_timer is a candidate), the driver remains OACTIVE forever until it is re-initialized. I am working on patches for em/igb/ixgb/ixgbe to fix these issues and would be happy to share them with anyone who is interested. since these are really severe problems, I hope gurus apply official fixes ASAP. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Intel 82574L interface wedging on em 7.1.9/7.2.3 when MSIX enabled
On 11/8/2011 10:23 PM, Jason Wolfe wrote: On Tue, Nov 8, 2011 at 10:21 AM, Hooman Fazaeli hoomanfaza...@gmail.com mailto:hoomanfaza...@gmail.com wrote: I have allocated more time to the problem and guess I can explain what your problem is. With MSIX disabled, the driver uses fast interrupt handler (em_irq_fast) which calls rx/tx task and then checks for link status change. This implies that rx/tx task is executed with every link state change. This is not efficient, as it is a waste of time to start transmission when link is down. However, it has the effect that after a temporary link loss (active-inactive-active), _start is executed and transmission continues normally. The value of link_toggles (3) clearly indicates that you had such a transition when the problem occured. With MSIX enabled, the link task (em_handle_link) does _not_ triggers _start when the link changes state from inactive to active (which it should). If if_snd quickly fills up during a temporary link loss, transmission is stopped forever and the driver never recovers from that state. The last patch should have reduced the frequency of the problem but it assumes every IFQ_ENQUEUE is followed by a if_start which is not a true assumption. If you are willing to test, I can prepare another patch for you to fix the issue in a different and more reliable way. Hooman, Thanks again for the assist, it sounds like this may also be why we see a bit higher latency with MSI-X disabled on this chipset. I'm happy to test any patches as I have a handful of boxes set aside to 'research' this issue. Hopefully the testing here helps along any patches to the tree for others benefit also. Jason Latency may or may not be related. I am doing more tests and will post my findings soon. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Intel 82574L interface wedging on em 7.1.9/7.2.3 when MSIX enabled
On 11/7/2011 9:24 PM, Jason Wolfe wrote: On Mon, Oct 31, 2011 at 1:13 AM, Hooman Fazaeli hoomanfaza...@gmail.com mailto:hoomanfaza...@gmail.com wrote: Attached is a patch for if_em.c. It flushes interface queue when it is full and link is not active. Please note that when this happens, drops are increasing on interface and this will trigger your scripts as before. You need to change a little the scripts as follows: check interface TX status if (interface TX seems hung) { sleep 5 check interface TX status if (interface TX seems hung) { reset the interface. } } For MULTIQUEUE, it just disables the check for link status (which is not good). so pls. test in non-MULTIQUEUE mode. The patch also contains some minor fixups to compile on 7 plus a fix from r1.69 which addressed RX hang problem (the fix was later removed in r1.70). I included it for Emil to give it a try. Pls. let me know if you have any problems with patch. Hooman, Unfortunately one of the server just had a wedge event a couple hours ago with this patch. To confirm your changes should cause a recovery within the time I'm allowing, here is the current format: check interface TX status if (interface TX seems hung) { sleep 3 check packets out sleep 2 check packets out if (packets not incrementing) { reset the interface } } I bounced em0 because dropped packets incremented 1749543 to 1749708 and the interface is not incrementing packets out. 4:10AM up 6 days, 15:23, 0 users, load averages: 0.02, 0.12, 0.14 em0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=219bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC ether X inet6 X%em0 prefixlen 64 scopeid 0x1 nd6 options=1PERFORMNUD media: Ethernet autoselect (1000baseT full-duplex) status: active em1: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=219bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC ether X inet6 X%em1 prefixlen 64 scopeid 0x2 nd6 options=3PERFORMNUD,ACCEPT_RTADV media: Ethernet autoselect (1000baseT full-duplex) status: active ipfw0: flags=8801UP,SIMPLEX,MULTICAST metric 0 mtu 65536 lo0: flags=8049UP,LOOPBACK,RUNNING,MULTICAST metric 0 mtu 16384 options=3RXCSUM,TXCSUM inet 127.0.0.1 netmask 0xff00 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4 inet X.X.X.X netmask 0x inet X.X.X.X netmask 0x inet X.X.X.X netmask 0x nd6 options=3PERFORMNUD,ACCEPT_RTADV lagg0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=219bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC ether X inet X.X.X.X netmask 0xff00 broadcast X.X.X.X inet6 X%lagg0 prefixlen 64 scopeid 0x5 inet6 X prefixlen 64 autoconf nd6 options=3PERFORMNUD,ACCEPT_RTADV media: Ethernet autoselect status: active laggproto loadbalance laggport: em0 flags=4ACTIVE laggport: em1 flags=4ACTIVE interrupt total rate irq3: uart1 3810 0 cpu0: timer 1147568087 2000 irq256: em0:rx 0 59779710 104 irq257: em0:tx 0 2771888652 4831 irq258: em0:link 1 0 irq259: em1:rx 0 3736828886 6512 irq260: em1:tx 0 2790566376 4863 irq261: em1:link 27286 0 irq262: mps0 395687386 689 cpu1: timer 1147559894 2000 cpu2: timer 1147559901 2000 cpu3: timer 1147559902 2000 Total 14345029891 25001 13466/4144/17610 mbufs in use (current/cache/total) 2567/2635/5202/5853720 mbuf clusters in use (current/cache/total/max) 2567/633 mbuf+clusters out of packet secondary zone in use (current/cache) 6798/554/7352/2926859 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) 35692K/8522K/44214K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0/0/0 sfbufs in use (current/peak/max) 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile 0 calls to protocol drain routines Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll Drop em0 1500 Link#1 00:25:90:2b:e5:75 60747643 0 0 11246408092 0 0 1750763 em0 1500 fe80:1::225:9 fe80:1::225:90ff: 0 - - 4 - - - em1 1500 Link#2 00:25:90:2b:e5:75 11237195776 123950 0 11344722383 0 0 545682 em1 1500 fe80:2::225:9 fe80:2::225:90ff: 0 - - 1 - - - lagg0 1500 Link#5 00:25:90:2b:e5:75 11297850142 0 0 22588666102 2296445 0 0 lagg0 1500 69.164.38.0/2 http://69.164.38.0/2 69.164.38.83 10189108030 - - 22592881776 - - - lagg0 1500 fe80:5::225:9 fe80:5::225:90ff: 24 - - 28 - - - lagg0 1500 2607:f4e8:310 2607:f4e8:310:12: 19578 - - 19591 - - - kern.msgbuf: Nov 7 04:10:06 cds1033 kernel
Re: Intel 82574L interface wedging on em 7.1.9/7.2.3 when MSIX enabled
On 10/31/2011 7:33 AM, Jason Wolfe wrote: Thanks for looking into this. I'd be happy to test any patch thrown my way, but keep in mind my issue is only tickled when MSI-X is enabled. My interfaces aren't bouncing, though it might be possible some unique path in the MSI-X code is causing a throughput hang akin to connectivity loss? Jack is the delta your speaking to the 7.2.4 code? I did manage to get the code from Intel compiled with a couple minutes of work, but haven't loaded it up yet as I didn't see anything that caught my untrained eye in the diffs. I'll wait until its ported over and would be happy to test if needed. Conveniently enough I just received another report from my test boxes with a pretty stock loader.conf. I had forgotten to remove the advanced options from the interfaces after I cycled them to pick up the fc_setting=0. Fixed that up just meow. hw.em.fc_setting=0 cc_cubic_load=YES I bounced em0 because dropped packets incremented 368756 to 369124 and the interface is not incrementing packets out. 5:35PM up 2 days, 17:45, 0 users, load averages: 0.34, 0.45, 0.48 em0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=219bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC ether X inet6 X%em0 prefixlen 64 scopeid 0x1 nd6 options=1PERFORMNUD media: Ethernet autoselect (1000baseT full-duplex) status: active em1: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=219bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC ether X inet6 X%em1 prefixlen 64 scopeid 0x2 inet6 X prefixlen 64 autoconf nd6 options=3PERFORMNUD,ACCEPT_RTADV media: Ethernet autoselect (1000baseT full-duplex) status: active ipfw0: flags=8801UP,SIMPLEX,MULTICAST metric 0 mtu 65536 lo0: flags=8049UP,LOOPBACK,RUNNING,MULTICAST metric 0 mtu 16384 options=3RXCSUM,TXCSUM inet 127.0.0.1 netmask 0xff00 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4 inet X.X.X.X netmask 0x inet X.X.X.X netmask 0x inet X.X.X.X netmask 0x inet X.X.X.X netmask 0x inet X.X.X.X netmask 0x inet X.X.X.X netmask 0x inet X.X.X.X netmask 0x inet X.X.X.X netmask 0x nd6 options=3PERFORMNUD,ACCEPT_RTADV lagg0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=219bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC ether X inet X.X.X.X netmask 0xff00 broadcast X.X.X.X inet6 X%lagg0 prefixlen 64 scopeid 0x5 nd6 options=3PERFORMNUD,ACCEPT_RTADV media: Ethernet autoselect status: active laggproto loadbalance laggport: em0 flags=4ACTIVE laggport: em1 flags=4ACTIVE interrupt total rate irq3: uart1 3456 0 cpu0: timer 473404250 2000 irq256: em0:rx 0 24614350 103 irq257: em0:tx 0 1220810972 5157 irq258: em0:link 1 0 irq259: em1:rx 0 1533295149 6477 irq260: em1:tx 0 1194032538 5044 irq261: em1:link 3272 0 irq262: mps0 189602667 801 cpu3: timer 473396089 2000 cpu1: timer 473396089 2000 cpu2: timer 473396081 2000 Total 6055954914 25585 32999/8476/41475 mbufs in use (current/cache/total) 4064/3398/7462/5872038 mbuf clusters in use (current/cache/total/max) 4064/800 mbuf+clusters out of packet secondary zone in use (current/cache) 24900/669/25569/2936019 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) 115977K/11591K/127568K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0/0/0 sfbufs in use (current/peak/max) 0 requests for sfbufs denied 0 requests for sfbufs delayed 61 requests for I/O initiated by sendfile 0 calls to protocol drain routines Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll Drop em0 1500 Link#1 00:25:90:2a:a2:d7 24946787 0 0 5734180355 0 0 369844 em0 1500 fe80:1::225:9 fe80:1::225:90ff: 0 - - 2 - - - em1 1500 Link#2 00:25:90:2a:a2:d7 5220869518 15996 0 5429971995 0 0 37009 em1 1500 fe80:2::225:9 fe80:2::225:90ff: 0 - - 1 - - - em1 1500 2607:f4e8:310 2607:f4e8:310:12: 0 - - 0 - - - lagg0 1500 Link#5 00:25:90:2a:a2:d7 5245767782 0 0 11162877037 406853 0 0 lagg0 1500 69.164.38.0/2 http://69.164.38.0/2 69.164.38.69 4776881809 - - 11164303625 - - - lagg0 1500 fe80:5::225:9 fe80:5::225:90ff: 0 - - 3 - - - kern.msgbuf: Oct 30 17:08:38 cds1019 kernel: ifa_add_loopback_route: insertion failed Oct 30 17:12:10 cds1019 kernel: ifa_add_loopback_route: insertion failed Oct 30 17:20:20 cds1019 last message repeated 3 times Oct 30 17:32:13 cds1019 last message repeated 4 times Oct 30 17:34:27 cds1019 kernel: ifa_add_loopback_route: insertion failed Oct 30 17:35:03 cds1019 kernel: Interface is RUNNING and INACTIVE Oct 30 17:35:03 cds1019 kernel: em0: hw tdh = 818, hw tdt = 818 Oct 30 17:35:03 cds1019 kernel: em0: hw rdh = 99, hw rdt = 98 Oct 30 17:35:03 cds1019 kernel: em0: Tx
Re: Intel 82574L interface wedging on em 7.1.9/7.2.3 when MSIX enabled
On 10/31/2011 11:43 AM, Emil Muratov wrote: You may try these settings and see if they help: - hw.em.fc_setting=0 (in /boot/loader.conf) - hw.em.rxd=4096 (in /boot/loader.conf) - hw.em.txd=4096 (in /boot/loader.conf) - Fix speed and duplex at both link sides. After doing that, confirm on the freebsd box (with ifconfig) and the other device (with whatever command it provides) that the same speed and duplex is used by both devices. you also have high values for dev.em.0.rx/tx_[abs]_int_delay. If you have set them manually, remove them or replace them with these in loader.conf: hw.em.rx_int_delay=0 hw.em.tx_int_delay=66 hw.em.tx_abs_int_delay=66 hw.em.rx_abs_int_delay=66 these may be set via corresponding sysctls too. Still no luck with the above settings, I've got another lockups a couple of times. Here is the recent details = 11.10.30-23:43:06 ... interface em0 is down... we have Ierrs and no ingoing packets for 5 secs, interface em0 must be toggled 11:43PM up 1 day, 3:01, 2 users, load averages: 0.76, 0.64, 0.70 == vmstat -i == interrupt total rate irq18: ehci0 1145540 11 irq22: nfe0473895599 4872 cpu0: timer195004026 2005 irq256: ahci0 12832958131 irq257: em0:rx 095571051982 irq258: em0:tx 088777545912 irq259: em0:link 946 0 cpu3: timer195003397 2005 cpu1: timer195003398 2005 cpu2: timer195003399 2005 Total 1452237859 14932 == netstat -m == 5424/1701/7125 mbufs in use (current/cache/total) 719/1185/1904/51200 mbuf clusters in use (current/cache/total/max) 719/582 mbuf+clusters out of packet secondary zone in use (current/cache) 329/583/912/12800 4k (page size) jumbo clusters in use (current/cache/total/max) 4095/342/4437/12800 9k jumbo clusters in use (current/cache/total/max) 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) 40978K/8205K/49183K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/6663503/0 requests for jumbo clusters denied (4k/9k/16k) 0/0/0 sfbufs in use (current/peak/max) 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile 0 calls to protocol drain routines == netstat -ind == NameMtu Network Address Ipkts Ierrs IdropOpkts Oerrs Coll Drop usbus 0 Link#1 0 0 00 0 00 usbus 0 Link#2 0 0 00 0 00 nfe0 1500 Link#3 00:25:22:21:86:89 196018201 0 0 350650768 0 0 664 nfe0 1500 fe80::225:22f fe80::225:22ff:fe0 - -0 - -- nfe0 1500 10.16.128.0/1 10.16.189.71 6 - - 29787707 - -- em09000 Link#4 00:1b:21:ab:bf:4a 175676617 949 0 101627139 0 00 em09000 192.168.168.0 192.168.168.1 7628423 - - 13654747 - -- em09000 fe80::21b:21f fe80::21b:21ff:fe 45 - - 5747 - -- em09000 2002:d5xx:xxx 2002:d5xx::x: 153 - - 159 - -- Oct 30 23:43:06 ion kernel: Interface is RUNNING and INACTIVE Oct 30 23:43:07 ion kernel: em0: hw tdh = 2656, hw tdt = 3271 Oct 30 23:43:07 ion kernel: em0: hw rdh = 2112, hw rdt = 2111 Oct 30 23:43:07 ion kernel: em0: Tx Queue Status = 1 Oct 30 23:43:07 ion kernel: em0: TX descriptors avail = 3481 Oct 30 23:43:07 ion kernel: em0: Tx Descriptors avail failure = 0 Oct 30 23:43:07 ion kernel: em0: RX discarded packets = 0 Oct 30 23:43:07 ion kernel: em0: RX Next to Check = 2112 Oct 30 23:43:07 ion kernel: em0: RX Next to Refresh = 2111 net.inet.ip.intr_queue_maxlen: 4096 net.inet.ip.intr_queue_drops: 0 dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 7.2.3 dev.em.0.%driver: em dev.em.0.%location: slot=0 function=0 dev.em.0.%pnpinfo: vendor=0x8086 device=0x10d3 subvendor=0x8086 subdevice=0xa01f class=0x02 dev.em.0.%parent: pci2 dev.em.0.nvm: -1 dev.em.0.debug: -1 dev.em.0.rx_int_delay: 0 dev.em.0.tx_int_delay: 66 dev.em.0.rx_abs_int_delay: 66 dev.em.0.tx_abs_int_delay: 66 dev.em.0.rx_processing_limit: 100 dev.em.0.flow_control: 0 dev.em.0.eee_control: 0 dev.em.0.link_irq: 956 dev.em.0.mbuf_alloc_fail: 0 dev.em.0.cluster_alloc_fail: 0 dev.em.0.dropped: 0 dev.em.0.tx_dma_fail: 1 dev.em.0.rx_overruns: 0 dev.em.0.watchdog_timeouts: 0 dev.em.0.device_control: 1074790984 dev.em.0.rx_control: 100827170 dev.em.0.fc_high_water: 11264 dev.em.0.fc_low_water: 9764 dev.em.0.queue0.txd_head: 2656 dev.em.0.queue0.txd_tail: 3274 dev.em.0.queue0.tx_irq: 88769608 dev.em.0.queue0.no_desc_avail: 0 dev.em.0.queue0.rxd_head: 2112
Re: Intel 82574L interface wedging on em 7.1.9/7.2.3 when MSIX enabled
On 10/31/2011 12:51 PM, Emil Muratov wrote: On 31.10.2011 12:13, Hooman Fazaeli wrote: Thanks for looking into this. I'd be happy to test any patch thrown my way, but keep in mind my issue is only tickled when MSI-X is enabled. My interfaces aren't bouncing, though it might be possible some unique path in the MSI-X code is causing a throughput hang akin to connectivity loss? Jack is the delta your speaking to the 7.2.4 code? I did manage to get the code from Intel compiled with a couple minutes of work, but haven't loaded it up yet as I didn't see anything that caught my untrained eye in the diffs. I'll wait until its ported over and would be happy to test if needed. Conveniently enough I just received another report from my test boxes with a pretty stock loader.conf. I had forgotten to remove the advanced options from the interfaces after I cycled them to pick up the fc_setting=0. Fixed that up just meow. hw.em.fc_setting=0 cc_cubic_load=YES Jason Attached is a patch for if_em.c. It flushes interface queue when it is full and link is not active. Please note that when this happens, drops are increasing on interface and this will trigger your scripts as before. You need to change a little the scripts as follows: check interface TX status if (interface TX seems hung) { sleep 5 check interface TX status if (interface TX seems hung) { reset the interface. } } For MULTIQUEUE, it just disables the check for link status (which is not good). so pls. test in non-MULTIQUEUE mode. The patch also contains some minor fixups to compile on 7 plus a fix from r1.69 which addressed RX hang problem (the fix was later removed in r1.70). I included it for Emil to give it a try. Pls. let me know if you have any problems with patch. Hi! Thanks for the update. But I can't make it, there is an error in build process. Can you kindly take a look at it? -emil@ion-/usr/src/sys/dev/e1000 --(0) sudo patch /home/emil/patches/if_em/if_em.c.patch Password: Hmm... Looks like a unified diff to me... The text leading up to this was: -- |--- if_em.c.orig 2011-10-31 11:43:35.0 +0330 |+++ if_em.c2011-10-31 11:43:35.0 +0330 -- Patching file if_em.c using Plan A... Hunk #1 succeeded at 85. Hunk #2 succeeded at 101. Hunk #3 succeeded at 382 (offset -29 lines). Hunk #4 succeeded at 400 (offset -29 lines). Hunk #5 succeeded at 857 (offset -29 lines). Hunk #6 succeeded at 960 (offset -29 lines). Hunk #7 succeeded at 1420 (offset -29 lines). Hunk #8 succeeded at 1436 (offset -29 lines). Hunk #9 succeeded at 1466 (offset -29 lines). Hunk #10 succeeded at 2230 (offset -29 lines). Hunk #11 succeeded at 2338 (offset -29 lines). Hunk #12 succeeded at 2350 (offset -29 lines). Hunk #13 succeeded at 3799 (offset -29 lines). Hunk #14 succeeded at 5164 with fuzz 2 (offset -29 lines). Hunk #15 succeeded at 5616 (offset -4 lines). done -emil@ion-/usr/src/sys/dev/e1000 --(0) sudo patch /home/emil/patches/if_em/if_em.h.patch Hmm... Looks like a unified diff to me... The text leading up to this was: -- |--- if_em.h.orig 2011-10-31 11:43:34.0 +0330 |+++ if_em.h2011-10-31 11:43:35.0 +0330 -- Patching file if_em.h using Plan A... Hunk #1 succeeded at 438. done #root@ion-/usr/src/sys/modules/em #-(0) make Warning: Object directory not changed from original /usr/src/sys/modules/em awk -f @/tools/makeobjops.awk @/kern/device_if.m -h awk -f @/tools/makeobjops.awk @/kern/bus_if.m -h awk -f @/tools/makeobjops.awk @/dev/pci/pci_if.m -h : opt_inet.h cc -O2 -pipe -march=nocona -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE -nostdinc -I/usr/src/sys/modules/em/../../dev/e1000 -I. -I@-I@/contrib/altq -finline-limit=8000 --param inline-unit-growth=100 --param large-function-growth=1000 -fno-common -fno-omit-frame-pointer-mcmodel=kernel -mno-red-zone -mfpmath=387 -mno-sse -mno-sse2 -mno-sse3 -mno-mmx -mno-3dnow -msoft-float -fno-asynchronous-unwind-tables - ffreestanding -fstack-protector -std=iso9899:1999 -fstack-protector -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing- prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef -Wno-pointer-sign -fformat-extensions -c /usr/src/sys/modules/em/../../dev/e1000/if _em.c /usr/src/sys/modules/em/../../dev/e1000/if_em.c:387: error: 'sysctl__hw_em_children' undeclared here (not in a function) *** Error code 1 Stop in /usr/src/sys/modules/em. Please sync your sys/dev/e1000 with HEAD and try again: setenv CVSROOT :pserver:anon...@anoncvs.freebsd.org:/home/ncvs cvs login password: enter anonymous cd /usr/src
Re: kern/162028: [ixgbe] [patch] misplaced #endif in ixgbe.c
The following reply was made to PR kern/162028; it has been noted by GNATS. From: Hooman Fazaeli hoomanfaza...@gmail.com To: Sergey Kandaurov pluk...@gmail.com Cc: bug-follo...@freebsd.org Subject: Re: kern/162028: [ixgbe] [patch] misplaced #endif in ixgbe.c Date: Sun, 30 Oct 2011 11:03:44 +0330 On 10/29/2011 4:28 PM, Sergey Kandaurov wrote: I have a more complete patch. Can you test it please? Index: sys/dev/ixgbe/ixgbe.c === --- sys/dev/ixgbe/ixgbe.c (revision 226068) +++ sys/dev/ixgbe/ixgbe.c (working copy) @@ -867,16 +867,15 @@ static int ixgbe_ioctl(struct ifnet * ifp, u_long command, caddr_t data) { struct adapter *adapter = ifp-if_softc; - struct ifreq*ifr = (struct ifreq *) data; + struct ifreq*ifr = (struct ifreq *)data; #if defined(INET) || defined(INET6) - struct ifaddr *ifa = (struct ifaddr *)data; - boolavoid_reset = FALSE; + struct ifaddr *ifa = (struct ifaddr *)data; #endif - int error = 0; + boolavoid_reset = FALSE; + int error = 0; switch (command) { - -case SIOCSIFADDR: + case SIOCSIFADDR: #ifdef INET if (ifa-ifa_addr-sa_family == AF_INET) avoid_reset = TRUE; @@ -885,7 +884,6 @@ ixgbe_ioctl(struct ifnet * ifp, u_long command, ca if (ifa-ifa_addr-sa_family == AF_INET6) avoid_reset = TRUE; #endif -#if defined(INET) || defined(INET6) /* ** Calling init results in link renegotiation, ** so we avoid doing it when possible. @@ -894,12 +892,13 @@ ixgbe_ioctl(struct ifnet * ifp, u_long command, ca ifp-if_flags |= IFF_UP; if (!(ifp-if_drv_flags IFF_DRV_RUNNING)) ixgbe_init(adapter); +#ifdef INET if (!(ifp-if_flags IFF_NOARP)) arp_ifinit(ifp, ifa); +#endif } else error = ether_ioctl(ifp, command, data); break; -#endif case SIOCSIFMTU: IOCTL_DEBUGOUT(ioctl: SIOCSIFMTU (Set Interface MTU)); if (ifr-ifr_mtu IXGBE_MAX_FRAME_SIZE - ETHER_HDR_LEN) { sure. I am very busy right now. Will test as soon as I can. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Intel 82574L interface wedging on em 7.1.9/7.2.3 when MSIX enabled
I finally managed to re-produce an affect similar to Jason's case. It may not be the exact same issue, but it is a serious problem and must be addressed. 1. Push out packet on em/igb with high rate. 2. Disconnect cable and wait for a few seconds. netstat -ind shows that Drops are increasing. 3. Re-connect the cable. Both sides of like re-negotiate and the links comes up. 4. But ..., no packets is ever transmitted again and Drops still increasing! This is because em/lem/igb and some other interfaces (i.e., bce) have a check at the very beginning of their _start function which checks link status and immediately returns if it is inactive. This behavior causes if_snd to fills up in step 2 and when this happens, IFQ_HANDOFF never calls if_start again, even when the link becomes active again. A cable unplug is not necessary to trigger the issue. Any temporary link loss (e.i., during re-negotiation) can potentially lead to aforementioned problem. IMHO, this is not a driver issue and the real fix would be to change IFQ_HANDOFF to call if_start when the queue is full. Jason, If you are interested, I can prepare a patch for you to address this issue in if_em and see if it helps. --- if_em.c.orig2011-10-27 21:09:33.0 +0330 +++ if_em.c 2011-10-27 21:46:18.0 +0330 @@ -85,6 +85,14 @@ #include e1000_82571.h #include if_em.h +#if !defined(DISABLE_FIXUPS) __FreeBSD_version 80 +static __inline int +pci_find_cap(device_t dev, int capability, int *capreg) +{ +return (PCI_FIND_EXTCAP(device_get_parent(dev), dev, capability, capreg)); +} +#endif + /* * Set this to one to display debug statistics */ @@ -399,6 +407,12 @@ /* Global used in WOL setup with multiport cards */ static int global_quad_port_a = 0; +#ifndef DISABLE_FIXUPS +static int em_rx_hang_fixup = 0; +SYSCTL_INT(_hw_em, OID_AUTO, rx_hang_fixup, CTLFLAG_RW, em_rx_hang_fixup, 0, +Enable/disable r1.69 RX hang fixup code); +#endif + /* * Device identification routine * @@ -864,7 +878,11 @@ int err = 0, enq = 0; if ((ifp-if_drv_flags (IFF_DRV_RUNNING | IFF_DRV_OACTIVE)) != +#ifdef DISABLE_FIXUPS IFF_DRV_RUNNING || adapter-link_active == 0) { +#else + IFF_DRV_RUNNING) { +#endif if (m != NULL) err = drbr_enqueue(ifp, txr-br, m); return (err); @@ -963,8 +981,10 @@ IFF_DRV_RUNNING) return; +#ifdef DISABLE_FIXUPS if (!adapter-link_active) return; +#endif while (!IFQ_DRV_IS_EMPTY(ifp-if_snd)) { /* Call cleanup if number of TX descriptors low */ @@ -1414,7 +1434,11 @@ * Legacy polling routine: note this only works with single queue * */ +#if !defined(DISABLE_FIXUPS) __FreeBSD_version 80 +static void +#else static int +#endif em_poll(struct ifnet *ifp, enum poll_cmd cmd, int count) { struct adapter *adapter = ifp-if_softc; @@ -1426,7 +1450,11 @@ EM_CORE_LOCK(adapter); if ((ifp-if_drv_flags IFF_DRV_RUNNING) == 0) { EM_CORE_UNLOCK(adapter); +#if !defined(DISABLE_FIXUPS) __FreeBSD_version 80 + return; +#else return (0); +#endif } if (cmd == POLL_AND_CHECK_STATUS) { @@ -1452,8 +1480,11 @@ em_start_locked(ifp, txr); #endif EM_TX_UNLOCK(txr); - +#if !defined(DISABLE_FIXUPS) __FreeBSD_version 80 + return; +#else return (rx_done); +#endif } #endif /* DEVICE_POLLING */ @@ -2213,6 +2244,16 @@ e1000_get_laa_state_82571(adapter-hw)) e1000_rar_set(adapter-hw, adapter-hw.mac.addr, 0); +#ifndef DISABLE_FIXUPS + if (em_rx_hang_fixup) { + /* trigger tq to refill rx ring queue if it is empty */ + for (int i = 0; i adapter-num_queues; i++, rxr++) { + if (rxr-next_to_check == rxr-next_to_refresh) { + taskqueue_enqueue(rxr-tq, rxr-rx_task); + } + } + } +#endif /* Mask to use in the irq trigger */ if (adapter-msix_mem) trigger = rxr-ims; /* RX for 82574 */ @@ -3766,7 +3807,7 @@ * If we have a minimum free, clear IFF_DRV_OACTIVE * to tell the stack that it is OK to send packets. */ -if (txr-tx_avail EM_MAX_SCATTER) +if (txr-tx_avail = EM_MAX_SCATTER) ifp-if_drv_flags = ~IFF_DRV_OACTIVE; /* Disable watchdog if all clean */ @@ -5553,4 +5594,8 @@ rxr-rx_discarded); device_printf(dev, RX Next to Check = %d\n, rxr-next_to_check);
Re: Intel 82574L interface wedging on em 7.1.9/7.2.3 when MSIX enabled
On 10/30/2011 6:03 PM, Ryan Stone wrote: On Sun, Oct 30, 2011 at 4:57 AM, Hooman Fazaelihoomanfaza...@gmail.com wrote: IMHO, this is not a driver issue and the real fix would be to change IFQ_HANDOFF to call if_start when the queue is full. I'm not sure that's the right approach. 99% of the time, calling if_start when the queue is full will be a waste of time. It seems to me that the link interrupt handler needs to kick off the tx task to drain the tx queue instead. If the queue were not full, system would consume the CPU for sending packets. Now, that it is full, a much little time is used to recover from a (temporary) problem. Not a big deal! Furthermore, the most common case for queue being full is stack sending packets too fast. In this case OACTIVE is set and if_start is not called at all. Changing HANDOFF has the benefit that it is simple, can be implemented fast and fixes the problem once for all drivers and for all such dangerous bugs not yet discovered. It also makes drivers' code simpler. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Intel 82574L interface wedging on em 7.1.9/7.2.3 when MSIX enabled
On 10/27/2011 9:59 AM, Emil Muratov wrote: Hi Hooman Here is what I've got when the script triggered just in time when the interface was locked 11.10.26-23:39:10 ... interface em0 is down... FreeBSD ion.hotplug.ru 8.2-STABLE FreeBSD 8.2-STABLE #0: Thu Oct 20 20:20:25 MSD 2011 r...@epia.home .lan:/usr/obj/usr/src/sys/ION6debug amd64 11:39PM up 1:12, 2 users, load averages: 0.26, 0.48, 0.58 == vmstat -i == interrupt total rate irq22: nfe0 16644480 3865 cpu0: timer 8610122 1999 irq256: ahci0 606705140 irq257: em0:rx 0 3896622904 irq258: em0:tx 0 2762957641 irq259: em0:link 620 0 cpu3: timer 8609499 1999 cpu1: timer 8609499 1999 cpu2: timer 8609499 1999 Total 58350003 13550 == netstat -ind == NameMtu Network Address Ipkts Ierrs IdropOpkts Oerrs Coll Drop usbus 0 Link#1 0 0 00 0 00 usbus 0 Link#2 0 0 00 0 00 nfe0 1500 Link#3 00:25:22:21:86:89 7157140 0 0 12266747 0 00 nfe0 1500 fe80::225:22f fe80::225:22ff:fe0 - - 85 - -- nfe0 1500 10.16.128.0/1 10.16.189.71 0 - -48135 - -- em09000 Link#4 00:1b:21:ab:bf:4a 5465087 623 0 2862028 0 0 113 em09000 192.168.168.0 192.168.168.1 764085 - - 1005078 - -- em09000 fe80::21b:21f fe80::21b:21ff:fe 45 - - 252 - -- em09000 2002:d58d:871 2002:d58d:8715:1: 73 - - 38 - -- wifi 1500 Link#7 00:1b:21:ab:bf:4a 347 0 0 350 0 00 wifi 1500 192.168.168.6 192.168.168.65 0 - -0 - -- wifi 1500 fe80::225:x fe80::225:x:x0 - - 349 - - - wifi 1500 2002:x:x 2002:x:x:2:0 - -0 - -- wifio 1500 Link#8 00:1b:21:ab:bf:4a59559 0 0 114639 0 00 wifio 1500 192.168.168.8 192.168.168.81 0 - - 160 - -- wifio 1500 fe80::225:x fe80::225:x:x0 - -0 - - - stf0 1280 Link#95725 0 0 6125 420 00 stf0 1280 2002:x:x 2002:x:x::1 1878 - - 1121 - -- ng0* 1500 Link#10 0 0 00 0 00 ng1* 1500 Link#11 0 0 00 0 00 ng21492 Link#127143733 0 0 12234436 0 00 ng21492 213.141.x.x 213.141.x.x 4735932 - - 8480089 - -- ng21492 fe80::x:x fe80::x:x:x0 - -1 - -- tun0 1455 Link#13350 0 0 172 0 00 tun0 1455 fe80::225:x fe80::225:x:x0 - -2 - - - tun0 1455 192.168.169.1 192.168.169.1 117 - - 167 - -- Oct 26 23:39:11 ion kernel: em0: hw tdh = 975, hw tdt = 944 Oct 26 23:39:11 ion kernel: em0: hw rdh = 960, hw rdt = 959 Oct 26 23:39:11 ion kernel: em0: Tx Queue Status = 1 Oct 26 23:39:11 ion kernel: em0: TX descriptors avail = 31 Oct 26 23:39:11 ion kernel: em0: Tx Descriptors avail failure = 0 Oct 26 23:39:11 ion kernel: em0: RX discarded packets = 0 Oct 26 23:39:11 ion kernel: em0: RX Next to Check = 960 Oct 26 23:39:11 ion kernel: em0: RX Next to Refresh = 959 net.inet.ip.intr_queue_maxlen: 4096 net.inet.ip.intr_queue_drops: 0 dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 7.2.3 dev.em.0.%driver: em dev.em.0.%location: slot=0 function=0 dev.em.0.%pnpinfo: vendor=0x8086 device=0x10d3 subvendor=0x8086 subdevice=0xa01f class=0x02 dev.em.0.%parent: pci2 dev.em.0.nvm: -1 dev.em.0.debug: -1 dev.em.0.rx_int_delay: 200 dev.em.0.tx_int_delay: 200 dev.em.0.rx_abs_int_delay: 4096 dev.em.0.tx_abs_int_delay: 4096 dev.em.0.rx_processing_limit: 100 dev.em.0.flow_control: 3 dev.em.0.eee_control: 0 dev.em.0.link_irq: 648 dev.em.0.mbuf_alloc_fail: 0 dev.em.0.cluster_alloc_fail: 0 dev.em.0.dropped: 0 dev.em.0.tx_dma_fail: 0 dev.em.0.rx_overruns: 0 dev.em.0.watchdog_timeouts: 0 dev.em.0.device_control: 1477444168 dev.em.0.rx_control: 100827170 dev.em.0.fc_high_water: 11264 dev.em.0.fc_low_water: 9764 dev.em.0.queue0.txd_head: 975 dev.em.0.queue0.txd_tail: 944 dev.em.0.queue0.tx_irq: 2762762 dev.em.0.queue0.no_desc_avail: 0 dev.em.0.queue0.rxd_head: 960 dev.em.0.queue0.rxd_tail: 959 dev.em.0.queue0.rx_irq: 3895860 dev.em.0.mac_stats.excess_coll: 0
Re: Intel 82574L interface wedging on em 7.1.9/7.2.3 when MSIX enabled
Hi Jason Have you tried: hw.em.fc_setting=0 (in loader.conf) ifconfig emX -tso -lro -rxcsum -txcsum -vlanhwtag -wol with MSIX and no multiqueue. Advanced features has always been a source of problem. It is worth a try and help to narrow down possibilities. It would also be helpful if you provide 'ifconfig' output when the problem happens. And a question: Does interface RX also hangs or it is just TX? On 10/26/2011 12:25 AM, Jason Wolfe wrote: On Fri, Oct 7, 2011 at 2:14 PM, Jason Wolfenitrobo...@gmail.com wrote: Bumping rx/tx descriptors to 2048 was actually for performance reasons and not to try to get around the issue. I did some fairly in depth testing and found under heavy load it performed the best with those settings. As mentioned on the other thread I'll re enable MSI-X on a few servers here and collect uptime and the kernel msgbuf in addition. I'll bump the descriptors down to 512 to try and increase our chances and compile the driver with EM_MULTIQUEUE also. Jason Hi there, So I have a small pool of server running EM_MULTIQUEUE with lower descriptors as promised and just received an alert of an event. I have a fairly large pool of servers on the same hardware running the same OS/driver sans MSI-X and multiqueue with not a single 'wedge' event in about 2 months now, and it seems multiqueue has not changed the commonality of the issue. Here is my loader.conf followed by everything collected: net.inet.tcp.tcbhashsize=4096 net.inet.tcp.syncache.hashsize=1024 net.inet.tcp.syncache.bucketlimit=512 net.inet.tcp.syncache.cachelimit=65536 net.inet.tcp.hostcache.hashsize=1024 net.inet.tcp.hostcache.bucketlimit=512 net.inet.tcp.hostcache.cachelimit=65536 hw.em.rxd=512 hw.em.txd=512 cc_cubic_load=YES I bounced em1 because dropped packets incremented 1386169 to 1386355 and the interface is not incrementing packets out. 1:30PM up 4 days, 6:19, 0 users, load averages: 0.18, 0.38, 0.42 interrupt total rate irq3: uart1 5816 0 cpu0: timer 736655476 2000 irq256: em0:rx 0 38122306 103 irq257: em0:tx 0 1605535054 4359 irq258: em0:link 1 0 irq259: em1:rx 0 2192460862 5952 irq260: em1:tx 0 1599049303 4341 irq261: em1:link 4172 0 irq262: mps0 212448927 576 cpu2: timer 736647277 2000 cpu3: timer 736647302 2000 cpu1: timer 736647302 2000 Total 8594223798 2 27653/6022/33675 mbufs in use (current/cache/total) 3054/3196/6250/5700670 mbuf clusters in use (current/cache/total/max) 3054/1041 mbuf+clusters out of packet secondary zone in use (current/cache) 23266/1642/24908/2850335 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) 106085K/14465K/120550K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0/0/0 sfbufs in use (current/peak/max) 0 requests for sfbufs denied 0 requests for sfbufs delayed 22 requests for I/O initiated by sendfile 0 calls to protocol drain routines Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll Drop em0 1500Link#1 00:25:90:1f:f5:7d 38575296 0 0 6300959828 0 0 706638 em0 1500 fe80:1::225:9 fe80:1::225:90ff: 0 - - 3 - - - em1 1500Link#2 00:25:90:1f:f5:7d 6091053202 22415 0 6327642657 0 0 1386797 em1 1500 fe80:2::225:9 fe80:2::225:90ff: 0 - - 1 - - - lagg0 1500Link#5 00:25:90:1f:f5:7d 6129556798 0 0 12627493094 2093435 0 0 lagg0 1500 69.164.38.0/2 69.164.38.93 5429109508 - - 12630422599 - - - lagg0 1500 fe80:5::225:9 fe80:5::225:90ff: 12 - - 17 - - - lagg0 1500 2607:f4e8:310 2607:f4e8:310:12: 13655 - - 13663 - - - kern.msgbuf: Oct 25 13:30:04 cds1043 kernel: Interface is RUNNING and INACTIVE Oct 25 13:30:04 cds1043 kernel: em0: hw tdh = 105, hw tdt = 158 Oct 25 13:30:04 cds1043 kernel: em0: hw rdh = 191, hw rdt = 190 Oct 25 13:30:04 cds1043 kernel: em0: Tx Queue Status = 0 Oct 25 13:30:04 cds1043 kernel: em0: TX descriptors avail = 422 Oct 25 13:30:04 cds1043 kernel: em0: Tx Descriptors avail failure = 0 Oct 25 13:30:04 cds1043 kernel: em0: RX discarded packets = 0 Oct 25 13:30:04 cds1043 kernel: em0: RX Next to Check = 192 Oct 25 13:30:04 cds1043 kernel: em0: RX Next to Refresh = 191 Oct 25 13:30:04 cds1043 kernel: Interface is RUNNING and INACTIVE Oct 25 13:30:04 cds1043 kernel: em1: hw tdh = 159, hw tdt = 159 Oct 25 13:30:04 cds1043 kernel: em1: hw rdh = 193, hw rdt = 191 Oct 25 13:30:04 cds1043 kernel: em1: Tx Queue Status = 0 Oct 25 13:30:04 cds1043 kernel: em1: TX descriptors avail = 512 Oct 25 13:30:04 cds1043 kernel: em1: Tx Descriptors avail failure = 0 Oct 25 13:30:04 cds1043 kernel: em1: RX discarded packets = 0 Oct 25 13:30:04 cds1043 kernel: em1: RX Next to Check = 407 Oct 25 13:30:04 cds1043 kernel: em1: RX Next to Refresh = 436 net.inet.ip.intr_queue_maxlen: 512 net.inet.ip.intr_queue_drops: 0 dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 7.2.3 dev.em.0.%driver: em dev.em.0.%location:
Re: Intel 82574L interface wedging on em 7.1.9/7.2.3 when MSIX enabled
Hi, Can yan you pls post the output of these command _when_ the problem happens? uname -a sysctl dev.em netstat -ind ifconfig I've got almost the same problem with intel 82574L based nic. My platform is nvidia ion running Atom 1.6 and nic is an external PCI-express adapter. Unlike Jason's case mine is always stuck in receiving traffic, it's Ierrs increasing while Ipkts not. Thanks to Jason's script I can see those locks and interface flapping every several hours. My system is not a heavy loaded server but just a home nas/router, usually routing at 100 mbps or less. Nither disabling MSIX nor tuning txd rxd doesn't help me. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
misplaced #endif in ixgbe
A misplaced #endif in ixgbe_ioctl() causes all sorts of problems when INET and INET6 are undefined. Pls. see the attached patch. --- ixgbe.c.orig2011-10-17 20:37:17.0 +0330 +++ ixgbe.c 2011-10-17 20:38:40.0 +0330 @@ -898,8 +898,8 @@ arp_ifinit(ifp, ifa); } else error = ether_ioctl(ifp, command, data); - break; #endif + break; case SIOCSIFMTU: IOCTL_DEBUGOUT(ioctl: SIOCSIFMTU (Set Interface MTU)); if (ifr-ifr_mtu IXGBE_MAX_FRAME_SIZE - ETHER_HDR_LEN) { ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: em(4) high latency w/o msix
8.2-RELEASE and stable/8 have the same problem. Ping RTT triples when MSIX is disabled. On 10/3/2011 11:50 AM, Jack Vogel wrote: Can you try the driver in 8.2 and possibly stable/8 to see the behavior there. And, just curious, why are you disabling MSIX? Jack On Mon, Oct 3, 2011 at 12:51 AM, Hooman Fazaeli faza...@sepehrs.com mailto:faza...@sepehrs.com wrote: Hi Jack The hardware is a PCIe network appliance with 3 port modules. The ports I have used in the test are 82574L residing on a 4 port module. Anyway, as I noted in last mail, the stock 7.3-RELEASE driver does not expose this problem on the same hardware. On 10/2/2011 7:38 PM, Jack Vogel wrote: On what hardware? Jack On Sun, Oct 2, 2011 at 6:42 AM, Hooman Fazaeli faza...@sepehrs.com mailto:faza...@sepehrs.com wrote: Latest em(4) driver from HEAD seems to have high latency when MSIX is disabled. With MSIX enabled (hw.em.enable_msix=1): # ping -c5 192.168.1.83 PING 192.168.1.83 (192.168.1.83): 56 data bytes 64 bytes from 192.168.1.83 http://192.168.1.83: icmp_seq=0 ttl=64 time=0.055 ms 64 bytes from 192.168.1.83 http://192.168.1.83: icmp_seq=1 ttl=64 time=0.076 ms 64 bytes from 192.168.1.83 http://192.168.1.83: icmp_seq=2 ttl=64 time=0.066 ms 64 bytes from 192.168.1.83 http://192.168.1.83: icmp_seq=3 ttl=64 time=0.051 ms 64 bytes from 192.168.1.83 http://192.168.1.83: icmp_seq=4 ttl=64 time=0.063 ms --- 192.168.1.83 ping statistics --- 5 packets transmitted, 5 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 0.051/0.062/0.076/0.009 ms With MSIX disabled: # ping -c5 192.168.1.83 PING 192.168.1.83 (192.168.1.83): 56 data bytes 64 bytes from 192.168.1.83 http://192.168.1.83: icmp_seq=0 ttl=64 time=0.180 ms 64 bytes from 192.168.1.83 http://192.168.1.83: icmp_seq=1 ttl=64 time=0.164 ms 64 bytes from 192.168.1.83 http://192.168.1.83: icmp_seq=2 ttl=64 time=0.169 ms 64 bytes from 192.168.1.83 http://192.168.1.83: icmp_seq=3 ttl=64 time=0.172 ms 64 bytes from 192.168.1.83 http://192.168.1.83: icmp_seq=4 ttl=64 time=0.167 ms --- 192.168.1.83 ping statistics --- 5 packets transmitted, 5 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 0.164/0.170/0.180/0.005 ms As you see, w/o MSIX, RTT increases by a factor of 3. I also tested the following drivers: - igb(4) from HEAD: OK. - Stock 7.3-RELEASE: OK. - Stock 7.4-RELEASE: problem exist. Any ideas? ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
intel checksum offload
Hi list, The data sheet for intel 82576 advertises IP TX/RX checksum offload but the driver does not set CSUM_IP in ifp-if_hwassist. Does this mean that driver (and chip) do not support IP TX checksum offload or the support for TX is not yet included in the driver? ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: em driver, 82574L chip, and possibly ASPM
I have similar problems on a couple of 7.3 boxes with latest driver form -CURRENT. I just wanted to know if your 7 boxes work fine so I look for cause else where. On 2/7/2011 3:23 AM, Mike Tancsa wrote: So far so good. I would often get a hang on the level zero dumps to my backup server Sunday AM, and it made it through! So a good sign, but not a definitive sign. I have a PCIe em card that has this chipset as well and was showing the same sort of problem in a customer's RELENG_7 box. I will see if I can get the customer to try the card in their box with the patch for RELENG_7 as it would show this issue at least once a day until I pulled the card for an older version ---Mike On 2/4/2011 1:12 PM, Jack Vogel wrote: Was curious too, but being more patient than you :) Jack On Fri, Feb 4, 2011 at 10:09 AM, Sean Brunosean...@yahoo-inc.com wrote: Any more data on this problem or do we have to wait a while? Sean On Wed, 2011-02-02 at 10:28 -0800, Mike Tancsa wrote: On 2/2/2011 12:37 PM, Jack Vogel wrote: So has everyone that wanted to get something testing been able to do so? I have been testing in the back and will deploy to my production box this afternoon. As I am not able to reproduce it easily, it will be a bit before I can say the issue is gone. Jan however, was able to trigger it with greater ease ? ---Mike Jack On Tue, Feb 1, 2011 at 7:03 PM, Mike Tancsam...@sentex.net wrote: On 2/1/2011 5:03 PM, Sean Bruno wrote: On Tue, 2011-02-01 at 13:43 -0800, Jack Vogel wrote: To those who are going to test, here is the if_em.c, based on head, with my changes, I have to leave for the afternoon, and have not had a chance to build this, but it should work. I will check back in the later evening. Any blatant problems Sean, feel free to fix them :) Jack I suspect that line 1490 should be: if (more_rx || (ifp-if_drv_flags IFF_DRV_OACTIVE)) { I have hacked up a RELENG_8 version which I think is correct including the above change http://www.tancsa.com/if_em-8.c --- if_em.c.orig2011-02-01 21:47:14.0 -0500 +++ if_em.c 2011-02-01 21:47:19.0 -0500 @@ -30,7 +30,7 @@ POSSIBILITY OF SUCH DAMAGE. **/ -/*$FreeBSD: src/sys/dev/e1000/if_em.c,v 1.21.2.20 2011/01/22 01:37:53 jfv Exp $*/ +/*$FreeBSD$*/ #ifdef HAVE_KERNEL_OPTION_HEADERS #include opt_device_polling.h @@ -93,7 +93,7 @@ /* * Driver version: */ -char em_driver_version[] = 7.1.9; +char em_driver_version[] = 7.1.9-test; /* * PCI Device ID Table @@ -927,11 +927,10 @@ if (!adapter-link_active) return; -/* Call cleanup if number of TX descriptors low */ - if (txr-tx_avail= EM_TX_CLEANUP_THRESHOLD) - em_txeof(txr); - while (!IFQ_DRV_IS_EMPTY(ifp-if_snd)) { + /* First cleanup if TX descriptors low */ + if (txr-tx_avail= EM_TX_CLEANUP_THRESHOLD) + em_txeof(txr); if (txr-tx_avail EM_MAX_SCATTER) { ifp-if_drv_flags |= IFF_DRV_OACTIVE; break; @@ -1411,8 +1410,7 @@ if (!drbr_empty(ifp, txr-br)) em_mq_start_locked(ifp, txr, NULL); #else - if (!IFQ_DRV_IS_EMPTY(ifp-if_snd)) - em_start_locked(ifp, txr); + em_start_locked(ifp, txr); #endif EM_TX_UNLOCK(txr); @@ -1475,11 +1473,10 @@ struct ifnet*ifp = adapter-ifp; struct tx_ring *txr = adapter-tx_rings; struct rx_ring *rxr = adapter-rx_rings; - boolmore; - if (ifp-if_drv_flags IFF_DRV_RUNNING) { - more = em_rxeof(rxr, adapter-rx_process_limit, NULL); + boolmore_rx; + more_rx = em_rxeof(rxr, adapter-rx_process_limit, NULL); EM_TX_LOCK(txr); em_txeof(txr); @@ -1487,12 +1484,10 @@ if (!drbr_empty(ifp, txr-br)) em_mq_start_locked(ifp, txr, NULL); #else - if (!IFQ_DRV_IS_EMPTY(ifp-if_snd)) - em_start_locked(ifp, txr); + em_start_locked(ifp, txr); #endif - em_txeof(txr); EM_TX_UNLOCK(txr); - if (more) { + if (more_rx || (ifp-if_drv_flags IFF_DRV_OACTIVE)) { taskqueue_enqueue(adapter-tq, adapter-que_task); return; } @@ -1604,7 +1599,6 @@ if (!IFQ_DRV_IS_EMPTY(ifp-if_snd)) em_start_locked(ifp, txr); #endif - em_txeof(txr); E1000_WRITE_REG(adapter-hw, E1000_IMS, txr-ims); EM_TX_UNLOCK(txr);
Re: em driver, 82574L chip, and possibly ASPM
Can you pls. share the patch for freebsd 7? On 2/7/2011 3:23 AM, Mike Tancsa wrote: So far so good. I would often get a hang on the level zero dumps to my backup server Sunday AM, and it made it through! So a good sign, but not a definitive sign. I have a PCIe em card that has this chipset as well and was showing the same sort of problem in a customer's RELENG_7 box. I will see if I can get the customer to try the card in their box with the patch for RELENG_7 as it would show this issue at least once a day until I pulled the card for an older version ---Mike On 2/4/2011 1:12 PM, Jack Vogel wrote: Was curious too, but being more patient than you :) Jack On Fri, Feb 4, 2011 at 10:09 AM, Sean Brunosean...@yahoo-inc.com wrote: Any more data on this problem or do we have to wait a while? Sean On Wed, 2011-02-02 at 10:28 -0800, Mike Tancsa wrote: On 2/2/2011 12:37 PM, Jack Vogel wrote: So has everyone that wanted to get something testing been able to do so? I have been testing in the back and will deploy to my production box this afternoon. As I am not able to reproduce it easily, it will be a bit before I can say the issue is gone. Jan however, was able to trigger it with greater ease ? ---Mike Jack On Tue, Feb 1, 2011 at 7:03 PM, Mike Tancsam...@sentex.net wrote: On 2/1/2011 5:03 PM, Sean Bruno wrote: On Tue, 2011-02-01 at 13:43 -0800, Jack Vogel wrote: To those who are going to test, here is the if_em.c, based on head, with my changes, I have to leave for the afternoon, and have not had a chance to build this, but it should work. I will check back in the later evening. Any blatant problems Sean, feel free to fix them :) Jack I suspect that line 1490 should be: if (more_rx || (ifp-if_drv_flags IFF_DRV_OACTIVE)) { I have hacked up a RELENG_8 version which I think is correct including the above change http://www.tancsa.com/if_em-8.c --- if_em.c.orig2011-02-01 21:47:14.0 -0500 +++ if_em.c 2011-02-01 21:47:19.0 -0500 @@ -30,7 +30,7 @@ POSSIBILITY OF SUCH DAMAGE. **/ -/*$FreeBSD: src/sys/dev/e1000/if_em.c,v 1.21.2.20 2011/01/22 01:37:53 jfv Exp $*/ +/*$FreeBSD$*/ #ifdef HAVE_KERNEL_OPTION_HEADERS #include opt_device_polling.h @@ -93,7 +93,7 @@ /* * Driver version: */ -char em_driver_version[] = 7.1.9; +char em_driver_version[] = 7.1.9-test; /* * PCI Device ID Table @@ -927,11 +927,10 @@ if (!adapter-link_active) return; -/* Call cleanup if number of TX descriptors low */ - if (txr-tx_avail= EM_TX_CLEANUP_THRESHOLD) - em_txeof(txr); - while (!IFQ_DRV_IS_EMPTY(ifp-if_snd)) { + /* First cleanup if TX descriptors low */ + if (txr-tx_avail= EM_TX_CLEANUP_THRESHOLD) + em_txeof(txr); if (txr-tx_avail EM_MAX_SCATTER) { ifp-if_drv_flags |= IFF_DRV_OACTIVE; break; @@ -1411,8 +1410,7 @@ if (!drbr_empty(ifp, txr-br)) em_mq_start_locked(ifp, txr, NULL); #else - if (!IFQ_DRV_IS_EMPTY(ifp-if_snd)) - em_start_locked(ifp, txr); + em_start_locked(ifp, txr); #endif EM_TX_UNLOCK(txr); @@ -1475,11 +1473,10 @@ struct ifnet*ifp = adapter-ifp; struct tx_ring *txr = adapter-tx_rings; struct rx_ring *rxr = adapter-rx_rings; - boolmore; - if (ifp-if_drv_flags IFF_DRV_RUNNING) { - more = em_rxeof(rxr, adapter-rx_process_limit, NULL); + boolmore_rx; + more_rx = em_rxeof(rxr, adapter-rx_process_limit, NULL); EM_TX_LOCK(txr); em_txeof(txr); @@ -1487,12 +1484,10 @@ if (!drbr_empty(ifp, txr-br)) em_mq_start_locked(ifp, txr, NULL); #else - if (!IFQ_DRV_IS_EMPTY(ifp-if_snd)) - em_start_locked(ifp, txr); + em_start_locked(ifp, txr); #endif - em_txeof(txr); EM_TX_UNLOCK(txr); - if (more) { + if (more_rx || (ifp-if_drv_flags IFF_DRV_OACTIVE)) { taskqueue_enqueue(adapter-tq, adapter-que_task); return; } @@ -1604,7 +1599,6 @@ if (!IFQ_DRV_IS_EMPTY(ifp-if_snd)) em_start_locked(ifp, txr); #endif - em_txeof(txr); E1000_WRITE_REG(adapter-hw, E1000_IMS, txr-ims); EM_TX_UNLOCK(txr); } @@ -3730,17 +3724,17 @@ txr-queue_status = EM_QUEUE_HUNG; /* - * If we have enough
Re: Introducing netmap: line-rate packet send/receive at 10Gbit/s
Thanks for the work. Is source for driver patches available? On 6/3/2011 3:01 AM, Luigi Rizzo wrote: Hi, we have recently worked on a project, called netmap, which lets FreeBSD send/receive packets at line rate even at 10 Gbit/s with very low CPU overhead: one core at 1.33 GHz does 14.88 Mpps with a modified ixgbe driver, which gives plenty of CPU cycles to handle multiple interface and/or do useful work (packet forwarding, analysis, etc.) You can find full documentation and source code and even a picobsd image at http://info.iet.unipi.it/~luigi/netmap/ The system uses memory mapped packet buffers to reduce the cost of data movements, but this would not be enough to make it useful or novel. Netmap uses many other small but important tricks to make the system fast, safe and easy to use, and support transmission, reception, and communication with the host stack. You can see full details in documentation at the above link. Feedback welcome. cheers luigi -+--- Prof. Luigi RIZZO, ri...@iet.unipi.it . Dip. di Ing. dell'Informazione http://www.iet.unipi.it/~luigi/. Universita` di Pisa TEL +39-050-2211611 . via Diotisalvi 2 Mobile +39-338-6809875 . 56122 PISA (Italy) -+--- ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
broadcom 57710 support
Any one knows if there is any near plan to develop drivers for network cards based on broadcom NetXtereme II 57710 10 GbE controller? --- best regards Hooman Fazaeli ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org