Re: UDP sendto() returning ENOBUFS - "No buffer space available"
On Fri, 18 Jul 2014, Adrian Chadd wrote: On 18 July 2014 13:40, Bruce Evans wrote: On Fri, 18 Jul 2014, hiren panchasara wrote: On Wed, Jul 16, 2014 at 11:00 AM, Adrian Chadd wrote: Hi! So the UDP transmit path is udp_usrreqs->pru_send() == udp_send() -> udp_output() -> ip_output() udp_output() does do a M_PREPEND() which can return ENOBUFS. ip_output can also return ENOBUFS. it doesn't look like the socket code (eg sosend_dgram()) is doing any buffering - it's just copying the frame and stuffing it up to the driver. No queuing involved before the NIC. Right. Thanks for confirming. Most buffering should be in ifq above the NIC. For UDP, I think udp_output() puts buffers on the ifq and calls the driver for every one, but the driver shouldn't do anything for most calls. The driver can't possibly do anything if its ring buffer is full, and shouldn't do anything if it is nearly full. Buffers accumulate in the ifq until the driver gets around to them or the queue fills up. Most ENOBUFS errors are for when it fills up. It can very easily fill up, especially since it is too small in most configurations. Just loop calling sendto(). This will fill the ifq almost instantly unless the hardware is faster than the software. For if_transmit() drivers, there's no ifp queue. The queuing is being done in the driver. For drivers with if_transmit(), they may end up doing direct DMA ring dispatch or they may have a buf_ring in front of it.There's no ifq anymore. It upsets the ALTQ people too. Ah, a new source of bugs. Most drivers don't use this yet. Most still use ifq with the bogus size of (tx_ring_size - 1): Ones converted to the indirect API: % dev/bge/if_bge.c: if_setsendqlen(ifp, BGE_TX_RING_CNT - 1); % dev/bxe/bxe.c:if_setsendqlen(ifp, sc->tx_ring_size); bxe is one of the few without the silly subtraction of 1. % dev/e1000/if_em.c:if_setsendqlen(ifp, adapter->num_tx_desc - 1); % dev/e1000/if_lem.c: if_setsendqlen(ifp, adapter->num_tx_desc - 1); % dev/fxp/if_fxp.c: if_setsendqlen(ifp, FXP_NTXCB - 1); % dev/nfe/if_nfe.c: if_setsendqlen(ifp, NFE_TX_RING_COUNT - 1); Ones not converted: % dev/ae/if_ae.c: ifp->if_snd.ifq_drv_maxlen = ifqmaxlen; % dev/ae/if_ae.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifp->if_snd.ifq_drv_maxlen); The double setting is related to ALTQ. I grepped for maxlen to find both. I might have missed alternative spellings. ifqmaxlen is usually 50, so all drivers using it have very little buffering. Even if their tx ring is tiny, this 50 is too small above 1 or 10 Mbps. % dev/age/if_age.c: ifp->if_snd.ifq_drv_maxlen = AGE_TX_RING_CNT - 1; % dev/age/if_age.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifp->if_snd.ifq_drv_maxlen); % dev/alc/if_alc.c: ifp->if_snd.ifq_drv_maxlen = ALC_TX_RING_CNT - 1; % dev/alc/if_alc.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifp->if_snd.ifq_drv_maxlen); % dev/ale/if_ale.c: ifp->if_snd.ifq_drv_maxlen = ALE_TX_RING_CNT - 1; % dev/ale/if_ale.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifp->if_snd.ifq_drv_maxlen); % dev/an/if_an.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen); % dev/an/if_an.c: ifp->if_snd.ifq_drv_maxlen = ifqmaxlen; % dev/asmc/asmc.c: uint8_t maxlen; % dev/asmc/asmc.c: maxlen = type[0]; Grepping for maxlen unfortunately found related things. I deleted most after this. % dev/ath/if_ath.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen); % dev/ath/if_ath.c: ifp->if_snd.ifq_drv_maxlen = ifqmaxlen; % dev/bce/if_bce.c: ifp->if_snd.ifq_drv_maxlen = USABLE_TX_BD_ALLOC; % dev/bce/if_bce.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifp->if_snd.ifq_drv_maxlen); % dev/bfe/if_bfe.c: ifp->if_snd.ifq_drv_maxlen = BFE_TX_QLEN; % dev/bm/if_bm.c: ifp->if_snd.ifq_drv_maxlen = BM_MAX_TX_PACKETS; % dev/bwi/if_bwi.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen); % dev/bwi/if_bwi.c: ifp->if_snd.ifq_drv_maxlen = ifqmaxlen; % dev/bwn/if_bwn.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen); % dev/bwn/if_bwn.c: ifp->if_snd.ifq_drv_maxlen = ifqmaxlen; % dev/cadence/if_cgem.c:ifp->if_snd.ifq_drv_maxlen = IFQ_MAXLEN; % dev/cas/if_cas.c: ifp->if_snd.ifq_drv_maxlen = CAS_TXQUEUELEN; % dev/ce/if_ce.c: d->queue.ifq_maxlen = ifqmaxlen; % dev/ce/if_ce.c: d->hi_queue.ifq_maxlen = ifqmaxlen; % dev/ce/if_ce.c: d->rqueue.ifq_maxlen = ifqmaxlen; % dev/ce/if_ce.c: d->rqueue.ifq_maxlen = ifqmaxlen; Seems silly to have many tiny queues, especially when their length is nominal and can be changed by tunables if not sysctls so that it is not actually tiny. But good for latency. % dev/cm/smc90cx6.c:ifp->if_snd.ifq_maxlen = ifqmaxlen; % dev/cp/if_cp.c: d->queue.ifq_maxlen = ifqmaxlen; % dev/cp/if_cp.c: d-&
Re: UDP sendto() returning ENOBUFS - "No buffer space available"
> On Jul 18, 2014, at 23:34, Adrian Chadd wrote: > > It upsets the ALTQ people too. I'm an ALTQ person (pfSense, so maybe one if the biggest) and I'm not upset. That cr*p needs to die in a fire. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: UDP sendto() returning ENOBUFS - "No buffer space available"
Hi, On 18 July 2014 13:40, Bruce Evans wrote: > On Fri, 18 Jul 2014, hiren panchasara wrote: > >> On Wed, Jul 16, 2014 at 11:00 AM, Adrian Chadd wrote: >>> >>> Hi! >>> >>> So the UDP transmit path is udp_usrreqs->pru_send() == udp_send() -> >>> udp_output() -> ip_output() >>> >>> udp_output() does do a M_PREPEND() which can return ENOBUFS. ip_output >>> can also return ENOBUFS. >>> >>> it doesn't look like the socket code (eg sosend_dgram()) is doing any >>> buffering - it's just copying the frame and stuffing it up to the >>> driver. No queuing involved before the NIC. >> >> >> Right. Thanks for confirming. > > > Most buffering should be in ifq above the NIC. For UDP, I think > udp_output() puts buffers on the ifq and calls the driver for every > one, but the driver shouldn't do anything for most calls. The > driver can't possibly do anything if its ring buffer is full, and > shouldn't do anything if it is nearly full. Buffers accumulate in > the ifq until the driver gets around to them or the queue fills up. > Most ENOBUFS errors are for when it fills up. It can very easily > fill up, especially since it is too small in most configurations. > Just loop calling sendto(). This will fill the ifq almost > instantly unless the hardware is faster than the software. For if_transmit() drivers, there's no ifp queue. The queuing is being done in the driver. For drivers with if_transmit(), they may end up doing direct DMA ring dispatch or they may have a buf_ring in front of it.There's no ifq anymore. It upsets the ALTQ people too. -a ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: UDP sendto() returning ENOBUFS - "No buffer space available"
On Fri, 18 Jul 2014, hiren panchasara wrote: On Wed, Jul 16, 2014 at 11:00 AM, Adrian Chadd wrote: Hi! So the UDP transmit path is udp_usrreqs->pru_send() == udp_send() -> udp_output() -> ip_output() udp_output() does do a M_PREPEND() which can return ENOBUFS. ip_output can also return ENOBUFS. it doesn't look like the socket code (eg sosend_dgram()) is doing any buffering - it's just copying the frame and stuffing it up to the driver. No queuing involved before the NIC. Right. Thanks for confirming. Most buffering should be in ifq above the NIC. For UDP, I think udp_output() puts buffers on the ifq and calls the driver for every one, but the driver shouldn't do anything for most calls. The driver can't possibly do anything if its ring buffer is full, and shouldn't do anything if it is nearly full. Buffers accumulate in the ifq until the driver gets around to them or the queue fills up. Most ENOBUFS errors are for when it fills up. It can very easily fill up, especially since it is too small in most configurations. Just loop calling sendto(). This will fill the ifq almost instantly unless the hardware is faster than the software. So a _well behaved_ driver will return ENOBUFS _and_ not queue the frame. However, it's entirely plausible that the driver isn't well behaved - the intel drivers screwed up here and there with transmit queue and failure to queue vs failure to transmit. No, the driver doesn't have much control over the ifq. So yeah, try tweaking the tx ring descriptor for the driver your'e using and see how big a bufring it's allocating. Yes, so I am dealing with Broadcom BCM5706/BCM5708 Gigabit Ethernet, i.e. bce(4). I bumped up tx_pages from 2 (default) to 8 where each page is 255 buffer descriptors. I am seeing quite nice improvement on stable/10 where I can send *more* stuff :-) 255 is not many. I am most familiar with bge where there is a single tx ring with 511 or 512 buffer descriptors (some bge's have more, but this is unportable and was not supported last time I looked. The extras might be only for input). One of my bge's can do 640 kpps with tiny packets (only 80 kpps with normal packets) and the other only 200 (?) kpps (both should be limited mainly by the PCI bus, but the slow one is limited by it being a dumbed down 5705"plus"). At 640 kpps, it takes 800 microseconds to transmit 512 packets. (There is 1 packet per buffer descriptor for small packets.) Considerable buffering in ifq is needed to prevent the transmitter running dry whenever the application stops generating packets for more than 800 microseconds for some reason, but the default buffering is stupidly small. The default is given by net.inet.ifqmaxlen and some corresponding macros, and is still just 50. 50 was enough for 1 Mpbs ethernet and perhaps even for 10 Mbps, but is now too small. Most drivers don't use it, but use their own too-small value. bge uses just its own ring buffer size of 511. I use 1 or 4 depending on hz: % diff -u2 if_bge.c~ if_bge.c % --- if_bge.c~ 2012-03-13 02:13:48.144002000 + % +++ if_bge.c 2012-03-13 02:13:50.123023000 + % @@ -3315,5 +3316,6 @@ % ifp->if_start = bge_start; % ifp->if_init = bge_init; % - ifp->if_snd.ifq_drv_maxlen = BGE_TX_RING_CNT - 1; % + ifp->if_snd.ifq_drv_maxlen = BGE_TX_RING_CNT + % + imax(4 * tick, 1) / 1; % IFQ_SET_MAXLEN(&ifp->if_snd, ifp->if_snd.ifq_drv_maxlen); % IFQ_SET_READY(&ifp->if_snd); 4 is what is needed for 4 tick's worth of buffering at hz = 100. 4 is far too large where 50 is far too small, but something like it is needed when hz is large due to another problem: select() on the ENOBUFS condition is broken (unsupported), so when sendto() returns ENOBUFS there is no way for the application to tell how long it should wait before retrying. If it wants to burn CPU then it can spin calling sendto(). Otherwise, it should sleep, but with a sleep granularity of 1 tick this requires several ticks worth of buffering to avoid the transmitter running dry. Large queue lengths give a large latency for packets at the end of the queue and give no chance of the working set fitting in an Ln cache for small n. The precise stupidly small value of (tx_ring_count - 1) for the ifq length seems to be for no good reason. Subtracting 1 is apparently to increase the chance that all packets in the ifq can be fitted into the tx ring. But this is silly since the ifq packet count is in dufferent units to the buffer descriptor count. For normal-size packets, there are 2 descriptors per packet. So in the usual case where the ifq is full, only about half of it can be moved to the tx ring. And this is good since it gives a little more buffering. Otherwise, the effective buffering is just what is in the tx ring, since none is left in the ifq after transferring eve
Re: UDP sendto() returning ENOBUFS - "No buffer space available"
On Wed, Jul 16, 2014 at 11:00 AM, Adrian Chadd wrote: > Hi! > > So the UDP transmit path is udp_usrreqs->pru_send() == udp_send() -> > udp_output() -> ip_output() > > udp_output() does do a M_PREPEND() which can return ENOBUFS. ip_output > can also return ENOBUFS. > > it doesn't look like the socket code (eg sosend_dgram()) is doing any > buffering - it's just copying the frame and stuffing it up to the > driver. No queuing involved before the NIC. Right. Thanks for confirming. > > So a _well behaved_ driver will return ENOBUFS _and_ not queue the > frame. However, it's entirely plausible that the driver isn't well > behaved - the intel drivers screwed up here and there with transmit > queue and failure to queue vs failure to transmit. > > So yeah, try tweaking the tx ring descriptor for the driver your'e > using and see how big a bufring it's allocating. Yes, so I am dealing with Broadcom BCM5706/BCM5708 Gigabit Ethernet, i.e. bce(4). I bumped up tx_pages from 2 (default) to 8 where each page is 255 buffer descriptors. I am seeing quite nice improvement on stable/10 where I can send *more* stuff :-) cheers, Hiren ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: UDP sendto() returning ENOBUFS - "No buffer space available"
Hi! So the UDP transmit path is udp_usrreqs->pru_send() == udp_send() -> udp_output() -> ip_output() udp_output() does do a M_PREPEND() which can return ENOBUFS. ip_output can also return ENOBUFS. it doesn't look like the socket code (eg sosend_dgram()) is doing any buffering - it's just copying the frame and stuffing it up to the driver. No queuing involved before the NIC. So a _well behaved_ driver will return ENOBUFS _and_ not queue the frame. However, it's entirely plausible that the driver isn't well behaved - the intel drivers screwed up here and there with transmit queue and failure to queue vs failure to transmit. So yeah, try tweaking the tx ring descriptor for the driver your'e using and see how big a bufring it's allocating. -a On 16 July 2014 01:58, hiren panchasara wrote: > Return values in sendto() manpage says: > > [ENOBUFS] The system was unable to allocate an internal buffer. > The operation may succeed when buffers become avail- > able. > > [ENOBUFS] The output queue for a network interface was full. > This generally indicates that the interface has > stopped sending, but may be caused by transient con- > gestion. > > If I hit the first condition, it should reflect as failures in > "netstat -m". Is that a correct assumption? > > I want to understand what happens when/if we hit the second condition. > And how to prevent that from happening. > Is it just application's job to rate-limit data it sends to the n/w > interface card so that it doesn't saturate? > Does kernel do any sort of queuing in the case of ENOBUFS? OR does the > message just gets dropped? > > For an application sending a lot of UDP data and returning ENOBUFS, > what all udp and other tunables I should tweak? I can only think of: > - number of tx ring descriptors - increasing this will get us more txds. > - kern.ipc.maxsockbuf: Increasing this will increase buffer size > allocated for sockets. > > what else? > > Any comments/suggestions/corrections? > > cheers, > Hiren > ___ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
UDP sendto() returning ENOBUFS - "No buffer space available"
Return values in sendto() manpage says: [ENOBUFS] The system was unable to allocate an internal buffer. The operation may succeed when buffers become avail- able. [ENOBUFS] The output queue for a network interface was full. This generally indicates that the interface has stopped sending, but may be caused by transient con- gestion. If I hit the first condition, it should reflect as failures in "netstat -m". Is that a correct assumption? I want to understand what happens when/if we hit the second condition. And how to prevent that from happening. Is it just application's job to rate-limit data it sends to the n/w interface card so that it doesn't saturate? Does kernel do any sort of queuing in the case of ENOBUFS? OR does the message just gets dropped? For an application sending a lot of UDP data and returning ENOBUFS, what all udp and other tunables I should tweak? I can only think of: - number of tx ring descriptors - increasing this will get us more txds. - kern.ipc.maxsockbuf: Increasing this will increase buffer size allocated for sockets. what else? Any comments/suggestions/corrections? cheers, Hiren ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
ENOBUFS and DNS...
< said: > If I were to tweak the sysctl net.inet.ip.intr_queue_maxlen from its > default of 50 up, would that possibly help named? No, it will not have any effect on your problem. The IP input queue is only on receive, and your problem is on transmit. The only thing that could possibly help your problem is increasing your output queue length, and it is already quite substantial; doing this will probably hurt as much as it helps, since the output queue is serviced in strict FIFO order and there is no way to ``call back'' a packet once it makes it there. Something like ALTQ might help if you are able to use a WFQ discipline and assign a high weight to DNS traffic. -GAWollman ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
ENOBUFS and DNS...
[Drop hostname part of IPv6-only address above to obtain IPv4-capable e-mail, or just drop me from the recipients and I'll catch up from the archives] Hello, "%s"! I've read in this list from a couple years ago, several discussions about ENOBUFS being returned to UDP-using applications. This is what I'm experiencing with BIND when I get hit with lots of queries over a slow link. I'm serving DNS info for my subdomain, with an off-site secondary. I'm on a dial-in now (no comments please); I don't ever remember seeing this with a cable modem connection (about 2-4x upstream speed than now, with downstream speed higher still). When I send a mail to the FreeBSD lists, shortly after, I get hit with lots of DNS queries to verify my address(es). My modem is saturated both down- and upstream for some minutes. For a minute or two, `named' spits out syslog messages about insufficient resources, as the replies it tries to make return ENOBUFS. If I were to tweak the sysctl net.inet.ip.intr_queue_maxlen from its default of 50 up, would that possibly help named? Or might that cause problems elsewhere? Or should I ignore this, or would the best possible solution be for me simply not to send any more mail to the lists? I can think of a few possibilties for this being made worse over my thin pipe. Comments about my thoughts below are welcome, to help me improve my understanding of things. I'm usually filling the downstream pipe even without the queries coming in (pay-per-minute so I'm trying to maximize use of pipe). This alone may worsen things, as incoming queries see a high latency, causing them to be repeated before a response is received, possibly causing other nameservers to initiate queries to me, resulting in many more queries coming in than if I returned answers promptly. The size of the outgoing responses is larger than the queries, so it takes more time to push out responses than it does for them to come in. These factors combined with the timeouts/retries that resolvers and nameservers have, mean that no matter what I do, things won't get a lot better for me. (As a note, when I sent mails over the cable modem, a different mailing list software was used by FreeBSD. Still, I'd see heaps of queries shortly after, just as now. This in the event the current software makes the deliveries faster at the same time, causing more simultaneous queries to me. Also, perhaps more sites are doing not only sender validation but also validation of the from address due to spam growth the last year.) I suspect that not all sites are able to successfuly query me, as after the initial couple minutes of ENOBUFS problems and as the incoming queries taper off, some time later I'll see a repeat of the ENOBUFS problem, as I'm assuming another round of attempts is made to dispose of the queue built up at freebsd.org. If I'm still online when that happens, to be queried, of course. I haven't looked to see whether BIND does anything special when an ENOBUFS pops up in order not to drop the response. Perhaps if it were to do so, queueing responses, things would only get worse as the backlog continues to increase, so by the time responses get sent, the requester has already given up (after sending a few more queries to increase the backlog further). Thus in such a case the better thing is to drop random responses in order to get fewer of them out in a more timely fashion. Or perhaps I shouldn't worry, trusting that the sites which fail to receive a response from me directly after a few tries might poke the offsite secondary nameserver, and that the error-recovery is handled by DNS, so I shouldn't do anything to UDP to try to help. Anyway, just for fun, I'm going to double the above sysctl value for this message and see how things change. Later I'll think about suspending my downloads to speed up incoming queries. Also, I just remembered that userland ppp allows me to prioritize certain traffic so I should try that too, though normally the downloads I do only snarf a few hundred bytes/sec from the outgoing pipe, so that might help little As noted, comments about my ideas are welcome. Thanks, Barry Bouwsma ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: mpd: two links make one disconnect (ENOBUFS, LCP no reply)
Hi, On Wed, 10 Dec 2003, Giovanni P. Tirloni wrote: > common: > set bundle disable multilink > set bundle enable compression > set bundle yes encryption ^^^ please remove this line You don't need ECP for MPPE (Microsoft Point to Point Encryption) Maybe this option is confusing the windoze clients. > set ccp yes mppc > set ccp yes mpp-e40 > set ccp yes mpp-e56 > set ccp yes mpp-e128 > set ccp yes mpp-stateless > set ipcp enable vjcomp > set iface enable proxy-arp > set iface route 192.168.1.253/24 > set ipcp dns 1.2.3.4 > set link deny pap chap > set link enable chap-md5 chap-msv1 chap-msv2 BTW: you can just enable chap-msv1 and chap-msv2, because when using MPPE MS-CHAP is mandatory. Could you please post (in private) more of your logfile and your mpd.links? bye, -- --- -- Michael Bretterklieber - http://www.bretterklieber.com A-Quadrat Automation GmbH - http://www.a-quadrat.at Tel: ++43-(0)3172-41679 - GSM: ++43-(0)699 12861847 --- -- "...the number of UNIX installations has grown to 10, with more expected..." - Dennis Ritchie and Ken Thompson, June 1972 ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
mpd: two links make one disconnect (ENOBUFS, LCP no reply)
Hi, The behaviour I'm having with mpd-3.15 is that it establishes the first connection in ng0 and when I try to open another connection it works but drops the first one after sometime because it stops answering the LCP echos. When both are established I can ping the last one but the ping to the first IP returns ENOBUFS (probably because the link is being dropped). Anything related to the PPTP output window? Here is the log entries after both links are established (they show as connected in the win2k and winxp boxes and pptp0 was answering the LCP echos): Dec 10 11:02:22 servidor mpd: [pptp1] exec: command returned 256 Dec 10 11:02:22 servidor mpd: [pptp1] IFACE: Up event Dec 10 11:02:24 servidor mpd: [pptp1] ECP: SendConfigReq #4 Dec 10 11:02:24 servidor mpd: [pptp1] LCP: rec'd Protocol Reject #9 link 0 (Opened) Dec 10 11:02:24 servidor mpd: [pptp1] LCP: protocol ECP was rejected Dec 10 11:02:24 servidor mpd: [pptp1] ECP: protocol was rejected by peer Dec 10 11:02:24 servidor mpd: [pptp1] ECP: state change Req-Sent --> Stopped Dec 10 11:02:24 servidor mpd: [pptp1] ECP: LayerFinish Dec 10 11:03:20 servidor mpd: [pptp0] LCP: no reply to 1 echo request(s) Dec 10 11:03:25 servidor mpd: [pptp0] LCP: no reply to 2 echo request(s) Dec 10 11:03:30 servidor mpd: [pptp0] LCP: no reply to 3 echo request(s) Dec 10 11:03:35 servidor mpd: [pptp0] LCP: no reply to 4 echo request(s) Dec 10 11:03:40 servidor mpd: [pptp0] LCP: no reply to 5 echo request(s) Dec 10 11:03:45 servidor mpd: [pptp0] LCP: no reply to 6 echo request(s) Dec 10 11:03:50 servidor mpd: [pptp0] LCP: no reply to 7 echo request(s) Dec 10 11:03:50 servidor mpd: [pptp0] LCP: peer not responding to echo requests Dec 10 11:03:50 servidor mpd: [pptp0] LCP: LayerFinish Dec 10 11:03:50 servidor mpd: [pptp0] LCP: LayerStart Dec 10 11:03:50 servidor mpd: [pptp0] LCP: state change Opened --> Starting Dec 10 11:03:50 servidor mpd: [pptp0] LCP: phase shift NETWORK --> DEAD Dec 10 11:03:50 servidor mpd: [pptp0] setting interface ng0 MTU to 1500 bytes Dec 10 11:03:50 servidor mpd: [pptp0] up: 0 links, total bandwidth 9600 bps Dec 10 11:03:50 servidor mpd: [pptp0] IPCP: Down event Dec 10 11:03:50 servidor mpd: [pptp0] IPCP: state change Opened --> Starting Dec 10 11:03:50 servidor mpd: [pptp0] IPCP: LayerDown Dec 10 11:03:50 servidor mpd: [pptp0] IFACE: Down event Dec 10 11:03:50 servidor mpd: [pptp0] exec: /sbin/route delete 192.168.1.253 -iface lo0 Dec 10 11:03:50 servidor mpd: [pptp0] exec: /usr/sbin/arp -d 192.168.1.220 Dec 10 11:03:50 servidor mpd: [pptp0] exec: /sbin/ifconfig ng0 down delete -link0 Dec 10 11:03:50 servidor mpd: [pptp0] CCP: Down event Dec 10 11:03:50 servidor mpd: [pptp0] CCP: state change Opened --> Starting Dec 10 11:03:50 servidor mpd: [pptp0] CCP: LayerDown Dec 10 11:03:50 servidor mpd: [pptp0] CCP: Close event Dec 10 11:03:50 servidor mpd: [pptp0] CCP: state change Starting --> Initial Dec 10 11:03:50 servidor mpd: [pptp0] CCP: LayerFinish Dec 10 11:03:50 servidor mpd: [pptp0] ECP: Down event Dec 10 11:03:50 servidor mpd: [pptp0] ECP: state change Stopped --> Starting Dec 10 11:03:50 servidor mpd: [pptp0] ECP: LayerStart Dec 10 11:03:50 servidor mpd: [pptp0] ECP: Close event # netstat -m mbuf usage: GEN cache: 0/0 (in use/in pool) CPU #0 cache: 2/256 (in use/in pool) Total: 2/256 (in use/in pool) Mbuf cache high watermark: 512 Maximum possible: 27136 Allocated mbuf types: 2 mbufs allocated to data 0% of mbuf map consumed mbuf cluster usage: GEN cache: 0/80 (in use/in pool) CPU #0 cache: 0/128 (in use/in pool) Total: 0/208 (in use/in pool) Cluster cache high watermark: 128 Maximum possible: 13568 1% of cluster map consumed 480 KBytes of wired memory reserved (0% in use) 0 requests for memory denied 0 requests for memory delayed 0 calls to protocol drain routines After much tweaking here is my mpd.conf: mpd.conf --- default: load pptp0 load pptp1 common: set bundle disable multilink set bundle enable compression set bundle yes encryption set ccp yes mppc set ccp yes mpp-e40 set ccp yes mpp-e56 set ccp yes mpp-e128 set ccp yes mpp-stateless set ipcp enable vjcomp set iface enable proxy-arp set iface route 192.168.1.253/24 set ipcp dns 1.2.3.4 set link deny pap chap set link enable chap-md5 chap-msv1 chap-msv2 set ipcp nbns 192.168.1.254 pptp0: new -i ng0 pptp0 pptp0 set ipcp ranges 192.168.1.253/32 192.168.1.220/24 load common pptp1: new -i ng1 pptp1 pptp1 set ipcp ranges 192.168.1.253/32 192.168.1.221/24 load common mpd.conf - Thanks in adv
RE: bug in bge driver with ENOBUFS on 4.7
> From: Don Bowman [mailto:don@;sandvine.com] > In bge_rxeof(), there can end up being a condition which causes > the driver to endlessly interrupt. > > if (bge_newbuf_std(sc, sc->bge_std, NULL) == ENOBUFS) { > ifp->if_ierrors++; > bge_newbuf_std(sc, sc->bge_std, m); > continue; > } > > happens. Now, bge_newbuf_std returns ENOBUFS. 'm' is also NULL. > This causes the received packet to not be dequeued, and the driver > will then go straight back into interrupt as the chip will > reassert the interrupt as soon as we return. More information... It would appear that we're looping here in the rx interrupt, the variable 'stdcnt' which counts the number of standard-sized packets pulled off per iteration is huge (indicating we've overrun the ring multiple times). while(sc->bge_rx_saved_considx != sc->bge_rdata->bge_status_block.bge_idx[0].bge_rx_prod_idx) { is the construct that controls when we exit the loop. Clearly in my case this is never becoming false. I see 'sc->bge_rx_saved_considx' as 201, and the RHS of the expression as 38442. This doesn't seem correct, I think that both numbers must be <= BGE_SSLOTS. (kgdb) p/x *cur_rx $10 = {bge_addr = {bge_addr_hi = 0x0, bge_addr_lo = 0xca2d802}, bge_len = 0x4a, bge_idx = 0xc8, bge_flags = 0x7004, bge_type = 0x0, bge_tcp_udp_csum = 0x9992, bge_ip_csum = 0x, bge_vlan_tag = 0x0, bge_error_flag = 0x0, bge_rsvd = 0x0, bge_opaque = 0x0} Any suggestions anyone? To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
bug in bge driver with ENOBUFS on 4.7
In bge_rxeof(), there can end up being a condition which causes the driver to endlessly interrupt. if (bge_newbuf_std(sc, sc->bge_std, NULL) == ENOBUFS) { ifp->if_ierrors++; bge_newbuf_std(sc, sc->bge_std, m); continue; } happens. Now, bge_newbuf_std returns ENOBUFS. 'm' is also NULL. This causes the received packet to not be dequeued, and the driver will then go straight back into interrupt as the chip will reassert the interrupt as soon as we return. Suggestions on a fix? I'm not sure why I ran out of mbufs, I have kern.ipc.nmbclusters: 9 kern.ipc.nmbufs: 28 (kgdb) p/x mbstat $11 = {m_mbufs = 0x3a0, m_clusters = 0x39c, m_spare = 0x0, m_clfree = 0x212, m_drops = 0x0, m_wait = 0x0, m_drain = 0x0, m_mcfail = 0x0, m_mpfail = 0x0, m_msize = 0x100, m_mclbytes = 0x800, m_minclsize = 0xd5, m_mlen = 0xec, m_mhlen = 0xd4} but bge_newbuf_std() does this: if (m == NULL) { MGETHDR(m_new, M_DONTWAIT, MT_DATA); if (m_new == NULL) { return(ENOBUFS); } and then returns ENOBUFS. This is with 4.7-RELEASE. --don ([EMAIL PROTECTED] www.sandvine.com) To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Performance of em driver (Was: ENOBUFS)
On Fri, 18 Oct 2002, Kelly Yancey wrote: > Hmm. Might that explain the abysmal performance of the em driver with > packets smaller than 333 bytes? > > Kelly > This is just a follow-up to report that thanks to Luigi and Prafulla we were able to track down the cause of the problems I was seeing with the em driver/hardware. In our test environment we had left the IP packet queue (net.inet.ip.intr_queue_maxlen) at its default value of 50 which, when using the em card, was overflowing causing the dropped packets. While it is curious that it was not overflowing using the bge card, clearly 50 packets is a restrictive maximum queue size for any decent amount of traffic. Below are some of the results from our testing. First, a note about the methodology: traffic was generated using 7 10/100 ethernet ports of a SmartBits 600 (each port was set to generate 14.25Mbps of traffic for a aggregate of 99.75Mbps, slightly higher than the theoretical maximum wirespeed). The traffic was then VLAN tagged before being passed to a 1.8Ghz Pentium 4 running FreeBSD 4.5p19 where it was untagged and passed back to the SmartBits. The numbers quoted below are the actual amount of traffic that was delivered back to the SmartBits. The kernel involved included a number of modifications proprietary to NTTMCL so the numbers are going to differ from a stock kernel and I only present them for comparative purposes between the different network configurations. Also note that all interfaces were configured for 100base-TX full-duplex. Frame Size NICs queue ipfw 64 128 192 bge->fxp 50 0 79.708 97.325 98.124 Mbps bge->fxp 1000 0 80.172 97.325 98.124 Mbps em->fxp1000 0 77.590 97.325 98.124 Mbps bge->fxp 5032 39.097 97.325 98.124 Mbps bge->fxp 100032 62.011 97.325 98.124 Mbps em->fxp100032 63.651 97.325 98.124 Mbps The numbers in the ipfw column are the number of non-matching rules in the ruleset before an "allow all from any to any" rule. Kelly -- Kelly Yancey -- kbyanc@{posi.net,FreeBSD.org} -- [EMAIL PROTECTED] "And say, finally, whether peace is best preserved by giving energy to the government or information to the people. This last is the most certain and the most legitimate engine of government." -- Thomas Jefferson to James Madison, 1787. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ENOBUFS
> In special cases, the error induced by having interrupts blocked > causes errors which are much larger than polling alone. Which conditions block interrupts for longer than, say, a millisecond? Disk errors / wakeups? Anything occurring in "normal" conditions? Pete To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ENOBUFS
On Fri, 18 Oct 2002, Kelly Yancey wrote: > > You should definitely clarify how fast the smartbits unit is pushing > > out traffic, and whether its speed depends on the measured RTT. > > > > It doesn't sound like the box is that smart. As it was explained to me, the > test setup includes a desired 'load' to put on the wire: it is measured as a > percentage of the wire speed. Since our SmartBit unit only supports > 100base-T and doesn't understand vlans, we have to use 7 separate outbound > ports, each configured for 14.25% load. To the GigE interface, this should > appear as 99.75 megabits of data (including all headers/framing). > Oops. That was actually the explanation of the SmartBits 'desired ILoad' which I didn't quote in the posted numbers. The actual number of packets transmitted is based on RTT. Sorry for the confusion, Kelly -- Kelly Yancey -- kbyanc@{posi.net,FreeBSD.org} "No nation is permitted to live in ignorance with impunity." -- Thomas Jefferson, 1821. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ENOBUFS
On Fri, 18 Oct 2002, Luigi Rizzo wrote: > Oh, I *thought* the numbers you reported were pps but now i see that > nowhere you mentioned that. > Sorry. I just checked with our tester. Those are the total number of packets sent during the test. Each test lasted 10 seconds, so divde by 10 to get pps. > But if things are as you say, i am seriously puzzled on what you > are trying to measure -- the output interface (fxp) is a 100Mbit/s > card which cannot possibly support the load you are trying to offer > to saturate the input link. > We don't want to saturate the input link, only saturate the outbound link (100Mps). Oddly enough, the em card cannot do this with any packets less than 333 bytes and drops ~50% of the packets. But clearly this isn't a bottlenext issue because the drop-off isn't smooth. 332 byte backs cause ~50% packet loss; 333 byte packets cause 0% packet loss. > You should definitely clarify how fast the smartbits unit is pushing > out traffic, and whether its speed depends on the measured RTT. > It doesn't sound like the box is that smart. As it was explained to me, the test setup includes a desired 'load' to put on the wire: it is measured as a percentage of the wire speed. Since our SmartBit unit only supports 100base-T and doesn't understand vlans, we have to use 7 separate outbound ports, each configured for 14.25% load. To the GigE interface, this should appear as 99.75 megabits of data (including all headers/framing). > It might well be that what you are > seeing is saturation of ipintrq, which happens because of some > strange timing issue -- nothing to do with the board. > I don't understand why it would only happen with the em card and not with the bge under the exact same traffic (or even more demanding traffic, i.e. 64byte frames). Also, wouldn't packet gradually subside as we approached the 333 byte magic limit rather than the sudden drop-off we are seeing? > In any case, at least in my experience, a 1GHz box with two em > cards can easily forward between 350 and 400kpps (64-byte frames) with a > 4.6-ish kernel, and a 2.4GHz box goes above 650kpps. > We expect our kernel to be slower than that (we typically see ~120kpps for 64-byte frames using the bge driver and a 5701-based card) because we are using an fxp card for outbound traffic and have added additional code to the ip_input() processing. The point isn't absolute numbers, though, but trying to figure out why when using the em driver (and only with the em driver!) we see ~50% packet loss with packets smaller than 333 bytes (no matter what size, just that it is smaller). That is, 64 byte frames: ~50% packet loss; 332 byte frames: ~50% packet loss; 333 byte frames: 0% packet loss. That sort of sudden drop doesn't look like a bottleneck to me. We've mostly written the em driver off because of this. The bge driver works just fine performance wise; it was the sporadic watchdog timeouts that led us to investigate the Intel cards to begin with. I only mentioned it on-list because earlier Jim McGrath alluded to similar performance issues with the Intel GigE cards and small frames. Actually, at this point, I'm hoping that your polling patches for the em driver workaround whatever problem is causing the packet loss and am eagerly awaiting them to be committed. :) Thanks, Kelly > > On Fri, Oct 18, 2002 at 11:13:54AM -0700, Kelly Yancey wrote: > > On Fri, 18 Oct 2002, Luigi Rizzo wrote: > > > > > How is the measurement done, does the box under test act as a router > > > with the smartbit pushing traffic in and expecting it back ? > > > > > The box has 2 interfaces, a fxp and a em (or bge). The GigE interface is > > configured with 7 VLANs. THe SmartBit produces X byte UDP datagrams that go > > through a Foundry ServerIron switch for VLAN tagging and then to the GigE > > interface (where they are untagged). The box is acting as a router and all > > traffic is directed out the fxp interface where it returns to the SmartBit. > > > > > The numbers are strange, anyways. > > > > > > A frame of N bytes takes (N*8+160) nanoseconds on the wire, which > > > for 330-byte frames should amount to 100/(330*8+160) ~= 357kpps, > > > not the 249 or so you are seeing. Looks as if the times were 40% off. > > > > > > > Yeah, I've never made to much sense of the actual numbers myself. Our > > resident SmartBit expert runs the tests and provides me with the results. I > > use them more for getting an idea of the relative performance of one > > configuration over another and not as absolute numbers themselves. I'll check > > with our resident expert and see if he can explain how it calculates those > > numbers. The point being, though, that there is an undeniable drop-off with > > 332 byte or smaller packets. We have never seen any such drop-off using the > > bge driver. > > > > Thanks, > > > > Kelly > > > > > cheers > > > luigi > > > > > > On Fri, Oct 18, 2002 at 10:45:13AM -0700, Kelly Yanc
Re: ENOBUFS
On Fri, 18 Oct 2002, Prafulla Deuskar wrote: > FYI. 82543 doesn't support PCI-X protocol. > For PCI-X support use 82544, 82545 or 82546 based cards. > > -Prafulla > That is alright, we aren't expecting PCI-X speeds. It is just that our only PCI slot on the motherboard (1U rack-mount system) is a PCI-X slot. Shouldn't the 82543 still function normally but only as at PCI speeds? Thanks, Kelly > > Kelly Yancey [[EMAIL PROTECTED]] wrote: > > On Fri, 18 Oct 2002, Luigi Rizzo wrote: > > > > > On Fri, Oct 18, 2002 at 10:27:04AM -0700, Kelly Yancey wrote: > > > ... > > > > Hmm. Might that explain the abysmal performance of the em driver with > > > > packets smaller than 333 bytes? > > > > > > what do you mean ? it works great for me. even on -current i > > > can push out over 400kpps (64byte frames) on a 2.4GHz box. > > > > > > luigi > > > > > > > Using a SmartBit to push traffic across a 1.8Ghz P4; 82543 chipset card > > plugged into PCI-X bus: > > > > FrameSize TxFramesRxFramesLostFrames Lost (%) > > 330 249984 129518 120466 48.19 > > 331 249144 127726 121418 48.73 > > 332 248472 140817 107655 43.33 > > 333 247800 247800 0 0 > > > > It has no trouble handling frames 333 bytes or larger. But for any frame > > 332 bytes or smaller we consistently see ~50% packet loss. This same machine > > easily pushes ~100Mps with the very same frame sizes using a bge card rather > > than em. > > > > I've gotten the same results with both em driver version 1.3.14 and 1.3.15 > > on both FreeBSD 4.5 and 4.7 (all 4 combinations, that is). > > > > Kelly > > > > -- > > Kelly Yancey -- kbyanc@{posi.net,FreeBSD.org} > > FreeBSD, The Power To Serve: http://www.freebsd.org/ > > > > > > To Unsubscribe: send mail to [EMAIL PROTECTED] > > with "unsubscribe freebsd-net" in the body of the message > > To Unsubscribe: send mail to [EMAIL PROTECTED] > with "unsubscribe freebsd-net" in the body of the message > -- Kelly Yancey -- kbyanc@{posi.net,FreeBSD.org} "No nation is permitted to live in ignorance with impunity." -- Thomas Jefferson, 1821. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ENOBUFS
Oh, I *thought* the numbers you reported were pps but now i see that nowhere you mentioned that. But if things are as you say, i am seriously puzzled on what you are trying to measure -- the output interface (fxp) is a 100Mbit/s card which cannot possibly support the load you are trying to offer to saturate the input link. You should definitely clarify how fast the smartbits unit is pushing out traffic, and whether its speed depends on the measured RTT. It might well be that what you are seeing is saturation of ipintrq, which happens because of some strange timing issue -- nothing to do with the board. In any case, at least in my experience, a 1GHz box with two em cards can easily forward between 350 and 400kpps (64-byte frames) with a 4.6-ish kernel, and a 2.4GHz box goes above 650kpps. cheers luigi On Fri, Oct 18, 2002 at 11:13:54AM -0700, Kelly Yancey wrote: > On Fri, 18 Oct 2002, Luigi Rizzo wrote: > > > How is the measurement done, does the box under test act as a router > > with the smartbit pushing traffic in and expecting it back ? > > > The box has 2 interfaces, a fxp and a em (or bge). The GigE interface is > configured with 7 VLANs. THe SmartBit produces X byte UDP datagrams that go > through a Foundry ServerIron switch for VLAN tagging and then to the GigE > interface (where they are untagged). The box is acting as a router and all > traffic is directed out the fxp interface where it returns to the SmartBit. > > > The numbers are strange, anyways. > > > > A frame of N bytes takes (N*8+160) nanoseconds on the wire, which > > for 330-byte frames should amount to 100/(330*8+160) ~= 357kpps, > > not the 249 or so you are seeing. Looks as if the times were 40% off. > > > > Yeah, I've never made to much sense of the actual numbers myself. Our > resident SmartBit expert runs the tests and provides me with the results. I > use them more for getting an idea of the relative performance of one > configuration over another and not as absolute numbers themselves. I'll check > with our resident expert and see if he can explain how it calculates those > numbers. The point being, though, that there is an undeniable drop-off with > 332 byte or smaller packets. We have never seen any such drop-off using the > bge driver. > > Thanks, > > Kelly > > > cheers > > luigi > > > > On Fri, Oct 18, 2002 at 10:45:13AM -0700, Kelly Yancey wrote: > > ... > > > > can push out over 400kpps (64byte frames) on a 2.4GHz box. > > > > > > > > luigi > > > > > > > > > > Using a SmartBit to push traffic across a 1.8Ghz P4; 82543 chipset card > > > plugged into PCI-X bus: > > > > > > FrameSize TxFramesRxFramesLostFrames Lost (%) > > > 330 249984 129518 120466 48.19 > > > 331 249144 127726 121418 48.73 > > > 332 248472 140817 107655 43.33 > > > 333 247800 247800 0 0 > > > > > > It has no trouble handling frames 333 bytes or larger. But for any frame > > > 332 bytes or smaller we consistently see ~50% packet loss. This same machine > > > easily pushes ~100Mps with the very same frame sizes using a bge card rather > > > than em. > > > > > > I've gotten the same results with both em driver version 1.3.14 and 1.3.15 > > > on both FreeBSD 4.5 and 4.7 (all 4 combinations, that is). > > > > > > Kelly > > > > > > -- > > > Kelly Yancey -- kbyanc@{posi.net,FreeBSD.org} > > > FreeBSD, The Power To Serve: http://www.freebsd.org/ > > > > > > > > > To Unsubscribe: send mail to [EMAIL PROTECTED] > > > with "unsubscribe freebsd-net" in the body of the message > > > > To Unsubscribe: send mail to [EMAIL PROTECTED] > > with "unsubscribe freebsd-net" in the body of the message > > > > -- > Kelly Yancey -- kbyanc@{posi.net,FreeBSD.org} > Join distributed.net Team FreeBSD: http://www.posi.net/freebsd/Team-FreeBSD/ > > > To Unsubscribe: send mail to [EMAIL PROTECTED] > with "unsubscribe freebsd-net" in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ENOBUFS
On Fri, 18 Oct 2002, Luigi Rizzo wrote: > How is the measurement done, does the box under test act as a router > with the smartbit pushing traffic in and expecting it back ? > The box has 2 interfaces, a fxp and a em (or bge). The GigE interface is configured with 7 VLANs. THe SmartBit produces X byte UDP datagrams that go through a Foundry ServerIron switch for VLAN tagging and then to the GigE interface (where they are untagged). The box is acting as a router and all traffic is directed out the fxp interface where it returns to the SmartBit. > The numbers are strange, anyways. > > A frame of N bytes takes (N*8+160) nanoseconds on the wire, which > for 330-byte frames should amount to 100/(330*8+160) ~= 357kpps, > not the 249 or so you are seeing. Looks as if the times were 40% off. > Yeah, I've never made to much sense of the actual numbers myself. Our resident SmartBit expert runs the tests and provides me with the results. I use them more for getting an idea of the relative performance of one configuration over another and not as absolute numbers themselves. I'll check with our resident expert and see if he can explain how it calculates those numbers. The point being, though, that there is an undeniable drop-off with 332 byte or smaller packets. We have never seen any such drop-off using the bge driver. Thanks, Kelly > cheers > luigi > > On Fri, Oct 18, 2002 at 10:45:13AM -0700, Kelly Yancey wrote: > ... > > > can push out over 400kpps (64byte frames) on a 2.4GHz box. > > > > > > luigi > > > > > > > Using a SmartBit to push traffic across a 1.8Ghz P4; 82543 chipset card > > plugged into PCI-X bus: > > > > FrameSize TxFramesRxFramesLostFrames Lost (%) > > 330 249984 129518 120466 48.19 > > 331 249144 127726 121418 48.73 > > 332 248472 140817 107655 43.33 > > 333 247800 247800 0 0 > > > > It has no trouble handling frames 333 bytes or larger. But for any frame > > 332 bytes or smaller we consistently see ~50% packet loss. This same machine > > easily pushes ~100Mps with the very same frame sizes using a bge card rather > > than em. > > > > I've gotten the same results with both em driver version 1.3.14 and 1.3.15 > > on both FreeBSD 4.5 and 4.7 (all 4 combinations, that is). > > > > Kelly > > > > -- > > Kelly Yancey -- kbyanc@{posi.net,FreeBSD.org} > > FreeBSD, The Power To Serve: http://www.freebsd.org/ > > > > > > To Unsubscribe: send mail to [EMAIL PROTECTED] > > with "unsubscribe freebsd-net" in the body of the message > > To Unsubscribe: send mail to [EMAIL PROTECTED] > with "unsubscribe freebsd-net" in the body of the message > -- Kelly Yancey -- kbyanc@{posi.net,FreeBSD.org} Join distributed.net Team FreeBSD: http://www.posi.net/freebsd/Team-FreeBSD/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ENOBUFS
FYI. 82543 doesn't support PCI-X protocol. For PCI-X support use 82544, 82545 or 82546 based cards. -Prafulla Kelly Yancey [[EMAIL PROTECTED]] wrote: > On Fri, 18 Oct 2002, Luigi Rizzo wrote: > > > On Fri, Oct 18, 2002 at 10:27:04AM -0700, Kelly Yancey wrote: > > ... > > > Hmm. Might that explain the abysmal performance of the em driver with > > > packets smaller than 333 bytes? > > > > what do you mean ? it works great for me. even on -current i > > can push out over 400kpps (64byte frames) on a 2.4GHz box. > > > > luigi > > > > Using a SmartBit to push traffic across a 1.8Ghz P4; 82543 chipset card > plugged into PCI-X bus: > > FrameSize TxFramesRxFramesLostFrames Lost (%) > 330 249984 129518 120466 48.19 > 331 249144 127726 121418 48.73 > 332 248472 140817 107655 43.33 > 333 247800 247800 0 0 > > It has no trouble handling frames 333 bytes or larger. But for any frame > 332 bytes or smaller we consistently see ~50% packet loss. This same machine > easily pushes ~100Mps with the very same frame sizes using a bge card rather > than em. > > I've gotten the same results with both em driver version 1.3.14 and 1.3.15 > on both FreeBSD 4.5 and 4.7 (all 4 combinations, that is). > > Kelly > > -- > Kelly Yancey -- kbyanc@{posi.net,FreeBSD.org} > FreeBSD, The Power To Serve: http://www.freebsd.org/ > > > To Unsubscribe: send mail to [EMAIL PROTECTED] > with "unsubscribe freebsd-net" in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ENOBUFS
How is the measurement done, does the box under test act as a router with the smartbit pushing traffic in and expecting it back ? The numbers are strange, anyways. A frame of N bytes takes (N*8+160) nanoseconds on the wire, which for 330-byte frames should amount to 100/(330*8+160) ~= 357kpps, not the 249 or so you are seeing. Looks as if the times were 40% off. cheers luigi On Fri, Oct 18, 2002 at 10:45:13AM -0700, Kelly Yancey wrote: ... > > can push out over 400kpps (64byte frames) on a 2.4GHz box. > > > > luigi > > > > Using a SmartBit to push traffic across a 1.8Ghz P4; 82543 chipset card > plugged into PCI-X bus: > > FrameSize TxFramesRxFramesLostFrames Lost (%) > 330 249984 129518 120466 48.19 > 331 249144 127726 121418 48.73 > 332 248472 140817 107655 43.33 > 333 247800 247800 0 0 > > It has no trouble handling frames 333 bytes or larger. But for any frame > 332 bytes or smaller we consistently see ~50% packet loss. This same machine > easily pushes ~100Mps with the very same frame sizes using a bge card rather > than em. > > I've gotten the same results with both em driver version 1.3.14 and 1.3.15 > on both FreeBSD 4.5 and 4.7 (all 4 combinations, that is). > > Kelly > > -- > Kelly Yancey -- kbyanc@{posi.net,FreeBSD.org} > FreeBSD, The Power To Serve: http://www.freebsd.org/ > > > To Unsubscribe: send mail to [EMAIL PROTECTED] > with "unsubscribe freebsd-net" in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ENOBUFS
On Fri, 18 Oct 2002, Luigi Rizzo wrote: > On Fri, Oct 18, 2002 at 10:27:04AM -0700, Kelly Yancey wrote: > ... > > Hmm. Might that explain the abysmal performance of the em driver with > > packets smaller than 333 bytes? > > what do you mean ? it works great for me. even on -current i > can push out over 400kpps (64byte frames) on a 2.4GHz box. > > luigi > Using a SmartBit to push traffic across a 1.8Ghz P4; 82543 chipset card plugged into PCI-X bus: FrameSize TxFramesRxFramesLostFrames Lost (%) 330 249984 129518 120466 48.19 331 249144 127726 121418 48.73 332 248472 140817 107655 43.33 333 247800 247800 0 0 It has no trouble handling frames 333 bytes or larger. But for any frame 332 bytes or smaller we consistently see ~50% packet loss. This same machine easily pushes ~100Mps with the very same frame sizes using a bge card rather than em. I've gotten the same results with both em driver version 1.3.14 and 1.3.15 on both FreeBSD 4.5 and 4.7 (all 4 combinations, that is). Kelly -- Kelly Yancey -- kbyanc@{posi.net,FreeBSD.org} FreeBSD, The Power To Serve: http://www.freebsd.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ENOBUFS
On Fri, Oct 18, 2002 at 06:21:37PM +0300, Petri Helenius wrote: ... > Luigi´s polling work would be useful here. That would lead to incorrect > timestamps > on the packets, though? polling introduce an extra uncertainty which might be as large as an entire clock tick, yes. But even with interrupts, you cannot trust the time when the interrupt driver is run -- there are cases where an ISR is delayed by 10ms or more. And even when it runs, it might take quite a bit of time (up to a few 100's of microseconds) to drain the receive queue from packets received earlier. in normal cases, timestamps are reasonably accurate in both cases. In special cases, the error induced by having interrupts blocked causes errors which are much larger than polling alone. cheers luigi To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ENOBUFS
On Fri, Oct 18, 2002 at 10:27:04AM -0700, Kelly Yancey wrote: ... > Hmm. Might that explain the abysmal performance of the em driver with > packets smaller than 333 bytes? what do you mean ? it works great for me. even on -current i can push out over 400kpps (64byte frames) on a 2.4GHz box. luigi To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ENOBUFS
On Fri, 18 Oct 2002, Petri Helenius wrote: > > > > just reading the source code, yes, it appears that the card has > > support for delayed rx/tx interrupts -- see RIDV and TIDV definitions > > and usage in sys/dev/em/* . I don't know in what units are the values > > (28 and 128, respectively), but it does appear that tx interrupts are > > delayed a bit more than rx interrupts. > > > The thing what is looking suspect is also the "small packet interrupt" feature > which does not seem to get modified in the em driver but is on the defines. > > If that would be on by default, we´d probably see interrupts "too often" > because it tries to optimize interrupts for good throughput on small number > of TCP streams. > Hmm. Might that explain the abysmal performance of the em driver with packets smaller than 333 bytes? Kelly -- Kelly Yancey -- kbyanc@{posi.net,FreeBSD.org} FreeBSD, The Power To Serve: http://www.freebsd.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ENOBUFS
Transmit/Receive Interrupt Delay values are in units of 1.024 microseconds. The em driver currently uses these to enable interrupt coalescing on the cards. Thanks, Prafulla To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ENOBUFS
In reply to "Jim McGrath" <[EMAIL PROTECTED]> : > > > Where could I get the errata sheet? > > Product Specification Updates i.e. errata, and the Product Specification > itself are available from Intel under a Non Disclosure Agreement. Unless > you work for a company that is doing business with Intel, they are probably > not obtainable. > > > > Could the numbers be packet thresholds? 28 and 128 packets respectively? > > > I can't answer that directly because of NDA. Let us apply some logic here. > If they were packet counts, under very low load conditions e.g. a single > telnet session, the telnet link would be unusable. This leads us to the > conclusion that they must be time values. Based on the source code for the sk driver (look for "interrupt moderation" in if_sk.c) I would suspect that those values represent time in microseconds. My guess (based on no privileged information whatsoever) is that if we've not interrupted in microseconds and we have something to send (or we've received something) go ahead and raise an interrupt. Just a guess. I'm perfectly willing to be wrong about this --eli > > Jim > > Anything else that can be done? Does PCI width/speed affect the amount of > > time spent in the kernel interrupt or are the PCI transfers asynchronous? > > > > Pete > > > > - Original Message - > > From: "Jim McGrath" <[EMAIL PROTECTED]> > > To: "Luigi Rizzo" <[EMAIL PROTECTED]>; "Petri Helenius" <[EMAIL PROTECTED]> > > Cc: "Lars Eggert" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> > > Sent: Friday, October 18, 2002 7:49 AM > > Subject: RE: ENOBUFS > > > > > > > Careful here. Read the errata sheet!! I do not believe the em > > driver uses > > > these parameters, and possibly for a good reason. > > > > > > Jim > > > > > > > -Original Message- > > > > From: [EMAIL PROTECTED] > > > > [mailto:owner-freebsd-net@;FreeBSD.ORG]On Behalf Of Luigi Rizzo > > > > Sent: Thursday, October 17, 2002 11:12 PM > > > > To: Petri Helenius > > > > Cc: Lars Eggert; [EMAIL PROTECTED] > > > > Subject: Re: ENOBUFS > > > > > > > > > > > > On Thu, Oct 17, 2002 at 11:55:24PM +0300, Petri Helenius wrote: > > > > ... > > > > > I seem to get about 5-6 packets on an interrupt. Is this tunable? At > > > > > > > > just reading the source code, yes, it appears that the card has > > > > support for delayed rx/tx interrupts -- see RIDV and TIDV definitions > > > > and usage in sys/dev/em/* . I don't know in what units are the values > > > > (28 and 128, respectively), but it does appear that tx interrupts are > > > > delayed a bit more than rx interrupts. > > > > > > > > They are not user-configurable at the moment though, you need > > to rebuild > > > > the kernel. > > > > > > > > cheers > > > > luigi > > > > > > > > > 50kpps the card generates 10k interrupts a second. Sending generates > > > > > way less. This is about 300Mbps so with the average packet size of > > > > > 750 there should be room for more packets on the interface queue > > > > > before needing to service an interrupt? > > > > > > > > > > What´s the way to access kernel adapter-structure? Is there > > an utility > > > > > that can view the values there? > > > > > > > > > > > Pete > > > > > > > > > > > > > > > > > > To Unsubscribe: send mail to [EMAIL PROTECTED] > > > > with "unsubscribe freebsd-net" in the body of the message > > > > > > > > > > > > > > > > > To Unsubscribe: send mail to [EMAIL PROTECTED] > with "unsubscribe freebsd-net" in the body of the message msg07192/pgp0.pgp Description: PGP signature
RE: ENOBUFS
> The chips I have are 82546. Is your recommendation to steer away > from Intel > Gigabit Ethernet chips? What would be more optimal alternative? > The 82543/82544 chips worked well in vanilla configurations. I never played with an 82546. The em driver is supported by Intel, so any chip features it uses should be safe. When testing with a SmartBits, 64 byte packets and high line utilization, I ran into problems when RIDV was enabled. This may be fixed with the 82546, but I have no way of verifying this. > Maybe I´ll play with the value and see what happens. Any comments on > the question how to access the adapter structure from userland? > We added sysctls to the wx driver to allow tuning/testing of various parameters. The same could be done to the em driver. You will likely need to do more than just modify fields in the adapter structure. The control register you are targeting will need to be rewritten by the sysctl. Jim To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ENOBUFS
(I´ll throw in the address found in the README of the driver, maybe somebody there has access to appropriate documentation / is willing to work on documenting tunables and optimizing the performance) > I have to tread carefully here because I was under NDA at my previous > company. My work was with the wx driver, but hardware problems are hardware > problems. There are a lot of performance enhancing features in the 82544. The chips I have are 82546. Is your recommendation to steer away from Intel Gigabit Ethernet chips? What would be more optimal alternative? > You will notice that the em driver does not use them. This may be for a > reason :-( Our implementation ran with transmit interrupts disabled, so I > can't comment on TIDV and am not allowed to comment on RIDV. > Maybe I´ll play with the value and see what happens. Any comments on the question how to access the adapter structure from userland? > The Receive Descriptor Threshold interrupt showed promise under high load > (Rx interrupts disabled) but you would need to add a timeout function, 1 > msec. or faster, to process receive descriptors under low load. > Luigi´s polling work would be useful here. That would lead to incorrect timestamps on the packets, though? Pete > > > -Original Message- > > From: [EMAIL PROTECTED] > > [mailto:owner-freebsd-net@;FreeBSD.ORG]On Behalf Of Luigi Rizzo > > Sent: Friday, October 18, 2002 12:56 AM > > To: Jim McGrath > > Cc: Petri Helenius; Lars Eggert; [EMAIL PROTECTED] > > Subject: Re: ENOBUFS > > > > > > On Fri, Oct 18, 2002 at 12:49:04AM -0400, Jim McGrath wrote: > > > Careful here. Read the errata sheet!! I do not believe the em > > driver uses > > > these parameters, and possibly for a good reason. > > > > as if i had access to the data sheets :) > > > > cheers > > luigi > > > Jim > > > > > > > -Original Message- > > > > From: [EMAIL PROTECTED] > > > > [mailto:owner-freebsd-net@;FreeBSD.ORG]On Behalf Of Luigi Rizzo > > > > Sent: Thursday, October 17, 2002 11:12 PM > > > > To: Petri Helenius > > > > Cc: Lars Eggert; [EMAIL PROTECTED] > > > > Subject: Re: ENOBUFS > > > > > > > > > > > > On Thu, Oct 17, 2002 at 11:55:24PM +0300, Petri Helenius wrote: > > > > ... > > > > > I seem to get about 5-6 packets on an interrupt. Is this tunable? At > > > > > > > > just reading the source code, yes, it appears that the card has > > > > support for delayed rx/tx interrupts -- see RIDV and TIDV definitions > > > > and usage in sys/dev/em/* . I don't know in what units are the values > > > > (28 and 128, respectively), but it does appear that tx interrupts are > > > > delayed a bit more than rx interrupts. > > > > > > > > They are not user-configurable at the moment though, you need > > to rebuild > > > > the kernel. > > > > To Unsubscribe: send mail to [EMAIL PROTECTED] > > with "unsubscribe freebsd-net" in the body of the message > > > > To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
RE: ENOBUFS
> Where could I get the errata sheet? Product Specification Updates i.e. errata, and the Product Specification itself are available from Intel under a Non Disclosure Agreement. Unless you work for a company that is doing business with Intel, they are probably not obtainable. > > Could the numbers be packet thresholds? 28 and 128 packets respectively? > I can't answer that directly because of NDA. Let us apply some logic here. If they were packet counts, under very low load conditions e.g. a single telnet session, the telnet link would be unusable. This leads us to the conclusion that they must be time values. Jim > Anything else that can be done? Does PCI width/speed affect the amount of > time spent in the kernel interrupt or are the PCI transfers asynchronous? > > Pete > > - Original Message - > From: "Jim McGrath" <[EMAIL PROTECTED]> > To: "Luigi Rizzo" <[EMAIL PROTECTED]>; "Petri Helenius" <[EMAIL PROTECTED]> > Cc: "Lars Eggert" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> > Sent: Friday, October 18, 2002 7:49 AM > Subject: RE: ENOBUFS > > > > Careful here. Read the errata sheet!! I do not believe the em > driver uses > > these parameters, and possibly for a good reason. > > > > Jim > > > > > -Original Message- > > > From: [EMAIL PROTECTED] > > > [mailto:owner-freebsd-net@;FreeBSD.ORG]On Behalf Of Luigi Rizzo > > > Sent: Thursday, October 17, 2002 11:12 PM > > > To: Petri Helenius > > > Cc: Lars Eggert; [EMAIL PROTECTED] > > > Subject: Re: ENOBUFS > > > > > > > > > On Thu, Oct 17, 2002 at 11:55:24PM +0300, Petri Helenius wrote: > > > ... > > > > I seem to get about 5-6 packets on an interrupt. Is this tunable? At > > > > > > just reading the source code, yes, it appears that the card has > > > support for delayed rx/tx interrupts -- see RIDV and TIDV definitions > > > and usage in sys/dev/em/* . I don't know in what units are the values > > > (28 and 128, respectively), but it does appear that tx interrupts are > > > delayed a bit more than rx interrupts. > > > > > > They are not user-configurable at the moment though, you need > to rebuild > > > the kernel. > > > > > > cheers > > > luigi > > > > > > > 50kpps the card generates 10k interrupts a second. Sending generates > > > > way less. This is about 300Mbps so with the average packet size of > > > > 750 there should be room for more packets on the interface queue > > > > before needing to service an interrupt? > > > > > > > > What´s the way to access kernel adapter-structure? Is there > an utility > > > > that can view the values there? > > > > > > > > > Pete > > > > > > > > > > > > > > To Unsubscribe: send mail to [EMAIL PROTECTED] > > > with "unsubscribe freebsd-net" in the body of the message > > > > > > > > > To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
RE: ENOBUFS
I have to tread carefully here because I was under NDA at my previous company. My work was with the wx driver, but hardware problems are hardware problems. There are a lot of performance enhancing features in the 82544. You will notice that the em driver does not use them. This may be for a reason :-( Our implementation ran with transmit interrupts disabled, so I can't comment on TIDV and am not allowed to comment on RIDV. The Receive Descriptor Threshold interrupt showed promise under high load (Rx interrupts disabled) but you would need to add a timeout function, 1 msec. or faster, to process receive descriptors under low load. Jim > -Original Message- > From: [EMAIL PROTECTED] > [mailto:owner-freebsd-net@;FreeBSD.ORG]On Behalf Of Luigi Rizzo > Sent: Friday, October 18, 2002 12:56 AM > To: Jim McGrath > Cc: Petri Helenius; Lars Eggert; [EMAIL PROTECTED] > Subject: Re: ENOBUFS > > > On Fri, Oct 18, 2002 at 12:49:04AM -0400, Jim McGrath wrote: > > Careful here. Read the errata sheet!! I do not believe the em > driver uses > > these parameters, and possibly for a good reason. > > as if i had access to the data sheets :) > > cheers > luigi > > Jim > > > > > -Original Message- > > > From: [EMAIL PROTECTED] > > > [mailto:owner-freebsd-net@;FreeBSD.ORG]On Behalf Of Luigi Rizzo > > > Sent: Thursday, October 17, 2002 11:12 PM > > > To: Petri Helenius > > > Cc: Lars Eggert; [EMAIL PROTECTED] > > > Subject: Re: ENOBUFS > > > > > > > > > On Thu, Oct 17, 2002 at 11:55:24PM +0300, Petri Helenius wrote: > > > ... > > > > I seem to get about 5-6 packets on an interrupt. Is this tunable? At > > > > > > just reading the source code, yes, it appears that the card has > > > support for delayed rx/tx interrupts -- see RIDV and TIDV definitions > > > and usage in sys/dev/em/* . I don't know in what units are the values > > > (28 and 128, respectively), but it does appear that tx interrupts are > > > delayed a bit more than rx interrupts. > > > > > > They are not user-configurable at the moment though, you need > to rebuild > > > the kernel. > > To Unsubscribe: send mail to [EMAIL PROTECTED] > with "unsubscribe freebsd-net" in the body of the message > To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ENOBUFS
> > just reading the source code, yes, it appears that the card has > support for delayed rx/tx interrupts -- see RIDV and TIDV definitions > and usage in sys/dev/em/* . I don't know in what units are the values > (28 and 128, respectively), but it does appear that tx interrupts are > delayed a bit more than rx interrupts. > The thing what is looking suspect is also the "small packet interrupt" feature which does not seem to get modified in the em driver but is on the defines. If that would be on by default, we´d probably see interrupts "too often" because it tries to optimize interrupts for good throughput on small number of TCP streams. Should these questions be posted to the authors of the driver? Pete To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ENOBUFS
Where could I get the errata sheet? Could the numbers be packet thresholds? 28 and 128 packets respectively? Anything else that can be done? Does PCI width/speed affect the amount of time spent in the kernel interrupt or are the PCI transfers asynchronous? Pete - Original Message - From: "Jim McGrath" <[EMAIL PROTECTED]> To: "Luigi Rizzo" <[EMAIL PROTECTED]>; "Petri Helenius" <[EMAIL PROTECTED]> Cc: "Lars Eggert" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Friday, October 18, 2002 7:49 AM Subject: RE: ENOBUFS > Careful here. Read the errata sheet!! I do not believe the em driver uses > these parameters, and possibly for a good reason. > > Jim > > > -Original Message- > > From: [EMAIL PROTECTED] > > [mailto:owner-freebsd-net@;FreeBSD.ORG]On Behalf Of Luigi Rizzo > > Sent: Thursday, October 17, 2002 11:12 PM > > To: Petri Helenius > > Cc: Lars Eggert; [EMAIL PROTECTED] > > Subject: Re: ENOBUFS > > > > > > On Thu, Oct 17, 2002 at 11:55:24PM +0300, Petri Helenius wrote: > > ... > > > I seem to get about 5-6 packets on an interrupt. Is this tunable? At > > > > just reading the source code, yes, it appears that the card has > > support for delayed rx/tx interrupts -- see RIDV and TIDV definitions > > and usage in sys/dev/em/* . I don't know in what units are the values > > (28 and 128, respectively), but it does appear that tx interrupts are > > delayed a bit more than rx interrupts. > > > > They are not user-configurable at the moment though, you need to rebuild > > the kernel. > > > > cheers > > luigi > > > > > 50kpps the card generates 10k interrupts a second. Sending generates > > > way less. This is about 300Mbps so with the average packet size of > > > 750 there should be room for more packets on the interface queue > > > before needing to service an interrupt? > > > > > > What´s the way to access kernel adapter-structure? Is there an utility > > > that can view the values there? > > > > > > > Pete > > > > > > > > > > To Unsubscribe: send mail to [EMAIL PROTECTED] > > with "unsubscribe freebsd-net" in the body of the message > > > > To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ENOBUFS
On Fri, Oct 18, 2002 at 12:49:04AM -0400, Jim McGrath wrote: > Careful here. Read the errata sheet!! I do not believe the em driver uses > these parameters, and possibly for a good reason. as if i had access to the data sheets :) cheers luigi > Jim > > > -Original Message- > > From: [EMAIL PROTECTED] > > [mailto:owner-freebsd-net@;FreeBSD.ORG]On Behalf Of Luigi Rizzo > > Sent: Thursday, October 17, 2002 11:12 PM > > To: Petri Helenius > > Cc: Lars Eggert; [EMAIL PROTECTED] > > Subject: Re: ENOBUFS > > > > > > On Thu, Oct 17, 2002 at 11:55:24PM +0300, Petri Helenius wrote: > > ... > > > I seem to get about 5-6 packets on an interrupt. Is this tunable? At > > > > just reading the source code, yes, it appears that the card has > > support for delayed rx/tx interrupts -- see RIDV and TIDV definitions > > and usage in sys/dev/em/* . I don't know in what units are the values > > (28 and 128, respectively), but it does appear that tx interrupts are > > delayed a bit more than rx interrupts. > > > > They are not user-configurable at the moment though, you need to rebuild > > the kernel. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
RE: ENOBUFS
Careful here. Read the errata sheet!! I do not believe the em driver uses these parameters, and possibly for a good reason. Jim > -Original Message- > From: [EMAIL PROTECTED] > [mailto:owner-freebsd-net@;FreeBSD.ORG]On Behalf Of Luigi Rizzo > Sent: Thursday, October 17, 2002 11:12 PM > To: Petri Helenius > Cc: Lars Eggert; [EMAIL PROTECTED] > Subject: Re: ENOBUFS > > > On Thu, Oct 17, 2002 at 11:55:24PM +0300, Petri Helenius wrote: > ... > > I seem to get about 5-6 packets on an interrupt. Is this tunable? At > > just reading the source code, yes, it appears that the card has > support for delayed rx/tx interrupts -- see RIDV and TIDV definitions > and usage in sys/dev/em/* . I don't know in what units are the values > (28 and 128, respectively), but it does appear that tx interrupts are > delayed a bit more than rx interrupts. > > They are not user-configurable at the moment though, you need to rebuild > the kernel. > > cheers > luigi > > > 50kpps the card generates 10k interrupts a second. Sending generates > > way less. This is about 300Mbps so with the average packet size of > > 750 there should be room for more packets on the interface queue > > before needing to service an interrupt? > > > > What´s the way to access kernel adapter-structure? Is there an utility > > that can view the values there? > > > > > Pete > > > > > > To Unsubscribe: send mail to [EMAIL PROTECTED] > with "unsubscribe freebsd-net" in the body of the message > To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ENOBUFS
On Thu, Oct 17, 2002 at 11:55:24PM +0300, Petri Helenius wrote: ... > I seem to get about 5-6 packets on an interrupt. Is this tunable? At just reading the source code, yes, it appears that the card has support for delayed rx/tx interrupts -- see RIDV and TIDV definitions and usage in sys/dev/em/* . I don't know in what units are the values (28 and 128, respectively), but it does appear that tx interrupts are delayed a bit more than rx interrupts. They are not user-configurable at the moment though, you need to rebuild the kernel. cheers luigi > 50kpps the card generates 10k interrupts a second. Sending generates > way less. This is about 300Mbps so with the average packet size of > 750 there should be room for more packets on the interface queue > before needing to service an interrupt? > > What´s the way to access kernel adapter-structure? Is there an utility > that can view the values there? > > > Pete > > To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ENOBUFS
> > Less :-) Let me tell you tomorrow, don't have the numbers here right now. I seem to get about 5-6 packets on an interrupt. Is this tunable? At 50kpps the card generates 10k interrupts a second. Sending generates way less. This is about 300Mbps so with the average packet size of 750 there should be room for more packets on the interface queue before needing to service an interrupt? What´s the way to access kernel adapter-structure? Is there an utility that can view the values there? > Pete To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ENOBUFS
> Sam Leffler wrote: > > Try my port of the netbsd kttcp kernel module. You can find it at > > > > http://www.freebsd.org/~sam > > this seems to use some things from netbsd like > so_rcv.sb_lastrecord and SBLASTRECORDCHK/SBLASTMBUFCHK. > Is there something else I need to apply to build it on > freebsd -STABLE? > Sorry, I ported Jason's tail pointer stuff to -stable before kttcp so it assumes that's installed. If you don't want to redo kttcp you might try applying thorpe-stable.patch from the same directory. FWIW I've been running with that patch in my production systems for many months w/o incident. I never committed it because I didn't see noticeable performance improvements. Sam To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
RE: ENOBUFS
Sam Leffler wrote: > Try my port of the netbsd kttcp kernel module. You can find it at > > http://www.freebsd.org/~sam this seems to use some things from netbsd like so_rcv.sb_lastrecord and SBLASTRECORDCHK/SBLASTMBUFCHK. Is there something else I need to apply to build it on freebsd -STABLE? --don ([EMAIL PROTECTED] www.sandvine.com) To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ENOBUFS
> > The 900Mbps are similar to what I see here on similar hardware. > > What kind of receive performance do you observe? I haven´t got that > far yet. > > > > For your two-interface setup, are the 600Mbps aggregate send rate on > > both interfaces, or do you see 600Mbps per interface? In the latter > > 600Mbps per interface. I´m going to try this out also on -CURRENT > to see if it changes anything. Interrupts do not seem to pose a big > problem because I´m seeing only a few thousand em interrupts > a second but since every packet involves a write call there are >100k > syscalls a second. > > > case, is your CPU maxed out? Only one can be in the kernel under > > -stable, so the second one won't help much. With small packets like > > that, you may be interrupt-bound. (Until Luigi releases polling for em > > interfaces... :-) > > > I´ll try changing the packet sizes to figure out optimum. > Try my port of the netbsd kttcp kernel module. You can find it at http://www.freebsd.org/~sam It will eliminate the system calls. Don't recall if you said your system is a dual-processor; I never tried it on SMP hardware. Sam To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ENOBUFS
On Wed, Oct 16, 2002 at 08:57:19AM +0300, Petri Helenius wrote: > > > > how large are the packets and how fast is the box ? > > Packets go out at an average size of 1024 bytes. The box is dual > P4 Xeon 2400/400 so I think it should qualify as "fast" ? I disabled yes, it qualifies as fast. With this kind of box, a trivial program can send short (18 byte payload, 64 byte total) UDP frames at 5-600kpps, with quite a bit of time i suspect is being spent in the userland-kernel transition (with some tricks to skip that i went up to ~680kpps). > The information I´m looking for is how to instrument where the hard to tell -- see if short packets you get the same performance i mention above, then maybe try some tricks such as sending short bursts (5-10 pkts at a time) on each of the interfaces. Maybe using a UP kernel as opposed to an SMP one might give you slightly better performance, i am not sure though. There might be some minor optimizations here and there which could possibly help (e.g. make th em driver use m_getcl(), remove IPSEC from the kernel if you have it) but you are essentially close to the speed you can get with that box (within a factor of 2, probably). cheers luigi > > on a fast box you should be able to generate packets faster than wire > > speed for sizes around 500bytes, meaning that you are going to saturate > > the queue no matter how large it is. > > > > cheers > > luigi > > > > > em-interface is running 66/64 and is there a way to see interface queue > depth? > > > em0: port > 0x3040-0x307f > > > mem 0xfc22-0xfc23 irq 17 at device 3.0 on pci2 > > > em0: Speed:1000 Mbps Duplex:Full > > > pcib2: at device 29.0 on pci1 > > > IOAPIC #2 intpin 0 -> irq 16 > > > IOAPIC #2 intpin 6 -> irq 17 > > > IOAPIC #2 intpin 7 -> irq 18 > > > pci2: on pcib2 > > > > > > The OS is 4.7-RELEASE. > > > > > > Pete > > > > > > > > > > > > To Unsubscribe: send mail to [EMAIL PROTECTED] > > > with "unsubscribe freebsd-net" in the body of the message > > > > To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ENOBUFS
Petri Helenius wrote: >>The 900Mbps are similar to what I see here on similar hardware. > > What kind of receive performance do you observe? I haven´t got that > far yet. Less :-) Let me tell you tomorrow, don't have the numbers here right now. > 600Mbps per interface. I´m going to try this out also on -CURRENT > to see if it changes anything. Interrupts do not seem to pose a big > problem because I´m seeing only a few thousand em interrupts > a second but since every packet involves a write call there are >100k > syscalls a second. So maybe syscalls/second are the bottleneck. On -current, try enabling zero copy sockets, it seems to help somewhat. Other than that, I've not found -current to be much different in terms of performance. If you're just interested in maxing throughput, try sending over TCP with large write sizes. In that case, syscall overhead is less, since you amortize it over multiple packets. (But there are different issues that can limit TCP throughput.) > I´ll try changing the packet sizes to figure out optimum. I think I remember that 4K packets were fastest with the em hardware in our case. Lars -- Lars Eggert <[EMAIL PROTECTED]> USC Information Sciences Institute smime.p7s Description: S/MIME Cryptographic Signature
Re: ENOBUFS
> The 900Mbps are similar to what I see here on similar hardware. What kind of receive performance do you observe? I haven´t got that far yet. > > For your two-interface setup, are the 600Mbps aggregate send rate on > both interfaces, or do you see 600Mbps per interface? In the latter 600Mbps per interface. I´m going to try this out also on -CURRENT to see if it changes anything. Interrupts do not seem to pose a big problem because I´m seeing only a few thousand em interrupts a second but since every packet involves a write call there are >100k syscalls a second. > case, is your CPU maxed out? Only one can be in the kernel under > -stable, so the second one won't help much. With small packets like > that, you may be interrupt-bound. (Until Luigi releases polling for em > interfaces... :-) > I´ll try changing the packet sizes to figure out optimum. Pete To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ENOBUFS
Petri Helenius wrote: >>how large are the packets and how fast is the box ? > > > Packets go out at an average size of 1024 bytes. The box is dual > P4 Xeon 2400/400 so I think it should qualify as "fast" ? I disabled > hyperthreading to figure out if it was causing problems. I seem to > be able to send packets at a rate in the 900Mbps when just sending > them out with a process. If I do similar sending on two interfaces at > same time, it tops out at 600Mbps. The 900Mbps are similar to what I see here on similar hardware. For your two-interface setup, are the 600Mbps aggregate send rate on both interfaces, or do you see 600Mbps per interface? In the latter case, is your CPU maxed out? Only one can be in the kernel under -stable, so the second one won't help much. With small packets like that, you may be interrupt-bound. (Until Luigi releases polling for em interfaces... :-) Lars -- Lars Eggert <[EMAIL PROTECTED]> USC Information Sciences Institute smime.p7s Description: S/MIME Cryptographic Signature
Re: ENOBUFS
> > how large are the packets and how fast is the box ? Packets go out at an average size of 1024 bytes. The box is dual P4 Xeon 2400/400 so I think it should qualify as "fast" ? I disabled hyperthreading to figure out if it was causing problems. I seem to be able to send packets at a rate in the 900Mbps when just sending them out with a process. If I do similar sending on two interfaces at same time, it tops out at 600Mbps. The information I´m looking for is how to instrument where the bottleneck is to either tune the parameters or report a bug in PCI or em code. (or just simply swap the GE hardware to something that works better) Pete > on a fast box you should be able to generate packets faster than wire > speed for sizes around 500bytes, meaning that you are going to saturate > the queue no matter how large it is. > > cheers > luigi > > > em-interface is running 66/64 and is there a way to see interface queue depth? > > em0: port 0x3040-0x307f > > mem 0xfc22-0xfc23 irq 17 at device 3.0 on pci2 > > em0: Speed:1000 Mbps Duplex:Full > > pcib2: at device 29.0 on pci1 > > IOAPIC #2 intpin 0 -> irq 16 > > IOAPIC #2 intpin 6 -> irq 17 > > IOAPIC #2 intpin 7 -> irq 18 > > pci2: on pcib2 > > > > The OS is 4.7-RELEASE. > > > > Pete > > > > > > > > To Unsubscribe: send mail to [EMAIL PROTECTED] > > with "unsubscribe freebsd-net" in the body of the message > To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ENOBUFS
Petri Helenius wrote: >>Probably means that your outgoing interface queue is filling up. >>ENOBUFS is the only way the kernel has to tell you ``slow down!''. >> > > How much should I be able to send to two em interfaces on one > 66/64 PCI ? I've seen netperf UDP throughputs of ~950Mpbs with a fiber em card and 4K datagrams on a 2.4Ghz P4. Lars -- Lars Eggert <[EMAIL PROTECTED]> USC Information Sciences Institute smime.p7s Description: S/MIME Cryptographic Signature
Re: ENOBUFS
On Wed, Oct 16, 2002 at 02:04:11AM +0300, Petri Helenius wrote: > > > > What rate are you sending these packets at? A standard interface queue > > length is 50 packets, you get ENOBUFS when it's full. > > > This might explain the phenomenan. (packets are going out bursty, with average > hovering at ~500Mbps:ish) I recomplied kernel with IFQ_MAXLEN of 5000 > but there seems to be no change in the behaviour. How do I make sure that how large are the packets and how fast is the box ? on a fast box you should be able to generate packets faster than wire speed for sizes around 500bytes, meaning that you are going to saturate the queue no matter how large it is. cheers luigi > em-interface is running 66/64 and is there a way to see interface queue depth? > em0: port 0x3040-0x307f > mem 0xfc22-0xfc23 irq 17 at device 3.0 on pci2 > em0: Speed:1000 Mbps Duplex:Full > pcib2: at device 29.0 on pci1 > IOAPIC #2 intpin 0 -> irq 16 > IOAPIC #2 intpin 6 -> irq 17 > IOAPIC #2 intpin 7 -> irq 18 > pci2: on pcib2 > > The OS is 4.7-RELEASE. > > Pete > > > > To Unsubscribe: send mail to [EMAIL PROTECTED] > with "unsubscribe freebsd-net" in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ENOBUFS
> > Probably means that your outgoing interface queue is filling up. > ENOBUFS is the only way the kernel has to tell you ``slow down!''. > How much should I be able to send to two em interfaces on one 66/64 PCI ? Pete To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ENOBUFS
> > What rate are you sending these packets at? A standard interface queue > length is 50 packets, you get ENOBUFS when it's full. > This might explain the phenomenan. (packets are going out bursty, with average hovering at ~500Mbps:ish) I recomplied kernel with IFQ_MAXLEN of 5000 but there seems to be no change in the behaviour. How do I make sure that em-interface is running 66/64 and is there a way to see interface queue depth? em0: port 0x3040-0x307f mem 0xfc22-0xfc23 irq 17 at device 3.0 on pci2 em0: Speed:1000 Mbps Duplex:Full pcib2: at device 29.0 on pci1 IOAPIC #2 intpin 0 -> irq 16 IOAPIC #2 intpin 6 -> irq 17 IOAPIC #2 intpin 7 -> irq 18 pci2: on pcib2 The OS is 4.7-RELEASE. Pete To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
ENOBUFS
< said: > My processes writing to SOCK_DGRAM sockets are getting ENOBUFS Probably means that your outgoing interface queue is filling up. ENOBUFS is the only way the kernel has to tell you ``slow down!''. -GAWollman To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ENOBUFS
On Wed, 16 Oct 2002, Petri Helenius wrote: > > My processes writing to SOCK_DGRAM sockets are getting ENOBUFS > while netstat -s counter under the heading of "ip" is incrementing: > 7565828 output packets dropped due to no bufs, etc. > but netstat -m shows: my guess is that the interface has no more room in it's output queue.. when you get the error, back off a bit.. > > netstat -m > 579/1440/131072 mbufs in use (current/peak/max): > 578 mbufs allocated to data > 1 mbufs allocated to packet headers > 576/970/32768 mbuf clusters in use (current/peak/max) > 2300 Kbytes allocated to network (2% of mb_map in use) > 0 requests for memory denied > 0 requests for memory delayed > 0 calls to protocol drain routines > > Where should I start looking? The interface is em > > Pete > > > > To Unsubscribe: send mail to [EMAIL PROTECTED] > with "unsubscribe freebsd-net" in the body of the message > To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ENOBUFS
Petri Helenius wrote: > My processes writing to SOCK_DGRAM sockets are getting ENOBUFS > while netstat -s counter under the heading of "ip" is incrementing: > 7565828 output packets dropped due to no bufs, etc. What rate are you sending these packets at? A standard interface queue length is 50 packets, you get ENOBUFS when it's full. Lars -- Lars Eggert <[EMAIL PROTECTED]> USC Information Sciences Institute smime.p7s Description: S/MIME Cryptographic Signature
ENOBUFS
My processes writing to SOCK_DGRAM sockets are getting ENOBUFS while netstat -s counter under the heading of "ip" is incrementing: 7565828 output packets dropped due to no bufs, etc. but netstat -m shows: > netstat -m 579/1440/131072 mbufs in use (current/peak/max): 578 mbufs allocated to data 1 mbufs allocated to packet headers 576/970/32768 mbuf clusters in use (current/peak/max) 2300 Kbytes allocated to network (2% of mb_map in use) 0 requests for memory denied 0 requests for memory delayed 0 calls to protocol drain routines Where should I start looking? The interface is em Pete To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ip_output and ENOBUFS
Julian Elischer writes: > > > On Wed, 27 Mar 2002, Andrew Gallatin wrote: > > > > > Archie Cobbs writes: > > > Luigi Rizzo writes: > > > > > Is if_tx_rdy() something that can be used generally or does it only > > > > > work with dummynet ? > > > > > > > > well, the function is dummynet-specific, but I would certainly like > > > > a generic callback list to be implemented in ifnet which is > > > > invoked on tx_empty events. > > > > > > Me too :-) > > > > > > > The problem as usual is that you have to touch every single device > > > > driver... Fortunately we can leave the ifnet structure unmodified > > > > because i just discovered there is an ifindex2ifnet array which is > > > > managed and can be extended to point to additional ifnet state that > > > > does not fit in the immutable one... > > > > > > Why is it important to avoid changing 'struct ifnet' ? > > > > To maintain binary compatability for commercial network drivers. > > > > Currently, network driver modules built on 4.1.1 work on all versions > > of FreeBSD through 4.5-STABLE. > > > Not QUITE true.. > > they ar ebroken in some cases for 4.4 amd 4.5 due to a renumberring > of SYSINIT orderings, but I fixed that and they should work in 4.6 > again.. I know we hit it here with some cards we have.. > I just made a small patch in teh local trees to allow us to use them. > > Some cards may not hit this problem. I've never tried loading our driver at boot (we have customers load it manually, or via a /usr/local/etc/rc.d script very late in boot). 4.5 works fine for us. There was a bit of breakage just after 4.5 when for ARP support for variable length link level addresses was MFCed, but I caught that early.. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ip_output and ENOBUFS
On Wed, 27 Mar 2002, Andrew Gallatin wrote: > > Archie Cobbs writes: > > Luigi Rizzo writes: > > > > Is if_tx_rdy() something that can be used generally or does it only > > > > work with dummynet ? > > > > > > well, the function is dummynet-specific, but I would certainly like > > > a generic callback list to be implemented in ifnet which is > > > invoked on tx_empty events. > > > > Me too :-) > > > > > The problem as usual is that you have to touch every single device > > > driver... Fortunately we can leave the ifnet structure unmodified > > > because i just discovered there is an ifindex2ifnet array which is > > > managed and can be extended to point to additional ifnet state that > > > does not fit in the immutable one... > > > > Why is it important to avoid changing 'struct ifnet' ? > > To maintain binary compatability for commercial network drivers. > > Currently, network driver modules built on 4.1.1 work on all versions > of FreeBSD through 4.5-STABLE. > Not QUITE true.. they ar ebroken in some cases for 4.4 amd 4.5 due to a renumberring of SYSINIT orderings, but I fixed that and they should work in 4.6 again.. I know we hit it here with some cards we have.. I just made a small patch in teh local trees to allow us to use them. Some cards may not hit this problem. > > Drew > > To Unsubscribe: send mail to [EMAIL PROTECTED] > with "unsubscribe freebsd-net" in the body of the message > To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ip_output and ENOBUFS
On Wed, 27 Mar 2002, Archie Cobbs wrote: > Luigi Rizzo writes: > > > Is if_tx_rdy() something that can be used generally or does it only > > > work with dummynet ? > > > > well, the function is dummynet-specific, but I would certainly like > > a generic callback list to be implemented in ifnet which is > > invoked on tx_empty events. > > Me too :-) > > > The problem as usual is that you have to touch every single device > > driver... Fortunately we can leave the ifnet structure unmodified > > because i just discovered there is an ifindex2ifnet array which is > > managed and can be extended to point to additional ifnet state that > > does not fit in the immutable one... > > Why is it important to avoid changing 'struct ifnet' ? You can't touch struct ifnet in a released line of systems e.g. 4.x must not touch struct ifnet of break binary compatibility with drivers written for earlier 4.x systems. (and not available in source).. it turns out that sync interface cards are the single largest set of binary drivers... > > -Archie > > __ > Archie Cobbs * Packet Design * http://www.packetdesign.com > > To Unsubscribe: send mail to [EMAIL PROTECTED] > with "unsubscribe freebsd-net" in the body of the message > To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ip_output and ENOBUFS
On Wed, 27 Mar 2002, Luigi Rizzo wrote: > On Wed, Mar 27, 2002 at 09:53:00AM -0800, Archie Cobbs wrote: > ... > > > managed and can be extended to point to additional ifnet state that > > > does not fit in the immutable one... > > > > Why is it important to avoid changing 'struct ifnet' ? > > backward compatibility with binary-only drivers ... > Not that i care too much (in the end it is for the benefit of > a limited set of people which could as well not upgrade, vs. > preventing useful functionality to be added in a safe way), > but some people do and i can see their point. > On the other hand, this also means we can never progress if not > on major releases or unless changes slip in unnoticed... It is possible to hang extra info off the ifnet structure if one is careful. just not IN it.. > > cheers > luigi > > > -Archie > > > > __ > > Archie Cobbs * Packet Design * http://www.packetdesign.com > > > > To Unsubscribe: send mail to [EMAIL PROTECTED] > > with "unsubscribe freebsd-net" in the body of the message > > To Unsubscribe: send mail to [EMAIL PROTECTED] > with "unsubscribe freebsd-net" in the body of the message > To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ip_output and ENOBUFS
On Wed, Mar 27, 2002 at 09:53:00AM -0800, Archie Cobbs wrote: ... > > managed and can be extended to point to additional ifnet state that > > does not fit in the immutable one... > > Why is it important to avoid changing 'struct ifnet' ? backward compatibility with binary-only drivers ... Not that i care too much (in the end it is for the benefit of a limited set of people which could as well not upgrade, vs. preventing useful functionality to be added in a safe way), but some people do and i can see their point. On the other hand, this also means we can never progress if not on major releases or unless changes slip in unnoticed... cheers luigi > -Archie > > __ > Archie Cobbs * Packet Design * http://www.packetdesign.com > > To Unsubscribe: send mail to [EMAIL PROTECTED] > with "unsubscribe freebsd-net" in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ip_output and ENOBUFS
Archie Cobbs writes: > Luigi Rizzo writes: > > > Is if_tx_rdy() something that can be used generally or does it only > > > work with dummynet ? > > > > well, the function is dummynet-specific, but I would certainly like > > a generic callback list to be implemented in ifnet which is > > invoked on tx_empty events. > > Me too :-) > > > The problem as usual is that you have to touch every single device > > driver... Fortunately we can leave the ifnet structure unmodified > > because i just discovered there is an ifindex2ifnet array which is > > managed and can be extended to point to additional ifnet state that > > does not fit in the immutable one... > > Why is it important to avoid changing 'struct ifnet' ? To maintain binary compatability for commercial network drivers. Currently, network driver modules built on 4.1.1 work on all versions of FreeBSD through 4.5-STABLE. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ip_output and ENOBUFS
Luigi Rizzo writes: > > Is if_tx_rdy() something that can be used generally or does it only > > work with dummynet ? > > well, the function is dummynet-specific, but I would certainly like > a generic callback list to be implemented in ifnet which is > invoked on tx_empty events. Me too :-) > The problem as usual is that you have to touch every single device > driver... Fortunately we can leave the ifnet structure unmodified > because i just discovered there is an ifindex2ifnet array which is > managed and can be extended to point to additional ifnet state that > does not fit in the immutable one... Why is it important to avoid changing 'struct ifnet' ? -Archie __ Archie Cobbs * Packet Design * http://www.packetdesign.com To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ip_output and ENOBUFS
On Tue, Mar 26, 2002 at 10:48:33PM -0800, Archie Cobbs wrote: > Luigi Rizzo writes: > > As a matter of fact, i even implemented a similar thing in dummynet, > > and if device drivers call if_tx_rdy() when they complete a > > transmission, then the tx interrupt can be used to clock > > packets out of the dummynet pipes. A patch for if_tun.c is below, > > So if_tx_rdy() sounds like my if_get_next().. guess you already did that :-) yes, but it does not solve the problem of the original poster who wanted to block/wakeup processes getting enobufs. Signal just do not propagate beyond the pipe they are sent to. > Is if_tx_rdy() something that can be used generally or does it only > work with dummynet ? well, the function is dummynet-specific, but I would certainly like a generic callback list to be implemented in ifnet which is invoked on tx_empty events. So Ckernel modules could hook their own callback to the list and get notified of events when they occur. The problem as usual is that you have to touch every single device driver... Fortunately we can leave the ifnet structure unmodified because i just discovered there is an ifindex2ifnet array which is managed and can be extended to point to additional ifnet state that does not fit in the immutable one... cheers luigi To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ip_output and ENOBUFS
Luigi Rizzo writes: > > Along those lines, this might be a handy thing to add... > > > > int if_get_next(struct ifnet *ifp); /* runs at splimp() */ > > > > This function tries to "get" the next packet scheduled to go > > out interface 'ifp' and, if successful, puts it on &ifp->if_snd > > (the interface output queue for 'ifp') and returns 1; otherwise, > > it returns zero. > > how is this different from having a longer device queue ? The idea is that if_get_next() may in turn call some scheduling code that intelligently decides what packet gets to go next. So, when this kind of thing is enabled, the device queue basically always has either zero or one packets on it. In effect, this allows you to move the interface output queue out of the (dumb) device driver upwards in the networking stack, where e.g. a netgraph node can make the scheduling decision. The existing fixed length FIFO queues at each device mean you can't do intelligent scheduling of packets, because you can't manage that queue, because part of "managing" the queue is knowing when it goes empty. -Archie __ Archie Cobbs * Packet Design * http://www.packetdesign.com To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ip_output and ENOBUFS
Luigi Rizzo writes: > As a matter of fact, i even implemented a similar thing in dummynet, > and if device drivers call if_tx_rdy() when they complete a > transmission, then the tx interrupt can be used to clock > packets out of the dummynet pipes. A patch for if_tun.c is below, So if_tx_rdy() sounds like my if_get_next().. guess you already did that :-) Is if_tx_rdy() something that can be used generally or does it only work with dummynet ? -Archie __ Archie Cobbs * Packet Design * http://www.packetdesign.com To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ip_output and ENOBUFS
the ENOBUFS is very typical with UDP applications that try to send as fast as possible (e.g. the various network test utilities in ports), and as i said in a previous message, putting up a mechanism to pass around queue full/queue not full events is expensive because it might trigger on every single packet, and possibly have to wakeup multiple processes each time (with only one being able to succeed). The tcp handling of ENOBUFS is much cheaper. TCP is not waken up by the device, but from acks coming from the other side, or from timeouts. So there is not per-packet overhead just to implement this mechanism. As a matter of fact, i even implemented a similar thing in dummynet, and if device drivers call if_tx_rdy() when they complete a transmission, then the tx interrupt can be used to clock packets out of the dummynet pipes. A patch for if_tun.c is below, and if_tx_rdy() is in netinet/ip_dummynet.c. You could replace the call to if_tx_rdy with a wakeup() using some appropriate argument to wake up threads waiting for devices to become ready. cheers luigi > lcvs diff -u if_tun.c Index: if_tun.c === RCS file: /home/ncvs/src/sys/net/if_tun.c,v retrieving revision 1.51.2.2 diff -u -r1.51.2.2 if_tun.c --- if_tun.c28 Jul 1999 15:08:06 - 1.51.2.2 +++ if_tun.c19 Jun 2000 12:07:17 - @@ -19,6 +19,7 @@ #include "opt_devfs.h" #include "opt_inet.h" +#include "opt_ipdn.h" #include #include @@ -162,6 +163,10 @@ ifp = &tp->tun_if; tp->tun_flags |= TUN_OPEN; TUNDEBUG("%s%d: open\n", ifp->if_name, ifp->if_unit); +#ifdef DUMMYNET + if (ifp->if_snd.ifq_len == 0) /* better be! */ + if_tx_rdy(ifp); +#endif return (0); } @@ -487,6 +492,10 @@ } } } while (m0 == 0); +#ifdef DUMMYNET + if (ifp->if_snd.ifq_len == 0) + if_tx_rdy(ifp); +#endif splx(s); while (m0 && uio->uio_resid > 0 && error == 0) { On Tue, Mar 26, 2002 at 09:09:17AM -0800, Lars Eggert wrote: > Matthew Luckie wrote: > > hmm, we looked at how other protocols handled the ENOBUFS case from > > ip_output. > > > > tcp_output calls tcp_quench on this error. > > > > while the interface may not be able to send any more packets than it > > does currently, closing the congestion window back to 1 segment > > seems a severe way to handle this error, knowing that the network > > did not drop the packet due to congestion. Ideally, there might be > > some form of blocking until such time as a mbuf comes available. > > This sounds as if it will be much easier come FreeBSD 5.0 > > TCP will almost never encouter this scenario, since it's self-clocking. > The NIC is very rarely the bottleneck resource for a given network > connection. Have you looked at mean queue lengths for NICs? They are > typically zero or one. The NIC will only be the bottleneck if you are > sending at a higher rate than line speed and your burt time is too long > to be absorbed by the queue. > > > I'm aware that if people are hitting this condition, they need to > > increase the number of mbufs to get maximum performance. > > No. ENOBUFS in ip_output almost always means that your NIC queue is > full, which isn't controlled through mbufs. You can make the queue > longer, but that won't help if you're sending too fast. > > > This section of code has previously been discussed here: > > http://docs.freebsd.org/cgi/getmsg.cgi?fetch=119188+0+archive/2000/fr- > > eebsd-net/2730.freebsd-net and has been in use for many years (a > > This is a slightly different problem than you describe. What Archie saw > was an ENOBUFS being handled like a loss inside the network, even though > the sender has information locally that can allow it to make smarter > retransmission decisions. > > Lars > -- > Lars Eggert <[EMAIL PROTECTED]> Information Sciences Institute > http://www.isi.edu/larse/ University of Southern California To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ip_output and ENOBUFS
On Tue, Mar 26, 2002 at 10:10:05PM -0800, Archie Cobbs wrote: > Luigi Rizzo writes: ... > Along those lines, this might be a handy thing to add... > > int if_get_next(struct ifnet *ifp); /* runs at splimp() */ > > This function tries to "get" the next packet scheduled to go > out interface 'ifp' and, if successful, puts it on &ifp->if_snd > (the interface output queue for 'ifp') and returns 1; otherwise, > it returns zero. how is this different from having a longer device queue ? cheers luigi > Then, each device driver can be modified (over time) to invoke > this function when it gets a transmit interrupt and it's output > queue is empty. If the function returns 1, grab the new packet > off the queue and schedule it for transmission. > > Once this is done it becomes much easier to hack together ideas > for queueing and scheduling e.g., a netgraph node that does packet > scheduling. > > I think ALTQ does something like this. It would be nice if it > was generic enough that other mechanisms besides ALTQ (like > netgraph) could also use it. I'm not that familiar with how > ALTQ is implemented. > > -Archie > > __ > Archie Cobbs * Packet Design * http://www.packetdesign.com > > To Unsubscribe: send mail to [EMAIL PROTECTED] > with "unsubscribe freebsd-net" in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ip_output and ENOBUFS
Luigi Rizzo writes: > > >if you could suggest a few modifications that would be required, i'd like > > >to pursue this further. > > > > Look at tsleep/wakeup on ifnet of if_snd. > > I am under the impression that implementing this mechanism would > not be so trivial. It is not immediate to tell back to the caller > on which interface ip_output() failed. Nor there is a common place > that i know of where you can be notified that a packet was successfully > transmitted -- i suspect you should patch all individual drivers. > Finally, there is the question on whether you do a wakeup as soon > as you get a free slot in the queue (in which case you most likely > end up paying the cost of a tsleep/wakeup pair on each transmission), > or you put some histeresys. Along those lines, this might be a handy thing to add... int if_get_next(struct ifnet *ifp); /* runs at splimp() */ This function tries to "get" the next packet scheduled to go out interface 'ifp' and, if successful, puts it on &ifp->if_snd (the interface output queue for 'ifp') and returns 1; otherwise, it returns zero. Then, each device driver can be modified (over time) to invoke this function when it gets a transmit interrupt and it's output queue is empty. If the function returns 1, grab the new packet off the queue and schedule it for transmission. Once this is done it becomes much easier to hack together ideas for queueing and scheduling e.g., a netgraph node that does packet scheduling. I think ALTQ does something like this. It would be nice if it was generic enough that other mechanisms besides ALTQ (like netgraph) could also use it. I'm not that familiar with how ALTQ is implemented. -Archie __ Archie Cobbs * Packet Design * http://www.packetdesign.com To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ip_output and ENOBUFS
Matthew Luckie wrote: > hmm, we looked at how other protocols handled the ENOBUFS case from > ip_output. > > tcp_output calls tcp_quench on this error. > > while the interface may not be able to send any more packets than it > does currently, closing the congestion window back to 1 segment > seems a severe way to handle this error, knowing that the network > did not drop the packet due to congestion. Ideally, there might be > some form of blocking until such time as a mbuf comes available. > This sounds as if it will be much easier come FreeBSD 5.0 TCP will almost never encouter this scenario, since it's self-clocking. The NIC is very rarely the bottleneck resource for a given network connection. Have you looked at mean queue lengths for NICs? They are typically zero or one. The NIC will only be the bottleneck if you are sending at a higher rate than line speed and your burt time is too long to be absorbed by the queue. > I'm aware that if people are hitting this condition, they need to > increase the number of mbufs to get maximum performance. No. ENOBUFS in ip_output almost always means that your NIC queue is full, which isn't controlled through mbufs. You can make the queue longer, but that won't help if you're sending too fast. > This section of code has previously been discussed here: > http://docs.freebsd.org/cgi/getmsg.cgi?fetch=119188+0+archive/2000/fr- > eebsd-net/2730.freebsd-net and has been in use for many years (a This is a slightly different problem than you describe. What Archie saw was an ENOBUFS being handled like a loss inside the network, even though the sender has information locally that can allow it to make smarter retransmission decisions. Lars -- Lars Eggert <[EMAIL PROTECTED]> Information Sciences Institute http://www.isi.edu/larse/ University of Southern California smime.p7s Description: S/MIME Cryptographic Signature
Re: ip_output and ENOBUFS
> I am under the impression that implementing this mechanism would > not be so trivial. hmm, we looked at how other protocols handled the ENOBUFS case from ip_output. tcp_output calls tcp_quench on this error. while the interface may not be able to send any more packets than it does currently, closing the congestion window back to 1 segment seems a severe way to handle this error, knowing that the network did not drop the packet due to congestion. Ideally, there might be some form of blocking until such time as a mbuf comes available. This sounds as if it will be much easier come FreeBSD 5.0 I'm aware that if people are hitting this condition, they need to increase the number of mbufs to get maximum performance. This section of code has previously been discussed here: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=119188+0+archive/2000/freebsd-net/2730.freebsd-net and has been in use for many years (a glance at TCP/IP Illustrated Vol 2 shows similar code), so there is probably a good reason that I am not aware of for this code to be in place. Comments? To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ip_output and ENOBUFS
On Mon, Mar 25, 2002 at 02:06:19PM -0800, Lars Eggert wrote: > Matthew Luckie wrote: > >>>Is there a mechanism to tell when ip_output should be called again? ... > >if you could suggest a few modifications that would be required, i'd like > >to pursue this further. > > Look at tsleep/wakeup on ifnet of if_snd. I am under the impression that implementing this mechanism would not be so trivial. It is not immediate to tell back to the caller on which interface ip_output() failed. Nor there is a common place that i know of where you can be notified that a packet was successfully transmitted -- i suspect you should patch all individual drivers. Finally, there is the question on whether you do a wakeup as soon as you get a free slot in the queue (in which case you most likely end up paying the cost of a tsleep/wakeup pair on each transmission), or you put some histeresys. cheers luigi > Lars > -- > Lars Eggert <[EMAIL PROTECTED]> Information Sciences Institute > http://www.isi.edu/larse/ University of Southern California To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ip_output and ENOBUFS
On Mon, 25 Mar 2002, Matthew Luckie wrote: > Hi > > > Is there a mechanism to tell when ip_output should be called again? > Ideally, I would block until such time as i could send it via ip_output no, there is no such mechanism that I know of.. > > (please CC: me on any responses) > > Matthew Luckie > [EMAIL PROTECTED] > To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ip_output and ENOBUFS
Lars Eggert wrote: > Matthew Luckie wrote: > Is there a mechanism to tell when ip_output should be called again? Ideally, I would block until such time as i could send it via ip_output >>> >>> >>> You probably get that because the outbound interface queue gets full, >>> so you want to block your caller until space becomes available there. >>> There currently is no such mechanism (AFAIK, and talking about >>> -STABLE here), but it's not too much work to add. >> >> >> if you could suggest a few modifications that would be required, i'd like >> to pursue this further. > > > Look at tsleep/wakeup on ifnet of if_snd. ^^ or Sorry, big fingers. -- Lars Eggert <[EMAIL PROTECTED]> Information Sciences Institute http://www.isi.edu/larse/ University of Southern California smime.p7s Description: S/MIME Cryptographic Signature
Re: ip_output and ENOBUFS
Matthew Luckie wrote: >>>Is there a mechanism to tell when ip_output should be called again? >>>Ideally, I would block until such time as i could send it via ip_output >> >>You probably get that because the outbound interface queue gets full, so >>you want to block your caller until space becomes available there. There >>currently is no such mechanism (AFAIK, and talking about -STABLE here), >>but it's not too much work to add. > > if you could suggest a few modifications that would be required, i'd like > to pursue this further. Look at tsleep/wakeup on ifnet of if_snd. Lars -- Lars Eggert <[EMAIL PROTECTED]> Information Sciences Institute http://www.isi.edu/larse/ University of Southern California smime.p7s Description: S/MIME Cryptographic Signature
Re: ip_output and ENOBUFS
> > Is there a mechanism to tell when ip_output should be called again? > > Ideally, I would block until such time as i could send it via ip_output > > You probably get that because the outbound interface queue gets full, so > you want to block your caller until space becomes available there. There > currently is no such mechanism (AFAIK, and talking about -STABLE here), > but it's not too much work to add. i've worked at a layer above ip_output, but i havent looked to deeply at this issue and the code in the kernel. if you could suggest a few modifications that would be required, i'd like to pursue this further. Thanks for your response. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ip_output and ENOBUFS
Matthew Luckie wrote: > I have written a syscall that creates a packet in kernel-space, > timestamps it, and then sends it via ip_output > > If the user-space application uses this system call faster than the > packets can be sent, ip_output will return ENOBUFS. > > Is there a mechanism to tell when ip_output should be called again? > Ideally, I would block until such time as i could send it via ip_output You probably get that because the outbound interface queue gets full, so you want to block your caller until space becomes available there. There currently is no such mechanism (AFAIK, and talking about -STABLE here), but it's not too much work to add. Not sure if this is really useful though. Ususally the NIC doesn't limit your transmission speed, it's losses inside the network that do. Also, why a new system call? Is it that much more efficient than RawIP? Lars -- Lars Eggert <[EMAIL PROTECTED]> Information Sciences Institute http://www.isi.edu/larse/ University of Southern California smime.p7s Description: S/MIME Cryptographic Signature
ip_output and ENOBUFS
Hi I have written a syscall that creates a packet in kernel-space, timestamps it, and then sends it via ip_output If the user-space application uses this system call faster than the packets can be sent, ip_output will return ENOBUFS. Is there a mechanism to tell when ip_output should be called again? Ideally, I would block until such time as i could send it via ip_output (please CC: me on any responses) Matthew Luckie [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
Re: ENOBUFS and network performance tuning
On Tue, 25 Sep 2001, Jeff Behl wrote: > Any other guidelines to help tune a FreeBSD box for this sort of use > would be greatly appreciated. Currently, the only change we make is > increasing MAXUSERS to 128, though I'm not sure this is the preferred > approach. That's the simplest approach, as it bumps up numerous kernel setting. With 4.4 you can tune it in loader.conf, so changing the setting isn't a big deal. You should probably check how many sockets are sticking around in the TIME_WAIT state and compare it to kern.ipc.maxsockets - that may be the limit you're hitting first. Mike "Silby" Silbersack To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message
ENOBUFS and network performance tuning
I have 4.3, and soon to be 4.4, boxes dedicated to a single app which basically 'bounces' traffic between two incoming TCP connections. After around 240 sessions (each session consisting of two incoming connections with traffic being passed between them), I started getting ENOBUFS errors. netstat -m showed mbuf's never peaked, so we increased kern.ipc.somaxconn from 128 -> 256. Should this help the problem? Any other guidelines to help tune a FreeBSD box for this sort of use would be greatly appreciated. Currently, the only change we make is increasing MAXUSERS to 128, though I'm not sure this is the preferred approach. Also, is there a definitive guide to what all the kernel variables (sysctl -a) are? thanks Jeff To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message