Proposed patch, convert IFQ_MAXLEN to kernel tunable...
Hi, It turns out that the last time anyone looked at this constant was before 1994 and it's very likely time to turn it into a kernel tunable. On hosts that have a high rate of packet transmission packets can be dropped at the interface queue because this value is too small. Rather than make a sweeping code change I propose the following change to the macro and updating a couple of places in the IP and IPv6 stacks that were using this macro to set their own global variables. I have tested this in my test lab at work, it is not as yet in production at my day job, but will be soon. Best, George Index: netinet/ip_input.c === --- netinet/ip_input.c (revision 183299) +++ netinet/ip_input.c (working copy) @@ -133,7 +133,6 @@ struct pfil_head inet_pfil_hook; /* Packet filter hooks */ static struct ifqueue ipintrq; -static int ipqmaxlen = IFQ_MAXLEN; extern struct domain inetdomain; extern struct protosw inetsw[]; @@ -265,7 +264,7 @@ /* Initialize various other remaining things. */ ip_id = time_second & 0x; - ipintrq.ifq_maxlen = ipqmaxlen; + ipintrq.ifq_maxlen = IFQ_MAXLEN; mtx_init(&ipintrq.ifq_mtx, "ip_inq", NULL, MTX_DEF); netisr_register(NETISR_IP, ip_input, &ipintrq, NETISR_MPSAFE); } Index: net/if.c === --- net/if.c(revision 183299) +++ net/if.c(working copy) @@ -135,7 +135,14 @@ #endif intif_index = 0; -intifqmaxlen = IFQ_MAXLEN; + +int ifqmaxlen = 50; +TUNABLE_INT("net.ifqmaxlen", &ifqmaxlen); + +SYSCTL_INT(_net, OID_AUTO, ifqmaxlen, CTLFLAG_RD, + &ifqmaxlen, 0, + "interface queue length"); + struct ifnethead ifnet;/* depend on static init XXX */ struct ifgrouphead ifg_head; struct mtx ifnet_lock; Index: net/if.h === --- net/if.h(revision 183299) +++ net/if.h(working copy) @@ -221,7 +221,7 @@ #defineIFCAP_WOL (IFCAP_WOL_UCAST | IFCAP_WOL_MCAST | IFCAP_WOL_MAGIC) #defineIFCAP_TOE (IFCAP_TOE4 | IFCAP_TOE6) -#defineIFQ_MAXLEN 50 +#defineIFQ_MAXLEN ifqmaxlen #defineIFNET_SLOWHZ1 /* granularity is 1 second */ /* Index: netinet6/ip6_input.c === --- netinet6/ip6_input.c(revision 183299) +++ netinet6/ip6_input.c(working copy) @@ -115,7 +115,6 @@ u_char ip6_protox[IPPROTO_MAX]; static struct ifqueue ip6intrq; -static int ip6qmaxlen = IFQ_MAXLEN; struct in6_ifaddr *in6_ifaddr; extern struct callout in6_tmpaddrtimer_ch; @@ -178,7 +177,7 @@ printf("%s: WARNING: unable to register pfil hook, " "error %d\n", __func__, i); - ip6intrq.ifq_maxlen = ip6qmaxlen; + ip6intrq.ifq_maxlen = IFQ_MAXLEN; mtx_init(&ip6intrq.ifq_mtx, "ip6_inq", NULL, MTX_DEF); netisr_register(NETISR_IPV6, ip6_input, &ip6intrq, 0); scope6_init(); ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Proposed patch, convert IFQ_MAXLEN to kernel tunable...
Hi, On Tue, Sep 23, 2008 at 03:29:06PM -0400, [EMAIL PROTECTED] wrote: > It turns out that the last time anyone looked at this constant was > before 1994 and it's very likely time to turn it into a kernel > tunable. On hosts that have a high rate of packet transmission > packets can be dropped at the interface queue because this value is > too small. Rather than make a sweeping code change I propose the > following change to the macro and updating a couple of places in the > IP and IPv6 stacks that were using this macro to set their own global > variables. > > I have tested this in my test lab at work, it is not as yet in > production at my day job, but will be soon. > It's not that bad -- most modern Ethernet drivers initialize interface input queues themselves, and don't depend on IFQ_MAXLEN. The IPv4 input queue is tunable via net.inet.ip.intr_queue_maxlen. The IPv6 queue can similarly be made tunable. I agree that ifqmaxlen can be made tunable because there's still a lot of (mostly for old hardware) drivers that use ifqmaxlen and IFQ_MAXLEN, but I'm against changing the definition of IFQ_MAXLEN. Imagine some code like this: void *x[IFQ_MAXLEN];// here it's 50 And some function that does: for (i = 0; i < IFQ_MAXLEN; i++) { // not necessarily 50 x[i] = NULL; } Cheers, -- Ruslan Ermilov [EMAIL PROTECTED] FreeBSD committer ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Proposed patch, convert IFQ_MAXLEN to kernel tunable...
At Wed, 24 Sep 2008 00:17:18 +0400, Ruslan Ermilov wrote: > > Hi, > > On Tue, Sep 23, 2008 at 03:29:06PM -0400, [EMAIL PROTECTED] wrote: > > It turns out that the last time anyone looked at this constant was > > before 1994 and it's very likely time to turn it into a kernel > > tunable. On hosts that have a high rate of packet transmission > > packets can be dropped at the interface queue because this value is > > too small. Rather than make a sweeping code change I propose the > > following change to the macro and updating a couple of places in the > > IP and IPv6 stacks that were using this macro to set their own global > > variables. > > > > I have tested this in my test lab at work, it is not as yet in > > production at my day job, but will be soon. > > > It's not that bad -- most modern Ethernet drivers initialize interface > input queues themselves, and don't depend on IFQ_MAXLEN. The IPv4 > input queue is tunable via net.inet.ip.intr_queue_maxlen. The IPv6 > queue can similarly be made tunable. I agree that ifqmaxlen can be > made tunable because there's still a lot of (mostly for old hardware) > drivers that use ifqmaxlen and IFQ_MAXLEN, but I'm against changing > the definition of IFQ_MAXLEN. Imagine some code like this: > Sorry, this is about the output queue, not the input queue. Though there are both input and output queues that depend on this. > void *x[IFQ_MAXLEN]; // here it's 50 > > And some function that does: > > for (i = 0; i < IFQ_MAXLEN; i++) {// not necessarily 50 > x[i] = NULL; > } > I found no occurrences of the above in our code base. I used cscope to search all of src/sys. Are you aware of any occurrences of this? Best, George ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Proposed patch, convert IFQ_MAXLEN to kernel tunable...
Hi, I agree with the intent of the change that IPv4 and IPv6 input queues should have a tunable queue length. However, the change provided is going to make the definition of IFQ_MAXLEN global and dependent upon a variable. [EMAIL PROTECTED] wrote: Hi, It turns out that the last time anyone looked at this constant was before 1994 and it's very likely time to turn it into a kernel tunable. On hosts that have a high rate of packet transmission packets can be dropped at the interface queue because this value is too small. Rather than make a sweeping code change I propose the following change to the macro and updating a couple of places in the IP and IPv6 stacks that were using this macro to set their own global variables. This isn't appropriate for many uses of ifq's which might be internal to a given driver or subsystem, and which may use IFQ_MAXLEN for convenience, as Ruslan has pointed out. I have code elsewhere which does this. Can you please do this on a per-protocol stack basis? i.e. give IPv4 and IPv6 their own TUNABLE queue length. thanks BMS ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Proposed patch, convert IFQ_MAXLEN to kernel tunable...
[EMAIL PROTECTED] wrote: ... I found no occurrences of the above in our code base. I used cscope to search all of src/sys. Are you aware of any occurrences of this? I have been using IFQ_MAXLEN to size buffer queues internal to some IGMPv3 stuff. I don't feel comfortable with a change which sizes the queues for both IPv4 and IPv6 stacks, from a variable which is obscured by a macro. thanks BMS ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Proposed patch, convert IFQ_MAXLEN to kernel tunable...
At Wed, 24 Sep 2008 15:50:32 +0100, Bruce M. Simpson wrote: > > Hi, > > I agree with the intent of the change that IPv4 and IPv6 input queues > should have a tunable queue length. However, the change provided is > going to make the definition of IFQ_MAXLEN global and dependent upon a > variable. > > [EMAIL PROTECTED] wrote: > > Hi, > > > > It turns out that the last time anyone looked at this constant was > > before 1994 and it's very likely time to turn it into a kernel > > tunable. On hosts that have a high rate of packet transmission > > packets can be dropped at the interface queue because this value is > > too small. Rather than make a sweeping code change I propose the > > following change to the macro and updating a couple of places in the > > IP and IPv6 stacks that were using this macro to set their own global > > variables. > > > > This isn't appropriate for many uses of ifq's which might be internal to > a given driver or subsystem, and which may use IFQ_MAXLEN for > convenience, as Ruslan has pointed out. I have code elsewhere which does > this. > > Can you please do this on a per-protocol stack basis? i.e. give IPv4 and > IPv6 their own TUNABLE queue length. > Actually what we'd need is N of these, since my target is actually the send queue, not the input queue. Let me look at this some more. Best, George ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Proposed patch, convert IFQ_MAXLEN to kernel tunable...
George V. Neville-Neil wrote this message on Tue, Sep 23, 2008 at 15:29 -0400: > It turns out that the last time anyone looked at this constant was > before 1994 and it's very likely time to turn it into a kernel > tunable. On hosts that have a high rate of packet transmission > packets can be dropped at the interface queue because this value is > too small. Rather than make a sweeping code change I propose the > following change to the macro and updating a couple of places in the > IP and IPv6 stacks that were using this macro to set their own global > variables. The better solution is to resurrect rwatson's patch that eliminates the interface queue, and does direct dispatch to the ethernet driver.. Usually the driver has a queue of 512 or more packets already, so putting them into a second queue doesn't provide much benefit besides increasing the amount of locking necessary to deliver packets... -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Proposed patch, convert IFQ_MAXLEN to kernel tunable...
At Wed, 24 Sep 2008 12:53:31 -0700, John-Mark Gurney wrote: > > George V. Neville-Neil wrote this message on Tue, Sep 23, 2008 at 15:29 -0400: > > It turns out that the last time anyone looked at this constant was > > before 1994 and it's very likely time to turn it into a kernel > > tunable. On hosts that have a high rate of packet transmission > > packets can be dropped at the interface queue because this value is > > too small. Rather than make a sweeping code change I propose the > > following change to the macro and updating a couple of places in the > > IP and IPv6 stacks that were using this macro to set their own global > > variables. > > The better solution is to resurrect rwatson's patch that eliminates the > interface queue, and does direct dispatch to the ethernet driver.. > Usually the driver has a queue of 512 or more packets already, so putting > them into a second queue doesn't provide much benefit besides increasing > the amount of locking necessary to deliver packets... Actually I am making this change because I found on 10G hardware the queue is too small. Also, there are many systems where you might want to up this, usually ones that are highly biased towards transmit only, like a multicast repeater of some sort. Best, George ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Proposed patch, convert IFQ_MAXLEN to kernel tunable...
[EMAIL PROTECTED] wrote: At Wed, 24 Sep 2008 12:53:31 -0700, John-Mark Gurney wrote: George V. Neville-Neil wrote this message on Tue, Sep 23, 2008 at 15:29 -0400: It turns out that the last time anyone looked at this constant was before 1994 and it's very likely time to turn it into a kernel tunable. On hosts that have a high rate of packet transmission packets can be dropped at the interface queue because this value is too small. Rather than make a sweeping code change I propose the following change to the macro and updating a couple of places in the IP and IPv6 stacks that were using this macro to set their own global variables. The better solution is to resurrect rwatson's patch that eliminates the interface queue, and does direct dispatch to the ethernet driver.. Usually the driver has a queue of 512 or more packets already, so putting them into a second queue doesn't provide much benefit besides increasing the amount of locking necessary to deliver packets... Actually I am making this change because I found on 10G hardware the queue is too small. Also, there are many systems where you might want to up this, usually ones that are highly biased towards transmit only, like a multicast repeater of some sort. One system I have seen, that I thought made sense used to define queue length globally in msecs and each interface interpretted that to a different length. Best, George ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]" ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Proposed patch, convert IFQ_MAXLEN to kernel tunable...
On Wed, 24 Sep 2008 [EMAIL PROTECTED] wrote: At Wed, 24 Sep 2008 12:53:31 -0700, John-Mark Gurney wrote: George V. Neville-Neil wrote this message on Tue, Sep 23, 2008 at 15:29 -0400: It turns out that the last time anyone looked at this constant was before 1994 and it's very likely time to turn it into a kernel tunable. On hosts that have a high rate of packet transmission packets can be dropped at the interface queue because this value is too small. Rather than make a sweeping code change I propose the following change to the macro and updating a couple of places in the IP and IPv6 stacks that were using this macro to set their own global variables. The global value is rarely used (except by low-end and/or 10 Mbps hardware where it doesn't matter). One important place where it is used (that you changed) is for ipintrq, but this should have a different default anyway and already has its own sysctl to fix up the broken default. The better solution is to resurrect rwatson's patch that eliminates the interface queue, and does direct dispatch to the ethernet driver.. If this were really better, then rwatson would have committed it. Usually the driver has a queue of 512 or more packets already, so putting them into a second queue doesn't provide much benefit besides increasing the amount of locking necessary to deliver packets... No, the second queue provides the fairly large benefit of doing watermark processing via double buffering. Most or all network drivers do no watermark processing except accidentally via the second queue. Even with watermarks in the driver queue, it still takes a fairly large external queue to prevent the transmitter running dry under load. E.g., in my version of the bge driver, there is essentially a watermark at 112 descriptors before the end of the driver tx queue. Under load, the driver tx queue is normally filled with 496 descriptors on every output interrupt. The driver enters state IFF_DRV_OACTIVE and upper layers stop trying to transfer packets to the driver queue. The next output interrupt occurs after 384 of the 496 descriptors have been handled, so that 112 remain. Then IFF_DRV_OACTIVE is cleared and upper layers resume transferring packets to the driver queue. For output to stream, and for efficiency, it is essential for upper layers to have plenty of packets to transfer at this point, but without the double buffering they would have none. When things are working right, the upper layers produce more than 384 descriptors by transferring at this point so that the driver queue fills again. For output to stream, it is essential to produce 1 new descriptor faster than the hardware can handle the 112 remaining ones and then keep producing new descriptors faster than the hardware can handle the previous new ones. With my hardware, the 112 descriptors take about 150 usec to handle. Buffering in userland is too far away to deliver new packets within this time in many cases (though syscalls are fast enough, scheduling delays are usually too long). Actually I am making this change because I found on 10G hardware the queue is too small. Also, there are many systems where you might want to up this, usually ones that are highly biased towards transmit only, like a multicast repeater of some sort. You would rarely want the globabl default to apply to all interfaces. Anywyay, interfaces with a hardware queue length (er, ring count) of 512 normally don't use the default, but use their ring count (bogusly less 1) for the software queue length too. This can be too small too. In some cases, since select() for writing on sockets attached to hardware doesn't work (it should block if all queues are full and return soon IFF_DRV_ACTIVE transitions from set to clear and the queues become non-full, but it only unerstands the socket buffer and doesn't look at the queues), context switches to the application producing the packets is delayed until at least the next scheduling tick or two, which happens 2-20 msec later. I use sendq lengths of ~2 for bge and em to prevent the queues running dry with a scheduling latency of 20 msec. (10 Gbps would require prepostorous queue lengths of ~20 for the same result.) However, even a queue length of (512+512-1) is bad for performance. Cycling through mbufs in a circular way ensures busting of caches once the queue length is large enough for the memory to exceed the cache size[s]. With a queue length of 50 and today's cache sizes, it is almost possible to fit everything into the L1 cache. Bruce ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"