[LARTC] Re: [PATCH] HTB O(1) class lookup
On Monday 05 February 2007 11:16, Jarek Poplawski wrote: On 01-02-2007 12:30, Andi Kleen wrote: Simon Lodal [EMAIL PROTECTED] writes: Memory is generally not an issue, but CPU is, and you can not beat the CPU efficiency of plain array lookup (always faster, and constant time). Probably for some old (or embedded) lean boxes used for small network routers, with memory hungry iptables - memory could be an issue. Sure, but if they are that constrained they probably do not run HTB in the first place. We are talking about 4k initially, up to 256k worst case (or 512k if your router is 64bit, unlikely if small is a priority). And the worst memory consumption case considered by Patrick should be relatively unlikely. Anyway, such approach, that most users do something this (reasonable) way, doesn't look like good programming practice. The current hash algorithm also assumes certain usage patterns, namely that you choose classids that generate different hash keys (= distribute uniformly across the buckets), or scalability will suffer very quickly. Even at 64 classes you would probably see htb_find() near the top of a profiling analysis. But I would say that it is just as unlikely as choosing 64 classids that cause my patch to allocate all 256k. In these unlikely cases, my patch only wastes passive memory, while the current htb wastes cpu to a point where it can severely limit routing performance. I wonder, why not try, at least for a while, to do this a compile (menuconfig) option with a comment: recommended for a large number of classes. After hash optimization and some testing, final decisions could be made. I decided not to do it because it would mean too many ifdefs (ugly+unmaintanable code). Regards Simon ___ LARTC mailing list LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
[LARTC] Re: [PATCH] HTB O(1) class lookup
On Thursday 01 February 2007 12:30, Andi Kleen wrote: Simon Lodal [EMAIL PROTECTED] writes: Memory is generally not an issue, but CPU is, and you can not beat the CPU efficiency of plain array lookup (always faster, and constant time). Actually that's not true when the array doesn't fit in cache. The cost of going out to memory over caches is so large (factor 100 and more) that often algorithms with smaller cache footprint can easily beat algorithms that execute much less instructions if it has less cache misses. That is because not all instructions have the same cost; anything in core is very fast but going out to memory is very slow. That said I think I agree with your analysis that a two level array is probably the right data structure for this and likely not less cache efficient than the hash table. Good point. The 2-level lookup generates 3 memory accesses (including getting at the htb_class struct). List traversal will generate many more memory accesses, unless the lists have 3 or fewer entries (currently that only holds true for up to 48 classes, uniformly distributed). It is difficult to judge if the tables will be in cache or not. The tables are clearly extra baggage for the cachelines, compared to only having the htb_class structs (they are going to be fetched anyway). On the other hand, if you have 10k classes, they are usually (real world) allocated for individual users, of which at most half are active at any time. With hashing, all 10k classes are fetched into cachelines all the time, only in order to traverse lists. That is 150k wasted cache (5000 x 32 bytes); plenty for 10k pointers in lookup tables. Regards Simon ___ LARTC mailing list LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
[LARTC] Re: [PATCH] HTB O(1) class lookup
On Thursday 01 February 2007 07:08, Patrick McHardy wrote: Simon Lodal wrote: This patch changes HTB's class storage from hash+lists to a two-level linear array, so it can do constant time (O(1)) class lookup by classid. It improves scalability for large number of classes. Without the patch, ~14k htb classes can starve a Xeon-3.2 at only 15kpps, using most of it's cycles traversing lists in htb_find(). The patch eliminates this problem, and has a measurable impact even with a few hundred classes. Previously, scalability could be improved by increasing HTB_HSIZE, modify the hash function, and recompile, but this patch works for everyone without recompile and scales better too. I agree that the current fixed sized hashes (additionally quite small by default) are a big problem with many classes, for all of HTB/HFSC/CBQ. But I think your approach is a bit wasteful, with unfortunately chosen classids 128 classes are enough to reach the maximum memory usage of ~512kb (with 4k pages and 8 byte pointers). I think it is a non-issue since it does not happen in practice. Generally there are two ways to assign classids: 1) Manually, used when you have few classes. People usually use 100, 101, 102, 200, 201 etc (probably unaware that they are hex). With 4k pages and 32bit pointers, everything below classid 400 is within the first page, which covers most few classes examples you can find lying around. 2) Script generated, in practice this is required if you have many classes. The classid's will then usually be forthrunning, at least in large chunks, which means minimal memory waste, and an optimal case for plain linear lookup; hashing them can only be wasteful. I have a patch for HFSC which introduces dynamic resizing of the class hash. I have planned to generalize it (similar to tcf_hashinfo) and convert HTB and CBQ as well, which as a nice side effect will allow to get rid of some duplicated code, like hash walking. I have not been looking into HFSC and CBQ, was not aware that they have similar issues. If you give me a few days I'll try to finish and post it. Memory is generally not an issue, but CPU is, and you can not beat the CPU efficiency of plain array lookup (always faster, and constant time). If anything, I would find it more relevant to use array lookup with dynamic adjustment of the array size (HTB_CLS_ARRAY_SIZE in my patch); start out small to waste less memory, increase up to PAGE_SIZE as needed. But then, it is probably too much effort for the gain (a few kb's in machines that should have plenty of RAM anyway), and requires more code = more complexity, bugs, maintenance. Regards Simon ___ LARTC mailing list LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
[LARTC] [PATCH] HTB O(1) class lookup
This patch changes HTB's class storage from hash+lists to a two-level linear array, so it can do constant time (O(1)) class lookup by classid. It improves scalability for large number of classes. Without the patch, ~14k htb classes can starve a Xeon-3.2 at only 15kpps, using most of it's cycles traversing lists in htb_find(). The patch eliminates this problem, and has a measurable impact even with a few hundred classes. Previously, scalability could be improved by increasing HTB_HSIZE, modify the hash function, and recompile, but this patch works for everyone without recompile and scales better too. The patch is for 2.6.20-rc6, I have older ones for 2.6.18 and 2.6.19 if anyone is interested. Signed-off-by: Simon Lodal [EMAIL PROTECTED] --- linux-2.6.20-rc6.base/net/sched/sch_htb.c 2007-01-25 03:19:28.0 +0100 +++ linux-2.6.20-rc6/net/sched/sch_htb.c 2007-02-01 05:44:36.0 +0100 @@ -68,16 +68,37 @@ one less than their parent. */ -#define HTB_HSIZE 16 /* classid hash size */ -#define HTB_EWMAC 2 /* rate average over HTB_EWMAC*HTB_HSIZE sec */ -#define HTB_RATECM 1 /* whether to use rate computer */ +#define HTB_MAX_CLS (TC_H_MIN(-1)+1) +#define HTB_CLS_ARRAY_SIZE PAGE_SIZE +#define HTB_CLS_PER_ARRAY (HTB_CLS_ARRAY_SIZE / sizeof(void*)) +#define HTB_CLS_ARRAYS (HTB_MAX_CLS / HTB_CLS_PER_ARRAY) +#define HTB_CLS_ARRAY(CID) (CID / HTB_CLS_PER_ARRAY) +#define HTB_CLS_INDEX(CID) (CID (HTB_CLS_PER_ARRAY-1)) + + #define HTB_HYSTERESIS 1 /* whether to use mode hysteresis for speedup */ -#define HTB_VER 0x30011 /* major must be matched with number suplied by TC as version */ +#define HTB_VER 0x30012 /* major must be matched with number suplied by TC as version */ #if HTB_VER 16 != TC_HTB_PROTOVER #error Mismatched sch_htb.c and pkt_sch.h #endif +/* Whether to use rate computer. This is only used for statistics output to + userspace (tc -s class show dev ...); if you do not care about that and + want the last bit of performance, disable this. */ +#define HTB_RATECM +#ifdef HTB_RATECM +/* Time between rate computation updates, in seconds, for each class. */ +#define HTB_RATECM_UPDATE_INTERVAL 16 +/* How many HTB_RATECM_UPDATE_INTERVAL intervals to average over when + computing the rate. If set to 1, the computed rate will be exactly the + observed rate of the last interval. If set to higher values, the computed + rate will converge slower, but also vary less with small, temporary changes + in traffic. +*/ +#define HTB_RATECM_AVERAGE_INTERVALS 2 +#endif + /* used internaly to keep status of single class */ enum htb_cmode { HTB_CANT_SEND, /* class can't send and can't borrow */ @@ -104,7 +125,7 @@ /* topology */ int level; /* our level (see above) */ struct htb_class *parent; /* parent class */ - struct hlist_node hlist; /* classid hash list item */ + struct list_head clist; /* classid list item */ struct list_head sibling; /* sibling list item */ struct list_head children; /* children list */ @@ -165,9 +186,13 @@ return rate-data[slot]; } +typedef struct htb_class* htb_cls_array[HTB_CLS_PER_ARRAY]; + struct htb_sched { - struct list_head root; /* root classes list */ - struct hlist_head hash[HTB_HSIZE]; /* hashed by classid */ + struct list_head root; /* root classes list */ + struct list_head clist; /* all classes list */ + /* all classes arrays for fast lookup */ + htb_cls_array* classes[HTB_CLS_ARRAYS]; struct list_head drops[TC_HTB_NUMPRIO];/* active leaves (for drops) */ /* self list - roots of self generating tree */ @@ -198,8 +223,10 @@ psched_time_t now; /* cached dequeue time */ struct timer_list timer; /* send delay timer */ #ifdef HTB_RATECM - struct timer_list rttim; /* rate computer timer */ - int recmp_bucket; /* which hash bucket to recompute next */ + struct timer_list rttim;/* rate computer timer */ + int clscnt; /* total number of classes */ + struct list_head *rtcur;/* current class to update rate timer for */ + int rtiter; /* current iteration (1..UPDATE_INTERVAL) */ #endif /* non shaped skbs; let them go directly thru */ @@ -209,32 +236,22 @@ long direct_pkts; }; -/* compute hash of size HTB_HSIZE for given handle */ -static inline int htb_hash(u32 h) -{ -#if HTB_HSIZE != 16 -#error Declare new hash for your HTB_HSIZE -#endif - h ^= h 8; /* stolen from cbq_hash */ - h ^= h 4; - return h 0xf; -} - -/* find class in global hash table using given handle */ +/* find class in class arrays using given handle */ static inline struct htb_class *htb_find(u32 handle, struct Qdisc *sch) { struct htb_sched *q = qdisc_priv(sch); - struct hlist_node *p; - struct htb_class *cl; + htb_cls_array* a; + int minorid; if (TC_H_MAJ(handle) != sch-handle) return NULL; - hlist_for_each_entry(cl, p, q-hash + htb_hash(handle), hlist) { - if (cl-classid == handle) - return cl; - } - return NULL; + minorid = TC_H_MIN(handle); + a = q-classes[HTB_CLS_ARRAY(minorid)]; + if (a == NULL) + return
Re: [LARTC] how to setup massive traffic shaping? (2 class B nets)
If you use HTB, you need to compile it with HTB_HSIZE set to at least 256 (in sch_htb.c). Else your CPU will be fully loaded with even a few kpps traffic. The problem is how HTB stores the classes, not very efficient when there are thousands of them. I do not know if other qdiscs have the same problem. I am working on a better patch for that, but it is not ready yet. Regards, Simon On Tuesday 19 September 2006 13:42, Тимур Сафин wrote: Hello I have 2 class-B networks (172.22.0.0/16 and 172.23.0.0/16, over 130k of ip's) and need to setup traffic tbf shapers with 64kb/s for each ip from 172.22.0.0/16 and 128kb/s for each ip from 172.23.0.0/16 just read lartc and don't understand how to use u32 for decreasing number of rules and hashing ___ LARTC mailing list LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc ___ LARTC mailing list LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
Re: [LARTC] process id with firewall and tc
Routing, firewalling and shaping run in kernel and has no pid. Instead you can get/set /proc flags, and check for the presence of certain data structures. /proc/sys/net/ipv4/ip_forward is the routing master switch. If 0, the machine forwards nothing. You can both set and get the value, should be relatively easy from a web page. Beware that setting it to 1 may reset other /proc keys to default values. For iptables firewalling you probably need to check if your rules are loaded or not, a script parsing 'iptables -nL' output could do it. Or you could use a condition match enabled in the beginning of each table, and drop all traffic if the condition is false. The /proc/net/ipt_condition/enabled value can then be read and set as a master switch from the web page. Shaping has no /proc files, and no way to create a master switch, so you need a script that parses 'tc qdisc show dev eth0' or 'tc class show dev eth0' output. Regards, Simon On Saturday 16 September 2006 15:38, William Bohannan wrote: Not sure this is the correct place to post this but I am looking to have status of the firewall and traffic control (active, disabled, stopped etc) on a webpage controlled via something like pid as the machine has many things running on it, like firewall, traffic control, data collection for graphing the traffic flows, as well as other services like squid etc. Any ideas would be most helpful. Kind Regards William ___ LARTC mailing list LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
Re: [LARTC] Performance problem on a linux bridge used for shaping.
I have similar hardware, load and trouble. Interrupts are only sent to one cpu, instead of all of them, because that was only overhead. I think the default was changed somewhere around 2.6.10 or .12, but I have forgotten the url. There is a CONFIG_IRQBALANCE option in the kernel, but last time I checked (2.6.16) it did not work very well; almost never does anything. So I have turned it off. I use the userspace irqbalance daemon that periodically sets the smp_affinity's, with the effect that ingoing and outgoing traffic are handled by each their cpu (assuming that no other interrupts or processes are significant). That helps some. But we only shape in one direction, and it can not help spread the shaping load between the CPU's. There is also an acpi_irq_balance kernel parameter (not related to the kernel irq balancer), which apparently uses the APIC to do interrupt round-robin. It worked surprisingly well (perfect, actually) on an old dual celeron that I tested; the network interrupts are spread nicely and evenly across the cpu's. It is probably very chipset dependent, and I have not yet tested it on the firewalls. But as I understand from this: http://vger.kernel.org/~davem/cgi-bin/blog.cgi/2006/09/14#netconf2006_day2 it might not even be an advantage, since the current shaping code would just make the cpu's step on each others toes. Regards, Simon On Saturday 16 September 2006 20:39, Alexandru Dragoi wrote: Hello, Here is the situation. There is a machine with 3 intel gigabit card, 2 of them on PCI-X and in bridge, the 3rd is used only for management access. The machine is a dual Xeon 2.8GHz with HT. With 2.6.8 kernel from debian (testing) and htb with u32 on, i usually get about 30-40% software interrupts on CPU0 and CPU2, and without htb and u32, 10% less. Now, if I boot with 2.6.17.9 kernel, first all irqs are on same CPU. I managed with smp_afinity to move irq of one card to a different CPU. In these circumstances, I have about 20% or a little less soft interrupts on each CPU without shaping, but about 60-70% os soft interrupts with shapping, and sometimes there is packet loss, also dropped packets are shown on ifconfig. The htb script is same. I have u32 performance counters enabled on u32 in 2.6.17.9. I also have NAPI, of course. I can't think on anything else that can cause the problem, but seem to be something in the kernel. Here is the output of lspci: # lspci :00:00.0 Host bridge: Intel Corporation E7320 Memory Controller Hub (rev 0a) :00:00.1 ff00: Intel Corporation E7320 Error Reporting Registers (rev 0a) :00:02.0 PCI bridge: Intel Corporation E7525/E7520/E7320 PCI Express Port A (rev 0a) :00:03.0 PCI bridge: Intel Corporation E7525/E7520/E7320 PCI Express Port A1 (rev 0a) :00:1c.0 PCI bridge: Intel Corporation 6300ESB 64-bit PCI-X Bridge (rev 02) :00:1d.0 USB Controller: Intel Corporation 6300ESB USB Universal Host Controller (rev 02) :00:1d.1 USB Controller: Intel Corporation 6300ESB USB Universal Host Controller (rev 02) :00:1d.4 System peripheral: Intel Corporation 6300ESB Watchdog Timer (rev 02) :00:1d.5 PIC: Intel Corporation 6300ESB I/O Advanced Programmable Interrupt Controller (rev 02) :00:1d.7 USB Controller: Intel Corporation 6300ESB USB2 Enhanced Host Controller (rev 02) :00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 0a) :00:1f.0 ISA bridge: Intel Corporation 6300ESB LPC Interface Controller (rev 02) :00:1f.2 IDE interface: Intel Corporation 6300ESB SATA Storage Controller (rev 02) :00:1f.3 SMBus: Intel Corporation 6300ESB SMBus Controller (rev 02) :03:02.0 Ethernet controller: Intel Corporation 82544EI Gigabit Ethernet Controller (Copper) (rev 02) :03:03.0 Ethernet controller: Intel Corporation 82544EI Gigabit Ethernet Controller (Copper) (rev 02) :04:02.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) :04:03.0 Ethernet controller: Intel Corporation 82541GI Gigabit Ethernet Controller (rev 05) The traffic is somewhere at 40kpps of traffic and 120mbit up, 120mbit/down. Suggestions about better hardware and kernel, or links with docs about these are really welcomed. Bye ___ LARTC mailing list LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc ___ LARTC mailing list LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
[LARTC] Large number of HTB classes
I am planning a setup with thousands of classes in a HTB qdisc, say from 1:1000 to 1:2000, each with a very small rate and a big ceil, for fair sharing of a 45mbit link. I suspect some problems could be lurking in there. Anyone having good/bad experience with such number of classes? Simon ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
Re: [LARTC] Large number of HTB classes
HFSC seems interesting, but does it really work well? I am not afraid of new stuff, but bleeding edge is probably too risky. That HTB problem, I guess you mean it is possible to have available bandwidth, but when some of it has been distributed between users, all users' ceils go below their quantums? But that implies that users have low ceils? Wouldn't it be solved by setting all users' ceil to the full link bandwidth? The doubly linked list patch you mention, I believe it should be in vanilla kernel since 2.4.20-22 or something? I use 2.4.27 here. If not, do you have a link for it? Simon Tomasz Paszkowski skrev: On Fri, Aug 27, 2004 at 10:46:59AM +0200, Simon Lodal wrote: I am planning a setup with thousands of classes in a HTB qdisc, say from 1:1000 to 1:2000, each with a very small rate and a big ceil, for fair sharing of a 45mbit link. Consider using HFSC. HTB is not the best solution for such a number of classes with small rate. Users will not be able to get the whole ceil even if there will be avaliable bandwidth. Secondly make sure your're using kernel with double linked list patch applied to qdisc api. ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/