Re: [LARTC] Re: HFSC
Patrick McHardy wrote: > Nuutti Kotivuori wrote: >> Patrick McHardy wrote: >> >> BTW the Blue scheduler patch for 2.6 seems to be working nicely - >> but I haven't had the time to run the tests on it that I wished to, >> so I haven't posted anything further about it. > > I have in the mean time read up on SFB, maybe I'll extend blue when > I find the time. That reminds me - I have an "extension" to Blue I'd like to try and cook up, if I ever manage to get the time. Ingress Blue. Basically just having a token bucket on ingress, just like traffic policing has - but using Blue on that. Running out of tokens means packet drop, so increase probability. Bucket overflowing with tokens means link idle, so decrease probability. I have a feeling something like that might work well when trying to reduce packet queueing at the ISP on a slow inbound link - better than the usual strict ingress police or using IMQ with RED and such. > PSCHED_GETTIMEOFDAY (or PSCHED_CPU in case of the kernel) are > important for HFSC to work properly, PSCHED_JIFFIES has too low > resolution. That might be what was messing up my other simulations as well - thanks for the heads up, I will see what comes out of that. [...] > I know what the problem is. Try: ip link set lo txqueuelen 1 Works like a dream! > or upgrade to 2.6.4. The problem got introduces when fixing > an off-by-one in pfifo_fast, before it would enqueue one packet > with a txqueuelen of 0. In 2.6.4 this behaviour is restored, > although it's a misconfiguration anyway to use leaf-queues with > a limit of 1 for anything but well-formed flows. Figures... :-) So I just got bitten by the default configuration of txqueuelen of 0 for lo, that I didn't even happen to think about. Time to upgrade to 2.6.4 as well it seems. -- Naked ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
Re: [LARTC] Re: HFSC
Patrick McHardy wrote: > Late reply but here it is ;) No worries - it wasn't exactly brief and I did have other stuff to spend my time on. BTW the Blue scheduler patch for 2.6 seems to be working nicely - but I haven't had the time to run the tests on it that I wished to, so I haven't posted anything further about it. > Nuutti Kotivuori wrote: >> Patrick McHardy wrote: [...] > I think it can be expressed easier like this: > > b(t) = > { > m1 * tt <= d > m1 * d + m2 * (t - d) t > d > } > > b_rt(t) <= b_ls(t) <= b_ul(t) for all t >= 0 Yes, certainly - I just wished to eliminate t from it all. > No error is signalled when these are violated. Right. > The later is correct, the class will participate in link-sharing, > but will only be selected by the real-time criterion under full > load. It will also be punished later wrt. excess bandwidth as long > as the parent constantly stays active. Ah, yes. Makes perfect sense. > It will still respect the upper-limit curve, but I'm not sure about > the consequences for sibling-classes and the parent's active child > list, I need to think about this some more. In any case it's not > advisable to do so. Okay. > The sum of all realtime service curves must not be bigger than the > service curve of the link itself, otherwise the service can't be > guaranteed. Nod. > For link-sharing curves, it's actually not important that they don't > exceed their parent because they only define a share, not an > absolute amount of service. Only the relative differences between > siblings matter. Makes sense. > Adding n curves gives you (in the worst case) a (n+1)-ary curve, you > can calculate it like this: > > sc1: m1 = 100kbit, d = 1s, m2 = 200kbit > sc2: m1 = 50kbit, d = 0.25s, m2 = 300kbit > sc3: m1 = 200kbit, d = 1.5s, m2 = 500kbit > - > m = > { > 350kbit d <= 0.25s > 600kbit 0.25s < d <= 1s > 700kbit 1s < d <= 1.5s > 1000kbitd > 1.5s > } Right. I think there's a need for a small tool to make these calculations - and perhaps even to automatically be able to scale other curves to maintain the restrictions. But that is for the future, manual calculation will do for now. > If it is possible to fulfill all demands with the available > excess-bandwidth than there is no difference. The real difference is > of a different kind. A parents link-sharing service curve might be > violated by the real-time criterion used for one of it's > children. The parents' siblings will suffer from this as well > (link-sharing wise) because they share the same parent and part of > the service given to all siblings of this parent has been used in > violation of link-sharing, so link-sharing only leaves will > suffer. An example for this is in the HFSC paper on page 6. Right, I think I understand this now. >>> - The sum of all real-time curves must not exceed 100%. [...] > Yes, actually link capacity. Already explained above. > >> And what happens if they do? [...] > Nothing bad will happen, only the guarantees can't be met anymore. > It will still pick the class with the smallest deadline. Okay, this is good to know. >> If only relative difference matters, why must link-sharing service >> curves be larger than real-time service curves? And smaller than >> upper-limit service curves? > > They don't. It just makes it simpler to assure that the service > given to a class is at least the amount defined by the real-time > curve, which is usually what you want. Exactly. >> I also seem to have a lot of difficulties in trying to simulate the >> behaviour of the qdisc. [...] > Have you applied the patches from trash.net/~kaber/hfsc/tcsim ? > With HZ=1000 and PSCHED_GETTIMEOFDAY as clocksource I got very > good results. I tried HZ=1000 and HZ=100 both, and the results were odd. But I think I didn't touch the clocksource at all. I will try later on with PSCHED_GETTIMEOFDAY as well. >> Also, something as trivial as this: >> tc qdisc add dev $DEV root handle 1: hfsc default 1 >> tc class add dev $DEV parent 1: classid 1:1 hfsc rt m2 100kbps >> seems to work for 'eth0' but not 'lo' interface, where as for example >> the 'tbf' qdisc does work for 'lo' as well. If I run those commands on >> 'lo', every packet shows up as dropped by the qdisc. > > Works ok here .. do you mean inside tcsim ? No, I don't mean inside tcsim. Here is a full transcript: * shiro:~# export DEV=lo shiro:~# tc -s -d qdisc show dev $DEV shiro:~# ping localhost -
[LARTC] Re: Any danger in thrashing 'tc'?
David McNab wrote: > Is there any danger in a prog which repeatedly clears the ingress > and root egress qdiscs, and sets up new ones, even as frequently as > every 5-15 seconds? [...] > As you can see, the prog will be frequently spitting heaps of tc > commands, constantly taking down the ingress and root egress qdiscs, > and creating new ones. > > So, am I likely to hit on any unintended consequences (apart from > the minor cpu spikes)? When you remove the qdiscs, you will cause all queued packets to drop. When you add qdiscs you obviously have reseted the burst values for all of them so then each of has a full burst to spend (if you use bursts anywhere, that is). This does not happen when you change parameters that can be changed by 'tc qdisc change', but not all values can be changed. And obviously this does not happen if you just change iptables marking rules to mark packets differently. As for your actual problem - I would suggest looking for some other solution than mangling rules/qdiscs every few seconds. -- Naked ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
Re: [LARTC] Re: Blue and SFB
Patrick McHardy wrote: > I've completed the port and tested it yesterday, unfortunately it's > not useable in the real world as is. There is a strong bias against > non-ECN flows because their packets are simply dropped instead of > marked. At high load (50 ECN vs. 1 non-ECN flow) and a marking > probability of about 10% the non-ECN flow simply stalls. > I can send you the patch if you're interested .. Thank you, I am really interested. I will try how it behaves for me in various circumstances. What you say, though, is probably true - and the situation is even more accentuated when considering that different TCP stacks react to ECN and packet drops differently - a single drop percent will not be enough. Which ofcourse brings us to SFB - with Stochastic Fair Blue, the drop percentage for the non-ECN flow should be significantly lower and the connections should transfer more or less fairly. -- Naked ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
[LARTC] Re: Blue and SFB
Patrick McHardy wrote: > There is a blue implementation for Linux at > http://home.sch.bme.hu/~bartoki/projects/thesis/2001-May-26/ Nice! I briefly scanned the implementation and it didn't look too bad. Some oddities here and there (such as changing HZ from 100 to 1024??). But, as the implementation is rather old, it would require a complete overhaul for 2.6 I think. It is a shame it wasn't worked in to the Linux kernel when it was still current, as I think the algorithm could have a lot of uses. What is the process of getting new traffic schedulers in the kernel? I guess most of the netfilter stuff goes through netfilter development and patch-o-matic - but there isn't anything similar for QoS, is there? > BTW: Regarding your remaining HFSC questions (I just discovered > them on lartc), I'm going to answer them tomorrow. No rush. I did realize that I posted the reply only to the list through gmane, but didn't consider it important enough to resend directly. -- Naked ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
[LARTC] Blue and SFB (Was: how to add my own traffic scheduler to TC)
Kennedy Cheng wrote: > What are the steps needed to add my own traffic scheduler to TC? >From your small example there, I would guess you are working on a Blue scheduler for TC. Hence I'd like to ask publically from everyone around - has anyone seen any efforts to implement Blue[1]? What about SFB, eg. Stochastic Fair Blue? The Blue algorithm seems really straightforward to implement, so I really wonder why it hasn't been done already? SFB is ofcourse more complex in several ways, but perhaps a lot of code could be reused from SFQ, since the basic idea of hashing flows is the same. Thanks, -- Naked Footnotes: [1] http://www.thefengs.com/wuchang/work/blue/ ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
[LARTC] Re: how do you rate limit routable traffic without rate limiting LAN protocols like arps and igmp?
Michael A. D'Annunzio wrote: > I know if I remove the default parameter, traffic not matching > any filter is sent over the root queue, but I need to have a defined > default. Define the default for whatever you wish to have as the default class for non-ip traffic and then filter _all_ ip traffic into a certain class, eg: tc filter add dev $DEV parent 1: protocol ip prio 18 u32 \ match ip dst 0.0.0.0/0 flowid 1:20 Or, even more modularily, use the MARK target at the end of your iptables ruleset after all other markings: iptables -t mangle -A POST_ROUTING -m mark --mark 0 -j MARK --set-mark 3 iptables -t mangle -A OUTPUT -m mark --mark 0 -j MARK --set-mark 3 And then just setup your tc to match the marks to classes: tc filter add dev $DEV parent 1: protocol ip prio 10 \ handle 3 fw \ flowid 1:30 Or a variety of other solutions. I am not sure if this will solve your problem exactly, though - since having problems with ARP traffic and so seems really odd. -- Naked ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
[LARTC] Re: Strange tc issue
Roy Walker wrote: > I guess the issue centers around every linux box I have shows the > pfifo_qdisc when I do an 'ip link show'. But when I do a tc it does > not do this. This definitely tells me that tc is not reading the > qdiscs properly which also probably means it is not setting them > right either. Problem is that is compiles properly and does not > give me an errors during build. > > This is really starting to drive me crazy. Appreciate your help. The things you say seem partly contradictory, and it is really not clear what is the problem - however, I will give an example of what things do on *my* system so you can spot the discrepancy between your systems. , | ... no configuration ... | shiro:~# ip link show dev eth0 | 7: eth0: mtu 1500 qdisc pfifo_fast qlen 100 | link/ether 00:30:1b:ae:6a:66 brd ff:ff:ff:ff:ff:ff | shiro:~# tc qdisc show dev eth0 | qdisc pfifo_fast 0: [Unknown qdisc, optlen=20] | shiro:~# tc class show dev eth0 | shiro:~# tc filter show dev eth0 | | ... add a qdisc ... | shiro:~# tc qdisc add dev eth0 root pfifo limit 100 | | ... show configuration again ... | shiro:~# ip link show eth0 | 7: eth0: mtu 1500 qdisc pfifo qlen 100 | link/ether 00:30:1b:ae:6a:66 brd ff:ff:ff:ff:ff:ff | shiro:~# tc qdisc show dev eth0 | qdisc pfifo 8002: limit 100p | shiro:~# tc class show dev eth0 | shiro:~# tc filter show dev eth0 ` And no loss of connectivity to anywhere, or anything of the likes. If this does not work for you, then there is something seriously wrong either with your kernel or with tc. If this does work for you, I suggest specifying exactly what you commands you are saying that cause your problems - there might be something wrong there. -- Naked ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
[LARTC] Re: HFSC
Patrick McHardy wrote: > The combinations you list are correct. Real-time curves are only > valid for leaf-classes, whereas link-sharing and upper-limit curves > are valid for all classes in the heirarchy. Right, after a bit of experimentation and thinking, I realized this. > When multiple curves are used, the following must hold: > rt <= ls <= ul If this is for all t, in practise this means: if d > 0 m1(rt) <= m1(ls) <= m1(ul) m2(rt) <= m2(ls) <= m2(ul) if m1 < m2 d(rt) >= d(ls) >= d(ul) elsif m1 > m2 d(rt) <= d(ls> <= d(ul) else d irrelevant else m1 irrelevant m2(rt) <= m2(ls) <= m2(ul) Am I correct? What happens if these values are violated? Are any errors signalled? Also, I have very little clue why these must hold as such. Obviously if a link-sharing curve is smaller than the real-time curve, then when the class participates in link-sharing, it would have to have sent less than it already has. But if this is so, does the algorithm break totally, or does that only mean that the class does not participate in link-sharing before the excess bandwidth share for the class based on the relative link-sharing service curve goes above the real-time service curve? The latter would not necessarily an unwanted behaviour. Then if the upper-limit service curve is smaller than the link-sharing curve, what would this cause? Naive assumptions would lead me to think that it would merely mean that the class participates in link-sharing based on the relative service curve it has, but never ends up taking more than what the upper-limit service curve dictates. Eg. in a case with a relatively large link-sharing service curve and a smaller upper-limit service curve the class would get a big share out of a small amount of excess bandwidth shared, but as bandwidth to share is increased, upper-limit service curve will limit it to a constant limit. Or again, does the algorithm break somehow? And I am even more confused when I think what this means for the interior classes and their service curves. Apparently the service curves for parent classes are respected, but does that mean that the service curve for a parent class would have to be equal to or larger than the sum of all child classes service curves? If so, how does one calculate this? It would be an n-ary curve if calculated exactly. Or if not so, then what happens if the service curves of child classes exceed the service curve of the parent class? Obviously here we are talking only about link-sharing service curves (and upper-limit service curves) as real-time service curves are always fulfilled and only handled by leaf classes. > To understand why there are two (forgetting about upper-limit curves > for now) different curves your need to know that scheduling in HFSC > is based on two criteria: the real-time criterion which ensures that > the guarantees of leaf-classes are met and the link-sharing > criterion which tries to satisfy the service curves of intermediate > classes and distributes excess bandwidth fairly. The reason why > there are two different criteria is that in the Fair Service > Link-sharing model that is approximated by HFSC it is not always > possible to guarantee the service of all classes simultaneously at > all times (with non-linear service curves). HFSC chooses to > guarantee the service curves of (real-time) leaf-classes (because > only leaves carry packets), and uses the link-sharing criterion to > minimize the discrepancy between the actual service received and the > service defined by the Fair Service Link-sharing Model. Right. I think I understand this, but I am not so certain of the implications for actual use. > The upper-limit curve is used to limit the link-sharing > curve. Without an upper-limit curve, packets are dequeued at the > speed the underlying device is capable of. For example in the case > of software devices, this is not very desireable, so you can limit > the total output rate. I came to this conclusion by experimentation. So upper-limit service curve can be used to shape link-sharing usage - but real-time service curves are fulfilled regardless of it? So the end result would be that a class with only a real-time service curve throttles itself to the rate, with a link-sharing service curve becomes work-conserving and with an upper-limit service curve it throttles itself again. > For your other questions: > - If you specify only a real-time service curve the class will not > participate in link-sharing. This means it can only send at it's > configured rate. The difference two a link-share+upper-limit curve > is that the service is guaranteed. > > - If you specify only a link-share curve their are no deadlines and > no guarantees can be given. Right. After a while I realized this. But again I am somewhat uncertain of the ramifications. If we assume classes that have real-time service curves equal to link-sharing service curves, and compare those to classes
[LARTC] Re: tcng version 9l
Werner Almesberger wrote: > Since I cleaned up so many things for Gentoo yesterday, here's one > for Debian 3.0. The main problems were: I still have one more problem with latest Debian unstable. This is the failure message I am getting: cc -g -Wall -Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations -I../shared -Iklib -Iklib/include -Iulib/iproute2/include -I. -DVERSION=\"`cat ../VERSION`\" -DTOPDIR=\"/home/naked/src/tcng/tcng\" -DTCC_CMD=\"/home/naked/src/tcng/tcng/bin/tcc\" -DKFULLVERSION=\"2.4.25\" -DKFULLVERSIONNUM=`printf "0x%02x%02x%02x" 2 4 25` -DIVERSION=\"010824\" -c -o tcsim.o tcsim.c In file included from /usr/include/bits/sigcontext.h:28, from /usr/include/signal.h:326, from tcsim.c:15: /usr/include/asm/sigcontext.h:79: error: parse error before '*' token /usr/include/asm/sigcontext.h:82: error: parse error before '}' token Which is caused by having klib/include in the include path, and it overriding what linux/compiler.h is. If I add: #define __user #define __kernel to tcsim/klib/include/linux/compiler.h, everything works perfectly and tests pass (Passed all 1534 tests (24 conditional tests skipped)). -- Naked ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
[LARTC] Re: HFSC
Patrick McHardy wrote: > This is currently all there is. If you have some specific questions, > just ask (but please CC lartc). If anyone wants to write some > documentation I'd be happy to help, but I don't have time for it > myself. I am not sure if the original poster has specific questions, but I sure do. I just recently got into this HFSC mess myself, so I'm a bit fuzzy in all the terms and differences in implementation. I read the paper (SIGCOM97) on HFSC and I think I understood most of it. But there are some things in the implementation that I couldn't really realize. I'll quote the usage here for reference: , | Usage: ... hfsc [ rt SC ] [ ls SC ] [ ul SC ] | | SC := [ [ m1 BPS ] [ d SEC ] m2 BPS | | m1 : slope of first segment | d : x-coordinate of intersection | m2 : slope of second segment ` Okay, the SC parameters I think I understand rather well - they are there to define the service curve itself. But the way hfsc takes three optional parameters of service curves puzzles me. I believe 'rt' referes to 'Real-Time Service Curve', 'ls' to 'Link Sharing Service Curve' and 'ul' to 'Upper Limit Service Curve'. If I understand correctly, the SIGCOM97 paper mentioned that the link sharing selection need not be the same as the real time selection, but in examples assumed for simplicity that they were. Also, from the source I conclude that 'Upper Limit Service Curve' cannot be specified without 'Link Sharing Service Curve'. So, this, all in all, baffles me :-) The possible combinations I can make here are: Real-Time Link-Sharing Real-Time, Link-Sharing Link-Sharing, Upper-Limit Real-Time, Link-Sharing, Upper-Limit How do these behave? If I specify *only* the real-time curve, what is used for the link-sharing part? Or does that mean that there is no sharing? Or if I only specify the link-sharing curve, does that mean that no specific deadlines are for packets, just that they are sent based on the link-sharing model? And what actually is the upper-limit service curve? I take it that it is some kind of a packet drop curve, but I don't know how it would behave - nor why it would require the link-sharing curve. So, any pointers on these would be helpful, or if you manage to get the time to explain it specifically. I will probably cook up atleast an example script using HFSC for normal QoS if I manage to understand how it works, perhaps even some documentation. -- Naked ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
[LARTC] Re: ESFQ Modification
Robert Kurjata wrote: > Some time ago I faced a problem in limiting traffic on host with > multiple uplinks. Since all the stuff worked nice seemed that there > will be no problems. But then I realized that P2P users are smart > enough to bypass limits as sfq doesn't give fair sharing in this > case (thousands of connections from one user versus few from the > other). I tried IMQ but it's instability in my configuration was > painfull. So I made something like that: > > 1. i use IPMARK patch for the iptables to mark all the connections > in P2P related class depending on source IP (i use SNAT), > 2. modified ESFQ to create hash depending on FWMARK instead of src > ip 3. and it worked. So I have uplink policy based on source ip in > snat-ed environment without using IMQ. > > I'm looking for the opinions, cause I may be wrong in this. > Patch for the files below, cause it's short Quite an unorthodox solution, I must say. But I guess it's as valid as anything is. SFQ and ESFQ are usually for situations where you have a large amount of connections (hashes) that you just cannot track invidually. Hence stochastic - and hence the options for perturbation and so. If there's only a few hashes, as most likely is in your NFMARK case, most of the time they will hit separate hash buckets - but on some perturbation, they might hit the same hash buckets again and fairness is not achieved. The patch as it was by my brief peek, looked rather okay, though. Have you looked in to the WRR scheduler? It is meant to give an equal share of bandwidth to all 'local' machines with weighted round robin scheduling and sounds like exactly what you are looking for. -- Naked ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
[LARTC] Re: SMP and tc
Andreas Hess wrote: > I wonder if anyone has run tc on an e.g. dual processor system? As > far as I know under linux-2.6 it is possible that two processors > receive and process packets of one NIC. Is this right? And if yes, > does it work fine? Yes, it is working fine. There are several locks in packet processing code that span all processors though, so it's not entirely separate, only mostly so. -- Naked ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
[LARTC] Anyone using PCI/USB xDSL under Linux?
I am looking to find people who are using either a PCI xDSL card or an USB xDSL modem right now under Linux. I am not entirely sure where I should go look to find people who do this, but they definitely exist now that there are drivers for several modems. Specifically I am interested in performance and latency. When using an ethernet connected xDSL modem, you obviously get the ethernet latency for transmitting the packet and you need to limit your send speed to something below the actual link speed to own the queue. The ethernet latency is largely insignificant, but owning and having exact control over the send queue is intriguing. So, if anyone has any input on the matter or knows where I might look further, I would be very interested to know. TIA, -- Naked ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
[LARTC] Re: Latency low enough for gaming
Patrick Petersen wrote: > I have learned a lot from the lartc list archive, but this specifik > leaves me with no clue. I have been able to get real close to normal > latency by capping incoming traffic at around 1200kbits, but its no > fun throwing away almost half your bandwidth. > > Can i get any recommendations? Let's get the problem statement clear first. First of all, it is obvious that the high latency is a result of queueing at the ISP, before the packets are send over the slow link to your router. ISPs have very long queues normally. Secondly, one needs to understand that there isn't really a damn thing you can do about it. If someone ping-floods you, it will saturate your downlink and latency will go through the roof. This cannot be prevented except by having access to your ISP. And thirdly, the only thing you can do, is to either discard or delay perfectly good packets which have already travelled over your slow link and spent bandwidth on it. If you drop a packet, it will most likely have to be resent and again use up the same bandwidth. And the only good this does is to try and make the connections throttle themselves when they notice that packets aren't getting through. TCP does this, and a few application level UDP protocols do this, but not much else. So, to your *goal* in a single sentence: Force TCP to send packets slower than your downlink speed. If you can manage this, then no packets are queued at your ISP and you can prioritise traffic perfectly on your router. So, how does TCP work, then? On a connection, TCP has a window size in both directions, which is the amount of new packets that can be transferred without getting an acknowledgement for the packets already sent. Every packet sent is put on a re-send queue, and removed from there when an acknowledgement is received for that packet. If an acknowledgement doesn't arrive for a while, the packet is re-sent. So what happens when a packet is dropped, is that the connection stalls for a moment, because a packet is unacknowledged and send window limits the amount of packets that can be in transit. TCP stacks also throttle themselves when they notice that packets are being dropped. Traditionally, the maximum window size was 64kb - that is, a maximum of 64kbs of data can be unacknowledged on the link. Then the internet became full of links which have a large bandwidth, but also lots of latency. TCP window scaling was invented, and now window sizes can be much larger than that. Also, traditionally TCP only acknowledged up to the last continguous packet - that is, it wouldn't send acknowledgements for the packets that arrived after the missing packet. A loss of a single packet usually caused a short stall in the connection. This was augmented by cool retransmission logic, which allowed TCP to recover from the dropping of a single packet without a stall. And yet later selective acknowledgements were invented, which allows TCP to tell the other end exactly which packets it is missing, and now TCP survives quite high packet loss reasonably well. So, what's the solution? How to make TCP throttle properly? The *real* solution would be to implement a packet mangler which would mutilate outgoing TCP ACK packets such that it would only give out transmission windows with the speed the link is configured to. However, to my knowledge, no free software implements this. I might work up a patch later, if I can come up with a good design. But, short of implementing the *real* solution, there are several things you can do to improve the situation. But first, let's see what is happening now. Right now, your scripts shove all incoming traffic to a HTB, inside which the selection of packets happens through ESFQ. The HTB has to be limited to a rate *smaller* than the actual downlink for it to have any effect what so ever. And even so, what you do is that you queue (eg. delay) packets (maximum of 128 packets as per ESFQ), and then drop fairly traffic that comes faster. So what does TCP do about it? Latency is higher because of queueing at your router, or queuing at the ISP, so the large window sizes allow for a lot of packets to be in transit, waiting to be transferred. A bunch of packets are dropped, so those are retransmitted as soon as possible (at the arrival of the next selective acknowledgement), again filling up the queue. TCP will always try to transfer a bit faster than the speed it can get packets through to take immediate use of improved situations. With a single TCP stream, the queue size at your router or ISP is neglible, so it doesn't hurt latency much. But when there are a large amount of connections transferring as fast as they can, there's a lot of overshooting and what you described happens - the only way to prevent queuing at ISP is to limit the bandwidth to half of the actual link speed. What
[LARTC] Re: ACK Packet Detection
Alan Ford wrote: > I'm trying to understand how the wondershaper ACK match works. Can > somebody help me decode it? > > |tc filter add dev $DEV parent 1: protocol ip prio 10 u32 \ > | match ip protocol 6 0xff \ > > TCP. Aye. > Do these start from the start of the IP header, or the TCP header? Ip header - there's no state information relayed between matches - so these matches cannot know that the protocol is TCP. > | match u8 0x05 0x0f at 0 \ > > If this is start of TCP header - source port is over 1280? First byte of ip packet, first nibble is version, second nibble is length in words. 0x45 is what it is normally - eg. 20 bytes ip header, no options. That is, this just makes sure there are no ip options on the packet. > | match u16 0x 0xffc0 at 2 \ > > Something about the destination port, I'm a bit confused by the > netmask. Surely not "under 64", which is how I'm reading it? > > Or, if this is from the start of the IP header, is this packet > length? Under 64 bytes? Might make more sense... Length below 64. TCP has no length field - and the only thing which separates an ACK packet with no data transmitted with it from an ACK packet which has data as well is indeed the packet length. > | match u8 0x10 0xff at 33 \ > > ??? > > Acknowledgement number starts with 0x10 ? ACK bit is on in TCP flags - and everything else is off. > | flowid 1:10 That should do it. However, I prefer to do the same thing in netfilter and then just use that information in the traffic control side. Example from a 'ferm' script: proto tcp tcp-flags ALL ACK length 0:63 MARK setmark 1; This one is almost identical to the one shown above and much easier to understand. -- Naked ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
[LARTC] Re: wondershaper + htb limiting ftp sends
mark ryan wrote: > Or, ideally, I would like to limit proftpd itself...howeve there > doesn't seem to be a way to do that with linux. Windows can but I > guess Linux cant. Proftpd does have plenty of ways to limit the bandwidth specifically for certain commands only and directions wished and all that. Just peruse the documentation. Operating system has nothing to do with it. -- Naked ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
Re: [LARTC] Re: per-session QoS
Ben wrote: > On Thu, 2004-02-05 at 18:03, Nuutti Kotivuori wrote: > >> What you would wish to do is have a simple per connection token >> bucket, and just DROP every packet exceeding the rate in the >> connection, am I right? > > I don't want to loose data, so dropping packets definately seems > like the wrong thing to do. Unless that's how ingress filters work? > I haven't used them before. Dropping packets will not mean losing data - it just means that the TCP connections have to resend the packets and in general means that the connection will throttle itself to the configured rate. But ingress filtering as it is now works exactly like that. The packet that you are receiving has already reached your machine and you either drop it or accept it. If you wish to do something further, you can look into IMQ. > Fortunately I have access to the code of my server application, > because it sounds like the easiest thing is going to be to just put > per-session rate limiting into that. Right, well, it probably is the easiest solution - just note that you will be working from behind your own receive buffers and tcp windows, which means that the connection might initially accept (burst) more data than you expect before the buffers fill. -- Naked ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
[LARTC] Re: per-session QoS
Ben wrote: > Hey guys, I'm looking for a way to limit ingress throughput for each > tcp session to a destination port on my server. I've found lots of > ways to limit total throughput to a given port on an ip-level, but > that's not quite the same thing. > > I'm somewhat surprised this doesn't seem to be implemented > already. Maybe it is and I'm not seeing it? I have a need for a very similar thing. But in my case, I wish to schedule tcp sessions to a different transfer class if they transfer faster than a certain speed. Doing this on the actual traffic control side of things seems tricky, since none of them have any notion of connections or tcp sessions. Doing this by the way of the 'connbytes' match, eg. by storing the data in the connection tracking table, seems rather easily doable. What you would wish to do is have a simple per connection token bucket, and just DROP every packet exceeding the rate in the connection, am I right? What I would wish is a bit more complex. I'd like to have per connection token bucket, but have it such that when it runs out of tokens, the rule stops matching, but every packet will still take whatever tokens there are in the bucket. And the rule would start matching again only after a certain amount of tokens has again been amassed in the bucket. This is to prevent too rapid churn between different transfer classes per connection. And I haven't found anything which would do this for me anywhere. So, I might code it myself if no other solution comes up. -- Naked ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/