[LARTC] Re: RFC - bandwidth optimization idea
From: [EMAIL PROTECTED] (Paul Hampson) Wait, you're trying to send more data than the link can take? Then No, of course I don't expect to send more than the limit. send UDP, throttle it at the local end with a drop-oldest qdisc. Then you get the effect of 'most recent data is best'. Anything more Yes, that gives me most recent is best but that does not do what I want except in a few weird cases. If every packet is independent, perhaps it would suffice to always send the newest, e.g., if I were trying to tell the other side what's the latest clock time. (In that case I'd also limit the queue length to one.) You gotta prioritise your data, using TOS or diffserv or something. Set your voice to real-time, so it always gets sent, and the your other applications can use unused packet-times. Use a dropping qdisc This may be the best I can do in the current world where the facility I described does not exist. It does not solve the problem I described. TOS/diffserv etc is more for use by the intervening infrastructure and this problem applies even in the case where there is no congestion or delay at all in that infrastructure, but only in the link from the sending machine. Using real time is just a matter of giving one application priority over others. First, the link itself may have varying bandwidth, and second the other applications might also have urgent data to send. Dropping packets can be disastrous if they happen to contain critical data that is not duplicated in other packets. At very least I have to be able to find out which ones were dropped. But better than all of that is the ability to decide what to send at the last moment. I have a vauge recollection that this sort of thing is discussed in Tannenbaum's Computer Networks textbook, to do with positional data of satellites or something. (eg. if the positional data is delayed, we write it off, we don't want to delay the data about where we are _now_ in order to know where we were _then_) If the goal is to listen to the sound from .2 sec ago and it takes .1 sec to get there then clearly it's a waste of time to send data that's older than .1 sec. But the packet in the queue might have some data that's older and some that's newer. I can't drop part of it. Instead I'd like to know that the packet is about to be sent now, and respond by finding the best data to send now. From: Ed W [EMAIL PROTECTED] This is a total pain to optimise. Ideally I would like an API to be able to limit the congestion window on the local machine for a particular connection (which I don't think exists on either windows or linux?). This way the OS will report that the queue is full quickly to the local program without buffering up a ton of data. The issue in my case is that you have two simultaneous streams in transit for email, one to receive new mail and one to send mail out. In the case of the sat phone it's possible to have net buffers which are 20 secs or so long and so when you send out a status message to say email received successfully, send me the next one, it can end up queued behind a bunch of lower priority data for a VERY long time. Often these buffers are on the remote ISP end where you have very little control. This is a serious slowdown on a link which is costing you $1.50/min. I'm not sure I follow the problem, but if you're saying that one stream should have priority over the other, it seems you could do that with two different queues, one with priority over the other. Or something like sfq could at least prevent one connection from waiting for the other to send a lot of data. ___ LARTC mailing list LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
[LARTC] Re: RFC - bandwidth optimization idea
From: Andreas Klauer [EMAIL PROTECTED] Doesn't every QDisc work that way? When the kernel wants to send a packet, it calls the appropriate dequeue() function in the QDisc. I'm not a kernel developer so this guess might be wrong. That's correct, but this operation takes a packet from an OS queue and the only control the application has over that queue is to put something into it. One way to view the idea is that I want to make it convenient for the application to decide what to put into the queue at the latest possible time without losing any of its available bandwidth. Think in terms of an OS callback to the application saying I'm ready to send your data now, what should I send? But still, I don't think that the queueing is the main problem with your idea... the main problem is, how do you decide what's important and what not, and what's obsolete? This is up to the application of course. See below. From: [EMAIL PROTECTED] (Paul Hampson) I believe the general solution to this is to use UDP, and make sure The scheme I describe wouldn't make a lot of sense for tcp, which after all specifies congestion control, retransmission, etc. But UDP still goes through the queuing that I want to optimize. your source machine doesn't queue up packets locally (eg. ethernet network contention) and let the best-effort nature of UDP deal with dropping stuff that gets delayed. The problem is that the OS is not helpful in avoiding queuing up packets locally. That's part of what I'm trying to fix. For instance, a relatively cheap approximation would be to give the application a way to see how many packets it has in the queue. Then it could at least delay its decision about what to put into the queue until the queue was short. Even better would be to see an estimate of how long it will be before the next packet it enqueues will be sent - like your call will be answered in approximately 4 minutes. I'm not sure there's any way to have an 'I changed my mind about sending that' interface into your network stack... And generally it wouldn't be useful, data spends longer in transit than it does in your queues. That depends on the rate at which the queue is emptied. If your queue has a rate limit of 10bps then your packets can spend a long time in the queue. - There are slow links (For instance, I recall hearing that submarines have very low rates.) - The application might be allocated a small part of the bandwidth shared with other applications. It occurs to me that an example where this would be helpful is transmitting voice data over a low bandwidth link (like a cell phone). Suppose you know that the actual transit time is .1 sec and you want the listener to always hear what the speaker was saying .2 sec ago at the best possible quality. Suppose the available bandwidth is shared with other applications. The voice application doesn't know when they will want to send or how urgent their data might be. Someone else decides that. It just wants to send the best possible data in the bandwidth allocated to it. I imagine is continually sampling the input and revising what it considers to be the most valuable unsent data for the last .1 sec. Whenever the OS decides it's time to send the next voice packet I want it to send the latest idea of what's most valuable. I don't want to have to put data into the queue to wait for times that might depend on what urgent communication might be required by other applications. ___ LARTC mailing list LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
[LARTC] RFC - bandwidth optimization idea
I'm interested in all of - opinions about why this is a good or bad idea - pointers to similar proposals or products that already exist - implementation suggestions This is meant for real time applications that have small available bandwidth and so they have to consider carefully what's the best way to use that bandwidth. I imagine that things happen that cause them to continually reevaluate what's the most important/urgent thing to send next. I want to make it possible for them to delay the choice until the OS is actually ready to send that next packet. The reason they can't do this now is that the OS enqueues packets. Suppose an application uses udp or tcp to tell the OS to send some data. It then discovers that data is obsolete. The old data might still be in the queue to be sent but it's too late to recall it. One way to avoid that is to always delay telling the OS to send something until the OS is almost ready to send the next packet from the queue that your data will enter. But that's not so easy to do, and there's a big penalty if you wait just a little too long. What I want, at least conceptually, is that the application maintains its own queue of data to be sent, ordered by priority. Whenever the OS is ready to send the next packet for that application, it removes the highest priority packet (if any) from the queue and sends it. ___ LARTC mailing list LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
[LARTC] facilities to output to monitoring interfaces
How can one copy packets to a monitoring interface? For a start I'd like to know how to just copy all of those that arrive on eth1 out to eth2 in addition to whatever else would normally happen to them. After that, a number of interesting possibilities: - Copy only those with specified properties. (I suppose a random probability of copying fits into this category.) - Copy only those that are actually sent (so if the packet is dropped anywhere along the way there's no false positive). - Copy only part of the packet, say, only the first 64 bytes. - Extract specified parts of packets and collect the results into larger packets that hold the data for many of the original packets. ___ LARTC mailing list / LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
[LARTC] HTB burstable for 2 interface , how ?
INTERNET | |eth0 202=2E14=2E41=2E1 BW=2EManager | | | +eth1192=2E168=2E1=2E0/24 | +--eth2192=2E168=2E2=2E0/24 Total incoming bandwidth to eth0 is 1024kbps should be shared to eth1 and eth2, which mean each get 512Kbps and burstable to 1024Kbps if other host is idle=2E This doesn't make sense to me. The fact that an internal host is idle does not justify not sending traffic TO it. The suggestions to use IMQ+HTB seem to miss the problem that if someone sends 1024 to eth1 then nobody has a chance to even begine to send anything to eth2. I think you want to allow borrowing only as long as the total incoming rate from eth0 is sufficiently less than 1024 to be sure that those sending to the lesser used internal interface can speed up. In effect I think you have to sacrifice some part of your 1024 to make sure the shaping is done at your machine. I'm not sure how much you have to sacrifice. But suppose it's 24K, so you then have two htb classes that have rate 500, ceil 1000. And the parent class also has ceil 1000. That's critical. That means that if we send at full rate to eth1 then we still have room for someone to start sending to eth2. Then when someone does start sending, he initially gets 24K to eth2. At that point HTB reduces the traffic to eth1 by 24K in order to stay below total 1000. Then the guy sending to eth2 can increase by 24K which will cause eth1 to drop another 24, etc. As you can see, the amount you reserve (you might say waste) also limits how fast the traffic equalizes. Does this make sense to everyone out there? ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
[LARTC] iptables u32 match code for review/testing/...
of the IP header itself. ... 0220x3C@024=0 The first 0 means read bytes 0-3, 22 means shift that 22 bits to the right. Shifting 24 bits would give the first byte, so only 22 bits is four times that plus a few more bits. 3C then eliminates the two extra bits on the right and the first four bits of the first byte. For instance, if IHL=5 then the IP header is 20 (4 x 5) bytes long. In this case bytes 0-1 are (in binary) 0101 yyzz, 22 gives the 10 bit value 0101yy and 3C gives 010100. @ means to use this number as a new offset into the packet, and read four bytes starting from there. This is the first 4 bytes of the icmp payload, of which byte 0 is the icmp type. Therefore we simply shift the value 24 to the right to throw out all but the first byte and compare the result with 0. Example: tcp payload bytes 8-12 is any of 1, 2, 5 or 8 First we test that the packet is a tcp packet (similar to icmp). --u32 60xFF=6 ... Next, test that it's not a fragment (same as above). ... 0220x3C@12260x3C@8=1,2,5,8 0223C as above computes the number of bytes in the IP header. @ makes this the new offset into the packet, which is the start of the tcp header. The length of the tcp header (again in 32 bit words) is the left half of byte 12 of the tcp header. The 12263C computes this length in bytes (similar to the IP header before). @ makes this the new offset, which is the start of the tcp payload. Finally 8 reads bytes 8-12 of the payload and = checks whether the result is any of 1, 2, 5 or 8 */ #include linux/module.h #include linux/skbuff.h #include linux/netfilter_ipv4/ipt_u32.h #include linux/netfilter_ipv4/ip_tables.h /* #include asm-i386/timex.h for timing */ MODULE_AUTHOR(Don Cohen [EMAIL PROTECTED]); MODULE_DESCRIPTION(IP tables u32 matching module); MODULE_LICENSE(GPL); static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, const void *matchinfo, int offset, const void *hdr, u_int16_t datalen, int *hotdrop) { const struct ipt_u32 *data = matchinfo; int testind, i; unsigned char* origbase = (char*)skb-nh.iph; unsigned char* base = origbase; unsigned char* head = skb-head; unsigned char* end = skb-end; int nnums, nvals; u_int32_t pos, val; /* unsigned long long cycles1, cycles2, cycles3, cycles4; cycles1 = get_cycles(); */ for (testind=0; testind data-ntests; testind++) { base=origbase; /* reset for each test */ pos = data-tests[testind].location[0].number; if (base+pos+3 end || base+pos head) return 0; val = (base[pos]24) + (base[pos+1]16) + (base[pos+2]8) + base[pos+3]; nnums = data-tests[testind].nnums; for (i=1; innums; i++) { u_int32_t number = data-tests[testind].location[i].number; switch (data-tests[testind].location[i].nextop) { case IPT_U32_AND: val = val number; break; case IPT_U32_LEFTSH: val = val number; break; case IPT_U32_RIGHTSH: val = val number; break; case IPT_U32_AT: base = base + val; pos = number; if (base+pos+3 end || base+pos head) return 0; val = (base[pos]24) + (base[pos+1]16) + (base[pos+2]8) + base[pos+3]; break; } } nvals = data-tests[testind].nvalues; for (i=0; i nvals; i++) { if ((data-tests[testind].value[i].min = val) (val = data-tests[testind].value[i].max)) {break;}} if(i = data-tests[testind].nvalues) { /* cycles2 = get_cycles(); printk(failed %d in %d cycles\n, testind, cycles2-cycles1); */ return 0;} } /* cycles2 = get_cycles(); printk(succeeded in %d cycles\n, cycles2-cycles1); */ return 1; } static int checkentry(const char *tablename, const struct ipt_ip *ip, void *matchinfo, unsigned int matchsize, unsigned int hook_mask) { if (matchsize != IPT_ALIGN(sizeof(struct ipt_u32))) return 0; return 1; } static struct ipt_match u32_match = { { NULL, NULL }, u32, match, checkentry, NULL, THIS_MODULE }; static int __init init(void) { return ipt_register_match(u32_match); } static void __exit fini(void) { ipt_unregister_match(u32_match); } module_init(init); module_exit(fini); iptables-1.2.7a/extensions/libipt_u32.c /* Shared library add-on to iptables to add u32 matching, generalized matching on values found at packet offsets Detailed doc is in the kernel module source net/ipv4/netfilter/ipt_u32.c */ #include stdio.h #include netdb.h #include string.h #include stdlib.h #include getopt.h #include iptables.h #include linux/netfilter_ipv4/ipt_u32.h #include errno.h #include ctype.h /* Function which prints out usage message. */ static void help(void) { printf( u32 v%s options:\n --u32 tests\n tests := location = value | tests location = value\n value := range | value , range\n range := number
Re: [LARTC] how to get the latency down on maxed out classes?
Abraham van der Merwe writes: Hi Don! I then tried fifos. With small packet fifos the packet loss is just to great to be of any use and even then the latency is quite high (~200ms). A small detail: what are small packet fifos? You mean fifos that can only hold a small number of packets? Or fifos that only hold packets with small numbers of bytes? You consider 200ms high? One max size packet = 1500 bytes = 12Kbit which is about 200ms on a 64Kbit link. You can't expect to do better. The problem is that with 200ms the packet loss is so much that the link is effectively useless (90% packet loss). As soon as I make the queue big enough to not drop significant amounts of packets, the latency goes way up (3 secs). I don't understand the connection between 200ms and packet loss. If you make the queue small (in packet capacity) then worst case latency decreases. Packet loss occurs in either case whenever a packet arrives and the queue is full. If you try to send at a higher rate than allowed then you will fill the queue in either case (a small queue more quickly, of course), and from then on you will lose packets. If you send packets at twice the allowed rate you lose half of them, if you send at 10 times the allowed rate you lose 90%. The fact that you're losing lots of packets, though, indicates to me that you're acting like an attacker, and dropping most of that traffic is therefore exactly the right thing to do. If you were using a correctly working tcp it would not continue to send at 10 times the allowed rate. It would notice that packets were being lost and would slow down until the loss rate became very small. Similarly, I don't understand the latency issue. An application that cares about latency will not create a large backlog. What is this application that is sending faster than the link allows and wants a low latency, and why is it misbehaving? ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
[LARTC] how to get the latency down on maxed out classes?
lets say I want to limit traffic to/from client to 64kbit. now, client opens a tcp connection blasting away at full speed. If client now pings isp, it gets on average around 7 seconds latency. I tried to improve this by using SFQ on the leaf nodes of my HTB hierarchy, but that does not really improve the situation, only makes it much worse. with SFQ I get anything between 250ms and 13 seconds latency. You understand what's going on here? As I recall, both pfifo and sfq default to queues of length 128 packets. If you fill that with 1500 byte packets you have ~200Kbytes which is about 1.6Mbits. At 64Kbit/sec that would take ~30 sec to send so your latency could be as high as 30 sec. You can limit this latency by reducing the queue size. On the other hand, the application that fills the queue evidently doesn't mind large latency. Otherwise it wouldn't fill the queue. I think I posted to this list once a description (maybe even the code?) of another way to limit latency - drop packets that have been in the queue for more than a timeout period (I tend to use 3 sec). SFQ should have the desirable result that one tcp connection won't slow down another one or a ping. I then tried fifos. With small packet fifos the packet loss is just to great to be of any use and even then the latency is quite high (~200ms). You consider 200ms high? One max size packet = 1500 bytes = 12Kbit which is about 200ms on a 64Kbit link. You can't expect to do better. I'm thinking of using RED, but the number of parameters is daunting and I have no idea how the HTB rate correlates to packet size and burst rates for red. RED should be independent of HTB. ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
[LARTC] use or non-use of multiple processors in forwarding
More info on this problem: /proc/stat shows me that all of the [packet forwarding] work is done by one cpu, the other does nothing. /var/log/messages shows the following interesting data: kernel: enabled ExtINT on CPU#0 kernel: masked ExtINT on CPU#1 Does anyone know that this means, whether it would be related to the problem above, and if so, how to change it? ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
[LARTC] use or non-use of multiple processors in forwarding
I'm testing to see how fast A can ping C without losing packets. A -- B -- C B is a dual processor (Intel(R) XEON(TM) CPU 1.80GHz) machine. /proc/stat shows me that all of the work is done by one cpu, the other does nothing. Does anyone have any ideas of why this should be the case and what I can do to change it? I'm hoping I can get higher throughput if both of the cpu's participate. ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
[LARTC] Traffic shaping for upload download
Lets say we want to limit a customer usage to 256kbit total. That is, you want to limit upload+download. Whether or not it can be done, I think it's worth pointing out that this is nonsense. It makes sense to allocate A+B only if A and B can be used to replace each other. Upload and Download are not like that. They're more like food and air - you need both. If you have no air it won't do any good to be given more food. In your case the analogous thing is that you have a total of 1Mbit up and 1Mbit down available, two users, and you allocate 1Mbit total to each. One decides to attack the other by using 1Mbit upload. You decide that's fair, the other (the victim) can just use 1Mbit download. Well, maybe he can, but it won't do him any good. I suggest instead that you allocate upload and download bandwidth separately. ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
Re: [LARTC] Re: [release] ipsysctl tutorial 1.0.1
I'd like to ask for some clarifications, if not quoting, in the tutorial on page x321.html (not sure of section numbers) re: syn cookies. I don't understand what the question is here. Dan Bernstein (everyone's favorite mathematician :-) ) makes it very I was not aware of that. clear on http://cr.yp.to/syncookies.html that your warnings are primarily FUD. For the sake of quoting: A few people (notably Alexey Kuznetsov, Wichert Akkerman, and Perry Metzger) have been spreading misinformation about SYN cookies. Here are some of their bogus claims: I was also not aware of any such controversy, but I think the points below are correct. * SYN cookies ``present serious violation of TCP protocol.'' Reality: SYN cookies are fully compliant with the TCP protocol. Every packet sent by a SYN-cookie server is something that could also have been sent by a non-SYN-cookie server. * SYN cookies ``do not allow to use TCP extensions'' such as large windows. Reality: SYN cookies don't hurt TCP extensions. A connection saved by SYN cookies can't use large windows; but the same is true without SYN cookies, because the connection would have been destroyed. * SYN cookies cause ``massive hanging connections.'' Reality: With or without SYN cookies, connections occasionally hang because a computer or network is overloaded. Applications deal with this by simply dropping idle connections. * SYN cookies cause ``serious degradation of service.'' Reality: SYN cookies /improve/ service. They do take a small amount of CPU time to compute, but that CPU time has to be spent anyway for hard-to-predict sequence numbers; see RFC 1948. * SYN cookies cause ``magic resets.'' Reality: SYN cookies never cause resets. These people also have the annoying habit of crediting their bogus claims to other people, such as me. I don't know whether to attribute this to malice or stupidity; either way, I would like the record to be set straight. I invited Kuznetsov to either retract or defend his claims. He refused to do so. I'm sure he's aware by now that his claims are false, and that any attempted defense will be promptly ripped to shreds; but he's still not admitting his errors. It's unfortunate that he doesn't have more respect for the truth. I also invited Akkerman to either retract or defend his claims. He did not respond. ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
[LARTC] congestion problem
Client --- R1 --- R2 --- R3 --- Web the Client it's me, the R1 router it's myne (so i can control it), the R2 is my provider router, and R3 is the provider,provider router. R2 - R3 is a 2mbit link R1 - R2 is a 10mbit link R2 have multiple interfaces and other 10mbit links I have a 32kbit garanted bandwidth on the R2-R3, but without limit (rate 32kbit, ceil 2mbit) You have guaranteed 32K upstream, downstream or both? There's something strange about that in any case. For upstream, how does r2 know which packets are from you? Source address? Then some other customer of your ISP could deny you service by spoofing your address (unless your ISP filters that). Downstream is also strange, first cause your ISP's ISP would then have to know about you, second cause you have little control over what others send you. So if that is controlled at all it should be shaped in accordance with your wishes. You talk about downloading. But in that case the bandwidth is used mostly downstream. You have limited control over that. Assuming the servers are using tcp you could control the acks (more to the point the windows) you send back to limit the rate at which they send to you. Of course, 32Kbit is slow enough that you're never likely to be happy with download speed. ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
[LARTC] re: keeping up with file sharing programs
I have a different proposal. I think you should use ESFQ, always based on the internal IP address, i.e., in the outbound direction base it on source, inbound use dest. (This is to separately share upload and download bandwidth.) That means that someone trying to use small bandwidth will get it right away while those trying to use a lot will have to share it equally with others. I'm not optimistic about the other schemes that have been suggested: - identify good ports Then people will start using those ports for the bad stuff. - identify bad ip addresses Then people will go around borrowing each others computers - even a separate class for the residence halls Then people will go to the academic buildings to use the computers there. At a college I'd even expect that faculty members would let students borrow their computers, so there's not even much point to giving faculty a separate class. It's true that this might cause trouble for people trying to do large downloads of important stuff. But are you supposed to know which stuff is important? When people claim that what they're doing is important you can put them in the important class and tell all in that class who the others are and that they're competing with each other - so complain to each other before they complain to you. In fact, I'd try to monitor the usage of such people and distribute the results to them all, so they know who to blame. When people complain about others whose stuff they think is not really important you can let some higher academic authority make the call. ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
[LARTC] re: Anyone else seen this one? (ping)
Anyone else seen this one? ping -n {remote host} { 6 second delay } 64 bytes from 216.168.105.33: icmp_seq=0 ttl=255 time=6sec 64 bytes from 216.168.105.33: icmp_seq=1 ttl=255 time=5sec 64 bytes from 216.168.105.33: icmp_seq=2 ttl=255 time=4sec 64 bytes from 216.168.105.33: icmp_seq=3 ttl=255 time=3sec 64 bytes from 216.168.105.33: icmp_seq=4 ttl=255 time=2sec 64 bytes from 216.168.105.33: icmp_seq=5 ttl=255 time=1sec 64 bytes from 216.168.105.33: icmp_seq=6 ttl=255 time=242usec 64 bytes from 216.168.105.33: icmp_seq=7 ttl=255 time=250usec ... Yes, and I know just how to make it happen. You put ping packets and some other type of packets into the same low rate class. Then you send a bunch of the other kind of packet and start your ping. Suppose your class is allowed to send 10 pps and you start with 60 other packets (perhaps even ping packets belonging to someone else) in the queue when you start your ping. 6 seconds later all of your ping packets get to the head of the queue and are sent in the next second. Of course, the first one was sent 6 sec ago, but the next was sent 5 sec. ago, the third 4 sec. ago, etc. The first 6 replies all return at about the same time and look like those above. The rest appear at 1 sec intervals as you send them and look like the last two. BTW, this is pretty similar to the example that lead me to suggest a limited lifetime in the queue. ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
Re: [LARTC] Iptables, SNAT/MASQ, Multiple gateways
Simon Matthews writes: OK, this may be a reasonable approach, but how do I force it initiate connections from the fast interface, yet allow it to fail over to the slow interface if the sytem removes the route to the fast gateway because it has detected that it is not responding? Off hand I don't know anything built in for this (I look forward to hearing an answer from someone who does), but I don't think this is really what you want anyway. It's not as if your link is the only one that could fail! If ISP1's upstream link fails then you want to use ISP2 for all traffic other than that intended for ISP1 itself. And of course, problems further upstream prevent you from reaching certain addresses but not others, and you don't really know which without a global view of the routing. I think the right solution involves monitoring the traffic. There's a wide range of things you could do, the simplest being simply detecting that the link is not responding. You could also try to detect tcp retransmits, measure RTT, aggregate data to measure how well individual connections are working, further aggregate data to determine which addresses blocks are working well and which poorly, etc. Then use that data to decide which of your links to use for a given destination. I actually sent a proposal to this list that I think provides a good solution to the general problem: an extension to TCP (possibly even IP) that supports multiple addresses/ports. This would even allow you to switch addresses in the middle of a connection. I think what I described before applies more to the machine on the other side of your connection, which now would know both of your addresses. Whenever it does a tcp retransmit it switches the address. It therefore tends to stay on the one that works most reliably. (Perhaps this algorithm could be improved to take speed into account too.) This discussion points out that something similar should be done on your end: you should switch the output interface you use when you retransmit. Of course this is not yet implemented. It's on my queue, but not close to the beginning. I'd be glad if someone out there could beat me to it. ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
[LARTC] Anything out there that is similar to Cisco's WFQ?
From: CIT/Paul [EMAIL PROTECTED] Any help would be greatly appreciated :) This is much better than SFQ : Sounds like SFQ to me. Can you tell us what the differences are? ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
[LARTC] RE: Anything out there that is similar to Cisco's WFQ?
Paul writes: No SFQ is not like WFQ... WRR is the closest thing to cisco's fair-queue.. WRR keeps track of the connections using the ip_conntrack .. that's sort of what cisco's fair-queue does and it checks the bandwidth streams and gives lower priority to the higher streams and larger packets.. it's meant to reduce latency for traffic shaping and it does :) I haven't tried WRR but it looks like the closest thing to it although it doesn't take everything in to account as cisco's flow based WFQ does.. This is not very convincing. Do you actually know how WFQ works? If so, please tell us. The doc you sent did not describe how it works but what the effects are, and those are entirely consistent with what SFQ does. High bandwidth flows are limited, low bandwidth flows get lower latency. Can you describe some effect that's different? ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
[LARTC] Gigabit Etnernet router
Date: Mon, 24 Jun 2002 16:33:32 +0200 (CEST) From: M.F. PSIkappa [EMAIL PROTECTED] Subject: [LARTC] Gigabit Etnernet router Hi, I would like to build new router with 3 Gigabit Ethernet card. Need I dual procesor system or not ? I would like to have trafic controling (htb or cbq/sfq) and firewall (iptables) on this router. Can you recommend me some good motherborad with 64-bit PCI-X ? More to the point, where can you get a motherboard with 3 64x66 PCI buses?(!!) Note that one gigabit card actually can use 2gigabits of PCI bandwidth (one in, one out), and a pci bus is nowhere near 100% efficient, so one 64x66 PCI bus has enough bandwidth to handle 1 such card at full bandwidth, not enough for 2. ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
Re: [LARTC] Questions about IMQ
Patrick McHardy writes: We're adding an htb as the qdisc for a child class of htb ? Why? Isn't that just wasting time? Can't all 10: stuff be done with 1: instead? The root qdisc is used for delay simulation, 10:0 is the real qdisc ( http://luxik.cdi.cz/~devik/qos/htb/manual/userg.htm#prio ) So I was right, just to waste time. That was not part of the spec, as I recall. So I suggest it be removed from the example. If more than one imq device is used you specify the one which should get the packet with --todev argument to IMQ target. Be sure to put that in the doc. I didn't see it there. I suppose the default is imq0 ? skb-dev doesn't get changed if thats what you mean .. Ok, important to know that. I gather there is currently no way to read the imq mark from netfilter. When the packet is eventually dequeued (if not dropped) then it goes where? I'm hoping it goes to the beginning of pre-routing so we can apply conntrack/nat/mangle rules to it with -i imq0. No it doesn't. I think it doesn't make any sense to use any kind of iptables rules on packets passing imq because all of them come from/go to real devices which you can use in your rules. But if you could read the imq mark then it would make a lot of sense. These two things in combination would allow me to do what I want without changing the code. As it is, it looks like I need a local variant of IMQ that runs before conntrack. (On the other hand, this is probably the more efficient solution anyhow.) I suspect this is not the case, since I see in the patch code nf_reinject(skb, info, NF_ACCEPT) I'm not even sure netfilter supports what I want. I see in http://netfilter.samba.org/documentation/HOWTO//netfilter-hacking-HOWTO-3.html 5.NF_REPEAT: call this hook again. but what's this hook ? Is it the imq hook or pre_routing ? it's imq hook. from net/core/netfilter.c: nf_reinject(...) As I thought, there's no convenient way for you to do what I want. you can easily change this order. i guess you already noticed if you looked at the imq source. Right. But this is not a change that everyone would want. but are you sure this is necessary ? i guess your connection must be extremly fast if someone wants to dos you through a connection tracking table fillup attack ... My idea of extremely fast has changed recently. Maybe it's a bit ahead of yours. First, I'm interested in protecting against attacks from inside the firewall, and these are typically connected at 100Mbit. Is that fast enough? Next I've been playing with gigabit cards. Finally I visited sprint a few weeks ago and they're not interested in anything as slow as one gigabit. Although, for a firewall, I admit that seems fast enough for the time being. Changing skb-dev to imq0 would result in something like this: ... - NF_HOOK(..) - imq - qdisc - reinject - continue NF_HOOK - ... - dev_queue_xmit - qdisc - imq - reinject (CRASH!) If you mean it could result in infinite loops, yes, but this is not the first invention of infinite loops. If your rules do the right things then the loops can also be avoided. Besides, that requires my other request, that the reinject go back to the beginning of the prerouting hook. Without that it was completely plausible that the skb dev could have been changed. But I'm not complaining. I just wanted to know. If you look at the imq source you find a imq_skb_destructor, i though about adding a comment that it's meant to save rusty's life. if skb's are freed inside qdiscs kfree_skb will call the destructor which will do necessary things to protect rusty :) Ok, I wouldn't want to contribute to his early demise. This tends to confirm my first guess, which was that the important thing here was to free skbs when they are no longer in use. I guess user mode can't free them, but perhaps the better solution would have been to free them before a copy is sent to user space and then recreating them if the copy ever came back. But I digress... Thanks for all the answers. ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
[LARTC] a small htb question (I think)
in the output of ip addr: 2: eth0: ... qdisc htb qlen 100 Does qlen 100 have anything to do with htb? ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
Re: [LARTC] SFQ buckets/extensions
Alexander Atanasov writes: SFQ classify connections with ports, esfq classify them and just by IP so we can have flows: SRC IP+proto+DST IP+PORT - one flow just DST IP - one flow another we can think of - one flow So i think just about packets of size bytes but without protos which they carry and TCP with its features to tune becomes a kind of exception. I don't think this is true. First, of course, almost all packets TCP. Second, I'd expect that multiple TCP streams along the same path tend to equalize, assuming of course that they are not differentiated along the way. E.g., with just fifo queuing I'd expect that two scp's along the same path would tend toward equal bandwidth. If this is true then multiple TCP streams along the same path would tend to act like one. Perhaps someone else out there can either confirm or debunk that expectation? Of course, things get more complicated when the streams have the same source but different destinations. In that case I'd expect them to again adjust to the differences in bandwidth to the different destinations. So maybe sub-subqueues below the source IP subqueues are not so important. No, the errors are accumulated. It's always within one packet of the ideal in terms of what's sent. The advantage of measuring queue length in bytes would be more fairness in dropping, i.e., you should drop from a queue with 10 packets each of length 1000 bytes instead of a queue with 20 packets each of length 100 bytes. I didn't get it ? What do you mean - have different queues agains packets sizes ? I was suggesting that when you add a packet to a subqueue you not just record the fact that there are now 16 packets in that subqueue, but 1600 bytes. Then the limit on total queue size is measured in bytes. When you enqueue, you check to see whether this packet would cause the limit to be exceeded, and if so you drop from the subqueue with the most bytes. That's closer to the spirit of SFQ, but it's probably more expensive in run time. The current implementation has a very fast way to determine which subqueue has the most packets. The object is to make enqueue/dequeue small constant time operations. I don't see (so far) how to do that with queue lengths measured in bytes. ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
[LARTC] (E)SFQ HRR (=Hierarchical Round Robin)
From: Martin Devera [EMAIL PROTECTED] Subject: [LARTC] (E)SFQ suggestion Hi, just simple note. Maybe it is already in progress :) There are attempts to replace hashing routine in SFQ to consider IPs or ports. What about to use HRR - roundrobin around bunch of IP adresses and then smaller WRR for ports per IP ? It would solve both problems - fairnes between computers (IP) and between flows on than single computer ... (Took me a moment to figure out HRR=Hierarchical Round Robin.) Yep, that has been on my queue for a long time. (Though I'm interested in a different case than source IP in first level and other stuff in second.) Problems: - it's a lot more storage (instead of small constant space per subqueue we now have constant per sub-subqueue, so something like 128 = 128 x 20) - double the time for two lookups - nontrivial change in code. Of course, the temptation would be to make the code work with n levels. So far I've been able to get by without it. That, along with the disincentives above account for it remaining on the queue for so long. Some day I'll need it. If I'm lucky someone else will get there before me. ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
[LARTC] SFQ buckets/extensions
... What if SFQ were to start with a minimal number of buckets, and track how 'deep' each bucket was, then go to a larger number of bits (2/4 at a time?) if the buckets hit a certain depth? Theoretically, this would mean that 'fairness' would be achieved more often in current collision situations but that a smaller number of buckets would be necessary to achieve fairness in currently low-collision situations. I haven't looked at the SFQ code in a while, so I don't know how much benefit this would be in terms of processing time, or even how expensive it would be to change hash sizes on the fly, but at a certain level of resolution (+/- 2-4 bits), the changes wouldn't be terribly frequent anyway. A few reactions: - The only runtime cost of lots of buckets is a small amount of storage for each bucket. Allocating buckets at runtime also introduces the problem that you could run out of space. - There's no advantage to having many more buckets that the number of packets you're willing to queue, which is typically only on the order of a few hundred. extensions And all the discussions tend to lead to the conclusion that there should be an sfq option (when the queue is created) for: a) how big the hash is b) whether to take into account source ports or not c) whether to take into account destination ports or not d) etc. :) Maybe someone who's written a qdisc would feel up to this? I've been hoping to get to it, since I have other stuff I'd like to incorporate into a new sfq version. From: Alexander Atanasov [EMAIL PROTECTED] I've done some in this direction , probably needs more work, and it's poorly tested - expect b00ms ;) This adds a new qdisc for now - esfq which is a 100% clone of original sfq. - You can set all sfq parameters: hash table size, queue depths, queue limits. - You can choose from 3 hash types: original(classic), dst ip, src ip. Things to consider: perturbation with dst and src hashes is not good IMHO, you can try with perturb 0 if it couses trouble. Please, see the attached files. Plaing with it gives interesting results: higher depth - makes flows equal slower small depth - makes flows equal faster limit kills big delays when set at about 75-85% of depth. I don't understand what these last three lines mean. Could you explain? Needs testings and mesurements - that's why i made it separate qdisc and not a patch over sfq, i wanted to compare both. Any feedback good or bad is welcome. I'll send you my current module, also a variant of SFQ. It contains doc that I think is worth including, also changes some of the code to be more understandable, separates the number of packets allowed in the queue from the number of buckets, supports the time limit (discussed in earlier messages), controls these things via /proc, maybe a few other things I'm forgetting. This version does not support hashing on different properties of the packet, cause it uses a totally different criterion for identifying subclasses of traffic. You can discard that and restore the sfq hash with your modifications. I think (hope) these changes are pretty much independent. ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
Re: [LARTC] SFQ buckets/extensions
Alexander Atanasov writes: At first look - i think i've have to incorporate my changes into your work. I've not done much just added hashes and unlocked what Alexey Kuznetsov did. Not quite that simple. You have to throw out about half of my file, mostly the last third or so, which replaces the hash function, plus most of the /proc stuff, and probably a lot of other little pieces scattered here and there. I now recall a few other things - I support configuration of sevice weights for different subqueues, which makes no sense for sfq, also I record the amount of service (bytes and packets) per subqueue and report these to the tc -s -d stuff, which also makes no sense for sfq. After removing all that stuff you then have to restore the hash. Plaing with it gives interesting results: higher depth - makes flows equal slower small depth - makes flows equal faster limit kills big delays when set at about 75-85% of depth. I don't understand what these last three lines mean. Could you explain? depth is how much packets which are queued on a row of the hash table. If you have large queues (higher depth) sfq reacts slower when a new flow appears (it has to do more work to make queue lengths equal ). When you have short queues it reacts faster, so adjusting depth to your bandwidth and traffic type can make it do better work. I set bounded cbq class 320kbits and esfq with dst hash: Start an upload - it gets 40KB Start second one - it should get 20KB asap to be fair. With depth 128 it would take it let's say 6 sec. to make both 20KB, with depth 64 about 3sec - drop packets early with shorter queue. (i've to make some exact measurements since this is just an example and may not be correct). I don't see why that should be the case. And I don't recall ever observing it. This adaptation time should be practically zero. There's no work in making the queues equal. (Let's use the word queue to mean the whole SFQ and subqueue for the part sharing a hash index.) If you have, say, 100 packets in one subqueue and 10 in another they're already sharing the bandwidth 50-50. limit sets a threshold on queued packets - if a packet exceeds it's dropped so delay is smaller, but when it tries to make flows equal it counts depth, not limit. With above example depth 128 and limit 100: When first upload enqueue 100 packets sfq starts to drop, but goal to make flows equal is 64 packets in queue. Flow doesn't get the 28 packets which are to be enqueued and delayed for a long time and probably dropped when recived. I disagree that the goal is to make the subqueues the same length. The goal is to serve them with the same bandwidth (as long as they don't become empty.) Queue length depends on how many packets each download is willing to send without an ack. If one is willing to send 100 and the other is willing to send 10, then the subqueues will likely be length 100 and 10, but each will still get the same bandwidth. Without window scaling the max window size is 64K which is only about 45 packets, so it's not really normal to have 100 packets in a subqueue. ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
Re: [LARTC] Re: More on qdiscs - about dangling backlogs
Patrick McHardy writes: I don't think dropping at dequeue is necessary. Here's an example showing it is. I have an SFQ with max queue size 128 and very low rate, say about 1 packet/sec that I use to limit the rate of SYN's. Now as part of a test I send a syn flood, say 200 packets in one second. In that first second SFQ drops 200-128 but the time limit won't drop any more on enqueue. Now we're sending one packet/sec and I try to open a tcp connection. Suppose the age limit is 5 sec. and we drop on enqueue only. If I try to open the tcp connection 3 sec after the flood, none of the 124 or so packets in the queue has expired and it's going to take 2 min. for my syn to get through. Whereas, if dequeue drops expired packets then I can get through in 2 sec. The goal is not to provide a maximum in-queue time, if the qdisc is able to send there is no reason to drop. The reason to drop is that it's a waste of time/bandwidth to send expired packets, and even worse to make something unexpired wait for them (and even worse yet if it has to wait long enough to expire itself). The problem is if the qdisc is not able to send many packets can get queued until drops occur. This means it takes a long time until the sender receives indication of congestion. For TCP, congestion is indicated by either consequent ACKS with the same ACK number or SACKs. In this case tcp should stop cause it doesn't get any acks cause its packets are not getting forwarded. Or if the problem is the other direction, it should stop cause it's not getting the acks. ACKs are only generated by the receiver if something was actualy received, so by dropping packets after some timeout the time until a duplicate ACK is generated becomes smaller. I think now we're getting into the subject of why it's good to drop packets that have been waiting for a long time. If your objective is to generate a duplicate ack then I'm not sure dropping packets is the right way. After all, you might drop a packet that would have generated one sooner. For that matter, you might have dropped one that would have generated a non-duplicate ack, which would be even better for tcp. Nevertheless, I do think it's good to drop packets that are not forwarded in a timely fashion. It's just not a simple argument. If you assume the expiration time to be smaller than the time requried to fill the senders congestion window, it doesn't makes sense anymore to drop packets during dequeue as this could possibly prevent a duplicate ACK from beeing generated - relay indication of congestion. It sounds better to me to check for expired packets during enqueue (timers would probably be too expensive i guess) and drop them before enqueueing the new packet. I'm not sure what you have in mind for the check here. I was expecting that no packets would actually take a long time between being received/generated and being enqueued. Rather, when you enqueue one, you can look in the queue to find any others that have been in the queue too long. But this does not catch as many cases as checking at dequeue. I was not proposing to set any timers. Just record the time the packet arrives and when you dequeue, see how long ago that was. With SFQ, what about not keeping a per-packet timeout but a per-flow congestion-indication timeout which would drop the first packet of a flow after max(age of any packet from this flow) = timeout ? This would assure congestion indication to be delivered to the sender as soon as the flow is chosen to send again after the timeout occured (as long as qlen(flow) 1 which is probably the case). I don't think I understand what you're proposing. If we enqueue 50 packets from one flow all at once, and after sending 10 they all expire (since they're all about the same age), then you want to drop one? Why is it worth sending the rest? ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
[LARTC] Re: More on qdiscs - about dangling backlogs
Martin Devera writes: Hi, only a few notes on the theme. You are right with the displacement and bad enqueue byte counters. Maybe it would be better to cound packets at dequeue time only in clasfull qdisc. It also makes better sense because qdosc can also instruct SFQ to drop packet - this drop I don't understand. What other qdisc instructs SFQ to drop? is then not decremented from HTB/CBQ's stats. HTB uses enqueue time counting because CBQ does it too and at very I figured that's why it worked that way. begining I based my work on CBQ. Seems it is time to change things. Another important info: you really CAN'T drop packets in dequeue routine if you don't want to fool classfull parent. Many logic in CBQ/HTB/ATM/PRIO qdisc is based on the existence of backlog. I gather you only care whether this qdisc is waiting to send, not how much is in the queue. When you drop in dequeue, parent will think that he itself still has some packets somewhere (in children) and will constantly attempt to find them. And will be confused by the fact that it can't. Enqueue routine can give you confirmation of packet enqueue, drop routine the same. Only dequeue can't say hey there is your skb and by the way two others was dropped. What a pity. SOLUTIONS: One way is to monitor (store) q.qlen variable of all children of classfull qdisc. When I call enqueue/drop/dequeue on it I'll see its discrepancy agains last stored value and will update my counter This seems like the right solution, except that you shouldn't even need to store a counter. Just use the one in the child qdisc. BTW, I think you are justified in assuming that this counter can only change in two ways: decrease when dequeue is called (but possibly by more than one packet) and increase by 1 when enqueue is called. And I guess also requeue. Which raises another problem, since that's not called by the parent, right? accordingly. In the same way we could add q.bytelen and then we would be able to do the same for bytes - but this is not neccessary probably as we need to know only bytes dequeued ... My impression is that every qdisc is currently supposed to keep track of # packets in the queue, but not necessarily # bytes. Best not to change that, especially since we don't even have any currently proposed use for it. There is other way - add parameter to dequeue routine which tells us how many packets we should out qlen decrease by. But I think that former approach is simpler and prone to silent drops (for example by some timer). I think you're saying the former approach is better, but prone suggests otherwise. I prefer the former approach. It's better for all the other qdiscs out there to keep working without change. There are only a few classful qdiscs that would have to change, and they could still work unchanged with qdiscs that don't do silent drops. Don do you plan to implement these corrections for CBQ/PRIO ? Not any time soon. The last time I tried to read CBQ it was in order to figure out what all those parameters mean, and I didn't get very far. Now that I no longer use it I have less incentive. I'll do the change for HTB. Thanks. But which ones? It sounds like you plan to - count as sent only what's returned from dequeue - ignore the return value of enqueue and check the queue length to see whether it's empty I like those, but also suggested that HTB would be a good place to add code for dropping stale packets. If you're interested in doing that I should mention: Not all packets come with timestamps (e.g., locally generated ones). Solution: at enqueue, if the timestamp is 0, set it to the current time. ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
[LARTC] rp filter questions
The rp_filter is also explained here: http://lartc.org/HOWTO//cvs/2.4routing/html/c1182.html#AEN1188 above says: for i in /proc/sys/net/ipv4/conf/*/rp_filter ; do echo 1 $i done First question: ls /proc/sys/net/ipv4/conf/*/rp_filter = /proc/sys/net/ipv4/conf/all/rp_filter /proc/sys/net/ipv4/conf/default/rp_filter /proc/sys/net/ipv4/conf/eth0/rp_filter /proc/sys/net/ipv4/conf/eth1/rp_filter /proc/sys/net/ipv4/conf/eth2/rp_filter /proc/sys/net/ipv4/conf/lo/rp_filter What do all and default do? Could the look above be replaced by just one? Second question: How does the runtime cost of rp_filter compare with that of rules like iptables -A FORWARD -i eth1 -s ! 10.0.0.0/8 -j DROP I assume in one case you have to do a route lookup, in the other you have to iterate over the appropriate rules. What are these costs? Ideally the answers should be in terms of variables we know, such as the number of rules, the number of rules per interface, the number of routes, etc. ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
[LARTC] re: Per-connection routing for multiple uplinks/providers
I have been digging through the Lartc documentation as well as Netfilter, etc. and haven't found much on per-connection routing for multiple uplinks/providers. What I would like to do is cleanly move packets out to the Internet over two (maybe 3) separate interfaces, utilizing all of the bandwidth, and avoiding snags. What you (and everyone else) would really like is to make your two or three links act like one link with bandwidth equal to the sum of the parts. As long as those different links have different ip addresses (which will surely be the case if they connect to different providers) this cannot be done. You can indeed send traffic out in a manner that uses all of your bandwidth (assuming your providers don't do the ingress/egress filtering that they should - a pretty safe bet at the moment, sad to say). This does introduce additional problems since packets you send are now much more likely to arrive out of order, and for that reason alone it's probably not a good idea. A more fundamental problem is that the incoming packets cannot share the different links in the same way. When you send a packet out you have to choose one IP address as its source. The reply will be sent to that address and will have to arrive on the link with that address. Thus, for example, if you have two links with the same incoming bandwidth and only one connection, you can't use more than half of your total incoming bandwidth for that connection. I could use a round-robin scheduler, which would put consecutive packets on different interfaces. I think this will run into problems when the reply packets come back. Maybe not ?? As long as the provider does no filtering this will work, but will also cause packets to arrive out of order, which is bad for performance. I read through Arthur Leeuwen's documentation (http://lartc.org/HOWTO//cvs/2.4routing/html/x247.html ) on a scheme for dividing the outgoing packets on a per-route basis. Packets going to the same destination will go through the same interface. This gets around the round-robin problem, but I think this is not 'fair' in the sense that one interface might accumulate more routes than the other, and there does not seem to be a mechanism (other than periodically flushing the route tables) for evening out the flows. It is pretty simple though and I will use this as a first chop solution. Who cares about fairness in the number of routes? The important thing is the bandwidth used by those routes. And you can't balance that, since you don't know when you choose the route what bandwidth will be used by that route. Another approach to the problem would be to do a round-robin on a per-connection basis. Each new connection would go out of the 'next' interface. Again, the problem is that when you have to choose you don't know what the bandwidth of the connection will be. You'd do a little better to measure the bandwidth being used currently on each link and assign the next connection to the link with the most unused bandwidth. But of course, this is still only a poor approximation of what you want. ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
Re: [LARTC] Per-connection routing for multiple uplinks/providers
Bob Gustafson writes: But, But, - this is really just software. We are not trying to cram wine bottles down the internet pipe (although many would really like to do that!). The limitations I point out are inherent in tcp/ip. I think I sent a proposal to this list describing a modification to tcp that would allow one connection to use many ip addresses (for each endpoint). That would allow substantial improvement, since you would be able to switch addresses in mid stream (in a live connection). It would not solve all of the problems. In particular, you would not be able to efficiently use both/all addresses at once because tcp has been adapted to work well in the case where packets arrive in order. That could also perhaps be overcome with changes to tcp. Note, however, that these changes would only help you in cases where both machines are using the modified versions. From the requestee point of view, I know how much bandwidth I need to listen to the BBC newscast, or to a company conference call. I can also request email and ftp sessions to work in the 'background' at a lower bandwidth allocation (cost?), but if I am talking to someone interactively, it would be nice if my packets were transferred at a regular rate without jitter or delay. IP doesn't do this, and one can argue that it cannot. But, the whole thing is run by software and software can change. All of the things above can already be done on a single link. What cannot be done is make two links work like one with the sum of the bandwidth. ___ LARTC mailing list / [EMAIL PROTECTED] http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/