Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
Russell Stuart wrote: On Sat, 2007-01-20 at 09:47 +0100, Patrick McHardy wrote: Russell Stuart wrote: b. There is no compatibility problem. Again, (b). You seem to have something in mind, it would be easier if you would just explain exactly where you think there is a problem. I though I had :(. Consider: Line speed is 256 K bits/sec. Protocol: ADSL/ATM (PPPoE VC/LLC) (Overhead is 42 bytes + cell pad). Kernel HZ is 1000. cell_log = 8. Below is a table which shows the RTAB that would be sent to a pre-STAB kernel: IP Datagram Packet Size Packet Size Ticks to Packet Size Seen by KernelOn the Wire Send packet RTAB[0]=2-14..-7 0..7 53..53 1.656 RTAB[1]=2 -6..1 8..1553..53 1.656 RTAB[2]=3 2..9 16..2353..106 3.313 RTAB[3]=3 10..17 24..31 106..106 3.313 ... RTAB[9]=5 58..63 72..79 106..159 4.968 RTAB[10]=564..71 80..87 159..159 4.968 Below is the same thing for a post-STAB kernel: IP Datagram Packet Size Packet Size Ticks to Packet Size Seen by KernelOn the Wire Send packet RTAB[0]=0 - Undefined as no STAB entry is 0. RTAB[1]=0 - Undefined as no STAB entry is 0. ... RTAB[5]=0 - Undefined as no STAB entry is 0. RTAB[6]=2-14..-7 0..7 53..53 1.656 RTAB[7]=2 -6..1 8..1553..53 1.656 RTAB[8]=3 2..9 16..2353..106 3.313 RTAB[9]=3 10..17 24..31 106..106 3.313 ... RTAB[15]=558..63 72..79 106..159 4.968 RTAB[16]=564..71 80..87 159..159 4.968 The two RTAB's are different. Thus you must send different RTAB's to pre-STAB and post-STAB kernels. How is tc to decide which one to send? I did add code that checked uname once to solve a very similar problem in tc, but that got my wrist slapped. If the users asks to use STABs, send the modified RTAB. If the kernel doesn't support STABs it will return an error, which is good enough. Replacing RTAB with STAB would solve the problem, BTW, as the post-STAB kernel would ignore the RTAB. It would also solve another problem. The granularity of RTAB sucks for VOIP (my area of interest). Eg on a 2 M bit link, one ATM cell takes 0.0848 ticks to send, two cells 0.170 ticks, three cells 0.2544 ticks. RTP voice packets are typically two or three cells. RTAB only holds an integral number of ticks of course, making the current traffic control engine useless for VOIP links with speeds of around 2.5M bit or above. This could be fixed in an STAB implementation. I think this is a different problem. If you replace RTABs by STABs you again can't use it for anything that is only interested in the size, not the transmission time (HFSC, SFQ, ...). - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
On Wed, 2007-01-24 at 17:38 +0100, Patrick McHardy wrote: The two RTAB's are different. Thus you must send different RTAB's to pre-STAB and post-STAB kernels. How is tc to decide which one to send? I did add code that checked uname once to solve a very similar problem in tc, but that got my wrist slapped. If the users asks to use STABs, send the modified RTAB. If the kernel doesn't support STABs it will return an error, which is good enough. Yuk! Now the user has to say whether he wants to use STAB's or not? Currently, apart from some debugging params to tc, the user isn't even aware that the traffic control is implemented in terms of RTAB's. That is how it should be - it is an implementation detail. I think this is a different problem. If you replace RTABs by STABs you again can't use it for anything that is only interested in the size, not the transmission time (HFSC, SFQ, ...). I was a little too brief. The comment stems from the observation that in all current implementations: const A_CONSTANT; for (i = 0; i 256; i += 1) assert(RTAB[i] == STAB[i] * A_CONSTANT); Ergo, if in addition to implementing STAB as you plan to, A_CONSTANT was shipped to the kernel then RTAB could be replaced. A_CONSTANT could be set so the calculation would return the time it would take to send a packet in micro seconds, say (a figure I just pulled out of the air). This is 1000 times more precise than the kernel can do now. It wouldn't be perfect - the kernel would send the packets in bursts. But it would be good enough to solve my problem with VOIP, I think. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
Russell Stuart wrote: Yuk! Now the user has to say whether he wants to use STAB's or not? Currently, apart from some debugging params to tc, the user isn't even aware that the traffic control is implemented in terms of RTAB's. That is how it should be - it is an implementation detail. Of course he has to, just like your atm parameter. In case of stabs it would be something like stab atm. I think this is a different problem. If you replace RTABs by STABs you again can't use it for anything that is only interested in the size, not the transmission time (HFSC, SFQ, ...). I was a little too brief. The comment stems from the observation that in all current implementations: const A_CONSTANT; for (i = 0; i 256; i += 1) assert(RTAB[i] == STAB[i] * A_CONSTANT); Ergo, if in addition to implementing STAB as you plan to, A_CONSTANT was shipped to the kernel then RTAB could be replaced. At least look at the patch I sent. STAB mapping is _not_ a multiplication by a constant (which wouldn't be able to express minimum packet size or padding to multiples of cell sizes). - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
On Thu, 2007-01-25 at 01:06 +0100, Patrick McHardy wrote: Of course he has to, just like your atm parameter. In case of stabs it would be something like stab atm. The difference being with atm he is telling the kernel he has an ATM line, but with STAB he is telling the kernel how it should internally calculate packet lengths and transmission times. Why should he care how the kernel does it? Surely this should be hidden. As it is now, BTW. I was a little too brief. The comment stems from the observation that in all current implementations: const A_CONSTANT; for (i = 0; i 256; i += 1) assert(RTAB[i] == STAB[i] * A_CONSTANT); Ergo, if in addition to implementing STAB as you plan to, A_CONSTANT was shipped to the kernel then RTAB could be replaced. At least look at the patch I sent. STAB mapping is _not_ a multiplication by a constant (which wouldn't be able to express minimum packet size or padding to multiples of cell sizes). You have read this the wrong way. Firstly I did look at the patch you posted - and commented on it in some detail. Admittedly it was a while ago. The details are fuzzy now, but I am pretty sure I know what you want achieve and how you plan to go about it. Secondly, I am not saying STAB is a multiplication by a constant - any more than RTAB is multiplication by a constant. I am fully aware that STAB can take into account MPU's, cell padding, additional protocol overhead and whatnot - just like RTAB does now. What I am saying is that on or about is summed up in the tc_calc_xmittime() function in tc_core.c. It is a grand total of 1 line long: return tc_core_usec2tick(100*((double)size/rate)); Later on in the code, we have this assignment: rtab[i] = tc_calc_xmittime(bps, sz); Which can be re-written: rtab[i] = tc_core_usec2tick(100*((double)size/rate)); If STAB was introduced and was kept to a fixed 256 elements (I am not suggesting it should be), the core change would be to change the line above to read: stab[i] = size; rtab[i] = tc_core_usec2tick(100*((double)stab[i]/rate)); Thus my assertion that: RTAB[i] = STAB[i] * A_CONSTANT is literally true. A_CONSTANT is currently always 100 / rate * HZ. All I am saying is that in the kernel, wherever we have a reference to rtab[i] now, after your stab is introduced that reference could be replaced by stab[i] * A_CONSTANT, provided A_CONSTANT was sent with stab by tc. Doing this would have several advantages: a. Uses less space than having both RTAB and STAB. b. Provides the framework to allow the kernel to overcomes my problem with HZ being too slow for ATM links 2M bips/sec. c. Puts an end to my whinging about compatibility problems above. We can just send the old RTAB. It will be ignored by kernels that use STAB. Are you objecting to this because I hadn't spelt it out clearly, or is there something deeper I am missing? It doesn't seem like a radical proposal to me. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
Russell Stuart wrote: On Fri, 2007-01-19 at 13:19 +0100, Patrick McHardy wrote: [...] I don't understand - too many negates here without parens. Are you saying: a. Backward / Forward compatibility between the kernel and its user space tools isn't an issue, or b. There is no compatibility problem. Again, (b). You seem to have something in mind, it would be easier if you would just explain exactly where you think there is a problem. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
On Sat, 2007-01-20 at 09:47 +0100, Patrick McHardy wrote: Russell Stuart wrote: b. There is no compatibility problem. Again, (b). You seem to have something in mind, it would be easier if you would just explain exactly where you think there is a problem. I though I had :(. Consider: Line speed is 256 K bits/sec. Protocol: ADSL/ATM (PPPoE VC/LLC) (Overhead is 42 bytes + cell pad). Kernel HZ is 1000. cell_log = 8. Below is a table which shows the RTAB that would be sent to a pre-STAB kernel: IP Datagram Packet Size Packet Size Ticks to Packet Size Seen by KernelOn the Wire Send packet RTAB[0]=2-14..-7 0..7 53..53 1.656 RTAB[1]=2 -6..1 8..1553..53 1.656 RTAB[2]=3 2..9 16..2353..106 3.313 RTAB[3]=3 10..17 24..31 106..106 3.313 ... RTAB[9]=5 58..63 72..79 106..159 4.968 RTAB[10]=564..71 80..87 159..159 4.968 Below is the same thing for a post-STAB kernel: IP Datagram Packet Size Packet Size Ticks to Packet Size Seen by KernelOn the Wire Send packet RTAB[0]=0 - Undefined as no STAB entry is 0. RTAB[1]=0 - Undefined as no STAB entry is 0. ... RTAB[5]=0 - Undefined as no STAB entry is 0. RTAB[6]=2-14..-7 0..7 53..53 1.656 RTAB[7]=2 -6..1 8..1553..53 1.656 RTAB[8]=3 2..9 16..2353..106 3.313 RTAB[9]=3 10..17 24..31 106..106 3.313 ... RTAB[15]=558..63 72..79 106..159 4.968 RTAB[16]=564..71 80..87 159..159 4.968 The two RTAB's are different. Thus you must send different RTAB's to pre-STAB and post-STAB kernels. How is tc to decide which one to send? I did add code that checked uname once to solve a very similar problem in tc, but that got my wrist slapped. Replacing RTAB with STAB would solve the problem, BTW, as the post-STAB kernel would ignore the RTAB. It would also solve another problem. The granularity of RTAB sucks for VOIP (my area of interest). Eg on a 2 M bit link, one ATM cell takes 0.0848 ticks to send, two cells 0.170 ticks, three cells 0.2544 ticks. RTP voice packets are typically two or three cells. RTAB only holds an integral number of ticks of course, making the current traffic control engine useless for VOIP links with speeds of around 2.5M bit or above. This could be fixed in an STAB implementation. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
Russell Stuart wrote: On Thu, 2007-01-18 at 12:37 +0100, Patrick McHardy wrote: Or are you proposing tc behave differently on different kernel versions. (I have no problem with that, but isn't it officially frowned upon?) Yes. There is no way you can make this work on old kernels, nobody expects that. The important part is that everything continues to work as before and that both old and new iproute binaries work properly on both old and new kernels (new iproute on old kernels without STABs obviously). I thought that some degree of compatibility was expected. At the very least the newest version of tc must work on _any_ kernel as least as well as the version it replaces did. I also though newer kernels should work older version of iproute2, albeit without the features added in the newer versions. Are you saying this is not so? No, thats exactly what I'm saying. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
On Fri, 2007-01-19 at 13:19 +0100, Patrick McHardy wrote: Russell Stuart wrote: I thought that some degree of compatibility was expected. At the very least the newest version of tc must work on _any_ kernel as least as well as the version it replaces did. I also though newer kernels should work older version of iproute2, albeit without the features added in the newer versions. Are you saying this is not so? No, thats exactly what I'm saying. I don't understand - too many negates here without parens. Are you saying: a. Backward / Forward compatibility between the kernel and its user space tools isn't an issue, or b. There is no compatibility problem. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
On Thu, 2007-01-18 at 05:05 +0100, Patrick McHardy wrote: Yesterday I was chatting about this at LCA 2007, and it dawned on me that there is a problem with the dual RTAB/STAB approach. Currently the lookup in the kernel is time_to_transmit_a_packet = RTAB[packet_length_seen_by_kernel] As I understand it, you are proposing to change that to time_to_transmit_a_packet = RTAB[STAB[packet_length_seen_by_kernel]] = RTAB[packet_length_seen_on_the_wire] Given RTAB is the same in both cases the results of the calculation will be different (and ergo wrong in one case or the other). RTAB can't change and remain compatible with old kernels. Ergo this approach breaks backward compatibility. RTABs don't change, they continue to work as before. But when an STAB is present the lookup is based on the STAB size mapping. Neither one is wrong, RTABs calculate the transmission time based only on the specified rate, RTABs + STABs calculate the transmission time based on the rate, but include external overhead. No argument with RTAB works as before. But aren't you proposing to feed it the accurate packet lengths calculated STAB? For example, if the VOIP IP datagram is 60 bytes, older kernels will index RTAB by (60 + ethernet_header_size), STAB kernels will index it by 159 bytes (3 ATM cells - say). If tc doesn't do a uname call or something, then it will send the same RTAB to both kernels. Obviously the results returned by the RTAB lookup will be different. If you aren't proposing to feed the RTAB lookup with the output of STAB, then I still don't understand why the current ATM patch isn't needed. Or are you proposing tc behave differently on different kernel versions. (I have no problem with that, but isn't it officially frowned upon?) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
On Thu, 2007-01-18 at 12:37 +0100, Patrick McHardy wrote: Or are you proposing tc behave differently on different kernel versions. (I have no problem with that, but isn't it officially frowned upon?) Yes. There is no way you can make this work on old kernels, nobody expects that. The important part is that everything continues to work as before and that both old and new iproute binaries work properly on both old and new kernels (new iproute on old kernels without STABs obviously). I thought that some degree of compatibility was expected. At the very least the newest version of tc must work on _any_ kernel as least as well as the version it replaces did. I also though newer kernels should work older version of iproute2, albeit without the features added in the newer versions. Are you saying this is not so? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
On Thu, 2006-11-30 at 14:07 +0100, Patrick McHardy wrote: Qdiscs don't use RTABs to measure rates but to calculate transmission times. Transmission time is always related to the length, the difference between our patches is that you modify the RTABs in advance to include the overhead in the calculation, my patch changes the length used to look up the transmission time. Which works with or without RTABs. Thanks, I understand now. I had decided that that STAB was indeed the way forward, and hence I should devote my time to making it work with ADSL/ATM. The only issue is now I don't have any spare time right now. Yesterday I was chatting about this at LCA 2007, and it dawned on me that there is a problem with the dual RTAB/STAB approach. Currently the lookup in the kernel is time_to_transmit_a_packet = RTAB[packet_length_seen_by_kernel] As I understand it, you are proposing to change that to time_to_transmit_a_packet = RTAB[STAB[packet_length_seen_by_kernel]] = RTAB[packet_length_seen_on_the_wire] Given RTAB is the same in both cases the results of the calculation will be different (and ergo wrong in one case or the other). RTAB can't change and remain compatible with old kernels. Ergo this approach breaks backward compatibility. I can't see any way forward but to break the link between RTAB and STAB. Personally I think this is a good thing, as I think STAB should replace RTAB completely. One way to do this is to give the kernel STAB + transmission rate, so the calculation above becomes: time_to_transmit_a_packet = STAB[packet_length_seen_by_kernel] * transmission_rate RTAB disappears in new kernels, and the compatibility problem goes with it. Can you think of other ways forward? I need some way forward so I can get this thing working. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
Russell Stuart wrote: On Thu, 2006-11-30 at 14:07 +0100, Patrick McHardy wrote: Qdiscs don't use RTABs to measure rates but to calculate transmission times. Transmission time is always related to the length, the difference between our patches is that you modify the RTABs in advance to include the overhead in the calculation, my patch changes the length used to look up the transmission time. Which works with or without RTABs. Thanks, I understand now. Glad we found some common ground :) I had decided that that STAB was indeed the way forward, and hence I should devote my time to making it work with ADSL/ATM. The only issue is now I don't have any spare time right now. Yesterday I was chatting about this at LCA 2007, and it dawned on me that there is a problem with the dual RTAB/STAB approach. Currently the lookup in the kernel is time_to_transmit_a_packet = RTAB[packet_length_seen_by_kernel] As I understand it, you are proposing to change that to time_to_transmit_a_packet = RTAB[STAB[packet_length_seen_by_kernel]] = RTAB[packet_length_seen_on_the_wire] Given RTAB is the same in both cases the results of the calculation will be different (and ergo wrong in one case or the other). RTAB can't change and remain compatible with old kernels. Ergo this approach breaks backward compatibility. RTABs don't change, they continue to work as before. But when an STAB is present the lookup is based on the STAB size mapping. Neither one is wrong, RTABs calculate the transmission time based only on the specified rate, RTABs + STABs calculate the transmission time based on the rate, but include external overhead. I can't see any way forward but to break the link between RTAB and STAB. Personally I think this is a good thing, as I think STAB should replace RTAB completely. One way to do this is to give the kernel STAB + transmission rate, so the calculation above becomes: time_to_transmit_a_packet = STAB[packet_length_seen_by_kernel] * transmission_rate RTAB disappears in new kernels, and the compatibility problem goes with it. (STAB[..] / transmission_rate) Removing RTABs would break compatibilty, I don't really see why adding STABs would. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
First, sorry for letting you wait so long .. Russell Stuart wrote: On Tue, 2006-10-24 at 18:19 +0200, Patrick McHardy wrote: No, my patch works for qdiscs with and without RTABs, this is where they overlap. Could you explain how this works? I didn't see how qdiscs that used RTAB to measure rates of transmission could use your STAB to do the same thing. At least not without substantial modifications to your patch. Qdiscs don't use RTABs to measure rates but to calculate transmission times. Transmission time is always related to the length, the difference between our patches is that you modify the RTABs in advance to include the overhead in the calculation, my patch changes the length used to look up the transmission time. Which works with or without RTABs. No, as we already discussed, SFQ uses the packet size for calculating remaining quanta, and fairness would increase if the real transmission size (and time) were used. RED uses the backlog size to calculate the drop probabilty (and supports attaching inner qdiscs nowadays), so keeping accurate backlog statistics seems to be a win as well (besides their use for estimators). It is also possible to specify the maximum latency for TBF instead of a byte limit (which is passed as max. backlog value to the inner bfifo qdisc), this would also need accurate backlog statistics. This is all beside the point if you can show how you patch gets rid of RTAB - currently I am acting under the assumption it doesn't. If it does you get all you describe for free. Why? Otherwise - yes, you are correct. The ATM patch does not introduce accurate packet lengths into the kernel, which is what is required to give the benefits you describe. But that was never the ATM patches goal. The ATM patch gives accurate rate calculations for ATM links, nothing more. Accurate packet length calculations is apparently the goal of your patch, and I wish you luck with it. Again, its not rate calculations but transmission time calculations, which _are a function of the length_. Ethernet, VLAN, Tunnels, ... its especially useful for tunnels if you also shape on the underlying device since the qdisc on the tunnel device and the qdisc on the underlying device should ideally be in sync (otherwise no accurate bandwidth reservation is possible). Hmmm - not as far as I am aware. In all those cases the IP layer breaks up the data into MTU sized packets before they get to the scheduler. ATM is the only technology I am known of where setting the MTU to be bigger than the end to end link can support is normal. Thats not the point. If I want to do scheduling on the ipip device and on the underlying device at the same time I need to reserve the amount of bandwidth given to the ipip device + the bandwidth uses for encapsulation on the underlying device. The easy way to do this is to use the same amount of bandwidth on both devices and make the scheduler on the ipip device aware that some overhead is going to be added. The hard way is to calculate the worst case (bandwidth / minimum packet size * overhead per packet) and add that on the underlying device. Either you or Jesper pointed to this code in iproute: for (i=0; i256; i++) { unsigned sz = (icell_log); ... rtab[i] = tc_core_usec2tick(100*((double)sz/bps)); which tends to underestimate the transmission time by using the smallest possible size for each cell. Firstly, yes you are correct. It will under some circumstances underestimate the number of cells it takes to send a packet. The reason is because the whole aim of the ATM patch was to make maximum use of the ATM link, while at the same time keeping control of scheduling decisions. To keep control of scheduling decisions, we must _never_ overestimate the speed of the link. If we do the ISP will take control of the scheduling. Underestimating the transmission time is equivalent to overestimating the rate. At first sight this seems a minor issue. Its not, because the error can be large. An example of overestimating the link speed would be were one RTAB entry covers both the 2 and 3 cell cases. If we say the IP packet is going to use 2 cells, and in fact it uses 3, then the error is 50%. This is a huge error, and in fact eliminating this error is the whole point of the ATM patch. As an example of its impact, I was trying to make VOIP work over a shared link. If the ISP starts making the scheduling decisions then VOIP packets start being dropped or delayed, rendering VOIP unusable. So in order to use VOIP on the link I have to understate the link capacity by 50%. As it happens, VOIP generates a stream of packets in the 2-3 cell size range, the actual size depending on the codec negotiated by the end points. Jesper in his thesis gives perhaps an more important example what happens if you overestimate the link speed. It turns out in interacts with TCP's flow
Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
Russell Stuart wrote: On Mon, 2006-10-23 at 14:39 +0200, Patrick McHardy wrote: The implementation may be different, but the intention and the result is the same. I probably would mind less if it wouldn't affect userspace compatibility, but we need to carry this stuff for ever even if we add another implementation that covers all qdiscs. So where is the overlap? Your patch will make qdiscs that use packet length work with ATM. Ours makes the rest, ie those that use Alexy's RTAB, work with ATM. They would probably both apply with minimal conflicts. You have previously said you have no intention of changing the current RTAB implantation with your STAB patch. No, my patch works for qdiscs with and without RTABs, this is where they overlap. As an aside non-work conserving qdisc's that do scheduling are the real targets of ATM patch. The rest are not effected by ATM overly. The only one of those that doesn't use Alexy's RTAB is the one you introduced - HFSC. You are the best person to fix things so HFSC does work with ATM, and that is what I thought you were doing with the STAB patch. No, as we already discussed, SFQ uses the packet size for calculating remaining quanta, and fairness would increase if the real transmission size (and time) were used. RED uses the backlog size to calculate the drop probabilty (and supports attaching inner qdiscs nowadays), so keeping accurate backlog statistics seems to be a win as well (besides their use for estimators). It is also possible to specify the maximum latency for TBF instead of a byte limit (which is passed as max. backlog value to the inner bfifo qdisc), this would also need accurate backlog statistics. What do I need to explain further? As I stated several times, I would like to see a patch that handles all qdiscs. And it should probably not have hardcoded ATM values, there is other media that behaves similar. I am not aware of other media that behaves in a similar way, although I am no expert. Ethernet, VLAN, Tunnels, ... its especially useful for tunnels if you also shape on the underlying device since the qdisc on the tunnel device and the qdisc on the underlying device should ideally be in sync (otherwise no accurate bandwidth reservation is possible). What have I missed? The hard coded ATM values don't effect this patch btw, they are a user space thing only. Sure, I'm just mentioning it. Is seems to be quite deeply codified in userspace though. A negative cell align is possible, and in fact is typical. If slot ended up being less than 0 then the configuration parameters passed to tc would of been wrong - they could not of matched the actual traffic. The error can't be detected in tc, but it can't be allowed to cause the kernel to index off the end of an array either. I'm not sure I understand what you're saying here. The transmission time gets _smaller_ by transmitting over ATM? Does this have anything to do with the off-by-one during rate table calculation you or Jesper noticed? There is nothing I would describe as an off-by-one error in the RTAB calculation, so I can't be sure what you are referring to. Either you or Jesper pointed to this code in iproute: for (i=0; i256; i++) { unsigned sz = (icell_log); ... rtab[i] = tc_core_usec2tick(100*((double)sz/bps)); which tends to underestimate the transmission time by using the smallest possible size for each cell. The packetisation done by ATM does introduce rounding / truncation errors of up to ((1 cell_log) - 1). Negative cell alignments is the easiest way to fix that. Doesn't this cause underestimates as well for minimum sized (within a cell size) packets? The error is the same as the one I mentioned above (apparently in the opposite direction though), which is why I thought this might be related. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
On Tue, 24 Oct 2006, Patrick McHardy wrote: Russell Stuart wrote: On Mon, 2006-10-23 at 14:39 +0200, Patrick McHardy wrote: The implementation may be different, but the intention and the result is the same. I probably would mind less if it wouldn't affect userspace compatibility, but we need to carry this stuff for ever even if we add another implementation that covers all qdiscs. So where is the overlap? Your patch will make qdiscs that use packet length work with ATM. Ours makes the rest, ie those that use Alexy's RTAB, work with ATM. They would probably both apply with minimal conflicts. You have previously said you have no intention of changing the current RTAB implantation with your STAB patch. No, my patch works for qdiscs with and without RTABs, this is where they overlap. Patrick, your patch does not solve the alignment issue of the RTABs. (The kernel part of) our patch (Me and Russells), simply gives the ability to align the RTAB. We will still need this ability when your STAB patch is in place. cut Sure, I'm just mentioning it. Is seems to be quite deeply codified in userspace though. Yes, the userspace patch is where we do the ugly complicated stuff. cut I'm not sure I understand what you're saying here. The transmission time gets _smaller_ by transmitting over ATM? Does this have anything to do with the off-by-one during rate table calculation you or Jesper noticed? There is nothing I would describe as an off-by-one error in the RTAB calculation, so I can't be sure what you are referring to. Either you or Jesper pointed to this code in iproute: It was me, Jesper. for (i=0; i256; i++) { unsigned sz = (icell_log); ... rtab[i] = tc_core_usec2tick(100*((double)sz/bps)); which tends to underestimate the transmission time by using the smallest possible size for each cell. Correct. My original patch only corrected this off-by-one error (and parsed on an overhead argument to the kernel). Russell, took the step further and made it possible to adjust the alignment from userspace. cut Cheers, Jesper Brouer -- --- MSc. Master of Computer Science Dept. of Computer Science, University of Copenhagen Author of http://www.adsl-optimizer.dk --- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
On Tue, 2006-10-24 at 18:19 +0200, Patrick McHardy wrote: No, my patch works for qdiscs with and without RTABs, this is where they overlap. Could you explain how this works? I didn't see how qdiscs that used RTAB to measure rates of transmission could use your STAB to do the same thing. At least not without substantial modifications to your patch. As an aside non-work conserving qdisc's that do scheduling are the real targets of ATM patch. The rest are not effected by ATM overly. The only one of those that doesn't use Alexy's RTAB is the one you introduced - HFSC. You are the best person to fix things so HFSC does work with ATM, and that is what I thought you were doing with the STAB patch. No, as we already discussed, SFQ uses the packet size for calculating remaining quanta, and fairness would increase if the real transmission size (and time) were used. RED uses the backlog size to calculate the drop probabilty (and supports attaching inner qdiscs nowadays), so keeping accurate backlog statistics seems to be a win as well (besides their use for estimators). It is also possible to specify the maximum latency for TBF instead of a byte limit (which is passed as max. backlog value to the inner bfifo qdisc), this would also need accurate backlog statistics. This is all beside the point if you can show how you patch gets rid of RTAB - currently I am acting under the assumption it doesn't. If it does you get all you describe for free. Otherwise - yes, you are correct. The ATM patch does not introduce accurate packet lengths into the kernel, which is what is required to give the benefits you describe. But that was never the ATM patches goal. The ATM patch gives accurate rate calculations for ATM links, nothing more. Accurate packet length calculations is apparently the goal of your patch, and I wish you luck with it. Ethernet, VLAN, Tunnels, ... its especially useful for tunnels if you also shape on the underlying device since the qdisc on the tunnel device and the qdisc on the underlying device should ideally be in sync (otherwise no accurate bandwidth reservation is possible). Hmmm - not as far as I am aware. In all those cases the IP layer breaks up the data into MTU sized packets before they get to the scheduler. ATM is the only technology I am known of where setting the MTU to be bigger than the end to end link can support is normal. What have I missed? The hard coded ATM values don't effect this patch btw, they are a user space thing only. Sure, I'm just mentioning it. Is seems to be quite deeply codified in userspace though. We will have to disagree on that. Jesper and I had the discussion. It came down to using a #define, or letting it be defined on the command line. I actually wrote code for both. In the end I decided the command line option was a waste of time. The difference in lines of code is not huge, however. Either you or Jesper pointed to this code in iproute: for (i=0; i256; i++) { unsigned sz = (icell_log); ... rtab[i] = tc_core_usec2tick(100*((double)sz/bps)); which tends to underestimate the transmission time by using the smallest possible size for each cell. This is going to be long, Patrick. I know you don't like long emails, so if you don't have the patience for it stop reading now. The remainder of this email addresses this one point. Firstly, yes you are correct. It will under some circumstances underestimate the number of cells it takes to send a packet. The reason is because the whole aim of the ATM patch was to make maximum use of the ATM link, while at the same time keeping control of scheduling decisions. To keep control of scheduling decisions, we must _never_ overestimate the speed of the link. If we do the ISP will take control of the scheduling. At first sight this seems a minor issue. Its not, because the error can be large. An example of overestimating the link speed would be were one RTAB entry covers both the 2 and 3 cell cases. If we say the IP packet is going to use 2 cells, and in fact it uses 3, then the error is 50%. This is a huge error, and in fact eliminating this error is the whole point of the ATM patch. As an example of its impact, I was trying to make VOIP work over a shared link. If the ISP starts making the scheduling decisions then VOIP packets start being dropped or delayed, rendering VOIP unusable. So in order to use VOIP on the link I have to understate the link capacity by 50%. As it happens, VOIP generates a stream of packets in the 2-3 cell size range, the actual size depending on the codec negotiated by the end points. Jesper in his thesis gives perhaps an more important example what happens if you overestimate the link speed. It turns out in interacts with TCP's flow control badly, slowing down all TCP flows over the link. The reasons are subtle so I won't go into it here. But the end result is if
Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
On Thu, 2006-10-19 at 16:38 +0200, Patrick McHardy wrote: I still think this patch shouldn't go in. There's no point in doing the same thing twice, and I haven't heard a compelling argument why it has to be done in a way that only helps qdiscs using rtabs while ignoring statistics and estimators (I even provided a patch to show how to do it without these limitations). As far as I can see one patch changes the way the kernel calculates packet lengths, and the other modifies RTAB. I can not see the significant overlap between the patches that you talk about. As for why haven't got a new patch from me that addresses the doing the same thing twice issue - it is because I can see no such issue. I have asked you repeatedly to explain it further, but you have not done so. As for providing a patch - I believe at the time you called it something you were working on. As far as I know you still are working on it. Besides, even if you did intend me to take it further, I am not particularly interested in the packet length issue as it does not effect the real world performance of any of the qdiscs I use. Besides that: +static inline u32 qdisc_l2t(struct qdisc_rate_table* rtab, int pktlen) +{ + int slot = pktlen + rtab-rate.cell_align; + if (slot 0) + slot = 0; Why would it go negative? A negative cell_align doesn't make sense I guess. A negative cell align is possible, and in fact is typical. If slot ended up being less than 0 then the configuration parameters passed to tc would of been wrong - they could not of matched the actual traffic. The error can't be detected in tc, but it can't be allowed to cause the kernel to index off the end of an array either. + slot = rtab-rate.cell_log; + if (slot 255) + return rtab-data[255] + 1; Whats the point of this? Is it just to keep htb giant statistics working? Yes. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
Russell Stuart wrote: On Thu, 2006-10-19 at 16:38 +0200, Patrick McHardy wrote: I still think this patch shouldn't go in. There's no point in doing the same thing twice, and I haven't heard a compelling argument why it has to be done in a way that only helps qdiscs using rtabs while ignoring statistics and estimators (I even provided a patch to show how to do it without these limitations). As far as I can see one patch changes the way the kernel calculates packet lengths, and the other modifies RTAB. I can not see the significant overlap between the patches that you talk about. The implementation may be different, but the intention and the result is the same. I probably would mind less if it wouldn't affect userspace compatibility, but we need to carry this stuff for ever even if we add another implementation that covers all qdiscs. As for why haven't got a new patch from me that addresses the doing the same thing twice issue - it is because I can see no such issue. I have asked you repeatedly to explain it further, but you have not done so. What do I need to explain further? As I stated several times, I would like to see a patch that handles all qdiscs. And it should probably not have hardcoded ATM values, there is other media that behaves similar. As for providing a patch - I believe at the time you called it something you were working on. As far as I know you still are working on it. Besides, even if you did intend me to take it further, I am not particularly interested in the packet length issue as it does not effect the real world performance of any of the qdiscs I use. No, I'm not working on it currently, it was more meant for demonstration purposes. If you're not interested in taking the opinion of people working on the code into account thats your problem. Besides that: +static inline u32 qdisc_l2t(struct qdisc_rate_table* rtab, int pktlen) +{ + int slot = pktlen + rtab-rate.cell_align; + if (slot 0) + slot = 0; Why would it go negative? A negative cell_align doesn't make sense I guess. A negative cell align is possible, and in fact is typical. If slot ended up being less than 0 then the configuration parameters passed to tc would of been wrong - they could not of matched the actual traffic. The error can't be detected in tc, but it can't be allowed to cause the kernel to index off the end of an array either. I'm not sure I understand what you're saying here. The transmission time gets _smaller_ by transmitting over ATM? Does this have anything to do with the off-by-one during rate table calculation you or Jesper noticed? + slot = rtab-rate.cell_log; + if (slot 255) + return rtab-data[255] + 1; Whats the point of this? Is it just to keep htb giant statistics working? Yes. TBF and policers already make sure this can never happen, this is what HTB should do as well. If it is also needed for CBQ it is a bugfix and should be done seperately. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
On Mon, 2006-10-23 at 14:39 +0200, Patrick McHardy wrote: The implementation may be different, but the intention and the result is the same. I probably would mind less if it wouldn't affect userspace compatibility, but we need to carry this stuff for ever even if we add another implementation that covers all qdiscs. So where is the overlap? Your patch will make qdiscs that use packet length work with ATM. Ours makes the rest, ie those that use Alexy's RTAB, work with ATM. They would probably both apply with minimal conflicts. You have previously said you have no intention of changing the current RTAB implantation with your STAB patch. As an aside non-work conserving qdisc's that do scheduling are the real targets of ATM patch. The rest are not effected by ATM overly. The only one of those that doesn't use Alexy's RTAB is the one you introduced - HFSC. You are the best person to fix things so HFSC does work with ATM, and that is what I thought you were doing with the STAB patch. What do I need to explain further? As I stated several times, I would like to see a patch that handles all qdiscs. And it should probably not have hardcoded ATM values, there is other media that behaves similar. I am not aware of other media that behaves in a similar way, although I am no expert. What have I missed? The hard coded ATM values don't effect this patch btw, they are a user space thing only. Besides that: +static inline u32 qdisc_l2t(struct qdisc_rate_table* rtab, int pktlen) +{ + int slot = pktlen + rtab-rate.cell_align; + if (slot 0) + slot = 0; Why would it go negative? A negative cell_align doesn't make sense I guess. A negative cell align is possible, and in fact is typical. If slot ended up being less than 0 then the configuration parameters passed to tc would of been wrong - they could not of matched the actual traffic. The error can't be detected in tc, but it can't be allowed to cause the kernel to index off the end of an array either. I'm not sure I understand what you're saying here. The transmission time gets _smaller_ by transmitting over ATM? Does this have anything to do with the off-by-one during rate table calculation you or Jesper noticed? There is nothing I would describe as an off-by-one error in the RTAB calculation, so I can't be sure what you are referring to. The packetisation done by ATM does introduce rounding / truncation errors of up to ((1 cell_log) - 1). Negative cell alignments is the easiest way to fix that. + slot = rtab-rate.cell_log; + if (slot 255) + return rtab-data[255] + 1; Whats the point of this? Is it just to keep htb giant statistics working? Yes. TBF and policers already make sure this can never happen, this is what HTB should do as well. If it is also needed for CBQ it is a bugfix and should be done seperately. OK. I will change it. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
jamal wrote: The poor guy has been persistent and has some good ideas and we need to encourage him to stick around. Why dont you help him get the patch in the shape you think is reasonable? I know you are busy elsewhere and your patch has been a while since you last promised. I will try to help as well. I have explained what I would like to see several times and sent a patch to demonstrate it. I'd be happy to help further (I would like to see this go in eventually), but there's nothing I can do besides repeating myself as long as we only get to see the same patch again and again. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
jamal wrote: ACKed-by: Jamal Hadi Salim When Patrick has his patch ready after this goes in we can revisit. NACK. I still think this patch shouldn't go in. There's no point in doing the same thing twice, and I haven't heard a compelling argument why it has to be done in a way that only helps qdiscs using rtabs while ignoring statistics and estimators (I even provided a patch to show how to do it without these limitations). Besides that: +static inline u32 qdisc_l2t(struct qdisc_rate_table* rtab, int pktlen) +{ + int slot = pktlen + rtab-rate.cell_align; + if (slot 0) + slot = 0; Why would it go negative? A negative cell_align doesn't make sense I guess. + slot = rtab-rate.cell_log; + if (slot 255) + return rtab-data[255] + 1; Whats the point of this? Is it just to keep htb giant statistics working? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
On Thu, 2006-19-10 at 16:38 +0200, Patrick McHardy wrote: jamal wrote: ACKed-by: Jamal Hadi Salim When Patrick has his patch ready after this goes in we can revisit. NACK. I still think this patch shouldn't go in. There's no point in doing the same thing twice, and I haven't heard a compelling argument why it has to be done in a way that only helps qdiscs using rtabs while ignoring statistics and estimators (I even provided a patch to show how to do it without these limitations). The poor guy has been persistent and has some good ideas and we need to encourage him to stick around. Why dont you help him get the patch in the shape you think is reasonable? I know you are busy elsewhere and your patch has been a while since you last promised. I will try to help as well. Besides that: I will let Russell respond to the critique (As well as the concept of stats and estimator); Russell please try to be brief and to the point (I still have to learn that lesson myself ;-). cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
From: jamal [EMAIL PROTECTED] Date: Tue, 17 Oct 2006 09:07:24 -0400 ACKed-by: Jamal Hadi Salim When Patrick has his patch ready after this goes in we can revisit. If anything it's too late to put this into 2.6.19 so I'll likely therefore toss it into net-2.6.20 whenever I open that tree up. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
On Tue, 2006-17-10 at 09:34 +1000, Russell Stuart wrote: The Linux traffic's control engine inaccurately calculates transmission times for packets sent over ADSL links. For some packet sizes the error rises to over 50%. This occurs because ADSL uses ATM as its link layer transport, and ATM transmits packets in fixed sized 53 byte cells. This changes the kernel rate table lookup, to be able to lookup packet transmission times over all ATM links, including ADSL, with perfect accuracy. The accuracy is dependent on the rate table that is calculated in userspace by iproute2 command tc. A longer presentation of the patch, its rational, what it does and how to use it can be found here: http://www.stuart.id.au/russell/files/tc/tc-atm/ A earlier version of the patch, and a _detailed_ empirical investigation of its effects can be found here: http://www.adsl-optimizer.dk/ Signed-off-by: Jesper Dangaard Brouer [EMAIL PROTECTED] Signed-off-by: Russell Stuart [EMAIL PROTECTED] ACKed-by: Jamal Hadi Salim When Patrick has his patch ready after this goes in we can revisit. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html