Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-09-04 Thread Russell Stuart
On Wed, 2006-08-09 at 07:33 -0400, jamal wrote:
> > Jamal - unless there are other outstanding issues I have 
> > missed or someone has had second thoughts this means you
> > should be OK with the patch going in.  Can we get it into
> > Dave M's tree now?
> 
> Hi Russell,
> 
> My apologies; at some point i lost track of the discussion because
> things seemed under control. 
> Lets hear from Patrick - I have a lot of faith in him; i just happen to
> know he is occupied elsewhere as we speak (which may explain his
> silence). So please be a little patient with him. In the meantime please
> dont be discouraged in continuing to contribute - sometimes this is how
> things go.
> 
> You are right that if there is no generic solution from Patrick we
> should not hold your patch. My understanding of Patrick's patch is it
> covers for your case; i may have misread that given you and him are
> still discussing the differences. 
> 
> Perhaps one approach is to push your patch in first and then when
> Patrick is ready he can operate on that. Patrick, thoughts?

Jamal, I think its time to assume Patrick has lost interest.
Is it possible to get this in now?


-- 
VGER BF report: U 0.542723
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-08-09 Thread jamal
On Wed, 2006-09-08 at 08:01 +1000, Russell Stuart wrote:
> On Mon, 2006-07-31 at 09:06 +1000, Russell Stuart wrote:
> > It has gone quiet again.  In my mind the one unresolved issue
> > is whether Patrick intended to remove RTAB with his patch.
> > If not, the ATM patch as it stands will have to go in.
> > 
> > Patrick - it would be nice to hear from you.
> 
> It appears Patrick isn't going to reply.  The situation as
> I see it is:
> 
> - Patrick's last post said his STAB patch wasn't going to
>   touch RTAB, so regardless of whether Patrick's STAB
>   patch proceeds or not the ATM patch will be needed to
>   make the current kernel work with ATM.
> 
> - If I read Jesper's last post correctly, he said Patrick
>   acknowledged to him in a private email that RTAB had to 
>   change to accommodate ATM, and in fact called it a bug
>   in RTAB.  Fixing this "bug" is what the kernel part of
>   the current ATM patch does - and it does it in a 
>   backward compatible way.
> 
> - Jamal, you said that this ATM patch should go in if 
>   Patrick didn't come up with a better solution.
> 
> Jamal - unless there are other outstanding issues I have 
> missed or someone has had second thoughts this means you
> should be OK with the patch going in.  Can we get it into
> Dave M's tree now?

Hi Russell,

My apologies; at some point i lost track of the discussion because
things seemed under control. 
Lets hear from Patrick - I have a lot of faith in him; i just happen to
know he is occupied elsewhere as we speak (which may explain his
silence). So please be a little patient with him. In the meantime please
dont be discouraged in continuing to contribute - sometimes this is how
things go.

You are right that if there is no generic solution from Patrick we
should not hold your patch. My understanding of Patrick's patch is it
covers for your case; i may have misread that given you and him are
still discussing the differences. 

Perhaps one approach is to push your patch in first and then when
Patrick is ready he can operate on that. Patrick, thoughts?

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-08-08 Thread Russell Stuart
On Mon, 2006-07-31 at 09:06 +1000, Russell Stuart wrote:
> It has gone quiet again.  In my mind the one unresolved issue
> is whether Patrick intended to remove RTAB with his patch.
> If not, the ATM patch as it stands will have to go in.
> 
> Patrick - it would be nice to hear from you.

It appears Patrick isn't going to reply.  The situation as
I see it is:

- Patrick's last post said his STAB patch wasn't going to
  touch RTAB, so regardless of whether Patrick's STAB
  patch proceeds or not the ATM patch will be needed to
  make the current kernel work with ATM.

- If I read Jesper's last post correctly, he said Patrick
  acknowledged to him in a private email that RTAB had to 
  change to accommodate ATM, and in fact called it a bug
  in RTAB.  Fixing this "bug" is what the kernel part of
  the current ATM patch does - and it does it in a 
  backward compatible way.

- Jamal, you said that this ATM patch should go in if 
  Patrick didn't come up with a better solution.

Jamal - unless there are other outstanding issues I have 
missed or someone has had second thoughts this means you
should be OK with the patch going in.  Can we get it into
Dave M's tree now?

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-07-30 Thread Russell Stuart
On Thu, 2006-07-20 at 14:56 +1000, Russell Stuart wrote:
> On Wed, 2006-07-19 at 16:50 +0200, Patrick McHardy wrote:
> > Please excuse my silence, I was travelling and am still catching up
> > with my mails.
> 
> Sorry.  Had I realised you were busy I would of
> waited.
> 
> > > - As it stands, it doesn't help the qdiscs that use 
> > >   RTAB.  So unless he proposes to remove RTAB entirely 
> > >   the ATM patch as it will still have to go in.
> > 
> > Why? The length calculated by my STABs (or something similar)
> > is used by _all_ qdiscs. Not only for transmission time calculation,
> > but also for statistics and estimators.
> 
> Oh.  I didn't see where it is used for the time 
> calculation in your patch.  Did I miss something,
> or is that the unfinished bit?
> 
> This is possibly my stumbling block.  If you don't remove
> RTAB the ATM patch as stands will be needed.  Your patch
> didn't remove RTAB, and you didn't say it was intended to,
> so I presume it wasn't going to.

It has gone quiet again.  In my mind the one unresolved issue
is whether Patrick intended to remove RTAB with his patch.
If not, the ATM patch as it stands will have to go in.

Patrick - it would be nice to hear from you.



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL (RTAB BUG)

2006-07-20 Thread Alexey Kuznetsov
Hello!

> It shouldn't be.  Any decimal number can be expressed
> as a fraction, eg:

I remember this. :-) I stalled selecting corrects divisors
to fight over/underflows. Not becuase it was difficult,
just because did not see a reason to do this.



> But doing so would get rid of the table implementation 
> and the flexibility it has given us to date.  For that 
> reason I feel uncomfortable with it.
> 
> The engineering decision becomes this - are there any
> other protocols like ATM out there that could justify 
> such a change? 

Is it faster? You say, yes. Is it required? You say, yes.
Is there some protocols, which needs more flexibility? No.

> know a good deal more about them than I do.  What say
> you?

Frankly, I seriously believed that rtabs is a good way to handle ATM. :-)
I seriously believed that you have to do something like:
((packet_size+cell_payload-1)/cell_payload)*cell_size

So, if in reality even this protocol does not justify keeping ratbs, kill
them.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL (RTAB BUG)

2006-07-19 Thread Russell Stuart
On Thu, 2006-07-20 at 01:00 +0400, Alexey Kuznetsov wrote:
> Hello!

So you really do exist?  I thought it was just
rumour.

> Well, if fixed point arithmetics is not a problem.

It shouldn't be.  Any decimal number can be expressed
as a fraction, eg:

  0.00123 = 123/10

Which can be calculated as a multiply and a divide. With
MTU's up to 2048, it should be possible to do this with
99.% accuracy (ie 2048/2^23).

With a bit more work in userspace (ie in tc), it can be
be reduced to a multiply and a shift.

> Plus, remember, the function is not R*size, it is at least
> R*size+addend, to account for link overhead. Plus account for padding
> of small packets. Plus, when policing it should deaccount already added
> link headers, QoS counts only network payload.

Yes, it is flexible - and has served us well up until
now.  It doesn't work well for ATM, but with a small
bit of extra calculation in the kernel it could.
However, it turns out that ATM is a special case.  If 
ATM's cell payload was 58 bytes instead of 48 bytes 
(say), then it would not be possible to produce a RTAB 
that had small errors (eg < 10%) for smallish packet 
sizes (< 290 bytes).  I seem to have trouble 
explaining why in a concise way that people understand, 
so I won't try here.

So when Alan Cox said our ATM patch didn't solve the 
packetisation problem in general, he was right as our
patch just built upon RTAB.  Patrick's STAB proposal 
in general either for that matter, as it is just another 
implementation of RTAB with the same limitations.  The 
only way I can think of to solve it in general is to 
move many more calculations into the kernel - as I 
proposed in a long winded answer to Patrick earlier 
in this thread.

But doing so would get rid of the table implementation 
and the flexibility it has given us to date.  For that 
reason I feel uncomfortable with it.

The engineering decision becomes this - are there any
other protocols like ATM out there that could justify 
such a change?  (In my more cynical moments I think of 
it differently - has/is the world going to make a 
second engineering fuck up on the scale of ATM again?  
How on earth did anyone decide that pushing data 
packets over ATM, as happens in ADSL, was a good 
idea?)  I know of no other such protocols.  But then
I don't have an encyclopedic knowledge of comms
protocols, so that doesn't mean much.  I suspect you
know a good deal more about them than I do.  What say
you?

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-07-19 Thread Russell Stuart
On Wed, 2006-07-19 at 16:50 +0200, Patrick McHardy wrote:
> Please excuse my silence, I was travelling and am still catching up
> with my mails.

Sorry.  Had I realised you were busy I would of
waited.

> > - As it stands, it doesn't help the qdiscs that use 
> >   RTAB.  So unless he proposes to remove RTAB entirely 
> >   the ATM patch as it will still have to go in.
> 
> Why? The length calculated by my STABs (or something similar)
> is used by _all_ qdiscs. Not only for transmission time calculation,
> but also for statistics and estimators.

Oh.  I didn't see where it is used for the time 
calculation in your patch.  Did I miss something,
or is that the unfinished bit?

This is possibly my stumbling block.  If you don't remove
RTAB the ATM patch as stands will be needed.  Your patch
didn't remove RTAB, and you didn't say it was intended to,
so I presume it wasn't going to.

>  If the length calculation
> doesn't fit for ATM, that can be fixed.

Yes of course.  Just to be clear: as far as I am concerned
this never was an issue.

> > - A bit of effort was put into making this current
> >   ATM patch both backwards and forwards compatible.
> >   Patricks patch would work with newer kernels,
> >   obviously.  Older kernels, and in particular the
> >   kernel that Debian is Etch is likely to distribute
> >   would miss out.
> 
> True, but it provides more consistency, and making current
> kernels behave better is more important than old kernels.

I guess provided the new "tc" works with older kernels this
is OK - although a disappoint to me.  Works here being defined
as "works as well as a previous the version of tc does".  For 
me not working would be OK as well provided "tc" issued a 
warning message to the effect that it "needs kernel version 
XXX or above"", but doing that would probably require it to 
look at the kernel version.  Looking at the kernel version 
in tc seems to be frowned upon.

> You seem to have misunderstood my patch. It doesn't need to
> touch RTABs, it just calculates the packet length as seen
> on the wire (whereever it is) and uses that thoughout the
> entire qdisc layer.

No, you have it in reverse - as I said above.  My problem is 
that your patch does not touch RTAB.  Several qdiscs really 
don't care about the length of a packet (other than for 
keeping track of stats) - they just care about how long 
it takes to send.  Off the top of my these are HTB, CBQ 
and TBF.  They use RTAB to make this calculation.  So unless
you replace RTAB with STAB the current ATM patch will still 
be needed.

> > One other point - the optimisation Patrick proposes
> > for STAB (over RTAB) was to make the number of entries
> > variable.  This seems like a good idea.  However there 
> > is no such thing as a free lunch, and if you did 
> > indeed reduce the number of entries to 16 for Ethernet 
> > (as I think Patrick suggested), then each entry would
> > cover 1500/16 = 93 different packet lengths.  Ie,
> > entry 0 would cover packet lengths 0..93, entry 1
> > 94..186, and so on.  A single entry can't be right
> > for all those packet lengths, so again we are back
> > to a average 30% error for typical VOIP length
> > packets.
> 
> My patch doesn't uses fixed sized cells, so it can deal
> with anything, worst case is you use one cell per packet
> size. Optimizing size and lookup speed for ethernet makes
> a lot more sense than optimizing for ADSL.

I was just responding to a point you made earlier, when
you said STAB could only use 16 entries as opposed to the
256 used by RTAB.  I suspect nobody would actually do that 
because of the inaccuracy it creates, so the comparison is
perhaps unfair.  I agree the flexibility of making STAB 
variable length is a good idea, and comes at 0 cost in 
the kernel.

Andy Furniss wrote:
> > Russell Stuart wrote:
> >> The kernel will have to do a shift and a division
> >> for each packet, which I assume is permissible.
> > 
> > 
> > I guess that is for others to decide :-) I think Patrick has a point
> > about sfq/htb drr, Like you I guess, I thought that alot of extra per
> > packet calculations would have got an instant NO.
> 
> Its only done once per packet (currently, it might be interesting to
> override the length for specific classes and their childs, for example
> if you do queueing on eth0 and have an DSL router one hop apart).
> The division is gone in my patch btw.

Unlike the packet length the time calculation can't be
cached in the skb.  Most classes in HTB/CBQ use different
packet transmission rates.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL (RTAB BUG)

2006-07-19 Thread Alexey Kuznetsov
Hello!

> Guess that Alexey wrote these RTAB lookup in a time where array lookups 
> was faster... now we have that memory lookups are the bottleneck.

No, they were slower from the very beginning. If I remember correctly,
there is comment about this somewhere.

I just did not find any simple way to do 32 bit fixed point arithmetics
scaling from bps to Gbps and was lazy to investigate this further,
tables are much simpler and more flexible.


> What about removing the RTAB system entirely?

Well, if fixed point arithmetics is not a problem.

Plus, remember, the function is not R*size, it is at least
R*size+addend, to account for link overhead. Plus account for padding
of small packets. Plus, when policing it should deaccount already added
link headers, QoS counts only network payload.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL (RTAB BUG)

2006-07-19 Thread Jesper Dangaard Brouer


Russell Stuart wrote:


- As it stands, it doesn't help the qdiscs that use 
RTAB.  So unless he proposes to remove RTAB entirely the ATM patch as 
it will still have to go in.


Here is a very important point here:

 The RTAB (rate-table) in the kernel is NOT aligned, this is the ONLY
 reason why we need to patch the kernel.

This why the kernel RTAB patch is still needed even with Patrick's 
excelent STAB patch.  (Maybe the RTAB system should be removed entirely?)


I have discussed this RTAB issue with Patrick (off list) and he denoted 
this RTAB issue as a regular BUG.



This is the problem with the RTAB:
--
The "hash" function for packet size lookups is a simple binary shift
operation.

 rtab[pkt_len>>cell_log] = pkt_xmit_time;

With a cell_log of 3 (the default), this implies that:
pkt_len  0 to  7 goes into entry 0,
and pkt_len  8 to 15 goes into entry 1,
and pkt_len 16 to 23 goes into entry 2,
and pkt_len 24 to 31 goes into entry 3,
and so on...

Current mapping:
entry[0](maps: 0- 7)=xmit_size:0
entry[1](maps: 8-15)=xmit_size:8
entry[2](maps:16-23)=xmit_size:16
entry[3](maps:24-31)=xmit_size:24
entry[4](maps:32-39)=xmit_size:32
entry[5](maps:40-47)=xmit_size:40
entry[6](maps:48-55)=xmit_size:48

When the table is constructed (in tc_core.c) the pkt_xmit_time is 
calculated from the lower boundary.  Meaning that transmitting a 7 byte 
packet "costs" 0 bytes to transmit.  The zero transmit cost is properly 
not a real-world problem as the IP header bounds the minimum packet size.


for (i=0; i<256; i++) {
  unsigned sz = (i<>cell_log]

and adjusting the xmit_size accordingly.
Giving the table:

 entry[0](maps:1-8)=xmit_size:8
 entry[1](maps:9-16)=xmit_size:16
 entry[2](maps:15-24)=xmit_size:24
 entry[3](maps:24-32)=xmit_size:32
 entry[4](maps:33-40)=xmit_size:40
 entry[5](maps:41-48)=xmit_size:48
 entry[6](maps:47-56)=xmit_size:56

The xmit_size is done like this:

for (i=0; i<256; i++) {
  unsigned sz = ((i+1)<>cell_log]


Another fix for the RTAB:
-
Remove the RTAB array lookups and calculate the pkt_xmit_time instead.

Guess that Alexey wrote these RTAB lookup in a time where array lookups 
was faster... now we have that memory lookups are the bottleneck.


What about removing the RTAB system entirely?


Cheers,
  Jesper Brouer

--
---
MSc. Master of Computer Science
Dept. of Computer Science, University of Copenhagen
Author of http://www.adsl-optimizer.dk
---
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-07-19 Thread Patrick McHardy
Andy Furniss wrote:
> Russell Stuart wrote:
> 
>> - As it stands, it doesn't help the qdiscs that use   RTAB.  So unless
>> he proposes to remove RTAB entirely   the ATM patch as it will still
>> have to go in.
> 
> 
> Hmm - I was just looking at the kernel changes to htb. The only
> difference is the len - I am blindly assuming that it does/will return
> the link lengths properly for atm.
> 
> So for atm, qdisc_tx_len(skb) will always return lengths that are
> multiples of 53.
> 
> If nothing else were done we would suffer innacuarcy from the cell_log
> just like eth.
> 
> But no other kernel hack would be needed to do it perfectly - rather
> like we (who patch for atm already) just fill the tc generated rate
> table with what we like, that would be an option.

That is how it should work. If the calculation doesn't fit, lets fix it.

>> The kernel will have to do a shift and a division
>> for each packet, which I assume is permissible.
> 
> 
> I guess that is for others to decide :-) I think Patrick has a point
> about sfq/htb drr, Like you I guess, I thought that alot of extra per
> packet calculations would have got an instant NO.

Its only done once per packet (currently, it might be interesting to
override the length for specific classes and their childs, for example
if you do queueing on eth0 and have an DSL router one hop apart).
The division is gone in my patch btw.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-07-19 Thread Patrick McHardy
Russell Stuart wrote:
> On Tue, 2006-07-18 at 22:46 +0100, Andy Furniss wrote: 
> 
>>FWIW I think it may be possible to do it Patricks' way, as if I read it 
>>properly he will end up with the ATM cell train length which gets 
>>shifted by cell_log and looked up as before. The ATM length will be in 
>>steps of 53 so with cell_log 3 or 4 I think there will be no collisions 
>>- so special rate tables for ATM can still be made perfect.

Please excuse my silence, I was travelling and am still catching up
with my mails.

> Patrick is proposing that the packet lengths be sent to 
> the kernel in a similar way to how transmission times (ie
> RTAB) is sent now.  I agree that is how things should be 
> done - but it doesn't have much to do with the ATM patch, 
> other than he has allowed for ATM in the way he does the 
> calculation in the kernel [1].
> 
> In particular:
> 
> - As it stands, it doesn't help the qdiscs that use 
>   RTAB.  So unless he proposes to remove RTAB entirely 
>   the ATM patch as it will still have to go in.

Why? The length calculated by my STABs (or something similar)
is used by _all_ qdiscs. Not only for transmission time calculation,
but also for statistics and estimators. If the length calculation
doesn't fit for ATM, that can be fixed.

> - A bit of effort was put into making this current
>   ATM patch both backwards and forwards compatible.
>   Patricks patch would work with newer kernels,
>   obviously.  Older kernels, and in particular the
>   kernel that Debian is Etch is likely to distribute
>   would miss out.

True, but it provides more consistency, and making current
kernels behave better is more important than old kernels.

> If Patrick did intend remove RTAB entirely then he
> needs to add a fair bit more into his patch.  Since 
> RTAB is just STAB scaled, its certainly possible.
> The kernel will have to do a shift and a division
> for each packet, which I assume is permissible.

You seem to have misunderstood my patch. It doesn't need to
touch RTABs, it just calculates the packet length as seen
on the wire (whereever it is) and uses that thoughout the
entire qdisc layer.

>>As you say, I think mpu should be added aswell - so eth/other can benefit.
> 
> 
> Not really.  The MPU is reflected in the STAB table,
> just as it is for RTAB.
> 
> One other point - the optimisation Patrick proposes
> for STAB (over RTAB) was to make the number of entries
> variable.  This seems like a good idea.  However there 
> is no such thing as a free lunch, and if you did 
> indeed reduce the number of entries to 16 for Ethernet 
> (as I think Patrick suggested), then each entry would
> cover 1500/16 = 93 different packet lengths.  Ie,
> entry 0 would cover packet lengths 0..93, entry 1
> 94..186, and so on.  A single entry can't be right
> for all those packet lengths, so again we are back
> to a average 30% error for typical VOIP length
> packets.

My patch doesn't uses fixed sized cells, so it can deal
with anything, worst case is you use one cell per packet
size. Optimizing size and lookup speed for ethernet makes
a lot more sense than optimizing for ADSL.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-07-19 Thread Andy Furniss

Russell Stuart wrote:
On Tue, 2006-07-18 at 22:46 +0100, Andy Furniss wrote: 

FWIW I think it may be possible to do it Patricks' way, as if I read it 
properly he will end up with the ATM cell train length which gets 
shifted by cell_log and looked up as before. The ATM length will be in 
steps of 53 so with cell_log 3 or 4 I think there will be no collisions 
- so special rate tables for ATM can still be made perfect.



Patrick is proposing that the packet lengths be sent to 
the kernel in a similar way to how transmission times (ie
RTAB) is sent now.  I agree that is how things should be 
done - but it doesn't have much to do with the ATM patch, 
other than he has allowed for ATM in the way he does the 
calculation in the kernel [1].


In particular:

- As it stands, it doesn't help the qdiscs that use 
  RTAB.  So unless he proposes to remove RTAB entirely 
  the ATM patch as it will still have to go in.


Hmm - I was just looking at the kernel changes to htb. The only 
difference is the len - I am blindly assuming that it does/will return 
the link lengths properly for atm.


So for atm, qdisc_tx_len(skb) will always return lengths that are 
multiples of 53.


If nothing else were done we would suffer innacuarcy from the cell_log 
just like eth.


But no other kernel hack would be needed to do it perfectly - rather 
like we (who patch for atm already) just fill the tc generated rate 
table with what we like, that would be an option.




- A bit of effort was put into making this current
  ATM patch both backwards and forwards compatible.
  Patricks patch would work with newer kernels,
  obviously.  Older kernels, and in particular the
  kernel that Debian is Etch is likely to distribute
  would miss out.

If Patrick did intend remove RTAB entirely then he
needs to add a fair bit more into his patch.  Since 
RTAB is just STAB scaled, its certainly possible.

The kernel will have to do a shift and a division
for each packet, which I assume is permissible.


I guess that is for others to decide :-) I think Patrick has a point 
about sfq/htb drr, Like you I guess, I thought that alot of extra per 
packet calculations would have got an instant NO.





As you say, I think mpu should be added aswell - so eth/other can benefit.



Not really.  The MPU is reflected in the STAB table,
just as it is for RTAB.


OK, I was thinking of what Jamal said about helping others, so 
everything TC should be capable of accepting mpu and overhead with these 
patches - or is more work needed?


It will be good to be able to say

tc ... police rate 500kbit mpu 60 overhead 24 ... for eth.
(Assuming eth mpu/overhead are really 46/38 - p in mpu is payload AIUI 
so 60 and 24 come from allowing for skb->len being IP+14)


or for ATM + pppoa something like

tc ... police rate 500kbit overhead 10 atm ...

In the case of eth someone already added mpu/overhead for HTB and it 
doesn't need any extra per packet calcs. I guess this way it would.




One other point - the optimisation Patrick proposes
for STAB (over RTAB) was to make the number of entries
variable.  This seems like a good idea.  However there 
is no such thing as a free lunch, and if you did 
indeed reduce the number of entries to 16 for Ethernet 
(as I think Patrick suggested), then each entry would

cover 1500/16 = 93 different packet lengths.  Ie,
entry 0 would cover packet lengths 0..93, entry 1
94..186, and so on.  A single entry can't be right
for all those packet lengths, so again we are back
to a average 30% error for typical VOIP length
packets.


I agree less accuracy will not be nice. But as an option it could be the 
only way you can do 1/10Gig + jumbo frames.


Andy.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-07-18 Thread Russell Stuart
On Tue, 2006-07-18 at 22:46 +0100, Andy Furniss wrote: 
> FWIW I think it may be possible to do it Patricks' way, as if I read it 
> properly he will end up with the ATM cell train length which gets 
> shifted by cell_log and looked up as before. The ATM length will be in 
> steps of 53 so with cell_log 3 or 4 I think there will be no collisions 
> - so special rate tables for ATM can still be made perfect.

Patrick is proposing that the packet lengths be sent to 
the kernel in a similar way to how transmission times (ie
RTAB) is sent now.  I agree that is how things should be 
done - but it doesn't have much to do with the ATM patch, 
other than he has allowed for ATM in the way he does the 
calculation in the kernel [1].

In particular:

- As it stands, it doesn't help the qdiscs that use 
  RTAB.  So unless he proposes to remove RTAB entirely 
  the ATM patch as it will still have to go in.

- A bit of effort was put into making this current
  ATM patch both backwards and forwards compatible.
  Patricks patch would work with newer kernels,
  obviously.  Older kernels, and in particular the
  kernel that Debian is Etch is likely to distribute
  would miss out.

If Patrick did intend remove RTAB entirely then he
needs to add a fair bit more into his patch.  Since 
RTAB is just STAB scaled, its certainly possible.
The kernel will have to do a shift and a division
for each packet, which I assume is permissible.

> As you say, I think mpu should be added aswell - so eth/other can benefit.

Not really.  The MPU is reflected in the STAB table,
just as it is for RTAB.

One other point - the optimisation Patrick proposes
for STAB (over RTAB) was to make the number of entries
variable.  This seems like a good idea.  However there 
is no such thing as a free lunch, and if you did 
indeed reduce the number of entries to 16 for Ethernet 
(as I think Patrick suggested), then each entry would
cover 1500/16 = 93 different packet lengths.  Ie,
entry 0 would cover packet lengths 0..93, entry 1
94..186, and so on.  A single entry can't be right
for all those packet lengths, so again we are back
to a average 30% error for typical VOIP length
packets.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-07-18 Thread Andy Furniss

Russell Stuart wrote:

On Sat, 2006-06-24 at 10:13 -0400, jamal wrote:


And yes, I was arguing that the tc scheme you describe would not be so
bad either if the cost of making a generic change is expensive.





Patrick seems to have a simple way to compensate generically for link
layer fragmentation, so i will not argue the practically; hopefully that
settles it? ;->



Things seem to have died down.  Patrick's patch seemed 
unrelated to ATM to me.  I did put up another suggestion, 
but I don't think anybody was too impressed with the 
idea.  So that leave the current ATM patch as the only 
one we have on the table that addresses the ATM issue.


FWIW I think it may be possible to do it Patricks' way, as if I read it 
properly he will end up with the ATM cell train length which gets 
shifted by cell_log and looked up as before. The ATM length will be in 
steps of 53 so with cell_log 3 or 4 I think there will be no collisions 
- so special rate tables for ATM can still be made perfect.


As you say, I think mpu should be added aswell - so eth/other can benefit.


Andy.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-07-18 Thread jamal
Russell,
I apologize i havent followed the latest discussion to the detail.
My understanding of Patricks work was it solved the ATM problem as well.
I think he is busy somewhere - lets give him an opportunity to respond
and i will try to catchup with the thread as well.

cheers,
jamal

On Tue, 2006-18-07 at 12:06 +1000, Russell Stuart wrote:
> On Sat, 2006-06-24 at 10:13 -0400, jamal wrote:
> > And yes, I was arguing that the tc scheme you describe would not be so
> > bad either if the cost of making a generic change is expensive.
> 
> > Patrick seems to have a simple way to compensate generically for link
> > layer fragmentation, so i will not argue the practically; hopefully that
> > settles it? ;->
> 
> Things seem to have died down.  Patrick's patch seemed 
> unrelated to ATM to me.  I did put up another suggestion, 
> but I don't think anybody was too impressed with the 
> idea.  So that leave the current ATM patch as the only 
> one we have on the table that addresses the ATM issue.
> 
> Since you don't think it is "too bad", can we proceed 
> with it?
> 
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-07-17 Thread Russell Stuart
On Sat, 2006-06-24 at 10:13 -0400, jamal wrote:
> And yes, I was arguing that the tc scheme you describe would not be so
> bad either if the cost of making a generic change is expensive.

> Patrick seems to have a simple way to compensate generically for link
> layer fragmentation, so i will not argue the practically; hopefully that
> settles it? ;->

Things seem to have died down.  Patrick's patch seemed 
unrelated to ATM to me.  I did put up another suggestion, 
but I don't think anybody was too impressed with the 
idea.  So that leave the current ATM patch as the only 
one we have on the table that addresses the ATM issue.

Since you don't think it is "too bad", can we proceed 
with it?

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-07-10 Thread Russell Stuart
On Fri, 2006-07-07 at 10:00 +0200, Patrick McHardy wrote:
> Russell Stuart wrote:
> > Unfortunately you do things in the wrong order for ATM.
> > See: http://mailman.ds9a.nl/pipermail/lartc/2006q1/018314.html
> > for an overview of the problem, and then the attached email for
> > a detailed description of how the current patch addresses it.
> > It is a trivial fix.
> 
> Actually that was the part I didn't understand, you keep talking
> (also in that comment in tc_core.c) about an "unknown overhead".
> What is that and why would it be unknown? The mail you attached
> is quite long, is there an simple example that shows what you
> mean?

The "unknown overhead" is just the overhead passed to tc
using the "tc ... overhead xxx" option.  It is probably
what you intended to put into your addend attribute.

It is "unknown" because the kernel currently doesn't use
it.  It is passed in the tc_ratespec, but is ignored by
the kernel as are most fields in there.

The easy way to fix the "ATM" problem described in the big
comment is simply to add the "overhead" to the packet 
length before doing the RTAB lookup.  (Identical comments 
apply to STAB).  If you don't accept this or understand
why, then go read the "long emails" which attempt to
explain it in detail.  Jesper's initial version of the
patch did just that, BTW.

However if you do that then you have to adjust RTAB for
all cases (not just ATM) to reflect that the kernel is 
now adding the overhead.  Thus the RTAB tc sends to the 
kernel now changes for different kernel versions, making 
modern versions of tc incompatible with older kernels, 
and visa versa.  I didn't consider that acceptable.

My solution to this to give the kernel the old format
RTAB (ie the one that assumed the kernel didn't add the
overhead) and a small adjustment.  This small adjustment 
is called cell_align in the ATM patch.  You do the same 
thing with cell_align as the previous solution did with 
the overhead - ie add it in just before looking up RTAB.  
This is in effect all the kernel part of the ATM patch
does - make the kernel accept the cell_align option,
and add it to skb->len before looking up RTAB.

The difference between cell_align and overhead is that
cell_align is always 0 when there is no packetisation,
and even when non zero it is small (less than 1< > However, now you lot have made me go away and think, I have
> > another idea on how to attack this.  Perhaps it will be
> > more palatable to you.  It would replace RTAB and STAB with
> > a 28 byte structure for most protocol stacks - well all I can
> > think of off the top of my head, anyway.  RTAB would have to
> > remain for backwards compatibility, of course.
> 
> Can you describe in more detail?

OK, but first I want to make the point that the only
reason I suggest this is to get some sort of ATM
patch into the kernel, as the current patch on the
table is having a rough time.

Alan Cox made the point earlier (if I understood him
correctly) that this tabling lookup probably isn't
a big win on modern CPU's - we may be better off
moving it all into the kernel.  Thinking about this,
I tried to come up with a way of describing the
mapping between skb->len and the on the wire packet
length for every protocol I know.  This is what I
came up with.

Assume we have a packet length L, which is to be
transported by some protocol.  For now we consider
one protocol only, ie: TCP, PPP, ATM, Ethernet or
whatever.   I will generalise it to multiple protocols
later.  I think a generalised transformation can be 
made using using 5 numbers which are applied in this
order:

  Overhead - A fixed overhead that is added to L.

  Mpu  - Minimum packet size.  If the result of
 (Overhead+L) is smaller that this, then 
 the new result becomes this size.

  Round- The result is then rounded up to this
 many bytes.  For protocols that always
 transmit single bytes this figure would be
 1.  If there were some protocol that
 transmitted data as 4 byte chunks then this
 would be 4.  For ATM it is 48.

  CellPay  - If the packet is broken down into smaller
 packets when sent, then this is the amount
 of data that will fit into each chunk.

  CallOver - This is the additional overhead each cell
 carries.

The idea is the kernel would do this calculation on the
fly for each packet.  If you represent this set of number 
numbers as a comma separated list in the order they were 
presented above, then here are some examples:

  IP:   20
  Ethernet: 18,64
  PPP:  2
  ATM:  0,0,48,48,5

It may be that 5 numbers are a overkill.  It is for all
protocols I am aware of - for those you could get away
with 4.  But I am no expert.

The next step is to generalise for many protocols.  As
the protocols are stacked the length output by one 
protocol becoming the input length for the downstream 
one.  So we just need to apply the same tran

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-07-07 Thread Patrick McHardy
Russell Stuart wrote:
> Unfortunately you do things in the wrong order for ATM.
> See: http://mailman.ds9a.nl/pipermail/lartc/2006q1/018314.html
> for an overview of the problem, and then the attached email for
> a detailed description of how the current patch addresses it.
> It is a trivial fix.

Actually that was the part I didn't understand, you keep talking
(also in that comment in tc_core.c) about an "unknown overhead".
What is that and why would it be unknown? The mail you attached
is quite long, is there an simple example that shows what you
mean?

> As I said earlier, RTAB and STAB contain the same numbers,
> just scaled differently.  The ATM patch stuffed around with
> RTAB.  With your patch in place it will have to do the same 
> exactly the same thing with STAB - because RTAB and STAB
> carry the same data.  So to me the two patches seem
> orthogonal.

Yes. As I said in the beginning of this thread, we could still
do the RTAB thing for qdiscs that support it. But since the
visible size should be the same for all qdiscs (including child
ones) we need to do the initial calculation anyway, so I don't
see any purpose in still adjusting the rate tables and using
skb->len instead of using the adjusted size.

> One observation is the size optimisation you applied to STAB, 
> making it variable length, could also be applied to RTAB.  
> In fact it should be.  Then they would be identical, apart 
> from the scaling.  Even the lookup operation (performed in
> qdisc_init_len in your patch) would be identical.

We can do that. I'm not attached to the size tables :)

> However, now you lot have made me go away and think, I have
> another idea on how to attack this.  Perhaps it will be
> more palatable to you.  It would replace RTAB and STAB with
> a 28 byte structure for most protocol stacks - well all I can
> think of off the top of my head, anyway.  RTAB would have to
> remain for backwards compatibility, of course.

Can you describe in more detail?
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-07-05 Thread Russell Stuart
On Tue, 2006-07-04 at 15:29 +0200, Patrick McHardy wrote:
> Unfortunately I still didn't got to cleaning them up, so I'm sending
> them in their preliminary state. Its not much that is missing, but
> the netem usage of skb->cb needs to be integrated better, I failed
> to move it to the qdisc_skb_cb so far because of circular includes.

Cleanups aside, architecturally the bulk of your patch 
looks like a no-brainier to me.  The calculation of
packet length should be in one place.  Caching it in
skb->cb was a nice touch.

> But nothing unfixable. I'm mostly interested if the current size-tables
> can express what you need for ATM, I wasn't able to understand the
> big comment in tc_core.c in your patch.

Unfortunately you do things in the wrong order for ATM.
See: http://mailman.ds9a.nl/pipermail/lartc/2006q1/018314.html
for an overview of the problem, and then the attached email for
a detailed description of how the current patch addresses it.
It is a trivial fix.

As I said earlier, RTAB and STAB contain the same numbers,
just scaled differently.  The ATM patch stuffed around with
RTAB.  With your patch in place it will have to do the same 
exactly the same thing with STAB - because RTAB and STAB
carry the same data.  So to me the two patches seem
orthogonal.

One observation is the size optimisation you applied to STAB, 
making it variable length, could also be applied to RTAB.  
In fact it should be.  Then they would be identical, apart 
from the scaling.  Even the lookup operation (performed in
qdisc_init_len in your patch) would be identical.

However, now you lot have made me go away and think, I have
another idea on how to attack this.  Perhaps it will be
more palatable to you.  It would replace RTAB and STAB with
a 28 byte structure for most protocol stacks - well all I can
think of off the top of my head, anyway.  RTAB would have to
remain for backwards compatibility, of course.

--- Begin Message ---
Explanation of rate calculation code.

tc_core.c lines 53..61, tc_align_to_cells():

  Does the same thing as you function of the same name.
  Code is intended to be semantically equivalent.


tc_core.c lines 145..160, in tc_calc_ratespec:

  There are two material differences to your code.  Both
  are in this line:

int sz = ((i+1)len/8]:  1  1  1  1  1  1  1  1  1 x2  2  2

Each column is for a particular packet length.  The rows
show the packet length as seen by a layer of the protocol.  
The top row, "ATM Payload", contains packet length seen 
by the ATM layer - ie the ATM payload.  It includes all 
overheads bar cell overhead and padding.  It is always 
equal to: skb->len + overhead.

The second row, "skb->len" contains the packet seen by 
the kernel.  It is just the figure in the first line
less the "overhead" passed to tc.

The third line shows how "tc" will calculate the rate 
table entry for the packet length.  It contains the
number of ATM cells "tc" thinks the packet will occupy.
I am assuming a cell_log of 3 here, meaning each rate
table entry is used for 8 different packet lengths.  
In an unpatched kernel a single rate table entry may 
apply to packets which use different numbers of ATM 
cells.  Since the rate table entry is a single number 
it cant be right for all packet lengths when that 
happens.  If faced with this ambiguity tc uses the number
of cells occupied by the longest packet the rate table 
entry applies to. This means the rate table will be wrong 
for the shorter packets, and this is highlighted in the
diagram an "x".

Thus we see from the table that for a 39 byte packet tc
thinks it will take 1 cell to send.  But for a 48 byte
packet the table says it will take "x2" packets.  The "x"
means it will really take 1 packet, but because this rate
table entry is also used by 49..55 byte packets tc used
the bigger value for that rate table entry.

We can fix the errant "x2" entry by sliding the rate
table along so it sits under the smallest packet length
it could apply to (in this case 49).  To achieve that
the rate table must be slide one byte to right.  This 
can be achieved by adding an "alignment" figure of -1 
to the packet length before looking up the rate table.
Here is the same diagram as above, but with two 
additional lines added showing the effect of adding the
alignment:

Overhead 0
--
ATM Payload:  39 40 41 42 43 44 45 46 47 48 49 50
skb->len: 39 40 41 42 43 44 45 46 47 48 49 50
rtab[skb->len/8]:  1  1  1  1  1  1  1  1  1 x2  2  2
skb->len-1:   38 39 40 41 42 43 44 45 46 47 48 49
rtab[(skb->len-1)/8]:  1  1  1  1  1  1  1  1  1  1  2  2

The rows "skb->len-1" and "rtab[(skb->len-1)/8]" are
copies of the rows above that have been moved slightly.
They have to be copies, as the rtab computed by tc hasn't
changed, so for example 47 must still yield 1 cell, and
48 must still yield 2 cells.

Below is the same diagram repeated for overheads 1 .. 9.
The are all pre

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-07-04 Thread Patrick McHardy
jamal wrote:
> On Tue, 2006-04-07 at 15:29 +0200, Patrick McHardy wrote:
> 
>>Russell Stuart wrote:
> 
> [..]
> 
>>>Without seeing your actual proposal it is difficult to
>>>judge whether this is a reasonable trade-off or not.
>>>Hopefully we will see your code soon.  Do you have any
>>>idea when?
>>
>>Unfortunately I still didn't got to cleaning them up, so I'm sending
>>them in their preliminary state. Its not much that is missing, but
>>the netem usage of skb->cb needs to be integrated better, I failed
>>to move it to the qdisc_skb_cb so far because of circular includes.
>>But nothing unfixable. I'm mostly interested if the current size-tables
>>can express what you need for ATM, I wasn't able to understand the
>>big comment in tc_core.c in your patch.
>>
> 
> 
> Looks good from within the range of "change within reason" of addressed
> problem. The cb on the qdisc seems only usable for netem, correct?

Yes, it has the same limitations as current netem cb usage. Really
makeing it useable for all qdiscs would require reserving a few bytes
for every level, so far that isn't necessary and I would prefer to
just add a time_to_send field for netem. The problem with this is
that it currently requires sch_generic.h and pkt_sched.h to include
one another, so I did the qdisc_skb_cb() thing to at least get it to
compile for now.

> Also while not unreasonable, i wasnt sure how qdisc_enqueue_root()
> fit in the grand scheme of things for this change (it seemed out of
> place).

Its there as a spot to do the initial time calculations and store them
in the cb. I didn't want to put this in net/core/dev.c.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-07-04 Thread jamal
On Tue, 2006-04-07 at 15:29 +0200, Patrick McHardy wrote:
> Russell Stuart wrote:
[..]
> > Without seeing your actual proposal it is difficult to
> > judge whether this is a reasonable trade-off or not.
> > Hopefully we will see your code soon.  Do you have any
> > idea when?
> 
> Unfortunately I still didn't got to cleaning them up, so I'm sending
> them in their preliminary state. Its not much that is missing, but
> the netem usage of skb->cb needs to be integrated better, I failed
> to move it to the qdisc_skb_cb so far because of circular includes.
> But nothing unfixable. I'm mostly interested if the current size-tables
> can express what you need for ATM, I wasn't able to understand the
> big comment in tc_core.c in your patch.
> 

Looks good from within the range of "change within reason" of addressed
problem. The cb on the qdisc seems only usable for netem, correct?
Also while not unreasonable, i wasnt sure how qdisc_enqueue_root()
fit in the grand scheme of things for this change (it seemed out of
place).

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-07-04 Thread Patrick McHardy
Russell Stuart wrote:
> On 26/06/2006 9:10 PM, Patrick McHardy wrote:
> 
>>> 5.  We still did have to modify the kernel for ATM.  That was
>>>because of its rather unusual characteristics.  However,
>>>it you look at the size of modifications made to the kernel
>>>verses the size made to the user space tool, (37 lines
>>>versus 303 lines,) the bulk of the work was does in user
>>>space.
>>
>>
>> I'm sorry, but arguing that a limited special case solution is
>> better because it needs slightly less code is just not reasonable.
> 
> 
> Without seeing your actual proposal it is difficult to
> judge whether this is a reasonable trade-off or not.
> Hopefully we will see your code soon.  Do you have any
> idea when?

Unfortunately I still didn't got to cleaning them up, so I'm sending
them in their preliminary state. Its not much that is missing, but
the netem usage of skb->cb needs to be integrated better, I failed
to move it to the qdisc_skb_cb so far because of circular includes.
But nothing unfixable. I'm mostly interested if the current size-tables
can express what you need for ATM, I wasn't able to understand the
big comment in tc_core.c in your patch.

[NET_SCHED]: Add accessor function for packet length for qdiscs

Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]>

---
commit 2a6508576111d82246ee018edbcc4b0f0d18acad
tree 8be27ab6040ea90ed11728763e5b8fcf9e221b67
parent 31304c909e6945b005af62cd55a582e9c010a0b4
author Patrick McHardy <[EMAIL PROTECTED]> Tue, 04 Jul 2006 15:03:01 +0200
committer Patrick McHardy <[EMAIL PROTECTED]> Tue, 04 Jul 2006 15:03:01 +0200

 include/net/sch_generic.h |9 +++--
 net/sched/sch_atm.c   |4 ++--
 net/sched/sch_cbq.c   |   12 ++--
 net/sched/sch_dsmark.c|2 +-
 net/sched/sch_fifo.c  |2 +-
 net/sched/sch_gred.c  |   12 ++--
 net/sched/sch_hfsc.c  |8 
 net/sched/sch_htb.c   |8 
 net/sched/sch_netem.c |6 +++---
 net/sched/sch_prio.c  |2 +-
 net/sched/sch_red.c   |2 +-
 net/sched/sch_sfq.c   |   14 +++---
 net/sched/sch_tbf.c   |6 +++---
 net/sched/sch_teql.c  |4 ++--
 14 files changed, 48 insertions(+), 43 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index b0e9108..75d7a55 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -184,12 +184,17 @@ tcf_destroy(struct tcf_proto *tp)
kfree(tp);
 }
 
+static inline unsigned int qdisc_tx_len(struct sk_buff *skb)
+{
+   return skb->len;
+}
+
 static inline int __qdisc_enqueue_tail(struct sk_buff *skb, struct Qdisc *sch,
   struct sk_buff_head *list)
 {
__skb_queue_tail(list, skb);
-   sch->qstats.backlog += skb->len;
-   sch->bstats.bytes += skb->len;
+   sch->qstats.backlog += qdisc_tx_len(skb);
+   sch->bstats.bytes += qdisc_tx_len(skb);
sch->bstats.packets++;
 
return NET_XMIT_SUCCESS;
diff --git a/net/sched/sch_atm.c b/net/sched/sch_atm.c
index dbf44da..4df305e 100644
--- a/net/sched/sch_atm.c
+++ b/net/sched/sch_atm.c
@@ -453,9 +453,9 @@ #endif
if (flow) flow->qstats.drops++;
return ret;
}
-   sch->bstats.bytes += skb->len;
+   sch->bstats.bytes += qdisc_tx_len(skb);
sch->bstats.packets++;
-   flow->bstats.bytes += skb->len;
+   flow->bstats.bytes += qdisc_tx_len(skb);
flow->bstats.packets++;
/*
 * Okay, this may seem weird. We pretend we've dropped the packet if
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index 80b7f6a..5d705e2 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -404,7 +404,7 @@ static int
 cbq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 {
struct cbq_sched_data *q = qdisc_priv(sch);
-   int len = skb->len;
+   int len = qdisc_tx_len(skb);
int ret;
struct cbq_class *cl = cbq_classify(skb, sch, &ret);
 
@@ -688,7 +688,7 @@ #ifdef CONFIG_NET_CLS_POLICE
 
 static int cbq_reshape_fail(struct sk_buff *skb, struct Qdisc *child)
 {
-   int len = skb->len;
+   int len = qdisc_tx_len(skb);
struct Qdisc *sch = child->__parent;
struct cbq_sched_data *q = qdisc_priv(sch);
struct cbq_class *cl = q->rx_class;
@@ -915,7 +915,7 @@ cbq_dequeue_prio(struct Qdisc *sch, int 
if (skb == NULL)
goto skip_class;
 
-   cl->deficit -= skb->len;
+   cl->deficit -= qdisc_tx_len(skb);
q->tx_class = cl;
q->tx_borrowed = borrow;
if (borrow != cl) {
@@ -923,11 +923,11 @@ #ifndef CBQ_XSTATS_BORROWS_BYTES
borrow->xstats.borrows++;
cl->xstats.borrows++;
 #else
-   borrow->xstats.borrows += skb->len;
-  

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-07-02 Thread jamal
On Sun, 2006-02-07 at 06:23 +0200, Patrick McHardy wrote:
[..]
> > - Any message to user space may be lost whether it has ifi_change or
> > not. You need some way to figure out a message was lost and declare your
> > state may be invalid. The -ENOBUFs is one way to discover stale state.
> 
> Thats not the point. ifi_change doesn't inform you about updates, it
> informs you about specific changes. 

yes, 

> And you can't determine those in
> the case of lost notifications without keeping the old state yourself.

nod,

> It may save you some time _comparing_ (not searching) in case it wasn't
> lost, 

This is one of the points of where it has value.

Take a hypothetical example of when you have a gazillion events/sec
happening, this is an exageration in case of ifi_change, just assume it
so for illustration purposes. Assume no messages are lost. Assume you
are only interested in the netdevice becoming  promisc or un-promisc.
In such a case, you can quickly check if among the vector of events
that this just happened or not.

> but it only contains a very limited set of information and nothing
> saves you from searching the entry a message refers to based on the
> primary keys (which unfortunately aren't even defined for every message
> type, which means they're not even clearly defined inside the kernel.
> routing rules are one example). 

In some cases it does provide you with details - example in the case of
booleans where it will tell you something changed from 0->1 or 1->0.
Actually the promisc case i gave is a bad example because it falls under
such a category although i hope it still made the point i was trying to
make. 

But yes in general, it is just supposed to provide you with info of
"dude, you got a message from X and Y" - which is more useful than the
select() "dude, you got a message". You have then to go and read that
message if you are interested or if you are busy you can go later.

> Its a hack for some specific usage case,
> and since you're defending this so hard, I'm naturally wondering if
> you introduced it?
> 

No, i didnt - it has been there for years; it may/must have been Alexey
who introduced it.

The reason i am defending it is i find it useful and it is a classical
approach in event management; i have seen it used in a lot of middleware
as an optimization. When you have a vector of events that can be
reported, then as optimization you dont waste resources by sending
details about the event in case nobody is interested in hearing about
the details or is ready to process them.

BTW,theres a consistency bug/hack at the moment in iproute2 on its
usage.

> > - by looking at ifi_change i can tell what changed from about 10 
> > other things reflected in the flags. If you get an ENOBUFS in this case
> > (or any other), it means your state is not consistent and to have
> > reliable info you need to query the kernel.
> 
> The flags only reflect a single piece of information, one that userspace
> needs to be able to reconstruct anyway. Once again the point: netlink
> notifications are _update_ messages, not state change messages. This is
> a fundamental difference, misleading userspace (see tc monitor) to
> believing it is something different for very limited usage cases does
> nothing good. I hereby state my believe that 99% of netlink users
> are broken.
> 

yes, it would mislead users who dont use it appropriately such as ip mon
but that doesnt make it less valuable. I thought ip mon would bitch if
it received enobufs? 

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-07-01 Thread Patrick McHardy
Sorry, missed your reply so far.

>>Then what is the intent, it doesn't carry any other information?
> 
> 
> Generally it is a filter to what the ifi_flags reflect.
>>From an event architecture scalability perspective, it is useful to be
> able to look at what changed without having to query the source. That is
> what ifi_change provides.  So it is not "Lazy userspace" rather it is an
> effective optimization.
> You keep less state and you have less message exchanges between user and
> kernel.

It might have been, but the situation is a bit more complicated today,
and special case solutions (aka hacks) don't work very well for the
huge diversity of interest in update messages. Besides that, it is
flawed.

>>It includes information that are not available any other way from
>>the kernel, yet the information is not transmitted reliably. How
>>could a program that relies on this possibly work reliable?
> 
> 
> Ok, so first lets separate them as two different issues:
> 
> - Any message to user space may be lost whether it has ifi_change or
> not. You need some way to figure out a message was lost and declare your
> state may be invalid. The -ENOBUFs is one way to discover stale state.

Thats not the point. ifi_change doesn't inform you about updates, it
informs you about specific changes. And you can't determine those in
the case of lost notifications without keeping the old state yourself.
It may save you some time _comparing_ (not searching) in case it wasn't
lost, but it only contains a very limited set of information and nothing
saves you from searching the entry a message refers to based on the
primary keys (which unfortunately aren't even defined for every message
type, which means they're not even clearly defined inside the kernel.
routing rules are one example). Its a hack for some specific usage case,
and since you're defending this so hard, I'm naturally wondering if
you introduced it?

> - by looking at ifi_change i can tell what changed from about 10 
> other things reflected in the flags. If you get an ENOBUFS in this case
> (or any other), it means your state is not consistent and to have
> reliable info you need to query the kernel.

The flags only reflect a single piece of information, one that userspace
needs to be able to reconstruct anyway. Once again the point: netlink
notifications are _update_ messages, not state change messages. This is
a fundamental difference, misleading userspace (see tc monitor) to
believing it is something different for very limited usage cases does
nothing good. I hereby state my believe that 99% of netlink users
are broken.

> Hope that makes sense.

Me too :)

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-06-27 Thread Patrick McHardy
Russell Stuart wrote:
> Without seeing your actual proposal it is difficult to
> judge whether this is a reasonable trade-off or not.
> Hopefully we will see your code soon.  Do you have any
> idea when?

Probably not today, I'll try to get it into shape until tomomorrow.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-06-27 Thread jamal
On Mon, 2006-26-06 at 13:21 +0200, Patrick McHardy wrote:
> jamal wrote:

> > scalability issues abound when you have a gazillion things to look at.
> > There used or may still be a way to tell from looking at netlink socket
> > that an error occurred since last time - such as "a message was lost".
> > You could use that to tell a message was lost and do scanning only
> > then.  
> 
> It returns -ENOBUFS on socket overrun. Without it netlink notifications
> wouldn't be very useable as you couldn't figure out when you missed
> some.
> 

I think you are probably right; it is -ENOBUFS. I recall something along
the lines of a socket->error flag somewhere. I will go and snoop on
code.

> >> but the ifi_change field will be different
> >>between notifications and dumps even if the object itself didn't
> >>change. "Lazy userspace" because looking at ifi_change is obviously
> >>only useful if it doesn't keep its last known state and tries to
> >>derive the change from update notifications alone .. which means it
> >>fails when notifications are lost.
> >>
> > 
> > 
> > But thats not the real intent for it. 
> 
> Then what is the intent, it doesn't carry any other information?

Generally it is a filter to what the ifi_flags reflect.
>From an event architecture scalability perspective, it is useful to be
able to look at what changed without having to query the source. That is
what ifi_change provides.  So it is not "Lazy userspace" rather it is an
effective optimization.
You keep less state and you have less message exchanges between user and
kernel.

> It includes information that are not available any other way from
> the kernel, yet the information is not transmitted reliably. How
> could a program that relies on this possibly work reliable?

Ok, so first lets separate them as two different issues:

- Any message to user space may be lost whether it has ifi_change or
not. You need some way to figure out a message was lost and declare your
state may be invalid. The -ENOBUFs is one way to discover stale state.
- by looking at ifi_change i can tell what changed from about 10 
other things reflected in the flags. If you get an ENOBUFS in this case
(or any other), it means your state is not consistent and to have
reliable info you need to query the kernel.

Hope that makes sense.

cheers,
jamal


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-06-26 Thread Russell Stuart

On 26/06/2006 9:10 PM, Patrick McHardy wrote:

5.  We still did have to modify the kernel for ATM.  That was
   because of its rather unusual characteristics.  However,
   it you look at the size of modifications made to the kernel
   verses the size made to the user space tool, (37 lines
   versus 303 lines,) the bulk of the work was does in user
   space.


I'm sorry, but arguing that a limited special case solution is
better because it needs slightly less code is just not reasonable.


Without seeing your actual proposal it is difficult to
judge whether this is a reasonable trade-off or not.
Hopefully we will see your code soon.  Do you have any
idea when?
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-06-26 Thread Patrick McHardy
jamal wrote:
> On Fri, 2006-23-06 at 16:32 +0200, Patrick McHardy wrote:
> 
>>I don't think it should carry both old and new speed. Netlink
>>notifications usually provide a snapshot of the new state, but
>>no indication what changed, with one notable exception, the
>>ifi_change field, which IMO is a hack for lazy userspace. 
> 
> 
> I am quiet fond of the ifi_change ;->
> 
>>Since
>>notifications can get lost, userspace needs to resync occasionally.
>>The naiive approach (works for every other object) to determine if
>>the object state changed from my last known state is to compare
>>all attributes ..
> 
> 
> scalability issues abound when you have a gazillion things to look at.
> There used or may still be a way to tell from looking at netlink socket
> that an error occurred since last time - such as "a message was lost".
> You could use that to tell a message was lost and do scanning only
> then.  

It returns -ENOBUFS on socket overrun. Without it netlink notifications
wouldn't be very useable as you couldn't figure out when you missed
some.

>> but the ifi_change field will be different
>>between notifications and dumps even if the object itself didn't
>>change. "Lazy userspace" because looking at ifi_change is obviously
>>only useful if it doesn't keep its last known state and tries to
>>derive the change from update notifications alone .. which means it
>>fails when notifications are lost.
>>
> 
> 
> But thats not the real intent for it. 

Then what is the intent, it doesn't carry any other information?
It includes information that are not available any other way from
the kernel, yet the information is not transmitted reliably. How
could a program that relies on this possibly work reliable?

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-06-26 Thread Patrick McHardy
Russell Stuart wrote:
> On Fri, 2006-06-23 at 17:21 +0200, Patrick McHardy wrote: 
> 
>>Not really. The randomization doesn't happen by default, but it doesn't
>>influence this anyway. SFQ allows flows to send up to "quantum" bytes
>>at a time before moving on to the next one. A flow that sends 75 * 20
>>byte will in the eyes of SFQ use 1500bytes, on the (ethernet) wire it
>>needs 4800bytes. A flow that sents 1500byte packets will only need
>>1504 bytes on the wire, but will be treated equally. So it does make
>>a different for SFQ.
> 
> 
> I hadn't even thought to check.  My bad.  The S in SFQ stands 
> for stochastic, so something that does without randomisation 
> the algorithm implemented couldn't really be called SFQ - 
> particularly as it weakens the algorithm considerably.  I 
> hope that most users do specify a perturb.

Its not as great as you think. Changing hash-functions on the
fly causes reordering for non-idle flows when bucket-lengths
aren't distributed even. I never use it.

> Your 20 byte example is hardly realistic.  skb->len includes 
> the 14 byte ethernet header, so there is a total of 6 data 
> bytes in a 20 byte packet.  The IP header alone is 20 bytes.  
> TCP as implemented on Linux adds another 32 bytes (20 + the 
> rtt option).  In other words I agree with Jamal's comments 
> elsewhere - optimising for MPU sized packets doesn't seem 
> like a win.

The point is that SFQ does care about packet sizes, and this is
true for both MPU-sized and other packets.

>>Its not about cleanup, its about providing the same capabilities
>>to all qdiscs instead of just a few selected ones and generalizing
>>it so it is also usable for non-ATM overhead calculations.
> 
> 
> Perhaps I chose my words poorly.
> 
> My intent was to contrast the size and goals of the two
> proposed patches.  The ATM patch is a 37 line patch.  It 
> includes some minor cleanups.  From the pseudo code you 
> have posted what you are proposing is a more ambitious and 
> much larger patch that moves a chunk of user space code 
> into the kernel.  I am a complete newbie when it comes to 
> getting code into the kernel, but that strikes me as 
> contentious.  I would rather not have the ATM patch 
> depend on it.
> 
> By the by, here are a couple of observations:
> 
> 1.  The entries in the current rtab are already very closely 
> related to packet lengths.  They are actually the packet
> length multiplied by a constant that converts the units
> from "bytes" to "jiffies".  The constant is the same for
> all entries in the table.
> 
> 2.  As such, the current rtab could already be used by SFQ
> and any other qdisc that needs to know the packet length.
> That SFQ doesn't do this is probably because it doesn't
> effect its performance overly.

The rtab includes the transmission time, which is related to,
but still is something different than the length. You can't
calculate the transmission time without a rate, which is
not needed otherwise for SFQ for example. The way rtabs
are used also needs more space, my current size tables
only use as much space as needed, which is 16 entries for
ethernet instead of 256.

> 3.  Be that as it may, the current RTAB isn't in the most
> convenient form for SFQ, and I am guessing it is in a 
> very inconvenient form for HFSC.  Adding a new version 
> that is identical except that it contains the raw packet 
> length would be a simple change.  In that format it
> could be used by all qdiscs.  The users of the existing
> rtab would have to do the multiplication that converts
> the packet length to jiffies in the kernel.  This means
> the conceptually at least, should the gootput change
> you need to change this one constant, not the entire
> table.

Well, I don't care much whether we use rtabs or something new,
but rtabs are meant for something different and as such are not
optimally suited for this.

> 4.  Much as you seem to dislike having the rate / packet length
> calculations in user space, having them there makes it easy 
> to add new technologies such as ATM.  You just have to 
> change a user space tool - not the kernel.

I don't dislike it for beeing in userspace, I dislike it for
a) beeing used for this since it only covers TBF-based qdiscs
b) beeing used for an ATM "special case" which is not special
   at all.

I should also note that my tables don't come out of the random
generator but are provided by userspace as well. Unless the
mechanism is unable to express the needs of a particular link
layer type all you have to do is to provide a different set
of values without touching any code at all.

> 5.  We still did have to modify the kernel for ATM.  That was
> because of its rather unusual characteristics.  However,
> it you look at the size of modifications made to the kernel
> verses the size made to the user space tool, (37 lines
> versus 303 lines,) the bulk of the work was does in user

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-06-25 Thread Russell Stuart

On 25/06/2006 12:13 AM, jamal wrote:

You can actually stop reading here if you have gathered the view at
this point that i am not objecting to the simple approach Patrick is
going with...


Perhaps this is my problem.  I am not sure I understand
what Patrick is proposing.  I can wait for his patch, I
guess.

Indeed and i referred to it in the exchanges. 
And yes, I was arguing that the tc scheme you describe would not be so

bad either if the cost of making a generic change is expensive.


OK.  I take it from this you think there is merit in
the idea of adding code so the kernel can calculate
the ATM link speeds correctly.  The discussion is
really about the best way to go about it?

If so, excellent.  I am not really too fussy about how
it is achieved, I just want my VOIP connections to
work well on stock kernels.


There are a lot of link layer issues that you may end up knowing of
(other than the ATM fragmentation overhead) in regards to something
downstream and you keep adding knobs is just adding more bloat. 
Example: If that 3rd hop was wireless that happened to be doing CDMA RLP

with a lot of retransmits, or wireless that varied its throughput from
1-3Mbps at any point in time or it was a satellite link that had a lot
of latency etc etc. You could always have some way to tweak these via
the kernel. In-fact people have written schedulers specifically for
these sorts of link layer problems (I think even some of the IEEE 802.11
or wimax folks have standardized specific schedulers). You basically
have to draw a line somewhere. My line was "can it be done via user
space? yes - do it there".


If you mean by adding lots of knobs, you mean we need a knob
for 802.11, a knob for ATM, a knob for ethernet and so on,
then we do need lots of knobs.  And you need to know which
of those layers is the bottle neck, so you know what knob to
fit.  But you only ever need one knob on a given link.

I can only think of two ways out of needing lots of knobs.
One is to have a qdisc that doesn't need to know the link
speed in order to shape traffic to it gets to the scheduling
and not someone upstream.  Sounds like black magic to me,
but perhaps HFSC does this - I have not read the papers
yet, but I plan to do so soon.

The second way is to automatically calculate the link speed,
using a daemon perhaps :).  Again it sounds like black
magic.  Note that there is already code in the kernel that
does this, but it lives in the layers above - in TCP and
DCCP.  I am referring to Westwood, and friends.  These
algorithms live in the layers above because the need feed
back from the network - which can only come from the other
end of connection unless ECN is working.

I have not been able to figure out how Patrick intends to
solve these problems from his posts so far, so I am waiting
for his code.  Hopefully it will include a lot of comments.


Patrick seems to have a simple way to compensate generically for link
layer fragmentation, so i will not argue the practically; hopefully that
settles it? ;->


Yes, it does.  It will be interesting to see what Patrick
comes up with.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-06-25 Thread Russell Stuart
On Fri, 2006-06-23 at 17:21 +0200, Patrick McHardy wrote: 
> Not really. The randomization doesn't happen by default, but it doesn't
> influence this anyway. SFQ allows flows to send up to "quantum" bytes
> at a time before moving on to the next one. A flow that sends 75 * 20
> byte will in the eyes of SFQ use 1500bytes, on the (ethernet) wire it
> needs 4800bytes. A flow that sents 1500byte packets will only need
> 1504 bytes on the wire, but will be treated equally. So it does make
> a different for SFQ.

I hadn't even thought to check.  My bad.  The S in SFQ stands 
for stochastic, so something that does without randomisation 
the algorithm implemented couldn't really be called SFQ - 
particularly as it weakens the algorithm considerably.  I 
hope that most users do specify a perturb.

Your 20 byte example is hardly realistic.  skb->len includes 
the 14 byte ethernet header, so there is a total of 6 data 
bytes in a 20 byte packet.  The IP header alone is 20 bytes.  
TCP as implemented on Linux adds another 32 bytes (20 + the 
rtt option).  In other words I agree with Jamal's comments 
elsewhere - optimising for MPU sized packets doesn't seem 
like a win.

> Not a problem as long as the new stuff doesn't break anything existing.
> My patch introduces a TCA_STAB (for size table), similar to the _RTAB
> attributes. Old iproute with new kernel and new iproute with old kernel
> both work fine.

OK, good.

> Its not about cleanup, its about providing the same capabilities
> to all qdiscs instead of just a few selected ones and generalizing
> it so it is also usable for non-ATM overhead calculations.

Perhaps I chose my words poorly.

My intent was to contrast the size and goals of the two
proposed patches.  The ATM patch is a 37 line patch.  It 
includes some minor cleanups.  From the pseudo code you 
have posted what you are proposing is a more ambitious and 
much larger patch that moves a chunk of user space code 
into the kernel.  I am a complete newbie when it comes to 
getting code into the kernel, but that strikes me as 
contentious.  I would rather not have the ATM patch 
depend on it.

By the by, here are a couple of observations:

1.  The entries in the current rtab are already very closely 
related to packet lengths.  They are actually the packet
length multiplied by a constant that converts the units
from "bytes" to "jiffies".  The constant is the same for
all entries in the table.

2.  As such, the current rtab could already be used by SFQ
and any other qdisc that needs to know the packet length.
That SFQ doesn't do this is probably because it doesn't
effect its performance overly.

3.  Be that as it may, the current RTAB isn't in the most
convenient form for SFQ, and I am guessing it is in a 
very inconvenient form for HFSC.  Adding a new version 
that is identical except that it contains the raw packet 
length would be a simple change.  In that format it
could be used by all qdiscs.  The users of the existing
rtab would have to do the multiplication that converts
the packet length to jiffies in the kernel.  This means
the conceptually at least, should the gootput change
you need to change this one constant, not the entire
table.

4.  Much as you seem to dislike having the rate / packet length
calculations in user space, having them there makes it easy 
to add new technologies such as ATM.  You just have to 
change a user space tool - not the kernel.

5.  We still did have to modify the kernel for ATM.  That was
because of its rather unusual characteristics.  However,
it you look at the size of modifications made to the kernel
verses the size made to the user space tool, (37 lines
versus 303 lines,) the bulk of the work was does in user
space.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-06-24 Thread jamal
On Fri, 2006-23-06 at 16:32 +0200, Patrick McHardy wrote:
> jamal wrote:

> > If you do it in user space you will need a daemon of some form; this is
> > my preference but it seems a lot of people hate daemons - the standard
> > claim is it is counter-usability. Such people are forgiving if you built
> > the daemon into the kernel as a thread. Perhaps the netcarrier that
> > Stefan Rompf has added could be extended to handle this)
> 
> I absolutely disagree, in my opinion we currently have too few
> specialized daemons and a lot too much wild shell scripting with
> seding, awking, grepping. 

Like i said i too prefer a daemon; but i have experienced a lot of
people who dont (especially in small embedded devices).

> I'm actually working on a daemon that
> provides a small scripting language to express networking configuration
> states and react to result set changes and state transitions.
> Its far from complete, but already allows you to express things like
> "on transition from { none, link flags !up } to { link flags up }
> execute /etc/network/link/$LINK_NAME up" (same for routes, addresses,
> basically all object types) or "for each link with flags lowerup,up
> execute add-to-bridge.sh". The value of each expression is dumped
> into the environment on execution, so you can comfortably use
> $LINK_NAME or $LINK_MTU instead of having to cut it out the
> "ip link list" output. Should be trivial to support link speed changes
> once we have notifications for that.
> 

cool - a neteventd


> I don't think it should carry both old and new speed. Netlink
> notifications usually provide a snapshot of the new state, but
> no indication what changed, with one notable exception, the
> ifi_change field, which IMO is a hack for lazy userspace. 

I am quiet fond of the ifi_change ;->

> Since
> notifications can get lost, userspace needs to resync occasionally.
> The naiive approach (works for every other object) to determine if
> the object state changed from my last known state is to compare
> all attributes ..

scalability issues abound when you have a gazillion things to look at.
There used or may still be a way to tell from looking at netlink socket
that an error occurred since last time - such as "a message was lost".
You could use that to tell a message was lost and do scanning only
then.  

>  but the ifi_change field will be different
> between notifications and dumps even if the object itself didn't
> change. "Lazy userspace" because looking at ifi_change is obviously
> only useful if it doesn't keep its last known state and tries to
> derive the change from update notifications alone .. which means it
> fails when notifications are lost.
> 

But thats not the real intent for it. 

cheers,
jamal

PS:- I will look at your other postings and respond later i have to run
for now.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-06-24 Thread jamal
On Fri, 2006-23-06 at 22:37 +1000, Russell Stuart wrote: 
>  Sorry for the digest like reply :(

;-> I know this discussion has gone in a few directions already
and i am sure as you kept responding to the emails it became obvious.
Let me just jump to one point so we dont repeat a lot of the same things
over and over..
You can actually stop reading here if you have gathered the view at
this point that i am not objecting to the simple approach Patrick is
going with...

> On Tue, 2006-06-20 at 10:06 -0400, jamal wrote:
>  But you can measure the bandwidth you are 
> getting from your ISP and plug that into the tc command 
> line.  The web page I sent to you describes how to do this
> for ADSL lines.
> 

Indeed and i referred to it in the exchanges. 
And yes, I was arguing that the tc scheme you describe would not be so
bad either if the cost of making a generic change is expensive.

To reiterate the scheduler "owns the resources" of the link
i.e link level bandwidth - aka throughput; however, teaching the
scheduler as in your description in that page about effective
throughput aka "goodput" is also within reason if you knew _it was
consistent_ . The diagram i drew was:

|Linux| --ethernet-- |Modem| --DSL-- |DSLAM| --ATM-- |BRAS| 

As can be seen from above, Linux has direct ownership of the ethernet
link resources (within limits of fixed bandwidth no AN etc). Just like a
disk quota scheduler owns the disk resources or CPU scheduler owns the
CPU cycles etc. Likewise, the DSLAM downstream owns scheduling of the
ATM link resources.

if you have knowledge that ATM runs between DSLAM and BRAS and that
the effective throughput is in-fact lower than the advertised one at
that point in the end2end path, then i admit it is fine to do as you
described in your web-page. 
Given the ubiquity of ADSL it is also ok to have simple tweak knobs for
the scheduler in Linux to allow for it to be able to compensate for the
ATM fragmentation 3 hops down. 

There are a lot of link layer issues that you may end up knowing of
(other than the ATM fragmentation overhead) in regards to something
downstream and you keep adding knobs is just adding more bloat. 
Example: If that 3rd hop was wireless that happened to be doing CDMA RLP
with a lot of retransmits, or wireless that varied its throughput from
1-3Mbps at any point in time or it was a satellite link that had a lot
of latency etc etc. You could always have some way to tweak these via
the kernel. In-fact people have written schedulers specifically for
these sorts of link layer problems (I think even some of the IEEE 802.11
or wimax folks have standardized specific schedulers). You basically
have to draw a line somewhere. My line was "can it be done via user
space? yes - do it there".

Patrick seems to have a simple way to compensate generically for link
layer fragmentation, so i will not argue the practically; hopefully that
settles it? ;->

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-06-23 Thread Patrick McHardy
Russell Stuart wrote:
> On Thu, 2006-06-22 at 14:29 -0400, jamal wrote: 
> 

> On Tue, 2006-06-20 at 03:04 +0200, Patrick McHardy wrote: 
> 
>>What about qdiscs like SFQ (which uses the packet size in quantum
>>calculations)? I guess it would make sense to use the wire-length
>>there as well.
> 
> 
> Being pedantic, SQF automatically assigns traffic to classes 
> and gives each class an equal share of the available bandwidth.  
> As I am sure you are aware SQF's trick is that it randomly 
> changes its classification algorithm - every second in the Linux 
> implementation.  If there are errors in rate calculation this 
> randomisation will ensure they are distributed equally between 
> the classes as time goes on.  So no, accurate packets sizes are 
> not that important to SQF.
> 
> But they are important to many other qdiscs, and I am sure 
> that was your point.  SQF just happened to be a bad example.

Not really. The randomization doesn't happen by default, but it doesn't
influence this anyway. SFQ allows flows to send up to "quantum" bytes
at a time before moving on to the next one. A flow that sends 75 * 20
byte will in the eyes of SFQ use 1500bytes, on the (ethernet) wire it
needs 4800bytes. A flow that sents 1500byte packets will only need
1504 bytes on the wire, but will be treated equally. So it does make
a different for SFQ.

> On Tue, 2006-06-20 at 16:45 +0200, Patrick McHardy wrote: 
> 
>>Handling all qdiscs would mean adding a pointer to a mapping table
>>to struct net_device and using something like "skb_wire_len(skb, dev)"
>>instead of skb->len in the queueing layer. That of course doesn't
>>mean that we can't still provide pre-adjusted ratetables for qdiscs
>>that use them.
> 
> 
> Yes, that would work well, and is probably how it should of
> been done when the kernel stuff was originally written.  As 
> it happens Jesper's original solution was closer to this.  The 
> reason we choose not to go that way it is would change the 
> kernel-userspace API.   The current patch solves the problem 
> and works well as possible on all kernel versions - both 
> patched and unpatched.

Not a problem as long as the new stuff doesn't break anything existing.
My patch introduces a TCA_STAB (for size table), similar to the _RTAB
attributes. Old iproute with new kernel and new iproute with old kernel
both work fine.

> Now that I think about to change things the way you suggest
> here does seem simple enough.  But it probably belongs in a 
> different patch.  We wrote this patch to fix a specific problem 
> with ATM links, and it should succeed or fail on the merits 
> of doing that.  Cleaning up the kernel code to do what you 
> suggest is a different issue.  Let whether it to should be 
> done, or not, be based on its own merits.

Its not about cleanup, its about providing the same capabilities
to all qdiscs instead of just a few selected ones and generalizing
it so it is also usable for non-ATM overhead calculations.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-06-23 Thread Patrick McHardy
jamal wrote:
> On Tue, 2006-20-06 at 17:09 +0200, Patrick McHardy wrote:
> 
>>The thing I'm not sure about is whether this wouldn't be handled better
>>by userspace, 
> 
> 
> If you do it in user space you will need a daemon of some form; this is
> my preference but it seems a lot of people hate daemons - the standard
> claim is it is counter-usability. Such people are forgiving if you built
> the daemon into the kernel as a thread. Perhaps the netcarrier that
> Stefan Rompf has added could be extended to handle this)

I absolutely disagree, in my opinion we currently have too few
specialized daemons and a lot too much wild shell scripting with
seding, awking, grepping. I'm actually working on a daemon that
provides a small scripting language to express networking configuration
states and react to result set changes and state transitions.
Its far from complete, but already allows you to express things like
"on transition from { none, link flags !up } to { link flags up }
execute /etc/network/link/$LINK_NAME up" (same for routes, addresses,
basically all object types) or "for each link with flags lowerup,up
execute add-to-bridge.sh". The value of each expression is dumped
into the environment on execution, so you can comfortably use
$LINK_NAME or $LINK_MTU instead of having to cut it out the
"ip link list" output. Should be trivial to support link speed changes
once we have notifications for that.

>>if the link layer speed changes you might not want
>>proportional scaling but prefer to still give a fixed amount of that
>>bandwidth to some class, for example VoIP traffic. Do we have netlink
>>notifications for link speed changes?
> 
> 
> Not there at the moment - but we do emit event for other link layer
> stuff like carrier on/off - so adding this should be trivial and a
> reasonable thing to have; with a caveat: it will be link-layer specific;
> so whoever ends up adding will have to be careful to make sure it is not
> hard-coded to be specific to ethernet-like netdevices. It could probably
> be reported together with link state as a TLV like ETHER_SPEED_CHANGED
> which carries probably old speed and new speed
> and maybe even reason why it changed.

I don't think it should carry both old and new speed. Netlink
notifications usually provide a snapshot of the new state, but
no indication what changed, with one notable exception, the
ifi_change field, which IMO is a hack for lazy userspace. Since
notifications can get lost, userspace needs to resync occasionally.
The naiive approach (works for every other object) to determine if
the object state changed from my last known state is to compare
all attributes .. but the ifi_change field will be different
between notifications and dumps even if the object itself didn't
change. "Lazy userspace" because looking at ifi_change is obviously
only useful if it doesn't keep its last known state and tries to
derive the change from update notifications alone .. which means it
fails when notifications are lost.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-06-23 Thread Russell Stuart
On Thu, 2006-06-22 at 14:29 -0400, jamal wrote: 
> Russell,
> 
> I did look at what you sent me and somewhere in those discussions i
> argue that the changes compensate to make the rate be a goodput
> instead of advertised throughput.

I did see that, but didn't realise you were responding to 
me.  A lot of discussion has gone on since and evidently 
quite a bit of which was addressed to me.  I will try to 
answer the some of the points.   Sorry for the digest 
like reply :(

On Wed, 2006-06-14 at 11:57 +0100, Alan Cox wrote:
> I'm 
> not sure if that matters but for modern processors I'm also sceptical
> that the clever computation is actually any faster than just doing the
> maths, especially if something cache intensive is also running.

Assuming you are referring to the rate tables - I hadn't
thought about it, but I guess I would agree.   However, this 
patch wasn't trying to radically re-engineer the traffic 
control engines rate calculation code.  Quite the reverse I -
was was trying to change it as little as possible.  The kernel 
part of the patch actually only introduced one small change - 
the optional addition of a constant the packet length.

On Thu, 2006-06-15 at 08:57 -0400, jamal wrote: 
> But i dont think it is ACKs perse that you or Russell are contending
> cause these issues. It's the presence of ATM . And all evidence seems to
> point to the fact that ISPs bill you for something other than your
> point of view, no?

I don't know about anywhere else, but certainly here in
Australia some ISP's creative in how they advertise their
link speeds.  Again that is not the issue we were trying 
to address with the patch.

On Thu, 2006-06-15 at 08:57 -0400, jamal wrote: 
> You are still speaking ATM (and the above may still be valid), but: 
> Could you for example look at the netdevice->type and from that figure
> out the link layer overhead and compensate for it.

As others have pointed out, this doesn't work for the ADSL 
user.  An ADSL modem is connected to the box using either 
ethernet, wireless or USB.

On Thu, 2006-06-15 at 09:03 -0400, jamal wrote: 
> It is probably doable by just looking at netdevice->type and figuring
> the link layer technology. Totally in user space and building the
> compensated for tables there before telling the kernel (advantage is no
> kernel changes and therefore it would work with older kernels as well).

Others have had this same thought, and have spent time trying
to come up with a user space only solution.  They failed because 
it isn't possible.  To understand why see this thread:

  http://mailman.ds9a.nl/pipermail/lartc/2006q1/018314.html

Also, the user space patch does improve the performance of 
older kernels (ie unpatched kernels).  Rather than getting 
the rate wrong 99.9% of the time, older kernels only get it 
wrong 14% of the time, on average.

On Tue, 2006-06-20 at 03:04 +0200, Patrick McHardy wrote: 
> What about qdiscs like SFQ (which uses the packet size in quantum
> calculations)? I guess it would make sense to use the wire-length
> there as well.

Being pedantic, SQF automatically assigns traffic to classes 
and gives each class an equal share of the available bandwidth.  
As I am sure you are aware SQF's trick is that it randomly 
changes its classification algorithm - every second in the Linux 
implementation.  If there are errors in rate calculation this 
randomisation will ensure they are distributed equally between 
the classes as time goes on.  So no, accurate packets sizes are 
not that important to SQF.

But they are important to many other qdiscs, and I am sure 
that was your point.  SQF just happened to be a bad example.

On Tue, 2006-06-20 at 10:06 -0400, jamal wrote:
> What this means is that Linux computes based on ethernet
> headers. Somewhere downstream ATM (refer to above) comes in and that
> causes mismatch in what Linux expects to be the bandwidth and what
> your service provider who doesnt account for the ATM overhead when
> they sell you "1.5Mbps".
> Reminds me of hard disk vendors who define 1K to be 1000 to show
> how large their drives are.
> Yes, Linux cant tell if your service provider is lying to you.

No, it can't.  But you can measure the bandwidth you are 
getting from your ISP and plug that into the tc command 
line.  The web page I sent to you describes how to do this
for ADSL lines.

On Tue, 2006-06-20 at 10:06 -0400, jamal wrote:
> > On Mon, 2006-19-06 at 21:31 +0200, Jesper Dangaard Brouer wrote:
> > The issue here is, that ATM does not have fixed overhead (due to alignment 
> > and padding).  This means that a fixed reduction of the bandwidth is not 
> > the solution.  We could reduce the bandwidth to the worst-case overhead, 
> > which is 62%, I do not think that is a good solution...
> > 
> 
> I dont see it as wrong to be honest with you. Your mileage may vary.

Jamal am I reading this correctly?  Did you just say that you 
don't see having to reduce your available bandwidth by 62% to 
take account of defic

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-06-22 Thread jamal
On Tue, 2006-20-06 at 17:09 +0200, Patrick McHardy wrote:
> jamal wrote:

> "Depend on bandwidth" is not the right term. All of TBF, HTB and HFSC
> provide bandwidth per time, but with TBF and HTB the relation between
> the amount of bandwidth is linear to the amount of time, with HFSC
> it is only on a linear on larger scale since it uses service curves,
> which are represented as two linear pieces. So you have bandwidth b1
> for time t1, bandwidth b2 after that until eternity. By scaling the
> clock rate you alter after how much time b2 kicks in, which affects
> the guaranteed delays. The end result should be that both bandwidth
> and delay scale up or down proportionally, but I'm not sure that this
> is what HFSC would do in all cases (on small scale). But it should
> be easy to answer with a bit more time for visualizing it.
> 

Ok, this makes things a little trickier though, no? 

> The thing I'm not sure about is whether this wouldn't be handled better
> by userspace, 

If you do it in user space you will need a daemon of some form; this is
my preference but it seems a lot of people hate daemons - the standard
claim is it is counter-usability. Such people are forgiving if you built
the daemon into the kernel as a thread. Perhaps the netcarrier that
Stefan Rompf has added could be extended to handle this)
Note, if you wanna do it right as well you will factor in other things
like some wireless technologies which changes their throughput
capability over a period of time ( A lot of these guys try to have their
own hardware level schedulers to compensate for this).

> if the link layer speed changes you might not want
> proportional scaling but prefer to still give a fixed amount of that
> bandwidth to some class, for example VoIP traffic. Do we have netlink
> notifications for link speed changes?

Not there at the moment - but we do emit event for other link layer
stuff like carrier on/off - so adding this should be trivial and a
reasonable thing to have; with a caveat: it will be link-layer specific;
so whoever ends up adding will have to be careful to make sure it is not
hard-coded to be specific to ethernet-like netdevices. It could probably
be reported together with link state as a TLV like ETHER_SPEED_CHANGED
which carries probably old speed and new speed
and maybe even reason why it changed.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-06-21 Thread Krzysztof Matusik
Dnia środa, 21 czerwca 2006 14:54, Patrick McHardy napisał:
> > I'd love to see this one implemented. I'm using HFSC more than a year and
> > it never provides proper QoS on ATM/ADSL links; low delays can never be
> > achieved even with significant throttling below the h/w link bandwidth.
>
> Mhh .. I trust you picked a proper clocksource and timer frequency?

250Hz or 1000Hz and jiffies (for a SMP machine)
I don't mean I've got big delays though and those are lowest with HFSC, when 
link is satiated.

IMHO since ATM is sending small packets constantly thorough its link (ATM 
specific) encapsulated data defragmentation cannot simply be achieved (or it 
simply isn't done by modems). As a result one can't simply throttle qdisc 
basing on computations like 
UpperLimit = ( bandwidth*(1 - header_bits/frame_size )
having bandwidth specified by ATM link speed rate, as it can be reported by 
modem, and expect it won't ever be congested.
This seems to be main problem. Link asymmetry, window size etc don't affect 
line congestion and so delays that much I suppose.
The same with incorrect timer :-).

> I hacked up a patch yesterday. I want to do a bit more testing before
> posting it, but its hard to really test the effectiveness because even
> without this patch my DSL line already delivers higher throughput and
> much better delay than it should (its a throttled SDSL line sold as
> ADSL). I'll try to do some testing on an ethernet link now.

I'll follow this thread then.

regards
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-06-21 Thread Patrick McHardy
Krzysztof Matusik wrote:
> Dnia wtorek, 20 czerwca 2006 17:16, Patrick McHardy napisał:
> 
>>The code wouldn't be very complicated, it just adds some overhead. If
>>you do something like I described in my previous mail the overhead for
>>people not using it would be an additional pointer test before reading
>>skb->len. I guess we could also make it a compile time option.
>>I personally think this is something that really improves our quality
>>of implementation, after all, its "wire" resources qdiscs are meant
>>to manage.
> 
> 
> I'd love to see this one implemented. I'm using HFSC more than a year and it 
> never provides proper QoS on ATM/ADSL links; low delays can never be achieved 
> even with significant throttling below the h/w link bandwidth.

Mhh .. I trust you picked a proper clocksource and timer frequency?

> This would help a lot regarding the amount of adsl users but on the other 
> hand- there's not many HFSC implementations in real-life I guess (users seem 
> to be afraid of it's 'complexity'). 
> This idea doesn't seem look dirty- is there a chance to implement it in the 
> kernel and iproute?

I hacked up a patch yesterday. I want to do a bit more testing before
posting it, but its hard to really test the effectiveness because even
without this patch my DSL line already delivers higher throughput and
much better delay than it should (its a throttled SDSL line sold as
ADSL). I'll try to do some testing on an ethernet link now.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-06-21 Thread Krzysztof Matusik
Dnia wtorek, 20 czerwca 2006 17:16, Patrick McHardy napisał:
> jamal wrote:
> > On Tue, 2006-20-06 at 03:04 +0200, Patrick McHardy wrote:
> >>It would be nice to have support for HFSC as well, which unfortunately
> >>needs to be done in the kernel since it doesn't use rate tables.
> >>What about qdiscs like SFQ (which uses the packet size in quantum
> >>calculations)? I guess it would make sense to use the wire-length
> >>there as well.
> >
> > Didnt even think of that ;->
> > Is it getting too complicated?
>
> The code wouldn't be very complicated, it just adds some overhead. If
> you do something like I described in my previous mail the overhead for
> people not using it would be an additional pointer test before reading
> skb->len. I guess we could also make it a compile time option.
> I personally think this is something that really improves our quality
> of implementation, after all, its "wire" resources qdiscs are meant
> to manage.

I'd love to see this one implemented. I'm using HFSC more than a year and it 
never provides proper QoS on ATM/ADSL links; low delays can never be achieved 
even with significant throttling below the h/w link bandwidth.
This would help a lot regarding the amount of adsl users but on the other 
hand- there's not many HFSC implementations in real-life I guess (users seem 
to be afraid of it's 'complexity'). 
This idea doesn't seem look dirty- is there a chance to implement it in the 
kernel and iproute?

regards
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-06-20 Thread Patrick McHardy
jamal wrote:
> On Tue, 2006-20-06 at 03:04 +0200, Patrick McHardy wrote:
> 
>>It would be nice to have support for HFSC as well, which unfortunately
>>needs to be done in the kernel since it doesn't use rate tables.
>>What about qdiscs like SFQ (which uses the packet size in quantum
>>calculations)? I guess it would make sense to use the wire-length
>>there as well.
> 
> 
> Didnt even think of that ;-> 
> Is it getting too complicated? 

The code wouldn't be very complicated, it just adds some overhead. If
you do something like I described in my previous mail the overhead for
people not using it would be an additional pointer test before reading
skb->len. I guess we could also make it a compile time option.
I personally think this is something that really improves our quality
of implementation, after all, its "wire" resources qdiscs are meant
to manage.

> BTW, I forgot to mention one thing on the bandwidth issue is we could do
> is send netlink events on link speed changes too; some listener
> somewhere would then do the adjustment.

See the mail I just wrote :)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-06-20 Thread Patrick McHardy
jamal wrote:
> On Tue, 2006-20-06 at 02:54 +0200, Patrick McHardy wrote:
> 
>>jamal wrote:
>>
>>>- For further reflection: Have you considered the case where the rate
>>>table has already been considered on some link speed in user space and
>>>then somewhere post-config the physical link speed changes? This would
>>>happen in the case where ethernet AN is involved and the partner makes
>>>some changes (use ethtool). 
>>>
> 
> [..]
> 
>>I've thought about this a couple of times, scaling the virtual clock
>>rate should be enough for "simple" qdiscs like TBF or HTB, which have
>>a linear relation between time and bandwidth. I haven't really thought
>>about the effects on HFSC yet, on a small scale the relation is
>>non-linear. 
> 
> 
> Does HFSC not depend on bandwith? How is rate control achieved?

"Depend on bandwidth" is not the right term. All of TBF, HTB and HFSC
provide bandwidth per time, but with TBF and HTB the relation between
the amount of bandwidth is linear to the amount of time, with HFSC
it is only on a linear on larger scale since it uses service curves,
which are represented as two linear pieces. So you have bandwidth b1
for time t1, bandwidth b2 after that until eternity. By scaling the
clock rate you alter after how much time b2 kicks in, which affects
the guaranteed delays. The end result should be that both bandwidth
and delay scale up or down proportionally, but I'm not sure that this
is what HFSC would do in all cases (on small scale). But it should
be easy to answer with a bit more time for visualizing it.

The thing I'm not sure about is whether this wouldn't be handled better
by userspace, if the link layer speed changes you might not want
proportional scaling but prefer to still give a fixed amount of that
bandwidth to some class, for example VoIP traffic. Do we have netlink
notifications for link speed changes?

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-06-20 Thread jamal
On Tue, 2006-20-06 at 03:04 +0200, Patrick McHardy wrote:
> jamal wrote:
> > You are still speaking ATM (and the above may still be valid), but: 
> > Could you for example look at the netdevice->type and from that figure
> > out the link layer overhead and compensate for it.
> > Obviously a lot more useful if such activity is doable in user space
> > without any knowledge of the kernel? and therefore zero change to the
> > kernel and everything then becomes forward and backward compatible.
> 
> It would be nice to have support for HFSC as well, which unfortunately
> needs to be done in the kernel since it doesn't use rate tables.
> What about qdiscs like SFQ (which uses the packet size in quantum
> calculations)? I guess it would make sense to use the wire-length
> there as well.

Didnt even think of that ;-> 
Is it getting too complicated? 

BTW, I forgot to mention one thing on the bandwidth issue is we could do
is send netlink events on link speed changes too; some listener
somewhere would then do the adjustment.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-06-20 Thread jamal
On Tue, 2006-20-06 at 02:54 +0200, Patrick McHardy wrote:
> jamal wrote:
> > - For further reflection: Have you considered the case where the rate
> > table has already been considered on some link speed in user space and
> > then somewhere post-config the physical link speed changes? This would
> > happen in the case where ethernet AN is involved and the partner makes
> > some changes (use ethtool). 
> > 
[..]
> I've thought about this a couple of times, scaling the virtual clock
> rate should be enough for "simple" qdiscs like TBF or HTB, which have
> a linear relation between time and bandwidth. I haven't really thought
> about the effects on HFSC yet, on a small scale the relation is
> non-linear. 

Does HFSC not depend on bandwith? How is rate control achieved?

> But this is a different problem from trying to accomodate
> for link-layer overhead.
> 

Yes it is different issue.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-06-20 Thread Jesper Dangaard Brouer

On Mon, 2006-06-19 at 22:35 -0700, Chris Wedgwood wrote:
> On Wed, Jun 14, 2006 at 11:40:04AM +0200, Jesper Dangaard Brouer wrote:
> 
> > The Linux traffic's control engine inaccurately calculates
> > transmission times for packets sent over ADSL links.  For some
> > packet sizes the error rises to over 50%.  This occurs because ADSL
> > uses ATM as its link layer transport, and ATM transmits packets in
> > fixed sized 53 byte cells.
> 
> What if AAL5 is used?  The cell-alignment math is going to be wrong
> there surely?

Actually it _is_ AAL5 which is accounted for.

See Chapter 5 "ADSL link layer overhead" (page 48-54).
http://www.adsl-optimizer.dk/thesis/main_final_hyper.pdf

-- 
Med venlig hilsen / Best regards
  Jesper Brouer
  ComX Networks A/S
  Linux Network developer
  Cand. Scient Datalog / MSc.
  Author of http://adsl-optimizer.dk



signature.asc
Description: This is a digitally signed message part


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-06-19 Thread Chris Wedgwood
On Wed, Jun 14, 2006 at 11:40:04AM +0200, Jesper Dangaard Brouer wrote:

> The Linux traffic's control engine inaccurately calculates
> transmission times for packets sent over ADSL links.  For some
> packet sizes the error rises to over 50%.  This occurs because ADSL
> uses ATM as its link layer transport, and ATM transmits packets in
> fixed sized 53 byte cells.

What if AAL5 is used?  The cell-alignment math is going to be wrong
there surely?

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-06-19 Thread Patrick McHardy
jamal wrote:
> You are still speaking ATM (and the above may still be valid), but: 
> Could you for example look at the netdevice->type and from that figure
> out the link layer overhead and compensate for it.
> Obviously a lot more useful if such activity is doable in user space
> without any knowledge of the kernel? and therefore zero change to the
> kernel and everything then becomes forward and backward compatible.

It would be nice to have support for HFSC as well, which unfortunately
needs to be done in the kernel since it doesn't use rate tables.
What about qdiscs like SFQ (which uses the packet size in quantum
calculations)? I guess it would make sense to use the wire-length
there as well.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-06-19 Thread Patrick McHardy
jamal wrote:
> - For further reflection: Have you considered the case where the rate
> table has already been considered on some link speed in user space and
> then somewhere post-config the physical link speed changes? This would
> happen in the case where ethernet AN is involved and the partner makes
> some changes (use ethtool). 
> 
> I would say the last bullet is a more interesting problem than a corner
> case of some link layer technology that has high overhead.
> Your work would be more interesting if it was generic for many link
> layers instead of just ATM.

I've thought about this a couple of times, scaling the virtual clock
rate should be enough for "simple" qdiscs like TBF or HTB, which have
a linear relation between time and bandwidth. I haven't really thought
about the effects on HFSC yet, on a small scale the relation is
non-linear. But this is a different problem from trying to accomodate
for link-layer overhead.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-06-15 Thread jamal
On Wed, 2006-14-06 at 14:55 +0200, Jesper Dangaard Brouer wrote:
> On Wed, 2006-06-14 at 08:06 -0400, jamal wrote:

> > - Have you tried to do a long-lived session such as a large FTP and 
> > seen how far off the deviation was? That would provide some interesting
> > data point.
> 
> The deviation can be calculated.  The impact is of cause small for large
> packets.  
>
> But the argument that bulk TCP transfers is not as badly
> affected, is wrong because all the TCP ACK packets gets maximum penalty.
> 

ACKs have always played a prominent role. The last numbers i have seen
for North America (but i think probably valid globaly) show in the range
of 40% ACKs in internet traffic
http://netweb.usc.edu/~rsinha/pkt-sizes/
I suspect a lot of these stats are on their way to change with voip, p2p
etc.
But i dont think it is ACKs perse that you or Russell are contending
cause these issues. It's the presence of ATM . And all evidence seems to
point to the fact that ISPs bill you for something other than your
point of view, no?

> On an ADSL link with more than 8 bytes overhead, a 40 bytes TCP ACK will
> use more that one ATM frame, causing 2 ATM frames to be send that
> consumes 106 bytes, eg. 62% overhead.  On a small upstream ADSL line
> that hurts! (See thesis page 53, table 5.3 "Overhead summary").
> 

But how are you connected to the DSLAM? In north America it is typically
ethernet. If i use the current tables i dont see much of a problem with
say cable modems. Are you trying to compensate for the accounting
differences between what your service provider measures (accounting for
their ATM cells) and what you do accounting for your ethernet frames? 
I guess i am lost as to where the ATM is in the topology and more
importantly whether we (Linux) mis-account or whether your approach is
trying to compensate for the ISPs mis-accounting.

> 
> > - To be a devil's advocate (and not claim there is no issue), where do
> > you draw the line with "overhead"? 
> > Example the smallest ethernet packet is 64 bytes of which 14 bytes are
> > ethernet headers ("overhead" for IP) - and this is not counting CRC etc.
> > If you were to set an MTU of say 64 bytes and tried to do a http or ftp,
> > how accurate do you think the calculation would be? I would think not
> > very different.
> 
> I do think we handle this situation, but I'm not quite sure that I fully
> understand the question (sorry).
> 

Assume the following:
- You had ethernet end to end. Is there still a problem?
- Take it a notch up and assume you had ethernet with MTU of 64B. This
way you will have all your packets being small and having high overhead.
Do you still have a problem?

> 
> > Does it matter if it is accurate on the majority of the cases?
> > - For further reflection: Have you considered the case where the rate
> > table has already been considered on some link speed in user space and
> > then somewhere post-config the physical link speed changes? This would
> > happen in the case where ethernet AN is involved and the partner makes
> > some changes (use ethtool). 
> >
> > I would say the last bullet is a more interesting problem than a corner
> > case of some link layer technology that has high overhead.
> 
> We only claim to do magic on ATM/ADSL links... nothing else ;-)
> 

This is well and good given the focus of your thesis. Up/down here we
need something more generic. Your masters-thesis is a good start but
consider doing the phd next and complete this work;->

> 
> > Your work would be more interesting if it was generic for many link
> > layers instead of just ATM.
> 
> Well, we did consider to do so, but we though that it would be harder to
> get it into the kernel.
> 
> Actually thats the reason for the defines:
>  #define  ATM_CELL_SIZE   53
>  #define  ATM_CELL_PAYLOAD48
> 
> Changing these should should make it possible to adapt to any other SAR
> (Segment And Reasembly) link layer.  
> 

You are still speaking ATM (and the above may still be valid), but: 
Could you for example look at the netdevice->type and from that figure
out the link layer overhead and compensate for it.
Obviously a lot more useful if such activity is doable in user space
without any knowledge of the kernel? and therefore zero change to the
kernel and everything then becomes forward and backward compatible.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-06-15 Thread jamal
On Wed, 2006-14-06 at 14:55 +0200, Jesper Dangaard Brouer wrote:
> On Wed, 2006-06-14 at 08:06 -0400, jamal wrote:

> > - Have you tried to do a long-lived session such as a large FTP and 
> > seen how far off the deviation was? That would provide some interesting
> > data point.
> 
> The deviation can be calculated.  The impact is of cause small for large
> packets.  
>
> But the argument that bulk TCP transfers is not as badly
> affected, is wrong because all the TCP ACK packets gets maximum penalty.
> 

ACKs have always played a prominent role. The last numbers i have seen
for North America (but i think probably valid globaly) show in the range
of 40% ACKs in internet traffic
http://netweb.usc.edu/~rsinha/pkt-sizes/
I suspect a lot of these stats are on their way to change with voip, p2p
etc.
But i dont think it is ACKs perse that you or Russell are contending
cause these issues. It's the presence of ATM . And all evidence seems to
point to the fact that ISPs bill you for something other than your
point of view, no?

> On an ADSL link with more than 8 bytes overhead, a 40 bytes TCP ACK will
> use more that one ATM frame, causing 2 ATM frames to be send that
> consumes 106 bytes, eg. 62% overhead.  On a small upstream ADSL line
> that hurts! (See thesis page 53, table 5.3 "Overhead summary").
> 

But how are you connected to the DSLAM? In north America it is typically
ethernet. If i use the current tables i dont see much of a problem with
say cable modems. Are you trying to compensate for the accounting
differences between what your service provider measures (accounting for
their ATM cells) and what you do accounting for your ethernet frames? 
I guess i am lost as to where the ATM is in the topology and more
importantly whether we (Linux) mis-account or whether your approach is
trying to compensate for the ISPs mis-accounting.

> 
> > - To be a devil's advocate (and not claim there is no issue), where do
> > you draw the line with "overhead"? 
> > Example the smallest ethernet packet is 64 bytes of which 14 bytes are
> > ethernet headers ("overhead" for IP) - and this is not counting CRC etc.
> > If you were to set an MTU of say 64 bytes and tried to do a http or ftp,
> > how accurate do you think the calculation would be? I would think not
> > very different.
> 
> I do think we handle this situation, but I'm not quite sure that I fully
> understand the question (sorry).
> 

Assume the following:
- You had ethernet end to end. Is there still a problem?
- Take it a notch up and assume you had ethernet with MTU of 64B. This
way you will have all your packets being small and having high overhead.
Do you still have a problem?

> 
> > Does it matter if it is accurate on the majority of the cases?
> > - For further reflection: Have you considered the case where the rate
> > table has already been considered on some link speed in user space and
> > then somewhere post-config the physical link speed changes? This would
> > happen in the case where ethernet AN is involved and the partner makes
> > some changes (use ethtool). 
> >
> > I would say the last bullet is a more interesting problem than a corner
> > case of some link layer technology that has high overhead.
> 
> We only claim to do magic on ATM/ADSL links... nothing else ;-)
> 

This is well and good given the focus of your thesis. Up/down here we
need something more generic. Your masters-thesis is a good start but
consider doing the phd next and complete this work;->

> 
> > Your work would be more interesting if it was generic for many link
> > layers instead of just ATM.
> 
> Well, we did consider to do so, but we though that it would be harder to
> get it into the kernel.
> 
> Actually thats the reason for the defines:
>  #define  ATM_CELL_SIZE   53
>  #define  ATM_CELL_PAYLOAD48
> 
> Changing these should should make it possible to adapt to any other SAR
> (Segment And Reasembly) link layer.  
> 

You are still speaking ATM (and the above may still be valid), but: 
Could you for example look at the netdevice->type and from that figure
out the link layer overhead and compensate for it.
Obviously a lot more useful if such activity is doable in user space
without any knowledge of the kernel? and therefore zero change to the
kernel and everything then becomes forward and backward compatible.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-06-14 Thread Andy Furniss

jamal wrote:

I have taken linux-kernel off the list.

Russell's site is inaccessible to me (I actually think this is related
to some DNS issues i may be having) and your masters is too long to
spend 2 minutes and glean it; so heres a question or two for you:

- Have you tried to do a long-lived session such as a large FTP and 
seen how far off the deviation was? That would provide some interesting

data point.
- To be a devil's advocate (and not claim there is no issue), where do
you draw the line with "overhead"? 


Me and many others have run a smilar hack for years, there is also a 
userspace project still alive which does the same.


The difference is that without it I would need to sacrifice almost half 
my 288kbit atm/dsl showtime bandwidth to be sure of control.


With the modification I can run at 286kbit / 288 and know I will never 
have jitter worse than the bitrate latency of a mtu packet. The 286 
figure was choses to allow a full buffer to drain/ allow for timer 
innaccuracy etc. On a p200 with tsc, 2.6.12 it's never gone over for me 
- though talking of timers I notice on my desktop 2.6.16 I gain 2 
minutes a day now.


Andy.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-06-14 Thread Jesper Dangaard Brouer
On Wed, 2006-06-14 at 10:27 -0400, Phillip Susi wrote:
> Jesper Dangaard Brouer wrote:
> > The Linux traffic's control engine inaccurately calculates
> > transmission times for packets sent over ADSL links.  For
> > some packet sizes the error rises to over 50%.  This occurs
> > because ADSL uses ATM as its link layer transport, and ATM
> > transmits packets in fixed sized 53 byte cells.
> > 
> 
> I could have sworn that DSL uses its own framing protocol that is 
> similar to the frame/superframe structure of HDSL ( T1 ) lines and over 
> that you can run ATM or ethernet.  Or is it typically ethernet -> ATM -> 
> HDSL?

Nope, not according to the ADSL standards G.992.1 and G.992.2.

> In any case, why does the kernel care about the exact time that the IP 
> packet has been received and reassembled on the headend?

I think you have misunderstood what the rate table does...
(There is an explaination in the thesis page 57 section 6.1.2)
http://www.adsl-optimizer.dk/thesis/

-- 
Med venlig hilsen / Best regards
  Jesper Brouer
  ComX Networks A/S
  Linux Network developer
  Cand. Scient Datalog / MSc.
  Author of http://adsl-optimizer.dk



signature.asc
Description: This is a digitally signed message part


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-06-14 Thread Phillip Susi

Jesper Dangaard Brouer wrote:

The Linux traffic's control engine inaccurately calculates
transmission times for packets sent over ADSL links.  For
some packet sizes the error rises to over 50%.  This occurs
because ADSL uses ATM as its link layer transport, and ATM
transmits packets in fixed sized 53 byte cells.



I could have sworn that DSL uses its own framing protocol that is 
similar to the frame/superframe structure of HDSL ( T1 ) lines and over 
that you can run ATM or ethernet.  Or is it typically ethernet -> ATM -> 
HDSL?


In any case, why does the kernel care about the exact time that the IP 
packet has been received and reassembled on the headend?



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-06-14 Thread Jesper Dangaard Brouer

On Wed, 2006-06-14 at 08:06 -0400, jamal wrote:

> Russell's site is inaccessible to me (I actually think this is related
> to some DNS issues i may be having) 

Strange, I have access to Russell's site.  Maybe its his redirect
feature that confuses your browser, try:
 http://ace-host.stuart.id.au/russell/files/tc/tc-atm/

> and your masters is too long to
> spend 2 minutes and glean it; so heres a question or two for you:

Yes, I is quite long and very detailed.  But it worth reading (... says
the author him self ;-))


> - Have you tried to do a long-lived session such as a large FTP and 
> seen how far off the deviation was? That would provide some interesting
> data point.

The deviation can be calculated.  The impact is of cause small for large
packets.  But the argument that bulk TCP transfers is not as badly
affected, is wrong because all the TCP ACK packets gets maximum penalty.

On an ADSL link with more than 8 bytes overhead, a 40 bytes TCP ACK will
use more that one ATM frame, causing 2 ATM frames to be send that
consumes 106 bytes, eg. 62% overhead.  On a small upstream ADSL line
that hurts! (See thesis page 53, table 5.3 "Overhead summary").


> - To be a devil's advocate (and not claim there is no issue), where do
> you draw the line with "overhead"? 
> Example the smallest ethernet packet is 64 bytes of which 14 bytes are
> ethernet headers ("overhead" for IP) - and this is not counting CRC etc.
> If you were to set an MTU of say 64 bytes and tried to do a http or ftp,
> how accurate do you think the calculation would be? I would think not
> very different.

I do think we handle this situation, but I'm not quite sure that I fully
understand the question (sorry).


> Does it matter if it is accurate on the majority of the cases?
> - For further reflection: Have you considered the case where the rate
> table has already been considered on some link speed in user space and
> then somewhere post-config the physical link speed changes? This would
> happen in the case where ethernet AN is involved and the partner makes
> some changes (use ethtool). 
>
> I would say the last bullet is a more interesting problem than a corner
> case of some link layer technology that has high overhead.

We only claim to do magic on ATM/ADSL links... nothing else ;-)


> Your work would be more interesting if it was generic for many link
> layers instead of just ATM.

Well, we did consider to do so, but we though that it would be harder to
get it into the kernel.

Actually thats the reason for the defines:
 #defineATM_CELL_SIZE   53
 #defineATM_CELL_PAYLOAD48

Changing these should should make it possible to adapt to any other SAR
(Segment And Reasembly) link layer.  


> On Wed, 2006-14-06 at 11:40 +0200, Jesper Dangaard Brouer wrote:
> > The Linux traffic's control engine inaccurately calculates
> > transmission times for packets sent over ADSL links.  For
> > some packet sizes the error rises to over 50%.  This occurs
> > because ADSL uses ATM as its link layer transport, and ATM
> > transmits packets in fixed sized 53 byte cells.
> > 
> > The following patches to iproute2 and the kernel add an
> > option to calculate traffic transmission times over all
> > ATM links, including ADSL, with perfect accuracy.
> > 
> > A longer presentation of the patch, its rational, what it
> > does and how to use it can be found here:
> >http://www.stuart.id.au/russell/files/tc/tc-atm/
> > 
> > A earlier version of the patch, and a _detailed_ empirical
> > investigation of its effects can be found here:
> >http://www.adsl-optimizer.dk/
> > 
> > The patches are both backwards and forwards compatible.
> > This means unpatched kernels will work with a patched
> > version of iproute2, and an unpatched iproute2 will work
> > on patches kernels.
> > 
> > 
> > This is a combined effort of Jesper Brouer and Russell Stuart,
> > to get these patches into the upstream kernel.
> > 
> > Let the discussion start about what we need to change to get this
> > upstream?
> > 
> > We see this as a feature enhancement, as thus hope that it can be
> > queued in davem's net-2.6.18.git tree.
> > 
> > ---
> > Regards,
> >  Jesper Brouer & Russell Stuart.
> > 
> 

Thanks for your comments :-)

-- 
Med venlig hilsen / Best regards
  Jesper Brouer
  ComX Networks A/S
  Linux Network developer
  Cand. Scient Datalog / MSc.
  Author of http://adsl-optimizer.dk



signature.asc
Description: This is a digitally signed message part


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-06-14 Thread jamal

I have taken linux-kernel off the list.

Russell's site is inaccessible to me (I actually think this is related
to some DNS issues i may be having) and your masters is too long to
spend 2 minutes and glean it; so heres a question or two for you:

- Have you tried to do a long-lived session such as a large FTP and 
seen how far off the deviation was? That would provide some interesting
data point.
- To be a devil's advocate (and not claim there is no issue), where do
you draw the line with "overhead"? 
Example the smallest ethernet packet is 64 bytes of which 14 bytes are
ethernet headers ("overhead" for IP) - and this is not counting CRC etc.
If you were to set an MTU of say 64 bytes and tried to do a http or ftp,
how accurate do you think the calculation would be? I would think not
very different.
Does it matter if it is accurate on the majority of the cases?
- For further reflection: Have you considered the case where the rate
table has already been considered on some link speed in user space and
then somewhere post-config the physical link speed changes? This would
happen in the case where ethernet AN is involved and the partner makes
some changes (use ethtool). 

I would say the last bullet is a more interesting problem than a corner
case of some link layer technology that has high overhead.
Your work would be more interesting if it was generic for many link
layers instead of just ATM.


cheers,
jamal

On Wed, 2006-14-06 at 11:40 +0200, Jesper Dangaard Brouer wrote:
> The Linux traffic's control engine inaccurately calculates
> transmission times for packets sent over ADSL links.  For
> some packet sizes the error rises to over 50%.  This occurs
> because ADSL uses ATM as its link layer transport, and ATM
> transmits packets in fixed sized 53 byte cells.
> 
> The following patches to iproute2 and the kernel add an
> option to calculate traffic transmission times over all
> ATM links, including ADSL, with perfect accuracy.
> 
> A longer presentation of the patch, its rational, what it
> does and how to use it can be found here:
>http://www.stuart.id.au/russell/files/tc/tc-atm/
> 
> A earlier version of the patch, and a _detailed_ empirical
> investigation of its effects can be found here:
>http://www.adsl-optimizer.dk/
> 
> The patches are both backwards and forwards compatible.
> This means unpatched kernels will work with a patched
> version of iproute2, and an unpatched iproute2 will work
> on patches kernels.
> 
> 
> This is a combined effort of Jesper Brouer and Russell Stuart,
> to get these patches into the upstream kernel.
> 
> Let the discussion start about what we need to change to get this
> upstream?
> 
> We see this as a feature enhancement, as thus hope that it can be
> queued in davem's net-2.6.18.git tree.
> 
> ---
> Regards,
>  Jesper Brouer & Russell Stuart.
> 

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-06-14 Thread Jesper Dangaard Brouer

The Linux traffic's control engine inaccurately calculates
transmission times for packets sent over ADSL links.  For
some packet sizes the error rises to over 50%.  This occurs
because ADSL uses ATM as its link layer transport, and ATM
transmits packets in fixed sized 53 byte cells.

The following patches to iproute2 and the kernel add an
option to calculate traffic transmission times over all
ATM links, including ADSL, with perfect accuracy.

A longer presentation of the patch, its rational, what it
does and how to use it can be found here:
   http://www.stuart.id.au/russell/files/tc/tc-atm/

A earlier version of the patch, and a _detailed_ empirical
investigation of its effects can be found here:
   http://www.adsl-optimizer.dk/

The patches are both backwards and forwards compatible.
This means unpatched kernels will work with a patched
version of iproute2, and an unpatched iproute2 will work
on patches kernels.


This is a combined effort of Jesper Brouer and Russell Stuart,
to get these patches into the upstream kernel.

Let the discussion start about what we need to change to get this
upstream?

We see this as a feature enhancement, as thus hope that it can be
queued in davem's net-2.6.18.git tree.

---
Regards,
 Jesper Brouer & Russell Stuart.



signature.asc
Description: This is a digitally signed message part