On Tue, 2006-07-04 at 15:29 +0200, Patrick McHardy wrote:
> Unfortunately I still didn't got to cleaning them up, so I'm sending
> them in their preliminary state. Its not much that is missing, but
> the netem usage of skb->cb needs to be integrated better, I failed
> to move it to the qdisc_skb_cb so far because of circular includes.

Cleanups aside, architecturally the bulk of your patch 
looks like a no-brainier to me.  The calculation of
packet length should be in one place.  Caching it in
skb->cb was a nice touch.

> But nothing unfixable. I'm mostly interested if the current size-tables
> can express what you need for ATM, I wasn't able to understand the
> big comment in tc_core.c in your patch.

Unfortunately you do things in the wrong order for ATM.
See: http://mailman.ds9a.nl/pipermail/lartc/2006q1/018314.html
for an overview of the problem, and then the attached email for
a detailed description of how the current patch addresses it.
It is a trivial fix.

As I said earlier, RTAB and STAB contain the same numbers,
just scaled differently.  The ATM patch stuffed around with
RTAB.  With your patch in place it will have to do the same 
exactly the same thing with STAB - because RTAB and STAB
carry the same data.  So to me the two patches seem
orthogonal.

One observation is the size optimisation you applied to STAB, 
making it variable length, could also be applied to RTAB.  
In fact it should be.  Then they would be identical, apart 
from the scaling.  Even the lookup operation (performed in
qdisc_init_len in your patch) would be identical.

However, now you lot have made me go away and think, I have
another idea on how to attack this.  Perhaps it will be
more palatable to you.  It would replace RTAB and STAB with
a 28 byte structure for most protocol stacks - well all I can
think of off the top of my head, anyway.  RTAB would have to
remain for backwards compatibility, of course.

--- Begin Message ---
Explanation of rate calculation code.

tc_core.c lines 53..61, tc_align_to_cells():

  Does the same thing as you function of the same name.
  Code is intended to be semantically equivalent.


tc_core.c lines 145..160, in tc_calc_ratespec:

  There are two material differences to your code.  Both
  are in this line:

    int sz = ((i+1)<<cell_log) - 1 + overhead;

  The first is that where I add the overhead, you don't.
  This is because I am calculating the rate table so it
  works as well as close to optimal as possible on an
  unpatched kernel.  Given the background in your draft
  email, I assume I don't have to prove you that it does
  in fact calculate the best possible table for an 
  unpatched kernel.

  The second change is I use:
    ((i+1)<<cell_log) - 1
  Whereas you had:
    ((i+1)<<cell_log)

  The number calculated by this expression:
    ((i+1) << cell_log)
  is the shortest packet length rtab[(i+1)] applies to.  
  Ergo, that number minus 1 must be the longest packet 
  length the previous rtab entry, rtab[i], applies to.
  So the intent of the change is that in unpatched kernels,
  rtab[i] will contain the time to send the longest 
  packet that uses it.  I suspect your code ends up doing 
  the same thing on kernels that add overhead to the 
  packet length.

There should be no other semantic differences between and
how you and I calculate the rate table.  If so this should
be sufficient to show that if your code calculated an ideal
rate table for kernels that added the overhead to the 
packet length before looking up the rate table, my code 
should calculate the best possible table for unpatched 
kernels.  It won't be ideal because some table entries will
apply to packets using different numbers of ATM cells.  
This is unavoidable.  But where that happens my code will 
always be conservative, ie use the longest packet length.

I think backward the compatibility increases the odds of
the patch being accepted.  The next step is to patch the
kernel it can get 100% accurate results by using this
table.

Consider this "diagram":

Overhead 0
----------
ATM Payload:          39 40 41 42 43 44 45 46 47 48 49 50
skb->len:             39 40 41 42 43 44 45 46 47 48 49 50
rtab[skb->len/8]:      1  1  1  1  1  1  1  1  1 x2  2  2

Each column is for a particular packet length.  The rows
show the packet length as seen by a layer of the protocol.  
The top row, "ATM Payload", contains packet length seen 
by the ATM layer - ie the ATM payload.  It includes all 
overheads bar cell overhead and padding.  It is always 
equal to: skb->len + overhead.

The second row, "skb->len" contains the packet seen by 
the kernel.  It is just the figure in the first line
less the "overhead" passed to tc.

The third line shows how "tc" will calculate the rate 
table entry for the packet length.  It contains the
number of ATM cells "tc" thinks the packet will occupy.
I am assuming a cell_log of 3 here, meaning each rate
table entry is used for 8 different packet lengths.  
In an unpatched kernel a single rate table entry may 
apply to packets which use different numbers of ATM 
cells.  Since the rate table entry is a single number 
it cant be right for all packet lengths when that 
happens.  If faced with this ambiguity tc uses the number
of cells occupied by the longest packet the rate table 
entry applies to. This means the rate table will be wrong 
for the shorter packets, and this is highlighted in the
diagram an "x".

Thus we see from the table that for a 39 byte packet tc
thinks it will take 1 cell to send.  But for a 48 byte
packet the table says it will take "x2" packets.  The "x"
means it will really take 1 packet, but because this rate
table entry is also used by 49..55 byte packets tc used
the bigger value for that rate table entry.

We can fix the errant "x2" entry by sliding the rate
table along so it sits under the smallest packet length
it could apply to (in this case 49).  To achieve that
the rate table must be slide one byte to right.  This 
can be achieved by adding an "alignment" figure of -1 
to the packet length before looking up the rate table.
Here is the same diagram as above, but with two 
additional lines added showing the effect of adding the
alignment:

Overhead 0
----------
ATM Payload:          39 40 41 42 43 44 45 46 47 48 49 50
skb->len:             39 40 41 42 43 44 45 46 47 48 49 50
rtab[skb->len/8]:      1  1  1  1  1  1  1  1  1 x2  2  2
skb->len-1:           38 39 40 41 42 43 44 45 46 47 48 49
rtab[(skb->len-1)/8]:  1  1  1  1  1  1  1  1  1  1  2  2

The rows "skb->len-1" and "rtab[(skb->len-1)/8]" are
copies of the rows above that have been moved slightly.
They have to be copies, as the rtab computed by tc hasn't
changed, so for example 47 must still yield 1 cell, and
48 must still yield 2 cells.

Below is the same diagram repeated for overheads 1 .. 9.
The are all pretty much the same as the one above.  The
only changes is the increasing overhead, which in turn 
changes the number of columns in error.  The alignment 
changes to compensate.  You will see it follows a 
straightforward pattern as the overhead increases.  

Overhead 1
----------
ATM Payload:          39 40 41 42 43 44 45 46 47 48 49 50
skb->len:             38 39 40 41 42 43 44 45 46 47 48 49
rtab[skb->len/8]:      1  1  1  1  1  1  1  1  1  1  2  2
skb->len-0:           38 39 40 41 42 43 44 45 46 47 48 49
rtab[(skb->len-0)/8]:  1  1  1  1  1  1  1  1  1  1  2  2

Overhead 2
----------
ATM Payload:          39 40 41 42 43 44 45 46 47 48 49 50
skb->len:             37 38 39 40 41 42 43 44 45 46 47 48
rtab[skb->len/8]:      1  1  1 x2 x2 x2 x2 x2 x2 x2  2  2
skb->len-7:           30 31 32 33 34 35 36 37 38 39 40 41
rtab[(skb->len-7)/8]:  1  1  1  1  1  1  1  1  1  1  2  2

Overhead 3
----------
ATM Payload:          39 40 41 42 43 44 45 46 47 48 49 50
skb->len:             36 37 38 39 40 41 42 43 44 45 46 47
rtab[skb->len/8]:      1  1  1  1 x2 x2 x2 x2 x2 x2  2  2
skb->len-6:           30 31 32 33 34 35 36 37 38 39 40 41
rtab[(skb->len-6)/8]:  1  1  1  1  1  1  1  1  1  1  2  2

Overhead 4
----------
ATM Payload:          39 40 41 42 43 44 45 46 47 48 49 50
skb->len:             35 36 37 38 39 40 41 42 43 44 45 46
rtab[skb->len/8]:      1  1  1  1  1 x2 x2 x2 x2 x2  2  2
skb->len-5:           30 31 32 33 34 35 36 37 38 39 40 41
rtab[(skb->len-5)/8]:  1  1  1  1  1  1  1  1  1  1  2  2

Overhead 5
----------
ATM Payload:          39 40 41 42 43 44 45 46 47 48 49 50
skb->len:             34 35 36 37 38 39 40 41 42 43 44 45
rtab[skb->len/8]:      1  1  1  1  1  1 x2 x2 x2 x2  2  2
skb->len-4:           30 31 32 33 34 35 36 37 38 39 40 41
rtab[(skb->len-4)/8]:  1  1  1  1  1  1  1  1  1  1  2  2

Overhead 6
----------
ATM Payload:          39 40 41 42 43 44 45 46 47 48 49 50
skb->len:             33 34 35 36 37 38 39 40 41 42 43 44
rtab[skb->len/8]:      1  1  1  1  1  1  1 x2 x2 x2  2  2
skb->len-3:           30 31 32 33 34 35 36 37 38 39 40 41
rtab[(skb->len-3)/8]:  1  1  1  1  1  1  1  1  1  1  2  2

Overhead 7
----------
ATM Payload:          39 40 41 42 43 44 45 46 47 48 49 50
skb->len:             32 33 34 35 36 37 38 39 40 41 42 43
rtab[skb->len/8]:      1  1  1  1  1  1  1  1 x2 x2  2  2
skb->len-2:           30 31 32 33 34 35 36 37 38 39 40 41
rtab[(skb->len-2)/8]:  1  1  1  1  1  1  1  1  1  1  2  2

Overhead 8
----------
ATM Payload:          39 40 41 42 43 44 45 46 47 48 49 50
skb->len:             31 32 33 34 35 36 37 38 39 40 41 42
rtab[skb->len/8]:      1  1  1  1  1  1  1  1  1 x2  2  2
skb->len-1:           30 31 32 33 34 35 36 37 38 39 40 41
rtab[(skb->len-1)/8]:  1  1  1  1  1  1  1  1  1  1  2  2

Overhead 9
----------
ATM Payload:          39 40 41 42 43 44 45 46 47 48 49 50
skb->len:             30 31 32 33 34 35 36 37 38 39 40 41
rtab[skb->len/8]:      1  1  1  1  1  1  1  1  1  1  2  2
skb->len-0:           30 31 32 33 34 35 36 37 38 39 40 41
rtab[(skb->len-0)/8]:  1  1  1  1  1  1  1  1  1  1  2  2

Notice that at overhead 8 has an alignment of -1, and this
is the same alignment as overhead 0 had.  This is because
the errors have moved through one rate table entry and are
now into the next one, so the "fix" is the same.

We can draw up a table showing the required alignment for
each overhead, like this:

  overhead | alignment
  ---------+----------
      0    |    -1
      1    |     0
      2    |    -7
      3    |    -6
      4    |    -5
      5    |    -4
      6    |    -3
      7    |    -2

We only need 8 elements, as once we get to 8 (or more
accurately 1<<cell_log) it repeats.  If the kernel
adds the alignment figure from the table to the packet
size before looking up the rate table, then it will 
always get accurate results from the optimal table for 
the unpatched kernel.

The alignment figure is calculated by "tc" and passed to
the kernel, much as you passed the "overhead" to the 
kernel in your code.  In my case it will always be small.

And finally, we come to ......

tc_core.c line 122, in tc_calc_cell_align.  The expression 
I use to calculate the alignment is:

    return (overhead + cell_size - 2) % cell_size - cell_size + 1

You could try and mathematically prove it does yield the
table above, but since there are only 8 values that
matter (the % in the expression ensures it repeats after
that), it much easier to just plug in the overheads 0..7
and verify it does come up with the correct alignment.

QED, for a cell_log of 3 anyway.

--- End Message ---

Reply via email to