Re: [LARTC] Re: HFSC

2004-03-21 Thread Nuutti Kotivuori
Patrick McHardy wrote:
> Nuutti Kotivuori wrote:
>> Patrick McHardy wrote:
>>
>> BTW the Blue scheduler patch for 2.6 seems to be working nicely -
>> but I haven't had the time to run the tests on it that I wished to,
>> so I haven't posted anything further about it.
>
> I have in the mean time read up on SFB, maybe I'll extend blue when
> I find the time.

That reminds me - I have an "extension" to Blue I'd like to try and
cook up, if I ever manage to get the time.

Ingress Blue. Basically just having a token bucket on ingress, just
like traffic policing has - but using Blue on that. Running out of
tokens means packet drop, so increase probability. Bucket overflowing
with tokens means link idle, so decrease probability.

I have a feeling something like that might work well when trying to
reduce packet queueing at the ISP on a slow inbound link - better than
the usual strict ingress police or using IMQ with RED and such.

> PSCHED_GETTIMEOFDAY (or PSCHED_CPU in case of the kernel) are
> important for HFSC to work properly, PSCHED_JIFFIES has too low
> resolution.

That might be what was messing up my other simulations as well -
thanks for the heads up, I will see what comes out of that.

[...]

> I know what the problem is. Try: ip link set lo txqueuelen 1

Works like a dream!

> or upgrade to 2.6.4. The problem got introduces when fixing
> an off-by-one in pfifo_fast, before it would enqueue one packet
> with a txqueuelen of 0. In 2.6.4 this behaviour is restored,
> although it's a misconfiguration anyway to use leaf-queues with
> a limit of 1 for anything but well-formed flows.

Figures... :-) So I just got bitten by the default configuration of
txqueuelen of 0 for lo, that I didn't even happen to think about. Time
to upgrade to 2.6.4 as well it seems.

-- Naked
___
LARTC mailing list / [EMAIL PROTECTED]
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/


Re: [LARTC] Re: HFSC

2004-03-21 Thread Nuutti Kotivuori
Patrick McHardy wrote:
> Late reply but here it is ;)

No worries - it wasn't exactly brief and I did have other stuff to
spend my time on.

BTW the Blue scheduler patch for 2.6 seems to be working nicely - but
I haven't had the time to run the tests on it that I wished to, so I
haven't posted anything further about it.

> Nuutti Kotivuori wrote:
>> Patrick McHardy wrote:

[...]

> I think it can be expressed easier like this:
>
> b(t) =
> {
> m1 * tt <= d
> m1 * d + m2 * (t - d) t > d
> }
>
> b_rt(t) <= b_ls(t) <= b_ul(t) for all t >= 0

Yes, certainly - I just wished to eliminate t from it all.

> No error is signalled when these are violated.

Right.

> The later is correct, the class will participate in link-sharing,
> but will only be selected by the real-time criterion under full
> load.  It will also be punished later wrt. excess bandwidth as long
> as the parent constantly stays active.

Ah, yes. Makes perfect sense.

> It will still respect the upper-limit curve, but I'm not sure about
> the consequences for sibling-classes and the parent's active child
> list, I need to think about this some more. In any case it's not
> advisable to do so.

Okay.

> The sum of all realtime service curves must not be bigger than the
> service curve of the link itself, otherwise the service can't be
> guaranteed.

Nod.

> For link-sharing curves, it's actually not important that they don't
> exceed their parent because they only define a share, not an
> absolute amount of service. Only the relative differences between
> siblings matter.

Makes sense.

> Adding n curves gives you (in the worst case) a (n+1)-ary curve, you
> can calculate it like this:
>
> sc1: m1 = 100kbit, d = 1s, m2 = 200kbit
> sc2: m1 = 50kbit, d = 0.25s, m2 = 300kbit
> sc3: m1 = 200kbit, d = 1.5s, m2 = 500kbit
> -
> m =
> {
>   350kbit d <= 0.25s
>   600kbit 0.25s < d <= 1s
>   700kbit 1s < d <= 1.5s
>   1000kbitd > 1.5s
> }

Right. I think there's a need for a small tool to make these
calculations - and perhaps even to automatically be able to scale
other curves to maintain the restrictions. But that is for the future,
manual calculation will do for now.

> If it is possible to fulfill all demands with the available
> excess-bandwidth than there is no difference. The real difference is
> of a different kind. A parents link-sharing service curve might be
> violated by the real-time criterion used for one of it's
> children. The parents' siblings will suffer from this as well
> (link-sharing wise) because they share the same parent and part of
> the service given to all siblings of this parent has been used in
> violation of link-sharing, so link-sharing only leaves will
> suffer. An example for this is in the HFSC paper on page 6.

Right, I think I understand this now.

>>> - The sum of all real-time curves must not exceed 100%.
[...]
> Yes, actually link capacity. Already explained above.
>
>> And what happens if they do?
[...]
> Nothing bad will happen, only the guarantees can't be met anymore.
> It will still pick the class with the smallest deadline.

Okay, this is good to know.

>> If only relative difference matters, why must link-sharing service
>> curves be larger than real-time service curves? And smaller than
>> upper-limit service curves?
>
> They don't. It just makes it simpler to assure that the service
> given to a class is at least the amount defined by the real-time
> curve, which is usually what you want.

Exactly.

>> I also seem to have a lot of difficulties in trying to simulate the
>> behaviour of the qdisc.
[...]
> Have you applied the patches from trash.net/~kaber/hfsc/tcsim ?
> With HZ=1000 and PSCHED_GETTIMEOFDAY as clocksource I got very
> good results.

I tried HZ=1000 and HZ=100 both, and the results were odd. But I think
I didn't touch the clocksource at all. I will try later on with
PSCHED_GETTIMEOFDAY as well.

>> Also, something as trivial as this:
>> tc qdisc add dev $DEV root handle 1: hfsc default 1
>> tc class add dev $DEV parent 1: classid 1:1 hfsc rt m2 100kbps
>> seems to work for 'eth0' but not 'lo' interface, where as for example
>> the 'tbf' qdisc does work for 'lo' as well. If I run those commands on
>> 'lo', every packet shows up as dropped by the qdisc.
>
> Works ok here .. do you mean inside tcsim ?

No, I don't mean inside tcsim. Here is a full transcript:

*
shiro:~# export DEV=lo
shiro:~# tc -s -d qdisc show dev $DEV
shiro:~# ping localhost -

[LARTC] Re: Any danger in thrashing 'tc'?

2004-03-12 Thread Nuutti Kotivuori
David McNab wrote:
> Is there any danger in a prog which repeatedly clears the ingress
> and root egress qdiscs, and sets up new ones, even as frequently as
> every 5-15 seconds?

[...]

> As you can see, the prog will be frequently spitting heaps of tc
> commands, constantly taking down the ingress and root egress qdiscs,
> and creating new ones.
>
> So, am I likely to hit on any unintended consequences (apart from
> the minor cpu spikes)?

When you remove the qdiscs, you will cause all queued packets to
drop. When you add qdiscs you obviously have reseted the burst values
for all of them so then each of has a full burst to spend (if you use
bursts anywhere, that is).

This does not happen when you change parameters that can be changed by
'tc qdisc change', but not all values can be changed. And obviously
this does not happen if you just change iptables marking rules to mark
packets differently.

As for your actual problem - I would suggest looking for some other
solution than mangling rules/qdiscs every few seconds.

-- Naked


___
LARTC mailing list / [EMAIL PROTECTED]
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/


Re: [LARTC] Re: Blue and SFB

2004-03-06 Thread Nuutti Kotivuori
Patrick McHardy wrote:
> I've completed the port and tested it yesterday, unfortunately it's
> not useable in the real world as is. There is a strong bias against
> non-ECN flows because their packets are simply dropped instead of
> marked. At high load (50 ECN vs. 1 non-ECN flow) and a marking
> probability of about 10% the non-ECN flow simply stalls.
> I can send you the patch if you're interested ..

Thank you, I am really interested.

I will try how it behaves for me in various circumstances.

What you say, though, is probably true - and the situation is even
more accentuated when considering that different TCP stacks react to
ECN and packet drops differently - a single drop percent will not be
enough.

Which ofcourse brings us to SFB - with Stochastic Fair Blue, the drop
percentage for the non-ECN flow should be significantly lower and the
connections should transfer more or less fairly.

-- Naked

___
LARTC mailing list / [EMAIL PROTECTED]
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/


[LARTC] Re: Blue and SFB

2004-03-04 Thread Nuutti Kotivuori
Patrick McHardy wrote:
> There is a blue implementation for Linux at
> http://home.sch.bme.hu/~bartoki/projects/thesis/2001-May-26/

Nice!

I briefly scanned the implementation and it didn't look too bad. Some
oddities here and there (such as changing HZ from 100 to 1024??).

But, as the implementation is rather old, it would require a complete
overhaul for 2.6 I think. It is a shame it wasn't worked in to the
Linux kernel when it was still current, as I think the algorithm could
have a lot of uses.

What is the process of getting new traffic schedulers in the kernel? I
guess most of the netfilter stuff goes through netfilter development
and patch-o-matic - but there isn't anything similar for QoS, is
there?

> BTW: Regarding your remaining HFSC questions (I just discovered
> them on lartc), I'm going to answer them tomorrow.

No rush. I did realize that I posted the reply only to the list
through gmane, but didn't consider it important enough to resend
directly.

-- Naked

___
LARTC mailing list / [EMAIL PROTECTED]
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/


[LARTC] Blue and SFB (Was: how to add my own traffic scheduler to TC)

2004-03-03 Thread Nuutti Kotivuori
Kennedy Cheng wrote:
> What are the steps needed to add my own traffic scheduler to TC?

>From your small example there, I would guess you are working on a Blue
scheduler for TC.

Hence I'd like to ask publically from everyone around - has anyone
seen any efforts to implement Blue[1]? What about SFB, eg. Stochastic
Fair Blue?

The Blue algorithm seems really straightforward to implement, so I
really wonder why it hasn't been done already? SFB is ofcourse more
complex in several ways, but perhaps a lot of code could be reused
from SFQ, since the basic idea of hashing flows is the same.

Thanks,
-- Naked

Footnotes: 
[1]  http://www.thefengs.com/wuchang/work/blue/


___
LARTC mailing list / [EMAIL PROTECTED]
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/


[LARTC] Re: how do you rate limit routable traffic without rate limiting LAN protocols like arps and igmp?

2004-03-02 Thread Nuutti Kotivuori
Michael A. D'Annunzio wrote:
> I know if I remove the default parameter, traffic not matching
> any filter is sent over the root queue, but I need to have a defined
> default.

Define the default for whatever you wish to have as the default class
for non-ip traffic and then filter _all_ ip traffic into a certain
class, eg:

  tc filter add dev $DEV parent 1: protocol ip prio 18 u32 \
 match ip dst 0.0.0.0/0 flowid 1:20

Or, even more modularily, use the MARK target at the end of your
iptables ruleset after all other markings:

  iptables -t mangle -A POST_ROUTING -m mark --mark 0 -j MARK --set-mark 3
  iptables -t mangle -A OUTPUT -m mark --mark 0 -j MARK --set-mark 3

And then just setup your tc to match the marks to classes:

  tc filter add dev $DEV parent 1: protocol ip prio 10 \
  handle 3 fw \
  flowid 1:30

Or a variety of other solutions.

I am not sure if this will solve your problem exactly, though - since
having problems with ARP traffic and so seems really odd.

-- Naked


___
LARTC mailing list / [EMAIL PROTECTED]
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/


[LARTC] Re: Strange tc issue

2004-03-01 Thread Nuutti Kotivuori
Roy Walker wrote:
> I guess the issue centers around every linux box I have shows the
> pfifo_qdisc when I do an 'ip link show'.  But when I do a tc it does
> not do this.  This definitely tells me that tc is not reading the
> qdiscs properly which also probably means it is not setting them
> right either.  Problem is that is compiles properly and does not
> give me an errors during build.
>
> This is really starting to drive me crazy.  Appreciate your help.

The things you say seem partly contradictory, and it is really not
clear what is the problem - however, I will give an example of what
things do on *my* system so you can spot the discrepancy between your
systems.

,
| ... no configuration ...
| shiro:~# ip link show dev eth0
| 7: eth0:  mtu 1500 qdisc pfifo_fast qlen 100
| link/ether 00:30:1b:ae:6a:66 brd ff:ff:ff:ff:ff:ff
| shiro:~# tc qdisc show dev eth0
| qdisc pfifo_fast 0: [Unknown qdisc, optlen=20] 
| shiro:~# tc class show dev eth0
| shiro:~# tc filter show dev eth0
|
| ... add a qdisc ...
| shiro:~# tc qdisc add dev eth0 root pfifo limit 100
|
| ... show configuration again ...
| shiro:~# ip link show eth0
| 7: eth0:  mtu 1500 qdisc pfifo qlen 100
| link/ether 00:30:1b:ae:6a:66 brd ff:ff:ff:ff:ff:ff
| shiro:~# tc qdisc show dev eth0
| qdisc pfifo 8002: limit 100p
| shiro:~# tc class show dev eth0
| shiro:~# tc filter show dev eth0
`

And no loss of connectivity to anywhere, or anything of the likes. If
this does not work for you, then there is something seriously wrong
either with your kernel or with tc. If this does work for you, I
suggest specifying exactly what you commands you are saying that cause
your problems - there might be something wrong there.

-- Naked

___
LARTC mailing list / [EMAIL PROTECTED]
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/


[LARTC] Re: HFSC

2004-03-01 Thread Nuutti Kotivuori
Patrick McHardy wrote:
> The combinations you list are correct. Real-time curves are only
> valid for leaf-classes, whereas link-sharing and upper-limit curves
> are valid for all classes in the heirarchy.

Right, after a bit of experimentation and thinking, I realized this.

> When multiple curves are used, the following must hold:
> rt <= ls <= ul

If this is for all t, in practise this means:

  if d > 0
 m1(rt) <= m1(ls) <= m1(ul)
 m2(rt) <= m2(ls) <= m2(ul)
 if m1 < m2
   d(rt) >= d(ls) >= d(ul)
 elsif m1 > m2
   d(rt) <= d(ls> <= d(ul)
 else
   d irrelevant
  else
 m1 irrelevant
 m2(rt) <= m2(ls) <= m2(ul)

Am I correct? What happens if these values are violated? Are any
errors signalled?

Also, I have very little clue why these must hold as such.

Obviously if a link-sharing curve is smaller than the real-time curve,
then when the class participates in link-sharing, it would have to
have sent less than it already has. But if this is so, does the
algorithm break totally, or does that only mean that the class does
not participate in link-sharing before the excess bandwidth share for
the class based on the relative link-sharing service curve goes above
the real-time service curve? The latter would not necessarily an
unwanted behaviour.

Then if the upper-limit service curve is smaller than the link-sharing
curve, what would this cause? Naive assumptions would lead me to think
that it would merely mean that the class participates in link-sharing
based on the relative service curve it has, but never ends up taking
more than what the upper-limit service curve dictates. Eg. in a case
with a relatively large link-sharing service curve and a smaller
upper-limit service curve the class would get a big share out of a
small amount of excess bandwidth shared, but as bandwidth to share is
increased, upper-limit service curve will limit it to a constant
limit. Or again, does the algorithm break somehow?

And I am even more confused when I think what this means for the
interior classes and their service curves. Apparently the service
curves for parent classes are respected, but does that mean that the
service curve for a parent class would have to be equal to or larger
than the sum of all child classes service curves? If so, how does one
calculate this? It would be an n-ary curve if calculated exactly. Or
if not so, then what happens if the service curves of child classes
exceed the service curve of the parent class? Obviously here we are
talking only about link-sharing service curves (and upper-limit
service curves) as real-time service curves are always fulfilled and
only handled by leaf classes.

> To understand why there are two (forgetting about upper-limit curves
> for now) different curves your need to know that scheduling in HFSC
> is based on two criteria: the real-time criterion which ensures that
> the guarantees of leaf-classes are met and the link-sharing
> criterion which tries to satisfy the service curves of intermediate
> classes and distributes excess bandwidth fairly. The reason why
> there are two different criteria is that in the Fair Service
> Link-sharing model that is approximated by HFSC it is not always
> possible to guarantee the service of all classes simultaneously at
> all times (with non-linear service curves). HFSC chooses to
> guarantee the service curves of (real-time) leaf-classes (because
> only leaves carry packets), and uses the link-sharing criterion to
> minimize the discrepancy between the actual service received and the
> service defined by the Fair Service Link-sharing Model.

Right. I think I understand this, but I am not so certain of the
implications for actual use.

> The upper-limit curve is used to limit the link-sharing
> curve. Without an upper-limit curve, packets are dequeued at the
> speed the underlying device is capable of. For example in the case
> of software devices, this is not very desireable, so you can limit
> the total output rate.

I came to this conclusion by experimentation. So upper-limit service
curve can be used to shape link-sharing usage - but real-time service
curves are fulfilled regardless of it? So the end result would be that
a class with only a real-time service curve throttles itself to the
rate, with a link-sharing service curve becomes work-conserving and
with an upper-limit service curve it throttles itself again.

> For your other questions:
> - If you specify only a real-time service curve the class will not
> participate in link-sharing. This means it can only send at it's
> configured rate. The difference two a link-share+upper-limit curve
> is that the service is guaranteed.
>
> - If you specify only a link-share curve their are no deadlines and
> no guarantees can be given.

Right. After a while I realized this. But again I am somewhat
uncertain of the ramifications.

If we assume classes that have real-time service curves equal to
link-sharing service curves, and compare those to classes 

[LARTC] Re: tcng version 9l

2004-02-29 Thread Nuutti Kotivuori
Werner Almesberger wrote:
> Since I cleaned up so many things for Gentoo yesterday, here's one
> for Debian 3.0. The main problems were:

I still have one more problem with latest Debian unstable. This is the
failure message I am getting:

cc -g -Wall -Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations 
-I../shared -Iklib -Iklib/include -Iulib/iproute2/include -I. -DVERSION=\"`cat 
../VERSION`\" -DTOPDIR=\"/home/naked/src/tcng/tcng\"  
-DTCC_CMD=\"/home/naked/src/tcng/tcng/bin/tcc\" -DKFULLVERSION=\"2.4.25\" 
-DKFULLVERSIONNUM=`printf "0x%02x%02x%02x" 2 4 25` -DIVERSION=\"010824\"   -c -o 
tcsim.o tcsim.c
In file included from /usr/include/bits/sigcontext.h:28,
 from /usr/include/signal.h:326,
 from tcsim.c:15:
/usr/include/asm/sigcontext.h:79: error: parse error before '*' token
/usr/include/asm/sigcontext.h:82: error: parse error before '}' token

Which is caused by having klib/include in the include path, and it
overriding what linux/compiler.h is. If I add:

#define __user
#define __kernel

to tcsim/klib/include/linux/compiler.h, everything works perfectly and
tests pass (Passed all 1534 tests (24 conditional tests skipped)).

-- Naked

___
LARTC mailing list / [EMAIL PROTECTED]
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/


[LARTC] Re: HFSC

2004-02-28 Thread Nuutti Kotivuori
Patrick McHardy wrote:
> This is currently all there is. If you have some specific questions,
> just ask (but please CC lartc). If anyone wants to write some
> documentation I'd be happy to help, but I don't have time for it
> myself.

I am not sure if the original poster has specific questions, but I
sure do.

I just recently got into this HFSC mess myself, so I'm a bit fuzzy in
all the terms and differences in implementation. I read the paper
(SIGCOM97) on HFSC and I think I understood most of it. But there are
some things in the implementation that I couldn't really realize.

I'll quote the usage here for reference:

,
| Usage: ... hfsc [ rt SC ] [ ls SC ] [ ul SC ]
|  
| SC := [ [ m1 BPS ] [ d SEC ] m2 BPS
|  
|  m1 : slope of first segment
|  d  : x-coordinate of intersection
|  m2 : slope of second segment
`

Okay, the SC parameters I think I understand rather well - they are
there to define the service curve itself. But the way hfsc takes three
optional parameters of service curves puzzles me.

I believe 'rt' referes to 'Real-Time Service Curve', 'ls' to 'Link
Sharing Service Curve' and 'ul' to 'Upper Limit Service Curve'. If I
understand correctly, the SIGCOM97 paper mentioned that the link
sharing selection need not be the same as the real time selection, but
in examples assumed for simplicity that they were. Also, from the
source I conclude that 'Upper Limit Service Curve' cannot be specified
without 'Link Sharing Service Curve'.

So, this, all in all, baffles me :-)

The possible combinations I can make here are:

  Real-Time
  Link-Sharing
  Real-Time, Link-Sharing
  Link-Sharing, Upper-Limit
  Real-Time, Link-Sharing, Upper-Limit

How do these behave? If I specify *only* the real-time curve, what is
used for the link-sharing part? Or does that mean that there is no
sharing? Or if I only specify the link-sharing curve, does that mean
that no specific deadlines are for packets, just that they are sent
based on the link-sharing model? And what actually is the upper-limit
service curve? I take it that it is some kind of a packet drop curve,
but I don't know how it would behave - nor why it would require the
link-sharing curve.

So, any pointers on these would be helpful, or if you manage to get
the time to explain it specifically.

I will probably cook up atleast an example script using HFSC for
normal QoS if I manage to understand how it works, perhaps even some
documentation.

-- Naked
___
LARTC mailing list / [EMAIL PROTECTED]
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/


[LARTC] Re: ESFQ Modification

2004-02-26 Thread Nuutti Kotivuori
Robert Kurjata wrote:
> Some time ago I faced a problem in limiting traffic on host with
> multiple uplinks. Since all the stuff worked nice seemed that there
> will be no problems. But then I realized that P2P users are smart
> enough to bypass limits as sfq doesn't give fair sharing in this
> case (thousands of connections from one user versus few from the
> other).  I tried IMQ but it's instability in my configuration was
> painfull.  So I made something like that:
>
> 1. i use IPMARK patch for the iptables to mark all the connections
> in P2P related class depending on source IP (i use SNAT),
> 2. modified ESFQ to create hash depending on FWMARK instead of src
> ip 3. and it worked. So I have uplink policy based on source ip in
> snat-ed environment without using IMQ.
>
> I'm looking for the opinions, cause I may be wrong in this.
> Patch for the files below, cause it's short

Quite an unorthodox solution, I must say. But I guess it's as valid as
anything is. SFQ and ESFQ are usually for situations where you have a
large amount of connections (hashes) that you just cannot track
invidually. Hence stochastic - and hence the options for perturbation
and so. If there's only a few hashes, as most likely is in your NFMARK
case, most of the time they will hit separate hash buckets - but on
some perturbation, they might hit the same hash buckets again and
fairness is not achieved.

The patch as it was by my brief peek, looked rather okay, though.

Have you looked in to the WRR scheduler? It is meant to give an equal
share of bandwidth to all 'local' machines with weighted round robin
scheduling and sounds like exactly what you are looking for.

-- Naked

___
LARTC mailing list / [EMAIL PROTECTED]
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/


[LARTC] Re: SMP and tc

2004-02-25 Thread Nuutti Kotivuori
Andreas Hess wrote:
> I wonder if anyone has run tc on an e.g. dual processor system?  As
> far as I know under linux-2.6 it is possible that two processors
> receive and process packets of one NIC. Is this right?  And if yes,
> does it work fine?

Yes, it is working fine. There are several locks in packet processing
code that span all processors though, so it's not entirely separate,
only mostly so.

-- Naked


___
LARTC mailing list / [EMAIL PROTECTED]
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/


[LARTC] Anyone using PCI/USB xDSL under Linux?

2004-02-24 Thread Nuutti Kotivuori
I am looking to find people who are using either a PCI xDSL card or an
USB xDSL modem right now under Linux. I am not entirely sure where I
should go look to find people who do this, but they definitely exist
now that there are drivers for several modems.

Specifically I am interested in performance and latency.

When using an ethernet connected xDSL modem, you obviously get the
ethernet latency for transmitting the packet and you need to limit
your send speed to something below the actual link speed to own the
queue. The ethernet latency is largely insignificant, but owning and
having exact control over the send queue is intriguing.

So, if anyone has any input on the matter or knows where I might look
further, I would be very interested to know.

TIA,
-- Naked


___
LARTC mailing list / [EMAIL PROTECTED]
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/


[LARTC] Re: Latency low enough for gaming

2004-02-18 Thread Nuutti Kotivuori
Patrick Petersen wrote:
> I have learned a lot from the lartc list archive, but this specifik
> leaves me with no clue. I have been able to get real close to normal
> latency by capping incoming traffic at around 1200kbits, but its no
> fun throwing away almost half your bandwidth.
>
> Can i get any recommendations?

Let's get the problem statement clear first.

  First of all, it is obvious that the high latency is a result of
  queueing at the ISP, before the packets are send over the slow link
  to your router. ISPs have very long queues normally.

  Secondly, one needs to understand that there isn't really a damn
  thing you can do about it. If someone ping-floods you, it will
  saturate your downlink and latency will go through the roof. This
  cannot be prevented except by having access to your ISP.

  And thirdly, the only thing you can do, is to either discard or
  delay perfectly good packets which have already travelled over your
  slow link and spent bandwidth on it. If you drop a packet, it will
  most likely have to be resent and again use up the same
  bandwidth. And the only good this does is to try and make the
  connections throttle themselves when they notice that packets aren't
  getting through. TCP does this, and a few application level UDP
  protocols do this, but not much else.

So, to your *goal* in a single sentence:

  Force TCP to send packets slower than your downlink speed.

If you can manage this, then no packets are queued at your ISP and you
can prioritise traffic perfectly on your router.

So, how does TCP work, then?

  On a connection, TCP has a window size in both directions, which is
  the amount of new packets that can be transferred without getting an
  acknowledgement for the packets already sent. Every packet sent is
  put on a re-send queue, and removed from there when an
  acknowledgement is received for that packet. If an acknowledgement
  doesn't arrive for a while, the packet is re-sent.

  So what happens when a packet is dropped, is that the connection
  stalls for a moment, because a packet is unacknowledged and send
  window limits the amount of packets that can be in transit. TCP
  stacks also throttle themselves when they notice that packets are
  being dropped.

  Traditionally, the maximum window size was 64kb - that is, a maximum
  of 64kbs of data can be unacknowledged on the link. Then the
  internet became full of links which have a large bandwidth, but also
  lots of latency. TCP window scaling was invented, and now window
  sizes can be much larger than that.

  Also, traditionally TCP only acknowledged up to the last continguous
  packet - that is, it wouldn't send acknowledgements for the packets
  that arrived after the missing packet. A loss of a single packet
  usually caused a short stall in the connection. This was augmented
  by cool retransmission logic, which allowed TCP to recover from the
  dropping of a single packet without a stall. And yet later selective
  acknowledgements were invented, which allows TCP to tell the other
  end exactly which packets it is missing, and now TCP survives quite
  high packet loss reasonably well.

So, what's the solution? How to make TCP throttle properly?

  The *real* solution would be to implement a packet mangler which
  would mutilate outgoing TCP ACK packets such that it would only give
  out transmission windows with the speed the link is configured
  to. However, to my knowledge, no free software implements this. I
  might work up a patch later, if I can come up with a good design.

But, short of implementing the *real* solution, there are several
things you can do to improve the situation. But first, let's see what
is happening now.

  Right now, your scripts shove all incoming traffic to a HTB, inside
  which the selection of packets happens through ESFQ. The HTB has to
  be limited to a rate *smaller* than the actual downlink for it to
  have any effect what so ever. And even so, what you do is that you
  queue (eg. delay) packets (maximum of 128 packets as per ESFQ), and
  then drop fairly traffic that comes faster.

  So what does TCP do about it? Latency is higher because of queueing
  at your router, or queuing at the ISP, so the large window sizes
  allow for a lot of packets to be in transit, waiting to be
  transferred. A bunch of packets are dropped, so those are
  retransmitted as soon as possible (at the arrival of the next
  selective acknowledgement), again filling up the queue. TCP will
  always try to transfer a bit faster than the speed it can get
  packets through to take immediate use of improved situations.

  With a single TCP stream, the queue size at your router or ISP is
  neglible, so it doesn't hurt latency much. But when there are a
  large amount of connections transferring as fast as they can,
  there's a lot of overshooting and what you described happens - the
  only way to prevent queuing at ISP is to limit the bandwidth to half
  of the actual link speed.

What 

[LARTC] Re: ACK Packet Detection

2004-02-08 Thread Nuutti Kotivuori
Alan Ford wrote:
> I'm trying to understand how the wondershaper ACK match works. Can
> somebody help me decode it?
>
> |tc filter add dev $DEV parent 1: protocol ip prio 10 u32 \
> |   match ip protocol 6 0xff \
>
> TCP.

Aye.

> Do these start from the start of the IP header, or the TCP header?

Ip header - there's no state information relayed between matches - so
these matches cannot know that the protocol is TCP.

> |   match u8 0x05 0x0f at 0 \
>
> If this is start of TCP header - source port is over 1280?

First byte of ip packet, first nibble is version, second nibble is
length in words. 0x45 is what it is normally - eg. 20 bytes ip header,
no options. That is, this just makes sure there are no ip options on
the packet.

> |   match u16 0x 0xffc0 at 2 \ 
>
> Something about the destination port, I'm a bit confused by the
> netmask.  Surely not "under 64", which is how I'm reading it?
>
> Or, if this is from the start of the IP header, is this packet
> length?  Under 64 bytes? Might make more sense...

Length below 64. TCP has no length field - and the only thing which
separates an ACK packet with no data transmitted with it from an ACK
packet which has data as well is indeed the packet length.

> |   match u8 0x10 0xff at 33 \ 
>
> ???
>
> Acknowledgement number starts with 0x10 ?

ACK bit is on in TCP flags - and everything else is off.

> |   flowid 1:10

That should do it. However, I prefer to do the same thing in netfilter
and then just use that information in the traffic control side.

Example from a 'ferm' script:

  proto tcp tcp-flags ALL ACK length 0:63 MARK setmark 1;

This one is almost identical to the one shown above and much easier to
understand.

-- Naked


___
LARTC mailing list / [EMAIL PROTECTED]
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/


[LARTC] Re: wondershaper + htb limiting ftp sends

2004-02-08 Thread Nuutti Kotivuori
mark ryan wrote:
> Or, ideally, I would like to limit proftpd itself...howeve there
> doesn't seem to be a way to do that with linux.  Windows can but I
> guess Linux cant.

Proftpd does have plenty of ways to limit the bandwidth specifically
for certain commands only and directions wished and all that.

Just peruse the documentation. Operating system has nothing to do with
it.

-- Naked


___
LARTC mailing list / [EMAIL PROTECTED]
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/


Re: [LARTC] Re: per-session QoS

2004-02-05 Thread Nuutti Kotivuori
Ben wrote:
> On Thu, 2004-02-05 at 18:03, Nuutti Kotivuori wrote:
>
>> What you would wish to do is have a simple per connection token
>> bucket, and just DROP every packet exceeding the rate in the
>> connection, am I right?
>
> I don't want to loose data, so dropping packets definately seems
> like the wrong thing to do. Unless that's how ingress filters work? 
> I haven't used them before.

Dropping packets will not mean losing data - it just means that the
TCP connections have to resend the packets and in general means that
the connection will throttle itself to the configured rate.

But ingress filtering as it is now works exactly like that. The packet
that you are receiving has already reached your machine and you either
drop it or accept it. If you wish to do something further, you can
look into IMQ.

> Fortunately I have access to the code of my server application,
> because it sounds like the easiest thing is going to be to just put
> per-session rate limiting into that.

Right, well, it probably is the easiest solution - just note that you
will be working from behind your own receive buffers and tcp windows,
which means that the connection might initially accept (burst) more
data than you expect before the buffers fill.

-- Naked

___
LARTC mailing list / [EMAIL PROTECTED]
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/


[LARTC] Re: per-session QoS

2004-02-05 Thread Nuutti Kotivuori
Ben wrote:
> Hey guys, I'm looking for a way to limit ingress throughput for each
> tcp session to a destination port on my server. I've found lots of
> ways to limit total throughput to a given port on an ip-level, but
> that's not quite the same thing.
>
> I'm somewhat surprised this doesn't seem to be implemented
> already. Maybe it is and I'm not seeing it?

I have a need for a very similar thing. But in my case, I wish to
schedule tcp sessions to a different transfer class if they transfer
faster than a certain speed.

Doing this on the actual traffic control side of things seems tricky,
since none of them have any notion of connections or tcp
sessions. Doing this by the way of the 'connbytes' match, eg. by
storing the data in the connection tracking table, seems rather easily
doable.

What you would wish to do is have a simple per connection token
bucket, and just DROP every packet exceeding the rate in the
connection, am I right?

What I would wish is a bit more complex. I'd like to have per
connection token bucket, but have it such that when it runs out of
tokens, the rule stops matching, but every packet will still take
whatever tokens there are in the bucket. And the rule would start
matching again only after a certain amount of tokens has again been
amassed in the bucket. This is to prevent too rapid churn between
different transfer classes per connection.

And I haven't found anything which would do this for me anywhere.

So, I might code it myself if no other solution comes up.

-- Naked

___
LARTC mailing list / [EMAIL PROTECTED]
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/