[LARTC] Re: RFC - bandwidth optimization idea

2005-07-11 Thread Don Cohen
  From: [EMAIL PROTECTED] (Paul Hampson)
  Wait, you're trying to send more data than the link can take? Then
No, of course I don't expect to send more than the limit.

  send UDP, throttle it at the local end with a drop-oldest qdisc. Then
  you get the effect of 'most recent data is best'. Anything more
Yes, that gives me most recent is best but that does not do what I
want except in a few weird cases.  If every packet is independent,
perhaps it would suffice to always send the newest, e.g., if I were
trying to tell the other side what's the latest clock time.  (In that
case I'd also limit the queue length to one.)  

  You gotta prioritise your data, using TOS or diffserv or something.
  Set your voice to real-time, so it always gets sent, and the your
  other applications can use unused packet-times. Use a dropping qdisc
This may be the best I can do in the current world where the
facility I described does not exist.  It does not solve the problem
I described.  TOS/diffserv etc is more for use by the intervening
infrastructure and this problem applies even in the case where there
is no congestion or delay at all in that infrastructure, but only in
the link from the sending machine.  Using real time is just a matter
of giving one application priority over others.  First, the link
itself may have varying bandwidth, and second the other applications
might also have urgent data to send.  Dropping packets can be
disastrous if they happen to contain critical data that is not 
duplicated in other packets.  At very least I have to be able to
find out which ones were dropped.  But better than all of that is 
the ability to decide what to send at the last moment.

  I have a vauge recollection that this sort of thing is discussed in
  Tannenbaum's Computer Networks textbook, to do with positional data
  of satellites or something. (eg. if the positional data is delayed,
  we write it off, we don't want to delay the data about where we are
  _now_ in order to know where we were _then_)
If the goal is to listen to the sound from .2 sec ago and it takes .1
sec to get there then clearly it's a waste of time to send data that's
older than .1 sec.  But the packet in the queue might have some data
that's older and some that's newer.  I can't drop part of it.  Instead
I'd like to know that the packet is about to be sent now, and respond
by finding the best data to send now.

  From: Ed W [EMAIL PROTECTED]
  This is a total pain to optimise.  Ideally I would like an API to be 
  able to limit the congestion window on the local machine for a 
  particular connection (which I don't think exists on either windows or 
  linux?).  This way the OS will report that the queue is full quickly to 
  the local program without buffering up a ton of data.
  
  The issue in my case is that you have two simultaneous streams in 
  transit for email, one to receive new mail and one to send mail out.  In 
  the case of the sat phone it's possible to have net buffers which are 20 
  secs or so long and so when you send out a status message to say email 
  received successfully, send me the next one, it can end up queued 
  behind a bunch of lower priority data for a VERY long time.  Often these 
  buffers are on the remote ISP end where you have very little control.  
  This is a serious slowdown on a link which is costing you $1.50/min.
I'm not sure I follow the problem, but if you're saying that one
stream should have priority over the other, it seems you could do
that with two different queues, one with priority over the other.
Or something like sfq could at least prevent one connection from
waiting for the other to send a lot of data.
___
LARTC mailing list
LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc


[LARTC] Re: RFC - bandwidth optimization idea

2005-07-09 Thread Don Cohen

  From: Andreas Klauer [EMAIL PROTECTED]
  Doesn't every QDisc work that way? When the kernel wants to send a packet, 
  it calls the appropriate dequeue() function in the QDisc. I'm not a kernel 
  developer so this guess might be wrong.
That's correct, but this operation takes a packet from an OS queue
and the only control the application has over that queue is to put
something into it.  One way to view the idea is that I want to make
it convenient for the application to decide what to put into the 
queue at the latest possible time without losing any of its available
bandwidth.  Think in terms of an OS callback to the application 
saying I'm ready to send your data now, what should I send?

  But still, I don't think that the queueing is the main problem with your 
  idea... the main problem is, how do you decide what's important and what 
  not, and what's obsolete?
This is up to the application of course.  See below.

  From: [EMAIL PROTECTED] (Paul Hampson)
  I believe the general solution to this is to use UDP, and make sure
The scheme I describe wouldn't make a lot of sense for tcp, which
after all specifies congestion control, retransmission, etc.
But UDP still goes through the queuing that I want to optimize.

  your source machine doesn't queue up packets locally (eg. ethernet
  network contention) and let the best-effort nature of UDP deal with
  dropping stuff that gets delayed.
The problem is that the OS is not helpful in avoiding queuing up
packets locally.  That's part of what I'm trying to fix.
For instance, a relatively cheap approximation would be to give
the application a way to see how many packets it has in the queue.
Then it could at least delay its decision about what to put into
the queue until the queue was short.  Even better would be to 
see an estimate of how long it will be before the next packet it
enqueues will be sent - like your call will be answered in
approximately 4 minutes.

  I'm not sure there's any way to have an 'I changed my mind about
  sending that' interface into your network stack... And generally
  it wouldn't be useful, data spends longer in transit than it does
  in your queues.
That depends on the rate at which the queue is emptied.
If your queue has a rate limit of 10bps then your packets can spend
a long time in the queue.
- There are slow links 
  (For instance, I recall hearing that submarines have very low rates.)
- The application might be allocated a small part of the bandwidth
  shared with other applications.

It occurs to me that an example where this would be helpful is
transmitting voice data over a low bandwidth link (like a cell phone).
Suppose you know that the actual transit time is .1 sec and you want
the listener to always hear what the speaker was saying .2 sec ago at
the best possible quality.
   
Suppose the available bandwidth is shared with other applications.
The voice application doesn't know when they will want to send or how
urgent their data might be.  Someone else decides that.  It just wants
to send the best possible data in the bandwidth allocated to it.  I
imagine is continually sampling the input and revising what it
considers to be the most valuable unsent data for the last .1 sec.
Whenever the OS decides it's time to send the next voice packet I want
it to send the latest idea of what's most valuable.  I don't want to
have to put data into the queue to wait for times that might depend on
what urgent communication might be required by other applications.
___
LARTC mailing list
LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc


[LARTC] RFC - bandwidth optimization idea

2005-07-08 Thread Don Cohen

I'm interested in all of
- opinions about why this is a good or bad idea
- pointers to similar proposals or products that already exist
- implementation suggestions

This is meant for real time applications that have small available
bandwidth and so they have to consider carefully what's the best way
to use that bandwidth.  I imagine that things happen that cause them
to continually reevaluate what's the most important/urgent thing to
send next.  I want to make it possible for them to delay the choice
until the OS is actually ready to send that next packet.  The reason
they can't do this now is that the OS enqueues packets.  Suppose an
application uses udp or tcp to tell the OS to send some data.  It then
discovers that data is obsolete.  The old data might still be in the
queue to be sent but it's too late to recall it.  One way to avoid
that is to always delay telling the OS to send something until the OS
is almost ready to send the next packet from the queue that your data
will enter.  But that's not so easy to do, and there's a big penalty
if you wait just a little too long.  What I want, at least
conceptually, is that the application maintains its own queue of data
to be sent, ordered by priority.  Whenever the OS is ready to send the
next packet for that application, it removes the highest priority
packet (if any) from the queue and sends it.
___
LARTC mailing list
LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc


[LARTC] facilities to output to monitoring interfaces

2005-01-30 Thread Don Cohen

How can one copy packets to a monitoring interface?
For a start I'd like to know how to just copy all of those
that arrive on eth1 out to eth2 in addition to whatever else
would normally happen to them.

After that, a number of interesting possibilities:

- Copy only those with specified properties.
(I suppose a random probability of copying fits into this
category.)

- Copy only those that are actually sent (so if the packet
is dropped anywhere along the way there's no false positive).

- Copy only part of the packet, say, only the first 64 bytes.

- Extract specified parts of packets and collect the results
into larger packets that hold the data for many of the original
packets.
___
LARTC mailing list / LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/


[LARTC] HTB burstable for 2 interface , how ?

2003-07-04 Thread Don Cohen

  INTERNET
  |
  |eth0 202=2E14=2E41=2E1
  BW=2EManager
  | |
  | +eth1192=2E168=2E1=2E0/24
  |
  +--eth2192=2E168=2E2=2E0/24
  
  Total incoming bandwidth to eth0 is 1024kbps
  should be shared to eth1 and eth2, which mean each get 512Kbps and
  burstable to 1024Kbps if other host is idle=2E

This doesn't make sense to me.
The fact that an internal host is idle does not justify not sending
traffic TO it.

The suggestions to use IMQ+HTB seem to miss the problem that 
if someone sends 1024 to eth1 then nobody has a chance to even
begine to send anything to eth2.

I think you want to allow borrowing only as long as the total
incoming rate from eth0 is sufficiently less than 1024 to be sure
that those sending to the lesser used internal interface can speed up.
In effect I think you have to sacrifice some part of your 1024 to make
sure the shaping is done at your machine.  I'm not sure how much you
have to sacrifice.  But suppose it's 24K, so you then have two htb
classes that have rate 500, ceil 1000.  And the parent class also
has ceil 1000.  That's critical.  That means that if we send at full
rate to eth1 then we still have room for someone to start sending to
eth2.  Then when someone does start sending, he initially gets 24K to
eth2.  At that point HTB reduces the traffic to eth1 by 24K in order
to stay below total 1000.  Then the guy sending to eth2 can increase
by 24K which will cause eth1 to drop another 24, etc.
As you can see, the amount you reserve (you might say waste) also
limits how fast the traffic equalizes.  

Does this make sense to everyone out there?
___
LARTC mailing list / [EMAIL PROTECTED]
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/


[LARTC] iptables u32 match code for review/testing/...

2002-12-27 Thread Don Cohen
 of the
 IP header itself.
 ... 0220x3C@024=0
 The first 0 means read bytes 0-3,
 22 means shift that 22 bits to the right.  Shifting 24 bits would give
   the first byte, so only 22 bits is four times that plus a few more bits.
 3C then eliminates the two extra bits on the right and the first four 
 bits of the first byte.
 For instance, if IHL=5 then the IP header is 20 (4 x 5) bytes long.
 In this case bytes 0-1 are (in binary) 0101 yyzz, 
 22 gives the 10 bit value 0101yy and 3C gives 010100.
 @ means to use this number as a new offset into the packet, and read
 four bytes starting from there.  This is the first 4 bytes of the icmp
 payload, of which byte 0 is the icmp type.  Therefore we simply shift
 the value 24 to the right to throw out all but the first byte and compare
 the result with 0.

Example: 
 tcp payload bytes 8-12 is any of 1, 2, 5 or 8
 First we test that the packet is a tcp packet (similar to icmp).
 --u32 60xFF=6  ...
 Next, test that it's not a fragment (same as above).
 ... 0220x3C@12260x3C@8=1,2,5,8
 0223C as above computes the number of bytes in the IP header.
 @ makes this the new offset into the packet, which is the start of the
 tcp header.  The length of the tcp header (again in 32 bit words) is
 the left half of byte 12 of the tcp header.  The 12263C
 computes this length in bytes (similar to the IP header before).
 @ makes this the new offset, which is the start of the tcp payload.
 Finally 8 reads bytes 8-12 of the payload and = checks whether the
 result is any of 1, 2, 5 or 8
*/

#include linux/module.h
#include linux/skbuff.h

#include linux/netfilter_ipv4/ipt_u32.h
#include linux/netfilter_ipv4/ip_tables.h

/* #include asm-i386/timex.h for timing */

MODULE_AUTHOR(Don Cohen [EMAIL PROTECTED]);
MODULE_DESCRIPTION(IP tables u32 matching module);
MODULE_LICENSE(GPL);

static int
match(const struct sk_buff *skb,
  const struct net_device *in,
  const struct net_device *out,
  const void *matchinfo,
  int offset,
  const void *hdr,
  u_int16_t datalen,
  int *hotdrop)
{
  const struct ipt_u32 *data = matchinfo;
  int testind, i;
  unsigned char* origbase = (char*)skb-nh.iph;
  unsigned char* base = origbase;
  unsigned char* head = skb-head;
  unsigned char* end = skb-end;
  int nnums, nvals;
  u_int32_t pos, val;
  /* unsigned long long cycles1, cycles2, cycles3, cycles4;
 cycles1 = get_cycles(); */
  for (testind=0; testind  data-ntests; testind++) {
base=origbase; /* reset for each test */
pos = data-tests[testind].location[0].number;
if (base+pos+3  end || base+pos  head) return 0;
val = (base[pos]24) + (base[pos+1]16) +
  (base[pos+2]8) + base[pos+3];
nnums = data-tests[testind].nnums;
for (i=1; innums; i++) {
  u_int32_t number = data-tests[testind].location[i].number;
  switch (data-tests[testind].location[i].nextop) {
  case IPT_U32_AND: val = val  number; break;
  case IPT_U32_LEFTSH: val = val  number; break;
  case IPT_U32_RIGHTSH: val = val  number; break;
  case IPT_U32_AT:
base = base + val;
pos = number;
if (base+pos+3  end || base+pos  head) return 0;
val = (base[pos]24) + (base[pos+1]16) +
  (base[pos+2]8) + base[pos+3];
break;
  }
}
nvals = data-tests[testind].nvalues;
for (i=0; i  nvals; i++) {
  if ((data-tests[testind].value[i].min = val) 
  (val = data-tests[testind].value[i].max))
{break;}}
if(i = data-tests[testind].nvalues) {
  /* cycles2 = get_cycles(); 
 printk(failed %d in %d cycles\n, testind, cycles2-cycles1); */
  return 0;}
  }
  /* cycles2 = get_cycles();
 printk(succeeded in %d cycles\n, cycles2-cycles1); */
  return 1;
}

static int
checkentry(const char *tablename,
   const struct ipt_ip *ip,
   void *matchinfo,
   unsigned int matchsize,
   unsigned int hook_mask)
{
  if (matchsize != IPT_ALIGN(sizeof(struct ipt_u32)))
return 0;
  return 1;
}

static struct ipt_match u32_match
  = { { NULL, NULL }, u32, match, checkentry, NULL, THIS_MODULE };

static int __init init(void)
{
  return ipt_register_match(u32_match);
}

static void __exit fini(void)
{
  ipt_unregister_match(u32_match);
}

module_init(init);
module_exit(fini);

 iptables-1.2.7a/extensions/libipt_u32.c
/* Shared library add-on to iptables to add u32 matching,
   generalized matching on values found at packet offsets

   Detailed doc is in the kernel module source
   net/ipv4/netfilter/ipt_u32.c
*/
#include stdio.h
#include netdb.h
#include string.h
#include stdlib.h
#include getopt.h
#include iptables.h
#include linux/netfilter_ipv4/ipt_u32.h
#include errno.h
#include ctype.h

/* Function which prints out usage message. */
static void
help(void)
{
  printf(
 u32 v%s options:\n
  --u32 tests\n
  tests := location = value | tests  location = value\n
  value := range | value , range\n
  range := number

Re: [LARTC] how to get the latency down on maxed out classes?

2002-12-09 Thread Don Cohen
Abraham van der Merwe writes:
  Hi Don!
  
 I then tried fifos. With small packet fifos the packet loss is just
 to great to be of any use and even then the latency is quite high (~200ms).

A small detail: what are small packet fifos?  You mean fifos that
can only hold a small number of packets?  Or fifos that only hold
packets with small numbers of bytes?

   You consider 200ms high?  One max size packet = 1500 bytes = 12Kbit
   which is about 200ms on a 64Kbit link.  You can't expect to do better.
  
  The problem is that with 200ms the packet loss is so much that the link is
  effectively useless (90% packet loss). As soon as I make the queue big
  enough to not drop significant amounts of packets, the latency goes way up
  (3 secs).

I don't understand the connection between 200ms and packet loss.
If you make the queue small (in packet capacity) then worst case
latency decreases.  Packet loss occurs in either case whenever a
packet arrives and the queue is full.

If you try to send at a higher rate than allowed then you will fill
the queue in either case (a small queue more quickly, of course),
and from then on you will lose packets.  If you send packets at twice 
the allowed rate you lose half of them, if you send at 10 times the
allowed rate you lose 90%.

The fact that you're losing lots of packets, though, indicates to me
that you're acting like an attacker, and dropping most of that traffic
is therefore exactly the right thing to do.

If you were using a correctly working tcp it would not continue to
send at 10 times the allowed rate.  It would notice that packets were
being lost and would slow down until the loss rate became very small.

Similarly, I don't understand the latency issue.  An application that
cares about latency will not create a large backlog.

What is this application that is sending faster than the link allows
and wants a low latency, and why is it misbehaving?
___
LARTC mailing list / [EMAIL PROTECTED]
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/



[LARTC] how to get the latency down on maxed out classes?

2002-12-08 Thread Don Cohen
  lets say I want to limit traffic to/from client to 64kbit. now, client opens
  a tcp connection blasting away at full speed.
  
  If client now pings isp, it gets on average around 7 seconds latency. I
  tried to improve this by using SFQ on the leaf nodes of my HTB hierarchy,
  but that does not really improve the situation, only makes it much worse.
  with SFQ I get anything between 250ms and 13 seconds latency.

You understand what's going on here?
As I recall, both pfifo and sfq default to queues of length 128
packets.  If you fill that with 1500 byte packets you have ~200Kbytes
which is about 1.6Mbits.  At 64Kbit/sec that would take ~30 sec to
send so your latency could be as high as 30 sec.
You can limit this latency by reducing the queue size.

On the other hand, the application that fills the queue evidently
doesn't mind large latency.  Otherwise it wouldn't fill the queue.

I think I posted to this list once a description (maybe even the
code?) of another way to limit latency - drop packets that have been
in the queue for more than a timeout period (I tend to use 3 sec).

SFQ should have the desirable result that one tcp connection won't
slow down another one or a ping.

  I then tried fifos. With small packet fifos the packet loss is just
  to great to be of any use and even then the latency is quite high (~200ms).
You consider 200ms high?  One max size packet = 1500 bytes = 12Kbit
which is about 200ms on a 64Kbit link.  You can't expect to do better.

  I'm thinking of using RED, but the number of parameters is daunting and I
  have no idea how the HTB rate correlates to packet size and burst rates for
  red.
RED should be independent of HTB.  
___
LARTC mailing list / [EMAIL PROTECTED]
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/



[LARTC] use or non-use of multiple processors in forwarding

2002-11-11 Thread Don Cohen
More info on this problem:
  /proc/stat shows me that all of the [packet forwarding] work is
   done by one cpu, the other does nothing. 

/var/log/messages shows the following interesting data:
 kernel: enabled ExtINT on CPU#0
 kernel: masked ExtINT on CPU#1

Does anyone know that this means, whether it would be related to the
problem above, and if so, how to change it?
___
LARTC mailing list / [EMAIL PROTECTED]
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/



[LARTC] use or non-use of multiple processors in forwarding

2002-11-08 Thread Don Cohen

I'm testing to see how fast A can ping C without losing packets.
  A -- B -- C
B is a dual processor (Intel(R) XEON(TM) CPU 1.80GHz) machine.

/proc/stat shows me that all of the work is done by one cpu, the other
does nothing. 
Does anyone have any ideas of why this should be the case and what
I can do to change it?  I'm hoping I can get higher throughput if
both of the cpu's participate.
___
LARTC mailing list / [EMAIL PROTECTED]
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/



[LARTC] Traffic shaping for upload download

2002-11-01 Thread Don Cohen
  Lets say we want to limit a customer usage to 256kbit total.
That is, you want to limit upload+download.

Whether or not it can be done, I think it's worth pointing out that
this is nonsense.  It makes sense to allocate A+B only if A and B can
be used to replace each other.  Upload and Download are not like that.
They're more like food and air - you need both.  If you have no air it
won't do any good to be given more food.

In your case the analogous thing is that you have a total of 1Mbit up
and 1Mbit down available, two users, and you allocate 1Mbit total to
each.  One decides to attack the other by using 1Mbit upload.  You
decide that's fair, the other (the victim) can just use 1Mbit
download.  Well, maybe he can, but it won't do him any good.  

I suggest instead that you allocate upload and download bandwidth
separately.
___
LARTC mailing list / [EMAIL PROTECTED]
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/



Re: [LARTC] Re: [release] ipsysctl tutorial 1.0.1

2002-10-28 Thread Don Cohen

  I'd like to ask for some clarifications, if not quoting, in the tutorial 
  on page x321.html (not sure of section numbers) re: syn cookies.

I don't understand what the question is here.

  Dan Bernstein (everyone's favorite mathematician :-) ) makes it very 

I was not aware of that.

  clear on http://cr.yp.to/syncookies.html that your warnings are 
  primarily FUD.  For the sake of quoting:
  A few people (notably Alexey Kuznetsov, Wichert Akkerman, and Perry 
  Metzger) have been spreading misinformation about SYN cookies. Here are 
  some of their bogus claims:

I was also not aware of any such controversy, but I think the points
below are correct.

  * SYN cookies ``present serious violation of TCP protocol.''
Reality: SYN cookies are fully compliant with the TCP protocol.
Every packet sent by a SYN-cookie server is something that could
also have been sent by a non-SYN-cookie server.
  * SYN cookies ``do not allow to use TCP extensions'' such as large
windows. Reality: SYN cookies don't hurt TCP extensions. A
connection saved by SYN cookies can't use large windows; but the
same is true without SYN cookies, because the connection would
have been destroyed.
  * SYN cookies cause ``massive hanging connections.'' Reality: With
or without SYN cookies, connections occasionally hang because a
computer or network is overloaded. Applications deal with this by
simply dropping idle connections.
  * SYN cookies cause ``serious degradation of service.'' Reality: SYN
cookies /improve/ service. They do take a small amount of CPU time
to compute, but that CPU time has to be spent anyway for
hard-to-predict sequence numbers; see RFC 1948.
  * SYN cookies cause ``magic resets.'' Reality: SYN cookies never
cause resets.
  
  These people also have the annoying habit of crediting their bogus 
  claims to other people, such as me. I don't know whether to attribute 
  this to malice or stupidity; either way, I would like the record to be 
  set straight.
  
  I invited Kuznetsov to either retract or defend his claims. He refused 
  to do so. I'm sure he's aware by now that his claims are false, and that 
  any attempted defense will be promptly ripped to shreds; but he's still 
  not admitting his errors. It's unfortunate that he doesn't have more 
  respect for the truth.
  
  I also invited Akkerman to either retract or defend his claims. He did 
  not respond.
  
___
LARTC mailing list / [EMAIL PROTECTED]
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/



[LARTC] congestion problem

2002-10-11 Thread Don Cohen


  Client --- R1 --- R2 --- R3 --- Web

  the Client it's me, the R1 router it's myne (so i can control it), the 
  R2 is my provider router, and R3 is the provider,provider router.

  R2 - R3 is a 2mbit link
  R1 - R2 is a 10mbit link
  R2 have multiple interfaces and other 10mbit links
  I have a 32kbit garanted bandwidth on the R2-R3, but without limit (rate 
  32kbit, ceil 2mbit)
You have guaranteed 32K upstream, downstream or both?
There's something strange about that in any case.
For upstream, how does r2 know which packets are from you?
Source address?  Then some other customer of your ISP could deny
you service by spoofing your address (unless your ISP filters that).
Downstream is also strange, first cause your ISP's ISP would then
have to know about you, second cause you have little control over what
others send you.  So if that is controlled at all it should be 
shaped in accordance with your wishes.

You talk about downloading.  But in that case the bandwidth is used
mostly downstream.  You have limited control over that.  Assuming
the servers are using tcp you could control the acks (more to the
point the windows) you send back to limit the rate at which they send
to you.
Of course, 32Kbit is slow enough that you're never likely to be happy
with download speed.

___
LARTC mailing list / [EMAIL PROTECTED]
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/



[LARTC] re: keeping up with file sharing programs

2002-10-09 Thread Don Cohen


I have a different proposal.
I think you should use ESFQ, always based on the internal IP address,
i.e., in the outbound direction base it on source, inbound use dest.
(This is to separately share upload and download bandwidth.)
That means that someone trying to use small bandwidth will get it
right away while those trying to use a lot will have to share it
equally with others.

I'm not optimistic about the other schemes that have been suggested:
- identify good ports 
  Then people will start using those ports for the bad stuff.
- identify bad ip addresses
  Then people will go around borrowing each others computers 
- even a separate class for the residence halls
  Then people will go to the academic buildings to use the computers
  there.  At a college I'd even expect that faculty members would
  let students borrow their computers, so there's not even much
  point to giving faculty a separate class.

It's true that this might cause trouble for people trying to do large
downloads of important stuff.  But are you supposed to know which
stuff is important?  When people claim that what they're doing is
important you can put them in the important class and tell all in
that class who the others are and that they're competing with each
other - so complain to each other before they complain to you.
In fact, I'd try to monitor the usage of such people and distribute
the results to them all, so they know who to blame.
When people complain about others whose stuff they think is not 
really important you can let some higher academic authority make 
the call.
___
LARTC mailing list / [EMAIL PROTECTED]
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/



[LARTC] re: Anyone else seen this one? (ping)

2002-10-01 Thread Don Cohen

 Anyone else seen this one?
  
  ping -n {remote host}
  { 6 second delay }
  64 bytes from 216.168.105.33: icmp_seq=0 ttl=255 time=6sec
  64 bytes from 216.168.105.33: icmp_seq=1 ttl=255 time=5sec
  64 bytes from 216.168.105.33: icmp_seq=2 ttl=255 time=4sec
  64 bytes from 216.168.105.33: icmp_seq=3 ttl=255 time=3sec
  64 bytes from 216.168.105.33: icmp_seq=4 ttl=255 time=2sec
  64 bytes from 216.168.105.33: icmp_seq=5 ttl=255 time=1sec
  64 bytes from 216.168.105.33: icmp_seq=6 ttl=255 time=242usec
  64 bytes from 216.168.105.33: icmp_seq=7 ttl=255 time=250usec
  ...

Yes, and I know just how to make it happen.

You put ping packets and some other type of packets into the same
low rate class.  Then you send a bunch of the other kind of packet
and start your ping.  Suppose your class is allowed to send 10 pps
and you start with 60 other packets (perhaps even ping packets
belonging to someone else) in the queue when you start your ping.
6 seconds later all of your ping packets get to the head of the queue
and are sent in the next second.  Of course, the first one was sent 6
sec ago, but the next was sent 5 sec. ago, the third 4 sec. ago, etc.
The first 6 replies all return at about the same time and look like
those above.  The rest appear at 1 sec intervals as you send them
and look like the last two.

BTW, this is pretty similar to the example that lead me to suggest a
limited lifetime in the queue.

___
LARTC mailing list / [EMAIL PROTECTED]
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/



Re: [LARTC] Iptables, SNAT/MASQ, Multiple gateways

2002-09-30 Thread Don Cohen

Simon Matthews writes:
  OK, this may be a reasonable approach, but how do I force it initiate 
  connections from the fast interface, yet allow it to fail over to the 
  slow interface if the sytem removes the route to the fast gateway because 
  it has detected that it is not responding? 

Off hand I don't know anything built in for this (I look forward to
hearing an answer from someone who does), but I don't think this is 
really what you want anyway.  It's not as if your link is the only one
that could fail!
If ISP1's upstream link fails then you want to use ISP2 for all
traffic other than that intended for ISP1 itself.  And of course, 
problems further upstream prevent you from reaching certain addresses
but not others, and you don't really know which without a global view
of the routing.

I think the right solution involves monitoring the traffic.
There's a wide range of things you could do, the simplest being 
simply detecting that the link is not responding.  You could also
try to detect tcp retransmits, measure RTT, aggregate data to measure
how well individual connections are working, further aggregate data to
determine which addresses blocks are working well and which poorly, etc.
Then use that data to decide which of your links to use for a given
destination.

I actually sent a proposal to this list that I think provides a good
solution to the general problem: an extension to TCP (possibly even
IP) that supports multiple addresses/ports.  This would even allow you
to switch addresses in the middle of a connection.  I think what I
described before applies more to the machine on the other side of your
connection, which now would know both of your addresses.  Whenever it
does a tcp retransmit it switches the address.  It therefore tends to
stay on the one that works most reliably.  (Perhaps this algorithm
could be improved to take speed into account too.)  This discussion
points out that something similar should be done on your end: you
should switch the output interface you use when you retransmit.

Of course this is not yet implemented.  It's on my queue, but not
close to the beginning.  I'd be glad if someone out there could beat
me to it. 
___
LARTC mailing list / [EMAIL PROTECTED]
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/



[LARTC] Anything out there that is similar to Cisco's WFQ?

2002-07-10 Thread Don Cohen

  From: CIT/Paul [EMAIL PROTECTED]
  Any help would be greatly appreciated :) This is much better than SFQ :

Sounds like SFQ to me.  Can you tell us what the differences are?
___
LARTC mailing list / [EMAIL PROTECTED]
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/



[LARTC] RE: Anything out there that is similar to Cisco's WFQ?

2002-07-10 Thread Don Cohen

Paul writes:
  No SFQ is not like WFQ... WRR is the closest thing to cisco's
  fair-queue..
  WRR keeps track of the connections using the ip_conntrack .. that's sort of
  what
  cisco's fair-queue does and it checks the bandwidth streams and gives lower
  priority
  to the higher streams and larger packets.. it's meant to reduce latency for
  traffic
  shaping and it does :)
  I haven't tried WRR but it looks like the closest thing to it although it
  doesn't
  take everything in to account as cisco's flow based WFQ does..

This is not very convincing.  Do you actually know how WFQ
works?  If so, please tell us.  The doc you sent did not describe how
it works but what the effects are, and those are entirely consistent
with what SFQ does. 
High bandwidth flows are limited, low bandwidth flows get lower
latency.  Can you describe some effect that's different?
___
LARTC mailing list / [EMAIL PROTECTED]
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/



[LARTC] Gigabit Etnernet router

2002-06-24 Thread Don Cohen

  Date: Mon, 24 Jun 2002 16:33:32 +0200 (CEST)
  From: M.F. PSIkappa [EMAIL PROTECTED]
  Subject: [LARTC] Gigabit Etnernet router
  
  Hi,
  I would like to build new router with 3 Gigabit Ethernet card. Need I
  dual procesor system or not ? I would like to have trafic controling (htb
  or cbq/sfq) and firewall (iptables) on this router.
  Can you recommend me some good motherborad with 64-bit PCI-X ?

More to the point, where can you get a motherboard with 3 64x66 PCI
buses?(!!)  Note that one gigabit card actually can use 2gigabits of
PCI bandwidth (one in, one out), and a pci bus is nowhere near 100%
efficient, so one 64x66 PCI bus has enough bandwidth to handle 1 such
card at full bandwidth, not enough for 2.

___
LARTC mailing list / [EMAIL PROTECTED]
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/



Re: [LARTC] Questions about IMQ

2002-06-13 Thread Don Cohen

Patrick McHardy writes:
  We're adding an htb as the qdisc for a child class of htb ?  Why?
  Isn't that just wasting time?  Can't all 10: stuff be done with 1:
  instead? 
  
  The root qdisc is used for delay simulation, 10:0 is the real qdisc
  ( http://luxik.cdi.cz/~devik/qos/htb/manual/userg.htm#prio )

So I was right, just to waste time.  That was not part of the spec, as
I recall.  So I suggest it be removed from the example.

  If more than one imq device is used you specify the one which should get 
  the packet with --todev argument to IMQ target.

Be sure to put that in the doc.  I didn't see it there.
I suppose the default is imq0 ?

  skb-dev doesn't get changed if thats what you mean ..

Ok, important to know that.  I gather there is currently no way to
read the imq mark from netfilter.  

  When the packet is eventually dequeued (if not dropped) then it
  goes where?  I'm hoping it goes to the beginning of pre-routing
  so we can apply conntrack/nat/mangle rules to it with -i imq0.
  
  No it doesn't. I think it doesn't make any sense to use any kind of 
  iptables rules on packets
  passing imq because all of them come from/go to real devices which you 
  can use in your rules.

But if you could read the imq mark then it would make a lot of sense.
These two things in combination would allow me to do what I want
without changing the code.  As it is, it looks like I need a local
variant of IMQ that runs before conntrack.  (On the other hand, this
is probably the more efficient solution anyhow.)

  I suspect this is not the case, since I see in the patch code
nf_reinject(skb, info, NF_ACCEPT)
  I'm not even sure netfilter supports what I want.
  I see in
   http://netfilter.samba.org/documentation/HOWTO//netfilter-hacking-HOWTO-3.html
5.NF_REPEAT: call this hook again. 
  but what's this hook ?  Is it the imq hook or pre_routing ?
  
  it's imq hook. from net/core/netfilter.c:
  nf_reinject(...)

As I thought, there's no convenient way for you to do what I want.

  you can easily change this order. i guess you already noticed if you 
  looked at the imq source.

Right.  But this is not a change that everyone would want.

  but are you sure this is necessary ? i guess your connection must be 
  extremly fast if someone
  wants to dos you through a connection tracking table fillup attack ...

My idea of extremely fast has changed recently.  Maybe it's a bit
ahead of yours.  First, I'm interested in protecting against attacks
from inside the firewall, and these are typically connected at
100Mbit.  Is that fast enough?  Next I've been playing with gigabit
cards.  Finally I visited sprint a few weeks ago and they're not
interested in anything as slow as one gigabit.  Although, for a
firewall, I admit that seems fast enough for the time being.

  Changing skb-dev to imq0 would result in something like this:
  ... - NF_HOOK(..) - imq - qdisc - reinject - continue NF_HOOK - 
  ... - dev_queue_xmit - qdisc - imq - reinject (CRASH!)

If you mean it could result in infinite loops, yes, but this is not
the first invention of infinite loops.  If your rules do the right
things then the loops can also be avoided.  Besides, that requires
my other request, that the reinject go back to the beginning of the
prerouting hook.  Without that it was completely plausible that the
skb dev could have been changed.  But I'm not complaining.  I just
wanted to know.

  If you look at the imq source you find a imq_skb_destructor, i though 
  about adding a comment that
  it's meant to save rusty's life. if skb's are freed inside qdiscs 
  kfree_skb will call the destructor which
  will do necessary things to protect rusty :)
Ok, I wouldn't want to contribute to his early demise.  This tends to
confirm my first guess, which was that the important thing here was to
free skbs when they are no longer in use.  I guess user mode can't free
them, but perhaps the better solution would have been to free them
before a copy is sent to user space and then recreating them if the
copy ever came back.  But I digress...

Thanks for all the answers.
___
LARTC mailing list / [EMAIL PROTECTED]
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/



[LARTC] a small htb question (I think)

2002-06-10 Thread Don Cohen

in the output of ip addr:

2: eth0:  ...  qdisc htb qlen 100

Does qlen 100 have anything to do with htb?
___
LARTC mailing list / [EMAIL PROTECTED]
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/



Re: [LARTC] SFQ buckets/extensions

2002-06-07 Thread Don Cohen

Alexander Atanasov writes:

  SFQ classify connections with ports, esfq classify them and just by IP so
  we can have flows:
   SRC IP+proto+DST IP+PORT - one flow
   just DST IP - one flow
   another we can think of - one flow
  So i think just about packets of size bytes but without protos which they
  carry and TCP with its features to tune becomes a kind of exception.

I don't think this is true.  First, of course, almost all packets TCP.
Second, I'd expect that multiple TCP streams along the same path tend
to equalize, assuming of course that they are not differentiated along
the way.  E.g., with just fifo queuing I'd expect that two scp's along
the same path would tend toward equal bandwidth.  If this is true then 
multiple TCP streams along the same path would tend to act like one.

Perhaps someone else out there can either confirm or debunk that
expectation?
Of course, things get more complicated when the streams have the same
source but different destinations.  In that case I'd expect them to
again adjust to the differences in bandwidth to the different
destinations.  So maybe sub-subqueues below the source IP subqueues
are not so important. 

   No, the errors are accumulated.  It's always within one packet of the
   ideal in terms of what's sent.  The advantage of measuring queue
   length in bytes would be more fairness in dropping, i.e., you should
   drop from a queue with 10 packets each of length 1000 bytes instead of
   a queue with 20 packets each of length 100 bytes.
  
   I didn't get it ? What do you mean - have different queues 
  agains packets sizes ?

I was suggesting that when you add a packet to a subqueue you not just
record the fact that there are now 16 packets in that subqueue, but
1600 bytes.  Then the limit on total queue size is measured in bytes.
When you enqueue, you check to see whether this packet would cause the
limit to be exceeded, and if so you drop from the subqueue with the
most bytes.

That's closer to the spirit of SFQ, but it's probably more expensive
in run time.  The current implementation has a very fast way to
determine which subqueue has the most packets.  The object is to make
enqueue/dequeue small constant time operations.  I don't see (so far)
how to do that with queue lengths measured in bytes.
___
LARTC mailing list / [EMAIL PROTECTED]
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/



[LARTC] (E)SFQ HRR (=Hierarchical Round Robin)

2002-06-06 Thread Don Cohen

  From: Martin Devera [EMAIL PROTECTED]
  Subject: [LARTC] (E)SFQ suggestion
  Hi,
  just simple note. Maybe it is already in progress :)
  
  There are attempts to replace hashing routine in SFQ to
  consider IPs or ports.
  What about to use HRR - roundrobin around bunch of IP
  adresses and then smaller WRR for ports per IP ?
  It would solve both problems - fairnes between computers
  (IP) and between flows on than single computer ...

(Took me a moment to figure out HRR=Hierarchical Round Robin.)

Yep, that has been on my queue for a long time.
(Though I'm interested in a different case than source IP in first
level and other stuff in second.)

Problems:
- it's a lot more storage (instead of small constant space per
subqueue we now have constant per sub-subqueue, so something 
like 128 = 128 x 20)
- double the time for two lookups
- nontrivial change in code.  Of course, the temptation would be to
make the code work with n levels.  

So far I've been able to get by without it.  That, along with the
disincentives above account for it remaining on the queue for so
long.  Some day I'll need it.  If I'm lucky someone else will get
there before me.

___
LARTC mailing list / [EMAIL PROTECTED]
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/



[LARTC] SFQ buckets/extensions

2002-06-05 Thread Don Cohen


  ... What if SFQ were to start with a minimal number of buckets, and
  track how 'deep' each bucket was, then go to a larger number of bits
  (2/4 at a time?) if the buckets hit a certain depth?  Theoretically,
  this would mean that 'fairness' would be achieved more often in current
  collision situations but that a smaller number of buckets would be
  necessary to achieve fairness in currently low-collision situations.
  
  I haven't looked at the SFQ code in a while, so I don't know how much
  benefit this would be in terms of processing time, or even how expensive
  it would be to change hash sizes on the fly, but at a certain level of
  resolution (+/- 2-4 bits), the changes wouldn't be terribly frequent
  anyway.

A few reactions:
- The only runtime cost of lots of buckets is a small amount of
storage for each bucket.  Allocating buckets at runtime also
introduces the problem that you could run out of space.
- There's no advantage to having many more buckets that the number 
of packets you're willing to queue, which is typically only on the
order of a few hundred.

 extensions
  And all the discussions tend to lead to the conclusion that there should
  be an sfq option (when the queue is created) for:
   a) how big the hash is
   b) whether to take into account source ports or not
   c) whether to take into account destination ports or not
   d) etc. :)
  
  Maybe someone who's written a qdisc would feel up to this?

I've been hoping to get to it, since I have other stuff I'd like to
incorporate into a new sfq version.  

  From: Alexander Atanasov [EMAIL PROTECTED]
   I've done some in this direction , probably needs more work, and
  it's poorly tested - expect b00ms ;)
   
   This adds a new qdisc for now - esfq which is a 100% clone of
  original sfq.
   - You can set all sfq parameters: hash table size, queue depths,
  queue limits.
   - You can choose from 3 hash types: original(classic), dst ip, src
  ip.
   Things to consider: perturbation with dst and src hashes is not
  good IMHO, you can try with perturb 0 if it couses trouble.
  
   Please, see the attached files.
  
   Plaing with it gives interesting results:  
   higher depth - makes flows equal slower
   small depth  - makes flows equal faster
   limit kills big delays when set at about 75-85% of depth.

I don't understand what these last three lines mean.  Could you
explain?

  
   Needs testings and mesurements - that's why i made it
  separate qdisc and not a patch over sfq, i wanted to compare both.
  
   Any feedback good or bad is welcome. 

I'll send you my current module, also a variant of SFQ.  It contains
doc that I think is worth including, also changes some of the code to
be more understandable, separates the number of packets allowed in the
queue from the number of buckets, supports the time limit (discussed
in earlier messages), controls these things via /proc, maybe a few
other things I'm forgetting.  This version does not support hashing on
different properties of the packet, cause it uses a totally different
criterion for identifying subclasses of traffic.  You can discard
that and restore the sfq hash with your modifications.  I think (hope)
these changes are pretty much independent.
___
LARTC mailing list / [EMAIL PROTECTED]
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/



Re: [LARTC] SFQ buckets/extensions

2002-06-05 Thread Don Cohen

Alexander Atanasov writes:
   At first look - i think i've have to incorporate my changes into
  your work. I've not done much just added hashes and unlocked what
  Alexey Kuznetsov did.
Not quite that simple.  You have to throw out about half of my file,
mostly the last third or so, which replaces the hash function, plus 
most of the /proc stuff, and probably a lot of other little pieces
scattered here and there.
I now recall a few other things - I support configuration of sevice 
weights for different subqueues, which makes no sense for sfq, also
I record the amount of service (bytes and packets) per subqueue and
report these to the tc -s -d stuff, which also makes no sense for sfq.
After removing all that stuff you then have to restore the hash.

 Plaing with it gives interesting results:  
 higher depth - makes flows equal slower
 small depth  - makes flows equal faster
 limit kills big delays when set at about 75-85% of depth.
   
   I don't understand what these last three lines mean.  Could you
   explain?
  
   depth is how much packets which are queued on a row of the
  hash table. If you have large queues (higher depth) sfq reacts slower when
  a new flow appears (it has to do more work to make queue lengths equal
  ). When you have short queues it reacts faster, so adjusting depth
  to your bandwidth and traffic type can make it do better work.
  I set bounded cbq class 320kbits and esfq with dst hash:
   Start an upload - it gets 40KB
   Start second one - it should get 20KB asap to be fair.
  With depth 128 it would take it let's say 6 sec. to make both 20KB, with
  depth 64 about 3sec - drop packets early with shorter queue.
  (i've to make some exact measurements since this is just an example
  and may not be correct). 
I don't see why that should be the case.  And I don't recall ever
observing it.  This adaptation time should be practically zero.
There's no work in making the queues equal.
(Let's use the word queue to mean the whole SFQ and subqueue for the
part sharing a hash index.)
If you have, say, 100 packets in one subqueue and 10 in another they're
already sharing the bandwidth 50-50. 

   limit sets a threshold on queued packets - if a packet exceeds
  it's dropped so delay is smaller, but when it tries to make flows equal it
  counts depth, not limit. With above example depth 128 and limit 100:
   When first upload enqueue 100 packets sfq starts to drop,
  but goal to make flows equal is 64 packets in queue. Flow doesn't get
  the 28 packets which are to be enqueued and delayed for a long time and
  probably dropped when recived.
I disagree that the goal is to make the subqueues the same length.
The goal is to serve them with the same bandwidth (as long as they
don't become empty.)
Queue length depends on how many packets each download is willing to
send without an ack.  If one is willing to send 100 and the other is
willing to send 10, then the subqueues will likely be length 100 and
10, but each will still get the same bandwidth.  Without window
scaling the max window size is 64K which is only about 45 packets,
so it's not really normal to have 100 packets in a subqueue.

___
LARTC mailing list / [EMAIL PROTECTED]
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/



Re: [LARTC] Re: More on qdiscs - about dangling backlogs

2002-05-06 Thread Don Cohen

Patrick McHardy writes:

  I don't think dropping at dequeue is necessary. 
Here's an example showing it is.
I have an SFQ with max queue size 128 and very low rate, say about
1 packet/sec that I use to limit the rate of SYN's.
Now as part of a test I send a syn flood, say 200 packets in one
second.  In that first second SFQ drops 200-128 but the time limit
won't drop any more on enqueue.  Now we're sending one packet/sec
and I try to open a tcp connection.
Suppose the age limit is 5 sec. and we drop on enqueue only.
If I try to open the tcp connection 3 sec after the flood, none of the
124 or so packets in the queue has expired and it's going to take 2
min. for my syn to get through.  Whereas, if dequeue drops expired
packets then I can get through in 2 sec.

   The goal is not to provide
  a maximum in-queue time, if the qdisc is able to send there is no reason 
  to drop.
The reason to drop is that it's a waste of time/bandwidth to send
expired packets, and even worse to make something unexpired wait for
them (and even worse yet if it has to wait long enough to expire
itself). 

The problem is if the qdisc is not able to send many packets 
  can get queued until drops occur. This means it takes a long time until 
  the sender receives indication of congestion. For TCP, congestion is 
  indicated by either consequent ACKS with the same ACK number or SACKs. 
In this case tcp should stop cause it doesn't get any acks cause its
packets are not getting forwarded.  Or if the problem is the other
direction, it should stop cause it's not getting the acks.

  ACKs are only generated by the receiver if something was actualy 
  received, so by dropping packets after some timeout the time until a 
  duplicate ACK is generated becomes smaller.
I think now we're getting into the subject of why it's good to 
drop packets that have been waiting for a long time.
If your objective is to generate a duplicate ack then I'm not sure
dropping packets is the right way.  After all, you might drop a packet
that would have generated one sooner.  For that matter, you might have
dropped one that would have generated a non-duplicate ack, which would
be even better for tcp.  Nevertheless, I do think it's good to drop 
packets that are not forwarded in a timely fashion.  It's just not a
simple argument.

  If you assume the expiration time to be smaller than the time requried 
  to fill the senders congestion window, it doesn't makes sense anymore to 
  drop packets during dequeue as this could possibly prevent a duplicate 
  ACK from beeing generated - relay indication of congestion.
  It sounds better to me to check for expired packets during enqueue 
  (timers would probably be too expensive i guess) and drop them before 
  enqueueing the new packet.
I'm not sure what you have in mind for the check here.
I was expecting that no packets would actually take a long time
between being received/generated and being enqueued.  Rather, when you
enqueue one, you can look in the queue to find any others that have
been in the queue too long.  But this does not catch as many cases as
checking at dequeue.
I was not proposing to set any timers.  Just record the time the packet
arrives and when you dequeue, see how long ago that was.

  With SFQ, what about not keeping a per-packet timeout but a per-flow 
  congestion-indication timeout which would drop the first packet of a 
  flow after max(age of any packet from this flow) = timeout ? This would 
  assure congestion indication to be delivered to the sender as soon as 
  the flow is chosen to send again after the timeout occured (as long as 
  qlen(flow)  1 which is probably the case).

I don't think I understand what you're proposing.  If we enqueue 50
packets from one flow all at once, and after sending 10 they all
expire (since they're all about the same age), then you want to drop
one?  Why is it worth sending the rest?
___
LARTC mailing list / [EMAIL PROTECTED]
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/



[LARTC] Re: More on qdiscs - about dangling backlogs

2002-05-05 Thread Don Cohen

Martin Devera writes:
  Hi,
  only a few notes on the theme. You are right with the displacement
  and bad enqueue byte counters. Maybe it would be better to cound
  packets at dequeue time only in clasfull qdisc. It also makes better
  sense because qdosc can also instruct SFQ to drop packet - this drop

I don't understand.  What other qdisc instructs SFQ to drop?

  is then not decremented from HTB/CBQ's stats.
  HTB uses enqueue time counting because CBQ does it too and at very

I figured that's why it worked that way.

  begining I based my work on CBQ. Seems it is time to change things.

  Another important info: you really CAN'T drop packets in dequeue
  routine if you don't want to fool classfull parent. Many logic
  in CBQ/HTB/ATM/PRIO qdisc is based on the existence of backlog.

I gather you only care whether this qdisc is waiting to send, not how
much is in the queue.

  When you drop in dequeue, parent will think that he itself still
  has some packets somewhere (in children) and will constantly attempt
  to find them. And will be confused by the fact that it can't.

  Enqueue routine can give you confirmation of packet enqueue, drop
  routine the same. Only dequeue can't say hey there is your skb and by
  the way two others was dropped. What a pity.

  SOLUTIONS:
  One way is to monitor (store) q.qlen variable of all children of
  classfull qdisc. When I call enqueue/drop/dequeue on it I'll see
  its discrepancy agains last stored value and will update my counter

This seems like the right solution, except that you shouldn't even
need to store a counter.  Just use the one in the child qdisc.
BTW, I think you are justified in assuming that this counter can only
change in two ways: decrease when dequeue is called (but possibly by
more than one packet) and increase by 1 when enqueue is called.
And I guess also requeue.  Which raises another problem, since that's
not called by the parent, right?

  accordingly. In the same way we could add q.bytelen and then we would be
  able to do the same for bytes - but this is not neccessary probably
  as we need to know only bytes dequeued ...

My impression is that every qdisc is currently supposed to keep track
of # packets in the queue, but not necessarily # bytes.
Best not to change that, especially since we don't even have any
currently proposed use for it.

  There is other way - add parameter to dequeue routine which tells us
  how many packets we should out qlen decrease by. But I think that former
  approach is simpler and prone to silent drops (for example by some
  timer).

I think you're saying the former approach is better, but prone 
suggests otherwise.

I prefer the former approach.  It's better for all the other qdiscs
out there to keep working without change.  There are only a few 
classful qdiscs that would have to change, and they could still work
unchanged with qdiscs that don't do silent drops.

  Don do you plan to implement these corrections for CBQ/PRIO ?

Not any time soon.  The last time I tried to read CBQ it was in order
to figure out what all those parameters mean, and I didn't get very
far.  Now that I no longer use it I have less incentive.

  I'll do the change for HTB.
Thanks.
But which ones?
It sounds like you plan to
- count as sent only what's returned from dequeue
- ignore the return value of enqueue and check the queue length to see
  whether it's empty
I like those, but also suggested that HTB would be a good place to add
code for dropping stale packets.  If you're interested in doing that I
should mention: 
Not all packets come with timestamps (e.g., locally generated ones).
Solution: at enqueue, if the timestamp is 0, set it to the current
time. 
___
LARTC mailing list / [EMAIL PROTECTED]
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/



[LARTC] rp filter questions

2002-05-03 Thread Don Cohen

   The rp_filter is also explained here:
  http://lartc.org/HOWTO//cvs/2.4routing/html/c1182.html#AEN1188
above says:
  for i in /proc/sys/net/ipv4/conf/*/rp_filter ; do
  echo 1  $i 
  done

First question:
 ls /proc/sys/net/ipv4/conf/*/rp_filter
 =
 /proc/sys/net/ipv4/conf/all/rp_filter
 /proc/sys/net/ipv4/conf/default/rp_filter
 /proc/sys/net/ipv4/conf/eth0/rp_filter
 /proc/sys/net/ipv4/conf/eth1/rp_filter
 /proc/sys/net/ipv4/conf/eth2/rp_filter
 /proc/sys/net/ipv4/conf/lo/rp_filter

What do all and default do?
Could the look above be replaced by just one?

Second question:
How does the runtime cost of rp_filter compare with that of rules like
iptables -A FORWARD -i eth1 -s ! 10.0.0.0/8 -j DROP

I assume in one case you have to do a route lookup, in the other you
have to iterate over the appropriate rules.  What are these costs?
Ideally the answers should be in terms of variables we know, such as 
the number of rules, the number of rules per interface, the number of
routes, etc.


___
LARTC mailing list / [EMAIL PROTECTED]
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/



[LARTC] re: Per-connection routing for multiple uplinks/providers

2002-04-16 Thread Don Cohen

  I have been digging through the Lartc documentation as well as Netfilter,
  etc. and haven't found much on per-connection routing for multiple
  uplinks/providers.
  
  What I would like to do is cleanly move packets out to the Internet over
  two (maybe 3) separate interfaces, utilizing all of the bandwidth, and
  avoiding snags.

What you (and everyone else) would really like is to make your two or
three links act like one link with bandwidth equal to the sum of the
parts.  As long as those different links have different ip addresses
(which will surely be the case if they connect to different providers)
this cannot be done.  You can indeed send traffic out in a manner that
uses all of your bandwidth (assuming your providers don't do the
ingress/egress filtering that they should - a pretty safe bet at the
moment, sad to say).  This does introduce additional problems since
packets you send are now much more likely to arrive out of order, and
for that reason alone it's probably not a good idea.

A more fundamental problem is that the incoming packets cannot share
the different links in the same way.  When you send a packet out you
have to choose one IP address as its source.  The reply will be sent
to that address and will have to arrive on the link with that address.
Thus, for example, if you have two links with the same incoming
bandwidth and only one connection, you can't use more than half of
your total incoming bandwidth for that connection.

  I could use a round-robin scheduler, which would put consecutive packets on
  different interfaces. I think this will run into problems when the reply
  packets come back. Maybe not ??
As long as the provider does no filtering this will work, but will
also cause packets to arrive out of order, which is bad for
performance. 

  I read through Arthur Leeuwen's documentation
  (http://lartc.org/HOWTO//cvs/2.4routing/html/x247.html )
  on a scheme for dividing the outgoing packets on a per-route basis. Packets
  going to the same destination will go through the same interface. This gets
  around the round-robin problem, but I think this is not 'fair' in the sense
  that one interface might accumulate more routes than the other, and there
  does not seem to be a mechanism (other than periodically flushing the route
  tables) for evening out the flows.  It is pretty simple though and I will
  use this as a first chop solution.
Who cares about fairness in the number of routes?  The important thing
is the bandwidth used by those routes.  And you can't balance that,
since you don't know when you choose the route what bandwidth will be
used by that route. 

  Another approach to the problem would be to do a round-robin on a
  per-connection basis. Each new connection would go out of the 'next'
  interface.
Again, the problem is that when you have to choose you don't know what
the bandwidth of the connection will be.  You'd do a little better
to measure the bandwidth being used currently on each link and assign
the next connection to the link with the most unused bandwidth.  But
of course, this is still only a poor approximation of what you want.

___
LARTC mailing list / [EMAIL PROTECTED]
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/



Re: [LARTC] Per-connection routing for multiple uplinks/providers

2002-04-16 Thread Don Cohen


Bob Gustafson writes:
  But, But, - this is really just software. We are not trying to cram wine
  bottles down the internet pipe (although many would really like to do
  that!).

The limitations I point out are inherent in tcp/ip.  I think I sent a
proposal to this list describing a modification to tcp that would
allow one connection to use many ip addresses (for each endpoint).
That would allow substantial improvement, since you would be able to
switch addresses in mid stream (in a live connection).  It would not
solve all of the problems.  In particular, you would not be able to
efficiently use both/all addresses at once because tcp has been
adapted to work well in the case where packets arrive in order.  That
could also perhaps be overcome with changes to tcp.  Note, however,
that these changes would only help you in cases where both machines
are using the modified versions.

  From the requestee point of view, I know how much bandwidth I need to
  listen to the BBC newscast, or to a company conference call. I can also
  request email and ftp sessions to work in the 'background' at a lower
  bandwidth allocation (cost?), but if I am talking to someone interactively,
  it would be nice if my packets were transferred at a regular rate without
  jitter or delay. IP doesn't do this, and one can argue that it cannot. But,
  the whole thing is run by software and software can change.

All of the things above can already be done on a single link.
What cannot be done is make two links work like one with the 
sum of the bandwidth.
___
LARTC mailing list / [EMAIL PROTECTED]
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/