[PATCH 2/2] [ACKVEC]: Schedule Sync as out-of-band mechanism

2008-02-25 Thread Gerrit Renker
The problem with Ack Vectors is that 

  i) their length is variable and can in principle grow quite large,
 ii) it is hard to predict exactly how large they will be.

Due to the second point it seems not a good idea to reduce the MPS; in
particular when on average there is enough room for the Ack Vector and an
increase in length is momentarily due to some burst loss, after which the
Ack Vector returns to its normal/average length.

The solution taken by this patch is to subtract a minimum-expected Ack Vector
length from the MPS (previous patch), and to defer any larger Ack Vectors onto
a separate Sync - but only if indeed there is no space left on the skb.

This patch provides the infrastructure to schedule Sync-packets for transporting
(urgent) out-of-band data. Its signalling is quicker than scheduling an Ack, 
since
it does not need to wait for new application data.

It can thus serve other parts of the DCCP code as well.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 include/linux/dccp.h |2 ++
 net/dccp/options.c   |   24 
 net/dccp/output.c|8 
 3 files changed, 30 insertions(+), 4 deletions(-)

--- a/include/linux/dccp.h
+++ b/include/linux/dccp.h
@@ -475,6 +475,7 @@ struct dccp_ackvec;
  * @dccps_hc_rx_insert_options - receiver wants to add options when acking
  * @dccps_hc_tx_insert_options - sender wants to add options when sending
  * @dccps_server_timewait - server holds timewait state on close (RFC 4340, 
8.3)
+ * @dccps_sync_scheduled - flag which signals "send out-of-band message soon"
  * @dccps_xmit_timer - timer for when CCID is not ready to send
  * @dccps_syn_rtt - RTT sample from Request/Response exchange (in usecs)
  */
@@ -515,6 +516,7 @@ struct dccp_sock {
__u8dccps_hc_rx_insert_options:1;
__u8dccps_hc_tx_insert_options:1;
__u8dccps_server_timewait:1;
+   __u8dccps_sync_scheduled:1;
struct timer_list   dccps_xmit_timer;
 };
 
--- a/net/dccp/options.c
+++ b/net/dccp/options.c
@@ -428,6 +428,7 @@ int dccp_insert_option_ackvec(struct soc
 {
struct dccp_sock *dp = dccp_sk(sk);
struct dccp_ackvec *av = dp->dccps_hc_rx_ackvec;
+   struct dccp_skb_cb *dcb = DCCP_SKB_CB(skb);
const u16 buflen = dccp_ackvec_buflen(av);
/* Figure out how many options do we need to represent the ackvec */
const u16 nr_opts = DIV_ROUND_UP(buflen, DCCP_SINGLE_OPT_MAXLEN);
@@ -436,10 +437,25 @@ int dccp_insert_option_ackvec(struct soc
const unsigned char *tail, *from;
unsigned char *to;
 
-   if (DCCP_SKB_CB(skb)->dccpd_opt_len + len > DCCP_MAX_OPT_LEN)
+   if (dcb->dccpd_opt_len + len > DCCP_MAX_OPT_LEN) {
+   DCCP_WARN("Lacking space for %u bytes on %s packet\n", len,
+ dccp_packet_name(dcb->dccpd_type));
return -1;
-
-   DCCP_SKB_CB(skb)->dccpd_opt_len += len;
+   }
+   /*
+* Since Ack Vectors are variable-length, we can not always predict
+* their size. To catch exception cases where the space is running out
+* on the skb, a separate Sync is scheduled to carry the Ack Vector.
+*/
+   if (len > DCCPAV_MIN_OPTLEN &&
+   len + dcb->dccpd_opt_len + skb->len > dp->dccps_mss_cache) {
+   DCCP_WARN("No space left for Ack Vector (%u) on skb (%u+%u), "
+ "MPS=%u ==> reduce payload size?\n", len, skb->len,
+ dcb->dccpd_opt_len, dp->dccps_mss_cache);
+   dp->dccps_sync_scheduled = 1;
+   return 0;
+   }
+   dcb->dccpd_opt_len += len;
 
to   = skb_push(skb, len);
len  = buflen;
@@ -480,7 +496,7 @@ int dccp_insert_option_ackvec(struct soc
/*
 * Each sent Ack Vector is recorded in the list, as per A.2 of RFC 4340.
 */
-   if (dccp_ackvec_update_records(av, DCCP_SKB_CB(skb)->dccpd_seq, nonce))
+   if (dccp_ackvec_update_records(av, dcb->dccpd_seq, nonce))
return -ENOBUFS;
return 0;
 }
--- a/net/dccp/output.c
+++ b/net/dccp/output.c
@@ -292,6 +292,8 @@ void dccp_write_xmit(struct sock *sk, in
if (err)
DCCP_BUG("err=%d after ccid_hc_tx_packet_sent",
 err);
+   if (dp->dccps_sync_scheduled)
+   dccp_send_sync(sk, dp->dccps_gsr, 
DCCP_PKT_SYNC);
} else {
dccp_pr_debug("packet discarded due to err=%d\n", err);
kfree_skb(skb);
@@ -564,6 +566,12 @@ void dccp_send_sync(struct sock *sk, con
DCCP_SKB_CB(skb)->dccpd_type = pkt_ty

[PATCH 1/2] [DCCP]: Leave headroom for options when calculating the MPS

2008-02-25 Thread Gerrit Renker
The Maximum Packet Size (MPS) is of interest for applications which want
to transfer data, so it is only relevant to the data transfer phase of a
connection (unless one wants to send data on the DCCP-Request, but that is
not considered here).

The strategy chosen to deal with this requirement is to leave room for only 
such options that may appear on data packets.

A special consideration applies to Ack Vectors: this is purely guesswork,
since these can have any length between 3 and 1020 bytes. The strategy
chosen here is to subtract a configurable minimum, the value of 16 bytes
(2 bytes for type/length plus 14 Ack Vector cells) has been found by 
experimentatation. If people experience this as too much or too little,
this could later be turned into a Kconfig option.   

There are currently no CCID-specific header options which may appear on data
packets, hence it is not necessary to define a corresponding CCID field.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Acked-by: Ian McDonald <[EMAIL PROTECTED]>
---
 net/dccp/ackvec.h |3 +++
 net/dccp/output.c |   22 ++
 2 files changed, 17 insertions(+), 8 deletions(-)

--- a/net/dccp/ackvec.h
+++ b/net/dccp/ackvec.h
@@ -21,6 +21,9 @@
 /* We can spread an ack vector across multiple options */
 #define DCCP_MAX_ACKVEC_LEN (DCCP_SINGLE_OPT_MAXLEN * 2)
 
+/* Estimated minimum average Ack Vector length - used for updating MPS */
+#define DCCPAV_MIN_OPTLEN  16
+
 #define DCCP_ACKVEC_STATE_RECEIVED 0
 #define DCCP_ACKVEC_STATE_ECN_MARKED   (1 << 6)
 #define DCCP_ACKVEC_STATE_NOT_RECEIVED (3 << 6)
--- a/net/dccp/output.c
+++ b/net/dccp/output.c
@@ -153,21 +153,27 @@ unsigned int dccp_sync_mss(struct sock *
struct inet_connection_sock *icsk = inet_csk(sk);
struct dccp_sock *dp = dccp_sk(sk);
u32 ccmps = dccp_determine_ccmps(dp);
-   int cur_mps = ccmps ? min(pmtu, ccmps) : pmtu;
+   u32 cur_mps = ccmps ? min(pmtu, ccmps) : pmtu;
 
/* Account for header lengths and IPv4/v6 option overhead */
cur_mps -= (icsk->icsk_af_ops->net_header_len + icsk->icsk_ext_hdr_len +
sizeof(struct dccp_hdr) + sizeof(struct dccp_hdr_ext));
 
/*
-* FIXME: this should come from the CCID infrastructure, where, say,
-* TFRC will say it wants TIMESTAMPS, ELAPSED time, etc, for now lets
-* put a rough estimate for NDP + TIMESTAMP + TIMESTAMP_ECHO + ELAPSED
-* TIME + TFRC_OPT_LOSS_EVENT_RATE + TFRC_OPT_RECEIVE_RATE + padding to
-* make it a multiple of 4
+* Leave enough headroom for common DCCP header options.
+* This only considers options which may appear on DCCP-Data packets, as
+* per table 3 in RFC 4340, 5.8. When running out of space for other
+* options (eg. Ack Vector which can take up to 255 bytes), it is better
+* to schedule a separate Ack. Thus we leave headroom for the following:
+*  - 1 byte for Slow Receiver (11.6)
+*  - 6 bytes for Timestamp (13.1)
+*  - 10 bytes for Timestamp Echo (13.3)
+*  - 8 bytes for NDP count (7.7, when activated)
+*  - 6 bytes for Data Checksum (9.3)
+*  - %DCCPAV_MIN_OPTLEN bytes for Ack Vector size (11.4, when enabled)
 */
-
-   cur_mps -= ((5 + 6 + 10 + 6 + 6 + 6 + 3) / 4) * 4;
+   cur_mps -= roundup(1 + 6 + 10 + dp->dccps_send_ndp_count * 8 + 6 +
+  (dp->dccps_hc_rx_ackvec ? DCCPAV_MIN_OPTLEN : 0), 4);
 
/* And store cached results */
icsk->icsk_pmtu_cookie = pmtu;
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[DCCP] [Patch v2 0/2] [Revision]: Fix Ack Vector MPS Minimum Value

2008-02-25 Thread Gerrit Renker
This is a resubmission to fix a problem with accounting for Ack Vector
length in the MPS.

The present solution did not work well: the MPS did not account for Ack Vectors,
so that applications which relied on the MPS value via getopt were 
disadvantaged 
by having all their Ack Vectors put onto Syncs -- sorely degrading performance.

Found out by testing with gstreamer DCCP plugin.

Hence the revision of these two patches implements a new strategy -- it now
 * subtracts an estimated minimum from the MPS - 
   currently set to 16 bytes (found via experimentation) and
 * schedules a Sync only if the actual length is greater than
   this minimum _and_ there is no space left on the skb.

Patch #1: Is the revised version of the account-for-option-sizes-in-MPS patch.
Patch #2: Is the revised "exception handler" for overly large Ack Vectors.

Both patches have been uploaded to the test tree on
git://eden-feed.erg.abnd.ac.uk/dccp_exp [dccp]
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/10] [ACKVEC]: Ack Vector interface clean-up

2008-02-21 Thread Gerrit Renker
This patch brings the Ack Vector interface up to date. Its main purpose is
to lay the basis for the subsequent patches of this set, which will use the
new data structure fields and routines.

There are no real algorithmic changes, rather an adaptation:

 (1) Replaced the static Ack Vector size (2) with a #define so that it can
 be adapted (with low loss / Ack Ratio, a value of 1 works, so 2 seems
 to be sufficient for the moment) and added a solution so that computing
 the ECN nonce will continue to work - even with larger Ack Vectors.

 (2) Replaced the #defines for Ack Vector states with a complete enum.

 (3) Replaced #defines to compute Ack Vector length and state with general
 purpose routines (inlines), and updated code to use these.

 (4) Added a `tail' field (conversion to circular buffer in subsequent patch).

 (5) Updated the (outdated) documentation for Ack Vector struct.

 (6) All sequence number containers now trimmed to 48 bits.

 (7) Removal of unused bits:
 * removed dccpav_ack_nonce from struct dccp_ackvec, since this is already
   redundantly stored in the `dccpavr_ack_nonce' (of Ack Vector record);
 * removed unused function to print Ack Vectors;
 * removed Elapsed Time for Ack Vectors (it was nowhere used);
 * replaced semantics of dccpavr_sent_len with dccpavr_ack_runlen, since
   the code needs to be able to remember the old run length;
 * reduced the de-/allocation routines (redundant / duplicate tests).


Justification for removing Elapsed Time information [can be removed]:
-
 1. The Elapsed Time information for Ack Vectors was nowhere used in the code.
 2. DCCP does not implement rate-based pacing of acknowledgments. The only
recommendation for always including Elapsed Time is in section 11.3 of
RFC 4340: "Receivers that rate-pace acknowledgements SHOULD [...]
include Elapsed Time options". But such is not the case here.
 3. It does not really improve estimation accuracy. The Elapsed Time field only
records the time between the arrival of the last acknowledgeable packet and
the time the Ack Vector is sent out. Since Linux does not (yet) implement
delayed Acks, the time difference will typically be small, since often the
arrival of a data packet triggers sending feedback at the HC-receiver.


Justification for changes in de-/allocation routines [can be removed]:
--
  * INIT_LIST_HEAD in dccp_ackvec_record_new was redundant, since the list
pointers were later overwritten when the node was added via list_add();
  * dccp_ackvec_record_new() was called in a single place only;
  * calls to list_del_init() before calling dccp_ackvec_record_delete() were
redundant, since subsequently the entire element was k-freed;
  * since all calls to dccp_ackvec_record_delete() were preceded to a call to
list_del_init(), the WARN_ON test would never evaluate to true;
  * since all calls to dccp_ackvec_record_delete() were made from within
list_for_each_entry_safe(), the test for avr == NULL was redundant;
  * list_empty() in ackvec_free was redundant, since the same condition is
embedded in the loop condition of the subsequent list_for_each_entry_safe().


Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/ackvec.c  |  205 +++
 net/dccp/ackvec.h  |  103 +---
 net/dccp/ccids/ccid2.c |   13 +--
 net/dccp/input.c   |6 +-
 4 files changed, 129 insertions(+), 198 deletions(-)

--- a/net/dccp/ackvec.h
+++ b/net/dccp/ackvec.h
@@ -3,83 +3,92 @@
 /*
  *  net/dccp/ackvec.h
  *
- *  An implementation of the DCCP protocol
+ *  An implementation of Ack Vectors for the DCCP protocol
+ *  Copyright (c) 2007 University of Aberdeen, Scotland, UK
  *  Copyright (c) 2005 Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
- *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms of the GNU General Public License version 2 as
  * published by the Free Software Foundation.
  */
 
 #include 
-#include 
 #include 
 #include 
 
 /* maximum size of a single TLV-encoded option (sans type/len bytes) */
 #define DCCP_SINGLE_OPT_MAXLEN  253
-/* We can spread an ack vector across multiple options */
-#define DCCP_MAX_ACKVEC_LEN (DCCP_SINGLE_OPT_MAXLEN * 2)
+/*
+ * Ack Vector buffer space is static, in multiples of %DCCP_SINGLE_OPT_MAXLEN,
+ * the maximum size of a single Ack Vector. Setting %DCCPAV_NUM_ACKVECS to 1
+ * will be sufficient for most cases of low Ack Ratios, using a value of 2 
gives
+ * more headroom if Ack Ratio is higher or when the sender acknowledges slowly.
+ */
+#define DCCPAV_NUM_ACKVECS 2
+#define DCCPAV_MAX_ACKVEC_LEN  (DCCP_SINGLE_OPT_MAXLEN * DCCPAV_NUM_ACKVECS)
+
+enum dccp_ackvec_states {
+   

[PATCH 09/10] [ACKVEC]: Remove old infrastructure

2008-02-21 Thread Gerrit Renker
This removes
 * functions for which updates have been provided in the preceding patches and
 * the @av_vec_len field - it is no longer necessary since the buffer length is
   now always computed dynamically;
 * conditional debugging code (CONFIG_IP_DCCP_ACKVEC).

The reason for removing the conditional debugging code is that Ack Vectors are
an almost inevitable necessity - RFC 4341 says that for CCID-2, Ack Vectors must
be used. Furthermore, the code would be only interesting for coding - after some
extensive testing with this patch set, having the debug code around is no longer
of real help.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/Kconfig   |3 -
 net/dccp/Makefile  |5 +-
 net/dccp/ackvec.c  |  252 
 net/dccp/ackvec.h  |   79 +---
 net/dccp/ccids/Kconfig |1 -
 5 files changed, 3 insertions(+), 337 deletions(-)

--- a/net/dccp/ackvec.c
+++ b/net/dccp/ackvec.c
@@ -9,19 +9,10 @@
  *  under the terms of the GNU General Public License as published by the
  *  Free Software Foundation; version 2 of the License;
  */
-
-#include "ackvec.h"
 #include "dccp.h"
-
-#include 
-#include 
-#include 
 #include 
-#include 
 #include 
 
-#include 
-
 static struct kmem_cache *dccp_ackvec_slab;
 static struct kmem_cache *dccp_ackvec_record_slab;
 
@@ -284,249 +275,6 @@ void dccp_ackvec_input(struct dccp_ackvec *av, u64 seqno, 
u8 state)
}
 }
 
-/*
- * If several packets are missing, the HC-Receiver may prefer to enter multiple
- * bytes with run length 0, rather than a single byte with a larger run length;
- * this simplifies table updates if one of the missing packets arrives.
- */
-static inline int dccp_ackvec_set_buf_head_state(struct dccp_ackvec *av,
-const unsigned int packets,
-const unsigned char state)
-{
-   unsigned int gap;
-   long new_head;
-
-   if (av->av_vec_len + packets > DCCPAV_MAX_ACKVEC_LEN)
-   return -ENOBUFS;
-
-   gap  = packets - 1;
-   new_head = av->av_buf_head - packets;
-
-   if (new_head < 0) {
-   if (gap > 0) {
-   memset(av->av_buf, DCCPAV_NOT_RECEIVED,
-  gap + new_head + 1);
-   gap = -new_head;
-   }
-   new_head += DCCPAV_MAX_ACKVEC_LEN;
-   }
-
-   av->av_buf_head = new_head;
-
-   if (gap > 0)
-   memset(av->av_buf + av->av_buf_head + 1,
-  DCCPAV_NOT_RECEIVED, gap);
-
-   av->av_buf[av->av_buf_head] = state;
-   av->av_vec_len += packets;
-   return 0;
-}
-
-/*
- * Implements the RFC 4340, Appendix A
- */
-int dccp_ackvec_add(struct dccp_ackvec *av, const struct sock *sk,
-   const u64 ackno, const u8 state)
-{
-   u8 *cur_head = av->av_buf + av->av_buf_head,
-  *buf_end  = av->av_buf + DCCPAV_MAX_ACKVEC_LEN;
-   /*
-* Check at the right places if the buffer is full, if it is, tell the
-* caller to start dropping packets till the HC-Sender acks our ACK
-* vectors, when we will free up space in av_buf.
-*
-* We may well decide to do buffer compression, etc, but for now lets
-* just drop.
-*
-* From Appendix A.1.1 (`New Packets'):
-*
-*  Of course, the circular buffer may overflow, either when the
-*  HC-Sender is sending data at a very high rate, when the
-*  HC-Receiver's acknowledgements are not reaching the HC-Sender,
-*  or when the HC-Sender is forgetting to acknowledge those acks
-*  (so the HC-Receiver is unable to clean up old state). In this
-*  case, the HC-Receiver should either compress the buffer (by
-*  increasing run lengths when possible), transfer its state to
-*  a larger buffer, or, as a last resort, drop all received
-*  packets, without processing them whatsoever, until its buffer
-*  shrinks again.
-*/
-
-   /* See if this is the first ackno being inserted */
-   if (av->av_vec_len == 0) {
-   *cur_head = state;
-   av->av_vec_len = 1;
-   } else if (after48(ackno, av->av_buf_ackno)) {
-   const u64 delta = dccp_delta_seqno(av->av_buf_ackno, ackno);
-
-   /*
-* Look if the state of this packet is the same as the
-* previous ackno and if so if we can bump the head len.
-*/
-   if (delta == 1 && dccp_ackvec_state(cur_head) == state &&
-   dccp_ackvec_runlen(cur_head) < DCCPAV_MAX_RUNLEN)
-   *cur_head += 1;
-  

[PATCH 10/10] [ACKVEC]: Separate option parsing from CCID processing

2008-02-21 Thread Gerrit Renker
This patch replaces an almost identical replication of code: large parts
of dccp_parse_options() re-appeared as ccid2_ackvector() in ccid2.c.

Apart from the duplication, this caused two more problems:
 1. CCIDs should not need to be concerned with parsing header options;
 2. one can not assume that Ack Vectors appear as a contiguous area within an
skb, it is legal to insert other options and/or padding in between. The
current code would throw an error and stop reading in such a case.

The patch provides a new data structure and associated list housekeeping.

Only small changes were necessary to integrate with CCID-2: data structure
initialisation, adapt list traversal routine, and add call to the provided
cleanup routine.

The latter also lead to fixing the following BUG: CCID-2 so far ignored
Ack Vectors on all packets other than Ack/DataAck, which is incorrect,
since Ack Vectors can be present on any packet that has an Ack field.

Details:

 * received Ack Vectors are parsed by dccp_parse_options() alone, which passes
   the result on to the CCID-specific routine ccid_hc_tx_parse_options();
 * CCIDs interested in using/decoding Ack Vector information will add code
   to fetch parsed Ack Vectors via this interface;
 * a data structure, `struct dccp_ackvec_parsed' is provided as interface;
 * this structure arranges Ack Vectors of the same skb into a FIFO order;
 * a doubly-linked list is used to keep the required FIFO code small.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/ackvec.c  |   28 ++
 net/dccp/ackvec.h  |   19 +++
 net/dccp/ccids/ccid2.c |  136 +++-
 net/dccp/ccids/ccid2.h |2 +
 net/dccp/options.c |   17 ---
 5 files changed, 102 insertions(+), 100 deletions(-)

--- a/net/dccp/ackvec.h
+++ b/net/dccp/ackvec.h
@@ -112,4 +112,23 @@ static inline bool dccp_ackvec_is_empty(const struct 
dccp_ackvec *av)
 {
return av->av_overflow == 0 && av->av_buf_head == av->av_buf_tail;
 }
+
+/**
+ * struct dccp_ackvec_parsed  -  Record offsets of Ack Vectors in skb
+ * @vec:   start of vector (offset into skb)
+ * @len:   length of @vec
+ * @nonce: whether @vec had an ECN nonce of 0 or 1
+ * @node:  FIFO - arranged in descending order of ack_ackno
+ * This structure is used by CCIDs to access Ack Vectors in a received skb.
+ */
+struct dccp_ackvec_parsed {
+   u8   *vec,
+len,
+nonce:1;
+   struct list_head node;
+};
+
+extern int dccp_ackvec_parsed_add(struct list_head *head,
+ u8 *vec, u8 len, u8 nonce);
+extern void dccp_ackvec_parsed_cleanup(struct list_head *parsed_chunks);
 #endif /* _ACKVEC_H */
--- a/net/dccp/ackvec.c
+++ b/net/dccp/ackvec.c
@@ -345,6 +345,34 @@ free_records:
}
 }
 
+/*
+ * Routines to keep track of Ack Vectors received in an skb
+ */
+int dccp_ackvec_parsed_add(struct list_head *head, u8 *vec, u8 len, u8 nonce)
+{
+   struct dccp_ackvec_parsed *new = kmalloc(sizeof(*new), GFP_ATOMIC);
+
+   if (new == NULL)
+   return -ENOBUFS;
+   new->vec   = vec;
+   new->len   = len;
+   new->nonce = nonce;
+
+   list_add_tail(&new->node, head);
+   return 0;
+}
+EXPORT_SYMBOL_GPL(dccp_ackvec_parsed_add);
+
+void dccp_ackvec_parsed_cleanup(struct list_head *parsed_chunks)
+{
+   struct dccp_ackvec_parsed *cur, *next;
+
+   list_for_each_entry_safe(cur, next, parsed_chunks, node)
+   kfree(cur);
+   INIT_LIST_HEAD(parsed_chunks);
+}
+EXPORT_SYMBOL_GPL(dccp_ackvec_parsed_cleanup);
+
 int __init dccp_ackvec_init(void)
 {
dccp_ackvec_slab = kmem_cache_create("dccp_ackvec",
--- a/net/dccp/options.c
+++ b/net/dccp/options.c
@@ -135,13 +135,6 @@ int dccp_parse_options(struct sock *sk, struct 
dccp_request_sock *dreq,
if (rc)
goto out_invalid_option;
break;
-   case DCCPO_ACK_VECTOR_0:
-   case DCCPO_ACK_VECTOR_1:
-   if (dccp_packet_without_ack(skb))   /* RFC 4340, 11.4 */
-   break;
-   dccp_pr_debug("%s Ack Vector (len=%u)\n", dccp_role(sk),
- len);
-   break;
case DCCPO_TIMESTAMP:
if (len != 4)
goto out_invalid_option;
@@ -229,6 +222,16 @@ int dccp_parse_options(struct sock *sk, struct 
dccp_request_sock *dreq,
 value) != 0)
goto out_invalid_option;
break;
+   case DCCPO_ACK_VECTOR_0:
+   case DCCPO_ACK_VECTOR_1:
+   if (dccp_packet_without_ack(sk

[PATCH 04/10] [ACKVEC]: Implementation of circular Ack Vector buffer with overflow handling

2008-02-21 Thread Gerrit Renker
This completes the implementation of a circular buffer for Ack Vectors, by
extending the current (linear array-based) implementation.  The changes are:

 (a) An `overflow' flag to deal with the case of overflow. As before, dynamic
 growth of the buffer will not be supported; but code will be added to deal
 robustly with overflowing Ack Vector buffers.

 (b) A `tail_seqno' field. When naively implementing the algorithm of Appendix A
 in RFC 4340, problems arise whenever subsequent Ack Vector records overlap,
 which can bring the entire run length calculation completely out of synch.
 (This is documented on http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/\
 ack_vectors/tracking_tail_ackno/ .)
 (c) The buffer lengthi is now computed dynamically (i.e. current fill level),
 as the span between head to tail.

As a result, dccp_ackvec_pending() is now simpler - the #ifdef is no longer
necessary since buf_empty is always true when IP_DCCP_ACKVEC is not configured.

Note on overflow handling:
-
 The Ack Vector code previously simply started to drop packets when the
 Ack Vector buffer overflowed. This means that the userspace application
 will not be able to receive, only because of an Ack Vector storage problem.

 Furthermore, overflow may be transient, so that applications may later
 recover from the overflow. Recovering from dropped packets is more difficult
 (e.g. video key frames).

 Hence the patch uses a different policy: when the buffer overflows, the oldest
 entries are subsequently overwritten. This has a higher chance of recovery.
 Details are on http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ack_vectors/

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/ackvec.c  |   33 +++--
 net/dccp/ackvec.h  |   17 ++---
 net/dccp/dccp.h|   14 +++---
 net/dccp/options.c |   12 ++--
 4 files changed, 58 insertions(+), 18 deletions(-)

--- a/net/dccp/ackvec.h
+++ b/net/dccp/ackvec.h
@@ -22,6 +22,7 @@
  * the maximum size of a single Ack Vector. Setting %DCCPAV_NUM_ACKVECS to 1
  * will be sufficient for most cases of low Ack Ratios, using a value of 2 
gives
  * more headroom if Ack Ratio is higher or when the sender acknowledges slowly.
+ * The maximum value is bounded by the u16 types for indices and functions.
  */
 #define DCCPAV_NUM_ACKVECS 2
 #define DCCPAV_MAX_ACKVEC_LEN  (DCCP_SINGLE_OPT_MAXLEN * DCCPAV_NUM_ACKVECS)
@@ -53,8 +54,10 @@ static inline u8 dccp_ackvec_state(const u8 *cell)
  * @av_buf_head:   head index; begin of live portion in @av_buf
  * @av_buf_tail:   tail index; first index _after_ the live portion in @av_buf
  * @av_buf_ackno:  highest seqno of acknowledgeable packet recorded in @av_buf
+ * @av_tail_ackno: lowest  seqno of acknowledgeable packet recorded in @av_buf
  * @av_buf_nonce:  ECN nonce sums, each covering subsequent segments of up to
  *%DCCP_SINGLE_OPT_MAXLEN cells in the live portion of @av_buf
+ * @av_overflow:   if 1 then buf_head == buf_tail indicates buffer wraparound
  * @av_records:   list of %dccp_ackvec_record (Ack Vectors sent 
previously)
  * @av_veclen:length of the live portion of @av_buf
  */
@@ -63,7 +66,9 @@ struct dccp_ackvec {
u16 av_buf_head;
u16 av_buf_tail;
u64 av_buf_ackno:48;
+   u64 av_tail_ackno:48;
boolav_buf_nonce[DCCPAV_NUM_ACKVECS];
+   u8  av_overflow:1;
struct list_headav_records;
u16 av_vec_len;
 };
@@ -111,10 +116,11 @@ extern int dccp_ackvec_parse(struct sock *sk, const 
struct sk_buff *skb,
 const u8 *value, const u8 len);
 
 extern int  dccp_ackvec_update_records(struct dccp_ackvec *av, u64 seq, u8 
sum);
+extern u16  dccp_ackvec_buflen(const struct dccp_ackvec *av);
 
-static inline int dccp_ackvec_pending(const struct dccp_ackvec *av)
+static inline bool dccp_ackvec_is_empty(const struct dccp_ackvec *av)
 {
-   return av->av_vec_len;
+   return av->av_overflow == 0 && av->av_buf_head == av->av_buf_tail;
 }
 #else /* CONFIG_IP_DCCP_ACKVEC */
 static inline int dccp_ackvec_init(void)
@@ -158,9 +164,14 @@ static inline int dccp_ackvec_update_records(struct 
dccp_ackvec *av, u64 seq, u8
return -1;
 }
 
-static inline int dccp_ackvec_pending(const struct dccp_ackvec *av)
+static inline u16 dccp_ackvec_buflen(const struct dccp_ackvec *av)
 {
return 0;
 }
+
+static inline bool dccp_ackvec_is_empty(const struct dccp_ackvec *av)
+{
+   return true;
+}
 #endif /* CONFIG_IP_DCCP_ACKVEC */
 #endif /* _ACKVEC_H */
--- a/net/dccp/ackvec.c
+++ b/net/dccp/ackvec.c
@@ -30,8 +30,8 @@ struct dccp_ackvec *dccp_ackvec_alloc(const gfp_t priority)
struct dccp_ackvec *av

[PATCH 05/10] [ACKVEC]: Algorithm to update buffer state

2008-02-21 Thread Gerrit Renker
This provides a routine to consistently update the buffer state when the
peer acknowledges receipt of Ack Vectors; updating state in the list of Ack
Vectors as well as in the circular buffer.

While based on RFC 4340, several additional (and necessary) precautions were
added to protect the consistency of the buffer state. These additions are
essential, since analysis and experience showed that the basic algorithm was
insufficient for this task (which lead to problems that were hard to debug).

The algorithm now
 * deals with HC-sender acknowledging to HC-receiver and vice versa,
 * keeps track of the last unacknowledged but received seqno in tail_ackno,
 * has special cases to reset the overflow condition when appropriate,
 * is protected against receiving older information (would mess up buffer 
state).

NOTE: The older code performed an unnecessary step, where the sender cleared
Ack Vector state by parsing the Ack Vector received by the HC-receiver. Doing
this was entirely redundant, since
 * the receiver always puts the full acknowledgment window (groups 2,3 in 
11.4.2)
   into the Ack Vectors it sends; hence the HC-receiver is only interested in 
the
   highest state that the HC-sender received;
 * this means that the acknowledgment number on the (Data)Ack from the HC-sender
   is sufficient; and work done in parsing earlier state is not necessary, since
   the later state subsumes the  earlier one (see also RFC 4340, A.4).
This older interface (dccp_ackvec_parse()) is therefore removed.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/ackvec.c  |   88 
 net/dccp/ackvec.h  |6 +++
 net/dccp/input.c   |4 +-
 net/dccp/options.c |   12 ++-
 4 files changed, 100 insertions(+), 10 deletions(-)

--- a/net/dccp/ackvec.h
+++ b/net/dccp/ackvec.h
@@ -116,6 +116,7 @@ extern int dccp_ackvec_parse(struct sock *sk, const struct 
sk_buff *skb,
 const u8 *value, const u8 len);
 
 extern int  dccp_ackvec_update_records(struct dccp_ackvec *av, u64 seq, u8 
sum);
+extern void dccp_ackvec_clear_state(struct dccp_ackvec *av, const u64 ackno);
 extern u16  dccp_ackvec_buflen(const struct dccp_ackvec *av);
 
 static inline bool dccp_ackvec_is_empty(const struct dccp_ackvec *av)
@@ -147,6 +148,11 @@ static inline int dccp_ackvec_add(struct dccp_ackvec *av, 
const struct sock *sk,
return -1;
 }
 
+static inline void dccp_ackvec_clear_state(struct dccp_ackvec *av,
+   const u64 ackno)
+{
+}
+
 static inline void dccp_ackvec_check_rcv_ackno(struct dccp_ackvec *av,
   struct sock *sk, const u64 ackno)
 {
--- a/net/dccp/ackvec.c
+++ b/net/dccp/ackvec.c
@@ -95,6 +95,24 @@ int dccp_ackvec_update_records(struct dccp_ackvec *av, u64 
seqno, u8 nonce_sum)
return 0;
 }
 
+static struct dccp_ackvec_record *dccp_ackvec_lookup(struct list_head *av_list,
+const u64 ackno)
+{
+   struct dccp_ackvec_record *avr;
+   /*
+* Exploit that records are inserted in descending order of sequence
+* number, start with the oldest record first. If @ackno is `before'
+* the earliest ack_ackno, the packet is too old to be considered.
+*/
+   list_for_each_entry_reverse(avr, av_list, avr_node) {
+   if (avr->avr_ack_seqno == ackno)
+   return avr;
+   if (before48(ackno, avr->avr_ack_seqno))
+   break;
+   }
+   return NULL;
+}
+
 /*
  * Buffer index and length computation using modulo-buffersize arithmetic.
  * Note that, as pointers move from right to left, head is `before' tail.
@@ -359,6 +377,76 @@ int dccp_ackvec_parse(struct sock *sk, const struct 
sk_buff *skb,
return 0;
 }
 
+/**
+ * dccp_ackvec_clear_state  -  Perform house-keeping / garbage-collection
+ * This routine is called when the peer acknowledges the receipt of Ack Vectors
+ * up to and including @ackno. While based on on section A.3 of RFC 4340, here
+ * are additional precautions to prevent corrupted buffer state. In particular,
+ * we use tail_ackno to identify outdated records; it always marks the earliest
+ * packet of group (2) in 11.4.2.
+ */
+void dccp_ackvec_clear_state(struct dccp_ackvec *av, const u64 ackno)
+ {
+   struct dccp_ackvec_record *avr, *next;
+   u8 runlen_now, eff_runlen;
+   s64 delta;
+
+   avr = dccp_ackvec_lookup(&av->av_records, ackno);
+   if (avr == NULL)
+   return;
+   /*
+* Deal with outdated acknowledgments: this arises when e.g. there are
+* several old records and the acks from the peer come in slowly. In
+* that case we may still have records that pre-date tail_ackno.
+*/
+   delta = dccp_delta_seqno(av->av_tail_ackno, avr->avr_ack_ackno);
+   if (delta <

[PATCH 06/10] [ACKVEC]: Update code for the Ack Vector input/registration routine

2008-02-21 Thread Gerrit Renker
This patch uupdates the code which registers new packets as received, using the
new circular buffer interface. It contributes a new algorithm which
* supports both tail/head pointers and buffer wrap-around and
* deals with overflow (head/tail move in lock-step).

The updated code is also partioned differently, into
1. dealing with the empty buffer,
2. adding new packets into non-empty buffer,
3. reserving space when encountering a `hole' in the sequence space,
4. updating old state and deciding when old state is irrelevant.

Protection against large burst losses: With regard to (3), it is too costly to
reserve space when there are large bursts of losses. When bursts get too large,
the code does no longer reserve space and just fills in cells normally. This
measure reduces space consumption by a factor of 63.

The code reuses in part the previous implementation by Arnaldo de Melo.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/ackvec.c |  150 +
 net/dccp/ackvec.h |9 +++
 2 files changed, 159 insertions(+), 0 deletions(-)

--- a/net/dccp/ackvec.h
+++ b/net/dccp/ackvec.h
@@ -27,6 +27,9 @@
 #define DCCPAV_NUM_ACKVECS 2
 #define DCCPAV_MAX_ACKVEC_LEN  (DCCP_SINGLE_OPT_MAXLEN * DCCPAV_NUM_ACKVECS)
 
+/* Threshold for coping with large bursts of losses */
+#define DCCPAV_BURST_THRESH(DCCPAV_MAX_ACKVEC_LEN / 8)
+
 enum dccp_ackvec_states {
DCCPAV_RECEIVED =   0x00,
DCCPAV_ECN_MARKED = 0x40,
@@ -115,6 +118,7 @@ extern int dccp_ackvec_parse(struct sock *sk, const struct 
sk_buff *skb,
 u64 *ackno, const u8 opt,
 const u8 *value, const u8 len);
 
+extern void dccp_ackvec_input(struct dccp_ackvec *av, u64 seqno, u8 state);
 extern int  dccp_ackvec_update_records(struct dccp_ackvec *av, u64 seq, u8 
sum);
 extern void dccp_ackvec_clear_state(struct dccp_ackvec *av, const u64 ackno);
 extern u16  dccp_ackvec_buflen(const struct dccp_ackvec *av);
@@ -142,6 +146,11 @@ static inline void dccp_ackvec_free(struct dccp_ackvec *av)
 {
 }
 
+static inline void dccp_ackvec_input(struct dccp_ackvec *av, u64 seq, u8 state)
+{
+
+}
+
 static inline int dccp_ackvec_add(struct dccp_ackvec *av, const struct sock 
*sk,
  const u64 ackno, const u8 state)
 {
--- a/net/dccp/ackvec.c
+++ b/net/dccp/ackvec.c
@@ -134,6 +134,156 @@ u16 dccp_ackvec_buflen(const struct dccp_ackvec *av)
return dccp_ackvec_idx_sub(av->av_buf_tail, av->av_buf_head);
 }
 
+/**
+ * dccp_ackvec_update_old  -  Update previous state as per RFC 4340, 11.4.1
+ * @av:non-empty buffer to update
+ * @distance:   negative or zero distance of @seqno from buf_ackno downward
+ * @seqno: the (old) sequence number whose record is to be updated
+ * @state: state in which packet carrying @seqno was received
+ */
+static void dccp_ackvec_update_old(struct dccp_ackvec *av, s64 distance,
+  u64 seqno, enum dccp_ackvec_states state)
+{
+   u16 ptr = av->av_buf_head;
+
+   BUG_ON(distance > 0);
+   if (unlikely(dccp_ackvec_is_empty(av)))
+   return;
+
+   do {
+   u8 runlen = dccp_ackvec_runlen(av->av_buf + ptr);
+
+   if (distance + runlen >= 0) {
+   /*
+* Only update the state if packet has not been received
+* yet. This is OK as per the second table in RFC 4340,
+* 11.4.1; i.e. here we are using the following table:
+* RECEIVED
+*  0   1   3
+*  S +---+---+---+
+*  T   0 | 0 | 0 | 0 |
+*  O +---+---+---+
+*  R   1 | 1 | 1 | 1 |
+*  E +---+---+---+
+*  D   3 | 0 | 1 | 3 |
+*+---+---+---+
+* The "Not Received" state was set by reserve_seats().
+*/
+   if (av->av_buf[ptr] == DCCPAV_NOT_RECEIVED)
+   av->av_buf[ptr] = state;
+   else
+   dccp_pr_debug("Not changing %llu state to %u\n",
+ (unsigned long long)seqno, state);
+   break;
+   }
+
+   distance += runlen + 1;
+   ptr   = dccp_ackvec_idx_add(ptr, 1);
+
+   } while (ptr != av->av_buf_tail);
+}
+
+/* Mark @num entries after buf_head as "Not yet received". */
+static void dccp_ackvec_reserve_seats(struct dccp_ackvec *av, u1

[PATCH 03/10] [ACKVEC]: Separate internals of Ack Vectors from option-parsing code

2008-02-21 Thread Gerrit Renker
This patch
 * separates Ack Vector housekeeping code from option-insertion code;
 * shifts option-specific code from ackvec.c into options.c;
 * introduces a dedicated routine to take care of the Ack Vector records;
 * simplifies the dccp_ackvec_insert_avr() routine: the BUG_ON was redundant,
   since the list is automatically arranged in descending order of ack_seqno.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/ackvec.c  |  100 +--
 net/dccp/ackvec.h  |5 +--
 net/dccp/options.c |   60 +++
 3 files changed, 80 insertions(+), 85 deletions(-)

--- a/net/dccp/ackvec.h
+++ b/net/dccp/ackvec.h
@@ -110,7 +110,7 @@ extern int dccp_ackvec_parse(struct sock *sk, const struct 
sk_buff *skb,
 u64 *ackno, const u8 opt,
 const u8 *value, const u8 len);
 
-extern int dccp_insert_option_ackvec(struct sock *sk, struct sk_buff *skb);
+extern int  dccp_ackvec_update_records(struct dccp_ackvec *av, u64 seq, u8 
sum);
 
 static inline int dccp_ackvec_pending(const struct dccp_ackvec *av)
 {
@@ -153,8 +153,7 @@ static inline int dccp_ackvec_parse(struct sock *sk, const 
struct sk_buff *skb,
return -1;
 }
 
-static inline int dccp_insert_option_ackvec(const struct sock *sk,
-   const struct sk_buff *skb)
+static inline int dccp_ackvec_update_records(struct dccp_ackvec *av, u64 seq, 
u8 nonce)
 {
return -1;
 }
--- a/net/dccp/options.c
+++ b/net/dccp/options.c
@@ -428,6 +428,66 @@ static int dccp_insert_option_timestamp_echo(struct 
dccp_sock *dp,
return 0;
 }
 
+int dccp_insert_option_ackvec(struct sock *sk, struct sk_buff *skb)
+{
+   struct dccp_sock *dp = dccp_sk(sk);
+   struct dccp_ackvec *av = dp->dccps_hc_rx_ackvec;
+   /* Figure out how many options do we need to represent the ackvec */
+   const u16 nr_opts = DIV_ROUND_UP(av->av_vec_len, 
DCCP_SINGLE_OPT_MAXLEN);
+   u16 len = av->av_vec_len + 2 * nr_opts;
+   u8 i, nonce = 0;
+   const unsigned char *tail, *from;
+   unsigned char *to;
+
+   if (DCCP_SKB_CB(skb)->dccpd_opt_len + len > DCCP_MAX_OPT_LEN)
+   return -1;
+
+   DCCP_SKB_CB(skb)->dccpd_opt_len += len;
+
+   to   = skb_push(skb, len);
+   len  = av->av_vec_len;
+   from = av->av_buf + av->av_buf_head;
+   tail = av->av_buf + DCCPAV_MAX_ACKVEC_LEN;
+
+   for (i = 0; i < nr_opts; ++i) {
+   int copylen = len;
+
+   if (len > DCCP_SINGLE_OPT_MAXLEN)
+   copylen = DCCP_SINGLE_OPT_MAXLEN;
+
+   /*
+* RFC 4340, 12.2: Encode the Nonce Echo for this Ack Vector via
+* its type; ack_nonce is the sum of all individual buf_nonce's.
+*/
+   nonce ^= av->av_buf_nonce[i];
+
+   *to++ = DCCPO_ACK_VECTOR_0 + av->av_buf_nonce[i];
+   *to++ = copylen + 2;
+
+   /* Check if buf_head wraps */
+   if (from + copylen > tail) {
+   const u16 tailsize = tail - from;
+
+   memcpy(to, from, tailsize);
+   to  += tailsize;
+   len -= tailsize;
+   copylen -= tailsize;
+   from= av->av_buf;
+   }
+
+   memcpy(to, from, copylen);
+   from += copylen;
+   to   += copylen;
+   len  -= copylen;
+   }
+   /*
+* Each sent Ack Vector is recorded in the list, as per A.2 of RFC 4340.
+*/
+   if (dccp_ackvec_update_records(av, DCCP_SKB_CB(skb)->dccpd_seq, nonce))
+   return -ENOBUFS;
+   return 0;
+}
+
 /**
  * dccp_insert_option_mandatory  -  Mandatory option (5.8.2)
  * Note that since we are using skb_push, this function needs to be called
--- a/net/dccp/ackvec.c
+++ b/net/dccp/ackvec.c
@@ -55,99 +55,35 @@ void dccp_ackvec_free(struct dccp_ackvec *av)
}
 }
 
-static void dccp_ackvec_insert_avr(struct dccp_ackvec *av,
-  struct dccp_ackvec_record *avr)
-{
-   /*
-* AVRs are sorted by seqno. Since we are sending them in order, we
-* just add the AVR at the head of the list.
-* -sorbo.
-*/
-   if (!list_empty(&av->av_records)) {
-   const struct dccp_ackvec_record *head =
-   list_entry(av->av_records.next,
-  struct dccp_ackvec_record,
-  avr_node);
-   BUG_ON(before48(avr->avr_ack_seqno, head->avr_ack_seqno));
-   }
-
-   list_add(&avr->avr_node, &av->av_records);
-}
-
-int dccp_insert_option_ackvec(struct sock *sk, struct sk_buff *sk

[PATCH 08/10] [ACKVEC]: Schedule SyncAck when running out of space

2008-02-21 Thread Gerrit Renker
The problem with Ack Vectors is that

  i) their length is variable and can in principle grow quite large,
 ii) it is hard to predict exactly how large they will be.

 Due to the second point it seems not a good idea to reduce the MPS;
i particular when on average there is enough room for the Ack Vector
and an increase in length is momentarily due to some burst loss, after
which the Ack Vector returns to its normal/average length.

The solution taken by this patch to address the outstanding FIXME is
to schedule a separate Sync when running out of space on the skb, and to
log a warning into the syslog.

The mechanism can also be used for other out-of-band signalling: it does
quicker signalling than scheduling an Ack, since it does not need to wait
for new data.

 Additional Note regarding MPS:
 -
 It is possible to lower MPS according to the average length of Ack Vectors;
 the following argues why this does not seem to be a good idea.

 When determining the average Ack Vector length, a moving-average is more
 useful than a normal average, since sudden peaks (burst losses) are better
 dampened. The Ack Vector buffer would have a field `av_avg_len' which tracks
 this moving average and MPS would be reduced by this value (plus 2 bytes for
 type/value for each full Ack Vector).

 However, this means that the MPS decreases in the middle of an established
 connection. For a user who has tuned his/her application to work with the
 MPS taken at the beginning of the connection this can be very counter-
 intuitive and annoying.

 (Over the long term there should be some adjustment to reduce MPS at least
  by a minimum when Ack Vectors are used; some applications may rely on the
  exact value of the MPS).

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 include/linux/dccp.h |2 ++
 net/dccp/options.c   |   23 +++
 net/dccp/output.c|8 
 3 files changed, 29 insertions(+), 4 deletions(-)

--- a/include/linux/dccp.h
+++ b/include/linux/dccp.h
@@ -475,6 +475,7 @@ struct dccp_ackvec;
  * @dccps_hc_rx_insert_options - receiver wants to add options when acking
  * @dccps_hc_tx_insert_options - sender wants to add options when sending
  * @dccps_server_timewait - server holds timewait state on close (RFC 4340, 
8.3)
+ * @dccps_sync_scheduled - flag which signals "send out-of-band message soon"
  * @dccps_xmit_timer - timer for when CCID is not ready to send
  * @dccps_syn_rtt - RTT sample from Request/Response exchange (in usecs)
  */
@@ -515,6 +516,7 @@ struct dccp_sock {
__u8dccps_hc_rx_insert_options:1;
__u8dccps_hc_tx_insert_options:1;
__u8dccps_server_timewait:1;
+   __u8dccps_sync_scheduled:1;
struct timer_list   dccps_xmit_timer;
 };
 
--- a/net/dccp/output.c
+++ b/net/dccp/output.c
@@ -290,6 +290,8 @@ void dccp_write_xmit(struct sock *sk, int block)
if (err)
DCCP_BUG("err=%d after ccid_hc_tx_packet_sent",
 err);
+   if (dp->dccps_sync_scheduled)
+   dccp_send_sync(sk, dp->dccps_gsr, 
DCCP_PKT_SYNC);
} else {
dccp_pr_debug("packet discarded due to err=%d\n", err);
kfree_skb(skb);
@@ -562,6 +564,12 @@ void dccp_send_sync(struct sock *sk, const u64 ackno,
DCCP_SKB_CB(skb)->dccpd_type = pkt_type;
DCCP_SKB_CB(skb)->dccpd_ack_seq = ackno;
 
+   /*
+* Clear the flag in case the Sync was scheduled for out-of-band data,
+* such as carrying a long Ack Vector.
+*/
+   dccp_sk(sk)->dccps_sync_scheduled = 0;
+
dccp_transmit_skb(sk, skb);
 }
--- a/net/dccp/options.c
+++ b/net/dccp/options.c
@@ -428,6 +428,7 @@ int dccp_insert_option_ackvec(struct sock *sk, struct 
sk_buff *skb)
 {
struct dccp_sock *dp = dccp_sk(sk);
struct dccp_ackvec *av = dp->dccps_hc_rx_ackvec;
+   struct dccp_skb_cb *dcb = DCCP_SKB_CB(skb);
const u16 buflen = dccp_ackvec_buflen(av);
/* Figure out how many options do we need to represent the ackvec */
const u16 nr_opts = DIV_ROUND_UP(buflen, DCCP_SINGLE_OPT_MAXLEN);
@@ -436,10 +437,24 @@ int dccp_insert_option_ackvec(struct sock *sk, struct 
sk_buff *skb)
const unsigned char *tail, *from;
unsigned char *to;
 
-   if (DCCP_SKB_CB(skb)->dccpd_opt_len + len > DCCP_MAX_OPT_LEN)
+   if (dcb->dccpd_opt_len + len > DCCP_MAX_OPT_LEN) {
+   DCCP_WARN("Lacking space for %u bytes on %s packet\n", len,
+ dccp_packet_name(dcb->dccpd_type));
return -1;
-
-   DCCP_SKB_CB(skb)->dccpd_opt_len += len;
+   }
+   /*
+*

[PATCH 01/10] [ACKVEC]: Need to ignore Ack Vectors on request sockets

2008-02-21 Thread Gerrit Renker
This fixes an oversight from an earlier patch.

The issue is that Ack Vectors must not be parsed on request sockets, since
the Ack Vector feature depends on the selection of the (TX) CCID. During the
initial handshake the CCIDs are undefined, and so RFC 4340, 10.3 applies:

 "Using CCID-specific options and feature options during a negotiation
  for the corresponding CCID feature is NOT RECOMMENDED [...]"

Worse, it is not even possible: when the server receives the Request from the
client, the CCID and Ack vector features are undefined; when the Ack finalising
the 3-way hanshake arrives, the request socket has not been cloned yet into a
full socket. (This order is necessary, since otherwise the newly created socket
would have to be destroyed whenever an option error occurred - a malicious
hacker could simply send garbage options and exploit this.)

As a result, it is not feasible to parse Ack Vectors on request sockets, since
their sink (the CCIDs interested in using Ack Vector information) is undefined.

Hence disabled by this patch.

Two further, minor changes:
 * replaced magic numbers for CCID-specific options with symbolic constants;
 * replaced local variables `idx' with computation in argument list (btw,
   these indices are nowhere used -- are they really still needed???).

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 include/linux/dccp.h |6 --
 net/dccp/options.c   |   19 +++
 2 files changed, 11 insertions(+), 14 deletions(-)

--- a/include/linux/dccp.h
+++ b/include/linux/dccp.h
@@ -165,8 +165,10 @@ enum {
DCCPO_TIMESTAMP_ECHO = 42,
DCCPO_ELAPSED_TIME = 43,
DCCPO_MAX = 45,
-   DCCPO_MIN_CCID_SPECIFIC = 128,
-   DCCPO_MAX_CCID_SPECIFIC = 255,
+   DCCPO_MIN_RX_CCID_SPECIFIC = 128,   /* from sender to receiver */
+   DCCPO_MAX_RX_CCID_SPECIFIC = 191,
+   DCCPO_MIN_TX_CCID_SPECIFIC = 192,   /* from receiver to sender */
+   DCCPO_MAX_TX_CCID_SPECIFIC = 255,
 };
 
 /* DCCP CCIDS */
--- a/net/dccp/options.c
+++ b/net/dccp/options.c
@@ -104,9 +104,10 @@ int dccp_parse_options(struct sock *sk, struct 
dccp_request_sock *dreq,
 *
 * CCID-specific options are ignored during connection setup, as
 * negotiation may still be in progress (see RFC 4340, 10.3).
-*
+* The same applies to Ack Vectors, as these depend on the CCID.
 */
-   if (dreq != NULL && opt >= 128)
+   if (dreq != NULL && (opt >= DCCPO_MIN_RX_CCID_SPECIFIC ||
+   opt == DCCPO_ACK_VECTOR_0 || opt == DCCPO_ACK_VECTOR_1))
goto ignore_option;
 
switch (opt) {
@@ -226,23 +227,17 @@ int dccp_parse_options(struct sock *sk, struct 
dccp_request_sock *dreq,
dccp_pr_debug("%s rx opt: ELAPSED_TIME=%d\n",
  dccp_role(sk), elapsed_time);
break;
-   case 128 ... 191: {
-   const u16 idx = value - options;
-
+   case DCCPO_MIN_RX_CCID_SPECIFIC ... DCCPO_MAX_RX_CCID_SPECIFIC:
if (ccid_hc_rx_parse_options(dp->dccps_hc_rx_ccid, sk,
-opt, len, idx,
+opt, len, value - options,
 value) != 0)
goto out_invalid_option;
-   }
break;
-   case 192 ... 255: {
-   const u16 idx = value - options;
-
+   case DCCPO_MIN_TX_CCID_SPECIFIC ... DCCPO_MAX_TX_CCID_SPECIFIC:
if (ccid_hc_tx_parse_options(dp->dccps_hc_tx_ccid, sk,
-opt, len, idx,
+opt, len, value - options,
 value) != 0)
goto out_invalid_option;
-   }
break;
default:
DCCP_CRIT("DCCP(%p): option %d(len=%d) not "
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/10] [ACKVEC]: Consolidate Ack-Vector processing within main DCCP module

2008-02-21 Thread Gerrit Renker
This aggregates Ack Vector processing (handling input and clearing old state)
into one function, for the following reasons and benefits:
 * all Ack Vector-specific processing is now in one place;
 * duplicated code is removed;
 * ensuring sanity: from an Ack Vector point of view, it is better to clear the
old state first before entering new state;
 * Ack Event handling happens mostly within the CCIDs, not the main DCCP module.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/input.c |   32 ++--
 1 files changed, 10 insertions(+), 22 deletions(-)

--- a/net/dccp/input.c
+++ b/net/dccp/input.c
@@ -159,13 +159,16 @@ static void dccp_rcv_reset(struct sock *sk, struct 
sk_buff *skb)
dccp_time_wait(sk, DCCP_TIME_WAIT, 0);
 }
 
-static void dccp_event_ack_recv(struct sock *sk, struct sk_buff *skb)
+static void dccp_handle_ackvec_processing(struct sock *sk, struct sk_buff *skb)
 {
-   struct dccp_sock *dp = dccp_sk(sk);
+   struct dccp_ackvec *av = dccp_sk(sk)->dccps_hc_rx_ackvec;
+   struct dccp_skb_cb *dcb = DCCP_SKB_CB(skb);
 
-   if (dp->dccps_hc_rx_ackvec != NULL)
-   dccp_ackvec_clear_state(dp->dccps_hc_rx_ackvec,
-   DCCP_SKB_CB(skb)->dccpd_ack_seq);
+   if (av != NULL) {
+   if (dcb->dccpd_ack_seq != DCCP_PKT_WITHOUT_ACK_SEQ)
+   dccp_ackvec_clear_state(av, dcb->dccpd_ack_seq);
+   dccp_ackvec_input(av, dcb->dccpd_seq, DCCPAV_RECEIVED);
+   }
 }
 
 static void dccp_deliver_input_to_ccids(struct sock *sk, struct sk_buff *skb)
@@ -364,21 +367,13 @@ discard:
 int dccp_rcv_established(struct sock *sk, struct sk_buff *skb,
 const struct dccp_hdr *dh, const unsigned len)
 {
-   struct dccp_sock *dp = dccp_sk(sk);
-
if (dccp_check_seqno(sk, skb))
goto discard;
 
if (dccp_parse_options(sk, NULL, skb))
goto discard;
 
-   if (DCCP_SKB_CB(skb)->dccpd_ack_seq != DCCP_PKT_WITHOUT_ACK_SEQ)
-   dccp_event_ack_recv(sk, skb);
-
-   if (dp->dccps_hc_rx_ackvec != NULL &&
-   dccp_ackvec_add(dp->dccps_hc_rx_ackvec, sk,
-   DCCP_SKB_CB(skb)->dccpd_seq, DCCPAV_RECEIVED))
-   goto discard;
+   dccp_handle_ackvec_processing(sk, skb);
dccp_deliver_input_to_ccids(sk, skb);
 
return __dccp_rcv_established(sk, skb, dh, len);
@@ -621,14 +616,7 @@ int dccp_rcv_state_process(struct sock *sk, struct sk_buff 
*skb,
if (dccp_parse_options(sk, NULL, skb))
goto discard;
 
-   if (dcb->dccpd_ack_seq != DCCP_PKT_WITHOUT_ACK_SEQ)
-   dccp_event_ack_recv(sk, skb);
-
-   if (dp->dccps_hc_rx_ackvec != NULL &&
-   dccp_ackvec_add(dp->dccps_hc_rx_ackvec, sk,
-   DCCP_SKB_CB(skb)->dccpd_seq, 
DCCPAV_RECEIVED))
-   goto discard;
-
+   dccp_handle_ackvec_processing(sk, skb);
dccp_deliver_input_to_ccids(sk, skb);
}
 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[DCCP] [Patch 0/10]: Finish and repair existing Ack Vector implementation

2008-02-21 Thread Gerrit Renker
This is a re-packaged resubmission (reduction from about 20 small patches) of 
the
Ack Vector patch set, which accomplishes two main things.

 First, it completes the implementation of a circular Ack Vector buffer. So far
 the buffer was implemented as a linear array which dropped packets on overflow.

 Second, it makes the Ack Vector implementation workable for Ack Ratios greater
 than 1. The code does not currently work when Ack Ratio is greater than 1.
 
 The reason that existing problems were not observed is that the main module 
 sets Ack Ratio to 1, which in effect bypasses Ack Vectors entirely.
 
 Several bugs with regard to a basic implementation of RFC 4340 Ack Vectors
 were fixed and protections against corrupting buffer state added:
 * successive overlapping Ack Vectors were corrupting buffer state;
 * a protection against outdated entries was missing;
 * a modification was necessary to compute ECN nonces over multiple vectors;
 * reserving entries (one per packet) didn't work well with large burst losses;
 
A summary of the problems that were fixed and full code documentation is on
http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ack_vectors/

The code has been tested extensively and has been part of the test tree for over
two months.

Ack Vectors are a necessary requirement of CCID-2, so it expected that CCID-2
can subsequently also be improved.
 
 Short summary
 -
 Patch # 1: Fixes an oversight: Ack Vectors need to be ignored as long as
feature-negotiation is processing.
 Patch # 2: Ack Vector interface clean-up -- preparation for subsequent patches.
 Patch # 3: Separates Ack Vector - specific code from option-inserting code.
 Patch # 4: Completes the implementation of a fully circular Ack Vector buffer. 
 Patch # 5: Adds a tested algorithm to update Ack Vector buffer state.
 Patch # 6: Updates and revises the way new Ack Vector information is 
registered.
 Patch # 7: Consolidates Ack Vector processing within DCCP main module.
 Patch # 8: Adds a fallback solution - scheduling a Sync when out-of-space.
 Patch # 9: Removes older and now unused parts of the Ack Vector infrastructure.
 Patch #10: Separates the task of parsing Ack Vectors from the CCID(-2) code.

The revised patch set has been uploaded to
git://eden-feed.erg.abdn.ac.uk/dccp_exp [dccp]
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] [DCCP]: Extend CCID packet dequeueing interface

2008-02-14 Thread Gerrit Renker
This extends the packet dequeuing interface of dccp_write_xmit() to allow
 1. CCIDs to take care of timing when the next packet may be sent;
 2. delayed sending (as before, with an inter-packet gap up to 65.535 seconds).

The main purpose is to take CCID2 out of its polling mode (when it is network-
limited, it tries every millisecond to send, without interruption).
The interface can also be used to support other CCIDs.

The mode of operation for (2) is as follows:
 * new packet is enqueued via dccp_sendmsg() => dccp_write_xmit(),
 * ccid_hc_tx_send_packet() detects that it may not send (e.g. window full),
 * it signals this condition via `CCID_PACKET_WILL_DEQUEUE_LATER',
 * dccp_write_xmit() returns without further action;
 * after some time the wait-condition for CCID becomes true,
 * that CCID schedules the tasklet,
 * tasklet function calls ccid_hc_tx_send_packet() via dccp_write_xmit(),
 * since the wait-condition is now true, ccid_hc_tx_packet() returns "send now",
 * packet is sent, and possibly more (since dccp_write_xmit() loops).

Code reuse: the taskled function calls dccp_write_xmit(), the timer function
reduces to a wrapper around the same code.

If the tasklet finds that the socket is locked, it re-schedules the tasklet
function (not the tasklet) after one jiffy.

Changed DCCP_BUG to DCCP_WARN when transmit_skb returns an error (e.g. when a
local qdisc is used, NET_XMIT_DROP=1 can be returned for many packets).

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 include/linux/dccp.h   |4 +-
 net/dccp/ccid.h|   37 -
 net/dccp/ccids/ccid3.c |4 +-
 net/dccp/output.c  |  103 +++
 net/dccp/timer.c   |   25 ++-
 5 files changed, 122 insertions(+), 51 deletions(-)

--- a/include/linux/dccp.h
+++ b/include/linux/dccp.h
@@ -476,7 +476,8 @@ struct dccp_ackvec;
  * @dccps_hc_tx_insert_options - sender wants to add options when sending
  * @dccps_server_timewait - server holds timewait state on close (RFC 4340, 
8.3)
  * @dccps_sync_scheduled - flag which signals "send out-of-band message soon"
- * @dccps_xmit_timer - timer for when CCID is not ready to send
+ * @dccps_xmitlet - tasklet scheduled by the TX CCID to dequeue data packets
+ * @dccps_xmit_timer - used by the TX CCID to delay sending (rate-based pacing)
  * @dccps_syn_rtt - RTT sample from Request/Response exchange (in usecs)
  */
 struct dccp_sock {
@@ -517,6 +518,7 @@ struct dccp_sock {
__u8dccps_hc_tx_insert_options:1;
__u8dccps_server_timewait:1;
__u8dccps_sync_scheduled:1;
+   struct tasklet_struct   dccps_xmitlet;
struct timer_list   dccps_xmit_timer;
 };
 
--- a/net/dccp/ccid.h
+++ b/net/dccp/ccid.h
@@ -124,13 +124,44 @@ static inline int ccid_get_current_id(struct dccp_sock 
*dp, bool rx)
 extern void ccid_hc_rx_delete(struct ccid *ccid, struct sock *sk);
 extern void ccid_hc_tx_delete(struct ccid *ccid, struct sock *sk);
 
+/*
+ * Congestion control of queued data packets via CCID decision.
+ *
+ * The TX CCID performs its congestion-control by indicating whether and when a
+ * queued packet may be sent, using the return code of 
ccid_hc_tx_send_packet().
+ * The following modes are supported:
+ *  - autonomous dequeueing (CCID internally schedules dccps_xmitlet);
+ *  - timer-based pacing (CCID returns a delay value in milliseconds).
+ * Modes and error handling are identified using the symbolic constants below.
+ */
+enum ccid_dequeueing_decision {
+   CCID_PACKET_SEND_AT_ONCE =   0x0,
+   CCID_PACKET_DELAY =  0x1,
+   CCID_PACKET_WILL_DEQUEUE_LATER = 0x2,
+   CCID_PACKET_ERR =0xF,
+};
+
+/* maximum possible number of milliseconds to delay a packet (65.535 seconds) 
*/
+#define CCID_PACKET_DELAY_MAX  0x
+#define CCID_PACKET_DELAY_MAX_USEC (CCID_PACKET_DELAY_MAX * USEC_PER_MSEC)
+
+static inline int ccid_packet_dequeue_eval(int return_code)
+{
+   if (return_code < 0)
+   return CCID_PACKET_ERR;
+   if (return_code == 0)
+   return CCID_PACKET_SEND_AT_ONCE;
+   if (return_code <= CCID_PACKET_DELAY_MAX)
+   return CCID_PACKET_DELAY;
+   return return_code;
+}
+
 static inline int ccid_hc_tx_send_packet(struct ccid *ccid, struct sock *sk,
 struct sk_buff *skb)
 {
-   int rc = 0;
if (ccid->ccid_ops->ccid_hc_tx_send_packet != NULL)
-   rc = ccid->ccid_ops->ccid_hc_tx_send_packet(sk, skb);
-   return rc;
+   return ccid->ccid_ops->ccid_hc_tx_send_packet(sk, skb);
+   return CCID_PACKET_SEND_AT_ONCE;
 }
 
 static inline void ccid_hc_tx_packet_sent(struct ccid *ccid, struct sock *sk,
--- a/net/dccp/ccids/ccid3.c
+

[PATCH 2/5] [CCID]: Refine the wait-for-ccid mechanism

2008-02-14 Thread Gerrit Renker
This extends the existing wait-for-ccid routine so that it may be used with
different types of CCID. It further addresses the problems listed below.

The code looks if the write queue is non-empty and grants the TX CCID up to
`timeout' jiffies to drain the queue. It will instead purge that queue if
 * the delay suggested by the CCID exceeds the time budget;
 * a socket error occurred while waiting for the CCID;
 * there is a signal pending (eg. annoyed user pressed Control-C);
 * the CCID does not support delays (we don't know how long it will take).


 D e t a i l s  [can be removed]
 ---
DCCP's sending mechanism functions a bit like non-blocking I/O: dccp_sendmsg()
will enqueue up to net.dccp.default.tx_qlen packets (default=5), without waiting
for them to be released to the network.

Rate-based CCIDs, such as CCID3/4, can impose sending delays of up to maximally
64 seconds (t_mbi in RFC 3448). Hence the write queue may still contain packets
when the application closes. Since the write queue is congestion-controlled by
the CCID, draining the queue is also under control of the CCID.

There are several problems that needed to be addressed:
 1) The queue-drain mechanism only works with rate-based CCIDs. If CCID2 for
example has a full TX queue and becomes network-limited just as the
application wants to close, then waiting for CCID2 to become unblocked could
lead to an indefinite  delay (i.e., application "hangs").
 2) Since each TX CCID in turn uses a feedback mechanism, there may be changes
in its sending policy while the queue is being drained. This can lead to
further delays during which the application will not be able to terminate.
 3) The minimum wait time for CCID3/4 can be expected to be the queue length
times the current inter-packet delay. For example if tx_qlen=100 and a delay
of 15 ms is used for each packet, then the application would have to wait
for a minimum of 1.5 seconds before being allowed to exit.
 4) There is no way for the user/application to control this behaviour. It would
be good to use the timeout argument of dccp_close() as an upper bound. Then
the maximum time that an application is willing to wait for its CCIDs to can
be set via the SO_LINGER option.

These problems are addressed by giving the CCID a grace period of up to the
`timeout' value.

The wait-for-ccid function is, as before, used when the application
 (a) has read all the data in its receive buffer and
 (b) if SO_LINGER was set with a non-zero linger time, or
 (c) the socket is either in the OPEN (active close) or in the PASSIVE_CLOSEREQ
 state (client application closes after receiving CloseReq).

In addition, there is a catch-all case by calling __skb_queue_purge() after
waiting for the CCID. This is necessary since the write queue may still have
data when
 (a) the host has been passively-closed,
 (b) abnormal termination (unread data, zero linger time),
 (c) wait-for-ccid could not finish within the given time limit.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/dccp.h   |3 +-
 net/dccp/output.c |  122 ++---
 net/dccp/proto.c  |   15 ++-
 net/dccp/timer.c  |2 +-
 4 files changed, 86 insertions(+), 56 deletions(-)

--- a/net/dccp/dccp.h
+++ b/net/dccp/dccp.h
@@ -215,8 +215,9 @@ extern void dccp_reqsk_send_ack(struct sk_buff *sk, struct 
request_sock *rsk);
 extern void dccp_send_sync(struct sock *sk, const u64 seq,
   const enum dccp_pkt_type pkt_type);
 
-extern void dccp_write_xmit(struct sock *sk, int block);
+extern void dccp_write_xmit(struct sock *sk);
 extern void dccp_write_space(struct sock *sk);
+extern void dccp_flush_write_queue(struct sock *sk, long *time_budget);
 
 extern void dccp_init_xmit_timers(struct sock *sk);
 static inline void dccp_clear_xmit_timers(struct sock *sk)
--- a/net/dccp/output.c
+++ b/net/dccp/output.c
@@ -204,49 +204,29 @@ void dccp_write_space(struct sock *sk)
 }
 
 /**
- * dccp_wait_for_ccid - Wait for ccid to tell us we can send a packet
+ * dccp_wait_for_ccid  -  Await CCID send permission
  * @sk:socket to wait for
- * @skb:   current skb to pass on for waiting
- * @delay: sleep timeout in milliseconds (> 0)
- * This function is called by default when the socket is closed, and
- * when a non-zero linger time is set on the socket. For consistency
+ * @delay: timeout in jiffies
+ * This is used by CCIDs which need to delay the send time in process context.
  */
-static int dccp_wait_for_ccid(struct sock *sk, struct sk_buff *skb, int delay)
+static int dccp_wait_for_ccid(struct sock *sk, unsigned long delay)
 {
-   struct dccp_sock *dp = dccp_sk(sk);
DEFINE_WAIT(wait);
-   unsigned long jiffdelay;
-   int rc;
-
-   do {
-   dccp_pr_debug("delayed send by %d msec\n", delay);
-

[PATCH 5/5] [CCID]: Unused argument

2008-02-14 Thread Gerrit Renker
This removes the argument `more' from ccid_hc_tx_packet_sent, since it was
nowhere used in the entire code.

(Anecdotally, this argument was not even used in the original KAME code where
 the function originally came from; compare the variable moreToSend in the
 freebsd61-dccp-kame-28.08.2006.patch now maintained by Emmanuel Lochin.)

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/ccid.h|6 +++---
 net/dccp/ccids/ccid2.c |2 +-
 net/dccp/ccids/ccid3.c |3 +--
 net/dccp/output.c  |2 +-
 4 files changed, 6 insertions(+), 7 deletions(-)

--- a/net/dccp/ccid.h
+++ b/net/dccp/ccid.h
@@ -75,7 +75,7 @@ struct ccid_operations {
int (*ccid_hc_tx_send_packet)(struct sock *sk,
  struct sk_buff *skb);
void(*ccid_hc_tx_packet_sent)(struct sock *sk,
- int more, unsigned int len);
+ unsigned int len);
void(*ccid_hc_rx_get_info)(struct sock *sk,
   struct tcp_info *info);
void(*ccid_hc_tx_get_info)(struct sock *sk,
@@ -165,10 +165,10 @@ static inline int ccid_hc_tx_send_packet(struct ccid 
*ccid, struct sock *sk,
 }
 
 static inline void ccid_hc_tx_packet_sent(struct ccid *ccid, struct sock *sk,
- int more, unsigned int len)
+ unsigned int len)
 {
if (ccid->ccid_ops->ccid_hc_tx_packet_sent != NULL)
-   ccid->ccid_ops->ccid_hc_tx_packet_sent(sk, more, len);
+   ccid->ccid_ops->ccid_hc_tx_packet_sent(sk, len);
 }
 
 static inline void ccid_hc_rx_packet_recv(struct ccid *ccid, struct sock *sk,
--- a/net/dccp/ccids/ccid2.c
+++ b/net/dccp/ccids/ccid2.c
@@ -224,7 +224,7 @@ static void ccid2_start_rto_timer(struct sock *sk)
   jiffies + hctx->ccid2hctx_rto);
 }
 
-static void ccid2_hc_tx_packet_sent(struct sock *sk, int more, unsigned int 
len)
+static void ccid2_hc_tx_packet_sent(struct sock *sk, unsigned int len)
 {
struct dccp_sock *dp = dccp_sk(sk);
struct ccid2_hc_tx_sock *hctx = ccid2_hc_tx_sk(sk);
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -376,8 +376,7 @@ static int ccid3_hc_tx_send_packet(struct sock *sk, struct 
sk_buff *skb)
return CCID_PACKET_SEND_AT_ONCE;
 }
 
-static void ccid3_hc_tx_packet_sent(struct sock *sk, int more,
-   unsigned int len)
+static void ccid3_hc_tx_packet_sent(struct sock *sk, unsigned int len)
 {
struct ccid3_hc_tx_sock *hctx = ccid3_hc_tx_sk(sk);
 
--- a/net/dccp/output.c
+++ b/net/dccp/output.c
@@ -264,7 +264,7 @@ static void dccp_xmit_packet(struct sock *sk)
 * end this error is indistinguishable from loss, so that finally (if
 * the peer has no bugs) the drop is reported via receiver feedback.
 */
-   ccid_hc_tx_packet_sent(dp->dccps_hc_tx_ccid, sk, 0, len);
+   ccid_hc_tx_packet_sent(dp->dccps_hc_tx_ccid, sk, len);
 
/*
 * If the CCID needs to transfer additional header options out-of-band
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] [CCID2]: Stop polling

2008-02-14 Thread Gerrit Renker
This updates CCID2 to use the CCID dequeuing mechanism, converting from
previous constant-polling to a now event-driven mechanism.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/ccids/ccid2.c |   21 +
 net/dccp/ccids/ccid2.h |5 +
 2 files changed, 18 insertions(+), 8 deletions(-)

--- a/net/dccp/ccids/ccid2.h
+++ b/net/dccp/ccids/ccid2.h
@@ -70,6 +70,11 @@ struct ccid2_hc_tx_sock {
struct list_headccid2hctx_parsed_ackvecs;
 };
 
+static inline bool ccid2_cwnd_network_limited(struct ccid2_hc_tx_sock *hctx)
+{
+   return (hctx->ccid2hctx_pipe >= hctx->ccid2hctx_cwnd);
+}
+
 struct ccid2_hc_rx_sock {
int ccid2hcrx_data;
 };
--- a/net/dccp/ccids/ccid2.c
+++ b/net/dccp/ccids/ccid2.c
@@ -124,12 +124,9 @@ static int ccid2_hc_tx_alloc_seq(struct ccid2_hc_tx_sock 
*hctx)
 
 static int ccid2_hc_tx_send_packet(struct sock *sk, struct sk_buff *skb)
 {
-   struct ccid2_hc_tx_sock *hctx = ccid2_hc_tx_sk(sk);
-
-   if (hctx->ccid2hctx_pipe < hctx->ccid2hctx_cwnd)
-   return 0;
-
-   return 1; /* XXX CCID should dequeue when ready instead of polling */
+   if (ccid2_cwnd_network_limited(ccid2_hc_tx_sk(sk)))
+   return CCID_PACKET_WILL_DEQUEUE_LATER;
+   return CCID_PACKET_SEND_AT_ONCE;
 }
 
 static void ccid2_change_l_ack_ratio(struct sock *sk, u32 val)
@@ -169,6 +166,7 @@ static void ccid2_hc_tx_rto_expire(unsigned long data)
 {
struct sock *sk = (struct sock *)data;
struct ccid2_hc_tx_sock *hctx = ccid2_hc_tx_sk(sk);
+   const bool sender_was_blocked = ccid2_cwnd_network_limited(hctx);
long s;
 
bh_lock_sock(sk);
@@ -189,8 +187,6 @@ static void ccid2_hc_tx_rto_expire(unsigned long data)
if (s > 60)
hctx->ccid2hctx_rto = 60 * HZ;
 
-   ccid2_start_rto_timer(sk);
-
/* adjust pipe, cwnd etc */
hctx->ccid2hctx_ssthresh = hctx->ccid2hctx_cwnd / 2;
if (hctx->ccid2hctx_ssthresh < 2)
@@ -207,6 +203,11 @@ static void ccid2_hc_tx_rto_expire(unsigned long data)
hctx->ccid2hctx_rpdupack = -1;
ccid2_change_l_ack_ratio(sk, 1);
ccid2_hc_tx_check_sanity(hctx);
+
+   /* if we were blocked before, we may now send cwnd=1 packet */
+   if (sender_was_blocked)
+   tasklet_schedule(&dccp_sk(sk)->dccps_xmitlet);
+   ccid2_start_rto_timer(sk);
 out:
bh_unlock_sock(sk);
sock_put(sk);
@@ -461,6 +462,7 @@ static void ccid2_hc_tx_packet_recv(struct sock *sk, struct 
sk_buff *skb)
 {
struct dccp_sock *dp = dccp_sk(sk);
struct ccid2_hc_tx_sock *hctx = ccid2_hc_tx_sk(sk);
+   const bool sender_was_blocked = ccid2_cwnd_network_limited(hctx);
struct dccp_ackvec_parsed *avp;
u64 ackno, seqno;
struct ccid2_seq *seqp;
@@ -646,6 +648,9 @@ static void ccid2_hc_tx_packet_recv(struct sock *sk, struct 
sk_buff *skb)
 
ccid2_hc_tx_check_sanity(hctx);
 done:
+   /* check if incoming Acks allow pending packets to be sent */
+   if (sender_was_blocked && !ccid2_cwnd_network_limited(hctx))
+   tasklet_schedule(&dccp_sk(sk)->dccps_xmitlet);
dccp_ackvec_parsed_cleanup(&hctx->ccid2hctx_parsed_ackvecs);
 }
 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] [DCCP]: Empty the write queue when disconnecting

2008-02-14 Thread Gerrit Renker
dccp_disconnect() can be called due to several reasons:

 1. when the connection setup failed (inet_stream_connect());
 2. when shutting down (inet_shutdown(), inet_csk_listen_stop());
 3. when aborting the connection (dccp_close() with 0 linger time).

In case (1) the write queue is empty. This patch empties the write queue,
if in case (2) or (3) it was not yet empty.

This avoids triggering the write-queue BUG_TRAP in sk_stream_kill_queues()
later on.

It also seems natural to do: when breaking an association, to delete all
packets that were originally intended for the soon-disconnected end (compare
with call to tcp_write_queue_purge in tcp_disconnect()).

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/proto.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -277,7 +277,9 @@ int dccp_disconnect(struct sock *sk, int flags)
sk->sk_err = ECONNRESET;
 
dccp_clear_xmit_timers(sk);
+
__skb_queue_purge(&sk->sk_receive_queue);
+   __skb_queue_purge(&sk->sk_write_queue);
if (sk->sk_send_head != NULL) {
__kfree_skb(sk->sk_send_head);
sk->sk_send_head = NULL;
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[DCCP] [PATCH 0/5]: Extend CCID interface so that CCID-2 stops polling

2008-02-14 Thread Gerrit Renker
This set of patches extends the packet sending/dequeuing interface, which is 
currently restricted to using time intervals only. This forces CCID-2 into
a constant polling mode, which is removed in patch #4.

Patch #1: Extends the CCID packet dequeuing interface to allow CCIDs to
  autonomously schedule sending. Previously it was timer-only,
  so that CCID-2 polls uninterruptedly.
Patch #2: Adds a similar extension to the routine which drains the packet queue
  at the end of a connection (still under congestion control).
Patch #3: Please clean up your queue when disconnecting.
Patch #4: With the previous changes, CCID-2 is taken out of its polling mode.
Patch #5: Removes the `more' argument from tx_packet_sent.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/6] [DCCP]: Fix the adjustments to AWL and SWL

2008-01-28 Thread Gerrit Renker
This fixes a problem and a potential loophole with regard to seqno/ackno
validity: the problem is that the initial adjustments to AWL/SWL were
only performed at the begin of the connection, during the handshake.

Since the Sequence Window feature is always greater than Wmin=32 (7.5.2),
it is however necessary to perform these adjustments at least for the first
W/W' (variables as per 7.5.1) packets in the lifetime of a connection.

This requirement is complicated by the fact that W/W' can change at any time
during the lifetime of a connection.

Therefore the consequence is to perform this safety check each time SWL/AWL
are updated.

A second problem solved by this patch is that the remote/local Sequence Window
feature values (which set the bounds for AWL/SWL/SWH) are undefined until the
feature negotiation has completed.

During the initial handshake we have more stringent sequence number protection,
the changes added by this patch effect that {A,S}W{L,H} are within the correct
bounds at the instant that feature negotiation completes (since the SeqWin
feature activation handlers call dccp_update_gsr/gss()).

A detailed rationale is below -- can be removed from the commit message.


1. Server sequence number checks during initial handshake
-
The server can not use the fields of the listening socket for seqno/ackno checks
and thus needs to store all relevant information on a per-connection basis on
the dccp_request socket. This is a size-constrained structure and has currently
only ISS (dreq_iss) and ISR (dreq_isr) defined.
Adding further fields (SW{L,H}, AW{L,H}) would increase the size of the struct
and it is questionable whether this will have any practical gain. The currently
implemented solution is as follows.
 * receiving first Request: dccp_v{4,6}_conn_request sets
ISR := P.seqno, ISS := dccp_v{4,6}_init_sequence()

 * sending first Response:  dccp_v{4,6}_send_response via dccp_make_response()
sets P.seqno := ISS, sets P.ackno := ISR

 * receiving retransmitted Request: dccp_check_req() overrides ISR := P.seqno

 * answering retransmitted Request: dccp_make_response() sets ISS += 1,
otherwise as per first Response

 * completing the handshake: succeeds in dccp_check_req() for the first Ack
 where P.ackno == ISS (P.seqno is not tested)

 * creating child socket: ISS, ISR are copied from the request_sock

This solution will succeed whenever the server can receive the Request and the
subsequent Ack in succession, without retransmissions. If there is packet loss,
the client needs to retransmit until this condition succeeds; it will otherwise
eventually give up. Adding further fields to the request_sock could increase
the robustness a bit, in that it would make possible to let a reordered Ack
(from a retransmitted Response) pass. The argument against such a solution is
that if the packet loss is not persistent and an Ack gets through, why not
wait for the one answering the original response: if the loss is persistent, it
is probably better to not start the connection in the first place.

Long story short: the present design (by Arnaldo) is simple and will likely work
just as well as a more complicated solution. As a consequence, {A,S}W{L,H} are
not needed until the moment the request_sock is cloned into the accept queue.

At that stage feature negotiation has completed, so that the values for the 
local
and remote Sequence Window feature (7.5.2) are known, i.e. we are now in a 
better
position to compute {A,S}W{L,H}.


2. Client sequence number checks during initial handshake
-
Until entering PARTOPEN the client does not need the adjustments, since it
constrains the Ack window to the packet it sent.

 * sending first Request: dccp_v{4,6}_connect() choose ISS,
  dccp_connect() then sets GAR := ISS (as per 8.5),
  dccp_transmit_skb() (with the previous bug fix) sets
 GSS := ISS, AWL := ISS, AWH := GSS
 * n-th retransmitted Request (with previous patch):
  dccp_retransmit_skb() via timer calls
  dccp_transmit_skb(), which sets GSS := ISS+n
  and then AWL := ISS, AWH := ISS+n

 * receiving any Response: dccp_rcv_request_sent_state_process()
   -- accepts packet if AWL <= P.ackno <= AWH;
   -- sets GSR = ISR = P.seqno

 * sending the Ack completing the handshake: dccp_send_ack() calls
   dccp_transmit_skb(), which sets GSS += 1
   and AWL := ISS, AWH := GSS


Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/dccp.h  |   20 
 net/dccp/input.c |   18 ++
 net/d

[PATCH 6/6] [DCCP-PROBE]: Reduce noise in output and convert to ktime_t

2008-01-28 Thread Gerrit Renker
This fixes the problem that dccp_probe output can grow quite large without
apparent benefit (many identical data points), creating huge files (up to
over one Gigabyte for a few minutes' test run) which are very hard to
post-process (in one instance it got so bad that gnuplot ate up all memory
plus swap).

The cause for the problem is that the kprobe is inserted into dccp_sendmsg,
which can be called in a polling-mode (whenever the TX queue is full due to
congestion-control issues, EAGAIN is returned). This creates many very
similar data points, i.e. the increase of processing time does not increase
the quality/information of the probe output.

The fix is to attach the probe to a different function -- write_xmit was
chosen since it gets called continually (both via userspace and timer);
an input-path function would stop sampling as soon as the other end stops
sending feedback.

For comparison the output file sizes for the same 20 second test
run over a lossy link:
   * before / without patch:  118   Megabytes
   * after  / with patch:   1.2 Megabytes
and there was much less noise in the output.

To allow backward compatibility with scripts that people use, the now-unused
`size' field in the output has been replaced with the CCID identifier. This
also serves for future compatibility - support for CCID2 is work in progress
(depends on the still unfinished SRTT/RTTVAR updates).

While at it, the update to ktime_t was also performed.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/probe.c |   68 ++---
 1 files changed, 23 insertions(+), 45 deletions(-)

--- a/net/dccp/probe.c
+++ b/net/dccp/probe.c
@@ -46,77 +46,55 @@ struct {
struct kfifo  *fifo;
spinlock_tlock;
wait_queue_head_t wait;
-   struct timevaltstart;
+   ktime_t   start;
 } dccpw;
 
-static void printl(const char *fmt, ...)
-{
-   va_list args;
-   int len;
-   struct timeval now;
-   char tbuf[256];
-
-   va_start(args, fmt);
-   do_gettimeofday(&now);
-
-   now.tv_sec -= dccpw.tstart.tv_sec;
-   now.tv_usec -= dccpw.tstart.tv_usec;
-   if (now.tv_usec < 0) {
-   --now.tv_sec;
-   now.tv_usec += 100;
-   }
-
-   len = sprintf(tbuf, "%lu.%06lu ",
- (unsigned long) now.tv_sec,
- (unsigned long) now.tv_usec);
-   len += vscnprintf(tbuf+len, sizeof(tbuf)-len, fmt, args);
-   va_end(args);
-
-   kfifo_put(dccpw.fifo, tbuf, len);
-   wake_up(&dccpw.wait);
-}
-
-static int jdccp_sendmsg(struct kiocb *iocb, struct sock *sk,
-struct msghdr *msg, size_t size)
+static void jdccp_write_xmit(struct sock *sk)
 {
const struct inet_sock *inet = inet_sk(sk);
struct ccid3_hc_tx_sock *hctx = NULL;
+   struct timespec tv;
+   char tbuf[256];
+   int len, ccid = ccid_get_current_id(dccp_sk(sk), false);
 
-   if (ccid_get_current_id(dccp_sk(sk), false) == DCCPC_CCID3)
+   if (ccid == DCCPC_CCID3)
hctx = ccid3_hc_tx_sk(sk);
 
-   if (port == 0 || ntohs(inet->dport) == port ||
-   ntohs(inet->sport) == port) {
-   if (hctx)
-   printl("%d.%d.%d.%d:%u %d.%d.%d.%d:%u %d %d %d %d %u "
-  "%llu %llu %d\n",
+   if (!port || ntohs(inet->dport) == port || ntohs(inet->sport) == port) {
+
+   tv  = ktime_to_timespec(ktime_sub(ktime_get(), dccpw.start));
+   len = sprintf(tbuf, "%lu.%09lu %d.%d.%d.%d:%u %d.%d.%d.%d:%u 
%d",
+  (unsigned long) tv.tv_sec,
+  (unsigned long) tv.tv_nsec,
   NIPQUAD(inet->saddr), ntohs(inet->sport),
-  NIPQUAD(inet->daddr), ntohs(inet->dport), size,
+  NIPQUAD(inet->daddr), ntohs(inet->dport), ccid);
+
+   if (hctx)
+   len += sprintf(tbuf + len, " %d %d %d %u %llu %llu %d",
   hctx->ccid3hctx_s, hctx->ccid3hctx_rtt,
   hctx->ccid3hctx_p, hctx->ccid3hctx_x_calc,
   hctx->ccid3hctx_x_recv >> 6,
   hctx->ccid3hctx_x >> 6, hctx->ccid3hctx_t_ipi);
-   else
-   printl("%d.%d.%d.%d:%u %d.%d.%d.%d:%u %d\n",
-  NIPQUAD(inet->saddr), ntohs(inet->sport),
-  NIPQUAD(inet->daddr), ntohs(inet->dport), size);
+
+   len += sprintf(tbuf + len, "\n");
+   kfifo_put(dccpw.fifo, tbuf, len);
+   wake_up(&dccpw.wait);
}
 
jprobe_return();
-  

[PATCH 5/6] [DCCP]: Merge now-reduced connect_init() function

2008-01-28 Thread Gerrit Renker
After moving the assignment of GAR/ISS from dccp_connect_init() to
dccp_transmit_skb(), the former function becomes very small, so that
a merger with dccp_connect() suggested itself.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/output.c |   18 +-
 1 files changed, 5 insertions(+), 13 deletions(-)

--- a/net/dccp/output.c
+++ b/net/dccp/output.c
@@ -444,8 +444,9 @@ int dccp_send_reset(struct sock *sk, enum dccp_reset_codes 
code)
 /*
  * Do all connect socket setups that can be done AF independent.
  */
-static inline void dccp_connect_init(struct sock *sk)
+int dccp_connect(struct sock *sk)
 {
+   struct sk_buff *skb;
struct dccp_sock *dp = dccp_sk(sk);
struct dst_entry *dst = __sk_dst_get(sk);
struct inet_connection_sock *icsk = inet_csk(sk);
@@ -455,22 +456,12 @@ static inline void dccp_connect_init(struct sock *sk)
 
dccp_sync_mss(sk, dst_mtu(dst));
 
-   /* Initialise GAR as per 8.5; AWL/AWH are set in dccp_transmit_skb() */
-   dp->dccps_gar = dp->dccps_iss;
-
-   icsk->icsk_retransmits = 0;
-}
-
-int dccp_connect(struct sock *sk)
-{
-   struct sk_buff *skb;
-   struct inet_connection_sock *icsk = inet_csk(sk);
-
/* do not connect if feature negotiation setup fails */
if (dccp_feat_finalise_settings(dccp_sk(sk)))
return -EPROTO;
 
-   dccp_connect_init(sk);
+   /* Initialise GAR as per 8.5; AWL/AWH are set in dccp_transmit_skb() */
+   dp->dccps_gar = dp->dccps_iss;
 
skb = alloc_skb(sk->sk_prot->max_header, sk->sk_allocation);
if (unlikely(skb == NULL))
@@ -486,6 +477,7 @@ int dccp_connect(struct sock *sk)
DCCP_INC_STATS(DCCP_MIB_ACTIVEOPENS);
 
/* Timer for repeating the REQUEST until an answer. */
+   icsk->icsk_retransmits = 0;
inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS,
  icsk->icsk_rto, DCCP_RTO_MAX);
return 0;
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/6] [DCCP]: Bug-Fix - AWL was never updated

2008-01-28 Thread Gerrit Renker
This patch was triggered by finding the  following message in the syslog:
 "kernel: dccp_check_seqno: DCCP: Step 6 failed for DATAACK packet, [...]
   P.ackno exists or LAWL(82947089) <= P.ackno(82948208)
<= S.AWH(82948728), sending SYNC..."

Note the difference between AWH and AWL: it is 1639 packets (while Sequence
Window was actually at 100). A closer look at the trace showed that
LAWL = AWL = 82947089 equalled the ISS on the Response.

The cause of the bug was that AWL was only ever set on the first packet - the
DCCP-Request sent by dccp_v{4,6}_connect().

The fix is to continually update AWL/AWH with each new packet (as GSS=AWH).

In addition, AWL/AWH are now updated to enforce more stringent checks on the
initial sequence numbers when connecting:
 * AWL is initialised to ISS and remains at this value;
 * AWH is always set to GSS (via dccp_update_gss());
 * so on the first Request: AWL =  AWH = ISS,
   and on the n-th Request: AWL = ISS, AWH = ISS+n.

As a consequence, only Response packets that refer to Requests sent by this
host will pass, all others are discarded. This is the intention and in effect
implements the initial adjustments for AWL as specified in RFC 4340, 7.5.1.

Note: A problem that remains is that ISS can potentially be under-run even after
  the initial handshake; this is addressed a subsequent patch.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/output.c |   34 +++---
 1 files changed, 15 insertions(+), 19 deletions(-)

--- a/net/dccp/output.c
+++ b/net/dccp/output.c
@@ -53,8 +53,11 @@ static int dccp_transmit_skb(struct sock *sk, struct sk_buff 
*skb)
  dccp_packet_hdr_len(dcb->dccpd_type);
int err, set_ack = 1;
u64 ackno = dp->dccps_gsr;
-
-   dccp_inc_seqno(&dp->dccps_gss);
+   /*
+* Increment GSS here already in case the option code needs it.
+* Update GSS for real only if option processing below succeeds.
+*/
+   dcb->dccpd_seq = ADD48(dp->dccps_gss, 1);
 
switch (dcb->dccpd_type) {
case DCCP_PKT_DATA:
@@ -66,6 +69,9 @@ static int dccp_transmit_skb(struct sock *sk, struct sk_buff 
*skb)
 
case DCCP_PKT_REQUEST:
set_ack = 0;
+   /* Use ISS on the first (non-retransmitted) Request. */
+   if (icsk->icsk_retransmits == 0)
+   dcb->dccpd_seq = dp->dccps_iss;
/* fall through */
 
case DCCP_PKT_SYNC:
@@ -84,14 +90,11 @@ static int dccp_transmit_skb(struct sock *sk, struct 
sk_buff *skb)
break;
}
 
-   dcb->dccpd_seq = dp->dccps_gss;
-
if (dccp_insert_options(sk, skb)) {
kfree_skb(skb);
return -EPROTO;
}
 
-
/* Build DCCP header and checksum it. */
dh = dccp_zeroed_hdr(skb, dccp_header_size);
dh->dccph_type  = dcb->dccpd_type;
@@ -103,7 +106,7 @@ static int dccp_transmit_skb(struct sock *sk, struct 
sk_buff *skb)
/* XXX For now we're using only 48 bits sequence numbers */
dh->dccph_x = 1;
 
-   dp->dccps_awh = dp->dccps_gss;
+   dccp_update_gss(sk, dcb->dccpd_seq);
dccp_hdr_set_seq(dh, dp->dccps_gss);
if (set_ack)
dccp_hdr_set_ack(dccp_hdr_ack_bits(skb), ackno);
@@ -112,6 +115,11 @@ static int dccp_transmit_skb(struct sock *sk, struct 
sk_buff *skb)
case DCCP_PKT_REQUEST:
dccp_hdr_request(skb)->dccph_req_service =
dp->dccps_service;
+   /*
+* Limit Ack window to ISS <= P.ackno <= GSS, so that
+* only Responses to Requests we sent are considered.
+*/
+   dp->dccps_awl = dp->dccps_iss;
break;
case DCCP_PKT_RESET:
dccp_hdr_reset(skb)->dccph_reset_code =
@@ -447,19 +455,7 @@ static inline void dccp_connect_init(struct sock *sk)
 
dccp_sync_mss(sk, dst_mtu(dst));
 
-   /*
-* SWL and AWL are initially adjusted so that they are not less than
-* the initial Sequence Numbers received and sent, respectively:
-*  SWL := max(GSR + 1 - floor(W/4), ISR),
-*  AWL := max(GSS - W' + 1, ISS).
-* These adjustments MUST be applied only at the beginning of the
-* connection.
-*/
-   dccp_update_gss(sk, dp->dccps_iss);
-   dccp_set_seqn

[PATCH 1/6] [CCID3]: Function returned wrong value

2008-01-28 Thread Gerrit Renker
This fixes a bug in the reverse lookup of p, given a value f(p) (used in the
reverse-lookup of the first/synthetic loss interval in RFC 3448, 6.3.1):
instead of p, the function returned the smallest table value f(p).

The smallest tabulated value of

   10^6 * f(p) =  sqrt(2*p/3) + 12 * sqrt(3*p/8) * (32 * p^3 + p)

for p=0.0001 is 8172.

Since this value is scaled by 10^6, the impact of this bug is that a loss
of 8172/10^6 = 0.8172% was reported whenever the input was below the table
resolution of 0.01%.

This means that the value was over 80 times too high, resulting in large spikes
of the initial loss interval, and thus unnecessarily reducing the throughput.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/ccids/lib/tfrc_equation.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

--- a/net/dccp/ccids/lib/tfrc_equation.c
+++ b/net/dccp/ccids/lib/tfrc_equation.c
@@ -677,11 +677,11 @@ u32 tfrc_calc_x_reverse_lookup(u32 fvalue)
/* Error cases. */
if (fvalue < tfrc_calc_x_lookup[0][1]) {
DCCP_WARN("fvalue %d smaller than resolution\n", fvalue);
-   return tfrc_calc_x_lookup[0][1];
+   return TFRC_SMALLEST_P;
}
if (fvalue > tfrc_calc_x_lookup[TFRC_CALC_X_ARRSIZE - 1][0]) {
DCCP_WARN("fvalue %d exceeds bounds!\n", fvalue);
-   return 100;
+   return 100; /* The maximum: 100% scaled by 10^6 */
}
 
if (fvalue <= tfrc_calc_x_lookup[TFRC_CALC_X_ARRSIZE - 1][1]) {
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[DCCP] [Patch 0/6] [BUG-Fix]: Fixes for CCID3, seqnos, and dccp_probe

2008-01-28 Thread Gerrit Renker
This is a set of bug fixes for CCID3 and general DCCP. 

Please consider patches #1, #2, #3. The remainder are for the test tree
(but are fixes nonetheless) and may not apply directly onto mainline; with
regard to patch #6, please see note at end of message.

Patch #1: Fixes a CCID3 bug: when loss was encountered, the value returned at 
  minimum resolution was wrong and over 80 times too high.

Patch #2: Fixes a bug in the assignment of GAR (used ISR instead of ISS).

Patch #3: Another bug fix: AWL was only ever set after the Response.

Patch #4: Fixes adjustments to AWL and SWL. These were only updated at the
  begin of the connection (where the statements reduced to much 
  simpler assignments), but need to be adjusted for the reception
  of subsequent packets also.
  (Note: This patch is best used after going through the feature
 negotiation patches, since the feature handlers for Sequence
 Window ensure that these values are up to date.)

Patch #5: As a consequence of the preceding patches, the dccp_connect_init
  function may be merged into dccp_connect. Is a suggestion (which
  would make sense), not a bug fix.

Patch #6: A fix for dccp_probe - this used to be noisy and created huge
  files. By attaching the dccp_probe onto a different function, this
  was fixed, the output file size shrank by a factor of over 100,
  with qualitatively the same or better output.


I have put these after the feature-negotiation patches onto

git://eden-feed.erg.abdn.ac.uk/dccp_exp {dccp,ccid4}

and updated the CCID4 tree to take advantage of the dccp_probe update.

If people would like to use the dccp_probe update for a mainline kernel,
I have uploaded a similar patch to patch #6 onto
http://www.erg.abdn.ac.uk/users/gerrit/dccp/dccp_probe_for_mainline.diff
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6] [DCCP]: Bug in initial acknowledgment number assignment

2008-01-28 Thread Gerrit Renker
Step 8.5 in RFC 4340 says for the newly cloned socket

   Initialize S.GAR := S.ISS,

but what in fact the code (minisocks.c) does is

   Initialize S.GAR := S.ISR,

which is wrong (typo?) -- fixed by the patch.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/minisocks.c |7 +++
 1 files changed, 3 insertions(+), 4 deletions(-)

--- a/net/dccp/minisocks.c
+++ b/net/dccp/minisocks.c
@@ -122,13 +122,12 @@ struct sock *dccp_create_openreq_child(struct sock *sk,
 *Initialize S.GAR := S.ISS
 *Set S.ISR, S.GSR, S.SWL, S.SWH from packet or Init Cookies
 */
+   newdp->dccps_gar = newdp->dccps_iss = dreq->dreq_iss;
+   dccp_update_gss(newsk, dreq->dreq_iss);
 
-   newdp->dccps_gar = newdp->dccps_isr = dreq->dreq_isr;
+   newdp->dccps_isr = dreq->dreq_isr;
dccp_update_gsr(newsk, dreq->dreq_isr);
 
-   newdp->dccps_iss = dreq->dreq_iss;
-   dccp_update_gss(newsk, dreq->dreq_iss);
-
/*
 * SWL and AWL are initially adjusted so that they are not less 
than
 * the initial Sequence Numbers received and sent, respectively:
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: FW: ccid2/ccid3 oopses

2008-01-10 Thread Gerrit Renker
| > So maybe the cause triggering this oops is somewhere else.
| 
| yes, probably.  sorry - i didn`t tell or maybe i didn`t know when writing
| my first mail to module authors and forget to add that before forwarding here.
| 
| for me , the problem does not happen with suse kernel of the day
| (2.6.24-rc6-git7-20080102160500-default, .config attached) but it happens
| with vanilla 2.6.24-rc6 (mostly allmodconfig, also attached)
| 
There are 256 differences between the two .config files. I think there are other
people on the list who will be able to give more information regarding the 
.config
files. The differences that struck me in the one which doesn't work is

 -- CONFIG_DEBUG_KERNEL and
 -- CONFIG_DEBUG_BUGVERBOSE were not set. Both are very useful for bug-hunting,
the latter is much better for decoding oopses.

Can't say anything about the Suse kernel. We use the plain kernel from 
www.kernel.org, 
specifically the netdev-tree:
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
If you can't get further here, try with a kernel.org kernel or check Suse 
forums.   

 1. the tests yesterday were done on the DCCP test tree based on the above 
netdev-2.6
2.6.24-rc7 tree from git://eden-feed.erg.abdn.ac.uk/dccp_exp   (dccp 
subtree)
Tested your for-loop 60 seconds each for CCID3/4 -- no oops.

 2. also repeated the tests on an unmodified 2.6.24-rc7 tree from netdev-2.6 
(today)
120 seconds for-loop each -- no oops.   

As said, if the above does not help, try a www.kernel.org kernel (or one of the
above trees) first.
| 
| >| > >> the easiest way to reproduce is:
| > | > >> 
| > | > >> while true;do modprobe dccp_ccid2/3;modprobe -r dccp_ccid2/3;done
| > | > >> after short time, the kernel oopses (messages below)
| > | > >> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: FW: ccid2/ccid3 oopses

2008-01-09 Thread Gerrit Renker
| > >> the easiest way to reproduce is:
| > >> 
| > >> while true;do modprobe dccp_ccid2/3;modprobe -r dccp_ccid2/3;done
| > >> after short time, the kernel oopses (messages below)
| > >> 

| 
| Gerrit, the control socket isn't attached to any CCID module, so the
| CCID modules should be safe to remove, and IIRC they were safe to
| unload.
| 
Ah, right. I have misread the email. And can confirm the above: running
the for-loop at the top of the message (60 seconds uninterrupted for
CCID2,3 each) brought no oopses.
So maybe the cause triggering this oops is somewhere else.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: FW: ccid2/ccid3 oopses

2008-01-09 Thread Gerrit Renker
Roland, -

>> apparently, i got crashes when loading/unloading other driver modules just
>> after ccid2 or ccid3 had been loaded/unloaded _once_ (have not used them at
>> all, just modprobe module;modprobe -r module) >
>> 

>> the easiest way to reproduce is:
>> 
>> while true;do modprobe dccp_ccid2/3;modprobe -r dccp_ccid2/3;done
>> after short time, the kernel oopses (messages below)
>> 
>> i`m not sure if this is worth to be filed at kernel bugzilla, so i`m 
>> contacting
>> you personally first.
>>
The issue is known: once loaded, the DCCP modules can not be unloaded
without causing a crash as the one you have observed. This is due to the
fact that dccp_ipv{4,6} use control sockets which need to be released
before the module can be unloaded.
When the control sockets are not released then crashes will always
result.
In earlier versions of DCCP there was a kernel option known as "unload hack",
which conditionally inserted 
sock_release(dccp_v{4,6}_ctl_socket);
in 
dccp_v{4,6}_exit()

However, as the name says, it is a hack since there are other issues to 
be considered:
* sockets in timewait state
* other wait states (e.g. half-open connections)
* memory which has not been released
* module dependencies

With regard to the latter, I am normally using the Unload Hack and
release modules in the following order:

dccp_probe => dccp_ccid2 => dccp_ccid3 => dccp_tfrc_lib =>
dccp_ipv6  => dccp_ipv4  => dccp_diag  => dccp

Long story short
 * the CCID/DCCP modules can currently not safely be unloaded
 * maybe we should disable module unloading for the mainline kernel
 * if anyone is interested to use the unload hack, here is the old patch
   http://www.erg.abdn.ac.uk/users/gerrit/dccp/testing_dccp/Unload_Hack.diff

Please feel free to come back on this issue
Gerrit
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] [CCID2]: Add option-parsing code to process Ack Vectors at the HC-sender

2008-01-08 Thread Gerrit Renker
This is patch 2 in the set and uses the routines provided by the previous
patch to implement parsing of received Ack Vectors, replacing duplicate code.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/ccids/ccid2.c |  132 
 net/dccp/ccids/ccid2.h |2 +
 2 files changed, 46 insertions(+), 88 deletions(-)

--- a/net/dccp/ccids/ccid2.h
+++ b/net/dccp/ccids/ccid2.h
@@ -47,6 +47,7 @@ struct ccid2_seq {
  * @ccid2hctx_lastrtt -time RTT was last measured
  * @ccid2hctx_rpseq - last consecutive seqno
  * @ccid2hctx_rpdupack - dupacks since rpseq
+ * @ccid2hctx_parsed_ackvecs: list of Ack Vectors received on current skb
 */
 struct ccid2_hc_tx_sock {
u32 ccid2hctx_cwnd;
@@ -66,6 +67,7 @@ struct ccid2_hc_tx_sock {
int ccid2hctx_rpdupack;
unsigned long   ccid2hctx_last_cong;
u64 ccid2hctx_high_ack;
+   struct list_headccid2hctx_parsed_ackvecs;
 };
 
 struct ccid2_hc_rx_sock {
--- a/net/dccp/ccids/ccid2.c
+++ b/net/dccp/ccids/ccid2.c
@@ -320,68 +320,6 @@ static void ccid2_hc_tx_packet_sent(struct sock *sk, int 
more, unsigned int len)
 #endif
 }
 
-/* XXX Lame code duplication!
- * returns -1 if none was found.
- * else returns the next offset to use in the function call.
- */
-static int ccid2_ackvector(struct sock *sk, struct sk_buff *skb, int offset,
-  unsigned char **vec, unsigned char *veclen)
-{
-   const struct dccp_hdr *dh = dccp_hdr(skb);
-   unsigned char *options = (unsigned char *)dh + dccp_hdr_len(skb);
-   unsigned char *opt_ptr;
-   const unsigned char *opt_end = (unsigned char *)dh +
-   (dh->dccph_doff * 4);
-   unsigned char opt, len;
-   unsigned char *value;
-
-   BUG_ON(offset < 0);
-   options += offset;
-   opt_ptr = options;
-   if (opt_ptr >= opt_end)
-   return -1;
-
-   while (opt_ptr != opt_end) {
-   opt   = *opt_ptr++;
-   len   = 0;
-   value = NULL;
-
-   /* Check if this isn't a single byte option */
-   if (opt > DCCPO_MAX_RESERVED) {
-   if (opt_ptr == opt_end)
-   goto out_invalid_option;
-
-   len = *opt_ptr++;
-   if (len < 3)
-   goto out_invalid_option;
-   /*
-* Remove the type and len fields, leaving
-* just the value size
-*/
-   len -= 2;
-   value   = opt_ptr;
-   opt_ptr += len;
-
-   if (opt_ptr > opt_end)
-   goto out_invalid_option;
-   }
-
-   switch (opt) {
-   case DCCPO_ACK_VECTOR_0:
-   case DCCPO_ACK_VECTOR_1:
-   *vec= value;
-   *veclen = len;
-   return offset + (opt_ptr - options);
-   }
-   }
-
-   return -1;
-
-out_invalid_option:
-   DCCP_BUG("Invalid option - this should not happen (previous parsing)!");
-   return -1;
-}
-
 static void ccid2_hc_tx_kill_rto_timer(struct sock *sk)
 {
struct ccid2_hc_tx_sock *hctx = ccid2_hc_tx_sk(sk);
@@ -502,15 +440,30 @@ static void ccid2_congestion_event(struct sock *sk, 
struct ccid2_seq *seqp)
ccid2_change_l_ack_ratio(sk, hctx->ccid2hctx_cwnd);
 }
 
+static int ccid2_hc_tx_parse_options(struct sock *sk, unsigned char option,
+unsigned char len, u16 idx,
+unsigned char *value)
+{
+   struct ccid2_hc_tx_sock *hctx = ccid2_hc_tx_sk(sk);
+
+   switch (option) {
+   case DCCPO_ACK_VECTOR_0:
+   return dccp_ackvec_parsed_add(&hctx->ccid2hctx_parsed_ackvecs,
+ value, len, 0);
+   case DCCPO_ACK_VECTOR_1:
+   return dccp_ackvec_parsed_add(&hctx->ccid2hctx_parsed_ackvecs,
+ value, len, 1);
+   }
+   return 0;
+}
+
 static void ccid2_hc_tx_packet_recv(struct sock *sk, struct sk_buff *skb)
 {
struct dccp_sock *dp = dccp_sk(sk);
struct ccid2_hc_tx_sock *hctx = ccid2_hc_tx_sk(sk);
+   struct dccp_ackvec_parsed *avp;
u64 ackno, seqno;
struct ccid2_seq *seqp;
-   unsigned char *vector;
-   unsigned char veclen;
-   int offset = 0;
int done = 0;
unsigned int maxincr = 0;
 
@@ -545,13 +498,13 @@ static void ccid2_hc_tx_packet_recv(struct sock *sk, 
struct sk_buff *skb)
}
 
/* check forward path congestion */
-   /* still didn't send out new data packet

[PATCH 1/3] [ACKVEC]: Schedule SyncAck when running out of space

2008-01-08 Thread Gerrit Renker
The problem with Ack Vectors is that

  i) their length is variable and can in principle grow quite large,
 ii) it is hard to predict exactly how large they will be.

 Due to the second point it seems not a good idea to reduce the MPS;
i particular when on average there is enough room for the Ack Vector
and an increase in length is momentarily due to some burst loss, after
which the Ack Vector returns to its normal/average length.

The solution taken by this patch to address the outstanding FIXME is
to schedule a separate Sync when running out of space on the skb, and to
log a warning into the syslog.

The mechanism can also be used for other out-of-band signalling: it does
quicker signalling than scheduling an Ack, since it does not need to wait
for new data.

 Additional Note:
 
 It is possible to lower MPS according to the average length of Ack Vectors;
 the following argues why this does not seem to be a good idea.

 When determining the average Ack Vector length, a moving-average is more
 useful than a normal average, since sudden peaks (burst losses) are better
 dampened. The Ack Vector buffer would have a field `av_avg_len' which tracks
 this moving average and MPS would be reduced by this value (plus 2 bytes for
 type/value for each full Ack Vector).

 However, this means that the MPS decreases in the middle of an established
 connection. For a user who has tuned his/her application to work with the
 MPS taken at the beginning of the connection this can be very counter-
 intuitive and annoying.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 include/linux/dccp.h |2 ++
 net/dccp/options.c   |   28 +++-
 net/dccp/output.c|8 
 3 files changed, 29 insertions(+), 9 deletions(-)

--- a/include/linux/dccp.h
+++ b/include/linux/dccp.h
@@ -475,6 +475,7 @@ struct dccp_ackvec;
  * @dccps_hc_rx_insert_options - receiver wants to add options when acking
  * @dccps_hc_tx_insert_options - sender wants to add options when sending
  * @dccps_server_timewait - server holds timewait state on close (RFC 4340, 
8.3)
+ * @dccps_sync_scheduled - flag which signals "send out-of-band message soon"
  * @dccps_xmit_timer - timer for when CCID is not ready to send
  * @dccps_syn_rtt - RTT sample from Request/Response exchange (in usecs)
  */
@@ -515,6 +516,7 @@ struct dccp_sock {
__u8dccps_hc_rx_insert_options:1;
__u8dccps_hc_tx_insert_options:1;
__u8dccps_server_timewait:1;
+   __u8dccps_sync_scheduled:1;
struct timer_list   dccps_xmit_timer;
 };
 
--- a/net/dccp/options.c
+++ b/net/dccp/options.c
@@ -426,7 +426,9 @@ static int dccp_insert_option_timestamp_echo(struct 
dccp_sock *dp,
 
 int dccp_insert_option_ackvec(struct sock *sk, struct sk_buff *skb)
 {
-   struct dccp_ackvec *av = dccp_sk(sk)->dccps_hc_rx_ackvec;
+   struct dccp_sock *dp = dccp_sk(sk);
+   struct dccp_ackvec *av = dp->dccps_hc_rx_ackvec;
+   struct dccp_skb_cb *dcb = DCCP_SKB_CB(skb);
const u16 buflen = dccp_ackvec_buflen(av);
/* Figure out how many options do we need to represent the ackvec */
const u16 nr_opts = DIV_ROUND_UP(buflen, DCCP_SINGLE_OPT_MAXLEN);
@@ -435,16 +437,24 @@ int dccp_insert_option_ackvec(struct sock *sk, struct 
sk_buff *skb)
const unsigned char *tail, *from;
unsigned char *to;
 
-   if (DCCP_SKB_CB(skb)->dccpd_opt_len + len > DCCP_MAX_OPT_LEN) {
-   /*
-* FIXME: when running out of option space while piggybacking on
-* DataAck, return 0 here and schedule a separate Ack instead.
-*/
+   if (dcb->dccpd_opt_len + len > DCCP_MAX_OPT_LEN) {
DCCP_WARN("Lacking space for %u bytes on %s packet\n", len,
- dccp_packet_name(DCCP_SKB_CB(skb)->dccpd_type));
+ dccp_packet_name(dcb->dccpd_type));
return -1;
}
-   DCCP_SKB_CB(skb)->dccpd_opt_len += len;
+   /*
+* Since Ack Vectors are variable-length, we can not always predict
+* their size. To catch exception cases where the space is running out
+* on the skb, a separate Sync is scheduled to carry the Ack Vector.
+*/
+   if (dcb->dccpd_opt_len + skb->len + len > dp->dccps_mss_cache) {
+   DCCP_WARN("No space left for Ack Vector (%u) on skb (%u+%u), "
+ "MPS=%u ==> reduce payload size?\n", len, skb->len,
+ dcb->dccpd_opt_len, dp->dccps_mss_cache);
+   dp->dccps_sync_scheduled = 1;
+   return 0;
+   }
+   dcb->dccpd_opt_len += len;
 
to   = skb_push(skb, len);
len  = buflen;
@@ -485,7 +495,7 @@ int dccp_insert

[PATCH 2/3] [ACKVEC]: Separate skb option parsing from CCID-specific code

2008-01-08 Thread Gerrit Renker
This patch replaces an almost identical reduplication of code; large parts of
dccp_parse_options() are repeated as ccid2_ackvector() in ccid2.c.

Two problems are involved, apart from the duplication:
 1. no separation of concerns: CCIDs should not need to be concerned with the
details of parsing header options;
 2. one can not assume that Ack Vectors appear as a contiguous area within an
skb, it is legal to insert other options and/or padding in between. The
current code would throw an error and stop reading in such a case. Not good.

Details:

 * received Ack Vectors are parsed by dccp_parse_options() alone, which passes
   the result on to the CCID-specific ccid_hc_tx_parse_options();
 * CCIDs interested in using/decoding Ack Vector information need to add code
   to receive parsed Ack Vectors via this interface;
 * a data structure, `struct dccp_ackvec_parsed' is provided, which arranges all
   Ack Vectors of a single skb into a list of parsed chunks;
 * a doubly-linked list was used since insertion needs to be at the tail end;
 * the associated list handling and list de-allocation routines.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/ackvec.c  |   28 
 net/dccp/ackvec.h  |   19 +++
 net/dccp/options.c |   17 ++---
 3 files changed, 57 insertions(+), 7 deletions(-)

--- a/net/dccp/ackvec.h
+++ b/net/dccp/ackvec.h
@@ -112,4 +112,23 @@ static inline bool dccp_ackvec_is_empty(const struct 
dccp_ackvec *av)
 {
return av->av_overflow == 0 && av->av_buf_head == av->av_buf_tail;
 }
+
+/**
+ * struct dccp_ackvec_parsed  -  Record offsets of Ack Vectors in skb
+ * @vec:   start of vector (offset into skb)
+ * @len:   length of @vec
+ * @nonce: whether @vec had an ECN nonce of 0 or 1
+ * @node:  FIFO - arranged in descending order of ack_ackno
+ * This structure is used by CCIDs to access Ack Vectors in a received skb.
+ */
+struct dccp_ackvec_parsed {
+   u8   *vec,
+len,
+nonce:1;
+   struct list_head node;
+};
+
+extern int dccp_ackvec_parsed_add(struct list_head *head,
+ u8 *vec, u8 len, u8 nonce);
+extern void dccp_ackvec_parsed_cleanup(struct list_head *parsed_chunks);
 #endif /* _ACKVEC_H */
--- a/net/dccp/options.c
+++ b/net/dccp/options.c
@@ -135,13 +135,6 @@ int dccp_parse_options(struct sock *sk, struct 
dccp_request_sock *dreq,
if (rc)
goto out_invalid_option;
break;
-   case DCCPO_ACK_VECTOR_0:
-   case DCCPO_ACK_VECTOR_1:
-   if (dccp_packet_without_ack(skb))   /* RFC 4340, 11.4 */
-   break;
-   dccp_pr_debug("%s Ack Vector (len=%u)\n", dccp_role(sk),
- len);
-   break;
case DCCPO_TIMESTAMP:
if (len != 4)
goto out_invalid_option;
@@ -229,6 +222,16 @@ int dccp_parse_options(struct sock *sk, struct 
dccp_request_sock *dreq,
 value) != 0)
goto out_invalid_option;
break;
+   case DCCPO_ACK_VECTOR_0:
+   case DCCPO_ACK_VECTOR_1:
+   if (dccp_packet_without_ack(skb))   /* RFC 4340, 11.4 */
+   break;
+   /*
+* Ack vectors are processed by the TX CCID if it is
+* interested. The RX CCID need not parse Ack Vectors,
+* since it is only interested in clearing old state.
+* Fall through.
+*/
case DCCPO_MIN_TX_CCID_SPECIFIC ... DCCPO_MAX_TX_CCID_SPECIFIC:
if (ccid_hc_tx_parse_options(dp->dccps_hc_tx_ccid, sk,
 opt, len, value - options,
--- a/net/dccp/ackvec.c
+++ b/net/dccp/ackvec.c
@@ -345,6 +345,34 @@ free_records:
}
 }
 
+/*
+ * Routines to keep track of Ack Vectors received in an skb
+ */
+int dccp_ackvec_parsed_add(struct list_head *head, u8 *vec, u8 len, u8 nonce)
+{
+   struct dccp_ackvec_parsed *new = kmalloc(sizeof(*new), GFP_ATOMIC);
+
+   if (new == NULL)
+   return -ENOBUFS;
+   new->vec   = vec;
+   new->len   = len;
+   new->nonce = nonce;
+
+   list_add_tail(&new->node, head);
+   return 0;
+}
+EXPORT_SYMBOL_GPL(dccp_ackvec_parsed_add);
+
+void dccp_ackvec_parsed_cleanup(struct list_head *parsed_chunks)
+{
+   struct dccp_ackvec_parsed *cur, *next;
+
+   list_for_each_entry_safe(cur, next, parsed_chunks, node)
+   kfree(cur);
+  

[DCCP] [PATCH 0/3]: Finishing Ack Vector patch set

2008-01-08 Thread Gerrit Renker
This is the "new stuff" for Ack Vectors, completing the Ack Vector work.

All other patches are as before, with the single exception of the update sent
yesterday (the recovery strategy for dealing with suddenly large losses).

Arnaldo, can you please indicate whether I should resubmit the older patches.

After some more testing I am positive that these are now ready to be considered.

Patch #1: Addresses the pending FIXME, if Ack Vectors grow too large for a 
packet,
  their variable-length information is transported via a 
separate/scheduled
  Sync. This mechanism can also be used for other out-of-band signalling
  (e.g. slow-receiver option).

Patch #2: Since there is already a rich option-parsing infrastructure in 
options.c,
  it is not necessary to replicate the code to parse Ack Vectors. This 
patch
  adds utility routines to store parsed Ack Vectors, which are 
processed by
  the existing infrastructure via the hc_tx_parse_options() interface.

Patch #3: Integrates patch #3 for use with CCID2. 

All patches have been uploaded to git://eden-feed.erg.abdn.ac.uk/dccp_exp
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[DCCP] [Announce]: Test tree updates

2008-01-07 Thread Gerrit Renker
This is an edited list of recent changes in the test tree

git://eden-feed.erg.abdn.ac.uk/dccp_exp

At the top of each block is the name of the patch, followed by a short 
description of the change, and the actual (or abridged if obvious) inter-diff.

Some of these changes refer to improved patches/bug-fixes, which are to be
submitted soon.

Gerrit


[DCCP]: Registration routines for changing feature values


==> Added symbolic constants for the Sequence Window / Ack Ratio limits
==> Added these constants to feat.c / ccid2.c (not shown)

--- a/net/dccp/feat.h
+++ b/net/dccp/feat.h
@@ -14,6 +14,15 @@
 #include 
 #include "dccp.h"
 
+/*
+ * Known limits of feature values.
+ */
+/* Ack Ratio takes 2-byte integer values (11.3) */
+#define DCCPF_ACK_RATIO_MAX0x
+/* Wmin=32 and Wmax=2^46-1 from 7.5.2 */
+#define DCCPF_SEQ_WMIN 32
+#define DCCPF_SEQ_WMAX 0x3FFFull
+
 enum dccp_feat_type {
FEAT_AT_RX   = 1,   /* located at RX side of half-connection  */
FEAT_AT_TX   = 2,   /* located at TX side of half-connection  */


[DCCP]: Implement both feature-local and feature-remote Sequence Window feature

==> Removed the default initialisation of the Sequence Window feature value.
This is redundant, a better solution is implemented in subsequent patch set.
==> Also calls the Sequence Window handlers immediately, so that the sequence
and acknowledgment validity windows are updated.

--- a/net/dccp/feat.c
+++ b/net/dccp/feat.c
@@ -1086,19 +1086,6 @@ int dccp_feat_init(struct sock *sk)
rc = dccp_feat_register_sp(fn, sp[i].feat_num, sp[i].is_local,
sp[i].mandatory, sp[i].val, sp[i].len);
 
-   /*
-* Initial values for the remote and local Sequence Window feature. This
-* is only for the client startup phase, to seed AWL/SWL. Until then,
-*  - the default of 100 is enough for 75 Request-retransmissions,
-*  - sequence window checks are not performed in state LISTEN/REQUEST,
-*  - the only Ack window check is for the Ack completing the handshake.
-* After the handshake, local/remote Sequence Window will be updated
-* with the negotiated values (or the defaults again if not different).
-* The server's AWL/SWL derive directly from the negotiated values.
-*/
-   for (i = 0; rc == 0 && i <= 1; i++)
-   rc = dccp_feat_activate(sk, DCCPF_SEQUENCE_WINDOW, i, NULL);
-
kfree(sp[0].val);
kfree(sp[1].val);
return rc;
--- a/net/dccp/minisocks.c
+++ b/net/dccp/minisocks.c
@@ -319,10 +319,17 @@ int dccp_hdlr_ccid(struct sock *sk, u64 
 
 int dccp_hdlr_seq_win(struct sock *sk, u64 seq_win, bool rx)
 {
-   if (rx)
-   dccp_sk(sk)->dccps_r_seq_win = seq_win;
-   else
-   dccp_sk(sk)->dccps_l_seq_win = seq_win;
+   struct dccp_sock *dp = dccp_sk(sk);
+   
+   if (rx) {
+   dp->dccps_r_seq_win = seq_win;
+   /* propagate changes to update SWL/SWH */
+   dccp_update_gsr(sk, dp->dccps_gsr);
+   } else {
+   dp->dccps_l_seq_win = seq_win;
+   /* propagate changes to update AWL */
+   dccp_update_gss(sk, dp->dccps_gss);
+   }
return 0;
 }
 

[DCCP]: Support exchange of NN options in (PART)OPEN state

==> Lifted the restriction to exchanging only Ack Ratio options, since Sequence
Window values also need to be updated in established state.
Patches to actually do this for CCID2 are work in progress.

--- a/net/dccp/feat.c
+++ b/net/dccp/feat.c
@@ -1176,12 +1177,7 @@ static u8 dccp_feat_handle_nn_establishe
} else if (type != FEAT_NN) {
return 0;
}
-   /*
-* The restriction to Ack Ratio is here for safety: it may be possible
-* to lift this and also update Sequence Window dynamically.
-*/
-if (feat != DCCPF_ACK_RATIO)
-   return 0;
+
/*
 * We don't accept empty Confirms, since in fast-path feature
 * negotiation the values are enabled immediately after sending

[ACKVEC]: Update Ack Vector input routine

==> Added a "cope with large bursts" recovery hook as follows:

  When a packet is missing, the Ack Vector code normally reserves one entry. 
This
  causes problems with larger losses, since the space requirements are 
O(burst_length).
  
  This patch defines a threshold for bursty loss, when exceeding this threshold,
  Ack Vector cells are populated up to their limit, without the expensive space
  reservation. 
  
  The advantage of this strategy is to reduce A

[DCCP] [Patch 1/1] [Bug-Fix]: Need to ignore Ack Vectors on request sockets

2007-12-21 Thread Gerrit Renker
[ACKVEC]: Need to ignore Ack Vectors on request sockets

This fixes an oversight (mine) from an earlier patch and is in principle a 
bug-fix, although the bug will with the current code not become visible.

The issue is that Ack Vectors must not be parsed on request sockets, since
the Ack Vector feature depends on the selection of the (TX) CCID. During the
initial handshake the CCIDs are undefined, and so RFC 4340, 10.3 applies:

 "Using CCID-specific options and feature options during a negotiation
  for the corresponding CCID feature is NOT RECOMMENDED [...]"

Worse, it is not even possible: when the server receives the Request from the 
client, the CCID and Ack vector features are undefined; when the Ack finalising
the 3-way hanshake arrives, the request socket has not been cloned yet into a
new (full) socket. This order is necessary, since otherwise the newly created
socket would have to be destroyed whenever an option error occurred (a malicious
hacker could simply send garbage options and exploit this).
As a result, it is not feasible to parse Ack Vectors on request sockets, since
their destination (the CCIDs interested in using Ack Vector information) is
undefined. Hence disabled by this patch.

Two more changes suggested itself:
 * replaced magic numbers for CCID-specific options with symbolic constants;
 * replaced local variables `idx' with computation in argument list.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 include/linux/dccp.h |6 --
 net/dccp/options.c   |   19 +++
 2 files changed, 11 insertions(+), 14 deletions(-)

--- a/net/dccp/options.c
+++ b/net/dccp/options.c
@@ -104,9 +104,10 @@ int dccp_parse_options(struct sock *sk, 
 *
 * CCID-specific options are ignored during connection setup, as
 * negotiation may still be in progress (see RFC 4340, 10.3).
-*
+* The same applies to Ack Vectors, as these depend on the CCID.
 */
-   if (dreq != NULL && opt >= 128)
+   if (dreq != NULL && (opt >= DCCPO_MIN_RX_CCID_SPECIFIC ||
+   opt == DCCPO_ACK_VECTOR_0 || opt == DCCPO_ACK_VECTOR_1))
goto ignore_option;
 
switch (opt) {
@@ -222,23 +223,17 @@ int dccp_parse_options(struct sock *sk, 
dccp_pr_debug("%s rx opt: ELAPSED_TIME=%d\n",
  dccp_role(sk), elapsed_time);
break;
-   case 128 ... 191: {
-   const u16 idx = value - options;
-
+   case DCCPO_MIN_RX_CCID_SPECIFIC ... DCCPO_MAX_RX_CCID_SPECIFIC:
if (ccid_hc_rx_parse_options(dp->dccps_hc_rx_ccid, sk,
-opt, len, idx,
+opt, len, value - options,
 value) != 0)
goto out_invalid_option;
-   }
break;
-   case 192 ... 255: {
-   const u16 idx = value - options;
-
+   case DCCPO_MIN_TX_CCID_SPECIFIC ... DCCPO_MAX_TX_CCID_SPECIFIC:
if (ccid_hc_tx_parse_options(dp->dccps_hc_tx_ccid, sk,
-opt, len, idx,
+opt, len, value - options,
 value) != 0)
goto out_invalid_option;
-   }
break;
default:
DCCP_CRIT("DCCP(%p): option %d(len=%d) not "
--- a/include/linux/dccp.h
+++ b/include/linux/dccp.h
@@ -165,8 +165,10 @@ enum {
DCCPO_TIMESTAMP_ECHO = 42,
DCCPO_ELAPSED_TIME = 43,
DCCPO_MAX = 45,
-   DCCPO_MIN_CCID_SPECIFIC = 128,
-   DCCPO_MAX_CCID_SPECIFIC = 255,
+   DCCPO_MIN_RX_CCID_SPECIFIC = 128,   /* from sender to receiver */
+   DCCPO_MAX_RX_CCID_SPECIFIC = 191,
+   DCCPO_MIN_TX_CCID_SPECIFIC = 192,   /* from receiver to sender */
+   DCCPO_MAX_TX_CCID_SPECIFIC = 255,
 };
 
 /* DCCP CCIDS */
-- 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/14] [ACKVEC]: Unnecessary to parse Ack Vectors when clearing HC-receiver state

2007-12-19 Thread Gerrit Renker
This removes code clearing Ack Vector state when it is acknowledged via an
Ack Vector by the peer. That code is redundant, since
 * the receiver always puts the full acknowledgment window (groups 2,3 in 
11.4.2)
   into the Ack Vectors it sends; hence the HC-receiver is only interested in 
the
   highest state that the HC-sender received;
 * this means that the acknowledgment number on the (Data)Ack from the HC-sender
   is sufficient; and work done in parsing earlier state is not necessary, since
   the later state subsumes the  earlier one (see also RFC 4340, A.4).

Other (minor) changes:
 * uses  the suggestion made in the code comment, to traverse from oldest record
   to youngest - allowing faster hits when the list is large;

 * removed the unused argument variable `sk' from check_rcv_ackno;

 * renamed check_rcv_ackno => clear_state, since then it is clearer
   what is done by the function (also same name as in A.3 of RFC 4340);

 * the variable `ackno' now becomes unused in options.c, and therefore
   is now used for other (space-saving) purposes.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/ackvec.c  |  114 ++--
 net/dccp/ackvec.h  |   17 +---
 net/dccp/input.c   |4 +-
 net/dccp/options.c |   12 ++
 4 files changed, 30 insertions(+), 117 deletions(-)

--- a/net/dccp/ackvec.h
+++ b/net/dccp/ackvec.h
@@ -102,13 +102,8 @@ extern void dccp_ackvec_free(struct dccp_ackvec *av);
 extern int dccp_ackvec_add(struct dccp_ackvec *av, const struct sock *sk,
   const u64 ackno, const u8 state);
 
-extern void dccp_ackvec_check_rcv_ackno(struct dccp_ackvec *av,
-   struct sock *sk, const u64 ackno);
-extern int dccp_ackvec_parse(struct sock *sk, const struct sk_buff *skb,
-u64 *ackno, const u8 opt,
-const u8 *value, const u8 len);
-
 extern int  dccp_ackvec_update_records(struct dccp_ackvec *av, u64 seq, u8 
sum);
+extern void dccp_ackvec_clear_state(struct dccp_ackvec *av, const u64 ackno);
 
 static inline int dccp_ackvec_pending(const struct dccp_ackvec *av)
 {
@@ -139,18 +134,10 @@ static inline int dccp_ackvec_add(struct dccp_ackvec *av, 
const struct sock *sk,
return -1;
 }
 
-static inline void dccp_ackvec_check_rcv_ackno(struct dccp_ackvec *av,
-  struct sock *sk, const u64 ackno)
+static inline void dccp_ackvec_clear_state(struct dccp_ackvec *av, const u64 
ackno)
 {
 }
 
-static inline int dccp_ackvec_parse(struct sock *sk, const struct sk_buff *skb,
-   const u64 *ackno, const u8 opt,
-   const u8 *value, const u8 len)
-{
-   return -1;
-}
-
 static inline int dccp_ackvec_update_records(struct dccp_ackvec *av, u64 seq, 
u8 nonce)
 {
return -1;
--- a/net/dccp/ackvec.c
+++ b/net/dccp/ackvec.c
@@ -86,6 +86,24 @@ int dccp_ackvec_update_records(struct dccp_ackvec *av, u64 
seqno, u8 nonce_sum)
return 0;
 }
 
+static struct dccp_ackvec_record *dccp_ackvec_lookup(struct list_head *av_list,
+const u64 ackno)
+{
+   struct dccp_ackvec_record *avr;
+   /*
+* Exploit that records are inserted in descending order of sequence
+* number, start with the oldest record first. If @ackno is `before'
+* the earliest ack_ackno, the packet is too old (or is a duplicate).
+*/
+   list_for_each_entry_reverse(avr, av_list, avr_node) {
+   if (avr->avr_ack_seqno == ackno)
+   return avr;
+   if (before48(ackno, avr->avr_ack_seqno))
+   break;
+   }
+   return NULL;
+}
+
 /*
  * If several packets are missing, the HC-Receiver may prefer to enter multiple
  * bytes with run length 0, rather than a single byte with a larger run length;
@@ -234,101 +252,13 @@ static void dccp_ackvec_throw_record(struct dccp_ackvec 
*av,
}
 }
 
-void dccp_ackvec_check_rcv_ackno(struct dccp_ackvec *av, struct sock *sk,
-const u64 ackno)
-{
-   struct dccp_ackvec_record *avr;
-
-   /*
-* If we traverse backwards, it should be faster when we have large
-* windows. We will be receiving ACKs for stuff we sent a while back
-* -sorbo.
-*/
-   list_for_each_entry_reverse(avr, &av->av_records, avr_node) {
-   if (ackno == avr->avr_ack_seqno) {
-   dccp_pr_debug("%s ACK packet 0, len=%d, ack_seqno=%llu, 
"
- "ack_ackno=%llu, ACKED!\n",
- dccp_role(sk), avr->avr_ack_runlen,
- (unsigned long long)avr->avr_ack_seqno,
- (unsigned long long)avr->avr

[PATCH 02/14] [ACKVEC]: Update Ack Vector fields

2007-12-19 Thread Gerrit Renker
This patch updates the fields used by the Ack Vector structures:

 * buf_tail was missing and has been added as struct member (support for
   updating this field follows in subsequent patches);

 * the buf_nonce is now an array, each element covering up to 253 bytes of
   buffer state - this simplifies computing the ECN nonce for large Ack Vectors.

In particular, there are the following changes:

 * since buf_nonce and buffer size use the same number, introduced a constant to
   set the maximum number of Ack Vectors in the buffer (tried Kconfig, but with
   low Ack Ratio rates, a size of 1 is fully sufficient, 2 seems to work for a
   wide range of cases; Kconfig remains an option for later);

 * removed the field dccpav_ack_nonce from struct dccp_ackvec, since this is
   already redundantly stored in the `dccpavr_ack_nonce' (Ack Vector record);

 * replaced semantics of dccpavr_sent_len with dccpavr_ack_runlen, since the
   code needs to be able to remember the old run length;

 * also updated the documentation on sending Ack Vectors, it was outdated
   (ack_runlen was not considered);

 * all sequence numbers now truncated to 48 bits (the assignment of 2^48 to
   buf_ackno in dccp_ackvec_alloc() has been removed, since dccp_ackvec_add()
   overrides this value each time a new packet is added to the buffer).

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/ackvec.c |   48 
 net/dccp/ackvec.h |   77 +++--
 2 files changed, 63 insertions(+), 62 deletions(-)

--- a/net/dccp/ackvec.h
+++ b/net/dccp/ackvec.h
@@ -18,8 +18,14 @@
 
 /* maximum size of a single TLV-encoded option (sans type/len bytes) */
 #define DCCP_SINGLE_OPT_MAXLEN  253
-/* We can spread an ack vector across multiple options */
-#define DCCP_MAX_ACKVEC_LEN (DCCP_SINGLE_OPT_MAXLEN * 2)
+/*
+ * Ack Vector buffer space is static, in multiples of %DCCP_SINGLE_OPT_MAXLEN,
+ * the maximum size of a single Ack Vector. Setting %DCCPAV_NUM_ACKVECS to 1
+ * will be sufficient for most cases of low Ack Ratios, using a value of 2 
gives
+ * more headroom if Ack Ratio is higher or when the sender acknowledges slowly.
+ */
+#define DCCPAV_NUM_ACKVECS 2
+#define DCCPAV_MAX_ACKVEC_LEN  (DCCP_SINGLE_OPT_MAXLEN * DCCPAV_NUM_ACKVECS)
 
 #define DCCP_ACKVEC_STATE_RECEIVED 0
 #define DCCP_ACKVEC_STATE_ECN_MARKED   (1 << 6)
@@ -28,58 +34,53 @@
 #define DCCP_ACKVEC_STATE_MASK 0xC0 /* 1100 */
 #define DCCP_ACKVEC_LEN_MASK   0x3F /* 0011 */
 
-/** struct dccp_ackvec - ack vector
- *
- * This data structure is the one defined in RFC 4340, Appendix A.
- *
- * @av_buf_head - circular buffer head
- * @av_buf_tail - circular buffer tail
- * @av_buf_ackno - ack # of the most recent packet acknowledgeable in the
- *buffer (i.e. %av_buf_head)
- * @av_buf_nonce - the one-bit sum of the ECN Nonces on all packets acked
- *by the buffer with State 0
+/** struct dccp_ackvec - Ack Vector main data structure
  *
- * Additionally, the HC-Receiver must keep some information about the
- * Ack Vectors it has recently sent. For each packet sent carrying an
- * Ack Vector, it remembers four variables:
+ * This implements a fixed-size circular buffer within an array and is largely
+ * based on Appendix A of RFC 4340.
  *
- * @av_records - list of dccp_ackvec_record
- * @av_ack_nonce - the one-bit sum of the ECN Nonces for all State 0.
- *
- * @av_time - the time in usecs
- * @av_buf - circular buffer of acknowledgeable packets
+ * @av_buf:   circular buffer storage area
+ * @av_buf_head:   head index; begin of live portion in @av_buf
+ * @av_buf_tail:   tail index; first index _after_ the live portion in @av_buf
+ * @av_buf_ackno:  highest seqno of acknowledgeable packet recorded in @av_buf
+ * @av_buf_nonce:  ECN nonce sums, each covering subsequent segments of up to
+ *%DCCP_SINGLE_OPT_MAXLEN cells in the live portion of @av_buf
+ * @av_records:   list of %dccp_ackvec_record (Ack Vectors sent 
previously)
+ * @av_time:  the time in usecs
+ * @av_veclen:length of the live portion of @av_buf
  */
 struct dccp_ackvec {
-   u64 av_buf_ackno;
+   u8  av_buf[DCCPAV_MAX_ACKVEC_LEN];
+   u16 av_buf_head;
+   u16 av_buf_tail;
+   u64 av_buf_ackno:48;
+   boolav_buf_nonce[DCCPAV_NUM_ACKVECS];
struct list_headav_records;
ktime_t av_time;
-   u16 av_buf_head;
u16 av_vec_len;
-   u8  av_buf_nonce;
-   u8  av_ack_nonce;
-   u8  av_buf[DCCP_MAX_ACKVEC_LEN];
 };
 
-/** struct dccp_ackvec_record - ack vector record
+/** struct dccp_ackvec_record - Records information about sent Ack Ve

[PATCH 06/14] [ACKVEC]: Simplify adding Ack Vector records, split option-specific code

2007-12-19 Thread Gerrit Renker
This patch
 * simplifies the dccp_ackvec_insert_avr() routine: the BUG_ON was redundant,
   since the list is automatically arranged in descending order of ack_no;
 * separates Ack Vector housekeeping code from option-specific code;
 * shifts option-specific code into options.c.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/ackvec.c  |  107 +---
 net/dccp/ackvec.h  |5 +-
 net/dccp/options.c |   65 +++
 3 files changed, 85 insertions(+), 92 deletions(-)

--- a/net/dccp/ackvec.h
+++ b/net/dccp/ackvec.h
@@ -108,7 +108,7 @@ extern int dccp_ackvec_parse(struct sock *sk, const struct 
sk_buff *skb,
 u64 *ackno, const u8 opt,
 const u8 *value, const u8 len);
 
-extern int dccp_insert_option_ackvec(struct sock *sk, struct sk_buff *skb);
+extern int  dccp_ackvec_update_records(struct dccp_ackvec *av, u64 seq, u8 
sum);
 
 static inline int dccp_ackvec_pending(const struct dccp_ackvec *av)
 {
@@ -151,8 +151,7 @@ static inline int dccp_ackvec_parse(struct sock *sk, const 
struct sk_buff *skb,
return -1;
 }
 
-static inline int dccp_insert_option_ackvec(const struct sock *sk,
-   const struct sk_buff *skb)
+static inline int dccp_ackvec_update_records(struct dccp_ackvec *av, u64 seq, 
u8 nonce)
 {
return -1;
 }
--- a/net/dccp/ackvec.c
+++ b/net/dccp/ackvec.c
@@ -54,106 +54,35 @@ void dccp_ackvec_free(struct dccp_ackvec *av)
}
 }
 
-static void dccp_ackvec_insert_avr(struct dccp_ackvec *av,
-  struct dccp_ackvec_record *avr)
-{
-   /*
-* AVRs are sorted by seqno. Since we are sending them in order, we
-* just add the AVR at the head of the list.
-* -sorbo.
-*/
-   if (!list_empty(&av->av_records)) {
-   const struct dccp_ackvec_record *head =
-   list_entry(av->av_records.next,
-  struct dccp_ackvec_record,
-  avr_node);
-   BUG_ON(before48(avr->avr_ack_seqno, head->avr_ack_seqno));
-   }
-
-   list_add(&avr->avr_node, &av->av_records);
-}
-
-int dccp_insert_option_ackvec(struct sock *sk, struct sk_buff *skb)
+/**
+ * dccp_ackvec_update_records  -  Record information about sent Ack Vectors
+ * @av:Ack Vector records to update
+ * @seqno: Sequence number of the packet carrying the Ack Vector just sent
+ * @nonce_sum: The sum of all buffer nonces contained in the Ack Vector
+ */
+int dccp_ackvec_update_records(struct dccp_ackvec *av, u64 seqno, u8 nonce_sum)
 {
-   struct dccp_sock *dp = dccp_sk(sk);
-   struct dccp_ackvec *av = dp->dccps_hc_rx_ackvec;
-   /* Figure out how many options do we need to represent the ackvec */
-   const u16 nr_opts = DIV_ROUND_UP(av->av_vec_len, 
DCCP_SINGLE_OPT_MAXLEN);
-   u16 len = av->av_vec_len + 2 * nr_opts;
-   u8 i, nonce = 0;
-   const unsigned char *tail, *from;
-   unsigned char *to;
struct dccp_ackvec_record *avr;
 
-   if (DCCP_SKB_CB(skb)->dccpd_opt_len + len > DCCP_MAX_OPT_LEN) {
-   /*
-* FIXME: when running out of option space while piggybacking on
-* DataAck, return 0 here and schedule a separate Ack instead.
-*/
-   DCCP_WARN("Lacking space for %u bytes on %s packet\n", len,
- dccp_packet_name(DCCP_SKB_CB(skb)->dccpd_type));
-   return -1;
-   }
-
avr = kmem_cache_alloc(dccp_ackvec_record_slab, GFP_ATOMIC);
if (avr == NULL)
-   return -1;
-
-   DCCP_SKB_CB(skb)->dccpd_opt_len += len;
-
-   to   = skb_push(skb, len);
-   len  = av->av_vec_len;
-   from = av->av_buf + av->av_buf_head;
-   tail = av->av_buf + DCCPAV_MAX_ACKVEC_LEN;
-
-   for (i = 0; i < nr_opts; ++i) {
-   int copylen = len;
-
-   if (len > DCCP_SINGLE_OPT_MAXLEN)
-   copylen = DCCP_SINGLE_OPT_MAXLEN;
-
-   /*
-* RFC 4340, 12.2: Encode the Nonce Echo for this Ack Vector via
-* its type; ack_nonce is the sum of all individual buf_nonce's.
-*/
-   nonce ^= av->av_buf_nonce[i];
-
-   *to++ = DCCPO_ACK_VECTOR_0 + av->av_buf_nonce[i];
-   *to++ = copylen + 2;
-
-   /* Check if buf_head wraps */
-   if (from + copylen > tail) {
-   const u16 tailsize = tail - from;
-
-   memcpy(to, from, tailsize);
-   to  += tailsize;
-   len -= tailsize;
-   copylen -= tailsize;
-  

[PATCH 04/14] [ACKVEC]: Inlines for run length and state

2007-12-19 Thread Gerrit Renker
This provides inlines for Ack Vector run length and state, which allow to wrap
several instances of the same code into function calls.

The unused function dccp_ackvec_print() (which also relied on the older
constants), has been removed.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/ackvec.c  |   71 +++-
 net/dccp/ackvec.h  |   13 +++-
 net/dccp/ccids/ccid2.c |7 ++---
 3 files changed, 30 insertions(+), 61 deletions(-)

--- a/net/dccp/ackvec.h
+++ b/net/dccp/ackvec.h
@@ -30,8 +30,17 @@
 #define DCCP_ACKVEC_STATE_ECN_MARKED   (1 << 6)
 #define DCCP_ACKVEC_STATE_NOT_RECEIVED (3 << 6)
 
-#define DCCP_ACKVEC_STATE_MASK 0xC0 /* 1100 */
-#define DCCP_ACKVEC_LEN_MASK   0x3F /* 0011 */
+#define DCCPAV_MAX_RUNLEN  0x3F
+
+static inline u8 dccp_ackvec_runlen(const u8 *cell)
+{
+   return *cell & DCCPAV_MAX_RUNLEN;
+}
+
+static inline u8 dccp_ackvec_state(const u8 *cell)
+{
+   return *cell & ~DCCPAV_MAX_RUNLEN;
+}
 
 /** struct dccp_ackvec - Ack Vector main data structure
  *
--- a/net/dccp/ackvec.c
+++ b/net/dccp/ackvec.c
@@ -135,7 +135,7 @@ int dccp_insert_option_ackvec(struct sock *sk, struct 
sk_buff *skb)
avr->avr_ack_ptr= av->av_buf_head;
avr->avr_ack_ackno  = av->av_buf_ackno;
avr->avr_ack_nonce  = nonce;
-   avr->avr_ack_runlen = av->av_buf[av->av_buf_head] & 
DCCP_ACKVEC_LEN_MASK;
+   avr->avr_ack_runlen = dccp_ackvec_runlen(av->av_buf + av->av_buf_head);
 
dccp_ackvec_insert_avr(av, avr);
 
@@ -178,18 +178,6 @@ void dccp_ackvec_free(struct dccp_ackvec *av)
kmem_cache_free(dccp_ackvec_slab, av);
 }
 
-static inline u8 dccp_ackvec_state(const struct dccp_ackvec *av,
-  const u32 index)
-{
-   return av->av_buf[index] & DCCP_ACKVEC_STATE_MASK;
-}
-
-static inline u8 dccp_ackvec_len(const struct dccp_ackvec *av,
-const u32 index)
-{
-   return av->av_buf[index] & DCCP_ACKVEC_LEN_MASK;
-}
-
 /*
  * If several packets are missing, the HC-Receiver may prefer to enter multiple
  * bytes with run length 0, rather than a single byte with a larger run length;
@@ -234,6 +222,8 @@ static inline int dccp_ackvec_set_buf_head_state(struct 
dccp_ackvec *av,
 int dccp_ackvec_add(struct dccp_ackvec *av, const struct sock *sk,
const u64 ackno, const u8 state)
 {
+   u8 *cur_head = av->av_buf + av->av_buf_head,
+  *buf_end  = av->av_buf + DCCPAV_MAX_ACKVEC_LEN;
/*
 * Check at the right places if the buffer is full, if it is, tell the
 * caller to start dropping packets till the HC-Sender acks our ACK
@@ -258,7 +248,7 @@ int dccp_ackvec_add(struct dccp_ackvec *av, const struct 
sock *sk,
 
/* See if this is the first ackno being inserted */
if (av->av_vec_len == 0) {
-   av->av_buf[av->av_buf_head] = state;
+   *cur_head = state;
av->av_vec_len = 1;
} else if (after48(ackno, av->av_buf_ackno)) {
const u64 delta = dccp_delta_seqno(av->av_buf_ackno, ackno);
@@ -267,10 +257,9 @@ int dccp_ackvec_add(struct dccp_ackvec *av, const struct 
sock *sk,
 * Look if the state of this packet is the same as the
 * previous ackno and if so if we can bump the head len.
 */
-   if (delta == 1 &&
-   dccp_ackvec_state(av, av->av_buf_head) == state &&
-   dccp_ackvec_len(av, av->av_buf_head) < DCCP_ACKVEC_LEN_MASK)
-   av->av_buf[av->av_buf_head]++;
+   if (delta == 1 && dccp_ackvec_state(cur_head) == state &&
+   dccp_ackvec_runlen(cur_head) < DCCPAV_MAX_RUNLEN)
+   *cur_head += 1;
else if (dccp_ackvec_set_buf_head_state(av, delta, state))
return -ENOBUFS;
} else {
@@ -283,21 +272,18 @@ int dccp_ackvec_add(struct dccp_ackvec *av, const struct 
sock *sk,
 *  could reduce the complexity of this scan.)
 */
u64 delta = dccp_delta_seqno(ackno, av->av_buf_ackno);
-   u32 index = av->av_buf_head;
 
while (1) {
-   const u8 len = dccp_ackvec_len(av, index);
-   const u8 state = dccp_ackvec_state(av, index);
+   const u8 len = dccp_ackvec_runlen(cur_head);
/*
 * valid packets not yet in av_buf have a reserved
 * entry, with a len equal to 0.
 */
-   if (state == DCCP_ACKVEC_STATE_NOT_RECEIVED &&
-   len == 0 &&am

[PATCH 09/14] [ACKVEC]: Support for circular Ack Vector buffer with overflow handling

2007-12-19 Thread Gerrit Renker
This adds missing bits to complete the implementation of a circular buffer for
Ack Vector information, with the following changes:

 (a) An `overflow' flag to deal with the case of overflow. As before, dynamic
 growth of the buffer will not be supported; but code will be added to deal
 more gracefully with overflowing Ack Vector buffers (if that ever happens).

 (b) A `tail_seqno' field. When naively implementing the algorithm of Appendix A
 in RFC 4340, problems arise whenever subsequent Ack Vector records overlap,
 which can bring the entire run length calculation completely out of synch.
 (This is documented on http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/\
 ack_vectors/tracking_tail_ackno/ .)

This further allowed to simplify dccp_ackvec_pending() - the #ifdef is no longer
necessary since buf_empty is always true when IP_DCCP_ACKVEC is not configured.

Note on overflow handling:
-
 The Ack Vector code previously simply started to drop packets when the
 Ack Vector buffer overflowed. This means that the userspace application
 will not be able to get its data, only because of an Ack Vector problem.

 Furthermore, overflow may be transient, so that the application later
 recovers from the overflow. Recovering from lost packets is more difficult
 (e.g. video key frames).

 Hence the patch uses a different policy: when the buffer overflows, the oldest
 entries are subsequently overwritten. This has a higher chance of recovery.
 Details are on http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ack_vectors/

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/ackvec.c  |   11 ++-
 net/dccp/ackvec.h  |   12 
 net/dccp/dccp.h|   14 +++---
 net/dccp/options.c |5 ++---
 4 files changed, 27 insertions(+), 15 deletions(-)

--- a/net/dccp/ackvec.h
+++ b/net/dccp/ackvec.h
@@ -53,8 +53,10 @@ static inline u8 dccp_ackvec_state(const u8 *cell)
  * @av_buf_head:   head index; begin of live portion in @av_buf
  * @av_buf_tail:   tail index; first index _after_ the live portion in @av_buf
  * @av_buf_ackno:  highest seqno of acknowledgeable packet recorded in @av_buf
+ * @av_tail_ackno: lowest  seqno of acknowledgeable packet recorded in @av_buf
  * @av_buf_nonce:  ECN nonce sums, each covering subsequent segments of up to
  *%DCCP_SINGLE_OPT_MAXLEN cells in the live portion of @av_buf
+ * @av_overflow:   if 1 then buf_head == buf_tail indicates buffer wraparound
  * @av_records:   list of %dccp_ackvec_record (Ack Vectors sent 
previously)
  * @av_veclen:length of the live portion of @av_buf
  */
@@ -63,7 +65,9 @@ struct dccp_ackvec {
u16 av_buf_head;
u16 av_buf_tail;
u64 av_buf_ackno:48;
+   u64 av_tail_ackno:48;
boolav_buf_nonce[DCCPAV_NUM_ACKVECS];
+   u8  av_overflow:1;
struct list_headav_records;
u16 av_vec_len;
 };
@@ -107,9 +111,9 @@ extern int dccp_ackvec_add(struct dccp_ackvec *av, const 
struct sock *sk,
 extern int  dccp_ackvec_update_records(struct dccp_ackvec *av, u64 seq, u8 
sum);
 extern void dccp_ackvec_clear_state(struct dccp_ackvec *av, const u64 ackno);
 
-static inline int dccp_ackvec_pending(const struct dccp_ackvec *av)
+static inline bool dccp_ackvec_is_empty(const struct dccp_ackvec *av)
 {
-   return av->av_vec_len;
+   return av->av_overflow == 0 && av->av_buf_head == av->av_buf_tail;
 }
 #else /* CONFIG_IP_DCCP_ACKVEC */
 static inline int dccp_ackvec_init(void)
@@ -145,9 +149,9 @@ static inline int dccp_ackvec_update_records(struct 
dccp_ackvec *av, u64 seq, u8
return -1;
 }
 
-static inline int dccp_ackvec_pending(const struct dccp_ackvec *av)
+static inline bool dccp_ackvec_is_empty(const struct dccp_ackvec *av)
 {
-   return 0;
+   return true;
 }
 #endif /* CONFIG_IP_DCCP_ACKVEC */
 #endif /* _ACKVEC_H */
--- a/net/dccp/ackvec.c
+++ b/net/dccp/ackvec.c
@@ -29,7 +29,8 @@ struct dccp_ackvec *dccp_ackvec_alloc(const gfp_t priority)
struct dccp_ackvec *av = kmem_cache_alloc(dccp_ackvec_slab, priority);
 
if (av != NULL) {
-   av->av_buf_head  = DCCPAV_MAX_ACKVEC_LEN - 1;
+   av->av_buf_head = av->av_buf_tail = DCCPAV_MAX_ACKVEC_LEN - 1;
+   av->av_overflow = 0;
av->av_vec_len   = 0;
memset(av->av_buf_nonce, 0, sizeof(av->av_buf_nonce));
INIT_LIST_HEAD(&av->av_records);
@@ -74,6 +75,14 @@ int dccp_ackvec_update_records(struct dccp_ackvec *av, u64 
seqno, u8 nonce_sum)
avr->avr_ack_nonce  = nonce_sum;
avr->avr_ack_runlen = dccp_ackvec_runlen(av->av_buf + av->av_buf_head);
/*
+* When the buffer overfl

[PATCH 05/14] [ACKVEC]: Smaller allocation/deallocation routines

2007-12-19 Thread Gerrit Renker
This patch removes redundant bits, implementing the same functionality with 
less code.

The details are:
  * The INIT_LIST_HEAD in dccp_ackvec_record_new was redundant, since the list 
pointers
were later overwritten when the node was added via list_add().

  * dccp_ackvec_record_new() was called only in one instance in the entire 
ackvec code.

  * The calls to list_del_init() before calling dccp_ackvec_record_delete() 
were redundant,
since subsequently the entire element was freed anyway.

  * Similarly, since all calls to dccp_ackvec_record_delete() were preceded to 
a call to
list_del_init(), the WARN_ON test would never evaluate to true.

  * Also, since all calls to dccp_ackvec_record_delete() were made from within 
list_for
each_entry_safe(), the test for avr == NULL was redundant.

  * The list_empty() test in ackvec_free was redundant, since the same 
condition is
embedded in the loop condition of the subsequent list_for_each_entry_safe().

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/ackvec.c |   78 +++--
 1 files changed, 28 insertions(+), 50 deletions(-)

--- a/net/dccp/ackvec.c
+++ b/net/dccp/ackvec.c
@@ -24,24 +24,34 @@
 static struct kmem_cache *dccp_ackvec_slab;
 static struct kmem_cache *dccp_ackvec_record_slab;
 
-static struct dccp_ackvec_record *dccp_ackvec_record_new(void)
+struct dccp_ackvec *dccp_ackvec_alloc(const gfp_t priority)
 {
-   struct dccp_ackvec_record *avr =
-   kmem_cache_alloc(dccp_ackvec_record_slab, GFP_ATOMIC);
+   struct dccp_ackvec *av = kmem_cache_alloc(dccp_ackvec_slab, priority);
+
+   if (av != NULL) {
+   av->av_buf_head  = DCCPAV_MAX_ACKVEC_LEN - 1;
+   av->av_vec_len   = 0;
+   memset(av->av_buf_nonce, 0, sizeof(av->av_buf_nonce));
+   INIT_LIST_HEAD(&av->av_records);
+   }
+   return av;
+}
 
-   if (avr != NULL)
-   INIT_LIST_HEAD(&avr->avr_node);
+static void dccp_ackvec_purge_records(struct dccp_ackvec *av)
+{
+   struct dccp_ackvec_record *cur, *next;
 
-   return avr;
+   list_for_each_entry_safe(cur, next, &av->av_records, avr_node)
+   kmem_cache_free(dccp_ackvec_record_slab, cur);
+   INIT_LIST_HEAD(&av->av_records);
 }
 
-static void dccp_ackvec_record_delete(struct dccp_ackvec_record *avr)
+void dccp_ackvec_free(struct dccp_ackvec *av)
 {
-   if (unlikely(avr == NULL))
-   return;
-   /* Check if deleting a linked record */
-   WARN_ON(!list_empty(&avr->avr_node));
-   kmem_cache_free(dccp_ackvec_record_slab, avr);
+   if (likely(av != NULL)) {
+   dccp_ackvec_purge_records(av);
+   kmem_cache_free(dccp_ackvec_slab, av);
+   }
 }
 
 static void dccp_ackvec_insert_avr(struct dccp_ackvec *av,
@@ -85,7 +95,7 @@ int dccp_insert_option_ackvec(struct sock *sk, struct sk_buff 
*skb)
return -1;
}
 
-   avr = dccp_ackvec_record_new();
+   avr = kmem_cache_alloc(dccp_ackvec_record_slab, GFP_ATOMIC);
if (avr == NULL)
return -1;
 
@@ -147,37 +157,6 @@ int dccp_insert_option_ackvec(struct sock *sk, struct 
sk_buff *skb)
return 0;
 }
 
-struct dccp_ackvec *dccp_ackvec_alloc(const gfp_t priority)
-{
-   struct dccp_ackvec *av = kmem_cache_alloc(dccp_ackvec_slab, priority);
-
-   if (av != NULL) {
-   av->av_buf_head  = DCCPAV_MAX_ACKVEC_LEN - 1;
-   av->av_vec_len   = 0;
-   memset(av->av_buf_nonce, 0, sizeof(av->av_buf_nonce));
-   INIT_LIST_HEAD(&av->av_records);
-   }
-
-   return av;
-}
-
-void dccp_ackvec_free(struct dccp_ackvec *av)
-{
-   if (unlikely(av == NULL))
-   return;
-
-   if (!list_empty(&av->av_records)) {
-   struct dccp_ackvec_record *avr, *next;
-
-   list_for_each_entry_safe(avr, next, &av->av_records, avr_node) {
-   list_del_init(&avr->avr_node);
-   dccp_ackvec_record_delete(avr);
-   }
-   }
-
-   kmem_cache_free(dccp_ackvec_slab, av);
-}
-
 /*
  * If several packets are missing, the HC-Receiver may prefer to enter multiple
  * bytes with run length 0, rather than a single byte with a larger run length;
@@ -321,8 +300,8 @@ static void dccp_ackvec_throw_record(struct dccp_ackvec *av,
 
/* free records */
list_for_each_entry_safe_from(avr, next, &av->av_records, avr_node) {
-   list_del_init(&avr->avr_node);
-   dccp_ackvec_record_delete(avr);
+   list_del(&avr->avr_node);
+   kmem_cache_free(dccp_ackvec_record_slab, avr);
}
 }
 
@@ -431,10 +410,9 @@ int __init dccp_ackvec_init(void)
if (dccp_ackvec_slab == NULL)
 

[PATCH 08/14] [ACKVEC]: Use enum to enumerate Ack Vector states

2007-12-19 Thread Gerrit Renker
This replaces 3 #defines with an enum containing all possible
Ack Vector states as per RFC 4340, 11.4. This helps to reduce
the length of several expressions.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/ackvec.c  |7 +++
 net/dccp/ackvec.h  |   10 ++
 net/dccp/ccids/ccid2.c |7 +++
 net/dccp/input.c   |9 +++--
 4 files changed, 15 insertions(+), 18 deletions(-)

--- a/net/dccp/ackvec.h
+++ b/net/dccp/ackvec.h
@@ -26,10 +26,12 @@
 #define DCCPAV_NUM_ACKVECS 2
 #define DCCPAV_MAX_ACKVEC_LEN  (DCCP_SINGLE_OPT_MAXLEN * DCCPAV_NUM_ACKVECS)
 
-#define DCCP_ACKVEC_STATE_RECEIVED 0
-#define DCCP_ACKVEC_STATE_ECN_MARKED   (1 << 6)
-#define DCCP_ACKVEC_STATE_NOT_RECEIVED (3 << 6)
-
+enum dccp_ackvec_states {
+   DCCPAV_RECEIVED =   0x00,
+   DCCPAV_ECN_MARKED = 0x40,
+   DCCPAV_RESERVED =   0x80,
+   DCCPAV_NOT_RECEIVED =   0xC0
+};
 #define DCCPAV_MAX_RUNLEN  0x3F
 
 static inline u8 dccp_ackvec_runlen(const u8 *cell)
--- a/net/dccp/ackvec.c
+++ b/net/dccp/ackvec.c
@@ -124,7 +124,7 @@ static inline int dccp_ackvec_set_buf_head_state(struct 
dccp_ackvec *av,
 
if (new_head < 0) {
if (gap > 0) {
-   memset(av->av_buf, DCCP_ACKVEC_STATE_NOT_RECEIVED,
+   memset(av->av_buf, DCCPAV_NOT_RECEIVED,
   gap + new_head + 1);
gap = -new_head;
}
@@ -135,7 +135,7 @@ static inline int dccp_ackvec_set_buf_head_state(struct 
dccp_ackvec *av,
 
if (gap > 0)
memset(av->av_buf + av->av_buf_head + 1,
-  DCCP_ACKVEC_STATE_NOT_RECEIVED, gap);
+  DCCPAV_NOT_RECEIVED, gap);
 
av->av_buf[av->av_buf_head] = state;
av->av_vec_len += packets;
@@ -205,8 +205,7 @@ int dccp_ackvec_add(struct dccp_ackvec *av, const struct 
sock *sk,
 * valid packets not yet in av_buf have a reserved
 * entry, with a len equal to 0.
 */
-   if (*cur_head == DCCP_ACKVEC_STATE_NOT_RECEIVED &&
-   delta == 0) {
+   if (*cur_head == DCCPAV_NOT_RECEIVED && delta == 0) {
dccp_pr_debug("Found %llu reserved seat!\n",
  (unsigned long long)ackno);
*cur_head = state;
--- a/net/dccp/ccids/ccid2.c
+++ b/net/dccp/ccids/ccid2.c
@@ -605,13 +605,12 @@ static void ccid2_hc_tx_packet_recv(struct sock *sk, 
struct sk_buff *skb)
const u8 state = dccp_ackvec_state(vector);
 
/* new packet received or marked */
-   if (state != DCCP_ACKVEC_STATE_NOT_RECEIVED &&
+   if (state != DCCPAV_NOT_RECEIVED &&
!seqp->ccid2s_acked) {
-   if (state ==
-   DCCP_ACKVEC_STATE_ECN_MARKED) {
+   if (state == DCCPAV_ECN_MARKED)
ccid2_congestion_event(sk,
   seqp);
-   } else
+   else
ccid2_new_ack(sk, seqp,
  &maxincr);
 
--- a/net/dccp/input.c
+++ b/net/dccp/input.c
@@ -377,8 +377,7 @@ int dccp_rcv_established(struct sock *sk, struct sk_buff 
*skb,
 
if (dp->dccps_hc_rx_ackvec != NULL &&
dccp_ackvec_add(dp->dccps_hc_rx_ackvec, sk,
-   DCCP_SKB_CB(skb)->dccpd_seq,
-   DCCP_ACKVEC_STATE_RECEIVED))
+   DCCP_SKB_CB(skb)->dccpd_seq, DCCPAV_RECEIVED))
goto discard;
dccp_deliver_input_to_ccids(sk, skb);
 
@@ -511,8 +510,7 @@ static int dccp_rcv_request_sent_state_process(struct sock 
*sk,
 
if (dp->dccps_hc_rx_ackvec != NULL &&
dccp_ackvec_add(dp->dccps_hc_rx_ackvec, sk,
-   DCCP_SKB_CB(skb)->dccpd_seq,
-   DCCP_ACKVEC_STATE_RECEIVED))
+   DCCP_SKB_CB(skb)->dccpd_seq, 
DCCPAV_RECEIVED))
goto unable_to_proceed;
 
dccp_send_ack(sk);
@@ -639,8 +637,7 @@ int dccp_rcv_state_process(struct sock *sk, struct sk_buff 
*skb,
 
if (dp->dccps_hc_rx_ackvec != NULL &&
dccp_ackvec_add(dp->dccps_hc_rx_ackvec, sk,
-   DCCP_S

[PATCH 12/14] [ACKVEC]: Update Ack Vector input routine

2007-12-19 Thread Gerrit Renker
This patch updates the code which registers new packets as received, using the
new circular buffer interface, it now
* supports both tail/head pointers and buffer wrap-around,
* deals with overflow (head/tail move in lock-step), and
* uses dynamic buffer-length computation.

The updated code is also partioned differently, into
1. dealing with the empty buffer,
2. adding new packets into non-empty buffer,
3. reserving space when encountering a `hole' in the sequence space,
4. updating old state and deciding when old state is irrelevant.

Much of the old code and naming is reused.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/ackvec.c |  123 +
 net/dccp/ackvec.h |6 +++
 2 files changed, 129 insertions(+), 0 deletions(-)

--- a/net/dccp/ackvec.h
+++ b/net/dccp/ackvec.h
@@ -109,6 +109,7 @@ extern void dccp_ackvec_free(struct dccp_ackvec *av);
 extern int dccp_ackvec_add(struct dccp_ackvec *av, const struct sock *sk,
   const u64 ackno, const u8 state);
 
+extern void dccp_ackvec_input(struct dccp_ackvec *av, u64 seqno, u8 state);
 extern int  dccp_ackvec_update_records(struct dccp_ackvec *av, u64 seq, u8 
sum);
 extern void dccp_ackvec_clear_state(struct dccp_ackvec *av, const u64 ackno);
 extern u16  dccp_ackvec_buflen(const struct dccp_ackvec *av);
@@ -136,6 +137,11 @@ static inline void dccp_ackvec_free(struct dccp_ackvec *av)
 {
 }
 
+static inline void dccp_ackvec_input(struct dccp_ackvec *av, u64 seq, u8 state)
+{
+
+}
+
 static inline int dccp_ackvec_add(struct dccp_ackvec *av, const struct sock 
*sk,
  const u64 ackno, const u8 state)
 {
--- a/net/dccp/ackvec.c
+++ b/net/dccp/ackvec.c
@@ -134,6 +134,129 @@ u16 dccp_ackvec_buflen(const struct dccp_ackvec *av)
return dccp_ackvec_idx_sub(av->av_buf_tail, av->av_buf_head);
 }
 
+/* Mark @num entries after buf_head as "Not yet received". */
+static void dccp_ackvec_reserve_seats(struct dccp_ackvec *av, u16 num)
+{
+   u16 start = dccp_ackvec_idx_add(av->av_buf_head, 1),
+   len   = DCCPAV_MAX_ACKVEC_LEN - start;
+
+   /* check for buffer wrap-around */
+   if (num > len) {
+   memset(av->av_buf + start, DCCPAV_NOT_RECEIVED, len);
+   start = 0;
+   num  -= len;
+   }
+   if (num)
+   memset(av->av_buf + start, DCCPAV_NOT_RECEIVED, num);
+}
+
+/**
+ * dccp_ackvec_update_old  -  Update previous state as per RFC 4340, 11.4.1
+ * @av:non-empty buffer to update
+ * @distance:   negative or zero distance of @seqno from buf_ackno downward
+ * @seqno: the (old) sequence number whose record is to be updated
+ * @state: state in which packet carrying @seqno was received
+ */
+static void dccp_ackvec_update_old(struct dccp_ackvec *av, s64 distance,
+  u64 seqno, enum dccp_ackvec_states state)
+{
+   u16 ptr = av->av_buf_head;
+
+   BUG_ON(distance > 0);
+   if (unlikely(dccp_ackvec_is_empty(av)))
+   return;
+
+   do {
+   u8 runlen = dccp_ackvec_runlen(av->av_buf + ptr);
+
+   if (distance + runlen >= 0) {
+   /*
+* Only update the state if packet has not been received
+* yet. This is OK as per the second table in RFC 4340,
+* 11.4.1; i.e. here we are using the following table:
+* RECEIVED
+*  0   1   3
+*  S +---+---+---+
+*  T   0 | 0 | 0 | 0 |
+*  O +---+---+---+
+*  R   1 | 1 | 1 | 1 |
+*  E +---+---+---+
+*  D   3 | 0 | 1 | 3 |
+*+---+---+---+
+* The "Not Received" state was set by reserve_seats().
+*/
+   if (av->av_buf[ptr] == DCCPAV_NOT_RECEIVED)
+   av->av_buf[ptr] = state;
+   else
+   dccp_pr_debug("Not changing %llu state to %u\n",
+ (unsigned long long)seqno, state);
+   break;
+   }
+
+   distance += runlen + 1;
+   ptr   = dccp_ackvec_idx_add(ptr, 1);
+
+   } while (ptr != av->av_buf_tail);
+}
+
+/**
+ * dccp_ackvec_add_new  -  Record one or more new entries in Ack Vector buffer
+ * @av: container of buffer to update (can be empty or 
non-empty)
+ * @num_packets: number of packets to register (mu

[PATCH 10/14] [ACKVEC]: Determine buffer length dynamically

2007-12-19 Thread Gerrit Renker
The length of the circular Ack Vector buffer is now determined dynamically,
as the span between head to tail.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/ackvec.c  |   21 +
 net/dccp/ackvec.h  |7 +++
 net/dccp/options.c |7 ---
 3 files changed, 32 insertions(+), 3 deletions(-)

--- a/net/dccp/ackvec.h
+++ b/net/dccp/ackvec.h
@@ -22,6 +22,7 @@
  * the maximum size of a single Ack Vector. Setting %DCCPAV_NUM_ACKVECS to 1
  * will be sufficient for most cases of low Ack Ratios, using a value of 2 
gives
  * more headroom if Ack Ratio is higher or when the sender acknowledges slowly.
+ * The maximum value is bounded by the u16 types for indices and functions.
  */
 #define DCCPAV_NUM_ACKVECS 2
 #define DCCPAV_MAX_ACKVEC_LEN  (DCCP_SINGLE_OPT_MAXLEN * DCCPAV_NUM_ACKVECS)
@@ -110,6 +111,7 @@ extern int dccp_ackvec_add(struct dccp_ackvec *av, const 
struct sock *sk,
 
 extern int  dccp_ackvec_update_records(struct dccp_ackvec *av, u64 seq, u8 
sum);
 extern void dccp_ackvec_clear_state(struct dccp_ackvec *av, const u64 ackno);
+extern u16  dccp_ackvec_buflen(const struct dccp_ackvec *av);
 
 static inline bool dccp_ackvec_is_empty(const struct dccp_ackvec *av)
 {
@@ -149,6 +151,11 @@ static inline int dccp_ackvec_update_records(struct 
dccp_ackvec *av, u64 seq, u8
return -1;
 }
 
+static inline u16 dccp_ackvec_buflen(const struct dccp_ackvec *av)
+{
+   return 0;
+}
+
 static inline bool dccp_ackvec_is_empty(const struct dccp_ackvec *av)
 {
return true;
--- a/net/dccp/ackvec.c
+++ b/net/dccp/ackvec.c
@@ -114,6 +114,27 @@ static struct dccp_ackvec_record 
*dccp_ackvec_lookup(struct list_head *av_list,
 }
 
 /*
+ * Buffer index and length computation using modulo-buffersize arithmetic.
+ * Note that, as pointers move from right to left, head is `before' tail.
+ */
+static inline u16 dccp_ackvec_idx_add(const u16 a, const u16 b)
+{
+   return (a + b) % DCCPAV_MAX_ACKVEC_LEN;
+}
+
+static inline u16 dccp_ackvec_idx_sub(const u16 a, const u16 b)
+{
+   return dccp_ackvec_idx_add(a, DCCPAV_MAX_ACKVEC_LEN - b);
+}
+
+u16 dccp_ackvec_buflen(const struct dccp_ackvec *av)
+{
+   if (unlikely(av->av_overflow))
+   return DCCPAV_MAX_ACKVEC_LEN;
+   return dccp_ackvec_idx_sub(av->av_buf_tail, av->av_buf_head);
+}
+
+/*
  * If several packets are missing, the HC-Receiver may prefer to enter multiple
  * bytes with run length 0, rather than a single byte with a larger run length;
  * this simplifies table updates if one of the missing packets arrives.
--- a/net/dccp/options.c
+++ b/net/dccp/options.c
@@ -432,9 +432,10 @@ static int dccp_insert_option_timestamp_echo(struct 
dccp_sock *dp,
 int dccp_insert_option_ackvec(struct sock *sk, struct sk_buff *skb)
 {
struct dccp_ackvec *av = dccp_sk(sk)->dccps_hc_rx_ackvec;
+   const u16 buflen = dccp_ackvec_buflen(av);
/* Figure out how many options do we need to represent the ackvec */
-   const u16 nr_opts = DIV_ROUND_UP(av->av_vec_len, 
DCCP_SINGLE_OPT_MAXLEN);
-   u16 len = av->av_vec_len + 2 * nr_opts;
+   const u16 nr_opts = DIV_ROUND_UP(buflen, DCCP_SINGLE_OPT_MAXLEN);
+   u16 len = buflen + 2 * nr_opts;
u8 i, nonce = 0;
const unsigned char *tail, *from;
unsigned char *to;
@@ -451,7 +452,7 @@ int dccp_insert_option_ackvec(struct sock *sk, struct 
sk_buff *skb)
DCCP_SKB_CB(skb)->dccpd_opt_len += len;
 
to   = skb_push(skb, len);
-   len  = av->av_vec_len;
+   len  = buflen;
from = av->av_buf + av->av_buf_head;
tail = av->av_buf + DCCPAV_MAX_ACKVEC_LEN;
 
-- 
1.5.3.GIT

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/14] [ACKVEC]: Use Elapsed Time separately

2007-12-19 Thread Gerrit Renker
This decouples the use of Elapsed Time options from the use of Ack Vectors, so
that Elapsed Time options are no longer added automatically to each Ack Vector.

There are three reasons for this:
 1. The Elapsed Time information is nowhere used in the code.

 2. DCCP does not implement rate-based pacing of acknowledgments. The only
recommendation for always including Elapsed Time is in section 11.3 of
RFC 4340: "Receivers that rate-pace acknowledgements SHOULD [...]
include Elapsed Time options". But such is not the case here.

 3. It does not really improve estimation accuracy. The Elapsed Time field only
records the time between the arrival of the last acknowledgeable packet and
the time the Ack Vector is sent out. Since Linux does not (yet) implement
delayed Acks, the time difference will typically be small, since often the
arrival of a data packet triggers sending feedback at the HC-receiver.
If the Ack Vector has a wide coverage (up to 16192) then the Elapsed Time is
not very meaningful for older packets.
If indeed elapsed time is required by the endpoint or CCID, then using
Timestamp options can provide the same information (elapsed time field
in the Timestamp Echo option).

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/ackvec.c |   11 ---
 net/dccp/ackvec.h |3 ---
 2 files changed, 0 insertions(+), 14 deletions(-)

--- a/net/dccp/ackvec.h
+++ b/net/dccp/ackvec.h
@@ -12,7 +12,6 @@
  */
 
 #include 
-#include 
 #include 
 #include 
 
@@ -46,7 +45,6 @@
  * @av_buf_nonce:  ECN nonce sums, each covering subsequent segments of up to
  *%DCCP_SINGLE_OPT_MAXLEN cells in the live portion of @av_buf
  * @av_records:   list of %dccp_ackvec_record (Ack Vectors sent 
previously)
- * @av_time:  the time in usecs
  * @av_veclen:length of the live portion of @av_buf
  */
 struct dccp_ackvec {
@@ -56,7 +54,6 @@ struct dccp_ackvec {
u64 av_buf_ackno:48;
boolav_buf_nonce[DCCPAV_NUM_ACKVECS];
struct list_headav_records;
-   ktime_t av_time;
u16 av_vec_len;
 };
 
--- a/net/dccp/ackvec.c
+++ b/net/dccp/ackvec.c
@@ -71,18 +71,9 @@ int dccp_insert_option_ackvec(struct sock *sk, struct 
sk_buff *skb)
const u16 nr_opts = DIV_ROUND_UP(av->av_vec_len, 
DCCP_SINGLE_OPT_MAXLEN);
u16 len = av->av_vec_len + 2 * nr_opts;
u8 i, nonce = 0;
-   u32 elapsed_time;
const unsigned char *tail, *from;
unsigned char *to;
struct dccp_ackvec_record *avr;
-   suseconds_t delta;
-
-   delta = ktime_us_delta(ktime_get_real(), av->av_time);
-   elapsed_time = delta / 10;
-
-   if (elapsed_time != 0 &&
-   dccp_insert_option_elapsed_time(sk, skb, elapsed_time))
-   return -1;
 
if (DCCP_SKB_CB(skb)->dccpd_opt_len + len > DCCP_MAX_OPT_LEN) {
/*
@@ -162,7 +153,6 @@ struct dccp_ackvec *dccp_ackvec_alloc(const gfp_t priority)
 
if (av != NULL) {
av->av_buf_head  = DCCPAV_MAX_ACKVEC_LEN - 1;
-   av->av_time  = ktime_set(0, 0);
av->av_vec_len   = 0;
memset(av->av_buf_nonce, 0, sizeof(av->av_buf_nonce));
INIT_LIST_HEAD(&av->av_records);
@@ -321,7 +311,6 @@ int dccp_ackvec_add(struct dccp_ackvec *av, const struct 
sock *sk,
}
 
av->av_buf_ackno = ackno;
-   av->av_time = ktime_get_real();
 out:
return 0;
 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/14] [CCID2]: Ack Vectors can happen on most packets

2007-12-19 Thread Gerrit Renker
CCID2 feedback only considers Ack/DataAck packets, but Ack Vectors can appear
on other packets, too. In RFC 4340, 11.4 the only restriction is that they must
not appear on Data or Request packets.

A possible case is for instance a Sync packet used for feature-negotiation
(as used e.g in  section 6.6.1 of RFC 4340). In this case the host must answer
with a SyncAck (7.5.4). Now, if the Sync packet carried an Ack Vector then
the SyncAck acknowledges the receipt of that Ack Vector (thus changing the
Acknowledgment Window, cf. 11.4.2); but the current code would not allow to
parse the received Ack Vector.

The fix is in parsing Ack Vectors on all packet types which are allowed to
carry an Ack Vector.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/ccids/ccid2.c |7 +--
 1 files changed, 1 insertions(+), 6 deletions(-)

--- a/net/dccp/ccids/ccid2.c
+++ b/net/dccp/ccids/ccid2.c
@@ -549,13 +549,8 @@ static void ccid2_hc_tx_packet_recv(struct sock *sk, 
struct sk_buff *skb)
if (hctx->ccid2hctx_seqh == hctx->ccid2hctx_seqt)
return;
 
-   switch (DCCP_SKB_CB(skb)->dccpd_type) {
-   case DCCP_PKT_ACK:
-   case DCCP_PKT_DATAACK:
-   break;
-   default:
+   if (dccp_packet_without_ack(skb))
return;
-   }
 
ackno = DCCP_SKB_CB(skb)->dccpd_ack_seq;
if (after48(ackno, hctx->ccid2hctx_high_ack))
-- 
1.5.3.GIT

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/14] [ACKVEC]: Implement algorithm to update buffer state

2007-12-19 Thread Gerrit Renker
This implements an algorithm to consistently update the buffer state when
the peer acknowledges receipt of Ack Vectors; updating state in the list of
Ack Vectors as well as in the circular buffer.

The algorithm
 * deals with HC-sender acknowledging to HC-receiver and vice versa,
 * keeps track of the last unacknowledged but received seqno in tail_ackno,
 * has special cases to reset the overflow condition when appropriate,
 * is protected against receiving older information (would mess up buffer 
state).

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/ackvec.c |   82 -
 1 files changed, 62 insertions(+), 20 deletions(-)

--- a/net/dccp/ackvec.c
+++ b/net/dccp/ackvec.c
@@ -262,34 +262,76 @@ out_duplicate:
return -EILSEQ;
 }
 
-static void dccp_ackvec_throw_record(struct dccp_ackvec *av,
-struct dccp_ackvec_record *avr)
-{
-   struct dccp_ackvec_record *next;
+/**
+ * dccp_ackvec_clear_state  -  Perform house-keeping / garbage-collection
+ * This routine is called when the peer acknowledges the receipt of Ack Vectors
+ * up to and including @ackno. While based on on section A.3 of RFC 4340, here
+ * are additional precautions to prevent corrupted buffer state. In particular,
+ * we use tail_ackno to identify outdated records; it always marks the earliest
+ * packet of group (2) in 11.4.2.
+ */
+void dccp_ackvec_clear_state(struct dccp_ackvec *av, const u64 ackno)
+ {
+   struct dccp_ackvec_record *avr, *next;
+   u8 runlen_now, eff_runlen;
+   s64 delta;
 
-   /* sort out vector length */
-   if (av->av_buf_head <= avr->avr_ack_ptr)
-   av->av_vec_len = avr->avr_ack_ptr - av->av_buf_head;
-   else
-   av->av_vec_len = DCCPAV_MAX_ACKVEC_LEN - 1 -
-av->av_buf_head + avr->avr_ack_ptr;
+   avr = dccp_ackvec_lookup(&av->av_records, ackno);
+   if (avr == NULL)
+   return;
+   /*
+* Deal with outdated acknowledgments: this arises when e.g. there are
+* several old records and the acks from the peer come in slowly. In
+* that case we may still have records that pre-date tail_ackno.
+*/
+   delta = dccp_delta_seqno(av->av_tail_ackno, avr->avr_ack_ackno);
+   if (delta < 0)
+   goto free_records;
+   /*
+* Deal with overlapping Ack Vectors: don't subtract more than the
+* number of packets between tail_ackno and ack_ackno.
+*/
+   eff_runlen = delta < avr->avr_ack_runlen ? delta : avr->avr_ack_runlen;
 
-   /* free records */
+   runlen_now = dccp_ackvec_runlen(av->av_buf + avr->avr_ack_ptr);
+   /*
+* The run length of Ack Vector cells does not decrease over time. If
+* the run length is the same as at the time the Ack Vector was sent, we
+* free the ack_ptr cell. That cell can however not be freed if the run
+* length has increased: in this case we need to move the tail pointer
+* backwards (towards higher indices), to its next-oldest neighbour.
+*/
+   if (runlen_now > eff_runlen) {
+
+   av->av_buf[avr->avr_ack_ptr] -= eff_runlen + 1;
+   av->av_buf_tail = dccp_ackvec_idx_add(avr->avr_ack_ptr, 1);
+
+   /* This move may not have cleared the overflow flag. */
+   if (av->av_overflow)
+   av->av_overflow = (av->av_buf_head == av->av_buf_tail);
+   } else {
+   av->av_buf_tail = avr->avr_ack_ptr;
+   /*
+* We have made sure that avr points to a valid cell within the
+* buffer. This cell is either older than head, or equals head
+* (empty buffer): in both cases we no longer have any overflow.
+*/
+   av->av_overflow = 0;
+   }
+
+   /*
+* The peer has acknowledged up to and including ack_ackno. Hence the
+* first packet in group (2) of 11.4.2 is the successor of ack_ackno.
+*/
+   av->av_tail_ackno = ADD48(avr->avr_ack_ackno, 1);
+
+free_records:
list_for_each_entry_safe_from(avr, next, &av->av_records, avr_node) {
list_del(&avr->avr_node);
kmem_cache_free(dccp_ackvec_record_slab, avr);
}
 }
 
-void dccp_ackvec_clear_state(struct dccp_ackvec *av, const u64 ackno)
-{
-   struct dccp_ackvec_record *avr;
-
-   avr = dccp_ackvec_lookup(&av->av_records, ackno);
-   if (avr != NULL)
-   dccp_ackvec_throw_record(av, avr);
-}
-
 int __init dccp_ackvec_init(void)
 {
dccp_ackvec_slab = kmem_cache_create("dccp_ackvec",
-- 
1.5.3.GIT

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[DCCP] [RFC] [Patch 0/14]: Ack Vector implementation + fixes

2007-12-19 Thread Gerrit Renker
This set of patches adds functionality to the existing DCCP Ack Vector
implementation, extends it to a full circular buffer, and fixes two
previously undiscovered problems which otherwise result in a corrupted
buffer state. 
It is important that Ack Vectors run reliably since otherwise problems in
other parts (in particular CCID2) can not be fixed.

Quite some testing has been performed so that I am almost convinced it is
bug free. To make 100% sure that it contains no further hidden issues
as before, I would like to post this as RFC, to be eventually submitted
next year.


Patch  #1: Fixes the problem that CCID2 ignores Ack Vectors on some packets.
Patch  #2: Adds a tail pointer to the buffer and makes the size configurable.
Patch  #3: Decouples the use of Elapsed Time from the use of Ack Vectors. 
Patch  #4: Provides inlines for Ack Vector state and run length.
Patch  #5: Removes redundancies from the allocation / de-allocation routines.
Patch  #6: Splits Ack Vector specific code from option-specific code.
Patch  #7: Avoids doing unnecessary work when clearing state: the sender of
   Ack Vectors is only interested in the highest-received Ack Number
   when clearing state. Hence parsing any Ack Vectors for this purpose
   (which only contain lower Ack Numbers) is entirely unnecessary.
Patch  #8: Replaces #defines for Ack Vector states with enum.
Patch  #9: Provides two extensions required for robustness: overflow handling
   (indicated by a flag) and tracking the lowest-unacknowledged sequence
   number (tail_ackno)
Patch #10: Inlines for buffer index manipulation, dynamic buffer length 
calculation.
Patch #11: Implements algorithm to clear buffer state (non-trivial).
Patch #12: Updates the code which registers incoming packets in the buffer.
Patch #13: Aggregates Ack-Vector specific code into one function, thus making it
   easier/simpler for the main DCCP module to use Ack Vectors.
Patch #14: Cleans up old and now unused bits.


The above patches have been uploaded to 

git://eden-feed.erg.abdn.ac.uk/dccp_exp [dccp]

but please note that the server will be switched off from 21/12 ... 01/08.

Detailed information and documentation can be found on 
http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ack_vectors/
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 14/14] [ACKVEC]: Remove old infrastructure

2007-12-19 Thread Gerrit Renker
This removes
 * functions for which updates have been provided in the preceding patches and
 * the @av_vec_len field - it is no longer necessary since the buffer length is
   now always computed dynamically;
 * conditional debugging code (CONFIG_IP_DCCP_ACKVEC).

The reason for removing the latter is that Ack Vectors are an almost
inevitable necessity - RFC 4341 says if you use CCID2, Ack Vectors
must be used. Furthermore, the code would be only interesting for going
bug-hunting. Having dealt with several bugs/problems in the code already,
I am positive that the old debugging code is no longer necessary.

Also updated copyrights.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/Kconfig   |3 -
 net/dccp/Makefile  |5 +-
 net/dccp/ackvec.c  |  141 +---
 net/dccp/ackvec.h  |   64 +-
 net/dccp/ccids/Kconfig |1 -
 5 files changed, 7 insertions(+), 207 deletions(-)

--- a/net/dccp/Kconfig
+++ b/net/dccp/Kconfig
@@ -25,9 +25,6 @@ config INET_DCCP_DIAG
def_tristate y if (IP_DCCP = y && INET_DIAG = y)
def_tristate m
 
-config IP_DCCP_ACKVEC
-   bool
-
 source "net/dccp/ccids/Kconfig"
 
 menu "DCCP Kernel Hacking"
--- a/net/dccp/Makefile
+++ b/net/dccp/Makefile
@@ -1,6 +1,7 @@
 obj-$(CONFIG_IP_DCCP) += dccp.o dccp_ipv4.o
 
-dccp-y := ccid.o feat.o input.o minisocks.o options.o output.o proto.o timer.o
+dccp-y := ccid.o feat.o input.o minisocks.o options.o \
+ output.o proto.o timer.o ackvec.o
 
 dccp_ipv4-y := ipv4.o
 
@@ -8,8 +9,6 @@ dccp_ipv4-y := ipv4.o
 obj-$(subst y,$(CONFIG_IP_DCCP),$(CONFIG_IPV6)) += dccp_ipv6.o
 dccp_ipv6-y := ipv6.o
 
-dccp-$(CONFIG_IP_DCCP_ACKVEC) += ackvec.o
-
 obj-$(CONFIG_INET_DCCP_DIAG) += dccp_diag.o
 obj-$(CONFIG_NET_DCCPPROBE) += dccp_probe.o
 
--- a/net/dccp/ackvec.c
+++ b/net/dccp/ackvec.c
@@ -1,26 +1,18 @@
 /*
  *  net/dccp/ackvec.c
  *
- *  An implementation of the DCCP protocol
+ *  An implementation of Ack Vectors for the DCCP protocol
+ *  Copyright (c) 2007 University of Aberdeen, Scotland, UK
  *  Copyright (c) 2005 Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
  *
  *  This program is free software; you can redistribute it and/or modify it
  *  under the terms of the GNU General Public License as published by the
  *  Free Software Foundation; version 2 of the License;
  */
-
-#include "ackvec.h"
 #include "dccp.h"
-
-#include 
-#include 
-#include 
 #include 
-#include 
 #include 
 
-#include 
-
 static struct kmem_cache *dccp_ackvec_slab;
 static struct kmem_cache *dccp_ackvec_record_slab;
 
@@ -31,7 +23,6 @@ struct dccp_ackvec *dccp_ackvec_alloc(const gfp_t priority)
if (av != NULL) {
av->av_buf_head = av->av_buf_tail = DCCPAV_MAX_ACKVEC_LEN - 1;
av->av_overflow = 0;
-   av->av_vec_len   = 0;
memset(av->av_buf_nonce, 0, sizeof(av->av_buf_nonce));
INIT_LIST_HEAD(&av->av_records);
}
@@ -257,134 +248,6 @@ void dccp_ackvec_input(struct dccp_ackvec *av, u64 seqno, 
u8 state)
}
 }
 
-/*
- * If several packets are missing, the HC-Receiver may prefer to enter multiple
- * bytes with run length 0, rather than a single byte with a larger run length;
- * this simplifies table updates if one of the missing packets arrives.
- */
-static inline int dccp_ackvec_set_buf_head_state(struct dccp_ackvec *av,
-const unsigned int packets,
-const unsigned char state)
-{
-   unsigned int gap;
-   long new_head;
-
-   if (av->av_vec_len + packets > DCCPAV_MAX_ACKVEC_LEN)
-   return -ENOBUFS;
-
-   gap  = packets - 1;
-   new_head = av->av_buf_head - packets;
-
-   if (new_head < 0) {
-   if (gap > 0) {
-   memset(av->av_buf, DCCPAV_NOT_RECEIVED,
-  gap + new_head + 1);
-   gap = -new_head;
-   }
-   new_head += DCCPAV_MAX_ACKVEC_LEN;
-   }
-
-   av->av_buf_head = new_head;
-
-   if (gap > 0)
-   memset(av->av_buf + av->av_buf_head + 1,
-  DCCPAV_NOT_RECEIVED, gap);
-
-   av->av_buf[av->av_buf_head] = state;
-   av->av_vec_len += packets;
-   return 0;
-}
-
-/*
- * Implements the RFC 4340, Appendix A
- */
-int dccp_ackvec_add(struct dccp_ackvec *av, const struct sock *sk,
-   const u64 ackno, const u8 state)
-{
-   u8 *cur_head = av->av_buf + av->av_buf_head,
-  *buf_end  = av->av_buf + DCCPAV_MAX_ACKVEC_LEN;
-   /*
-* Check at the right places if the buffer is full, if it is, tell the
-* caller to start dropping packets till the HC-Sender acks our ACK
-* vectors, when we will f

[PATCH 13/14] [ACKVEC]: Aggregate Ack-Vector related processing into single function

2007-12-19 Thread Gerrit Renker
This aggregates Ack Vector processing (handling input and clearing old state)
into one function, for the following reasons and benefits:
 * duplicated code is removed;
 * all Ack Vector-specific processing is now in one place;
 * sanity: from an Ack Vector point of view, it is better to clear the old
   state first before entering new state;
 * Ack Event handling happens mostly within the CCIDs, not the main DCCP module.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/input.c |   38 +++---
 1 files changed, 11 insertions(+), 27 deletions(-)

--- a/net/dccp/input.c
+++ b/net/dccp/input.c
@@ -159,13 +159,16 @@ static void dccp_rcv_reset(struct sock *sk, struct 
sk_buff *skb)
dccp_time_wait(sk, DCCP_TIME_WAIT, 0);
 }
 
-static void dccp_event_ack_recv(struct sock *sk, struct sk_buff *skb)
+static void dccp_handle_ackvec_processing(struct sock *sk, struct sk_buff *skb)
 {
-   struct dccp_sock *dp = dccp_sk(sk);
+   struct dccp_ackvec *av = dccp_sk(sk)->dccps_hc_rx_ackvec;
+   struct dccp_skb_cb *dcb = DCCP_SKB_CB(skb);
 
-   if (dp->dccps_hc_rx_ackvec != NULL)
-   dccp_ackvec_clear_state(dp->dccps_hc_rx_ackvec,
-   DCCP_SKB_CB(skb)->dccpd_ack_seq);
+   if (av != NULL) {
+   if (dcb->dccpd_ack_seq != DCCP_PKT_WITHOUT_ACK_SEQ)
+   dccp_ackvec_clear_state(av, dcb->dccpd_ack_seq);
+   dccp_ackvec_input(av, dcb->dccpd_seq, DCCPAV_RECEIVED);
+   }
 }
 
 static void dccp_deliver_input_to_ccids(struct sock *sk, struct sk_buff *skb)
@@ -364,21 +367,13 @@ discard:
 int dccp_rcv_established(struct sock *sk, struct sk_buff *skb,
 const struct dccp_hdr *dh, const unsigned len)
 {
-   struct dccp_sock *dp = dccp_sk(sk);
-
if (dccp_check_seqno(sk, skb))
goto discard;
 
if (dccp_parse_options(sk, NULL, skb))
goto discard;
 
-   if (DCCP_SKB_CB(skb)->dccpd_ack_seq != DCCP_PKT_WITHOUT_ACK_SEQ)
-   dccp_event_ack_recv(sk, skb);
-
-   if (dp->dccps_hc_rx_ackvec != NULL &&
-   dccp_ackvec_add(dp->dccps_hc_rx_ackvec, sk,
-   DCCP_SKB_CB(skb)->dccpd_seq, DCCPAV_RECEIVED))
-   goto discard;
+   dccp_handle_ackvec_processing(sk, skb);
dccp_deliver_input_to_ccids(sk, skb);
 
return __dccp_rcv_established(sk, skb, dh, len);
@@ -507,11 +502,7 @@ static int dccp_rcv_request_sent_state_process(struct sock 
*sk,
 */
if (dccp_feat_activate_values(sk, &dp->dccps_featneg))
goto unable_to_proceed;
-
-   if (dp->dccps_hc_rx_ackvec != NULL &&
-   dccp_ackvec_add(dp->dccps_hc_rx_ackvec, sk,
-   DCCP_SKB_CB(skb)->dccpd_seq, 
DCCPAV_RECEIVED))
-   goto unable_to_proceed;
+   dccp_handle_ackvec_processing(sk, skb);
 
dccp_send_ack(sk);
return -1;
@@ -632,14 +623,7 @@ int dccp_rcv_state_process(struct sock *sk, struct sk_buff 
*skb,
if (dccp_parse_options(sk, NULL, skb))
goto discard;
 
-   if (dcb->dccpd_ack_seq != DCCP_PKT_WITHOUT_ACK_SEQ)
-   dccp_event_ack_recv(sk, skb);
-
-   if (dp->dccps_hc_rx_ackvec != NULL &&
-   dccp_ackvec_add(dp->dccps_hc_rx_ackvec, sk,
-   DCCP_SKB_CB(skb)->dccpd_seq, 
DCCPAV_RECEIVED))
-   goto discard;
-
+   dccp_handle_ackvec_processing(sk, skb);
dccp_deliver_input_to_ccids(sk, skb);
}
 
-- 
1.5.3.GIT

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[DCCP] [PATCH 1/1]: Whitespace / outdated documentation

2007-12-18 Thread Gerrit Renker
[CCID3]: Whitespace cleanups and outdated documentation

This removes outdated documentation which had been forgotten to be
removed (x_recv, rtt now appear twice, p was removed from rx_sock);
and removes new whitespace.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/ccids/ccid3.c |6 +++---
 net/dccp/ccids/ccid3.h |3 ---
 2 files changed, 3 insertions(+), 6 deletions(-)

--- a/net/dccp/ccids/ccid3.h
+++ b/net/dccp/ccids/ccid3.h
@@ -133,9 +133,6 @@ enum ccid3_hc_rx_states {
 
 /** struct ccid3_hc_rx_sock - CCID3 receiver half-connection socket
  *
- *  @ccid3hcrx_x_recv  -  Receiver estimate of send rate (RFC 3448 4.3)
- *  @ccid3hcrx_rtt  -  Receiver estimate of rtt (non-standard)
- *  @ccid3hcrx_p  -  Current loss event rate (RFC 3448 5.4)
  *  @ccid3hcrx_last_counter  -  Tracks window counter (RFC 4342, 8.1)
  *  @ccid3hcrx_state  -  Receiver state, one of %ccid3_hc_rx_states
  *  @ccid3hcrx_bytes_recv  -  Total sum of DCCP payload bytes
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -437,9 +437,9 @@ static void ccid3_hc_tx_packet_recv(stru
ccid3_hc_tx_set_state(sk, TFRC_SSTATE_FBACK);
 
if (hctx->ccid3hctx_t_rto == 0) {
-   /*
+   /*
 * Initial feedback packet: Larger Initial Windows (4.2)
-*/
+*/
hctx->ccid3hctx_x= rfc3390_initial_rate(sk);
hctx->ccid3hctx_t_ld = now;
 
@@ -451,7 +451,7 @@ static void ccid3_hc_tx_packet_recv(stru
 * First feedback after nofeedback timer expiry (4.3)
 */
goto done_computing_x;
-   }
+   }
}
 
/* Update sending rate (step 4 of [RFC 3448, 4.3]) */
-- 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] [UDP6]: Counter increment on BH mode

2007-12-18 Thread Gerrit Renker
| 
| I didn't hear any objections so here is the patch again.
| 
| [SNMP]: Fix SNMP counters with PREEMPT
| 
I don't feel competent to say whether this is correct, but it seems that
this is going a long way towards resolving older problems with the SNMP
counters.

What I can say is that a year ago, when doing the UDP/-Lite work, I found
that the counters were not accurate, which was due to the double-counting
problem that Wang Chen brought up again.

Resolving this issue was stalled at that time due to the fact that a
counter may be incremented on one CPU and then later decremented on
another.

It looks as if this work is reaching the cause of the problem, which would
be good since the problem propagates to all users of such counters (UDP,
UDP-Lite, and similar MIB counters). So thank you for taking this issue up,
I hope that this will lead to a patch set resolving the counter issues.

Best regards,
Gerrit
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC]: Break up a patch in two (rfc3448bis changes to feedback reception)

2007-12-17 Thread Gerrit Renker
|   The end result should be equivalent, but please take a look and
That is a good catch - this patch was a pain to keep updated exactly due
to the many indentation levels. I had a quick look, the patch looks ok.

Just a small suggestion - since the RTT lookup code in tx_packet_recv()
is new, would it make sense to group it with the RTT validation, as e.g.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[DCCP] [Announce]: Ack Vector Implementation Notes

2007-12-13 Thread Gerrit Renker
I've been working on making DCCP Ack Vectors more robust, dealing more 
gracefully
with buffer overflow, and fixing two cases which will lead to corrupted buffer 
state.

The encountered problems and implementation strategy are documented on
http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ack_vectors/

I'd be glad for feedback, in particular if there are any errors or points which 
may have been
overlooked.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 8/8] [PATCH v2] [CCID3]: Interface CCID3 code with newer Loss Intervals Database

2007-12-12 Thread Gerrit Renker
| > +static struct kmem_cache  *tfrc_lh_slab  __read_mostly;/* <=== */
| 
| Yup, this one, is introduced as above but is not initialized at the
| module init routine, please see, it should be OK and we can move on:
| 
| 
http://git.kernel.org/?p=linux/kernel/git/acme/net-2.6.25.git;a=commitdiff;h=a925429ce2189b548dc19037d3ebd4ff35ae4af7
| 
Sorry for the confusion - you were right, the initialisation was sitting
in the wrong patch, not the one in the subject line. In your online
version the problem is fixed. Thanks a lot for all the work and for the
clarification.

Gerrit
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 8/8] [PATCH v2] [CCID3]: Interface CCID3 code with newer Loss Intervals Database

2007-12-12 Thread Gerrit Renker
| This time around I'm not doing any reordering, just trying to use your
| patches as is, but adding this patch as-is produces a kernel that will
| crash, no?
| 
| > The loss history and the RX/TX packet history slabs are all created in
| > tfrc.c using the three different __init routines of the dccp_tfrc_lib.
| 
| Yes, the init routines are called and in turn they create the slab
| caches, but up to the patch "[PATCH 8/8] [PATCH v2] [CCID3]: Interface
| CCID3 code with newer Loss Intervals Database" the new li slab is not
| being created, no? See what I'm talking?
| 
Sorry, there is some weird kind of mix-up going on. Can you please check
your patch set: it seems this email exchange refers to an older variant.
In the most recent patch set, the slab is introduced in the patch

[TFRC]: Ringbuffer to track loss interval history

--- a/net/dccp/ccids/lib/loss_interval.c
+++ b/net/dccp/ccids/lib/loss_interval.c
@@ -27,6 +23,54 @@ struct dccp_li_hist_entry {
u32  dccplih_interval;
 };

+static struct kmem_cache  *tfrc_lh_slab  __read_mostly;/* <=== */
+/* Loss Interval weights from [RFC 3448, 5.4], scaled by 10 */
+static const int tfrc_lh_weights[NINTERVAL] = { 10, 10, 10, 10, 8, 6, 4, 2 };
// ...

And this is 6/8, i.e. before 8/8, cf.
http://www.mail-archive.com/[EMAIL PROTECTED]/msg03000.html
 
I don't know which tree you are working off, would it be possible to
check against the test tree
git://eden-feed.erg.abdn.ac.uk/dccp_exp [dccp]

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 8/8] [PATCH v2] [CCID3]: Interface CCID3 code with newer Loss Intervals Database

2007-12-11 Thread Gerrit Renker
| When interfacing we must make sure that ccid3 tfrc_lh_slab is created
| and then tfrc_li_cachep is not needed. I'm doing this while keeping
| the structure of the patches, i.e. one introducing, the other removing.
| But we need to create tfrc_lh_slab if we want the tree to be bisectable.
| 
| I'm doing this and keeping your Signed-off-line, please holler if you
| disagree for some reason.
If you are just shifting and reordering then that is fine with me. But
it seems you mean a different patch since in this one there is no slab
initialisation. 
The loss history and the RX/TX packet history slabs are all created in
tfrc.c using the three different __init routines of the dccp_tfrc_lib.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] [PATCH v2] [TFRC]: Ringbuffer to track loss interval history

2007-12-10 Thread Gerrit Renker
A ringbuffer-based implementation of loss interval history is easier to
maintain, allocate, and update.

The `swap' routine to keep the RX history sorted is due to and was written
by Arnaldo Carvalho de Melo, simplifying an earlier macro-based variant.

Details:
 * access to the Loss Interval Records via macro wrappers (with safety checks);
 * simplified, on-demand allocation of entries (no extra memory consumption on
   lossless links); cache allocation is local to the module / exported as 
service;
 * provision of RFC-compliant algorithm to re-compute average loss interval;
 * provision of comprehensive, new loss detection algorithm
- support for all cases of loss, including re-ordered/duplicate packets;
- waiting for NDUPACK=3 packets to fill the hole;
- updating loss records when a late-arriving packet fills a hole.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
---
 net/dccp/ccids/lib/loss_interval.c  |  161 +++-
 net/dccp/ccids/lib/loss_interval.h  |   56 ++-
 net/dccp/ccids/lib/packet_history.c |  202 +++
 net/dccp/ccids/lib/packet_history.h |   11 ++-
 net/dccp/ccids/lib/tfrc.h   |3 +
 5 files changed, 423 insertions(+), 10 deletions(-)

diff --git a/net/dccp/ccids/lib/loss_interval.c 
b/net/dccp/ccids/lib/loss_interval.c
index c0a933a..39980d1 100644
--- a/net/dccp/ccids/lib/loss_interval.c
+++ b/net/dccp/ccids/lib/loss_interval.c
@@ -1,6 +1,7 @@
 /*
  *  net/dccp/ccids/lib/loss_interval.c
  *
+ *  Copyright (c) 2007   The University of Aberdeen, Scotland, UK
  *  Copyright (c) 2005-7 The University of Waikato, Hamilton, New Zealand.
  *  Copyright (c) 2005-7 Ian McDonald <[EMAIL PROTECTED]>
  *  Copyright (c) 2005 Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
@@ -10,12 +11,7 @@
  *  the Free Software Foundation; either version 2 of the License, or
  *  (at your option) any later version.
  */
-
-#include 
 #include 
-#include "../../dccp.h"
-#include "loss_interval.h"
-#include "packet_history.h"
 #include "tfrc.h"
 
 #define DCCP_LI_HIST_IVAL_F_LENGTH  8
@@ -27,6 +23,54 @@ struct dccp_li_hist_entry {
u32  dccplih_interval;
 };
 
+static struct kmem_cache  *tfrc_lh_slab  __read_mostly;
+/* Loss Interval weights from [RFC 3448, 5.4], scaled by 10 */
+static const int tfrc_lh_weights[NINTERVAL] = { 10, 10, 10, 10, 8, 6, 4, 2 };
+
+/* implements LIFO semantics on the array */
+static inline u8 LIH_INDEX(const u8 ctr)
+{
+   return (LIH_SIZE - 1 - (ctr % LIH_SIZE));
+}
+
+/* the `counter' index always points at the next entry to be populated */
+static inline struct tfrc_loss_interval *tfrc_lh_peek(struct tfrc_loss_hist 
*lh)
+{
+   return lh->counter ? lh->ring[LIH_INDEX(lh->counter - 1)] : NULL;
+}
+
+/* given i with 0 <= i <= k, return I_i as per the rfc3448bis notation */
+static inline u32 tfrc_lh_get_interval(struct tfrc_loss_hist *lh, const u8 i)
+{
+   BUG_ON(i >= lh->counter);
+   return lh->ring[LIH_INDEX(lh->counter - i - 1)]->li_length;
+}
+
+/*
+ * On-demand allocation and de-allocation of entries
+ */
+static struct tfrc_loss_interval *tfrc_lh_demand_next(struct tfrc_loss_hist 
*lh)
+{
+   if (lh->ring[LIH_INDEX(lh->counter)] == NULL)
+   lh->ring[LIH_INDEX(lh->counter)] = 
kmem_cache_alloc(tfrc_lh_slab,
+   GFP_ATOMIC);
+   return lh->ring[LIH_INDEX(lh->counter)];
+}
+
+void tfrc_lh_cleanup(struct tfrc_loss_hist *lh)
+{
+   if (!tfrc_lh_is_initialised(lh))
+   return;
+
+   for (lh->counter = 0; lh->counter < LIH_SIZE; lh->counter++)
+   if (lh->ring[LIH_INDEX(lh->counter)] != NULL) {
+   kmem_cache_free(tfrc_lh_slab,
+   lh->ring[LIH_INDEX(lh->counter)]);
+   lh->ring[LIH_INDEX(lh->counter)] = NULL;
+   }
+}
+EXPORT_SYMBOL_GPL(tfrc_lh_cleanup);
+
 static struct kmem_cache *dccp_li_cachep __read_mostly;
 
 static inline struct dccp_li_hist_entry *dccp_li_hist_entry_new(const gfp_t 
prio)
@@ -98,6 +142,65 @@ u32 dccp_li_hist_calc_i_mean(struct list_head *list)
 
 EXPORT_SYMBOL_GPL(dccp_li_hist_calc_i_mean);
 
+static void tfrc_lh_calc_i_mean(struct tfrc_loss_hist *lh)
+{
+   u32 i_i, i_tot0 = 0, i_tot1 = 0, w_tot = 0;
+   int i, k = tfrc_lh_length(lh) - 1; /* k is as in rfc3448bis, 5.4 */
+
+   for (i=0; i <= k; i++) {
+   i_i = tfrc_lh_get_interval(lh, i);
+
+   if (i < k) {
+   i_tot0 += i_i * tfrc_lh_weights[i];
+   w_tot  += tfrc_lh_weights[i];
+   }
+   if (i > 0)
+   i_tot1 += i_i * tfrc_lh_weights[i-1];
+   }
+

[PATCH 1/4] [PATCH v2] [TFRC]: Put RX/TX initialisation into tfrc.c

2007-12-10 Thread Gerrit Renker
This separates RX/TX initialisation and puts all packet history / loss intervals
initialisation into tfrc.c.
The organisation is uniform: slab declaration -> {rx,tx}_init() -> 
{rx,tx}_exit()

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/ccids/lib/packet_history.c |   68 --
 net/dccp/ccids/lib/tfrc.c   |   31 
 2 files changed, 55 insertions(+), 44 deletions(-)

diff --git a/net/dccp/ccids/lib/packet_history.c 
b/net/dccp/ccids/lib/packet_history.c
index af44082..727b17d 100644
--- a/net/dccp/ccids/lib/packet_history.c
+++ b/net/dccp/ccids/lib/packet_history.c
@@ -57,6 +57,22 @@ struct tfrc_tx_hist_entry {
  */
 static struct kmem_cache *tfrc_tx_hist_slab;
 
+int __init tfrc_tx_packet_history_init(void)
+{
+   tfrc_tx_hist_slab = kmem_cache_create("tfrc_tx_hist",
+ sizeof(struct tfrc_tx_hist_entry),
+ 0, SLAB_HWCACHE_ALIGN, NULL);
+   return tfrc_tx_hist_slab == NULL ? -ENOBUFS : 0;
+}
+
+void tfrc_tx_packet_history_exit(void)
+{
+   if (tfrc_tx_hist_slab != NULL) {
+   kmem_cache_destroy(tfrc_tx_hist_slab);
+   tfrc_tx_hist_slab = NULL;
+   }
+}
+
 static struct tfrc_tx_hist_entry *
tfrc_tx_hist_find_entry(struct tfrc_tx_hist_entry *head, u64 seqno)
 {
@@ -119,6 +135,22 @@ EXPORT_SYMBOL_GPL(tfrc_tx_hist_rtt);
  */
 static struct kmem_cache *tfrc_rx_hist_slab;
 
+int __init tfrc_rx_packet_history_init(void)
+{
+   tfrc_rx_hist_slab = kmem_cache_create("tfrc_rxh_cache",
+ sizeof(struct tfrc_rx_hist_entry),
+ 0, SLAB_HWCACHE_ALIGN, NULL);
+   return tfrc_rx_hist_slab == NULL ? -ENOBUFS : 0;
+}
+
+void tfrc_rx_packet_history_exit(void)
+{
+   if (tfrc_rx_hist_slab != NULL) {
+   kmem_cache_destroy(tfrc_rx_hist_slab);
+   tfrc_rx_hist_slab = NULL;
+   }
+}
+
 /**
  * tfrc_rx_hist_index - index to reach n-th entry after loss_start
  */
@@ -316,39 +348,3 @@ keep_ref_for_next_time:
return sample;
 }
 EXPORT_SYMBOL_GPL(tfrc_rx_hist_sample_rtt);
-
-__init int packet_history_init(void)
-{
-   tfrc_tx_hist_slab = kmem_cache_create("tfrc_tx_hist",
- sizeof(struct 
tfrc_tx_hist_entry), 0,
- SLAB_HWCACHE_ALIGN, NULL);
-   if (tfrc_tx_hist_slab == NULL)
-   goto out_err;
-
-   tfrc_rx_hist_slab = kmem_cache_create("tfrc_rx_hist",
- sizeof(struct 
tfrc_rx_hist_entry), 0,
- SLAB_HWCACHE_ALIGN, NULL);
-   if (tfrc_rx_hist_slab == NULL)
-   goto out_free_tx;
-
-   return 0;
-
-out_free_tx:
-   kmem_cache_destroy(tfrc_tx_hist_slab);
-   tfrc_tx_hist_slab = NULL;
-out_err:
-   return -ENOBUFS;
-}
-
-void packet_history_exit(void)
-{
-   if (tfrc_tx_hist_slab != NULL) {
-   kmem_cache_destroy(tfrc_tx_hist_slab);
-   tfrc_tx_hist_slab = NULL;
-   }
-
-   if (tfrc_rx_hist_slab != NULL) {
-   kmem_cache_destroy(tfrc_rx_hist_slab);
-   tfrc_rx_hist_slab = NULL;
-   }
-}
diff --git a/net/dccp/ccids/lib/tfrc.c b/net/dccp/ccids/lib/tfrc.c
index 3a7a183..20763fa 100644
--- a/net/dccp/ccids/lib/tfrc.c
+++ b/net/dccp/ccids/lib/tfrc.c
@@ -14,27 +14,42 @@ module_param(tfrc_debug, bool, 0444);
 MODULE_PARM_DESC(tfrc_debug, "Enable debug messages");
 #endif
 
+extern int  tfrc_tx_packet_history_init(void);
+extern void tfrc_tx_packet_history_exit(void);
+extern int  tfrc_rx_packet_history_init(void);
+extern void tfrc_rx_packet_history_exit(void);
+
 extern int  dccp_li_init(void);
 extern void dccp_li_exit(void);
-extern int packet_history_init(void);
-extern void packet_history_exit(void);
 
 static int __init tfrc_module_init(void)
 {
int rc = dccp_li_init();
 
-   if (rc == 0) {
-   rc = packet_history_init();
-   if (rc != 0)
-   dccp_li_exit();
-   }
+   if (rc)
+   goto out;
+
+   rc = tfrc_tx_packet_history_init();
+   if (rc)
+   goto out_free_loss_intervals;
 
+   rc = tfrc_rx_packet_history_init();
+   if (rc)
+   goto out_free_tx_history;
+   return 0;
+
+out_free_tx_history:
+   tfrc_tx_packet_history_exit();
+out_free_loss_intervals:
+   dccp_li_exit();
+out:
return rc;
 }
 
 static void __exit tfrc_module_exit(void)
 {
-   packet_history_exit();
+   tfrc_rx_packet_history_exit();
+   tfrc_tx_packet_history_exit();
dccp_li_exit();
 }
 
-- 
1.5.3.GIT

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] [CCID3]: Redundant debugging output / documentation

2007-12-10 Thread Gerrit Renker
Each time feedback is sent two lines are printed:

ccid3_hc_rx_send_feedback: client ... - entry
ccid3_hc_rx_send_feedback: Interval ...usec, X_recv=..., 1/p=...

The first line is redundant and thus removed.

Further, documentation of ccid3_hc_rx_sock (capitalisation) is made consistent.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/ccids/ccid3.c |2 --
 net/dccp/ccids/ccid3.h |4 ++--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index 60fcb31..b92069b 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -685,8 +685,6 @@ static void ccid3_hc_rx_send_feedback(struct sock *sk,
ktime_t now;
s64 delta = 0;
 
-   ccid3_pr_debug("%s(%p) - entry \n", dccp_role(sk), sk);
-
if (unlikely(hcrx->ccid3hcrx_state == TFRC_RSTATE_TERM))
return;
 
diff --git a/net/dccp/ccids/ccid3.h b/net/dccp/ccids/ccid3.h
index 3c33dc6..6ceeb80 100644
--- a/net/dccp/ccids/ccid3.h
+++ b/net/dccp/ccids/ccid3.h
@@ -135,9 +135,9 @@ enum ccid3_hc_rx_states {
  *
  *  @ccid3hcrx_x_recv  -  Receiver estimate of send rate (RFC 3448 4.3)
  *  @ccid3hcrx_rtt  -  Receiver estimate of rtt (non-standard)
- *  @ccid3hcrx_p  -  current loss event rate (RFC 3448 5.4)
+ *  @ccid3hcrx_p  -  Current loss event rate (RFC 3448 5.4)
  *  @ccid3hcrx_last_counter  -  Tracks window counter (RFC 4342, 8.1)
- *  @ccid3hcrx_state  -  receiver state, one of %ccid3_hc_rx_states
+ *  @ccid3hcrx_state  -  Receiver state, one of %ccid3_hc_rx_states
  *  @ccid3hcrx_bytes_recv  -  Total sum of DCCP payload bytes
  *  @ccid3hcrx_tstamp_last_feedback  -  Time at which last feedback was sent
  *  @ccid3hcrx_tstamp_last_ack  -  Time at which last feedback was sent
-- 
1.5.3.GIT

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] [PATCH v2] [TFRC]: Loss interval code needs the macros/inlines that were moved

2007-12-10 Thread Gerrit Renker
This moves the inlines (which were previously declared as macros) back into 
packet_history.h since
the loss detection code needs to be able to read entries from the RX history in 
order to create the
relevant loss entries: it needs at least tfrc_rx_hist_loss_prev() and 
tfrc_rx_hist_last_rcv(), which
in turn require the definition of the other inlines (macros).

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/ccids/lib/packet_history.c |   35 ---
 net/dccp/ccids/lib/packet_history.h |   35 +++
 2 files changed, 35 insertions(+), 35 deletions(-)

diff --git a/net/dccp/ccids/lib/packet_history.c 
b/net/dccp/ccids/lib/packet_history.c
index 1346045..22114c6 100644
--- a/net/dccp/ccids/lib/packet_history.c
+++ b/net/dccp/ccids/lib/packet_history.c
@@ -151,41 +151,6 @@ void tfrc_rx_packet_history_exit(void)
}
 }
 
-/**
- * tfrc_rx_hist_index - index to reach n-th entry after loss_start
- */
-static inline u8 tfrc_rx_hist_index(const struct tfrc_rx_hist *h, const u8 n)
-{
-   return (h->loss_start + n) & TFRC_NDUPACK;
-}
-
-/**
- * tfrc_rx_hist_last_rcv - entry with highest-received-seqno so far
- */
-static inline struct tfrc_rx_hist_entry *
-   tfrc_rx_hist_last_rcv(const struct tfrc_rx_hist *h)
-{
-   return h->ring[tfrc_rx_hist_index(h, h->loss_count)];
-}
-
-/**
- * tfrc_rx_hist_entry - return the n-th history entry after loss_start
- */
-static inline struct tfrc_rx_hist_entry *
-   tfrc_rx_hist_entry(const struct tfrc_rx_hist *h, const u8 n)
-{
-   return h->ring[tfrc_rx_hist_index(h, n)];
-}
-
-/**
- * tfrc_rx_hist_loss_prev - entry with highest-received-seqno before loss was 
detected
- */
-static inline struct tfrc_rx_hist_entry *
-   tfrc_rx_hist_loss_prev(const struct tfrc_rx_hist *h)
-{
-   return h->ring[h->loss_start];
-}
-
 /* has the packet contained in skb been seen before? */
 int tfrc_rx_hist_duplicate(struct tfrc_rx_hist *h, struct sk_buff *skb)
 {
diff --git a/net/dccp/ccids/lib/packet_history.h 
b/net/dccp/ccids/lib/packet_history.h
index 3dfd182..e58b0fc 100644
--- a/net/dccp/ccids/lib/packet_history.h
+++ b/net/dccp/ccids/lib/packet_history.h
@@ -84,6 +84,41 @@ struct tfrc_rx_hist {
 #define rtt_sample_prev  loss_start
 };
 
+/**
+ * tfrc_rx_hist_index - index to reach n-th entry after loss_start
+ */
+static inline u8 tfrc_rx_hist_index(const struct tfrc_rx_hist *h, const u8 n)
+{
+   return (h->loss_start + n) & TFRC_NDUPACK;
+}
+
+/**
+ * tfrc_rx_hist_last_rcv - entry with highest-received-seqno so far
+ */
+static inline struct tfrc_rx_hist_entry *
+   tfrc_rx_hist_last_rcv(const struct tfrc_rx_hist *h)
+{
+   return h->ring[tfrc_rx_hist_index(h, h->loss_count)];
+}
+
+/**
+ * tfrc_rx_hist_entry - return the n-th history entry after loss_start
+ */
+static inline struct tfrc_rx_hist_entry *
+   tfrc_rx_hist_entry(const struct tfrc_rx_hist *h, const 
u8 n)
+{
+   return h->ring[tfrc_rx_hist_index(h, n)];
+}
+
+/**
+ * tfrc_rx_hist_loss_prev - entry with highest-received-seqno before loss was 
detected
+ */
+static inline struct tfrc_rx_hist_entry *
+   tfrc_rx_hist_loss_prev(const struct tfrc_rx_hist *h)
+{
+   return h->ring[h->loss_start];
+}
+
 extern void tfrc_rx_hist_add_packet(struct tfrc_rx_hist *h,
const struct sk_buff *skb, const u32 ndp);
 
-- 
1.5.3.GIT

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] [TFRC]: Revised Loss Intervals Patches (macro-less, new swap function)

2007-12-10 Thread Gerrit Renker
This revision updates earlier patches, following discussion,
and adds one additional cleanup patch at the end.


Patch #1: Revision of initialisation patch; fixed calling __exit function 
  from __init function - identified by Arnaldo.

Patch #2: Revision - re-converted tfrc_rx_hist_entry() back to inline,
  following discussion with Arnaldo.

Patch #3: Reworked - loss intervals database.  Individual changes:
  - replaced tfrc_rx_hist_swap() with routine suggested by Arnaldo;
  - replaced all access macros with inlines or in-place(s);
  - replaced LIH_INDEX also with inline instead of macro.
  
Patch #4: Removes redundant debugging output from syslog.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/8] [TFRC]: Loss interval code needs the macros/inlines that were moved

2007-12-10 Thread Gerrit Renker
| > 
| >   distcc[24516] ERROR: compile 
/root/.ccache/packet_his.tmp.aspire.home.net.24512.i on _tiptop failed
| >   /usr/src/davem-2.6/net/dccp/ccids/lib/packet_history.c: In function 
'__one_after_loss':
| >   /usr/src/davem-2.6/net/dccp/ccids/lib/packet_history.c:266: error: lvalue 
required as unary '&' operand

| 
| Because you do it this way:
| 
| tfrc_rx_hist_swap(&TFRC_RX_HIST_ENTRY(h, 0), &TFRC_RX_HIST_ENTRY(h, 3));
| 
| I checked and at least in this patch series all uses are of this type,
| so why not do it using just the indexes, which would be simpler:
| 
| tfrc_rx_hist_swap(h, 0, 3);
| 
| With this implementation:
| 
| static void tfrc_rx_hist_swap(struct tfrc_rx_hist *h, const int a, const int 
b)
| {
|   const int idx_a = tfrc_rx_hist_index(h, a),
| int idx_b = tfrc_rx_hist_index(h, b);
|   struct tfrc_rx_hist_entry *tmp = h->ring[idx_a];
| 
|   h->ring[idx_a] = h->ring[idx_b];
|   h->ring[idx_b] = tmp;
| }
| 
Agreed, that is useful in the present case, since then everything uses
inlines. The only suggestion I'd like to make is to use `u8' instead of 
`int' since the indices will have very low values.

There is a related point: you will probably have noticed that loss_interval.c 
also uses macros. I don't know if you are planning to convert these also into 
inlines. I think that there would be less benefit in converting these, since
they are locl to loss_interval.c and mostly serve to improve readability.

As I have at least one other patch to revise (plus another minor one),
I'll rework this according to the above. 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/8] [TFRC]: Put RX/TX initialisation into tfrc.c

2007-12-10 Thread Gerrit Renker
| > This separates RX/TX initialisation and puts all packet history / loss 
intervals
| > initialisation into tfrc.c.
| > The organisation is uniform: slab declaration -> {rx,tx}_init() -> 
{rx,tx}_exit()
| 
| NAK, you can't call a __exit marked routine from a __init marked
| routine.
| 
Ok thanks, will fix that in revision 2.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/8] [PATCH v2] [TFRC]: Ringbuffer to track loss interval history

2007-12-08 Thread Gerrit Renker
A ringbuffer-based implementation of loss interval history is easier to
maintain, allocate, and update.

Details:
 * access to the Loss Interval Records via macro wrappers (with safety checks);
 * simplified, on-demand allocation of entries (no extra memory consumption on
   lossless links); cache allocation is local to the module / exported as 
service;
 * provision of RFC-compliant algorithm to re-compute average loss interval;
 * provision of comprehensive, new loss detection algorithm
- support for all cases of loss, including re-ordered/duplicate packets;
- waiting for NDUPACK=3 packets to fill the hole;
- updating loss records when a late-arriving packet fills a hole.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
---
 net/dccp/ccids/lib/loss_interval.c  |  158 ++-
 net/dccp/ccids/lib/loss_interval.h  |   58 ++-
 net/dccp/ccids/lib/packet_history.c |  201 +++
 net/dccp/ccids/lib/packet_history.h |   11 ++-
 net/dccp/ccids/lib/tfrc.h   |3 +
 5 files changed, 421 insertions(+), 10 deletions(-)

diff --git a/net/dccp/ccids/lib/loss_interval.c 
b/net/dccp/ccids/lib/loss_interval.c
index c0a933a..b0e59fb 100644
--- a/net/dccp/ccids/lib/loss_interval.c
+++ b/net/dccp/ccids/lib/loss_interval.c
@@ -1,6 +1,7 @@
 /*
  *  net/dccp/ccids/lib/loss_interval.c
  *
+ *  Copyright (c) 2007   The University of Aberdeen, Scotland, UK
  *  Copyright (c) 2005-7 The University of Waikato, Hamilton, New Zealand.
  *  Copyright (c) 2005-7 Ian McDonald <[EMAIL PROTECTED]>
  *  Copyright (c) 2005 Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
@@ -10,12 +11,7 @@
  *  the Free Software Foundation; either version 2 of the License, or
  *  (at your option) any later version.
  */
-
-#include 
 #include 
-#include "../../dccp.h"
-#include "loss_interval.h"
-#include "packet_history.h"
 #include "tfrc.h"
 
 #define DCCP_LI_HIST_IVAL_F_LENGTH  8
@@ -27,6 +23,51 @@ struct dccp_li_hist_entry {
u32  dccplih_interval;
 };
 
+static struct kmem_cache  *tfrc_lh_slab  __read_mostly;
+/* Loss Interval weights from [RFC 3448, 5.4], scaled by 10 */
+static const int tfrc_lh_weights[NINTERVAL] = { 10, 10, 10, 10, 8, 6, 4, 2 };
+
+/*
+ * Access macros: These require that at least one entry is present in lh,
+ * and implement array semantics (0 is first, n-1 is the last of n entries).
+ */
+#define __lih_index(lh, n) LIH_INDEX((lh)->counter - (n) - 1)
+#define __lih_entry(lh, n) (lh)->ring[__lih_index(lh, n)]
+#define __curr_entry(lh)   (lh)->ring[LIH_INDEX((lh)->counter - 1)]
+#define __next_entry(lh)   (lh)->ring[LIH_INDEX((lh)->counter)]
+
+/* given i with 0 <= i <= k, return I_i as per the rfc3448bis notation */
+static inline u32 tfrc_lh_get_interval(struct tfrc_loss_hist *lh, u8 i)
+{
+   BUG_ON(i >= lh->counter);
+   return __lih_entry(lh, i)->li_length;
+}
+
+static inline struct tfrc_loss_interval *tfrc_lh_peek(struct tfrc_loss_hist 
*lh)
+{
+   return lh->counter ? __curr_entry(lh) : NULL;
+}
+
+/*
+ * On-demand allocation and de-allocation of entries
+ */
+static struct tfrc_loss_interval *tfrc_lh_demand_next(struct tfrc_loss_hist 
*lh)
+{
+   if (__next_entry(lh) == NULL)
+   __next_entry(lh) = kmem_cache_alloc(tfrc_lh_slab, GFP_ATOMIC);
+
+   return __next_entry(lh);
+}
+
+void tfrc_lh_cleanup(struct tfrc_loss_hist *lh)
+{
+   if (tfrc_lh_is_initialised(lh))
+   for (lh->counter = 0; lh->counter < LIH_SIZE; lh->counter++)
+   if (__next_entry(lh) != NULL)
+   kmem_cache_free(tfrc_lh_slab, __next_entry(lh));
+}
+EXPORT_SYMBOL_GPL(tfrc_lh_cleanup);
+
 static struct kmem_cache *dccp_li_cachep __read_mostly;
 
 static inline struct dccp_li_hist_entry *dccp_li_hist_entry_new(const gfp_t 
prio)
@@ -98,6 +139,65 @@ u32 dccp_li_hist_calc_i_mean(struct list_head *list)
 
 EXPORT_SYMBOL_GPL(dccp_li_hist_calc_i_mean);
 
+static void tfrc_lh_calc_i_mean(struct tfrc_loss_hist *lh)
+{
+   u32 i_i, i_tot0 = 0, i_tot1 = 0, w_tot = 0;
+   int i, k = tfrc_lh_length(lh) - 1; /* k is as in rfc3448bis, 5.4 */
+
+   for (i=0; i <= k; i++) {
+   i_i = tfrc_lh_get_interval(lh, i);
+
+   if (i < k) {
+   i_tot0 += i_i * tfrc_lh_weights[i];
+   w_tot  += tfrc_lh_weights[i];
+   }
+   if (i > 0)
+   i_tot1 += i_i * tfrc_lh_weights[i-1];
+   }
+
+   BUG_ON(w_tot == 0);
+   lh->i_mean = max(i_tot0, i_tot1) / w_tot;
+}
+
+/**
+ * tfrc_lh_update_i_mean  -  Update the `open' loss interval I_0
+ * For recomputing p: returns `true' if p > p_prev  <=>  1/p < 1/p_prev
+ */
+u8 tfrc_lh_update_i_mean(struct tfrc_

[PATCH 8/8] [PATCH v2] [CCID3]: Interface CCID3 code with newer Loss Intervals Database

2007-12-08 Thread Gerrit Renker
This hooks up the TFRC Loss Interval database with CCID 3 packet reception.
In addition, it makes the CCID-specific computation of the first loss
interval (which requires access to all the guts of CCID3) local to ccid3.c.

The patch also fixes an omission in the DCCP code, that of a default /
fallback RTT value (defined in section 3.4 of RFC 4340 as 0.2 sec); while
at it, the  upper bound of 4 seconds for an RTT sample has  been reduced to
match the initial TCP RTO value of 3 seconds from[RFC 1122, 4.2.3.1].

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
---
 net/dccp/ccids/ccid3.c |   72 
 net/dccp/ccids/ccid3.h |   10 +++---
 net/dccp/dccp.h|7 +++-
 3 files changed, 70 insertions(+), 19 deletions(-)

diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index 60fcb31..4d0de21 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -1,6 +1,7 @@
 /*
  *  net/dccp/ccids/ccid3.c
  *
+ *  Copyright (c) 2007   The University of Aberdeen, Scotland, UK
  *  Copyright (c) 2005-7 The University of Waikato, Hamilton, New Zealand.
  *  Copyright (c) 2005-7 Ian McDonald <[EMAIL PROTECTED]>
  *
@@ -33,11 +34,7 @@
  *  along with this program; if not, write to the Free Software
  *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
  */
-#include "../ccid.h"
 #include "../dccp.h"
-#include "lib/packet_history.h"
-#include "lib/loss_interval.h"
-#include "lib/tfrc.h"
 #include "ccid3.h"
 
 #include 
@@ -759,6 +756,46 @@ static int ccid3_hc_rx_insert_options(struct sock *sk, 
struct sk_buff *skb)
return 0;
 }
 
+/** ccid3_first_li  -  Implements [RFC 3448, 6.3.1]
+ *
+ * Determine the length of the first loss interval via inverse lookup.
+ * Assume that X_recv can be computed by the throughput equation
+ * s
+ * X_recv = 
+ *  R * fval
+ * Find some p such that f(p) = fval; return 1/p (scaled).
+ */
+static u32 ccid3_first_li(struct sock *sk)
+{
+   struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk);
+   u32 x_recv, p, delta;
+   u64 fval;
+
+   if (hcrx->ccid3hcrx_rtt == 0) {
+   DCCP_WARN("No RTT estimate available, using fallback RTT\n");
+   hcrx->ccid3hcrx_rtt = DCCP_FALLBACK_RTT;
+   }
+
+   delta = 
ktime_to_us(net_timedelta(hcrx->ccid3hcrx_tstamp_last_feedback));
+   x_recv = scaled_div32(hcrx->ccid3hcrx_bytes_recv, delta);
+   if (x_recv == 0) {  /* would also trigger divide-by-zero */
+   DCCP_WARN("X_recv==0\n");
+   if ((x_recv = hcrx->ccid3hcrx_x_recv) == 0) {
+   DCCP_BUG("stored value of X_recv is zero");
+   return ~0U;
+   }
+   }
+
+   fval = scaled_div(hcrx->ccid3hcrx_s, hcrx->ccid3hcrx_rtt);
+   fval = scaled_div32(fval, x_recv);
+   p = tfrc_calc_x_reverse_lookup(fval);
+
+   ccid3_pr_debug("%s(%p), receive rate=%u bytes/s, implied "
+  "loss rate=%u\n", dccp_role(sk), sk, x_recv, p);
+
+   return p == 0 ? ~0U : scaled_div(1, p);
+}
+
 static void ccid3_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb)
 {
struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk);
@@ -796,6 +833,14 @@ static void ccid3_hc_rx_packet_recv(struct sock *sk, 
struct sk_buff *skb)
/*
 * Handle pending losses and otherwise check for new loss
 */
+   if (tfrc_rx_hist_loss_pending(&hcrx->ccid3hcrx_hist) &&
+   tfrc_rx_handle_loss(&hcrx->ccid3hcrx_hist,
+   &hcrx->ccid3hcrx_li_hist,
+   skb, ndp, ccid3_first_li, sk) ) {
+   do_feedback = CCID3_FBACK_PARAM_CHANGE;
+   goto done_receiving;
+   }
+
if (tfrc_rx_hist_new_loss_indicated(&hcrx->ccid3hcrx_hist, skb, ndp))
goto update_records;
 
@@ -805,7 +850,7 @@ static void ccid3_hc_rx_packet_recv(struct sock *sk, struct 
sk_buff *skb)
if (unlikely(!is_data_packet))
goto update_records;
 
-   if (list_empty(&hcrx->ccid3hcrx_li_hist)) {  /* no loss so far: p = 0 */
+   if (!tfrc_lh_is_initialised(&hcrx->ccid3hcrx_li_hist)) {
const u32 sample = 
tfrc_rx_hist_sample_rtt(&hcrx->ccid3hcrx_hist, skb);
/*
 * Empty loss history: no loss so far, hence p stays 0.
@@ -814,6 +859,13 @@ static void ccid3_hc_rx_packet_recv(struct sock *sk, 
struct sk_buff *skb)
 */
if (sample != 0)
hcrx->ccid3hcrx_rtt = tfrc_ewma(hcrx->ccid3hcrx_rtt, 
sample, 9);
+
+   } else if (tfrc_lh_update_i_mean(&hcrx->ccid3hcrx_li_hist, skb)) {
+   

[PATCH 5/8] [TFRC]: Loss interval code needs the macros/inlines that were moved

2007-12-08 Thread Gerrit Renker
This moves the inlines (which were previously declared as macros) back into 
packet_history.h since
the loss detection code needs to be able to read entries from the RX history in 
order to create the
relevant loss entries: it needs at least tfrc_rx_hist_loss_prev() and 
tfrc_rx_hist_last_rcv(), which
in turn require the definition of the other inlines (macros).

Additionally, inn one case the use of inlines instead of a macro broke the 
algorithm: rx_hist_swap()
(introduced in next patch) needs to be able to swap the history entries; when 
using an inline returning
a pointer instead, one gets compilation errors such as:

  distcc[24516] ERROR: compile 
/root/.ccache/packet_his.tmp.aspire.home.net.24512.i on _tiptop failed
  /usr/src/davem-2.6/net/dccp/ccids/lib/packet_history.c: In function 
'__one_after_loss':
  /usr/src/davem-2.6/net/dccp/ccids/lib/packet_history.c:266: error: lvalue 
required as unary '&' operand
  /usr/src/davem-2.6/net/dccp/ccids/lib/packet_history.c:267: error: lvalue 
required as unary '&' operand
  /usr/src/davem-2.6/net/dccp/ccids/lib/packet_history.c: In function 
'__two_after_loss':
  /usr/src/davem-2.6/net/dccp/ccids/lib/packet_history.c:298: error: lvalue 
required as unary '&' operand
  /usr/src/davem-2.6/net/dccp/ccids/lib/packet_history.c:299: error: lvalue 
required as unary '&' operand
  /usr/src/davem-2.6/net/dccp/ccids/lib/packet_history.c:336: error: lvalue 
required as unary '&' operand
  /usr/src/davem-2.6/net/dccp/ccids/lib/packet_history.c:337: error: lvalue 
required as unary '&' operand
  make[4]: *** [net/dccp/ccids/lib/packet_history.o] Error 1
  make[3]: *** [net/dccp/ccids/lib] Error 2
  make[2]: *** [net/dccp/ccids] Error 2
  make[1]: *** [net/dccp/] Error 2
  make: *** [sub-make] Error 2

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/ccids/lib/packet_history.c |   37 +--
 net/dccp/ccids/lib/packet_history.h |   32 ++
 2 files changed, 33 insertions(+), 36 deletions(-)

diff --git a/net/dccp/ccids/lib/packet_history.c 
b/net/dccp/ccids/lib/packet_history.c
index 6256bec..428636e 100644
--- a/net/dccp/ccids/lib/packet_history.c
+++ b/net/dccp/ccids/lib/packet_history.c
@@ -151,41 +151,6 @@ void __exit tfrc_rx_packet_history_exit(void)
}
 }
 
-/**
- * tfrc_rx_hist_index - index to reach n-th entry after loss_start
- */
-static inline u8 tfrc_rx_hist_index(const struct tfrc_rx_hist *h, const u8 n)
-{
-   return (h->loss_start + n) & TFRC_NDUPACK;
-}
-
-/**
- * tfrc_rx_hist_last_rcv - entry with highest-received-seqno so far
- */
-static inline struct tfrc_rx_hist_entry *
-   tfrc_rx_hist_last_rcv(const struct tfrc_rx_hist *h)
-{
-   return h->ring[tfrc_rx_hist_index(h, h->loss_count)];
-}
-
-/**
- * tfrc_rx_hist_entry - return the n-th history entry after loss_start
- */
-static inline struct tfrc_rx_hist_entry *
-   tfrc_rx_hist_entry(const struct tfrc_rx_hist *h, const u8 n)
-{
-   return h->ring[tfrc_rx_hist_index(h, n)];
-}
-
-/**
- * tfrc_rx_hist_loss_prev - entry with highest-received-seqno before loss was 
detected
- */
-static inline struct tfrc_rx_hist_entry *
-   tfrc_rx_hist_loss_prev(const struct tfrc_rx_hist *h)
-{
-   return h->ring[h->loss_start];
-}
-
 /* has the packet contained in skb been seen before? */
 int tfrc_rx_hist_duplicate(struct tfrc_rx_hist *h, struct sk_buff *skb)
 {
@@ -196,7 +161,7 @@ int tfrc_rx_hist_duplicate(struct tfrc_rx_hist *h, struct 
sk_buff *skb)
return 1;
 
for (i = 1; i <= h->loss_count; i++)
-   if (tfrc_rx_hist_entry(h, i)->tfrchrx_seqno == seq)
+   if (TFRC_RX_HIST_ENTRY(h, i)->tfrchrx_seqno == seq)
return 1;
 
return 0;
diff --git a/net/dccp/ccids/lib/packet_history.h 
b/net/dccp/ccids/lib/packet_history.h
index 3dfd182..6ea25cd 100644
--- a/net/dccp/ccids/lib/packet_history.h
+++ b/net/dccp/ccids/lib/packet_history.h
@@ -84,6 +84,38 @@ struct tfrc_rx_hist {
 #define rtt_sample_prev  loss_start
 };
 
+/**
+ * tfrc_rx_hist_index - index to reach n-th entry after loss_start
+ */
+static inline u8 tfrc_rx_hist_index(const struct tfrc_rx_hist *h, const u8 n)
+{
+   return (h->loss_start + n) & TFRC_NDUPACK;
+}
+
+/**
+ * tfrc_rx_hist_last_rcv - entry with highest-received-seqno so far
+ */
+static inline struct tfrc_rx_hist_entry *
+   tfrc_rx_hist_last_rcv(const struct tfrc_rx_hist *h)
+{
+   return h->ring[tfrc_rx_hist_index(h, h->loss_count)];
+}
+
+/**
+ * tfrc_rx_hist_entry - return the n-th history entry after loss_start
+ * This has to be a macro since rx_hist_swap needs to be able to swap entries.
+ */
+#define TFRC_RX_HIST_ENTRY(h, n)   ((h)->ring[tfrc_rx_

[PATCH 2/8] [TFRC]: Put RX/TX initialisation into tfrc.c

2007-12-08 Thread Gerrit Renker
This separates RX/TX initialisation and puts all packet history / loss intervals
initialisation into tfrc.c.
The organisation is uniform: slab declaration -> {rx,tx}_init() -> 
{rx,tx}_exit()

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/ccids/lib/packet_history.c |   68 --
 net/dccp/ccids/lib/tfrc.c   |   31 
 2 files changed, 55 insertions(+), 44 deletions(-)

diff --git a/net/dccp/ccids/lib/packet_history.c 
b/net/dccp/ccids/lib/packet_history.c
index 54cd23e..af0db71 100644
--- a/net/dccp/ccids/lib/packet_history.c
+++ b/net/dccp/ccids/lib/packet_history.c
@@ -57,6 +57,22 @@ struct tfrc_tx_hist_entry {
  */
 static struct kmem_cache *tfrc_tx_hist_slab;
 
+int __init tfrc_tx_packet_history_init(void)
+{
+   tfrc_tx_hist_slab = kmem_cache_create("tfrc_tx_hist",
+ sizeof(struct tfrc_tx_hist_entry),
+ 0, SLAB_HWCACHE_ALIGN, NULL);
+   return tfrc_tx_hist_slab == NULL ? -ENOBUFS : 0;
+}
+
+void __exit tfrc_tx_packet_history_exit(void)
+{
+   if (tfrc_tx_hist_slab != NULL) {
+   kmem_cache_destroy(tfrc_tx_hist_slab);
+   tfrc_tx_hist_slab = NULL;
+   }
+}
+
 static struct tfrc_tx_hist_entry *
tfrc_tx_hist_find_entry(struct tfrc_tx_hist_entry *head, u64 seqno)
 {
@@ -119,6 +135,22 @@ EXPORT_SYMBOL_GPL(tfrc_tx_hist_rtt);
  */
 static struct kmem_cache *tfrc_rx_hist_slab;
 
+int __init tfrc_rx_packet_history_init(void)
+{
+   tfrc_rx_hist_slab = kmem_cache_create("tfrc_rxh_cache",
+ sizeof(struct tfrc_rx_hist_entry),
+ 0, SLAB_HWCACHE_ALIGN, NULL);
+   return tfrc_rx_hist_slab == NULL ? -ENOBUFS : 0;
+}
+
+void __exit tfrc_rx_packet_history_exit(void)
+{
+   if (tfrc_rx_hist_slab != NULL) {
+   kmem_cache_destroy(tfrc_rx_hist_slab);
+   tfrc_rx_hist_slab = NULL;
+   }
+}
+
 /**
  * tfrc_rx_hist_index - index to reach n-th entry after loss_start
  */
@@ -321,39 +353,3 @@ keep_ref_for_next_time:
return sample;
 }
 EXPORT_SYMBOL_GPL(tfrc_rx_hist_sample_rtt);
-
-__init int packet_history_init(void)
-{
-   tfrc_tx_hist_slab = kmem_cache_create("tfrc_tx_hist",
- sizeof(struct 
tfrc_tx_hist_entry), 0,
- SLAB_HWCACHE_ALIGN, NULL);
-   if (tfrc_tx_hist_slab == NULL)
-   goto out_err;
-
-   tfrc_rx_hist_slab = kmem_cache_create("tfrc_rx_hist",
- sizeof(struct 
tfrc_rx_hist_entry), 0,
- SLAB_HWCACHE_ALIGN, NULL);
-   if (tfrc_rx_hist_slab == NULL)
-   goto out_free_tx;
-
-   return 0;
-
-out_free_tx:
-   kmem_cache_destroy(tfrc_tx_hist_slab);
-   tfrc_tx_hist_slab = NULL;
-out_err:
-   return -ENOBUFS;
-}
-
-void packet_history_exit(void)
-{
-   if (tfrc_tx_hist_slab != NULL) {
-   kmem_cache_destroy(tfrc_tx_hist_slab);
-   tfrc_tx_hist_slab = NULL;
-   }
-
-   if (tfrc_rx_hist_slab != NULL) {
-   kmem_cache_destroy(tfrc_rx_hist_slab);
-   tfrc_rx_hist_slab = NULL;
-   }
-}
diff --git a/net/dccp/ccids/lib/tfrc.c b/net/dccp/ccids/lib/tfrc.c
index 3a7a183..20763fa 100644
--- a/net/dccp/ccids/lib/tfrc.c
+++ b/net/dccp/ccids/lib/tfrc.c
@@ -14,27 +14,42 @@ module_param(tfrc_debug, bool, 0444);
 MODULE_PARM_DESC(tfrc_debug, "Enable debug messages");
 #endif
 
+extern int  tfrc_tx_packet_history_init(void);
+extern void tfrc_tx_packet_history_exit(void);
+extern int  tfrc_rx_packet_history_init(void);
+extern void tfrc_rx_packet_history_exit(void);
+
 extern int  dccp_li_init(void);
 extern void dccp_li_exit(void);
-extern int packet_history_init(void);
-extern void packet_history_exit(void);
 
 static int __init tfrc_module_init(void)
 {
int rc = dccp_li_init();
 
-   if (rc == 0) {
-   rc = packet_history_init();
-   if (rc != 0)
-   dccp_li_exit();
-   }
+   if (rc)
+   goto out;
+
+   rc = tfrc_tx_packet_history_init();
+   if (rc)
+   goto out_free_loss_intervals;
 
+   rc = tfrc_rx_packet_history_init();
+   if (rc)
+   goto out_free_tx_history;
+   return 0;
+
+out_free_tx_history:
+   tfrc_tx_packet_history_exit();
+out_free_loss_intervals:
+   dccp_li_exit();
+out:
return rc;
 }
 
 static void __exit tfrc_module_exit(void)
 {
-   packet_history_exit();
+   tfrc_rx_packet_history_exit();
+   tfrc_tx_packet_history_exit();
dccp_li_exit();
 }
 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/8] [TFRC]: Whitespace cleanups

2007-12-08 Thread Gerrit Renker
Just some tidy-ups to keep git/quilt happy. Also moved up the
comment "Receiver routines" above the first occurrence of RX
history routines.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/ccids/ccid3.c  |   16 
 net/dccp/ccids/lib/loss_interval.c  |2 +-
 net/dccp/ccids/lib/packet_history.c |   12 ++--
 3 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index faacffa..a5246f7 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -780,7 +780,7 @@ static void ccid3_hc_rx_packet_recv(struct sock *sk, struct 
sk_buff *skb)
 */
}
goto update_records;
-   }
+   }
 
if (tfrc_rx_hist_duplicate(&hcrx->ccid3hcrx_hist, skb))
return; /* done receiving */
@@ -792,7 +792,7 @@ static void ccid3_hc_rx_packet_recv(struct sock *sk, struct 
sk_buff *skb)
 */
hcrx->ccid3hcrx_s = tfrc_ewma(hcrx->ccid3hcrx_s, payload, 9);
hcrx->ccid3hcrx_bytes_recv += payload;
-   }
+   }
 
/*
 * Handle pending losses and otherwise check for new loss
@@ -808,11 +808,11 @@ static void ccid3_hc_rx_packet_recv(struct sock *sk, 
struct sk_buff *skb)
 
if (list_empty(&hcrx->ccid3hcrx_li_hist)) {  /* no loss so far: p = 0 */
const u32 sample = 
tfrc_rx_hist_sample_rtt(&hcrx->ccid3hcrx_hist, skb);
-   /*
-* Empty loss history: no loss so far, hence p stays 0.
-* Sample RTT values, since an RTT estimate is required for the
-* computation of p when the first loss occurs; RFC 3448, 6.3.1.
-*/
+   /*
+* Empty loss history: no loss so far, hence p stays 0.
+* Sample RTT values, since an RTT estimate is required for the
+* computation of p when the first loss occurs; RFC 3448, 6.3.1.
+*/
if (sample != 0)
hcrx->ccid3hcrx_rtt = tfrc_ewma(hcrx->ccid3hcrx_rtt, 
sample, 9);
}
@@ -823,7 +823,7 @@ static void ccid3_hc_rx_packet_recv(struct sock *sk, struct 
sk_buff *skb)
if (SUB16(dccp_hdr(skb)->dccph_ccval, hcrx->ccid3hcrx_last_counter) > 3)
do_feedback = CCID3_FBACK_PERIODIC;
 
-update_records:
+update_records:
tfrc_rx_hist_add_packet(&hcrx->ccid3hcrx_hist, skb, ndp);
 
if (do_feedback)
diff --git a/net/dccp/ccids/lib/loss_interval.c 
b/net/dccp/ccids/lib/loss_interval.c
index 7e0714a..c0a933a 100644
--- a/net/dccp/ccids/lib/loss_interval.c
+++ b/net/dccp/ccids/lib/loss_interval.c
@@ -131,7 +131,7 @@ static u32 dccp_li_calc_first_li(struct sock *sk,
 {
 /*
  * FIXME:
- * Will be rewritten in the upcoming new loss intervals code. 
+ * Will be rewritten in the upcoming new loss intervals code.
  * Has to be commented ou because it relies on the old rx history
  * data structures
  */
diff --git a/net/dccp/ccids/lib/packet_history.c 
b/net/dccp/ccids/lib/packet_history.c
index e197389..54cd23e 100644
--- a/net/dccp/ccids/lib/packet_history.c
+++ b/net/dccp/ccids/lib/packet_history.c
@@ -114,6 +114,11 @@ u32 tfrc_tx_hist_rtt(struct tfrc_tx_hist_entry *head, 
const u64 seqno,
 EXPORT_SYMBOL_GPL(tfrc_tx_hist_rtt);
 
 
+/*
+ * Receiver History Routines
+ */
+static struct kmem_cache *tfrc_rx_hist_slab;
+
 /**
  * tfrc_rx_hist_index - index to reach n-th entry after loss_start
  */
@@ -131,11 +136,6 @@ static inline struct tfrc_rx_hist_entry *
return h->ring[tfrc_rx_hist_index(h, h->loss_count)];
 }
 
-/*
- * Receiver History Routines
- */
-static struct kmem_cache *tfrc_rx_hist_slab;
-
 void tfrc_rx_hist_add_packet(struct tfrc_rx_hist *h,
 const struct sk_buff *skb,
 const u32 ndp)
@@ -278,7 +278,7 @@ u32 tfrc_rx_hist_sample_rtt(struct tfrc_rx_hist *h, const 
struct sk_buff *skb)
 {
u32 sample = 0,
delta_v = SUB16(dccp_hdr(skb)->dccph_ccval,
-   tfrc_rx_hist_rtt_last_s(h)->tfrchrx_ccval);
+   tfrc_rx_hist_rtt_last_s(h)->tfrchrx_ccval);
 
if (delta_v < 1 || delta_v > 4) {   /* unsuitable CCVal delta */
if (h->rtt_sample_prev == 2) {  /* previous candidate stored */
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/8] [TFRC]: CCID3 (and CCID4) needs to access these inlines

2007-12-08 Thread Gerrit Renker
This moves two inlines back to packet_history.h: these are not private
to packet_history.c, but are needed by CCID3/4 to detect whether a new
loss is indicated, or whether a loss is already pending.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/ccids/lib/packet_history.c |   26 --
 net/dccp/ccids/lib/packet_history.h |   28 ++--
 2 files changed, 26 insertions(+), 28 deletions(-)

diff --git a/net/dccp/ccids/lib/packet_history.c 
b/net/dccp/ccids/lib/packet_history.c
index a421032..760631e 100644
--- a/net/dccp/ccids/lib/packet_history.c
+++ b/net/dccp/ccids/lib/packet_history.c
@@ -188,32 +188,6 @@ void tfrc_rx_hist_add_packet(struct tfrc_rx_hist *h,
 }
 EXPORT_SYMBOL_GPL(tfrc_rx_hist_add_packet);
 
-/* initialise loss detection and disable RTT sampling */
-static inline void tfrc_rx_hist_loss_indicated(struct tfrc_rx_hist *h)
-{
-   h->loss_count = 1;
-}
-
-/* indicate whether previously a packet was detected missing */
-static inline int tfrc_rx_hist_loss_pending(const struct tfrc_rx_hist *h)
-{
-   return h->loss_count;
-}
-
-/* any data packets missing between last reception and skb ? */
-int tfrc_rx_hist_new_loss_indicated(struct tfrc_rx_hist *h,
-   const struct sk_buff *skb, u32 ndp)
-{
-   int delta = dccp_delta_seqno(tfrc_rx_hist_last_rcv(h)->tfrchrx_seqno,
-DCCP_SKB_CB(skb)->dccpd_seq);
-
-   if (delta > 1 && ndp < delta)
-   tfrc_rx_hist_loss_indicated(h);
-
-   return tfrc_rx_hist_loss_pending(h);
-}
-EXPORT_SYMBOL_GPL(tfrc_rx_hist_new_loss_indicated);
-
 static void tfrc_rx_hist_swap(struct tfrc_rx_hist_entry **a,
  struct tfrc_rx_hist_entry **b)
 {
diff --git a/net/dccp/ccids/lib/packet_history.h 
b/net/dccp/ccids/lib/packet_history.h
index 6df9582..cb3220f 100644
--- a/net/dccp/ccids/lib/packet_history.h
+++ b/net/dccp/ccids/lib/packet_history.h
@@ -115,12 +115,36 @@ static inline struct tfrc_rx_hist_entry *
return h->ring[h->loss_start];
 }
 
+/* initialise loss detection and disable RTT sampling */
+static inline void tfrc_rx_hist_loss_indicated(struct tfrc_rx_hist *h)
+{
+   h->loss_count = 1;
+}
+
+/* indicate whether previously a packet was detected missing */
+static inline int tfrc_rx_hist_loss_pending(const struct tfrc_rx_hist *h)
+{
+   return h->loss_count;
+}
+
+/* any data packets missing between last reception and skb ? */
+static inline int tfrc_rx_hist_new_loss_indicated(struct tfrc_rx_hist *h,
+ const struct sk_buff *skb, 
u32 ndp)
+{
+   int delta = dccp_delta_seqno(tfrc_rx_hist_last_rcv(h)->tfrchrx_seqno,
+DCCP_SKB_CB(skb)->dccpd_seq);
+
+   if (delta > 1 && ndp < delta)
+   tfrc_rx_hist_loss_indicated(h);
+
+   return tfrc_rx_hist_loss_pending(h);
+}
+
 extern void tfrc_rx_hist_add_packet(struct tfrc_rx_hist *h,
const struct sk_buff *skb, const u32 ndp);
 
 extern int tfrc_rx_hist_duplicate(struct tfrc_rx_hist *h, struct sk_buff *skb);
-extern int tfrc_rx_hist_new_loss_indicated(struct tfrc_rx_hist *h,
-  const struct sk_buff *skb, u32 ndp);
+
 struct tfrc_loss_hist;
 extern int  tfrc_rx_handle_loss(struct tfrc_rx_hist *, struct tfrc_loss_hist *,
struct sk_buff *skb, u32 ndp,
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/8] [TFRC]: Need separate entry_from_skb routine

2007-12-08 Thread Gerrit Renker
This again separates tfrc_rx_hist_packet() from tfrc_rx_hist_entry_from_skb(), 
which
was the format used in the original patch submission. The reason for this is the
requirement -- used by the subsequent patches -- to be able to insert a newly 
arrived
skb in arbitrary places within the ringbuffer history.

The add_packet() function is only a special case of this usage: it inserts 
always
at the front, and thus updates the highest-received sequence number. That has 
been
the main reason why it was originally called tfrc_rx_hist_update() --  the old 
name
reflected better what the function did.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/ccids/lib/packet_history.c |   35 ---
 1 files changed, 20 insertions(+), 15 deletions(-)

diff --git a/net/dccp/ccids/lib/packet_history.c 
b/net/dccp/ccids/lib/packet_history.c
index 4d37396..6256bec 100644
--- a/net/dccp/ccids/lib/packet_history.c
+++ b/net/dccp/ccids/lib/packet_history.c
@@ -168,21 +168,6 @@ static inline struct tfrc_rx_hist_entry *
return h->ring[tfrc_rx_hist_index(h, h->loss_count)];
 }
 
-void tfrc_rx_hist_add_packet(struct tfrc_rx_hist *h,
-const struct sk_buff *skb,
-const u32 ndp)
-{
-   struct tfrc_rx_hist_entry *entry = tfrc_rx_hist_last_rcv(h);
-   const struct dccp_hdr *dh = dccp_hdr(skb);
-
-   entry->tfrchrx_seqno = DCCP_SKB_CB(skb)->dccpd_seq;
-   entry->tfrchrx_ccval = dh->dccph_ccval;
-   entry->tfrchrx_type  = dh->dccph_type;
-   entry->tfrchrx_ndp   = ndp;
-   entry->tfrchrx_tstamp = ktime_get_real();
-}
-EXPORT_SYMBOL_GPL(tfrc_rx_hist_add_packet);
-
 /**
  * tfrc_rx_hist_entry - return the n-th history entry after loss_start
  */
@@ -218,6 +203,26 @@ int tfrc_rx_hist_duplicate(struct tfrc_rx_hist *h, struct 
sk_buff *skb)
 }
 EXPORT_SYMBOL_GPL(tfrc_rx_hist_duplicate);
 
+static void tfrc_rx_hist_entry_from_skb(struct tfrc_rx_hist_entry *entry,
+   const struct sk_buff *skb, u32 ndp)
+{
+   const struct dccp_hdr *dh = dccp_hdr(skb);
+
+   entry->tfrchrx_seqno  = DCCP_SKB_CB(skb)->dccpd_seq;
+   entry->tfrchrx_ccval  = dh->dccph_ccval;
+   entry->tfrchrx_type   = dh->dccph_type;
+   entry->tfrchrx_ndp= ndp;
+   entry->tfrchrx_tstamp = ktime_get_real();
+}
+
+/* commit packet details of skb to history (record highest received seqno) */
+void tfrc_rx_hist_add_packet(struct tfrc_rx_hist *h,
+const struct sk_buff *skb, u32 ndp)
+{
+   tfrc_rx_hist_entry_from_skb(tfrc_rx_hist_last_rcv(h), skb, ndp);
+}
+EXPORT_SYMBOL_GPL(tfrc_rx_hist_add_packet);
+
 /* initialise loss detection and disable RTT sampling */
 static inline void tfrc_rx_hist_loss_indicated(struct tfrc_rx_hist *h)
 {
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/8] [TFRC/CCID3]: Remove now unused functions / function calls

2007-12-08 Thread Gerrit Renker
This removes two things which now have become redundant:
 1. The function tfrc_rx_hist_entry_delete() is no longer referenced anywhere.
 2. The CCID3 HC-receiver still inserted timestamps, but received timestamps
are not parsed/referenced/used by the HC-sender, it serves no function.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/ccids/ccid3.c  |3 +--
 net/dccp/ccids/lib/packet_history.c |5 -
 2 files changed, 1 insertions(+), 7 deletions(-)

diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index a5246f7..60fcb31 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -750,8 +750,7 @@ static int ccid3_hc_rx_insert_options(struct sock *sk, 
struct sk_buff *skb)
x_recv = htonl(hcrx->ccid3hcrx_x_recv);
pinv   = htonl(hcrx->ccid3hcrx_pinv);
 
-   if (dccp_insert_option_timestamp(sk, skb) ||
-   dccp_insert_option(sk, skb, TFRC_OPT_LOSS_EVENT_RATE,
+   if (dccp_insert_option(sk, skb, TFRC_OPT_LOSS_EVENT_RATE,
   &pinv, sizeof(pinv)) ||
dccp_insert_option(sk, skb, TFRC_OPT_RECEIVE_RATE,
   &x_recv, sizeof(x_recv)))
diff --git a/net/dccp/ccids/lib/packet_history.c 
b/net/dccp/ccids/lib/packet_history.c
index af0db71..4d37396 100644
--- a/net/dccp/ccids/lib/packet_history.c
+++ b/net/dccp/ccids/lib/packet_history.c
@@ -183,11 +183,6 @@ void tfrc_rx_hist_add_packet(struct tfrc_rx_hist *h,
 }
 EXPORT_SYMBOL_GPL(tfrc_rx_hist_add_packet);
 
-static inline void tfrc_rx_hist_entry_delete(struct tfrc_rx_hist_entry *entry)
-{
-   kmem_cache_free(tfrc_rx_hist_slab, entry);
-}
-
 /**
  * tfrc_rx_hist_entry - return the n-th history entry after loss_start
  */
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/8] [DCCP]: Updates and fixes to ensure code works with recent changes

2007-12-08 Thread Gerrit Renker
Some of the recent changes in 2.6.25 cause problems with the existing 
implementation (i.e. break
the code). This patch set therefore provides fixes and resubmits one subsequent 
patch which
has not been considered so far, but which is part of the tfrc_lib package.

  Patch #1: Performs whitespace cleanups.
  Patch #2: Migrates all loss interval / packet history initialisation code 
into tfrc.c.
  Patch #3: Removes two unused functions/function calls that have become 
obsolete now.
  Patch #4: Splits rx_hist_add_packet() into its original constituents -- they 
are needed.
  Patch #5: Restores the parts of the macro/inline conversion that broke the 
algorithm.
  Patch #6: Is a v2 patch - the Loss Intervals code, now updated to work with 
the recent changes.
  Patch #7: Reverts hiding inlines which are needed by the calling CCID module.
  Patch #8: Also a patch v2 - shows how all the new stuff is integrated to work 
with CCID3.

The code compiles cleanly, all patches have been uploaded to the test tree 
(backported from 2.6.25),
  git://eden-feed.erg.abdn.ac.uk/dccp_exp   [dccp]

So far only a few quick bandwidth tests have been performed. These merely 
confirm that the stack
does not crash. Since most of this patch set deals with loss detection and 
re-ordering, some more
detailed tests are needed to ensure that the code, as before, deals well with 
loss, reordering,
and duplication (this requires at least one NetEm box).

Updates to the CCID4 subtree need to be suspended for a few days. There are too 
many fiddly changes
all over the place; until we understand exactly what is going on and if there 
has been a
regression, it is not a good idea to track everything.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC2][PATCH 7/7] [TFRC]: New rx history code

2007-12-06 Thread Gerrit Renker
|   The first six patches in this series are unmodified, so if you
| are OK with them please send me your Signed-off-by.
Patches [1/7], [2/7], and [6/7] already have a signed-off and there are
no changes. Just acknowledged [3..5/7], will look at [7/7] now.

Cheers
Gerrit
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7] [TFRC]: Rename dccp_rx_ to tfrc_rx_

2007-12-06 Thread Gerrit Renker
| Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/7] [TFRC]: Make the rx history slab be global

2007-12-06 Thread Gerrit Renker
| Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/7] [TFRC]: Rename tfrc_tx_hist to tfrc_tx_hist_slab, for consistency

2007-12-06 Thread Gerrit Renker
| Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/3][BUG-FIX]: Test tree updates and bug fixes

2007-12-05 Thread Gerrit Renker
| Thanks, I folded this into the reorganized RX history handling patch,
| together with reverting ccid3_hc_rx_packet_recv to very close to your
| original patch, with this changes:
| 
| 1. no need to calculate the payload size for non data packets as this
|value won't be used.
| 2. Initialize hcrx->ccid3hcrx_bytes_recv with the payload size when
|hcrx->ccid3hcrx_state == TFRC_RSTATE_NO_DATA.
| 3. tfrc_rx_hist_duplicate() is only called when ccid3hcrx_state !=
|TFRC_RSTATE_NO_DATA, so it doesn't need to goto the done_receiving
|label (that was removed as this was its only use) as do_feedback
|would always be CCID3_FBACK_NONE and so the test would always fail
|and no feedback would be sent, so just return right there.
| 
| Now it reads:
| 
| static void ccid3_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb)
| {
|   struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk);
|   enum ccid3_fback_type do_feedback = CCID3_FBACK_NONE;
|   const u32 ndp = dccp_sk(sk)->dccps_options_received.dccpor_ndp;
|   const bool is_data_packet = dccp_data_packet(skb);
| 
|   if (unlikely(hcrx->ccid3hcrx_state == TFRC_RSTATE_NO_DATA)) {
|   if (is_data_packet) {
|   const u32 payload = skb->len - 
dccp_hdr(skb)->dccph_doff * 4;
|   do_feedback = FBACK_INITIAL;
|   ccid3_hc_rx_set_state(sk, TFRC_RSTATE_DATA);
|   hcrx->ccid3hcrx_s =
|   hcrx->ccid3hcrx_bytes_recv = payload_size;

  ==> Please see other email regarding bytes_recv, but I think you already got 
that.
  Maybe one could then write
hcrx->ccid3hcrx_s = skb->len - dccp_hdr(skb)->dccph_doff * 4;
 
|   }
|   goto update_records;
|   }
| 
|   if (tfrc_rx_hist_duplicate(&hcrx->ccid3hcrx_hist, skb))
|   return; /* done receiving */
| 
|   if (is_data_packet) {
|   const u32 payload = skb->len - dccp_hdr(skb)->dccph_doff * 4;
|   /*
|* Update moving-average of s and the sum of received payload 
bytes
|*/
|   hcrx->ccid3hcrx_s = tfrc_ewma(hcrx->ccid3hcrx_s, payload_size, 
9);
|   hcrx->ccid3hcrx_bytes_recv += payload_size;
|   }
| 
|   /*
|* Handle pending losses and otherwise check for new loss
|*/
|   if (tfrc_rx_hist_new_loss_indicated(&hcrx->ccid3hcrx_hist, skb, ndp))
|   goto update_records;
| 
|   /*
|* Handle data packets: RTT sampling and monitoring p
|*/
|   if (unlikely(!is_data_packet))
|   goto update_records;
| 
|   if (list_empty(&hcrx->ccid3hcrx_li_hist)) {  /* no loss so far: p = 0 */
|   const u32 sample = 
tfrc_rx_hist_sample_rtt(&hcrx->ccid3hcrx_hist, skb);
==> If you like, you could add the original comment here that p=0 if no loss 
occured, i.e.
/*
 * Empty loss history: no loss so far, hence p stays 0.
 * Sample RTT values, since an RTT estimate is required for the
 * computation of p when the first loss occurs; RFC 3448, 6.3.1.
 */
|   if (sample != 0)
|   hcrx->ccid3hcrx_rtt = tfrc_ewma(hcrx->ccid3hcrx_rtt, 
sample, 9);
|   }
| 
|   /*
|* Check if the periodic once-per-RTT feedback is due; RFC 4342, 10.3
|*/
|   if (SUB16(dccp_hdr(skb)->dccph_ccval, hcrx->ccid3hcrx_last_counter) > 3)
|   do_feedback = CCID3_FBACK_PERIODIC;
| 
| update_records:   
|   tfrc_rx_hist_add_packet(&hcrx->ccid3hcrx_hist, skb, ndp);
| 
==>  here a jump label is missing. It is not needed by this patch and
 above you have replaced it with a return + comment, but it is needed in a 
later
 patch: when a new loss event occurs, the control jumps to `done_receiving' 
and
 sends a feedback packet with type FBACK_PARAM_CHANGE.
done_receiving:
|   if (do_feedback)
|   ccid3_hc_rx_send_feedback(sk, skb, do_feedback);
| }
| 

| Now to some questions and please bear with me as I haven't got to the
| patches after this:
| 
| tfrc_rx_hist->loss_count as of now is a boolean, my understanding is
| that you are counting loss events, i.e. it doesn't matter in:
|
It is not a boolean, but uses a hidden trick which maybe should be commented:
 * here and in the TCP world NDUPACK = 3
 * hence the bitfield size for loss_count is 2 bits, which can express
   at most 3 = NDUPACK (that is why it is declared as loss_count:2)
 * the trick is that when the loss count increases beyond 3, it automatically 
   cycles back to 0 (although the code does not rely on that features
   and does this explicitly);
 * loss_start is the same


| /* any data packets missing between last reception and skb ? */
| int tfrc_rx_hist_new_loss_indicated(struct tfrc_rx_hist *h,
|   const struct sk_buff *skb, u32 ndp)
| {
|

Re: [PATCH v2 0/3][BUG-FIX]: Test tree updates and bug fixes

2007-12-05 Thread Gerrit Renker
| > @@ -788,8 +782,8 @@ static void ccid3_hc_rx_packet_recv(stru
| > if (unlikely(hcrx->ccid3hcrx_state == TFRC_RSTATE_NO_DATA)) {
| > if (is_data_packet) {
| > do_feedback = FBACK_INITIAL;
| > +   hcrx->ccid3hcrx_s = payload_size;
| > ccid3_hc_rx_set_state(sk, TFRC_RSTATE_DATA);
| > -   ccid3_hc_rx_update_s(hcrx, payload_size);
| 
| We have to set ccid3hcrx_bytes_recv to the payload_size here too, I'm
| fixing this on the reworked patch that introduces the RX history.
| 
I almost did the same error again by wanting to agree too prematurely.

But updating ccid3hcrx_bytes_recv is in fact not needed here and if it
would be done it would not have a useable effect. The reason is that the
first data packet will trigger the initial feedback; and in the initial
feedback packet X_recv (which is ccid3hcrx_bytes_recv / the_time_spent)
is set to 0 (RFC 3448, 6.3).

For this reason, updating bytes_recv for the first data packet is also not
in the flowchart on 
http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid3_packet_reception/
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCHES 0/7]: Reorganization of RX history patches

2007-12-05 Thread Gerrit Renker
| > Patch #3: 
| > -
| >   Renames s/tfrc_tx_hist/tfrc_tx_hist_slab/g; no problems.
| 
| This was for consistency with the other slabs in DCCP:
| 
| [EMAIL PROTECTED] net-2.6.25]$ find net/dccp/ -name "*.c" | xargs grep 
'struct kmem_cache'
Thanks, I will put the same naming scheme also in the test tree (tomorrow).

 
| > Patch #4:
| > -
| > Just a suggestion, it is possible to simplify this further,
| >   by doing all the initialisation / cleanup in tfrc.c:
| > int __init {rx,tx}_packet_history_init()
| > {
| > tfrc_{rx,tx}_hist_slab = kmem_cache_create("tfrc_{rx,tx}_hist", 
...);
| > return tfrc_{rx,tx}_hist_slab == NULL ? - ENOBUFS : 0;
| > }
| >   and then call these successively in tfrc_module_init().
| 
| Idea here was to have each C source file to have a init module. Perhaps
| we should try to break packet_history.c into tx_packet_history and
| rx_packet_history.c. We can do that later to try to meet the goal of
| being able to see what is being replaced.
|
I think this is a great idea, since then rx_packet_history.c could also
take up all the internals of the RX packet history list, as it is
currently done for the TX history, and it could also possibly
incorporate. packet_history_internal.h.


|  
| > Patch #7:
| > -
|  
| >  * tfrc_rx_hist_entry_data_packet() is not needed:
| >- a similar function, called dccp_data_packet(), was introduced in patch 
2/7
| 
| I thought about that, but dccp_data_packet is for skbs,
| tfrc_rx_hist_entry_data_packet is for tfrc_rx_hist_entries, I guess we
| should just make dccp_data_packet receive the packet type and not an
| object that has a packet type field.
| 
The question which of the two interfaces is generally better to use is
best left to you. Two functions doing almost the same thing can probably
be replaced by just one.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/7] [TFRC] New rx history code

2007-12-05 Thread Gerrit Renker
| > In the end this went back to the same questions and issues that were 
tackled between
| > February and March this year. I wish we could have had this dicussion then; 
it would
| > have made this email unnecessary.
| 
| That is why it is important that the changeset comment collects these
| scattered notes and discussions. Think about in some years from now we
| will be left with a situation where we would have to look at many places
| to understando some aspects of what is in the code. While this happens
| and we have google for that I think that since you keep such detailed
| notes it is not a problem to get them in the changesets.
|  
I agree, the changelogs are all a bit short. When condensing from 40 to 16 
patches the changelogs
were also cut down, for fear of being too verbose. Will go through the patches 
in the next days and
see if/where this can be improved, that would probably have saved some trouble.

| >  * But where I need to interact is when changes are made to the algorithm, 
even if these may 
| >seem small. The underlying reasons that lead to the code may not be 
immediately clear,
| >but since this is a solution for a specific problem, there are also 
specific reasons; which
| >is why for algorithm changes (and underlying data structure) I'd ask you 
to discuss this first
| >before committing the patch. 
| 
| I did this for the RX handling, where changes were made. Did that for
| the TX but ended up commiting before comments from you, but I think its
| fair to say that the changes for the TX history were more organizational
| than in the essence of the algorithm.
|  
It was a similar situation: the RFC patch came out on Thursday afternoon; I 
replied a bit hurriedly
on Friday that it seemed ok, but found the full confirmation only on Sunday 
after putting all
recent changes into the test tree. And yes, you made a good job of it.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/3][BUG-FIX]: Test tree updates and bug fixes

2007-12-05 Thread Gerrit Renker
This revision fixes a bug present in the per-socket allocation of RX history 
entries; identification of this bug is thanks to Arnaldo Carvalho de Melo.

The bug was in not deallocating history entries when the allocation of 
one array element failed. The solution in this revised patch set is the
original one written by Arnaldo.

--> Patch v2 <-
[TFRC]: New RX history implementation

This provides a new, self-contained and generic RX history service for TFRC
based protocols.

Details:
 * new data structure, initialisation and cleanup routines;
 * allocation of dccp_rx_hist entries local to packet_history.c, 
   as a service exported by the dccp_tfrc_lib module.
 * interface to automatically track highest-received seqno;
 * receiver-based RTT estimation (needed for instance by RFC 3448, 6.3.1);
 * a generic function to test for `data packets' as per  RFC 4340, sec. 7.7.


Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
---
 net/dccp/ccids/lib/packet_history.c |  126 ---
 net/dccp/ccids/lib/packet_history.h |  144 +++-
 net/dccp/ccids/lib/tfrc_module.c|   26 --
 net/dccp/dccp.h |   12 +++
 4 files changed, 285 insertions(+), 23 deletions(-)

--- a/net/dccp/ccids/lib/packet_history.h
+++ b/net/dccp/ccids/lib/packet_history.h
@@ -1,3 +1,5 @@
+#ifndef _DCCP_PKT_HIST_
+#define _DCCP_PKT_HIST_
 /*
  *  Packet RX/TX history data structures and routines for TFRC-based protocols.
  *
@@ -32,10 +34,6 @@
  *  along with this program; if not, write to the Free Software
  *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
  */
-
-#ifndef _DCCP_PKT_HIST_
-#define _DCCP_PKT_HIST_
-
 #include 
 #include 
 #include 
@@ -43,9 +41,13 @@
 
 /* Number of later packets received before one is considered lost */
 #define TFRC_RECV_NUM_LATE_LOSS 3
+/* Number of packets to wait after a missing packet (RFC 4342, 6.1) */
+#define NDUPACK 3
 
 #define TFRC_WIN_COUNT_PER_RTT  4
 #define TFRC_WIN_COUNT_LIMIT   16
+/* Subtraction a-b modulo-16, respects circular wrap-around */
+#define SUB16(a,b) (((a) + 16 - (b)) & 0xF)
 
 struct tfrc_tx_hist_entry;
 
@@ -66,6 +68,23 @@ struct dccp_rx_hist_entry {
ktime_t  dccphrx_tstamp;
 };
 
+
+/**
+ *   tfrc_rx_hist_entry  -  Store information about a single received packet
+ *   @seqno:   DCCP packet sequence number
+ *   @ccval:   window counter value of packet (RFC 4342, 8.1)
+ *   @ptype:   the type (5.1) of the packet
+ *   @ndp: the NDP count (if any) of the packet
+ *   @stamp:   actual receive time of packet
+ */
+struct tfrc_rx_hist_entry {
+   u64 seqno:48,
+   ccval:4,
+   ptype:4;
+   u32 ndp;
+   ktime_t stamp;
+};
+
 struct dccp_rx_hist {
struct kmem_cache *dccprxh_slab;
 };
@@ -73,6 +92,123 @@ struct dccp_rx_hist {
 extern struct dccp_rx_hist *dccp_rx_hist_new(const char *name);
 extern voiddccp_rx_hist_delete(struct dccp_rx_hist *hist);
 
+/**
+ *   tfrc_rx_hist  -  RX history structure for TFRC-based protocols
+ *
+ *   @ring:Packet history for RTT sampling and loss detection
+ *   @loss_count:  Number of entries in circular history
+ *   @loss_start:  Movable index (for loss detection)
+ *   @rtt_sample_prev:  Used during RTT sampling, points to candidate entry
+ */
+struct tfrc_rx_hist {
+   struct tfrc_rx_hist_entry   *ring[NDUPACK + 1];
+   u8  loss_count:2,
+   loss_start:2;
+#define rtt_sample_prevloss_start
+};
+
+/*
+ * Macros for loss detection.
+ * @loss_prev:  entry with highest-received-seqno before loss was detected
+ * @hist_index: index to reach n-th entry after loss_start
+ * @hist_entry: return the n-th history entry  after loss_start
+ * @last_rcv:   entry with highest-received-seqno so far
+ */
+#define loss_prev(h)   (h)->ring[(h)->loss_start]
+#define hist_index(h, n)   (((h)->loss_start + (n)) & NDUPACK)
+#define hist_entry(h, n)   (h)->ring[hist_index(h, n)]
+#define last_rcv(h)(h)->ring[hist_index(h, (h)->loss_count)]
+
+/*
+ * Macros to access history entries for RTT sampling.
+ * @rtt_last_s: reference entry to compute RTT samples against
+ * @rtt_prev_s: previously suitable (wrt rtt_last_s) RTT-sampling entry
+ */
+#define rtt_last_s(h)  (h)->ring[0]
+#define rtt_prev_s(h)  (h)->ring[(h)->rtt_sample_prev]
+
+/* initialise loss detection and disable RTT sampling */
+static inline void tfrc_rx_hist_loss_indicated(struct tfrc_rx_hist *h)
+{
+   h->loss_count = 1;
+}
+
+/* indicate whether previously a packet was detected missing */
+static inline int tfrc_rx_loss_pending(struct tfrc_r

[PATCH v2 0/3][BUG-FIX]: Test tree updates and bug fixes

2007-12-05 Thread Gerrit Renker
This fixes a problem in the initial revision of the patch: The loss interval
history was not de-allocated when the initialisation of the packet history
failed.  The identification of this problem is also thanks due to Arnaldo.

The interdiff to the previous revision is:

--- b/net/dccp/ccids/lib/tfrc_module.c
+++ b/net/dccp/ccids/lib/tfrc_module.c
@@ -26,7 +26,12 @@
 
if (rc == 0)
rc = packet_history_init();
+   if (rc == 0)
+   goto out;
 
+out_free_loss_intervals:
+   dccp_li_exit();
+out:
return rc;
 }

-> Patch v2 <---
[TFRC]: Provide central source file and debug facility

This patch changes the tfrc_lib module in the following manner:

 (1) a dedicated tfrc_module source file  (this is later populated 
 with TX/RX hist and LI Database routines);
 (2) a dedicated tfrc_pr_debug macro with toggle switch `tfrc_debug'.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
---
 net/dccp/ccids/Kconfig  |   12 +---
 net/dccp/ccids/lib/Makefile |3 +-
 net/dccp/ccids/lib/loss_interval.c  |2 -
 net/dccp/ccids/lib/packet_history.c |   28 ++--
 net/dccp/ccids/lib/packet_history.h |3 --
 net/dccp/ccids/lib/tfrc.h   |   17 +---
 net/dccp/ccids/lib/tfrc_module.c|   50 
 7 files changed, 78 insertions(+), 37 deletions(-)

--- a/net/dccp/ccids/lib/packet_history.h
+++ b/net/dccp/ccids/lib/packet_history.h
@@ -39,8 +39,7 @@
 #include 
 #include 
 #include 
-
-#include "../../dccp.h"
+#include "tfrc.h"
 
 /* Number of later packets received before one is considered lost */
 #define TFRC_RECV_NUM_LATE_LOSS 3
--- a/net/dccp/ccids/lib/packet_history.c
+++ b/net/dccp/ccids/lib/packet_history.c
@@ -35,7 +35,6 @@
  *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
  */
 
-#include 
 #include 
 #include "packet_history.h"
 
@@ -277,39 +276,18 @@ void dccp_rx_hist_purge(struct dccp_rx_h
 
 EXPORT_SYMBOL_GPL(dccp_rx_hist_purge);
 
-extern int __init dccp_li_init(void);
-extern void dccp_li_exit(void);
-
-static __init int packet_history_init(void)
+int __init packet_history_init(void)
 {
-   if (dccp_li_init() != 0)
-   goto out;
-
tfrc_tx_hist = kmem_cache_create("tfrc_tx_hist",
 sizeof(struct tfrc_tx_hist_entry), 0,
 SLAB_HWCACHE_ALIGN, NULL);
-   if (tfrc_tx_hist == NULL)
-   goto out_li_exit;
-
-   return 0;
-out_li_exit:
-   dccp_li_exit();
-out:
-   return -ENOBUFS;
+   return tfrc_tx_hist == NULL ? -ENOBUFS : 0;
 }
-module_init(packet_history_init);
 
-static __exit void packet_history_exit(void)
+void __exit packet_history_exit(void)
 {
if (tfrc_tx_hist != NULL) {
kmem_cache_destroy(tfrc_tx_hist);
tfrc_tx_hist = NULL;
}
-   dccp_li_exit();
 }
-module_exit(packet_history_exit);
-
-MODULE_AUTHOR("Ian McDonald <[EMAIL PROTECTED]>, "
- "Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>");
-MODULE_DESCRIPTION("DCCP TFRC library");
-MODULE_LICENSE("GPL");
--- a/net/dccp/ccids/lib/tfrc.h
+++ b/net/dccp/ccids/lib/tfrc.h
@@ -3,10 +3,11 @@
 /*
  *  net/dccp/ccids/lib/tfrc.h
  *
- *  Copyright (c) 2005 The University of Waikato, Hamilton, New Zealand.
- *  Copyright (c) 2005 Ian McDonald <[EMAIL PROTECTED]>
- *  Copyright (c) 2005 Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
- *  Copyright (c) 2003 Nils-Erik Mattsson, Joacim Haggmark, Magnus Erixzon
+ *  Copyright (c) 2007   The University of Aberdeen, Scotland, UK
+ *  Copyright (c) 2005-6 The University of Waikato, Hamilton, New Zealand.
+ *  Copyright (c) 2005-6 Ian McDonald <[EMAIL PROTECTED]>
+ *  Copyright (c) 2005   Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
+ *  Copyright (c) 2003   Nils-Erik Mattsson, Joacim Haggmark, Magnus Erixzon
  *
  *  This program is free software; you can redistribute it and/or modify
  *  it under the terms of the GNU General Public License as published by
@@ -15,6 +16,14 @@
  */
 #include 
 #include 
+#include "../../dccp.h"
+
+#ifdef CONFIG_IP_DCCP_TFRC_DEBUG
+extern int tfrc_debug;
+#define tfrc_pr_debug(format, a...)DCCP_PR_DEBUG(tfrc_debug, format, ##a)
+#else
+#define tfrc_pr_debug(format, a...)
+#endif
 
 /* integer-arithmetic divisions of type (a * 100)/b */
 static inline u64 scaled_div(u64 a, u32 b)
--- /dev/null
+++ b/net/dccp/ccids/lib/tfrc_module.c
@@ -0,0 +1,50 @@
+/*
+ * TFRC: main module holding the pieces of the TFRC library together
+ *
+ * Copyright (c) 2007   The University of Aberdeen, S

[PATCH v2 0/3][BUG-FIX]: Test tree updates and bug fixes

2007-12-05 Thread Gerrit Renker
This patch removes the following redundancies:
  * ccid3_hc_rx_update_s() is only called for data packets (that is what it 
should be called for);
  * each call to ccid3_hc_rx_update_s() is wrapped inside a "if 
(is_data_packet)" test';
  * therefore testing each time if "len > 0" is redundant (pointed out by 
Arnaldo);
  * no other code calls this function, hence the inline function can go.

Also replaced the first call to updating s with direct assignment of 
`payload_size'; this has also
been pointed out by Arnaldo.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/ccids/ccid3.c |   13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -654,12 +654,6 @@ static void ccid3_hc_rx_set_state(struct
hcrx->ccid3hcrx_state = state;
 }
 
-static inline void ccid3_hc_rx_update_s(struct ccid3_hc_rx_sock *hcrx, int len)
-{
-   if (likely(len > 0))/* don't update on empty packets (e.g. ACKs) */
-   hcrx->ccid3hcrx_s = tfrc_ewma(hcrx->ccid3hcrx_s, len, 9);
-}
-
 static void ccid3_hc_rx_send_feedback(struct sock *sk, struct sk_buff *skb,
  enum ccid3_fback_type fbtype)
 {
@@ -788,8 +782,8 @@ static void ccid3_hc_rx_packet_recv(stru
if (unlikely(hcrx->ccid3hcrx_state == TFRC_RSTATE_NO_DATA)) {
if (is_data_packet) {
do_feedback = FBACK_INITIAL;
+   hcrx->ccid3hcrx_s = payload_size;
ccid3_hc_rx_set_state(sk, TFRC_RSTATE_DATA);
-   ccid3_hc_rx_update_s(hcrx, payload_size);
}
goto update_records;
}
@@ -798,7 +792,10 @@ static void ccid3_hc_rx_packet_recv(stru
goto done_receiving;
 
if (is_data_packet) {
-   ccid3_hc_rx_update_s(hcrx, payload_size);
+   /*
+* Update moving-average of s and the sum of received payload 
bytes
+*/
+   hcrx->ccid3hcrx_s = tfrc_ewma(hcrx->ccid3hcrx_s, payload_size, 
9);
hcrx->ccid3hcrx_bytes_recv += payload_size;
}
 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/3][BUG-FIX]: Test tree updates and bug fixes

2007-12-05 Thread Gerrit Renker
Orthogonal to the ongoing discussion of patches, here are updates
of existing ones. This is mainly to keep the test tree in synch;
I'd like to upload the new test tree since -rc4 is out.

Arnaldo has actually pointed out more bugs than I have given him credit for,
these are contained in this patch set.

Patch #1: Fixes the allocation of per-socket RX history array elements
  (implements the solution contributed by Arnaldo).

Patch #2: Fixes a similar problem in the tfrc_module.c -- this solution
  is also present in Arnaldo's patch set, but he did not explicitly
  point it out. The update fixes this problem (it used to disappear
  later in the patch set, when the full initialisation was made for
  TX/RX histories and Loss Intervals; so it is only temporary).

Patch #3: Removes the redundant test "len > 0", as a result one inline function
  becomes obsolete. Thanks is again due to Arnaldo.

I have not yet updated the CCID4 subtree with regard to patch#3 but will do
as soon as work here permits.

The updates to the tree are now available at
git://eden-feed.erg.abdn.ac.uk/dccp_exp {dccp,ccid4} 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCHES 0/7]: Reorganization of RX history patches

2007-12-05 Thread Gerrit Renker
Today being Wednesday, below is some feedback after working through the patch 
set.
Hope this is helpful. 

Patch #1:
-
  Several new good points are introduced:
  - IP_DCCP_TFRC_DEBUG is made dependent on IP_DCCP_TFRC_DEBUG which is useful
  - the select from CONFIG_IP_DCCP_CCID3 => CONFIG_DCCP_TFRC_LIB
  - the cleanup action in tfrc_module_init() (when packet_history_init() fails)
was previously missing, this is a good catch.
  Also a note: tfrc_pr_debug() is not currently used (but may be later should 
the 
   code common to both CCID3 and CCID4 be shared).
  
Patches #2/#6:
--
  Separated from main patch, no problems (were initially submitted in this 
format).
  I wonder whether switching back to smaller patch sizes is better?

Patch #3: 
-
  Renames s/tfrc_tx_hist/tfrc_tx_hist_slab/g; no problems.

Patch #4:
-
  packet_history_init() initialises both RX and TX history and is later called 
by the module_init()
  function in net/dccp/ccids/lib/tfrc.c. Just a suggestion, it is possible to 
simplify this further,
  by doing all the initialisation / cleanup in tfrc.c:
int __init {rx,tx}_packet_history_init()
{
tfrc_{rx,tx}_hist_slab = kmem_cache_create("tfrc_{rx,tx}_hist", 
...);
return tfrc_{rx,tx}_hist_slab == NULL ? - ENOBUFS : 0;
}
  and then call these successively in tfrc_module_init().

Patch #5:
-
  The naming scheme introduced here is 
  s/dccp_rx/tfrc_rx/g; 
  s/dccphrx/tfrchrx/g;
  I wonder, while at it, would it be possible to extend this and extend this to 
other parts
  of the code? Basically this is the same discussion as earlier on [EMAIL 
PROTECTED] with Leandro,
  who first came up with this naming scheme. There the same question came up 
and the result
  of the discussion was that a prefix of  `tfrchrx' is cryptic; if something 
simpler is 
  possible then it would be great.

Patch #7:
-
 * ccid3_hc_rx_detect_loss() can be fully removed since no other code 
references it any
   further.
 * bytes_recv also counts the payload_size's of non-data packets, which is 
wrong (it should only
   sum up the sum of bytes transferred as data packets)  

 * loss handling is not correctly taken care of: unlike in the other part, both 
data and non-data
   packets are used to detect loss (this is not correctly handled in the 
current Linux implementation).


 * tfrc_rx_hist_entry_data_packet() is not needed:
   - a similar function, called dccp_data_packet(), was introduced in patch 2/7
   - code compiles cleanly without tfrc_rx_hist_entry_data_packet()
   - all references to this function are deleted by patch 7/7
 * is another header file (net/dccp/ccids/lib/packet_history_internal.h) really 
necessary?
   - net/dccp/ccids/lib has already 3 x header file, 4 x source file
   - with the removal of tfrc_rx_hist_entry_data_packet(), only struct 
tfrc_rx_hist_entry
 remains as the sole occupant of that file
   - how about hiding the internals of struct tfrc_rx_hist_entry by putting it 
into packet_history.c,
 as it was done previously in a similar way for the TX packet history?

 * in ccid3_hc_rx_insert_options(), due to hunk-shifting or something else, 
there is still the
   call to dccp_insert_option_timestamp(sk, skb)
   --> this was meant to be removed by an earlier patch (which also removed the 
Elapsed Time option);
   --> in the original submission of this patch the call to 
dccp_insert_option_timestamp() did no
   longer appear (as can be found in the [EMAIL PROTECTED] mailing list 
archives), and the test tree
   likewise does not use it;
   --> it can be removed with full confidence since no part of the code uses 
timestamps sent by the
   HC-receiver (only the HC-sender timestamps are used); and it is further 
not required by the
   spec to send HC-receiver timestamps (RFC 4342, section 6)

 * one of the two variables ccid3hcrx_tstamp_last_feedback and 
ccid3hcrx_tstamp_last_ack is
   redundant and can be removed (this may be part of a later patch); the 
variable ccid3hcrx_tstamp_last_feedback
   is very long (function is clear due to type of ktime_t).


 * the inlines are a good idea regarding type safety, but I take it that we can 
now throw overboard the old rule
   of 80 characters per line? Due to the longer names of the inlines, some 
expressions end at column 98 (cf. 
   tfrc_rx_hist_sample_rtt(); but to be honest I'd rather get rid of that 
function since the receiver-RTT
   sampling is notoriously inaccurate (wrong place to measure) and then there 
is little left to argue with the inlines).

 * with regard to RX history initialisation, would you be amicable to the idea 
of performing the RX/TX history, and
   loss intervals history in tfrc.c, as suggested for patch 1/7 (shortens the 
routines)?


 * tfrc_rx_hist_entry is lacking documentation (my fault, had been forgotten in 
the initial submission):
/** 
 * tfrc_r

Re: [PATCH 7/7][TAKE 2][TFRC] New rx history code

2007-12-05 Thread Gerrit Renker
| I found a problem that I'm still investigating if was introduced by this
| patch or if was already present. When sending 1 million 256 byte packets
| with ttcp over loopback, using ccid3 it is crashing, the test machine
| I'm using doesn't have a serial port, its a notebook, will switch to
| another that has and provide the backtrace. It doesn't happens every
| time.
| 
CCID3 is difficult due to the TX queue. Small packets are faster on the
wire, fill up the TX queue faster. Since there is little loss on LANs, 
the slow-start algorithm will soon reach link capacity; but CCID3 can
not deal effectively with high speeds.
What is known not to work well at the moment is bidirectional data
transfer (e.g. an echo server/client). This lead to the comment in
tfrc_rx_sample_rtt(); the support for bidirectional data transfer
needs some more work, which however requires to make one-directional
transfer work well first.

| Here is tfrc_rx_hist_alloc back using ring of pointers with the fixed
| error path.
| 
Thank you - I was just about to send a similar patch as update since
you clearly identified this bug. I will resubmit with your version
and upload it to the test tree.


| +int tfrc_rx_hist_alloc(struct tfrc_rx_hist *h)
|  {
| + int i;
| +
| + for (i = 0; i <= TFRC_NDUPACK; i++) {
| + h->ring[i] = kmem_cache_alloc(tfrc_rx_hist_slab, GFP_ATOMIC);
| + if (h->ring[i] == NULL)
| + goto out_free;
| + }
| +
| + h->loss_count = h->loss_start = 0;
| + return 0;
| +
| +out_free:
| + while (i-- != 0) {
| + kmem_cache_free(tfrc_rx_hist_slab, h->ring[i]);
| + h->ring[i] = NULL;
|   }
| + return -ENOBUFS;
|  }
| 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/7] [TFRC] New rx history code

2007-12-05 Thread Gerrit Renker
Quoting Arnaldo:
| Em Tue, Dec 04, 2007 at 06:55:17AM +, Gerrit Renker escreveu:
| > NAK. You have changed the control flow of the algorithm and the underlying
| > data structure. Originally it had been an array of pointers, and this had
| > been a design decision right from the first submissions in March. From I
| > of 
http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid3_packet_reception/loss_detection/loss_detection_algorithm_notes.txt
| >  
| >  "1. 'ring' is an array of pointers rather than a contiguous area in
| >  memory. The reason is that, when having to swap adjacent entries
| >  to sort the history, it is easier to swap pointers than to copy
| >  (heavy-weight) history-entry data structs."
| > 
| > But in your algorithm it becomes a normal linear array.
| > 
| > As a consequence all subsequent code needs to be rewritten to use
| > copying instead of swapping pointers. And there is a lot of that, since
| > the loss detection code makes heavy use of swapping entries in order to
| > cope with reordered and/or delayed packets.
| 
| So lets not do that, the decision was made based on the RX history patch
| alone, where, as pointed out, no swapping occurs.
|  
| > This is not what I would call "entirely equivalent" as you claim. Your
| > algorithm is fundamentally different, the control flow is not the same,
| > nor are the underlying data structures.
| 
| Its is equivalent up to what is in the tree, thank you for point out
| that in subsequent patches it will be used.
| 
| Had this design decision been made explicit in the changeset comment
| that introduces the data structure I wouldn't have tried to reduce the
| number of allocations.
| 
| And you even said that it was a good decision when first reacting to
| this change, which I saw as positive feedback for the RFC I sent.
| 
That was my fault. Your solution "looked" much better to me but even I had 
forgotten
that the algorithm is based on an array of pointers. When I finally sat down 
and looked through the patch set I found the original notes from March and
saw that using a linear array instead of an array of pointers will require quite
a few changes and means changing the algorithm. I then thought through whether 
there
could be any advantage in using a linear array as in this patch, but could find 
none.
In the end this went back to the same questions and issues that were tackled 
between
February and March this year. I wish we could have had this dicussion then; it 
would
have made this email unnecessary.

I thank you for your replay and I am sorry if you took offense at my perhaps a 
little strong
reaction, here is the essence what I tried to convey but what maybe did not 
succeed in conveying:

 * I am absolutely ok with you making changes to variable names, namespaces, 
file organisation,
   overall arrangement etc (as per previous email). You have experience and 
this will often be a 
   benefit. For instance, you combined tfrc_entry_from_skb() with 
tfrc_rx_hist_update() to give
   a new function, tfrc_rx_hist_add_packet(). This is ok since entry_from_skb() 
is not otherwise
   used (and I only found this out when reading your patch). (On the other 
hand, since this is a
   4-element ring buffer, it is no real adding, the same entry will frequently 
be overwritten,
   this is why it was initially called hist_update().)

 * I am also not so much concerned about credit. As long as the (changed) end 
result is as
   least as good as or hopefully better than the submitted input, the author 
name is a side
   issue; so I do think that your approach is sound and fair. The only place 
where credits
   are needed is for the SatSix project (www.ist-satsix.org) which funded this 
work. This is
   why some of the copyrights now included "University of Aberdeen".  What in 
any case I'd like
   to add is at least the Signed-off-by: if it turns out to be a mess I want to 
contribute my
   part of responsibility.

 * But where I need to interact is when changes are made to the algorithm, even 
if these may 
   seem small. The underlying reasons that lead to the code may not be 
immediately clear,
   but since this is a solution for a specific problem, there are also specific 
reasons; which
   is why for algorithm changes (and underlying data structure) I'd ask you to 
discuss this first
   before committing the patch. 

 * In part you are already doing this (message subject); it may be necessary to 
adapt the way
   of discussion/presentation. The subject is complex (spent a week with the 
flow chart only)
   and it is a lot - initially this had been 40 small patches.  
   
| > | I modified it just to try to make it closer to the existing API, hide 
details from
| > | the CCIDs and fix a couple bugs found in the initial implementation.
| > What is "a couple bugs"? So far you have pointe

Re: [PATCH 7/7] [TFRC] New rx history code

2007-12-03 Thread Gerrit Renker
NAK. You have changed the control flow of the algorithm and the underlying
data structure. Originally it had been an array of pointers, and this had
been a design decision right from the first submissions in March. From I
of 
http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid3_packet_reception/loss_detection/loss_detection_algorithm_notes.txt
 
 "1. 'ring' is an array of pointers rather than a contiguous area in
 memory. The reason is that, when having to swap adjacent entries
 to sort the history, it is easier to swap pointers than to copy
 (heavy-weight) history-entry data structs."

But in your algorithm it becomes a normal linear array.

As a consequence all subsequent code needs to be rewritten to use
copying instead of swapping pointers. And there is a lot of that, since
the loss detection code makes heavy use of swapping entries in order to
cope with reordered and/or delayed packets.

This is not what I would call "entirely equivalent" as you claim. Your
algorithm is fundamentally different, the control flow is not the same,
nor are the underlying data structures.

| Credit here goes to Gerrit Renker, that provided the initial implementation 
for
| this new codebase.
| 
| I modified it just to try to make it closer to the existing API, hide details 
from
| the CCIDs and fix a couple bugs found in the initial implementation.
What is "a couple bugs"? So far you have pointed out only one problem and that 
was
agreed, it can be fixed. But it remains a side issue, it is not a failure of the
algorithm per se. What is worse here is that you take this single occurrence as 
a
kind of carte blanche to mess around with the code as you please. Where please 
are
the other bugs you are referring to?

I am not buying into this and am not convinced that you are understanding
what you are doing. It is the third time that you take patches which
have been submitted on [EMAIL PROTECTED] for over half a year, for which you
have offered no more than a sentence of feedback, release them under
your name, and introduce fundamental changes which alter the behaviour.

The first instance was the set of ktime patches which I had developed
for the test tree and which you extended to DCCP. In this case it were
in fact three bugs that you added in migrating my patches. I know this
because it messed up the way CCID3 behaved and since I spent several
hours chasing these. And, in contrast to the above, it is not a mere
claim: that is recorded in the netdev mail archives.

The second instance was when you released the TX history patches under
your name. Pro forma there was an RFC patch at 3pm, de facto it was
checked in a few hours later: input not welcome.

The third instance is now where you change in essence the floor
underneath this work. Since you are using a different basis, the
algorithm is (in addition to the changes in control flow) necessarily
different. 

I have provided documentation, written test modules, and am able to prove
that the algorithm as submitted works reasonably correct. In addition, the
behaviour has been verified using regression tests.

I am not prepared to take your claims and expressions of "deepest
respect" at face value since your actions - not your words - show that
you are, in fact, not dealing respectfully with work which has taken
months to complete and verify.

During 9 months on [EMAIL PROTECTED] you did not provide so much as a paragraph
of feedback. Instead you mopped up code from the test tree, modified it
according to your own whims and now, in the circle of your
invitation-only buddies, start to talk about having discussions and 
iterations. The only iteration I can see is in fixing up the things you
introduced, so it is not you who has to pay the price.

Sorry, unless you can offer a more mature model of collaboration,
consider me out of this and the patches summarily withdrawn. I am not
prepared to throw away several months of work just because you feel
inspired to do as you please with the work of others. 

Gerrit

| Original changeset comment from Gerrit:
|   ---
| This provides a new, self-contained and generic RX history service for TFRC
| based protocols.
| 
| Details:
|  * new data structure, initialisation and cleanup routines;
|  * allocation of dccp_rx_hist entries local to packet_history.c,
|as a service exported by the dccp_tfrc_lib module.
|  * interface to automatically track highest-received seqno;
|  * receiver-based RTT estimation (needed for instance by RFC 3448, 6.3.1);
|  * a generic function to test for `data packets' as per  RFC 4340, sec. 7.7.
|   ---
| 
| Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
| ---
|  net/dccp/ccids/ccid3.c   |  255 
|  net/dccp/ccids/ccid3.h   |   14 +-
|  net/dccp/ccids/lib/loss_interval.c   |   14 +-
|  net/dccp/ccids/lib/packet_history.c 

Re: [RFC][PATCHES 0/7]: Reorganization of RX history patches

2007-12-03 Thread Gerrit Renker
| > | > static inline void ccid3_hc_rx_update_s(struct ccid3_hc_rx_sock 
*hcrx, int len)
| > | > {
| > | > if (likely(len > 0))/* don't update on empty 
packets (e.g. ACKs) */
| > | > hcrx->ccid3hcrx_s = 
tfrc_ewma(hcrx->ccid3hcrx_s, len, 9);
| > | > }
| > | 
| > |   And we also just do test for len > 0 in update_s, that looks like
| > | also excessive, no?
| > Hm, I think we need to make it robust against API bugs and/or zero-sized
| > data packets. The check `len > 0' may seem redundant but it catches such
| > a condition. For a moving average an accidental zero value will have
| > quite an impact on the overall value. In CCID3 it is
| > 
| > x_new = 0.9 * x_old + 0.1 * len
| > 
| > So if len is accidentally 0 where it should not be, then each time the
| > moving average is reduced to 90%.
| 
| So we must make it a BUG_ON, not something that is to be always present.
|  
I think it should be a warning condition since it can be triggered when
the remote party sends zero-sized packets. It may be good to log this
into the syslog to warn about possibly misbehaving apps/peers/remote
stacks.


| > As a comparison - the entire patch set took about a full month to do.
| > But that meant I am reasonably sure the algorithm is sound and can cope
| > with problematic conditions.
| 
| And from what I saw so far that is my impression too, if you look at
| what I'm doing it is:
| 
| 1. go thru the whole patch trying to understand hunk by hunk
You are doing a great job - in particular as it really is a lot of material.

| 2. do consistency changes (add namespace prefixes)
| 3. reorganize the code to look more like what is already there, we
|both have different backgrounds and tastes about how code should
|be written, so its only normal that if we want to keep the code
|consistent, and I want that, I jump into things I think should be
|"reworded", while trying to keep the algorithm expressed by you.
|
Agree, that is not always easy to get right. I try to stick as close as
possible to existing conventions but of course that is my
interpretation, so I am already anticipating such changes/comments here.

| think about further automatization on regression testing.
| 
If it is of any use, some scripts and setups are at the bottom of the page at
http://www.erg.abdn.ac.uk/users/gerrit/dccp/testing_dccp/
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCHES 0/7]: Reorganization of RX history patches

2007-12-03 Thread Gerrit Renker
| > Are you suggesting using netdev exclusively or in addition to [EMAIL 
PROTECTED]
| 
| Well, since at least one person that has contributed significantly in
| the past has said he can't cope with traffic on netdev, we can CC
| [EMAIL PROTECTED]
I have a similar problem with the traffic but agree and will copy as well.


| > have been posted and re-posted for a period of over 9 months on [EMAIL 
PROTECTED], and
| 
| Being posted and re-posted does not guarantee that the patch is OK or
| that is in a form that is acceptable to all tree maintainers.
With the first point I couldn't agree more, but this is not really what I meant 
- the point
was that despite posting and re-posting there was often silence. And now there 
is feedback,
in form of a patchset made by you; and all that I am asking for is just to be 
given the time to
look that through. Last time a RFC patch appeared at 3pm and was checked in the 
same evening
(only found out next morning).
With your experience and long background as a maintainer maybe you expect 
quicker turn-around
times, but I also had waited patiently until you had had a chance to review the 
stuff I sent.


| > | . The code that allocates the RX ring deals with failures when one of the 
entries in
| > |   the ring buffer is not successfully allocated, the original code was 
leaking the
| > |   successfully allocated entries.
| 
| Sorry for not point out exactly this, here it goes:
| 
| Your original patch:
| 
| +int tfrc_rx_hist_init(struct tfrc_rx_hist *h)
| +{
| + int i;
| +
| + for (i = 0; i <= NDUPACK; i++) {
| + h->ring[i] = kmem_cache_alloc(tfrc_rxh_cache, GFP_ATOMIC);
| + if (h->ring[i] == NULL)
| + return 1;
| + }
| + h->loss_count = 0;
| + h->loss_start = 0;
| + return 0;
| +}
| +EXPORT_SYMBOL_GPL(tfrc_rx_hist_init);
| 
I expected this and it actually was very clear from your original message. I 
fully back up
your solution in this point, see below. But my question above was rather: are 
there any
other bugs rather than the above leakage, which is what the previous email 
seemed to indicate?

With regard to your solution - you are using

int tfrc_rx_hist_alloc(struct tfrc_rx_hist *h)
{
h->ring = kmem_cache_alloc(tfrc_rx_hist_slab, GFP_ATOMIC);
h->loss_count = h->loss_start = 0;
return h->ring == NULL;
}

which is better not only for one but for two reasons. It solves the leakage and 
in addition makes
the entire code simpler. Fully agreed.
  

| 
| > | . I haven't checked if all the code was commited, as I tried to introduce 
just what was
| > |   immediatelly used, probably we'll need to do some changes when working 
on the merge
| > |   of your loss intervals code.
| > Sorry I don't understand this point.
| 
| Let me check now and tell you for sure:
| 
| tfrc_rx_hist_delta_seqno and tfrc_rx_hist_swap were not included, as
| they were not used, we should introduce them later, when getting to the
| working on the loss interval code.
Ah thanks, that was really not clear. Just beginning to work through the set.

| > static void ccid3_hc_rx_packet_recv(struct sock *sk, struct sk_buff 
*skb)
| > {
| > // ...
| > u32 sample, payload_size = skb->len - dccp_hdr(skb)->dccph_doff 
* 4;
| > 
| > if (unlikely(hcrx->ccid3hcrx_state == TFRC_RSTATE_NO_DATA)) {
| > if (is_data_packet) {
| > do_feedback = FBACK_INITIAL;
| > ccid3_hc_rx_set_state(sk, TFRC_RSTATE_DATA);
| > ccid3_hc_rx_update_s(hcrx, payload_size);
| > }
| > goto update_records;
| > }
| > 
| > ==> Non-data packets are ignored for the purposes of computing s (this is 
in the RFC),
| > consequently update_s() is only called for data packets; using the two 
following
| > functions:
| > 
| > 
| > static inline u32 tfrc_ewma(const u32 avg, const u32 newval, const u8 
weight)
| > {
| > return avg ? (weight * avg + (10 - weight) * newval) / 10 : 
newval;
| > }
| 
| I hadn't considered that tfrc_ewma would for every packet check if the
| avg was 0 and I find it suboptimal now that I look at it, we are just
| feeding data packets, no? 
Yes exactly, only data packets are used for s.

| > static inline void ccid3_hc_rx_update_s(struct ccid3_hc_rx_sock *hcrx, 
int len)
| > {
| > if (likely(len > 0))/* don't update on empty packets (e.g. 
ACKs) */
| > hcrx->ccid3hcrx_s = tfrc_ewma(hcrx->ccid3hcrx_s, len, 
9);
| > }
| 
|   And we also just do test for len > 0 in update_s, that looks like
| also excessive, no?
Hm, I think we need to make it robust against API bugs and/or zero-sized
data packets. The check `len > 0' may seem redundant but it catches such
a condition. For a moving average an accidental zero value will 

[PATCH 5/6] [TFRC]: Ringbuffer to track loss interval history

2007-12-03 Thread Gerrit Renker
A ringbuffer-based implementation of loss interval history is easier to
maintain, allocate, and update.

Details:
 * access to the Loss Interval Records via macro wrappers (with safety checks);
 * simplified, on-demand allocation of entries (no extra memory consumption on
   lossless links); cache allocation is local to the module / exported as 
service;
 * provision of RFC-compliant algorithm to re-compute average loss interval;
 * provision of comprehensive, new loss detection algorithm
- support for all cases of loss, including re-ordered/duplicate packets;
- waiting for NDUPACK=3 packets to fill the hole;
- updating loss records when a late-arriving packet fills a hole.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
---
 net/dccp/ccids/lib/loss_interval.c  |  158 +-
 net/dccp/ccids/lib/loss_interval.h  |   58 +++-
 net/dccp/ccids/lib/packet_history.c |  183 +++
 net/dccp/ccids/lib/packet_history.h |4 +
 net/dccp/ccids/lib/tfrc.h   |3 +
 5 files changed, 400 insertions(+), 6 deletions(-)

diff --git a/net/dccp/ccids/lib/loss_interval.c 
b/net/dccp/ccids/lib/loss_interval.c
index 0ebdc67..cde5c7f 100644
--- a/net/dccp/ccids/lib/loss_interval.c
+++ b/net/dccp/ccids/lib/loss_interval.c
@@ -1,6 +1,7 @@
 /*
  *  net/dccp/ccids/lib/loss_interval.c
  *
+ *  Copyright (c) 2007   The University of Aberdeen, Scotland, UK
  *  Copyright (c) 2005-7 The University of Waikato, Hamilton, New Zealand.
  *  Copyright (c) 2005-7 Ian McDonald <[EMAIL PROTECTED]>
  *  Copyright (c) 2005 Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
@@ -10,12 +11,7 @@
  *  the Free Software Foundation; either version 2 of the License, or
  *  (at your option) any later version.
  */
-
-#include 
 #include 
-#include "../../dccp.h"
-#include "loss_interval.h"
-#include "packet_history.h"
 #include "tfrc.h"
 
 #define DCCP_LI_HIST_IVAL_F_LENGTH  8
@@ -27,6 +23,51 @@ struct dccp_li_hist_entry {
u32  dccplih_interval;
 };
 
+static struct kmem_cache  *tfrc_lh_slab  __read_mostly;
+/* Loss Interval weights from [RFC 3448, 5.4], scaled by 10 */
+static const int tfrc_lh_weights[NINTERVAL] = { 10, 10, 10, 10, 8, 6, 4, 2 };
+
+/*
+ * Access macros: These require that at least one entry is present in lh,
+ * and implement array semantics (0 is first, n-1 is the last of n entries).
+ */
+#define __lih_index(lh, n) LIH_INDEX((lh)->counter - (n) - 1)
+#define __lih_entry(lh, n) (lh)->ring[__lih_index(lh, n)]
+#define __curr_entry(lh)   (lh)->ring[LIH_INDEX((lh)->counter - 1)]
+#define __next_entry(lh)   (lh)->ring[LIH_INDEX((lh)->counter)]
+
+/* given i with 0 <= i <= k, return I_i as per the rfc3448bis notation */
+static inline u32 tfrc_lh_get_interval(struct tfrc_loss_hist *lh, u8 i)
+{
+   BUG_ON(i >= lh->counter);
+   return __lih_entry(lh, i)->li_length;
+}
+
+static inline struct tfrc_loss_interval *tfrc_lh_peek(struct tfrc_loss_hist 
*lh)
+{
+   return lh->counter ? __curr_entry(lh) : NULL;
+}
+
+/*
+ * On-demand allocation and de-allocation of entries
+ */
+static struct tfrc_loss_interval *tfrc_lh_demand_next(struct tfrc_loss_hist 
*lh)
+{
+   if (__next_entry(lh) == NULL)
+   __next_entry(lh) = kmem_cache_alloc(tfrc_lh_slab, GFP_ATOMIC);
+
+   return __next_entry(lh);
+}
+
+void tfrc_lh_cleanup(struct tfrc_loss_hist *lh)
+{
+   if (tfrc_lh_is_initialised(lh))
+   for (lh->counter = 0; lh->counter < LIH_SIZE; lh->counter++)
+   if (__next_entry(lh) != NULL)
+   kmem_cache_free(tfrc_lh_slab, __next_entry(lh));
+}
+EXPORT_SYMBOL_GPL(tfrc_lh_cleanup);
+
 static struct kmem_cache *dccp_li_cachep __read_mostly;
 
 static inline struct dccp_li_hist_entry *dccp_li_hist_entry_new(const gfp_t 
prio)
@@ -98,6 +139,65 @@ u32 dccp_li_hist_calc_i_mean(struct list_head *list)
 
 EXPORT_SYMBOL_GPL(dccp_li_hist_calc_i_mean);
 
+static void tfrc_lh_calc_i_mean(struct tfrc_loss_hist *lh)
+{
+   u32 i_i, i_tot0 = 0, i_tot1 = 0, w_tot = 0;
+   int i, k = tfrc_lh_length(lh) - 1; /* k is as in rfc3448bis, 5.4 */
+
+   for (i=0; i <= k; i++) {
+   i_i = tfrc_lh_get_interval(lh, i);
+
+   if (i < k) {
+   i_tot0 += i_i * tfrc_lh_weights[i];
+   w_tot  += tfrc_lh_weights[i];
+   }
+   if (i > 0)
+   i_tot1 += i_i * tfrc_lh_weights[i-1];
+   }
+
+   BUG_ON(w_tot == 0);
+   lh->i_mean = max(i_tot0, i_tot1) / w_tot;
+}
+
+/**
+ * tfrc_lh_update_i_mean  -  Update the `open' loss interval I_0
+ * For recomputing p: returns `true' if p > p_prev  <=>  1/p < 1/p_prev
+ */
+u8 tfrc_lh_update_i_mean(struct tfrc_

[PATCH 3/6] [CCID3]: Hook up with new RX history interface

2007-12-03 Thread Gerrit Renker
In addition, it makes two corrections too the code:

 1. The receiver of a half-connection does not set window counter values;
only the sender sets window counters [RFC 4342, sections 5 and 8.1].
 2. The computation of X_recv does currently not conform to TFRC/RFC 3448,
since this specification requires that X_recv be computed over the last
R_m seconds (sec. 6.2). The patch tackles this problem as it
  - explicitly distinguishes the types of feedback (using an enum);
  - uses previous value of X_recv when sending feedback due to a parameter 
change;
  - makes all state changes local to ccid3_hc_tx_packet_recv;
  - assigns feedback type according to incident (previously only used flag 
`do_feedback').
   Further and detailed information is at
   
http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid3_packet_reception/#8._Computing_X_recv_

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
---
 net/dccp/ccids/ccid3.c |  292 ++--
 net/dccp/ccids/ccid3.h |   38 ---
 2 files changed, 102 insertions(+), 228 deletions(-)

diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index 5dea690..ef243dc 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -1,6 +1,7 @@
 /*
  *  net/dccp/ccids/ccid3.c
  *
+ *  Copyright (c) 2007   The University of Aberdeen, Scotland, UK
  *  Copyright (c) 2005-7 The University of Waikato, Hamilton, New Zealand.
  *  Copyright (c) 2005-7 Ian McDonald <[EMAIL PROTECTED]>
  *
@@ -49,8 +50,6 @@ static int ccid3_debug;
 #define ccid3_pr_debug(format, a...)
 #endif
 
-static struct dccp_rx_hist *ccid3_rx_hist;
-
 /*
  * Transmitter Half-Connection Routines
  */
@@ -675,52 +674,53 @@ static inline void ccid3_hc_rx_update_s(struct 
ccid3_hc_rx_sock *hcrx, int len)
hcrx->ccid3hcrx_s = tfrc_ewma(hcrx->ccid3hcrx_s, len, 9);
 }
 
-static void ccid3_hc_rx_send_feedback(struct sock *sk)
+static void ccid3_hc_rx_send_feedback(struct sock *sk, struct sk_buff *skb,
+ enum ccid3_fback_type fbtype)
 {
struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk);
struct dccp_sock *dp = dccp_sk(sk);
-   struct dccp_rx_hist_entry *packet;
-   ktime_t now;
-   suseconds_t delta;
-
-   ccid3_pr_debug("%s(%p) - entry \n", dccp_role(sk), sk);
+   ktime_t now = ktime_get_real();
+   s64 delta = 0;
 
-   now = ktime_get_real();
+   if (unlikely(hcrx->ccid3hcrx_state == TFRC_RSTATE_TERM))
+   return;
 
-   switch (hcrx->ccid3hcrx_state) {
-   case TFRC_RSTATE_NO_DATA:
+   switch (fbtype) {
+   case FBACK_INITIAL:
hcrx->ccid3hcrx_x_recv = 0;
+   hcrx->ccid3hcrx_pinv   = ~0U;   /* see RFC 4342, 8.5 */
break;
-   case TFRC_RSTATE_DATA:
-   delta = ktime_us_delta(now,
-  hcrx->ccid3hcrx_tstamp_last_feedback);
-   DCCP_BUG_ON(delta < 0);
-   hcrx->ccid3hcrx_x_recv =
-   scaled_div32(hcrx->ccid3hcrx_bytes_recv, delta);
+   case FBACK_PARAM_CHANGE:
+   /*
+* When parameters change (new loss or p > p_prev), we do not
+* have a reliable estimate for R_m of [RFC 3448, 6.2] and so
+* need to  reuse the previous value of X_recv. However, when
+* X_recv was 0 (due to early loss), this would kill X down to
+* s/t_mbi (i.e. one packet in 64 seconds).
+* To avoid such drastic reduction, we approximate X_recv as
+* the number of bytes since last feedback.
+* This is a safe fallback, since X is bounded above by X_calc.
+*/
+   if (hcrx->ccid3hcrx_x_recv > 0)
+   break;
+   /* fall through */
+   case FBACK_PERIODIC:
+   delta = ktime_us_delta(now, hcrx->ccid3hcrx_last_feedback);
+   if (delta <= 0)
+   DCCP_BUG("delta (%ld) <= 0", (long)delta);
+   else
+   hcrx->ccid3hcrx_x_recv =
+   scaled_div32(hcrx->ccid3hcrx_bytes_recv, delta);
break;
-   case TFRC_RSTATE_TERM:
-   DCCP_BUG("%s(%p) - Illegal state TERM", dccp_role(sk), sk);
-   return;
-   }
-
-   packet = dccp_rx_hist_find_data_packet(&hcrx->ccid3hcrx_hist);
-   if (unlikely(packet == NULL)) {
-   DCCP_WARN("%s(%p), no data packet in history!\n",
- dccp_role(sk), sk);
+   default:
return;
}
+   ccid3_pr_debug("Interval %ldusec, X_recv=%u, 1/p=%u\n", (long)delta,
+  hcrx->ccid3hcrx_x

[PATCH 2/6] [TFRC]: New RX history implementation

2007-12-03 Thread Gerrit Renker
This provides a new, self-contained and generic RX history service for TFRC
based protocols.

Details:
 * new data structure, initialisation and cleanup routines;
 * allocation of dccp_rx_hist entries local to packet_history.c,
   as a service exported by the dccp_tfrc_lib module.
 * interface to automatically track highest-received seqno;
 * receiver-based RTT estimation (needed for instance by RFC 3448, 6.3.1);
 * a generic function to test for `data packets' as per  RFC 4340, sec. 7.7.


Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
---
 net/dccp/ccids/lib/packet_history.c |  120 ++---
 net/dccp/ccids/lib/packet_history.h |  143 ++-
 net/dccp/ccids/lib/tfrc_module.c|   27 ++-
 net/dccp/dccp.h |   12 +++
 4 files changed, 281 insertions(+), 21 deletions(-)

diff --git a/net/dccp/ccids/lib/packet_history.c 
b/net/dccp/ccids/lib/packet_history.c
index ebcb6c0..2f9e0a0 100644
--- a/net/dccp/ccids/lib/packet_history.c
+++ b/net/dccp/ccids/lib/packet_history.c
@@ -34,7 +34,6 @@
  *  along with this program; if not, write to the Free Software
  *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
  */
-
 #include 
 #include "packet_history.h"
 
@@ -55,6 +54,22 @@ struct tfrc_tx_hist_entry {
  */
 static struct kmem_cache *tfrc_tx_hist;
 
+int __init tx_packet_history_init(void)
+{
+   tfrc_tx_hist = kmem_cache_create("tfrc_tx_hist",
+sizeof(struct tfrc_tx_hist_entry), 0,
+SLAB_HWCACHE_ALIGN, NULL);
+   return tfrc_tx_hist == NULL ? -ENOBUFS : 0;
+}
+
+void tx_packet_history_cleanup(void)
+{
+   if (tfrc_tx_hist != NULL) {
+   kmem_cache_destroy(tfrc_tx_hist);
+   tfrc_tx_hist = NULL;
+   }
+}
+
 static struct tfrc_tx_hist_entry *
tfrc_tx_hist_find_entry(struct tfrc_tx_hist_entry *head, u64 seqno)
 {
@@ -264,6 +279,49 @@ void dccp_rx_hist_add_packet(struct dccp_rx_hist *hist,
 
 EXPORT_SYMBOL_GPL(dccp_rx_hist_add_packet);
 
+static struct kmem_cache *tfrc_rxh_cache;
+
+int __init rx_packet_history_init(void)
+{
+   tfrc_rxh_cache = kmem_cache_create("tfrc_rxh_cache",
+  sizeof(struct tfrc_rx_hist_entry),
+  0, SLAB_HWCACHE_ALIGN, NULL);
+   return tfrc_rxh_cache == NULL ? -ENOBUFS : 0;
+}
+
+void rx_packet_history_cleanup(void)
+{
+   if (tfrc_rxh_cache != NULL) {
+   kmem_cache_destroy(tfrc_rxh_cache);
+   tfrc_rxh_cache = NULL;
+   }
+}
+
+int tfrc_rx_hist_init(struct tfrc_rx_hist *h)
+{
+   int i;
+
+   for (i = 0; i <= NDUPACK; i++) {
+   h->ring[i] = kmem_cache_alloc(tfrc_rxh_cache, GFP_ATOMIC);
+   if (h->ring[i] == NULL)
+   return 1;
+   }
+   h->loss_count = 0;
+   h->loss_start = 0;
+   return 0;
+}
+EXPORT_SYMBOL_GPL(tfrc_rx_hist_init);
+
+void tfrc_rx_hist_cleanup(struct tfrc_rx_hist *h)
+{
+   int i;
+
+   for (i=0; i <= NDUPACK; i++)
+   if (h->ring[i] != NULL)
+   kmem_cache_free(tfrc_rxh_cache, h->ring[i]);
+}
+EXPORT_SYMBOL_GPL(tfrc_rx_hist_cleanup);
+
 void dccp_rx_hist_purge(struct dccp_rx_hist *hist, struct list_head *list)
 {
struct dccp_rx_hist_entry *entry, *next;
@@ -276,18 +334,56 @@ void dccp_rx_hist_purge(struct dccp_rx_hist *hist, struct 
list_head *list)
 
 EXPORT_SYMBOL_GPL(dccp_rx_hist_purge);
 
-int __init packet_history_init(void)
+/**
+ * tfrc_rx_sample_rtt  -  Sample RTT from timestamp / CCVal
+ * Based on ideas presented in RFC 4342, 8.1. Returns 0 if it was not able
+ * to compute a sample with given data - calling function should check this.
+ */
+u32 tfrc_rx_sample_rtt(struct tfrc_rx_hist *h, struct sk_buff *skb)
 {
-   tfrc_tx_hist = kmem_cache_create("tfrc_tx_hist",
-sizeof(struct tfrc_tx_hist_entry), 0,
-SLAB_HWCACHE_ALIGN, NULL);
-   return tfrc_tx_hist == NULL ? -ENOBUFS : 0;
-}
+   u32 sample = 0,
+   delta_v = SUB16(dccp_hdr(skb)->dccph_ccval, rtt_last_s(h)->ccval);
+
+   if (delta_v < 1 || delta_v > 4) {   /* unsuitable CCVal delta */
+
+   if (h->rtt_sample_prev == 2) {  /* previous candidate stored */
+   sample = SUB16(rtt_prev_s(h)->ccval,
+  rtt_last_s(h)->ccval);
+   if (sample)
+   sample = 4 / sample
+  * ktime_us_delta(rtt_prev_s(h)->stamp,
+   rtt_last_s(h)->stamp);
+   else/*
+* FIXME: Thi

[PATCH 4/6] [TFRC]: Remove old RX history interface

2007-12-03 Thread Gerrit Renker
Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
---
 net/dccp/ccids/lib/loss_interval.c  |9 ++
 net/dccp/ccids/lib/packet_history.c |  162 ---
 net/dccp/ccids/lib/packet_history.h |   84 --
 3 files changed, 9 insertions(+), 246 deletions(-)

diff --git a/net/dccp/ccids/lib/loss_interval.c 
b/net/dccp/ccids/lib/loss_interval.c
index 4fe3c1c..0ebdc67 100644
--- a/net/dccp/ccids/lib/loss_interval.c
+++ b/net/dccp/ccids/lib/loss_interval.c
@@ -129,6 +129,12 @@ static u32 dccp_li_calc_first_li(struct sock *sk,
 u16 s, u32 bytes_recv,
 u32 previous_x_recv)
 {
+   /*
+* XXX This function still relies on the old RX interface and thus can 
not be
+* kept. But it can also not be removed since the loss interval code 
calls it.
+* Since it will be replaced anyway, comment it out for this moment.
+*/
+#ifdef __THIS_IS_RESOLVED_IN_NEXT_PATCH__
struct dccp_rx_hist_entry *entry, *next, *tail = NULL;
u32 x_recv, p;
suseconds_t rtt, delta;
@@ -220,6 +226,9 @@ found:
return ~0;
else
return 100 / p;
+#else
+   return ~0;
+#endif
 }
 
 void dccp_li_update_li(struct sock *sk,
diff --git a/net/dccp/ccids/lib/packet_history.c 
b/net/dccp/ccids/lib/packet_history.c
index 2f9e0a0..e87e445 100644
--- a/net/dccp/ccids/lib/packet_history.c
+++ b/net/dccp/ccids/lib/packet_history.c
@@ -129,156 +129,6 @@ EXPORT_SYMBOL_GPL(tfrc_tx_hist_rtt);
 /*
  * Receiver History Routines
  */
-struct dccp_rx_hist *dccp_rx_hist_new(const char *name)
-{
-   struct dccp_rx_hist *hist = kmalloc(sizeof(*hist), GFP_ATOMIC);
-   static const char dccp_rx_hist_mask[] = "rx_hist_%s";
-   char *slab_name;
-
-   if (hist == NULL)
-   goto out;
-
-   slab_name = kmalloc(strlen(name) + sizeof(dccp_rx_hist_mask) - 1,
-   GFP_ATOMIC);
-   if (slab_name == NULL)
-   goto out_free_hist;
-
-   sprintf(slab_name, dccp_rx_hist_mask, name);
-   hist->dccprxh_slab = kmem_cache_create(slab_name,
-sizeof(struct dccp_rx_hist_entry),
-0, SLAB_HWCACHE_ALIGN, NULL);
-   if (hist->dccprxh_slab == NULL)
-   goto out_free_slab_name;
-out:
-   return hist;
-out_free_slab_name:
-   kfree(slab_name);
-out_free_hist:
-   kfree(hist);
-   hist = NULL;
-   goto out;
-}
-
-EXPORT_SYMBOL_GPL(dccp_rx_hist_new);
-
-void dccp_rx_hist_delete(struct dccp_rx_hist *hist)
-{
-   const char* name = kmem_cache_name(hist->dccprxh_slab);
-
-   kmem_cache_destroy(hist->dccprxh_slab);
-   kfree(name);
-   kfree(hist);
-}
-
-EXPORT_SYMBOL_GPL(dccp_rx_hist_delete);
-
-int dccp_rx_hist_find_entry(const struct list_head *list, const u64 seq,
-   u8 *ccval)
-{
-   struct dccp_rx_hist_entry *packet = NULL, *entry;
-
-   list_for_each_entry(entry, list, dccphrx_node)
-   if (entry->dccphrx_seqno == seq) {
-   packet = entry;
-   break;
-   }
-
-   if (packet)
-   *ccval = packet->dccphrx_ccval;
-
-   return packet != NULL;
-}
-
-EXPORT_SYMBOL_GPL(dccp_rx_hist_find_entry);
-struct dccp_rx_hist_entry *
-   dccp_rx_hist_find_data_packet(const struct list_head *list)
-{
-   struct dccp_rx_hist_entry *entry, *packet = NULL;
-
-   list_for_each_entry(entry, list, dccphrx_node)
-   if (entry->dccphrx_type == DCCP_PKT_DATA ||
-   entry->dccphrx_type == DCCP_PKT_DATAACK) {
-   packet = entry;
-   break;
-   }
-
-   return packet;
-}
-
-EXPORT_SYMBOL_GPL(dccp_rx_hist_find_data_packet);
-
-void dccp_rx_hist_add_packet(struct dccp_rx_hist *hist,
-   struct list_head *rx_list,
-   struct list_head *li_list,
-   struct dccp_rx_hist_entry *packet,
-   u64 nonloss_seqno)
-{
-   struct dccp_rx_hist_entry *entry, *next;
-   u8 num_later = 0;
-
-   list_add(&packet->dccphrx_node, rx_list);
-
-   num_later = TFRC_RECV_NUM_LATE_LOSS + 1;
-
-   if (!list_empty(li_list)) {
-   list_for_each_entry_safe(entry, next, rx_list, dccphrx_node) {
-   if (num_later == 0) {
-   if (after48(nonloss_seqno,
-  entry->dccphrx_seqno)) {
-   list_del_init(&entry->dccphrx_node);
-   dccp_rx_hist_entry_delete(hist, entry);
-   }
-   

[PATCH 1/6] [TFRC]: Provide central source file and debug facility

2007-12-03 Thread Gerrit Renker
This patch changes the tfrc_lib module in the following manner:

 (1) a dedicated tfrc_module source file  (this is later populated
 with TX/RX hist and LI Database routines);
 (2) a dedicated tfrc_pr_debug macro with toggle switch `tfrc_debug'.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
---
 net/dccp/ccids/Kconfig  |   12 ++---
 net/dccp/ccids/lib/Makefile |3 +-
 net/dccp/ccids/lib/loss_interval.c  |2 +-
 net/dccp/ccids/lib/packet_history.c |   28 ++---
 net/dccp/ccids/lib/packet_history.h |3 +-
 net/dccp/ccids/lib/tfrc.h   |   17 ++---
 net/dccp/ccids/lib/tfrc_module.c|   45 +++
 7 files changed, 73 insertions(+), 37 deletions(-)
 create mode 100644 net/dccp/ccids/lib/tfrc_module.c

diff --git a/net/dccp/ccids/Kconfig b/net/dccp/ccids/Kconfig
index 3d7d867..a0c42d7 100644
--- a/net/dccp/ccids/Kconfig
+++ b/net/dccp/ccids/Kconfig
@@ -63,10 +63,6 @@ config IP_DCCP_CCID3
 
  If in doubt, say M.
 
-config IP_DCCP_TFRC_LIB
-   depends on IP_DCCP_CCID3
-   def_tristate IP_DCCP_CCID3
-
 config IP_DCCP_CCID3_DEBUG
  bool "CCID3 debugging messages"
  depends on IP_DCCP_CCID3
@@ -110,5 +106,13 @@ config IP_DCCP_CCID3_RTO
is serious network congestion: experimenting with larger values 
should
therefore not be performed on WANs.
 
+# The TFRC Library: currently only has CCID 3 as customer
+config IP_DCCP_TFRC_LIB
+   depends on IP_DCCP_CCID3
+   def_tristate IP_DCCP_CCID3
+
+config IP_DCCP_TFRC_DEBUG
+   bool
+   default y if IP_DCCP_CCID3_DEBUG
 
 endmenu
diff --git a/net/dccp/ccids/lib/Makefile b/net/dccp/ccids/lib/Makefile
index 5f940a6..1295635 100644
--- a/net/dccp/ccids/lib/Makefile
+++ b/net/dccp/ccids/lib/Makefile
@@ -1,3 +1,4 @@
 obj-$(CONFIG_IP_DCCP_TFRC_LIB) += dccp_tfrc_lib.o
 
-dccp_tfrc_lib-y := loss_interval.o packet_history.o tfrc_equation.o
+dccp_tfrc_lib-y := tfrc_module.otfrc_equation.o \
+   packet_history.o loss_interval.o
diff --git a/net/dccp/ccids/lib/loss_interval.c 
b/net/dccp/ccids/lib/loss_interval.c
index f2ca4eb..4fe3c1c 100644
--- a/net/dccp/ccids/lib/loss_interval.c
+++ b/net/dccp/ccids/lib/loss_interval.c
@@ -285,7 +285,7 @@ int __init dccp_li_init(void)
return dccp_li_cachep == NULL ? -ENOBUFS : 0;
 }
 
-void dccp_li_exit(void)
+void __exit dccp_li_exit(void)
 {
if (dccp_li_cachep != NULL) {
kmem_cache_destroy(dccp_li_cachep);
diff --git a/net/dccp/ccids/lib/packet_history.c 
b/net/dccp/ccids/lib/packet_history.c
index 4805de9..ebcb6c0 100644
--- a/net/dccp/ccids/lib/packet_history.c
+++ b/net/dccp/ccids/lib/packet_history.c
@@ -35,7 +35,6 @@
  *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
  */
 
-#include 
 #include 
 #include "packet_history.h"
 
@@ -277,39 +276,18 @@ void dccp_rx_hist_purge(struct dccp_rx_hist *hist, struct 
list_head *list)
 
 EXPORT_SYMBOL_GPL(dccp_rx_hist_purge);
 
-extern int __init dccp_li_init(void);
-extern void dccp_li_exit(void);
-
-static __init int packet_history_init(void)
+int __init packet_history_init(void)
 {
-   if (dccp_li_init() != 0)
-   goto out;
-
tfrc_tx_hist = kmem_cache_create("tfrc_tx_hist",
 sizeof(struct tfrc_tx_hist_entry), 0,
 SLAB_HWCACHE_ALIGN, NULL);
-   if (tfrc_tx_hist == NULL)
-   goto out_li_exit;
-
-   return 0;
-out_li_exit:
-   dccp_li_exit();
-out:
-   return -ENOBUFS;
+   return tfrc_tx_hist == NULL ? -ENOBUFS : 0;
 }
-module_init(packet_history_init);
 
-static __exit void packet_history_exit(void)
+void __exit packet_history_exit(void)
 {
if (tfrc_tx_hist != NULL) {
kmem_cache_destroy(tfrc_tx_hist);
tfrc_tx_hist = NULL;
}
-   dccp_li_exit();
 }
-module_exit(packet_history_exit);
-
-MODULE_AUTHOR("Ian McDonald <[EMAIL PROTECTED]>, "
- "Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>");
-MODULE_DESCRIPTION("DCCP TFRC library");
-MODULE_LICENSE("GPL");
diff --git a/net/dccp/ccids/lib/packet_history.h 
b/net/dccp/ccids/lib/packet_history.h
index 0670f46..9a2642e 100644
--- a/net/dccp/ccids/lib/packet_history.h
+++ b/net/dccp/ccids/lib/packet_history.h
@@ -39,8 +39,7 @@
 #include 
 #include 
 #include 
-
-#include "../../dccp.h"
+#include "tfrc.h"
 
 /* Number of later packets received before one is considered lost */
 #define TFRC_RECV_NUM_LATE_LOSS 3
diff --git a/net/dccp/ccids/lib/tfrc.h b/net/dccp/ccids/lib/tfrc.h
index 5a0ba86..ab8848c 100644
--- a/net/dccp/ccids/lib/tfrc.h
+++ b/net/dccp/ccids/lib/tfrc.h
@@ -3,10 +3,11 @@
 /*
  *  net/dccp/ccids/lib/tfrc.h
  *
- *  Copyright (c) 2005 The University of W

[PATCH 6/6] [CCID]: Interface CCID3 code with newer Loss Intervals Database

2007-12-03 Thread Gerrit Renker
This hooks up the TFRC Loss Interval database with CCID 3 packet reception.
In addition, it makes the CCID-specific computation of the first loss
interval (which requires access to all the guts of CCID3) local to ccid3.c.

The patch also fixes an omission in the DCCP code, that of a default /
fallback RTT value (defined in section 3.4 of RFC 4340 as 0.2 sec); while
at it, the  upper bound of 4 seconds for an RTT sample has  been reduced to
match the initial TCP RTO value of 3 seconds from[RFC 1122, 4.2.3.1].

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
---
 net/dccp/ccids/ccid3.c |   76 +++
 net/dccp/ccids/ccid3.h |8 ++--
 net/dccp/dccp.h|7 +++-
 3 files changed, 72 insertions(+), 19 deletions(-)

diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index ef243dc..bf6d043 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -34,11 +34,7 @@
  *  along with this program; if not, write to the Free Software
  *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
  */
-#include "../ccid.h"
 #include "../dccp.h"
-#include "lib/packet_history.h"
-#include "lib/loss_interval.h"
-#include "lib/tfrc.h"
 #include "ccid3.h"
 
 #include 
@@ -751,6 +747,46 @@ static int ccid3_hc_rx_insert_options(struct sock *sk, 
struct sk_buff *skb)
return 0;
 }
 
+/** ccid3_first_li  -  Implements [RFC 3448, 6.3.1]
+ *
+ * Determine the length of the first loss interval via inverse lookup.
+ * Assume that X_recv can be computed by the throughput equation
+ * s
+ * X_recv = 
+ *  R * fval
+ * Find some p such that f(p) = fval; return 1/p (scaled).
+ */
+static u32 ccid3_first_li(struct sock *sk)
+{
+   struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk);
+   u32 x_recv, p, delta;
+   u64 fval;
+
+   if (hcrx->ccid3hcrx_rtt == 0) {
+   DCCP_WARN("No RTT estimate available, using fallback RTT\n");
+   hcrx->ccid3hcrx_rtt = DCCP_FALLBACK_RTT;
+   }
+
+   delta = ktime_to_us(net_timedelta(hcrx->ccid3hcrx_last_feedback));
+   x_recv = scaled_div32(hcrx->ccid3hcrx_bytes_recv, delta);
+   if (x_recv == 0) {  /* would also trigger divide-by-zero */
+   DCCP_WARN("X_recv==0\n");
+   if ((x_recv = hcrx->ccid3hcrx_x_recv) == 0) {
+   DCCP_BUG("stored value of X_recv is zero");
+   return ~0U;
+   }
+   }
+
+   fval = scaled_div(hcrx->ccid3hcrx_s, hcrx->ccid3hcrx_rtt);
+   fval = scaled_div32(fval, x_recv);
+   p = tfrc_calc_x_reverse_lookup(fval);
+
+   ccid3_pr_debug("%s(%p), receive rate=%u bytes/s, implied "
+  "loss rate=%u\n", dccp_role(sk), sk, x_recv, p);
+
+   return p == 0 ? ~0U : scaled_div(1, p);
+}
+
 static void ccid3_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb)
 {
struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk);
@@ -779,6 +815,14 @@ static void ccid3_hc_rx_packet_recv(struct sock *sk, 
struct sk_buff *skb)
/*
 *  Handle pending losses and otherwise check for new loss
 */
+   if (tfrc_rx_loss_pending(&hcrx->ccid3hcrx_hist) &&
+   tfrc_rx_handle_loss(&hcrx->ccid3hcrx_hist,
+   &hcrx->ccid3hcrx_li_hist,
+   skb, ndp, ccid3_first_li, sk) ) {
+   do_feedback = FBACK_PARAM_CHANGE;
+   goto done_receiving;
+   }
+
if (tfrc_rx_new_loss_indicated(&hcrx->ccid3hcrx_hist, skb, ndp))
goto update_records;
 
@@ -788,13 +832,23 @@ static void ccid3_hc_rx_packet_recv(struct sock *sk, 
struct sk_buff *skb)
if (unlikely(!is_data_packet))
goto update_records;
 
-   if (list_empty(&hcrx->ccid3hcrx_li_hist)) {  /* no loss so far: p = 0 */
-
+   if (! tfrc_lh_is_initialised(&hcrx->ccid3hcrx_li_hist)) {
+   /*
+* Empty loss history: no loss so far, hence p stays 0.
+* Sample RTT values, since an RTT estimate is required for the
+* computation of p when the first loss occurs; RFC 3448, 6.3.1.
+*/
sample = tfrc_rx_sample_rtt(&hcrx->ccid3hcrx_hist, skb);
if (sample != 0)
hcrx->ccid3hcrx_rtt =
tfrc_ewma(hcrx->ccid3hcrx_rtt, sample, 9);
 
+   } else if (tfrc_lh_update_i_mean(&hcrx->ccid3hcrx_li_hist, skb)) {
+   /*
+* Step (3) of [RFC 3448, 6.1]: Recompute I_mean and, if I_mean
+* has decreased (resp. p has increased), send feedback now.
+*/
+ 

[RFC][Resend][PATCH 0/7]: CCID3/TFRC RX history and loss intervals implementation

2007-12-03 Thread Gerrit Renker
I am re-submitting the original patch set which Arnaldo referred to, due to:

 1/  It has been reworked to patch/compile against the 2.6.25 tree, which lead
 to a few changes (not conceptual in nature).

 2/  Patch #7 in Arnaldo's set says that this patch set introduced some bugs.

 Please can you point out if and what those are, so that they can be fixed 
(just
 saying that there are bugs without saying where they are is not very 
helpful).

Overview:
-
Patch #1: Provides a central module source file for dccp_tfrc_lib.
Patch #2: Implements a new (reduced) RX history interface.
Patch #3: Hooks CCID3 up with the new RX history interface.
Patch #4: Removes the old/previous RX history interface.
Patch #5: Ringbuffer-based loss intervals database.
Patch #6: Interface CCID3 with loss intervals database. 

The algorithms that are implemented here have been documented on 
http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid3_packet_reception/
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCHES 0/7]: Reorganization of RX history patches

2007-12-03 Thread Gerrit Renker
Hi Arnaldo,

hank you for going through this. I have just backported your recent patches of 
2.6.25
to the DCCP/CCID4/Faster Restart test tree at 
git://eden-feed.erg.abdn.ac.uk/dccp_exp {dccp,ccid4,dccp_fr}
as per subsequent message.
|do, so please consider moving DCCP discussion to 
netdev@vger.kernel.org,
|where lots of smart networking folks are present and can help our 
efforts
|on turning RFCs to code.
Are you suggesting using netdev exclusively or in addition to [EMAIL PROTECTED]
  

|   Please take a look at this patch series where I reorganized your work 
on the new
| TFRC rx history handling code. I'll wait for your considerations and then do 
as many
| interactions as reasonable to get your work merged.
| 
|   It should be completely equivalent, plus some fixes and optimizations, 
such as:
It will be necessary to address these points one-by-one. Before diving into 
making
fixes and `optimisations', have you tested your code? The patches you are 
referring to
have been posted and re-posted for a period of over 9 months on [EMAIL 
PROTECTED], and
there are regression tests which show that this code improves on the existing 
Linux
implementation. These are labelled as `test tree' on 
http://www.linux-foundation.org/en/Net:DCCP_Testing#Regression_testing
So if you are making changes to the code, I would like to ask if you have run 
similar
regression tests, to avoid having to step back later.


 
| . The code that allocates the RX ring deals with failures when one of the 
entries in
|   the ring buffer is not successfully allocated, the original code was 
leaking the
|   successfully allocated entries.
| 
| . We do just one allocation for the ring buffer, as the number of entries is 
fixed we
|   should just do one allocation and not TFRC_NDUPACK times.
Will look at the first point in the patch; with regard to the second point I 
agree, this
will make the code simpler, which is good. 

| . I haven't checked if all the code was commited, as I tried to introduce 
just what was
|   immediatelly used, probably we'll need to do some changes when working on 
the merge
|   of your loss intervals code.
Sorry I don't understand this point.

| . I changed the ccid3_hc_rx_packet_recv code to set hcrx->ccid3hcrx_s for the 
first
|   non-data packet instead of calling ccid3_hc_rx_set_state, that would use 0 
as the
|   initial value in the EWMA calculation.
This is a misunderstanding. Non-data packets are not considered in the moving 
average
for the data packet size `s'; and it would be an error to do (consider 40byte 
Acks vs.
1460byte data packets, also it is in RFC 4342). 
Where would the zero initial value come from? I think this is also a 
misunderstanding.
Please have a look below:
static void ccid3_hc_rx_packet_recv(struct sock *sk, struct sk_buff 
*skb)
{
// ...
u32 sample, payload_size = skb->len - dccp_hdr(skb)->dccph_doff 
* 4;

if (unlikely(hcrx->ccid3hcrx_state == TFRC_RSTATE_NO_DATA)) {
if (is_data_packet) {
do_feedback = FBACK_INITIAL;
ccid3_hc_rx_set_state(sk, TFRC_RSTATE_DATA);
ccid3_hc_rx_update_s(hcrx, payload_size);
}
goto update_records;
}

==> Non-data packets are ignored for the purposes of computing s (this is in 
the RFC),
consequently update_s() is only called for data packets; using the two 
following
functions:


static inline u32 tfrc_ewma(const u32 avg, const u32 newval, const u8 
weight)
{
return avg ? (weight * avg + (10 - weight) * newval) / 10 : 
newval;
}

static inline void ccid3_hc_rx_update_s(struct ccid3_hc_rx_sock *hcrx, 
int len)
{
if (likely(len > 0))/* don't update on empty packets (e.g. 
ACKs) */
hcrx->ccid3hcrx_s = tfrc_ewma(hcrx->ccid3hcrx_s, len, 
9);
}

==> Hence I can't see where a zero value should come from: ccid3hrx_s is 
initially 
initialised with zero (memset(...,0,...)); when first called, update_s() 
will
feed a non-zero payload size to tfrc_ewma(), which will return  `newval' = 
payload_size,
hence the first data packet will contribute a non-zero payload_size.
Zero-sized DCCP-Data packets are pathological and are ignored by the CCID 
calculations
(not by the receiver); a corresponding counterpart for zero-sized

| 
|   It is available at:
| 
| master.kernel.org:/pub/scm/linux/kernel/git/acme/net-2.6.25
| 
Need to do this separately. As said, the code has been developed and tested 
over a long time,
it took a long while until it acted predictably, so being careful is very 
important.

I would rather not have my patches merged and continue to run a test tree if 
the current
changes alter the behaviour to the worse.
--
To unsubsc

  1   2   >