Re: [PATCH 5/8]: Move debugging macro to header file

2006-12-19 Thread Arnaldo Carvalho de Melo

On 12/19/06, Ian McDonald <[EMAIL PROTECTED]> wrote:

Gerrit,

From memory it is when I do make modules_install that it fails as it
is a linking issue not a compiling issue. The relevant part of the
config is that I've got CCID3 debugging turned on of course.


The loop comes because dccp_ccid3.ko needs symbols provided by
dccp_tfrc_lib.ko, so if we try to export ccid3_pr_debug to be used by
dccp_tfrc_lib.ko we'll get the loop, so I guess the right thing to do
is to introduce tfrc_pr_debug, that will give us more flexibility,
allowing one to select ccid3 debugging, tfrc debugging, both or none.

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/8]: Move debugging macro to header file

2006-12-19 Thread Arnaldo Carvalho de Melo

On 12/19/06, Ian McDonald <[EMAIL PROTECTED]> wrote:

On 12/20/06, Arnaldo Carvalho de Melo <[EMAIL PROTECTED]> wrote:
> The loop comes because dccp_ccid3.ko needs symbols provided by
> dccp_tfrc_lib.ko, so if we try to export ccid3_pr_debug to be used by
> dccp_tfrc_lib.ko we'll get the loop, so I guess the right thing to do
> is to introduce tfrc_pr_debug, that will give us more flexibility,
> allowing one to select ccid3 debugging, tfrc debugging, both or none.
>
> - Arnaldo
>
I think the right thing to do is not to introduce another level of debugging.

People want to debug or they don't in my opinion. I think we should do
away with ccid3_pr_debug and ccid2_pr_debug. I always turn them all on
or all off when working with testing (or add my own statements in).


If the way to go is a boolean, i.e. to debug or not to debug we have
to remove ccid{2,3}_pr_debug and stick to using dccp_pr_debug
everywhere, that would eliminate the loop as dccp.ko doesn't directly
uses any code from dccp_ipv[4,6]. ccid[2, 3] or tfrc.

But I think that being able to debug just the dccp core, or just
ccid3, or just tfrc is better.

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/8]: Move debugging macro to header file

2006-12-19 Thread Arnaldo Carvalho de Melo

On 12/19/06, Ian McDonald <[EMAIL PROTECTED]> wrote:

On 12/20/06, Arnaldo Carvalho de Melo <[EMAIL PROTECTED]> wrote:
> > I think the right thing to do is not to introduce another level of 
debugging.
> >
> > People want to debug or they don't in my opinion. I think we should do
> > away with ccid3_pr_debug and ccid2_pr_debug. I always turn them all on
> > or all off when working with testing (or add my own statements in).
>
> If the way to go is a boolean, i.e. to debug or not to debug we have
> to remove ccid{2,3}_pr_debug and stick to using dccp_pr_debug
> everywhere, that would eliminate the loop as dccp.ko doesn't directly
> uses any code from dccp_ipv[4,6]. ccid[2, 3] or tfrc.
>
Agree and that's what I prefer.

> But I think that being able to debug just the dccp core, or just
> ccid3, or just tfrc is better.
>
I'll go with your choice on this one. In the interim I'll carry on
using dccp_pr_debug for tfrc debugging as short on time.


Agreed, in the short term just use dccp_pr_debug in tfrc, it should just work.

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


HEADS UP: DCCP debugging

2006-12-27 Thread Arnaldo Carvalho de Melo
I've been partying and working on an OSTRA rebirth, this time based on
my current work on DWARF tools, so to give just a heads up, here is the
current stage, explanations below:

1. build the kernel as usual + select CONFIG_DEBUG_INFO to insert
debugging information, the resulting kernel will be larger but otherwise
the same, the debug information, just like with any other project built
with 'gcc -g' will be in separate ELF sections.

2. Run ctracer, the tool I'm writing, available at
http://www.kernel.org/git/?p=linux/kernel/git/acme/pahole.git;a=summary

ctracer --recursive -dir net/dccp/ --glob "*.ko" --kprobes vmlinux sock > 
ctracer.c

'--recursive --dir net/dccp/' --glob "*.ko"' will tell ctracer to
consider just the files in the net/dccp/ subtree

'--kprobes vmlinux' its just for ctracer to find the needed structs and
function prototypes for the kprobes routines and for the current trace
relay agent, a lame one that just uses plain printk, but will be
replaced by something along the lines of what is used in blktrace, or
just the same, perhaps, haven't checked yet.

finally 'sock' tells ctracer that we're interested only in functions
that receives as one of its parameter a pointer to struct sock.

The resulting ctracer.c file is available at:
http://oops.ghostprotocols.net:81/acme/sock_ctracer.c

Then, using the instructions in Documentation/kprobes.txt, we build the
resulting kernel module, ctracer.c, for reference the makefile is very
simple:

- 8< --
[EMAIL PROTECTED] ctracer_example]$ cat Makefile
obj-m := ctracer.o
KDIR := /pub/scm/linux/kernel/git/acme/linux-2.6/
PWD := $(shell pwd)
default:
$(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules
[EMAIL PROTECTED] ctracer_example]$
- 8< --

Then we load the DCCP modules:

[EMAIL PROTECTED] ~]# modprobe dccp_ccid3
[   86.873785] CCID: Registered CCID 3 (ccid3)
[EMAIL PROTECTED] ~]# modprobe dccp_ipv4

And then the ctracer.ko module, that will complain about some symbols
not being found, i.e. I haven't loaded ccid2.ko, dccp_ipv6.ko nor the
dccpprobe.ko (that I hope to turn moot after I finish this work):

[EMAIL PROTECTED] ~]# insmod ./ctracer.ko
[   94.835016] register_jprobe(ccid2_hc_tx_packet_recv) failed, returned -22
[   94.906545] register_kretprobe(ccid2_hc_tx_packet_recv) failed, returned -22
[   94.991244] register_jprobe(ccid2_hc_tx_packet_sent) failed, returned -22
[   95.072229] register_kretprobe(ccid2_hc_tx_packet_sent) failed, returned -22
[   95.154525] register_jprobe(ccid2_hc_tx_send_packet) failed, returned -22
[   95.236687] register_kretprobe(ccid2_hc_tx_send_packet) failed, returned -22
[   95.315511] register_jprobe(ccid2_hc_tx_init) failed, returned -22
[   95.389672] register_kretprobe(ccid2_hc_tx_init) failed, returned -22
[   95.473402] register_jprobe(ccid2_hc_tx_exit) failed, returned -22
[   95.559145] register_kretprobe(ccid2_hc_tx_exit) failed, returned -22
[   95.642080] register_jprobe(ccid2_hc_rx_packet_recv) failed, returned -22
[   95.725702] register_kretprobe(ccid2_hc_rx_packet_recv) failed, returned -22
[   95.820803] register_jprobe(ccid2_hc_tx_dec_pipe) failed, returned -22
[   95.903664] register_kretprobe(ccid2_hc_tx_dec_pipe) failed, returned -22
[   95.986252] register_jprobe(ccid2_hc_tx_kill_rto_timer) failed, returned -22
[   96.071975] register_kretprobe(ccid2_hc_tx_kill_rto_timer) failed, returned 
-22
[   96.151152] register_jprobe(ccid2_start_rto_timer) failed, returned -22
[   96.243973] register_kretprobe(ccid2_start_rto_timer) failed, returned -22
[   96.327091] register_jprobe(ccid2_change_l_ack_ratio) failed, returned -22
[   96.413197] register_kretprobe(ccid2_change_l_ack_ratio) failed, returned -22

This soft lockup is harmless so far, something related to qemu +
kprobes, still have to investigate, but has not prevented me from
testing what I have so far:

[  106.611513] BUG: soft lockup detected on CPU#0!
[  106.625659]  [] update_process_times+0x28/0x70
[  106.641645]  [] timer_interrupt+0x2b/0x50
[  106.648264]  [] handle_IRQ_event+0x25/0x50
[  106.655091]  [] handle_level_irq+0x0/0xc0
[  106.668005]  [] handle_level_irq+0x55/0xc0
[  106.716414]  [] do_IRQ+0x63/0xb0
[  106.720906]  [] common_interrupt+0x23/0x28
[  106.726948]  [] kallsyms_expand_symbol+0x79/0xb0
[  106.738641]  [] kallsyms_lookup_name+0x3d/0x80
[  106.753817]  [] longjmp_break_handler+0x0/0x130
[  106.771530]  [] jprobe_init+0x41/0xaf [ctracer]
[  106.777810]  [] __register_kprobe+0x45/0x270
[  106.828679]  [] jprobe_init+0x41/0xaf [ctracer]
[  106.841134]  [] sys_init_module+0x155/0x1740
[  106.849205]  [] syscall_call+0x7/0xb
[  106.859973]  ===
[  115.678750] register_jprobe(dccp_diag_get_info) failed, returned -22
[  115.846025] register_kretprobe(dccp_diag_get_info) failed, returned -22
[  118.297134] register_jprobe(dccp_v6_conn_request) failed, returned -22
[  118.452448] register_kretprobe(dccp_v6_conn_request) failed, returned -22
[

Re: [SUMMARY]: Problems with loss interval recalculation / audit against [RFC 3448, 5.4]

2007-01-08 Thread Arnaldo Carvalho de Melo

On 1/8/07, Eddie Kohler <[EMAIL PROTECTED]> wrote:

Gerrit,

First I would like to send a NEWBIE REQUEST.

How can I get a complete copy of the DCCP source checked out, including
your/Ian's patches?  Where is there something on line that explains the git
process?  Thanks for your/anyone's help.


Git process:

Install git (if you have problems with this, let us know):

Then:

git-clone git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.git

With this you should have David Miller networking tree, the canonical
one when working on linux networking code, mine is just a staging area
for David as his is a staging area for Linus.

At this point Gerrit is maintaining a set of patches, not using git,
so you'll have to get the patches he is staging for me at:

http://www.erg.abdn.ac.uk/users/gerrit/dccp/patch-backlog/the_whole_lot.tar.gz

Apply them in order and you'll have the current status quo, I hope to
go over the patches Gerrit is graciously maintaining soon, but the
process above should be simple enough for you to follow.

Questions?

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1]: Remove build warnings on 64-bit architecture

2007-01-12 Thread Arnaldo Carvalho de Melo

On 1/12/07, Ian McDonald <[EMAIL PROTECTED]> wrote:

On 13/01/07, Gerrit Renker <[EMAIL PROTECTED]> wrote:





> @@ -899,7 +901,7 @@ static void ccid3_hc_rx_packet_recv(stru
> DCCP_BUG_ON(r_sample < 0);
> if (unlikely(r_sample <= t_elapsed))
> DCCP_WARN("r_sample=%ldus, t_elapsed=%ldus\n",
> - r_sample, t_elapsed);
> + (long)r_sample, (long)t_elapsed);
> else
> r_sample -= t_elapsed;
> CCID3_RTT_SANITY_CHECK(r_sample);
> -

I can't comment on suseconds stuff as don't know so much...


Checked this last month, look at the URL to see the diff:

http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=1fba78b6cba14bd37fdb12c5367f1e4d58ff2e0f

commit 1fba78b6cba14bd37fdb12c5367f1e4d58ff2e0f
Author: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
Date:   Sun Dec 10 15:39:29 2006 -0200

   [DCCP] ccid3: Fixup some type conversions related to rtts

   Spotted by David Miller when compiling on sparc64, I reproduced it here on
   parisc64, that are the only platforms to define __kernel_suseconds_t as an
   'int', all the others, x86_64 and x86 included typedef it as a
'long', but from
   the definition of suseconds_t it should just be an 'int' on
platforms where it
   is >= 32bits, it would not require all the castings from
suseconds_t to (int)
   when printking variables of this type, that are not needed on parisc64 and
   sparc64.

   Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHES][0/2] DCCP fixes

2007-03-06 Thread Arnaldo Carvalho de Melo

Hi David,

  Please consider pulling from:

master.kernel.org:/pub/scm/linux/kernel/git/acme/net-2.6.git

  These are just bugfixes, so should go into 2.6.21,

Best Regards,

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2][DCCP]: Correctly split CCID half connections

2007-03-06 Thread Arnaldo Carvalho de Melo

This fixes a bug caused by a previous patch, which causes DCCP servers in
LISTEN state to not receive packets.

This patch changes the logic so that
* servers in either LISTEN or OPEN state get the RX half connection packets
* clients in OPEN state get the TX half connection packets

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
From 253a86dae9b01b2be15ef838a19de7ccde575fb4 Mon Sep 17 00:00:00 2001
From: Gerrit Renker <[EMAIL PROTECTED]>
Date: Tue, 6 Mar 2007 13:02:35 -0300
Subject: [PATCH] [DCCP]: Correctly split CCID half connections

This fixes a bug caused by a previous patch, which causes DCCP servers in
LISTEN state to not receive packets.

This patch changes the logic so that
 * servers in either LISTEN or OPEN state get the RX half connection packets
 * clients in OPEN state get the TX half connection packets

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/input.c |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/dccp/input.c b/net/dccp/input.c
index 4dee462..287099f 100644
--- a/net/dccp/input.c
+++ b/net/dccp/input.c
@@ -256,10 +256,10 @@ int dccp_rcv_established(struct sock *sk, struct sk_buff *skb,
 	 *(only one is active at a time); when moving to bidirectional
 	 *service, this needs to be revised.
 	 */
-	if (dccp_sk(sk)->dccps_role == DCCP_ROLE_SERVER)
-		ccid_hc_rx_packet_recv(dp->dccps_hc_rx_ccid, sk, skb);
-	else
+	if (dccp_sk(sk)->dccps_role == DCCP_ROLE_CLIENT)
 		ccid_hc_tx_packet_recv(dp->dccps_hc_tx_ccid, sk, skb);
+	else	/* listening or connected server */
+		ccid_hc_rx_packet_recv(dp->dccps_hc_rx_ccid, sk, skb);
 
 	return __dccp_rcv_established(sk, skb, dh, len);
 discard:
@@ -495,10 +495,10 @@ int dccp_rcv_state_process(struct sock *sk, struct sk_buff *skb,
 			goto discard;
 
 		/* XXX see the comments in dccp_rcv_established about this */
-		if (dccp_sk(sk)->dccps_role == DCCP_ROLE_SERVER)
-			ccid_hc_rx_packet_recv(dp->dccps_hc_rx_ccid, sk, skb);
-		else
+		if (dccp_sk(sk)->dccps_role == DCCP_ROLE_CLIENT)
 			ccid_hc_tx_packet_recv(dp->dccps_hc_tx_ccid, sk, skb);
+		else
+			ccid_hc_rx_packet_recv(dp->dccps_hc_rx_ccid, sk, skb);
 	}
 
 	/*
-- 
1.5.0.2



[PATCH 2/2][DCCP]: Set RTO for newly created child socket

2007-03-06 Thread Arnaldo Carvalho de Melo

This mirrors a recent change in tcp_open_req_child, whereby the icsk_rto of the
newly created child socket was not set (but rather on the parent socket). Same
fix for DCCP.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
From 2d53968791b732cba42c3ff2c371060a349da2d7 Mon Sep 17 00:00:00 2001
From: Gerrit Renker <[EMAIL PROTECTED]>
Date: Tue, 6 Mar 2007 13:05:08 -0300
Subject: [PATCH] [DCCP]: Set RTO for newly created child socket

This mirrors a recent change in tcp_open_req_child, whereby the icsk_rto of the
newly created child socket was not set (but rather on the parent socket). Same
fix for DCCP.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/minisocks.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/dccp/minisocks.c b/net/dccp/minisocks.c
index 6656bb4..6d235b3 100644
--- a/net/dccp/minisocks.c
+++ b/net/dccp/minisocks.c
@@ -103,7 +103,7 @@ struct sock *dccp_create_openreq_child(struct sock *sk,
 
 	if (newsk != NULL) {
 		const struct dccp_request_sock *dreq = dccp_rsk(req);
-		struct inet_connection_sock *newicsk = inet_csk(sk);
+		struct inet_connection_sock *newicsk = inet_csk(newsk);
 		struct dccp_sock *newdp = dccp_sk(newsk);
 		struct dccp_minisock *newdmsk = dccp_msk(newsk);
 
-- 
1.5.0.2



Re: [ANNOUNCE]: Patch Preview

2007-03-12 Thread Arnaldo Carvalho de Melo

On 3/12/07, Gerrit Renker <[EMAIL PROTECTED]> wrote:

I have put quite a lot of work into
  * bringing CCID 3 up to spec
  * implement loss detection according to RFC 3448/4342
  * add loss intervals handling according to rfc3448bis
  * many cool new features
  * ample documentation, applications, and test modules for most features

However, since patches are small for review-ability, this amounts to a total of 
75 patches,
including the patch backlog; please see suggestion (2).

1. PREVIEW: Since mostly the following files are modified:
  net/dccp/ccids/lib/ccid3.*
  net/dccp/ccids/lib/loss_interval.*
  net/dccp/ccids/lib/packet_history.*
   I have posted combined patches onto

   http://www.erg.abdn.ac.uk/users/gerrit/dccp/patches_working-set/

   These apply in-order (see series file) onto today's tree. They have been 
extensively tested,
   can offer a test module for most essential features.


2. SUGGESTION: Since the patch backlog is really quite old now (over 2 months), 
I would like
   to suggest to re-submit it as one, combined patch, in the format of
   
http://www.erg.abdn.ac.uk/users/gerrit/dccp/patches_working-set/01_The_Patch_Backlog.diff
   (maybe a bit more verbose)
   ==> But this depends on whether people are ok with this (see (3))


No, what I think is the best to do is to consider your patch backlog
as a separate git tree that now will be pulled, preserving the
changesets history.

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE]: Patch Preview

2007-03-12 Thread Arnaldo Carvalho de Melo

On 3/12/07, Ian McDonald <[EMAIL PROTECTED]> wrote:

On 3/13/07, Gerrit Renker <[EMAIL PROTECTED]> wrote:
> I have put quite a lot of work into
>   * bringing CCID 3 up to spec
>   * implement loss detection according to RFC 3448/4342
>   * add loss intervals handling according to rfc3448bis
>   * many cool new features
>   * ample documentation, applications, and test modules for most features

Gerrit,

This sounds great. Well done!
>
> 3. MISSED? I am not sure if I have missed Sign-offs or Acked-byes. Ian (and 
others) can you please
>check that http://www.erg.abdn.ac.uk/users/gerrit/dccp/patch-backlog/
>is correct? In particular, only the following two out of 25 are not 
acked/signed-off:
> * 
http://www.erg.abdn.ac.uk/users/gerrit/dccp/patch-backlog/3e_CCID3_send_time_race_condition.diff
>   (but I believe you said you were ok with it, since the patch was 
changed  to update t_ipi when `s' changes)

Acked-by: Ian McDonald <[EMAIL PROTECTED]>

> * 
http://www.erg.abdn.ac.uk/users/gerrit/dccp/patch-backlog/4a_debug_always_no_module.diff
>

I'm not so confident when it comes to the debugging code which is why
I didn't comment. One thing is I notice this code which looks like it
could be useful if ported to DCCP:
http://master.kernel.org/git/?p=linux/kernel/git/davem/net-2.6.22.git;a=commit;h=8e2101a9faab43dcf307d46784a1ede7997fe729

This allows turning debug code on/off by sysctl.

> 4. SUMMARY of patch sets 2 and 3:
>
> * avoiding accumulation of sending credits
> * faster & more efficient time-stamping (timestamps from NIC)
> * honour initial (SYN handshake) RTT estimate
> * implement rfc3448bis nofeedback timer handling
> - ignore feedback after nofeedback timer expiry when p=0
> - do not reduce X after nofeedback timer expiry
> * implement larger initial windows as described in rfc3448bis
> * resolve locking issues for SMP
> - atomic context between sending packet & updating CCID 
structures
> - atomic context when updating t_nom
> - protect transmitter data structures against asynchronous 
changes
> - keep garbage collection separate from access
>
> * clean up and minimize socket data structures
> * allow retrieving current Maximum Packet Size via socket option
> * compilation fixes for 64-bit architectures
> * function for distinguishing `data' and `non-data' (NDP) packet types
> * routines to perform modulo-16 (window counter CCval) arithmetic
> * simplify the kernel API (split functionality into modules)
> * completely revised RX history structure
> - circular list using only minimum of entries
> - object-oriented programming interface (like abstract data 
type)
> - reusing the same structure for both RTT sampling and loss 
detection
> - generic service, available to other CCIDs
> * loss detection
> - cope with delayed, late, and re-ordered packets
> - detect when a sequence hole is later filled
> - take NDP information into account
> - assign losses of the same RTT to the same loss event
> - recycle history structures after loss detection (cope with 
loss bursts)
>
> * loss interval management
> - also ring list since simplifies interface
> - object-oriented programming interface (like abstract data 
type)
> - on-demand allocation: less memory consumption on loss-free 
connections
> - handing over from loss detection to loss interval history 
completely automated
>   by module (reusable for future CCIDs)
>
> Enjoy,
> Gerrit
>
The new patches sound great but I find them hard to review all
combined. Arnaldo has asked for a git tree I see. Can that be


No need for a git tree, I can apply it one by one here,what I said is
that what gerryt has is equivalent to a git tree, i.e. a series of
patches with comments and I don't want to lose that comments neither
the separation.

Will submit some more csets I have here for the skb layer header
pointers series and merge this after a quick review (famous last
words, lets see if its true today :-) ).

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE]: Patch Preview

2007-03-12 Thread Arnaldo Carvalho de Melo

On 3/12/07, Arnaldo Carvalho de Melo <[EMAIL PROTECTED]> wrote:

On 3/12/07, Ian McDonald <[EMAIL PROTECTED]> wrote:



> The new patches sound great but I find them hard to review all
> combined. Arnaldo has asked for a git tree I see. Can that be

No need for a git tree, I can apply it one by one here,what I said is
that what gerryt has is equivalent to a git tree, i.e. a series of
patches with comments and I don't want to lose that comments neither
the separation.

Will submit some more csets I have here for the skb layer header
pointers series and merge this after a quick review (famous last
words, lets see if its true today :-) ).


I started merging some, to speed up the process I'll accept some even
with reservations on some aspects of it, such as:

1a_simplify_seqno_arithmetic.diff:

- use of all upercase (ADD48, etc)
- DCCP_MAX_SEQNO should really be something like whats in
include/linux/kernel.h:

#define INT_MAX ((int)(~0U>>1))
#define INT_MIN (-INT_MAX - 1)
#define UINT_MAX(~0U)
#define LONG_MAX((long)(~0UL>>1))
#define LONG_MIN(-LONG_MAX - 1)
#define ULONG_MAX   (~0UL)
#define LLONG_MAX   ((long long)(~0ULL>>1))
#define LLONG_MIN   (-LLONG_MAX - 1)
#define ULLONG_MAX  (~0ULL)

But named UINT48_MAX or something in this direction, In fact I don't
see why these things are in kernel.h, they probably would be better of
in limits.h, like in userland.


1e_dccp_inc_seqno-no-pointer-variant.diff:

What is the point? INC48() is introduced but not used anywhere besides
in dccp_inc_seqno, where there is no changes with this patch, does the
other patches after this series use it?

The others in the 1e series looks OK apart from the issues mentioned
above, will try and continue tomorrow

Thanks a lot!

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1][DCCP] getsockopt: Fix DCCP_SOCKOPT_[SEND,RECV]_CSCOV

2007-03-28 Thread Arnaldo Carvalho de Melo
David,

Please push for 2.6.21 and stable (CCed for good measure).

- Arnaldo

---

We were only checking if there was enough space to put the int, but left len as
specified by the (malicious) user, sigh, fix it by setting len to sizeof(val) 
and
transfering just one int worth of data, the one asked for.

Also check for negative len values.

Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>

diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index cf28c53..6607b7b 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -575,7 +575,7 @@ static int do_dccp_getsockopt(struct sock *sk, int level, 
int optname,
if (get_user(len, optlen))
return -EFAULT;
 
-   if (len < sizeof(int))
+   if (len < (int)sizeof(int))
return -EINVAL;
 
dp = dccp_sk(sk);
@@ -589,9 +589,11 @@ static int do_dccp_getsockopt(struct sock *sk, int level, 
int optname,
   (__be32 __user *)optval, optlen);
case DCCP_SOCKOPT_SEND_CSCOV:
val = dp->dccps_pcslen;
+   len = sizeof(val);
break;
case DCCP_SOCKOPT_RECV_CSCOV:
val = dp->dccps_pcrlen;
+   len = sizeof(val);
break;
case 128 ... 191:
return ccid_hc_rx_getsockopt(dp->dccps_hc_rx_ccid, sk, optname,
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1][DCCP] getsockopt: Fix DCCP_SOCKOPT_[SEND,RECV]_CSCOV

2007-03-28 Thread Arnaldo Carvalho de Melo

On 3/28/07, Ian McDonald <[EMAIL PROTECTED]> wrote:

On 3/29/07, Arnaldo Carvalho de Melo <[EMAIL PROTECTED]> wrote:
> David,
>
> Please push for 2.6.21 and stable (CCed for good measure).
>
> - Arnaldo
>
> ---
>
> We were only checking if there was enough space to put the int, but left len 
as
> specified by the (malicious) user, sigh, fix it by setting len to sizeof(val) 
and
> transfering just one int worth of data, the one asked for.
>
> Also check for negative len values.
>
Part of the issue here is possibly that we are using signed ints here
and the type from userspace is socklen_t which by my quick check is
unsigned on my system.

I haven't checked how this is defined on other architectures yet but
if this is the case we should tidyup to remove other possible errors
of this type. I'll look into this some more as time permits.

Or am I talking through a whole in my head?



One way or the other we are safe now, no?

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 39/43]: Protect against Reset/Sync floods due to buggy applications

2007-04-10 Thread Arnaldo Carvalho de Melo

On 4/10/07, Eddie Kohler <[EMAIL PROTECTED]> wrote:

Hi Gerrit,

Gerrit Renker wrote:
> Listen Eddie,
>
> I am not interested in your offline email. All further such email will
> be directed back to the list.
>
> Eddie Kohler wrote:
> |  Glad to help!
> |  E

As you requested, this mail is being cc'd to the kernel mailing list.

In future, every time I send you a three-word note, I will make sure to cc the
kernel mailing list.

I deeply, deeply apologize for whatever it is I have done this time to make
you angry, although I have no idea what that could be.  I hereby acknowledge
all of your many thousands of contributions to the DCCP kernel implementation.

All hail Gerrit!!~*$716617*&!197`987(*&!(*&([EMAIL PROTECTED]&(*#&([EMAIL 
PROTECTED]


Oh, the fun, keep up the good work on reviewing the patches, as usual
for the not so recent past I've been unable to participate fully, and
was feeling ashamed for that, but in hindsight its not that bad, at
least the patches can brew in this so joyfull exchange of messages, at
some point I'll just jump in and read those messages to harvest the
patches for inclusion in mainline, as such I really hope that all
interested parties look at Gerrit and Ian patches and test and report
with it on top of mainline.

Best Regards,

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/25]: Cheaper & smaller timestamping

2007-04-12 Thread Arnaldo Carvalho de Melo

On 4/2/07, Gerrit Renker <[EMAIL PROTECTED]> wrote:

|  > [DCCP]: Cheaper & smaller timestamping

|  A couple of comments though:
|  - this is how I had the code originally in many cases and Arnaldo
|  changed - can't remember why.
Arnaldo's code uses a reference point, the skb_get_timestamp() also did
that. May be useful against old incarnations across reboots.


I got to this cset now in today's merge session, the cset that did that was:

b0e567806d16586629468c824dfb2e71155df7da

Comment was, unfortunately, not so clear:


[EMAIL PROTECTED] net-2.6.22]$ git-show b0e567806d16586629468c824dfb2e71155df7da
commit b0e567806d16586629468c824dfb2e71155df7da
Author: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
Date:   Fri Sep 9 02:38:35 2005 -0300

   [DCCP] Introduce dccp_timestamp

   To start the timestamps with 0.0ms, easing the integer maths in
the CCIDs, this
   probably will be reworked to use the to be introduced struct timeval_offset
   infrastructure out of skb_get_timestamp, etc.

   Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---

What are the architectures you've been testing? All 64 bits? IIRC the
problem was related to overflows, suseconds_t is unsigned long, so 32
bits on i386...


|  - we should be looking at using the new ktime as monotonic but that
|  can be a task for another time.
Excellent idea, this may allow to make CCID3 more precise.

By sheer magic, skb_get_timestamp does this already, as I just saw:

static inline void skb_get_timestamp(const struct sk_buff *skb, struct timeval 
*stamp)
{
*stamp = ktime_to_timeval(skb->tstamp);
}

so we only need to replace the do_gettimeofday() at some time.


Yes, skb_get_timestamp seems to be the right thing to do, but now I'm
uneasy about applying the patch that essentially reverts
b0e567806d16586629468c824dfb2e71155df7da, will try not applying it and
applying the followup patches, will report the results soon.

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/25]: Cheaper & smaller timestamping

2007-04-12 Thread Arnaldo Carvalho de Melo

On 4/12/07, Arnaldo Carvalho de Melo <[EMAIL PROTECTED]> wrote:

On 4/2/07, Gerrit Renker <[EMAIL PROTECTED]> wrote:
> |  > [DCCP]: Cheaper & smaller timestamping
>
> |  A couple of comments though:
> |  - this is how I had the code originally in many cases and Arnaldo
> |  changed - can't remember why.
> Arnaldo's code uses a reference point, the skb_get_timestamp() also did
> that. May be useful against old incarnations across reboots.

I got to this cset now in today's merge session, the cset that did that was:

b0e567806d16586629468c824dfb2e71155df7da

Comment was, unfortunately, not so clear:


[EMAIL PROTECTED] net-2.6.22]$ git-show b0e567806d16586629468c824dfb2e71155df7da
commit b0e567806d16586629468c824dfb2e71155df7da
Author: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
Date:   Fri Sep 9 02:38:35 2005 -0300

[DCCP] Introduce dccp_timestamp

To start the timestamps with 0.0ms, easing the integer maths in
the CCIDs, this
probably will be reworked to use the to be introduced struct timeval_offset
infrastructure out of skb_get_timestamp, etc.

Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---

What are the architectures you've been testing? All 64 bits? IIRC the
problem was related to overflows, suseconds_t is unsigned long, so 32
bits on i386...

> |  - we should be looking at using the new ktime as monotonic but that
> |  can be a task for another time.
> Excellent idea, this may allow to make CCID3 more precise.
>
> By sheer magic, skb_get_timestamp does this already, as I just saw:
>
> static inline void skb_get_timestamp(const struct sk_buff *skb, struct 
timeval *stamp)
> {
> *stamp = ktime_to_timeval(skb->tstamp);
> }
>
> so we only need to replace the do_gettimeofday() at some time.

Yes, skb_get_timestamp seems to be the right thing to do, but now I'm
uneasy about applying the patch that essentially reverts
b0e567806d16586629468c824dfb2e71155df7da, will try not applying it and
applying the followup patches, will report the results soon.


OK, you did the conversion, applying it.

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 19/25]: Remove unused fields in packet history structure

2007-04-12 Thread Arnaldo Carvalho de Melo

On 3/26/07, Ian McDonald <[EMAIL PROTECTED]> wrote:

On 3/22/07, Gerrit Renker <[EMAIL PROTECTED]> wrote:
> [CCID 3]: Remove unused fields in packet history structure
>
Acked-by: Ian McDonald <[EMAIL PROTECTED]>


Indeed it reduced the size:

[EMAIL PROTECTED] net-2.6.22]$ cat /tmp/before
/*  /home/acme/git/net-2.6.22/net/dccp/ccids/lib/packet_history.h:55 */
struct dccp_tx_hist_entry {
   struct list_head   dccphtx_node; /* 016 */
   u64dccphtx_seqno:48; /*16 8 */
   u64dccphtx_sent:1;   /*16 8 */

   /* XXX 15 bits hole, try to pack */

   u32dccphtx_rtt;  /*24 4 */

   /* XXX 4 bytes hole, try to pack */

   struct timeval dccphtx_tstamp;   /*3216 */
}; /* size: 48, cachelines: 1 */
  /* sum members: 44, holes: 1, sum holes: 4 */
  /* bit holes: 1, sum bit holes: 15 bits */
  /* last cacheline: 48 bytes */
[EMAIL PROTECTED] net-2.6.22]$ pahole -e
../build/mica/net-2.6.22/net/dccp/ccids/ccid3.o dccp_tx_hist_entry /*
 /home/acme/git/net-2.6.22/net/dccp/ccids/lib/packet_history.h:55
*/
struct dccp_tx_hist_entry {
   struct list_head   dccphtx_node; /* 016 */
   u64dccphtx_seqno;/*16 8 */
   struct timeval dccphtx_tstamp;   /*2416 */
}; /* size: 40, cachelines: 1 */
  /* last cacheline: 40 bytes */
[EMAIL PROTECTED] net-2.6.22]$

I wonder if we could make this a single linked list, lemme see.. I
think so, and by doing that we get the struct down to 32 bytes, making
it cacheline friendly, and also simplify the operations, i.e. that
opencoded list operation would just go away, will leave this for
later, in a TODO file.

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 24/25]: Macro for moving average

2007-04-12 Thread Arnaldo Carvalho de Melo

On 4/2/07, Ian McDonald <[EMAIL PROTECTED]> wrote:

On 4/2/07, Gerrit Renker <[EMAIL PROTECTED]> wrote:
> Quoting Ian McDonald:
> |  > +/**
> |  > + * Macro for exponentially weighted moving average
> |  > + * @weight: Weight to be used as damping factor, in units of 1/10
> |  > + * Beware that @val is evaluated multiple times.
> |  > + */
> |  > +#define TFRC_EWMA(val, newval, weight) \
> |  > + val = val? ((weight) * val + (10 - (weight)) * (newval)) / 10 : 
(newval)
> |  > +
> |  Just wondering why you pass weight when you only use 9 anyway and it
> |  just adds a step. Is it used in following patches with a different
> |  weight?
> |
> In the following patches it is used one more time, which is for the 
`Preventing
> Oscillations' changeset (this is not in the next patch set but in the one 
after that);
> here again the weight 9/10 is used. I think it is useful to keep the `weight' 
argument,
> since RFC 3448 suggests to use a different weight for the RTT sample when the 
`Oscillation
> Prevention' e.g. is not deployed (a q close to zero).
> I think it is better to keep the weight argument, it allows changing the code 
later, while
> still providing the abstraction: for CCID 3 there are 4 uses of this macro 
alone.
>
OK.

Acked-by: Ian McDonald <[EMAIL PROTECTED]>


I don't like macros that receive as a parameter something that will be
changed, its more readable to have it as a inline function and return
the new value, even if the resulting code uses more characters, so I'm
applying the attached patch and will take care of the fallout, that is
small from what I grepped at the_whole_lot.

- Arnaldo
diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index 9474331..428aa92 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -183,7 +183,7 @@ static inline void ccid3_hc_tx_update_s(struct ccid3_hc_tx_sock *hctx, int len)
 {
 	const u16 old_s = hctx->ccid3hctx_s;
 
-	hctx->ccid3hctx_s = old_s == 0 ? len : (9 * old_s + len) / 10;
+	hctx->ccid3hctx_s = tfrc_ewma(hctx->ccid3hctx_s, len, 9);
 
 	if (hctx->ccid3hctx_s != old_s)
 		ccid3_update_send_interval(hctx);
@@ -474,8 +474,7 @@ static void ccid3_hc_tx_packet_recv(struct sock *sk, struct sk_buff *skb)
 	 *
 	 * q is a constant, RFC 3448 recommends 0.9
 	 */
-	hctx->ccid3hctx_rtt = hctx->ccid3hctx_rtt == 0 ?  r_sample
-			: (9 * hctx->ccid3hctx_rtt + r_sample) / 10;
+	hctx->ccid3hctx_rtt = tfrc_ewma(hctx->ccid3hctx_rtt, r_sample, 9);
 
 	/*
 	 * Update allowed sending rate as per draft rfc3448bis, 4.2/4.3
@@ -724,8 +723,7 @@ static inline void ccid3_hc_rx_update_s(struct ccid3_hc_rx_sock *hcrx, int len)
 	if (unlikely(len == 0))	/* don't update on empty packets (e.g. ACKs) */
 		ccid3_pr_debug("Packet payload length is 0 - not updating\n");
 	else
-		hcrx->ccid3hcrx_s = hcrx->ccid3hcrx_s == 0 ? len :
-(9 * hcrx->ccid3hcrx_s + len) / 10;
+		hcrx->ccid3hcrx_s = tfrc_ewma(hcrx->ccid3hcrx_s, len, 9);
 }
 
 static void ccid3_hc_rx_send_feedback(struct sock *sk)
diff --git a/net/dccp/ccids/lib/tfrc.h b/net/dccp/ccids/lib/tfrc.h
index faf5f7e..f26e9e9 100644
--- a/net/dccp/ccids/lib/tfrc.h
+++ b/net/dccp/ccids/lib/tfrc.h
@@ -37,6 +37,17 @@ static inline u32 scaled_div32(u64 a, u32 b)
 	return result;
 }
 
+/**
+ * Exponentially weighted moving average
+ * @weight: Weight to be used as damping factor, in units of 1/10
+ */
+static inline u32 tfrc_ewma(const u32 val, const u32 newval, const u8 weight)
+{
+	if (val != 0)
+		return (weight * val + (10 - weight) * newval) / 10; 
+	return newval;
+}
+
 extern u32 tfrc_calc_x(u16 s, u32 R, u32 p);
 extern u32 tfrc_calc_x_reverse_lookup(u32 fvalue);
 


RFC: patch backlog

2007-04-12 Thread Arnaldo Carvalho de Melo

Hi,

  I've applied the first 29 patches in Gerrit's patch backlog at
http://www.erg.abdn.ac.uk/users/gerrit/dccp/patch-backlog/, i.e. all
the way to "[CCID3]: History access is redundant when sending
feedback", aka 10d_CCID3-RX-Feedback_Sans-History-Access.diff, its now
at:

Web access:
http://git.kernel.org/?p=linux/kernel/git/acme/net-2.6.22.git;a=shortlog

GIT URL:
git://git.kernel.org/pub/scm/linux/kernel/git/acme/net-2.6.22.git

 I understand there is still some disagreement over this one:

"[CCID3]: Avoid accumulation of large send credit"

http://git.kernel.org/?p=linux/kernel/git/acme/net-2.6.22.git;a=commitdiff;h=86a703f86b9e54c1c12679e34aa1e3cf16360076

  I have to read the arguments presented by Eddie and Ian, and we
can always fix this at some point when consensus is reached, but for
now, to help in getting the pile of good work Gerrit has done I've
merged it.

   The few changes I made were commented in messages I've sent to
this list today, will try to find time again to go over the rest of
the backlog, there are some that were acked by Ian, but not some, so
if Ian could take a look at the ones without an ACK I'd really
appreciate.

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/29]: dccp patch backlog

2007-04-12 Thread Arnaldo Carvalho de Melo
Hi David,

Please consider pulling from:

master.kernel.org:/pub/scm/linux/kernel/git/acme/net-2.6.22

Best Regards,

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/29] Fix bug in the calculation of very low sending rates

2007-04-12 Thread Arnaldo Carvalho de Melo
This fixes an error in the calculation of t_ipi when X converges towards
very low sending rates (between 1 and 64 bytes per second).

Although this case may not sound likely, it can be reproduced by connecting,
hitting enter (1 byte sent) and waiting for some time, during which the
nofeedback timer halves the sending rate until finally it reaches the region
1..64 bytes/sec. Computing X is handled correctly (tested separately); but by
dividing X _before_ entering the calculation of t_ipi, X becomes zero as
a result.  This in turn triggers a BUG condition caught in scaled_div().

Fixed by replacing with equivalent statement and explicit typecast for good
measure.

Calculation verified and effect of patch tested - reduced never below 1 byte
per 64 seconds afterwards, i.e. not allowing divide-by-zero.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Acked-by: Ian McDonald <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/ccids/ccid3.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index c3abd76..d7d9ce7 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -102,8 +102,8 @@ static inline u64 rfc3390_initial_rate(struct sock *sk)
 static inline void ccid3_update_send_interval(struct ccid3_hc_tx_sock *hctx)
 {
/* Calculate new t_ipi = s / X_inst (X_inst is in 64 * bytes/second) */
-   hctx->ccid3hctx_t_ipi = scaled_div(hctx->ccid3hctx_s,
-  hctx->ccid3hctx_x >> 6);
+   hctx->ccid3hctx_t_ipi = scaled_div32(((u64)hctx->ccid3hctx_s) << 6,
+hctx->ccid3hctx_x);
 
/* Calculate new delta by delta = min(t_ipi / 2, t_gran / 2) */
hctx->ccid3hctx_delta = min_t(u32, hctx->ccid3hctx_t_ipi / 2,
-- 
1.5.0.6

-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/29] Avoid accumulation of large send credit

2007-04-12 Thread Arnaldo Carvalho de Melo
Problem:
 Large backlogs of packets will accumulate in the CCID3 TX module when
  (i)   the application idles and/or
  (ii)  the application emits at a rate slower than the allowed rate X/s and/or
  (iii) due to scheduling inaccuracy (resolution only up to HZ).

 The consequence is that a huge burst of packets can be sent immediately, which
 violates the allowed sending rate and can (worst case) choke the network.
 Furthermore (iii) is especially likely when using high-speed (Gbit) links.

Fix:
 Avoid any backlog of sending time which is greater than one whole t_ipi. This
 is a conservative tight bound which will ensure safe operation with regard to
 the allowed fair-share rate in a wide range of possible hardware 
configurations.
 The value is derived below with a simple model, showing that this choice is 
safe
 from an operational point of view.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/ccids/ccid3.c |   12 ++--
 1 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index d7d9ce7..c623761 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -362,7 +362,15 @@ static int ccid3_hc_tx_send_packet(struct sock *sk, struct 
sk_buff *skb)
case TFRC_SSTATE_NO_FBACK:
case TFRC_SSTATE_FBACK:
delay = timeval_delta(&hctx->ccid3hctx_t_nom, &now);
-   ccid3_pr_debug("delay=%ld\n", (long)delay);
+   /*
+* Lagging behind for more than a full t_ipi: when this occurs,
+* a send credit accrues which causes packet storms, violating
+* even the average allowed sending rate. This case happens if
+* the application idles for some time, or if it emits packets
+* at a rate smaller than X/s. Avoid such accumulation.
+*/
+   if (delay + (suseconds_t)hctx->ccid3hctx_t_ipi < 0)
+   hctx->ccid3hctx_t_nom = now;
/*
 *  Scheduling of packet transmissions [RFC 3448, 4.6]
 *
@@ -371,7 +379,7 @@ static int ccid3_hc_tx_send_packet(struct sock *sk, struct 
sk_buff *skb)
 * else
 *   // send the packet in (t_nom - t_now) milliseconds.
 */
-   if (delay - (suseconds_t)hctx->ccid3hctx_delta >= 0)
+   else if (delay - (suseconds_t)hctx->ccid3hctx_delta >= 0)
return delay / 1000L;
 
ccid3_hc_tx_update_win_count(hctx, &now);
-- 
1.5.0.6

-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/29] Debug statements for Elapsed Time option

2007-04-12 Thread Arnaldo Carvalho de Melo
This prints the value of the parsed Elapsed Time when received via a
Timestamp Echo option [RFC 4342, 13.3].

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Acked-by: Ian McDonald <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/options.c |8 ++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/dccp/options.c b/net/dccp/options.c
index 14b6212..34d536d 100644
--- a/net/dccp/options.c
+++ b/net/dccp/options.c
@@ -172,21 +172,25 @@ int dccp_parse_options(struct sock *sk, struct sk_buff 
*skb)
opt_recv->dccpor_timestamp_echo = ntohl(*(__be32 
*)value);
 
dccp_pr_debug("%s rx opt: TIMESTAMP_ECHO=%u, len=%d, "
- "ackno=%llu, ",  dccp_role(sk),
+ "ackno=%llu", dccp_role(sk),
  opt_recv->dccpor_timestamp_echo,
  len + 2,
  (unsigned long long)
  DCCP_SKB_CB(skb)->dccpd_ack_seq);
 
 
-   if (len == 4)
+   if (len == 4) {
+   dccp_pr_debug_cat("\n");
break;
+   }
 
if (len == 6)
elapsed_time = ntohs(*(__be16 *)(value + 4));
else
elapsed_time = ntohl(*(__be32 *)(value + 4));
 
+   dccp_pr_debug_cat(", ELAPSED_TIME=%d\n", elapsed_time);
+
/* Give precedence to the biggest ELAPSED_TIME */
if (elapsed_time > opt_recv->dccpor_elapsed_time)
opt_recv->dccpor_elapsed_time = elapsed_time;
-- 
1.5.0.6

-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/29] Complete documentation of dccp_sock

2007-04-12 Thread Arnaldo Carvalho de Melo
This fills in missing documentation for dccp_sock fields.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 include/linux/dccp.h |   21 ++---
 1 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/include/linux/dccp.h b/include/linux/dccp.h
index e668cf5..fda2148 100644
--- a/include/linux/dccp.h
+++ b/include/linux/dccp.h
@@ -465,21 +465,20 @@ struct dccp_ackvec;
  * @dccps_service_list - second .. last service code on passive socket
  * @dccps_timestamp_time - time of latest TIMESTAMP option
  * @dccps_timestamp_echo - latest timestamp received on a TIMESTAMP option
- * @dccps_l_ack_ratio -
- * @dccps_r_ack_ratio -
+ * @dccps_l_ack_ratio - feature-local Ack Ratio
+ * @dccps_r_ack_ratio - feature-remote Ack Ratio
  * @dccps_pcslen - sender   partial checksum coverage (via sockopt)
  * @dccps_pcrlen - receiver partial checksum coverage (via sockopt)
  * @dccps_ndp_count - number of Non Data Packets since last data packet
- * @dccps_mss_cache -
- * @dccps_minisock -
+ * @dccps_mss_cache - current value of MSS (path MTU minus header sizes)
+ * @dccps_minisock - associated minisock (accessed via dccp_msk)
  * @dccps_hc_rx_ackvec - rx half connection ack vector
- * @dccps_hc_rx_ccid -
- * @dccps_hc_tx_ccid -
- * @dccps_options_received -
- * @dccps_epoch -
- * @dccps_role - Role of this sock, one of %dccp_role
- * @dccps_hc_rx_insert_options -
- * @dccps_hc_tx_insert_options -
+ * @dccps_hc_rx_ccid - CCID used for the receiver (or receiving 
half-connection)
+ * @dccps_hc_tx_ccid - CCID used for the sender (or sending half-connection)
+ * @dccps_options_received - parsed set of retrieved options
+ * @dccps_role - role of this sock, one of %dccp_role
+ * @dccps_hc_rx_insert_options - receiver wants to add options when acking
+ * @dccps_hc_tx_insert_options - sender wants to add options when sending
  * @dccps_xmit_timer - timer for when CCID is not ready to send
  * @dccps_syn_rtt - RTT sample from Request/Response exchange (in usecs)
  */
-- 
1.5.0.6

-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/29] Cheaper & smaller timestamping

2007-04-12 Thread Arnaldo Carvalho de Melo
Currently dccp_timestamp calculates timestamps relative to an offset computed
at initialisation time.  This patch reduces the cost of timestamping by removing
the offset computation. This affords the advantages that

 * there are now fewer instructions per single timestamp;
 * the size of the dccp_sock becomes smaller (minus @dccps_epoch).

(NB: Using a reference epoch is not strictly necessary, since all known 
timestamping
 operations in DCCP only need relative time differences, but not absolute 
time.
 I have tested this patch for a while on different platforms, found no 
problems.)

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Acked-by: Ian McDonald <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 include/linux/dccp.h|1 -
 net/dccp/ackvec.c   |4 ++--
 net/dccp/ccids/ccid3.c  |   18 +-
 net/dccp/ccids/lib/packet_history.h |2 +-
 net/dccp/dccp.h |2 --
 net/dccp/input.c|2 +-
 net/dccp/minisocks.c|1 -
 net/dccp/options.c  |   22 +++---
 net/dccp/proto.c|1 -
 9 files changed, 16 insertions(+), 37 deletions(-)

diff --git a/include/linux/dccp.h b/include/linux/dccp.h
index fda2148..c297212 100644
--- a/include/linux/dccp.h
+++ b/include/linux/dccp.h
@@ -511,7 +511,6 @@ struct dccp_sock {
struct ccid *dccps_hc_rx_ccid;
struct ccid *dccps_hc_tx_ccid;
struct dccp_options_receiveddccps_options_received;
-   struct timeval  dccps_epoch;
enum dccp_role  dccps_role:2;
__u8dccps_hc_rx_insert_options:1;
__u8dccps_hc_tx_insert_options:1;
diff --git a/net/dccp/ackvec.c b/net/dccp/ackvec.c
index 01030f3..a2bf318 100644
--- a/net/dccp/ackvec.c
+++ b/net/dccp/ackvec.c
@@ -82,7 +82,7 @@ int dccp_insert_option_ackvec(struct sock *sk, struct sk_buff 
*skb)
if (DCCP_SKB_CB(skb)->dccpd_opt_len + len > DCCP_MAX_OPT_LEN)
return -1;
 
-   dccp_timestamp(sk, &now);
+   do_gettimeofday(&now);
elapsed_time = timeval_delta(&now, &av->dccpav_time) / 10;
 
if (elapsed_time != 0 &&
@@ -321,7 +321,7 @@ int dccp_ackvec_add(struct dccp_ackvec *av, const struct 
sock *sk,
}
 
av->dccpav_buf_ackno = ackno;
-   dccp_timestamp(sk, &av->dccpav_time);
+   do_gettimeofday(&av->dccpav_time);
 out:
return 0;
 
diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index c623761..b756e95 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -273,7 +273,7 @@ static void ccid3_hc_tx_no_feedback_timer(unsigned long 
data)
  (2 * TFRC_T_MBI));
 
if (hctx->ccid3hctx_p == 0)
-   dccp_timestamp(sk, &now);
+   do_gettimeofday(&now);
} else {
hctx->ccid3hctx_x_recv = hctx->ccid3hctx_x_calc;
hctx->ccid3hctx_x_recv <<= 4;
@@ -325,7 +325,7 @@ static int ccid3_hc_tx_send_packet(struct sock *sk, struct 
sk_buff *skb)
if (unlikely(skb->len == 0))
return -EBADMSG;
 
-   dccp_timestamp(sk, &now);
+   do_gettimeofday(&now);
 
switch (hctx->ccid3hctx_state) {
case TFRC_SSTATE_NO_SENT:
@@ -418,7 +418,7 @@ static void ccid3_hc_tx_packet_sent(struct sock *sk, int 
more,
}
dccp_tx_hist_add_entry(&hctx->ccid3hctx_hist, packet);
 
-   dccp_timestamp(sk, &now);
+   do_gettimeofday(&now);
packet->dccphtx_tstamp = now;
packet->dccphtx_seqno  = dccp_sk(sk)->dccps_gss;
packet->dccphtx_rtt= hctx->ccid3hctx_rtt;
@@ -469,7 +469,7 @@ static void ccid3_hc_tx_packet_recv(struct sock *sk, struct 
sk_buff *skb)
else   /* can not exceed 100% */
hctx->ccid3hctx_p = 100 / pinv;
 
-   dccp_timestamp(sk, &now);
+   do_gettimeofday(&now);
 
/*
 * Calculate new round trip sample as per [RFC 3448, 4.3] by
@@ -751,7 +751,7 @@ static void ccid3_hc_rx_send_feedback(struct sock *sk)
 
ccid3_pr_debug("%s(%p) - entry \n", dccp_role(sk), sk);
 
-   dccp_timestamp(sk, &now);
+   do_gettimeofday(&now);
 
switch (hcrx->ccid3hcrx_state) {
case TFRC_RSTATE_NO_DATA:
@@ -903,7 +903,7 @@ found:
return ~0;
}
 
-   dccp_timestamp(sk, &tstamp);
+   do_gettimeofday(&tstamp);
delta = timeval_delta(&tstamp, &hcr

[PATCH 06/29] Use skb timestamp for TX side

2007-04-12 Thread Arnaldo Carvalho de Melo
 This patch uses skb timestamps (and provides the necessary enable/disable
 mechanism) instead of layer-4 timestamping. This affords a more accurate
 RTT estimation.

Background:

 The sender currently computes the timestamp of received (Ack) packets when
 the packet is delivered to layer 4. The skb timestamp is taken earlier, and
 the differences are notable: dDuring test runs under different load conditions,
 the layer-4 timestamp was observed to be on average up to 100 microseconds,
 in the extreme up to a millisecond, later than the skb timestamp.
 This has a negative impact on the RTT estimation (reduced accuracy).

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Acked-by: Ian McDonald <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/ccids/ccid3.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index b756e95..a20b636 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -469,7 +469,7 @@ static void ccid3_hc_tx_packet_recv(struct sock *sk, struct 
sk_buff *skb)
else   /* can not exceed 100% */
hctx->ccid3hctx_p = 100 / pinv;
 
-   do_gettimeofday(&now);
+   skb_get_timestamp(skb, &now);
 
/*
 * Calculate new round trip sample as per [RFC 3448, 4.3] by
@@ -645,6 +645,7 @@ static int ccid3_hc_tx_init(struct ccid *ccid, struct sock 
*sk)
ccid3_hc_tx_no_feedback_timer;
hctx->ccid3hctx_no_feedback_timer.data = (unsigned long)sk;
init_timer(&hctx->ccid3hctx_no_feedback_timer);
+   net_enable_timestamp();
 
return 0;
 }
@@ -657,6 +658,7 @@ static void ccid3_hc_tx_exit(struct sock *sk)
 
ccid3_hc_tx_set_state(sk, TFRC_SSTATE_TERM);
sk_stop_timer(sk, &hctx->ccid3hctx_no_feedback_timer);
+   net_disable_timestamp();
 
/* Empty packet history */
dccp_tx_hist_purge(ccid3_tx_hist, &hctx->ccid3hctx_hist);
-- 
1.5.0.6

-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/29] Use skb timestamp for receiver side

2007-04-12 Thread Arnaldo Carvalho de Melo
 This patch replaces calls to do_gettimeofday at the receiver CCID 3
 with skb timestamps. The skb timestamps are computed earlier in the
 receive path, experimental measurements have shown that up to several
 hundred microseconds can lie between the skb receive timestamp and the
 timestamp taken when CCID 3 receives the packet. This difference has
 a negative impact on RTT estimation (reduced accuracy).

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Acked-by: Ian McDonald <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/ccids/lib/packet_history.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/dccp/ccids/lib/packet_history.h 
b/net/dccp/ccids/lib/packet_history.h
index b40b9f9..330b58f 100644
--- a/net/dccp/ccids/lib/packet_history.h
+++ b/net/dccp/ccids/lib/packet_history.h
@@ -151,7 +151,7 @@ static inline struct dccp_rx_hist_entry *
entry->dccphrx_ccval = dh->dccph_ccval;
entry->dccphrx_type  = dh->dccph_type;
entry->dccphrx_ndp   = ndp;
-   do_gettimeofday(&entry->dccphrx_tstamp);
+   skb_get_timestamp(skb, &entry->dccphrx_tstamp);
}
 
return entry;
-- 
1.5.0.6

-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/29] Honour initial RTT estimate

2007-04-12 Thread Arnaldo Carvalho de Melo
This is a small optimisation which improves the accuracy of TX
RTT sampling when an initial RTT sample (e.g. from the intial
Request/Response exchange) is available.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Acked-by: Ian McDonald <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/ccids/ccid3.c |   13 ++---
 1 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index a20b636..9d661e4 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -469,28 +469,29 @@ static void ccid3_hc_tx_packet_recv(struct sock *sk, 
struct sk_buff *skb)
else   /* can not exceed 100% */
hctx->ccid3hctx_p = 100 / pinv;
 
-   skb_get_timestamp(skb, &now);
-
/*
 * Calculate new round trip sample as per [RFC 3448, 4.3] by
 *  R_sample  =  (now - t_recvdata) - t_elapsed
 */
+   skb_get_timestamp(skb, &now);
r_sample = dccp_sample_rtt(sk, &now, &packet->dccphtx_tstamp);
 
/*
-* Update RTT estimate by
-* If (No feedback recv)
+* Update RTT estimate (honours RTT from SYN exchange):
+* If (No RTT sample so far)
 *R = R_sample;
 * Else
 *R = q * R + (1 - q) * R_sample;
 *
 * q is a constant, RFC 3448 recomments 0.9
 */
+   hctx->ccid3hctx_rtt = hctx->ccid3hctx_rtt == 0 ?  r_sample
+   : (9 * hctx->ccid3hctx_rtt + r_sample) / 10;
+
if (hctx->ccid3hctx_state == TFRC_SSTATE_NO_FBACK) {
/*
 * Larger Initial Windows [RFC 4342, sec. 5]
 */
-   hctx->ccid3hctx_rtt  = r_sample;
hctx->ccid3hctx_x= rfc3390_initial_rate(sk);
hctx->ccid3hctx_t_ld = now;
 
@@ -504,8 +505,6 @@ static void ccid3_hc_tx_packet_recv(struct sock *sk, struct 
sk_buff *skb)
 
ccid3_hc_tx_set_state(sk, TFRC_SSTATE_FBACK);
} else {
-   hctx->ccid3hctx_rtt = (9 * hctx->ccid3hctx_rtt +
-  r_sample) / 10;
 
/* Update sending rate (step 4 of [RFC 3448, 4.3]) */
if (hctx->ccid3hctx_p > 0)
-- 
1.5.0.6

-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/29] Ignore irrelevant states on receiving feedback

2007-04-12 Thread Arnaldo Carvalho de Melo
This removes two currently irrelevant states in TX feedback handling:

 * the NO_SENT state is not triggered, since currently DCCP only operates in
   half-connections (in which this state is irrelevant). When upgrading to
   bidirectional half-connections, other changes are required anyhow, so that
   there is no loss in removing this currently unsupported state.

 * the TERM (terminating) state is uninteresting.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/ccids/ccid3.c |  207 
 1 files changed, 102 insertions(+), 105 deletions(-)

diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index 9d661e4..22ba3f9 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -442,130 +442,127 @@ static void ccid3_hc_tx_packet_recv(struct sock *sk, 
struct sk_buff *skb)
  DCCP_SKB_CB(skb)->dccpd_type == DCCP_PKT_DATAACK))
return;
 
+   /* only the established state is of interest here */
+   switch (hctx->ccid3hctx_state) {
+   case TFRC_SSTATE_NO_FBACK:  /* fall through */
+   case TFRC_SSTATE_FBACK: break;
+   default:return;
+   }
+
+   /* get packet from history to look up t_recvdata */
+   packet = dccp_tx_hist_find_entry(&hctx->ccid3hctx_hist,
+DCCP_SKB_CB(skb)->dccpd_ack_seq);
+   if (unlikely(packet == NULL)) {
+   DCCP_WARN("%s(%p), seqno %llu(%s) doesn't exist in history!\n",
+ dccp_role(sk), sk,
+ (unsigned long long)DCCP_SKB_CB(skb)->dccpd_ack_seq,
+ dccp_packet_name(DCCP_SKB_CB(skb)->dccpd_type));
+   return;
+   }
+
opt_recv = &hctx->ccid3hctx_options_received;
 
-   switch (hctx->ccid3hctx_state) {
-   case TFRC_SSTATE_NO_FBACK:
-   case TFRC_SSTATE_FBACK:
-   /* get packet from history to look up t_recvdata */
-   packet = dccp_tx_hist_find_entry(&hctx->ccid3hctx_hist,
- DCCP_SKB_CB(skb)->dccpd_ack_seq);
-   if (unlikely(packet == NULL)) {
-   DCCP_WARN("%s(%p), seqno %llu(%s) doesn't exist "
- "in history!\n",  dccp_role(sk), sk,
-   (unsigned long long)DCCP_SKB_CB(skb)->dccpd_ack_seq,
-   dccp_packet_name(DCCP_SKB_CB(skb)->dccpd_type));
-   return;
-   }
+   /* Update receive rate in units of 64 * bytes/second */
+   hctx->ccid3hctx_x_recv = opt_recv->ccid3or_receive_rate;
+   hctx->ccid3hctx_x_recv <<= 6;
 
-   /* Update receive rate in units of 64 * bytes/second */
-   hctx->ccid3hctx_x_recv = opt_recv->ccid3or_receive_rate;
-   hctx->ccid3hctx_x_recv <<= 6;
+   /* Update loss event rate */
+   pinv = opt_recv->ccid3or_loss_event_rate;
+   if (pinv == ~0U || pinv == 0)  /* see RFC 4342, 8.5   */
+   hctx->ccid3hctx_p = 0;
+   else   /* can not exceed 100% */
+   hctx->ccid3hctx_p = 100 / pinv;
 
-   /* Update loss event rate */
-   pinv = opt_recv->ccid3or_loss_event_rate;
-   if (pinv == ~0U || pinv == 0)  /* see RFC 4342, 8.5   */
-   hctx->ccid3hctx_p = 0;
-   else   /* can not exceed 100% */
-   hctx->ccid3hctx_p = 100 / pinv;
+   /*
+* Calculate new round trip sample as per [RFC 3448, 4.3] by
+* R_sample  =  (t_now - t_recvdata) - t_elapsed
+*/
+   skb_get_timestamp(skb, &now);
+   r_sample = dccp_sample_rtt(sk, &now, &packet->dccphtx_tstamp);
 
-   /*
-* Calculate new round trip sample as per [RFC 3448, 4.3] by
-*  R_sample  =  (now - t_recvdata) - t_elapsed
-*/
-   skb_get_timestamp(skb, &now);
-   r_sample = dccp_sample_rtt(sk, &now, &packet->dccphtx_tstamp);
+   /*
+* Update RTT estimate (honours RTT from SYN exchange):
+* If (No RTT sample so far)
+*R = R_sample;
+* Else
+*R = q * R + (1 - q) * R_sample;
+*
+* q is a constant, RFC 3448 recommends 0.9
+*/
+   hctx->ccid3hctx_rtt = hctx->ccid3hctx_rtt == 0 ?  r_sample
+   : (9 * hctx->ccid3hctx_rtt + r_sample) / 10;
 
+   if (hctx->ccid3hctx_state == TFRC_SSTATE_NO_FBACK) {
/*
-* Update RTT estimate 

[PATCH 10/29] Shorten statement for updating p

2007-04-12 Thread Arnaldo Carvalho de Melo
This shortens the statement for updating the loss event
rate p when a feedback packet is received.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Acked-by: Ian McDonald <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/ccids/ccid3.c |7 ++-
 1 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index 22ba3f9..c3711e1 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -466,12 +466,9 @@ static void ccid3_hc_tx_packet_recv(struct sock *sk, 
struct sk_buff *skb)
hctx->ccid3hctx_x_recv = opt_recv->ccid3or_receive_rate;
hctx->ccid3hctx_x_recv <<= 6;
 
-   /* Update loss event rate */
+   /* Update loss event rate (scaled by 1e6), cf. RFC 4342, 8.5 */
pinv = opt_recv->ccid3or_loss_event_rate;
-   if (pinv == ~0U || pinv == 0)  /* see RFC 4342, 8.5   */
-   hctx->ccid3hctx_p = 0;
-   else   /* can not exceed 100% */
-   hctx->ccid3hctx_p = 100 / pinv;
+   hctx->ccid3hctx_p = (pinv == ~0U || pinv == 0) ? 0 : scaled_div(1, 
pinv);
 
/*
 * Calculate new round trip sample as per [RFC 3448, 4.3] by
-- 
1.5.0.6

-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/29] Set idle state after nofeedback timer expiry

2007-04-12 Thread Arnaldo Carvalho de Melo
This corrects a mistake with regard to draft rfc3448bis: the `idle' state was
mistakenly entered before the allowed sending rate X was recomputed. With
regard to draft rfc3448bis, section 4.3 / step (4) (implemented by
ccid3_hc_tx_update_x), which caused erroneous calculation.

Fixed by: moving the update to the `idle' state to the end of no_feedback_timer.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/ccids/ccid3.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index c3711e1..e166f5c 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -233,8 +233,6 @@ static void ccid3_hc_tx_no_feedback_timer(unsigned long 
data)
ccid3_pr_debug("%s(%p, state=%s) - entry \n", dccp_role(sk), sk,
   ccid3_tx_state_name(hctx->ccid3hctx_state));
 
-   hctx->ccid3hctx_idle = 1;
-
switch (hctx->ccid3hctx_state) {
case TFRC_SSTATE_NO_FBACK:
/* RFC 3448, 4.4: Halve send rate directly */
@@ -294,6 +292,8 @@ static void ccid3_hc_tx_no_feedback_timer(unsigned long 
data)
goto out;
}
 
+   hctx->ccid3hctx_idle = 1;
+
 restart_timer:
sk_reset_timer(sk, &hctx->ccid3hctx_no_feedback_timer,
   jiffies + usecs_to_jiffies(t_nfb));
-- 
1.5.0.6

-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 13/29] Nofeedback timer according to rfc3448bis

2007-04-12 Thread Arnaldo Carvalho de Melo
This implements the changes to the nofeedback timer handling suggested
in draft rfc3448bis, section 4.4. In particular, these changes mean:

 * better handling of the lossless case (p == 0)
 * the timestamp for computing t_ld becomes obsolete
 * much more recent document (RFC 34484 is 4 years old)
 * concepts in rfc3448bis arose from a real, working implementation (cf. sec. 
12)

 Further Changes:

 1. The CCID 3 state now changes from FBACK back to NO_FBACK when the 
nofeedback timer
expires - this is required for updating X when feedback is received (cf. 
draft
rfc3448bis, 4.2); this better reflects that no feedback was heard since the 
time has
been started; the corresponding support for X is implemented in subsequent 
patch.
 2. Two more irrelevant states are removed (as in earlier patch for 
packet_sent):
 o TFRC_SSTATE_NO_SENT (logically not possible), and
 o TFRC_SSTATE_TERM(irrelevant, since terminating anyway).

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/ccids/ccid3.c |   48 
 1 files changed, 20 insertions(+), 28 deletions(-)

diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index 471db59..63ac4ae 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -220,7 +220,6 @@ static void ccid3_hc_tx_no_feedback_timer(unsigned long 
data)
 {
struct sock *sk = (struct sock *)data;
struct ccid3_hc_tx_sock *hctx = ccid3_hc_tx_sk(sk);
-   struct timeval now;
unsigned long t_nfb = USEC_PER_SEC / 5;
 
bh_lock_sock(sk);
@@ -233,24 +232,27 @@ static void ccid3_hc_tx_no_feedback_timer(unsigned long 
data)
ccid3_pr_debug("%s(%p, state=%s) - entry \n", dccp_role(sk), sk,
   ccid3_tx_state_name(hctx->ccid3hctx_state));
 
-   switch (hctx->ccid3hctx_state) {
-   case TFRC_SSTATE_NO_FBACK:
-   /* RFC 3448, 4.4: Halve send rate directly */
+   if (hctx->ccid3hctx_state == TFRC_SSTATE_FBACK)
+   ccid3_hc_tx_set_state(sk, TFRC_SSTATE_NO_FBACK);
+   else if (hctx->ccid3hctx_state != TFRC_SSTATE_NO_FBACK)
+   goto out;
+
+   /*
+* Determine new allowed sending rate X as per draft rfc3448bis, 4.4
+*/
+   if (hctx->ccid3hctx_t_rto == 0 ||   /* no feedback received yet */
+   hctx->ccid3hctx_p == 0) {
+
+   /* halve send rate directly */
hctx->ccid3hctx_x = max(hctx->ccid3hctx_x / 2,
(((__u64)hctx->ccid3hctx_s) << 6) /
TFRC_T_MBI);
-
-   ccid3_pr_debug("%s(%p, state=%s), updated tx rate to %u "
-  "bytes/s\n", dccp_role(sk), sk,
-  ccid3_tx_state_name(hctx->ccid3hctx_state),
-  (unsigned)(hctx->ccid3hctx_x >> 6));
ccid3_update_send_interval(hctx);
-   break;
-   case TFRC_SSTATE_FBACK:
+   } else {
/*
-*  Modify the cached value of X_recv [RFC 3448, 4.4]
+*  Modify the cached value of X_recv
 *
-*  If (p == 0 || X_calc > 2 * X_recv)
+*  If (X_calc > 2 * X_recv)
 *X_recv = max(X_recv / 2, s / (2 * t_mbi));
 *  Else
 *X_recv = X_calc / 4;
@@ -259,29 +261,19 @@ static void ccid3_hc_tx_no_feedback_timer(unsigned long 
data)
 */
BUG_ON(hctx->ccid3hctx_p && !hctx->ccid3hctx_x_calc);
 
-   if (hctx->ccid3hctx_p == 0 ||
-   (hctx->ccid3hctx_x_calc > (hctx->ccid3hctx_x_recv >> 5))) {
-
+   if (hctx->ccid3hctx_x_calc > (hctx->ccid3hctx_x_recv >> 5))
hctx->ccid3hctx_x_recv =
max(hctx->ccid3hctx_x_recv / 2,
(((__u64)hctx->ccid3hctx_s) << 6) /
  (2 * TFRC_T_MBI));
-
-   if (hctx->ccid3hctx_p == 0)
-   do_gettimeofday(&now);
-   } else {
+   else {
hctx->ccid3hctx_x_recv = hctx->ccid3hctx_x_calc;
hctx->ccid3hctx_x_recv <<= 4;
}
-   /* Now recalculate X [RFC 3448, 4.3, step (4)] */
-   ccid3_hc_tx_update_x(sk, &now);
-   break;
-   case TFRC_SSTATE_NO_SENT:
-   DCCP_BUG("%s(%p) - Illegal state NO_SENT", dccp_role(sk), sk);
-   /* fall through */
-   case TFRC_SSTATE_TERM:
-   goto

[PATCH 14/29] Implement rfc3448bis changes to feedback reception

2007-04-12 Thread Arnaldo Carvalho de Melo
This implements the algorithm to update the allowed sending rate X upon
receiving feedback packets, as described in draft rfc3448bis, 4.2/4.3.

Some changes (use of Larger Initial Windows) were already present, this patch
adds the remaining ones. In particular:

 * use t_RTO to distinguish the case "initial feedback packet":
   as per earlier patch, it uses "state == NO_FBACK" to imply that
 (a) either nofeedback timer expired (when t_RTO != 0) or
 (b) initial feedback packet (when t_RTO == 0);
 * no reduction of sending rate afer nofeedback timer expiry when p == 0;
 * implements the clause ``if [...] not the first packet after a nofeedback
   timer'' of draft rfc3448bis, 4.3.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/ccids/ccid3.c |   64 +--
 1 files changed, 34 insertions(+), 30 deletions(-)

diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index 63ac4ae..131001b 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -419,7 +419,6 @@ static void ccid3_hc_tx_packet_sent(struct sock *sk, int 
more,
 
 static void ccid3_hc_tx_packet_recv(struct sock *sk, struct sk_buff *skb)
 {
-   const struct dccp_sock *dp = dccp_sk(sk);
struct ccid3_hc_tx_sock *hctx = ccid3_hc_tx_sk(sk);
struct ccid3_options_received *opt_recv;
struct dccp_tx_hist_entry *packet;
@@ -481,41 +480,45 @@ static void ccid3_hc_tx_packet_recv(struct sock *sk, 
struct sk_buff *skb)
hctx->ccid3hctx_rtt = hctx->ccid3hctx_rtt == 0 ?  r_sample
: (9 * hctx->ccid3hctx_rtt + r_sample) / 10;
 
+   /*
+* Update allowed sending rate as per draft rfc3448bis, 4.2/4.3
+*/
if (hctx->ccid3hctx_state == TFRC_SSTATE_NO_FBACK) {
-   /*
-* Larger Initial Windows [RFC 4342, sec. 5]
-*/
-   hctx->ccid3hctx_x= rfc3390_initial_rate(sk);
-   hctx->ccid3hctx_t_ld = now;
 
-   ccid3_update_send_interval(hctx);
+   ccid3_hc_tx_set_state(sk, TFRC_SSTATE_FBACK);
 
-   ccid3_pr_debug("%s(%p), s=%u, MSS=%u, "
-  "R_sample=%uus, X=%u\n", dccp_role(sk),
-  sk, hctx->ccid3hctx_s,
-  dp->dccps_mss_cache, r_sample,
-  (unsigned)(hctx->ccid3hctx_x >> 6));
+   if (hctx->ccid3hctx_t_rto == 0) {
+   /*
+* Initial feedback packet: Larger Initial Windows (4.2)
+*/
+   hctx->ccid3hctx_x= rfc3390_initial_rate(sk);
+   hctx->ccid3hctx_t_ld = now;
 
-   ccid3_hc_tx_set_state(sk, TFRC_SSTATE_FBACK);
-   } else {
+   ccid3_update_send_interval(hctx);
 
-   /* Update sending rate (step 4 of [RFC 3448, 4.3]) */
-   if (hctx->ccid3hctx_p > 0)
-   hctx->ccid3hctx_x_calc =
-   tfrc_calc_x(hctx->ccid3hctx_s,
-   hctx->ccid3hctx_rtt,
-   hctx->ccid3hctx_p);
-   ccid3_hc_tx_update_x(sk, &now);
-
-   ccid3_pr_debug("%s(%p), RTT=%uus (sample=%uus), s=%u, "
-  "p=%u, X_calc=%u, X_recv=%u, X=%u\n",
-  dccp_role(sk),
-  sk, hctx->ccid3hctx_rtt, r_sample,
-  hctx->ccid3hctx_s, hctx->ccid3hctx_p,
-  hctx->ccid3hctx_x_calc,
-  (unsigned)(hctx->ccid3hctx_x_recv >> 6),
-  (unsigned)(hctx->ccid3hctx_x >> 6));
+   goto done_computing_x;
+
+   } else if (hctx->ccid3hctx_p == 0) {
+   /*
+* First feedback after nofeedback timer expiry (4.3)
+*/
+   goto done_computing_x;
+   }
}
+   /* perform step (4) of draft rfc3448bis, section 4.3 */
+   if (hctx->ccid3hctx_p > 0)
+   hctx->ccid3hctx_x_calc = tfrc_calc_x(hctx->ccid3hctx_s,
+hctx->ccid3hctx_rtt,
+hctx->ccid3hctx_p);
+   ccid3_hc_tx_update_x(sk, &now);
+
+done_computing_x:
+   ccid3_pr_debug("%s(%p), RTT=%uus (sample=%uus), s=%u, p=%u, X_calc=%u, "
+  "X_recv=%u, X=%u\n", dccp_role(sk), sk,
+  hctx->ccid3hctx_rtt, r_sample, hctx->cc

[PATCH 15/29] Disable softirq when sending and notifying CCID

2007-04-12 Thread Arnaldo Carvalho de Melo
This disables softirqs when performing the CCID-specific send operation
in dccp_write_xmit, so that actual sending, and calling the CCID post-send
routine becomes an atomic unit.

Why this needs to be done:

 The function dccp_write_xmit can be called both in user context (via
 dccp_sendmsg) and via timer softirq (dccp_write_xmit_timer). It does

  1. call the CCID-specific `pre-send' routine ccid_hc_tx_send_packet()
  2. ship the skb via dccp_transmit_skb
  3. call the CCID-specific `post-send' routine ccid_hc_tx_packet_sent().

 The last one does e.g. accounting by updating data records (as in CCID 3).

 The transition from 2 ... 3 should be atomic and not be interrupted
 by softirqs.  The reason is that the TX and RX halves of the CCID modules
 share data structures and both halves change state. If the sending process is
 allowed to be interrupted by the reception of a DCCP packet via softirq
 handler, then state and data structures of the sender can become corrupted.

 Here is an actual example whose effects were observed and lead to this patch:
 in CCID 3 the sender records a timestamp when ccid_hc_tx_packet_sent() is 
called.
 If the application is sending via dccp_sendmsg, it may be interrupted and run a
 little while later. Suppose that such interruption happens between steps (2) 
and
 (3) above: the packet has been sent, and immediately afterwards dccp_sendmsg is
 interrupted. Meanwhile the transmitted skb reaches the other side, and an Ack
 comes back; this Ack is processed via softirq (which is allowed to interrupt
 dccp_sendmsg); only then step (3) is performed, but too late: the timestamp
 taken in ccid3_hc_tx_packet_sent is now /after/ the Ack has come in.

 In the observed case, negative RTT samples (i.e. Acks arriving before the
 sent packet was registered) were the result.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Acked-by: Ian McDonald <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/output.c |   13 ++---
 1 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/net/dccp/output.c b/net/dccp/output.c
index c8d843e..0978bc2 100644
--- a/net/dccp/output.c
+++ b/net/dccp/output.c
@@ -250,11 +250,18 @@ void dccp_write_xmit(struct sock *sk, int block)
else
dcb->dccpd_type = DCCP_PKT_DATA;
 
+   /*
+* Transmission and calling the post-send CCID operation
+* must not be interrupted by other processing (e.g.
+* packet reception), otherwise strange errors result.
+*/
+   local_bh_disable();
err = dccp_transmit_skb(sk, skb);
ccid_hc_tx_packet_sent(dp->dccps_hc_tx_ccid, sk, 0, 
len);
-   if (err)
-   DCCP_BUG("err=%d after ccid_hc_tx_packet_sent",
-err);
+   local_bh_enable();
+
+   if (unlikely(err))
+   DCCP_BUG("dccp_transmit_skb returned %d", err);
} else {
dccp_pr_debug("packet discarded due to err=%d\n", err);
kfree_skb(skb);
-- 
1.5.0.6

-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 16/29] Preparation to fit TX list locking

2007-04-12 Thread Arnaldo Carvalho de Melo
This patch prepares adding TX history locking by changing all
functions which return a reference to a TX history object:

 1) removal of dccp_tx_hist_head  -  nowhere used in the code;
 2) change dccp_tx_hist_find_entry  -  the sole use is to look up a
timestamp for an Ack-ed packet; and then do garbage-collection.

The latter function is changed to return the desired timestamp instead of
a history object reference.

Not returning object references is necessary, since otherwise lists could
not be garbage-collected (or would have to use atomic reference counts).

Furthermore, the garbage-collection is now integrated into (2), since it
requires first a read access and then a write access at the same place in
the list: giving up the lock here would permit race condition.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/ccids/ccid3.c  |   18 +++---
 net/dccp/ccids/lib/packet_history.c |   27 +--
 net/dccp/ccids/lib/packet_history.h |   16 ++--
 3 files changed, 30 insertions(+), 31 deletions(-)

diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index 131001b..0029979 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -421,10 +421,9 @@ static void ccid3_hc_tx_packet_recv(struct sock *sk, 
struct sk_buff *skb)
 {
struct ccid3_hc_tx_sock *hctx = ccid3_hc_tx_sk(sk);
struct ccid3_options_received *opt_recv;
-   struct dccp_tx_hist_entry *packet;
-   struct timeval now;
+   struct timeval now, t_send;
unsigned long t_nfb;
-   u32 pinv, r_sample;
+   u32 pinv, r_sample, rc;
 
BUG_ON(hctx == NULL);
 
@@ -440,10 +439,10 @@ static void ccid3_hc_tx_packet_recv(struct sock *sk, 
struct sk_buff *skb)
default:return;
}
 
-   /* get packet from history to look up t_recvdata */
-   packet = dccp_tx_hist_find_entry(&hctx->ccid3hctx_hist,
-DCCP_SKB_CB(skb)->dccpd_ack_seq);
-   if (unlikely(packet == NULL)) {
+   /* look up t_recvdata in packet history */
+   rc = dccp_tx_hist_get_send_time(ccid3_tx_hist, &hctx->ccid3hctx_hist,
+   DCCP_SKB_CB(skb)->dccpd_ack_seq, 
&t_send);
+   if (unlikely(!rc)) {
DCCP_WARN("%s(%p), seqno %llu(%s) doesn't exist in history!\n",
  dccp_role(sk), sk,
  (unsigned long long)DCCP_SKB_CB(skb)->dccpd_ack_seq,
@@ -466,7 +465,7 @@ static void ccid3_hc_tx_packet_recv(struct sock *sk, struct 
sk_buff *skb)
 * R_sample  =  (t_now - t_recvdata) - t_elapsed
 */
skb_get_timestamp(skb, &now);
-   r_sample = dccp_sample_rtt(sk, &now, &packet->dccphtx_tstamp);
+   r_sample = dccp_sample_rtt(sk, &now, &t_send);
 
/*
 * Update RTT estimate (honours RTT from SYN exchange):
@@ -523,9 +522,6 @@ done_computing_x:
/* unschedule no feedback timer */
sk_stop_timer(sk, &hctx->ccid3hctx_no_feedback_timer);
 
-   /* remove all packets older than the one acked from history */
-   dccp_tx_hist_purge_older(ccid3_tx_hist,
-&hctx->ccid3hctx_hist, packet);
/*
 * As we have calculated new ipi, delta, t_nom it is possible
 * that we now can send a packet, so wake up dccp_wait_for_ccid
diff --git a/net/dccp/ccids/lib/packet_history.c 
b/net/dccp/ccids/lib/packet_history.c
index 2e8ef42..d149b19 100644
--- a/net/dccp/ccids/lib/packet_history.c
+++ b/net/dccp/ccids/lib/packet_history.c
@@ -85,21 +85,36 @@ void dccp_tx_hist_delete(struct dccp_tx_hist *hist)
 
 EXPORT_SYMBOL_GPL(dccp_tx_hist_delete);
 
-struct dccp_tx_hist_entry *
-   dccp_tx_hist_find_entry(const struct list_head *list, const u64 seq)
+/**
+ *  dccp_tx_hist_get_send_time  -  Retrieve timestamp of sent packet
+ *
+ *  @hist:   history slab associated with TX history
+ *  @list:   TX history list head
+ *  @seq:48-bit sequence number of packet in question
+ *  @t_send: timestamp to be returned
+ *  Returns 1 on success and then garbage-collects all older entries.
+ */
+int dccp_tx_hist_get_send_time(struct dccp_tx_hist *hist,
+  struct list_head *list, u64 seq,
+  struct timeval *t_send)
 {
-   struct dccp_tx_hist_entry *packet = NULL, *entry;
+   struct dccp_tx_hist_entry *entry;
+   int found = 0;
 
list_for_each_entry(entry, list, dccphtx_node)
if (entry->dccphtx_seqno == seq) {
-   packet = entry;
+   found = 1;
+   *t_send = entry->dccphtx_tstamp;
break;
}
 
-   return packet;
+   if (found)
+  

[PATCH 17/29] Locking for packet history TX list

2007-04-12 Thread Arnaldo Carvalho de Melo
Currently the packet history lists are accessed asynchronously from different
contexts, but without any locking protection. This can lead to race conditions,
corruption and facilitates unpredictable behaviour.

In particular, this patch

 1) adds a R/W spinlock to protect the list against concurrent access;
 2) makes dccp_tx_hist_add_entry local to packet_history.c
 - locking should only be visible in the source file, but not the header 
file.

(NB - there are at the moment no pure readers for this list, so in principle a
 normal spinlock could alternatively be used).

  D e t a i l s
  -
 The question is whether to disable softirqs.

 (1) dccp_tx_hist_get_send_time is called in the following, simplified manner:

ccid3_hc_tx_packet_recv  <-  dccp_v{4,6}_do_rcv  <-  dccp_v{4,6}_rcv
 <-  release_sock

 dccp_v{4,6}_do_recv is called as backlog handler for dccp_v{4,6}_rcv, which
 means softirq context (via dccp_v{4,6}_rcv -> sk_receive_skb). The second
 alternative (release_sock) is in user context.

 (2) dccp_tx_hist_purge is only called in ccid3_hc_tx_exit, in this manner:

 ccid3_hc_tx_exit <- ccid_delete <- ccid_hc_tx_delete <- 
dccp_feat_update_ccid
  <- 
dccp_create_openreq_child
  <- dccp_init_sock
  <- dccp_destroy_sock

 Since dccp_tx_hist_purge is called at the end of an old or a new 
connection,
 disabling softirqs is not strictly necessary.

 (3) dccp_tx_hist_add_entry is called in mixed contexts:
 ccid3_hc_tx_packet_sent <- dccp_write_xmit <- dccp_write_xmit_timer 
[Softirq]
<- dccp_send_close   [User 
Context]
<- dccp_sendmsg  [User 
Context]

 Since the previous patch disables softirqs generally when 
ccid_hc_tx_packet_sent is
 called, disabling softirqs here is also not strictly necessary.

Due to the different context in (1) and so as not to make too many assumptions, 
I think
that the safest thing to do is to disable softirqs in all three cases. This 
prevents
strange problems should the code change somewhat in the future.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/ccids/lib/packet_history.c |   17 +
 net/dccp/ccids/lib/packet_history.h |6 +-
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/net/dccp/ccids/lib/packet_history.c 
b/net/dccp/ccids/lib/packet_history.c
index d149b19..a5dc450 100644
--- a/net/dccp/ccids/lib/packet_history.c
+++ b/net/dccp/ccids/lib/packet_history.c
@@ -41,6 +41,8 @@
 /*
  * Transmitter History Routines
  */
+DEFINE_RWLOCK(dccp_tx_hist_lock);
+
 struct dccp_tx_hist *dccp_tx_hist_new(const char *name)
 {
struct dccp_tx_hist *hist = kmalloc(sizeof(*hist), GFP_ATOMIC);
@@ -101,6 +103,7 @@ int dccp_tx_hist_get_send_time(struct dccp_tx_hist *hist,
struct dccp_tx_hist_entry *entry;
int found = 0;
 
+   write_lock_bh(&dccp_tx_hist_lock);
list_for_each_entry(entry, list, dccphtx_node)
if (entry->dccphtx_seqno == seq) {
found = 1;
@@ -110,24 +113,38 @@ int dccp_tx_hist_get_send_time(struct dccp_tx_hist *hist,
 
if (found)
dccp_tx_hist_purge_older(hist, list, entry);
+   write_unlock_bh(&dccp_tx_hist_lock);
 
return found;
 }
 
 EXPORT_SYMBOL_GPL(dccp_tx_hist_get_send_time);
 
+void dccp_tx_hist_add_entry(struct list_head *list,
+   struct dccp_tx_hist_entry *entry)
+{
+   write_lock_bh(&dccp_tx_hist_lock);
+   list_add(&entry->dccphtx_node, list);
+   write_unlock_bh(&dccp_tx_hist_lock);
+}
+
+EXPORT_SYMBOL_GPL(dccp_tx_hist_add_entry);
+
 void dccp_tx_hist_purge(struct dccp_tx_hist *hist, struct list_head *list)
 {
struct dccp_tx_hist_entry *entry, *next;
 
+   write_lock_bh(&dccp_tx_hist_lock);
list_for_each_entry_safe(entry, next, list, dccphtx_node) {
list_del_init(&entry->dccphtx_node);
dccp_tx_hist_entry_delete(hist, entry);
}
+   write_unlock_bh(&dccp_tx_hist_lock);
 }
 
 EXPORT_SYMBOL_GPL(dccp_tx_hist_purge);
 
+/* XXX careful, this one is not lock-protected */
 void dccp_tx_hist_purge_older(struct dccp_tx_hist *hist,
  struct list_head *list,
  struct dccp_tx_hist_entry *packet)
diff --git a/net/dccp/ccids/lib/packet_history.h 
b/net/dccp/ccids/lib/packet_history.h
index da81709..402ac90 100644
--- a/net/dccp/ccids/lib/packet_history.h
+++ b/net/dccp/ccid

[PATCH 18/29] Reduce time spent while write-lock is held

2007-04-12 Thread Arnaldo Carvalho de Melo
This minimizes the time during which the `write_lock' is held: by using a
temporary list_head and re-adjusting the pointers of the old list accordingly.
The clean-up operation can then safely be done outside the lock, since the
newly created list is then `homeless'.

 1. For dccp_tx_hist_purge, this is conveniently done by using 
list_replace_init,

 2. the case for dccp_tx_hist_purge_older is analogous and, in the absence of a
matching library function, done manually;

 3. the `dangerous' function dccp_tx_hist_purge_older has been removed by
merging its functionality into dccp_tx_hist_get_send_time.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/ccids/lib/packet_history.c |   47 ++
 net/dccp/ccids/lib/packet_history.h |4 ---
 2 files changed, 25 insertions(+), 26 deletions(-)

diff --git a/net/dccp/ccids/lib/packet_history.c 
b/net/dccp/ccids/lib/packet_history.c
index a5dc450..02bf58f 100644
--- a/net/dccp/ccids/lib/packet_history.c
+++ b/net/dccp/ccids/lib/packet_history.c
@@ -87,6 +87,16 @@ void dccp_tx_hist_delete(struct dccp_tx_hist *hist)
 
 EXPORT_SYMBOL_GPL(dccp_tx_hist_delete);
 
+static void __dccp_tx_hist_wipe(struct list_head *head, struct kmem_cache 
*slab)
+{
+   struct dccp_tx_hist_entry *entry, *next;
+
+   list_for_each_entry_safe(entry, next, head, dccphtx_node) {
+   list_del(&entry->dccphtx_node);
+   kmem_cache_free(slab, entry);
+   }
+}
+
 /**
  *  dccp_tx_hist_get_send_time  -  Retrieve timestamp of sent packet
  *
@@ -101,6 +111,7 @@ int dccp_tx_hist_get_send_time(struct dccp_tx_hist *hist,
   struct timeval *t_send)
 {
struct dccp_tx_hist_entry *entry;
+   struct list_head free_this;
int found = 0;
 
write_lock_bh(&dccp_tx_hist_lock);
@@ -108,12 +119,20 @@ int dccp_tx_hist_get_send_time(struct dccp_tx_hist *hist,
if (entry->dccphtx_seqno == seq) {
found = 1;
*t_send = entry->dccphtx_tstamp;
+   /* forget/free all entries older than this one */
+   free_this.next  = entry->dccphtx_node.next;
+   free_this.next->prev= &free_this;
+   free_this.prev  = list->prev;
+   free_this.prev->next= &free_this;
+
+   entry->dccphtx_node.next = list;
+   list->prev   = &entry->dccphtx_node;
break;
}
+   write_unlock_bh(&dccp_tx_hist_lock);
 
if (found)
-   dccp_tx_hist_purge_older(hist, list, entry);
-   write_unlock_bh(&dccp_tx_hist_lock);
+   __dccp_tx_hist_wipe(&free_this, hist->dccptxh_slab);
 
return found;
 }
@@ -132,32 +151,16 @@ EXPORT_SYMBOL_GPL(dccp_tx_hist_add_entry);
 
 void dccp_tx_hist_purge(struct dccp_tx_hist *hist, struct list_head *list)
 {
-   struct dccp_tx_hist_entry *entry, *next;
+   struct list_head free_this;
 
write_lock_bh(&dccp_tx_hist_lock);
-   list_for_each_entry_safe(entry, next, list, dccphtx_node) {
-   list_del_init(&entry->dccphtx_node);
-   dccp_tx_hist_entry_delete(hist, entry);
-   }
+   list_replace_init(list, &free_this);
write_unlock_bh(&dccp_tx_hist_lock);
-}
-
-EXPORT_SYMBOL_GPL(dccp_tx_hist_purge);
 
-/* XXX careful, this one is not lock-protected */
-void dccp_tx_hist_purge_older(struct dccp_tx_hist *hist,
- struct list_head *list,
- struct dccp_tx_hist_entry *packet)
-{
-   struct dccp_tx_hist_entry *next;
-
-   list_for_each_entry_safe_continue(packet, next, list, dccphtx_node) {
-   list_del_init(&packet->dccphtx_node);
-   dccp_tx_hist_entry_delete(hist, packet);
-   }
+   __dccp_tx_hist_wipe(&free_this, hist->dccptxh_slab);
 }
 
-EXPORT_SYMBOL_GPL(dccp_tx_hist_purge_older);
+EXPORT_SYMBOL_GPL(dccp_tx_hist_purge);
 
 /*
  * Receiver History Routines
diff --git a/net/dccp/ccids/lib/packet_history.h 
b/net/dccp/ccids/lib/packet_history.h
index 402ac90..6208188 100644
--- a/net/dccp/ccids/lib/packet_history.h
+++ b/net/dccp/ccids/lib/packet_history.h
@@ -95,10 +95,6 @@ static inline void dccp_tx_hist_entry_delete(struct 
dccp_tx_hist *hist,
 extern void dccp_tx_hist_purge(struct dccp_tx_hist *hist,
   struct list_head *list);
 
-extern void dccp_tx_hist_purge_older(struct dccp_tx_hist *hist,
-struct list_head *list,
-struct dccp_tx_hist_entry *next);
-
 /*
  * Receiver History data structures and declarations
  */
-- 
1.5.0.6

-
To u

[PATCH 12/29] Use t_RTO as indicator for feedback reception

2007-04-12 Thread Arnaldo Carvalho de Melo
 This patch uses t_RTO instead of ccid3hctx_state to check whether upon
 expiry of the nofeedback timer feedback has been received yet.

 Justification:

  Functionally, there is no change yet using this patch, but it is necessary
  for the support of nofeedback handling as per rfc3448bis (subsequent patch).

  The main advantage is that ccid3hctx_state can now be used to toggle between
  the reception of feedback and no feedback as in "the nofeedback timer has
  expired". This is more efficient, since the state "packet sent but no initial
  feedback" happens only once during the lifetime of a connection, whereas the
  other state change (feedback <-> nofeedback timer expires) happens often.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/ccids/ccid3.c |   19 ++-
 1 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index e166f5c..471db59 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -244,9 +244,6 @@ static void ccid3_hc_tx_no_feedback_timer(unsigned long 
data)
   "bytes/s\n", dccp_role(sk), sk,
   ccid3_tx_state_name(hctx->ccid3hctx_state),
   (unsigned)(hctx->ccid3hctx_x >> 6));
-   /* The value of R is still undefined and so we can not recompute
-* the timout value. Keep initial value as per [RFC 4342, 5]. */
-   t_nfb = TFRC_INITIAL_TIMEOUT;
ccid3_update_send_interval(hctx);
break;
case TFRC_SSTATE_FBACK:
@@ -278,12 +275,6 @@ static void ccid3_hc_tx_no_feedback_timer(unsigned long 
data)
}
/* Now recalculate X [RFC 3448, 4.3, step (4)] */
ccid3_hc_tx_update_x(sk, &now);
-   /*
-* Schedule no feedback timer to expire in
-* max(t_RTO, 2 * s/X)  =  max(t_RTO, 2 * t_ipi)
-* See comments in packet_recv() regarding the value of t_RTO.
-*/
-   t_nfb = max(hctx->ccid3hctx_t_rto, 2 * hctx->ccid3hctx_t_ipi);
break;
case TFRC_SSTATE_NO_SENT:
DCCP_BUG("%s(%p) - Illegal state NO_SENT", dccp_role(sk), sk);
@@ -294,6 +285,15 @@ static void ccid3_hc_tx_no_feedback_timer(unsigned long 
data)
 
hctx->ccid3hctx_idle = 1;
 
+   /*
+* Set new timeout for the nofeedback timer.
+* See comments in packet_recv() regarding the value of t_RTO.
+*/
+   if (unlikely(hctx->ccid3hctx_t_rto == 0))   /* no feedback yet */
+   t_nfb = TFRC_INITIAL_TIMEOUT;
+   else
+   t_nfb = max(hctx->ccid3hctx_t_rto, 2 * hctx->ccid3hctx_t_ipi);
+
 restart_timer:
sk_reset_timer(sk, &hctx->ccid3hctx_no_feedback_timer,
   jiffies + usecs_to_jiffies(t_nfb));
@@ -631,6 +631,7 @@ static int ccid3_hc_tx_init(struct ccid *ccid, struct sock 
*sk)
 
hctx->ccid3hctx_s = 0;
hctx->ccid3hctx_rtt   = 0;
+   hctx->ccid3hctx_t_rto = 0;
hctx->ccid3hctx_state = TFRC_SSTATE_NO_SENT;
INIT_LIST_HEAD(&hctx->ccid3hctx_hist);
 
-- 
1.5.0.6

-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 19/29] Remove unused fields in packet history structure

2007-04-12 Thread Arnaldo Carvalho de Melo
This removes two fields of the TX packet history structure which
are not referenced by the CCID 3 code and are not needed:

 * dccphtx_sent  -  is redundant since the fact that an entry is
   present in the TX history in itself is an indication that the
   packet has been sent (cf. dccp_write_xmit and ccid3_hc_tx_packet_sent);

 * dccphtx_rtt  -  is nowhere referenced and is not even required:
   the `Preventing Oscillations' mechanism in [RFC 3448, 4.5] uses a
   moving-average, but does not require to memorize past RTTs;
   the history field has no further use;

As a further benefit, the history entry size is reduced.

NB : Adding the newly created entry is now at the end of packet_sent - if it is
 in the history before being fully filled in, list corruption is possible.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Acked-by: Ian McDonald <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/ccids/ccid3.c  |4 +---
 net/dccp/ccids/lib/packet_history.h |   12 ++--
 2 files changed, 3 insertions(+), 13 deletions(-)

diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index 0029979..cd3565f 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -408,13 +408,11 @@ static void ccid3_hc_tx_packet_sent(struct sock *sk, int 
more,
DCCP_CRIT("packet history - out of memory!");
return;
}
-   dccp_tx_hist_add_entry(&hctx->ccid3hctx_hist, packet);
 
do_gettimeofday(&now);
packet->dccphtx_tstamp = now;
packet->dccphtx_seqno  = dccp_sk(sk)->dccps_gss;
-   packet->dccphtx_rtt= hctx->ccid3hctx_rtt;
-   packet->dccphtx_sent   = 1;
+   dccp_tx_hist_add_entry(&hctx->ccid3hctx_hist, packet);
 }
 
 static void ccid3_hc_tx_packet_recv(struct sock *sk, struct sk_buff *skb)
diff --git a/net/dccp/ccids/lib/packet_history.h 
b/net/dccp/ccids/lib/packet_history.h
index 6208188..78ef50f 100644
--- a/net/dccp/ccids/lib/packet_history.h
+++ b/net/dccp/ccids/lib/packet_history.h
@@ -54,9 +54,7 @@
  */
 struct dccp_tx_hist_entry {
struct list_head dccphtx_node;
-   u64  dccphtx_seqno:48,
-dccphtx_sent:1;
-   u32  dccphtx_rtt;
+   u64  dccphtx_seqno;
struct timeval   dccphtx_tstamp;
 };
 
@@ -71,13 +69,7 @@ static inline struct dccp_tx_hist_entry *
dccp_tx_hist_entry_new(struct dccp_tx_hist *hist,
   const gfp_t prio)
 {
-   struct dccp_tx_hist_entry *entry = kmem_cache_alloc(hist->dccptxh_slab,
-   prio);
-
-   if (entry != NULL)
-   entry->dccphtx_sent = 0;
-
-   return entry;
+   return kmem_cache_alloc(hist->dccptxh_slab, prio);
 }
 
 extern int dccp_tx_hist_get_send_time(struct dccp_tx_hist *, struct list_head 
*,
-- 
1.5.0.6

-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 20/29] Add socket option to query the current MPS

2007-04-12 Thread Arnaldo Carvalho de Melo
This enables applications to query the current value of the Maximum Packet
Size via a socket option, suggested as a SHOULD in (RFC 4340, p. 102).

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 Documentation/networking/dccp.txt |3 +++
 include/linux/dccp.h  |1 +
 net/dccp/proto.c  |4 
 3 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/Documentation/networking/dccp.txt 
b/Documentation/networking/dccp.txt
index 4504cc5..521e2dd 100644
--- a/Documentation/networking/dccp.txt
+++ b/Documentation/networking/dccp.txt
@@ -41,6 +41,9 @@ the socket will fall back to 0 (which means that no 
meaningful service code
 is present). Connecting sockets set at most one service option; for
 listening sockets, multiple service codes can be specified.
 
+DCCP_SOCKOPT_GET_CUR_MPS is read-only and retrieves the current maximum packet
+size (application payload size) in bytes, see RFC 4340, section 14.
+
 DCCP_SOCKOPT_SEND_CSCOV and DCCP_SOCKOPT_RECV_CSCOV are used for setting the
 partial checksum coverage (RFC 4340, sec. 9.2). The default is that checksums
 always cover the entire packet and that only fully covered application data is
diff --git a/include/linux/dccp.h b/include/linux/dccp.h
index c297212..dcaa1ac 100644
--- a/include/linux/dccp.h
+++ b/include/linux/dccp.h
@@ -204,6 +204,7 @@ struct dccp_so_feat {
 #define DCCP_SOCKOPT_SERVICE   2
 #define DCCP_SOCKOPT_CHANGE_L  3
 #define DCCP_SOCKOPT_CHANGE_R  4
+#define DCCP_SOCKOPT_GET_CUR_MPS   5
 #define DCCP_SOCKOPT_SEND_CSCOV10
 #define DCCP_SOCKOPT_RECV_CSCOV11
 #define DCCP_SOCKOPT_CCID_RX_INFO  128
diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index 7337e39..e00359d 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -586,6 +586,10 @@ static int do_dccp_getsockopt(struct sock *sk, int level, 
int optname,
case DCCP_SOCKOPT_SERVICE:
return dccp_getsockopt_service(sk, len,
   (__be32 __user *)optval, optlen);
+   case DCCP_SOCKOPT_GET_CUR_MPS:
+   val = dp->dccps_mss_cache;
+   len = sizeof(val);
+   break;
case DCCP_SOCKOPT_SEND_CSCOV:
val = dp->dccps_pcslen;
len = sizeof(val);
-- 
1.5.0.6

-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 21/29] Wait for CCID

2007-04-12 Thread Arnaldo Carvalho de Melo
This performs a minor optimisation: when ccid_hc_tx_send_packet returns a value
greater than zero, then the same call previously was done again at the begin of
the while loop in dccp_wait_for_ccid.

This patch exploits the available information and schedule-timeouts directly
instead.

Documentation also added.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/output.c |   34 +++---
 1 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/net/dccp/output.c b/net/dccp/output.c
index 0978bc2..9ca44cb 100644
--- a/net/dccp/output.c
+++ b/net/dccp/output.c
@@ -174,34 +174,38 @@ void dccp_write_space(struct sock *sk)
 
 /**
  * dccp_wait_for_ccid - Wait for ccid to tell us we can send a packet
- * @sk: socket to wait for
+ * @sk:socket to wait for
+ * @skb:   current skb to pass on for waiting
+ * @delay: sleep timeout in milliseconds (> 0)
+ * This function is called by default when the socket is closed, and
+ * when a non-zero linger time is set on the socket. For consistency
  */
-static int dccp_wait_for_ccid(struct sock *sk, struct sk_buff *skb)
+static int dccp_wait_for_ccid(struct sock *sk, struct sk_buff *skb, int delay)
 {
struct dccp_sock *dp = dccp_sk(sk);
DEFINE_WAIT(wait);
-   unsigned long delay;
+   unsigned long jiffdelay;
int rc;
 
-   while (1) {
+   do {
+   dccp_pr_debug("delayed send by %d msec\n", delay);
+   jiffdelay = msecs_to_jiffies(delay);
+
prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE);
 
+   sk->sk_write_pending++;
+   release_sock(sk);
+   schedule_timeout(jiffdelay);
+   lock_sock(sk);
+   sk->sk_write_pending--;
+
if (sk->sk_err)
goto do_error;
if (signal_pending(current))
goto do_interrupted;
 
rc = ccid_hc_tx_send_packet(dp->dccps_hc_tx_ccid, sk, skb);
-   if (rc <= 0)
-   break;
-   dccp_pr_debug("delayed send by %d msec\n", rc);
-   delay = msecs_to_jiffies(rc);
-   sk->sk_write_pending++;
-   release_sock(sk);
-   schedule_timeout(delay);
-   lock_sock(sk);
-   sk->sk_write_pending--;
-   }
+   } while ((delay = rc) > 0);
 out:
finish_wait(sk->sk_sleep, &wait);
return rc;
@@ -228,7 +232,7 @@ void dccp_write_xmit(struct sock *sk, int block)
msecs_to_jiffies(err)+jiffies);
break;
} else
-   err = dccp_wait_for_ccid(sk, skb);
+   err = dccp_wait_for_ccid(sk, skb, err);
if (err && err != -EINTR)
DCCP_BUG("err=%d after dccp_wait_for_ccid", 
err);
}
-- 
1.5.0.6

-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 22/29] Make all `debug' parameters bool

2007-04-12 Thread Arnaldo Carvalho de Melo
This just sets the parameter to bool, since debugging messages
are either on or off.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Acked-by: Ian McDonald <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/ccids/ccid2.c |2 +-
 net/dccp/ccids/ccid3.c |2 +-
 net/dccp/proto.c   |2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/dccp/ccids/ccid2.c b/net/dccp/ccids/ccid2.c
index 248d20f..67a3d23 100644
--- a/net/dccp/ccids/ccid2.c
+++ b/net/dccp/ccids/ccid2.c
@@ -835,7 +835,7 @@ static struct ccid_operations ccid2 = {
 };
 
 #ifdef CONFIG_IP_DCCP_CCID2_DEBUG
-module_param(ccid2_debug, int, 0444);
+module_param(ccid2_debug, bool, 0444);
 MODULE_PARM_DESC(ccid2_debug, "Enable debug messages");
 #endif
 
diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index cd3565f..590c002 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -1215,7 +1215,7 @@ static struct ccid_operations ccid3 = {
 };
 
 #ifdef CONFIG_IP_DCCP_CCID3_DEBUG
-module_param(ccid3_debug, int, 0444);
+module_param(ccid3_debug, bool, 0444);
 MODULE_PARM_DESC(ccid3_debug, "Enable debug messages");
 #endif
 
diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index e00359d..a93bb3d 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -991,7 +991,7 @@ MODULE_PARM_DESC(thash_entries, "Number of ehash buckets");
 
 #ifdef CONFIG_IP_DCCP_DEBUG
 int dccp_debug;
-module_param(dccp_debug, int, 0444);
+module_param(dccp_debug, bool, 0444);
 MODULE_PARM_DESC(dccp_debug, "Enable debug messages");
 
 EXPORT_SYMBOL_GPL(dccp_debug);
-- 
1.5.0.6

-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 23/29] Unaligned pointer access

2007-04-12 Thread Arnaldo Carvalho de Melo
This fixes `unaligned (read) access' errors of the type

 Kernel unaligned access at TPC[100f970c] dccp_parse_options+0x4f4/0x7e0 [dccp]
 Kernel unaligned access at TPC[1011f2e4] ccid3_hc_tx_parse_options+0x1ac/0x380 
[dccp_ccid3]
 Kernel unaligned access at TPC[100f9898] dccp_parse_options+0x680/0x880 [dccp]

by using the get_unaligned macro for parsing options.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/ccids/ccid3.c |9 +
 net/dccp/ccids/ccid3.h |1 +
 net/dccp/options.c |   34 ++
 3 files changed, 28 insertions(+), 16 deletions(-)

diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index 590c002..9474331 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -560,6 +560,7 @@ static int ccid3_hc_tx_parse_options(struct sock *sk, 
unsigned char option,
const struct dccp_sock *dp = dccp_sk(sk);
struct ccid3_hc_tx_sock *hctx = ccid3_hc_tx_sk(sk);
struct ccid3_options_received *opt_recv;
+   u32 opt_val;
 
BUG_ON(hctx == NULL);
 
@@ -581,8 +582,8 @@ static int ccid3_hc_tx_parse_options(struct sock *sk, 
unsigned char option,
  dccp_role(sk), sk, len);
rc = -EINVAL;
} else {
-   opt_recv->ccid3or_loss_event_rate =
-   ntohl(*(__be32 *)value);
+   opt_val = get_unaligned((u32 *)value);
+   opt_recv->ccid3or_loss_event_rate = ntohl(opt_val);
ccid3_pr_debug("%s(%p), LOSS_EVENT_RATE=%u\n",
   dccp_role(sk), sk,
   opt_recv->ccid3or_loss_event_rate);
@@ -603,8 +604,8 @@ static int ccid3_hc_tx_parse_options(struct sock *sk, 
unsigned char option,
  dccp_role(sk), sk, len);
rc = -EINVAL;
} else {
-   opt_recv->ccid3or_receive_rate =
-   ntohl(*(__be32 *)value);
+   opt_val = get_unaligned((u32 *)value);
+   opt_recv->ccid3or_receive_rate = ntohl(opt_val);
ccid3_pr_debug("%s(%p), RECEIVE_RATE=%u\n",
   dccp_role(sk), sk,
   opt_recv->ccid3or_receive_rate);
diff --git a/net/dccp/ccids/ccid3.h b/net/dccp/ccids/ccid3.h
index 8d31b38..fbef100 100644
--- a/net/dccp/ccids/ccid3.h
+++ b/net/dccp/ccids/ccid3.h
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "../ccid.h"
 
 /* Two seconds as per RFC 3448 4.2 */
diff --git a/net/dccp/options.c b/net/dccp/options.c
index 72c11ef..3e11f18 100644
--- a/net/dccp/options.c
+++ b/net/dccp/options.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -68,7 +69,7 @@ int dccp_parse_options(struct sock *sk, struct sk_buff *skb)
struct dccp_options_received *opt_recv = &dp->dccps_options_received;
unsigned char opt, len;
unsigned char *value;
-   u32 elapsed_time;
+   u32 elapsed_time, opt_val;
int rc;
int mandatory = 0;
 
@@ -155,7 +156,8 @@ int dccp_parse_options(struct sock *sk, struct sk_buff *skb)
if (len != 4)
goto out_invalid_option;
 
-   opt_recv->dccpor_timestamp = ntohl(*(__be32 *)value);
+   opt_val = get_unaligned((u32 *)value);
+   opt_recv->dccpor_timestamp = ntohl(opt_val);
 
dp->dccps_timestamp_echo = opt_recv->dccpor_timestamp;
do_gettimeofday(&dp->dccps_timestamp_time);
@@ -169,7 +171,8 @@ int dccp_parse_options(struct sock *sk, struct sk_buff *skb)
if (len != 4 && len != 6 && len != 8)
goto out_invalid_option;
 
-   opt_recv->dccpor_timestamp_echo = ntohl(*(__be32 
*)value);
+   opt_val = get_unaligned((u32 *)value);
+   opt_recv->dccpor_timestamp_echo = ntohl(opt_val);
 
dccp_pr_debug("%s rx opt: TIMESTAMP_ECHO=%u, len=%d, "
  "ackno=%llu", dccp_role(sk),
@@ -178,16 +181,20 @@ int dccp_parse_options(struct sock *sk, struct sk_buff 
*skb)
  (unsigned long long)
  DCCP_SKB_CB(skb)->dccpd_ack_seq);
 
+   value += 4;
 
-   if (len == 4) {
+   if (len == 4) { /* no elapsed time included */

[PATCH 24/29] Macro for moving average

2007-04-12 Thread Arnaldo Carvalho de Melo
The moving average computation occurs so frequently in the CCID 3 code that
it merits a macro of its own.

Committer note: changed the patch to have it as an inline that returns the new
value, keeping the logic.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Acked-by: Ian McDonald <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/ccids/ccid3.c|8 +++-
 net/dccp/ccids/lib/tfrc.h |   11 +++
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index 9474331..428aa92 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -183,7 +183,7 @@ static inline void ccid3_hc_tx_update_s(struct 
ccid3_hc_tx_sock *hctx, int len)
 {
const u16 old_s = hctx->ccid3hctx_s;
 
-   hctx->ccid3hctx_s = old_s == 0 ? len : (9 * old_s + len) / 10;
+   hctx->ccid3hctx_s = tfrc_ewma(hctx->ccid3hctx_s, len, 9);
 
if (hctx->ccid3hctx_s != old_s)
ccid3_update_send_interval(hctx);
@@ -474,8 +474,7 @@ static void ccid3_hc_tx_packet_recv(struct sock *sk, struct 
sk_buff *skb)
 *
 * q is a constant, RFC 3448 recommends 0.9
 */
-   hctx->ccid3hctx_rtt = hctx->ccid3hctx_rtt == 0 ?  r_sample
-   : (9 * hctx->ccid3hctx_rtt + r_sample) / 10;
+   hctx->ccid3hctx_rtt = tfrc_ewma(hctx->ccid3hctx_rtt, r_sample, 9);
 
/*
 * Update allowed sending rate as per draft rfc3448bis, 4.2/4.3
@@ -724,8 +723,7 @@ static inline void ccid3_hc_rx_update_s(struct 
ccid3_hc_rx_sock *hcrx, int len)
if (unlikely(len == 0)) /* don't update on empty packets (e.g. ACKs) */
ccid3_pr_debug("Packet payload length is 0 - not updating\n");
else
-   hcrx->ccid3hcrx_s = hcrx->ccid3hcrx_s == 0 ? len :
-   (9 * hcrx->ccid3hcrx_s + len) / 10;
+   hcrx->ccid3hcrx_s = tfrc_ewma(hcrx->ccid3hcrx_s, len, 9);
 }
 
 static void ccid3_hc_rx_send_feedback(struct sock *sk)
diff --git a/net/dccp/ccids/lib/tfrc.h b/net/dccp/ccids/lib/tfrc.h
index faf5f7e..6addc23 100644
--- a/net/dccp/ccids/lib/tfrc.h
+++ b/net/dccp/ccids/lib/tfrc.h
@@ -37,6 +37,17 @@ static inline u32 scaled_div32(u64 a, u32 b)
return result;
 }
 
+/**
+ * Exponentially weighted moving average
+ * @weight: Weight to be used as damping factor, in units of 1/10
+ */
+static inline u32 tfrc_ewma(const u32 val, const u32 newval, const u8 weight)
+{
+   if (val != 0)
+   return (weight * val + (10 - weight) * newval) / 10;
+   return newval;
+}
+
 extern u32 tfrc_calc_x(u16 s, u32 R, u32 p);
 extern u32 tfrc_calc_x_reverse_lookup(u32 fvalue);
 
-- 
1.5.0.6

-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 25/29] Add test function for data packets

2007-04-12 Thread Arnaldo Carvalho de Melo
This adds a function which returns `true' when an skb contains one of the packet
types specified in [RFC 4340, 7.7] as `data packet'.

NB - Resisted the temptation to define
 int dccp_non_data_packet(skb) { return !dccp_data_packet(skb); },
 since maybe someone will define new packet types.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Acked-by: Ian McDonald <[EMAIL PROTECTED]
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/dccp.h |   12 
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/net/dccp/dccp.h b/net/dccp/dccp.h
index b00fc7a..0dc4b7e 100644
--- a/net/dccp/dccp.h
+++ b/net/dccp/dccp.h
@@ -320,6 +320,7 @@ struct dccp_skb_cb {
 
 #define DCCP_SKB_CB(__skb) ((struct dccp_skb_cb *)&((__skb)->cb[0]))
 
+/* RFC 4340, sec. 7.7 */
 static inline int dccp_non_data_packet(const struct sk_buff *skb)
 {
const __u8 type = DCCP_SKB_CB(skb)->dccpd_type;
@@ -332,6 +333,17 @@ static inline int dccp_non_data_packet(const struct 
sk_buff *skb)
   type == DCCP_PKT_SYNCACK;
 }
 
+/* RFC 4340, sec. 7.7 */
+static inline int dccp_data_packet(const struct sk_buff *skb)
+{
+   const __u8 type = DCCP_SKB_CB(skb)->dccpd_type;
+
+   return type == DCCP_PKT_DATA ||
+  type == DCCP_PKT_DATAACK  ||
+  type == DCCP_PKT_REQUEST  ||
+  type == DCCP_PKT_RESPONSE;
+}
+
 static inline int dccp_packet_without_ack(const struct sk_buff *skb)
 {
const __u8 type = DCCP_SKB_CB(skb)->dccpd_type;
-- 
1.5.0.6

-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 26/29] Modulo-16 arithmetic for window counters

2007-04-12 Thread Arnaldo Carvalho de Melo
This implements modulo-16 arithmetic which is needed to compare
the 4-bit CCID 3 window counter (CCVal) values; it respects
circular wraparound and returns a-b mod 16.

Implemented as a macro, since required in several places.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/ccids/ccid3.c  |4 +---
 net/dccp/ccids/lib/packet_history.c |4 +---
 net/dccp/ccids/lib/packet_history.h |3 ++-
 3 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index 428aa92..6b1cc1b 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -841,9 +841,7 @@ static u32 ccid3_hc_rx_calc_first_li(struct sock *sk)
step = 1;
break;
case 1:
-   interval = win_count - entry->dccphrx_ccval;
-   if (interval < 0)
-   interval += TFRC_WIN_COUNT_LIMIT;
+   interval = SUB16(win_count, 
entry->dccphrx_ccval);
if (interval > 4)
goto found;
break;
diff --git a/net/dccp/ccids/lib/packet_history.c 
b/net/dccp/ccids/lib/packet_history.c
index 02bf58f..13cae61 100644
--- a/net/dccp/ccids/lib/packet_history.c
+++ b/net/dccp/ccids/lib/packet_history.c
@@ -291,9 +291,7 @@ void dccp_rx_hist_add_packet(struct dccp_rx_hist *hist,
win_count = entry->dccphrx_ccval;
break;
case 2:
-   tmp = win_count - entry->dccphrx_ccval;
-   if (tmp < 0)
-   tmp += TFRC_WIN_COUNT_LIMIT;
+   tmp = SUB16(win_count, 
entry->dccphrx_ccval);
if (tmp > TFRC_WIN_COUNT_PER_RTT + 1) {
/*
 * We have found a packet older
diff --git a/net/dccp/ccids/lib/packet_history.h 
b/net/dccp/ccids/lib/packet_history.h
index 78ef50f..9a27665 100644
--- a/net/dccp/ccids/lib/packet_history.h
+++ b/net/dccp/ccids/lib/packet_history.h
@@ -47,7 +47,8 @@
 #define TFRC_RECV_NUM_LATE_LOSS 3
 
 #define TFRC_WIN_COUNT_PER_RTT  4
-#define TFRC_WIN_COUNT_LIMIT   16
+/* Subtraction a-b modulo-16, respects circular wrap-around */
+#define SUB16(a, b) (((a) + 16 - (b)) & 0xF)
 
 /*
  * Transmitter History data structures and declarations
-- 
1.5.0.6

-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 27/29] Receiver does not set CCVal

2007-04-12 Thread Arnaldo Carvalho de Melo
This removes an unnecessary statement, since the receiver of a half-connection
does not set window counter values: as specified in RFCs 3448 and 4342,
the sender provides the RTT estimates to the receiver [RFC 3448, 3.2.1], which
in the particular case of CCID3 means that only the sender sets window counters
[RFC 4342, sections 5 and 8.1].

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/ccids/ccid3.c |2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index 6b1cc1b..e7d2b62 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -792,8 +792,6 @@ static int ccid3_hc_rx_insert_options(struct sock *sk, 
struct sk_buff *skb)
if (!(sk->sk_state == DCCP_OPEN || sk->sk_state == DCCP_PARTOPEN))
return 0;
 
-   DCCP_SKB_CB(skb)->dccpd_ccval = hcrx->ccid3hcrx_ccval_last_counter;
-
if (dccp_packet_without_ack(skb))
return 0;
 
-- 
1.5.0.6

-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 28/29] Use a function to update p_inv, and p is never used

2007-04-12 Thread Arnaldo Carvalho de Melo
This
 1) concentrates previously scattered computation of p_inv into one function;
 2) removes the `p' element of the CCID3 RX sock (it is redundant);
 3) makes the tfrc_rx_info structure standalone, only used on demand.

The reason for (2) is that the code essentially only uses p_inv = 1/p. It does 
not
need p, since all the relevant information is already in p_inv.
The motivation for (3) is that the embedded info structure will not be used 
often,
so that the extra cost of keeping p in synch with p_inv (which has to be done at
each packet reception) is overhead most of the time.

Advantages that this buys:

 * the RX sock loses further weight;
 * computation of p_inv is no longer scattered in different places;
 * not having to keep p in sync with p_inv speeds up computation
   (important, as p_inv has to be recomputed for each new data packet);
 * several test cases can now be removed since all is in one function;
 * the send_feedback and packet_recv code becomes simpler.

Committer note: Reorganized the struct a bit using pahole so that we can save
8 bytes on 64bit architectures, before it was 104 bytes, with
my change it got down to 96.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Acked-by: Ian McDonald <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/ccids/ccid3.c |   57 +--
 net/dccp/ccids/ccid3.h |   21 +++--
 2 files changed, 39 insertions(+), 39 deletions(-)

diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index e7d2b62..26e8870 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -726,6 +726,25 @@ static inline void ccid3_hc_rx_update_s(struct 
ccid3_hc_rx_sock *hcrx, int len)
hcrx->ccid3hcrx_s = tfrc_ewma(hcrx->ccid3hcrx_s, len, 9);
 }
 
+/* returns: 1 when p > p_prev (i.e. when feedback is required); 0 else */
+static int ccid3_hc_rx_update_p(struct ccid3_hc_rx_sock *hcrx)
+{
+   struct list_head *li_hist = &hcrx->ccid3hcrx_li_hist;
+   u32 pinv_prev = hcrx->ccid3hcrx_pinv;
+
+   if (list_empty(li_hist)) /* only empty if no loss so far, i.e. p == 0 */
+   return 0;
+
+   hcrx->ccid3hcrx_pinv = dccp_li_hist_calc_i_mean(li_hist);
+   if (hcrx->ccid3hcrx_pinv == 0) {
+   DCCP_BUG("non-empty LI history and yet I_mean == 0!");
+   return 0;
+   }
+
+   /* exploit that  p > p_prev  <=>  1/p < 1/p_prev  */
+   return (hcrx->ccid3hcrx_pinv < pinv_prev);
+}
+
 static void ccid3_hc_rx_send_feedback(struct sock *sk)
 {
struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk);
@@ -741,6 +760,7 @@ static void ccid3_hc_rx_send_feedback(struct sock *sk)
switch (hcrx->ccid3hcrx_state) {
case TFRC_RSTATE_NO_DATA:
hcrx->ccid3hcrx_x_recv = 0;
+   hcrx->ccid3hcrx_pinv   = ~0U;   /* see RFC 4342, 8.5 */
break;
case TFRC_RSTATE_DATA:
delta = timeval_delta(&now,
@@ -770,14 +790,6 @@ static void ccid3_hc_rx_send_feedback(struct sock *sk)
DCCP_BUG_ON(delta < 0);
hcrx->ccid3hcrx_elapsed_time = delta / 10;
 
-   if (hcrx->ccid3hcrx_p == 0)
-   hcrx->ccid3hcrx_pinv = ~0U; /* see RFC 4342, 8.5 */
-   else if (hcrx->ccid3hcrx_p > 100) {
-   DCCP_WARN("p (%u) > 100%%\n", hcrx->ccid3hcrx_p);
-   hcrx->ccid3hcrx_pinv = 1;   /* use 100% in this case */
-   } else
-   hcrx->ccid3hcrx_pinv = 100 / hcrx->ccid3hcrx_p;
-
dp->dccps_hc_rx_insert_options = 1;
dccp_send_ack(sk);
 }
@@ -1016,7 +1028,7 @@ static void ccid3_hc_rx_packet_recv(struct sock *sk, 
struct sk_buff *skb)
const struct dccp_options_received *opt_recv;
struct dccp_rx_hist_entry *packet;
struct timeval now;
-   u32 p_prev, r_sample, rtt_prev;
+   u32 r_sample, rtt_prev;
int loss, payload_size;
 
BUG_ON(hcrx == NULL);
@@ -1096,22 +1108,8 @@ static void ccid3_hc_rx_packet_recv(struct sock *sk, 
struct sk_buff *skb)
ccid3_pr_debug("%s(%p, state=%s), data loss! Reacting...\n",
   dccp_role(sk), sk, dccp_state_name(sk->sk_state));
 
-   p_prev = hcrx->ccid3hcrx_p;
-
-   /* Calculate loss event rate */
-   if (!list_empty(&hcrx->ccid3hcrx_li_hist)) {
-   u32 i_mean = dccp_li_hist_calc_i_mean(&hcrx->ccid3hcrx_li_hist);
-
-   /* Scaling up by 100 as fixed decimal */
-   if (i_mean != 0)
-   hcrx->ccid3hcrx_p = 100 / i_mean;
-   } else
-   DCCP_BUG("empty loss history");
-
-   if (hcrx->ccid3hcrx_p > p_prev) {
+   if (ccid3_hc_rx_update_p(hcrx))
ccid3_hc_rx_send_feedba

[PATCH 29/29] History access is redundant when sending feedback

2007-04-12 Thread Arnaldo Carvalho de Melo
This is a simplification to reduce the number of history accesses: when
receiving a new packet, the receiver needs to access the RX history twice:

 * first to create / add a new entry
 * then, when sending feedback, the just-created entry must be `found'
   via history lookup.

Meaning that, for the same information, it first needs an exclusive
write-access and then a (shared) read access. Without locking in between
these calls, the history information may have already changed.

This patch bypasses the double lookup by directly using the available data
from skb.  In addition, some more informative debugging information was added.

(NB: using the `packet' entry from rx_packet_recv() is not a good idea, since 
this
 can interfer with the RX history garbage collection: the entry may be 
garbage-
 collected at the same instant that send_feedback() tries to use it.)

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/ccids/ccid3.c |   35 ++-
 1 files changed, 14 insertions(+), 21 deletions(-)

diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index 26e8870..6e94496 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -745,15 +745,12 @@ static int ccid3_hc_rx_update_p(struct ccid3_hc_rx_sock 
*hcrx)
return (hcrx->ccid3hcrx_pinv < pinv_prev);
 }
 
-static void ccid3_hc_rx_send_feedback(struct sock *sk)
+static void ccid3_hc_rx_send_feedback(struct sock *sk, struct sk_buff *skb)
 {
struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk);
struct dccp_sock *dp = dccp_sk(sk);
-   struct dccp_rx_hist_entry *packet;
-   struct timeval now;
-   suseconds_t delta;
-
-   ccid3_pr_debug("%s(%p) - entry \n", dccp_role(sk), sk);
+   struct timeval now, t_recv;
+   suseconds_t delta = 0;
 
do_gettimeofday(&now);
 
@@ -773,23 +770,19 @@ static void ccid3_hc_rx_send_feedback(struct sock *sk)
DCCP_BUG("%s(%p) - Illegal state TERM", dccp_role(sk), sk);
return;
}
+   ccid3_pr_debug("Interval %ldusec, X_recv=%u, 1/p=%u\n", (long)delta,
+  hcrx->ccid3hcrx_x_recv, hcrx->ccid3hcrx_pinv);
 
-   packet = dccp_rx_hist_find_data_packet(&hcrx->ccid3hcrx_hist);
-   if (unlikely(packet == NULL)) {
-   DCCP_WARN("%s(%p), no data packet in history!\n",
- dccp_role(sk), sk);
-   return;
-   }
+   /* Elapsed time information [RFC 4340, 13.2] in units of 10 * usecs */
+   skb_get_timestamp(skb, &t_recv);
+   delta = timeval_delta(&now, &t_recv);
+   DCCP_BUG_ON(delta < 0);
 
+   hcrx->ccid3hcrx_elapsed_time = delta / 10;
hcrx->ccid3hcrx_tstamp_last_feedback = now;
-   hcrx->ccid3hcrx_ccval_last_counter   = packet->dccphrx_ccval;
+   hcrx->ccid3hcrx_ccval_last_counter   = dccp_hdr(skb)->dccph_ccval;
hcrx->ccid3hcrx_bytes_recv   = 0;
 
-   /* Elapsed time information [RFC 4340, 13.2] in units of 10 * usecs */
-   delta = timeval_delta(&now, &packet->dccphrx_tstamp);
-   DCCP_BUG_ON(delta < 0);
-   hcrx->ccid3hcrx_elapsed_time = delta / 10;
-
dp->dccps_hc_rx_insert_options = 1;
dccp_send_ack(sk);
 }
@@ -1084,7 +1077,7 @@ static void ccid3_hc_rx_packet_recv(struct sock *sk, 
struct sk_buff *skb)
ccid3_pr_debug("%s(%p, state=%s), skb=%p, sending initial "
   "feedback\n", dccp_role(sk), sk,
   dccp_state_name(sk->sk_state), skb);
-   ccid3_hc_rx_send_feedback(sk);
+   ccid3_hc_rx_send_feedback(sk, skb);
ccid3_hc_rx_set_state(sk, TFRC_RSTATE_DATA);
return;
case TFRC_RSTATE_DATA:
@@ -1096,7 +1089,7 @@ static void ccid3_hc_rx_packet_recv(struct sock *sk, 
struct sk_buff *skb)
if ((timeval_delta(&now, &hcrx->ccid3hcrx_tstamp_last_ack) -
 (suseconds_t)hcrx->ccid3hcrx_rtt) >= 0) {
hcrx->ccid3hcrx_tstamp_last_ack = now;
-   ccid3_hc_rx_send_feedback(sk);
+   ccid3_hc_rx_send_feedback(sk, skb);
}
return;
case TFRC_RSTATE_TERM:
@@ -1109,7 +1102,7 @@ static void ccid3_hc_rx_packet_recv(struct sock *sk, 
struct sk_buff *skb)
   dccp_role(sk), sk, dccp_state_name(sk->sk_state));
 
if (ccid3_hc_rx_update_p(hcrx))
-   ccid3_hc_rx_send_feedback(sk);
+   ccid3_hc_rx_send_feedback(sk, skb);
 }
 
 static int ccid3_hc_rx_init(struct ccid *ccid, struct sock *sk)
-- 
1.5.0.6

-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/29] Use skb timestamp for receiver side

2007-04-12 Thread Arnaldo Carvalho de Melo

On 4/12/07, David Miller <[EMAIL PROTECTED]> wrote:

From: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
Date: Thu, 12 Apr 2007 18:14:54 -0300

>  This patch replaces calls to do_gettimeofday at the receiver CCID 3
>  with skb timestamps. The skb timestamps are computed earlier in the
>  receive path, experimental measurements have shown that up to several
>  hundred microseconds can lie between the skb receive timestamp and the
>  timestamp taken when CCID 3 receives the packet. This difference has
>  a negative impact on RTT estimation (reduced accuracy).
>
> Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
> Acked-by: Ian McDonald <[EMAIL PROTECTED]>
> Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>

And this creates an even deeper dependency on the global packet
timestamping facility.  Those are supposed to be enabled only
in obscure circumstances because timestamping every packet is
incredibly expensive.

Please find a better way.


Well, there seems to be just one better way, to timestamp the packet
as soon as it enters the dccp stack, in dccp_v[46]_rcv, and consider
the time from the driver -> ip v[46] -> dccp_v[46]_ as "in the wire".

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/29] Honour initial RTT estimate

2007-04-12 Thread Arnaldo Carvalho de Melo

On 4/12/07, David Miller <[EMAIL PROTECTED]> wrote:

From: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
Date: Thu, 12 Apr 2007 18:15:05 -0300

> This is a small optimisation which improves the accuracy of TX
> RTT sampling when an initial RTT sample (e.g. from the intial
> Request/Response exchange) is available.
>
> Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
> Acked-by: Ian McDonald <[EMAIL PROTECTED]>
> Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>

Ok, even more...

I'm not pulling this stuff in, you guys have to work out a way
to implement this without killing performance on the rest of
the system.


I'm removing this one as well.

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3]: DCCP patches

2007-04-12 Thread Arnaldo Carvalho de Melo
Hi,

Ok, baby steps this time, please consider pulling from:

master.kernel.org:/pub/scm/linux/kernel/git/acme/net-2.6.22

Regards,

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] Fix bug in the calculation of very low sending rates

2007-04-12 Thread Arnaldo Carvalho de Melo
This fixes an error in the calculation of t_ipi when X converges towards
very low sending rates (between 1 and 64 bytes per second).

Although this case may not sound likely, it can be reproduced by connecting,
hitting enter (1 byte sent) and waiting for some time, during which the
nofeedback timer halves the sending rate until finally it reaches the region
1..64 bytes/sec. Computing X is handled correctly (tested separately); but by
dividing X _before_ entering the calculation of t_ipi, X becomes zero as
a result.  This in turn triggers a BUG condition caught in scaled_div().

Fixed by replacing with equivalent statement and explicit typecast for good
measure.

Calculation verified and effect of patch tested - reduced never below 1 byte
per 64 seconds afterwards, i.e. not allowing divide-by-zero.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Acked-by: Ian McDonald <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/ccids/ccid3.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index c3abd76..d7d9ce7 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -102,8 +102,8 @@ static inline u64 rfc3390_initial_rate(struct sock *sk)
 static inline void ccid3_update_send_interval(struct ccid3_hc_tx_sock *hctx)
 {
/* Calculate new t_ipi = s / X_inst (X_inst is in 64 * bytes/second) */
-   hctx->ccid3hctx_t_ipi = scaled_div(hctx->ccid3hctx_s,
-  hctx->ccid3hctx_x >> 6);
+   hctx->ccid3hctx_t_ipi = scaled_div32(((u64)hctx->ccid3hctx_s) << 6,
+hctx->ccid3hctx_x);
 
/* Calculate new delta by delta = min(t_ipi / 2, t_gran / 2) */
hctx->ccid3hctx_delta = min_t(u32, hctx->ccid3hctx_t_ipi / 2,
-- 
1.5.0.6

-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] Debug statements for Elapsed Time option

2007-04-12 Thread Arnaldo Carvalho de Melo
This prints the value of the parsed Elapsed Time when received via a
Timestamp Echo option [RFC 4342, 13.3].

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Acked-by: Ian McDonald <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/options.c |8 ++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/dccp/options.c b/net/dccp/options.c
index 14b6212..34d536d 100644
--- a/net/dccp/options.c
+++ b/net/dccp/options.c
@@ -172,21 +172,25 @@ int dccp_parse_options(struct sock *sk, struct sk_buff 
*skb)
opt_recv->dccpor_timestamp_echo = ntohl(*(__be32 
*)value);
 
dccp_pr_debug("%s rx opt: TIMESTAMP_ECHO=%u, len=%d, "
- "ackno=%llu, ",  dccp_role(sk),
+ "ackno=%llu", dccp_role(sk),
  opt_recv->dccpor_timestamp_echo,
  len + 2,
  (unsigned long long)
  DCCP_SKB_CB(skb)->dccpd_ack_seq);
 
 
-   if (len == 4)
+   if (len == 4) {
+   dccp_pr_debug_cat("\n");
break;
+   }
 
if (len == 6)
elapsed_time = ntohs(*(__be16 *)(value + 4));
else
elapsed_time = ntohl(*(__be32 *)(value + 4));
 
+   dccp_pr_debug_cat(", ELAPSED_TIME=%d\n", elapsed_time);
+
/* Give precedence to the biggest ELAPSED_TIME */
if (elapsed_time > opt_recv->dccpor_elapsed_time)
opt_recv->dccpor_elapsed_time = elapsed_time;
-- 
1.5.0.6

-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] Complete documentation of dccp_sock

2007-04-12 Thread Arnaldo Carvalho de Melo
This fills in missing documentation for dccp_sock fields.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 include/linux/dccp.h |   21 ++---
 1 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/include/linux/dccp.h b/include/linux/dccp.h
index e668cf5..fda2148 100644
--- a/include/linux/dccp.h
+++ b/include/linux/dccp.h
@@ -465,21 +465,20 @@ struct dccp_ackvec;
  * @dccps_service_list - second .. last service code on passive socket
  * @dccps_timestamp_time - time of latest TIMESTAMP option
  * @dccps_timestamp_echo - latest timestamp received on a TIMESTAMP option
- * @dccps_l_ack_ratio -
- * @dccps_r_ack_ratio -
+ * @dccps_l_ack_ratio - feature-local Ack Ratio
+ * @dccps_r_ack_ratio - feature-remote Ack Ratio
  * @dccps_pcslen - sender   partial checksum coverage (via sockopt)
  * @dccps_pcrlen - receiver partial checksum coverage (via sockopt)
  * @dccps_ndp_count - number of Non Data Packets since last data packet
- * @dccps_mss_cache -
- * @dccps_minisock -
+ * @dccps_mss_cache - current value of MSS (path MTU minus header sizes)
+ * @dccps_minisock - associated minisock (accessed via dccp_msk)
  * @dccps_hc_rx_ackvec - rx half connection ack vector
- * @dccps_hc_rx_ccid -
- * @dccps_hc_tx_ccid -
- * @dccps_options_received -
- * @dccps_epoch -
- * @dccps_role - Role of this sock, one of %dccp_role
- * @dccps_hc_rx_insert_options -
- * @dccps_hc_tx_insert_options -
+ * @dccps_hc_rx_ccid - CCID used for the receiver (or receiving 
half-connection)
+ * @dccps_hc_tx_ccid - CCID used for the sender (or sending half-connection)
+ * @dccps_options_received - parsed set of retrieved options
+ * @dccps_role - role of this sock, one of %dccp_role
+ * @dccps_hc_rx_insert_options - receiver wants to add options when acking
+ * @dccps_hc_tx_insert_options - sender wants to add options when sending
  * @dccps_xmit_timer - timer for when CCID is not ready to send
  * @dccps_syn_rtt - RTT sample from Request/Response exchange (in usecs)
  */
-- 
1.5.0.6

-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/29] Use skb timestamp for receiver side

2007-04-13 Thread Arnaldo Carvalho de Melo

On 4/13/07, Gerrit Renker <[EMAIL PROTECTED]> wrote:

Quoting Ian McDonald:



|  > If for instance you read the documentation accompanying that patch, there 
is a
|  > difference between 100 usec ... 1msec between skb_get_timestamp and taking 
the timestamp
|  > in the CCID3 receiver. It all adds up.
|  >
|  I think we need to look at whether the protocol is right for LANs but
|  I think if we carry out Eddie's suggestions it may well be right.
|  Against earlier code a couple of months ago with a couple of my
|  patches I was achieving around 90 Mbits/sec on iperf which matched TCP
|  performance and it was stable. It responded gracefully to loss also.



I have two boxes with Gbit cards here and saw speeds up to 750..820 Mbit.
It seems that the computation overhead allows it to reach up to 80..90% of the
link bandwidth (with coarse-granularity problems still unresolved).


|  I have not tested lately and tried to replicate but I know that the
|  state of tree before current patches couldn't do that any more... As
|  time permits I will redo the work.



Do you remember when the `bidirectional' patch was reverted? - after that the
CCID3 sender slowed down again to 75..80 Mbits/sec on a 100Mbit link.
This comes from the processing overhead, and was the original motivation for
this patch.


How many connections? Up to now, when I was more involved in DCCP
development, for the sake of testing the correctness of the protocol I
mostly tested with just a few connections, most of the time with just
one, which is OK while we're not yet feeling so good about the overal
correctness of the implementation, and because I mostly reused the TCP
machinery, but for performance we really have to test with many
connections, and in fact in combination with TCP connections, so that
we can see how DCCP affects overal system performance/stability.


That incidentally was the second `moronic' patch I had submitted (as it limited
connections to being one-directional). Without Andre's help it would probably 
still
be in. This really is the point I was trying to make in the email - I think we 
need to
focus more on the code. Here are two cases where the missing review/input came 
from outside.

I am not claiming to be any special. My hacking skills are rather moderate 
(i.e. you
have been warned about them patches :)


Neither me, and everybody commit mistakes, I should have known better
that net_enable_timestamp was not supposed to be enabled for normal
connections, just for things like tcpdump, etc, but in the end I
pushed for Dave trying to move things a bit forward, my bad, we really
have to take into account decisions we make that affects the rest of
the system :-\

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 15/25]: Disable softirq when sending and notifying CCID

2007-04-13 Thread Arnaldo Carvalho de Melo

On 3/21/07, Gerrit Renker <[EMAIL PROTECTED]> wrote:

[DCCP]: Disable softirq when sending and notifying CCID

This disables softirqs when performing the CCID-specific send operation
in dccp_write_xmit, so that actual sending, and calling the CCID post-send
routine becomes an atomic unit.

Why this needs to be done:
--
 The function dccp_write_xmit can be called both in user context (via
 dccp_sendmsg) and via timer softirq (dccp_write_xmit_timer). It does

  1. call the CCID-specific `pre-send' routine ccid_hc_tx_send_packet()
  2. ship the skb via dccp_transmit_skb
  3. call the CCID-specific `post-send' routine ccid_hc_tx_packet_sent().

 The last one does e.g. accounting by updating data records (as in CCID 3).

 The transition from 2 ... 3 should be atomic and not be interrupted
 by softirqs.  The reason is that the TX and RX halves of the CCID modules
 share data structures and both halves change state. If the sending process is
 allowed to be interrupted by the reception of a DCCP packet via softirq
 handler, then state and data structures of the sender can become corrupted.

 Here is an actual example whose effects were observed and lead to this patch:
 in CCID 3 the sender records a timestamp when ccid_hc_tx_packet_sent() is 
called.
 If the application is sending via dccp_sendmsg, it may be interrupted and run a
 little while later. Suppose that such interruption happens between steps (2) 
and
 (3) above: the packet has been sent, and immediately afterwards dccp_sendmsg is
 interrupted. Meanwhile the transmitted skb reaches the other side, and an Ack
 comes back; this Ack is processed via softirq (which is allowed to interrupt
 dccp_sendmsg); only then step (3) is performed, but too late: the timestamp
 taken in ccid3_hc_tx_packet_sent is now /after/ the Ack has come in.

 In the observed case, negative RTT samples (i.e. Acks arriving before the
 sent packet was registered) were the result.


Gerrit,

I looked at this one now and there and I noticed that we already
take a timestamp in the dccp_insert_options code path, so I think that
the right fix is to avoid taking two timestamps and using the one
taken at insert option time, i.e. when it gets to
dccp_hc_tx_packet_sent time we already have a timestamp in the packet,
i.e. avoid taking the timestamp after the packet is sent, just like
you did for the REQUEST case in dccp_insert_options, makes sense?

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/25]: Cheaper & smaller timestamping

2007-04-13 Thread Arnaldo Carvalho de Melo

On 4/13/07, Gerrit Renker <[EMAIL PROTECTED]> wrote:

Quoting Arnaldo Carvalho de Melo:
|  On 4/2/07, Gerrit Renker <[EMAIL PROTECTED]> wrote:
|  > | > [DCCP]: Cheaper & smaller timestamping
|  >
|  > | A couple of comments though:
|  > | - this is how I had the code originally in many cases and Arnaldo
|  > | changed - can't remember why.
|  > Arnaldo's code uses a reference point, the skb_get_timestamp() also did
|  > that. May be useful against old incarnations across reboots.
|
|  I got to this cset now in today's merge session, the cset that did that was:
|
|  b0e567806d16586629468c824dfb2e71155df7da
|
|  Comment was, unfortunately, not so clear:
|
|  
|  [EMAIL PROTECTED] net-2.6.22]$ git-show 
b0e567806d16586629468c824dfb2e71155df7da
|  commit b0e567806d16586629468c824dfb2e71155df7da
|  Author: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
|  Date: Fri Sep 9 02:38:35 2005 -0300
|
|   [DCCP] Introduce dccp_timestamp
|
|   To start the timestamps with 0.0ms, easing the integer maths in
|  the CCIDs, this
|   probably will be reworked to use the to be introduced struct timeval_offset
|   infrastructure out of skb_get_timestamp, etc.
|
|   Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
|  ---
|
|  What are the architectures you've been testing? All 64 bits? IIRC the
|  problem was related to overflows, suseconds_t is unsigned long, so 32
|  bits on i386...
I have tested on various i386 instances (where it is `long') and on sparc64 
(where it is `int', as
in parisc).  I was just calculating - there is indeed a chance to produce 
overflow, since we add
in timeval_add_usecs (since it is a signed type, the same problem seems to 
reappear in timeval_sub_usecs).

I don't have accurate figures at the moment with regard to overflow, but I 
assume that it is better to revert
this patch?

There is a related question  - with the new timesystem, should we convert to 
__get_realtime_clock_ts(), as
the comment above do_gettimeofday() says in kernel/timer.c ?

I would be glad for clarification, since there are quite a few patches to 
update.


I'd say leave the dccp_timestamp alone for now, I have to read a bit
more on the new time system and what was done on the net schedulers,
etc before being able to say something meaningful on this aspect.

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 15/25]: Disable softirq when sending and notifying CCID

2007-04-13 Thread Arnaldo Carvalho de Melo

On 4/13/07, Gerrit Renker <[EMAIL PROTECTED]> wrote:

Quoting Arnaldo Carvalho de Melo:
|  On 3/21/07, Gerrit Renker <[EMAIL PROTECTED]> wrote:
|  > [DCCP]: Disable softirq when sending and notifying CCID
|  >
|  > This disables softirqs when performing the CCID-specific send operation
|  > in dccp_write_xmit, so that actual sending, and calling the CCID post-send
|  > routine becomes an atomic unit.
|  >
|  > Why this needs to be done:
|  > --
|  >  The function dccp_write_xmit can be called both in user context (via
|  >  dccp_sendmsg) and via timer softirq (dccp_write_xmit_timer). It does
|  >
|  >   1. call the CCID-specific `pre-send' routine ccid_hc_tx_send_packet()
|  >   2. ship the skb via dccp_transmit_skb
|  >   3. call the CCID-specific `post-send' routine ccid_hc_tx_packet_sent().
|  >
|  >  The last one does e.g. accounting by updating data records (as in CCID 3).
|  >
|  >  The transition from 2 ... 3 should be atomic and not be interrupted
|  >  by softirqs.  The reason is that the TX and RX halves of the CCID modules
|  >  share data structures and both halves change state. If the sending 
process is
|  >  allowed to be interrupted by the reception of a DCCP packet via softirq
|  >  handler, then state and data structures of the sender can become 
corrupted.
|  >
|  >  Here is an actual example whose effects were observed and lead to this 
patch:
|  >  in CCID 3 the sender records a timestamp when ccid_hc_tx_packet_sent() is 
called.
|  >  If the application is sending via dccp_sendmsg, it may be interrupted and 
run a
|  >  little while later. Suppose that such interruption happens between steps 
(2) and
|  >  (3) above: the packet has been sent, and immediately afterwards 
dccp_sendmsg is
|  >  interrupted. Meanwhile the transmitted skb reaches the other side, and an 
Ack
|  >  comes back; this Ack is processed via softirq (which is allowed to 
interrupt
|  >  dccp_sendmsg); only then step (3) is performed, but too late: the 
timestamp
|  >  taken in ccid3_hc_tx_packet_sent is now /after/ the Ack has come in.
|  >
|  >  In the observed case, negative RTT samples (i.e. Acks arriving before the
|  >  sent packet was registered) were the result.
|
|  Gerrit,
|
|   I looked at this one now and there and I noticed that we already
|  take a timestamp in the dccp_insert_options code path, so I think that
|  the right fix is to avoid taking two timestamps and using the one
|  taken at insert option time, i.e. when it gets to
|  dccp_hc_tx_packet_sent time we already have a timestamp in the packet,
|  i.e. avoid taking the timestamp after the packet is sent, just like
|  you did for the REQUEST case in dccp_insert_options, makes sense?
|
I think you are making a good point - there is potential to integrate the use of
timestamps (I was thinking of the icsk_ack.lrcvtime field to keep a 32-bit 
timestamp).


icsk is per socket, we need something per packet, that will be put
into the DCCP_OPT_TIMESTAMP and used in the packet history.


I have a problem here - where in the code path do you mean? The only place 
dccp_insert_option_timestamp
is called is in ccid3_hc_rx_insert_options. Maybe I am confusing something here.


dccp_transmit_skb
 dccp_insert_options
 dccp_insert_timestamp (now just for the REQUEST packet, but we
can change this)


Is your intention to re-enable softirqs?


Intention is to reduce the window when softirqs are disabled, if they
need to be disabled at all.


I am quite a bit uneasy about it. When observing the
negative timestamps, I triggered a dump_stack. This kind of problem always 
happened when release_sock
called the backlog handler.


The problem you mentioned is that we want to put it into the TX
history only when we know for sure that the packet was sent, and after
we send the packet and before we put it into TX history we could be
softirq interrupted and process an ACK for this exact packet (which
_seems_ unlikely), so we would need to disable softirqs to make the
sequence:

   err = icsk->icsk_af_ops->queue_xmit(skb, 0);
   dccp_tx_packet_sent()

Uninterrupted by softirq, so it seems we need to disable softirq just
on this smaller window, no?

But how could that happen? If we receive the ACK before the packet was
put into TX history we will not find it in the TX history in the first
place, no?

I.e. if we are interrupted before putting the packet into the TX
history and the ACK is for another packet and this other packet is in
the TX history we won't get negative RTTs.

Please bear with me, I'm just trying to get back at Net/DCCP
development, may have said something stupid as I don't have that much
time right now for checking all this in detail.


There is so much interplay going on between CCID3 receiver and transmitter 
side, several timers, two
different history structures, that I thi

Re: [PATCH 4/25]: Cheaper & smaller timestamping

2007-04-13 Thread Arnaldo Carvalho de Melo

On 4/13/07, Gerrit Renker <[EMAIL PROTECTED]> wrote:

Quoting Arnaldo Carvalho de Melo:
|  > I have tested on various i386 instances (where it is `long') and on 
sparc64 (where it is `int', as
|  > in parisc). I was just calculating - there is indeed a chance to produce 
overflow, since we add
|  > in timeval_add_usecs (since it is a signed type, the same problem seems to 
reappear in timeval_sub_usecs).
|  >
|  > I don't have accurate figures at the moment with regard to overflow, but I 
assume that it is better to revert
|  > this patch?
|  >
|  > There is a related question - with the new timesystem, should we convert 
to __get_realtime_clock_ts(), as
|  > the comment above do_gettimeofday() says in kernel/timer.c ?
|  >
|  > I would be glad for clarification, since there are quite a few patches to 
update.
|
|  I'd say leave the dccp_timestamp alone for now, I have to read a bit
|  more on the new time system and what was done on the net schedulers,
|  etc before being able to say something meaningful on this aspect.
|
Just to be on the safe side then I will revert this patch 4/25 so that 
everything again uses dccp_epoch, ok?


Yes

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH-UPDATE]: Changes after retracting timestamping patches

2007-04-13 Thread Arnaldo Carvalho de Melo

On 4/13/07, Gerrit Renker <[EMAIL PROTECTED]> wrote:

Following the earlier discussions of patches, here is the update
after removing the skb_get_timestamp() patches.


OK, starting with the first, how can it be that it honours the initial
exchange if the timestamp is taken now not from the skb timestamp but
at that point in time? I.e. from a quick reading the comment doesn't
make sense anymore, as we're not getting the timestamp taken at packet
arrival, as dccp_timestamp just does a do_gettimeofday.

I think we should take the timestamp at dccp_v[46]_rcv time, into the
skb, that won't be  as precise as
skb_get_timestamp+net_enable_timestamp, but at least will eliminate
the time the packet sits in the backlog (if it sits at all), no?

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] DCCP: Shift code around

2007-05-10 Thread Arnaldo Carvalho de Melo

On 5/10/07, Ian McDonald <[EMAIL PROTECTED]> wrote:

This is just shifting code around and involves no renames except for module
initialisation. Also add/remove static, includes, EXPORT_SYMBOL_GPL as
needed. We remove dccp_li_hist_entry_delete as it's not actually used
anywhere!


Ian,

 Could you please split this up into several patches:

- one that _just_ moves code around, i.e. no code change at all

- another one that moves ccid3_li_hist to
net/dccp/ccids/lib/loss_interval.c and please check this:

-
[EMAIL PROTECTED] net-2.6]$ find net/dccp/  -type f | xargs grep ccid3_li_hist
net/dccp/ccids/ccid3.c: dccp_li_hist_purge(ccid3_li_hist,
&hcrx->ccid3hcrx_li_hist);
net/dccp/ccids/lib/loss_interval.c:struct dccp_li_hist *ccid3_li_hist;
net/dccp/ccids/lib/loss_interval.c:EXPORT_SYMBOL_GPL(ccid3_li_hist);
net/dccp/ccids/lib/loss_interval.c: if
(!dccp_li_hist_interval_new(ccid3_li_hist,
net/dccp/ccids/lib/loss_interval.c: entry =
dccp_li_hist_entry_new(ccid3_li_hist, GFP_ATOMIC);
net/dccp/ccids/lib/loss_interval.c:
kmem_cache_free(ccid3_li_hist->dccplih_slab, tail);
net/dccp/ccids/lib/loss_interval.c: ccid3_li_hist =
dccp_li_hist_new("ccid3");
net/dccp/ccids/lib/loss_interval.c: if (ccid3_li_hist == NULL)
net/dccp/ccids/lib/loss_interval.c: if (ccid3_li_hist != NULL) {
net/dccp/ccids/lib/loss_interval.c:
dccp_li_hist_delete(ccid3_li_hist);
net/dccp/ccids/lib/loss_interval.c: ccid3_li_hist = NULL;
net/dccp/ccids/lib/loss_interval.h:extern struct dccp_li_hist *ccid3_li_hist;
[EMAIL PROTECTED] net-2.6]$
-

dccp_li_hist_purge() with your patch becomes the only reason for
ccid3_li_hist to be exported, so please just don't pass it as this
function is _already_ in loss_interval.c where ccid3_li_hist is. Then
make ccid3_li_hist static, unexport it and rename it to dccp_li_hist,
just like all the other code and variables in loss_interval.c.

- then the last one, delete unused functions.

Patches that do many things are always bad for reviewing, sorry.

More comments below, but I'll wait till you do what I suggested for a
more thorough review.

No need to resend the first one in the series, the copyrights one,
I've already applied that to my local tree.

Thanks,

- Arnaldo




diff --git a/net/dccp/ccids/lib/loss_interval.c
b/net/dccp/ccids/lib/loss_interval.c
index 3829afc..d067d4a 100644
--- a/net/dccp/ccids/lib/loss_interval.c
+++ b/net/dccp/ccids/lib/loss_interval.c





+static __init int li_module_init(void)
+{
+   int rc = -ENOBUFS;
+
+   ccid3_li_hist = dccp_li_hist_new("ccid3");
+   if (ccid3_li_hist == NULL)
+   return rc;
+   else
+   return 0;
+}


Too convoluted. The rc variable has no reason to exist, please rewrite it as:

static __init int li_module_init(void)
{
   dccp_li_hist = dccp_li_hist_new("ccid3");
   return dccp_li_hist == NULL ? -ENOBUFS : 0;
}

The rename to dccp_li_hist is as suggested above.


-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] DCCP: Shift code around

2007-05-10 Thread Arnaldo Carvalho de Melo

On 5/10/07, Ian McDonald <[EMAIL PROTECTED]> wrote:

On 5/11/07, Arnaldo Carvalho de Melo <[EMAIL PROTECTED]> wrote:
> On 5/10/07, Ian McDonald <[EMAIL PROTECTED]> wrote:
> > This is just shifting code around and involves no renames except for module
> > initialisation. Also add/remove static, includes, EXPORT_SYMBOL_GPL as
> > needed. We remove dccp_li_hist_entry_delete as it's not actually used
> > anywhere!
>
> Ian,
>
>   Could you please split this up into several patches:

Working on that at moment. About to post one patch which stands alone OK.

> -
> [EMAIL PROTECTED] net-2.6]$ find net/dccp/  -type f | xargs grep ccid3_li_hist
> net/dccp/ccids/ccid3.c: dccp_li_hist_purge(ccid3_li_hist,
> &hcrx->ccid3hcrx_li_hist);
> net/dccp/ccids/lib/loss_interval.c:struct dccp_li_hist *ccid3_li_hist;
> net/dccp/ccids/lib/loss_interval.c:EXPORT_SYMBOL_GPL(ccid3_li_hist);
> net/dccp/ccids/lib/loss_interval.c: if
> (!dccp_li_hist_interval_new(ccid3_li_hist,
> net/dccp/ccids/lib/loss_interval.c: entry =
> dccp_li_hist_entry_new(ccid3_li_hist, GFP_ATOMIC);
> net/dccp/ccids/lib/loss_interval.c:
> kmem_cache_free(ccid3_li_hist->dccplih_slab, tail);
> net/dccp/ccids/lib/loss_interval.c: ccid3_li_hist =
> dccp_li_hist_new("ccid3");
> net/dccp/ccids/lib/loss_interval.c: if (ccid3_li_hist == NULL)
> net/dccp/ccids/lib/loss_interval.c: if (ccid3_li_hist != NULL) {
> net/dccp/ccids/lib/loss_interval.c:
> dccp_li_hist_delete(ccid3_li_hist);
> net/dccp/ccids/lib/loss_interval.c: ccid3_li_hist = NULL;
> net/dccp/ccids/lib/loss_interval.h:extern struct dccp_li_hist *ccid3_li_hist;
> [EMAIL PROTECTED] net-2.6]$
> -
>
> dccp_li_hist_purge() with your patch becomes the only reason for
> ccid3_li_hist to be exported, so please just don't pass it as this
> function is _already_ in loss_interval.c where ccid3_li_hist is. Then
> make ccid3_li_hist static, unexport it and rename it to dccp_li_hist,
> just like all the other code and variables in loss_interval.c.

I actually had picked that up but saved it for later Was trying
not to overwhelm you with patches but backfired a little ;-)

> Patches that do many things are always bad for reviewing, sorry.

Understand. You said that last time when it was 3, and now it's about
10-15 patches I think after I do this work... I should have learnt by
now!


not a problem, the problem is with complex patches, that takes more
time. If you make it easy by leaving the most complex ones to the tail
and submit small, self contained patches, I can eat them ok :-)


- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/7] DCCP: Fix dccp_sum_coverage

2007-05-28 Thread Arnaldo Carvalho de Melo

On 5/19/07, Ian McDonald <[EMAIL PROTECTED]> wrote:

When compiling with EXTRA_CFLAGS=-W notice that we have signed/unsigned
issue in dccp.h.

Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
---
diff --git a/net/dccp/dccp.h b/net/dccp/dccp.h
index d8ad27b..e2d74cd 100644
--- a/net/dccp/dccp.h
+++ b/net/dccp/dccp.h
@@ -184,7 +184,7 @@ DECLARE_SNMP_STAT(struct dccp_mib, dccp_statistics);
 /*
  * Checksumming routines
  */
-static inline int dccp_csum_coverage(const struct sk_buff *skb)
+static inline unsigned int dccp_csum_coverage(const struct sk_buff *skb)
 {
const struct dccp_hdr* dh = dccp_hdr(skb);

@@ -195,7 +195,7 @@ static inline int dccp_csum_coverage(const struct sk_buff
*skb)

 static inline void dccp_csum_outgoing(struct sk_buff *skb)
 {
-   int cov = dccp_csum_coverage(skb);
+   unsigned int cov = dccp_csum_coverage(skb);

if (cov >= skb->len)
dccp_hdr(skb)->dccph_cscov = 0;



Thanks, applied.

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5] DCCP: Shift ccid3_li_hist

2007-05-28 Thread Arnaldo Carvalho de Melo

On 5/11/07, Ian McDonald <[EMAIL PROTECTED]> wrote:

This shifts ccid3_li_hist.

We temporarily export the symbol but remove this in later patch.

Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>


Ian,

  Finally looking at this, sorry for taking so long. But I have
some issues with this loss intervals code reshuflling.

  The idea of having dccp_tfrc_lib.ko is to have things that are
not  specific to any CCID implementation, so that if some new CCID
uses the TFRC algorithms it would share code with CCID3, etc.

  So it is not really correct for the code in net/dccp/ccids/lib/
to access the half connection internal state, as it really don't know
what is the layout of such internal state.

In your patches you move ccid3_hc_rx_update_li from
net/dccp/ccids/ccid3.c to net/dccp/ccids/lib/loss_interval.c and then
renamed it to dccp_li_update_li, but this function still uses
ccid3_hc_rx_sk, not good.

So I'm looking at the intent of the patches and will post here
how I think it should have been done and the sequence of patches
preparing for such change, etc.

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5] DCCP: Shift ccid3_li_hist

2007-05-28 Thread Arnaldo Carvalho de Melo

On 5/28/07, Arnaldo Carvalho de Melo <[EMAIL PROTECTED]> wrote:

On 5/11/07, Ian McDonald <[EMAIL PROTECTED]> wrote:
> This shifts ccid3_li_hist.
>
> We temporarily export the symbol but remove this in later patch.
>
> Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>

Ian,

   Finally looking at this, sorry for taking so long. But I have
some issues with this loss intervals code reshuflling.

   The idea of having dccp_tfrc_lib.ko is to have things that are
not  specific to any CCID implementation, so that if some new CCID
uses the TFRC algorithms it would share code with CCID3, etc.

   So it is not really correct for the code in net/dccp/ccids/lib/
to access the half connection internal state, as it really don't know
what is the layout of such internal state.

 In your patches you move ccid3_hc_rx_update_li from
net/dccp/ccids/ccid3.c to net/dccp/ccids/lib/loss_interval.c and then
renamed it to dccp_li_update_li, but this function still uses
ccid3_hc_rx_sk, not good.

 So I'm looking at the intent of the patches and will post here
how I think it should have been done and the sequence of patches
preparing for such change, etc.


Please take a look my tree at:

http://git.kernel.org/?p=linux/kernel/git/acme/net-2.6.git

I rewrote your patches to achieve the same result while keeping
loss_interval.c CCID agnostic Please also take a look at how the
patches go on paving the way for the next  ones, without having to
make dccp_li_hist temporarily exported, etc.

This tree has just some copyright and compiler warning fixes besides
these csets, so if you agree on it just provide me the signed-off or
acked by lines and I'll add those and republish the tree.

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5] DCCP: Shift ccid3_li_hist

2007-05-30 Thread Arnaldo Carvalho de Melo

On 5/30/07, Gerrit Renker <[EMAIL PROTECTED]> wrote:

Quoting Arnaldo Carvalho de Melo:
|  The idea of having dccp_tfrc_lib.ko is to have things that are
|  not specific to any CCID implementation, so that if some new CCID
|  uses the TFRC algorithms it would share code with CCID3, etc.
|
|   So it is not really correct for the code in net/dccp/ccids/lib/
|  to access the half connection internal state, as it really don't know
|  what is the layout of such internal state.
|
|   In your patches you move ccid3_hc_rx_update_li from
|  net/dccp/ccids/ccid3.c to net/dccp/ccids/lib/loss_interval.c and then
|  renamed it to dccp_li_update_li, but this function still uses
|  ccid3_hc_rx_sk, not good.
|
|   So I'm looking at the intent of the patches and will post here
|  how I think it should have been done and the sequence of patches
|  preparing for such change, etc.
I understand your train of thought, since the same issue had bugged me;
I think I may have a solution for that in my patch set.

The problem is that
 * TFRC (dccp_tfrc_lib.ko) needs to know the length of the first loss
   interval (which is computed by ccid3_hc_rx_update_li)
 * fully agree that the TFRC module should not (need to) know the internals
   of the calling module - when CCID4 comes, the same problem will come again

The choice I have been using was to use a function pointer for calc_first_li:

/** tfrc_lh_interval_new  -  Insert new record into the Loss Interval database


if it inserts it should be named foo_add or foo_insert, _new, at least
in the convention I used in DCCP is just for constructors, just to
allocate space and initialize the fields. In other areas of the kernel
people use _aloc, like in sock_alloc with the same semantics. Only
after the object is created we use something like _add or _insert to
put it into some list, hash table, etc.


 * @lh:Loss Interval database


And who allocated lh, i.e. what is its constructor? is it embedded
into some other struct? like

struct ccid3_hc_rx_sock {
   struct tfrc_loss_hist ccid3hcrx_lh;
.
.
.
};

and then you call tfrc_lh_interval_new(&hcrx->ccid3hcrx_lh, ...) ?


 * @rh:Receive history containing a fresh loss event
 * @calc_first_li: Caller-dependent routine to compute length of first interval
 */
int tfrc_lh_interval_new(struct tfrc_loss_hist *lh, struct tfrc_rx_hist *rh,
 u32 (*calc_first_li)(struct sock *), struct sock *sk)

The price to pay is that the argument for *calc_first_li needs to be carried, 
too,
but other than that it separates the CCID3 and TFRC interfaces.


What are the reasons for sk to be needed? In the recent patches I kept
it just because we still use the ugly dccp_timestamp() interface that
needs it, but we have to solve that and at some point I think the best
thing is for the dccp_tfrc_lib code to not know a thing about struct
sock. But I digress... ok, we need some context in the ccid specific
code, it could be a void pointer that would be casted to, say, struct
ccid3_hc_rx_sock in the ccid3 case. Nah, leave it as sock, from were
we can get the whole context, even if we get to look at the other half
connection internal state somehow, the timestamp case has to be solved
but play no role here.



This function is not actually used (though visible) in CCID3, the one that is 
called is


Why visible?



**
 *  tfrc_rx_handle_loss  -  Loss detection and further processing
 *  @h: The non-empty RX history object
 *  @lh:Loss Intervals database to update
 *  @skb:   Currently received packet
 *  @ndp:   The NDP count belonging to @skb
 *  @calc_first_li: Caller-dependent computation of first loss interval in @lh
 *  Chooses action according to pending loss, updates LI database when a new
 *  loss was detected, and does required post-processing. Returns 1 when caller
 *  should send feedback, 0 otherwise.
 */
int tfrc_rx_handle_loss(struct tfrc_rx_hist *h,
struct tfrc_loss_hist *lh,
struct sk_buff *skb, u32 ndp,
u32 (*calc_first_li)(struct sock *), struct sock *sk)

This function is a `wrapper', which will call tfrc_lh_interval_new() when there 
is a
loss event (3 duplicate packets received). This is a single-function interface
to the TFRC module, covering al the loss processing.


Oh, I see, with the same intent as what is in my tree, i.e. Ian's
rewritten patches:

http://git.kernel.org/?p=linux/kernel/git/acme/net-2.6.git;a=blob_plain;f=net/dccp/ccids/lib/loss_interval.h;hb=e3370bb96ed004618a9398b6098421c7b05ac943

There are just tree functions, no loss interval
constructor/destructor, no loss event constructor/destructor, just
three functions:

extern void dccp_li_hist_purge(struct list_head *list);

extern u32 dccp_li_hist_calc_i_mean(struct list_head *list);

extern void dccp_li_update_li(struct sock *sk,
   

Re: [dccp] Re: problem with CCID3 loss events

2007-06-02 Thread Arnaldo Carvalho de Melo

On 6/1/07, Ian McDonald <[EMAIL PROTECTED]> wrote:

On 6/2/07, Patrick Andrieux <[EMAIL PROTECTED]> wrote:
> Hi again,
>
> I find out what was the problem. I recompile the latest 2.6.20 kernel
> with Ian's patches, and my iperf client doesn't freeze anymore. So I
> guess there is something wrong in Dave Miller's tree (I didn't use
> Ian's patches).
> dccp_probe module doesn't work either with his tree, it seg fault
> trying to insert it.
>
> Patrick.
>

Can you send your kernel .config files and also the scripts you use to
load dccp and netem so I can try and replicate here and fix.

I can't promise a quick fix but will look into it. Given Dave M's tree
for DCCP is the same as what Linus has I would like to fix it...

Do you have any more info from the segfault or freeze in your logs?


Please, send all this information to the list.

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] DCCP: Remove dccp_li_hist_entry_delete

2007-06-03 Thread Arnaldo Carvalho de Melo

On 6/3/07, Ian McDonald <[EMAIL PROTECTED]> wrote:

This removes dccp_li_hist_entry_delete as it's not used anywhere.


Yes, it is not being used, question is if it should be removed or if
we have to convert the open coded calls to
kmem_cache_free(dccp_li_cachep, entry) to use it, I think we should
use it, its the logical counterpart to dccp_li_hist_entry_new.

That or we kill also dccp_li_entry_new and use kmem_cache_alloc where
it is being used now. I prefer to keep both.

- Arnaldo



Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
---
diff --git a/net/dccp/ccids/lib/loss_interval.c
b/net/dccp/ccids/lib/loss_interval.c
index 83b5504..fd9c6d9 100644
--- a/net/dccp/ccids/lib/loss_interval.c
+++ b/net/dccp/ccids/lib/loss_interval.c
@@ -34,12 +34,6 @@ static inline struct dccp_li_hist_entry
*dccp_li_hist_entry_new(const gfp_t prio
return kmem_cache_alloc(dccp_li_cachep, prio);
 }

-static inline void dccp_li_hist_entry_delete(struct dccp_li_hist_entry
*entry)
-{
-   if (entry != NULL)
-   kmem_cache_free(dccp_li_cachep, entry);
-}
-
 void dccp_li_hist_purge(struct list_head *list)
 {
struct dccp_li_hist_entry *entry, *next;
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: segfault : modprobe dccp_probe

2007-06-05 Thread Arnaldo Carvalho de Melo

On 6/5/07, Patrick Andrieux <[EMAIL PROTECTED]> wrote:

Hi,

I have a segfault when inserting dccp_probe module on Dave Miller's
tree *without* Ian's patches. I got it using this cmd line :


Question, does it also segfaults _with_ Ian's patches?


do_page_fault+0x436/0x509
Jun  5 12:26:55 dccpServer kernel: [ 1027.751869]  []
error_code+0x6a/0x70
Jun  5 12:26:55 dccpServer kernel: [ 1027.751873]  []
__register_kprobe+0x23a/0x294
Jun  5 12:26:55 dccpServer kernel: [ 1027.751876]  []
register_jprobe+0x19/0x1b


I ask this because it is segfaulting at dccpprobe_init time, when it
is registering the jprobe. To me it looks like something changed in
the jprobes infrastructure and we are seeing a fallout from that. But
I haven't investigated this more than looking at this backtrace.


Jun  5 12:26:55 dccpServer kernel: [  1027.751880]  []
dccpprobe_init+0x76/0xb4 [dccp_probe]
Jun  5 12:26:55 dccpServer kernel: [ 1027.751885]  []
sys_init_module+0x12d5/0x1415
Jun  5 12:26:55 dccpServer kernel: [ 1027.751890 ]  []
sysenter_past_esp+0x5f/0x99
Jun  5 12:26:55 dccpServer kernel: [ 1027.751893]  ===

Let me know if you need anything else.

rgds,
Patrick.



-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: segfault : modprobe dccp_probe

2007-06-05 Thread Arnaldo Carvalho de Melo

On 6/5/07, Patrick Andrieux <[EMAIL PROTECTED]> wrote:

No, I only tried on my arch, not a laptop but a desktop, intel P4.

And you are right, I've the same problem with `modprobe -v tcp_probe'.
Do you think the problem could come from my 2.6.22 kernel config ?
Because it works on 2.6.20-12 kernel with Ian's patches...

As I said to Arnaldo, I'll try with Ian's patches.


You'll probably get the same results :)

This extra information that the problem happens as well with tcp_probe
seems to strongly point to a problem with kprobes, perhaps something
done after 2.6.20 that changes kprobes ABI and that was not reflected
even on the in-kernel kprobes/jprobes users.

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 'BUG: scheduling while atomic' during dccp transfer

2007-06-07 Thread Arnaldo Carvalho de Melo

On 6/7/07, Florian Westphal <[EMAIL PROTECTED]> wrote:

While transmitting data via dccp (Linux 2.6.21) I encountered this:

BUG: scheduling while atomic: firefox-bin/0x1100/6792
 [] schedule+0x43c/0x5c0
 [] skb_checksum+0x4f/0x2a0
 [] handle_level_irq+0x7a/0xc0
 [] __cond_resched+0x16/0x40
 [] cond_resched+0x2a/0x40
 [] __kmalloc+0x6a/0x80
 [] ccid2_hc_tx_alloc_seq+0x4a/0xd0 [dccp_ccid2]
 [] ccid2_hc_tx_packet_sent+0x13c/0x170 [dccp_ccid2]
 [] ccid2_hc_tx_packet_sent+0x0/0x170 [dccp_ccid2]
 [] dccp_write_xmit+0x207/0x300 [dccp]
 [] dccp_write_xmit_timer+0x37/0x40 [dccp]
 [] run_timer_softirq+0x123/0x180
 [] dccp_write_xmit_timer+0x0/0x40 [dccp]
 [] dccp_write_xmit_timer+0x0/0x40 [dccp]
 [] __do_softirq+0x42/0x90
 [] do_softirq+0x26/0x30
 [] irq_exit+0x5a/0x60
 [] do_IRQ+0x3d/0x70
 [] sys_gettimeofday+0x28/0x80
 [] common_interrupt+0x23/0x28
 [] km_policy_expired+0x10/0x50
 ===

Please CC me on replies, i'm not subscribed.
Thanks, Florian.


Can you try with the attached patch?

- Arnaldo
diff --git a/net/dccp/ccids/ccid2.c b/net/dccp/ccids/ccid2.c
index 248d20f..fb5710e 100644
--- a/net/dccp/ccids/ccid2.c
+++ b/net/dccp/ccids/ccid2.c
@@ -83,8 +83,7 @@ static void ccid2_hc_tx_check_sanity(const struct ccid2_hc_tx_sock *hctx)
 #define ccid2_hc_tx_check_sanity(hctx)
 #endif
 
-static int ccid2_hc_tx_alloc_seq(struct ccid2_hc_tx_sock *hctx, int num,
- gfp_t gfp)
+static int ccid2_hc_tx_alloc_seq(struct ccid2_hc_tx_sock *hctx, int num)
 {
 	struct ccid2_seq *seqp;
 	int i;
@@ -95,7 +94,7 @@ static int ccid2_hc_tx_alloc_seq(struct ccid2_hc_tx_sock *hctx, int num,
 		return -ENOMEM;
 
 	/* allocate buffer and initialize linked list */
-	seqp = kmalloc(sizeof(*seqp) * num, gfp);
+	seqp = kmalloc(sizeof(*seqp) * num, GFP_ATOMIC);
 	if (seqp == NULL)
 		return -ENOMEM;
 
@@ -298,7 +297,7 @@ static void ccid2_hc_tx_packet_sent(struct sock *sk, int more, unsigned int len)
 		int rc;
 
 		ccid2_pr_debug("allocating more space in history\n");
-		rc = ccid2_hc_tx_alloc_seq(hctx, CCID2_SEQBUF_LEN, GFP_KERNEL);
+		rc = ccid2_hc_tx_alloc_seq(hctx, CCID2_SEQBUF_LEN);
 		BUG_ON(rc); /* XXX what do we do? */
 
 		next = hctx->ccid2hctx_seqh->ccid2s_next;
@@ -771,7 +770,7 @@ static int ccid2_hc_tx_init(struct ccid *ccid, struct sock *sk)
 	hctx->ccid2hctx_seqbufc   = 0;
 
 	/* XXX init ~ to window size... */
-	if (ccid2_hc_tx_alloc_seq(hctx, CCID2_SEQBUF_LEN, GFP_ATOMIC) != 0)
+	if (ccid2_hc_tx_alloc_seq(hctx, CCID2_SEQBUF_LEN) != 0)
 		return -ENOMEM;
 
 	hctx->ccid2hctx_sent	 = 0;


Re: 'BUG: scheduling while atomic' during dccp transfer

2007-06-07 Thread Arnaldo Carvalho de Melo

On 6/7/07, Florian Westphal <[EMAIL PROTECTED]> wrote:

Arnaldo Carvalho de Melo <[EMAIL PROTECTED]> wrote:
> >While transmitting data via dccp (Linux 2.6.21) I encountered this:
> >
> >BUG: scheduling while atomic: firefox-bin/0x1100/6792
[..]

> Can you try with the attached patch?

I haven't been able to reproduce it so far.


You mean with this patch applied?

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/10] First DCCP batch for 2.6.23

2007-06-15 Thread Arnaldo Carvalho de Melo
Hi David,

 Please consider pulling from:

master.kernel.org:/pub/scm/linux/kernel/git/acme/net-2.6.23

Regards,

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/10] ccid3: Update copyrights

2007-06-15 Thread Arnaldo Carvalho de Melo
Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/ccids/ccid3.c |4 ++--
 net/dccp/ccids/lib/loss_interval.c |4 ++--
 net/dccp/ccids/lib/loss_interval.h |4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index ec7fa4d..2d203ae 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -1,8 +1,8 @@
 /*
  *  net/dccp/ccids/ccid3.c
  *
- *  Copyright (c) 2005 The University of Waikato, Hamilton, New Zealand.
- *  Copyright (c) 2005-6 Ian McDonald <[EMAIL PROTECTED]>
+ *  Copyright (c) 2005-7 The University of Waikato, Hamilton, New Zealand.
+ *  Copyright (c) 2005-7 Ian McDonald <[EMAIL PROTECTED]>
  *
  *  An implementation of the DCCP protocol
  *
diff --git a/net/dccp/ccids/lib/loss_interval.c 
b/net/dccp/ccids/lib/loss_interval.c
index 372d7e7..3829afc 100644
--- a/net/dccp/ccids/lib/loss_interval.c
+++ b/net/dccp/ccids/lib/loss_interval.c
@@ -1,8 +1,8 @@
 /*
  *  net/dccp/ccids/lib/loss_interval.c
  *
- *  Copyright (c) 2005 The University of Waikato, Hamilton, New Zealand.
- *  Copyright (c) 2005-6 Ian McDonald <[EMAIL PROTECTED]>
+ *  Copyright (c) 2005-7 The University of Waikato, Hamilton, New Zealand.
+ *  Copyright (c) 2005-7 Ian McDonald <[EMAIL PROTECTED]>
  *  Copyright (c) 2005 Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
  *
  *  This program is free software; you can redistribute it and/or modify
diff --git a/net/dccp/ccids/lib/loss_interval.h 
b/net/dccp/ccids/lib/loss_interval.h
index eb25701..1e48fe3 100644
--- a/net/dccp/ccids/lib/loss_interval.h
+++ b/net/dccp/ccids/lib/loss_interval.h
@@ -3,8 +3,8 @@
 /*
  *  net/dccp/ccids/lib/loss_interval.h
  *
- *  Copyright (c) 2005 The University of Waikato, Hamilton, New Zealand.
- *  Copyright (c) 2005 Ian McDonald <[EMAIL PROTECTED]>
+ *  Copyright (c) 2005-7 The University of Waikato, Hamilton, New Zealand.
+ *  Copyright (c) 2005-7 Ian McDonald <[EMAIL PROTECTED]>
  *  Copyright (c) 2005 Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
  *
  *  This program is free software; you can redistribute it and/or modify it
-- 
1.5.0.6

-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/10] Fix dccp_sum_coverage

2007-06-15 Thread Arnaldo Carvalho de Melo
When compiling with EXTRA_CFLAGS=-W notice that we have signed/unsigned issue
in dccp.h.

Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
---
 net/dccp/dccp.h |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/dccp/dccp.h b/net/dccp/dccp.h
index d8ad27b..e2d74cd 100644
--- a/net/dccp/dccp.h
+++ b/net/dccp/dccp.h
@@ -184,7 +184,7 @@ DECLARE_SNMP_STAT(struct dccp_mib, dccp_statistics);
 /*
  * Checksumming routines
  */
-static inline int dccp_csum_coverage(const struct sk_buff *skb)
+static inline unsigned int dccp_csum_coverage(const struct sk_buff *skb)
 {
const struct dccp_hdr* dh = dccp_hdr(skb);
 
@@ -195,7 +195,7 @@ static inline int dccp_csum_coverage(const struct sk_buff 
*skb)
 
 static inline void dccp_csum_outgoing(struct sk_buff *skb)
 {
-   int cov = dccp_csum_coverage(skb);
+   unsigned int cov = dccp_csum_coverage(skb);
 
if (cov >= skb->len)
dccp_hdr(skb)->dccph_cscov = 0;
-- 
1.5.0.6

-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/10] loss_interval: Fix timeval initialisation

2007-06-15 Thread Arnaldo Carvalho de Melo
When compiling with EXTRA_CFLAGS=-W noticed that tstamp is not initialised
correctly in dccp_li_calc_first_li.

Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
---
 net/dccp/ccids/ccid3.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index 2d203ae..9686a8d 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -829,7 +829,7 @@ static u32 ccid3_hc_rx_calc_first_li(struct sock *sk)
struct dccp_rx_hist_entry *entry, *next, *tail = NULL;
u32 x_recv, p;
suseconds_t rtt, delta;
-   struct timeval tstamp = { 0, };
+   struct timeval tstamp = { 0, 0 };
int interval = 0;
int win_count = 0;
int step = 0;
-- 
1.5.0.6

-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/10] Remove accesses to ccid3_hc_rx_sock in ccid3_hc_rx_{update,calc_first}_li

2007-06-15 Thread Arnaldo Carvalho de Melo
This is a preparatory patch for moving these loss interval functions from
net/dccp/ccids/ccid3.c to net/dccp/ccids/lib/loss_interval.c.

Based on a patch by Ian McDonald.

Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/ccids/ccid3.c |   62 +++
 1 files changed, 41 insertions(+), 21 deletions(-)

diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index 9686a8d..fb500d3 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -823,9 +823,12 @@ static int ccid3_hc_rx_insert_options(struct sock *sk, 
struct sk_buff *skb)
  *
  * returns estimated loss interval in usecs */
 
-static u32 ccid3_hc_rx_calc_first_li(struct sock *sk)
+static u32 ccid3_hc_rx_calc_first_li(struct sock *sk,
+struct list_head *hist_list,
+struct timeval *last_feedback,
+u16 s, u32 bytes_recv,
+u32 previous_x_recv)
 {
-   struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk);
struct dccp_rx_hist_entry *entry, *next, *tail = NULL;
u32 x_recv, p;
suseconds_t rtt, delta;
@@ -835,8 +838,7 @@ static u32 ccid3_hc_rx_calc_first_li(struct sock *sk)
int step = 0;
u64 fval;
 
-   list_for_each_entry_safe(entry, next, &hcrx->ccid3hcrx_hist,
-dccphrx_node) {
+   list_for_each_entry_safe(entry, next, hist_list, dccphrx_node) {
if (dccp_rx_hist_entry_data_packet(entry)) {
tail = entry;
 
@@ -895,19 +897,20 @@ found:
}
 
dccp_timestamp(sk, &tstamp);
-   delta = timeval_delta(&tstamp, &hcrx->ccid3hcrx_tstamp_last_feedback);
+   delta = timeval_delta(&tstamp, last_feedback);
DCCP_BUG_ON(delta <= 0);
 
-   x_recv = scaled_div32(hcrx->ccid3hcrx_bytes_recv, delta);
+   x_recv = scaled_div32(bytes_recv, delta);
if (x_recv == 0) {  /* would also trigger divide-by-zero */
DCCP_WARN("X_recv==0\n");
-   if ((x_recv = hcrx->ccid3hcrx_x_recv) == 0) {
+   if (previous_x_recv == 0) {
DCCP_BUG("stored value of X_recv is zero");
return ~0;
}
+   x_recv = previous_x_recv;
}
 
-   fval = scaled_div(hcrx->ccid3hcrx_s, rtt);
+   fval = scaled_div(s, rtt);
fval = scaled_div32(fval, x_recv);
p = tfrc_calc_x_reverse_lookup(fval);
 
@@ -920,26 +923,36 @@ found:
return 100 / p;
 }
 
-static void ccid3_hc_rx_update_li(struct sock *sk, u64 seq_loss, u8 win_loss)
+static void ccid3_hc_rx_update_li(struct sock *sk,
+ struct list_head *li_hist_list,
+ struct list_head *hist_list,
+ struct timeval *last_feedback,
+ u16 s, u32 bytes_recv,
+ u32 previous_x_recv,
+ u64 seq_loss, u8 win_loss)
 {
-   struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk);
struct dccp_li_hist_entry *head;
u64 seq_temp;
 
-   if (list_empty(&hcrx->ccid3hcrx_li_hist)) {
+   if (list_empty(li_hist_list)) {
if (!dccp_li_hist_interval_new(ccid3_li_hist,
-  &hcrx->ccid3hcrx_li_hist, seq_loss, win_loss))
+  li_hist_list, seq_loss,
+  win_loss))
return;
 
-   head = list_entry(hcrx->ccid3hcrx_li_hist.next,
-  struct dccp_li_hist_entry, dccplih_node);
-   head->dccplih_interval = ccid3_hc_rx_calc_first_li(sk);
+   head = list_entry(li_hist_list->next, struct dccp_li_hist_entry,
+ dccplih_node);
+   head->dccplih_interval =
+   ccid3_hc_rx_calc_first_li(sk, hist_list,
+ last_feedback, s,
+ bytes_recv,
+ previous_x_recv);
} else {
struct dccp_li_hist_entry *entry;
struct list_head *tail;
 
-   head = list_entry(hcrx->ccid3hcrx_li_hist.next,
-  struct dccp_li_hist_entry, dccplih_node);
+   head = list_entry(li_hist_list->next, struct dccp_li_hist_entry,
+ dccplih_node);
/* FIXME win count check removed as was wrong */
/* should make this check with receive history */
/* and compare there as per section 10.2 of RFC4342 */
@

[PATCH 05/10] [CCID3]: Pass ccid3_li_hist to ccid3_hc_rx_update_li

2007-06-15 Thread Arnaldo Carvalho de Melo
Now ccid3_hc_rx_update_li is ready to be moved to
net/dccp/ccids/lib/loss_interval, it uses the same interface as the other
functions there.

Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/ccids/ccid3.c |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index fb500d3..52a71a9 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -924,6 +924,7 @@ found:
 }
 
 static void ccid3_hc_rx_update_li(struct sock *sk,
+ struct dccp_li_hist *li_hist,
  struct list_head *li_hist_list,
  struct list_head *hist_list,
  struct timeval *last_feedback,
@@ -935,9 +936,8 @@ static void ccid3_hc_rx_update_li(struct sock *sk,
u64 seq_temp;
 
if (list_empty(li_hist_list)) {
-   if (!dccp_li_hist_interval_new(ccid3_li_hist,
-  li_hist_list, seq_loss,
-  win_loss))
+   if (!dccp_li_hist_interval_new(li_hist, li_hist_list,
+  seq_loss, win_loss))
return;
 
head = list_entry(li_hist_list->next, struct dccp_li_hist_entry,
@@ -960,7 +960,7 @@ static void ccid3_hc_rx_update_li(struct sock *sk,
/* new loss event detected */
/* calculate last interval length */
seq_temp = dccp_delta_seqno(head->dccplih_seqno, seq_loss);
-   entry = dccp_li_hist_entry_new(ccid3_li_hist, GFP_ATOMIC);
+   entry = dccp_li_hist_entry_new(li_hist, GFP_ATOMIC);
 
if (entry == NULL) {
DCCP_BUG("out of memory - can not allocate entry");
@@ -971,7 +971,7 @@ static void ccid3_hc_rx_update_li(struct sock *sk,
 
tail = li_hist_list->prev;
list_del(tail);
-   kmem_cache_free(ccid3_li_hist->dccplih_slab, tail);
+   kmem_cache_free(li_hist->dccplih_slab, tail);
 
/* Create the newest interval */
entry->dccplih_seqno = seq_loss;
@@ -1005,7 +1005,7 @@ static int ccid3_hc_rx_detect_loss(struct sock *sk,
while (dccp_delta_seqno(hcrx->ccid3hcrx_seqno_nonloss, seqno)
   > TFRC_RECV_NUM_LATE_LOSS) {
loss = 1;
-   ccid3_hc_rx_update_li(sk,
+   ccid3_hc_rx_update_li(sk, ccid3_li_hist,
  &hcrx->ccid3hcrx_li_hist,
  &hcrx->ccid3hcrx_hist,
  &hcrx->ccid3hcrx_tstamp_last_feedback,
-- 
1.5.0.6

-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/10] [DCCP] loss_interval: Move ccid3_hc_rx_update_li to loss_interval

2007-06-15 Thread Arnaldo Carvalho de Melo
Renaming it to dccp_li_update_li.

Also based on previous work by Ian McDonald.

Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/ccids/ccid3.c |  179 ++--
 net/dccp/ccids/lib/loss_interval.c |  160 
 net/dccp/ccids/lib/loss_interval.h |7 ++
 3 files changed, 176 insertions(+), 170 deletions(-)

diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index 52a71a9..9d2e2c1 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -819,167 +819,6 @@ static int ccid3_hc_rx_insert_options(struct sock *sk, 
struct sk_buff *skb)
return 0;
 }
 
-/* calculate first loss interval
- *
- * returns estimated loss interval in usecs */
-
-static u32 ccid3_hc_rx_calc_first_li(struct sock *sk,
-struct list_head *hist_list,
-struct timeval *last_feedback,
-u16 s, u32 bytes_recv,
-u32 previous_x_recv)
-{
-   struct dccp_rx_hist_entry *entry, *next, *tail = NULL;
-   u32 x_recv, p;
-   suseconds_t rtt, delta;
-   struct timeval tstamp = { 0, 0 };
-   int interval = 0;
-   int win_count = 0;
-   int step = 0;
-   u64 fval;
-
-   list_for_each_entry_safe(entry, next, hist_list, dccphrx_node) {
-   if (dccp_rx_hist_entry_data_packet(entry)) {
-   tail = entry;
-
-   switch (step) {
-   case 0:
-   tstamp= entry->dccphrx_tstamp;
-   win_count = entry->dccphrx_ccval;
-   step = 1;
-   break;
-   case 1:
-   interval = win_count - entry->dccphrx_ccval;
-   if (interval < 0)
-   interval += TFRC_WIN_COUNT_LIMIT;
-   if (interval > 4)
-   goto found;
-   break;
-   }
-   }
-   }
-
-   if (unlikely(step == 0)) {
-   DCCP_WARN("%s(%p), packet history has no data packets!\n",
- dccp_role(sk), sk);
-   return ~0;
-   }
-
-   if (unlikely(interval == 0)) {
-   DCCP_WARN("%s(%p), Could not find a win_count interval > 0."
- "Defaulting to 1\n", dccp_role(sk), sk);
-   interval = 1;
-   }
-found:
-   if (!tail) {
-   DCCP_CRIT("tail is null\n");
-   return ~0;
-   }
-
-   delta = timeval_delta(&tstamp, &tail->dccphrx_tstamp);
-   DCCP_BUG_ON(delta < 0);
-
-   rtt = delta * 4 / interval;
-   ccid3_pr_debug("%s(%p), approximated RTT to %dus\n",
-  dccp_role(sk), sk, (int)rtt);
-
-   /*
-* Determine the length of the first loss interval via inverse lookup.
-* Assume that X_recv can be computed by the throughput equation
-*  s
-*  X_recv = 
-*   R * fval
-* Find some p such that f(p) = fval; return 1/p [RFC 3448, 6.3.1].
-*/
-   if (rtt == 0) { /* would result in divide-by-zero */
-   DCCP_WARN("RTT==0\n");
-   return ~0;
-   }
-
-   dccp_timestamp(sk, &tstamp);
-   delta = timeval_delta(&tstamp, last_feedback);
-   DCCP_BUG_ON(delta <= 0);
-
-   x_recv = scaled_div32(bytes_recv, delta);
-   if (x_recv == 0) {  /* would also trigger divide-by-zero */
-   DCCP_WARN("X_recv==0\n");
-   if (previous_x_recv == 0) {
-   DCCP_BUG("stored value of X_recv is zero");
-   return ~0;
-   }
-   x_recv = previous_x_recv;
-   }
-
-   fval = scaled_div(s, rtt);
-   fval = scaled_div32(fval, x_recv);
-   p = tfrc_calc_x_reverse_lookup(fval);
-
-   ccid3_pr_debug("%s(%p), receive rate=%u bytes/s, implied "
-  "loss rate=%u\n", dccp_role(sk), sk, x_recv, p);
-
-   if (p == 0)
-   return ~0;
-   else
-   return 100 / p;
-}
-
-static void ccid3_hc_rx_update_li(struct sock *sk,
- struct dccp_li_hist *li_hist,
- struct list_head *li_hist_list,
- struct list_head *hist_list,
- struct timeval *last_feedback,
- u16 s, u32 bytes_recv,
- u32 previous_x_recv,
- u64 

[PATCH 07/10] loss_interval: unexport dccp_li_hist_interval_new

2007-06-15 Thread Arnaldo Carvalho de Melo
Now its only used inside the loss_interval code.

Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/ccids/lib/loss_interval.c |7 +++
 net/dccp/ccids/lib/loss_interval.h |3 ---
 2 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/net/dccp/ccids/lib/loss_interval.c 
b/net/dccp/ccids/lib/loss_interval.c
index ee59fde..8ac68c6 100644
--- a/net/dccp/ccids/lib/loss_interval.c
+++ b/net/dccp/ccids/lib/loss_interval.c
@@ -120,8 +120,9 @@ u32 dccp_li_hist_calc_i_mean(struct list_head *list)
 
 EXPORT_SYMBOL_GPL(dccp_li_hist_calc_i_mean);
 
-int dccp_li_hist_interval_new(struct dccp_li_hist *hist,
-   struct list_head *list, const u64 seq_loss, const u8 win_loss)
+static int dccp_li_hist_interval_new(struct dccp_li_hist *hist,
+struct list_head *list,
+const u64 seq_loss, const u8 win_loss)
 {
struct dccp_li_hist_entry *entry;
int i;
@@ -142,8 +143,6 @@ int dccp_li_hist_interval_new(struct dccp_li_hist *hist,
return 1;
 }
 
-EXPORT_SYMBOL_GPL(dccp_li_hist_interval_new);
-
 /* calculate first loss interval
  *
  * returns estimated loss interval in usecs */
diff --git a/net/dccp/ccids/lib/loss_interval.h 
b/net/dccp/ccids/lib/loss_interval.h
index 17f173a..653328d 100644
--- a/net/dccp/ccids/lib/loss_interval.h
+++ b/net/dccp/ccids/lib/loss_interval.h
@@ -52,9 +52,6 @@ extern void dccp_li_hist_purge(struct dccp_li_hist *hist,
 
 extern u32 dccp_li_hist_calc_i_mean(struct list_head *list);
 
-extern int dccp_li_hist_interval_new(struct dccp_li_hist *hist,
-   struct list_head *list, const u64 seq_loss, const u8 win_loss);
-
 extern void dccp_li_update_li(struct sock *sk, struct dccp_li_hist *li_hist,
  struct list_head *li_hist_list,
  struct list_head *hist_list,
-- 
1.5.0.6

-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/10] loss_interval: Make dccp_li_hist_entry_{new,delete} private

2007-06-15 Thread Arnaldo Carvalho de Melo
Not used outside the loss_interval code anymore.

Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/ccids/lib/loss_interval.c |   14 ++
 net/dccp/ccids/lib/loss_interval.h |   14 --
 2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/net/dccp/ccids/lib/loss_interval.c 
b/net/dccp/ccids/lib/loss_interval.c
index 8ac68c6..28eac9b 100644
--- a/net/dccp/ccids/lib/loss_interval.c
+++ b/net/dccp/ccids/lib/loss_interval.c
@@ -62,6 +62,20 @@ void dccp_li_hist_delete(struct dccp_li_hist *hist)
 
 EXPORT_SYMBOL_GPL(dccp_li_hist_delete);
 
+static inline struct dccp_li_hist_entry *
+   dccp_li_hist_entry_new(struct dccp_li_hist *hist,
+  const gfp_t prio)
+{
+   return kmem_cache_alloc(hist->dccplih_slab, prio);
+}
+
+static inline void dccp_li_hist_entry_delete(struct dccp_li_hist *hist,
+struct dccp_li_hist_entry *entry)
+{
+   if (entry != NULL)
+   kmem_cache_free(hist->dccplih_slab, entry);
+}
+
 void dccp_li_hist_purge(struct dccp_li_hist *hist, struct list_head *list)
 {
struct dccp_li_hist_entry *entry, *next;
diff --git a/net/dccp/ccids/lib/loss_interval.h 
b/net/dccp/ccids/lib/loss_interval.h
index 653328d..8d3c9bf 100644
--- a/net/dccp/ccids/lib/loss_interval.h
+++ b/net/dccp/ccids/lib/loss_interval.h
@@ -33,20 +33,6 @@ struct dccp_li_hist_entry {
u32  dccplih_interval;
 };
 
-static inline struct dccp_li_hist_entry *
-   dccp_li_hist_entry_new(struct dccp_li_hist *hist,
-  const gfp_t prio)
-{
-   return kmem_cache_alloc(hist->dccplih_slab, prio);
-}
-
-static inline void dccp_li_hist_entry_delete(struct dccp_li_hist *hist,
-struct dccp_li_hist_entry *entry)
-{
-   if (entry != NULL)
-   kmem_cache_free(hist->dccplih_slab, entry);
-}
-
 extern void dccp_li_hist_purge(struct dccp_li_hist *hist,
   struct list_head *list);
 
-- 
1.5.0.6

-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/10] loss_interval: Nuke dccp_li_hist

2007-06-15 Thread Arnaldo Carvalho de Melo
It had just a slab cache, so, for the sake of simplicity just make
dccp_trfc_lib module init routine create the slab cache, no need for users of
the lib to create a private loss_interval object.

Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/ccids/ccid3.c |   18 +--
 net/dccp/ccids/lib/loss_interval.c |   92 
 net/dccp/ccids/lib/loss_interval.h |   12 +
 3 files changed, 36 insertions(+), 86 deletions(-)

diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index 9d2e2c1..407f10c 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -49,7 +49,6 @@ static int ccid3_debug;
 
 static struct dccp_tx_hist *ccid3_tx_hist;
 static struct dccp_rx_hist *ccid3_rx_hist;
-static struct dccp_li_hist *ccid3_li_hist;
 
 /*
  * Transmitter Half-Connection Routines
@@ -844,7 +843,7 @@ static int ccid3_hc_rx_detect_loss(struct sock *sk,
while (dccp_delta_seqno(hcrx->ccid3hcrx_seqno_nonloss, seqno)
   > TFRC_RECV_NUM_LATE_LOSS) {
loss = 1;
-   dccp_li_update_li(sk, ccid3_li_hist,
+   dccp_li_update_li(sk,
  &hcrx->ccid3hcrx_li_hist,
  &hcrx->ccid3hcrx_hist,
  &hcrx->ccid3hcrx_tstamp_last_feedback,
@@ -1011,7 +1010,7 @@ static void ccid3_hc_rx_exit(struct sock *sk)
dccp_rx_hist_purge(ccid3_rx_hist, &hcrx->ccid3hcrx_hist);
 
/* Empty loss interval history */
-   dccp_li_hist_purge(ccid3_li_hist, &hcrx->ccid3hcrx_li_hist);
+   dccp_li_hist_purge(&hcrx->ccid3hcrx_li_hist);
 }
 
 static void ccid3_hc_rx_get_info(struct sock *sk, struct tcp_info *info)
@@ -1095,19 +1094,12 @@ static __init int ccid3_module_init(void)
if (ccid3_tx_hist == NULL)
goto out_free_rx;
 
-   ccid3_li_hist = dccp_li_hist_new("ccid3");
-   if (ccid3_li_hist == NULL)
-   goto out_free_tx;
-
rc = ccid_register(&ccid3);
if (rc != 0)
-   goto out_free_loss_interval_history;
+   goto out_free_tx;
 out:
return rc;
 
-out_free_loss_interval_history:
-   dccp_li_hist_delete(ccid3_li_hist);
-   ccid3_li_hist = NULL;
 out_free_tx:
dccp_tx_hist_delete(ccid3_tx_hist);
ccid3_tx_hist = NULL;
@@ -1130,10 +1122,6 @@ static __exit void ccid3_module_exit(void)
dccp_rx_hist_delete(ccid3_rx_hist);
ccid3_rx_hist = NULL;
}
-   if (ccid3_li_hist != NULL) {
-   dccp_li_hist_delete(ccid3_li_hist);
-   ccid3_li_hist = NULL;
-   }
 }
 module_exit(ccid3_module_exit);
 
diff --git a/net/dccp/ccids/lib/loss_interval.c 
b/net/dccp/ccids/lib/loss_interval.c
index 28eac9b..e6b1f0c 100644
--- a/net/dccp/ccids/lib/loss_interval.c
+++ b/net/dccp/ccids/lib/loss_interval.c
@@ -18,71 +18,26 @@
 #include "packet_history.h"
 #include "tfrc.h"
 
-struct dccp_li_hist *dccp_li_hist_new(const char *name)
-{
-   struct dccp_li_hist *hist = kmalloc(sizeof(*hist), GFP_ATOMIC);
-   static const char dccp_li_hist_mask[] = "li_hist_%s";
-   char *slab_name;
-
-   if (hist == NULL)
-   goto out;
-
-   slab_name = kmalloc(strlen(name) + sizeof(dccp_li_hist_mask) - 1,
-   GFP_ATOMIC);
-   if (slab_name == NULL)
-   goto out_free_hist;
-
-   sprintf(slab_name, dccp_li_hist_mask, name);
-   hist->dccplih_slab = kmem_cache_create(slab_name,
-sizeof(struct dccp_li_hist_entry),
-  0, SLAB_HWCACHE_ALIGN,
-  NULL, NULL);
-   if (hist->dccplih_slab == NULL)
-   goto out_free_slab_name;
-out:
-   return hist;
-out_free_slab_name:
-   kfree(slab_name);
-out_free_hist:
-   kfree(hist);
-   hist = NULL;
-   goto out;
-}
-
-EXPORT_SYMBOL_GPL(dccp_li_hist_new);
-
-void dccp_li_hist_delete(struct dccp_li_hist *hist)
-{
-   const char* name = kmem_cache_name(hist->dccplih_slab);
-
-   kmem_cache_destroy(hist->dccplih_slab);
-   kfree(name);
-   kfree(hist);
-}
-
-EXPORT_SYMBOL_GPL(dccp_li_hist_delete);
+struct kmem_cache *dccp_li_cachep __read_mostly;
 
-static inline struct dccp_li_hist_entry *
-   dccp_li_hist_entry_new(struct dccp_li_hist *hist,
-  const gfp_t prio)
+static inline struct dccp_li_hist_entry *dccp_li_hist_entry_new(const gfp_t 
prio)
 {
-   return kmem_cache_alloc(hist->dccplih_slab, prio);
+   return kmem_cache_alloc(dccp_li_cachep, prio);
 }
 
-static inline void dccp_li_hist_entry_delete(struct dccp_li_hist *hist,
-struct dccp_li_hist_entry *en

[PATCH 10/10] loss_interval: make struct dccp_li_hist_entry private

2007-06-15 Thread Arnaldo Carvalho de Melo
net/dccp/ccids/lib/loss_interval.c is the only place where this struct is used.

Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/ccids/lib/loss_interval.c |9 +
 net/dccp/ccids/lib/loss_interval.h |   10 --
 2 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/net/dccp/ccids/lib/loss_interval.c 
b/net/dccp/ccids/lib/loss_interval.c
index e6b1f0c..01c1edb 100644
--- a/net/dccp/ccids/lib/loss_interval.c
+++ b/net/dccp/ccids/lib/loss_interval.c
@@ -18,6 +18,15 @@
 #include "packet_history.h"
 #include "tfrc.h"
 
+#define DCCP_LI_HIST_IVAL_F_LENGTH  8
+
+struct dccp_li_hist_entry {
+   struct list_head dccplih_node;
+   u64  dccplih_seqno:48,
+dccplih_win_count:4;
+   u32  dccplih_interval;
+};
+
 struct kmem_cache *dccp_li_cachep __read_mostly;
 
 static inline struct dccp_li_hist_entry *dccp_li_hist_entry_new(const gfp_t 
prio)
diff --git a/net/dccp/ccids/lib/loss_interval.h 
b/net/dccp/ccids/lib/loss_interval.h
index f35c111..906c806 100644
--- a/net/dccp/ccids/lib/loss_interval.h
+++ b/net/dccp/ccids/lib/loss_interval.h
@@ -14,18 +14,8 @@
  */
 
 #include 
-#include 
 #include 
 
-#define DCCP_LI_HIST_IVAL_F_LENGTH  8
-
-struct dccp_li_hist_entry {
-   struct list_head dccplih_node;
-   u64  dccplih_seqno:48,
-dccplih_win_count:4;
-   u32  dccplih_interval;
-};
-
 extern void dccp_li_hist_purge(struct list_head *list);
 
 extern u32 dccp_li_hist_calc_i_mean(struct list_head *list);
-- 
1.5.0.6

-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3]: Inline for time delta

2007-06-16 Thread Arnaldo Carvalho de Melo

On 6/11/07, Gerrit Renker <[EMAIL PROTECTED]> wrote:

Quoting Ian McDonald:
|  > --- a/net/dccp/dccp.h
|  > +++ b/net/dccp/dccp.h
|  > @@ -421,6 +421,11 @@ static inline suseconds_t timeval_delta(
|  > return secs * USEC_PER_SEC + usecs;
|  > }
|  >
|  > +static inline s64 ktime_delta(ktime_t later, ktime_t earlier)
|  > +{
|  > + return ktime_to_us(ktime_sub(later, earlier));
|  > +}
|  > +
|  > static inline void timeval_add_usecs(struct timeval *tv,
|  > const suseconds_t usecs)
|  > {
|
|  Judging from patches I've seen rejected recently I'd say this should
|  be shifted into the same place as ktime_sub is defined. Also rename to
|  ktime_sub_us.
|
ktime_sub is defined in include/linux/ktime.h - the file only contains generic
definitions, everything is in nanoseconds, struct timeval, or struct timespec.

The other place is include/linux/skbuff.h, where net_timedelta() is defined.
This is probably due to the skb->tstamp field - as a utility function to be used
with skbs.

Neither place seems appropriate for above function, it is more specific than the
ones in ktime.h. I am at a loss where else if not the DCCP code, where it is
specifically used, to place it.

The name was chosen for consistency with timeval_delta(), which is the current
function for microsecond time differences in net/dccp/dccp.h.


I'm going to rename ktime_delta() to ktime_us_delta(), as it receives
two ktime_t values, but returns a delta in microseconds, not in
ktime_t.

Don't worry, I'll fixup the later patches, after a while we get
everything sorted out.

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3]: Sending time: update to ktime_t and bug-fix

2007-06-16 Thread Arnaldo Carvalho de Melo

On 6/9/07, Gerrit Renker <[EMAIL PROTECTED]> wrote:

[CCID3]: Sending time: update to ktime_t and bug-fix

This updates the computation of t_nom and t_last_win_count to use the newer
gettimeofday interface.

The second point of this patch is to fix a bug in the send time processing:

  ccid3_hc_tx_send_packet currently returns 0 when the time difference
  between current time and t_nom is less than 1000 microseconds.

  In this case the packet is sent immediately; but, unlike other packets
  that can be emitted on first attempt, it will not have its window
  counter updated and its options set as required. This is a bug.

Fix: Require the time difference to be at least 1000 microseconds. The
 algorithm then converges: time differences > 1000 microseconds trigger the
 timer in dccp_write_xmit; after timer expiry this function is tried again;
 when the time difference is less than 1000, the packet will have its 
options
 added and window counter updated as required.


I'm breaking this into two patches for you this time :-)

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3]: Inline for time delta

2007-06-16 Thread Arnaldo Carvalho de Melo

On 6/16/07, Gerrit Renker <[EMAIL PROTECTED]> wrote:

Quoting Arnaldo Carvalho de Melo:
|  > The name was chosen for consistency with timeval_delta(), which is the 
current
|  > function for microsecond time differences in net/dccp/dccp.h.
|
|  I'm going to rename ktime_delta() to ktime_us_delta(), as it receives
|  two ktime_t values, but returns a delta in microseconds, not in
|  ktime_t.
I am going to change that in my tree as well, since I am busy updating with 
regard
to most recent changes anyway.


I just checked with Thomas Gleixner, the ktime_t guy and he is ok with
adding ktime_us_delta() to ktime.h, which I have in my tree already.

I'm now staring at this:
  /* set the nominal send time for the next following packet */
-timeval_add_usecs(&hctx->ccid3hctx_t_nom, hctx->ccid3hctx_t_ipi);
+   hctx->ccid3hctx_t_nom = ktime_add_ns(hctx->ccid3hctx_t_nom,
+hctx->ccid3hctx_t_ipi * 1000);

I'm thinking about just keeping t_ipi in nanoseconds to avoid doing
the multiplies back and forth, but haven't fully looked at the other
t_ipi uses, quick thoughts? Stupid idea?

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/10] First DCCP batch for 2.6.23

2007-06-16 Thread Arnaldo Carvalho de Melo

On 6/16/07, Gerrit Renker <[EMAIL PROTECTED]> wrote:

Thanks for the update.
Just for information - I will be updating all my patches based on this set of
10, as I expect them to be in David Miller's tree soon.

If people would like me to resubmit the updates, please give me a shout as I
will otherwise prepare these updates for the test tree.


David already merged these :-) I'm now going over the ktime_t stuff,
see other messages from today.

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3]: Inline for time delta

2007-06-16 Thread Arnaldo Carvalho de Melo

On 6/16/07, Arnaldo Carvalho de Melo <[EMAIL PROTECTED]> wrote:

On 6/16/07, Gerrit Renker <[EMAIL PROTECTED]> wrote:
> Quoting Arnaldo Carvalho de Melo:
> |  > The name was chosen for consistency with timeval_delta(), which is the 
current
> |  > function for microsecond time differences in net/dccp/dccp.h.
> |
> |  I'm going to rename ktime_delta() to ktime_us_delta(), as it receives
> |  two ktime_t values, but returns a delta in microseconds, not in
> |  ktime_t.
> I am going to change that in my tree as well, since I am busy updating with 
regard
> to most recent changes anyway.

I just checked with Thomas Gleixner, the ktime_t guy and he is ok with
adding ktime_us_delta() to ktime.h, which I have in my tree already.

I'm now staring at this:
   /* set the nominal send time for the next following packet */
-timeval_add_usecs(&hctx->ccid3hctx_t_nom, hctx->ccid3hctx_t_ipi);
+   hctx->ccid3hctx_t_nom = ktime_add_ns(hctx->ccid3hctx_t_nom,
+hctx->ccid3hctx_t_ipi * 1000);

I'm thinking about just keeping t_ipi in nanoseconds to avoid doing
the multiplies back and forth, but haven't fully looked at the other
t_ipi uses, quick thoughts? Stupid idea?


Nah, will just add ktime_add_us to the ktime infrastructure (also
ACKed by Thomas).

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3]: Inline for time delta

2007-06-16 Thread Arnaldo Carvalho de Melo

On 6/16/07, Gerrit Renker <[EMAIL PROTECTED]> wrote:

Quoting Arnaldo Carvalho de Melo:
|  > I am going to change that in my tree as well, since I am busy updating 
with regard
|  > to most recent changes anyway.
|
|  I just checked with Thomas Gleixner, the ktime_t guy and he is ok with
|  adding ktime_us_delta() to ktime.h, which I have in my tree already.
Ok when that comes through, the patch can simply be dropped.

|  I'm now staring at this:
|   /* set the nominal send time for the next following packet */
|  - timeval_add_usecs(&hctx->ccid3hctx_t_nom, hctx->ccid3hctx_t_ipi);
|  + hctx->ccid3hctx_t_nom = ktime_add_ns(hctx->ccid3hctx_t_nom,
|  + hctx->ccid3hctx_t_ipi * 1000);
|
|  I'm thinking about just keeping t_ipi in nanoseconds to avoid doing
|  the multiplies back and forth, but haven't fully looked at the other
|  t_ipi uses, quick thoughts? Stupid idea?
Please keep it for the moment. The packet scheduler needs an overhaul in any 
case,
so the above IMO is the simplest way of aligning the old code with the new 
interface.

Overhauling the packet scheduler will cost some work, but it is one of the next 
items
that should be addressed when through with the current set of patches.


OK! I'm just adding this to ktime.h:

static inline ktime_t ktime_add_us(const ktime_t kt, const u64 usec)
{
   return ktime_add_ns(kt, usec * 1000);
}

And will use it where needed.

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3]: Update computation of X to use newer timeofday interface

2007-06-16 Thread Arnaldo Carvalho de Melo

On 6/11/07, Gerrit Renker <[EMAIL PROTECTED]> wrote:

Quoting Ian McDonald:
|  This looks OK but...
|  > -static void ccid3_hc_tx_update_x(struct sock *sk, struct timeval *now)
|  > +static void ccid3_hc_tx_update_x(struct sock *sk, ktime_t *stamp)
|  >
|  I don't see callers updated. Is this what you mean by patches being
|  interdependent?
That is right - the callers are updated in the next bundle of patches. This
set contains 29 patches in total, divided into the following sets:
 1. ktime_t update   [sent]
 2. RTT and timestamping patches [sent]
 3. CCID3 TX history - locking and migration to singly-linked list
   (suggestion by Arnaldo)  => to be sent
 4. Update of computation wrt draft rfc3448bis-00   => to be sent
 5. Miscellaneous CCID3 patches => to be sent
 6. Miscellaneous DCCP patches  => to be sent

The callers are updated in subset (4) - I will send that later today.

These patches are first and foremost meant for the upcoming `experimental' tree,
I was thinking that sending one or two bundles at a time is better than sending
the whole 29 patches at once.

Once they are in the tree, after going through this stage, the relationships are
easier to see.

Thus, interdependencies are unavoidable at this stage, but they have been 
minimised
to quite a great degree: with some labour, 5 patches that overlapped with others
were merged across 50 or so patches.

Doing even more integration would make them less readable.


We'll have to avoid it, I'm looking at how to do it...


I'm just stuck with t_ld not having being converted to ktime_t, so
I'll fix this up by...

-   hctx->ccid3hctx_t_ld = now;
+   hctx->ccid3hctx_t_ld = ktime_to_timeval(now);

With this the second patch builds, now to look at the 3rd.


Gerrit, have you ever used git-bisect? Think about what people that
build distro kernels, where our DCCP stuff is built, will think about
us when doing a bisect to find the changeset that introduced a bug in
a completely unrelated area such as sysrq+M oopsing on machines with
sparse memory maps (example: myself last week 8) ) and the build
breaks because of such patch interdependency...

So I'll try to help you in finding ways for never, ever having the
tree not building at any point in time.

This is OK when in a rush and in a private tree, not in something that
we expect to merge :-)

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3]: Update computation of X to use newer timeofday interface

2007-06-16 Thread Arnaldo Carvalho de Melo

On 6/16/07, Gerrit Renker <[EMAIL PROTECTED]> wrote:

Quoting Arnaldo Carvalho de Melo:
|  We'll have to avoid it, I'm looking at how to do it...
|
|
|  I'm just stuck with t_ld not having being converted to ktime_t, so
|  I'll fix this up by...
|
|  - hctx->ccid3hctx_t_ld = now;
|  + hctx->ccid3hctx_t_ld = ktime_to_timeval(now);
|
|  With this the second patch builds, now to look at the 3rd.
|
|
|  Gerrit, have you ever used git-bisect? Think about what people that
|  build distro kernels, where our DCCP stuff is built, will think about
|  us when doing a bisect to find the changeset that introduced a bug in
|  a completely unrelated area such as sysrq+M oopsing on machines with
|  sparse memory maps (example: myself last week 8) ) and the build
|  breaks because of such patch interdependency...
|
|  So I'll try to help you in finding ways for never, ever having the
|  tree not building at any point in time.
|
|  This is OK when in a rush and in a private tree, not in something that
|  we expect to merge :-)
In that case please don't consider these patches for merging. I may not
have been clear, but these patches are first and foremost destined for
the experimental tree, and there are still 12 patches outstanding.


OK, but doing the simple ktime_to_timeval above I was able to merge
something that moves us further, its clear and self contained, so I
loved it and merged.



I.e. you are trying to build on something which is not yet complete.


I made it self contained (the second patch), split it so that the one
after it fixes the bug you described, which improves what we have in
the tree.


We had these inter-dependencies before and they have been giving me (and
from what I read not only me) a lot of hell. It is all inter-related
and it is extremely hard to make patches both readable, split in logical
units, and compile as well.


I know,


I agree with your point, but trying to dissect these patches while in the
middle of a submission is madness.


Hey, think that what we're doing is: you're working on the
experimental tree, I'm trying to cherry pick what I think its small,
self contained and fixes things, reducing the size of the experimental
tree.


My idea was to put that into the test tree and offer it for testing.

I myself have been testing this code for 4..5 months now with good results.


Thank you for that!


I appreciate your help but I think this patch set can wait - at least until
I have had time to submit the rest, please.


I'll continue reading it, and will discuss with you in advance the
ones I think I can merge, ok?

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5]: Shorten statement for updating p

2007-06-16 Thread Arnaldo Carvalho de Melo

On 6/11/07, Gerrit Renker <[EMAIL PROTECTED]> wrote:

[CCID 3]: Shorten statement for updating p

This shortens the statement for updating the loss event
rate p when a feedback packet is received.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
---
 net/dccp/ccids/ccid3.c |7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -436,12 +436,9 @@ static void ccid3_hc_tx_packet_recv(stru
hctx->ccid3hctx_x_recv = opt_recv->ccid3or_receive_rate;
hctx->ccid3hctx_x_recv <<= 6;

-   /* Update loss event rate */
+   /* Update loss event rate (scaled by 1e6), cf. RFC 4342, 8.5 */
pinv = opt_recv->ccid3or_loss_event_rate;
-   if (pinv == ~0U || pinv == 0)  /* see RFC 4342, 8.5   */
-   hctx->ccid3hctx_p = 0;
-   else   /* can not exceed 100% */
-   hctx->ccid3hctx_p = 100 / pinv;
+   hctx->ccid3hctx_p = (pinv == ~0U || pinv == 0)? 0 : 
scaled_div(1, pinv);

/*
 * Calculate new RTT sample and update moving average


I love '?:', but not when things get complex. I think this should be
kept as if+else.

I'll pick the conversion to scaled_div tho, looks right.

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4]: DCCP ktime_t initial work

2007-06-17 Thread Arnaldo Carvalho de Melo
Hi David,

Please consider pulling from:

master.kernel.org:/pub/scm/linux/kernel/git/acme/net-2.6.23

I briefed Thomas Gleixner about the new ktime_us_delta and
ktime_add_us functions and he is OK with having them added to ktime.h.

More work on this are going on Gerrit Renker's experimental tree
that I plan to review and cherry pick soon, thanks Gerrit for all the
hard work!

Best Regards,

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] [KTIME]: Introduce ktime_us_delta

2007-06-17 Thread Arnaldo Carvalho de Melo
This provides a reusable time difference function which returns the difference 
in
microseconds, as often used in the DCCP code.

Commiter note: renamed ktime_delta to ktime_us_delta and put it in ktime.h.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 include/linux/ktime.h |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/include/linux/ktime.h b/include/linux/ktime.h
index c762954..a208f9f 100644
--- a/include/linux/ktime.h
+++ b/include/linux/ktime.h
@@ -267,6 +267,11 @@ static inline s64 ktime_to_us(const ktime_t kt)
return (s64) tv.tv_sec * USEC_PER_SEC + tv.tv_usec;
 }
 
+static inline s64 ktime_us_delta(const ktime_t later, const ktime_t earlier)
+{
+   return ktime_to_us(ktime_sub(later, earlier));
+}
+
 /*
  * The resolution of the clocks. The resolution value is returned in
  * the clock_getres() system call to give application programmers an
-- 
1.5.0.6

-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] [KTIME]: Introduce ktime_add_us

2007-06-17 Thread Arnaldo Carvalho de Melo
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 include/linux/ktime.h |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/include/linux/ktime.h b/include/linux/ktime.h
index a208f9f..9800bae 100644
--- a/include/linux/ktime.h
+++ b/include/linux/ktime.h
@@ -272,6 +272,11 @@ static inline s64 ktime_us_delta(const ktime_t later, 
const ktime_t earlier)
return ktime_to_us(ktime_sub(later, earlier));
 }
 
+static inline ktime_t ktime_add_us(const ktime_t kt, const u64 usec)
+{
+   return ktime_add_ns(kt, usec * 1000);
+}
+
 /*
  * The resolution of the clocks. The resolution value is returned in
  * the clock_getres() system call to give application programmers an
-- 
1.5.0.6

-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] [CCID3]: Sending time: update to ktime_t

2007-06-17 Thread Arnaldo Carvalho de Melo
This updates the computation of t_nom and t_last_win_count to use the newer
gettimeofday interface.

Committer note: used ktime_to_timeval to set the 'now' variable to t_ld in
ccid3hctx_no_feedback_timer

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/ccids/ccid3.c |   33 +
 net/dccp/ccids/ccid3.h |5 +++--
 2 files changed, 16 insertions(+), 22 deletions(-)

diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index 407f10c..94b3a1a 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -193,25 +193,20 @@ static inline void ccid3_hc_tx_update_s(struct 
ccid3_hc_tx_sock *hctx, int len)
  * The algorithm is not applicable if RTT < 4 microseconds.
  */
 static inline void ccid3_hc_tx_update_win_count(struct ccid3_hc_tx_sock *hctx,
-   struct timeval *now)
+   ktime_t now)
 {
-   suseconds_t delta;
u32 quarter_rtts;
 
if (unlikely(hctx->ccid3hctx_rtt < 4))  /* avoid divide-by-zero */
return;
 
-   delta = timeval_delta(now, &hctx->ccid3hctx_t_last_win_count);
-   DCCP_BUG_ON(delta < 0);
-
-   quarter_rtts = (u32)delta / (hctx->ccid3hctx_rtt / 4);
+   quarter_rtts = ktime_us_delta(now, hctx->ccid3hctx_t_last_win_count);
+   quarter_rtts /= hctx->ccid3hctx_rtt / 4;
 
if (quarter_rtts > 0) {
-   hctx->ccid3hctx_t_last_win_count = *now;
+   hctx->ccid3hctx_t_last_win_count = now;
hctx->ccid3hctx_last_win_count  += min_t(u32, quarter_rtts, 5);
hctx->ccid3hctx_last_win_count  &= 0xF; /* mod 16 */
-
-   ccid3_pr_debug("now at %#X\n", hctx->ccid3hctx_last_win_count);
}
 }
 
@@ -311,8 +306,8 @@ static int ccid3_hc_tx_send_packet(struct sock *sk, struct 
sk_buff *skb)
 {
struct dccp_sock *dp = dccp_sk(sk);
struct ccid3_hc_tx_sock *hctx = ccid3_hc_tx_sk(sk);
-   struct timeval now;
-   suseconds_t delay;
+   ktime_t now = ktime_get_real();
+   s64 delay;
 
BUG_ON(hctx == NULL);
 
@@ -324,8 +319,6 @@ static int ccid3_hc_tx_send_packet(struct sock *sk, struct 
sk_buff *skb)
if (unlikely(skb->len == 0))
return -EBADMSG;
 
-   dccp_timestamp(sk, &now);
-
switch (hctx->ccid3hctx_state) {
case TFRC_SSTATE_NO_SENT:
sk_reset_timer(sk, &hctx->ccid3hctx_no_feedback_timer,
@@ -348,7 +341,7 @@ static int ccid3_hc_tx_send_packet(struct sock *sk, struct 
sk_buff *skb)
ccid3_pr_debug("SYN RTT = %uus\n", dp->dccps_syn_rtt);
hctx->ccid3hctx_rtt  = dp->dccps_syn_rtt;
hctx->ccid3hctx_x= rfc3390_initial_rate(sk);
-   hctx->ccid3hctx_t_ld = now;
+   hctx->ccid3hctx_t_ld = ktime_to_timeval(now);
} else {
/* Sender does not have RTT sample: X = MSS/second */
hctx->ccid3hctx_x = dp->dccps_mss_cache;
@@ -360,7 +353,7 @@ static int ccid3_hc_tx_send_packet(struct sock *sk, struct 
sk_buff *skb)
break;
case TFRC_SSTATE_NO_FBACK:
case TFRC_SSTATE_FBACK:
-   delay = timeval_delta(&hctx->ccid3hctx_t_nom, &now);
+   delay = ktime_us_delta(hctx->ccid3hctx_t_nom, now);
ccid3_pr_debug("delay=%ld\n", (long)delay);
/*
 *  Scheduling of packet transmissions [RFC 3448, 4.6]
@@ -370,10 +363,10 @@ static int ccid3_hc_tx_send_packet(struct sock *sk, 
struct sk_buff *skb)
 * else
 *   // send the packet in (t_nom - t_now) milliseconds.
 */
-   if (delay - (suseconds_t)hctx->ccid3hctx_delta >= 0)
-   return delay / 1000L;
+   if (delay - (s64)hctx->ccid3hctx_delta >= 0)
+   return (u32)delay / 1000L;
 
-   ccid3_hc_tx_update_win_count(hctx, &now);
+   ccid3_hc_tx_update_win_count(hctx, now);
break;
case TFRC_SSTATE_TERM:
DCCP_BUG("%s(%p) - Illegal state TERM", dccp_role(sk), sk);
@@ -386,8 +379,8 @@ static int ccid3_hc_tx_send_packet(struct sock *sk, struct 
sk_buff *skb)
hctx->ccid3hctx_idle = 0;
 
/* set the nominal send time for the next following packet */
-   timeval_add_usecs(&hctx->ccid3hctx_t_nom, hctx->ccid3hctx_t_ipi);
-
+   hctx->ccid3hctx_t_nom = ktime_add_us(hctx->ccid3hctx_t_nom,
+hctx->ccid3hctx_t_ipi);
return 0;
 }
 
diff --gi

[PATCH 4/4] [CCID3]: Fix a bug in the send time processing

2007-06-17 Thread Arnaldo Carvalho de Melo
ccid3_hc_tx_send_packet currently returns 0 when the time difference between
current time and t_nom is less than 1000 microseconds.

In this case the packet is sent immediately; but, unlike other packets that can
be emitted on first attempt, it will not have its window counter updated and
its options set as required. This is a bug.

Fix: Require the time difference to be at least 1000 microseconds. The
algorithm then converges: time differences > 1000 microseconds trigger the
timer in dccp_write_xmit; after timer expiry this function is tried again; when
the time difference is less than 1000, the packet will have its options added
and window counter updated as required.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/ccids/ccid3.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index 94b3a1a..e91c2b9 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -363,7 +363,7 @@ static int ccid3_hc_tx_send_packet(struct sock *sk, struct 
sk_buff *skb)
 * else
 *   // send the packet in (t_nom - t_now) milliseconds.
 */
-   if (delay - (s64)hctx->ccid3hctx_delta >= 0)
+   if (delay - (s64)hctx->ccid3hctx_delta >= 1000)
return (u32)delay / 1000L;
 
ccid3_hc_tx_update_win_count(hctx, now);
-- 
1.5.0.6

-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/1]: Current BUGs

2007-06-19 Thread Arnaldo Carvalho de Melo

On 6/19/07, Gerrit Renker <[EMAIL PROTECTED]> wrote:

I received note from Tommi Saviranta with bug information which is copied below.
One bug we had recently (reported by Florian Westphal), I attach my patch for
it (having observed the same thing at home); now there is a third occurrence.

I believe we should fix this soon.


1. Write queue not empty

|   "KERNEL: assertion (skb_queue_empty(&sk->sk_write_queue))
|  failed at net/core/stream.c (276)" in system log.
I observed this also at some time - but with TCP.


Humm, this means that when we call sk_stream_kill_queues, that now is
only called from inet_csk_destroy_sock (the other user is out of the
tree, in LLC patches I never got enough time to polish and submit)
that is called in three places:

-> when we are killing childs that we're almost finishing the
connection setup (in inet_csk_listen_stop, called from dccp_close on
the master socket or in dccp_disconnect)

-> in dccp_close for a client socket

-> in dccp_done, that is when the socket is in TIME_WAIT, finally
having its last remnants released or in error conditions (write error
-> timeout)

The BUG_TRAP basically means that we have packet(s) in the
sk_write_queue, that we should have purged before, ideas?


2. Out-of-order segments

|  At some point I've also had the following line in syslog, possibly
|  related to failing full duplex:
|
|   dccp_check_seqno: DCCP: Step 6 failed for ACK packet,
|   (LSWL(194687531369580) <= P.seqno(194687531369777) <= 
S.SWH(194687531369679))
|   and (P.ackno exists
|   or LAWL(195643175609843) <= P.ackno(195643175713728) <= 
S.AWH(195643175713921),
|   sending SYNC...
Ian observed this in December - the most recent occurrence was the Sync-flood 
fixes
(which will be resubmitted soon).


OK, try to make it applicable to what we have in net-2.6.23, i.e.
independent of the stuff we have now in the experimental tree.


3. Memory allocation while in atomic context (the bug)
--
|  At worst case scenario, such as when running iperf,
|   host2% ./iperf --protocol DCCP -l 500 -c 192.168.1.1 -p 5001 -t 60
|  results in kernel panic which totally kills networking:
|
|  
|  CCID: Registered CCID 2 (ccid2)
|  BUG: sleeping function called from invalid context at mm/slab.c:3035
|  in_atomic():1, irqs_disabled():0
|  [] __kmalloc+0x42/0x7d
|  [] ccid2_hc_tx_alloc_seq+0x23/0xa4 [dccp_ccid2]
|  [] ccid2_hc_tx_packet_sent+0x8d/0x13f [dccp_ccid2]
|  [] ccid2_hc_tx_packet_sent+0x0/0x13f [dccp_ccid2]
|  [] dccp_write_xmit+0x20e/0x2c4 [dccp]
|  [] hrtimer_run_queues+0x127/0x141
|  [] dccp_write_xmit_timer+0x0/0x51 [dccp]
|  [] dccp_write_xmit_timer+0x33/0x51 [dccp]
|  [] run_timer_softirq+0x101/0x164
|  [] net_rx_action+0xca/0x185
|  [] __do_softirq+0x5d/0xba
|  [] do_softirq+0x59/0xb1
|  [] handle_level_irq+0x0/0xdf
|  [] do_IRQ+0xc6/0xdd
|  [] common_interrupt+0x23/0x28
|  [] find_busiest_group+0x1d2/0x4c3
|  [] lock_sock_nested+0x20/0xa3
|  [] copy_from_user+0x3a/0x66
|  [] dccp_sendmsg+0x2c/0x156 [dccp]
|  [] inet_sendmsg+0x3b/0x45
|  [] sock_aio_write+0xf9/0x105
|  [] do_sync_write+0xc7/0x10a
|  [] autoremove_wake_function+0x0/0x35
|  [] vfs_write+0xbc/0x154
|  [] sys_write+0x41/0x67
|  [] syscall_call+0x7/0xb
|  ===
|  
|
This was observed first on 
http://www.mail-archive.com/dccp@vger.kernel.org/msg01811.html
A patch is attached - Arnaldo came up with an independent solution.


Doh, I just applied my patch, will be in net-2.6.23 and I'll ask DaveM
to have it in 2.6.22 and the [EMAIL PROTECTED] guys to get it into
stable as well.

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


<    1   2   3   4   5   6   7   8   >