Re: [PATCH v2 0/2] net/sctp: Avoid allocating high order memory with kmalloc()
> On 3. Aug 2018, at 22:30, Marcelo Ricardo Leitner > wrote: > > On Fri, Aug 03, 2018 at 04:43:28PM +, David Laight wrote: >> From: Konstantin Khorenko >>> Sent: 03 August 2018 17:21 >>> >>> Each SCTP association can have up to 65535 input and output streams. >>> For each stream type an array of sctp_stream_in or sctp_stream_out >>> structures is allocated using kmalloc_array() function. This function >>> allocates physically contiguous memory regions, so this can lead >>> to allocation of memory regions of very high order, i.e.: >> ... >> >> Given how useless SCTP streams are, does anything actually use >> more than about 4? > > Maybe Michael can help us with that. I'm also curious now. In the context of SIGTRAN I have seen 17 streams... In the context of WebRTC I have seen more streams. In general, the streams concept seems to be useful. QUIC has lots of streams. So I'm wondering why they are considered useless. David, can you elaborate on this? Best regards Michael > > Marcelo
Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs
> On 29. May 2018, at 13:41, Neil Horman wrote: > > On Mon, May 28, 2018 at 04:43:15PM -0300, Marcelo Ricardo Leitner wrote: >> On Sat, May 26, 2018 at 09:01:00PM -0400, Neil Horman wrote: >>> On Sat, May 26, 2018 at 05:50:39PM +0200, Dmitry Vyukov wrote: >>>> On Sat, May 26, 2018 at 5:42 PM, Michael Tuexen >>>> wrote: >>>>>> On 25. May 2018, at 21:13, Neil Horman wrote: >>>>>> >>>>>> On Sat, May 26, 2018 at 01:41:02AM +0800, Xin Long wrote: >>>>>>> syzbot reported a rcu_sched self-detected stall on CPU which is caused >>>>>>> by too small value set on rto_min with SCTP_RTOINFO sockopt. With this >>>>>>> value, hb_timer will get stuck there, as in its timer handler it starts >>>>>>> this timer again with this value, then goes to the timer handler again. >>>>>>> >>>>>>> This problem is there since very beginning, and thanks to Eric for the >>>>>>> reproducer shared from a syzbot mail. >>>>>>> >>>>>>> This patch fixes it by not allowing to set rto_min with a value below >>>>>>> 200 msecs, which is based on TCP's, by either setsockopt or sysctl. >>>>>>> >>>>>>> Reported-by: syzbot+3dcd59a1f907245f8...@syzkaller.appspotmail.com >>>>>>> Suggested-by: Marcelo Ricardo Leitner >>>>>>> Signed-off-by: Xin Long >>>>>>> --- >>>>>>> include/net/sctp/constants.h | 1 + >>>>>>> net/sctp/socket.c| 10 +++--- >>>>>>> net/sctp/sysctl.c| 3 ++- >>>>>>> 3 files changed, 10 insertions(+), 4 deletions(-) >>>>>>> >>>>>>> diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h >>>>>>> index 20ff237..2ee7a7b 100644 >>>>>>> --- a/include/net/sctp/constants.h >>>>>>> +++ b/include/net/sctp/constants.h >>>>>>> @@ -277,6 +277,7 @@ enum { SCTP_MAX_GABS = 16 }; >>>>>>> #define SCTP_RTO_INITIAL (3 * 1000) >>>>>>> #define SCTP_RTO_MIN (1 * 1000) >>>>>>> #define SCTP_RTO_MAX (60 * 1000) >>>>>>> +#define SCTP_RTO_HARD_MIN 200 >>>>>>> >>>>>>> #define SCTP_RTO_ALPHA 3 /* 1/8 when converted to right >>>>>>> shifts. */ >>>>>>> #define SCTP_RTO_BETA 2 /* 1/4 when converted to right >>>>>>> shifts. */ >>>>>>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c >>>>>>> index ae7e7c6..6ef12c7 100644 >>>>>>> --- a/net/sctp/socket.c >>>>>>> +++ b/net/sctp/socket.c >>>>>>> @@ -3029,7 +3029,8 @@ static int sctp_setsockopt_nodelay(struct sock >>>>>>> *sk, char __user *optval, >>>>>>> * be changed. >>>>>>> * >>>>>>> */ >>>>>>> -static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user >>>>>>> *optval, unsigned int optlen) >>>>>>> +static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user >>>>>>> *optval, >>>>>>> + unsigned int optlen) >>>>>>> { >>>>>>> struct sctp_rtoinfo rtoinfo; >>>>>>> struct sctp_association *asoc; >>>>>>> @@ -3056,10 +3057,13 @@ static int sctp_setsockopt_rtoinfo(struct sock >>>>>>> *sk, char __user *optval, unsigne >>>>>>> else >>>>>>> rto_max = asoc ? asoc->rto_max : sp->rtoinfo.srto_max; >>>>>>> >>>>>>> -if (rto_min) >>>>>>> +if (rto_min) { >>>>>>> +if (rto_min < SCTP_RTO_HARD_MIN) >>>>>>> +return -EINVAL; >>>>>>> rto_min = asoc ? msecs_to_jiffies(rto_min) : rto_min; >>>>>>> -else >>>>>>> +} else { >>>>>>> rto_min = asoc ? asoc->rto_min : sp->rtoinfo.srto_min; >>>>>>> +} >>>>>>> >>>>>>> if (rto_min > rto_max) >>>>>>> return -EINVAL; >>>
Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs
> On 26. May 2018, at 17:50, Dmitry Vyukov <dvyu...@google.com> wrote: > > On Sat, May 26, 2018 at 5:42 PM, Michael Tuexen > <michael.tue...@lurchi.franken.de> wrote: >>> On 25. May 2018, at 21:13, Neil Horman <nhor...@tuxdriver.com> wrote: >>> >>> On Sat, May 26, 2018 at 01:41:02AM +0800, Xin Long wrote: >>>> syzbot reported a rcu_sched self-detected stall on CPU which is caused >>>> by too small value set on rto_min with SCTP_RTOINFO sockopt. With this >>>> value, hb_timer will get stuck there, as in its timer handler it starts >>>> this timer again with this value, then goes to the timer handler again. >>>> >>>> This problem is there since very beginning, and thanks to Eric for the >>>> reproducer shared from a syzbot mail. >>>> >>>> This patch fixes it by not allowing to set rto_min with a value below >>>> 200 msecs, which is based on TCP's, by either setsockopt or sysctl. >>>> >>>> Reported-by: syzbot+3dcd59a1f907245f8...@syzkaller.appspotmail.com >>>> Suggested-by: Marcelo Ricardo Leitner <marcelo.leit...@gmail.com> >>>> Signed-off-by: Xin Long <lucien@gmail.com> >>>> --- >>>> include/net/sctp/constants.h | 1 + >>>> net/sctp/socket.c| 10 +++--- >>>> net/sctp/sysctl.c| 3 ++- >>>> 3 files changed, 10 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h >>>> index 20ff237..2ee7a7b 100644 >>>> --- a/include/net/sctp/constants.h >>>> +++ b/include/net/sctp/constants.h >>>> @@ -277,6 +277,7 @@ enum { SCTP_MAX_GABS = 16 }; >>>> #define SCTP_RTO_INITIAL (3 * 1000) >>>> #define SCTP_RTO_MIN (1 * 1000) >>>> #define SCTP_RTO_MAX (60 * 1000) >>>> +#define SCTP_RTO_HARD_MIN 200 >>>> >>>> #define SCTP_RTO_ALPHA 3 /* 1/8 when converted to right shifts. >>>> */ >>>> #define SCTP_RTO_BETA 2 /* 1/4 when converted to right shifts. >>>> */ >>>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c >>>> index ae7e7c6..6ef12c7 100644 >>>> --- a/net/sctp/socket.c >>>> +++ b/net/sctp/socket.c >>>> @@ -3029,7 +3029,8 @@ static int sctp_setsockopt_nodelay(struct sock *sk, >>>> char __user *optval, >>>> * be changed. >>>> * >>>> */ >>>> -static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, >>>> unsigned int optlen) >>>> +static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, >>>> + unsigned int optlen) >>>> { >>>> struct sctp_rtoinfo rtoinfo; >>>> struct sctp_association *asoc; >>>> @@ -3056,10 +3057,13 @@ static int sctp_setsockopt_rtoinfo(struct sock >>>> *sk, char __user *optval, unsigne >>>> else >>>> rto_max = asoc ? asoc->rto_max : sp->rtoinfo.srto_max; >>>> >>>> -if (rto_min) >>>> +if (rto_min) { >>>> +if (rto_min < SCTP_RTO_HARD_MIN) >>>> +return -EINVAL; >>>> rto_min = asoc ? msecs_to_jiffies(rto_min) : rto_min; >>>> -else >>>> +} else { >>>> rto_min = asoc ? asoc->rto_min : sp->rtoinfo.srto_min; >>>> +} >>>> >>>> if (rto_min > rto_max) >>>> return -EINVAL; >>>> diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c >>>> index 33ca5b7..7ec854a 100644 >>>> --- a/net/sctp/sysctl.c >>>> +++ b/net/sctp/sysctl.c >>>> @@ -52,6 +52,7 @@ static int rto_alpha_min = 0; >>>> static int rto_beta_min = 0; >>>> static int rto_alpha_max = 1000; >>>> static int rto_beta_max = 1000; >>>> +static int rto_hard_min = SCTP_RTO_HARD_MIN; >>>> >>>> static unsigned long max_autoclose_min = 0; >>>> static unsigned long max_autoclose_max = >>>> @@ -116,7 +117,7 @@ static struct ctl_table sctp_net_table[] = { >>>> .maxlen = sizeof(unsigned int), >>>> .mode = 0644, >>>> .proc_handler = proc_sctp_do_rto_min, >>>> -.extra1 = , >>>> +.extra1
Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs
> On 25. May 2018, at 21:13, Neil Hormanwrote: > > On Sat, May 26, 2018 at 01:41:02AM +0800, Xin Long wrote: >> syzbot reported a rcu_sched self-detected stall on CPU which is caused >> by too small value set on rto_min with SCTP_RTOINFO sockopt. With this >> value, hb_timer will get stuck there, as in its timer handler it starts >> this timer again with this value, then goes to the timer handler again. >> >> This problem is there since very beginning, and thanks to Eric for the >> reproducer shared from a syzbot mail. >> >> This patch fixes it by not allowing to set rto_min with a value below >> 200 msecs, which is based on TCP's, by either setsockopt or sysctl. >> >> Reported-by: syzbot+3dcd59a1f907245f8...@syzkaller.appspotmail.com >> Suggested-by: Marcelo Ricardo Leitner >> Signed-off-by: Xin Long >> --- >> include/net/sctp/constants.h | 1 + >> net/sctp/socket.c| 10 +++--- >> net/sctp/sysctl.c| 3 ++- >> 3 files changed, 10 insertions(+), 4 deletions(-) >> >> diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h >> index 20ff237..2ee7a7b 100644 >> --- a/include/net/sctp/constants.h >> +++ b/include/net/sctp/constants.h >> @@ -277,6 +277,7 @@ enum { SCTP_MAX_GABS = 16 }; >> #define SCTP_RTO_INITIAL (3 * 1000) >> #define SCTP_RTO_MIN (1 * 1000) >> #define SCTP_RTO_MAX (60 * 1000) >> +#define SCTP_RTO_HARD_MIN 200 >> >> #define SCTP_RTO_ALPHA 3 /* 1/8 when converted to right shifts. */ >> #define SCTP_RTO_BETA 2 /* 1/4 when converted to right shifts. */ >> diff --git a/net/sctp/socket.c b/net/sctp/socket.c >> index ae7e7c6..6ef12c7 100644 >> --- a/net/sctp/socket.c >> +++ b/net/sctp/socket.c >> @@ -3029,7 +3029,8 @@ static int sctp_setsockopt_nodelay(struct sock *sk, >> char __user *optval, >> * be changed. >> * >> */ >> -static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, >> unsigned int optlen) >> +static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, >> + unsigned int optlen) >> { >> struct sctp_rtoinfo rtoinfo; >> struct sctp_association *asoc; >> @@ -3056,10 +3057,13 @@ static int sctp_setsockopt_rtoinfo(struct sock *sk, >> char __user *optval, unsigne >> else >> rto_max = asoc ? asoc->rto_max : sp->rtoinfo.srto_max; >> >> -if (rto_min) >> +if (rto_min) { >> +if (rto_min < SCTP_RTO_HARD_MIN) >> +return -EINVAL; >> rto_min = asoc ? msecs_to_jiffies(rto_min) : rto_min; >> -else >> +} else { >> rto_min = asoc ? asoc->rto_min : sp->rtoinfo.srto_min; >> +} >> >> if (rto_min > rto_max) >> return -EINVAL; >> diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c >> index 33ca5b7..7ec854a 100644 >> --- a/net/sctp/sysctl.c >> +++ b/net/sctp/sysctl.c >> @@ -52,6 +52,7 @@ static int rto_alpha_min = 0; >> static int rto_beta_min = 0; >> static int rto_alpha_max = 1000; >> static int rto_beta_max = 1000; >> +static int rto_hard_min = SCTP_RTO_HARD_MIN; >> >> static unsigned long max_autoclose_min = 0; >> static unsigned long max_autoclose_max = >> @@ -116,7 +117,7 @@ static struct ctl_table sctp_net_table[] = { >> .maxlen = sizeof(unsigned int), >> .mode = 0644, >> .proc_handler = proc_sctp_do_rto_min, >> -.extra1 = , >> +.extra1 = _hard_min, >> .extra2 = _net.sctp.rto_max >> }, >> { >> -- >> 2.1.0 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > Patch looks fine, you probably want to note this hard minimum in man(7) sctp > as > well > I'm aware of some signalling networks which use RTO.min of smaller values than 200ms. So could this be reduced? Best regards Michael > Acked-by: Neil Horman > > -- > To unsubscribe from this list: send the line "unsubscribe linux-sctp" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] sctp: add support for SCTP_REUSE_PORT sockopt
> On 21. May 2018, at 15:48, Neil Horman <nhor...@tuxdriver.com> wrote: > > On Mon, May 21, 2018 at 02:16:56PM +0200, Michael Tuexen wrote: >>> On 21. May 2018, at 13:39, Neil Horman <nhor...@tuxdriver.com> wrote: >>> >>> On Sun, May 20, 2018 at 10:54:04PM -0300, Marcelo Ricardo Leitner wrote: >>>> On Sun, May 20, 2018 at 08:50:59PM -0400, Neil Horman wrote: >>>>> On Sat, May 19, 2018 at 03:44:40PM +0800, Xin Long wrote: >>>>>> This feature is actually already supported by sk->sk_reuse which can be >>>>>> set by SO_REUSEADDR. But it's not working exactly as RFC6458 demands in >>>>>> section 8.1.27, like: >>>>>> >>>>>> - This option only supports one-to-one style SCTP sockets >>>>>> - This socket option must not be used after calling bind() >>>>>> or sctp_bindx(). >>>>>> >>>>>> Besides, SCTP_REUSE_PORT sockopt should be provided for user's programs. >>>>>> Otherwise, the programs with SCTP_REUSE_PORT from other systems will not >>>>>> work in linux. >>>>>> >>>>>> This patch reuses sk->sk_reuse and works pretty much as SO_REUSEADDR, >>>>>> just with some extra setup limitations that are neeeded when it is being >>>>>> enabled. >>>>>> >>>>>> "It should be noted that the behavior of the socket-level socket option >>>>>> to reuse ports and/or addresses for SCTP sockets is unspecified", so it >>>>>> leaves SO_REUSEADDR as is for the compatibility. >>>>>> >>>>>> Signed-off-by: Xin Long <lucien@gmail.com> >>>>>> --- >>>>>> include/uapi/linux/sctp.h | 1 + >>>>>> net/sctp/socket.c | 48 >>>>>> +++ >>>>>> 2 files changed, 49 insertions(+) >>>>>> >>>>> A few things: >>>>> >>>>> 1) I agree with Tom, this feature is a complete duplication of the >>>>> SK_REUSEPORT >>>>> socket option. I understand that this is an implementation of the option >>>>> in the >>>>> RFC, but its definately a duplication of a feature, which makes several >>>>> things >>>>> really messy. >>>>> >>>>> 2) The overloading of the sk_reuse opeion is a bad idea, for several >>>>> reasons. >>>>> Chief among them is the behavioral interference between this patch and the >>>>> SO_REUSEADDR socket level option, that also sets this feature. If you set >>>>> sk_reuse via SO_REUSEADDR, you will set the SCTP port reuse feature >>>>> regardless >>>>> of the bind or 1:1/1:m state of the socket. Vice versa, if you set this >>>>> socket >>>>> option via the SCTP_PORT_REUSE option you will inadvertently turn on >>>>> address >>>>> reuse for the socket. We can't do that. >>>> >>>> Given your comments, going a bit further here, one other big >>>> implication is that a port would never be able to be considered to >>>> fully meet SCTP standards regarding reuse because a rogue application >>>> may always abuse of the socket level opt to gain access to the port. >>>> >>>> IOW, the patch allows the application to use such restrictions against >>>> itself and nothing else, which undermines the patch idea. >>>> >>> Agreed. >>> >>>> I lack the knowledge on why the SCTP option was proposed in the RFC. I >>>> guess they had a good reason to add the restriction on 1:1/1:m style. >>>> Does the usage of the current imply in any risk to SCTP sockets? If >>>> yes, that would give some grounds for going forward with the SCTP >>>> option. >>>> >>> I'm also not privy to why the sctp option was proposed, though I expect >>> that the >>> lack of standardization of SO_REUSEPORT probably had something to do with >>> it. >>> As for the reasoning behind restriction to only 1:1 sockets, if I had to >>> guess, >>> I would say it likely because it creates ordering difficulty at the >>> application >>> level. >>> >>> CC-ing Michael Tuxen, who I believe had some input on this RFC. Hopefully >>> he >>>
Re: [PATCH net-next] sctp: add support for SCTP_REUSE_PORT sockopt
> On 21. May 2018, at 13:39, Neil Hormanwrote: > > On Sun, May 20, 2018 at 10:54:04PM -0300, Marcelo Ricardo Leitner wrote: >> On Sun, May 20, 2018 at 08:50:59PM -0400, Neil Horman wrote: >>> On Sat, May 19, 2018 at 03:44:40PM +0800, Xin Long wrote: This feature is actually already supported by sk->sk_reuse which can be set by SO_REUSEADDR. But it's not working exactly as RFC6458 demands in section 8.1.27, like: - This option only supports one-to-one style SCTP sockets - This socket option must not be used after calling bind() or sctp_bindx(). Besides, SCTP_REUSE_PORT sockopt should be provided for user's programs. Otherwise, the programs with SCTP_REUSE_PORT from other systems will not work in linux. This patch reuses sk->sk_reuse and works pretty much as SO_REUSEADDR, just with some extra setup limitations that are neeeded when it is being enabled. "It should be noted that the behavior of the socket-level socket option to reuse ports and/or addresses for SCTP sockets is unspecified", so it leaves SO_REUSEADDR as is for the compatibility. Signed-off-by: Xin Long --- include/uapi/linux/sctp.h | 1 + net/sctp/socket.c | 48 +++ 2 files changed, 49 insertions(+) >>> A few things: >>> >>> 1) I agree with Tom, this feature is a complete duplication of the >>> SK_REUSEPORT >>> socket option. I understand that this is an implementation of the option >>> in the >>> RFC, but its definately a duplication of a feature, which makes several >>> things >>> really messy. >>> >>> 2) The overloading of the sk_reuse opeion is a bad idea, for several >>> reasons. >>> Chief among them is the behavioral interference between this patch and the >>> SO_REUSEADDR socket level option, that also sets this feature. If you set >>> sk_reuse via SO_REUSEADDR, you will set the SCTP port reuse feature >>> regardless >>> of the bind or 1:1/1:m state of the socket. Vice versa, if you set this >>> socket >>> option via the SCTP_PORT_REUSE option you will inadvertently turn on address >>> reuse for the socket. We can't do that. >> >> Given your comments, going a bit further here, one other big >> implication is that a port would never be able to be considered to >> fully meet SCTP standards regarding reuse because a rogue application >> may always abuse of the socket level opt to gain access to the port. >> >> IOW, the patch allows the application to use such restrictions against >> itself and nothing else, which undermines the patch idea. >> > Agreed. > >> I lack the knowledge on why the SCTP option was proposed in the RFC. I >> guess they had a good reason to add the restriction on 1:1/1:m style. >> Does the usage of the current imply in any risk to SCTP sockets? If >> yes, that would give some grounds for going forward with the SCTP >> option. >> > I'm also not privy to why the sctp option was proposed, though I expect that > the > lack of standardization of SO_REUSEPORT probably had something to do with it. > As for the reasoning behind restriction to only 1:1 sockets, if I had to > guess, > I would say it likely because it creates ordering difficulty at the > application > level. > > CC-ing Michael Tuxen, who I believe had some input on this RFC. Hopefully he > can shed some light on this. Dear all, the reason this was added is to have a specified way to allow a system to behave like a client and server making use of the INIT collision. For 1-to-many style sockets you can do this by creating a socket, binding it, calling listen on it and trying to connect to the peer. For 1-to-1 style sockets you need two sockets for it. One listener and one you use to connect (and close it in case of failure, open a new one...). It was not clear if one can achieve this with SO_REUSEPORT and/or SO_REUSEADDR on all platforms. We left that unspecified. I hope this makes the intention clearer. Best regards Michael > > Neil > >>> >>> Its a bit frustrating, since SO_REUSEPORT is widely available on multiple >>> operating systems, but isn't standard (AFAIK). I would say however, given >>> the >>> prevalence of the socket level option, we should likely advocate for the >>> removal >>> of the sctp specific option, or at the least implement and document it as >>> being >> >> Is it possible, to remove/deprecate an option once it is published on a RFC? >> >>> an alias for SO_REUSEPORT >>> >>> >>> As this stands however, its a NACK from me. >>> >>> Neil >>> >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-sctp" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html smime.p7s Description: S/MIME cryptographic signature
Re: [RFC PATCH net-next 0/3] sctp: add GSO support
> On 29 Jan 2016, at 12:26, Marcelo Ricardo Leitner <marcelo.leit...@gmail.com> > wrote: > > On Fri, Jan 29, 2016 at 11:57:46AM +0100, Michael Tuexen wrote: >>> On 29 Jan 2016, at 02:18, Marcelo Ricardo Leitner >>> <marcelo.leit...@gmail.com> wrote: >>> >>> On Fri, Jan 29, 2016 at 12:36:05AM +0100, Michael Tuexen wrote: >>>> >>>>> On 28 Jan 2016, at 22:03, Marcelo Ricardo Leitner >>>>> <marcelo.leit...@gmail.com> wrote: >>>>> >>>>> On Thu, Jan 28, 2016 at 06:54:06PM +0100, Michael Tuexen wrote: >>>>>>> On 28 Jan 2016, at 14:51, David Laight <david.lai...@aculab.com> wrote: >>>>>>> >>>>>>> From: Marcelo Ricardo Leitner >>>>>>>> Sent: 27 January 2016 17:07 >>>>>>>> This patchset is merely a RFC for the moment. There are some >>>>>>>> controversial points that I'd like to discuss before actually proposing >>>>>>>> the patches. >>>>>>> >>>>>>> You also need to look at how a 'user' can actually get SCTP to >>>>>>> merge data chunks in the first place. >>>>>>> >>>>>>> With Nagle disabled (and it probably has to be since the data flow >>>>>>> is unlikely to be 'command-response' or 'unidirectional bulk') >>>>>>> it is currently almost impossible to get more than one chunk >>>>>>> into an ethernet frame. >>>>>>> >>>>>>> Support for MSG_MORE would help. >>>>>> What about adding support for the explicit EOR mode as specified in >>>>>> https://tools.ietf.org/html/rfc6458#section-8.1.26 >>>>> >>>>> Seizing the moment to clarify my understanding on that. :) >>>>> Such multiple calls to send system calls will result in a single data >>>>> chunk. Is that so? That's what I get from that text and also from this >>>> No. It results in a single user message. This means you can send >>>> a user message larger than the send buffer size. How the user message >>>> is fragmented in DATA chunks is transparent to the upper layer. >>>> >>>> Does this make things clearer? >>> >>> I think so, yes. So it allows delaying setting the Ending fragment bit >>> until the application set SCTP_EOR. All the rest before this stays as >>> before: first send() will generate a chunk with Beginning bit set and >>> may generate some other middle-fragments (no B nor E bit set) if >>> necessary, second to N-1 call to send will generate only middle >>> fragments, while the last send, with SCTP_EOF, will then set the Ending >>> fragment in the last one. Right? >> Yes. But there are no restrictions on the user data provided in send() >> calls and DATA chunks. So you can >> send(10 byte, no SCTP_EOR) >> resulting in one DATA chunk with the B bit, several with no B and no E bit. >> send(10 byte, no SCTP_EOR) >> resulting in several chunks with no B and no E bit. >> send(10 byte, SCTP_EOR) >> resulting in several chunks with no B and no E bit and one (the last) chunk >> with the E bit. >> >> On the other hand you can do >> send(1 byte, no SCTP_EOR) >> resulting in a single DATA chunk with the E bit set. >> send(1 byte, no SCTP_EOR) >> send(1 byte, no SCTP_EOR) >> send(1 byte, no SCTP_EOR) >> send(1 byte, no SCTP_EOR) >> send(1 byte, no SCTP_EOR) >> All resulting in a single DATA chunk with 5 bytes user data and no B or E >> bit. >> (For example if Nagle is enabled and only after the last send call the SACK >> arrives). >> send(1 byte, SCTP_EOR) >> results in a single DATA chunk with the E bist set. > > Cool, thanks Michael. It will be quite fun to mix this with MSG_MORE > logic, I think :) Don't know. In FreeBSD we do support SCTP_EOR, but not MSG_MORE, which seems to be Linux specific. Best regards Michael > > Best regards, > Marcelo > >
Re: [RFC PATCH net-next 0/3] sctp: add GSO support
> On 29 Jan 2016, at 02:18, Marcelo Ricardo Leitner <marcelo.leit...@gmail.com> > wrote: > > On Fri, Jan 29, 2016 at 12:36:05AM +0100, Michael Tuexen wrote: >> >>> On 28 Jan 2016, at 22:03, Marcelo Ricardo Leitner >>> <marcelo.leit...@gmail.com> wrote: >>> >>> On Thu, Jan 28, 2016 at 06:54:06PM +0100, Michael Tuexen wrote: >>>>> On 28 Jan 2016, at 14:51, David Laight <david.lai...@aculab.com> wrote: >>>>> >>>>> From: Marcelo Ricardo Leitner >>>>>> Sent: 27 January 2016 17:07 >>>>>> This patchset is merely a RFC for the moment. There are some >>>>>> controversial points that I'd like to discuss before actually proposing >>>>>> the patches. >>>>> >>>>> You also need to look at how a 'user' can actually get SCTP to >>>>> merge data chunks in the first place. >>>>> >>>>> With Nagle disabled (and it probably has to be since the data flow >>>>> is unlikely to be 'command-response' or 'unidirectional bulk') >>>>> it is currently almost impossible to get more than one chunk >>>>> into an ethernet frame. >>>>> >>>>> Support for MSG_MORE would help. >>>> What about adding support for the explicit EOR mode as specified in >>>> https://tools.ietf.org/html/rfc6458#section-8.1.26 >>> >>> Seizing the moment to clarify my understanding on that. :) >>> Such multiple calls to send system calls will result in a single data >>> chunk. Is that so? That's what I get from that text and also from this >> No. It results in a single user message. This means you can send >> a user message larger than the send buffer size. How the user message >> is fragmented in DATA chunks is transparent to the upper layer. >> >> Does this make things clearer? > > I think so, yes. So it allows delaying setting the Ending fragment bit > until the application set SCTP_EOR. All the rest before this stays as > before: first send() will generate a chunk with Beginning bit set and > may generate some other middle-fragments (no B nor E bit set) if > necessary, second to N-1 call to send will generate only middle > fragments, while the last send, with SCTP_EOF, will then set the Ending > fragment in the last one. Right? Yes. But there are no restrictions on the user data provided in send() calls and DATA chunks. So you can send(10 byte, no SCTP_EOR) resulting in one DATA chunk with the B bit, several with no B and no E bit. send(10 byte, no SCTP_EOR) resulting in several chunks with no B and no E bit. send(10 byte, SCTP_EOR) resulting in several chunks with no B and no E bit and one (the last) chunk with the E bit. On the other hand you can do send(1 byte, no SCTP_EOR) resulting in a single DATA chunk with the E bit set. send(1 byte, no SCTP_EOR) send(1 byte, no SCTP_EOR) send(1 byte, no SCTP_EOR) send(1 byte, no SCTP_EOR) send(1 byte, no SCTP_EOR) All resulting in a single DATA chunk with 5 bytes user data and no B or E bit. (For example if Nagle is enabled and only after the last send call the SACK arrives). send(1 byte, SCTP_EOR) results in a single DATA chunk with the E bist set. Best regards Michael > > Thanks, > Marcelo > >> >> Best regards >> Michael >>> snippet: >>> "Sending a message using sendmsg() is atomic unless explicit end of >>> record (EOR) marking is enabled on the socket specified by sd (see >>> Section 8.1.26)." >>> >>> Best regards, >>> Marcelo >>> >>>> Best regards >>>> Michael >>>>> >>>>> Given the current implementation you can get almost the required >>>>> behaviour by turning nagle off and on repeatedly. >>>>> >>>>> I did wonder whether the queued data could actually be picked up >>>>> be a Heartbeat chunk that is probing a different remote address >>>>> (which would be bad news). >>>>> >>>>> David >>>>> >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in >>>>> the body of a message to majord...@vger.kernel.org >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>> >>>> >>> >> >
Re: [RFC PATCH net-next 0/3] sctp: add GSO support
> On 28 Jan 2016, at 22:03, Marcelo Ricardo Leitner <marcelo.leit...@gmail.com> > wrote: > > On Thu, Jan 28, 2016 at 06:54:06PM +0100, Michael Tuexen wrote: >>> On 28 Jan 2016, at 14:51, David Laight <david.lai...@aculab.com> wrote: >>> >>> From: Marcelo Ricardo Leitner >>>> Sent: 27 January 2016 17:07 >>>> This patchset is merely a RFC for the moment. There are some >>>> controversial points that I'd like to discuss before actually proposing >>>> the patches. >>> >>> You also need to look at how a 'user' can actually get SCTP to >>> merge data chunks in the first place. >>> >>> With Nagle disabled (and it probably has to be since the data flow >>> is unlikely to be 'command-response' or 'unidirectional bulk') >>> it is currently almost impossible to get more than one chunk >>> into an ethernet frame. >>> >>> Support for MSG_MORE would help. >> What about adding support for the explicit EOR mode as specified in >> https://tools.ietf.org/html/rfc6458#section-8.1.26 > > Seizing the moment to clarify my understanding on that. :) > Such multiple calls to send system calls will result in a single data > chunk. Is that so? That's what I get from that text and also from this No. It results in a single user message. This means you can send a user message larger than the send buffer size. How the user message is fragmented in DATA chunks is transparent to the upper layer. Does this make things clearer? Best regards Michael > snippet: > "Sending a message using sendmsg() is atomic unless explicit end of > record (EOR) marking is enabled on the socket specified by sd (see > Section 8.1.26)." > > Best regards, > Marcelo > >> Best regards >> Michael >>> >>> Given the current implementation you can get almost the required >>> behaviour by turning nagle off and on repeatedly. >>> >>> I did wonder whether the queued data could actually be picked up >>> be a Heartbeat chunk that is probing a different remote address >>> (which would be bad news). >>> >>> David >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in >>> the body of a message to majord...@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >> >
Re: [RFC PATCH net-next 0/3] sctp: add GSO support
> On 28 Jan 2016, at 14:51, David Laightwrote: > > From: Marcelo Ricardo Leitner >> Sent: 27 January 2016 17:07 >> This patchset is merely a RFC for the moment. There are some >> controversial points that I'd like to discuss before actually proposing >> the patches. > > You also need to look at how a 'user' can actually get SCTP to > merge data chunks in the first place. > > With Nagle disabled (and it probably has to be since the data flow > is unlikely to be 'command-response' or 'unidirectional bulk') > it is currently almost impossible to get more than one chunk > into an ethernet frame. > > Support for MSG_MORE would help. What about adding support for the explicit EOR mode as specified in https://tools.ietf.org/html/rfc6458#section-8.1.26 Best regards Michael > > Given the current implementation you can get almost the required > behaviour by turning nagle off and on repeatedly. > > I did wonder whether the queued data could actually be picked up > be a Heartbeat chunk that is probing a different remote address > (which would be bad news). > > David > > -- > To unsubscribe from this list: send the line "unsubscribe linux-sctp" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >
Re: [RFC net-next 1/1] : sctp: denoise deprecation log on SCTP_EVENTS
On 22 Jul 2015, at 14:04, Daniel Borkmann dan...@iogearbox.net wrote: On 07/22/2015 01:48 PM, David Laight wrote: From: Jamal Hadi Salim Sent: 09 July 2015 11:38 In the newer kernels this message is extremely noisy. After a quick discussion with Daniel it seems to me it will be very hard to get existing apps that nobody is going to update to continue to work (i.e no forward compat). And newer apps that desire to play in both older kernels and new kernels will have to play some tricks to work (i.e weak backward compatibility). These are good reasons to totally get rid of this message. At minimal to neutre it. The attached change tries to do that. However, if you had multiple apps, you will only get warning for the first one. Never mind spanning the kernel log, there will be performance issues with systems that are sending significant amounts of SCTP data. I'm not sure you'll get 1+/sec through the trace system. What do you mean by trace system (there's no backtrace)? Anyway, as previously stated, I doubt we'd get rid of the deprecated SCTP uapi from the RFC any time soon, so, imho, we should just remove these rate-limited messages altogether to stop spamming the klog ... ... and we will not remove it in FreeBSD either. Best regards Michael I'm going to have to find which version this trace went into and make sure we tell customers not to use the relevant kernel versions. David -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] net: sctp: stop spamming klog with rfc6458, 5.3.2. deprecation warnings
On 22 Jul 2015, at 16:31, Daniel Borkmann dan...@iogearbox.net wrote: Back then when we added support for SCTP_SNDINFO/SCTP_RCVINFO from RFC6458 5.3.4/5.3.5, we decided to add a deprecation warning for the (as per RFC deprecated) SCTP_SNDRCV via commit bbbea41d5e53 (net: sctp: deprecate rfc6458, 5.3.2. SCTP_SNDRCV support), see [1]. Imho, it was not a good idea, and we should just revert that message for a couple of reasons: 1) It's uapi and therefore set in stone forever. 2) To be able to run on older and newer kernels, an SCTP application would need to probe for both, SCTP_SNDRCV, but also SCTP_SNDINFO/ SCTP_RCVINFO support, so that on older kernels, it can make use of SCTP_SNDRCV, and on newer kernels SCTP_SNDINFO/SCTP_RCVINFO. In my (limited) experience, a lot of SCTP appliances are migrating to newer kernels only ve(ee)ry slowly. 3) Some people don't have the chance to change their applications, f.e. due to proprietary legacy stuff. So, they'll hit this warning in fast path and are stuck with older kernels. But i.e. due to point 1) I really fail to see the benefit of a warning. So just revert that for now, the issue was reported up Jamal. [1] http://thread.gmane.org/gmane.linux.network/321960/ Looks good to me. Best regards Michael Reported-by: Jamal Hadi Salim j...@mojatatu.com Signed-off-by: Daniel Borkmann dan...@iogearbox.net Cc: Michael Tuexen tue...@fh-muenster.de --- net/sctp/socket.c | 6 -- 1 file changed, 6 deletions(-) diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 1425ec2..17bef01 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -2200,12 +2200,6 @@ static int sctp_setsockopt_events(struct sock *sk, char __user *optval, if (copy_from_user(sctp_sk(sk)-subscribe, optval, optlen)) return -EFAULT; - if (sctp_sk(sk)-subscribe.sctp_data_io_event) - pr_warn_ratelimited(DEPRECATED %s (pid %d) - Requested SCTP_SNDRCVINFO event.\n - Use SCTP_RCVINFO through SCTP_RECVRCVINFO option instead.\n, - current-comm, task_pid_nr(current)); - /* At the time when a user app subscribes to SCTP_SENDER_DRY_EVENT, * if there is no data to be sent or retransmit, the stack will * immediately send up this notification. -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH net-next] sctp: fix src address selection if using secondary addresses
On 09 Jul 2015, at 18:54, Marcelo Ricardo Leitner mleit...@redhat.com wrote: Cc'ing Michael too. I'm not familiar with the Linux kernel code, so I can't comment on it. But making sure to use a source address belonging to the emitting interface makes sense for me. Best regards Michael On Tue, Jul 07, 2015 at 02:42:25PM -0300, Marcelo Ricardo Leitner wrote: Hi folks, This is an attempt to better choose a src address for sctp packets as peers with rp_filter could be dropping our packets in some situations. With this patch, we try to respect and use a src address that belongs to the interface we are putting the packet out. I have that feeling that there is be a better way to do this, but I just couldn't see it. This patch has been tested with and without gateways between the peers and also just two peers connected via two subnets and results were pretty good. One could think that this limits the address combination we can use, but such combinations probably are just bogus anyway. Like, if you have an host with address A1 and B1 and another with A2 and B2, you cannot expect that A can use A1 to reach B2 through subnet B, because the return path would be via the other link which, when this switch happens, we are thinking it's broken. Thanks, Marcelo ---8--- In short, sctp is likely to incorrectly choose src address if socket is bound to secondary addresses. This patch fixes it by adding a new check that tries to anticipate if the src address would be expected by the next hop/peer on this interface by doing reverse routing. Also took the shot to reduce the indentation level on this code. Details: Currently, sctp will do a routing attempt without specifying the src address and compare the returned value (preferred source) with the addresses that the socket is bound to. When using secondary addresses, this will not match. Then it will try specifying each of the addresses that the socket is bound to and re-routing, checking if that address is valid as src for that dst. Thing is, this check alone is weak: # ip r l 192.168.100.0/24 dev eth1 proto kernel scope link src 192.168.100.149 192.168.122.0/24 dev eth0 proto kernel scope link src 192.168.122.147 # ip a l 1: lo: LOOPBACK,UP,LOWER_UP mtu 65536 qdisc noqueue state UNKNOWN group default link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 52:54:00:15:18:6a brd ff:ff:ff:ff:ff:ff inet 192.168.122.147/24 brd 192.168.122.255 scope global dynamic eth0 valid_lft 2160sec preferred_lft 2160sec inet 192.168.122.148/24 scope global secondary eth0 valid_lft forever preferred_lft forever inet6 fe80::5054:ff:fe15:186a/64 scope link valid_lft forever preferred_lft forever 3: eth1: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 52:54:00:b3:91:46 brd ff:ff:ff:ff:ff:ff inet 192.168.100.149/24 brd 192.168.100.255 scope global dynamic eth1 valid_lft 2162sec preferred_lft 2162sec inet 192.168.100.148/24 scope global secondary eth1 valid_lft forever preferred_lft forever inet6 fe80::5054:ff:feb3:9146/64 scope link valid_lft forever preferred_lft forever 4: ens9: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 52:54:00:05:47:ee brd ff:ff:ff:ff:ff:ff inet6 fe80::5054:ff:fe05:47ee/64 scope link valid_lft forever preferred_lft forever # ip r g 192.168.100.193 from 192.168.122.148 192.168.100.193 from 192.168.122.148 dev eth1 cache Even if you specify an interface: # ip r g 192.168.100.193 from 192.168.122.148 oif eth1 192.168.100.193 from 192.168.122.148 dev eth1 cache Although this would be valid, peers using rp_filter will drop such packets as their src doesn't match the routes for that interface. So we fix this by adding an extra check, we try to do the reverse routing and check if the interface used would be the same. If not, we skip such address. If yes, we use it. Signed-off-by: Marcelo Ricardo Leitner marcelo.leit...@gmail.com --- net/sctp/protocol.c | 55 +++-- 1 file changed, 41 insertions(+), 14 deletions(-) diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c index 59e80356672bdf89777265ae1f8c384792dfb98c..e52fd6f77963426a7cf3e83ca01a9cdae1cb2c01 100644 --- a/net/sctp/protocol.c +++ b/net/sctp/protocol.c @@ -53,6 +53,7 @@ #include net/net_namespace.h #include net/protocol.h #include net/ip.h +#include net/ip_fib.h #include net/ipv6.h #include net/route.h #include net/sctp/sctp.h @@ -487,23 +488,49 @@ static void
Re: [Lksctp-developers] [PATCH] SCTP: drop SACK if ctsn is not less than the next tsn of assoc
Hi Wei, see my comments in-line. Best regards Michael On Aug 1, 2007, at 3:06 AM, Wei Yongjun wrote: On Tue, 2007-07-31 at 07:37 -0400, Neil Horman wrote: On Tue, Jul 31, 2007 at 12:44:27PM +0800, Wei Yongjun wrote: If SCTP data sender received a SACK which contains Cumulative TSN Ack is not less than the Cumulative TSN Ack Point, and if this Cumulative TSN Ack is not used by the data sender, SCTP data sender still accept this SACK , and next SACK which send correctly to DATA sender be dropped, because it is less than the new Cumulative TSN Ack Point. After received this SACK, data will be retrans again and again even if correct SACK is received. So I think this SACK must be dropped to let data transmit correctly. Following is the tcpdump of my test. And patch in this mail can avoid this problem. 02:19:38.233278 sctp (1) [INIT] [init tag: 1250461886] [rwnd: 54784] [OS: 10] [MIS: 65535] [init TSN: 217114040] 02:19:39.782160 sctp (1) [INIT ACK] [init tag: 1] [rwnd: 54784] [OS: 100] [MIS: 65535] [init TSN: 100] 02:19:39.798583 sctp (1) [COOKIE ECHO] 02:19:40.082125 sctp (1) [COOKIE ACK] 02:19:40.097859 sctp (1) [DATA] (B)(E) [TSN: 217114040] [SID: 0] [SSEQ 0] [PPID 0xf192090b] 02:19:40.100162 sctp (1) [DATA] (B)(E) [TSN: 217114041] [SID: 0] [SSEQ 1] [PPID 0x3e467007] 02:19:40.100779 sctp (1) [DATA] (B)(E) [TSN: 217114042] [SID: 0] [SSEQ 2] [PPID 0x11b12a0a] 02:19:40.101200 sctp (1) [DATA] (B)(E) [TSN: 217114043] [SID: 0] [SSEQ 3] [PPID 0x30e7d979] 02:19:40.561147 sctp (1) [SACK] [cum ack 217114040] [a_rwnd 54784] [#gap acks 0] [#dup tsns 0] 02:19:40.568498 sctp (1) [DATA] (B)(E) [TSN: 217114044] [SID: 0] [SSEQ 4] [PPID 0x251ff86f] 02:19:40.569308 sctp (1) [DATA] (B)(E) [TSN: 217114045] [SID: 0] [SSEQ 5] [PPID 0xe5d5da5d] 02:19:40.700584 sctp (1) [SACK] [cum ack 290855864] [a_rwnd 54784] [#gap acks 0] [#dup tsns 0] 02:19:40.701562 sctp (1) [DATA] (B)(E) [TSN: 217114046] [SID: 0] [SSEQ 6] [PPID 0x87d8b423] 02:19:40.701567 sctp (1) [DATA] (B)(E) [TSN: 217114047] [SID: 0] [SSEQ 7] [PPID 0xca47e645] 02:19:40.701569 sctp (1) [DATA] (B)(E) [TSN: 217114048] [SID: 0] [SSEQ 8] [PPID 0x6c0ea150] 02:19:40.701576 sctp (1) [DATA] (B)(E) [TSN: 217114049] [SID: 0] [SSEQ 9] [PPID 0x9cc1994f] 02:19:40.701585 sctp (1) [DATA] (B)(E) [TSN: 217114050] [SID: 0] [SSEQ 10] [PPID 0xb1df4129] 02:19:41.098201 sctp (1) [SACK] [cum ack 217114041] [a_rwnd 54784] [#gap acks 0] [#dup tsns 0] 02:19:41.283257 sctp (1) [SACK] [cum ack 217114042] [a_rwnd 54784] [#gap acks 0] [#dup tsns 0] 02:19:41.457217 sctp (1) [SACK] [cum ack 217114043] [a_rwnd 54784] [#gap acks 0] [#dup tsns 0] 02:19:41.691528 sctp (1) [SACK] [cum ack 217114044] [a_rwnd 54784] [#gap acks 0] [#dup tsns 0] 02:19:41.849636 sctp (1) [SACK] [cum ack 217114045] [a_rwnd 54784] [#gap acks 0] [#dup tsns 0] 02:19:41.975473 sctp (1) [DATA] (B)(E) [TSN: 217114046] [SID: 0] [SSEQ 6] [PPID 0x87d8b423] 02:19:42.021229 sctp (1) [SACK] [cum ack 217114046] [a_rwnd 54784] [#gap acks 0] [#dup tsns 0] 02:19:42.196495 sctp (1) [SACK] [cum ack 217114047] [a_rwnd 54784] [#gap acks 0] [#dup tsns 0] 02:19:42.424319 sctp (1) [SACK] [cum ack 217114048] [a_rwnd 54784] [#gap acks 0] [#dup tsns 0] 02:19:42.586924 sctp (1) [SACK] [cum ack 217114049] [a_rwnd 54784] [#gap acks 0] [#dup tsns 0] 02:19:42.744810 sctp (1) [SACK] [cum ack 217114050] [a_rwnd 54784] [#gap acks 0] [#dup tsns 0] 02:19:42.965536 sctp (1) [SACK] [cum ack 217114046] [a_rwnd 54784] [#gap acks 0] [#dup tsns 0] 02:19:43.106385 sctp (1) [DATA] (B)(E) [TSN: 217114046] [SID: 0] [SSEQ 6] [PPID 0x87d8b423] 02:19:43.218969 sctp (1) [SACK] [cum ack 217114046] [a_rwnd 54784] [#gap acks 0] [#dup tsns 0] 02:19:45.374101 sctp (1) [DATA] (B)(E) [TSN: 217114046] [SID: 0] [SSEQ 6] [PPID 0x87d8b423] 02:19:45.489258 sctp (1) [SACK] [cum ack 217114046] [a_rwnd 54784] [#gap acks 0] [#dup tsns 0] 02:19:49.830116 sctp (1) [DATA] (B)(E) [TSN: 217114046] [SID: 0] [SSEQ 6] [PPID 0x87d8b423] 02:19:49.984577 sctp (1) [SACK] [cum ack 217114046] [a_rwnd 54784] [#gap acks 0] [#dup tsns 0] 02:19:58.760300 sctp (1) [DATA] (B)(E) [TSN: 217114046] [SID: 0] [SSEQ 6] [PPID 0x87d8b423] 02:19:58.931690 sctp (1) [SACK] [cum ack 217114046] [a_rwnd 54784] [#gap acks 0] [#dup tsns 0] Signed-off-by: Wei Yongjun [EMAIL PROTECTED] --- net/sctp/sm_statefuns.c.orig 2007-07-29 18:11:01.0 -0400 +++ net/sctp/sm_statefuns.c 2007-07-29 18:14:49.0 -0400 @@ -2880,6 +2880,15 @@ sctp_disposition_t sctp_sf_eat_sack_6_2( return SCTP_DISPOSITION_DISCARD; } + /* If Cumulative TSN Ack is not less than the Cumulative TSN +* Ack which will be send in the next data, drop the SACK. +*/ + if (!TSN_lt(ctsn, asoc-next_tsn)) { + SCTP_DEBUG_PRINTK(ctsn %x\n, ctsn); + SCTP_DEBUG_PRINTK(next_tsn %x\n, asoc-next_tsn); + return SCTP_DISPOSITION_DISCARD; + } + /* Return this