Re: [PATCH v2 0/2] net/sctp: Avoid allocating high order memory with kmalloc()

2018-08-03 Thread Michael Tuexen



> On 3. Aug 2018, at 22:30, Marcelo Ricardo Leitner  
> wrote:
> 
> On Fri, Aug 03, 2018 at 04:43:28PM +, David Laight wrote:
>> From: Konstantin Khorenko
>>> Sent: 03 August 2018 17:21
>>> 
>>> Each SCTP association can have up to 65535 input and output streams.
>>> For each stream type an array of sctp_stream_in or sctp_stream_out
>>> structures is allocated using kmalloc_array() function. This function
>>> allocates physically contiguous memory regions, so this can lead
>>> to allocation of memory regions of very high order, i.e.:
>> ...
>> 
>> Given how useless SCTP streams are, does anything actually use
>> more than about 4?
> 
> Maybe Michael can help us with that. I'm also curious now.
In the context of SIGTRAN I have seen 17 streams...

In the context of WebRTC I have seen more streams. In general,
the streams concept seems to be useful. QUIC has lots of streams.

So I'm wondering why they are considered useless.
David, can you elaborate on this?

Best regards
Michael
> 
>  Marcelo



Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs

2018-05-29 Thread Michael Tuexen
> On 29. May 2018, at 13:41, Neil Horman  wrote:
> 
> On Mon, May 28, 2018 at 04:43:15PM -0300, Marcelo Ricardo Leitner wrote:
>> On Sat, May 26, 2018 at 09:01:00PM -0400, Neil Horman wrote:
>>> On Sat, May 26, 2018 at 05:50:39PM +0200, Dmitry Vyukov wrote:
>>>> On Sat, May 26, 2018 at 5:42 PM, Michael Tuexen
>>>>  wrote:
>>>>>> On 25. May 2018, at 21:13, Neil Horman  wrote:
>>>>>> 
>>>>>> On Sat, May 26, 2018 at 01:41:02AM +0800, Xin Long wrote:
>>>>>>> syzbot reported a rcu_sched self-detected stall on CPU which is caused
>>>>>>> by too small value set on rto_min with SCTP_RTOINFO sockopt. With this
>>>>>>> value, hb_timer will get stuck there, as in its timer handler it starts
>>>>>>> this timer again with this value, then goes to the timer handler again.
>>>>>>> 
>>>>>>> This problem is there since very beginning, and thanks to Eric for the
>>>>>>> reproducer shared from a syzbot mail.
>>>>>>> 
>>>>>>> This patch fixes it by not allowing to set rto_min with a value below
>>>>>>> 200 msecs, which is based on TCP's, by either setsockopt or sysctl.
>>>>>>> 
>>>>>>> Reported-by: syzbot+3dcd59a1f907245f8...@syzkaller.appspotmail.com
>>>>>>> Suggested-by: Marcelo Ricardo Leitner 
>>>>>>> Signed-off-by: Xin Long 
>>>>>>> ---
>>>>>>> include/net/sctp/constants.h |  1 +
>>>>>>> net/sctp/socket.c| 10 +++---
>>>>>>> net/sctp/sysctl.c|  3 ++-
>>>>>>> 3 files changed, 10 insertions(+), 4 deletions(-)
>>>>>>> 
>>>>>>> diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h
>>>>>>> index 20ff237..2ee7a7b 100644
>>>>>>> --- a/include/net/sctp/constants.h
>>>>>>> +++ b/include/net/sctp/constants.h
>>>>>>> @@ -277,6 +277,7 @@ enum { SCTP_MAX_GABS = 16 };
>>>>>>> #define SCTP_RTO_INITIAL (3 * 1000)
>>>>>>> #define SCTP_RTO_MIN (1 * 1000)
>>>>>>> #define SCTP_RTO_MAX (60 * 1000)
>>>>>>> +#define SCTP_RTO_HARD_MIN   200
>>>>>>> 
>>>>>>> #define SCTP_RTO_ALPHA  3   /* 1/8 when converted to right 
>>>>>>> shifts. */
>>>>>>> #define SCTP_RTO_BETA   2   /* 1/4 when converted to right 
>>>>>>> shifts. */
>>>>>>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
>>>>>>> index ae7e7c6..6ef12c7 100644
>>>>>>> --- a/net/sctp/socket.c
>>>>>>> +++ b/net/sctp/socket.c
>>>>>>> @@ -3029,7 +3029,8 @@ static int sctp_setsockopt_nodelay(struct sock 
>>>>>>> *sk, char __user *optval,
>>>>>>> * be changed.
>>>>>>> *
>>>>>>> */
>>>>>>> -static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user 
>>>>>>> *optval, unsigned int optlen)
>>>>>>> +static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user 
>>>>>>> *optval,
>>>>>>> +   unsigned int optlen)
>>>>>>> {
>>>>>>> struct sctp_rtoinfo rtoinfo;
>>>>>>> struct sctp_association *asoc;
>>>>>>> @@ -3056,10 +3057,13 @@ static int sctp_setsockopt_rtoinfo(struct sock 
>>>>>>> *sk, char __user *optval, unsigne
>>>>>>> else
>>>>>>> rto_max = asoc ? asoc->rto_max : sp->rtoinfo.srto_max;
>>>>>>> 
>>>>>>> -if (rto_min)
>>>>>>> +if (rto_min) {
>>>>>>> +if (rto_min < SCTP_RTO_HARD_MIN)
>>>>>>> +return -EINVAL;
>>>>>>> rto_min = asoc ? msecs_to_jiffies(rto_min) : rto_min;
>>>>>>> -else
>>>>>>> +} else {
>>>>>>> rto_min = asoc ? asoc->rto_min : sp->rtoinfo.srto_min;
>>>>>>> +}
>>>>>>> 
>>>>>>> if (rto_min > rto_max)
>>>>>>> return -EINVAL;
>>>

Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs

2018-05-27 Thread Michael Tuexen
> On 26. May 2018, at 17:50, Dmitry Vyukov <dvyu...@google.com> wrote:
> 
> On Sat, May 26, 2018 at 5:42 PM, Michael Tuexen
> <michael.tue...@lurchi.franken.de> wrote:
>>> On 25. May 2018, at 21:13, Neil Horman <nhor...@tuxdriver.com> wrote:
>>> 
>>> On Sat, May 26, 2018 at 01:41:02AM +0800, Xin Long wrote:
>>>> syzbot reported a rcu_sched self-detected stall on CPU which is caused
>>>> by too small value set on rto_min with SCTP_RTOINFO sockopt. With this
>>>> value, hb_timer will get stuck there, as in its timer handler it starts
>>>> this timer again with this value, then goes to the timer handler again.
>>>> 
>>>> This problem is there since very beginning, and thanks to Eric for the
>>>> reproducer shared from a syzbot mail.
>>>> 
>>>> This patch fixes it by not allowing to set rto_min with a value below
>>>> 200 msecs, which is based on TCP's, by either setsockopt or sysctl.
>>>> 
>>>> Reported-by: syzbot+3dcd59a1f907245f8...@syzkaller.appspotmail.com
>>>> Suggested-by: Marcelo Ricardo Leitner <marcelo.leit...@gmail.com>
>>>> Signed-off-by: Xin Long <lucien@gmail.com>
>>>> ---
>>>> include/net/sctp/constants.h |  1 +
>>>> net/sctp/socket.c| 10 +++---
>>>> net/sctp/sysctl.c|  3 ++-
>>>> 3 files changed, 10 insertions(+), 4 deletions(-)
>>>> 
>>>> diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h
>>>> index 20ff237..2ee7a7b 100644
>>>> --- a/include/net/sctp/constants.h
>>>> +++ b/include/net/sctp/constants.h
>>>> @@ -277,6 +277,7 @@ enum { SCTP_MAX_GABS = 16 };
>>>> #define SCTP_RTO_INITIAL (3 * 1000)
>>>> #define SCTP_RTO_MIN (1 * 1000)
>>>> #define SCTP_RTO_MAX (60 * 1000)
>>>> +#define SCTP_RTO_HARD_MIN   200
>>>> 
>>>> #define SCTP_RTO_ALPHA  3   /* 1/8 when converted to right shifts. 
>>>> */
>>>> #define SCTP_RTO_BETA   2   /* 1/4 when converted to right shifts. 
>>>> */
>>>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
>>>> index ae7e7c6..6ef12c7 100644
>>>> --- a/net/sctp/socket.c
>>>> +++ b/net/sctp/socket.c
>>>> @@ -3029,7 +3029,8 @@ static int sctp_setsockopt_nodelay(struct sock *sk, 
>>>> char __user *optval,
>>>> * be changed.
>>>> *
>>>> */
>>>> -static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, 
>>>> unsigned int optlen)
>>>> +static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval,
>>>> +   unsigned int optlen)
>>>> {
>>>> struct sctp_rtoinfo rtoinfo;
>>>> struct sctp_association *asoc;
>>>> @@ -3056,10 +3057,13 @@ static int sctp_setsockopt_rtoinfo(struct sock 
>>>> *sk, char __user *optval, unsigne
>>>> else
>>>> rto_max = asoc ? asoc->rto_max : sp->rtoinfo.srto_max;
>>>> 
>>>> -if (rto_min)
>>>> +if (rto_min) {
>>>> +if (rto_min < SCTP_RTO_HARD_MIN)
>>>> +return -EINVAL;
>>>> rto_min = asoc ? msecs_to_jiffies(rto_min) : rto_min;
>>>> -else
>>>> +} else {
>>>> rto_min = asoc ? asoc->rto_min : sp->rtoinfo.srto_min;
>>>> +}
>>>> 
>>>> if (rto_min > rto_max)
>>>> return -EINVAL;
>>>> diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c
>>>> index 33ca5b7..7ec854a 100644
>>>> --- a/net/sctp/sysctl.c
>>>> +++ b/net/sctp/sysctl.c
>>>> @@ -52,6 +52,7 @@ static int rto_alpha_min = 0;
>>>> static int rto_beta_min = 0;
>>>> static int rto_alpha_max = 1000;
>>>> static int rto_beta_max = 1000;
>>>> +static int rto_hard_min = SCTP_RTO_HARD_MIN;
>>>> 
>>>> static unsigned long max_autoclose_min = 0;
>>>> static unsigned long max_autoclose_max =
>>>> @@ -116,7 +117,7 @@ static struct ctl_table sctp_net_table[] = {
>>>> .maxlen = sizeof(unsigned int),
>>>> .mode   = 0644,
>>>> .proc_handler   = proc_sctp_do_rto_min,
>>>> -.extra1 = ,
>>>> +.extra1  

Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs

2018-05-26 Thread Michael Tuexen
> On 25. May 2018, at 21:13, Neil Horman  wrote:
> 
> On Sat, May 26, 2018 at 01:41:02AM +0800, Xin Long wrote:
>> syzbot reported a rcu_sched self-detected stall on CPU which is caused
>> by too small value set on rto_min with SCTP_RTOINFO sockopt. With this
>> value, hb_timer will get stuck there, as in its timer handler it starts
>> this timer again with this value, then goes to the timer handler again.
>> 
>> This problem is there since very beginning, and thanks to Eric for the
>> reproducer shared from a syzbot mail.
>> 
>> This patch fixes it by not allowing to set rto_min with a value below
>> 200 msecs, which is based on TCP's, by either setsockopt or sysctl.
>> 
>> Reported-by: syzbot+3dcd59a1f907245f8...@syzkaller.appspotmail.com
>> Suggested-by: Marcelo Ricardo Leitner 
>> Signed-off-by: Xin Long 
>> ---
>> include/net/sctp/constants.h |  1 +
>> net/sctp/socket.c| 10 +++---
>> net/sctp/sysctl.c|  3 ++-
>> 3 files changed, 10 insertions(+), 4 deletions(-)
>> 
>> diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h
>> index 20ff237..2ee7a7b 100644
>> --- a/include/net/sctp/constants.h
>> +++ b/include/net/sctp/constants.h
>> @@ -277,6 +277,7 @@ enum { SCTP_MAX_GABS = 16 };
>> #define SCTP_RTO_INITIAL (3 * 1000)
>> #define SCTP_RTO_MIN (1 * 1000)
>> #define SCTP_RTO_MAX (60 * 1000)
>> +#define SCTP_RTO_HARD_MIN   200
>> 
>> #define SCTP_RTO_ALPHA  3   /* 1/8 when converted to right shifts. */
>> #define SCTP_RTO_BETA   2   /* 1/4 when converted to right shifts. */
>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
>> index ae7e7c6..6ef12c7 100644
>> --- a/net/sctp/socket.c
>> +++ b/net/sctp/socket.c
>> @@ -3029,7 +3029,8 @@ static int sctp_setsockopt_nodelay(struct sock *sk, 
>> char __user *optval,
>>  * be changed.
>>  *
>>  */
>> -static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, 
>> unsigned int optlen)
>> +static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval,
>> +   unsigned int optlen)
>> {
>>  struct sctp_rtoinfo rtoinfo;
>>  struct sctp_association *asoc;
>> @@ -3056,10 +3057,13 @@ static int sctp_setsockopt_rtoinfo(struct sock *sk, 
>> char __user *optval, unsigne
>>  else
>>  rto_max = asoc ? asoc->rto_max : sp->rtoinfo.srto_max;
>> 
>> -if (rto_min)
>> +if (rto_min) {
>> +if (rto_min < SCTP_RTO_HARD_MIN)
>> +return -EINVAL;
>>  rto_min = asoc ? msecs_to_jiffies(rto_min) : rto_min;
>> -else
>> +} else {
>>  rto_min = asoc ? asoc->rto_min : sp->rtoinfo.srto_min;
>> +}
>> 
>>  if (rto_min > rto_max)
>>  return -EINVAL;
>> diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c
>> index 33ca5b7..7ec854a 100644
>> --- a/net/sctp/sysctl.c
>> +++ b/net/sctp/sysctl.c
>> @@ -52,6 +52,7 @@ static int rto_alpha_min = 0;
>> static int rto_beta_min = 0;
>> static int rto_alpha_max = 1000;
>> static int rto_beta_max = 1000;
>> +static int rto_hard_min = SCTP_RTO_HARD_MIN;
>> 
>> static unsigned long max_autoclose_min = 0;
>> static unsigned long max_autoclose_max =
>> @@ -116,7 +117,7 @@ static struct ctl_table sctp_net_table[] = {
>>  .maxlen = sizeof(unsigned int),
>>  .mode   = 0644,
>>  .proc_handler   = proc_sctp_do_rto_min,
>> -.extra1 = ,
>> +.extra1 = _hard_min,
>>  .extra2 = _net.sctp.rto_max
>>  },
>>  {
>> -- 
>> 2.1.0
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
> Patch looks fine, you probably want to note this hard minimum in man(7) sctp 
> as
> well
> 
I'm aware of some signalling networks which use RTO.min of smaller values than 
200ms.
So could this be reduced?

Best regards
Michael
> Acked-by: Neil Horman 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



Re: [PATCH net-next] sctp: add support for SCTP_REUSE_PORT sockopt

2018-05-21 Thread Michael Tuexen
> On 21. May 2018, at 15:48, Neil Horman <nhor...@tuxdriver.com> wrote:
> 
> On Mon, May 21, 2018 at 02:16:56PM +0200, Michael Tuexen wrote:
>>> On 21. May 2018, at 13:39, Neil Horman <nhor...@tuxdriver.com> wrote:
>>> 
>>> On Sun, May 20, 2018 at 10:54:04PM -0300, Marcelo Ricardo Leitner wrote:
>>>> On Sun, May 20, 2018 at 08:50:59PM -0400, Neil Horman wrote:
>>>>> On Sat, May 19, 2018 at 03:44:40PM +0800, Xin Long wrote:
>>>>>> This feature is actually already supported by sk->sk_reuse which can be
>>>>>> set by SO_REUSEADDR. But it's not working exactly as RFC6458 demands in
>>>>>> section 8.1.27, like:
>>>>>> 
>>>>>> - This option only supports one-to-one style SCTP sockets
>>>>>> - This socket option must not be used after calling bind()
>>>>>>   or sctp_bindx().
>>>>>> 
>>>>>> Besides, SCTP_REUSE_PORT sockopt should be provided for user's programs.
>>>>>> Otherwise, the programs with SCTP_REUSE_PORT from other systems will not
>>>>>> work in linux.
>>>>>> 
>>>>>> This patch reuses sk->sk_reuse and works pretty much as SO_REUSEADDR,
>>>>>> just with some extra setup limitations that are neeeded when it is being
>>>>>> enabled.
>>>>>> 
>>>>>> "It should be noted that the behavior of the socket-level socket option
>>>>>> to reuse ports and/or addresses for SCTP sockets is unspecified", so it
>>>>>> leaves SO_REUSEADDR as is for the compatibility.
>>>>>> 
>>>>>> Signed-off-by: Xin Long <lucien@gmail.com>
>>>>>> ---
>>>>>> include/uapi/linux/sctp.h |  1 +
>>>>>> net/sctp/socket.c | 48 
>>>>>> +++
>>>>>> 2 files changed, 49 insertions(+)
>>>>>> 
>>>>> A few things:
>>>>> 
>>>>> 1) I agree with Tom, this feature is a complete duplication of the 
>>>>> SK_REUSEPORT
>>>>> socket option.  I understand that this is an implementation of the option 
>>>>> in the
>>>>> RFC, but its definately a duplication of a feature, which makes several 
>>>>> things
>>>>> really messy.
>>>>> 
>>>>> 2) The overloading of the sk_reuse opeion is a bad idea, for several 
>>>>> reasons.
>>>>> Chief among them is the behavioral interference between this patch and the
>>>>> SO_REUSEADDR socket level option, that also sets this feature.  If you set
>>>>> sk_reuse via SO_REUSEADDR, you will set the SCTP port reuse feature 
>>>>> regardless
>>>>> of the bind or 1:1/1:m state of the socket.  Vice versa, if you set this 
>>>>> socket
>>>>> option via the SCTP_PORT_REUSE option you will inadvertently turn on 
>>>>> address
>>>>> reuse for the socket.  We can't do that.
>>>> 
>>>> Given your comments, going a bit further here, one other big
>>>> implication is that a port would never be able to be considered to
>>>> fully meet SCTP standards regarding reuse because a rogue application
>>>> may always abuse of the socket level opt to gain access to the port.
>>>> 
>>>> IOW, the patch allows the application to use such restrictions against
>>>> itself and nothing else, which undermines the patch idea.
>>>> 
>>> Agreed.
>>> 
>>>> I lack the knowledge on why the SCTP option was proposed in the RFC. I
>>>> guess they had a good reason to add the restriction on 1:1/1:m style.
>>>> Does the usage of the current imply in any risk to SCTP sockets? If
>>>> yes, that would give some grounds for going forward with the SCTP
>>>> option.
>>>> 
>>> I'm also not privy to why the sctp option was proposed, though I expect 
>>> that the
>>> lack of standardization of SO_REUSEPORT probably had something to do with 
>>> it.
>>> As for the reasoning behind restriction to only 1:1 sockets, if I had to 
>>> guess,
>>> I would say it likely because it creates ordering difficulty at the 
>>> application
>>> level.
>>> 
>>> CC-ing Michael Tuxen, who I believe had some input on this RFC.  Hopefully 
>>> he
>>>

Re: [PATCH net-next] sctp: add support for SCTP_REUSE_PORT sockopt

2018-05-21 Thread Michael Tuexen
> On 21. May 2018, at 13:39, Neil Horman  wrote:
> 
> On Sun, May 20, 2018 at 10:54:04PM -0300, Marcelo Ricardo Leitner wrote:
>> On Sun, May 20, 2018 at 08:50:59PM -0400, Neil Horman wrote:
>>> On Sat, May 19, 2018 at 03:44:40PM +0800, Xin Long wrote:
 This feature is actually already supported by sk->sk_reuse which can be
 set by SO_REUSEADDR. But it's not working exactly as RFC6458 demands in
 section 8.1.27, like:
 
  - This option only supports one-to-one style SCTP sockets
  - This socket option must not be used after calling bind()
or sctp_bindx().
 
 Besides, SCTP_REUSE_PORT sockopt should be provided for user's programs.
 Otherwise, the programs with SCTP_REUSE_PORT from other systems will not
 work in linux.
 
 This patch reuses sk->sk_reuse and works pretty much as SO_REUSEADDR,
 just with some extra setup limitations that are neeeded when it is being
 enabled.
 
 "It should be noted that the behavior of the socket-level socket option
 to reuse ports and/or addresses for SCTP sockets is unspecified", so it
 leaves SO_REUSEADDR as is for the compatibility.
 
 Signed-off-by: Xin Long 
 ---
 include/uapi/linux/sctp.h |  1 +
 net/sctp/socket.c | 48 
 +++
 2 files changed, 49 insertions(+)
 
>>> A few things:
>>> 
>>> 1) I agree with Tom, this feature is a complete duplication of the 
>>> SK_REUSEPORT
>>> socket option.  I understand that this is an implementation of the option 
>>> in the
>>> RFC, but its definately a duplication of a feature, which makes several 
>>> things
>>> really messy.
>>> 
>>> 2) The overloading of the sk_reuse opeion is a bad idea, for several 
>>> reasons.
>>> Chief among them is the behavioral interference between this patch and the
>>> SO_REUSEADDR socket level option, that also sets this feature.  If you set
>>> sk_reuse via SO_REUSEADDR, you will set the SCTP port reuse feature 
>>> regardless
>>> of the bind or 1:1/1:m state of the socket.  Vice versa, if you set this 
>>> socket
>>> option via the SCTP_PORT_REUSE option you will inadvertently turn on address
>>> reuse for the socket.  We can't do that.
>> 
>> Given your comments, going a bit further here, one other big
>> implication is that a port would never be able to be considered to
>> fully meet SCTP standards regarding reuse because a rogue application
>> may always abuse of the socket level opt to gain access to the port.
>> 
>> IOW, the patch allows the application to use such restrictions against
>> itself and nothing else, which undermines the patch idea.
>> 
> Agreed.
> 
>> I lack the knowledge on why the SCTP option was proposed in the RFC. I
>> guess they had a good reason to add the restriction on 1:1/1:m style.
>> Does the usage of the current imply in any risk to SCTP sockets? If
>> yes, that would give some grounds for going forward with the SCTP
>> option.
>> 
> I'm also not privy to why the sctp option was proposed, though I expect that 
> the
> lack of standardization of SO_REUSEPORT probably had something to do with it.
> As for the reasoning behind restriction to only 1:1 sockets, if I had to 
> guess,
> I would say it likely because it creates ordering difficulty at the 
> application
> level.
> 
> CC-ing Michael Tuxen, who I believe had some input on this RFC.  Hopefully he
> can shed some light on this.
Dear all,

the reason this was added is to have a specified way to allow a system to
behave like a client and server making use of the INIT collision.

For 1-to-many style sockets you can do this by creating a socket, binding it,
calling listen on it and trying to connect to the peer.

For 1-to-1 style sockets you need two sockets for it. One listener and one
you use to connect (and close it in case of failure, open a new one...).

It was not clear if one can achieve this with SO_REUSEPORT and/or SO_REUSEADDR
on all platforms. We left that unspecified.

I hope this makes the intention clearer.

Best regards
Michael
> 
> Neil
> 
>>> 
>>> Its a bit frustrating, since SO_REUSEPORT is widely available on multiple
>>> operating systems, but isn't standard (AFAIK).  I would say however, given 
>>> the
>>> prevalence of the socket level option, we should likely advocate for the 
>>> removal
>>> of the sctp specific option, or at the least implement and document it as 
>>> being
>> 
>> Is it possible, to remove/deprecate an option once it is published on a RFC?
>> 
>>> an alias for SO_REUSEPORT
>>> 
>>> 
>>> As this stands however, its a NACK from me.
>>> 
>>> Neil
>>> 
>> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



smime.p7s
Description: S/MIME cryptographic signature


Re: [RFC PATCH net-next 0/3] sctp: add GSO support

2016-01-29 Thread Michael Tuexen

> On 29 Jan 2016, at 12:26, Marcelo Ricardo Leitner <marcelo.leit...@gmail.com> 
> wrote:
> 
> On Fri, Jan 29, 2016 at 11:57:46AM +0100, Michael Tuexen wrote:
>>> On 29 Jan 2016, at 02:18, Marcelo Ricardo Leitner 
>>> <marcelo.leit...@gmail.com> wrote:
>>> 
>>> On Fri, Jan 29, 2016 at 12:36:05AM +0100, Michael Tuexen wrote:
>>>> 
>>>>> On 28 Jan 2016, at 22:03, Marcelo Ricardo Leitner 
>>>>> <marcelo.leit...@gmail.com> wrote:
>>>>> 
>>>>> On Thu, Jan 28, 2016 at 06:54:06PM +0100, Michael Tuexen wrote:
>>>>>>> On 28 Jan 2016, at 14:51, David Laight <david.lai...@aculab.com> wrote:
>>>>>>> 
>>>>>>> From: Marcelo Ricardo Leitner
>>>>>>>> Sent: 27 January 2016 17:07
>>>>>>>> This patchset is merely a RFC for the moment. There are some
>>>>>>>> controversial points that I'd like to discuss before actually proposing
>>>>>>>> the patches.
>>>>>>> 
>>>>>>> You also need to look at how a 'user' can actually get SCTP to
>>>>>>> merge data chunks in the first place.
>>>>>>> 
>>>>>>> With Nagle disabled (and it probably has to be since the data flow
>>>>>>> is unlikely to be 'command-response' or 'unidirectional bulk')
>>>>>>> it is currently almost impossible to get more than one chunk
>>>>>>> into an ethernet frame.
>>>>>>> 
>>>>>>> Support for MSG_MORE would help.
>>>>>> What about adding support for the explicit EOR mode as specified in
>>>>>> https://tools.ietf.org/html/rfc6458#section-8.1.26
>>>>> 
>>>>> Seizing the moment to clarify my understanding on that. :)
>>>>> Such multiple calls to send system calls will result in a single data
>>>>> chunk. Is that so? That's what I get from that text and also from this
>>>> No. It results in a single user message. This means you can send
>>>> a user message larger than the send buffer size. How the user message
>>>> is fragmented in DATA chunks is transparent to the upper layer.
>>>> 
>>>> Does this make things clearer?
>>> 
>>> I think so, yes. So it allows delaying setting the Ending fragment bit
>>> until the application set SCTP_EOR. All the rest before this stays as
>>> before: first send() will generate a chunk with Beginning bit set and
>>> may generate some other middle-fragments (no B nor E bit set) if
>>> necessary, second to N-1 call to send will generate only middle
>>> fragments, while the last send, with SCTP_EOF, will then set the Ending
>>> fragment in the last one. Right?
>> Yes. But there are no restrictions on the user data provided in send()
>> calls and DATA chunks. So you can
>> send(10 byte, no SCTP_EOR)
>> resulting in one DATA chunk with the B bit, several with no B and no E bit.
>> send(10 byte, no SCTP_EOR)
>> resulting in several chunks with no B and no E bit.
>> send(10 byte, SCTP_EOR)
>> resulting in several chunks with no B and no E bit and one (the last) chunk
>> with the E bit.
>> 
>> On the other hand you can do
>> send(1 byte, no SCTP_EOR)
>> resulting in a single DATA chunk with the E bit set.
>> send(1 byte, no SCTP_EOR)
>> send(1 byte, no SCTP_EOR)
>> send(1 byte, no SCTP_EOR)
>> send(1 byte, no SCTP_EOR)
>> send(1 byte, no SCTP_EOR)
>> All resulting in a single DATA chunk with 5 bytes user data and no B or E 
>> bit.
>> (For example if Nagle is enabled and only after the last send call the SACK 
>> arrives).
>> send(1 byte, SCTP_EOR)
>> results in a single DATA chunk with the E bist set.
> 
> Cool, thanks Michael. It will be quite fun to mix this with MSG_MORE
> logic, I think :)
Don't know. In FreeBSD we do support SCTP_EOR, but not MSG_MORE, which seems
to be Linux specific.

Best regards
Michael
> 
> Best regards,
> Marcelo
> 
> 



Re: [RFC PATCH net-next 0/3] sctp: add GSO support

2016-01-29 Thread Michael Tuexen
> On 29 Jan 2016, at 02:18, Marcelo Ricardo Leitner <marcelo.leit...@gmail.com> 
> wrote:
> 
> On Fri, Jan 29, 2016 at 12:36:05AM +0100, Michael Tuexen wrote:
>> 
>>> On 28 Jan 2016, at 22:03, Marcelo Ricardo Leitner 
>>> <marcelo.leit...@gmail.com> wrote:
>>> 
>>> On Thu, Jan 28, 2016 at 06:54:06PM +0100, Michael Tuexen wrote:
>>>>> On 28 Jan 2016, at 14:51, David Laight <david.lai...@aculab.com> wrote:
>>>>> 
>>>>> From: Marcelo Ricardo Leitner
>>>>>> Sent: 27 January 2016 17:07
>>>>>> This patchset is merely a RFC for the moment. There are some
>>>>>> controversial points that I'd like to discuss before actually proposing
>>>>>> the patches.
>>>>> 
>>>>> You also need to look at how a 'user' can actually get SCTP to
>>>>> merge data chunks in the first place.
>>>>> 
>>>>> With Nagle disabled (and it probably has to be since the data flow
>>>>> is unlikely to be 'command-response' or 'unidirectional bulk')
>>>>> it is currently almost impossible to get more than one chunk
>>>>> into an ethernet frame.
>>>>> 
>>>>> Support for MSG_MORE would help.
>>>> What about adding support for the explicit EOR mode as specified in
>>>> https://tools.ietf.org/html/rfc6458#section-8.1.26
>>> 
>>> Seizing the moment to clarify my understanding on that. :)
>>> Such multiple calls to send system calls will result in a single data
>>> chunk. Is that so? That's what I get from that text and also from this
>> No. It results in a single user message. This means you can send
>> a user message larger than the send buffer size. How the user message
>> is fragmented in DATA chunks is transparent to the upper layer.
>> 
>> Does this make things clearer?
> 
> I think so, yes. So it allows delaying setting the Ending fragment bit
> until the application set SCTP_EOR. All the rest before this stays as
> before: first send() will generate a chunk with Beginning bit set and
> may generate some other middle-fragments (no B nor E bit set) if
> necessary, second to N-1 call to send will generate only middle
> fragments, while the last send, with SCTP_EOF, will then set the Ending
> fragment in the last one. Right?
Yes. But there are no restrictions on the user data provided in send()
calls and DATA chunks. So you can
send(10 byte, no SCTP_EOR)
resulting in one DATA chunk with the B bit, several with no B and no E bit.
send(10 byte, no SCTP_EOR)
resulting in several chunks with no B and no E bit.
send(10 byte, SCTP_EOR)
resulting in several chunks with no B and no E bit and one (the last) chunk
with the E bit.

On the other hand you can do
send(1 byte, no SCTP_EOR)
resulting in a single DATA chunk with the E bit set.
send(1 byte, no SCTP_EOR)
send(1 byte, no SCTP_EOR)
send(1 byte, no SCTP_EOR)
send(1 byte, no SCTP_EOR)
send(1 byte, no SCTP_EOR)
All resulting in a single DATA chunk with 5 bytes user data and no B or E bit.
(For example if Nagle is enabled and only after the last send call the SACK 
arrives).
send(1 byte, SCTP_EOR)
results in a single DATA chunk with the E bist set.

Best regards
Michael
> 
> Thanks,
> Marcelo
> 
>> 
>> Best regards
>> Michael
>>> snippet:
>>> "Sending a message using sendmsg() is atomic unless explicit end of
>>> record (EOR) marking is enabled on the socket specified by sd (see
>>> Section 8.1.26)."
>>> 
>>> Best regards,
>>> Marcelo
>>> 
>>>> Best regards
>>>> Michael
>>>>> 
>>>>> Given the current implementation you can get almost the required
>>>>> behaviour by turning nagle off and on repeatedly.
>>>>> 
>>>>> I did wonder whether the queued data could actually be picked up
>>>>> be a Heartbeat chunk that is probing a different remote address
>>>>> (which would be bad news).
>>>>> 
>>>>>   David
>>>>> 
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
>>>>> the body of a message to majord...@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>> 
>>>> 
>>> 
>> 
> 



Re: [RFC PATCH net-next 0/3] sctp: add GSO support

2016-01-28 Thread Michael Tuexen

> On 28 Jan 2016, at 22:03, Marcelo Ricardo Leitner <marcelo.leit...@gmail.com> 
> wrote:
> 
> On Thu, Jan 28, 2016 at 06:54:06PM +0100, Michael Tuexen wrote:
>>> On 28 Jan 2016, at 14:51, David Laight <david.lai...@aculab.com> wrote:
>>> 
>>> From: Marcelo Ricardo Leitner
>>>> Sent: 27 January 2016 17:07
>>>> This patchset is merely a RFC for the moment. There are some
>>>> controversial points that I'd like to discuss before actually proposing
>>>> the patches.
>>> 
>>> You also need to look at how a 'user' can actually get SCTP to
>>> merge data chunks in the first place.
>>> 
>>> With Nagle disabled (and it probably has to be since the data flow
>>> is unlikely to be 'command-response' or 'unidirectional bulk')
>>> it is currently almost impossible to get more than one chunk
>>> into an ethernet frame.
>>> 
>>> Support for MSG_MORE would help.
>> What about adding support for the explicit EOR mode as specified in
>> https://tools.ietf.org/html/rfc6458#section-8.1.26
> 
> Seizing the moment to clarify my understanding on that. :)
> Such multiple calls to send system calls will result in a single data
> chunk. Is that so? That's what I get from that text and also from this
No. It results in a single user message. This means you can send
a user message larger than the send buffer size. How the user message
is fragmented in DATA chunks is transparent to the upper layer.

Does this make things clearer?

Best regards
Michael
> snippet:
> "Sending a message using sendmsg() is atomic unless explicit end of
> record (EOR) marking is enabled on the socket specified by sd (see
> Section 8.1.26)."
> 
> Best regards,
> Marcelo
> 
>> Best regards
>> Michael
>>> 
>>> Given the current implementation you can get almost the required
>>> behaviour by turning nagle off and on repeatedly.
>>> 
>>> I did wonder whether the queued data could actually be picked up
>>> be a Heartbeat chunk that is probing a different remote address
>>> (which would be bad news).
>>> 
>>> David
>>> 
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> 
>> 
> 



Re: [RFC PATCH net-next 0/3] sctp: add GSO support

2016-01-28 Thread Michael Tuexen
> On 28 Jan 2016, at 14:51, David Laight  wrote:
> 
> From: Marcelo Ricardo Leitner
>> Sent: 27 January 2016 17:07
>> This patchset is merely a RFC for the moment. There are some
>> controversial points that I'd like to discuss before actually proposing
>> the patches.
> 
> You also need to look at how a 'user' can actually get SCTP to
> merge data chunks in the first place.
> 
> With Nagle disabled (and it probably has to be since the data flow
> is unlikely to be 'command-response' or 'unidirectional bulk')
> it is currently almost impossible to get more than one chunk
> into an ethernet frame.
> 
> Support for MSG_MORE would help.
What about adding support for the explicit EOR mode as specified in
https://tools.ietf.org/html/rfc6458#section-8.1.26

Best regards
Michael
> 
> Given the current implementation you can get almost the required
> behaviour by turning nagle off and on repeatedly.
> 
> I did wonder whether the queued data could actually be picked up
> be a Heartbeat chunk that is probing a different remote address
> (which would be bad news).
> 
>   David
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



Re: [RFC net-next 1/1] : sctp: denoise deprecation log on SCTP_EVENTS

2015-07-22 Thread Michael Tuexen
 On 22 Jul 2015, at 14:04, Daniel Borkmann dan...@iogearbox.net wrote:
 
 On 07/22/2015 01:48 PM, David Laight wrote:
 From: Jamal Hadi Salim
 Sent: 09 July 2015 11:38
 
 In the newer kernels this message is extremely noisy. After a quick
 discussion with Daniel it seems to me it will be very hard to get
 existing apps that nobody is going to update to continue to work
 (i.e no forward compat). And newer apps that desire to play in both
 older kernels and new kernels will have to play some tricks to work
 (i.e weak backward compatibility).  These are good reasons
 to totally get rid of this message. At minimal to neutre it.
 The attached change tries to do that. However, if you had multiple
 apps, you will only get warning for the first one.
 
 Never mind spanning the kernel log, there will be performance
 issues with systems that are sending significant amounts of SCTP data.
 I'm not sure you'll get 1+/sec through the trace system.
 
 What do you mean by trace system (there's no backtrace)?
 
 Anyway, as previously stated, I doubt we'd get rid of the deprecated
 SCTP uapi from the RFC any time soon, so, imho, we should just remove
 these rate-limited messages altogether to stop spamming the klog ...
... and we will not remove it in FreeBSD either.

Best regards
Michael
 
 I'm going to have to find which version this trace went into and
 make sure we tell customers not to use the relevant kernel versions.
 
  David
 
 
 

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] net: sctp: stop spamming klog with rfc6458, 5.3.2. deprecation warnings

2015-07-22 Thread Michael Tuexen
 On 22 Jul 2015, at 16:31, Daniel Borkmann dan...@iogearbox.net wrote:
 
 Back then when we added support for SCTP_SNDINFO/SCTP_RCVINFO from
 RFC6458 5.3.4/5.3.5, we decided to add a deprecation warning for the
 (as per RFC deprecated) SCTP_SNDRCV via commit bbbea41d5e53 (net:
 sctp: deprecate rfc6458, 5.3.2. SCTP_SNDRCV support), see [1].
 
 Imho, it was not a good idea, and we should just revert that message
 for a couple of reasons:
 
  1) It's uapi and therefore set in stone forever.
 
  2) To be able to run on older and newer kernels, an SCTP application
 would need to probe for both, SCTP_SNDRCV, but also SCTP_SNDINFO/
 SCTP_RCVINFO support, so that on older kernels, it can make use
 of SCTP_SNDRCV, and on newer kernels SCTP_SNDINFO/SCTP_RCVINFO.
 In my (limited) experience, a lot of SCTP appliances are migrating
 to newer kernels only ve(ee)ry slowly.
 
  3) Some people don't have the chance to change their applications,
 f.e. due to proprietary legacy stuff. So, they'll hit this warning
 in fast path and are stuck with older kernels.
 
 But i.e. due to point 1) I really fail to see the benefit of a warning.
 So just revert that for now, the issue was reported up Jamal.
 
  [1] http://thread.gmane.org/gmane.linux.network/321960/
Looks good to me.

Best regards
Michael
 
 Reported-by: Jamal Hadi Salim j...@mojatatu.com
 Signed-off-by: Daniel Borkmann dan...@iogearbox.net
 Cc: Michael Tuexen tue...@fh-muenster.de
 ---
 net/sctp/socket.c | 6 --
 1 file changed, 6 deletions(-)
 
 diff --git a/net/sctp/socket.c b/net/sctp/socket.c
 index 1425ec2..17bef01 100644
 --- a/net/sctp/socket.c
 +++ b/net/sctp/socket.c
 @@ -2200,12 +2200,6 @@ static int sctp_setsockopt_events(struct sock *sk, 
 char __user *optval,
   if (copy_from_user(sctp_sk(sk)-subscribe, optval, optlen))
   return -EFAULT;
 
 - if (sctp_sk(sk)-subscribe.sctp_data_io_event)
 - pr_warn_ratelimited(DEPRECATED %s (pid %d) 
 - Requested SCTP_SNDRCVINFO event.\n
 - Use SCTP_RCVINFO through SCTP_RECVRCVINFO 
 option instead.\n,
 - current-comm, task_pid_nr(current));
 -
   /* At the time when a user app subscribes to SCTP_SENDER_DRY_EVENT,
* if there is no data to be sent or retransmit, the stack will
* immediately send up this notification.
 -- 
 1.9.3
 
 

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH net-next] sctp: fix src address selection if using secondary addresses

2015-07-09 Thread Michael Tuexen
 On 09 Jul 2015, at 18:54, Marcelo Ricardo Leitner mleit...@redhat.com wrote:
 
 Cc'ing Michael too.
I'm not familiar with the Linux kernel code, so I can't comment on it.
But making sure to use a source address belonging to the emitting
interface makes sense for me.

Best regards
Michael
 
 On Tue, Jul 07, 2015 at 02:42:25PM -0300, Marcelo Ricardo Leitner wrote:
 Hi folks,
 
 This is an attempt to better choose a src address for sctp packets as
 peers with rp_filter could be dropping our packets in some situations.
 With this patch, we try to respect and use a src address that belongs to
 the interface we are putting the packet out.
 
 I have that feeling that there is be a better way to do this, but I
 just couldn't see it.
 
 This patch has been tested with and without gateways between the peers
 and also just two peers connected via two subnets and results were
 pretty good.
 
 One could think that this limits the address combination we can use, but
 such combinations probably are just bogus anyway. Like, if you have an
 host with address A1 and B1 and another with A2 and B2, you cannot
 expect that A can use A1 to reach B2 through subnet B, because the
 return path would be via the other link which, when this switch happens,
 we are thinking it's broken.
 
 Thanks,
 Marcelo
 
 ---8---
 
 In short, sctp is likely to incorrectly choose src address if socket is
 bound to secondary addresses. This patch fixes it by adding a new check
 that tries to anticipate if the src address would be expected by the
 next hop/peer on this interface by doing reverse routing.
 
 Also took the shot to reduce the indentation level on this code.
 
 Details:
 
 Currently, sctp will do a routing attempt without specifying the src
 address and compare the returned value (preferred source) with the
 addresses that the socket is bound to. When using secondary addresses,
 this will not match.
 
 Then it will try specifying each of the addresses that the socket is
 bound to and re-routing, checking if that address is valid as src for
 that dst. Thing is, this check alone is weak:
 
 # ip r l
 192.168.100.0/24 dev eth1  proto kernel  scope link  src 192.168.100.149
 192.168.122.0/24 dev eth0  proto kernel  scope link  src 192.168.122.147
 
 # ip a l
 1: lo: LOOPBACK,UP,LOWER_UP mtu 65536 qdisc noqueue state UNKNOWN group 
 default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
   valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
   valid_lft forever preferred_lft forever
 2: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc pfifo_fast state 
 UP group default qlen 1000
link/ether 52:54:00:15:18:6a brd ff:ff:ff:ff:ff:ff
inet 192.168.122.147/24 brd 192.168.122.255 scope global dynamic eth0
   valid_lft 2160sec preferred_lft 2160sec
inet 192.168.122.148/24 scope global secondary eth0
   valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:fe15:186a/64 scope link
   valid_lft forever preferred_lft forever
 3: eth1: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc pfifo_fast state 
 UP group default qlen 1000
link/ether 52:54:00:b3:91:46 brd ff:ff:ff:ff:ff:ff
inet 192.168.100.149/24 brd 192.168.100.255 scope global dynamic eth1
   valid_lft 2162sec preferred_lft 2162sec
inet 192.168.100.148/24 scope global secondary eth1
   valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:feb3:9146/64 scope link
   valid_lft forever preferred_lft forever
 4: ens9: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc pfifo_fast state 
 UP group default qlen 1000
link/ether 52:54:00:05:47:ee brd ff:ff:ff:ff:ff:ff
inet6 fe80::5054:ff:fe05:47ee/64 scope link
   valid_lft forever preferred_lft forever
 
 # ip r g 192.168.100.193 from 192.168.122.148
 192.168.100.193 from 192.168.122.148 dev eth1
cache
 
 Even if you specify an interface:
 
 # ip r g 192.168.100.193 from 192.168.122.148 oif eth1
 192.168.100.193 from 192.168.122.148 dev eth1
cache
 
 Although this would be valid, peers using rp_filter will drop such
 packets as their src doesn't match the routes for that interface.
 
 So we fix this by adding an extra check, we try to do the reverse
 routing and check if the interface used would be the same. If not, we
 skip such address. If yes, we use it.
 
 Signed-off-by: Marcelo Ricardo Leitner marcelo.leit...@gmail.com
 ---
 net/sctp/protocol.c | 55 
 +++--
 1 file changed, 41 insertions(+), 14 deletions(-)
 
 diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
 index 
 59e80356672bdf89777265ae1f8c384792dfb98c..e52fd6f77963426a7cf3e83ca01a9cdae1cb2c01
  100644
 --- a/net/sctp/protocol.c
 +++ b/net/sctp/protocol.c
 @@ -53,6 +53,7 @@
 #include net/net_namespace.h
 #include net/protocol.h
 #include net/ip.h
 +#include net/ip_fib.h
 #include net/ipv6.h
 #include net/route.h
 #include net/sctp/sctp.h
 @@ -487,23 +488,49 @@ static void 

Re: [Lksctp-developers] [PATCH] SCTP: drop SACK if ctsn is not less than the next tsn of assoc

2007-08-01 Thread Michael Tuexen

Hi Wei,

see my comments in-line.

Best regards
Michael

On Aug 1, 2007, at 3:06 AM, Wei Yongjun wrote:




On Tue, 2007-07-31 at 07:37 -0400, Neil Horman wrote:


On Tue, Jul 31, 2007 at 12:44:27PM +0800, Wei Yongjun wrote:

If SCTP data sender received a SACK which contains Cumulative  
TSN Ack is
not less than the Cumulative TSN Ack Point, and if this  
Cumulative TSN
Ack is not used by the data sender, SCTP data sender still  
accept this
SACK , and next SACK which send correctly to DATA sender be  
dropped,

because it is less than the new Cumulative TSN Ack Point.
After received this SACK, data will be retrans again and again  
even if

correct SACK is received.
So I think this SACK must be dropped to let data transmit   
correctly.


Following is the tcpdump of my test. And patch in this mail can  
avoid

this problem.

02:19:38.233278 sctp (1) [INIT] [init tag: 1250461886] [rwnd:  
54784] [OS: 10] [MIS: 65535] [init TSN: 217114040]
02:19:39.782160 sctp (1) [INIT ACK] [init tag: 1] [rwnd: 54784]  
[OS: 100] [MIS: 65535] [init TSN: 100]

02:19:39.798583 sctp (1) [COOKIE ECHO]
02:19:40.082125 sctp (1) [COOKIE ACK]
02:19:40.097859 sctp (1) [DATA] (B)(E) [TSN: 217114040] [SID: 0]  
[SSEQ 0] [PPID 0xf192090b]
02:19:40.100162 sctp (1) [DATA] (B)(E) [TSN: 217114041] [SID: 0]  
[SSEQ 1] [PPID 0x3e467007]
02:19:40.100779 sctp (1) [DATA] (B)(E) [TSN: 217114042] [SID: 0]  
[SSEQ 2] [PPID 0x11b12a0a]
02:19:40.101200 sctp (1) [DATA] (B)(E) [TSN: 217114043] [SID: 0]  
[SSEQ 3] [PPID 0x30e7d979]
02:19:40.561147 sctp (1) [SACK] [cum ack 217114040] [a_rwnd  
54784] [#gap acks 0] [#dup tsns 0]
02:19:40.568498 sctp (1) [DATA] (B)(E) [TSN: 217114044] [SID: 0]  
[SSEQ 4] [PPID 0x251ff86f]
02:19:40.569308 sctp (1) [DATA] (B)(E) [TSN: 217114045] [SID: 0]  
[SSEQ 5] [PPID 0xe5d5da5d]
02:19:40.700584 sctp (1) [SACK] [cum ack 290855864] [a_rwnd  
54784] [#gap acks 0] [#dup tsns 0]
02:19:40.701562 sctp (1) [DATA] (B)(E) [TSN: 217114046] [SID: 0]  
[SSEQ 6] [PPID 0x87d8b423]
02:19:40.701567 sctp (1) [DATA] (B)(E) [TSN: 217114047] [SID: 0]  
[SSEQ 7] [PPID 0xca47e645]
02:19:40.701569 sctp (1) [DATA] (B)(E) [TSN: 217114048] [SID: 0]  
[SSEQ 8] [PPID 0x6c0ea150]
02:19:40.701576 sctp (1) [DATA] (B)(E) [TSN: 217114049] [SID: 0]  
[SSEQ 9] [PPID 0x9cc1994f]
02:19:40.701585 sctp (1) [DATA] (B)(E) [TSN: 217114050] [SID: 0]  
[SSEQ 10] [PPID 0xb1df4129]
02:19:41.098201 sctp (1) [SACK] [cum ack 217114041] [a_rwnd  
54784] [#gap acks 0] [#dup tsns 0]
02:19:41.283257 sctp (1) [SACK] [cum ack 217114042] [a_rwnd  
54784] [#gap acks 0] [#dup tsns 0]
02:19:41.457217 sctp (1) [SACK] [cum ack 217114043] [a_rwnd  
54784] [#gap acks 0] [#dup tsns 0]
02:19:41.691528 sctp (1) [SACK] [cum ack 217114044] [a_rwnd  
54784] [#gap acks 0] [#dup tsns 0]
02:19:41.849636 sctp (1) [SACK] [cum ack 217114045] [a_rwnd  
54784] [#gap acks 0] [#dup tsns 0]
02:19:41.975473 sctp (1) [DATA] (B)(E) [TSN: 217114046] [SID: 0]  
[SSEQ 6] [PPID 0x87d8b423]
02:19:42.021229 sctp (1) [SACK] [cum ack 217114046] [a_rwnd  
54784] [#gap acks 0] [#dup tsns 0]
02:19:42.196495 sctp (1) [SACK] [cum ack 217114047] [a_rwnd  
54784] [#gap acks 0] [#dup tsns 0]
02:19:42.424319 sctp (1) [SACK] [cum ack 217114048] [a_rwnd  
54784] [#gap acks 0] [#dup tsns 0]
02:19:42.586924 sctp (1) [SACK] [cum ack 217114049] [a_rwnd  
54784] [#gap acks 0] [#dup tsns 0]
02:19:42.744810 sctp (1) [SACK] [cum ack 217114050] [a_rwnd  
54784] [#gap acks 0] [#dup tsns 0]
02:19:42.965536 sctp (1) [SACK] [cum ack 217114046] [a_rwnd  
54784] [#gap acks 0] [#dup tsns 0]
02:19:43.106385 sctp (1) [DATA] (B)(E) [TSN: 217114046] [SID: 0]  
[SSEQ 6] [PPID 0x87d8b423]
02:19:43.218969 sctp (1) [SACK] [cum ack 217114046] [a_rwnd  
54784] [#gap acks 0] [#dup tsns 0]
02:19:45.374101 sctp (1) [DATA] (B)(E) [TSN: 217114046] [SID: 0]  
[SSEQ 6] [PPID 0x87d8b423]
02:19:45.489258 sctp (1) [SACK] [cum ack 217114046] [a_rwnd  
54784] [#gap acks 0] [#dup tsns 0]
02:19:49.830116 sctp (1) [DATA] (B)(E) [TSN: 217114046] [SID: 0]  
[SSEQ 6] [PPID 0x87d8b423]
02:19:49.984577 sctp (1) [SACK] [cum ack 217114046] [a_rwnd  
54784] [#gap acks 0] [#dup tsns 0]
02:19:58.760300 sctp (1) [DATA] (B)(E) [TSN: 217114046] [SID: 0]  
[SSEQ 6] [PPID 0x87d8b423]
02:19:58.931690 sctp (1) [SACK] [cum ack 217114046] [a_rwnd  
54784] [#gap acks 0] [#dup tsns 0]



Signed-off-by: Wei Yongjun [EMAIL PROTECTED]

--- net/sctp/sm_statefuns.c.orig	2007-07-29 18:11:01.0  
-0400

+++ net/sctp/sm_statefuns.c 2007-07-29 18:14:49.0 -0400
@@ -2880,6 +2880,15 @@ sctp_disposition_t sctp_sf_eat_sack_6_2(
return SCTP_DISPOSITION_DISCARD;
}

+   /* If Cumulative TSN Ack is not less than the Cumulative TSN
+* Ack which will be send in the next data, drop the SACK.
+*/
+   if (!TSN_lt(ctsn, asoc-next_tsn)) {
+   SCTP_DEBUG_PRINTK(ctsn %x\n, ctsn);
+   SCTP_DEBUG_PRINTK(next_tsn %x\n, asoc-next_tsn);
+   return SCTP_DISPOSITION_DISCARD;
+   }
+
/* Return this