Re: [PATCH v2 4.19] tcp: fix TCP socks unreleased in BBR mode

2020-08-11 Thread Eric Dumazet



On 8/11/20 3:37 AM, Jason Xing wrote:
> Hi everyone,
> 
> Could anyone take a look at this issue? I believe it is of high-importance.
> Though Eric gave the proper patch a few months ago, the stable branch
> still hasn't applied or merged this fix. It seems this patch was
> forgotten :(


Sure, I'll take care of this shortly.

Thanks.

> 
> Thanks,
> Jason
> 
> On Thu, Jun 4, 2020 at 9:47 PM Jason Xing  wrote:
>>
>> On Thu, Jun 4, 2020 at 9:10 PM Eric Dumazet  wrote:
>>>
>>> On Thu, Jun 4, 2020 at 2:01 AM  wrote:

 From: Jason Xing 

 When using BBR mode, too many tcp socks cannot be released because of
 duplicate use of the sock_hold() in the manner of tcp_internal_pacing()
 when RTO happens. Therefore, this situation maddly increases the slab
 memory and then constantly triggers the OOM until crash.

 Besides, in addition to BBR mode, if some mode applies pacing function,
 it could trigger what we've discussed above,

 Reproduce procedure:
 0) cat /proc/slabinfo | grep TCP
 1) switch net.ipv4.tcp_congestion_control to bbr
 2) using wrk tool something like that to send packages
 3) using tc to increase the delay and loss to simulate the RTO case.
 4) cat /proc/slabinfo | grep TCP
 5) kill the wrk command and observe the number of objects and slabs in
 TCP.
 6) at last, you could notice that the number would not decrease.

 v2: extend the timer which could cover all those related potential risks
 (suggested by Eric Dumazet and Neal Cardwell)

 Signed-off-by: Jason Xing 
 Signed-off-by: liweishi 
 Signed-off-by: Shujin Li 
>>>
>>> That is not how things work really.
>>>
>>> I will submit this properly so that stable teams do not have to guess
>>> how to backport this to various kernels.
>>>
>>> Changelog is misleading, this has nothing to do with BBR, we need to be 
>>> precise.
>>>
>>
>> Thanks for your help. I can finally apply this patch into my kernel.
>>
>> Looking forward to your patchset :)
>>
>> Jason
>>
>>> Thank you.


Re: [PATCH v2 4.19] tcp: fix TCP socks unreleased in BBR mode

2020-08-11 Thread Jason Xing
Hi everyone,

Could anyone take a look at this issue? I believe it is of high-importance.
Though Eric gave the proper patch a few months ago, the stable branch
still hasn't applied or merged this fix. It seems this patch was
forgotten :(

Thanks,
Jason

On Thu, Jun 4, 2020 at 9:47 PM Jason Xing  wrote:
>
> On Thu, Jun 4, 2020 at 9:10 PM Eric Dumazet  wrote:
> >
> > On Thu, Jun 4, 2020 at 2:01 AM  wrote:
> > >
> > > From: Jason Xing 
> > >
> > > When using BBR mode, too many tcp socks cannot be released because of
> > > duplicate use of the sock_hold() in the manner of tcp_internal_pacing()
> > > when RTO happens. Therefore, this situation maddly increases the slab
> > > memory and then constantly triggers the OOM until crash.
> > >
> > > Besides, in addition to BBR mode, if some mode applies pacing function,
> > > it could trigger what we've discussed above,
> > >
> > > Reproduce procedure:
> > > 0) cat /proc/slabinfo | grep TCP
> > > 1) switch net.ipv4.tcp_congestion_control to bbr
> > > 2) using wrk tool something like that to send packages
> > > 3) using tc to increase the delay and loss to simulate the RTO case.
> > > 4) cat /proc/slabinfo | grep TCP
> > > 5) kill the wrk command and observe the number of objects and slabs in
> > > TCP.
> > > 6) at last, you could notice that the number would not decrease.
> > >
> > > v2: extend the timer which could cover all those related potential risks
> > > (suggested by Eric Dumazet and Neal Cardwell)
> > >
> > > Signed-off-by: Jason Xing 
> > > Signed-off-by: liweishi 
> > > Signed-off-by: Shujin Li 
> >
> > That is not how things work really.
> >
> > I will submit this properly so that stable teams do not have to guess
> > how to backport this to various kernels.
> >
> > Changelog is misleading, this has nothing to do with BBR, we need to be 
> > precise.
> >
>
> Thanks for your help. I can finally apply this patch into my kernel.
>
> Looking forward to your patchset :)
>
> Jason
>
> > Thank you.


Re: [PATCH v2 4.19] tcp: fix TCP socks unreleased in BBR mode

2020-06-04 Thread Jason Xing
On Thu, Jun 4, 2020 at 9:10 PM Eric Dumazet  wrote:
>
> On Thu, Jun 4, 2020 at 2:01 AM  wrote:
> >
> > From: Jason Xing 
> >
> > When using BBR mode, too many tcp socks cannot be released because of
> > duplicate use of the sock_hold() in the manner of tcp_internal_pacing()
> > when RTO happens. Therefore, this situation maddly increases the slab
> > memory and then constantly triggers the OOM until crash.
> >
> > Besides, in addition to BBR mode, if some mode applies pacing function,
> > it could trigger what we've discussed above,
> >
> > Reproduce procedure:
> > 0) cat /proc/slabinfo | grep TCP
> > 1) switch net.ipv4.tcp_congestion_control to bbr
> > 2) using wrk tool something like that to send packages
> > 3) using tc to increase the delay and loss to simulate the RTO case.
> > 4) cat /proc/slabinfo | grep TCP
> > 5) kill the wrk command and observe the number of objects and slabs in
> > TCP.
> > 6) at last, you could notice that the number would not decrease.
> >
> > v2: extend the timer which could cover all those related potential risks
> > (suggested by Eric Dumazet and Neal Cardwell)
> >
> > Signed-off-by: Jason Xing 
> > Signed-off-by: liweishi 
> > Signed-off-by: Shujin Li 
>
> That is not how things work really.
>
> I will submit this properly so that stable teams do not have to guess
> how to backport this to various kernels.
>
> Changelog is misleading, this has nothing to do with BBR, we need to be 
> precise.
>

Thanks for your help. I can finally apply this patch into my kernel.

Looking forward to your patchset :)

Jason

> Thank you.


Re: [PATCH v2 4.19] tcp: fix TCP socks unreleased in BBR mode

2020-06-04 Thread Eric Dumazet
On Thu, Jun 4, 2020 at 2:01 AM  wrote:
>
> From: Jason Xing 
>
> When using BBR mode, too many tcp socks cannot be released because of
> duplicate use of the sock_hold() in the manner of tcp_internal_pacing()
> when RTO happens. Therefore, this situation maddly increases the slab
> memory and then constantly triggers the OOM until crash.
>
> Besides, in addition to BBR mode, if some mode applies pacing function,
> it could trigger what we've discussed above,
>
> Reproduce procedure:
> 0) cat /proc/slabinfo | grep TCP
> 1) switch net.ipv4.tcp_congestion_control to bbr
> 2) using wrk tool something like that to send packages
> 3) using tc to increase the delay and loss to simulate the RTO case.
> 4) cat /proc/slabinfo | grep TCP
> 5) kill the wrk command and observe the number of objects and slabs in
> TCP.
> 6) at last, you could notice that the number would not decrease.
>
> v2: extend the timer which could cover all those related potential risks
> (suggested by Eric Dumazet and Neal Cardwell)
>
> Signed-off-by: Jason Xing 
> Signed-off-by: liweishi 
> Signed-off-by: Shujin Li 

That is not how things work really.

I will submit this properly so that stable teams do not have to guess
how to backport this to various kernels.

Changelog is misleading, this has nothing to do with BBR, we need to be precise.

Thank you.