Re: [PATCH v2 4.19] tcp: fix TCP socks unreleased in BBR mode
On 8/11/20 3:37 AM, Jason Xing wrote: > Hi everyone, > > Could anyone take a look at this issue? I believe it is of high-importance. > Though Eric gave the proper patch a few months ago, the stable branch > still hasn't applied or merged this fix. It seems this patch was > forgotten :( Sure, I'll take care of this shortly. Thanks. > > Thanks, > Jason > > On Thu, Jun 4, 2020 at 9:47 PM Jason Xing wrote: >> >> On Thu, Jun 4, 2020 at 9:10 PM Eric Dumazet wrote: >>> >>> On Thu, Jun 4, 2020 at 2:01 AM wrote: From: Jason Xing When using BBR mode, too many tcp socks cannot be released because of duplicate use of the sock_hold() in the manner of tcp_internal_pacing() when RTO happens. Therefore, this situation maddly increases the slab memory and then constantly triggers the OOM until crash. Besides, in addition to BBR mode, if some mode applies pacing function, it could trigger what we've discussed above, Reproduce procedure: 0) cat /proc/slabinfo | grep TCP 1) switch net.ipv4.tcp_congestion_control to bbr 2) using wrk tool something like that to send packages 3) using tc to increase the delay and loss to simulate the RTO case. 4) cat /proc/slabinfo | grep TCP 5) kill the wrk command and observe the number of objects and slabs in TCP. 6) at last, you could notice that the number would not decrease. v2: extend the timer which could cover all those related potential risks (suggested by Eric Dumazet and Neal Cardwell) Signed-off-by: Jason Xing Signed-off-by: liweishi Signed-off-by: Shujin Li >>> >>> That is not how things work really. >>> >>> I will submit this properly so that stable teams do not have to guess >>> how to backport this to various kernels. >>> >>> Changelog is misleading, this has nothing to do with BBR, we need to be >>> precise. >>> >> >> Thanks for your help. I can finally apply this patch into my kernel. >> >> Looking forward to your patchset :) >> >> Jason >> >>> Thank you.
Re: [PATCH v2 4.19] tcp: fix TCP socks unreleased in BBR mode
Hi everyone, Could anyone take a look at this issue? I believe it is of high-importance. Though Eric gave the proper patch a few months ago, the stable branch still hasn't applied or merged this fix. It seems this patch was forgotten :( Thanks, Jason On Thu, Jun 4, 2020 at 9:47 PM Jason Xing wrote: > > On Thu, Jun 4, 2020 at 9:10 PM Eric Dumazet wrote: > > > > On Thu, Jun 4, 2020 at 2:01 AM wrote: > > > > > > From: Jason Xing > > > > > > When using BBR mode, too many tcp socks cannot be released because of > > > duplicate use of the sock_hold() in the manner of tcp_internal_pacing() > > > when RTO happens. Therefore, this situation maddly increases the slab > > > memory and then constantly triggers the OOM until crash. > > > > > > Besides, in addition to BBR mode, if some mode applies pacing function, > > > it could trigger what we've discussed above, > > > > > > Reproduce procedure: > > > 0) cat /proc/slabinfo | grep TCP > > > 1) switch net.ipv4.tcp_congestion_control to bbr > > > 2) using wrk tool something like that to send packages > > > 3) using tc to increase the delay and loss to simulate the RTO case. > > > 4) cat /proc/slabinfo | grep TCP > > > 5) kill the wrk command and observe the number of objects and slabs in > > > TCP. > > > 6) at last, you could notice that the number would not decrease. > > > > > > v2: extend the timer which could cover all those related potential risks > > > (suggested by Eric Dumazet and Neal Cardwell) > > > > > > Signed-off-by: Jason Xing > > > Signed-off-by: liweishi > > > Signed-off-by: Shujin Li > > > > That is not how things work really. > > > > I will submit this properly so that stable teams do not have to guess > > how to backport this to various kernels. > > > > Changelog is misleading, this has nothing to do with BBR, we need to be > > precise. > > > > Thanks for your help. I can finally apply this patch into my kernel. > > Looking forward to your patchset :) > > Jason > > > Thank you.
Re: [PATCH v2 4.19] tcp: fix TCP socks unreleased in BBR mode
On Thu, Jun 4, 2020 at 9:10 PM Eric Dumazet wrote: > > On Thu, Jun 4, 2020 at 2:01 AM wrote: > > > > From: Jason Xing > > > > When using BBR mode, too many tcp socks cannot be released because of > > duplicate use of the sock_hold() in the manner of tcp_internal_pacing() > > when RTO happens. Therefore, this situation maddly increases the slab > > memory and then constantly triggers the OOM until crash. > > > > Besides, in addition to BBR mode, if some mode applies pacing function, > > it could trigger what we've discussed above, > > > > Reproduce procedure: > > 0) cat /proc/slabinfo | grep TCP > > 1) switch net.ipv4.tcp_congestion_control to bbr > > 2) using wrk tool something like that to send packages > > 3) using tc to increase the delay and loss to simulate the RTO case. > > 4) cat /proc/slabinfo | grep TCP > > 5) kill the wrk command and observe the number of objects and slabs in > > TCP. > > 6) at last, you could notice that the number would not decrease. > > > > v2: extend the timer which could cover all those related potential risks > > (suggested by Eric Dumazet and Neal Cardwell) > > > > Signed-off-by: Jason Xing > > Signed-off-by: liweishi > > Signed-off-by: Shujin Li > > That is not how things work really. > > I will submit this properly so that stable teams do not have to guess > how to backport this to various kernels. > > Changelog is misleading, this has nothing to do with BBR, we need to be > precise. > Thanks for your help. I can finally apply this patch into my kernel. Looking forward to your patchset :) Jason > Thank you.
Re: [PATCH v2 4.19] tcp: fix TCP socks unreleased in BBR mode
On Thu, Jun 4, 2020 at 2:01 AM wrote: > > From: Jason Xing > > When using BBR mode, too many tcp socks cannot be released because of > duplicate use of the sock_hold() in the manner of tcp_internal_pacing() > when RTO happens. Therefore, this situation maddly increases the slab > memory and then constantly triggers the OOM until crash. > > Besides, in addition to BBR mode, if some mode applies pacing function, > it could trigger what we've discussed above, > > Reproduce procedure: > 0) cat /proc/slabinfo | grep TCP > 1) switch net.ipv4.tcp_congestion_control to bbr > 2) using wrk tool something like that to send packages > 3) using tc to increase the delay and loss to simulate the RTO case. > 4) cat /proc/slabinfo | grep TCP > 5) kill the wrk command and observe the number of objects and slabs in > TCP. > 6) at last, you could notice that the number would not decrease. > > v2: extend the timer which could cover all those related potential risks > (suggested by Eric Dumazet and Neal Cardwell) > > Signed-off-by: Jason Xing > Signed-off-by: liweishi > Signed-off-by: Shujin Li That is not how things work really. I will submit this properly so that stable teams do not have to guess how to backport this to various kernels. Changelog is misleading, this has nothing to do with BBR, we need to be precise. Thank you.