Re: Soft lockup in inet_put_port on 4.6
> On Dec 19, 2016, at 11:52 PM, Eric Dumazetwrote: > > On Tue, 2016-12-20 at 03:40 +, Josef Bacik wrote: >>> On Dec 19, 2016, at 9:42 PM, Eric Dumazet wrote: >>> On Mon, 2016-12-19 at 18:07 -0800, Tom Herbert wrote: When sockets created SO_REUSEPORT move to TW state they are placed back on the the tb->owners. fastreuse port is no longer set so we have to walk potential long list of sockets in tb->owners to open a new listener socket. I imagine this is happens when we try to open a new listener SO_REUSEPORT after the system has been running a while and so we hit the long tb->owners list. >>> >>> Hmm... __inet_twsk_hashdance() does not change tb->fastreuse >>> >>> So where tb->fastreuse is cleared ? >>> >>> If all your sockets have SO_REUSEPORT set, this should not happen. >>> >> >> The app starts out with no SO_REUSEPORT, and then we restart it with >> that option enabled. > > But... why would the application do this dance ? > > I now better understand why we never had these issues... > It doesn't do it as a part of it's normal operation. The old version didn't use SO_REUSEPORT and then somebody added support for it, restarted the service with the new option enabled and boom. They immediately stopped doing anything and gave it to me to figure out. > >> What I suspect is we have all the twsks from the original service, >> and the fastreuse stuff is cleared. My naive patch resets it once we >> add a reuseport sk to the tb and that makes the problem go away. I'm >> reworking all of this logic and adding some extra info to the tb to >> make the reset actually safe. I'll send those patches out tomorrow. >> Thanks, > > Okay, we will review them ;) > > Note that Willy Tarreau wants some mechanism to be able to freeze a > listener, to allow haproxy to be replaced without closing any sessions. > I assume that's what these guys would want as well. They have some weird handoff thing they do when the app starts but I'm not entirely convinced it does what they think it does. Thanks, Josef
Re: Soft lockup in inet_put_port on 4.6
On Tue, 2016-12-20 at 03:40 +, Josef Bacik wrote: > > On Dec 19, 2016, at 9:42 PM, Eric Dumazetwrote: > > > >> On Mon, 2016-12-19 at 18:07 -0800, Tom Herbert wrote: > >> > >> When sockets created SO_REUSEPORT move to TW state they are placed > >> back on the the tb->owners. fastreuse port is no longer set so we have > >> to walk potential long list of sockets in tb->owners to open a new > >> listener socket. I imagine this is happens when we try to open a new > >> listener SO_REUSEPORT after the system has been running a while and so > >> we hit the long tb->owners list. > > > > Hmm... __inet_twsk_hashdance() does not change tb->fastreuse > > > > So where tb->fastreuse is cleared ? > > > > If all your sockets have SO_REUSEPORT set, this should not happen. > > > > The app starts out with no SO_REUSEPORT, and then we restart it with > that option enabled. But... why would the application do this dance ? I now better understand why we never had these issues... > What I suspect is we have all the twsks from the original service, > and the fastreuse stuff is cleared. My naive patch resets it once we > add a reuseport sk to the tb and that makes the problem go away. I'm > reworking all of this logic and adding some extra info to the tb to > make the reset actually safe. I'll send those patches out tomorrow. > Thanks, Okay, we will review them ;) Note that Willy Tarreau wants some mechanism to be able to freeze a listener, to allow haproxy to be replaced without closing any sessions.
Re: Soft lockup in inet_put_port on 4.6
> On Dec 19, 2016, at 9:42 PM, Eric Dumazetwrote: > >> On Mon, 2016-12-19 at 18:07 -0800, Tom Herbert wrote: >> >> When sockets created SO_REUSEPORT move to TW state they are placed >> back on the the tb->owners. fastreuse port is no longer set so we have >> to walk potential long list of sockets in tb->owners to open a new >> listener socket. I imagine this is happens when we try to open a new >> listener SO_REUSEPORT after the system has been running a while and so >> we hit the long tb->owners list. > > Hmm... __inet_twsk_hashdance() does not change tb->fastreuse > > So where tb->fastreuse is cleared ? > > If all your sockets have SO_REUSEPORT set, this should not happen. > The app starts out with no SO_REUSEPORT, and then we restart it with that option enabled. What I suspect is we have all the twsks from the original service, and the fastreuse stuff is cleared. My naive patch resets it once we add a reuseport sk to the tb and that makes the problem go away. I'm reworking all of this logic and adding some extra info to the tb to make the reset actually safe. I'll send those patches out tomorrow. Thanks, Josef
Re: Soft lockup in inet_put_port on 4.6
On Mon, 2016-12-19 at 18:07 -0800, Tom Herbert wrote: > When sockets created SO_REUSEPORT move to TW state they are placed > back on the the tb->owners. fastreuse port is no longer set so we have > to walk potential long list of sockets in tb->owners to open a new > listener socket. I imagine this is happens when we try to open a new > listener SO_REUSEPORT after the system has been running a while and so > we hit the long tb->owners list. Hmm... __inet_twsk_hashdance() does not change tb->fastreuse So where tb->fastreuse is cleared ? If all your sockets have SO_REUSEPORT set, this should not happen.
Re: Soft lockup in inet_put_port on 4.6
On Mon, Dec 19, 2016 at 5:56 PM, David Millerwrote: > From: Josef Bacik > Date: Sat, 17 Dec 2016 13:26:00 + > >> So take my current duct tape fix and augment it with more >> information in the bind bucket? I'm not sure how to make this work >> without at least having a list of the binded addrs as well to make >> sure we are really ok. I suppose we could save the fastreuseport >> address that last succeeded to make it work properly, but I'd have >> to make it protocol agnostic and then have a callback to have the >> protocol to make sure we don't have to do the bind_conflict run. Is >> that what you were thinking of? Thanks, > > So there isn't a deadlock or lockup here, something is just running > really slow, right? > Correct. > And that "something" is a scan of the sockets on a tb list, and > there's lots of timewait sockets hung off of that tb. > Yes. > As far as I can tell, this scan is happening in > inet_csk_bind_conflict(). > Yes. > Furthermore, reuseport is somehow required to make this problem > happen. How exactly? When sockets created SO_REUSEPORT move to TW state they are placed back on the the tb->owners. fastreuse port is no longer set so we have to walk potential long list of sockets in tb->owners to open a new listener socket. I imagine this is happens when we try to open a new listener SO_REUSEPORT after the system has been running a while and so we hit the long tb->owners list. Tom
Re: Soft lockup in inet_put_port on 4.6
From: Josef BacikDate: Sat, 17 Dec 2016 13:26:00 + > So take my current duct tape fix and augment it with more > information in the bind bucket? I'm not sure how to make this work > without at least having a list of the binded addrs as well to make > sure we are really ok. I suppose we could save the fastreuseport > address that last succeeded to make it work properly, but I'd have > to make it protocol agnostic and then have a callback to have the > protocol to make sure we don't have to do the bind_conflict run. Is > that what you were thinking of? Thanks, So there isn't a deadlock or lockup here, something is just running really slow, right? And that "something" is a scan of the sockets on a tb list, and there's lots of timewait sockets hung off of that tb. As far as I can tell, this scan is happening in inet_csk_bind_conflict(). Furthermore, reuseport is somehow required to make this problem happen. How exactly?
Re: Soft lockup in inet_put_port on 4.6
> On Dec 17, 2016, at 6:09 AM, Hannes Frederic Sowa >wrote: > >> On 16.12.2016 23:50, Josef Bacik wrote: >>> On Fri, Dec 16, 2016 at 5:18 PM, Tom Herbert wrote: On Fri, Dec 16, 2016 at 2:08 PM, Josef Bacik wrote: > On Fri, Dec 16, 2016 at 10:21 AM, Josef Bacik wrote: > >> On Fri, Dec 16, 2016 at 9:54 AM, Josef Bacik wrote: >> >> On Thu, Dec 15, 2016 at 7:07 PM, Hannes Frederic Sowa >> wrote: >>> >>> Hi Josef, >>> On 15.12.2016 19:53, Josef Bacik wrote: On Tue, Dec 13, 2016 at 6:32 PM, Tom Herbert wrote: > > On Tue, Dec 13, 2016 at 3:03 PM, Craig Gallek > > wrote: >> >> On Tue, Dec 13, 2016 at 3:51 PM, Tom Herbert >> >> wrote: >>> >>> I think there may be some suspicious code in >>> inet_csk_get_port. At >>> tb_found there is: >>> >>> if (((tb->fastreuse > 0 && reuse) || >>>(tb->fastreuseport > 0 && >>> >>> !rcu_access_pointer(sk->sk_reuseport_cb) && >>> sk->sk_reuseport && uid_eq(tb->fastuid, >>> uid))) && >>> smallest_size == -1) >>> goto success; >>> if >>> (inet_csk(sk)->icsk_af_ops->bind_conflict(sk, >>> tb, true)) { >>> if ((reuse || >>>(tb->fastreuseport > 0 && >>> sk->sk_reuseport && >>> >>> !rcu_access_pointer(sk->sk_reuseport_cb) && >>> uid_eq(tb->fastuid, uid))) && >>> smallest_size != -1 && >>> --attempts >= >>> 0) { >>> spin_unlock_bh(>lock); >>> goto again; >>> } >>> goto fail_unlock; >>> } >>> >>> AFAICT there is redundancy in these two conditionals. The >>> same >>> clause >>> is being checked in both: (tb->fastreuseport > 0 && >>> !rcu_access_pointer(sk->sk_reuseport_cb) && >>> sk->sk_reuseport && >>> uid_eq(tb->fastuid, uid))) && smallest_size == -1. If this >>> is true >>> the >>> first conditional should be hit, goto done, and the second >>> will >>> never >>> evaluate that part to true-- unless the sk is changed (do >>> we need >>> READ_ONCE for sk->sk_reuseport_cb?). >> >> That's an interesting point... It looks like this function also >> changed in 4.6 from using a single local_bh_disable() at the >> beginning >> with several spin_lock(>lock) to exclusively >> spin_lock_bh(>lock) at each locking point. Perhaps >> the full >> bh >> disable variant was preventing the timers in your stack >> trace from >> running interleaved with this function before? > > > Could be, although dropping the lock shouldn't be able to > affect the > search state. TBH, I'm a little lost in reading function, the > SO_REUSEPORT handling is pretty complicated. For instance, > rcu_access_pointer(sk->sk_reuseport_cb) is checked three times in > that > function and also in every call to inet_csk_bind_conflict. I > wonder > if > we can simply this under the assumption that SO_REUSEPORT is only > allowed if the port number (snum) is explicitly specified. Ok first I have data for you Hannes, here's the time distributions before during and after the lockup (with all the debugging in place the box eventually recovers). I've attached it as a text file since it is long. >>> >>> >>> Thanks a lot! >>> Second is I was thinking about why we would spend so much time doing the ->owners list, and obviously it's because of the massive amount of timewait sockets on the owners list. I wrote the following dumb patch and tested it and the problem has disappeared completely. Now I don't know if this is right at all, but I thought it was weird we weren't copying the soreuseport option from the original socket onto the twsk. Is there are reason
Re: Soft lockup in inet_put_port on 4.6
On 16.12.2016 23:50, Josef Bacik wrote: > On Fri, Dec 16, 2016 at 5:18 PM, Tom Herbertwrote: >> On Fri, Dec 16, 2016 at 2:08 PM, Josef Bacik wrote: >>> On Fri, Dec 16, 2016 at 10:21 AM, Josef Bacik wrote: On Fri, Dec 16, 2016 at 9:54 AM, Josef Bacik wrote: > > On Thu, Dec 15, 2016 at 7:07 PM, Hannes Frederic Sowa > wrote: >> >> Hi Josef, >> >> On 15.12.2016 19:53, Josef Bacik wrote: >>> >>> On Tue, Dec 13, 2016 at 6:32 PM, Tom Herbert >>> wrote: On Tue, Dec 13, 2016 at 3:03 PM, Craig Gallek wrote: > >On Tue, Dec 13, 2016 at 3:51 PM, Tom Herbert > > wrote: >> >>I think there may be some suspicious code in >> inet_csk_get_port. At >>tb_found there is: >> >>if (((tb->fastreuse > 0 && reuse) || >> (tb->fastreuseport > 0 && >> >> !rcu_access_pointer(sk->sk_reuseport_cb) && >> sk->sk_reuseport && uid_eq(tb->fastuid, >> uid))) && >>smallest_size == -1) >>goto success; >>if >> (inet_csk(sk)->icsk_af_ops->bind_conflict(sk, >> tb, true)) { >>if ((reuse || >> (tb->fastreuseport > 0 && >> sk->sk_reuseport && >> >> !rcu_access_pointer(sk->sk_reuseport_cb) && >> uid_eq(tb->fastuid, uid))) && >>smallest_size != -1 && >> --attempts >= >> 0) { >>spin_unlock_bh(>lock); >>goto again; >>} >>goto fail_unlock; >>} >> >>AFAICT there is redundancy in these two conditionals. The >> same >> clause >>is being checked in both: (tb->fastreuseport > 0 && >>!rcu_access_pointer(sk->sk_reuseport_cb) && >> sk->sk_reuseport && >>uid_eq(tb->fastuid, uid))) && smallest_size == -1. If this >> is true >> the >>first conditional should be hit, goto done, and the second >> will >> never >>evaluate that part to true-- unless the sk is changed (do >> we need >>READ_ONCE for sk->sk_reuseport_cb?). > >That's an interesting point... It looks like this function also >changed in 4.6 from using a single local_bh_disable() at the > beginning >with several spin_lock(>lock) to exclusively >spin_lock_bh(>lock) at each locking point. Perhaps > the full > bh >disable variant was preventing the timers in your stack > trace from >running interleaved with this function before? Could be, although dropping the lock shouldn't be able to affect the search state. TBH, I'm a little lost in reading function, the SO_REUSEPORT handling is pretty complicated. For instance, rcu_access_pointer(sk->sk_reuseport_cb) is checked three times in that function and also in every call to inet_csk_bind_conflict. I wonder if we can simply this under the assumption that SO_REUSEPORT is only allowed if the port number (snum) is explicitly specified. >>> >>> >>> Ok first I have data for you Hannes, here's the time distributions >>> before during and after the lockup (with all the debugging in >>> place >>> the >>> box eventually recovers). I've attached it as a text file >>> since it is >>> long. >> >> >> Thanks a lot! >> >>> Second is I was thinking about why we would spend so much time >>> doing >>> the >>> ->owners list, and obviously it's because of the massive amount of >>> timewait sockets on the owners list. I wrote the following >>> dumb patch >>> and tested it and the problem has disappeared completely. Now >>> I don't >>> know if this is right at all, but I thought it was weird we >>> weren't >>> copying the soreuseport option from the original socket onto >>> the twsk. >>> Is there are reason we aren't doing this currently? Does this >>> help >>> explain what is happening? Thanks, >> >> >> The patch is
Re: Soft lockup in inet_put_port on 4.6
On Fri, Dec 16, 2016 at 5:18 PM, Tom Herbertwrote: On Fri, Dec 16, 2016 at 2:08 PM, Josef Bacik wrote: On Fri, Dec 16, 2016 at 10:21 AM, Josef Bacik wrote: On Fri, Dec 16, 2016 at 9:54 AM, Josef Bacik wrote: On Thu, Dec 15, 2016 at 7:07 PM, Hannes Frederic Sowa wrote: Hi Josef, On 15.12.2016 19:53, Josef Bacik wrote: On Tue, Dec 13, 2016 at 6:32 PM, Tom Herbert wrote: On Tue, Dec 13, 2016 at 3:03 PM, Craig Gallek wrote: On Tue, Dec 13, 2016 at 3:51 PM, Tom Herbert wrote: I think there may be some suspicious code in inet_csk_get_port. At tb_found there is: if (((tb->fastreuse > 0 && reuse) || (tb->fastreuseport > 0 && !rcu_access_pointer(sk->sk_reuseport_cb) && sk->sk_reuseport && uid_eq(tb->fastuid, uid))) && smallest_size == -1) goto success; if (inet_csk(sk)->icsk_af_ops->bind_conflict(sk, tb, true)) { if ((reuse || (tb->fastreuseport > 0 && sk->sk_reuseport && !rcu_access_pointer(sk->sk_reuseport_cb) && uid_eq(tb->fastuid, uid))) && smallest_size != -1 && --attempts >= 0) { spin_unlock_bh(>lock); goto again; } goto fail_unlock; } AFAICT there is redundancy in these two conditionals. The same clause is being checked in both: (tb->fastreuseport > 0 && !rcu_access_pointer(sk->sk_reuseport_cb) && sk->sk_reuseport && uid_eq(tb->fastuid, uid))) && smallest_size == -1. If this is true the first conditional should be hit, goto done, and the second will never evaluate that part to true-- unless the sk is changed (do we need READ_ONCE for sk->sk_reuseport_cb?). That's an interesting point... It looks like this function also changed in 4.6 from using a single local_bh_disable() at the beginning with several spin_lock(>lock) to exclusively spin_lock_bh(>lock) at each locking point. Perhaps the full bh disable variant was preventing the timers in your stack trace from running interleaved with this function before? Could be, although dropping the lock shouldn't be able to affect the search state. TBH, I'm a little lost in reading function, the SO_REUSEPORT handling is pretty complicated. For instance, rcu_access_pointer(sk->sk_reuseport_cb) is checked three times in that function and also in every call to inet_csk_bind_conflict. I wonder if we can simply this under the assumption that SO_REUSEPORT is only allowed if the port number (snum) is explicitly specified. Ok first I have data for you Hannes, here's the time distributions before during and after the lockup (with all the debugging in place the box eventually recovers). I've attached it as a text file since it is long. Thanks a lot! Second is I was thinking about why we would spend so much time doing the ->owners list, and obviously it's because of the massive amount of timewait sockets on the owners list. I wrote the following dumb patch and tested it and the problem has disappeared completely. Now I don't know if this is right at all, but I thought it was weird we weren't copying the soreuseport option from the original socket onto the twsk. Is there are reason we aren't doing this currently? Does this help explain what is happening? Thanks, The patch is interesting and a good clue, but I am immediately a bit concerned that we don't copy/tag the socket with the uid also to keep the security properties for SO_REUSEPORT. I have to think a bit more about this. We have seen hangs during connect. I am afraid this patch wouldn't help there while also guaranteeing uniqueness. Yeah so I looked at the code some more and actually my patch is really bad. If sk2->sk_reuseport is set we'll look at sk2->sk_reuseport_cb, which is outside of the timewait sock, so that's definitely bad. But we should at least be setting it to 0 so that we don't do this normally. Unfortunately simply setting it to 0 doesn't fix the problem. So for some reason having ->sk_reuseport set to 1 on a timewait socket makes this problem non-existent, which is strange. So back to the drawing board I guess. I wonder if doing what craig suggested and batching the timewait timer expires so it hurts less would accomplish the same results. Thanks, Wait no I lied, we access the sk->sk_reuseport_cb, not sk2's. This is the code
Re: Soft lockup in inet_put_port on 4.6
On Fri, Dec 16, 2016 at 2:08 PM, Josef Bacikwrote: > On Fri, Dec 16, 2016 at 10:21 AM, Josef Bacik wrote: >> >> On Fri, Dec 16, 2016 at 9:54 AM, Josef Bacik wrote: >>> >>> On Thu, Dec 15, 2016 at 7:07 PM, Hannes Frederic Sowa >>> wrote: Hi Josef, On 15.12.2016 19:53, Josef Bacik wrote: > > On Tue, Dec 13, 2016 at 6:32 PM, Tom Herbert > wrote: >> >> On Tue, Dec 13, 2016 at 3:03 PM, Craig Gallek >> wrote: >>> >>> On Tue, Dec 13, 2016 at 3:51 PM, Tom Herbert >>> wrote: I think there may be some suspicious code in inet_csk_get_port. At tb_found there is: if (((tb->fastreuse > 0 && reuse) || (tb->fastreuseport > 0 && !rcu_access_pointer(sk->sk_reuseport_cb) && sk->sk_reuseport && uid_eq(tb->fastuid, uid))) && smallest_size == -1) goto success; if (inet_csk(sk)->icsk_af_ops->bind_conflict(sk, tb, true)) { if ((reuse || (tb->fastreuseport > 0 && sk->sk_reuseport && !rcu_access_pointer(sk->sk_reuseport_cb) && uid_eq(tb->fastuid, uid))) && smallest_size != -1 && --attempts >= 0) { spin_unlock_bh(>lock); goto again; } goto fail_unlock; } AFAICT there is redundancy in these two conditionals. The same clause is being checked in both: (tb->fastreuseport > 0 && !rcu_access_pointer(sk->sk_reuseport_cb) && sk->sk_reuseport && uid_eq(tb->fastuid, uid))) && smallest_size == -1. If this is true the first conditional should be hit, goto done, and the second will never evaluate that part to true-- unless the sk is changed (do we need READ_ONCE for sk->sk_reuseport_cb?). >>> >>> That's an interesting point... It looks like this function also >>> changed in 4.6 from using a single local_bh_disable() at the >>> beginning >>> with several spin_lock(>lock) to exclusively >>> spin_lock_bh(>lock) at each locking point. Perhaps the full >>> bh >>> disable variant was preventing the timers in your stack trace from >>> running interleaved with this function before? >> >> >> Could be, although dropping the lock shouldn't be able to affect the >> search state. TBH, I'm a little lost in reading function, the >> SO_REUSEPORT handling is pretty complicated. For instance, >> rcu_access_pointer(sk->sk_reuseport_cb) is checked three times in >> that >> function and also in every call to inet_csk_bind_conflict. I wonder >> if >> we can simply this under the assumption that SO_REUSEPORT is only >> allowed if the port number (snum) is explicitly specified. > > > Ok first I have data for you Hannes, here's the time distributions > before during and after the lockup (with all the debugging in place > the > box eventually recovers). I've attached it as a text file since it is > long. Thanks a lot! > Second is I was thinking about why we would spend so much time doing > the > ->owners list, and obviously it's because of the massive amount of > timewait sockets on the owners list. I wrote the following dumb patch > and tested it and the problem has disappeared completely. Now I don't > know if this is right at all, but I thought it was weird we weren't > copying the soreuseport option from the original socket onto the twsk. > Is there are reason we aren't doing this currently? Does this help > explain what is happening? Thanks, The patch is interesting and a good clue, but I am immediately a bit concerned that we don't copy/tag the socket with the uid also to keep the security properties for SO_REUSEPORT. I have to think a bit more about this. We have seen hangs during connect. I am afraid this patch wouldn't help there while also guaranteeing uniqueness. >>> >>> >>> >>> Yeah so I looked at the code some more and actually my patch is really >>> bad. If sk2->sk_reuseport is set we'll look at sk2->sk_reuseport_cb, which >>> is outside of the timewait sock, so that's definitely bad. >>> >>> But we should at least be
Re: Soft lockup in inet_put_port on 4.6
On Fri, Dec 16, 2016 at 10:21 AM, Josef Bacikwrote: On Fri, Dec 16, 2016 at 9:54 AM, Josef Bacik wrote: On Thu, Dec 15, 2016 at 7:07 PM, Hannes Frederic Sowa wrote: Hi Josef, On 15.12.2016 19:53, Josef Bacik wrote: On Tue, Dec 13, 2016 at 6:32 PM, Tom Herbert wrote: On Tue, Dec 13, 2016 at 3:03 PM, Craig Gallek wrote: On Tue, Dec 13, 2016 at 3:51 PM, Tom Herbert wrote: I think there may be some suspicious code in inet_csk_get_port. At tb_found there is: if (((tb->fastreuse > 0 && reuse) || (tb->fastreuseport > 0 && !rcu_access_pointer(sk->sk_reuseport_cb) && sk->sk_reuseport && uid_eq(tb->fastuid, uid))) && smallest_size == -1) goto success; if (inet_csk(sk)->icsk_af_ops->bind_conflict(sk, tb, true)) { if ((reuse || (tb->fastreuseport > 0 && sk->sk_reuseport && !rcu_access_pointer(sk->sk_reuseport_cb) && uid_eq(tb->fastuid, uid))) && smallest_size != -1 && --attempts >= 0) { spin_unlock_bh(>lock); goto again; } goto fail_unlock; } AFAICT there is redundancy in these two conditionals. The same clause is being checked in both: (tb->fastreuseport > 0 && !rcu_access_pointer(sk->sk_reuseport_cb) && sk->sk_reuseport && uid_eq(tb->fastuid, uid))) && smallest_size == -1. If this is true the first conditional should be hit, goto done, and the second will never evaluate that part to true-- unless the sk is changed (do we need READ_ONCE for sk->sk_reuseport_cb?). That's an interesting point... It looks like this function also changed in 4.6 from using a single local_bh_disable() at the beginning with several spin_lock(>lock) to exclusively spin_lock_bh(>lock) at each locking point. Perhaps the full bh disable variant was preventing the timers in your stack trace from running interleaved with this function before? Could be, although dropping the lock shouldn't be able to affect the search state. TBH, I'm a little lost in reading function, the SO_REUSEPORT handling is pretty complicated. For instance, rcu_access_pointer(sk->sk_reuseport_cb) is checked three times in that function and also in every call to inet_csk_bind_conflict. I wonder if we can simply this under the assumption that SO_REUSEPORT is only allowed if the port number (snum) is explicitly specified. Ok first I have data for you Hannes, here's the time distributions before during and after the lockup (with all the debugging in place the box eventually recovers). I've attached it as a text file since it is long. Thanks a lot! Second is I was thinking about why we would spend so much time doing the ->owners list, and obviously it's because of the massive amount of timewait sockets on the owners list. I wrote the following dumb patch and tested it and the problem has disappeared completely. Now I don't know if this is right at all, but I thought it was weird we weren't copying the soreuseport option from the original socket onto the twsk. Is there are reason we aren't doing this currently? Does this help explain what is happening? Thanks, The patch is interesting and a good clue, but I am immediately a bit concerned that we don't copy/tag the socket with the uid also to keep the security properties for SO_REUSEPORT. I have to think a bit more about this. We have seen hangs during connect. I am afraid this patch wouldn't help there while also guaranteeing uniqueness. Yeah so I looked at the code some more and actually my patch is really bad. If sk2->sk_reuseport is set we'll look at sk2->sk_reuseport_cb, which is outside of the timewait sock, so that's definitely bad. But we should at least be setting it to 0 so that we don't do this normally. Unfortunately simply setting it to 0 doesn't fix the problem. So for some reason having ->sk_reuseport set to 1 on a timewait socket makes this problem non-existent, which is strange. So back to the drawing board I guess. I wonder if doing what craig suggested and batching the timewait timer expires so it hurts less would accomplish the same results. Thanks, Wait no I lied, we access the sk->sk_reuseport_cb, not sk2's. This is the code if ((!reuse || !sk2->sk_reuse || sk2->sk_state == TCP_LISTEN) && (!reuseport || !sk2->sk_reuseport || rcu_access_pointer(sk->sk_reuseport_cb) ||
Re: Soft lockup in inet_put_port on 4.6
On Fri, Dec 16, 2016 at 9:54 AM, Josef Bacikwrote: On Thu, Dec 15, 2016 at 7:07 PM, Hannes Frederic Sowa wrote: Hi Josef, On 15.12.2016 19:53, Josef Bacik wrote: On Tue, Dec 13, 2016 at 6:32 PM, Tom Herbert wrote: On Tue, Dec 13, 2016 at 3:03 PM, Craig Gallek wrote: On Tue, Dec 13, 2016 at 3:51 PM, Tom Herbert wrote: I think there may be some suspicious code in inet_csk_get_port. At tb_found there is: if (((tb->fastreuse > 0 && reuse) || (tb->fastreuseport > 0 && !rcu_access_pointer(sk->sk_reuseport_cb) && sk->sk_reuseport && uid_eq(tb->fastuid, uid))) && smallest_size == -1) goto success; if (inet_csk(sk)->icsk_af_ops->bind_conflict(sk, tb, true)) { if ((reuse || (tb->fastreuseport > 0 && sk->sk_reuseport && !rcu_access_pointer(sk->sk_reuseport_cb) && uid_eq(tb->fastuid, uid))) && smallest_size != -1 && --attempts >= 0) { spin_unlock_bh(>lock); goto again; } goto fail_unlock; } AFAICT there is redundancy in these two conditionals. The same clause is being checked in both: (tb->fastreuseport > 0 && !rcu_access_pointer(sk->sk_reuseport_cb) && sk->sk_reuseport && uid_eq(tb->fastuid, uid))) && smallest_size == -1. If this is true the first conditional should be hit, goto done, and the second will never evaluate that part to true-- unless the sk is changed (do we need READ_ONCE for sk->sk_reuseport_cb?). That's an interesting point... It looks like this function also changed in 4.6 from using a single local_bh_disable() at the beginning with several spin_lock(>lock) to exclusively spin_lock_bh(>lock) at each locking point. Perhaps the full bh disable variant was preventing the timers in your stack trace from running interleaved with this function before? Could be, although dropping the lock shouldn't be able to affect the search state. TBH, I'm a little lost in reading function, the SO_REUSEPORT handling is pretty complicated. For instance, rcu_access_pointer(sk->sk_reuseport_cb) is checked three times in that function and also in every call to inet_csk_bind_conflict. I wonder if we can simply this under the assumption that SO_REUSEPORT is only allowed if the port number (snum) is explicitly specified. Ok first I have data for you Hannes, here's the time distributions before during and after the lockup (with all the debugging in place the box eventually recovers). I've attached it as a text file since it is long. Thanks a lot! Second is I was thinking about why we would spend so much time doing the ->owners list, and obviously it's because of the massive amount of timewait sockets on the owners list. I wrote the following dumb patch and tested it and the problem has disappeared completely. Now I don't know if this is right at all, but I thought it was weird we weren't copying the soreuseport option from the original socket onto the twsk. Is there are reason we aren't doing this currently? Does this help explain what is happening? Thanks, The patch is interesting and a good clue, but I am immediately a bit concerned that we don't copy/tag the socket with the uid also to keep the security properties for SO_REUSEPORT. I have to think a bit more about this. We have seen hangs during connect. I am afraid this patch wouldn't help there while also guaranteeing uniqueness. Yeah so I looked at the code some more and actually my patch is really bad. If sk2->sk_reuseport is set we'll look at sk2->sk_reuseport_cb, which is outside of the timewait sock, so that's definitely bad. But we should at least be setting it to 0 so that we don't do this normally. Unfortunately simply setting it to 0 doesn't fix the problem. So for some reason having ->sk_reuseport set to 1 on a timewait socket makes this problem non-existent, which is strange. So back to the drawing board I guess. I wonder if doing what craig suggested and batching the timewait timer expires so it hurts less would accomplish the same results. Thanks, Wait no I lied, we access the sk->sk_reuseport_cb, not sk2's. This is the code if ((!reuse || !sk2->sk_reuse || sk2->sk_state == TCP_LISTEN) && (!reuseport || !sk2->sk_reuseport || rcu_access_pointer(sk->sk_reuseport_cb) || (sk2->sk_state != TCP_TIME_WAIT &&
Re: Soft lockup in inet_put_port on 4.6
On Thu, Dec 15, 2016 at 7:07 PM, Hannes Frederic Sowawrote: Hi Josef, On 15.12.2016 19:53, Josef Bacik wrote: On Tue, Dec 13, 2016 at 6:32 PM, Tom Herbert wrote: On Tue, Dec 13, 2016 at 3:03 PM, Craig Gallek wrote: On Tue, Dec 13, 2016 at 3:51 PM, Tom Herbert wrote: I think there may be some suspicious code in inet_csk_get_port. At tb_found there is: if (((tb->fastreuse > 0 && reuse) || (tb->fastreuseport > 0 && !rcu_access_pointer(sk->sk_reuseport_cb) && sk->sk_reuseport && uid_eq(tb->fastuid, uid))) && smallest_size == -1) goto success; if (inet_csk(sk)->icsk_af_ops->bind_conflict(sk, tb, true)) { if ((reuse || (tb->fastreuseport > 0 && sk->sk_reuseport && !rcu_access_pointer(sk->sk_reuseport_cb) && uid_eq(tb->fastuid, uid))) && smallest_size != -1 && --attempts >= 0) { spin_unlock_bh(>lock); goto again; } goto fail_unlock; } AFAICT there is redundancy in these two conditionals. The same clause is being checked in both: (tb->fastreuseport > 0 && !rcu_access_pointer(sk->sk_reuseport_cb) && sk->sk_reuseport && uid_eq(tb->fastuid, uid))) && smallest_size == -1. If this is true the first conditional should be hit, goto done, and the second will never evaluate that part to true-- unless the sk is changed (do we need READ_ONCE for sk->sk_reuseport_cb?). That's an interesting point... It looks like this function also changed in 4.6 from using a single local_bh_disable() at the beginning with several spin_lock(>lock) to exclusively spin_lock_bh(>lock) at each locking point. Perhaps the full bh disable variant was preventing the timers in your stack trace from running interleaved with this function before? Could be, although dropping the lock shouldn't be able to affect the search state. TBH, I'm a little lost in reading function, the SO_REUSEPORT handling is pretty complicated. For instance, rcu_access_pointer(sk->sk_reuseport_cb) is checked three times in that function and also in every call to inet_csk_bind_conflict. I wonder if we can simply this under the assumption that SO_REUSEPORT is only allowed if the port number (snum) is explicitly specified. Ok first I have data for you Hannes, here's the time distributions before during and after the lockup (with all the debugging in place the box eventually recovers). I've attached it as a text file since it is long. Thanks a lot! Second is I was thinking about why we would spend so much time doing the ->owners list, and obviously it's because of the massive amount of timewait sockets on the owners list. I wrote the following dumb patch and tested it and the problem has disappeared completely. Now I don't know if this is right at all, but I thought it was weird we weren't copying the soreuseport option from the original socket onto the twsk. Is there are reason we aren't doing this currently? Does this help explain what is happening? Thanks, The patch is interesting and a good clue, but I am immediately a bit concerned that we don't copy/tag the socket with the uid also to keep the security properties for SO_REUSEPORT. I have to think a bit more about this. We have seen hangs during connect. I am afraid this patch wouldn't help there while also guaranteeing uniqueness. Yeah so I looked at the code some more and actually my patch is really bad. If sk2->sk_reuseport is set we'll look at sk2->sk_reuseport_cb, which is outside of the timewait sock, so that's definitely bad. But we should at least be setting it to 0 so that we don't do this normally. Unfortunately simply setting it to 0 doesn't fix the problem. So for some reason having ->sk_reuseport set to 1 on a timewait socket makes this problem non-existent, which is strange. So back to the drawing board I guess. I wonder if doing what craig suggested and batching the timewait timer expires so it hurts less would accomplish the same results. Thanks, Josef
Re: Soft lockup in inet_put_port on 4.6
Hi Josef, On 15.12.2016 19:53, Josef Bacik wrote: > On Tue, Dec 13, 2016 at 6:32 PM, Tom Herbertwrote: >> On Tue, Dec 13, 2016 at 3:03 PM, Craig Gallek >> wrote: >>> On Tue, Dec 13, 2016 at 3:51 PM, Tom Herbert >>> wrote: I think there may be some suspicious code in inet_csk_get_port. At tb_found there is: if (((tb->fastreuse > 0 && reuse) || (tb->fastreuseport > 0 && !rcu_access_pointer(sk->sk_reuseport_cb) && sk->sk_reuseport && uid_eq(tb->fastuid, uid))) && smallest_size == -1) goto success; if (inet_csk(sk)->icsk_af_ops->bind_conflict(sk, tb, true)) { if ((reuse || (tb->fastreuseport > 0 && sk->sk_reuseport && !rcu_access_pointer(sk->sk_reuseport_cb) && uid_eq(tb->fastuid, uid))) && smallest_size != -1 && --attempts >= 0) { spin_unlock_bh(>lock); goto again; } goto fail_unlock; } AFAICT there is redundancy in these two conditionals. The same clause is being checked in both: (tb->fastreuseport > 0 && !rcu_access_pointer(sk->sk_reuseport_cb) && sk->sk_reuseport && uid_eq(tb->fastuid, uid))) && smallest_size == -1. If this is true the first conditional should be hit, goto done, and the second will never evaluate that part to true-- unless the sk is changed (do we need READ_ONCE for sk->sk_reuseport_cb?). >>> That's an interesting point... It looks like this function also >>> changed in 4.6 from using a single local_bh_disable() at the beginning >>> with several spin_lock(>lock) to exclusively >>> spin_lock_bh(>lock) at each locking point. Perhaps the full bh >>> disable variant was preventing the timers in your stack trace from >>> running interleaved with this function before? >> >> Could be, although dropping the lock shouldn't be able to affect the >> search state. TBH, I'm a little lost in reading function, the >> SO_REUSEPORT handling is pretty complicated. For instance, >> rcu_access_pointer(sk->sk_reuseport_cb) is checked three times in that >> function and also in every call to inet_csk_bind_conflict. I wonder if >> we can simply this under the assumption that SO_REUSEPORT is only >> allowed if the port number (snum) is explicitly specified. > > Ok first I have data for you Hannes, here's the time distributions > before during and after the lockup (with all the debugging in place the > box eventually recovers). I've attached it as a text file since it is > long. Thanks a lot! > Second is I was thinking about why we would spend so much time doing the > ->owners list, and obviously it's because of the massive amount of > timewait sockets on the owners list. I wrote the following dumb patch > and tested it and the problem has disappeared completely. Now I don't > know if this is right at all, but I thought it was weird we weren't > copying the soreuseport option from the original socket onto the twsk. > Is there are reason we aren't doing this currently? Does this help > explain what is happening? Thanks, The patch is interesting and a good clue, but I am immediately a bit concerned that we don't copy/tag the socket with the uid also to keep the security properties for SO_REUSEPORT. I have to think a bit more about this. We have seen hangs during connect. I am afraid this patch wouldn't help there while also guaranteeing uniqueness. Thanks a lot!
Re: Soft lockup in inet_put_port on 4.6
On Thu, Dec 15, 2016 at 5:39 PM, Tom Herbertwrote: > On Thu, Dec 15, 2016 at 10:53 AM, Josef Bacik wrote: >> On Tue, Dec 13, 2016 at 6:32 PM, Tom Herbert wrote: >>> >>> On Tue, Dec 13, 2016 at 3:03 PM, Craig Gallek >>> wrote: On Tue, Dec 13, 2016 at 3:51 PM, Tom Herbert wrote: > > I think there may be some suspicious code in inet_csk_get_port. At > tb_found there is: > > if (((tb->fastreuse > 0 && reuse) || > (tb->fastreuseport > 0 && >!rcu_access_pointer(sk->sk_reuseport_cb) && >sk->sk_reuseport && uid_eq(tb->fastuid, uid))) && > smallest_size == -1) > goto success; > if (inet_csk(sk)->icsk_af_ops->bind_conflict(sk, tb, > true)) { > if ((reuse || > (tb->fastreuseport > 0 && >sk->sk_reuseport && >!rcu_access_pointer(sk->sk_reuseport_cb) > && >uid_eq(tb->fastuid, uid))) && > smallest_size != -1 && --attempts >= 0) { > spin_unlock_bh(>lock); > goto again; > } > goto fail_unlock; > } > > AFAICT there is redundancy in these two conditionals. The same clause > is being checked in both: (tb->fastreuseport > 0 && > !rcu_access_pointer(sk->sk_reuseport_cb) && sk->sk_reuseport && > uid_eq(tb->fastuid, uid))) && smallest_size == -1. If this is true the > first conditional should be hit, goto done, and the second will never > evaluate that part to true-- unless the sk is changed (do we need > READ_ONCE for sk->sk_reuseport_cb?). That's an interesting point... It looks like this function also changed in 4.6 from using a single local_bh_disable() at the beginning with several spin_lock(>lock) to exclusively spin_lock_bh(>lock) at each locking point. Perhaps the full bh disable variant was preventing the timers in your stack trace from running interleaved with this function before? >>> >>> >>> Could be, although dropping the lock shouldn't be able to affect the >>> search state. TBH, I'm a little lost in reading function, the >>> SO_REUSEPORT handling is pretty complicated. For instance, >>> rcu_access_pointer(sk->sk_reuseport_cb) is checked three times in that >>> function and also in every call to inet_csk_bind_conflict. I wonder if >>> we can simply this under the assumption that SO_REUSEPORT is only >>> allowed if the port number (snum) is explicitly specified. >> >> >> Ok first I have data for you Hannes, here's the time distributions before >> during and after the lockup (with all the debugging in place the box >> eventually recovers). I've attached it as a text file since it is long. >> >> Second is I was thinking about why we would spend so much time doing the >> ->owners list, and obviously it's because of the massive amount of timewait >> sockets on the owners list. I wrote the following dumb patch and tested it >> and the problem has disappeared completely. Now I don't know if this is >> right at all, but I thought it was weird we weren't copying the soreuseport >> option from the original socket onto the twsk. Is there are reason we >> aren't doing this currently? Does this help explain what is happening? >> Thanks, >> > I think that would explain it. We would be walking long lists of TW > sockets in inet_bind_bucket_for_each(tb, >chain). This should > break, although now I'm wondering if there's other ways we can get > into this situation. reuseport ensures that we can have long lists of > sockets in a single bucket, TW sockets can make that list really long. What if the time-wait timer implementation was changed to do more opportunistic removals? In this case, you seem to have a coordinated timer event causing many independent locking events on the bucket in question. If one of those firing events realized it could handle all of them, you could greatly reduce the contention. The fact that they all hash to the same bucket may make this even easier... > Tom > >> Josef
Re: Soft lockup in inet_put_port on 4.6
On Thu, Dec 15, 2016 at 10:53 AM, Josef Bacikwrote: > On Tue, Dec 13, 2016 at 6:32 PM, Tom Herbert wrote: >> >> On Tue, Dec 13, 2016 at 3:03 PM, Craig Gallek >> wrote: >>> >>> On Tue, Dec 13, 2016 at 3:51 PM, Tom Herbert >>> wrote: I think there may be some suspicious code in inet_csk_get_port. At tb_found there is: if (((tb->fastreuse > 0 && reuse) || (tb->fastreuseport > 0 && !rcu_access_pointer(sk->sk_reuseport_cb) && sk->sk_reuseport && uid_eq(tb->fastuid, uid))) && smallest_size == -1) goto success; if (inet_csk(sk)->icsk_af_ops->bind_conflict(sk, tb, true)) { if ((reuse || (tb->fastreuseport > 0 && sk->sk_reuseport && !rcu_access_pointer(sk->sk_reuseport_cb) && uid_eq(tb->fastuid, uid))) && smallest_size != -1 && --attempts >= 0) { spin_unlock_bh(>lock); goto again; } goto fail_unlock; } AFAICT there is redundancy in these two conditionals. The same clause is being checked in both: (tb->fastreuseport > 0 && !rcu_access_pointer(sk->sk_reuseport_cb) && sk->sk_reuseport && uid_eq(tb->fastuid, uid))) && smallest_size == -1. If this is true the first conditional should be hit, goto done, and the second will never evaluate that part to true-- unless the sk is changed (do we need READ_ONCE for sk->sk_reuseport_cb?). >>> >>> That's an interesting point... It looks like this function also >>> changed in 4.6 from using a single local_bh_disable() at the beginning >>> with several spin_lock(>lock) to exclusively >>> spin_lock_bh(>lock) at each locking point. Perhaps the full bh >>> disable variant was preventing the timers in your stack trace from >>> running interleaved with this function before? >> >> >> Could be, although dropping the lock shouldn't be able to affect the >> search state. TBH, I'm a little lost in reading function, the >> SO_REUSEPORT handling is pretty complicated. For instance, >> rcu_access_pointer(sk->sk_reuseport_cb) is checked three times in that >> function and also in every call to inet_csk_bind_conflict. I wonder if >> we can simply this under the assumption that SO_REUSEPORT is only >> allowed if the port number (snum) is explicitly specified. > > > Ok first I have data for you Hannes, here's the time distributions before > during and after the lockup (with all the debugging in place the box > eventually recovers). I've attached it as a text file since it is long. > > Second is I was thinking about why we would spend so much time doing the > ->owners list, and obviously it's because of the massive amount of timewait > sockets on the owners list. I wrote the following dumb patch and tested it > and the problem has disappeared completely. Now I don't know if this is > right at all, but I thought it was weird we weren't copying the soreuseport > option from the original socket onto the twsk. Is there are reason we > aren't doing this currently? Does this help explain what is happening? > Thanks, > I think that would explain it. We would be walking long lists of TW sockets in inet_bind_bucket_for_each(tb, >chain). This should break, although now I'm wondering if there's other ways we can get into this situation. reuseport ensures that we can have long lists of sockets in a single bucket, TW sockets can make that list really long. Tom > Josef
Re: Soft lockup in inet_put_port on 4.6
On Tue, Dec 13, 2016 at 6:32 PM, Tom Herbertwrote: On Tue, Dec 13, 2016 at 3:03 PM, Craig Gallek wrote: On Tue, Dec 13, 2016 at 3:51 PM, Tom Herbert wrote: I think there may be some suspicious code in inet_csk_get_port. At tb_found there is: if (((tb->fastreuse > 0 && reuse) || (tb->fastreuseport > 0 && !rcu_access_pointer(sk->sk_reuseport_cb) && sk->sk_reuseport && uid_eq(tb->fastuid, uid))) && smallest_size == -1) goto success; if (inet_csk(sk)->icsk_af_ops->bind_conflict(sk, tb, true)) { if ((reuse || (tb->fastreuseport > 0 && sk->sk_reuseport && !rcu_access_pointer(sk->sk_reuseport_cb) && uid_eq(tb->fastuid, uid))) && smallest_size != -1 && --attempts >= 0) { spin_unlock_bh(>lock); goto again; } goto fail_unlock; } AFAICT there is redundancy in these two conditionals. The same clause is being checked in both: (tb->fastreuseport > 0 && !rcu_access_pointer(sk->sk_reuseport_cb) && sk->sk_reuseport && uid_eq(tb->fastuid, uid))) && smallest_size == -1. If this is true the first conditional should be hit, goto done, and the second will never evaluate that part to true-- unless the sk is changed (do we need READ_ONCE for sk->sk_reuseport_cb?). That's an interesting point... It looks like this function also changed in 4.6 from using a single local_bh_disable() at the beginning with several spin_lock(>lock) to exclusively spin_lock_bh(>lock) at each locking point. Perhaps the full bh disable variant was preventing the timers in your stack trace from running interleaved with this function before? Could be, although dropping the lock shouldn't be able to affect the search state. TBH, I'm a little lost in reading function, the SO_REUSEPORT handling is pretty complicated. For instance, rcu_access_pointer(sk->sk_reuseport_cb) is checked three times in that function and also in every call to inet_csk_bind_conflict. I wonder if we can simply this under the assumption that SO_REUSEPORT is only allowed if the port number (snum) is explicitly specified. Ok first I have data for you Hannes, here's the time distributions before during and after the lockup (with all the debugging in place the box eventually recovers). I've attached it as a text file since it is long. Second is I was thinking about why we would spend so much time doing the ->owners list, and obviously it's because of the massive amount of timewait sockets on the owners list. I wrote the following dumb patch and tested it and the problem has disappeared completely. Now I don't know if this is right at all, but I thought it was weird we weren't copying the soreuseport option from the original socket onto the twsk. Is there are reason we aren't doing this currently? Does this help explain what is happening? Thanks, Josef inet_csk_get_port : count distribution 0 -> 1 : 0|| 2 -> 3 : 0|| 4 -> 7 : 0|| 8 -> 15 : 0|| 16 -> 31 : 0|| 32 -> 63 : 0|| 64 -> 127: 0|| 128 -> 255: 0|| 256 -> 511: 0|| 512 -> 1023 : 0|| 1024 -> 2047 : 4|* | 2048 -> 4095 : 100 || 4096 -> 8191 : 64 |* | 8192 -> 16383 : 35 |** | 16384 -> 32767 : 2|| inet_csk_bind_conflict : count distribution 0 -> 1 : 0|| 2 -> 3 : 0|| 4 -> 7 : 0|| 8 -> 15 : 0|| 16 -> 31 : 0||
Re: Soft lockup in inet_put_port on 4.6
On Tue, Dec 13, 2016 at 3:03 PM, Craig Gallekwrote: > On Tue, Dec 13, 2016 at 3:51 PM, Tom Herbert wrote: >> I think there may be some suspicious code in inet_csk_get_port. At >> tb_found there is: >> >> if (((tb->fastreuse > 0 && reuse) || >> (tb->fastreuseport > 0 && >> !rcu_access_pointer(sk->sk_reuseport_cb) && >> sk->sk_reuseport && uid_eq(tb->fastuid, uid))) && >> smallest_size == -1) >> goto success; >> if (inet_csk(sk)->icsk_af_ops->bind_conflict(sk, tb, true)) { >> if ((reuse || >> (tb->fastreuseport > 0 && >> sk->sk_reuseport && >> !rcu_access_pointer(sk->sk_reuseport_cb) && >> uid_eq(tb->fastuid, uid))) && >> smallest_size != -1 && --attempts >= 0) { >> spin_unlock_bh(>lock); >> goto again; >> } >> goto fail_unlock; >> } >> >> AFAICT there is redundancy in these two conditionals. The same clause >> is being checked in both: (tb->fastreuseport > 0 && >> !rcu_access_pointer(sk->sk_reuseport_cb) && sk->sk_reuseport && >> uid_eq(tb->fastuid, uid))) && smallest_size == -1. If this is true the >> first conditional should be hit, goto done, and the second will never >> evaluate that part to true-- unless the sk is changed (do we need >> READ_ONCE for sk->sk_reuseport_cb?). > That's an interesting point... It looks like this function also > changed in 4.6 from using a single local_bh_disable() at the beginning > with several spin_lock(>lock) to exclusively > spin_lock_bh(>lock) at each locking point. Perhaps the full bh > disable variant was preventing the timers in your stack trace from > running interleaved with this function before? Could be, although dropping the lock shouldn't be able to affect the search state. TBH, I'm a little lost in reading function, the SO_REUSEPORT handling is pretty complicated. For instance, rcu_access_pointer(sk->sk_reuseport_cb) is checked three times in that function and also in every call to inet_csk_bind_conflict. I wonder if we can simply this under the assumption that SO_REUSEPORT is only allowed if the port number (snum) is explicitly specified. Tom
Re: Soft lockup in inet_put_port on 4.6
On Tue, Dec 13, 2016 at 3:51 PM, Tom Herbertwrote: > I think there may be some suspicious code in inet_csk_get_port. At > tb_found there is: > > if (((tb->fastreuse > 0 && reuse) || > (tb->fastreuseport > 0 && > !rcu_access_pointer(sk->sk_reuseport_cb) && > sk->sk_reuseport && uid_eq(tb->fastuid, uid))) && > smallest_size == -1) > goto success; > if (inet_csk(sk)->icsk_af_ops->bind_conflict(sk, tb, true)) { > if ((reuse || > (tb->fastreuseport > 0 && > sk->sk_reuseport && > !rcu_access_pointer(sk->sk_reuseport_cb) && > uid_eq(tb->fastuid, uid))) && > smallest_size != -1 && --attempts >= 0) { > spin_unlock_bh(>lock); > goto again; > } > goto fail_unlock; > } > > AFAICT there is redundancy in these two conditionals. The same clause > is being checked in both: (tb->fastreuseport > 0 && > !rcu_access_pointer(sk->sk_reuseport_cb) && sk->sk_reuseport && > uid_eq(tb->fastuid, uid))) && smallest_size == -1. If this is true the > first conditional should be hit, goto done, and the second will never > evaluate that part to true-- unless the sk is changed (do we need > READ_ONCE for sk->sk_reuseport_cb?). That's an interesting point... It looks like this function also changed in 4.6 from using a single local_bh_disable() at the beginning with several spin_lock(>lock) to exclusively spin_lock_bh(>lock) at each locking point. Perhaps the full bh disable variant was preventing the timers in your stack trace from running interleaved with this function before?
Re: Soft lockup in inet_put_port on 4.6
I think there may be some suspicious code in inet_csk_get_port. At tb_found there is: if (((tb->fastreuse > 0 && reuse) || (tb->fastreuseport > 0 && !rcu_access_pointer(sk->sk_reuseport_cb) && sk->sk_reuseport && uid_eq(tb->fastuid, uid))) && smallest_size == -1) goto success; if (inet_csk(sk)->icsk_af_ops->bind_conflict(sk, tb, true)) { if ((reuse || (tb->fastreuseport > 0 && sk->sk_reuseport && !rcu_access_pointer(sk->sk_reuseport_cb) && uid_eq(tb->fastuid, uid))) && smallest_size != -1 && --attempts >= 0) { spin_unlock_bh(>lock); goto again; } goto fail_unlock; } AFAICT there is redundancy in these two conditionals. The same clause is being checked in both: (tb->fastreuseport > 0 && !rcu_access_pointer(sk->sk_reuseport_cb) && sk->sk_reuseport && uid_eq(tb->fastuid, uid))) && smallest_size == -1. If this is true the first conditional should be hit, goto done, and the second will never evaluate that part to true-- unless the sk is changed (do we need READ_ONCE for sk->sk_reuseport_cb?). Another potential issue is the that the goto again goes back to doing the port scan, but if snum had been set originally that doesn't seem like what we want. Thanks, Tom On Mon, Dec 12, 2016 at 2:24 PM, Josef Bacikwrote: > > On Mon, Dec 12, 2016 at 1:44 PM, Hannes Frederic Sowa > wrote: >> >> On 12.12.2016 19:05, Josef Bacik wrote: >>> >>> On Fri, Dec 9, 2016 at 11:14 PM, Eric Dumazet >>> wrote: On Fri, 2016-12-09 at 19:47 -0800, Eric Dumazet wrote: > > Hmm... Is your ephemeral port range includes the port your load > balancing app is using ? I suspect that you might have processes doing bind( port = 0) that are trapped into the bind_conflict() scan ? With 100,000 + timewaits there, this possibly hurts. Can you try the following loop breaker ? >>> >>> >>> It doesn't appear that the app is doing bind(port = 0) during normal >>> operation. I tested this patch and it made no difference. I'm going to >>> test simply restarting the app without changing to the SO_REUSEPORT >>> option. Thanks, >> >> >> Would it be possible to trace the time the function uses with trace? If >> we don't see the number growing considerably over time we probably can >> rule out that we loop somewhere in there (I would instrument >> inet_csk_bind_conflict, __inet_hash_connect and inet_csk_get_port). >> >> __inet_hash_connect -> __inet_check_established also takes a lock >> (inet_ehash_lockp) which can be locked from inet_diag code path during >> socket diag info dumping. >> >> Unfortunately we couldn't reproduce it so far. :/ > > > So I had a bcc script running to time how long we spent in > inet_csk_bind_conflict, __inet_hash_connect and inet_csk_get_port, but of > course I'm an idiot and didn't actually separate out the stats so I could > tell _which_ one was taking forever. But anyway here's a normal > distribution on the box > > Some shit : count distribution > 0 -> 1 : 0| | > 2 -> 3 : 0| | > 4 -> 7 : 0| | > 8 -> 15 : 0| | >16 -> 31 : 0| | >32 -> 63 : 0| | >64 -> 127: 0| | > 128 -> 255: 0| | > 256 -> 511: 0| | > 512 -> 1023 : 0| | > 1024 -> 2047 : 74 | | > 2048 -> 4095 : 10537 > || > 4096 -> 8191 : 8497 | | > 8192 -> 16383 : 3745 |** | > 16384 -> 32767 : 300 |* | > 32768 -> 65535 : 250 | | > 65536 -> 131071 : 180 | | >131072 -> 262143 : 71 | | >262144 -> 524287 : 18 | | >
Re: Soft lockup in inet_put_port on 4.6
On Mon, Dec 12, 2016 at 1:44 PM, Hannes Frederic Sowawrote: On 12.12.2016 19:05, Josef Bacik wrote: On Fri, Dec 9, 2016 at 11:14 PM, Eric Dumazet wrote: On Fri, 2016-12-09 at 19:47 -0800, Eric Dumazet wrote: Hmm... Is your ephemeral port range includes the port your load balancing app is using ? I suspect that you might have processes doing bind( port = 0) that are trapped into the bind_conflict() scan ? With 100,000 + timewaits there, this possibly hurts. Can you try the following loop breaker ? It doesn't appear that the app is doing bind(port = 0) during normal operation. I tested this patch and it made no difference. I'm going to test simply restarting the app without changing to the SO_REUSEPORT option. Thanks, Would it be possible to trace the time the function uses with trace? If we don't see the number growing considerably over time we probably can rule out that we loop somewhere in there (I would instrument inet_csk_bind_conflict, __inet_hash_connect and inet_csk_get_port). __inet_hash_connect -> __inet_check_established also takes a lock (inet_ehash_lockp) which can be locked from inet_diag code path during socket diag info dumping. Unfortunately we couldn't reproduce it so far. :/ So I had a bcc script running to time how long we spent in inet_csk_bind_conflict, __inet_hash_connect and inet_csk_get_port, but of course I'm an idiot and didn't actually separate out the stats so I could tell _which_ one was taking forever. But anyway here's a normal distribution on the box Some shit : count distribution 0 -> 1 : 0| | 2 -> 3 : 0| | 4 -> 7 : 0| | 8 -> 15 : 0| | 16 -> 31 : 0| | 32 -> 63 : 0| | 64 -> 127: 0| | 128 -> 255: 0| | 256 -> 511: 0| | 512 -> 1023 : 0| | 1024 -> 2047 : 74 | | 2048 -> 4095 : 10537 || 4096 -> 8191 : 8497 | | 8192 -> 16383 : 3745 |** | 16384 -> 32767 : 300 |* | 32768 -> 65535 : 250 | | 65536 -> 131071 : 180 | | 131072 -> 262143 : 71 | | 262144 -> 524287 : 18 | | 524288 -> 1048575: 5| | With the times in nanoseconds, and here's the distribution during the problem Some shit : count distribution 0 -> 1 : 0| | 2 -> 3 : 0| | 4 -> 7 : 0| | 8 -> 15 : 0| | 16 -> 31 : 0| | 32 -> 63 : 0| | 64 -> 127: 0| | 128 -> 255: 0| | 256 -> 511: 0| | 512 -> 1023 : 0| | 1024 -> 2047 : 21 | | 2048 -> 4095 : 21820 || 4096 -> 8191 : 11598|* | 8192 -> 16383 : 4337 |*** | 16384 -> 32767 : 290 | | 32768 -> 65535 : 59 | | 65536 -> 131071 : 23 | | 131072 -> 262143 : 12 | | 262144 -> 524287 : 6| | 524288 -> 1048575: 19 | | 1048576 -> 2097151: 1079 |* |
Re: Soft lockup in inet_put_port on 4.6
On Mon, Dec 12, 2016 at 1:44 PM, Hannes Frederic Sowawrote: On 12.12.2016 19:05, Josef Bacik wrote: On Fri, Dec 9, 2016 at 11:14 PM, Eric Dumazet wrote: On Fri, 2016-12-09 at 19:47 -0800, Eric Dumazet wrote: Hmm... Is your ephemeral port range includes the port your load balancing app is using ? I suspect that you might have processes doing bind( port = 0) that are trapped into the bind_conflict() scan ? With 100,000 + timewaits there, this possibly hurts. Can you try the following loop breaker ? It doesn't appear that the app is doing bind(port = 0) during normal operation. I tested this patch and it made no difference. I'm going to test simply restarting the app without changing to the SO_REUSEPORT option. Thanks, Would it be possible to trace the time the function uses with trace? If we don't see the number growing considerably over time we probably can rule out that we loop somewhere in there (I would instrument inet_csk_bind_conflict, __inet_hash_connect and inet_csk_get_port). __inet_hash_connect -> __inet_check_established also takes a lock (inet_ehash_lockp) which can be locked from inet_diag code path during socket diag info dumping. Unfortunately we couldn't reproduce it so far. :/ Working on getting the timing info, will probably be tomorrow due to meetings. I did test simply restarting the app without changing to the config that enabled the use of SO_REUSEPORT and the problem didn't occur, so it definitely has something to do with SO_REUSEPORT. Thanks, Josef
Re: Soft lockup in inet_put_port on 4.6
On 12.12.2016 19:05, Josef Bacik wrote: > On Fri, Dec 9, 2016 at 11:14 PM, Eric Dumazet> wrote: >> On Fri, 2016-12-09 at 19:47 -0800, Eric Dumazet wrote: >> >>> >>> Hmm... Is your ephemeral port range includes the port your load >>> balancing app is using ? >> >> I suspect that you might have processes doing bind( port = 0) that are >> trapped into the bind_conflict() scan ? >> >> With 100,000 + timewaits there, this possibly hurts. >> >> Can you try the following loop breaker ? > > It doesn't appear that the app is doing bind(port = 0) during normal > operation. I tested this patch and it made no difference. I'm going to > test simply restarting the app without changing to the SO_REUSEPORT > option. Thanks, Would it be possible to trace the time the function uses with trace? If we don't see the number growing considerably over time we probably can rule out that we loop somewhere in there (I would instrument inet_csk_bind_conflict, __inet_hash_connect and inet_csk_get_port). __inet_hash_connect -> __inet_check_established also takes a lock (inet_ehash_lockp) which can be locked from inet_diag code path during socket diag info dumping. Unfortunately we couldn't reproduce it so far. :/ Thanks, Hannes
Re: Soft lockup in inet_put_port on 4.6
On Fri, Dec 9, 2016 at 11:14 PM, Eric Dumazetwrote: On Fri, 2016-12-09 at 19:47 -0800, Eric Dumazet wrote: Hmm... Is your ephemeral port range includes the port your load balancing app is using ? I suspect that you might have processes doing bind( port = 0) that are trapped into the bind_conflict() scan ? With 100,000 + timewaits there, this possibly hurts. Can you try the following loop breaker ? It doesn't appear that the app is doing bind(port = 0) during normal operation. I tested this patch and it made no difference. I'm going to test simply restarting the app without changing to the SO_REUSEPORT option. Thanks, Josef
Re: Soft lockup in inet_put_port on 4.6
On Fri, 2016-12-09 at 19:47 -0800, Eric Dumazet wrote: > > Hmm... Is your ephemeral port range includes the port your load > balancing app is using ? I suspect that you might have processes doing bind( port = 0) that are trapped into the bind_conflict() scan ? With 100,000 + timewaits there, this possibly hurts. Can you try the following loop breaker ? diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index d5d3ead0a6c31e42e8843d30f8c643324a91b8e9..74f0f5ee6a02c624edb0263b9ddd27813f68d0a5 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -51,7 +51,7 @@ int inet_csk_bind_conflict(const struct sock *sk, int reuse = sk->sk_reuse; int reuseport = sk->sk_reuseport; kuid_t uid = sock_i_uid((struct sock *)sk); - + unsigned int max_count; /* * Unlike other sk lookup places we do not check * for sk_net here, since _all_ the socks listed @@ -59,6 +59,7 @@ int inet_csk_bind_conflict(const struct sock *sk, * one this bucket belongs to. */ + max_count = relax ? ~0U : 100; sk_for_each_bound(sk2, >owners) { if (sk != sk2 && !inet_v6_ipv6only(sk2) && @@ -84,6 +85,8 @@ int inet_csk_bind_conflict(const struct sock *sk, break; } } + if (--max_count == 0) + return 1; } return sk2 != NULL; } diff --git a/net/ipv6/inet6_connection_sock.c b/net/ipv6/inet6_connection_sock.c index 1c86c478f578b49373e61a4c397f23f3dc7f3fc6..4f63d06e0d601da94eb3f2b35a988abd060e156c 100644 --- a/net/ipv6/inet6_connection_sock.c +++ b/net/ipv6/inet6_connection_sock.c @@ -35,12 +35,14 @@ int inet6_csk_bind_conflict(const struct sock *sk, int reuse = sk->sk_reuse; int reuseport = sk->sk_reuseport; kuid_t uid = sock_i_uid((struct sock *)sk); + unsigned int max_count; /* We must walk the whole port owner list in this case. -DaveM */ /* * See comment in inet_csk_bind_conflict about sock lookup * vs net namespaces issues. */ + max_count = relax ? ~0U : 100; sk_for_each_bound(sk2, >owners) { if (sk != sk2 && (!sk->sk_bound_dev_if || @@ -61,6 +63,8 @@ int inet6_csk_bind_conflict(const struct sock *sk, ipv6_rcv_saddr_equal(sk, sk2, true)) break; } + if (--max_count == 0) + return 1; } return sk2 != NULL;
Re: Soft lockup in inet_put_port on 4.6
On Fri, 2016-12-09 at 20:59 -0500, Josef Bacik wrote: > On Thu, Dec 8, 2016 at 8:01 PM, Josef Bacikwrote: > > > >> On Dec 8, 2016, at 7:32 PM, Eric Dumazet > >> wrote: > >> > >>> On Thu, 2016-12-08 at 16:36 -0500, Josef Bacik wrote: > >>> > >>> We can reproduce the problem at will, still trying to run down the > >>> problem. I'll try and find one of the boxes that dumped a core > >>> and get > >>> a bt of everybody. Thanks, > >> > >> OK, sounds good. > >> > >> I had a look and : > >> - could not spot a fix that came after 4.6. > >> - could not spot an obvious bug. > >> > >> Anything special in the program triggering the issue ? > >> SO_REUSEPORT and/or special socket options ? > >> > > > > So they recently started using SO_REUSEPORT, that's what triggered > > it, if they don't use it then everything is fine. > > > > I added some instrumentation for get_port to see if it was looping in > > there and none of my printk's triggered. The softlockup messages are > > always on the inet_bind_bucket lock, sometimes in the process context > > in get_port or in the softirq context either through inet_put_port or > > inet_kill_twsk. On the box that I have a coredump for there's only > > one processor in the inet code so I'm not sure what to make of that. > > That was a box from last week so I'll look at a more recent core and > > see if it's different. Thanks, > > Ok more investigation today, a few bullet points > > - With all the debugging turned on the boxes seem to recover after > about a minute. I'd get the spam of the soft lockup messages all on > the inet_bind_bucket, and then the box would be fine. > - I looked at a core I had from before I started investigating things > and there's only one process trying to get the inet_bind_bucket of all > the 48 cpus. > - I noticed that there was over 100k twsk's in that original core. > - I put a global counter of the twsk's (since most of the softlockup > messages have the twsk timers in the stack) and noticed with the > debugging kernel it started around 16k twsk's and once it recovered it > was down to less than a thousand. There's a jump where it goes from 8k > to 2k and then there's only one more softlockup message and the box is > fine. > - This happens when we restart the service with the config option to > start using SO_REUSEPORT. > > The application is our load balancing app, so obviously has lots of > connections opened at any given time. What I'm wondering and will test > on Monday is if the SO_REUSEPORT change even matters, or if simply > restarting the service is what triggers the problem. One thing I > forgot to mention is that it's also using TCP_FASTOPEN in both the > non-reuseport and reuseport variants. > > What I suspect is happening is the service stops, all of the sockets it > had open go into TIMEWAIT with relatively the same timer period, and > then suddenly all wake up at the same time which coupled with the > massive amount of traffic that we see per box anyway results in so much > contention and ksoftirqd usage that the box livelocks for a while. > With the lock debugging and stuff turned on we aren't able to service > as much traffic so it recovers relatively quickly, whereas a normal > production kernel never recovers. > > Please keep in mind that I"m a file system developer so my conclusions > may be completely insane, any guidance would be welcome. I'll continue > hammering on this on Monday. Thanks, Hmm... Is your ephemeral port range includes the port your load balancing app is using ?
Re: Soft lockup in inet_put_port on 4.6
On Thu, Dec 8, 2016 at 8:01 PM, Josef Bacikwrote: On Dec 8, 2016, at 7:32 PM, Eric Dumazet wrote: On Thu, 2016-12-08 at 16:36 -0500, Josef Bacik wrote: We can reproduce the problem at will, still trying to run down the problem. I'll try and find one of the boxes that dumped a core and get a bt of everybody. Thanks, OK, sounds good. I had a look and : - could not spot a fix that came after 4.6. - could not spot an obvious bug. Anything special in the program triggering the issue ? SO_REUSEPORT and/or special socket options ? So they recently started using SO_REUSEPORT, that's what triggered it, if they don't use it then everything is fine. I added some instrumentation for get_port to see if it was looping in there and none of my printk's triggered. The softlockup messages are always on the inet_bind_bucket lock, sometimes in the process context in get_port or in the softirq context either through inet_put_port or inet_kill_twsk. On the box that I have a coredump for there's only one processor in the inet code so I'm not sure what to make of that. That was a box from last week so I'll look at a more recent core and see if it's different. Thanks, Ok more investigation today, a few bullet points - With all the debugging turned on the boxes seem to recover after about a minute. I'd get the spam of the soft lockup messages all on the inet_bind_bucket, and then the box would be fine. - I looked at a core I had from before I started investigating things and there's only one process trying to get the inet_bind_bucket of all the 48 cpus. - I noticed that there was over 100k twsk's in that original core. - I put a global counter of the twsk's (since most of the softlockup messages have the twsk timers in the stack) and noticed with the debugging kernel it started around 16k twsk's and once it recovered it was down to less than a thousand. There's a jump where it goes from 8k to 2k and then there's only one more softlockup message and the box is fine. - This happens when we restart the service with the config option to start using SO_REUSEPORT. The application is our load balancing app, so obviously has lots of connections opened at any given time. What I'm wondering and will test on Monday is if the SO_REUSEPORT change even matters, or if simply restarting the service is what triggers the problem. One thing I forgot to mention is that it's also using TCP_FASTOPEN in both the non-reuseport and reuseport variants. What I suspect is happening is the service stops, all of the sockets it had open go into TIMEWAIT with relatively the same timer period, and then suddenly all wake up at the same time which coupled with the massive amount of traffic that we see per box anyway results in so much contention and ksoftirqd usage that the box livelocks for a while. With the lock debugging and stuff turned on we aren't able to service as much traffic so it recovers relatively quickly, whereas a normal production kernel never recovers. Please keep in mind that I"m a file system developer so my conclusions may be completely insane, any guidance would be welcome. I'll continue hammering on this on Monday. Thanks, Josef
Re: Soft lockup in inet_put_port on 4.6
> On Dec 8, 2016, at 7:32 PM, Eric Dumazetwrote: > >> On Thu, 2016-12-08 at 16:36 -0500, Josef Bacik wrote: >> >> We can reproduce the problem at will, still trying to run down the >> problem. I'll try and find one of the boxes that dumped a core and get >> a bt of everybody. Thanks, > > OK, sounds good. > > I had a look and : > - could not spot a fix that came after 4.6. > - could not spot an obvious bug. > > Anything special in the program triggering the issue ? > SO_REUSEPORT and/or special socket options ? > So they recently started using SO_REUSEPORT, that's what triggered it, if they don't use it then everything is fine. I added some instrumentation for get_port to see if it was looping in there and none of my printk's triggered. The softlockup messages are always on the inet_bind_bucket lock, sometimes in the process context in get_port or in the softirq context either through inet_put_port or inet_kill_twsk. On the box that I have a coredump for there's only one processor in the inet code so I'm not sure what to make of that. That was a box from last week so I'll look at a more recent core and see if it's different. Thanks, Josef
Re: Soft lockup in inet_put_port on 4.6
On Thu, 2016-12-08 at 16:36 -0500, Josef Bacik wrote: > We can reproduce the problem at will, still trying to run down the > problem. I'll try and find one of the boxes that dumped a core and get > a bt of everybody. Thanks, OK, sounds good. I had a look and : - could not spot a fix that came after 4.6. - could not spot an obvious bug. Anything special in the program triggering the issue ? SO_REUSEPORT and/or special socket options ? Thanks.
Re: Soft lockup in inet_put_port on 4.6
On Thu, Dec 8, 2016 at 4:03 PM, Hannes Frederic Sowawrote: Hello Tom, On Wed, Dec 7, 2016, at 00:06, Tom Herbert wrote: We are seeing a fair number of machines getting into softlockup in 4.6 kernel. As near as I can tell this is happening on the spinlock in bind hash bucket. When inet_csk_get_port exits and does spinunlock_bh the TCP timer runs and we hit lockup in inet_put_port (presumably on same lock). It seems like the locked isn't properly be unlocked somewhere but I don't readily see it. Any ideas? Likewise we received reports that pretty much look the same on our heavily patched kernel. Did you have a chance to investigate or reproduce the problem? I am wondering if you would be able to take a complete thread stack dump if you can reproduce this to check if one of the user space processes is looping inside finding a free port? We can reproduce the problem at will, still trying to run down the problem. I'll try and find one of the boxes that dumped a core and get a bt of everybody. Thanks, Josef
Re: Soft lockup in inet_put_port on 4.6
Hello Tom, On Wed, Dec 7, 2016, at 00:06, Tom Herbert wrote: > We are seeing a fair number of machines getting into softlockup in 4.6 > kernel. As near as I can tell this is happening on the spinlock in > bind hash bucket. When inet_csk_get_port exits and does spinunlock_bh > the TCP timer runs and we hit lockup in inet_put_port (presumably on > same lock). It seems like the locked isn't properly be unlocked > somewhere but I don't readily see it. > > Any ideas? Likewise we received reports that pretty much look the same on our heavily patched kernel. Did you have a chance to investigate or reproduce the problem? I am wondering if you would be able to take a complete thread stack dump if you can reproduce this to check if one of the user space processes is looping inside finding a free port? Thanks, Hannes
Soft lockup in inet_put_port on 4.6
Hello, We are seeing a fair number of machines getting into softlockup in 4.6 kernel. As near as I can tell this is happening on the spinlock in bind hash bucket. When inet_csk_get_port exits and does spinunlock_bh the TCP timer runs and we hit lockup in inet_put_port (presumably on same lock). It seems like the locked isn't properly be unlocked somewhere but I don't readily see it. Any ideas? Thanks, Tom NMI watchdog: BUG: soft lockup - CPU#22 stuck for 22s! [proxygend:4152094] Modules linked in: fuse nf_log_ipv6 ip6t_REJECT nf_reject_ipv6 nf_log_ipv4 nf_log_common xt_LOG ipt_REJECT nf_reject_ipv4 xt_limit xt_multiport ipip ip_tunnel tunnel4 ip6_tunnel tunnel6 coretemp mptctl mptbase cls_bpf ipmi_watchdog tcp_diag inet_diag ip6table_filter xt_NFLOG nfnetlink_log xt_comment xt_statistic iptable_filter xt_mark tpm_crb i2c_piix4 dm_crypt loop ipmi_devintf acpi_cpufreq iTCO_wdt iTCO_vendor_support ipmi_si ipmi_msghandler efivars i2c_i801 sg lpc_ich mfd_core hpilo xhci_pci xhci_hcd button nvme nvme_core CPU: 22 PID: 4152094 Comm: proxygend Tainted: G W L 4.6.7-13_fbk3_1119_g367d67b #13 Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 12/27/2015 task: 88168c52d100 ti: 881c12fb task.ti: 881c12fb RIP: 0010:[] [] queued_spin_lock_slowpath+0xf8/0x170 RSP: 0018:883fff303da0 EFLAGS: 0246 RAX: RBX: 881257163e00 RCX: 0001 RDX: 883fff375e40 RSI: 005c RDI: c90018d6bae0 RBP: 883fff303da0 R08: 883fff315e40 R09: R10: 0020 R11: 01c0 R12: c90018d6bae0 R13: 820f8a80 R14: 881257163f30 R15: FS: 7fa7bb7ff700() GS:883fff30() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7ff4be114d90 CR3: 00243f99c000 CR4: 003406e0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 Stack: 883fff303db0 817e5910 883fff303dd8 8172f6b4 881257163e00 0007 0004 883fff303e00 81733237 881257163e00 81ce7cc0 Call Trace: [] _raw_spin_lock+0x20/0x30 [] inet_put_port+0x54/0xb0 [] tcp_set_state+0x67/0xc0 [] tcp_done+0x33/0x90 [] tcp_write_err+0x31/0x50 [] tcp_retransmit_timer+0x119/0x7d0 [] ? tcp_write_timer_handler+0x1e0/0x1e0 [] tcp_write_timer_handler+0x8e/0x1e0 [] tcp_write_timer+0x67/0x70 [] call_timer_fn+0x35/0x120 [] ? tcp_write_timer_handler+0x1e0/0x1e0 [] run_timer_softirq+0x1fc/0x2b0 [] __do_softirq+0xcc/0x26c [] do_softirq_own_stack+0x1c/0x30 [] do_softirq+0x31/0x40 [] __local_bh_enable_ip+0x78/0x80 [] _raw_spin_unlock_bh +0x1a/0x20 [] inet_csk_get_port+0x1c1/0x5a0 [] ? sock_poll+0x47/0xb0 [] inet_csk_listen_start+0x65/0xc0 [] inet_listen+0x9c/0xe0 [] SyS_listen+0x80/0x90 [] entry_SYSCALL_64_fastpath+0x13/0x8f Code: c1 ea 0c 83 e8 01 83 e2 30 48 98 48 81 c2 40 5e 01 00 48 03 14 c5 c0 d4 d1 81 4c 89 02 41 8b 40 08 85 c0 75 0a f3 90 41 8b 40 08 <85> c0 74 f6 4d 8b 08 4d 85 c9 74 08 41 0f 0d 09 eb 02 f3 90 8b