Re: svn commit: r303037 - head/sys/kern
On 07/20/16 18:35, Gleb Smirnoff wrote: Randall, I have just tested and r303037 brings the TCP panic back. I got two crashes during 2.5 hours. Hi Gleb, There is more than one way to fix the TCP panic. You don't _need_ to change the return values of callout_stop(). You _can_ make a new function for this, which doesn't cause problems in case anyone out there is depending on the current callout_stop() behaviour. In your email [1] you are right, there is regression that (-1) return value is lost. This problem was worked on in the PR 210884, and we were very close to commiting the fix. The whole 11.0-RELEASE cycle strongly depends on this change. We don't want to release with TCP panic, and of course we want the regression described in 210884 to be fixed. Your backout mixed with extra code really made things messy. Since I don't want to go with commit war, on behalf of RE we are asking for explicit agreement to back out r303037. Then we will proceed with latest patch from 210884. Is that okay? I think Randall is right trying to preserve the callout API. My wish Gleb, is that you add a new function to handle the new behaviour. [1] https://lists.freebsd.org/pipermail/svn-src-head/2016-July/089313.html --HPS ___ svn-src-head@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r303037 - head/sys/kern
You are most welcome to backout anything you like.. as far as I am concerned you own the code.. R > On Jul 20, 2016, at 6:35 PM, Gleb Smirnoff wrote: > > Randall, > > I have just tested and r303037 brings the TCP panic back. I > got two crashes during 2.5 hours. > > In your email [1] you are right, there is regression that (-1) > return value is lost. This problem was worked on in the PR 210884, > and we were very close to commiting the fix. > > The whole 11.0-RELEASE cycle strongly depends on this change. We > don't want to release with TCP panic, and of course we want the > regression described in 210884 to be fixed. > > Your backout mixed with extra code really made things messy. Since > I don't want to go with commit war, on behalf of RE we are asking > for explicit agreement to back out r303037. Then we will proceed with > latest patch from 210884. Is that okay? > > [1] https://lists.freebsd.org/pipermail/svn-src-head/2016-July/089313.html > > On Wed, Jul 20, 2016 at 03:33:37PM +0200, Randall Stewart wrote: > R> Gleb > R> > R> I wish you would have responded earlier.. I am more than glad to hand > R> off all kern_timeout.c to you… please take it commit what you want to > R> it and have it. I hate the code and I dislike having to touch it. > R> > R> Its yours.. I can assure you I will not touch it again. > R> > R> R > R> > On Jul 20, 2016, at 8:53 AM, Gleb Smirnoff wrote: > R> > > R> > On Tue, Jul 19, 2016 at 06:31:19PM +, Randall Stewart wrote: > R> > R> Author: rrs > R> > R> Date: Tue Jul 19 18:31:19 2016 > R> > R> New Revision: 303037 > R> > R> URL: https://svnweb.freebsd.org/changeset/base/303037 > R> > R> > R> > R> Log: > R> > R> This reverts out Gleb's changes and adds three small > R> > R> fixes that I think closes up the races Gleb was > R> > R> looking for. This is running quite nicely in Netflix and > R> > R> now no longer causes TCP-tcb leaks. > R> > R> > R> > R> Differential Revision: 7135 > R> > > R> > Just to notice that I am completely pissed of by this commit > R> > war, that you started. > R> > > R> > I've been testing my changes properly, I gave people time to > R> > review my changes. You didn't. > R> > > R> > From your explanation in other emails I see that you've been > R> > testing your changes with a version of FreeBSD that is a heavily > R> > modified FreeBSD 10, not 11. > R> > > R> > The new code you mixed with revert of mine, doesn't fix the > R> > problem observed. It fixes another problem that you imagined, > R> > which might exist, but isn't observed. We already discussed that > R> > and you didn't prove it wrong. > R> > > R> > Your change doesn't even revert my change completely. > R> > > R> > -- > R> > Totus tuus, Glebius. > R> > R> > R> Randall Stewart > R> r...@netflix.com > R> 803-317-4952 > R> > R> > R> > R> > R> > R> > > -- > Totus tuus, Glebius. Randall Stewart r...@netflix.com 803-317-4952 ___ svn-src-head@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r303037 - head/sys/kern
Randall, I have just tested and r303037 brings the TCP panic back. I got two crashes during 2.5 hours. In your email [1] you are right, there is regression that (-1) return value is lost. This problem was worked on in the PR 210884, and we were very close to commiting the fix. The whole 11.0-RELEASE cycle strongly depends on this change. We don't want to release with TCP panic, and of course we want the regression described in 210884 to be fixed. Your backout mixed with extra code really made things messy. Since I don't want to go with commit war, on behalf of RE we are asking for explicit agreement to back out r303037. Then we will proceed with latest patch from 210884. Is that okay? [1] https://lists.freebsd.org/pipermail/svn-src-head/2016-July/089313.html On Wed, Jul 20, 2016 at 03:33:37PM +0200, Randall Stewart wrote: R> Gleb R> R> I wish you would have responded earlier.. I am more than glad to hand R> off all kern_timeout.c to you… please take it commit what you want to R> it and have it. I hate the code and I dislike having to touch it. R> R> Its yours.. I can assure you I will not touch it again. R> R> R R> > On Jul 20, 2016, at 8:53 AM, Gleb Smirnoff wrote: R> > R> > On Tue, Jul 19, 2016 at 06:31:19PM +, Randall Stewart wrote: R> > R> Author: rrs R> > R> Date: Tue Jul 19 18:31:19 2016 R> > R> New Revision: 303037 R> > R> URL: https://svnweb.freebsd.org/changeset/base/303037 R> > R> R> > R> Log: R> > R> This reverts out Gleb's changes and adds three small R> > R> fixes that I think closes up the races Gleb was R> > R> looking for. This is running quite nicely in Netflix and R> > R> now no longer causes TCP-tcb leaks. R> > R> R> > R> Differential Revision:7135 R> > R> > Just to notice that I am completely pissed of by this commit R> > war, that you started. R> > R> > I've been testing my changes properly, I gave people time to R> > review my changes. You didn't. R> > R> > From your explanation in other emails I see that you've been R> > testing your changes with a version of FreeBSD that is a heavily R> > modified FreeBSD 10, not 11. R> > R> > The new code you mixed with revert of mine, doesn't fix the R> > problem observed. It fixes another problem that you imagined, R> > which might exist, but isn't observed. We already discussed that R> > and you didn't prove it wrong. R> > R> > Your change doesn't even revert my change completely. R> > R> > -- R> > Totus tuus, Glebius. R> R> R> Randall Stewart R> r...@netflix.com R> 803-317-4952 R> R> R> R> R> R> -- Totus tuus, Glebius. ___ svn-src-head@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r303037 - head/sys/kern
Gleb I wish you would have responded earlier.. I am more than glad to hand off all kern_timeout.c to you… please take it commit what you want to it and have it. I hate the code and I dislike having to touch it. Its yours.. I can assure you I will not touch it again. R > On Jul 20, 2016, at 8:53 AM, Gleb Smirnoff wrote: > > On Tue, Jul 19, 2016 at 06:31:19PM +, Randall Stewart wrote: > R> Author: rrs > R> Date: Tue Jul 19 18:31:19 2016 > R> New Revision: 303037 > R> URL: https://svnweb.freebsd.org/changeset/base/303037 > R> > R> Log: > R> This reverts out Gleb's changes and adds three small > R> fixes that I think closes up the races Gleb was > R> looking for. This is running quite nicely in Netflix and > R> now no longer causes TCP-tcb leaks. > R> > R> Differential Revision: 7135 > > Just to notice that I am completely pissed of by this commit > war, that you started. > > I've been testing my changes properly, I gave people time to > review my changes. You didn't. > > From your explanation in other emails I see that you've been > testing your changes with a version of FreeBSD that is a heavily > modified FreeBSD 10, not 11. > > The new code you mixed with revert of mine, doesn't fix the > problem observed. It fixes another problem that you imagined, > which might exist, but isn't observed. We already discussed that > and you didn't prove it wrong. > > Your change doesn't even revert my change completely. > > -- > Totus tuus, Glebius. Randall Stewart r...@netflix.com 803-317-4952 ___ svn-src-head@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r303037 - head/sys/kern
On Tue, Jul 19, 2016 at 06:31:19PM +, Randall Stewart wrote: R> Author: rrs R> Date: Tue Jul 19 18:31:19 2016 R> New Revision: 303037 R> URL: https://svnweb.freebsd.org/changeset/base/303037 R> R> Log: R> This reverts out Gleb's changes and adds three small R> fixes that I think closes up the races Gleb was R> looking for. This is running quite nicely in Netflix and R> now no longer causes TCP-tcb leaks. R> R> Differential Revision: 7135 Just to notice that I am completely pissed of by this commit war, that you started. I've been testing my changes properly, I gave people time to review my changes. You didn't. >From your explanation in other emails I see that you've been testing your changes with a version of FreeBSD that is a heavily modified FreeBSD 10, not 11. The new code you mixed with revert of mine, doesn't fix the problem observed. It fixes another problem that you imagined, which might exist, but isn't observed. We already discussed that and you didn't prove it wrong. Your change doesn't even revert my change completely. -- Totus tuus, Glebius. ___ svn-src-head@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r303037 - head/sys/kern
> On Jul 19, 2016, at 11:31, Randall Stewart wrote: > > Author: rrs > Date: Tue Jul 19 18:31:19 2016 > New Revision: 303037 > URL: https://svnweb.freebsd.org/changeset/base/303037 > > Log: > This reverts out Gleb's changes and adds three small > fixes that I think closes up the races Gleb was > looking for. This is running quite nicely in Netflix and > now no longer causes TCP-tcb leaks. > > Differential Revision: 7135 This needs to be MFCed to ^/stable/11. Thanks, -Ngie signature.asc Description: Message signed with OpenPGP using GPGMail
svn commit: r303037 - head/sys/kern
Author: rrs Date: Tue Jul 19 18:31:19 2016 New Revision: 303037 URL: https://svnweb.freebsd.org/changeset/base/303037 Log: This reverts out Gleb's changes and adds three small fixes that I think closes up the races Gleb was looking for. This is running quite nicely in Netflix and now no longer causes TCP-tcb leaks. Differential Revision:7135 Modified: head/sys/kern/kern_timeout.c Modified: head/sys/kern/kern_timeout.c == --- head/sys/kern/kern_timeout.cTue Jul 19 18:15:22 2016 (r303036) +++ head/sys/kern/kern_timeout.cTue Jul 19 18:31:19 2016 (r303037) @@ -1050,7 +1050,7 @@ callout_reset_sbt_on(struct callout *c, */ if (c->c_lock != NULL && !cc_exec_cancel(cc, direct)) cancelled = cc_exec_cancel(cc, direct) = true; - if (cc_exec_waiting(cc, direct)) { + if (cc_exec_waiting(cc, direct) || cc_exec_drain(cc, direct)) { /* * Someone has called callout_drain to kill this * callout. Don't reschedule. @@ -1166,7 +1166,7 @@ _callout_stop_safe(struct callout *c, in struct callout_cpu *cc, *old_cc; struct lock_class *class; int direct, sq_locked, use_lock; - int cancelled, not_on_a_list; + int not_on_a_list; if ((flags & CS_DRAIN) != 0) WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK, c->c_lock, @@ -1234,17 +1234,47 @@ again: panic("migration should not happen"); #endif } - + if ((drain != NULL) && (c->c_iflags & CALLOUT_PENDING) && + (cc_exec_curr(cc, direct) != c)) { + /* +* This callout is executing and we are draining. +* The only way this can happen is if its also +* been rescheduled to run on one thread *and* asked to drain +* on this thread (at the same time it is waiting to execute). +*/ + if ((c->c_iflags & CALLOUT_PROCESSED) == 0) { + if (cc_exec_next(cc) == c) + cc_exec_next(cc) = LIST_NEXT(c, c_links.le); + LIST_REMOVE(c, c_links.le); + } else { + TAILQ_REMOVE(&cc->cc_expireq, c, c_links.tqe); + } + c->c_iflags &= ~CALLOUT_PENDING; + c->c_flags &= ~CALLOUT_ACTIVE; + } /* -* If the callout is running, try to stop it or drain it. +* If the callout isn't pending, it's not on the queue, so +* don't attempt to remove it from the queue. We can try to +* stop it by other means however. */ - if (cc_exec_curr(cc, direct) == c) { + if (!(c->c_iflags & CALLOUT_PENDING)) { /* -* Succeed we to stop it or not, we must clear the -* active flag - this is what API users expect. +* If it wasn't on the queue and it isn't the current +* callout, then we can't stop it, so just bail. +* It probably has already been run (if locking +* is properly done). You could get here if the caller +* calls stop twice in a row for example. The second +* call would fall here without CALLOUT_ACTIVE set. */ c->c_flags &= ~CALLOUT_ACTIVE; - + if (cc_exec_curr(cc, direct) != c) { + CTR3(KTR_CALLOUT, "failed to stop %p func %p arg %p", + c, c->c_func, c->c_arg); + CC_UNLOCK(cc); + if (sq_locked) + sleepq_release(&cc_exec_waiting(cc, direct)); + return (-1); + } if ((flags & CS_DRAIN) != 0) { /* * The current callout is running (or just @@ -1278,7 +1308,6 @@ again: old_cc = cc; goto again; } - /* * Migration could be cancelled here, but * as long as it is still not sure when it @@ -1362,6 +1391,8 @@ again: cc_exec_drain(cc, direct) = drain; } CC_UNLOCK(cc); + if (drain) + return(0); return ((flags & CS_EXECUTING) != 0); } CTR3(KTR_CALLOUT, "failed to stop %p func %p arg %p", @@ -1369,20 +1400,12 @@ again: if (drain) { cc_exec_drain(cc, direct) = drain; } - KASSERT(!sq_locked,