Re: Instant panic while trying run ports-mgmt/poudriere
John-Mark, with all the due respect I have to invoke the forest-vs-trees argument here: - it is established that in the knote() loop the current knote member of the klist can be removed - it's a fact that getting a pointer to a next element from a removed element is an illegal operation - FOREACH_SAFE is specifically designed to handle exactly this kind of the iteration -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Instant panic while trying run ports-mgmt/poudriere
On 1 September 2015 at 15:01, John-Mark Gurney wrote: > > But I would ask you to respect my maintainership of the code... Just > because you get paid to work on FreeBSD full time does not mean you > get to run roughshod over other people's work and force them to work > on your time frame... Other people have jobs, and families and > responsiblities too... A quick comment on this point, on behalf of the FreeBSD Foundation (and not core): working for the Foundation as either permanent staff or on a project grant conveys no special status with respect to making changes in FreeBSD. Staff and project developers are expected to abide by the same rules and social conventions when interacting with the FreeBSD community. That said, the discussion and diagnosis of this issue has been ongoing for about ten days, and avg provided a detailed sequence of events five days ago. In this case the patch fixed a panic that several people were experiencing, was tested by several people who experienced the panic, and received review. In my opinion r287366 was handled in a fair and reasonable fashion. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Instant panic while trying run ports-mgmt/poudriere
Konstantin Belousov wrote this message on Tue, Sep 01, 2015 at 21:44 +0300: > On Tue, Sep 01, 2015 at 11:24:06AM -0700, John-Mark Gurney wrote: > > Andriy Gapon wrote this message on Thu, Aug 27, 2015 at 23:21 +0300: > > > On 27/08/2015 21:09, John-Mark Gurney wrote: > > > > Andriy Gapon wrote this message on Thu, Aug 27, 2015 at 10:21 +0300: > > > >> On 27/08/2015 02:36, John-Mark Gurney wrote: > > > >>> We should/cannot get here w/ an empty list. If we do, then there is > > > >>> something seriously wrong... The current kn (which we must have as we > > > >>> are here) MUST be on the list, but as you just showed, there are no > > > >>> knotes on the list. > > > >>> > > > >>> Can you get me a print of the knote? That way I can see what flags > > > >>> are on it? > > > >> > > > >> Apologies if the following might sound a little bit patronizing, but it > > > >> seems that you have got all the facts correctly, but somehow the > > > >> connection between them did not become clear. > > > >> > > > >> So: > > > >> 1. The list originally is NOT empty. I guess that it has one entry, > > > >> but > > > >> that's an unimportant detail. > > > >> 2. This is why the loop is entered. It's a fact that it is entered. > > > >> 3. The list becomes empty precisely because the entry is removed during > > > >> the iteration in the loop (as kib has explained). It's a fact that the > > > >> list became empty at least in the panic that I reported. > > > > > > > > On you're latest dump, you said: > > > > Here is another +1 with r286922. > > > > > > > > I can add a couple of bits of debugging data: > > > > > > > > > > > > > > > > (kgdb) fr 8 > > > > > > > > #8 0x80639d60 in knote (list=0xf8019a733ea0, > > > > > > > > hint=2147483648, lockflags=) at > > > > > > > > /usr/src/sys/kern/kern_event.c:1964 > > > > > > > > 1964} else if ((lockflags & KNF_NOKQLOCK) != 0) { > > > > > > > > > > > > First off, that can't be r286922, per: > > > > https://svnweb.freebsd.org/base/stable/10/sys/kern/kern_event.c?annotate=286922 > > > > > > > > line 1964 is blank... The line of code above should be at line 1884, > > > > so not sure what is wrong here... > > > > > > No, it can not be indeed, because I am running head. > > > r286922 was the latest version of the repository, not the head branch, > > > at the moment when I pulled the repository via git. > > > > > > > Assuming that the pc really is at the line, f_event has not yet been > > > > called, > > > > > > Even on the second loop iteration? > > > > > > >which is why I said that the list cannot be empty yet, as > > > > f_event hasn't been called yet to remove the knote... It could be that > > > > optimization moved stuff around, but if that is the case, then the > > > > above wasn't useful.. > > > > > > I provided the disassembly of the code as well, it's very obvious how > > > the code was translated. > > > > > > >> 4. The element is not only unlinked from the list, but its memory is > > > >> also freed. > > > > > > > > Where is the memory freed? A knote MUST NOT be freed in an f_event > > > > handler. The only location that a list element is allowed to be > > > > freed is in knote_drop, which must happen after f_detach is called, > > > > but that can't/won't happen from knote (I believe the timer handles > > > > this specially, but we are talking about normal knlist type filters).. > > > > > > Well, right. knote()->filt_proc()->knlist_remove_inevent() just removes > > > the knote from the list. But then there is KNOTE_ACTIVATE() that passes > > > the knote to a different owner (so to say). And given that the knote has > > > EV_ONESHOT set on it (in filt_proc) and that poudriere can put quite a > > > stress load on a system, I am not surprised that another thread gets a > > > chance to call knote_drop() on the knote before the original thread > > > proceeds to the next iteration. > > > > Ok, I think I have identified the race that you guys were trying to > > tell me about, and though the _SAFE macro would be a similar fix, I'm > > going to rewrite the loop so that this is more explicit on what > > is happening here... > > > > So, the race is this... In knote, when the note is removed by > > f_event, things are find until the KQ lock is dropped... Once this > > lock is dropped, effective ownership of the knote is transfered > > from the knlist to the kq lock as the _DETACHED flag is now set, > > which means that reading any fields from that note is undefined.. > > > > Once the kq lock is released in knote, then it is possible for a > > functional like kqueue_scan to endup knote_drop'ing the note..
Re: Instant panic while trying run ports-mgmt/poudriere
On Tue, Sep 01, 2015 at 11:24:06AM -0700, John-Mark Gurney wrote: > Andriy Gapon wrote this message on Thu, Aug 27, 2015 at 23:21 +0300: > > On 27/08/2015 21:09, John-Mark Gurney wrote: > > > Andriy Gapon wrote this message on Thu, Aug 27, 2015 at 10:21 +0300: > > >> On 27/08/2015 02:36, John-Mark Gurney wrote: > > >>> We should/cannot get here w/ an empty list. If we do, then there is > > >>> something seriously wrong... The current kn (which we must have as we > > >>> are here) MUST be on the list, but as you just showed, there are no > > >>> knotes on the list. > > >>> > > >>> Can you get me a print of the knote? That way I can see what flags > > >>> are on it? > > >> > > >> Apologies if the following might sound a little bit patronizing, but it > > >> seems that you have got all the facts correctly, but somehow the > > >> connection between them did not become clear. > > >> > > >> So: > > >> 1. The list originally is NOT empty. I guess that it has one entry, but > > >> that's an unimportant detail. > > >> 2. This is why the loop is entered. It's a fact that it is entered. > > >> 3. The list becomes empty precisely because the entry is removed during > > >> the iteration in the loop (as kib has explained). It's a fact that the > > >> list became empty at least in the panic that I reported. > > > > > > On you're latest dump, you said: > > > Here is another +1 with r286922. > > > > > > I can add a couple of bits of debugging data: > > > > > > > > > > > > (kgdb) fr 8 > > > > > > #8 0x80639d60 in knote (list=0xf8019a733ea0, > > > > > > hint=2147483648, lockflags=) at > > > > > > /usr/src/sys/kern/kern_event.c:1964 > > > > > > 1964} else if ((lockflags & KNF_NOKQLOCK) != 0) { > > > > > > > > > First off, that can't be r286922, per: > > > https://svnweb.freebsd.org/base/stable/10/sys/kern/kern_event.c?annotate=286922 > > > > > > line 1964 is blank... The line of code above should be at line 1884, > > > so not sure what is wrong here... > > > > No, it can not be indeed, because I am running head. > > r286922 was the latest version of the repository, not the head branch, > > at the moment when I pulled the repository via git. > > > > > Assuming that the pc really is at the line, f_event has not yet been > > > called, > > > > Even on the second loop iteration? > > > > >which is why I said that the list cannot be empty yet, as > > > f_event hasn't been called yet to remove the knote... It could be that > > > optimization moved stuff around, but if that is the case, then the > > > above wasn't useful.. > > > > I provided the disassembly of the code as well, it's very obvious how > > the code was translated. > > > > >> 4. The element is not only unlinked from the list, but its memory is > > >> also freed. > > > > > > Where is the memory freed? A knote MUST NOT be freed in an f_event > > > handler. The only location that a list element is allowed to be > > > freed is in knote_drop, which must happen after f_detach is called, > > > but that can't/won't happen from knote (I believe the timer handles > > > this specially, but we are talking about normal knlist type filters).. > > > > Well, right. knote()->filt_proc()->knlist_remove_inevent() just removes > > the knote from the list. But then there is KNOTE_ACTIVATE() that passes > > the knote to a different owner (so to say). And given that the knote has > > EV_ONESHOT set on it (in filt_proc) and that poudriere can put quite a > > stress load on a system, I am not surprised that another thread gets a > > chance to call knote_drop() on the knote before the original thread > > proceeds to the next iteration. > > Ok, I think I have identified the race that you guys were trying to > tell me about, and though the _SAFE macro would be a similar fix, I'm > going to rewrite the loop so that this is more explicit on what > is happening here... > > So, the race is this... In knote, when the note is removed by > f_event, things are find until the KQ lock is dropped... Once this > lock is dropped, effective ownership of the knote is transfered > from the knlist to the kq lock as the _DETACHED flag is now set, > which means that reading any fields from that note is undefined.. > > Once the kq lock is released in knote, then it is possible for a > functional like kqueue_scan to endup knote_drop'ing the note... Did you read the commit message and my previous N messages about the subject ? Can you point me at a difference between the commit message and the above text ? I object against the your pointless and fact-less backout request and have no intention of complying with it. > >
Re: Instant panic while trying run ports-mgmt/poudriere
Andriy Gapon wrote this message on Thu, Aug 27, 2015 at 23:21 +0300: > On 27/08/2015 21:09, John-Mark Gurney wrote: > > Andriy Gapon wrote this message on Thu, Aug 27, 2015 at 10:21 +0300: > >> On 27/08/2015 02:36, John-Mark Gurney wrote: > >>> We should/cannot get here w/ an empty list. If we do, then there is > >>> something seriously wrong... The current kn (which we must have as we > >>> are here) MUST be on the list, but as you just showed, there are no > >>> knotes on the list. > >>> > >>> Can you get me a print of the knote? That way I can see what flags > >>> are on it? > >> > >> Apologies if the following might sound a little bit patronizing, but it > >> seems that you have got all the facts correctly, but somehow the > >> connection between them did not become clear. > >> > >> So: > >> 1. The list originally is NOT empty. I guess that it has one entry, but > >> that's an unimportant detail. > >> 2. This is why the loop is entered. It's a fact that it is entered. > >> 3. The list becomes empty precisely because the entry is removed during > >> the iteration in the loop (as kib has explained). It's a fact that the > >> list became empty at least in the panic that I reported. > > > > On you're latest dump, you said: > > Here is another +1 with r286922. > > > > I can add a couple of bits of debugging data: > > > > > > > > (kgdb) fr 8 > > > > #8 0x80639d60 in knote (list=0xf8019a733ea0, > > > > hint=2147483648, lockflags=) at > > > > /usr/src/sys/kern/kern_event.c:1964 > > > > 1964} else if ((lockflags & KNF_NOKQLOCK) != 0) { > > > > > > First off, that can't be r286922, per: > > https://svnweb.freebsd.org/base/stable/10/sys/kern/kern_event.c?annotate=286922 > > > > line 1964 is blank... The line of code above should be at line 1884, > > so not sure what is wrong here... > > No, it can not be indeed, because I am running head. > r286922 was the latest version of the repository, not the head branch, > at the moment when I pulled the repository via git. > > > Assuming that the pc really is at the line, f_event has not yet been > > called, > > Even on the second loop iteration? > > >which is why I said that the list cannot be empty yet, as > > f_event hasn't been called yet to remove the knote... It could be that > > optimization moved stuff around, but if that is the case, then the > > above wasn't useful.. > > I provided the disassembly of the code as well, it's very obvious how > the code was translated. > > >> 4. The element is not only unlinked from the list, but its memory is > >> also freed. > > > > Where is the memory freed? A knote MUST NOT be freed in an f_event > > handler. The only location that a list element is allowed to be > > freed is in knote_drop, which must happen after f_detach is called, > > but that can't/won't happen from knote (I believe the timer handles > > this specially, but we are talking about normal knlist type filters).. > > Well, right. knote()->filt_proc()->knlist_remove_inevent() just removes > the knote from the list. But then there is KNOTE_ACTIVATE() that passes > the knote to a different owner (so to say). And given that the knote has > EV_ONESHOT set on it (in filt_proc) and that poudriere can put quite a > stress load on a system, I am not surprised that another thread gets a > chance to call knote_drop() on the knote before the original thread > proceeds to the next iteration. Ok, I think I have identified the race that you guys were trying to tell me about, and though the _SAFE macro would be a similar fix, I'm going to rewrite the loop so that this is more explicit on what is happening here... So, the race is this... In knote, when the note is removed by f_event, things are find until the KQ lock is dropped... Once this lock is dropped, effective ownership of the knote is transfered from the knlist to the kq lock as the _DETACHED flag is now set, which means that reading any fields from that note is undefined.. Once the kq lock is released in knote, then it is possible for a functional like kqueue_scan to endup knote_drop'ing the note... Upon further examination, we may have another race as in knote_drop, when we call f_detach, we don't have the list locked, nor kq, which means that knlist_remove_inevent could be modifing the list at the same time that kqueue_register could be modifing it to remove a _DELETED note... I'd like to close both races at the same time since they go hand in hand... > > The rest of your explination is invalid due to the invalid assumption > > of this point... > > Eagerly waiting for your explanation... > > > If you can provide to
Re: Instant panic while trying run ports-mgmt/poudriere
On 08/23/15 22:54, Konstantin Belousov wrote: > On Sun, Aug 23, 2015 at 12:08:16PM +0300, Konstantin Belousov wrote: >> On Sun, Aug 23, 2015 at 09:54:28AM +0300, Andriy Gapon wrote: >>> On 12/08/2015 17:11, Lawrence Stewart wrote: On 08/07/15 07:33, Pawel Pekala wrote: > Hi K., > > On 2015-08-06 12:33 -0700, "K. Macy" wrote: >> Is this still happening? > > Still crashes: +1 for me running r286617 >>> >>> Here is another +1 with r286922. >>> I can add a couple of bits of debugging data: >>> >>> (kgdb) fr 8 >>> #8 0x80639d60 in knote (list=0xf8019a733ea0, >>> hint=2147483648, lockflags=) at >>> /usr/src/sys/kern/kern_event.c:1964 >>> 1964} else if ((lockflags & KNF_NOKQLOCK) != 0) { >>> (kgdb) p *list >>> $2 = {kl_list = {slh_first = 0x0}, kl_lock = 0x8063a1e0 >>> , kl_unlock = 0x8063a200 , >>> kl_assert_locked = 0x8063a220 , >>> kl_assert_unlocked = 0x8063a240 , >>> kl_lockarg = 0xf8019a733bb0} >>> (kgdb) disassemble >>> Dump of assembler code for function knote: >>> 0x80639d00 : push %rbp >>> 0x80639d01 : mov%rsp,%rbp >>> 0x80639d04 : push %r15 >>> 0x80639d06 : push %r14 >>> 0x80639d08 : push %r13 >>> 0x80639d0a : push %r12 >>> 0x80639d0c : push %rbx >>> 0x80639d0d : sub$0x18,%rsp >>> 0x80639d11 : mov%edx,%r12d >>> 0x80639d14 : mov%rsi,-0x30(%rbp) >>> 0x80639d18 : mov%rdi,%rbx >>> 0x80639d1b : test %rbx,%rbx >>> 0x80639d1e : je 0x80639ef6 >>> 0x80639d24 : mov%r12d,%eax >>> 0x80639d27 : and$0x1,%eax >>> 0x80639d2a : mov%eax,-0x3c(%rbp) >>> 0x80639d2d : mov0x28(%rbx),%rdi >>> 0x80639d31 : je 0x80639d38 >>> 0x80639d33 : callq *0x18(%rbx) >>> 0x80639d36 : jmp0x80639d42 >>> 0x80639d38 : callq *0x20(%rbx) >>> 0x80639d3b : mov0x28(%rbx),%rdi >>> 0x80639d3f : callq *0x8(%rbx) >>> 0x80639d42 : mov%rbx,-0x38(%rbp) >>> 0x80639d46 : mov(%rbx),%rbx >>> 0x80639d49 : test %rbx,%rbx >>> 0x80639d4c : je 0x80639ee5 >>> 0x80639d52 : and$0x2,%r12d >>> 0x80639d56 : nopw %cs:0x0(%rax,%rax,1) >>> 0x80639d60 : mov0x28(%rbx),%r14 >>> >>> Panic is in the last quoted instruction. >>> And: >>> (kgdb) i reg >>> rax0x246582 >>> rbx0xdeadc0dedeadc0de -2401050962867404578 >>> rcx0x0 0 >>> rdx0x12e302 >>> rsi0x80a26a5a -2136839590 >>> rdi0x80e81b80 -2132272256 >>> rbp0xfe02b7efea20 0xfe02b7efea20 >>> rsp0xfe02b7efe9e0 0xfe02b7efe9e0 >>> r8 0x80a269ce -2136839730 >>> r9 0x80e82838 -2132269000 >>> r100x1 65536 >>> r110x80fabd10 -2131051248 >>> r120x0 0 >>> r130xf801ff84a818 -8787511171048 >>> r140xf801ff84a800 -8787511171072 >>> r150xf8019a6974f0 -8789207452432 >>> rip0x80639d60 0x80639d60 >>> eflags 0x10286 66182 >>> >>> I think that $rbx stands out here (this is a kernel with INVARIANTS). >>> >>> Looking at the code, is it possible that one of the calls from within >>> the loop's body modifies the list? If that is so and provided that is a >>> valid behavior, then maybe using SLIST_FOREACH_SAFE would help. >> >> This is first time a useful debugging data was posted. >> >> The 0x28 offset may indicate either kn_kq member access of the struct >> knote, or kq_list of the struct kqueue. >> >> kl_list.slh_first of the list parameter is NULL, how would a list >> iteration loop even start ? Can you look up the list argument value >> from the previous frame (%rdi is overwritten, so debugger might be >> confused) ? > > After looking at your data closely, I think you are right. The panic > occurs when the exit1(9) does KNOTE_LOCKED(NOTE_EXIT). This is the > only case in the tree where filter uses knlist_remove_inevent() to detach > processed note, so indeed the slist is modified under the iterator. > > Below is the patch with the suggested change and unrelated cleanup of > the uma(9) KPI use. Please test, everybody who has a panic with the > backtrace pointing to the sys_exit(). Fixes the panic for me too, thanks Kostik. Cheers, Lawrence ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Instant panic while trying run ports-mgmt/poudriere
On 08/27/15 17:15, Don Lewis wrote: > On 27 Aug, Don Lewis wrote: >> On 27 Aug, Lawrence Stewart wrote: >>> On 08/27/15 09:36, John-Mark Gurney wrote: Andriy Gapon wrote this message on Sun, Aug 23, 2015 at 09:54 +0300: > On 12/08/2015 17:11, Lawrence Stewart wrote: >> On 08/07/15 07:33, Pawel Pekala wrote: >>> Hi K., >>> >>> On 2015-08-06 12:33 -0700, "K. Macy" wrote: Is this still happening? >>> >>> Still crashes: >> >> +1 for me running r286617 > > Here is another +1 with r286922. > I can add a couple of bits of debugging data: > > (kgdb) fr 8 > #8 0x80639d60 in knote (list=0xf8019a733ea0, > hint=2147483648, lockflags=) at > /usr/src/sys/kern/kern_event.c:1964 > 1964} else if ((lockflags & KNF_NOKQLOCK) != 0) { > (kgdb) p *list > $2 = {kl_list = {slh_first = 0x0}, kl_lock = 0x8063a1e0 We should/cannot get here w/ an empty list. If we do, then there is something seriously wrong... The current kn (which we must have as we are here) MUST be on the list, but as you just showed, there are no knotes on the list. Can you get me a print of the knote? That way I can see what flags are on it? >>> >>> I quickly tried to get this info for you by building my kernel with -O0 >>> and reproducing, but I get an insta-panic on boot with the new kernel: >>> >>> Fatal double fault >>> rip = 0x8218c794 >>> rsp = 0xfe044cdc9fe0 >>> rbp = 0xfe044cdca110 >>> cpuid = 2; apic id = 02 >>> panic: double fault >>> cpuid = 2 >>> KDB: stack backtrace: >>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame >>> 0xfe03dcfffe30 >>> vpanic() at vpanic+0x189/frame 0xfe03dcfffeb0 >>> panic() at panic+0x43/frame 0xfe03dc10 >>> dblfault_handler() at dblfault_handler+0xa2/frame 0xfe03dc30 >>> Xdblfault() at Xdblfault+0xac/frame 0xfe03dc30 >>> --- trap 0x17, rip = 0x8218c794, rsp = 0xfe044cdc9fe0, rbp = >>> 0xfe044cdca110 --- >>> vdev_queue_aggregate() at vdev_queue_aggregate+0x34/frame 0xfe044cdca110 >>> vdev_queue_io_to_issue() at vdev_queue_io_to_issue+0x1f5/frame >>> 0xfe044cdca560 >>> vdev_queue_io() at vdev_queue_io+0x19a/frame 0xfe044cdca5b0 >>> zio_vdev_io_start() at zio_vdev_io_start+0x81f/frame 0xfe044cdca6e0 >>> zio_execute() at zio_execute+0x23b/frame 0xfe044cdca730 >>> zio_nowait() at zio_nowait+0xbe/frame 0xfe044cdca760 >>> vdev_mirror_io_start() at vdev_mirror_io_start+0x140/frame >>> 0xfe044cdca800 >>> zio_vdev_io_start() at zio_vdev_io_start+0x12f/frame 0xfe044cdca930 >>> zio_execute() at zio_execute+0x23b/frame 0xfe044cdca980 >>> zio_nowait() at zio_nowait+0xbe/frame 0xfe044cdca9b0 >>> spa_load_verify_cb() at spa_load_verify_cb+0x37d/frame 0xfe044cdcaa50 >>> traverse_visitbp() at traverse_visitbp+0x5a5/frame 0xfe044cdcac60 >>> traverse_dnode() at traverse_dnode+0x98/frame 0xfe044cdcacd0 >>> traverse_visitbp() at traverse_visitbp+0xc66/frame 0xfe044cdcaee0 >>> traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb0f0 >>> traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb300 >>> traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb510 >>> traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb720 >>> traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb930 >>> traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcbb40 >>> traverse_dnode() at traverse_dnode+0x98/frame 0xfe044cdcbbb0 >>> traverse_visitbp() at traverse_visitbp+0xe59/frame 0xfe044cdcbdc0 >>> traverse_impl() at traverse_impl+0x79d/frame 0xfe044cdcbfd0 >>> traverse_dataset() at traverse_dataset+0x93/frame 0xfe044cdcc040 >>> traverse_pool() at traverse_pool+0x1f2/frame 0xfe044cdcc140 >>> spa_load_verify() at spa_load_verify+0xf3/frame 0xfe044cdcc1f0 >>> spa_load_impl() at spa_load_impl+0x2069/frame 0xfe044cdcc610 >>> spa_load() at spa_load+0x320/frame 0xfe044cdcc6d0 >>> spa_load_impl() at spa_load_impl+0x150b/frame 0xfe044cdccaf0 >>> spa_load() at spa_load+0x320/frame 0xfe044cdccbb0 >>> spa_load_best() at spa_load_best+0xc6/frame 0xfe044cdccc50 >>> spa_open_common() at spa_open_common+0x246/frame 0xfe044cdccd40 >>> spa_open() at spa_open+0x35/frame 0xfe044cdccd70 >>> dsl_pool_hold() at dsl_pool_hold+0x2d/frame 0xfe044cdccdb0 >>> dmu_objset_own() at dmu_objset_own+0x2e/frame 0xfe044cdcce30 >>> zfsvfs_create() at zfsvfs_create+0x100/frame 0xfe044cdcd050 >>> zfs_domount() at zfs_domount+0xa7/frame 0xfe044cdcd0e0 >>> zfs_mount() at zfs_mount+0x6c3/frame 0xfe044cdcd390 >>> vfs_donmount() at vfs_donmount+0x1330/frame 0xfe044cdcd660 >>> kernel_mount() at kernel_mount+0x62/frame 0xfe044cdcd6c0 >>> parse_mount() at parse_mount+0x668/frame 0xfe044cdcd810 >>> vfs_mountroot() at vfs_mountroot+0x85c/frame 0xfe044cdcd9d
Re: Instant panic while trying run ports-mgmt/poudriere
On 27/08/2015 23:21, Andriy Gapon wrote: >> > First off, that can't be r286922, per: >> > https://svnweb.freebsd.org/base/stable/10/sys/kern/kern_event.c?annotate=286922 >> > >> > line 1964 is blank... The line of code above should be at line 1884, >> > so not sure what is wrong here... > No, it can not be indeed, because I am running head. > r286922 was the latest version of the repository, not the head branch, > at the moment when I pulled the repository via git. Hrm, a small - irrelevant for me, but probably not for you - nit: r286922 is actually a head commit: https://svnweb.freebsd.org/base?view=revision&revision=286922 And: https://svnweb.freebsd.org/base/head/sys/kern/kern_event.c?annotate=286922#l1964 Not sure why you chose to look at stable/10 (given the mailing list). -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Instant panic while trying run ports-mgmt/poudriere
On Thu, Aug 27, 2015 at 11:09:45AM -0700, John-Mark Gurney wrote: > Andriy Gapon wrote this message on Thu, Aug 27, 2015 at 10:21 +0300: > > On 27/08/2015 02:36, John-Mark Gurney wrote: > > > We should/cannot get here w/ an empty list. If we do, then there is > > > something seriously wrong... The current kn (which we must have as we > > > are here) MUST be on the list, but as you just showed, there are no > > > knotes on the list. > > > > > > Can you get me a print of the knote? That way I can see what flags > > > are on it? > > > > Apologies if the following might sound a little bit patronizing, but it > > seems that you have got all the facts correctly, but somehow the > > connection between them did not become clear. > > > > So: > > 1. The list originally is NOT empty. I guess that it has one entry, but > > that's an unimportant detail. > > 2. This is why the loop is entered. It's a fact that it is entered. > > 3. The list becomes empty precisely because the entry is removed during > > the iteration in the loop (as kib has explained). It's a fact that the > > list became empty at least in the panic that I reported. > > On you're latest dump, you said: > Here is another +1 with r286922. > > I can add a couple of bits of debugging data: > > > > (kgdb) fr 8 > > #8 0x80639d60 in knote (list=0xf8019a733ea0, > > hint=2147483648, lockflags=) at > > /usr/src/sys/kern/kern_event.c:1964 > > 1964} else if ((lockflags & KNF_NOKQLOCK) != 0) { > > > First off, that can't be r286922, per: > https://svnweb.freebsd.org/base/stable/10/sys/kern/kern_event.c?annotate=286922 > > line 1964 is blank... The line of code above should be at line 1884, > so not sure what is wrong here... > > Assuming that the pc really is at the line, f_event has not yet been > called, which is why I said that the list cannot be empty yet, as > f_event hasn't been called yet to remove the knote... It could be that > optimization moved stuff around, but if that is the case, then the > above wasn't useful.. > > > 4. The element is not only unlinked from the list, but its memory is > > also freed. > > Where is the memory freed? A knote MUST NOT be freed in an f_event > handler. The only location that a list element is allowed to be > freed is in knote_drop, which must happen after f_detach is called, > but that can't/won't happen from knote (I believe the timer handles > this specially, but we are talking about normal knlist type filters).. > > The rest of your explination is invalid due to the invalid assumption > of this point... > > If you can provide to me where the knote is free'd in knote, w/ > function/line number stack trace (does not have to be dump, but a > sample call path), then I'll reconsider, and fix that bug... Sigh. Did you ever read the mails I sent ? Look at the filt_proc()->knlist_remove_inevent(). > > 5. That's why we have the use after free: SLIST_FOREACH is trying to get > > a pointer to a next element from the freed memory. > > 6. This is why the commit for trashing the freed memory made all the > > difference: previously the freed memory was unlikely to be re-used / > > modified, so the use-after-free had a high chance of succeeding. It's a > > fact that in my panic there was an attempt to dereference a trashed pointer. > > 7. Finally, this is why SLIST_FOREACH_SAFE helps here: we stash the > > pointer to the next element beforehand and, thus, we do not access the > > freed memory. > > > > Please let me know if you see any fault in above reasoning or if > > something is still no clear. > > -- > John-Mark GurneyVoice: +1 415 225 5579 > > "All that I will do, has been done, All that I have, has not." > ___ > freebsd-current@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Instant panic while trying run ports-mgmt/poudriere
On 27/08/2015 21:09, John-Mark Gurney wrote: > Andriy Gapon wrote this message on Thu, Aug 27, 2015 at 10:21 +0300: >> On 27/08/2015 02:36, John-Mark Gurney wrote: >>> We should/cannot get here w/ an empty list. If we do, then there is >>> something seriously wrong... The current kn (which we must have as we >>> are here) MUST be on the list, but as you just showed, there are no >>> knotes on the list. >>> >>> Can you get me a print of the knote? That way I can see what flags >>> are on it? >> >> Apologies if the following might sound a little bit patronizing, but it >> seems that you have got all the facts correctly, but somehow the >> connection between them did not become clear. >> >> So: >> 1. The list originally is NOT empty. I guess that it has one entry, but >> that's an unimportant detail. >> 2. This is why the loop is entered. It's a fact that it is entered. >> 3. The list becomes empty precisely because the entry is removed during >> the iteration in the loop (as kib has explained). It's a fact that the >> list became empty at least in the panic that I reported. > > On you're latest dump, you said: > Here is another +1 with r286922. > > I can add a couple of bits of debugging data: > > > > (kgdb) fr 8 > > #8 0x80639d60 in knote (list=0xf8019a733ea0, > > hint=2147483648, lockflags=) at > > /usr/src/sys/kern/kern_event.c:1964 > > 1964} else if ((lockflags & KNF_NOKQLOCK) != 0) { > > > First off, that can't be r286922, per: > https://svnweb.freebsd.org/base/stable/10/sys/kern/kern_event.c?annotate=286922 > > line 1964 is blank... The line of code above should be at line 1884, > so not sure what is wrong here... No, it can not be indeed, because I am running head. r286922 was the latest version of the repository, not the head branch, at the moment when I pulled the repository via git. > Assuming that the pc really is at the line, f_event has not yet been > called, Even on the second loop iteration? >which is why I said that the list cannot be empty yet, as > f_event hasn't been called yet to remove the knote... It could be that > optimization moved stuff around, but if that is the case, then the > above wasn't useful.. I provided the disassembly of the code as well, it's very obvious how the code was translated. >> 4. The element is not only unlinked from the list, but its memory is >> also freed. > > Where is the memory freed? A knote MUST NOT be freed in an f_event > handler. The only location that a list element is allowed to be > freed is in knote_drop, which must happen after f_detach is called, > but that can't/won't happen from knote (I believe the timer handles > this specially, but we are talking about normal knlist type filters).. Well, right. knote()->filt_proc()->knlist_remove_inevent() just removes the knote from the list. But then there is KNOTE_ACTIVATE() that passes the knote to a different owner (so to say). And given that the knote has EV_ONESHOT set on it (in filt_proc) and that poudriere can put quite a stress load on a system, I am not surprised that another thread gets a chance to call knote_drop() on the knote before the original thread proceeds to the next iteration. > The rest of your explination is invalid due to the invalid assumption > of this point... Eagerly waiting for your explanation... > If you can provide to me where the knote is free'd in knote, w/ > function/line number stack trace (does not have to be dump, but a > sample call path), then I'll reconsider, and fix that bug... >> 5. That's why we have the use after free: SLIST_FOREACH is trying to get >> a pointer to a next element from the freed memory. >> 6. This is why the commit for trashing the freed memory made all the >> difference: previously the freed memory was unlikely to be re-used / >> modified, so the use-after-free had a high chance of succeeding. It's a >> fact that in my panic there was an attempt to dereference a trashed pointer. >> 7. Finally, this is why SLIST_FOREACH_SAFE helps here: we stash the >> pointer to the next element beforehand and, thus, we do not access the >> freed memory. >> >> Please let me know if you see any fault in above reasoning or if >> something is still no clear. > -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Instant panic while trying run ports-mgmt/poudriere
Andriy Gapon wrote this message on Thu, Aug 27, 2015 at 10:21 +0300: > On 27/08/2015 02:36, John-Mark Gurney wrote: > > We should/cannot get here w/ an empty list. If we do, then there is > > something seriously wrong... The current kn (which we must have as we > > are here) MUST be on the list, but as you just showed, there are no > > knotes on the list. > > > > Can you get me a print of the knote? That way I can see what flags > > are on it? > > Apologies if the following might sound a little bit patronizing, but it > seems that you have got all the facts correctly, but somehow the > connection between them did not become clear. > > So: > 1. The list originally is NOT empty. I guess that it has one entry, but > that's an unimportant detail. > 2. This is why the loop is entered. It's a fact that it is entered. > 3. The list becomes empty precisely because the entry is removed during > the iteration in the loop (as kib has explained). It's a fact that the > list became empty at least in the panic that I reported. On you're latest dump, you said: Here is another +1 with r286922. I can add a couple of bits of debugging data: (kgdb) fr 8 #8 0x80639d60 in knote (list=0xf8019a733ea0, hint=2147483648, lockflags=) at /usr/src/sys/kern/kern_event.c:1964 1964} else if ((lockflags & KNF_NOKQLOCK) != 0) { First off, that can't be r286922, per: https://svnweb.freebsd.org/base/stable/10/sys/kern/kern_event.c?annotate=286922 line 1964 is blank... The line of code above should be at line 1884, so not sure what is wrong here... Assuming that the pc really is at the line, f_event has not yet been called, which is why I said that the list cannot be empty yet, as f_event hasn't been called yet to remove the knote... It could be that optimization moved stuff around, but if that is the case, then the above wasn't useful.. > 4. The element is not only unlinked from the list, but its memory is > also freed. Where is the memory freed? A knote MUST NOT be freed in an f_event handler. The only location that a list element is allowed to be freed is in knote_drop, which must happen after f_detach is called, but that can't/won't happen from knote (I believe the timer handles this specially, but we are talking about normal knlist type filters).. The rest of your explination is invalid due to the invalid assumption of this point... If you can provide to me where the knote is free'd in knote, w/ function/line number stack trace (does not have to be dump, but a sample call path), then I'll reconsider, and fix that bug... > 5. That's why we have the use after free: SLIST_FOREACH is trying to get > a pointer to a next element from the freed memory. > 6. This is why the commit for trashing the freed memory made all the > difference: previously the freed memory was unlikely to be re-used / > modified, so the use-after-free had a high chance of succeeding. It's a > fact that in my panic there was an attempt to dereference a trashed pointer. > 7. Finally, this is why SLIST_FOREACH_SAFE helps here: we stash the > pointer to the next element beforehand and, thus, we do not access the > freed memory. > > Please let me know if you see any fault in above reasoning or if > something is still no clear. -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Instant panic while trying run ports-mgmt/poudriere
On Thu, Aug 27, 2015 at 10:21:47AM +0300, Andriy Gapon wrote: > On 27/08/2015 02:36, John-Mark Gurney wrote: > > We should/cannot get here w/ an empty list. If we do, then there is > > something seriously wrong... The current kn (which we must have as we > > are here) MUST be on the list, but as you just showed, there are no > > knotes on the list. > > > > Can you get me a print of the knote? That way I can see what flags > > are on it? > > Apologies if the following might sound a little bit patronizing, but it > seems that you have got all the facts correctly, but somehow the > connection between them did not become clear. > > So: > 1. The list originally is NOT empty. I guess that it has one entry, but > that's an unimportant detail. > 2. This is why the loop is entered. It's a fact that it is entered. > 3. The list becomes empty precisely because the entry is removed during > the iteration in the loop (as kib has explained). It's a fact that the > list became empty at least in the panic that I reported. The only detail I can add to this explanation, which is probably third (?) time, is that the removal is done in the filt_proc() event method, by the call to knlist_remove_inevent(). > 4. The element is not only unlinked from the list, but its memory is > also freed. > 5. That's why we have the use after free: SLIST_FOREACH is trying to get > a pointer to a next element from the freed memory. > 6. This is why the commit for trashing the freed memory made all the > difference: previously the freed memory was unlikely to be re-used / > modified, so the use-after-free had a high chance of succeeding. It's a > fact that in my panic there was an attempt to dereference a trashed pointer. > 7. Finally, this is why SLIST_FOREACH_SAFE helps here: we stash the > pointer to the next element beforehand and, thus, we do not access the > freed memory. The additional, eighth item, should explain why the change to _SAFE() is the correct fix, and not just a papering over the problem. Nobody except the current thread can modify the knlist, because knlist is locked. As a consequence, only the current element can be unlinked and removed. So the _SAFE() iterator actually work. > > Please let me know if you see any fault in above reasoning or if > something is still no clear. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Instant panic while trying run ports-mgmt/poudriere
On 27 Aug, Lawrence Stewart wrote: > On 08/27/15 09:36, John-Mark Gurney wrote: >> Andriy Gapon wrote this message on Sun, Aug 23, 2015 at 09:54 +0300: >>> On 12/08/2015 17:11, Lawrence Stewart wrote: On 08/07/15 07:33, Pawel Pekala wrote: > Hi K., > > On 2015-08-06 12:33 -0700, "K. Macy" wrote: >> Is this still happening? > > Still crashes: +1 for me running r286617 >>> >>> Here is another +1 with r286922. >>> I can add a couple of bits of debugging data: >>> >>> (kgdb) fr 8 >>> #8 0x80639d60 in knote (list=0xf8019a733ea0, >>> hint=2147483648, lockflags=) at >>> /usr/src/sys/kern/kern_event.c:1964 >>> 1964} else if ((lockflags & KNF_NOKQLOCK) != 0) { >>> (kgdb) p *list >>> $2 = {kl_list = {slh_first = 0x0}, kl_lock = 0x8063a1e0 >> >> We should/cannot get here w/ an empty list. If we do, then there is >> something seriously wrong... The current kn (which we must have as we >> are here) MUST be on the list, but as you just showed, there are no >> knotes on the list. >> >> Can you get me a print of the knote? That way I can see what flags >> are on it? > > I quickly tried to get this info for you by building my kernel with -O0 > and reproducing, but I get an insta-panic on boot with the new kernel: > > Fatal double fault > rip = 0x8218c794 > rsp = 0xfe044cdc9fe0 > rbp = 0xfe044cdca110 > cpuid = 2; apic id = 02 > panic: double fault > cpuid = 2 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame > 0xfe03dcfffe30 > vpanic() at vpanic+0x189/frame 0xfe03dcfffeb0 > panic() at panic+0x43/frame 0xfe03dc10 > dblfault_handler() at dblfault_handler+0xa2/frame 0xfe03dc30 > Xdblfault() at Xdblfault+0xac/frame 0xfe03dc30 > --- trap 0x17, rip = 0x8218c794, rsp = 0xfe044cdc9fe0, rbp = > 0xfe044cdca110 --- > vdev_queue_aggregate() at vdev_queue_aggregate+0x34/frame 0xfe044cdca110 > vdev_queue_io_to_issue() at vdev_queue_io_to_issue+0x1f5/frame > 0xfe044cdca560 > vdev_queue_io() at vdev_queue_io+0x19a/frame 0xfe044cdca5b0 > zio_vdev_io_start() at zio_vdev_io_start+0x81f/frame 0xfe044cdca6e0 > zio_execute() at zio_execute+0x23b/frame 0xfe044cdca730 > zio_nowait() at zio_nowait+0xbe/frame 0xfe044cdca760 > vdev_mirror_io_start() at vdev_mirror_io_start+0x140/frame > 0xfe044cdca800 > zio_vdev_io_start() at zio_vdev_io_start+0x12f/frame 0xfe044cdca930 > zio_execute() at zio_execute+0x23b/frame 0xfe044cdca980 > zio_nowait() at zio_nowait+0xbe/frame 0xfe044cdca9b0 > spa_load_verify_cb() at spa_load_verify_cb+0x37d/frame 0xfe044cdcaa50 > traverse_visitbp() at traverse_visitbp+0x5a5/frame 0xfe044cdcac60 > traverse_dnode() at traverse_dnode+0x98/frame 0xfe044cdcacd0 > traverse_visitbp() at traverse_visitbp+0xc66/frame 0xfe044cdcaee0 > traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb0f0 > traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb300 > traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb510 > traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb720 > traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb930 > traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcbb40 > traverse_dnode() at traverse_dnode+0x98/frame 0xfe044cdcbbb0 > traverse_visitbp() at traverse_visitbp+0xe59/frame 0xfe044cdcbdc0 > traverse_impl() at traverse_impl+0x79d/frame 0xfe044cdcbfd0 > traverse_dataset() at traverse_dataset+0x93/frame 0xfe044cdcc040 > traverse_pool() at traverse_pool+0x1f2/frame 0xfe044cdcc140 > spa_load_verify() at spa_load_verify+0xf3/frame 0xfe044cdcc1f0 > spa_load_impl() at spa_load_impl+0x2069/frame 0xfe044cdcc610 > spa_load() at spa_load+0x320/frame 0xfe044cdcc6d0 > spa_load_impl() at spa_load_impl+0x150b/frame 0xfe044cdccaf0 > spa_load() at spa_load+0x320/frame 0xfe044cdccbb0 > spa_load_best() at spa_load_best+0xc6/frame 0xfe044cdccc50 > spa_open_common() at spa_open_common+0x246/frame 0xfe044cdccd40 > spa_open() at spa_open+0x35/frame 0xfe044cdccd70 > dsl_pool_hold() at dsl_pool_hold+0x2d/frame 0xfe044cdccdb0 > dmu_objset_own() at dmu_objset_own+0x2e/frame 0xfe044cdcce30 > zfsvfs_create() at zfsvfs_create+0x100/frame 0xfe044cdcd050 > zfs_domount() at zfs_domount+0xa7/frame 0xfe044cdcd0e0 > zfs_mount() at zfs_mount+0x6c3/frame 0xfe044cdcd390 > vfs_donmount() at vfs_donmount+0x1330/frame 0xfe044cdcd660 > kernel_mount() at kernel_mount+0x62/frame 0xfe044cdcd6c0 > parse_mount() at parse_mount+0x668/frame 0xfe044cdcd810 > vfs_mountroot() at vfs_mountroot+0x85c/frame 0xfe044cdcd9d0 > start_init() at start_init+0x62/frame 0xfe044cdcda70 > fork_exit() at fork_exit+0x84/frame 0xfe044cdcdab0 > fork_trampoline() at fork_trampoline+0xe/frame 0xfe044cdcdab0 > --- trap 0, rip = 0, rsp = 0, rbp = 0 --- > KDB: enter: panic > >
Re: Instant panic while trying run ports-mgmt/poudriere
On 27/08/2015 02:36, John-Mark Gurney wrote: > We should/cannot get here w/ an empty list. If we do, then there is > something seriously wrong... The current kn (which we must have as we > are here) MUST be on the list, but as you just showed, there are no > knotes on the list. > > Can you get me a print of the knote? That way I can see what flags > are on it? Apologies if the following might sound a little bit patronizing, but it seems that you have got all the facts correctly, but somehow the connection between them did not become clear. So: 1. The list originally is NOT empty. I guess that it has one entry, but that's an unimportant detail. 2. This is why the loop is entered. It's a fact that it is entered. 3. The list becomes empty precisely because the entry is removed during the iteration in the loop (as kib has explained). It's a fact that the list became empty at least in the panic that I reported. 4. The element is not only unlinked from the list, but its memory is also freed. 5. That's why we have the use after free: SLIST_FOREACH is trying to get a pointer to a next element from the freed memory. 6. This is why the commit for trashing the freed memory made all the difference: previously the freed memory was unlikely to be re-used / modified, so the use-after-free had a high chance of succeeding. It's a fact that in my panic there was an attempt to dereference a trashed pointer. 7. Finally, this is why SLIST_FOREACH_SAFE helps here: we stash the pointer to the next element beforehand and, thus, we do not access the freed memory. Please let me know if you see any fault in above reasoning or if something is still no clear. -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Instant panic while trying run ports-mgmt/poudriere
On 27 Aug, Don Lewis wrote: > On 27 Aug, Lawrence Stewart wrote: >> On 08/27/15 09:36, John-Mark Gurney wrote: >>> Andriy Gapon wrote this message on Sun, Aug 23, 2015 at 09:54 +0300: On 12/08/2015 17:11, Lawrence Stewart wrote: > On 08/07/15 07:33, Pawel Pekala wrote: >> Hi K., >> >> On 2015-08-06 12:33 -0700, "K. Macy" wrote: >>> Is this still happening? >> >> Still crashes: > > +1 for me running r286617 Here is another +1 with r286922. I can add a couple of bits of debugging data: (kgdb) fr 8 #8 0x80639d60 in knote (list=0xf8019a733ea0, hint=2147483648, lockflags=) at /usr/src/sys/kern/kern_event.c:1964 1964} else if ((lockflags & KNF_NOKQLOCK) != 0) { (kgdb) p *list $2 = {kl_list = {slh_first = 0x0}, kl_lock = 0x8063a1e0 >>> >>> We should/cannot get here w/ an empty list. If we do, then there is >>> something seriously wrong... The current kn (which we must have as we >>> are here) MUST be on the list, but as you just showed, there are no >>> knotes on the list. >>> >>> Can you get me a print of the knote? That way I can see what flags >>> are on it? >> >> I quickly tried to get this info for you by building my kernel with -O0 >> and reproducing, but I get an insta-panic on boot with the new kernel: >> >> Fatal double fault >> rip = 0x8218c794 >> rsp = 0xfe044cdc9fe0 >> rbp = 0xfe044cdca110 >> cpuid = 2; apic id = 02 >> panic: double fault >> cpuid = 2 >> KDB: stack backtrace: >> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame >> 0xfe03dcfffe30 >> vpanic() at vpanic+0x189/frame 0xfe03dcfffeb0 >> panic() at panic+0x43/frame 0xfe03dc10 >> dblfault_handler() at dblfault_handler+0xa2/frame 0xfe03dc30 >> Xdblfault() at Xdblfault+0xac/frame 0xfe03dc30 >> --- trap 0x17, rip = 0x8218c794, rsp = 0xfe044cdc9fe0, rbp = >> 0xfe044cdca110 --- >> vdev_queue_aggregate() at vdev_queue_aggregate+0x34/frame 0xfe044cdca110 >> vdev_queue_io_to_issue() at vdev_queue_io_to_issue+0x1f5/frame >> 0xfe044cdca560 >> vdev_queue_io() at vdev_queue_io+0x19a/frame 0xfe044cdca5b0 >> zio_vdev_io_start() at zio_vdev_io_start+0x81f/frame 0xfe044cdca6e0 >> zio_execute() at zio_execute+0x23b/frame 0xfe044cdca730 >> zio_nowait() at zio_nowait+0xbe/frame 0xfe044cdca760 >> vdev_mirror_io_start() at vdev_mirror_io_start+0x140/frame >> 0xfe044cdca800 >> zio_vdev_io_start() at zio_vdev_io_start+0x12f/frame 0xfe044cdca930 >> zio_execute() at zio_execute+0x23b/frame 0xfe044cdca980 >> zio_nowait() at zio_nowait+0xbe/frame 0xfe044cdca9b0 >> spa_load_verify_cb() at spa_load_verify_cb+0x37d/frame 0xfe044cdcaa50 >> traverse_visitbp() at traverse_visitbp+0x5a5/frame 0xfe044cdcac60 >> traverse_dnode() at traverse_dnode+0x98/frame 0xfe044cdcacd0 >> traverse_visitbp() at traverse_visitbp+0xc66/frame 0xfe044cdcaee0 >> traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb0f0 >> traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb300 >> traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb510 >> traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb720 >> traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb930 >> traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcbb40 >> traverse_dnode() at traverse_dnode+0x98/frame 0xfe044cdcbbb0 >> traverse_visitbp() at traverse_visitbp+0xe59/frame 0xfe044cdcbdc0 >> traverse_impl() at traverse_impl+0x79d/frame 0xfe044cdcbfd0 >> traverse_dataset() at traverse_dataset+0x93/frame 0xfe044cdcc040 >> traverse_pool() at traverse_pool+0x1f2/frame 0xfe044cdcc140 >> spa_load_verify() at spa_load_verify+0xf3/frame 0xfe044cdcc1f0 >> spa_load_impl() at spa_load_impl+0x2069/frame 0xfe044cdcc610 >> spa_load() at spa_load+0x320/frame 0xfe044cdcc6d0 >> spa_load_impl() at spa_load_impl+0x150b/frame 0xfe044cdccaf0 >> spa_load() at spa_load+0x320/frame 0xfe044cdccbb0 >> spa_load_best() at spa_load_best+0xc6/frame 0xfe044cdccc50 >> spa_open_common() at spa_open_common+0x246/frame 0xfe044cdccd40 >> spa_open() at spa_open+0x35/frame 0xfe044cdccd70 >> dsl_pool_hold() at dsl_pool_hold+0x2d/frame 0xfe044cdccdb0 >> dmu_objset_own() at dmu_objset_own+0x2e/frame 0xfe044cdcce30 >> zfsvfs_create() at zfsvfs_create+0x100/frame 0xfe044cdcd050 >> zfs_domount() at zfs_domount+0xa7/frame 0xfe044cdcd0e0 >> zfs_mount() at zfs_mount+0x6c3/frame 0xfe044cdcd390 >> vfs_donmount() at vfs_donmount+0x1330/frame 0xfe044cdcd660 >> kernel_mount() at kernel_mount+0x62/frame 0xfe044cdcd6c0 >> parse_mount() at parse_mount+0x668/frame 0xfe044cdcd810 >> vfs_mountroot() at vfs_mountroot+0x85c/frame 0xfe044cdcd9d0 >> start_init() at start_init+0x62/frame 0xfe044cdcda70 >> fork_exit() at fork_exit+0x84/frame 0xfe044cdcdab0 >> fork_tr
Re: Instant panic while trying run ports-mgmt/poudriere
On 08/27/15 09:36, John-Mark Gurney wrote: > Andriy Gapon wrote this message on Sun, Aug 23, 2015 at 09:54 +0300: >> On 12/08/2015 17:11, Lawrence Stewart wrote: >>> On 08/07/15 07:33, Pawel Pekala wrote: Hi K., On 2015-08-06 12:33 -0700, "K. Macy" wrote: > Is this still happening? Still crashes: >>> >>> +1 for me running r286617 >> >> Here is another +1 with r286922. >> I can add a couple of bits of debugging data: >> >> (kgdb) fr 8 >> #8 0x80639d60 in knote (list=0xf8019a733ea0, >> hint=2147483648, lockflags=) at >> /usr/src/sys/kern/kern_event.c:1964 >> 1964} else if ((lockflags & KNF_NOKQLOCK) != 0) { >> (kgdb) p *list >> $2 = {kl_list = {slh_first = 0x0}, kl_lock = 0x8063a1e0 > > We should/cannot get here w/ an empty list. If we do, then there is > something seriously wrong... The current kn (which we must have as we > are here) MUST be on the list, but as you just showed, there are no > knotes on the list. > > Can you get me a print of the knote? That way I can see what flags > are on it? I quickly tried to get this info for you by building my kernel with -O0 and reproducing, but I get an insta-panic on boot with the new kernel: Fatal double fault rip = 0x8218c794 rsp = 0xfe044cdc9fe0 rbp = 0xfe044cdca110 cpuid = 2; apic id = 02 panic: double fault cpuid = 2 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe03dcfffe30 vpanic() at vpanic+0x189/frame 0xfe03dcfffeb0 panic() at panic+0x43/frame 0xfe03dc10 dblfault_handler() at dblfault_handler+0xa2/frame 0xfe03dc30 Xdblfault() at Xdblfault+0xac/frame 0xfe03dc30 --- trap 0x17, rip = 0x8218c794, rsp = 0xfe044cdc9fe0, rbp = 0xfe044cdca110 --- vdev_queue_aggregate() at vdev_queue_aggregate+0x34/frame 0xfe044cdca110 vdev_queue_io_to_issue() at vdev_queue_io_to_issue+0x1f5/frame 0xfe044cdca560 vdev_queue_io() at vdev_queue_io+0x19a/frame 0xfe044cdca5b0 zio_vdev_io_start() at zio_vdev_io_start+0x81f/frame 0xfe044cdca6e0 zio_execute() at zio_execute+0x23b/frame 0xfe044cdca730 zio_nowait() at zio_nowait+0xbe/frame 0xfe044cdca760 vdev_mirror_io_start() at vdev_mirror_io_start+0x140/frame 0xfe044cdca800 zio_vdev_io_start() at zio_vdev_io_start+0x12f/frame 0xfe044cdca930 zio_execute() at zio_execute+0x23b/frame 0xfe044cdca980 zio_nowait() at zio_nowait+0xbe/frame 0xfe044cdca9b0 spa_load_verify_cb() at spa_load_verify_cb+0x37d/frame 0xfe044cdcaa50 traverse_visitbp() at traverse_visitbp+0x5a5/frame 0xfe044cdcac60 traverse_dnode() at traverse_dnode+0x98/frame 0xfe044cdcacd0 traverse_visitbp() at traverse_visitbp+0xc66/frame 0xfe044cdcaee0 traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb0f0 traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb300 traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb510 traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb720 traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb930 traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcbb40 traverse_dnode() at traverse_dnode+0x98/frame 0xfe044cdcbbb0 traverse_visitbp() at traverse_visitbp+0xe59/frame 0xfe044cdcbdc0 traverse_impl() at traverse_impl+0x79d/frame 0xfe044cdcbfd0 traverse_dataset() at traverse_dataset+0x93/frame 0xfe044cdcc040 traverse_pool() at traverse_pool+0x1f2/frame 0xfe044cdcc140 spa_load_verify() at spa_load_verify+0xf3/frame 0xfe044cdcc1f0 spa_load_impl() at spa_load_impl+0x2069/frame 0xfe044cdcc610 spa_load() at spa_load+0x320/frame 0xfe044cdcc6d0 spa_load_impl() at spa_load_impl+0x150b/frame 0xfe044cdccaf0 spa_load() at spa_load+0x320/frame 0xfe044cdccbb0 spa_load_best() at spa_load_best+0xc6/frame 0xfe044cdccc50 spa_open_common() at spa_open_common+0x246/frame 0xfe044cdccd40 spa_open() at spa_open+0x35/frame 0xfe044cdccd70 dsl_pool_hold() at dsl_pool_hold+0x2d/frame 0xfe044cdccdb0 dmu_objset_own() at dmu_objset_own+0x2e/frame 0xfe044cdcce30 zfsvfs_create() at zfsvfs_create+0x100/frame 0xfe044cdcd050 zfs_domount() at zfs_domount+0xa7/frame 0xfe044cdcd0e0 zfs_mount() at zfs_mount+0x6c3/frame 0xfe044cdcd390 vfs_donmount() at vfs_donmount+0x1330/frame 0xfe044cdcd660 kernel_mount() at kernel_mount+0x62/frame 0xfe044cdcd6c0 parse_mount() at parse_mount+0x668/frame 0xfe044cdcd810 vfs_mountroot() at vfs_mountroot+0x85c/frame 0xfe044cdcd9d0 start_init() at start_init+0x62/frame 0xfe044cdcda70 fork_exit() at fork_exit+0x84/frame 0xfe044cdcdab0 fork_trampoline() at fork_trampoline+0xe/frame 0xfe044cdcdab0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- KDB: enter: panic Didn't get a core because it panics before dumpdev is set. Is anyone else able to run -O0 kernels or do I have something set to evil? Cheers, Lawrence ___ freebsd-
Re: Instant panic while trying run ports-mgmt/poudriere
Konstantin Belousov wrote this message on Mon, Aug 24, 2015 at 11:10 +0300: > On Sun, Aug 23, 2015 at 10:35:44PM -0700, John-Mark Gurney wrote: > > Konstantin Belousov wrote this message on Sun, Aug 23, 2015 at 15:54 +0300: > > > if (kev->flags & EV_ADD) > > > - tkn = knote_alloc(waitok); /* prevent waiting with locks */ > > > + /* > > > + * Prevent waiting with locks. Non-sleepable > > > + * allocation failures are handled in the loop, only > > > + * if the spare knote appears to be actually required. > > > + */ > > > + tkn = knote_alloc(waitok); > > > > if you add this comment, please add curly braces around the block... > Ok. > > > > > > else > > > tkn = NULL; > > > > > > @@ -1310,8 +1315,7 @@ done: > > > FILEDESC_XUNLOCK(td->td_proc->p_fd); > > > if (fp != NULL) > > > fdrop(fp, td); > > > - if (tkn != NULL) > > > - knote_free(tkn); > > > + knote_free(tkn); > > > > Probably should just change knote_free to a static inline that does > > a uma_zfree as uma_zfree also does nothing is the input is NULL... > This was already done in the patch (the removal of the NULL check in > knote_free()). I usually do not add excessive inline keywords. Compilers > are good, sometimes even too good, at figuring out the possibilities for > inlining. knote_free() is inlined automatically. Though it is, if we really change knote_free to a bare uma_free, then either mark it inline (to be explicit about it's behavior), or make a macro out of it... I don't particularly like functions that contain one line of simple code... > > > @@ -1948,7 +1948,7 @@ knote(struct knlist *list, long hint, int lockflags) > > >* only safe if you want to remove the current item, which we are > > >* not doing. > > >*/ > > > - SLIST_FOREACH(kn, &list->kl_list, kn_selnext) { > > > + SLIST_FOREACH_SAFE(kn, &list->kl_list, kn_selnext, tkn) { > > > > Clearly you didn't read the comment that preceeds this line, or at > > least didn't update it: > > * SLIST_FOREACH, SLIST_FOREACH_SAFE is not safe in our case, it is > > * only safe if you want to remove the current item, which we are > > * not doing. > > > > So, you'll need to be more specific in why this needs to change... > > When I wrote this code, I spent a lot of time looking at this, and > > reasoned as to why SLIST_FOREACH_SAFE was NOT correct usage here... > I explained what happens in the message. The knote list is modified > by the filter, see knlist_remove_inevent() call in filt_proc(). > > > > kq = kn->kn_kq; > > > KQ_LOCK(kq); > > > if ((kn->kn_status & (KN_INFLUX | KN_SCAN)) == KN_INFLUX) { > > > @@ -2385,15 +2385,16 @@ SYSINIT(knote, SI_SUB_PSEUDO, SI_ORDER_ANY, > > > knote_init, NULL); > > > static struct knote * > > > knote_alloc(int waitok) > > > { > > > - return ((struct knote *)uma_zalloc(knote_zone, > > > - (waitok ? M_WAITOK : M_NOWAIT)|M_ZERO)); > > > + > > > + return (uma_zalloc(knote_zone, (waitok ? M_WAITOK : M_NOWAIT) | > > > + M_ZERO)); > > > } > > > > > > static void > > > > per above, we should add inline here... > > > > > knote_free(struct knote *kn) > > > { > > > - if (kn != NULL) > > > - uma_zfree(knote_zone, kn); > > > + > > > + uma_zfree(knote_zone, kn); > > > } > > > > > > /* > > > > I agree w/ the all the non-SLIST changes, but I disagree w/ the SLIST > > change as I don't believe that all cases was considered... > What cases do you mean ? > > The patch does not unlock knlist lock in the iteration. As such, the > only thread which could remove elements from the knlist, or rearrange > the list, while loop is active, is the current thread. So I claim that > the only the current iterating element can be removed, and the next list > element stays valid. This is enough for _SAFE loop to work. > > Why do you think that _SAFE is incorrect ? Comment talks about very I can't think of the reason right now, but I do remeber puzzling over this issue for some hours when I wrote this code, and I had proved to myself that _SAFE was NOT _SAFE for this use case... In the quick look I just had, I have not been able to decide one way or the other, but I'm suspicious that this is a recent issue, as this code has been running for close to a decade w/o any issues, and wonder if there was some other change that trigger the issue... The reason I'm cautious about changing this is that the code has been running fine for over a decade... Have you done a full test to validate that nothing else breaks? Ok, after looking more at the original dump, this is a use after free bug... As I said in another email, it should not be possible to get into the _FOREACH loop where knlist is an empty list. If it does, then there is another major bug that needs to be found... A simple change to _SAFE will not fix this use after free bug... > different case, where the knlist lock is dropp
Re: Instant panic while trying run ports-mgmt/poudriere
Andriy Gapon wrote this message on Sun, Aug 23, 2015 at 09:54 +0300: > On 12/08/2015 17:11, Lawrence Stewart wrote: > > On 08/07/15 07:33, Pawel Pekala wrote: > >> Hi K., > >> > >> On 2015-08-06 12:33 -0700, "K. Macy" wrote: > >>> Is this still happening? > >> > >> Still crashes: > > > > +1 for me running r286617 > > Here is another +1 with r286922. > I can add a couple of bits of debugging data: > > (kgdb) fr 8 > #8 0x80639d60 in knote (list=0xf8019a733ea0, > hint=2147483648, lockflags=) at > /usr/src/sys/kern/kern_event.c:1964 > 1964} else if ((lockflags & KNF_NOKQLOCK) != 0) { > (kgdb) p *list > $2 = {kl_list = {slh_first = 0x0}, kl_lock = 0x8063a1e0 We should/cannot get here w/ an empty list. If we do, then there is something seriously wrong... The current kn (which we must have as we are here) MUST be on the list, but as you just showed, there are no knotes on the list. Can you get me a print of the knote? That way I can see what flags are on it? > , kl_unlock = 0x8063a200 , > kl_assert_locked = 0x8063a220 , > kl_assert_unlocked = 0x8063a240 , > kl_lockarg = 0xf8019a733bb0} > (kgdb) disassemble > Dump of assembler code for function knote: > 0x80639d00 : push %rbp > 0x80639d01 : mov%rsp,%rbp > 0x80639d04 : push %r15 > 0x80639d06 : push %r14 > 0x80639d08 : push %r13 > 0x80639d0a : push %r12 > 0x80639d0c : push %rbx > 0x80639d0d : sub$0x18,%rsp > 0x80639d11 : mov%edx,%r12d > 0x80639d14 : mov%rsi,-0x30(%rbp) > 0x80639d18 : mov%rdi,%rbx > 0x80639d1b : test %rbx,%rbx > 0x80639d1e : je 0x80639ef6 > 0x80639d24 : mov%r12d,%eax > 0x80639d27 : and$0x1,%eax > 0x80639d2a : mov%eax,-0x3c(%rbp) > 0x80639d2d : mov0x28(%rbx),%rdi > 0x80639d31 : je 0x80639d38 > 0x80639d33 : callq *0x18(%rbx) > 0x80639d36 : jmp0x80639d42 > 0x80639d38 : callq *0x20(%rbx) > 0x80639d3b : mov0x28(%rbx),%rdi > 0x80639d3f : callq *0x8(%rbx) > 0x80639d42 : mov%rbx,-0x38(%rbp) > 0x80639d46 : mov(%rbx),%rbx > 0x80639d49 : test %rbx,%rbx > 0x80639d4c : je 0x80639ee5 > 0x80639d52 : and$0x2,%r12d > 0x80639d56 : nopw %cs:0x0(%rax,%rax,1) > 0x80639d60 : mov0x28(%rbx),%r14 > > Panic is in the last quoted instruction. > And: > (kgdb) i reg > rax0x246582 > rbx0xdeadc0dedeadc0de -2401050962867404578 > rcx0x0 0 > rdx0x12e302 > rsi0x80a26a5a -2136839590 > rdi0x80e81b80 -2132272256 > rbp0xfe02b7efea20 0xfe02b7efea20 > rsp0xfe02b7efe9e0 0xfe02b7efe9e0 > r8 0x80a269ce -2136839730 > r9 0x80e82838 -2132269000 > r100x1 65536 > r110x80fabd10 -2131051248 > r120x0 0 > r130xf801ff84a818 -8787511171048 > r140xf801ff84a800 -8787511171072 > r150xf8019a6974f0 -8789207452432 > rip0x80639d60 0x80639d60 > eflags 0x10286 66182 > > I think that $rbx stands out here (this is a kernel with INVARIANTS). Yeh, it was probably r284861 that I added to catch use after free bugs like this... You could try reverting r284861 to see if the bug goes away... If it does, then this is most likely a use after free bug... > Looking at the code, is it possible that one of the calls from within > the loop's body modifies the list? If that is so and provided that is a > valid behavior, then maybe using SLIST_FOREACH_SAFE would help. -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Instant panic while trying run ports-mgmt/poudriere
Hi Konstantin, On 2015-08-23 15:54 +0300, Konstantin Belousov wrote: >After looking at your data closely, I think you are right. The panic >occurs when the exit1(9) does KNOTE_LOCKED(NOTE_EXIT). This is the >only case in the tree where filter uses knlist_remove_inevent() to >detach processed note, so indeed the slist is modified under the >iterator. > >Below is the patch with the suggested change and unrelated cleanup of >the uma(9) KPI use. Please test, everybody who has a panic with the >backtrace pointing to the sys_exit(). This patch fixes issue for me. Thank you. -- pozdrawiam / with regards Paweł Pękala ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Instant panic while trying run ports-mgmt/poudriere
On Sun, Aug 23, 2015 at 10:35:44PM -0700, John-Mark Gurney wrote: > Konstantin Belousov wrote this message on Sun, Aug 23, 2015 at 15:54 +0300: > > if (kev->flags & EV_ADD) > > - tkn = knote_alloc(waitok); /* prevent waiting with locks */ > > + /* > > +* Prevent waiting with locks. Non-sleepable > > +* allocation failures are handled in the loop, only > > +* if the spare knote appears to be actually required. > > +*/ > > + tkn = knote_alloc(waitok); > > if you add this comment, please add curly braces around the block... Ok. > > > else > > tkn = NULL; > > > > @@ -1310,8 +1315,7 @@ done: > > FILEDESC_XUNLOCK(td->td_proc->p_fd); > > if (fp != NULL) > > fdrop(fp, td); > > - if (tkn != NULL) > > - knote_free(tkn); > > + knote_free(tkn); > > Probably should just change knote_free to a static inline that does > a uma_zfree as uma_zfree also does nothing is the input is NULL... This was already done in the patch (the removal of the NULL check in knote_free()). I usually do not add excessive inline keywords. Compilers are good, sometimes even too good, at figuring out the possibilities for inlining. knote_free() is inlined automatically. > > @@ -1948,7 +1948,7 @@ knote(struct knlist *list, long hint, int lockflags) > > * only safe if you want to remove the current item, which we are > > * not doing. > > */ > > - SLIST_FOREACH(kn, &list->kl_list, kn_selnext) { > > + SLIST_FOREACH_SAFE(kn, &list->kl_list, kn_selnext, tkn) { > > Clearly you didn't read the comment that preceeds this line, or at > least didn't update it: > * SLIST_FOREACH, SLIST_FOREACH_SAFE is not safe in our case, it is > * only safe if you want to remove the current item, which we are > * not doing. > > So, you'll need to be more specific in why this needs to change... > When I wrote this code, I spent a lot of time looking at this, and > reasoned as to why SLIST_FOREACH_SAFE was NOT correct usage here... I explained what happens in the message. The knote list is modified by the filter, see knlist_remove_inevent() call in filt_proc(). > > > kq = kn->kn_kq; > > KQ_LOCK(kq); > > if ((kn->kn_status & (KN_INFLUX | KN_SCAN)) == KN_INFLUX) { > > @@ -2385,15 +2385,16 @@ SYSINIT(knote, SI_SUB_PSEUDO, SI_ORDER_ANY, > > knote_init, NULL); > > static struct knote * > > knote_alloc(int waitok) > > { > > - return ((struct knote *)uma_zalloc(knote_zone, > > - (waitok ? M_WAITOK : M_NOWAIT)|M_ZERO)); > > + > > + return (uma_zalloc(knote_zone, (waitok ? M_WAITOK : M_NOWAIT) | > > + M_ZERO)); > > } > > > > static void > > per above, we should add inline here... > > > knote_free(struct knote *kn) > > { > > - if (kn != NULL) > > - uma_zfree(knote_zone, kn); > > + > > + uma_zfree(knote_zone, kn); > > } > > > > /* > > I agree w/ the all the non-SLIST changes, but I disagree w/ the SLIST > change as I don't believe that all cases was considered... What cases do you mean ? The patch does not unlock knlist lock in the iteration. As such, the only thread which could remove elements from the knlist, or rearrange the list, while loop is active, is the current thread. So I claim that the only the current iterating element can be removed, and the next list element stays valid. This is enough for _SAFE loop to work. Why do you think that _SAFE is incorrect ? Comment talks about very different case, where the knlist lock is dropped. Then indeed, other thread may iterate in parallel, and invalidate the memoized next element while KN_INFLUX is set for the current element and knlist is dropped. But _SAFE in sys/queue.h never means 'safe for parallel mutators', it only means 'safe for the current iterator removing current element'. I preferred not to touch the comment until it is confirmed that the change help. I reformulated it now, trying to keep the note about unlock (but is it useful ?). > > Anyways, the other changes shouldn't be committed w/ the SLIST change > as they are unrelated... Sure, I posted the diff against the WIP branch. The commits will be split. diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c index a4388aa..0e26a78 100644 --- a/sys/kern/kern_event.c +++ b/sys/kern/kern_event.c @@ -1105,10 +1105,16 @@ kqueue_register(struct kqueue *kq, struct kevent *kev, struct thread *td, int wa if (fops == NULL) return EINVAL; - if (kev->flags & EV_ADD) - tkn = knote_alloc(waitok); /* prevent waiting with locks */ - else + if (kev->flags & EV_ADD) { + /* +* Prevent waiting with locks. Non-sleepable +* allocation failures are handled in the loop, only +* if the spare knote appears to be actually required. +*/ + tkn = knote_al
Re: Instant panic while trying run ports-mgmt/poudriere
On 23/08/2015 15:54, Konstantin Belousov wrote: > After looking at your data closely, I think you are right. The panic > occurs when the exit1(9) does KNOTE_LOCKED(NOTE_EXIT). This is the > only case in the tree where filter uses knlist_remove_inevent() to detach > processed note, so indeed the slist is modified under the iterator. > > Below is the patch with the suggested change and unrelated cleanup of > the uma(9) KPI use. Please test, everybody who has a panic with the > backtrace pointing to the sys_exit(). Thank you very much! I no longer get the panic in the test case that previously triggered it. -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Instant panic while trying run ports-mgmt/poudriere
Konstantin Belousov wrote this message on Sun, Aug 23, 2015 at 15:54 +0300: > On Sun, Aug 23, 2015 at 12:08:16PM +0300, Konstantin Belousov wrote: > > On Sun, Aug 23, 2015 at 09:54:28AM +0300, Andriy Gapon wrote: > > > On 12/08/2015 17:11, Lawrence Stewart wrote: > > > > On 08/07/15 07:33, Pawel Pekala wrote: > > > >> Hi K., > > > >> > > > >> On 2015-08-06 12:33 -0700, "K. Macy" wrote: > > > >>> Is this still happening? > > > >> > > > >> Still crashes: > > > > > > > > +1 for me running r286617 > > > > > > Here is another +1 with r286922. > > > I can add a couple of bits of debugging data: > > > > > > (kgdb) fr 8 > > > #8 0x80639d60 in knote (list=0xf8019a733ea0, > > > hint=2147483648, lockflags=) at > > > /usr/src/sys/kern/kern_event.c:1964 > > > 1964} else if ((lockflags & KNF_NOKQLOCK) != 0) { > > > (kgdb) p *list > > > $2 = {kl_list = {slh_first = 0x0}, kl_lock = 0x8063a1e0 > > > , kl_unlock = 0x8063a200 , > > > kl_assert_locked = 0x8063a220 , > > > kl_assert_unlocked = 0x8063a240 , > > > kl_lockarg = 0xf8019a733bb0} > > > (kgdb) disassemble > > > Dump of assembler code for function knote: > > > 0x80639d00 : push %rbp > > > 0x80639d01 : mov%rsp,%rbp > > > 0x80639d04 : push %r15 > > > 0x80639d06 : push %r14 > > > 0x80639d08 : push %r13 > > > 0x80639d0a : push %r12 > > > 0x80639d0c : push %rbx > > > 0x80639d0d : sub$0x18,%rsp > > > 0x80639d11 : mov%edx,%r12d > > > 0x80639d14 : mov%rsi,-0x30(%rbp) > > > 0x80639d18 : mov%rdi,%rbx > > > 0x80639d1b : test %rbx,%rbx > > > 0x80639d1e : je 0x80639ef6 > > > 0x80639d24 : mov%r12d,%eax > > > 0x80639d27 : and$0x1,%eax > > > 0x80639d2a : mov%eax,-0x3c(%rbp) > > > 0x80639d2d : mov0x28(%rbx),%rdi > > > 0x80639d31 : je 0x80639d38 > > > 0x80639d33 : callq *0x18(%rbx) > > > 0x80639d36 : jmp0x80639d42 > > > 0x80639d38 : callq *0x20(%rbx) > > > 0x80639d3b : mov0x28(%rbx),%rdi > > > 0x80639d3f : callq *0x8(%rbx) > > > 0x80639d42 : mov%rbx,-0x38(%rbp) > > > 0x80639d46 : mov(%rbx),%rbx > > > 0x80639d49 : test %rbx,%rbx > > > 0x80639d4c : je 0x80639ee5 > > > 0x80639d52 : and$0x2,%r12d > > > 0x80639d56 : nopw %cs:0x0(%rax,%rax,1) > > > 0x80639d60 : mov0x28(%rbx),%r14 > > > > > > Panic is in the last quoted instruction. > > > And: > > > (kgdb) i reg > > > rax0x246582 > > > rbx0xdeadc0dedeadc0de -2401050962867404578 > > > rcx0x0 0 > > > rdx0x12e302 > > > rsi0x80a26a5a -2136839590 > > > rdi0x80e81b80 -2132272256 > > > rbp0xfe02b7efea20 0xfe02b7efea20 > > > rsp0xfe02b7efe9e0 0xfe02b7efe9e0 > > > r8 0x80a269ce -2136839730 > > > r9 0x80e82838 -2132269000 > > > r100x1 65536 > > > r110x80fabd10 -2131051248 > > > r120x0 0 > > > r130xf801ff84a818 -8787511171048 > > > r140xf801ff84a800 -8787511171072 > > > r150xf8019a6974f0 -8789207452432 > > > rip0x80639d60 0x80639d60 > > > eflags 0x10286 66182 > > > > > > I think that $rbx stands out here (this is a kernel with INVARIANTS). > > > > > > Looking at the code, is it possible that one of the calls from within > > > the loop's body modifies the list? If that is so and provided that is a > > > valid behavior, then maybe using SLIST_FOREACH_SAFE would help. > > > > This is first time a useful debugging data was posted. > > > > The 0x28 offset may indicate either kn_kq member access of the struct > > knote, or kq_list of the struct kqueue. > > > > kl_list.slh_first of the list parameter is NULL, how would a list > > iteration loop even start ? Can you look up the list argument value > > from the previous frame (%rdi is overwritten, so debugger might be > > confused) ? > > After looking at your data closely, I think you are right. The panic > occurs when the exit1(9) does KNOTE_LOCKED(NOTE_EXIT). This is the > only case in the tree where filter uses knlist_remove_inevent() to detach > processed note, so indeed the slist is modified under the iterator. > > Below is the patch with the suggested change and unrelated cleanup of > the uma(9) KPI use. Please test, everybody who has a panic with the > backtrace pointing to the sys_exit(). > > diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c > index a4388aa..2f15f7f 100644 > --- a/sys/kern/kern_event.c > +++ b/sys/kern/kern_
Re: Instant panic while trying run ports-mgmt/poudriere
On Sun, Aug 23, 2015 at 12:08:16PM +0300, Konstantin Belousov wrote: > On Sun, Aug 23, 2015 at 09:54:28AM +0300, Andriy Gapon wrote: > > On 12/08/2015 17:11, Lawrence Stewart wrote: > > > On 08/07/15 07:33, Pawel Pekala wrote: > > >> Hi K., > > >> > > >> On 2015-08-06 12:33 -0700, "K. Macy" wrote: > > >>> Is this still happening? > > >> > > >> Still crashes: > > > > > > +1 for me running r286617 > > > > Here is another +1 with r286922. > > I can add a couple of bits of debugging data: > > > > (kgdb) fr 8 > > #8 0x80639d60 in knote (list=0xf8019a733ea0, > > hint=2147483648, lockflags=) at > > /usr/src/sys/kern/kern_event.c:1964 > > 1964} else if ((lockflags & KNF_NOKQLOCK) != 0) { > > (kgdb) p *list > > $2 = {kl_list = {slh_first = 0x0}, kl_lock = 0x8063a1e0 > > , kl_unlock = 0x8063a200 , > > kl_assert_locked = 0x8063a220 , > > kl_assert_unlocked = 0x8063a240 , > > kl_lockarg = 0xf8019a733bb0} > > (kgdb) disassemble > > Dump of assembler code for function knote: > > 0x80639d00 : push %rbp > > 0x80639d01 : mov%rsp,%rbp > > 0x80639d04 : push %r15 > > 0x80639d06 : push %r14 > > 0x80639d08 : push %r13 > > 0x80639d0a : push %r12 > > 0x80639d0c : push %rbx > > 0x80639d0d : sub$0x18,%rsp > > 0x80639d11 : mov%edx,%r12d > > 0x80639d14 : mov%rsi,-0x30(%rbp) > > 0x80639d18 : mov%rdi,%rbx > > 0x80639d1b : test %rbx,%rbx > > 0x80639d1e : je 0x80639ef6 > > 0x80639d24 : mov%r12d,%eax > > 0x80639d27 : and$0x1,%eax > > 0x80639d2a : mov%eax,-0x3c(%rbp) > > 0x80639d2d : mov0x28(%rbx),%rdi > > 0x80639d31 : je 0x80639d38 > > 0x80639d33 : callq *0x18(%rbx) > > 0x80639d36 : jmp0x80639d42 > > 0x80639d38 : callq *0x20(%rbx) > > 0x80639d3b : mov0x28(%rbx),%rdi > > 0x80639d3f : callq *0x8(%rbx) > > 0x80639d42 : mov%rbx,-0x38(%rbp) > > 0x80639d46 : mov(%rbx),%rbx > > 0x80639d49 : test %rbx,%rbx > > 0x80639d4c : je 0x80639ee5 > > 0x80639d52 : and$0x2,%r12d > > 0x80639d56 : nopw %cs:0x0(%rax,%rax,1) > > 0x80639d60 : mov0x28(%rbx),%r14 > > > > Panic is in the last quoted instruction. > > And: > > (kgdb) i reg > > rax0x246582 > > rbx0xdeadc0dedeadc0de -2401050962867404578 > > rcx0x0 0 > > rdx0x12e302 > > rsi0x80a26a5a -2136839590 > > rdi0x80e81b80 -2132272256 > > rbp0xfe02b7efea20 0xfe02b7efea20 > > rsp0xfe02b7efe9e0 0xfe02b7efe9e0 > > r8 0x80a269ce -2136839730 > > r9 0x80e82838 -2132269000 > > r100x1 65536 > > r110x80fabd10 -2131051248 > > r120x0 0 > > r130xf801ff84a818 -8787511171048 > > r140xf801ff84a800 -8787511171072 > > r150xf8019a6974f0 -8789207452432 > > rip0x80639d60 0x80639d60 > > eflags 0x10286 66182 > > > > I think that $rbx stands out here (this is a kernel with INVARIANTS). > > > > Looking at the code, is it possible that one of the calls from within > > the loop's body modifies the list? If that is so and provided that is a > > valid behavior, then maybe using SLIST_FOREACH_SAFE would help. > > This is first time a useful debugging data was posted. > > The 0x28 offset may indicate either kn_kq member access of the struct > knote, or kq_list of the struct kqueue. > > kl_list.slh_first of the list parameter is NULL, how would a list > iteration loop even start ? Can you look up the list argument value > from the previous frame (%rdi is overwritten, so debugger might be > confused) ? After looking at your data closely, I think you are right. The panic occurs when the exit1(9) does KNOTE_LOCKED(NOTE_EXIT). This is the only case in the tree where filter uses knlist_remove_inevent() to detach processed note, so indeed the slist is modified under the iterator. Below is the patch with the suggested change and unrelated cleanup of the uma(9) KPI use. Please test, everybody who has a panic with the backtrace pointing to the sys_exit(). diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c index a4388aa..2f15f7f 100644 --- a/sys/kern/kern_event.c +++ b/sys/kern/kern_event.c @@ -1106,7 +1106,12 @@ kqueue_register(struct kqueue *kq, struct kevent *kev, struct thread *td, int wa return EINVAL; if (kev->flags & EV_ADD) - tkn = knote_alloc(waitok); /* prevent waiting with locks */ + /* +*
Re: Instant panic while trying run ports-mgmt/poudriere
On Sun, Aug 23, 2015 at 09:54:28AM +0300, Andriy Gapon wrote: > On 12/08/2015 17:11, Lawrence Stewart wrote: > > On 08/07/15 07:33, Pawel Pekala wrote: > >> Hi K., > >> > >> On 2015-08-06 12:33 -0700, "K. Macy" wrote: > >>> Is this still happening? > >> > >> Still crashes: > > > > +1 for me running r286617 > > Here is another +1 with r286922. > I can add a couple of bits of debugging data: > > (kgdb) fr 8 > #8 0x80639d60 in knote (list=0xf8019a733ea0, > hint=2147483648, lockflags=) at > /usr/src/sys/kern/kern_event.c:1964 > 1964} else if ((lockflags & KNF_NOKQLOCK) != 0) { > (kgdb) p *list > $2 = {kl_list = {slh_first = 0x0}, kl_lock = 0x8063a1e0 > , kl_unlock = 0x8063a200 , > kl_assert_locked = 0x8063a220 , > kl_assert_unlocked = 0x8063a240 , > kl_lockarg = 0xf8019a733bb0} > (kgdb) disassemble > Dump of assembler code for function knote: > 0x80639d00 : push %rbp > 0x80639d01 : mov%rsp,%rbp > 0x80639d04 : push %r15 > 0x80639d06 : push %r14 > 0x80639d08 : push %r13 > 0x80639d0a : push %r12 > 0x80639d0c : push %rbx > 0x80639d0d : sub$0x18,%rsp > 0x80639d11 : mov%edx,%r12d > 0x80639d14 : mov%rsi,-0x30(%rbp) > 0x80639d18 : mov%rdi,%rbx > 0x80639d1b : test %rbx,%rbx > 0x80639d1e : je 0x80639ef6 > 0x80639d24 : mov%r12d,%eax > 0x80639d27 : and$0x1,%eax > 0x80639d2a : mov%eax,-0x3c(%rbp) > 0x80639d2d : mov0x28(%rbx),%rdi > 0x80639d31 : je 0x80639d38 > 0x80639d33 : callq *0x18(%rbx) > 0x80639d36 : jmp0x80639d42 > 0x80639d38 : callq *0x20(%rbx) > 0x80639d3b : mov0x28(%rbx),%rdi > 0x80639d3f : callq *0x8(%rbx) > 0x80639d42 : mov%rbx,-0x38(%rbp) > 0x80639d46 : mov(%rbx),%rbx > 0x80639d49 : test %rbx,%rbx > 0x80639d4c : je 0x80639ee5 > 0x80639d52 : and$0x2,%r12d > 0x80639d56 : nopw %cs:0x0(%rax,%rax,1) > 0x80639d60 : mov0x28(%rbx),%r14 > > Panic is in the last quoted instruction. > And: > (kgdb) i reg > rax0x246582 > rbx0xdeadc0dedeadc0de -2401050962867404578 > rcx0x0 0 > rdx0x12e302 > rsi0x80a26a5a -2136839590 > rdi0x80e81b80 -2132272256 > rbp0xfe02b7efea20 0xfe02b7efea20 > rsp0xfe02b7efe9e0 0xfe02b7efe9e0 > r8 0x80a269ce -2136839730 > r9 0x80e82838 -2132269000 > r100x1 65536 > r110x80fabd10 -2131051248 > r120x0 0 > r130xf801ff84a818 -8787511171048 > r140xf801ff84a800 -8787511171072 > r150xf8019a6974f0 -8789207452432 > rip0x80639d60 0x80639d60 > eflags 0x10286 66182 > > I think that $rbx stands out here (this is a kernel with INVARIANTS). > > Looking at the code, is it possible that one of the calls from within > the loop's body modifies the list? If that is so and provided that is a > valid behavior, then maybe using SLIST_FOREACH_SAFE would help. This is first time a useful debugging data was posted. The 0x28 offset may indicate either kn_kq member access of the struct knote, or kq_list of the struct kqueue. kl_list.slh_first of the list parameter is NULL, how would a list iteration loop even start ? Can you look up the list argument value from the previous frame (%rdi is overwritten, so debugger might be confused) ? ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Instant panic while trying run ports-mgmt/poudriere
On 12/08/2015 17:11, Lawrence Stewart wrote: > On 08/07/15 07:33, Pawel Pekala wrote: >> Hi K., >> >> On 2015-08-06 12:33 -0700, "K. Macy" wrote: >>> Is this still happening? >> >> Still crashes: > > +1 for me running r286617 Here is another +1 with r286922. I can add a couple of bits of debugging data: (kgdb) fr 8 #8 0x80639d60 in knote (list=0xf8019a733ea0, hint=2147483648, lockflags=) at /usr/src/sys/kern/kern_event.c:1964 1964} else if ((lockflags & KNF_NOKQLOCK) != 0) { (kgdb) p *list $2 = {kl_list = {slh_first = 0x0}, kl_lock = 0x8063a1e0 , kl_unlock = 0x8063a200 , kl_assert_locked = 0x8063a220 , kl_assert_unlocked = 0x8063a240 , kl_lockarg = 0xf8019a733bb0} (kgdb) disassemble Dump of assembler code for function knote: 0x80639d00 : push %rbp 0x80639d01 : mov%rsp,%rbp 0x80639d04 : push %r15 0x80639d06 : push %r14 0x80639d08 : push %r13 0x80639d0a : push %r12 0x80639d0c : push %rbx 0x80639d0d : sub$0x18,%rsp 0x80639d11 : mov%edx,%r12d 0x80639d14 : mov%rsi,-0x30(%rbp) 0x80639d18 : mov%rdi,%rbx 0x80639d1b : test %rbx,%rbx 0x80639d1e : je 0x80639ef6 0x80639d24 : mov%r12d,%eax 0x80639d27 : and$0x1,%eax 0x80639d2a : mov%eax,-0x3c(%rbp) 0x80639d2d : mov0x28(%rbx),%rdi 0x80639d31 : je 0x80639d38 0x80639d33 : callq *0x18(%rbx) 0x80639d36 : jmp0x80639d42 0x80639d38 : callq *0x20(%rbx) 0x80639d3b : mov0x28(%rbx),%rdi 0x80639d3f : callq *0x8(%rbx) 0x80639d42 : mov%rbx,-0x38(%rbp) 0x80639d46 : mov(%rbx),%rbx 0x80639d49 : test %rbx,%rbx 0x80639d4c : je 0x80639ee5 0x80639d52 : and$0x2,%r12d 0x80639d56 : nopw %cs:0x0(%rax,%rax,1) 0x80639d60 : mov0x28(%rbx),%r14 Panic is in the last quoted instruction. And: (kgdb) i reg rax0x246582 rbx0xdeadc0dedeadc0de -2401050962867404578 rcx0x0 0 rdx0x12e302 rsi0x80a26a5a -2136839590 rdi0x80e81b80 -2132272256 rbp0xfe02b7efea20 0xfe02b7efea20 rsp0xfe02b7efe9e0 0xfe02b7efe9e0 r8 0x80a269ce -2136839730 r9 0x80e82838 -2132269000 r100x1 65536 r110x80fabd10 -2131051248 r120x0 0 r130xf801ff84a818 -8787511171048 r140xf801ff84a800 -8787511171072 r150xf8019a6974f0 -8789207452432 rip0x80639d60 0x80639d60 eflags 0x10286 66182 I think that $rbx stands out here (this is a kernel with INVARIANTS). Looking at the code, is it possible that one of the calls from within the loop's body modifies the list? If that is so and provided that is a valid behavior, then maybe using SLIST_FOREACH_SAFE would help. -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Instant panic while trying run ports-mgmt/poudriere
On 8/12/15 7:11 AM, Lawrence Stewart wrote: > On 08/07/15 07:33, Pawel Pekala wrote: >> Hi K., >> >> On 2015-08-06 12:33 -0700, "K. Macy" wrote: >>> Is this still happening? >> >> Still crashes: > > +1 for me running r286617 > r286510 has been stable in the package build cluster. r286593 is stable on my own system. -- Regards, Bryan Drewery ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Instant panic while trying run ports-mgmt/poudriere
On 08/07/15 07:33, Pawel Pekala wrote: > Hi K., > > On 2015-08-06 12:33 -0700, "K. Macy" wrote: >> Is this still happening? > > Still crashes: +1 for me running r286617 Cheers, Lawrence ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Instant panic while trying run ports-mgmt/poudriere
On 8/10/15 2:47 PM, Pawel Pekala wrote: > Hi Mateusz, > > On 2015-08-06 23:44 +0200, Mateusz Guzik wrote: >> Sorry, I completely forgot about this. >> >> Can you please modify debug flags in your kernel config file to be >> "-O0 -g3" and reproduce with that? This should allow kgdb to obtain >> full info (along with exact rash site for inspection) without further >> tinkering or guessing. > > I'm unable to provide this for you, kernel compiled with this flags > panics during boot at zfs root mount. > Try raising kern.kstack_pages to 5 or 6 in the loader prompt too. -- Regards, Bryan Drewery ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Instant panic while trying run ports-mgmt/poudriere
Hi Mateusz, On 2015-08-06 23:44 +0200, Mateusz Guzik wrote: >Sorry, I completely forgot about this. > >Can you please modify debug flags in your kernel config file to be >"-O0 -g3" and reproduce with that? This should allow kgdb to obtain >full info (along with exact rash site for inspection) without further >tinkering or guessing. I'm unable to provide this for you, kernel compiled with this flags panics during boot at zfs root mount. -- pozdrawiam / with regards Paweł Pękala ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Instant panic while trying run ports-mgmt/poudriere
On Thu, Aug 06, 2015 at 11:33:28PM +0200, Pawel Pekala wrote: > Hi K., > > On 2015-08-06 12:33 -0700, "K. Macy" wrote: > >Is this still happening? > > Still crashes: > > Thu Aug 6 23:22:05 CEST 2015 > > FreeBSD blaviken.slowicza.org 11.0-CURRENT FreeBSD 11.0-CURRENT #50 r286370: > Thu Aug 6 19:55:29 CEST 2015 > r...@blaviken.slowicza.org:/usr/obj/hdd/src/sys/GENERIC amd64 > > panic: > > GNU gdb 6.1.1 [FreeBSD] > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "amd64-marcel-freebsd"... > > Unread portion of the kernel message buffer: > > > Fatal trap 9: general protection fault while in kernel mode > cpuid = 2; apic id = 02 > instruction pointer = 0x20:0x809d6b80 > stack pointer = 0x28:0xfe046cc68a00 > frame pointer = 0x28:0xfe046cc68a50 > code segment = base 0x0, limit 0xf, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 2147 (sh) > > #8 0x80e44652 in calltrap () > at /hdd/src/sys/amd64/amd64/exception.S:235 > #9 0x809d6b80 in knote (list=0xf801dbebfea0, hint=2147483648, > lockflags=) at /hdd/src/sys/kern/kern_event.c:1920 > #10 0x809dc424 in exit1 (td=0xf802bd0559a0, > rval=, signo=0) at /hdd/src/sys/kern/kern_exit.c:564 > #11 0x809db8cd in sys_sys_exit (td=0x0, uap=) > at /hdd/src/sys/kern/kern_exit.c:178 > #12 0x80e64c22 in amd64_syscall (td=0xf802bd0559a0, traced=0) > at subr_syscall.c:133 > #13 0x80e4493b in Xfast_syscall () > at /hdd/src/sys/amd64/amd64/exception.S:395 > #14 0x000800922eea in ?? () > Previous frame inner to this frame (corrupt stack?) > Current language: auto; currently minimal > (kgdb) > Sorry, I completely forgot about this. Can you please modify debug flags in your kernel config file to be "-O0 -g3" and reproduce with that? This should allow kgdb to obtain full info (along with exact rash site for inspection) without further tinkering or guessing. -- Mateusz Guzik ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Instant panic while trying run ports-mgmt/poudriere
Hi K., On 2015-08-06 12:33 -0700, "K. Macy" wrote: >Is this still happening? Still crashes: Thu Aug 6 23:22:05 CEST 2015 FreeBSD blaviken.slowicza.org 11.0-CURRENT FreeBSD 11.0-CURRENT #50 r286370: Thu Aug 6 19:55:29 CEST 2015 r...@blaviken.slowicza.org:/usr/obj/hdd/src/sys/GENERIC amd64 panic: GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: Fatal trap 9: general protection fault while in kernel mode cpuid = 2; apic id = 02 instruction pointer = 0x20:0x809d6b80 stack pointer = 0x28:0xfe046cc68a00 frame pointer = 0x28:0xfe046cc68a50 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 2147 (sh) Reading symbols from /boot/kernel/zfs.ko.symbols...done. Loaded symbols for /boot/kernel/zfs.ko.symbols Reading symbols from /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols for /boot/kernel/opensolaris.ko.symbols Reading symbols from /boot/kernel/amdtemp.ko.symbols...done. Loaded symbols for /boot/kernel/amdtemp.ko.symbols Reading symbols from /boot/modules/cuse4bsd.ko...done. Loaded symbols for /boot/modules/cuse4bsd.ko Reading symbols from /boot/kernel/fuse.ko.symbols...done. Loaded symbols for /boot/kernel/fuse.ko.symbols Reading symbols from /boot/kernel/tmpfs.ko.symbols...done. Loaded symbols for /boot/kernel/tmpfs.ko.symbols Reading symbols from /boot/kernel/radeonkms.ko.symbols...done. Loaded symbols for /boot/kernel/radeonkms.ko.symbols Reading symbols from /boot/kernel/iicbb.ko.symbols...done. Loaded symbols for /boot/kernel/iicbb.ko.symbols Reading symbols from /boot/kernel/iicbus.ko.symbols...done. Loaded symbols for /boot/kernel/iicbus.ko.symbols Reading symbols from /boot/kernel/iic.ko.symbols...done. Loaded symbols for /boot/kernel/iic.ko.symbols Reading symbols from /boot/kernel/drm2.ko.symbols...done. Loaded symbols for /boot/kernel/drm2.ko.symbols Reading symbols from /boot/kernel/radeonkmsfw_BARTS_pfp.ko.symbols...done. Loaded symbols for /boot/kernel/radeonkmsfw_BARTS_pfp.ko.symbols Reading symbols from /boot/kernel/radeonkmsfw_BARTS_me.ko.symbols...done. Loaded symbols for /boot/kernel/radeonkmsfw_BARTS_me.ko.symbols Reading symbols from /boot/kernel/radeonkmsfw_BTC_rlc.ko.symbols...done. Loaded symbols for /boot/kernel/radeonkmsfw_BTC_rlc.ko.symbols Reading symbols from /boot/kernel/radeonkmsfw_BARTS_mc.ko.symbols...done. Loaded symbols for /boot/kernel/radeonkmsfw_BARTS_mc.ko.symbols Reading symbols from /boot/kernel/fdescfs.ko.symbols...done. Loaded symbols for /boot/kernel/fdescfs.ko.symbols Reading symbols from /boot/kernel/ums.ko.symbols...done. Loaded symbols for /boot/kernel/ums.ko.symbols Reading symbols from /boot/kernel/uhid.ko.symbols...done. Loaded symbols for /boot/kernel/uhid.ko.symbols Reading symbols from /boot/modules/vboxnetflt.ko...done. Loaded symbols for /boot/modules/vboxnetflt.ko Reading symbols from /boot/kernel/netgraph.ko.symbols...done. Loaded symbols for /boot/kernel/netgraph.ko.symbols Reading symbols from /boot/modules/vboxdrv.ko...done. Loaded symbols for /boot/modules/vboxdrv.ko Reading symbols from /boot/kernel/ng_ether.ko.symbols...done. Loaded symbols for /boot/kernel/ng_ether.ko.symbols Reading symbols from /boot/modules/vboxnetadp.ko...done. Loaded symbols for /boot/modules/vboxnetadp.ko Reading symbols from /boot/kernel/linux.ko.symbols...done. Loaded symbols for /boot/kernel/linux.ko.symbols Reading symbols from /boot/kernel/linux_common.ko.symbols...done. Loaded symbols for /boot/kernel/linux_common.ko.symbols Reading symbols from /boot/kernel/nullfs.ko.symbols...done. Loaded symbols for /boot/kernel/nullfs.ko.symbols Reading symbols from /boot/kernel/linprocfs.ko.symbols...done. Loaded symbols for /boot/kernel/linprocfs.ko.symbols Reading symbols from /boot/kernel/sem.ko.symbols...done. Loaded symbols for /boot/kernel/sem.ko.symbols #0 doadump (textdump=0) at pcpu.h:221 221 pcpu.h: No such file or directory. in pcpu.h (kgdb) #0 doadump (textdump=0) at pcpu.h:221 #1 0x80377f5e in db_dump (dummy=, dummy2=false, dummy3=0, dummy4=0x0) at /hdd/src/sys/ddb/db_command.c:533 #2 0x80377ad1 in db_command (cmd_table=0x0) at /hdd/src/sys/ddb/db_command.c:440 #3 0x80377764 in db_command_loop () at /hdd/src/sys/ddb/db_command.c:493 #4 0x8037a31b in db_trap (type=, code=0) at /hdd/src/sys/ddb/db_main.c:251 #5 0x80a57074 in kdb_trap (type=9, code=0, tf=) at /hdd/src/sys/kern/sub
Re: Instant panic while trying run ports-mgmt/poudriere
Is this still happening? On Jul 15, 2015 1:41 PM, "Pawel Pekala" wrote: > Hi John-Mark, > > On 2015-07-15 11:05 -0700, John-Mark Gurney wrote: > >Please repost the entire panic message, and the back trace w/o X > >running... Also, if you could share the core and kernel w/ me (you can > >email me directly if you'd like), that'd help. > > Fatal trap 9: general protection fault while in kernel mode > cpuid = 1; apic id = 01 > instruction pointer = 0x20:0x809338c0 > stack pointer = 0x28:0xfe046c818a00 > frame pointer = 0x28:0xfe046c818a50 > code segment= base 0x0, limit 0xf, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags= interrupt enabled, resume, IOPL = 0 > current process = 1491 (sh) > > Reading symbols from /boot/kernel/zfs.ko.symbols...done. > Loaded symbols for /boot/kernel/zfs.ko.symbols > Reading symbols from /boot/kernel/opensolaris.ko.symbols...done. > Loaded symbols for /boot/kernel/opensolaris.ko.symbols > Reading symbols from /boot/kernel/amdtemp.ko.symbols...done. > Loaded symbols for /boot/kernel/amdtemp.ko.symbols > Reading symbols from /boot/modules/cuse4bsd.ko...done. > Loaded symbols for /boot/modules/cuse4bsd.ko > Reading symbols from /boot/kernel/fuse.ko.symbols...done. > Loaded symbols for /boot/kernel/fuse.ko.symbols > Reading symbols from /boot/kernel/tmpfs.ko.symbols...done. > Loaded symbols for /boot/kernel/tmpfs.ko.symbols > Reading symbols from /boot/kernel/radeonkms.ko.symbols...done. > Loaded symbols for /boot/kernel/radeonkms.ko.symbols > Reading symbols from /boot/kernel/iicbb.ko.symbols...done. > Loaded symbols for /boot/kernel/iicbb.ko.symbols > Reading symbols from /boot/kernel/iicbus.ko.symbols...done. > Loaded symbols for /boot/kernel/iicbus.ko.symbols > Reading symbols from /boot/kernel/iic.ko.symbols...done. > Loaded symbols for /boot/kernel/iic.ko.symbols > Reading symbols from /boot/kernel/drm2.ko.symbols...done. > Loaded symbols for /boot/kernel/drm2.ko.symbols > Reading symbols from /boot/kernel/radeonkmsfw_BARTS_pfp.ko.symbols...done. > Loaded symbols for /boot/kernel/radeonkmsfw_BARTS_pfp.ko.symbols > Reading symbols from /boot/kernel/radeonkmsfw_BARTS_me.ko.symbols...done. > Loaded symbols for /boot/kernel/radeonkmsfw_BARTS_me.ko.symbols > Reading symbols from /boot/kernel/radeonkmsfw_BTC_rlc.ko.symbols...done. > Loaded symbols for /boot/kernel/radeonkmsfw_BTC_rlc.ko.symbols > Reading symbols from /boot/kernel/radeonkmsfw_BARTS_mc.ko.symbols...done. > Loaded symbols for /boot/kernel/radeonkmsfw_BARTS_mc.ko.symbols > Reading symbols from /boot/kernel/fdescfs.ko.symbols...done. > Loaded symbols for /boot/kernel/fdescfs.ko.symbols > Reading symbols from /boot/kernel/ums.ko.symbols...done. > Loaded symbols for /boot/kernel/ums.ko.symbols > Reading symbols from /boot/kernel/uhid.ko.symbols...done. > Loaded symbols for /boot/kernel/uhid.ko.symbols > Reading symbols from /boot/kernel/linux.ko.symbols...done. > Loaded symbols for /boot/kernel/linux.ko.symbols > Reading symbols from /boot/kernel/linux_common.ko.symbols...done. > Loaded symbols for /boot/kernel/linux_common.ko.symbols > Reading symbols from /boot/kernel/nullfs.ko.symbols...done. > Loaded symbols for /boot/kernel/nullfs.ko.symbols > Reading symbols from /boot/kernel/linprocfs.ko.symbols...done. > Loaded symbols for /boot/kernel/linprocfs.ko.symbols > Reading symbols from /boot/kernel/sem.ko.symbols...done. > Loaded symbols for /boot/kernel/sem.ko.symbols > #0 doadump (textdump=0) at pcpu.h:221 > 221 pcpu.h: No such file or directory. > in pcpu.h > (kgdb) #0 doadump (textdump=0) at pcpu.h:221 > #1 0x8035b45e in db_dump (dummy=, > dummy2=false, > dummy3=0, dummy4=0x0) at /hdd/src/sys/ddb/db_command.c:533 > #2 0x8035afd1 in db_command (cmd_table=0x0) > at /hdd/src/sys/ddb/db_command.c:440 > #3 0x8035ac64 in db_command_loop () > at /hdd/src/sys/ddb/db_command.c:493 > #4 0x8035d7fb in db_trap (type=, code=0) > at /hdd/src/sys/ddb/db_main.c:251 > #5 0x809b4094 in kdb_trap (type=9, code=0, tf= out>) > at /hdd/src/sys/kern/subr_kdb.c:654 > #6 0x80d9e065 in trap_fatal (frame=0xfe046c818950, > eva=) at /hdd/src/sys/amd64/amd64/trap.c:848 > #7 0x80d9dd33 in trap (frame=) > at /hdd/src/sys/amd64/amd64/trap.c:201 > #8 0x80d7ecb2 in calltrap () > at /hdd/src/sys/amd64/amd64/exception.S:235 > #9 0x809338c0 in knote (list=0xf80013ae4408, hint=2147483648, > lockflags=) at /hdd/src/sys/kern/kern_event.c:1920 > #10 0x80938ef1 in exit1 (td=0xf800135c5980, > rv=) at /hdd/src/sys/kern/kern_exit.c:559 > #11 0x809383be in sys_sys_exit (td=0x0, uap=) > at /hdd/src/sys/kern/kern_exit.c:177 > #12 0x80d9e8d2 in amd64_syscall (td=0xf800135c5980, traced=0) > at subr_syscall.c:133 > #13 0x80d7ef9b in Xfast_syscall () > at /hd
Re: Instant panic while trying run ports-mgmt/poudriere
Hi John-Mark, On 2015-07-15 11:05 -0700, John-Mark Gurney wrote: >Please repost the entire panic message, and the back trace w/o X >running... Also, if you could share the core and kernel w/ me (you can >email me directly if you'd like), that'd help. Fatal trap 9: general protection fault while in kernel mode cpuid = 1; apic id = 01 instruction pointer = 0x20:0x809338c0 stack pointer = 0x28:0xfe046c818a00 frame pointer = 0x28:0xfe046c818a50 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 1491 (sh) Reading symbols from /boot/kernel/zfs.ko.symbols...done. Loaded symbols for /boot/kernel/zfs.ko.symbols Reading symbols from /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols for /boot/kernel/opensolaris.ko.symbols Reading symbols from /boot/kernel/amdtemp.ko.symbols...done. Loaded symbols for /boot/kernel/amdtemp.ko.symbols Reading symbols from /boot/modules/cuse4bsd.ko...done. Loaded symbols for /boot/modules/cuse4bsd.ko Reading symbols from /boot/kernel/fuse.ko.symbols...done. Loaded symbols for /boot/kernel/fuse.ko.symbols Reading symbols from /boot/kernel/tmpfs.ko.symbols...done. Loaded symbols for /boot/kernel/tmpfs.ko.symbols Reading symbols from /boot/kernel/radeonkms.ko.symbols...done. Loaded symbols for /boot/kernel/radeonkms.ko.symbols Reading symbols from /boot/kernel/iicbb.ko.symbols...done. Loaded symbols for /boot/kernel/iicbb.ko.symbols Reading symbols from /boot/kernel/iicbus.ko.symbols...done. Loaded symbols for /boot/kernel/iicbus.ko.symbols Reading symbols from /boot/kernel/iic.ko.symbols...done. Loaded symbols for /boot/kernel/iic.ko.symbols Reading symbols from /boot/kernel/drm2.ko.symbols...done. Loaded symbols for /boot/kernel/drm2.ko.symbols Reading symbols from /boot/kernel/radeonkmsfw_BARTS_pfp.ko.symbols...done. Loaded symbols for /boot/kernel/radeonkmsfw_BARTS_pfp.ko.symbols Reading symbols from /boot/kernel/radeonkmsfw_BARTS_me.ko.symbols...done. Loaded symbols for /boot/kernel/radeonkmsfw_BARTS_me.ko.symbols Reading symbols from /boot/kernel/radeonkmsfw_BTC_rlc.ko.symbols...done. Loaded symbols for /boot/kernel/radeonkmsfw_BTC_rlc.ko.symbols Reading symbols from /boot/kernel/radeonkmsfw_BARTS_mc.ko.symbols...done. Loaded symbols for /boot/kernel/radeonkmsfw_BARTS_mc.ko.symbols Reading symbols from /boot/kernel/fdescfs.ko.symbols...done. Loaded symbols for /boot/kernel/fdescfs.ko.symbols Reading symbols from /boot/kernel/ums.ko.symbols...done. Loaded symbols for /boot/kernel/ums.ko.symbols Reading symbols from /boot/kernel/uhid.ko.symbols...done. Loaded symbols for /boot/kernel/uhid.ko.symbols Reading symbols from /boot/kernel/linux.ko.symbols...done. Loaded symbols for /boot/kernel/linux.ko.symbols Reading symbols from /boot/kernel/linux_common.ko.symbols...done. Loaded symbols for /boot/kernel/linux_common.ko.symbols Reading symbols from /boot/kernel/nullfs.ko.symbols...done. Loaded symbols for /boot/kernel/nullfs.ko.symbols Reading symbols from /boot/kernel/linprocfs.ko.symbols...done. Loaded symbols for /boot/kernel/linprocfs.ko.symbols Reading symbols from /boot/kernel/sem.ko.symbols...done. Loaded symbols for /boot/kernel/sem.ko.symbols #0 doadump (textdump=0) at pcpu.h:221 221 pcpu.h: No such file or directory. in pcpu.h (kgdb) #0 doadump (textdump=0) at pcpu.h:221 #1 0x8035b45e in db_dump (dummy=, dummy2=false, dummy3=0, dummy4=0x0) at /hdd/src/sys/ddb/db_command.c:533 #2 0x8035afd1 in db_command (cmd_table=0x0) at /hdd/src/sys/ddb/db_command.c:440 #3 0x8035ac64 in db_command_loop () at /hdd/src/sys/ddb/db_command.c:493 #4 0x8035d7fb in db_trap (type=, code=0) at /hdd/src/sys/ddb/db_main.c:251 #5 0x809b4094 in kdb_trap (type=9, code=0, tf=) at /hdd/src/sys/kern/subr_kdb.c:654 #6 0x80d9e065 in trap_fatal (frame=0xfe046c818950, eva=) at /hdd/src/sys/amd64/amd64/trap.c:848 #7 0x80d9dd33 in trap (frame=) at /hdd/src/sys/amd64/amd64/trap.c:201 #8 0x80d7ecb2 in calltrap () at /hdd/src/sys/amd64/amd64/exception.S:235 #9 0x809338c0 in knote (list=0xf80013ae4408, hint=2147483648, lockflags=) at /hdd/src/sys/kern/kern_event.c:1920 #10 0x80938ef1 in exit1 (td=0xf800135c5980, rv=) at /hdd/src/sys/kern/kern_exit.c:559 #11 0x809383be in sys_sys_exit (td=0x0, uap=) at /hdd/src/sys/kern/kern_exit.c:177 #12 0x80d9e8d2 in amd64_syscall (td=0xf800135c5980, traced=0) at subr_syscall.c:133 #13 0x80d7ef9b in Xfast_syscall () at /hdd/src/sys/amd64/amd64/exception.S:395 #14 0x000800922f3a in ?? () Previous frame inner to this frame (corrupt stack?) Current language: auto; currently minimal -- pozdrawiam / with regards Paweł Pękala ___ fre
Re: Instant panic while trying run ports-mgmt/poudriere
Pawel Pekala wrote this message on Wed, Jul 15, 2015 at 17:46 +0200: > On 2015-07-14 15:38 -0700, John-Mark Gurney wrote: > >Pawel Pekala wrote this message on Mon, Jul 13, 2015 at 23:12 +0200: > >> Let me know if you need more details. > > > >Were you running X at the time of the crash? and if so, can you try > >to reproduce w/o X running? It's hard to know if the panic (and you > >didn't include the panic string) is due to kern_event, or trying to > >do too much in the console driver. > > > >Thanks. > > Last tests were done with X running yes. Today I did same test with all > services commented out in rc.conf (including X) and did get same result. > Poudriere causes kernel panic always in the same spot: > > [00:00:39] >> Calculating ports order and dependencies Please repost the entire panic message, and the back trace w/o X running... Also, if you could share the core and kernel w/ me (you can email me directly if you'd like), that'd help. Thanks. -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Instant panic while trying run ports-mgmt/poudriere
Hi John-Mark, On 2015-07-14 15:38 -0700, John-Mark Gurney wrote: >Pawel Pekala wrote this message on Mon, Jul 13, 2015 at 23:12 +0200: >> Let me know if you need more details. > >Were you running X at the time of the crash? and if so, can you try >to reproduce w/o X running? It's hard to know if the panic (and you >didn't include the panic string) is due to kern_event, or trying to >do too much in the console driver. > >Thanks. Last tests were done with X running yes. Today I did same test with all services commented out in rc.conf (including X) and did get same result. Poudriere causes kernel panic always in the same spot: [00:00:39] >> Calculating ports order and dependencies -- pozdrawiam / with regards Paweł Pękala ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Instant panic while trying run ports-mgmt/poudriere
Hi John-Mark, On 2015-07-14 15:27 -0700, John-Mark Gurney wrote: >Pawel Pekala wrote this message on Tue, Jul 14, 2015 at 22:47 +0200: >> On 2015-07-13 23:28 +0200, Mateusz Guzik wrote: >> >On Mon, Jul 13, 2015 at 11:12:05PM +0200, Pawel Pekala wrote: >> >> Hi >> >> >> >> I'm getting 100% reproducible kernel crash while trying build >> >> ports with poudriere on my system. This started to show up about >> >> 2-3 weeks ago. I upgrade my system on weekly basis usually on >> >> saturday. Here's backtrace: >> >> >> >> (kgdb) bt >> >[..] >> >> at /hdd/src/sys/amd64/amd64/trap.c:201 >> >> #25 0x80dace32 in calltrap () >> >> at /hdd/src/sys/amd64/amd64/exception.S:235 #26 0x80941430 >> >> in knote (list=0xf801a2589408, hint=2147483648, >> >> lockflags=) >> >> at /hdd/src/sys/kern/kern_event.c:1920 #27 0x80946a51 in >> >> exit1 (td=0xf801b84014d0, rv=) >> >> at /hdd/src/sys/kern/kern_exit.c:560 #28 0x80945f1e in >> >> sys_sys_exit (td=0x0, uap=> >> out>) at /hdd/src/sys/kern/kern_exit.c:178 #29 0x80dcdaa2 >> >> out>in amd64_syscall (td=0xf801b84014d0, traced=0) >> >> at subr_syscall.c:133 >> >> #30 0x80dad11b in Xfast_syscall () >> >> at /hdd/src/sys/amd64/amd64/exception.S:395 #31 0x000800922eea >> >> in ?? () Previous frame inner to this frame (corrupt stack?) >> >> Current language: auto; currently minimal >> >> >> >> Let me know if you need more details. >> > >> > >> >Well, if the problem is really that reproducible it would be best if >> >you narrowed it down to the exact commit. >> > >> >However, quick look suggests you may be a "victim" of r284861. >> >> After further testing I can confirm that this panic was introduced in >> r284861, thanks for the hint! > >Can you tell me what your line 1920 of kern_event.c is? (and the >context around it? Or at least the $FreeBSD$ line from >kern_event.c? Because in HEAD, the line is: > } else if ((lockflags & KNF_NOKQLOCK) != 0) { > >and there isn't a way to fault on that code... Yes, this is strange. if ((kn->kn_status & (KN_INFLUX | KN_SCAN)) == KN_INFLUX) { /* * Do not process the influx notes, except for * the influx coming from the kq unlock in the * kqueue_scan(). In the later case, we do * not interfere with the scan, since the code * fragment in kqueue_scan() locks the knlist, * and cannot proceed until we finished. */ KQ_UNLOCK(kq); ===> line 1920 } else if ((lockflags & KNF_NOKQLOCK) != 0) { kn->kn_status |= KN_INFLUX; KQ_UNLOCK(kq); error = kn->kn_fop->f_event(kn, hint); KQ_LOCK(kq); kn->kn_status &= ~KN_INFLUX; if (error) KNOTE_ACTIVATE(kn, 1); KQ_UNLOCK_FLUX(kq); } else { Id line: __FBSDID("$FreeBSD: head/sys/kern/kern_event.c 284215 2015-06-10 10:48:12Z mjg $"); -- pozdrawiam / with regards Paweł Pękala ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Instant panic while trying run ports-mgmt/poudriere
Pawel Pekala wrote this message on Mon, Jul 13, 2015 at 23:12 +0200: > Let me know if you need more details. Were you running X at the time of the crash? and if so, can you try to reproduce w/o X running? It's hard to know if the panic (and you didn't include the panic string) is due to kern_event, or trying to do too much in the console driver. Thanks. -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Instant panic while trying run ports-mgmt/poudriere
Pawel Pekala wrote this message on Tue, Jul 14, 2015 at 22:47 +0200: > On 2015-07-13 23:28 +0200, Mateusz Guzik wrote: > >On Mon, Jul 13, 2015 at 11:12:05PM +0200, Pawel Pekala wrote: > >> Hi > >> > >> I'm getting 100% reproducible kernel crash while trying build ports > >> with poudriere on my system. This started to show up about 2-3 weeks > >> ago. I upgrade my system on weekly basis usually on saturday. > >> Here's backtrace: > >> > >> (kgdb) bt > >[..] > >> at /hdd/src/sys/amd64/amd64/trap.c:201 > >> #25 0x80dace32 in calltrap () > >> at /hdd/src/sys/amd64/amd64/exception.S:235 #26 0x80941430 > >> in knote (list=0xf801a2589408, hint=2147483648, lockflags= >> optimized out>) at /hdd/src/sys/kern/kern_event.c:1920 #27 > >> 0x80946a51 in exit1 (td=0xf801b84014d0, rv= >> optimized out>) at /hdd/src/sys/kern/kern_exit.c:560 #28 > >> 0x80945f1e in sys_sys_exit (td=0x0, uap= >> out>) at /hdd/src/sys/kern/kern_exit.c:178 #29 0x80dcdaa2 in > >> out>amd64_syscall (td=0xf801b84014d0, traced=0) > >> at subr_syscall.c:133 > >> #30 0x80dad11b in Xfast_syscall () > >> at /hdd/src/sys/amd64/amd64/exception.S:395 #31 0x000800922eea > >> in ?? () Previous frame inner to this frame (corrupt stack?) > >> Current language: auto; currently minimal > >> > >> Let me know if you need more details. > > > > > >Well, if the problem is really that reproducible it would be best if > >you narrowed it down to the exact commit. > > > >However, quick look suggests you may be a "victim" of r284861. > > After further testing I can confirm that this panic was introduced in > r284861, thanks for the hint! Can you tell me what your line 1920 of kern_event.c is? (and the context around it? Or at least the $FreeBSD$ line from kern_event.c? Because in HEAD, the line is: } else if ((lockflags & KNF_NOKQLOCK) != 0) { and there isn't a way to fault on that code... -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Instant panic while trying run ports-mgmt/poudriere
Hi Mateusz, On 2015-07-13 23:28 +0200, Mateusz Guzik wrote: >On Mon, Jul 13, 2015 at 11:12:05PM +0200, Pawel Pekala wrote: >> Hi >> >> I'm getting 100% reproducible kernel crash while trying build ports >> with poudriere on my system. This started to show up about 2-3 weeks >> ago. I upgrade my system on weekly basis usually on saturday. >> Here's backtrace: >> >> (kgdb) bt >[..] >> at /hdd/src/sys/amd64/amd64/trap.c:201 >> #25 0x80dace32 in calltrap () >> at /hdd/src/sys/amd64/amd64/exception.S:235 #26 0x80941430 >> in knote (list=0xf801a2589408, hint=2147483648, lockflags=> optimized out>) at /hdd/src/sys/kern/kern_event.c:1920 #27 >> 0x80946a51 in exit1 (td=0xf801b84014d0, rv=> optimized out>) at /hdd/src/sys/kern/kern_exit.c:560 #28 >> 0x80945f1e in sys_sys_exit (td=0x0, uap=> out>) at /hdd/src/sys/kern/kern_exit.c:178 #29 0x80dcdaa2 in >> out>amd64_syscall (td=0xf801b84014d0, traced=0) >> at subr_syscall.c:133 >> #30 0x80dad11b in Xfast_syscall () >> at /hdd/src/sys/amd64/amd64/exception.S:395 #31 0x000800922eea >> in ?? () Previous frame inner to this frame (corrupt stack?) >> Current language: auto; currently minimal >> >> Let me know if you need more details. > > >Well, if the problem is really that reproducible it would be best if >you narrowed it down to the exact commit. > >However, quick look suggests you may be a "victim" of r284861. After further testing I can confirm that this panic was introduced in r284861, thanks for the hint! -- pozdrawiam / with regards Paweł Pękala ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Instant panic while trying run ports-mgmt/poudriere
Hi Mateusz, On 2015-07-13 23:28 +0200, Mateusz Guzik wrote: >On Mon, Jul 13, 2015 at 11:12:05PM +0200, Pawel Pekala wrote: >> Hi >> >> I'm getting 100% reproducible kernel crash while trying build ports >> with poudriere on my system. This started to show up about 2-3 weeks >> ago. I upgrade my system on weekly basis usually on saturday. >> Here's backtrace: >> >> (kgdb) bt >[..] >> at /hdd/src/sys/amd64/amd64/trap.c:201 >> #25 0x80dace32 in calltrap () >> at /hdd/src/sys/amd64/amd64/exception.S:235 #26 0x80941430 >> in knote (list=0xf801a2589408, hint=2147483648, lockflags=> optimized out>) at /hdd/src/sys/kern/kern_event.c:1920 #27 >> 0x80946a51 in exit1 (td=0xf801b84014d0, rv=> optimized out>) at /hdd/src/sys/kern/kern_exit.c:560 #28 >> 0x80945f1e in sys_sys_exit (td=0x0, uap=> out>) at /hdd/src/sys/kern/kern_exit.c:178 #29 0x80dcdaa2 in >> out>amd64_syscall (td=0xf801b84014d0, traced=0) >> at subr_syscall.c:133 >> #30 0x80dad11b in Xfast_syscall () >> at /hdd/src/sys/amd64/amd64/exception.S:395 #31 0x000800922eea >> in ?? () Previous frame inner to this frame (corrupt stack?) >> Current language: auto; currently minimal >> >> Let me know if you need more details. > > >Well, if the problem is really that reproducible it would be best if >you narrowed it down to the exact commit. > >However, quick look suggests you may be a "victim" of r284861. > >Can you enter kgdb and: >f 26 >p *list > >? (kgdb) f 26 #26 0x80941430 in knote (list=0xf801a2589408, hint=2147483648, lockflags=) at /hdd/src/sys/kern/kern_event.c:1920 1920} else if ((lockflags & KNF_NOKQLOCK) != 0) { Current language: auto; currently minimal (kgdb) p *list $1 = {kl_list = {slh_first = 0x0}, kl_lock = 0x809418e0 , kl_unlock = 0x80941900 , kl_assert_locked = 0x80941920 , kl_assert_unlocked = 0x80941940 , kl_lockarg = 0xf801a2589120} Forgot to add my uname -a last time: FreeBSD blaviken.slowicza.org 11.0-CURRENT FreeBSD 11.0-CURRENT #44 r285509: Mon Jul 13 22:38:11 CEST 2015 c...@blaviken.slowicza.org:/usr/obj/hdd/src/sys/GENERIC amd64 -- pozdrawiam / with regards Paweł Pękala ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Instant panic while trying run ports-mgmt/poudriere
On Mon, Jul 13, 2015 at 11:12:05PM +0200, Pawel Pekala wrote: > Hi > > I'm getting 100% reproducible kernel crash while trying build ports > with poudriere on my system. This started to show up about 2-3 weeks > ago. I upgrade my system on weekly basis usually on saturday. > Here's backtrace: > > (kgdb) bt [..] > at /hdd/src/sys/amd64/amd64/trap.c:201 > #25 0x80dace32 in calltrap () at > /hdd/src/sys/amd64/amd64/exception.S:235 > #26 0x80941430 in knote (list=0xf801a2589408, hint=2147483648, > lockflags=) at /hdd/src/sys/kern/kern_event.c:1920 > #27 0x80946a51 in exit1 (td=0xf801b84014d0, rv= out>) > at /hdd/src/sys/kern/kern_exit.c:560 > #28 0x80945f1e in sys_sys_exit (td=0x0, uap=) > at /hdd/src/sys/kern/kern_exit.c:178 > #29 0x80dcdaa2 in amd64_syscall (td=0xf801b84014d0, traced=0) > at subr_syscall.c:133 > #30 0x80dad11b in Xfast_syscall () at > /hdd/src/sys/amd64/amd64/exception.S:395 > #31 0x000800922eea in ?? () > Previous frame inner to this frame (corrupt stack?) > Current language: auto; currently minimal > > Let me know if you need more details. Well, if the problem is really that reproducible it would be best if you narrowed it down to the exact commit. However, quick look suggests you may be a "victim" of r284861. Can you enter kgdb and: f 26 p *list ? -- Mateusz Guzik ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Instant panic while trying run ports-mgmt/poudriere
Hi I'm getting 100% reproducible kernel crash while trying build ports with poudriere on my system. This started to show up about 2-3 weeks ago. I upgrade my system on weekly basis usually on saturday. Here's backtrace: (kgdb) bt #0 doadump (textdump=1) at pcpu.h:221 #1 0x80984625 in kern_reboot (howto=260) at /hdd/src/sys/kern/kern_shutdown.c:447 #2 0x80984c18 in vpanic (fmt=, ap=) at /hdd/src/sys/kern/kern_shutdown.c:744 #3 0x80984c63 in panic (fmt=0x0) at /hdd/src/sys/kern/kern_shutdown.c:675 #4 0x8098e281 in mi_switch (flags=, newtd=) at /hdd/src/sys/kern/kern_synch.c:406 #5 0x809d5991 in turnstile_wait (ts=, owner=0x0, queue=) at /hdd/src/sys/kern/subr_turnstile.c:751 #6 0x8098234d in __rw_wlock_hard (c=0x8185bd18, tid=18446735285002704080, file=, line=) at /hdd/src/sys/kern/kern_rwlock.c:898 #7 0x80981f74 in _rw_wlock_cookie (c=, file=0x810e0c29 "/hdd/src/sys/amd64/amd64/pmap.c", line=3690) at /hdd/src/sys/kern/kern_rwlock.c:268 #8 0x80dbcf1e in pmap_remove_all (m=0xf8041a03e450) at /hdd/src/sys/amd64/amd64/pmap.c:3690 #9 0x80c30986 in cdev_pager_free_page (object=, m=0xf8041a03e450) at /hdd/src/sys/vm/device_pager.c:215 #10 0x8223ce30 in ttm_bo_release_mmap (bo=0xf800cd06e848) at /hdd/src/sys/modules/drm2/drm2/../../../dev/drm2/ttm/ttm_bo_vm.c:390 #11 0x82238a7c in ttm_bo_unmap_virtual (bo=0xf800cd06e848) at /hdd/src/sys/modules/drm2/drm2/../../../dev/drm2/ttm/ttm_bo.c:1651 #12 0x82081365 in radeon_pm_set_clocks (rdev=0xfe000133d000) at /hdd/src/sys/modules/drm2/radeonkms/../../../dev/drm2/radeon/radeon_pm.c:146 #13 0x82081e4e in radeon_pm_compute_clocks (rdev=) at /hdd/src/sys/modules/drm2/radeonkms/../../../dev/drm2/radeon/radeon_pm.c:777 #14 0x82093b63 in atombios_crtc_dpms (crtc=, mode=) at /hdd/src/sys/modules/drm2/radeonkms/../../../dev/drm2/radeon/atombios_crtc.c:277 #15 0x820955f9 in atombios_crtc_prepare (crtc=0xf80005c7f000) at /hdd/src/sys/modules/drm2/radeonkms/../../../dev/drm2/radeon/atombios_crtc.c:1829 #16 0x8221e938 in drm_crtc_helper_set_mode (crtc=0xf80005c7f000, mode=0xf80005775d00, x=0, y=0, old_fb=0xf80005c63100) at /hdd/src/sys/modules/drm2/drm2/../../../dev/drm2/drm_crtc_helper.c:454 #17 0x8221f504 in drm_crtc_helper_set_config (set=0xf80005742000) ---Type to continue, or q to quit--- at /hdd/src/sys/modules/drm2/drm2/../../../dev/drm2/drm_crtc_helper.c:752 #18 0x822255c6 in vt_kms_postswitch (arg=) at /hdd/src/sys/modules/drm2/drm2/../../../dev/drm2/drm_fb_helper.c:344 #19 0x8081f249 in vt_window_switch (vw=0x81558208) at /hdd/src/sys/dev/vt/vt_core.c:531 #20 0x8081ce83 in vtterm_cngrab (tm=) at /hdd/src/sys/dev/vt/vt_core.c:1423 #21 0x8092f225 in cngrab () at /hdd/src/sys/kern/kern_cons.c:368 #22 0x809c1a9a in kdb_trap (type=9, code=0, tf=) at /hdd/src/sys/kern/subr_kdb.c:651 #23 0x80dcd235 in trap_fatal (frame=0xfe046cfad950, eva=) at /hdd/src/sys/amd64/amd64/trap.c:848 #24 0x80dccf03 in trap (frame=) at /hdd/src/sys/amd64/amd64/trap.c:201 #25 0x80dace32 in calltrap () at /hdd/src/sys/amd64/amd64/exception.S:235 #26 0x80941430 in knote (list=0xf801a2589408, hint=2147483648, lockflags=) at /hdd/src/sys/kern/kern_event.c:1920 #27 0x80946a51 in exit1 (td=0xf801b84014d0, rv=) at /hdd/src/sys/kern/kern_exit.c:560 #28 0x80945f1e in sys_sys_exit (td=0x0, uap=) at /hdd/src/sys/kern/kern_exit.c:178 #29 0x80dcdaa2 in amd64_syscall (td=0xf801b84014d0, traced=0) at subr_syscall.c:133 #30 0x80dad11b in Xfast_syscall () at /hdd/src/sys/amd64/amd64/exception.S:395 #31 0x000800922eea in ?? () Previous frame inner to this frame (corrupt stack?) Current language: auto; currently minimal Let me know if you need more details. -- pozdrawiam / with regards Paweł Pękala ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"