subject:"Re\: Instant panic while trying run ports\-mgmt\/poudriere"

Re: Instant panic while trying run ports-mgmt/poudriere

2015-09-01 Thread Andriy Gapon


John-Mark,

with all the due respect I have to invoke the forest-vs-trees argument here:

- it is established that in the knote() loop the current knote member of the
klist can be removed
- it's a fact that getting a pointer to a next element from a removed element is
an illegal operation
- FOREACH_SAFE is specifically designed to handle exactly this kind of the 
iteration



-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Instant panic while trying run ports-mgmt/poudriere

2015-09-01 Thread Ed Maste

On 1 September 2015 at 15:01, John-Mark Gurney  wrote:
>
> But I would ask you to respect my maintainership of the code... Just
> because you get paid to work on FreeBSD full time does not mean you
> get to run roughshod over other people's work and force them to work
> on your time frame...  Other people have jobs, and families and
> responsiblities too...

A quick comment on this point, on behalf of the FreeBSD Foundation
(and not core): working for the Foundation as either permanent staff
or on a project grant conveys no special status with respect to making
changes in FreeBSD. Staff and project developers are expected to abide
by the same rules and social conventions when interacting with the
FreeBSD community.

That said, the discussion and diagnosis of this issue has been ongoing
for about ten days, and avg provided a detailed sequence of events
five days ago. In this case the patch fixed a panic that several
people were experiencing, was tested by several people who experienced
the panic, and received review. In my opinion r287366 was handled in a
fair and reasonable fashion.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Instant panic while trying run ports-mgmt/poudriere

2015-09-01 Thread John-Mark Gurney

Konstantin Belousov wrote this message on Tue, Sep 01, 2015 at 21:44 +0300:
> On Tue, Sep 01, 2015 at 11:24:06AM -0700, John-Mark Gurney wrote:
> > Andriy Gapon wrote this message on Thu, Aug 27, 2015 at 23:21 +0300:
> > > On 27/08/2015 21:09, John-Mark Gurney wrote:
> > > > Andriy Gapon wrote this message on Thu, Aug 27, 2015 at 10:21 +0300:
> > > >> On 27/08/2015 02:36, John-Mark Gurney wrote:
> > > >>> We should/cannot get here w/ an empty list.  If we do, then there is
> > > >>> something seriously wrong...  The current kn (which we must have as we
> > > >>> are here) MUST be on the list, but as you just showed, there are no
> > > >>> knotes on the list.
> > > >>>
> > > >>> Can you get me a print of the knote?  That way I can see what flags
> > > >>> are on it?
> > > >>
> > > >> Apologies if the following might sound a little bit patronizing, but it
> > > >> seems that you have got all the facts correctly, but somehow the
> > > >> connection between them did not become clear.
> > > >>
> > > >> So:
> > > >> 1. The list originally is NOT empty.  I guess that it has one entry, 
> > > >> but
> > > >> that's an unimportant detail.
> > > >> 2. This is why the loop is entered. It's a fact that it is entered.
> > > >> 3. The list becomes empty precisely because the entry is removed during
> > > >> the iteration in the loop (as kib has explained).  It's a fact that the
> > > >> list became empty at least in the panic that I reported.
> > > > 
> > > > On you're latest dump, you said:
> > > > Here is another +1 with r286922.
> > > > 
> > > > I can add a couple of bits of debugging data:   
> > > > 
> > > > 
> > > > 
> > > > (kgdb) fr 8 
> > > > 
> > > > #8  0x80639d60 in knote (list=0xf8019a733ea0,   
> > > > 
> > > > hint=2147483648, lockflags=) at
> > > > 
> > > > /usr/src/sys/kern/kern_event.c:1964 
> > > > 
> > > > 1964} else if ((lockflags & KNF_NOKQLOCK) != 0) {   
> > > > 
> > > > 
> > > > First off, that can't be r286922, per:
> > > > https://svnweb.freebsd.org/base/stable/10/sys/kern/kern_event.c?annotate=286922
> > > > 
> > > > line 1964 is blank...  The line of code above should be at line 1884,
> > > > so not sure what is wrong here...
> > > 
> > > No, it can not be indeed, because I am running head.
> > > r286922 was the latest version of the repository, not the head branch,
> > > at the moment when I pulled the repository via git.
> > > 
> > > > Assuming that the pc really is at the line, f_event has not yet been
> > > > called,
> > > 
> > > Even on the second loop iteration?
> > > 
> > > >which is why I said that the list cannot be empty yet, as
> > > > f_event hasn't been called yet to remove the knote...  It could be that
> > > > optimization moved stuff around, but if that is the case, then the
> > > > above wasn't useful..
> > > 
> > > I provided the disassembly of the code as well, it's very obvious how
> > > the code was translated.
> > > 
> > > >> 4. The element is not only unlinked from the list, but its memory is
> > > >> also freed.
> > > > 
> > > > Where is the memory freed?  A knote MUST NOT be freed in an f_event
> > > > handler.  The only location that a list element is allowed to be
> > > > freed is in knote_drop, which must happen after f_detach is called,
> > > > but that can't/won't happen from knote (I believe the timer handles
> > > > this specially, but we are talking about normal knlist type filters)..
> > > 
> > > Well, right.  knote()->filt_proc()->knlist_remove_inevent() just removes
> > > the knote from the list.  But then there is KNOTE_ACTIVATE() that passes
> > > the knote to a different owner (so to say). And given that the knote has
> > > EV_ONESHOT set on it (in filt_proc) and that poudriere can put quite a
> > > stress load on a system, I am not surprised that another thread gets a
> > > chance to call knote_drop() on the knote before the original thread
> > > proceeds to the next iteration.
> > 
> > Ok, I think I have identified the race that you guys were trying to
> > tell me about, and though the _SAFE macro would be a similar fix, I'm
> > going to rewrite the loop so that this is more explicit on what
> > is happening here...
> > 
> > So, the race is this...  In knote, when the note is removed by
> > f_event, things are find until the KQ lock is dropped...  Once this
> > lock is dropped, effective ownership of the knote is transfered
> > from the knlist to the kq lock as the _DETACHED flag is now set,
> > which means that reading any fields from that note is undefined..
> > 
> > Once the kq lock is released in knote, then it is possible for a
> > functional like kqueue_scan to endup knote_drop'ing the note..

Re: Instant panic while trying run ports-mgmt/poudriere

2015-09-01 Thread Konstantin Belousov

On Tue, Sep 01, 2015 at 11:24:06AM -0700, John-Mark Gurney wrote:
> Andriy Gapon wrote this message on Thu, Aug 27, 2015 at 23:21 +0300:
> > On 27/08/2015 21:09, John-Mark Gurney wrote:
> > > Andriy Gapon wrote this message on Thu, Aug 27, 2015 at 10:21 +0300:
> > >> On 27/08/2015 02:36, John-Mark Gurney wrote:
> > >>> We should/cannot get here w/ an empty list.  If we do, then there is
> > >>> something seriously wrong...  The current kn (which we must have as we
> > >>> are here) MUST be on the list, but as you just showed, there are no
> > >>> knotes on the list.
> > >>>
> > >>> Can you get me a print of the knote?  That way I can see what flags
> > >>> are on it?
> > >>
> > >> Apologies if the following might sound a little bit patronizing, but it
> > >> seems that you have got all the facts correctly, but somehow the
> > >> connection between them did not become clear.
> > >>
> > >> So:
> > >> 1. The list originally is NOT empty.  I guess that it has one entry, but
> > >> that's an unimportant detail.
> > >> 2. This is why the loop is entered. It's a fact that it is entered.
> > >> 3. The list becomes empty precisely because the entry is removed during
> > >> the iteration in the loop (as kib has explained).  It's a fact that the
> > >> list became empty at least in the panic that I reported.
> > > 
> > > On you're latest dump, you said:
> > > Here is another +1 with r286922.  
> > >   
> > > I can add a couple of bits of debugging data: 
> > >   
> > >   
> > >   
> > > (kgdb) fr 8   
> > >   
> > > #8  0x80639d60 in knote (list=0xf8019a733ea0, 
> > >   
> > > hint=2147483648, lockflags=) at  
> > >   
> > > /usr/src/sys/kern/kern_event.c:1964   
> > >   
> > > 1964} else if ((lockflags & KNF_NOKQLOCK) != 0) { 
> > >   
> > > 
> > > First off, that can't be r286922, per:
> > > https://svnweb.freebsd.org/base/stable/10/sys/kern/kern_event.c?annotate=286922
> > > 
> > > line 1964 is blank...  The line of code above should be at line 1884,
> > > so not sure what is wrong here...
> > 
> > No, it can not be indeed, because I am running head.
> > r286922 was the latest version of the repository, not the head branch,
> > at the moment when I pulled the repository via git.
> > 
> > > Assuming that the pc really is at the line, f_event has not yet been
> > > called,
> > 
> > Even on the second loop iteration?
> > 
> > >which is why I said that the list cannot be empty yet, as
> > > f_event hasn't been called yet to remove the knote...  It could be that
> > > optimization moved stuff around, but if that is the case, then the
> > > above wasn't useful..
> > 
> > I provided the disassembly of the code as well, it's very obvious how
> > the code was translated.
> > 
> > >> 4. The element is not only unlinked from the list, but its memory is
> > >> also freed.
> > > 
> > > Where is the memory freed?  A knote MUST NOT be freed in an f_event
> > > handler.  The only location that a list element is allowed to be
> > > freed is in knote_drop, which must happen after f_detach is called,
> > > but that can't/won't happen from knote (I believe the timer handles
> > > this specially, but we are talking about normal knlist type filters)..
> > 
> > Well, right.  knote()->filt_proc()->knlist_remove_inevent() just removes
> > the knote from the list.  But then there is KNOTE_ACTIVATE() that passes
> > the knote to a different owner (so to say). And given that the knote has
> > EV_ONESHOT set on it (in filt_proc) and that poudriere can put quite a
> > stress load on a system, I am not surprised that another thread gets a
> > chance to call knote_drop() on the knote before the original thread
> > proceeds to the next iteration.
> 
> Ok, I think I have identified the race that you guys were trying to
> tell me about, and though the _SAFE macro would be a similar fix, I'm
> going to rewrite the loop so that this is more explicit on what
> is happening here...
> 
> So, the race is this...  In knote, when the note is removed by
> f_event, things are find until the KQ lock is dropped...  Once this
> lock is dropped, effective ownership of the knote is transfered
> from the knlist to the kq lock as the _DETACHED flag is now set,
> which means that reading any fields from that note is undefined..
> 
> Once the kq lock is released in knote, then it is possible for a
> functional like kqueue_scan to endup knote_drop'ing the note...
Did you read the commit message and my previous N messages about the
subject ? Can you point me at a difference between the commit message
and the above text ?

I object against the your pointless and fact-less backout request
and have no intention of complying with it.

> 
>

Re: Instant panic while trying run ports-mgmt/poudriere

2015-09-01 Thread John-Mark Gurney

Andriy Gapon wrote this message on Thu, Aug 27, 2015 at 23:21 +0300:
> On 27/08/2015 21:09, John-Mark Gurney wrote:
> > Andriy Gapon wrote this message on Thu, Aug 27, 2015 at 10:21 +0300:
> >> On 27/08/2015 02:36, John-Mark Gurney wrote:
> >>> We should/cannot get here w/ an empty list.  If we do, then there is
> >>> something seriously wrong...  The current kn (which we must have as we
> >>> are here) MUST be on the list, but as you just showed, there are no
> >>> knotes on the list.
> >>>
> >>> Can you get me a print of the knote?  That way I can see what flags
> >>> are on it?
> >>
> >> Apologies if the following might sound a little bit patronizing, but it
> >> seems that you have got all the facts correctly, but somehow the
> >> connection between them did not become clear.
> >>
> >> So:
> >> 1. The list originally is NOT empty.  I guess that it has one entry, but
> >> that's an unimportant detail.
> >> 2. This is why the loop is entered. It's a fact that it is entered.
> >> 3. The list becomes empty precisely because the entry is removed during
> >> the iteration in the loop (as kib has explained).  It's a fact that the
> >> list became empty at least in the panic that I reported.
> > 
> > On you're latest dump, you said:
> > Here is another +1 with r286922.
> > 
> > I can add a couple of bits of debugging data:   
> > 
> > 
> > 
> > (kgdb) fr 8 
> > 
> > #8  0x80639d60 in knote (list=0xf8019a733ea0,   
> > 
> > hint=2147483648, lockflags=) at
> > 
> > /usr/src/sys/kern/kern_event.c:1964 
> > 
> > 1964} else if ((lockflags & KNF_NOKQLOCK) != 0) {   
> > 
> > 
> > First off, that can't be r286922, per:
> > https://svnweb.freebsd.org/base/stable/10/sys/kern/kern_event.c?annotate=286922
> > 
> > line 1964 is blank...  The line of code above should be at line 1884,
> > so not sure what is wrong here...
> 
> No, it can not be indeed, because I am running head.
> r286922 was the latest version of the repository, not the head branch,
> at the moment when I pulled the repository via git.
> 
> > Assuming that the pc really is at the line, f_event has not yet been
> > called,
> 
> Even on the second loop iteration?
> 
> >which is why I said that the list cannot be empty yet, as
> > f_event hasn't been called yet to remove the knote...  It could be that
> > optimization moved stuff around, but if that is the case, then the
> > above wasn't useful..
> 
> I provided the disassembly of the code as well, it's very obvious how
> the code was translated.
> 
> >> 4. The element is not only unlinked from the list, but its memory is
> >> also freed.
> > 
> > Where is the memory freed?  A knote MUST NOT be freed in an f_event
> > handler.  The only location that a list element is allowed to be
> > freed is in knote_drop, which must happen after f_detach is called,
> > but that can't/won't happen from knote (I believe the timer handles
> > this specially, but we are talking about normal knlist type filters)..
> 
> Well, right.  knote()->filt_proc()->knlist_remove_inevent() just removes
> the knote from the list.  But then there is KNOTE_ACTIVATE() that passes
> the knote to a different owner (so to say). And given that the knote has
> EV_ONESHOT set on it (in filt_proc) and that poudriere can put quite a
> stress load on a system, I am not surprised that another thread gets a
> chance to call knote_drop() on the knote before the original thread
> proceeds to the next iteration.

Ok, I think I have identified the race that you guys were trying to
tell me about, and though the _SAFE macro would be a similar fix, I'm
going to rewrite the loop so that this is more explicit on what
is happening here...

So, the race is this...  In knote, when the note is removed by
f_event, things are find until the KQ lock is dropped...  Once this
lock is dropped, effective ownership of the knote is transfered
from the knlist to the kq lock as the _DETACHED flag is now set,
which means that reading any fields from that note is undefined..

Once the kq lock is released in knote, then it is possible for a
functional like kqueue_scan to endup knote_drop'ing the note...

Upon further examination, we may have another race as in knote_drop,
when we call f_detach, we don't have the list locked, nor kq, which
means that knlist_remove_inevent could be modifing the list at the same
time that kqueue_register could be modifing it to remove a _DELETED
note...

I'd like to close both races at the same time since they go
hand in hand...

> > The rest of your explination is invalid due to the invalid assumption
> > of this point...
> 
> Eagerly waiting for your explanation...
> 
> > If you can provide to

Re: Instant panic while trying run ports-mgmt/poudriere

2015-08-27 Thread Lawrence Stewart

On 08/23/15 22:54, Konstantin Belousov wrote:
> On Sun, Aug 23, 2015 at 12:08:16PM +0300, Konstantin Belousov wrote:
>> On Sun, Aug 23, 2015 at 09:54:28AM +0300, Andriy Gapon wrote:
>>> On 12/08/2015 17:11, Lawrence Stewart wrote:
 On 08/07/15 07:33, Pawel Pekala wrote:
> Hi K.,
>
> On 2015-08-06 12:33 -0700, "K. Macy"  wrote:
>> Is this still happening?
>
> Still crashes:

 +1 for me running r286617
>>>
>>> Here is another +1 with r286922.
>>> I can add a couple of bits of debugging data:
>>>
>>> (kgdb) fr 8
>>> #8  0x80639d60 in knote (list=0xf8019a733ea0,
>>> hint=2147483648, lockflags=) at
>>> /usr/src/sys/kern/kern_event.c:1964
>>> 1964} else if ((lockflags & KNF_NOKQLOCK) != 0) {
>>> (kgdb) p *list
>>> $2 = {kl_list = {slh_first = 0x0}, kl_lock = 0x8063a1e0
>>> , kl_unlock = 0x8063a200 ,
>>>   kl_assert_locked = 0x8063a220 ,
>>> kl_assert_unlocked = 0x8063a240 ,
>>>   kl_lockarg = 0xf8019a733bb0}
>>> (kgdb) disassemble
>>> Dump of assembler code for function knote:
>>> 0x80639d00 :   push   %rbp
>>> 0x80639d01 :   mov%rsp,%rbp
>>> 0x80639d04 :   push   %r15
>>> 0x80639d06 :   push   %r14
>>> 0x80639d08 :   push   %r13
>>> 0x80639d0a :  push   %r12
>>> 0x80639d0c :  push   %rbx
>>> 0x80639d0d :  sub$0x18,%rsp
>>> 0x80639d11 :  mov%edx,%r12d
>>> 0x80639d14 :  mov%rsi,-0x30(%rbp)
>>> 0x80639d18 :  mov%rdi,%rbx
>>> 0x80639d1b :  test   %rbx,%rbx
>>> 0x80639d1e :  je 0x80639ef6 
>>> 0x80639d24 :  mov%r12d,%eax
>>> 0x80639d27 :  and$0x1,%eax
>>> 0x80639d2a :  mov%eax,-0x3c(%rbp)
>>> 0x80639d2d :  mov0x28(%rbx),%rdi
>>> 0x80639d31 :  je 0x80639d38 
>>> 0x80639d33 :  callq  *0x18(%rbx)
>>> 0x80639d36 :  jmp0x80639d42 
>>> 0x80639d38 :  callq  *0x20(%rbx)
>>> 0x80639d3b :  mov0x28(%rbx),%rdi
>>> 0x80639d3f :  callq  *0x8(%rbx)
>>> 0x80639d42 :  mov%rbx,-0x38(%rbp)
>>> 0x80639d46 :  mov(%rbx),%rbx
>>> 0x80639d49 :  test   %rbx,%rbx
>>> 0x80639d4c :  je 0x80639ee5 
>>> 0x80639d52 :  and$0x2,%r12d
>>> 0x80639d56 :  nopw   %cs:0x0(%rax,%rax,1)
>>> 0x80639d60 :  mov0x28(%rbx),%r14
>>>
>>> Panic is in the last quoted instruction.
>>> And:
>>> (kgdb) i reg
>>> rax0x246582
>>> rbx0xdeadc0dedeadc0de   -2401050962867404578
>>> rcx0x0  0
>>> rdx0x12e302
>>> rsi0x80a26a5a   -2136839590
>>> rdi0x80e81b80   -2132272256
>>> rbp0xfe02b7efea20   0xfe02b7efea20
>>> rsp0xfe02b7efe9e0   0xfe02b7efe9e0
>>> r8 0x80a269ce   -2136839730
>>> r9 0x80e82838   -2132269000
>>> r100x1  65536
>>> r110x80fabd10   -2131051248
>>> r120x0  0
>>> r130xf801ff84a818   -8787511171048
>>> r140xf801ff84a800   -8787511171072
>>> r150xf8019a6974f0   -8789207452432
>>> rip0x80639d60   0x80639d60 
>>> eflags 0x10286  66182
>>>
>>> I think that $rbx stands out here (this is a kernel with INVARIANTS).
>>>
>>> Looking at the code, is it possible that one of the calls from within
>>> the loop's body modifies the list?  If that is so and provided that is a
>>> valid behavior, then maybe using SLIST_FOREACH_SAFE would help.
>>
>> This is first time a useful debugging data was posted.
>>
>> The 0x28 offset may indicate either kn_kq member access of the struct
>> knote, or kq_list of the struct kqueue.
>>
>> kl_list.slh_first of the list parameter is NULL, how would a list
>> iteration loop even start ?  Can you look up the list argument value
>> from the previous frame (%rdi is overwritten, so debugger might be
>> confused) ?
> 
> After looking at your data closely, I think you are right.  The panic
> occurs when the exit1(9) does KNOTE_LOCKED(NOTE_EXIT).  This is the
> only case in the tree where filter uses knlist_remove_inevent() to detach
> processed note, so indeed the slist is modified under the iterator.
> 
> Below is the patch with the suggested change and unrelated cleanup of
> the uma(9) KPI use.  Please test, everybody who has a panic with the
> backtrace pointing to the sys_exit().

Fixes the panic for me too, thanks Kostik.

Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Instant panic while trying run ports-mgmt/poudriere

2015-08-27 Thread Lawrence Stewart

On 08/27/15 17:15, Don Lewis wrote:
> On 27 Aug, Don Lewis wrote:
>> On 27 Aug, Lawrence Stewart wrote:
>>> On 08/27/15 09:36, John-Mark Gurney wrote:
 Andriy Gapon wrote this message on Sun, Aug 23, 2015 at 09:54 +0300:
> On 12/08/2015 17:11, Lawrence Stewart wrote:
>> On 08/07/15 07:33, Pawel Pekala wrote:
>>> Hi K.,
>>>
>>> On 2015-08-06 12:33 -0700, "K. Macy"  wrote:
 Is this still happening?
>>>
>>> Still crashes:
>>
>> +1 for me running r286617
>
> Here is another +1 with r286922.
> I can add a couple of bits of debugging data:
>
> (kgdb) fr 8
> #8  0x80639d60 in knote (list=0xf8019a733ea0,
> hint=2147483648, lockflags=) at
> /usr/src/sys/kern/kern_event.c:1964
> 1964} else if ((lockflags & KNF_NOKQLOCK) != 0) {
> (kgdb) p *list
> $2 = {kl_list = {slh_first = 0x0}, kl_lock = 0x8063a1e0

 We should/cannot get here w/ an empty list.  If we do, then there is
 something seriously wrong...  The current kn (which we must have as we
 are here) MUST be on the list, but as you just showed, there are no
 knotes on the list.

 Can you get me a print of the knote?  That way I can see what flags
 are on it?
>>>
>>> I quickly tried to get this info for you by building my kernel with -O0
>>> and reproducing, but I get an insta-panic on boot with the new kernel:
>>>
>>> Fatal double fault
>>> rip = 0x8218c794
>>> rsp = 0xfe044cdc9fe0
>>> rbp = 0xfe044cdca110
>>> cpuid = 2; apic id = 02
>>> panic: double fault
>>> cpuid = 2
>>> KDB: stack backtrace:
>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
>>> 0xfe03dcfffe30
>>> vpanic() at vpanic+0x189/frame 0xfe03dcfffeb0
>>> panic() at panic+0x43/frame 0xfe03dc10
>>> dblfault_handler() at dblfault_handler+0xa2/frame 0xfe03dc30
>>> Xdblfault() at Xdblfault+0xac/frame 0xfe03dc30
>>> --- trap 0x17, rip = 0x8218c794, rsp = 0xfe044cdc9fe0, rbp =
>>> 0xfe044cdca110 ---
>>> vdev_queue_aggregate() at vdev_queue_aggregate+0x34/frame 0xfe044cdca110
>>> vdev_queue_io_to_issue() at vdev_queue_io_to_issue+0x1f5/frame
>>> 0xfe044cdca560
>>> vdev_queue_io() at vdev_queue_io+0x19a/frame 0xfe044cdca5b0
>>> zio_vdev_io_start() at zio_vdev_io_start+0x81f/frame 0xfe044cdca6e0
>>> zio_execute() at zio_execute+0x23b/frame 0xfe044cdca730
>>> zio_nowait() at zio_nowait+0xbe/frame 0xfe044cdca760
>>> vdev_mirror_io_start() at vdev_mirror_io_start+0x140/frame
>>> 0xfe044cdca800
>>> zio_vdev_io_start() at zio_vdev_io_start+0x12f/frame 0xfe044cdca930
>>> zio_execute() at zio_execute+0x23b/frame 0xfe044cdca980
>>> zio_nowait() at zio_nowait+0xbe/frame 0xfe044cdca9b0
>>> spa_load_verify_cb() at spa_load_verify_cb+0x37d/frame 0xfe044cdcaa50
>>> traverse_visitbp() at traverse_visitbp+0x5a5/frame 0xfe044cdcac60
>>> traverse_dnode() at traverse_dnode+0x98/frame 0xfe044cdcacd0
>>> traverse_visitbp() at traverse_visitbp+0xc66/frame 0xfe044cdcaee0
>>> traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb0f0
>>> traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb300
>>> traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb510
>>> traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb720
>>> traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb930
>>> traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcbb40
>>> traverse_dnode() at traverse_dnode+0x98/frame 0xfe044cdcbbb0
>>> traverse_visitbp() at traverse_visitbp+0xe59/frame 0xfe044cdcbdc0
>>> traverse_impl() at traverse_impl+0x79d/frame 0xfe044cdcbfd0
>>> traverse_dataset() at traverse_dataset+0x93/frame 0xfe044cdcc040
>>> traverse_pool() at traverse_pool+0x1f2/frame 0xfe044cdcc140
>>> spa_load_verify() at spa_load_verify+0xf3/frame 0xfe044cdcc1f0
>>> spa_load_impl() at spa_load_impl+0x2069/frame 0xfe044cdcc610
>>> spa_load() at spa_load+0x320/frame 0xfe044cdcc6d0
>>> spa_load_impl() at spa_load_impl+0x150b/frame 0xfe044cdccaf0
>>> spa_load() at spa_load+0x320/frame 0xfe044cdccbb0
>>> spa_load_best() at spa_load_best+0xc6/frame 0xfe044cdccc50
>>> spa_open_common() at spa_open_common+0x246/frame 0xfe044cdccd40
>>> spa_open() at spa_open+0x35/frame 0xfe044cdccd70
>>> dsl_pool_hold() at dsl_pool_hold+0x2d/frame 0xfe044cdccdb0
>>> dmu_objset_own() at dmu_objset_own+0x2e/frame 0xfe044cdcce30
>>> zfsvfs_create() at zfsvfs_create+0x100/frame 0xfe044cdcd050
>>> zfs_domount() at zfs_domount+0xa7/frame 0xfe044cdcd0e0
>>> zfs_mount() at zfs_mount+0x6c3/frame 0xfe044cdcd390
>>> vfs_donmount() at vfs_donmount+0x1330/frame 0xfe044cdcd660
>>> kernel_mount() at kernel_mount+0x62/frame 0xfe044cdcd6c0
>>> parse_mount() at parse_mount+0x668/frame 0xfe044cdcd810
>>> vfs_mountroot() at vfs_mountroot+0x85c/frame 0xfe044cdcd9d

Re: Instant panic while trying run ports-mgmt/poudriere

2015-08-27 Thread Andriy Gapon

On 27/08/2015 23:21, Andriy Gapon wrote:
>> > First off, that can't be r286922, per:
>> > https://svnweb.freebsd.org/base/stable/10/sys/kern/kern_event.c?annotate=286922
>> > 
>> > line 1964 is blank...  The line of code above should be at line 1884,
>> > so not sure what is wrong here...
> No, it can not be indeed, because I am running head.
> r286922 was the latest version of the repository, not the head branch,
> at the moment when I pulled the repository via git.


Hrm, a small - irrelevant for me, but probably not for you - nit:
r286922 is actually a head commit:
https://svnweb.freebsd.org/base?view=revision&revision=286922

And:
https://svnweb.freebsd.org/base/head/sys/kern/kern_event.c?annotate=286922#l1964

Not sure why you chose to look at stable/10 (given the mailing list).

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Instant panic while trying run ports-mgmt/poudriere

2015-08-27 Thread Konstantin Belousov

On Thu, Aug 27, 2015 at 11:09:45AM -0700, John-Mark Gurney wrote:
> Andriy Gapon wrote this message on Thu, Aug 27, 2015 at 10:21 +0300:
> > On 27/08/2015 02:36, John-Mark Gurney wrote:
> > > We should/cannot get here w/ an empty list.  If we do, then there is
> > > something seriously wrong...  The current kn (which we must have as we
> > > are here) MUST be on the list, but as you just showed, there are no
> > > knotes on the list.
> > > 
> > > Can you get me a print of the knote?  That way I can see what flags
> > > are on it?
> > 
> > Apologies if the following might sound a little bit patronizing, but it
> > seems that you have got all the facts correctly, but somehow the
> > connection between them did not become clear.
> > 
> > So:
> > 1. The list originally is NOT empty.  I guess that it has one entry, but
> > that's an unimportant detail.
> > 2. This is why the loop is entered. It's a fact that it is entered.
> > 3. The list becomes empty precisely because the entry is removed during
> > the iteration in the loop (as kib has explained).  It's a fact that the
> > list became empty at least in the panic that I reported.
> 
> On you're latest dump, you said:
> Here is another +1 with r286922.  
>   
> I can add a couple of bits of debugging data: 
>   
>   
>   
> (kgdb) fr 8   
>   
> #8  0x80639d60 in knote (list=0xf8019a733ea0, 
>   
> hint=2147483648, lockflags=) at  
>   
> /usr/src/sys/kern/kern_event.c:1964   
>   
> 1964} else if ((lockflags & KNF_NOKQLOCK) != 0) { 
>   
> 
> First off, that can't be r286922, per:
> https://svnweb.freebsd.org/base/stable/10/sys/kern/kern_event.c?annotate=286922
> 
> line 1964 is blank...  The line of code above should be at line 1884,
> so not sure what is wrong here...
> 
> Assuming that the pc really is at the line, f_event has not yet been
> called, which is why I said that the list cannot be empty yet, as
> f_event hasn't been called yet to remove the knote...  It could be that
> optimization moved stuff around, but if that is the case, then the
> above wasn't useful..
> 
> > 4. The element is not only unlinked from the list, but its memory is
> > also freed.
> 
> Where is the memory freed?  A knote MUST NOT be freed in an f_event
> handler.  The only location that a list element is allowed to be
> freed is in knote_drop, which must happen after f_detach is called,
> but that can't/won't happen from knote (I believe the timer handles
> this specially, but we are talking about normal knlist type filters)..
> 
> The rest of your explination is invalid due to the invalid assumption
> of this point...
> 
> If you can provide to me where the knote is free'd in knote, w/
> function/line number stack trace (does not have to be dump, but a
> sample call path), then I'll reconsider, and fix that bug...
Sigh.  Did you ever read the mails I sent ?

Look at the filt_proc()->knlist_remove_inevent().

> > 5. That's why we have the use after free: SLIST_FOREACH is trying to get
> > a pointer to a next element from the freed memory.
> > 6. This is why the commit for trashing the freed memory made all the
> > difference: previously the freed memory was unlikely to be re-used /
> > modified, so the use-after-free had a high chance of succeeding.  It's a
> > fact that in my panic there was an attempt to dereference a trashed pointer.
> > 7. Finally, this is why SLIST_FOREACH_SAFE helps here: we stash the
> > pointer to the next element beforehand and, thus, we do not access the
> > freed memory.
> > 
> > Please let me know if you see any fault in above reasoning or if
> > something is still no clear.
> 
> -- 
>   John-Mark GurneyVoice: +1 415 225 5579
> 
>  "All that I will do, has been done, All that I have, has not."
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Instant panic while trying run ports-mgmt/poudriere

2015-08-27 Thread Andriy Gapon

On 27/08/2015 21:09, John-Mark Gurney wrote:
> Andriy Gapon wrote this message on Thu, Aug 27, 2015 at 10:21 +0300:
>> On 27/08/2015 02:36, John-Mark Gurney wrote:
>>> We should/cannot get here w/ an empty list.  If we do, then there is
>>> something seriously wrong...  The current kn (which we must have as we
>>> are here) MUST be on the list, but as you just showed, there are no
>>> knotes on the list.
>>>
>>> Can you get me a print of the knote?  That way I can see what flags
>>> are on it?
>>
>> Apologies if the following might sound a little bit patronizing, but it
>> seems that you have got all the facts correctly, but somehow the
>> connection between them did not become clear.
>>
>> So:
>> 1. The list originally is NOT empty.  I guess that it has one entry, but
>> that's an unimportant detail.
>> 2. This is why the loop is entered. It's a fact that it is entered.
>> 3. The list becomes empty precisely because the entry is removed during
>> the iteration in the loop (as kib has explained).  It's a fact that the
>> list became empty at least in the panic that I reported.
> 
> On you're latest dump, you said:
> Here is another +1 with r286922.  
>   
> I can add a couple of bits of debugging data: 
>   
>   
>   
> (kgdb) fr 8   
>   
> #8  0x80639d60 in knote (list=0xf8019a733ea0, 
>   
> hint=2147483648, lockflags=) at  
>   
> /usr/src/sys/kern/kern_event.c:1964   
>   
> 1964} else if ((lockflags & KNF_NOKQLOCK) != 0) { 
>   
> 
> First off, that can't be r286922, per:
> https://svnweb.freebsd.org/base/stable/10/sys/kern/kern_event.c?annotate=286922
> 
> line 1964 is blank...  The line of code above should be at line 1884,
> so not sure what is wrong here...

No, it can not be indeed, because I am running head.
r286922 was the latest version of the repository, not the head branch,
at the moment when I pulled the repository via git.

> Assuming that the pc really is at the line, f_event has not yet been
> called,

Even on the second loop iteration?

>which is why I said that the list cannot be empty yet, as
> f_event hasn't been called yet to remove the knote...  It could be that
> optimization moved stuff around, but if that is the case, then the
> above wasn't useful..

I provided the disassembly of the code as well, it's very obvious how
the code was translated.

>> 4. The element is not only unlinked from the list, but its memory is
>> also freed.
> 
> Where is the memory freed?  A knote MUST NOT be freed in an f_event
> handler.  The only location that a list element is allowed to be
> freed is in knote_drop, which must happen after f_detach is called,
> but that can't/won't happen from knote (I believe the timer handles
> this specially, but we are talking about normal knlist type filters)..

Well, right.  knote()->filt_proc()->knlist_remove_inevent() just removes
the knote from the list.  But then there is KNOTE_ACTIVATE() that passes
the knote to a different owner (so to say). And given that the knote has
EV_ONESHOT set on it (in filt_proc) and that poudriere can put quite a
stress load on a system, I am not surprised that another thread gets a
chance to call knote_drop() on the knote before the original thread
proceeds to the next iteration.

> The rest of your explination is invalid due to the invalid assumption
> of this point...

Eagerly waiting for your explanation...

> If you can provide to me where the knote is free'd in knote, w/
> function/line number stack trace (does not have to be dump, but a
> sample call path), then I'll reconsider, and fix that bug...
>> 5. That's why we have the use after free: SLIST_FOREACH is trying to get
>> a pointer to a next element from the freed memory.
>> 6. This is why the commit for trashing the freed memory made all the
>> difference: previously the freed memory was unlikely to be re-used /
>> modified, so the use-after-free had a high chance of succeeding.  It's a
>> fact that in my panic there was an attempt to dereference a trashed pointer.
>> 7. Finally, this is why SLIST_FOREACH_SAFE helps here: we stash the
>> pointer to the next element beforehand and, thus, we do not access the
>> freed memory.
>>
>> Please let me know if you see any fault in above reasoning or if
>> something is still no clear.
> 


-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Instant panic while trying run ports-mgmt/poudriere

2015-08-27 Thread John-Mark Gurney

Andriy Gapon wrote this message on Thu, Aug 27, 2015 at 10:21 +0300:
> On 27/08/2015 02:36, John-Mark Gurney wrote:
> > We should/cannot get here w/ an empty list.  If we do, then there is
> > something seriously wrong...  The current kn (which we must have as we
> > are here) MUST be on the list, but as you just showed, there are no
> > knotes on the list.
> > 
> > Can you get me a print of the knote?  That way I can see what flags
> > are on it?
> 
> Apologies if the following might sound a little bit patronizing, but it
> seems that you have got all the facts correctly, but somehow the
> connection between them did not become clear.
> 
> So:
> 1. The list originally is NOT empty.  I guess that it has one entry, but
> that's an unimportant detail.
> 2. This is why the loop is entered. It's a fact that it is entered.
> 3. The list becomes empty precisely because the entry is removed during
> the iteration in the loop (as kib has explained).  It's a fact that the
> list became empty at least in the panic that I reported.

On you're latest dump, you said:
Here is another +1 with r286922.
I can add a couple of bits of debugging data:   

(kgdb) fr 8 
#8  0x80639d60 in knote (list=0xf8019a733ea0,   
hint=2147483648, lockflags=) at
/usr/src/sys/kern/kern_event.c:1964 
1964} else if ((lockflags & KNF_NOKQLOCK) != 0) {   

First off, that can't be r286922, per:
https://svnweb.freebsd.org/base/stable/10/sys/kern/kern_event.c?annotate=286922

line 1964 is blank...  The line of code above should be at line 1884,
so not sure what is wrong here...

Assuming that the pc really is at the line, f_event has not yet been
called, which is why I said that the list cannot be empty yet, as
f_event hasn't been called yet to remove the knote...  It could be that
optimization moved stuff around, but if that is the case, then the
above wasn't useful..

> 4. The element is not only unlinked from the list, but its memory is
> also freed.

Where is the memory freed?  A knote MUST NOT be freed in an f_event
handler.  The only location that a list element is allowed to be
freed is in knote_drop, which must happen after f_detach is called,
but that can't/won't happen from knote (I believe the timer handles
this specially, but we are talking about normal knlist type filters)..

The rest of your explination is invalid due to the invalid assumption
of this point...

If you can provide to me where the knote is free'd in knote, w/
function/line number stack trace (does not have to be dump, but a
sample call path), then I'll reconsider, and fix that bug...
> 5. That's why we have the use after free: SLIST_FOREACH is trying to get
> a pointer to a next element from the freed memory.
> 6. This is why the commit for trashing the freed memory made all the
> difference: previously the freed memory was unlikely to be re-used /
> modified, so the use-after-free had a high chance of succeeding.  It's a
> fact that in my panic there was an attempt to dereference a trashed pointer.
> 7. Finally, this is why SLIST_FOREACH_SAFE helps here: we stash the
> pointer to the next element beforehand and, thus, we do not access the
> freed memory.
> 
> Please let me know if you see any fault in above reasoning or if
> something is still no clear.

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 "All that I will do, has been done, All that I have, has not."
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Instant panic while trying run ports-mgmt/poudriere

2015-08-27 Thread Konstantin Belousov

On Thu, Aug 27, 2015 at 10:21:47AM +0300, Andriy Gapon wrote:
> On 27/08/2015 02:36, John-Mark Gurney wrote:
> > We should/cannot get here w/ an empty list.  If we do, then there is
> > something seriously wrong...  The current kn (which we must have as we
> > are here) MUST be on the list, but as you just showed, there are no
> > knotes on the list.
> > 
> > Can you get me a print of the knote?  That way I can see what flags
> > are on it?
> 
> Apologies if the following might sound a little bit patronizing, but it
> seems that you have got all the facts correctly, but somehow the
> connection between them did not become clear.
> 
> So:
> 1. The list originally is NOT empty.  I guess that it has one entry, but
> that's an unimportant detail.
> 2. This is why the loop is entered. It's a fact that it is entered.
> 3. The list becomes empty precisely because the entry is removed during
> the iteration in the loop (as kib has explained).  It's a fact that the
> list became empty at least in the panic that I reported.
The only detail I can add to this explanation, which is probably third (?)
time, is that the removal is done in the filt_proc() event method, by
the call to knlist_remove_inevent().

> 4. The element is not only unlinked from the list, but its memory is
> also freed.
> 5. That's why we have the use after free: SLIST_FOREACH is trying to get
> a pointer to a next element from the freed memory.
> 6. This is why the commit for trashing the freed memory made all the
> difference: previously the freed memory was unlikely to be re-used /
> modified, so the use-after-free had a high chance of succeeding.  It's a
> fact that in my panic there was an attempt to dereference a trashed pointer.
> 7. Finally, this is why SLIST_FOREACH_SAFE helps here: we stash the
> pointer to the next element beforehand and, thus, we do not access the
> freed memory.
The additional, eighth item, should explain why the change to _SAFE() is
the correct fix, and not just a papering over the problem. Nobody except
the current thread can modify the knlist, because knlist is locked. As
a consequence, only the current element can be unlinked and removed. So
the _SAFE() iterator actually work.

> 
> Please let me know if you see any fault in above reasoning or if
> something is still no clear.

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Instant panic while trying run ports-mgmt/poudriere

2015-08-27 Thread Don Lewis

On 27 Aug, Lawrence Stewart wrote:
> On 08/27/15 09:36, John-Mark Gurney wrote:
>> Andriy Gapon wrote this message on Sun, Aug 23, 2015 at 09:54 +0300:
>>> On 12/08/2015 17:11, Lawrence Stewart wrote:
 On 08/07/15 07:33, Pawel Pekala wrote:
> Hi K.,
>
> On 2015-08-06 12:33 -0700, "K. Macy"  wrote:
>> Is this still happening?
>
> Still crashes:

 +1 for me running r286617
>>>
>>> Here is another +1 with r286922.
>>> I can add a couple of bits of debugging data:
>>>
>>> (kgdb) fr 8
>>> #8  0x80639d60 in knote (list=0xf8019a733ea0,
>>> hint=2147483648, lockflags=) at
>>> /usr/src/sys/kern/kern_event.c:1964
>>> 1964} else if ((lockflags & KNF_NOKQLOCK) != 0) {
>>> (kgdb) p *list
>>> $2 = {kl_list = {slh_first = 0x0}, kl_lock = 0x8063a1e0
>> 
>> We should/cannot get here w/ an empty list.  If we do, then there is
>> something seriously wrong...  The current kn (which we must have as we
>> are here) MUST be on the list, but as you just showed, there are no
>> knotes on the list.
>> 
>> Can you get me a print of the knote?  That way I can see what flags
>> are on it?
> 
> I quickly tried to get this info for you by building my kernel with -O0
> and reproducing, but I get an insta-panic on boot with the new kernel:
> 
> Fatal double fault
> rip = 0x8218c794
> rsp = 0xfe044cdc9fe0
> rbp = 0xfe044cdca110
> cpuid = 2; apic id = 02
> panic: double fault
> cpuid = 2
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> 0xfe03dcfffe30
> vpanic() at vpanic+0x189/frame 0xfe03dcfffeb0
> panic() at panic+0x43/frame 0xfe03dc10
> dblfault_handler() at dblfault_handler+0xa2/frame 0xfe03dc30
> Xdblfault() at Xdblfault+0xac/frame 0xfe03dc30
> --- trap 0x17, rip = 0x8218c794, rsp = 0xfe044cdc9fe0, rbp =
> 0xfe044cdca110 ---
> vdev_queue_aggregate() at vdev_queue_aggregate+0x34/frame 0xfe044cdca110
> vdev_queue_io_to_issue() at vdev_queue_io_to_issue+0x1f5/frame
> 0xfe044cdca560
> vdev_queue_io() at vdev_queue_io+0x19a/frame 0xfe044cdca5b0
> zio_vdev_io_start() at zio_vdev_io_start+0x81f/frame 0xfe044cdca6e0
> zio_execute() at zio_execute+0x23b/frame 0xfe044cdca730
> zio_nowait() at zio_nowait+0xbe/frame 0xfe044cdca760
> vdev_mirror_io_start() at vdev_mirror_io_start+0x140/frame
> 0xfe044cdca800
> zio_vdev_io_start() at zio_vdev_io_start+0x12f/frame 0xfe044cdca930
> zio_execute() at zio_execute+0x23b/frame 0xfe044cdca980
> zio_nowait() at zio_nowait+0xbe/frame 0xfe044cdca9b0
> spa_load_verify_cb() at spa_load_verify_cb+0x37d/frame 0xfe044cdcaa50
> traverse_visitbp() at traverse_visitbp+0x5a5/frame 0xfe044cdcac60
> traverse_dnode() at traverse_dnode+0x98/frame 0xfe044cdcacd0
> traverse_visitbp() at traverse_visitbp+0xc66/frame 0xfe044cdcaee0
> traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb0f0
> traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb300
> traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb510
> traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb720
> traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb930
> traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcbb40
> traverse_dnode() at traverse_dnode+0x98/frame 0xfe044cdcbbb0
> traverse_visitbp() at traverse_visitbp+0xe59/frame 0xfe044cdcbdc0
> traverse_impl() at traverse_impl+0x79d/frame 0xfe044cdcbfd0
> traverse_dataset() at traverse_dataset+0x93/frame 0xfe044cdcc040
> traverse_pool() at traverse_pool+0x1f2/frame 0xfe044cdcc140
> spa_load_verify() at spa_load_verify+0xf3/frame 0xfe044cdcc1f0
> spa_load_impl() at spa_load_impl+0x2069/frame 0xfe044cdcc610
> spa_load() at spa_load+0x320/frame 0xfe044cdcc6d0
> spa_load_impl() at spa_load_impl+0x150b/frame 0xfe044cdccaf0
> spa_load() at spa_load+0x320/frame 0xfe044cdccbb0
> spa_load_best() at spa_load_best+0xc6/frame 0xfe044cdccc50
> spa_open_common() at spa_open_common+0x246/frame 0xfe044cdccd40
> spa_open() at spa_open+0x35/frame 0xfe044cdccd70
> dsl_pool_hold() at dsl_pool_hold+0x2d/frame 0xfe044cdccdb0
> dmu_objset_own() at dmu_objset_own+0x2e/frame 0xfe044cdcce30
> zfsvfs_create() at zfsvfs_create+0x100/frame 0xfe044cdcd050
> zfs_domount() at zfs_domount+0xa7/frame 0xfe044cdcd0e0
> zfs_mount() at zfs_mount+0x6c3/frame 0xfe044cdcd390
> vfs_donmount() at vfs_donmount+0x1330/frame 0xfe044cdcd660
> kernel_mount() at kernel_mount+0x62/frame 0xfe044cdcd6c0
> parse_mount() at parse_mount+0x668/frame 0xfe044cdcd810
> vfs_mountroot() at vfs_mountroot+0x85c/frame 0xfe044cdcd9d0
> start_init() at start_init+0x62/frame 0xfe044cdcda70
> fork_exit() at fork_exit+0x84/frame 0xfe044cdcdab0
> fork_trampoline() at fork_trampoline+0xe/frame 0xfe044cdcdab0
> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> KDB: enter: panic
> 
>

Re: Instant panic while trying run ports-mgmt/poudriere

2015-08-27 Thread Andriy Gapon

On 27/08/2015 02:36, John-Mark Gurney wrote:
> We should/cannot get here w/ an empty list.  If we do, then there is
> something seriously wrong...  The current kn (which we must have as we
> are here) MUST be on the list, but as you just showed, there are no
> knotes on the list.
> 
> Can you get me a print of the knote?  That way I can see what flags
> are on it?

Apologies if the following might sound a little bit patronizing, but it
seems that you have got all the facts correctly, but somehow the
connection between them did not become clear.

So:
1. The list originally is NOT empty.  I guess that it has one entry, but
that's an unimportant detail.
2. This is why the loop is entered. It's a fact that it is entered.
3. The list becomes empty precisely because the entry is removed during
the iteration in the loop (as kib has explained).  It's a fact that the
list became empty at least in the panic that I reported.
4. The element is not only unlinked from the list, but its memory is
also freed.
5. That's why we have the use after free: SLIST_FOREACH is trying to get
a pointer to a next element from the freed memory.
6. This is why the commit for trashing the freed memory made all the
difference: previously the freed memory was unlikely to be re-used /
modified, so the use-after-free had a high chance of succeeding.  It's a
fact that in my panic there was an attempt to dereference a trashed pointer.
7. Finally, this is why SLIST_FOREACH_SAFE helps here: we stash the
pointer to the next element beforehand and, thus, we do not access the
freed memory.

Please let me know if you see any fault in above reasoning or if
something is still no clear.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Instant panic while trying run ports-mgmt/poudriere

2015-08-27 Thread Don Lewis

On 27 Aug, Don Lewis wrote:
> On 27 Aug, Lawrence Stewart wrote:
>> On 08/27/15 09:36, John-Mark Gurney wrote:
>>> Andriy Gapon wrote this message on Sun, Aug 23, 2015 at 09:54 +0300:
 On 12/08/2015 17:11, Lawrence Stewart wrote:
> On 08/07/15 07:33, Pawel Pekala wrote:
>> Hi K.,
>>
>> On 2015-08-06 12:33 -0700, "K. Macy"  wrote:
>>> Is this still happening?
>>
>> Still crashes:
>
> +1 for me running r286617

 Here is another +1 with r286922.
 I can add a couple of bits of debugging data:

 (kgdb) fr 8
 #8  0x80639d60 in knote (list=0xf8019a733ea0,
 hint=2147483648, lockflags=) at
 /usr/src/sys/kern/kern_event.c:1964
 1964} else if ((lockflags & KNF_NOKQLOCK) != 0) {
 (kgdb) p *list
 $2 = {kl_list = {slh_first = 0x0}, kl_lock = 0x8063a1e0
>>> 
>>> We should/cannot get here w/ an empty list.  If we do, then there is
>>> something seriously wrong...  The current kn (which we must have as we
>>> are here) MUST be on the list, but as you just showed, there are no
>>> knotes on the list.
>>> 
>>> Can you get me a print of the knote?  That way I can see what flags
>>> are on it?
>> 
>> I quickly tried to get this info for you by building my kernel with -O0
>> and reproducing, but I get an insta-panic on boot with the new kernel:
>> 
>> Fatal double fault
>> rip = 0x8218c794
>> rsp = 0xfe044cdc9fe0
>> rbp = 0xfe044cdca110
>> cpuid = 2; apic id = 02
>> panic: double fault
>> cpuid = 2
>> KDB: stack backtrace:
>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
>> 0xfe03dcfffe30
>> vpanic() at vpanic+0x189/frame 0xfe03dcfffeb0
>> panic() at panic+0x43/frame 0xfe03dc10
>> dblfault_handler() at dblfault_handler+0xa2/frame 0xfe03dc30
>> Xdblfault() at Xdblfault+0xac/frame 0xfe03dc30
>> --- trap 0x17, rip = 0x8218c794, rsp = 0xfe044cdc9fe0, rbp =
>> 0xfe044cdca110 ---
>> vdev_queue_aggregate() at vdev_queue_aggregate+0x34/frame 0xfe044cdca110
>> vdev_queue_io_to_issue() at vdev_queue_io_to_issue+0x1f5/frame
>> 0xfe044cdca560
>> vdev_queue_io() at vdev_queue_io+0x19a/frame 0xfe044cdca5b0
>> zio_vdev_io_start() at zio_vdev_io_start+0x81f/frame 0xfe044cdca6e0
>> zio_execute() at zio_execute+0x23b/frame 0xfe044cdca730
>> zio_nowait() at zio_nowait+0xbe/frame 0xfe044cdca760
>> vdev_mirror_io_start() at vdev_mirror_io_start+0x140/frame
>> 0xfe044cdca800
>> zio_vdev_io_start() at zio_vdev_io_start+0x12f/frame 0xfe044cdca930
>> zio_execute() at zio_execute+0x23b/frame 0xfe044cdca980
>> zio_nowait() at zio_nowait+0xbe/frame 0xfe044cdca9b0
>> spa_load_verify_cb() at spa_load_verify_cb+0x37d/frame 0xfe044cdcaa50
>> traverse_visitbp() at traverse_visitbp+0x5a5/frame 0xfe044cdcac60
>> traverse_dnode() at traverse_dnode+0x98/frame 0xfe044cdcacd0
>> traverse_visitbp() at traverse_visitbp+0xc66/frame 0xfe044cdcaee0
>> traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb0f0
>> traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb300
>> traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb510
>> traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb720
>> traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb930
>> traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcbb40
>> traverse_dnode() at traverse_dnode+0x98/frame 0xfe044cdcbbb0
>> traverse_visitbp() at traverse_visitbp+0xe59/frame 0xfe044cdcbdc0
>> traverse_impl() at traverse_impl+0x79d/frame 0xfe044cdcbfd0
>> traverse_dataset() at traverse_dataset+0x93/frame 0xfe044cdcc040
>> traverse_pool() at traverse_pool+0x1f2/frame 0xfe044cdcc140
>> spa_load_verify() at spa_load_verify+0xf3/frame 0xfe044cdcc1f0
>> spa_load_impl() at spa_load_impl+0x2069/frame 0xfe044cdcc610
>> spa_load() at spa_load+0x320/frame 0xfe044cdcc6d0
>> spa_load_impl() at spa_load_impl+0x150b/frame 0xfe044cdccaf0
>> spa_load() at spa_load+0x320/frame 0xfe044cdccbb0
>> spa_load_best() at spa_load_best+0xc6/frame 0xfe044cdccc50
>> spa_open_common() at spa_open_common+0x246/frame 0xfe044cdccd40
>> spa_open() at spa_open+0x35/frame 0xfe044cdccd70
>> dsl_pool_hold() at dsl_pool_hold+0x2d/frame 0xfe044cdccdb0
>> dmu_objset_own() at dmu_objset_own+0x2e/frame 0xfe044cdcce30
>> zfsvfs_create() at zfsvfs_create+0x100/frame 0xfe044cdcd050
>> zfs_domount() at zfs_domount+0xa7/frame 0xfe044cdcd0e0
>> zfs_mount() at zfs_mount+0x6c3/frame 0xfe044cdcd390
>> vfs_donmount() at vfs_donmount+0x1330/frame 0xfe044cdcd660
>> kernel_mount() at kernel_mount+0x62/frame 0xfe044cdcd6c0
>> parse_mount() at parse_mount+0x668/frame 0xfe044cdcd810
>> vfs_mountroot() at vfs_mountroot+0x85c/frame 0xfe044cdcd9d0
>> start_init() at start_init+0x62/frame 0xfe044cdcda70
>> fork_exit() at fork_exit+0x84/frame 0xfe044cdcdab0
>> fork_tr

Re: Instant panic while trying run ports-mgmt/poudriere

2015-08-26 Thread Lawrence Stewart

On 08/27/15 09:36, John-Mark Gurney wrote:
> Andriy Gapon wrote this message on Sun, Aug 23, 2015 at 09:54 +0300:
>> On 12/08/2015 17:11, Lawrence Stewart wrote:
>>> On 08/07/15 07:33, Pawel Pekala wrote:
 Hi K.,

 On 2015-08-06 12:33 -0700, "K. Macy"  wrote:
> Is this still happening?

 Still crashes:
>>>
>>> +1 for me running r286617
>>
>> Here is another +1 with r286922.
>> I can add a couple of bits of debugging data:
>>
>> (kgdb) fr 8
>> #8  0x80639d60 in knote (list=0xf8019a733ea0,
>> hint=2147483648, lockflags=) at
>> /usr/src/sys/kern/kern_event.c:1964
>> 1964} else if ((lockflags & KNF_NOKQLOCK) != 0) {
>> (kgdb) p *list
>> $2 = {kl_list = {slh_first = 0x0}, kl_lock = 0x8063a1e0
> 
> We should/cannot get here w/ an empty list.  If we do, then there is
> something seriously wrong...  The current kn (which we must have as we
> are here) MUST be on the list, but as you just showed, there are no
> knotes on the list.
> 
> Can you get me a print of the knote?  That way I can see what flags
> are on it?

I quickly tried to get this info for you by building my kernel with -O0
and reproducing, but I get an insta-panic on boot with the new kernel:

Fatal double fault
rip = 0x8218c794
rsp = 0xfe044cdc9fe0
rbp = 0xfe044cdca110
cpuid = 2; apic id = 02
panic: double fault
cpuid = 2
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
0xfe03dcfffe30
vpanic() at vpanic+0x189/frame 0xfe03dcfffeb0
panic() at panic+0x43/frame 0xfe03dc10
dblfault_handler() at dblfault_handler+0xa2/frame 0xfe03dc30
Xdblfault() at Xdblfault+0xac/frame 0xfe03dc30
--- trap 0x17, rip = 0x8218c794, rsp = 0xfe044cdc9fe0, rbp =
0xfe044cdca110 ---
vdev_queue_aggregate() at vdev_queue_aggregate+0x34/frame 0xfe044cdca110
vdev_queue_io_to_issue() at vdev_queue_io_to_issue+0x1f5/frame
0xfe044cdca560
vdev_queue_io() at vdev_queue_io+0x19a/frame 0xfe044cdca5b0
zio_vdev_io_start() at zio_vdev_io_start+0x81f/frame 0xfe044cdca6e0
zio_execute() at zio_execute+0x23b/frame 0xfe044cdca730
zio_nowait() at zio_nowait+0xbe/frame 0xfe044cdca760
vdev_mirror_io_start() at vdev_mirror_io_start+0x140/frame
0xfe044cdca800
zio_vdev_io_start() at zio_vdev_io_start+0x12f/frame 0xfe044cdca930
zio_execute() at zio_execute+0x23b/frame 0xfe044cdca980
zio_nowait() at zio_nowait+0xbe/frame 0xfe044cdca9b0
spa_load_verify_cb() at spa_load_verify_cb+0x37d/frame 0xfe044cdcaa50
traverse_visitbp() at traverse_visitbp+0x5a5/frame 0xfe044cdcac60
traverse_dnode() at traverse_dnode+0x98/frame 0xfe044cdcacd0
traverse_visitbp() at traverse_visitbp+0xc66/frame 0xfe044cdcaee0
traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb0f0
traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb300
traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb510
traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb720
traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcb930
traverse_visitbp() at traverse_visitbp+0x930/frame 0xfe044cdcbb40
traverse_dnode() at traverse_dnode+0x98/frame 0xfe044cdcbbb0
traverse_visitbp() at traverse_visitbp+0xe59/frame 0xfe044cdcbdc0
traverse_impl() at traverse_impl+0x79d/frame 0xfe044cdcbfd0
traverse_dataset() at traverse_dataset+0x93/frame 0xfe044cdcc040
traverse_pool() at traverse_pool+0x1f2/frame 0xfe044cdcc140
spa_load_verify() at spa_load_verify+0xf3/frame 0xfe044cdcc1f0
spa_load_impl() at spa_load_impl+0x2069/frame 0xfe044cdcc610
spa_load() at spa_load+0x320/frame 0xfe044cdcc6d0
spa_load_impl() at spa_load_impl+0x150b/frame 0xfe044cdccaf0
spa_load() at spa_load+0x320/frame 0xfe044cdccbb0
spa_load_best() at spa_load_best+0xc6/frame 0xfe044cdccc50
spa_open_common() at spa_open_common+0x246/frame 0xfe044cdccd40
spa_open() at spa_open+0x35/frame 0xfe044cdccd70
dsl_pool_hold() at dsl_pool_hold+0x2d/frame 0xfe044cdccdb0
dmu_objset_own() at dmu_objset_own+0x2e/frame 0xfe044cdcce30
zfsvfs_create() at zfsvfs_create+0x100/frame 0xfe044cdcd050
zfs_domount() at zfs_domount+0xa7/frame 0xfe044cdcd0e0
zfs_mount() at zfs_mount+0x6c3/frame 0xfe044cdcd390
vfs_donmount() at vfs_donmount+0x1330/frame 0xfe044cdcd660
kernel_mount() at kernel_mount+0x62/frame 0xfe044cdcd6c0
parse_mount() at parse_mount+0x668/frame 0xfe044cdcd810
vfs_mountroot() at vfs_mountroot+0x85c/frame 0xfe044cdcd9d0
start_init() at start_init+0x62/frame 0xfe044cdcda70
fork_exit() at fork_exit+0x84/frame 0xfe044cdcdab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfe044cdcdab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic

Didn't get a core because it panics before dumpdev is set.

Is anyone else able to run -O0 kernels or do I have something set to evil?

Cheers,
Lawrence
___
freebsd-

Re: Instant panic while trying run ports-mgmt/poudriere

2015-08-26 Thread John-Mark Gurney

Konstantin Belousov wrote this message on Mon, Aug 24, 2015 at 11:10 +0300:
> On Sun, Aug 23, 2015 at 10:35:44PM -0700, John-Mark Gurney wrote:
> > Konstantin Belousov wrote this message on Sun, Aug 23, 2015 at 15:54 +0300:
> > >   if (kev->flags & EV_ADD)
> > > - tkn = knote_alloc(waitok);  /* prevent waiting with locks */
> > > + /*
> > > +  * Prevent waiting with locks.  Non-sleepable
> > > +  * allocation failures are handled in the loop, only
> > > +  * if the spare knote appears to be actually required.
> > > +  */
> > > + tkn = knote_alloc(waitok);
> > 
> > if you add this comment, please add curly braces around the block...
> Ok.
> 
> > 
> > >   else
> > >   tkn = NULL;
> > >  
> > > @@ -1310,8 +1315,7 @@ done:
> > >   FILEDESC_XUNLOCK(td->td_proc->p_fd);
> > >   if (fp != NULL)
> > >   fdrop(fp, td);
> > > - if (tkn != NULL)
> > > - knote_free(tkn);
> > > + knote_free(tkn);
> > 
> > Probably should just change knote_free to a static inline that does
> > a uma_zfree as uma_zfree also does nothing is the input is NULL...
> This was already done in the patch (the removal of the NULL check in
> knote_free()). I usually do not add excessive inline keywords. Compilers
> are good, sometimes even too good, at figuring out the possibilities for
> inlining. knote_free() is inlined automatically.

Though it is, if we really change knote_free to a bare uma_free, then
either mark it inline (to be explicit about it's behavior), or make a
macro out of it... I don't particularly like functions that contain one
line of simple code...

> > > @@ -1948,7 +1948,7 @@ knote(struct knlist *list, long hint, int lockflags)
> > >* only safe if you want to remove the current item, which we are
> > >* not doing.
> > >*/
> > > - SLIST_FOREACH(kn, &list->kl_list, kn_selnext) {
> > > + SLIST_FOREACH_SAFE(kn, &list->kl_list, kn_selnext, tkn) {
> > 
> > Clearly you didn't read the comment that preceeds this line, or at
> > least didn't update it:
> >  * SLIST_FOREACH, SLIST_FOREACH_SAFE is not safe in our case, it is
> >  * only safe if you want to remove the current item, which we are
> >  * not doing.
> > 
> > So, you'll need to be more specific in why this needs to change...
> > When I wrote this code, I spent a lot of time looking at this, and
> > reasoned as to why SLIST_FOREACH_SAFE was NOT correct usage here...
> I explained what happens in the message.  The knote list is modified
> by the filter, see knlist_remove_inevent() call in filt_proc().
>
> > >   kq = kn->kn_kq;
> > >   KQ_LOCK(kq);
> > >   if ((kn->kn_status & (KN_INFLUX | KN_SCAN)) == KN_INFLUX) {
> > > @@ -2385,15 +2385,16 @@ SYSINIT(knote, SI_SUB_PSEUDO, SI_ORDER_ANY, 
> > > knote_init, NULL);
> > >  static struct knote *
> > >  knote_alloc(int waitok)
> > >  {
> > > - return ((struct knote *)uma_zalloc(knote_zone,
> > > - (waitok ? M_WAITOK : M_NOWAIT)|M_ZERO));
> > > +
> > > + return (uma_zalloc(knote_zone, (waitok ? M_WAITOK : M_NOWAIT) |
> > > + M_ZERO));
> > >  }
> > >  
> > >  static void
> > 
> > per above, we should add inline here...
> > 
> > >  knote_free(struct knote *kn)
> > >  {
> > > - if (kn != NULL)
> > > - uma_zfree(knote_zone, kn);
> > > +
> > > + uma_zfree(knote_zone, kn);
> > >  }
> > >  
> > >  /*
> > 
> > I agree w/ the all the non-SLIST changes, but I disagree w/ the SLIST
> > change as I don't believe that all cases was considered...
> What cases do you mean ?
> 
> The patch does not unlock knlist lock in the iteration. As such, the
> only thread which could remove elements from the knlist, or rearrange
> the list, while loop is active, is the current thread. So I claim that
> the only the current iterating element can be removed, and the next list
> element stays valid. This is enough for _SAFE loop to work.
>
> Why do you think that _SAFE is incorrect ? Comment talks about very

I can't think of the reason right now, but I do remeber puzzling over
this issue for some hours when I wrote this code, and I had proved
to myself that _SAFE was NOT _SAFE for this use case...

In the quick look I just had, I have not been able to decide one way
or the other, but I'm suspicious that this is a recent issue, as this
code has been running for close to a decade w/o any issues, and wonder
if there was some other change that trigger the issue...

The reason I'm cautious about changing this is that the code has been
running fine for over a decade...  Have you done a full test to
validate that nothing else breaks?

Ok, after looking more at the original dump, this is a use after free
bug...  As I said in another email, it should not be possible to get
into the _FOREACH loop where knlist is an empty list.  If it does,
then there is another major bug that needs to be found...  A simple
change to _SAFE will not fix this use after free bug...

> different case, where the knlist lock is dropp

Re: Instant panic while trying run ports-mgmt/poudriere

2015-08-26 Thread John-Mark Gurney

Andriy Gapon wrote this message on Sun, Aug 23, 2015 at 09:54 +0300:
> On 12/08/2015 17:11, Lawrence Stewart wrote:
> > On 08/07/15 07:33, Pawel Pekala wrote:
> >> Hi K.,
> >>
> >> On 2015-08-06 12:33 -0700, "K. Macy"  wrote:
> >>> Is this still happening?
> >>
> >> Still crashes:
> > 
> > +1 for me running r286617
> 
> Here is another +1 with r286922.
> I can add a couple of bits of debugging data:
> 
> (kgdb) fr 8
> #8  0x80639d60 in knote (list=0xf8019a733ea0,
> hint=2147483648, lockflags=) at
> /usr/src/sys/kern/kern_event.c:1964
> 1964} else if ((lockflags & KNF_NOKQLOCK) != 0) {
> (kgdb) p *list
> $2 = {kl_list = {slh_first = 0x0}, kl_lock = 0x8063a1e0

We should/cannot get here w/ an empty list.  If we do, then there is
something seriously wrong...  The current kn (which we must have as we
are here) MUST be on the list, but as you just showed, there are no
knotes on the list.

Can you get me a print of the knote?  That way I can see what flags
are on it?

> , kl_unlock = 0x8063a200 ,
>   kl_assert_locked = 0x8063a220 ,
> kl_assert_unlocked = 0x8063a240 ,
>   kl_lockarg = 0xf8019a733bb0}
> (kgdb) disassemble
> Dump of assembler code for function knote:
> 0x80639d00 :   push   %rbp
> 0x80639d01 :   mov%rsp,%rbp
> 0x80639d04 :   push   %r15
> 0x80639d06 :   push   %r14
> 0x80639d08 :   push   %r13
> 0x80639d0a :  push   %r12
> 0x80639d0c :  push   %rbx
> 0x80639d0d :  sub$0x18,%rsp
> 0x80639d11 :  mov%edx,%r12d
> 0x80639d14 :  mov%rsi,-0x30(%rbp)
> 0x80639d18 :  mov%rdi,%rbx
> 0x80639d1b :  test   %rbx,%rbx
> 0x80639d1e :  je 0x80639ef6 
> 0x80639d24 :  mov%r12d,%eax
> 0x80639d27 :  and$0x1,%eax
> 0x80639d2a :  mov%eax,-0x3c(%rbp)
> 0x80639d2d :  mov0x28(%rbx),%rdi
> 0x80639d31 :  je 0x80639d38 
> 0x80639d33 :  callq  *0x18(%rbx)
> 0x80639d36 :  jmp0x80639d42 
> 0x80639d38 :  callq  *0x20(%rbx)
> 0x80639d3b :  mov0x28(%rbx),%rdi
> 0x80639d3f :  callq  *0x8(%rbx)
> 0x80639d42 :  mov%rbx,-0x38(%rbp)
> 0x80639d46 :  mov(%rbx),%rbx
> 0x80639d49 :  test   %rbx,%rbx
> 0x80639d4c :  je 0x80639ee5 
> 0x80639d52 :  and$0x2,%r12d
> 0x80639d56 :  nopw   %cs:0x0(%rax,%rax,1)
> 0x80639d60 :  mov0x28(%rbx),%r14
> 
> Panic is in the last quoted instruction.
> And:
> (kgdb) i reg
> rax0x246582
> rbx0xdeadc0dedeadc0de   -2401050962867404578
> rcx0x0  0
> rdx0x12e302
> rsi0x80a26a5a   -2136839590
> rdi0x80e81b80   -2132272256
> rbp0xfe02b7efea20   0xfe02b7efea20
> rsp0xfe02b7efe9e0   0xfe02b7efe9e0
> r8 0x80a269ce   -2136839730
> r9 0x80e82838   -2132269000
> r100x1  65536
> r110x80fabd10   -2131051248
> r120x0  0
> r130xf801ff84a818   -8787511171048
> r140xf801ff84a800   -8787511171072
> r150xf8019a6974f0   -8789207452432
> rip0x80639d60   0x80639d60 
> eflags 0x10286  66182
> 
> I think that $rbx stands out here (this is a kernel with INVARIANTS).

Yeh, it was probably r284861 that I added to catch use after free bugs
like this...  You could try reverting r284861 to see if the bug goes
away... If it does, then this is most likely a use after free bug...

> Looking at the code, is it possible that one of the calls from within
> the loop's body modifies the list?  If that is so and provided that is a
> valid behavior, then maybe using SLIST_FOREACH_SAFE would help.

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 "All that I will do, has been done, All that I have, has not."
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Instant panic while trying run ports-mgmt/poudriere

2015-08-26 Thread Pawel Pekala

Hi Konstantin,

On 2015-08-23 15:54 +0300, Konstantin Belousov 
wrote:
>After looking at your data closely, I think you are right.  The panic
>occurs when the exit1(9) does KNOTE_LOCKED(NOTE_EXIT).  This is the
>only case in the tree where filter uses knlist_remove_inevent() to
>detach processed note, so indeed the slist is modified under the
>iterator.
>
>Below is the patch with the suggested change and unrelated cleanup of
>the uma(9) KPI use.  Please test, everybody who has a panic with the
>backtrace pointing to the sys_exit().

This patch fixes issue for me. Thank you.

-- 
pozdrawiam / with regards
Paweł Pękala
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Instant panic while trying run ports-mgmt/poudriere

2015-08-24 Thread Konstantin Belousov

On Sun, Aug 23, 2015 at 10:35:44PM -0700, John-Mark Gurney wrote:
> Konstantin Belousov wrote this message on Sun, Aug 23, 2015 at 15:54 +0300:
> > if (kev->flags & EV_ADD)
> > -   tkn = knote_alloc(waitok);  /* prevent waiting with locks */
> > +   /*
> > +* Prevent waiting with locks.  Non-sleepable
> > +* allocation failures are handled in the loop, only
> > +* if the spare knote appears to be actually required.
> > +*/
> > +   tkn = knote_alloc(waitok);
> 
> if you add this comment, please add curly braces around the block...
Ok.

> 
> > else
> > tkn = NULL;
> >  
> > @@ -1310,8 +1315,7 @@ done:
> > FILEDESC_XUNLOCK(td->td_proc->p_fd);
> > if (fp != NULL)
> > fdrop(fp, td);
> > -   if (tkn != NULL)
> > -   knote_free(tkn);
> > +   knote_free(tkn);
> 
> Probably should just change knote_free to a static inline that does
> a uma_zfree as uma_zfree also does nothing is the input is NULL...
This was already done in the patch (the removal of the NULL check in
knote_free()). I usually do not add excessive inline keywords. Compilers
are good, sometimes even too good, at figuring out the possibilities for
inlining. knote_free() is inlined automatically.

> > @@ -1948,7 +1948,7 @@ knote(struct knlist *list, long hint, int lockflags)
> >  * only safe if you want to remove the current item, which we are
> >  * not doing.
> >  */
> > -   SLIST_FOREACH(kn, &list->kl_list, kn_selnext) {
> > +   SLIST_FOREACH_SAFE(kn, &list->kl_list, kn_selnext, tkn) {
> 
> Clearly you didn't read the comment that preceeds this line, or at
> least didn't update it:
>  * SLIST_FOREACH, SLIST_FOREACH_SAFE is not safe in our case, it is
>  * only safe if you want to remove the current item, which we are
>  * not doing.
> 
> So, you'll need to be more specific in why this needs to change...
> When I wrote this code, I spent a lot of time looking at this, and
> reasoned as to why SLIST_FOREACH_SAFE was NOT correct usage here...
I explained what happens in the message.  The knote list is modified
by the filter, see knlist_remove_inevent() call in filt_proc().

> 
> > kq = kn->kn_kq;
> > KQ_LOCK(kq);
> > if ((kn->kn_status & (KN_INFLUX | KN_SCAN)) == KN_INFLUX) {
> > @@ -2385,15 +2385,16 @@ SYSINIT(knote, SI_SUB_PSEUDO, SI_ORDER_ANY, 
> > knote_init, NULL);
> >  static struct knote *
> >  knote_alloc(int waitok)
> >  {
> > -   return ((struct knote *)uma_zalloc(knote_zone,
> > -   (waitok ? M_WAITOK : M_NOWAIT)|M_ZERO));
> > +
> > +   return (uma_zalloc(knote_zone, (waitok ? M_WAITOK : M_NOWAIT) |
> > +   M_ZERO));
> >  }
> >  
> >  static void
> 
> per above, we should add inline here...
> 
> >  knote_free(struct knote *kn)
> >  {
> > -   if (kn != NULL)
> > -   uma_zfree(knote_zone, kn);
> > +
> > +   uma_zfree(knote_zone, kn);
> >  }
> >  
> >  /*
> 
> I agree w/ the all the non-SLIST changes, but I disagree w/ the SLIST
> change as I don't believe that all cases was considered...
What cases do you mean ?

The patch does not unlock knlist lock in the iteration. As such, the
only thread which could remove elements from the knlist, or rearrange
the list, while loop is active, is the current thread. So I claim that
the only the current iterating element can be removed, and the next list
element stays valid. This is enough for _SAFE loop to work.

Why do you think that _SAFE is incorrect ? Comment talks about very
different case, where the knlist lock is dropped. Then indeed, other
thread may iterate in parallel, and invalidate the memoized next element
while KN_INFLUX is set for the current element and knlist is dropped.
But _SAFE in sys/queue.h never means 'safe for parallel mutators', it
only means 'safe for the current iterator removing current element'.

I preferred not to touch the comment until it is confirmed that the
change help.  I reformulated it now, trying to keep the note about
unlock (but is it useful ?).

> 
> Anyways, the other changes shouldn't be committed w/ the SLIST change
> as they are unrelated...
Sure, I posted the diff against the WIP branch.  The commits will be split.

diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c
index a4388aa..0e26a78 100644
--- a/sys/kern/kern_event.c
+++ b/sys/kern/kern_event.c
@@ -1105,10 +1105,16 @@ kqueue_register(struct kqueue *kq, struct kevent *kev, 
struct thread *td, int wa
if (fops == NULL)
return EINVAL;
 
-   if (kev->flags & EV_ADD)
-   tkn = knote_alloc(waitok);  /* prevent waiting with locks */
-   else
+   if (kev->flags & EV_ADD) {
+   /*
+* Prevent waiting with locks.  Non-sleepable
+* allocation failures are handled in the loop, only
+* if the spare knote appears to be actually required.
+*/
+   tkn = knote_al

Re: Instant panic while trying run ports-mgmt/poudriere

2015-08-23 Thread Andriy Gapon

On 23/08/2015 15:54, Konstantin Belousov wrote:
> After looking at your data closely, I think you are right.  The panic
> occurs when the exit1(9) does KNOTE_LOCKED(NOTE_EXIT).  This is the
> only case in the tree where filter uses knlist_remove_inevent() to detach
> processed note, so indeed the slist is modified under the iterator.
> 
> Below is the patch with the suggested change and unrelated cleanup of
> the uma(9) KPI use.  Please test, everybody who has a panic with the
> backtrace pointing to the sys_exit().

Thank you very much!
I no longer get the panic in the test case that previously triggered it.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Instant panic while trying run ports-mgmt/poudriere

2015-08-23 Thread John-Mark Gurney

Konstantin Belousov wrote this message on Sun, Aug 23, 2015 at 15:54 +0300:
> On Sun, Aug 23, 2015 at 12:08:16PM +0300, Konstantin Belousov wrote:
> > On Sun, Aug 23, 2015 at 09:54:28AM +0300, Andriy Gapon wrote:
> > > On 12/08/2015 17:11, Lawrence Stewart wrote:
> > > > On 08/07/15 07:33, Pawel Pekala wrote:
> > > >> Hi K.,
> > > >>
> > > >> On 2015-08-06 12:33 -0700, "K. Macy"  wrote:
> > > >>> Is this still happening?
> > > >>
> > > >> Still crashes:
> > > > 
> > > > +1 for me running r286617
> > > 
> > > Here is another +1 with r286922.
> > > I can add a couple of bits of debugging data:
> > > 
> > > (kgdb) fr 8
> > > #8  0x80639d60 in knote (list=0xf8019a733ea0,
> > > hint=2147483648, lockflags=) at
> > > /usr/src/sys/kern/kern_event.c:1964
> > > 1964} else if ((lockflags & KNF_NOKQLOCK) != 0) {
> > > (kgdb) p *list
> > > $2 = {kl_list = {slh_first = 0x0}, kl_lock = 0x8063a1e0
> > > , kl_unlock = 0x8063a200 ,
> > >   kl_assert_locked = 0x8063a220 ,
> > > kl_assert_unlocked = 0x8063a240 ,
> > >   kl_lockarg = 0xf8019a733bb0}
> > > (kgdb) disassemble
> > > Dump of assembler code for function knote:
> > > 0x80639d00 :   push   %rbp
> > > 0x80639d01 :   mov%rsp,%rbp
> > > 0x80639d04 :   push   %r15
> > > 0x80639d06 :   push   %r14
> > > 0x80639d08 :   push   %r13
> > > 0x80639d0a :  push   %r12
> > > 0x80639d0c :  push   %rbx
> > > 0x80639d0d :  sub$0x18,%rsp
> > > 0x80639d11 :  mov%edx,%r12d
> > > 0x80639d14 :  mov%rsi,-0x30(%rbp)
> > > 0x80639d18 :  mov%rdi,%rbx
> > > 0x80639d1b :  test   %rbx,%rbx
> > > 0x80639d1e :  je 0x80639ef6 
> > > 0x80639d24 :  mov%r12d,%eax
> > > 0x80639d27 :  and$0x1,%eax
> > > 0x80639d2a :  mov%eax,-0x3c(%rbp)
> > > 0x80639d2d :  mov0x28(%rbx),%rdi
> > > 0x80639d31 :  je 0x80639d38 
> > > 0x80639d33 :  callq  *0x18(%rbx)
> > > 0x80639d36 :  jmp0x80639d42 
> > > 0x80639d38 :  callq  *0x20(%rbx)
> > > 0x80639d3b :  mov0x28(%rbx),%rdi
> > > 0x80639d3f :  callq  *0x8(%rbx)
> > > 0x80639d42 :  mov%rbx,-0x38(%rbp)
> > > 0x80639d46 :  mov(%rbx),%rbx
> > > 0x80639d49 :  test   %rbx,%rbx
> > > 0x80639d4c :  je 0x80639ee5 
> > > 0x80639d52 :  and$0x2,%r12d
> > > 0x80639d56 :  nopw   %cs:0x0(%rax,%rax,1)
> > > 0x80639d60 :  mov0x28(%rbx),%r14
> > > 
> > > Panic is in the last quoted instruction.
> > > And:
> > > (kgdb) i reg
> > > rax0x246582
> > > rbx0xdeadc0dedeadc0de   -2401050962867404578
> > > rcx0x0  0
> > > rdx0x12e302
> > > rsi0x80a26a5a   -2136839590
> > > rdi0x80e81b80   -2132272256
> > > rbp0xfe02b7efea20   0xfe02b7efea20
> > > rsp0xfe02b7efe9e0   0xfe02b7efe9e0
> > > r8 0x80a269ce   -2136839730
> > > r9 0x80e82838   -2132269000
> > > r100x1  65536
> > > r110x80fabd10   -2131051248
> > > r120x0  0
> > > r130xf801ff84a818   -8787511171048
> > > r140xf801ff84a800   -8787511171072
> > > r150xf8019a6974f0   -8789207452432
> > > rip0x80639d60   0x80639d60 
> > > eflags 0x10286  66182
> > > 
> > > I think that $rbx stands out here (this is a kernel with INVARIANTS).
> > > 
> > > Looking at the code, is it possible that one of the calls from within
> > > the loop's body modifies the list?  If that is so and provided that is a
> > > valid behavior, then maybe using SLIST_FOREACH_SAFE would help.
> > 
> > This is first time a useful debugging data was posted.
> > 
> > The 0x28 offset may indicate either kn_kq member access of the struct
> > knote, or kq_list of the struct kqueue.
> > 
> > kl_list.slh_first of the list parameter is NULL, how would a list
> > iteration loop even start ?  Can you look up the list argument value
> > from the previous frame (%rdi is overwritten, so debugger might be
> > confused) ?
> 
> After looking at your data closely, I think you are right.  The panic
> occurs when the exit1(9) does KNOTE_LOCKED(NOTE_EXIT).  This is the
> only case in the tree where filter uses knlist_remove_inevent() to detach
> processed note, so indeed the slist is modified under the iterator.
> 
> Below is the patch with the suggested change and unrelated cleanup of
> the uma(9) KPI use.  Please test, everybody who has a panic with the
> backtrace pointing to the sys_exit().
> 
> diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c
> index a4388aa..2f15f7f 100644
> --- a/sys/kern/kern_event.c
> +++ b/sys/kern/kern_

Re: Instant panic while trying run ports-mgmt/poudriere

2015-08-23 Thread Konstantin Belousov

On Sun, Aug 23, 2015 at 12:08:16PM +0300, Konstantin Belousov wrote:
> On Sun, Aug 23, 2015 at 09:54:28AM +0300, Andriy Gapon wrote:
> > On 12/08/2015 17:11, Lawrence Stewart wrote:
> > > On 08/07/15 07:33, Pawel Pekala wrote:
> > >> Hi K.,
> > >>
> > >> On 2015-08-06 12:33 -0700, "K. Macy"  wrote:
> > >>> Is this still happening?
> > >>
> > >> Still crashes:
> > > 
> > > +1 for me running r286617
> > 
> > Here is another +1 with r286922.
> > I can add a couple of bits of debugging data:
> > 
> > (kgdb) fr 8
> > #8  0x80639d60 in knote (list=0xf8019a733ea0,
> > hint=2147483648, lockflags=) at
> > /usr/src/sys/kern/kern_event.c:1964
> > 1964} else if ((lockflags & KNF_NOKQLOCK) != 0) {
> > (kgdb) p *list
> > $2 = {kl_list = {slh_first = 0x0}, kl_lock = 0x8063a1e0
> > , kl_unlock = 0x8063a200 ,
> >   kl_assert_locked = 0x8063a220 ,
> > kl_assert_unlocked = 0x8063a240 ,
> >   kl_lockarg = 0xf8019a733bb0}
> > (kgdb) disassemble
> > Dump of assembler code for function knote:
> > 0x80639d00 :   push   %rbp
> > 0x80639d01 :   mov%rsp,%rbp
> > 0x80639d04 :   push   %r15
> > 0x80639d06 :   push   %r14
> > 0x80639d08 :   push   %r13
> > 0x80639d0a :  push   %r12
> > 0x80639d0c :  push   %rbx
> > 0x80639d0d :  sub$0x18,%rsp
> > 0x80639d11 :  mov%edx,%r12d
> > 0x80639d14 :  mov%rsi,-0x30(%rbp)
> > 0x80639d18 :  mov%rdi,%rbx
> > 0x80639d1b :  test   %rbx,%rbx
> > 0x80639d1e :  je 0x80639ef6 
> > 0x80639d24 :  mov%r12d,%eax
> > 0x80639d27 :  and$0x1,%eax
> > 0x80639d2a :  mov%eax,-0x3c(%rbp)
> > 0x80639d2d :  mov0x28(%rbx),%rdi
> > 0x80639d31 :  je 0x80639d38 
> > 0x80639d33 :  callq  *0x18(%rbx)
> > 0x80639d36 :  jmp0x80639d42 
> > 0x80639d38 :  callq  *0x20(%rbx)
> > 0x80639d3b :  mov0x28(%rbx),%rdi
> > 0x80639d3f :  callq  *0x8(%rbx)
> > 0x80639d42 :  mov%rbx,-0x38(%rbp)
> > 0x80639d46 :  mov(%rbx),%rbx
> > 0x80639d49 :  test   %rbx,%rbx
> > 0x80639d4c :  je 0x80639ee5 
> > 0x80639d52 :  and$0x2,%r12d
> > 0x80639d56 :  nopw   %cs:0x0(%rax,%rax,1)
> > 0x80639d60 :  mov0x28(%rbx),%r14
> > 
> > Panic is in the last quoted instruction.
> > And:
> > (kgdb) i reg
> > rax0x246582
> > rbx0xdeadc0dedeadc0de   -2401050962867404578
> > rcx0x0  0
> > rdx0x12e302
> > rsi0x80a26a5a   -2136839590
> > rdi0x80e81b80   -2132272256
> > rbp0xfe02b7efea20   0xfe02b7efea20
> > rsp0xfe02b7efe9e0   0xfe02b7efe9e0
> > r8 0x80a269ce   -2136839730
> > r9 0x80e82838   -2132269000
> > r100x1  65536
> > r110x80fabd10   -2131051248
> > r120x0  0
> > r130xf801ff84a818   -8787511171048
> > r140xf801ff84a800   -8787511171072
> > r150xf8019a6974f0   -8789207452432
> > rip0x80639d60   0x80639d60 
> > eflags 0x10286  66182
> > 
> > I think that $rbx stands out here (this is a kernel with INVARIANTS).
> > 
> > Looking at the code, is it possible that one of the calls from within
> > the loop's body modifies the list?  If that is so and provided that is a
> > valid behavior, then maybe using SLIST_FOREACH_SAFE would help.
> 
> This is first time a useful debugging data was posted.
> 
> The 0x28 offset may indicate either kn_kq member access of the struct
> knote, or kq_list of the struct kqueue.
> 
> kl_list.slh_first of the list parameter is NULL, how would a list
> iteration loop even start ?  Can you look up the list argument value
> from the previous frame (%rdi is overwritten, so debugger might be
> confused) ?

After looking at your data closely, I think you are right.  The panic
occurs when the exit1(9) does KNOTE_LOCKED(NOTE_EXIT).  This is the
only case in the tree where filter uses knlist_remove_inevent() to detach
processed note, so indeed the slist is modified under the iterator.

Below is the patch with the suggested change and unrelated cleanup of
the uma(9) KPI use.  Please test, everybody who has a panic with the
backtrace pointing to the sys_exit().

diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c
index a4388aa..2f15f7f 100644
--- a/sys/kern/kern_event.c
+++ b/sys/kern/kern_event.c
@@ -1106,7 +1106,12 @@ kqueue_register(struct kqueue *kq, struct kevent *kev, 
struct thread *td, int wa
return EINVAL;
 
if (kev->flags & EV_ADD)
-   tkn = knote_alloc(waitok);  /* prevent waiting with locks */
+   /*
+*

Re: Instant panic while trying run ports-mgmt/poudriere

2015-08-23 Thread Konstantin Belousov

On Sun, Aug 23, 2015 at 09:54:28AM +0300, Andriy Gapon wrote:
> On 12/08/2015 17:11, Lawrence Stewart wrote:
> > On 08/07/15 07:33, Pawel Pekala wrote:
> >> Hi K.,
> >>
> >> On 2015-08-06 12:33 -0700, "K. Macy"  wrote:
> >>> Is this still happening?
> >>
> >> Still crashes:
> > 
> > +1 for me running r286617
> 
> Here is another +1 with r286922.
> I can add a couple of bits of debugging data:
> 
> (kgdb) fr 8
> #8  0x80639d60 in knote (list=0xf8019a733ea0,
> hint=2147483648, lockflags=) at
> /usr/src/sys/kern/kern_event.c:1964
> 1964} else if ((lockflags & KNF_NOKQLOCK) != 0) {
> (kgdb) p *list
> $2 = {kl_list = {slh_first = 0x0}, kl_lock = 0x8063a1e0
> , kl_unlock = 0x8063a200 ,
>   kl_assert_locked = 0x8063a220 ,
> kl_assert_unlocked = 0x8063a240 ,
>   kl_lockarg = 0xf8019a733bb0}
> (kgdb) disassemble
> Dump of assembler code for function knote:
> 0x80639d00 :   push   %rbp
> 0x80639d01 :   mov%rsp,%rbp
> 0x80639d04 :   push   %r15
> 0x80639d06 :   push   %r14
> 0x80639d08 :   push   %r13
> 0x80639d0a :  push   %r12
> 0x80639d0c :  push   %rbx
> 0x80639d0d :  sub$0x18,%rsp
> 0x80639d11 :  mov%edx,%r12d
> 0x80639d14 :  mov%rsi,-0x30(%rbp)
> 0x80639d18 :  mov%rdi,%rbx
> 0x80639d1b :  test   %rbx,%rbx
> 0x80639d1e :  je 0x80639ef6 
> 0x80639d24 :  mov%r12d,%eax
> 0x80639d27 :  and$0x1,%eax
> 0x80639d2a :  mov%eax,-0x3c(%rbp)
> 0x80639d2d :  mov0x28(%rbx),%rdi
> 0x80639d31 :  je 0x80639d38 
> 0x80639d33 :  callq  *0x18(%rbx)
> 0x80639d36 :  jmp0x80639d42 
> 0x80639d38 :  callq  *0x20(%rbx)
> 0x80639d3b :  mov0x28(%rbx),%rdi
> 0x80639d3f :  callq  *0x8(%rbx)
> 0x80639d42 :  mov%rbx,-0x38(%rbp)
> 0x80639d46 :  mov(%rbx),%rbx
> 0x80639d49 :  test   %rbx,%rbx
> 0x80639d4c :  je 0x80639ee5 
> 0x80639d52 :  and$0x2,%r12d
> 0x80639d56 :  nopw   %cs:0x0(%rax,%rax,1)
> 0x80639d60 :  mov0x28(%rbx),%r14
> 
> Panic is in the last quoted instruction.
> And:
> (kgdb) i reg
> rax0x246582
> rbx0xdeadc0dedeadc0de   -2401050962867404578
> rcx0x0  0
> rdx0x12e302
> rsi0x80a26a5a   -2136839590
> rdi0x80e81b80   -2132272256
> rbp0xfe02b7efea20   0xfe02b7efea20
> rsp0xfe02b7efe9e0   0xfe02b7efe9e0
> r8 0x80a269ce   -2136839730
> r9 0x80e82838   -2132269000
> r100x1  65536
> r110x80fabd10   -2131051248
> r120x0  0
> r130xf801ff84a818   -8787511171048
> r140xf801ff84a800   -8787511171072
> r150xf8019a6974f0   -8789207452432
> rip0x80639d60   0x80639d60 
> eflags 0x10286  66182
> 
> I think that $rbx stands out here (this is a kernel with INVARIANTS).
> 
> Looking at the code, is it possible that one of the calls from within
> the loop's body modifies the list?  If that is so and provided that is a
> valid behavior, then maybe using SLIST_FOREACH_SAFE would help.

This is first time a useful debugging data was posted.

The 0x28 offset may indicate either kn_kq member access of the struct
knote, or kq_list of the struct kqueue.

kl_list.slh_first of the list parameter is NULL, how would a list
iteration loop even start ?  Can you look up the list argument value
from the previous frame (%rdi is overwritten, so debugger might be
confused) ?
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Instant panic while trying run ports-mgmt/poudriere

2015-08-22 Thread Andriy Gapon

On 12/08/2015 17:11, Lawrence Stewart wrote:
> On 08/07/15 07:33, Pawel Pekala wrote:
>> Hi K.,
>>
>> On 2015-08-06 12:33 -0700, "K. Macy"  wrote:
>>> Is this still happening?
>>
>> Still crashes:
> 
> +1 for me running r286617

Here is another +1 with r286922.
I can add a couple of bits of debugging data:

(kgdb) fr 8
#8  0x80639d60 in knote (list=0xf8019a733ea0,
hint=2147483648, lockflags=) at
/usr/src/sys/kern/kern_event.c:1964
1964} else if ((lockflags & KNF_NOKQLOCK) != 0) {
(kgdb) p *list
$2 = {kl_list = {slh_first = 0x0}, kl_lock = 0x8063a1e0
, kl_unlock = 0x8063a200 ,
  kl_assert_locked = 0x8063a220 ,
kl_assert_unlocked = 0x8063a240 ,
  kl_lockarg = 0xf8019a733bb0}
(kgdb) disassemble
Dump of assembler code for function knote:
0x80639d00 :   push   %rbp
0x80639d01 :   mov%rsp,%rbp
0x80639d04 :   push   %r15
0x80639d06 :   push   %r14
0x80639d08 :   push   %r13
0x80639d0a :  push   %r12
0x80639d0c :  push   %rbx
0x80639d0d :  sub$0x18,%rsp
0x80639d11 :  mov%edx,%r12d
0x80639d14 :  mov%rsi,-0x30(%rbp)
0x80639d18 :  mov%rdi,%rbx
0x80639d1b :  test   %rbx,%rbx
0x80639d1e :  je 0x80639ef6 
0x80639d24 :  mov%r12d,%eax
0x80639d27 :  and$0x1,%eax
0x80639d2a :  mov%eax,-0x3c(%rbp)
0x80639d2d :  mov0x28(%rbx),%rdi
0x80639d31 :  je 0x80639d38 
0x80639d33 :  callq  *0x18(%rbx)
0x80639d36 :  jmp0x80639d42 
0x80639d38 :  callq  *0x20(%rbx)
0x80639d3b :  mov0x28(%rbx),%rdi
0x80639d3f :  callq  *0x8(%rbx)
0x80639d42 :  mov%rbx,-0x38(%rbp)
0x80639d46 :  mov(%rbx),%rbx
0x80639d49 :  test   %rbx,%rbx
0x80639d4c :  je 0x80639ee5 
0x80639d52 :  and$0x2,%r12d
0x80639d56 :  nopw   %cs:0x0(%rax,%rax,1)
0x80639d60 :  mov0x28(%rbx),%r14

Panic is in the last quoted instruction.
And:
(kgdb) i reg
rax0x246582
rbx0xdeadc0dedeadc0de   -2401050962867404578
rcx0x0  0
rdx0x12e302
rsi0x80a26a5a   -2136839590
rdi0x80e81b80   -2132272256
rbp0xfe02b7efea20   0xfe02b7efea20
rsp0xfe02b7efe9e0   0xfe02b7efe9e0
r8 0x80a269ce   -2136839730
r9 0x80e82838   -2132269000
r100x1  65536
r110x80fabd10   -2131051248
r120x0  0
r130xf801ff84a818   -8787511171048
r140xf801ff84a800   -8787511171072
r150xf8019a6974f0   -8789207452432
rip0x80639d60   0x80639d60 
eflags 0x10286  66182

I think that $rbx stands out here (this is a kernel with INVARIANTS).

Looking at the code, is it possible that one of the calls from within
the loop's body modifies the list?  If that is so and provided that is a
valid behavior, then maybe using SLIST_FOREACH_SAFE would help.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Instant panic while trying run ports-mgmt/poudriere

2015-08-13 Thread Bryan Drewery

On 8/12/15 7:11 AM, Lawrence Stewart wrote:
> On 08/07/15 07:33, Pawel Pekala wrote:
>> Hi K.,
>>
>> On 2015-08-06 12:33 -0700, "K. Macy"  wrote:
>>> Is this still happening?
>>
>> Still crashes:
> 
> +1 for me running r286617
> 

r286510 has been stable in the package build cluster. r286593 is stable
on my own system.


-- 
Regards,
Bryan Drewery
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Instant panic while trying run ports-mgmt/poudriere

2015-08-12 Thread Lawrence Stewart

On 08/07/15 07:33, Pawel Pekala wrote:
> Hi K.,
> 
> On 2015-08-06 12:33 -0700, "K. Macy"  wrote:
>> Is this still happening?
> 
> Still crashes:

+1 for me running r286617

Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Instant panic while trying run ports-mgmt/poudriere

2015-08-10 Thread Bryan Drewery

On 8/10/15 2:47 PM, Pawel Pekala wrote:
> Hi Mateusz,
> 
> On 2015-08-06 23:44 +0200, Mateusz Guzik  wrote:
>> Sorry, I completely forgot about this.
>>
>> Can you please modify debug flags in your kernel config file to be
>> "-O0 -g3" and reproduce with that? This should allow kgdb to obtain
>> full info (along with exact rash site for inspection) without further
>> tinkering or guessing.
> 
> I'm unable to provide this for you, kernel compiled with this flags
> panics during boot at zfs root mount.
> 

Try raising kern.kstack_pages to 5 or 6 in the loader prompt too.

-- 
Regards,
Bryan Drewery
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Instant panic while trying run ports-mgmt/poudriere

2015-08-10 Thread Pawel Pekala

Hi Mateusz,

On 2015-08-06 23:44 +0200, Mateusz Guzik  wrote:
>Sorry, I completely forgot about this.
>
>Can you please modify debug flags in your kernel config file to be
>"-O0 -g3" and reproduce with that? This should allow kgdb to obtain
>full info (along with exact rash site for inspection) without further
>tinkering or guessing.

I'm unable to provide this for you, kernel compiled with this flags
panics during boot at zfs root mount.

-- 
pozdrawiam / with regards
Paweł Pękala
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Instant panic while trying run ports-mgmt/poudriere

2015-08-06 Thread Mateusz Guzik

On Thu, Aug 06, 2015 at 11:33:28PM +0200, Pawel Pekala wrote:
> Hi K.,
> 
> On 2015-08-06 12:33 -0700, "K. Macy"  wrote:
> >Is this still happening?
> 
> Still crashes:
> 
> Thu Aug  6 23:22:05 CEST 2015
> 
> FreeBSD blaviken.slowicza.org 11.0-CURRENT FreeBSD 11.0-CURRENT #50 r286370: 
> Thu Aug  6 19:55:29 CEST 2015 
> r...@blaviken.slowicza.org:/usr/obj/hdd/src/sys/GENERIC  amd64
> 
> panic: 
> 
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-marcel-freebsd"...
> 
> Unread portion of the kernel message buffer:
> 
> 
> Fatal trap 9: general protection fault while in kernel mode
> cpuid = 2; apic id = 02
> instruction pointer   = 0x20:0x809d6b80
> stack pointer = 0x28:0xfe046cc68a00
> frame pointer = 0x28:0xfe046cc68a50
> code segment  = base 0x0, limit 0xf, type 0x1b
>   = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags  = interrupt enabled, resume, IOPL = 0
> current process   = 2147 (sh)
> 
> #8  0x80e44652 in calltrap ()
> at /hdd/src/sys/amd64/amd64/exception.S:235
> #9  0x809d6b80 in knote (list=0xf801dbebfea0, hint=2147483648, 
> lockflags=) at /hdd/src/sys/kern/kern_event.c:1920
> #10 0x809dc424 in exit1 (td=0xf802bd0559a0, 
> rval=, signo=0) at /hdd/src/sys/kern/kern_exit.c:564
> #11 0x809db8cd in sys_sys_exit (td=0x0, uap=)
> at /hdd/src/sys/kern/kern_exit.c:178
> #12 0x80e64c22 in amd64_syscall (td=0xf802bd0559a0, traced=0)
> at subr_syscall.c:133
> #13 0x80e4493b in Xfast_syscall ()
> at /hdd/src/sys/amd64/amd64/exception.S:395
> #14 0x000800922eea in ?? ()
> Previous frame inner to this frame (corrupt stack?)
> Current language:  auto; currently minimal
> (kgdb) 
> 

Sorry, I completely forgot about this.

Can you please modify debug flags in your kernel config file to be "-O0 -g3"
and reproduce with that? This should allow kgdb to obtain full info
(along with exact rash site for inspection) without further tinkering or
guessing.

-- 
Mateusz Guzik 
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Instant panic while trying run ports-mgmt/poudriere

2015-08-06 Thread Pawel Pekala

Hi K.,

On 2015-08-06 12:33 -0700, "K. Macy"  wrote:
>Is this still happening?

Still crashes:

Thu Aug  6 23:22:05 CEST 2015

FreeBSD blaviken.slowicza.org 11.0-CURRENT FreeBSD 11.0-CURRENT #50 r286370: 
Thu Aug  6 19:55:29 CEST 2015 
r...@blaviken.slowicza.org:/usr/obj/hdd/src/sys/GENERIC  amd64

panic: 

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:


Fatal trap 9: general protection fault while in kernel mode
cpuid = 2; apic id = 02
instruction pointer = 0x20:0x809d6b80
stack pointer   = 0x28:0xfe046cc68a00
frame pointer   = 0x28:0xfe046cc68a50
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 2147 (sh)

Reading symbols from /boot/kernel/zfs.ko.symbols...done.
Loaded symbols for /boot/kernel/zfs.ko.symbols
Reading symbols from /boot/kernel/opensolaris.ko.symbols...done.
Loaded symbols for /boot/kernel/opensolaris.ko.symbols
Reading symbols from /boot/kernel/amdtemp.ko.symbols...done.
Loaded symbols for /boot/kernel/amdtemp.ko.symbols
Reading symbols from /boot/modules/cuse4bsd.ko...done.
Loaded symbols for /boot/modules/cuse4bsd.ko
Reading symbols from /boot/kernel/fuse.ko.symbols...done.
Loaded symbols for /boot/kernel/fuse.ko.symbols
Reading symbols from /boot/kernel/tmpfs.ko.symbols...done.
Loaded symbols for /boot/kernel/tmpfs.ko.symbols
Reading symbols from /boot/kernel/radeonkms.ko.symbols...done.
Loaded symbols for /boot/kernel/radeonkms.ko.symbols
Reading symbols from /boot/kernel/iicbb.ko.symbols...done.
Loaded symbols for /boot/kernel/iicbb.ko.symbols
Reading symbols from /boot/kernel/iicbus.ko.symbols...done.
Loaded symbols for /boot/kernel/iicbus.ko.symbols
Reading symbols from /boot/kernel/iic.ko.symbols...done.
Loaded symbols for /boot/kernel/iic.ko.symbols
Reading symbols from /boot/kernel/drm2.ko.symbols...done.
Loaded symbols for /boot/kernel/drm2.ko.symbols
Reading symbols from /boot/kernel/radeonkmsfw_BARTS_pfp.ko.symbols...done.
Loaded symbols for /boot/kernel/radeonkmsfw_BARTS_pfp.ko.symbols
Reading symbols from /boot/kernel/radeonkmsfw_BARTS_me.ko.symbols...done.
Loaded symbols for /boot/kernel/radeonkmsfw_BARTS_me.ko.symbols
Reading symbols from /boot/kernel/radeonkmsfw_BTC_rlc.ko.symbols...done.
Loaded symbols for /boot/kernel/radeonkmsfw_BTC_rlc.ko.symbols
Reading symbols from /boot/kernel/radeonkmsfw_BARTS_mc.ko.symbols...done.
Loaded symbols for /boot/kernel/radeonkmsfw_BARTS_mc.ko.symbols
Reading symbols from /boot/kernel/fdescfs.ko.symbols...done.
Loaded symbols for /boot/kernel/fdescfs.ko.symbols
Reading symbols from /boot/kernel/ums.ko.symbols...done.
Loaded symbols for /boot/kernel/ums.ko.symbols
Reading symbols from /boot/kernel/uhid.ko.symbols...done.
Loaded symbols for /boot/kernel/uhid.ko.symbols
Reading symbols from /boot/modules/vboxnetflt.ko...done.
Loaded symbols for /boot/modules/vboxnetflt.ko
Reading symbols from /boot/kernel/netgraph.ko.symbols...done.
Loaded symbols for /boot/kernel/netgraph.ko.symbols
Reading symbols from /boot/modules/vboxdrv.ko...done.
Loaded symbols for /boot/modules/vboxdrv.ko
Reading symbols from /boot/kernel/ng_ether.ko.symbols...done.
Loaded symbols for /boot/kernel/ng_ether.ko.symbols
Reading symbols from /boot/modules/vboxnetadp.ko...done.
Loaded symbols for /boot/modules/vboxnetadp.ko
Reading symbols from /boot/kernel/linux.ko.symbols...done.
Loaded symbols for /boot/kernel/linux.ko.symbols
Reading symbols from /boot/kernel/linux_common.ko.symbols...done.
Loaded symbols for /boot/kernel/linux_common.ko.symbols
Reading symbols from /boot/kernel/nullfs.ko.symbols...done.
Loaded symbols for /boot/kernel/nullfs.ko.symbols
Reading symbols from /boot/kernel/linprocfs.ko.symbols...done.
Loaded symbols for /boot/kernel/linprocfs.ko.symbols
Reading symbols from /boot/kernel/sem.ko.symbols...done.
Loaded symbols for /boot/kernel/sem.ko.symbols
#0  doadump (textdump=0) at pcpu.h:221
221 pcpu.h: No such file or directory.
in pcpu.h
(kgdb) #0  doadump (textdump=0) at pcpu.h:221
#1  0x80377f5e in db_dump (dummy=, dummy2=false, 
dummy3=0, dummy4=0x0) at /hdd/src/sys/ddb/db_command.c:533
#2  0x80377ad1 in db_command (cmd_table=0x0)
at /hdd/src/sys/ddb/db_command.c:440
#3  0x80377764 in db_command_loop ()
at /hdd/src/sys/ddb/db_command.c:493
#4  0x8037a31b in db_trap (type=, code=0)
at /hdd/src/sys/ddb/db_main.c:251
#5  0x80a57074 in kdb_trap (type=9, code=0, tf=)
at /hdd/src/sys/kern/sub

Re: Instant panic while trying run ports-mgmt/poudriere

2015-08-06 Thread K. Macy

Is this still happening?
On Jul 15, 2015 1:41 PM, "Pawel Pekala"  wrote:

> Hi John-Mark,
>
> On 2015-07-15 11:05 -0700, John-Mark Gurney  wrote:
> >Please repost the entire panic message, and the back trace w/o X
> >running...  Also, if you could share the core and kernel w/ me (you can
> >email me directly if you'd like), that'd help.
>
> Fatal trap 9: general protection fault while in kernel mode
> cpuid = 1; apic id = 01
> instruction pointer = 0x20:0x809338c0
> stack pointer   = 0x28:0xfe046c818a00
> frame pointer   = 0x28:0xfe046c818a50
> code segment= base 0x0, limit 0xf, type 0x1b
> = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags= interrupt enabled, resume, IOPL = 0
> current process = 1491 (sh)
>
> Reading symbols from /boot/kernel/zfs.ko.symbols...done.
> Loaded symbols for /boot/kernel/zfs.ko.symbols
> Reading symbols from /boot/kernel/opensolaris.ko.symbols...done.
> Loaded symbols for /boot/kernel/opensolaris.ko.symbols
> Reading symbols from /boot/kernel/amdtemp.ko.symbols...done.
> Loaded symbols for /boot/kernel/amdtemp.ko.symbols
> Reading symbols from /boot/modules/cuse4bsd.ko...done.
> Loaded symbols for /boot/modules/cuse4bsd.ko
> Reading symbols from /boot/kernel/fuse.ko.symbols...done.
> Loaded symbols for /boot/kernel/fuse.ko.symbols
> Reading symbols from /boot/kernel/tmpfs.ko.symbols...done.
> Loaded symbols for /boot/kernel/tmpfs.ko.symbols
> Reading symbols from /boot/kernel/radeonkms.ko.symbols...done.
> Loaded symbols for /boot/kernel/radeonkms.ko.symbols
> Reading symbols from /boot/kernel/iicbb.ko.symbols...done.
> Loaded symbols for /boot/kernel/iicbb.ko.symbols
> Reading symbols from /boot/kernel/iicbus.ko.symbols...done.
> Loaded symbols for /boot/kernel/iicbus.ko.symbols
> Reading symbols from /boot/kernel/iic.ko.symbols...done.
> Loaded symbols for /boot/kernel/iic.ko.symbols
> Reading symbols from /boot/kernel/drm2.ko.symbols...done.
> Loaded symbols for /boot/kernel/drm2.ko.symbols
> Reading symbols from /boot/kernel/radeonkmsfw_BARTS_pfp.ko.symbols...done.
> Loaded symbols for /boot/kernel/radeonkmsfw_BARTS_pfp.ko.symbols
> Reading symbols from /boot/kernel/radeonkmsfw_BARTS_me.ko.symbols...done.
> Loaded symbols for /boot/kernel/radeonkmsfw_BARTS_me.ko.symbols
> Reading symbols from /boot/kernel/radeonkmsfw_BTC_rlc.ko.symbols...done.
> Loaded symbols for /boot/kernel/radeonkmsfw_BTC_rlc.ko.symbols
> Reading symbols from /boot/kernel/radeonkmsfw_BARTS_mc.ko.symbols...done.
> Loaded symbols for /boot/kernel/radeonkmsfw_BARTS_mc.ko.symbols
> Reading symbols from /boot/kernel/fdescfs.ko.symbols...done.
> Loaded symbols for /boot/kernel/fdescfs.ko.symbols
> Reading symbols from /boot/kernel/ums.ko.symbols...done.
> Loaded symbols for /boot/kernel/ums.ko.symbols
> Reading symbols from /boot/kernel/uhid.ko.symbols...done.
> Loaded symbols for /boot/kernel/uhid.ko.symbols
> Reading symbols from /boot/kernel/linux.ko.symbols...done.
> Loaded symbols for /boot/kernel/linux.ko.symbols
> Reading symbols from /boot/kernel/linux_common.ko.symbols...done.
> Loaded symbols for /boot/kernel/linux_common.ko.symbols
> Reading symbols from /boot/kernel/nullfs.ko.symbols...done.
> Loaded symbols for /boot/kernel/nullfs.ko.symbols
> Reading symbols from /boot/kernel/linprocfs.ko.symbols...done.
> Loaded symbols for /boot/kernel/linprocfs.ko.symbols
> Reading symbols from /boot/kernel/sem.ko.symbols...done.
> Loaded symbols for /boot/kernel/sem.ko.symbols
> #0  doadump (textdump=0) at pcpu.h:221
> 221 pcpu.h: No such file or directory.
> in pcpu.h
> (kgdb) #0  doadump (textdump=0) at pcpu.h:221
> #1  0x8035b45e in db_dump (dummy=,
> dummy2=false,
> dummy3=0, dummy4=0x0) at /hdd/src/sys/ddb/db_command.c:533
> #2  0x8035afd1 in db_command (cmd_table=0x0)
> at /hdd/src/sys/ddb/db_command.c:440
> #3  0x8035ac64 in db_command_loop ()
> at /hdd/src/sys/ddb/db_command.c:493
> #4  0x8035d7fb in db_trap (type=, code=0)
> at /hdd/src/sys/ddb/db_main.c:251
> #5  0x809b4094 in kdb_trap (type=9, code=0, tf= out>)
> at /hdd/src/sys/kern/subr_kdb.c:654
> #6  0x80d9e065 in trap_fatal (frame=0xfe046c818950,
> eva=) at /hdd/src/sys/amd64/amd64/trap.c:848
> #7  0x80d9dd33 in trap (frame=)
> at /hdd/src/sys/amd64/amd64/trap.c:201
> #8  0x80d7ecb2 in calltrap ()
> at /hdd/src/sys/amd64/amd64/exception.S:235
> #9  0x809338c0 in knote (list=0xf80013ae4408, hint=2147483648,
> lockflags=) at /hdd/src/sys/kern/kern_event.c:1920
> #10 0x80938ef1 in exit1 (td=0xf800135c5980,
> rv=) at /hdd/src/sys/kern/kern_exit.c:559
> #11 0x809383be in sys_sys_exit (td=0x0, uap=)
> at /hdd/src/sys/kern/kern_exit.c:177
> #12 0x80d9e8d2 in amd64_syscall (td=0xf800135c5980, traced=0)
> at subr_syscall.c:133
> #13 0x80d7ef9b in Xfast_syscall ()
> at /hd

Re: Instant panic while trying run ports-mgmt/poudriere

2015-07-15 Thread Pawel Pekala

Hi John-Mark,

On 2015-07-15 11:05 -0700, John-Mark Gurney  wrote:
>Please repost the entire panic message, and the back trace w/o X
>running...  Also, if you could share the core and kernel w/ me (you can
>email me directly if you'd like), that'd help.

Fatal trap 9: general protection fault while in kernel mode
cpuid = 1; apic id = 01
instruction pointer = 0x20:0x809338c0
stack pointer   = 0x28:0xfe046c818a00
frame pointer   = 0x28:0xfe046c818a50
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 1491 (sh)

Reading symbols from /boot/kernel/zfs.ko.symbols...done.
Loaded symbols for /boot/kernel/zfs.ko.symbols
Reading symbols from /boot/kernel/opensolaris.ko.symbols...done.
Loaded symbols for /boot/kernel/opensolaris.ko.symbols
Reading symbols from /boot/kernel/amdtemp.ko.symbols...done.
Loaded symbols for /boot/kernel/amdtemp.ko.symbols
Reading symbols from /boot/modules/cuse4bsd.ko...done.
Loaded symbols for /boot/modules/cuse4bsd.ko
Reading symbols from /boot/kernel/fuse.ko.symbols...done.
Loaded symbols for /boot/kernel/fuse.ko.symbols
Reading symbols from /boot/kernel/tmpfs.ko.symbols...done.
Loaded symbols for /boot/kernel/tmpfs.ko.symbols
Reading symbols from /boot/kernel/radeonkms.ko.symbols...done.
Loaded symbols for /boot/kernel/radeonkms.ko.symbols
Reading symbols from /boot/kernel/iicbb.ko.symbols...done.
Loaded symbols for /boot/kernel/iicbb.ko.symbols
Reading symbols from /boot/kernel/iicbus.ko.symbols...done.
Loaded symbols for /boot/kernel/iicbus.ko.symbols
Reading symbols from /boot/kernel/iic.ko.symbols...done.
Loaded symbols for /boot/kernel/iic.ko.symbols
Reading symbols from /boot/kernel/drm2.ko.symbols...done.
Loaded symbols for /boot/kernel/drm2.ko.symbols
Reading symbols from /boot/kernel/radeonkmsfw_BARTS_pfp.ko.symbols...done.
Loaded symbols for /boot/kernel/radeonkmsfw_BARTS_pfp.ko.symbols
Reading symbols from /boot/kernel/radeonkmsfw_BARTS_me.ko.symbols...done.
Loaded symbols for /boot/kernel/radeonkmsfw_BARTS_me.ko.symbols
Reading symbols from /boot/kernel/radeonkmsfw_BTC_rlc.ko.symbols...done.
Loaded symbols for /boot/kernel/radeonkmsfw_BTC_rlc.ko.symbols
Reading symbols from /boot/kernel/radeonkmsfw_BARTS_mc.ko.symbols...done.
Loaded symbols for /boot/kernel/radeonkmsfw_BARTS_mc.ko.symbols
Reading symbols from /boot/kernel/fdescfs.ko.symbols...done.
Loaded symbols for /boot/kernel/fdescfs.ko.symbols
Reading symbols from /boot/kernel/ums.ko.symbols...done.
Loaded symbols for /boot/kernel/ums.ko.symbols
Reading symbols from /boot/kernel/uhid.ko.symbols...done.
Loaded symbols for /boot/kernel/uhid.ko.symbols
Reading symbols from /boot/kernel/linux.ko.symbols...done.
Loaded symbols for /boot/kernel/linux.ko.symbols
Reading symbols from /boot/kernel/linux_common.ko.symbols...done.
Loaded symbols for /boot/kernel/linux_common.ko.symbols
Reading symbols from /boot/kernel/nullfs.ko.symbols...done.
Loaded symbols for /boot/kernel/nullfs.ko.symbols
Reading symbols from /boot/kernel/linprocfs.ko.symbols...done.
Loaded symbols for /boot/kernel/linprocfs.ko.symbols
Reading symbols from /boot/kernel/sem.ko.symbols...done.
Loaded symbols for /boot/kernel/sem.ko.symbols
#0  doadump (textdump=0) at pcpu.h:221
221 pcpu.h: No such file or directory.
in pcpu.h
(kgdb) #0  doadump (textdump=0) at pcpu.h:221
#1  0x8035b45e in db_dump (dummy=, dummy2=false, 
dummy3=0, dummy4=0x0) at /hdd/src/sys/ddb/db_command.c:533
#2  0x8035afd1 in db_command (cmd_table=0x0)
at /hdd/src/sys/ddb/db_command.c:440
#3  0x8035ac64 in db_command_loop ()
at /hdd/src/sys/ddb/db_command.c:493
#4  0x8035d7fb in db_trap (type=, code=0)
at /hdd/src/sys/ddb/db_main.c:251
#5  0x809b4094 in kdb_trap (type=9, code=0, tf=)
at /hdd/src/sys/kern/subr_kdb.c:654
#6  0x80d9e065 in trap_fatal (frame=0xfe046c818950, 
eva=) at /hdd/src/sys/amd64/amd64/trap.c:848
#7  0x80d9dd33 in trap (frame=)
at /hdd/src/sys/amd64/amd64/trap.c:201
#8  0x80d7ecb2 in calltrap ()
at /hdd/src/sys/amd64/amd64/exception.S:235
#9  0x809338c0 in knote (list=0xf80013ae4408, hint=2147483648, 
lockflags=) at /hdd/src/sys/kern/kern_event.c:1920
#10 0x80938ef1 in exit1 (td=0xf800135c5980, 
rv=) at /hdd/src/sys/kern/kern_exit.c:559
#11 0x809383be in sys_sys_exit (td=0x0, uap=)
at /hdd/src/sys/kern/kern_exit.c:177
#12 0x80d9e8d2 in amd64_syscall (td=0xf800135c5980, traced=0)
at subr_syscall.c:133
#13 0x80d7ef9b in Xfast_syscall ()
at /hdd/src/sys/amd64/amd64/exception.S:395
#14 0x000800922f3a in ?? ()
Previous frame inner to this frame (corrupt stack?)
Current language:  auto; currently minimal

-- 
pozdrawiam / with regards
Paweł Pękala
___
fre

Re: Instant panic while trying run ports-mgmt/poudriere

2015-07-15 Thread John-Mark Gurney

Pawel Pekala wrote this message on Wed, Jul 15, 2015 at 17:46 +0200:
> On 2015-07-14 15:38 -0700, John-Mark Gurney  wrote:
> >Pawel Pekala wrote this message on Mon, Jul 13, 2015 at 23:12 +0200:
> >> Let me know if you need more details.
> >
> >Were you running X at the time of the crash?  and if so, can you try
> >to reproduce w/o X running?  It's hard to know if the panic (and you
> >didn't include the panic string) is due to kern_event, or trying to
> >do too much in the console driver.
> >
> >Thanks.
> 
> Last tests were done with X running yes. Today I did same test with all
> services commented out in rc.conf (including X) and did get same result.
> Poudriere causes kernel panic always in the same spot:
> 
> [00:00:39] >> Calculating ports order and dependencies

Please repost the entire panic message, and the back trace w/o X
running...  Also, if you could share the core and kernel w/ me (you can
email me directly if you'd like), that'd help.

Thanks.

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 "All that I will do, has been done, All that I have, has not."
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Instant panic while trying run ports-mgmt/poudriere

2015-07-15 Thread Pawel Pekala

Hi John-Mark,

On 2015-07-14 15:38 -0700, John-Mark Gurney  wrote:
>Pawel Pekala wrote this message on Mon, Jul 13, 2015 at 23:12 +0200:
>> Let me know if you need more details.
>
>Were you running X at the time of the crash?  and if so, can you try
>to reproduce w/o X running?  It's hard to know if the panic (and you
>didn't include the panic string) is due to kern_event, or trying to
>do too much in the console driver.
>
>Thanks.

Last tests were done with X running yes. Today I did same test with all
services commented out in rc.conf (including X) and did get same result.
Poudriere causes kernel panic always in the same spot:

[00:00:39] >> Calculating ports order and dependencies

-- 
pozdrawiam / with regards
Paweł Pękala
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Instant panic while trying run ports-mgmt/poudriere

2015-07-15 Thread Pawel Pekala

Hi John-Mark,

On 2015-07-14 15:27 -0700, John-Mark Gurney  wrote:
>Pawel Pekala wrote this message on Tue, Jul 14, 2015 at 22:47 +0200:
>> On 2015-07-13 23:28 +0200, Mateusz Guzik  wrote:
>> >On Mon, Jul 13, 2015 at 11:12:05PM +0200, Pawel Pekala wrote:
>> >> Hi
>> >> 
>> >> I'm getting 100% reproducible kernel crash while trying build
>> >> ports with poudriere on my system. This started to show up about
>> >> 2-3 weeks ago. I upgrade my system on weekly basis usually on
>> >> saturday. Here's backtrace:
>> >> 
>> >> (kgdb) bt
>> >[..]
>> >> at /hdd/src/sys/amd64/amd64/trap.c:201
>> >> #25 0x80dace32 in calltrap ()
>> >> at /hdd/src/sys/amd64/amd64/exception.S:235 #26 0x80941430
>> >> in knote (list=0xf801a2589408, hint=2147483648,
>> >> lockflags=)
>> >> at /hdd/src/sys/kern/kern_event.c:1920 #27 0x80946a51 in
>> >> exit1 (td=0xf801b84014d0, rv=)
>> >> at /hdd/src/sys/kern/kern_exit.c:560 #28 0x80945f1e in
>> >> sys_sys_exit (td=0x0, uap=> >> out>) at /hdd/src/sys/kern/kern_exit.c:178 #29 0x80dcdaa2
>> >> out>in amd64_syscall (td=0xf801b84014d0, traced=0)
>> >> at subr_syscall.c:133
>> >> #30 0x80dad11b in Xfast_syscall ()
>> >> at /hdd/src/sys/amd64/amd64/exception.S:395 #31 0x000800922eea
>> >> in ?? () Previous frame inner to this frame (corrupt stack?)
>> >> Current language:  auto; currently minimal
>> >> 
>> >> Let me know if you need more details.
>> >
>> >
>> >Well, if the problem is really that reproducible it would be best if
>> >you narrowed it down to the exact commit.
>> >
>> >However, quick look suggests you may be a "victim" of r284861.
>> 
>> After further testing I can confirm that this panic was introduced in
>> r284861, thanks for the hint!
>
>Can you tell me what your line 1920 of kern_event.c is? (and the
>context around it?   Or at least the $FreeBSD$ line from
>kern_event.c?  Because in HEAD, the line is:
>   } else if ((lockflags & KNF_NOKQLOCK) != 0) {
>
>and there isn't a way to fault on that code...

Yes, this is strange.

if ((kn->kn_status & (KN_INFLUX | KN_SCAN)) == KN_INFLUX) {
/*
 * Do not process the influx notes, except for
 * the influx coming from the kq unlock in the
 * kqueue_scan().  In the later case, we do
 * not interfere with the scan, since the code
 * fragment in kqueue_scan() locks the knlist,
 * and cannot proceed until we finished.
 */
KQ_UNLOCK(kq);
===> line 1920  } else if ((lockflags & KNF_NOKQLOCK) != 0) {
kn->kn_status |= KN_INFLUX;
KQ_UNLOCK(kq);
error = kn->kn_fop->f_event(kn, hint);
KQ_LOCK(kq);
kn->kn_status &= ~KN_INFLUX;
if (error)
KNOTE_ACTIVATE(kn, 1);
KQ_UNLOCK_FLUX(kq);
} else {

Id line:

__FBSDID("$FreeBSD: head/sys/kern/kern_event.c 284215 2015-06-10 10:48:12Z mjg 
$");

-- 
pozdrawiam / with regards
Paweł Pękala
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Instant panic while trying run ports-mgmt/poudriere

2015-07-14 Thread John-Mark Gurney

Pawel Pekala wrote this message on Mon, Jul 13, 2015 at 23:12 +0200:
> Let me know if you need more details.

Were you running X at the time of the crash?  and if so, can you try
to reproduce w/o X running?  It's hard to know if the panic (and you
didn't include the panic string) is due to kern_event, or trying to
do too much in the console driver.

Thanks.

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 "All that I will do, has been done, All that I have, has not."
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Instant panic while trying run ports-mgmt/poudriere

2015-07-14 Thread John-Mark Gurney

Pawel Pekala wrote this message on Tue, Jul 14, 2015 at 22:47 +0200:
> On 2015-07-13 23:28 +0200, Mateusz Guzik  wrote:
> >On Mon, Jul 13, 2015 at 11:12:05PM +0200, Pawel Pekala wrote:
> >> Hi
> >> 
> >> I'm getting 100% reproducible kernel crash while trying build ports
> >> with poudriere on my system. This started to show up about 2-3 weeks
> >> ago. I upgrade my system on weekly basis usually on saturday.
> >> Here's backtrace:
> >> 
> >> (kgdb) bt
> >[..]
> >> at /hdd/src/sys/amd64/amd64/trap.c:201
> >> #25 0x80dace32 in calltrap ()
> >> at /hdd/src/sys/amd64/amd64/exception.S:235 #26 0x80941430
> >> in knote (list=0xf801a2589408, hint=2147483648, lockflags= >> optimized out>) at /hdd/src/sys/kern/kern_event.c:1920 #27
> >> 0x80946a51 in exit1 (td=0xf801b84014d0, rv= >> optimized out>) at /hdd/src/sys/kern/kern_exit.c:560 #28
> >> 0x80945f1e in sys_sys_exit (td=0x0, uap= >> out>) at /hdd/src/sys/kern/kern_exit.c:178 #29 0x80dcdaa2 in
> >> out>amd64_syscall (td=0xf801b84014d0, traced=0)
> >> at subr_syscall.c:133
> >> #30 0x80dad11b in Xfast_syscall ()
> >> at /hdd/src/sys/amd64/amd64/exception.S:395 #31 0x000800922eea
> >> in ?? () Previous frame inner to this frame (corrupt stack?)
> >> Current language:  auto; currently minimal
> >> 
> >> Let me know if you need more details.
> >
> >
> >Well, if the problem is really that reproducible it would be best if
> >you narrowed it down to the exact commit.
> >
> >However, quick look suggests you may be a "victim" of r284861.
> 
> After further testing I can confirm that this panic was introduced in
> r284861, thanks for the hint!

Can you tell me what your line 1920 of kern_event.c is? (and the context
around it?   Or at least the $FreeBSD$ line from kern_event.c?  Because
in HEAD, the line is:
} else if ((lockflags & KNF_NOKQLOCK) != 0) {

and there isn't a way to fault on that code...

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 "All that I will do, has been done, All that I have, has not."
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Instant panic while trying run ports-mgmt/poudriere

2015-07-14 Thread Pawel Pekala

Hi Mateusz,

On 2015-07-13 23:28 +0200, Mateusz Guzik  wrote:
>On Mon, Jul 13, 2015 at 11:12:05PM +0200, Pawel Pekala wrote:
>> Hi
>> 
>> I'm getting 100% reproducible kernel crash while trying build ports
>> with poudriere on my system. This started to show up about 2-3 weeks
>> ago. I upgrade my system on weekly basis usually on saturday.
>> Here's backtrace:
>> 
>> (kgdb) bt
>[..]
>> at /hdd/src/sys/amd64/amd64/trap.c:201
>> #25 0x80dace32 in calltrap ()
>> at /hdd/src/sys/amd64/amd64/exception.S:235 #26 0x80941430
>> in knote (list=0xf801a2589408, hint=2147483648, lockflags=> optimized out>) at /hdd/src/sys/kern/kern_event.c:1920 #27
>> 0x80946a51 in exit1 (td=0xf801b84014d0, rv=> optimized out>) at /hdd/src/sys/kern/kern_exit.c:560 #28
>> 0x80945f1e in sys_sys_exit (td=0x0, uap=> out>) at /hdd/src/sys/kern/kern_exit.c:178 #29 0x80dcdaa2 in
>> out>amd64_syscall (td=0xf801b84014d0, traced=0)
>> at subr_syscall.c:133
>> #30 0x80dad11b in Xfast_syscall ()
>> at /hdd/src/sys/amd64/amd64/exception.S:395 #31 0x000800922eea
>> in ?? () Previous frame inner to this frame (corrupt stack?)
>> Current language:  auto; currently minimal
>> 
>> Let me know if you need more details.
>
>
>Well, if the problem is really that reproducible it would be best if
>you narrowed it down to the exact commit.
>
>However, quick look suggests you may be a "victim" of r284861.

After further testing I can confirm that this panic was introduced in
r284861, thanks for the hint!

-- 
pozdrawiam / with regards
Paweł Pękala
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Instant panic while trying run ports-mgmt/poudriere

2015-07-13 Thread Pawel Pekala

Hi Mateusz,

On 2015-07-13 23:28 +0200, Mateusz Guzik  wrote:
>On Mon, Jul 13, 2015 at 11:12:05PM +0200, Pawel Pekala wrote:
>> Hi
>> 
>> I'm getting 100% reproducible kernel crash while trying build ports
>> with poudriere on my system. This started to show up about 2-3 weeks
>> ago. I upgrade my system on weekly basis usually on saturday.
>> Here's backtrace:
>> 
>> (kgdb) bt
>[..]
>> at /hdd/src/sys/amd64/amd64/trap.c:201
>> #25 0x80dace32 in calltrap ()
>> at /hdd/src/sys/amd64/amd64/exception.S:235 #26 0x80941430
>> in knote (list=0xf801a2589408, hint=2147483648, lockflags=> optimized out>) at /hdd/src/sys/kern/kern_event.c:1920 #27
>> 0x80946a51 in exit1 (td=0xf801b84014d0, rv=> optimized out>) at /hdd/src/sys/kern/kern_exit.c:560 #28
>> 0x80945f1e in sys_sys_exit (td=0x0, uap=> out>) at /hdd/src/sys/kern/kern_exit.c:178 #29 0x80dcdaa2 in
>> out>amd64_syscall (td=0xf801b84014d0, traced=0)
>> at subr_syscall.c:133
>> #30 0x80dad11b in Xfast_syscall ()
>> at /hdd/src/sys/amd64/amd64/exception.S:395 #31 0x000800922eea
>> in ?? () Previous frame inner to this frame (corrupt stack?)
>> Current language:  auto; currently minimal
>> 
>> Let me know if you need more details.
>
>
>Well, if the problem is really that reproducible it would be best if
>you narrowed it down to the exact commit.
>
>However, quick look suggests you may be a "victim" of r284861.
>
>Can you enter kgdb and:
>f 26
>p *list
>
>?

(kgdb) f 26
#26 0x80941430 in knote (list=0xf801a2589408, hint=2147483648, 
lockflags=) at /hdd/src/sys/kern/kern_event.c:1920
1920} else if ((lockflags & KNF_NOKQLOCK) != 0) {
Current language:  auto; currently minimal
(kgdb) p *list
$1 = {kl_list = {slh_first = 0x0}, kl_lock = 0x809418e0 
, 
  kl_unlock = 0x80941900 , 
  kl_assert_locked = 0x80941920 , 
  kl_assert_unlocked = 0x80941940 , 
  kl_lockarg = 0xf801a2589120}


Forgot to add my uname -a last time:

FreeBSD blaviken.slowicza.org 11.0-CURRENT FreeBSD 11.0-CURRENT #44 r285509: 
Mon Jul 13 22:38:11 CEST 2015 
c...@blaviken.slowicza.org:/usr/obj/hdd/src/sys/GENERIC  amd64

-- 
pozdrawiam / with regards
Paweł Pękala
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Instant panic while trying run ports-mgmt/poudriere

2015-07-13 Thread Mateusz Guzik

On Mon, Jul 13, 2015 at 11:12:05PM +0200, Pawel Pekala wrote:
> Hi
> 
> I'm getting 100% reproducible kernel crash while trying build ports
> with poudriere on my system. This started to show up about 2-3 weeks
> ago. I upgrade my system on weekly basis usually on saturday.
> Here's backtrace:
> 
> (kgdb) bt
[..]
> at /hdd/src/sys/amd64/amd64/trap.c:201
> #25 0x80dace32 in calltrap () at 
> /hdd/src/sys/amd64/amd64/exception.S:235
> #26 0x80941430 in knote (list=0xf801a2589408, hint=2147483648, 
> lockflags=) at /hdd/src/sys/kern/kern_event.c:1920
> #27 0x80946a51 in exit1 (td=0xf801b84014d0, rv= out>)
> at /hdd/src/sys/kern/kern_exit.c:560
> #28 0x80945f1e in sys_sys_exit (td=0x0, uap=)
> at /hdd/src/sys/kern/kern_exit.c:178
> #29 0x80dcdaa2 in amd64_syscall (td=0xf801b84014d0, traced=0)
> at subr_syscall.c:133
> #30 0x80dad11b in Xfast_syscall () at 
> /hdd/src/sys/amd64/amd64/exception.S:395
> #31 0x000800922eea in ?? ()
> Previous frame inner to this frame (corrupt stack?)
> Current language:  auto; currently minimal
> 
> Let me know if you need more details.


Well, if the problem is really that reproducible it would be best if you
narrowed it down to the exact commit.

However, quick look suggests you may be a "victim" of r284861.

Can you enter kgdb and:
f 26
p *list

?


-- 
Mateusz Guzik 
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

41 matches

Mail list logo