Re: ahci panics when detaching...

2014-06-24 Thread John-Mark Gurney
John Baldwin wrote this message on Tue, Jun 24, 2014 at 09:51 -0400:
> On Monday, June 23, 2014 9:06:26 pm John-Mark Gurney wrote:
> > John Baldwin wrote this message on Mon, Jun 23, 2014 at 10:49 -0400:
> > > On Monday, June 23, 2014 9:44:08 am John-Mark Gurney wrote:
> > > > So, when I try to eject a ESATA card, the machine panics...  I am able
> > > > to successfully eject other cards, an ethernet (re) and a serial card
> > > > (uart), and both handle the removal of their device w/o issue and with
> > > > out crashes...
> > > > 
> > > > When I try w/ ahci, I get a panic...  The panic backtrace is:
> > > > #8  0x80ced4e2 in calltrap () at 
> > > ../../../amd64/amd64/exception.S:231
> > > > #9  0x8093d037 in rman_get_rid (r=0xf800064c9380)
> > > > at ../../../kern/subr_rman.c:979
> > > > #10 0x8092b888 in resource_list_release_active 
> > > (rl=0xf80006d39c08,
> > > > bus=0xf80002cd9000, child=0xf80006b6d700, type=3)
> > > > at ../../../kern/subr_bus.c:3419
> > > > #11 0x8065d7a1 in pci_child_detached (dev=0xf80002cd9000,
> > > > child=0xf80006b6d700) at ../../../dev/pci/pci.c:4133
> > > > ---Type  to continue, or q  to quit---
> > > > #12 0x80929708 in device_detach (dev=0xf80006b6d700)
> > > > at bus_if.h:181
> > > > #13 0x8065f9f7 in pci_delete_child (dev=0xf80002cd9000,
> > > > child=0xf80006b6d700) at ../../../dev/pci/pci.c:4710
> > > > 
> > > > In frame 9:
> > > > (kgdb) fr 9
> > > > #9  0x8093d037 in rman_get_rid (r=0xf800064c9380)
> > > > at ../../../kern/subr_rman.c:979
> > > > 979 return (r->__r_i->r_rid);
> > > > (kgdb) print r
> > > > $1 = (struct resource *) 0xf800064c9380
> > > > (kgdb) print/x *r
> > > > $4 = {__r_i = 0xdeadc0dedeadc0de, r_bustag = 0xdeadc0dedeadc0de, 
> > > >   r_bushandle = 0xdeadc0dedeadc0de}
> > > > 
> > > > So, looks like something is corrupted the resource data...
> > > 
> > > This is the malloc junking on free.  However, I wonder if the
> > > problem is that the resource was freed without being properly
> > > cleared from the resource_list in the PCI ivars.  Is this with local
> > > patches that you have?
> > 
> > Yes, but I didn't patch any of the pci code, or the resource code, so
> > this bug is in the original code...  My patches only effect the attach
> > case, don't touch the detach case...
> 
> What did you change in attach? :)  If the resource list isn't setup the same 
> then that could cause this.  In particular, the PCI bus pre-reserves resources
> for BARs so that they are allocated even if a driver hasn't allocated them.

What I mean by that is that I setup a few things in pci_attach_common,
like if the device has a slot that can hotplug, I attach an interrupt,
enable interrupts and a couple bookkeeping items... But that code
shouldn't change anything for ahci..

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 "All that I will do, has been done, All that I have, has not."
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ahci panics when detaching...

2014-06-24 Thread John Baldwin
On Monday, June 23, 2014 9:06:26 pm John-Mark Gurney wrote:
> John Baldwin wrote this message on Mon, Jun 23, 2014 at 10:49 -0400:
> > On Monday, June 23, 2014 9:44:08 am John-Mark Gurney wrote:
> > > So, when I try to eject a ESATA card, the machine panics...  I am able
> > > to successfully eject other cards, an ethernet (re) and a serial card
> > > (uart), and both handle the removal of their device w/o issue and with
> > > out crashes...
> > > 
> > > When I try w/ ahci, I get a panic...  The panic backtrace is:
> > > #8  0x80ced4e2 in calltrap () at 
> > ../../../amd64/amd64/exception.S:231
> > > #9  0x8093d037 in rman_get_rid (r=0xf800064c9380)
> > > at ../../../kern/subr_rman.c:979
> > > #10 0x8092b888 in resource_list_release_active 
> > (rl=0xf80006d39c08,
> > > bus=0xf80002cd9000, child=0xf80006b6d700, type=3)
> > > at ../../../kern/subr_bus.c:3419
> > > #11 0x8065d7a1 in pci_child_detached (dev=0xf80002cd9000,
> > > child=0xf80006b6d700) at ../../../dev/pci/pci.c:4133
> > > ---Type  to continue, or q  to quit---
> > > #12 0x80929708 in device_detach (dev=0xf80006b6d700)
> > > at bus_if.h:181
> > > #13 0x8065f9f7 in pci_delete_child (dev=0xf80002cd9000,
> > > child=0xf80006b6d700) at ../../../dev/pci/pci.c:4710
> > > 
> > > In frame 9:
> > > (kgdb) fr 9
> > > #9  0x8093d037 in rman_get_rid (r=0xf800064c9380)
> > > at ../../../kern/subr_rman.c:979
> > > 979 return (r->__r_i->r_rid);
> > > (kgdb) print r
> > > $1 = (struct resource *) 0xf800064c9380
> > > (kgdb) print/x *r
> > > $4 = {__r_i = 0xdeadc0dedeadc0de, r_bustag = 0xdeadc0dedeadc0de, 
> > >   r_bushandle = 0xdeadc0dedeadc0de}
> > > 
> > > So, looks like something is corrupted the resource data...
> > 
> > This is the malloc junking on free.  However, I wonder if the
> > problem is that the resource was freed without being properly
> > cleared from the resource_list in the PCI ivars.  Is this with local
> > patches that you have?
> 
> Yes, but I didn't patch any of the pci code, or the resource code, so
> this bug is in the original code...  My patches only effect the attach
> case, don't touch the detach case...

What did you change in attach? :)  If the resource list isn't setup the same 
then that could cause this.  In particular, the PCI bus pre-reserves resources
for BARs so that they are allocated even if a driver hasn't allocated them.
 
-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ahci panics when detaching...

2014-06-23 Thread John-Mark Gurney
John Baldwin wrote this message on Mon, Jun 23, 2014 at 10:49 -0400:
> On Monday, June 23, 2014 9:44:08 am John-Mark Gurney wrote:
> > So, when I try to eject a ESATA card, the machine panics...  I am able
> > to successfully eject other cards, an ethernet (re) and a serial card
> > (uart), and both handle the removal of their device w/o issue and with
> > out crashes...
> > 
> > When I try w/ ahci, I get a panic...  The panic backtrace is:
> > #8  0x80ced4e2 in calltrap () at 
> ../../../amd64/amd64/exception.S:231
> > #9  0x8093d037 in rman_get_rid (r=0xf800064c9380)
> > at ../../../kern/subr_rman.c:979
> > #10 0x8092b888 in resource_list_release_active 
> (rl=0xf80006d39c08,
> > bus=0xf80002cd9000, child=0xf80006b6d700, type=3)
> > at ../../../kern/subr_bus.c:3419
> > #11 0x8065d7a1 in pci_child_detached (dev=0xf80002cd9000,
> > child=0xf80006b6d700) at ../../../dev/pci/pci.c:4133
> > ---Type  to continue, or q  to quit---
> > #12 0x80929708 in device_detach (dev=0xf80006b6d700)
> > at bus_if.h:181
> > #13 0x8065f9f7 in pci_delete_child (dev=0xf80002cd9000,
> > child=0xf80006b6d700) at ../../../dev/pci/pci.c:4710
> > 
> > In frame 9:
> > (kgdb) fr 9
> > #9  0x8093d037 in rman_get_rid (r=0xf800064c9380)
> > at ../../../kern/subr_rman.c:979
> > 979 return (r->__r_i->r_rid);
> > (kgdb) print r
> > $1 = (struct resource *) 0xf800064c9380
> > (kgdb) print/x *r
> > $4 = {__r_i = 0xdeadc0dedeadc0de, r_bustag = 0xdeadc0dedeadc0de, 
> >   r_bushandle = 0xdeadc0dedeadc0de}
> > 
> > So, looks like something is corrupted the resource data...
> 
> This is the malloc junking on free.  However, I wonder if the
> problem is that the resource was freed without being properly
> cleared from the resource_list in the PCI ivars.  Is this with local
> patches that you have?

Yes, but I didn't patch any of the pci code, or the resource code, so
this bug is in the original code...  My patches only effect the attach
case, don't touch the detach case...

I was hoping someone who knows the code was like, yeh, I do remeber
that place in the code where we free something, but don't properly
NULL out the pointer, etc...

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 "All that I will do, has been done, All that I have, has not."
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ahci panics when detaching...

2014-06-23 Thread John Baldwin
On Monday, June 23, 2014 9:44:08 am John-Mark Gurney wrote:
> So, when I try to eject a ESATA card, the machine panics...  I am able
> to successfully eject other cards, an ethernet (re) and a serial card
> (uart), and both handle the removal of their device w/o issue and with
> out crashes...
> 
> When I try w/ ahci, I get a panic...  The panic backtrace is:
> #8  0x80ced4e2 in calltrap () at 
../../../amd64/amd64/exception.S:231
> #9  0x8093d037 in rman_get_rid (r=0xf800064c9380)
> at ../../../kern/subr_rman.c:979
> #10 0x8092b888 in resource_list_release_active 
(rl=0xf80006d39c08,
> bus=0xf80002cd9000, child=0xf80006b6d700, type=3)
> at ../../../kern/subr_bus.c:3419
> #11 0x8065d7a1 in pci_child_detached (dev=0xf80002cd9000,
> child=0xf80006b6d700) at ../../../dev/pci/pci.c:4133
> ---Type  to continue, or q  to quit---
> #12 0x80929708 in device_detach (dev=0xf80006b6d700)
> at bus_if.h:181
> #13 0x8065f9f7 in pci_delete_child (dev=0xf80002cd9000,
> child=0xf80006b6d700) at ../../../dev/pci/pci.c:4710
> 
> In frame 9:
> (kgdb) fr 9
> #9  0x8093d037 in rman_get_rid (r=0xf800064c9380)
> at ../../../kern/subr_rman.c:979
> 979 return (r->__r_i->r_rid);
> (kgdb) print r
> $1 = (struct resource *) 0xf800064c9380
> (kgdb) print/x *r
> $4 = {__r_i = 0xdeadc0dedeadc0de, r_bustag = 0xdeadc0dedeadc0de, 
>   r_bushandle = 0xdeadc0dedeadc0de}
> 
> So, looks like something is corrupted the resource data...

This is the malloc junking on free.  However, I wonder if the
problem is that the resource was freed without being properly
cleared from the resource_list in the PCI ivars.  Is this with local
patches that you have?

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ahci panics when detaching...

2014-06-23 Thread John-Mark Gurney
Eric van Gyzen wrote this message on Mon, Jun 23, 2014 at 08:57 -0500:
> On 06/23/2014 08:44, John-Mark Gurney wrote:
> > So, when I try to eject a ESATA card, the machine panics...  I am able
> > to successfully eject other cards, an ethernet (re) and a serial card
> > (uart), and both handle the removal of their device w/o issue and with
> > out crashes...
> >
> > When I try w/ ahci, I get a panic...  The panic backtrace is:
> > #8  0x80ced4e2 in calltrap () at 
> > ../../../amd64/amd64/exception.S:231
> > #9  0x8093d037 in rman_get_rid (r=0xf800064c9380)
> > at ../../../kern/subr_rman.c:979
> > #10 0x8092b888 in resource_list_release_active 
> > (rl=0xf80006d39c08,
> > bus=0xf80002cd9000, child=0xf80006b6d700, type=3)
> > at ../../../kern/subr_bus.c:3419
> > #11 0x8065d7a1 in pci_child_detached (dev=0xf80002cd9000,
> > child=0xf80006b6d700) at ../../../dev/pci/pci.c:4133
> > ---Type  to continue, or q  to quit---
> > #12 0x80929708 in device_detach (dev=0xf80006b6d700)
> > at bus_if.h:181
> > #13 0x8065f9f7 in pci_delete_child (dev=0xf80002cd9000,
> > child=0xf80006b6d700) at ../../../dev/pci/pci.c:4710
> >
> > In frame 9:
> > (kgdb) fr 9
> > #9  0x8093d037 in rman_get_rid (r=0xf800064c9380)
> > at ../../../kern/subr_rman.c:979
> > 979 return (r->__r_i->r_rid);
> > (kgdb) print r
> > $1 = (struct resource *) 0xf800064c9380
> > (kgdb) print/x *r
> > $4 = {__r_i = 0xdeadc0dedeadc0de, r_bustag = 0xdeadc0dedeadc0de, 
> >   r_bushandle = 0xdeadc0dedeadc0de}
> >
> > So, looks like something is corrupted the resource data...
> 
> The resource data has been freed.

Well, that is a type of corruption.. :)  If we free it, why wasn't
it removed from the list? or properly NULL'd out?

> > Attach dmesg:
> > atapci0:  at device 0.0 on pci2
> > ahci1:  at channel -1 on atapci0
> > ahci1: AHCI v1.00 with 2 3Gbps ports, Port Multiplier supported
> > ahci1: quirks=0x1
> > ahcich6:  at channel 0 on ahci1
> > ahcich7:  at channel 1 on ahci1
> > ata2:  at channel 0 on atapci0
> > [eject card]
> > ahcich6: stopping AHCI engine failed
> > ahcich6: stopping AHCI FR engine failed
> > ahcich6: detached
> > ahcich7: stopping AHCI engine failed
> > ahcich7: stopping AHCI FR engine failed
> > ahcich7: detached
> > ahci1: detached
> > ata2: detached
> > atapci0: detached
> >
> >
> > Fatal trap 9: general protection fault while in kernel mode
> >
> > Also, has anyone thought about adding a case in your trap
> > handler that when we hit the deadc0de address, to print up a
> > special message or something?  At least flag it, or do we not get
> > the faulting address?
> >
> > This is HEAD as of r266429.
> >
> > Let me know if there is anything else you need to know.
> 
> The full stack trace might be useful.

I could give it to you, but it contains code I can't release (at
least not yet)...  It's basicly an interrupt that calls
pci_delete_child, so there isn't anymore useful information there..

I'm just puzzled why uart and re don't have this same problem..

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 "All that I will do, has been done, All that I have, has not."
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ahci panics when detaching...

2014-06-23 Thread Eric van Gyzen
On 06/23/2014 08:44, John-Mark Gurney wrote:
> So, when I try to eject a ESATA card, the machine panics...  I am able
> to successfully eject other cards, an ethernet (re) and a serial card
> (uart), and both handle the removal of their device w/o issue and with
> out crashes...
>
> When I try w/ ahci, I get a panic...  The panic backtrace is:
> #8  0x80ced4e2 in calltrap () at ../../../amd64/amd64/exception.S:231
> #9  0x8093d037 in rman_get_rid (r=0xf800064c9380)
> at ../../../kern/subr_rman.c:979
> #10 0x8092b888 in resource_list_release_active (rl=0xf80006d39c08,
> bus=0xf80002cd9000, child=0xf80006b6d700, type=3)
> at ../../../kern/subr_bus.c:3419
> #11 0x8065d7a1 in pci_child_detached (dev=0xf80002cd9000,
> child=0xf80006b6d700) at ../../../dev/pci/pci.c:4133
> ---Type  to continue, or q  to quit---
> #12 0x80929708 in device_detach (dev=0xf80006b6d700)
> at bus_if.h:181
> #13 0x8065f9f7 in pci_delete_child (dev=0xf80002cd9000,
> child=0xf80006b6d700) at ../../../dev/pci/pci.c:4710
>
> In frame 9:
> (kgdb) fr 9
> #9  0x8093d037 in rman_get_rid (r=0xf800064c9380)
> at ../../../kern/subr_rman.c:979
> 979 return (r->__r_i->r_rid);
> (kgdb) print r
> $1 = (struct resource *) 0xf800064c9380
> (kgdb) print/x *r
> $4 = {__r_i = 0xdeadc0dedeadc0de, r_bustag = 0xdeadc0dedeadc0de, 
>   r_bushandle = 0xdeadc0dedeadc0de}
>
> So, looks like something is corrupted the resource data...

The resource data has been freed.

> Attach dmesg:
> atapci0:  at device 0.0 on pci2
> ahci1:  at channel -1 on atapci0
> ahci1: AHCI v1.00 with 2 3Gbps ports, Port Multiplier supported
> ahci1: quirks=0x1
> ahcich6:  at channel 0 on ahci1
> ahcich7:  at channel 1 on ahci1
> ata2:  at channel 0 on atapci0
> [eject card]
> ahcich6: stopping AHCI engine failed
> ahcich6: stopping AHCI FR engine failed
> ahcich6: detached
> ahcich7: stopping AHCI engine failed
> ahcich7: stopping AHCI FR engine failed
> ahcich7: detached
> ahci1: detached
> ata2: detached
> atapci0: detached
>
>
> Fatal trap 9: general protection fault while in kernel mode
>
> Also, has anyone thought about adding a case in your trap
> handler that when we hit the deadc0de address, to print up a
> special message or something?  At least flag it, or do we not get
> the faulting address?
>
> This is HEAD as of r266429.
>
> Let me know if there is anything else you need to know.

The full stack trace might be useful.

Eric
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"