Re: ahci panics when detaching...

2014-06-24 Thread John Baldwin
On Monday, June 23, 2014 9:06:26 pm John-Mark Gurney wrote:
 John Baldwin wrote this message on Mon, Jun 23, 2014 at 10:49 -0400:
  On Monday, June 23, 2014 9:44:08 am John-Mark Gurney wrote:
   So, when I try to eject a ESATA card, the machine panics...  I am able
   to successfully eject other cards, an ethernet (re) and a serial card
   (uart), and both handle the removal of their device w/o issue and with
   out crashes...
   
   When I try w/ ahci, I get a panic...  The panic backtrace is:
   #8  0x80ced4e2 in calltrap () at 
  ../../../amd64/amd64/exception.S:231
   #9  0x8093d037 in rman_get_rid (r=0xf800064c9380)
   at ../../../kern/subr_rman.c:979
   #10 0x8092b888 in resource_list_release_active 
  (rl=0xf80006d39c08,
   bus=0xf80002cd9000, child=0xf80006b6d700, type=3)
   at ../../../kern/subr_bus.c:3419
   #11 0x8065d7a1 in pci_child_detached (dev=0xf80002cd9000,
   child=0xf80006b6d700) at ../../../dev/pci/pci.c:4133
   ---Type return to continue, or q return to quit---
   #12 0x80929708 in device_detach (dev=0xf80006b6d700)
   at bus_if.h:181
   #13 0x8065f9f7 in pci_delete_child (dev=0xf80002cd9000,
   child=0xf80006b6d700) at ../../../dev/pci/pci.c:4710
   
   In frame 9:
   (kgdb) fr 9
   #9  0x8093d037 in rman_get_rid (r=0xf800064c9380)
   at ../../../kern/subr_rman.c:979
   979 return (r-__r_i-r_rid);
   (kgdb) print r
   $1 = (struct resource *) 0xf800064c9380
   (kgdb) print/x *r
   $4 = {__r_i = 0xdeadc0dedeadc0de, r_bustag = 0xdeadc0dedeadc0de, 
 r_bushandle = 0xdeadc0dedeadc0de}
   
   So, looks like something is corrupted the resource data...
  
  This is the malloc junking on free.  However, I wonder if the
  problem is that the resource was freed without being properly
  cleared from the resource_list in the PCI ivars.  Is this with local
  patches that you have?
 
 Yes, but I didn't patch any of the pci code, or the resource code, so
 this bug is in the original code...  My patches only effect the attach
 case, don't touch the detach case...

What did you change in attach? :)  If the resource list isn't setup the same 
then that could cause this.  In particular, the PCI bus pre-reserves resources
for BARs so that they are allocated even if a driver hasn't allocated them.
 
-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ahci panics when detaching...

2014-06-24 Thread John-Mark Gurney
John Baldwin wrote this message on Tue, Jun 24, 2014 at 09:51 -0400:
 On Monday, June 23, 2014 9:06:26 pm John-Mark Gurney wrote:
  John Baldwin wrote this message on Mon, Jun 23, 2014 at 10:49 -0400:
   On Monday, June 23, 2014 9:44:08 am John-Mark Gurney wrote:
So, when I try to eject a ESATA card, the machine panics...  I am able
to successfully eject other cards, an ethernet (re) and a serial card
(uart), and both handle the removal of their device w/o issue and with
out crashes...

When I try w/ ahci, I get a panic...  The panic backtrace is:
#8  0x80ced4e2 in calltrap () at 
   ../../../amd64/amd64/exception.S:231
#9  0x8093d037 in rman_get_rid (r=0xf800064c9380)
at ../../../kern/subr_rman.c:979
#10 0x8092b888 in resource_list_release_active 
   (rl=0xf80006d39c08,
bus=0xf80002cd9000, child=0xf80006b6d700, type=3)
at ../../../kern/subr_bus.c:3419
#11 0x8065d7a1 in pci_child_detached (dev=0xf80002cd9000,
child=0xf80006b6d700) at ../../../dev/pci/pci.c:4133
---Type return to continue, or q return to quit---
#12 0x80929708 in device_detach (dev=0xf80006b6d700)
at bus_if.h:181
#13 0x8065f9f7 in pci_delete_child (dev=0xf80002cd9000,
child=0xf80006b6d700) at ../../../dev/pci/pci.c:4710

In frame 9:
(kgdb) fr 9
#9  0x8093d037 in rman_get_rid (r=0xf800064c9380)
at ../../../kern/subr_rman.c:979
979 return (r-__r_i-r_rid);
(kgdb) print r
$1 = (struct resource *) 0xf800064c9380
(kgdb) print/x *r
$4 = {__r_i = 0xdeadc0dedeadc0de, r_bustag = 0xdeadc0dedeadc0de, 
  r_bushandle = 0xdeadc0dedeadc0de}

So, looks like something is corrupted the resource data...
   
   This is the malloc junking on free.  However, I wonder if the
   problem is that the resource was freed without being properly
   cleared from the resource_list in the PCI ivars.  Is this with local
   patches that you have?
  
  Yes, but I didn't patch any of the pci code, or the resource code, so
  this bug is in the original code...  My patches only effect the attach
  case, don't touch the detach case...
 
 What did you change in attach? :)  If the resource list isn't setup the same 
 then that could cause this.  In particular, the PCI bus pre-reserves resources
 for BARs so that they are allocated even if a driver hasn't allocated them.

What I mean by that is that I setup a few things in pci_attach_common,
like if the device has a slot that can hotplug, I attach an interrupt,
enable interrupts and a couple bookkeeping items... But that code
shouldn't change anything for ahci..

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 All that I will do, has been done, All that I have, has not.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


ahci panics when detaching...

2014-06-23 Thread John-Mark Gurney
So, when I try to eject a ESATA card, the machine panics...  I am able
to successfully eject other cards, an ethernet (re) and a serial card
(uart), and both handle the removal of their device w/o issue and with
out crashes...

When I try w/ ahci, I get a panic...  The panic backtrace is:
#8  0x80ced4e2 in calltrap () at ../../../amd64/amd64/exception.S:231
#9  0x8093d037 in rman_get_rid (r=0xf800064c9380)
at ../../../kern/subr_rman.c:979
#10 0x8092b888 in resource_list_release_active (rl=0xf80006d39c08,
bus=0xf80002cd9000, child=0xf80006b6d700, type=3)
at ../../../kern/subr_bus.c:3419
#11 0x8065d7a1 in pci_child_detached (dev=0xf80002cd9000,
child=0xf80006b6d700) at ../../../dev/pci/pci.c:4133
---Type return to continue, or q return to quit---
#12 0x80929708 in device_detach (dev=0xf80006b6d700)
at bus_if.h:181
#13 0x8065f9f7 in pci_delete_child (dev=0xf80002cd9000,
child=0xf80006b6d700) at ../../../dev/pci/pci.c:4710

In frame 9:
(kgdb) fr 9
#9  0x8093d037 in rman_get_rid (r=0xf800064c9380)
at ../../../kern/subr_rman.c:979
979 return (r-__r_i-r_rid);
(kgdb) print r
$1 = (struct resource *) 0xf800064c9380
(kgdb) print/x *r
$4 = {__r_i = 0xdeadc0dedeadc0de, r_bustag = 0xdeadc0dedeadc0de, 
  r_bushandle = 0xdeadc0dedeadc0de}

So, looks like something is corrupted the resource data...


Attach dmesg:
atapci0: JMicron JMB363 UDMA133 controller at device 0.0 on pci2
ahci1: JMicron JMB363 AHCI SATA controller at channel -1 on atapci0
ahci1: AHCI v1.00 with 2 3Gbps ports, Port Multiplier supported
ahci1: quirks=0x1NOFORCE
ahcich6: AHCI channel at channel 0 on ahci1
ahcich7: AHCI channel at channel 1 on ahci1
ata2: ATA channel at channel 0 on atapci0
[eject card]
ahcich6: stopping AHCI engine failed
ahcich6: stopping AHCI FR engine failed
ahcich6: detached
ahcich7: stopping AHCI engine failed
ahcich7: stopping AHCI FR engine failed
ahcich7: detached
ahci1: detached
ata2: detached
atapci0: detached


Fatal trap 9: general protection fault while in kernel mode

Also, has anyone thought about adding a case in your trap
handler that when we hit the deadc0de address, to print up a
special message or something?  At least flag it, or do we not get
the faulting address?

This is HEAD as of r266429.

Let me know if there is anything else you need to know.

Thanks.

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 All that I will do, has been done, All that I have, has not.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ahci panics when detaching...

2014-06-23 Thread Eric van Gyzen
On 06/23/2014 08:44, John-Mark Gurney wrote:
 So, when I try to eject a ESATA card, the machine panics...  I am able
 to successfully eject other cards, an ethernet (re) and a serial card
 (uart), and both handle the removal of their device w/o issue and with
 out crashes...

 When I try w/ ahci, I get a panic...  The panic backtrace is:
 #8  0x80ced4e2 in calltrap () at ../../../amd64/amd64/exception.S:231
 #9  0x8093d037 in rman_get_rid (r=0xf800064c9380)
 at ../../../kern/subr_rman.c:979
 #10 0x8092b888 in resource_list_release_active (rl=0xf80006d39c08,
 bus=0xf80002cd9000, child=0xf80006b6d700, type=3)
 at ../../../kern/subr_bus.c:3419
 #11 0x8065d7a1 in pci_child_detached (dev=0xf80002cd9000,
 child=0xf80006b6d700) at ../../../dev/pci/pci.c:4133
 ---Type return to continue, or q return to quit---
 #12 0x80929708 in device_detach (dev=0xf80006b6d700)
 at bus_if.h:181
 #13 0x8065f9f7 in pci_delete_child (dev=0xf80002cd9000,
 child=0xf80006b6d700) at ../../../dev/pci/pci.c:4710

 In frame 9:
 (kgdb) fr 9
 #9  0x8093d037 in rman_get_rid (r=0xf800064c9380)
 at ../../../kern/subr_rman.c:979
 979 return (r-__r_i-r_rid);
 (kgdb) print r
 $1 = (struct resource *) 0xf800064c9380
 (kgdb) print/x *r
 $4 = {__r_i = 0xdeadc0dedeadc0de, r_bustag = 0xdeadc0dedeadc0de, 
   r_bushandle = 0xdeadc0dedeadc0de}

 So, looks like something is corrupted the resource data...

The resource data has been freed.

 Attach dmesg:
 atapci0: JMicron JMB363 UDMA133 controller at device 0.0 on pci2
 ahci1: JMicron JMB363 AHCI SATA controller at channel -1 on atapci0
 ahci1: AHCI v1.00 with 2 3Gbps ports, Port Multiplier supported
 ahci1: quirks=0x1NOFORCE
 ahcich6: AHCI channel at channel 0 on ahci1
 ahcich7: AHCI channel at channel 1 on ahci1
 ata2: ATA channel at channel 0 on atapci0
 [eject card]
 ahcich6: stopping AHCI engine failed
 ahcich6: stopping AHCI FR engine failed
 ahcich6: detached
 ahcich7: stopping AHCI engine failed
 ahcich7: stopping AHCI FR engine failed
 ahcich7: detached
 ahci1: detached
 ata2: detached
 atapci0: detached


 Fatal trap 9: general protection fault while in kernel mode

 Also, has anyone thought about adding a case in your trap
 handler that when we hit the deadc0de address, to print up a
 special message or something?  At least flag it, or do we not get
 the faulting address?

 This is HEAD as of r266429.

 Let me know if there is anything else you need to know.

The full stack trace might be useful.

Eric
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ahci panics when detaching...

2014-06-23 Thread John-Mark Gurney
Eric van Gyzen wrote this message on Mon, Jun 23, 2014 at 08:57 -0500:
 On 06/23/2014 08:44, John-Mark Gurney wrote:
  So, when I try to eject a ESATA card, the machine panics...  I am able
  to successfully eject other cards, an ethernet (re) and a serial card
  (uart), and both handle the removal of their device w/o issue and with
  out crashes...
 
  When I try w/ ahci, I get a panic...  The panic backtrace is:
  #8  0x80ced4e2 in calltrap () at 
  ../../../amd64/amd64/exception.S:231
  #9  0x8093d037 in rman_get_rid (r=0xf800064c9380)
  at ../../../kern/subr_rman.c:979
  #10 0x8092b888 in resource_list_release_active 
  (rl=0xf80006d39c08,
  bus=0xf80002cd9000, child=0xf80006b6d700, type=3)
  at ../../../kern/subr_bus.c:3419
  #11 0x8065d7a1 in pci_child_detached (dev=0xf80002cd9000,
  child=0xf80006b6d700) at ../../../dev/pci/pci.c:4133
  ---Type return to continue, or q return to quit---
  #12 0x80929708 in device_detach (dev=0xf80006b6d700)
  at bus_if.h:181
  #13 0x8065f9f7 in pci_delete_child (dev=0xf80002cd9000,
  child=0xf80006b6d700) at ../../../dev/pci/pci.c:4710
 
  In frame 9:
  (kgdb) fr 9
  #9  0x8093d037 in rman_get_rid (r=0xf800064c9380)
  at ../../../kern/subr_rman.c:979
  979 return (r-__r_i-r_rid);
  (kgdb) print r
  $1 = (struct resource *) 0xf800064c9380
  (kgdb) print/x *r
  $4 = {__r_i = 0xdeadc0dedeadc0de, r_bustag = 0xdeadc0dedeadc0de, 
r_bushandle = 0xdeadc0dedeadc0de}
 
  So, looks like something is corrupted the resource data...
 
 The resource data has been freed.

Well, that is a type of corruption.. :)  If we free it, why wasn't
it removed from the list? or properly NULL'd out?

  Attach dmesg:
  atapci0: JMicron JMB363 UDMA133 controller at device 0.0 on pci2
  ahci1: JMicron JMB363 AHCI SATA controller at channel -1 on atapci0
  ahci1: AHCI v1.00 with 2 3Gbps ports, Port Multiplier supported
  ahci1: quirks=0x1NOFORCE
  ahcich6: AHCI channel at channel 0 on ahci1
  ahcich7: AHCI channel at channel 1 on ahci1
  ata2: ATA channel at channel 0 on atapci0
  [eject card]
  ahcich6: stopping AHCI engine failed
  ahcich6: stopping AHCI FR engine failed
  ahcich6: detached
  ahcich7: stopping AHCI engine failed
  ahcich7: stopping AHCI FR engine failed
  ahcich7: detached
  ahci1: detached
  ata2: detached
  atapci0: detached
 
 
  Fatal trap 9: general protection fault while in kernel mode
 
  Also, has anyone thought about adding a case in your trap
  handler that when we hit the deadc0de address, to print up a
  special message or something?  At least flag it, or do we not get
  the faulting address?
 
  This is HEAD as of r266429.
 
  Let me know if there is anything else you need to know.
 
 The full stack trace might be useful.

I could give it to you, but it contains code I can't release (at
least not yet)...  It's basicly an interrupt that calls
pci_delete_child, so there isn't anymore useful information there..

I'm just puzzled why uart and re don't have this same problem..

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 All that I will do, has been done, All that I have, has not.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ahci panics when detaching...

2014-06-23 Thread John Baldwin
On Monday, June 23, 2014 9:44:08 am John-Mark Gurney wrote:
 So, when I try to eject a ESATA card, the machine panics...  I am able
 to successfully eject other cards, an ethernet (re) and a serial card
 (uart), and both handle the removal of their device w/o issue and with
 out crashes...
 
 When I try w/ ahci, I get a panic...  The panic backtrace is:
 #8  0x80ced4e2 in calltrap () at 
../../../amd64/amd64/exception.S:231
 #9  0x8093d037 in rman_get_rid (r=0xf800064c9380)
 at ../../../kern/subr_rman.c:979
 #10 0x8092b888 in resource_list_release_active 
(rl=0xf80006d39c08,
 bus=0xf80002cd9000, child=0xf80006b6d700, type=3)
 at ../../../kern/subr_bus.c:3419
 #11 0x8065d7a1 in pci_child_detached (dev=0xf80002cd9000,
 child=0xf80006b6d700) at ../../../dev/pci/pci.c:4133
 ---Type return to continue, or q return to quit---
 #12 0x80929708 in device_detach (dev=0xf80006b6d700)
 at bus_if.h:181
 #13 0x8065f9f7 in pci_delete_child (dev=0xf80002cd9000,
 child=0xf80006b6d700) at ../../../dev/pci/pci.c:4710
 
 In frame 9:
 (kgdb) fr 9
 #9  0x8093d037 in rman_get_rid (r=0xf800064c9380)
 at ../../../kern/subr_rman.c:979
 979 return (r-__r_i-r_rid);
 (kgdb) print r
 $1 = (struct resource *) 0xf800064c9380
 (kgdb) print/x *r
 $4 = {__r_i = 0xdeadc0dedeadc0de, r_bustag = 0xdeadc0dedeadc0de, 
   r_bushandle = 0xdeadc0dedeadc0de}
 
 So, looks like something is corrupted the resource data...

This is the malloc junking on free.  However, I wonder if the
problem is that the resource was freed without being properly
cleared from the resource_list in the PCI ivars.  Is this with local
patches that you have?

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ahci panics when detaching...

2014-06-23 Thread John-Mark Gurney
John Baldwin wrote this message on Mon, Jun 23, 2014 at 10:49 -0400:
 On Monday, June 23, 2014 9:44:08 am John-Mark Gurney wrote:
  So, when I try to eject a ESATA card, the machine panics...  I am able
  to successfully eject other cards, an ethernet (re) and a serial card
  (uart), and both handle the removal of their device w/o issue and with
  out crashes...
  
  When I try w/ ahci, I get a panic...  The panic backtrace is:
  #8  0x80ced4e2 in calltrap () at 
 ../../../amd64/amd64/exception.S:231
  #9  0x8093d037 in rman_get_rid (r=0xf800064c9380)
  at ../../../kern/subr_rman.c:979
  #10 0x8092b888 in resource_list_release_active 
 (rl=0xf80006d39c08,
  bus=0xf80002cd9000, child=0xf80006b6d700, type=3)
  at ../../../kern/subr_bus.c:3419
  #11 0x8065d7a1 in pci_child_detached (dev=0xf80002cd9000,
  child=0xf80006b6d700) at ../../../dev/pci/pci.c:4133
  ---Type return to continue, or q return to quit---
  #12 0x80929708 in device_detach (dev=0xf80006b6d700)
  at bus_if.h:181
  #13 0x8065f9f7 in pci_delete_child (dev=0xf80002cd9000,
  child=0xf80006b6d700) at ../../../dev/pci/pci.c:4710
  
  In frame 9:
  (kgdb) fr 9
  #9  0x8093d037 in rman_get_rid (r=0xf800064c9380)
  at ../../../kern/subr_rman.c:979
  979 return (r-__r_i-r_rid);
  (kgdb) print r
  $1 = (struct resource *) 0xf800064c9380
  (kgdb) print/x *r
  $4 = {__r_i = 0xdeadc0dedeadc0de, r_bustag = 0xdeadc0dedeadc0de, 
r_bushandle = 0xdeadc0dedeadc0de}
  
  So, looks like something is corrupted the resource data...
 
 This is the malloc junking on free.  However, I wonder if the
 problem is that the resource was freed without being properly
 cleared from the resource_list in the PCI ivars.  Is this with local
 patches that you have?

Yes, but I didn't patch any of the pci code, or the resource code, so
this bug is in the original code...  My patches only effect the attach
case, don't touch the detach case...

I was hoping someone who knows the code was like, yeh, I do remeber
that place in the code where we free something, but don't properly
NULL out the pointer, etc...

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 All that I will do, has been done, All that I have, has not.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org